Recent Research Progresses on Optimal System Reliability Design
ABSTRACT. Optimal system reliability design is an important research field in reliability engineering. Since the 1950s, extensive studies have been conducted on various aspects of this issue. This field remains highly active today due to the need to develop new generations of complex engineering systems, such as 5G telecom networks and high-performance computing clusters, which are expected to be highly reliable to meet the stringent, dynamic, and often real-time quality demands of system operators and end-users.
Over the past five years, numerous new researches on optimal system reliability design have been published, addressing the theoretical challenges posed by the new engineering systems. This presentation will systematically review these works with the focus on theoretical advancements, including the models and methods for redundancy allocation problem, redundancy allocation under mixed uncertainty, joint reliability-redundancy allocation problem and joint redundancy allocation and maintenance optimization. Through analysis and discussions, we will outline future research directions.
Data Fusion and Feature Selection for Multiclass Classification
ABSTRACT. In many application areas, a phenomenon can be reflected through multimodal data. Multimodal data contain different information and can provide a more comprehensive representation or understanding of the phenomenon than unimodal data. However, multimodal classification requires careful handling of data fusion and dimensionality reduction. In order to solve the multiclass classification problem with multimodal data, this study conducts early fusion, which first concatenates features of all modalities and uses extended adaptive lasso (EA LASSO) to select features. Multilayer perceptron (MLP) is then used to build a multiclass classification model. In the demonstration, the proposed fusion structure effectively classified patients into cognitively normal, mild cognitive impairment, or having Alzheimer's disease with an accuracy of 89%.
Reliable comparison and determination of performance of classification models
ABSTRACT. A number of studies have focused on developing artificial intelligence (AI) technologies to help industries to meet the challenges presented during advances toward smartization and digitalization, subsequently producing many types of deep learning (DL) classification models. However, using the single values or means of evaluation metrics to compare the performance of classification models may lead to misjudgment. In view of this and considering the fact that accuracy is the most commonly used metric to evaluate the performance of classification models, we defined a performance gap index (PGI) GAP. Due to the unavoidable amount of uncertainty and randomness in evaluation metric values, we employed inferential statistics to define the triangular-shaped fuzzy number (TFN) of GAP, and further proposed the statistical test model of GAP based on TFN of GAP as well as formulated an even more stringent decision rule with the objective of reliably comparing the performance of the classification model developed in the current study
Deep reinforcement learning based resilient microgrid expansion planning considering power supply flexibility
ABSTRACT. Compared with centralized power systems, microgrids have the advantages of reducing transmission losses and integrating distributed power generation, especially renewable energy. In addition to working in conjunction with the main grids for power generation, microgrids can also provide power supply independently when separated from the main grids. Considering the power supply flexibility of microgrids, a long-term microgrid expansion planning problem regarding the investments in power generation and storage facilities is investigated. In this work, microgrids use renewable energy to supply clean and efficient power. The main grids, as the upstream power systems, can not only make up for the shortage of power supply from the microgrids but also reclaim the extra power from the microgrids. The power supply resilience brought about by the microgrids when the main grid is interrupted due to extreme events is discussed. A new microgrid expansion planning model is developed with the goal of improving the power supply reliability of the microgrid, reducing power supply costs and environmental pollution. In particular, the impacts of extreme weather on the operation of the microgrids and the actual operating characteristics of power generation and storage facilities in the microgrids are taken into account. A deep reinforcement learning algorithm is used to solve this dynamic and complex long-term microgrid expansion planning problem. Finally, a case study is established based on real data and simulation to validate the effectiveness of the proposed model, and the performance of the algorithm in solving large-scale long-term expansion planning problems is also demonstrated.
Optimal Designs for Gamma Accelerated Degradation Tests with Quadratic Stress Relation
ABSTRACT. In reliability engineering, accelerated degradation tests (ADTs) are commonly used to assess the lifetime of high-reliability products. The gamma accelerated degradation model is frequently employed to analyze the monotonic degradation paths of test units in ADTs. Typically, existing literature on optimal design assumes a linear relation between the model's degradation drift rate and stress. However, some fields of chemistry suggest that the reaction rate and stress exhibit super-Arrhenius behavior, indicating a quadratic relation between the reaction rate and stress. Therefore, this study aims to derive approximate optimal designs for gamma accelerated degradation models with this quadratic relation, using three criteria: V-optimality, D-optimality, and A-optimality. Applying the general equivalence theorem, we demonstrate that the optimal number of stress levels is three. Additionally, we determine the optimal stress levels and the corresponding proportions of test units.
Evaluation of software reliability models based on Hawkes process
ABSTRACT. The Hawkes process(HP) is widely used in seismology, finance, criminology, and other fields. This paper attempts to use HP in the field of software reliability, using 8 sets of software fault detection time data and a preliminary evaluation of HP-based software reliability models (SRMs) with different excitation functions. These results are compared with the well-known Non-homogeneous poisson process(NHPP) -based SRMs, and the results show that the goodness-of-fit of our proposed HP-based SRMs is able to be better than the NHPP-based SRMs in a fraction of cases.
Reliability and failure mechanisms of thermal imaging modules undergone highly accelerated stress tests
ABSTRACT. In this work, reliability of thermal imaging modules undergone highly accelerated stress tests was investigated and the following failure mechanisms such as lens degradation, die detachment, loss of vacuum and electrical failure were observed and extensively analyzed. The module, being a complex assembly, can experience individual or co-occurrent failure mechanisms; some of such co-occurrences of the failure mechanisms and their interdependencies have also been tackled. Besides, the necessity of an annealing procedure introduced between the application to the module of a saturated highly accelerated stress test and the following health check of the module is explained. Essentially, such annealing procedure helps prevent undesired moisture-related electrical failure of the module as well as differentiate intrinsic degradation of the module from that caused by the residual moisture trapped inside the module during the application of the stress test.
Maintenance Policy for Deteriorating k-out-of-n Systems Subject to Random Shocks
ABSTRACT. We investigate a k-out-of-n system in which each unit deteriorates according to a gamma process and is susceptible to various random external shocks. Upon arrival of a shock, the deterioration states of the relevant units experience a jump. There is an interdependence between individual units' deterioration state and the magnitude of the system's jump. At equally spaced time intervals, the deterioration state and jump magnitude of each unit are completely observed. At the beginning of each time interval, the decision-maker determines maintenance actions (either to do nothing or replace) for each unit based on the deterioration state and cumulative jump magnitude caused by random shocks, aiming to minimize the total expected maintenance cost over an infinite horizon. We optimize the condition-based policy using a Markov decision process and show that the total expected discounted cost monotonically increases with both the deterioration state and the jump magnitude.
Joint optimization of maintenance and spare unit management with partial observations
ABSTRACT. This research focuses on a series system consisting of non-identical units. A monitoring device provides partial observations related to system deterioration at equally spaced time periods. At the beginning of each period, the decision-maker determines whether to continue operation or perform maintenance based on deterioration and inventory information. When replacement is determined as the maintenance action, decisions regarding spare unit management—such as unit selection, inventory levels, and ordering timing—are made concurrently. This study proposes an optimal joint policy for maintenance and spare unit management to minimize expected total costs indefinitely. Additionally, the performance of the proposed policy is evaluated by comparing it with several benchmark policies.
On Planning Accelerated Life Tests for Model Selection
ABSTRACT. Accelerated life testing aims to predict the lifetime of items under normal conditions, using an accelerated lifetime model based on chemical theory for the prediction. The application of accelerated life testing to reliability evaluation requires that the model be known. However, in cases such as the development of new materials, it may not be possible to confidently choose a model beforehand. When information on the relationship between stress and the lifetime distribution expressed by the accelerated lifetime model is ambiguous, model selection is necessary. Wakimoto et al. (2024) proposed using EIC as a criterion for selection. Their study focused on examining the performance of model selection without considering experimental costs. Therefore, this study designs experimental plans for model selection to evaluate the identification performance of EIC under more realistic settings. Additionally, it compares the identification performance of EIC under the uniform plan and the experimental plan.
Dynamic Assessment of Mission Reliability for Autonomous Vehicles Considering Mission Criticality
ABSTRACT. With the growing concern for autonomous vehicle (AV) reliability, various statistical metrics have been developed to measure their long-term and average behaviors. However, these metrics overlook the characteristics during the phased-mission operations of AVs and the dynamically changing mission demands. This article introduces a novel method to dynamically assess AV mission reliability, which applies to the complex, time-varying mission demands and considers the differences in mission criticality at different phases. Functional data analysis methods are utilized to quantify the gap between key performance indicators and dynamic mission demands. A case study on the lateral control mission of AVs validates the effectiveness of the proposed methods, where the lane width is the key performance indicator.
Optimal cybersecurity investment and green hydrogen supply chain reliability under supply disruption risk
ABSTRACT. The green hydrogen supply chain (HSC) is growing concern about the impacts of renewable energy (RE) on system reliability. Although a number of attempts have been concentrated on RE-related systems, the lack of reliability and economic assessment of RE sources for HSC under cyber-physical systems prevents the competitiveness of the commercial hydrogen market. Given that obstacle, this study aims to propose a cyber-physical power system (CPPS) for HSC considering the supply disruption risk of RE. By evaluating the role of cybersecurity and renewable-based energy availability, hydrogen producers and renewable electricity suppliers can obtain the maximum profit by determining the optimal investment in the CPPS. The result implies that the higher the investment in cybersecurity for green hydrogen at producers is, the lower the price suppliers offer them because the reliability of information sharing is ensured in the entire CPPS. In particular, the positive exponential relationship describes the impact of the cybersecurity level on the hydrogen producers’ profit. Further analysis of the RE supply disruption risk supports the hydrogen producers that increasing production efficiency and capacity planning is a crucial strategy to mitigate the influence of the disruption risk on the green HSC.
Advancing Layer of Protection Analysis (LOPA) and Augmented Analytics System to Optimize SIL in Train Automatic Door Systems
ABSTRACT. Determining safety requirements for safety systems is crucial for hazard prevention. The train manufacturing industry in Indonesia adheres to the International Electrotechnical Commission (IEC) standard to assess product safety, using Safety Integrity Level (SIL) certification. Validating product test results through SIL Assessment calculation ensures compliance with safety standards. Accelerating product failure detection is essential for enhancing safety. This study focuses on validating security levels and expediting failure detection within the Door System. The SIL Assessment yields SIL 0, below the company's standard, prompting a comprehensive evaluation of the Safety Instrumented System (SIS) for the Door System. Layer of Protection Analysis (LOPA) suggests enhancing SIL values. SIL 0 indicates adequate risk reduction, No Risk (NR) indicates no potential risk, and SIL 1 suggests increasing SIL for risk reduction. Additionally, a Machine Learning classification algorithm is developed to predict failures swiftly. This integrated approach enhances safety and operational efficiency.
Describing Behavior of Safety-Related Systems with Proof-Testing and Its Applications
ABSTRACT. We describe the behavior of E/E/PE safety-related systems based on a continuous-time Markov chain with proof-testing which is known as maintenance activities especially for undetected dangerous faults. The undetected dangerous failure / fault cannot be detected by self-diagnostic system generally implemented in the E/E/PE safety-related systems. Therefore, major concern in the operation phase is when to conduct the proof-testing to maintain designed safety level. Analyzing our continuous-time Markov chain gives us useful discussions on optimal proof-testing intervals, and resolution of IEC 61508-based discontinuous safety assessment depending on the operation mode of the E/E/PE safety-related systems. We addresses these discussions and show numerical examples for them by following some definitions in IEC 61508, which is the international basic standard for the functional safety of the E/E/PE safety-related systems.
Estimation Method Based on SRGM Using Uncertain Fault Information on Open BTS
ABSTRACT. Bug tracking system (BTS) is used for debug to manage reported faults in software development. BTS has been used in the reliability evaluation since it records detail information that can analyze. It also uses for open source software (OSS) reliability evaluation. However, OSS project BTS rejects more faults than commercial project BTS. Furthermore, it takes long time for the reported fault to be confirmed. Therefore, it cannot apply the latest software reliability growth model (SRGM) when it is constructed by confirmed fault only. In this paper, we analyze the relationship between the reported faults and the fixed faults using the hyper text transfer protocol (HTTP) of client OSS fault data. Moreover, we propose the evaluation method of software reliability even in situations where it is unclear whether the reported fault is correct. As the result, we find the trend toward the correlation between the confirmed fault and the rejected fault. Moreover, we estimate the reliability growth curves of corrected faults with high accuracy using only reported faults.
Subsampling Approach for Massive Lifetime Data Analysis
ABSTRACT. Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been extensively developed to downsize the data volume, there is a notable gap in addressing the unique challenge of handling extensive reliability data, in which a common situation is that a large proportion of data is censored. In this article, we propose an efficient subsampling method for reliability analysis in the presence of censoring data, intending to estimate the parameters of lifetime distribution.
Moreover, a novel subsampling method for subsampling from severely censored data is proposed, i.e., only a tiny proportion of data is complete. The subsampling-based estimators are given, and their asymptotic properties are derived.The optimal subsampling probabilities are derived through the L-optimality criterion, which minimizes the trace of the product of the asymptotic covariance matrix and a constant matrix. Efficient algorithms are proposed to implement the proposed subsampling methods to address the challenge that optimal subsampling strategy depends on unknown parameter estimation from full data. Real-world hard drive dataset case and simulative empirical studies are employed to demonstrate the superior performance of the proposed methods.
A Bayesian Robust Regression Method for Corrupted Space Solar Array Data Reconstruction
ABSTRACT. Satellites have played a significant role in current production, daily life, and military operations. As an important component of a satellite, the Space Solar Array (SSA) powers the entire satellite platform to guarantee normal operation, so the satellite owner or ground station must monitor the power data to ensure the health status of the SSA. However, data transmission between satellites and ground stations often suffers corruption due to bandwidth limitations or environmental interference. Also, the performance of the devices on a satellite decreases gradually during operation, leading to reduced data quality and causing outliers or even censoring data. Consequently, conventional statistical methods are not applicable to the original SSA data, and one must first recover/reconstruct the data to solve for the missing data and outliers. But fortunately, we often have domain knowledge from engineers in the field of satellites, which can serve as prior information for models. To face severe data corruption, we propose a data reconstruction algorithm TRIP by incorporating prior information into a robust regression method, which greatly enhances the algorithm's ability when facing a large number of outliers and corruptions.
In this work, we use the power data of an anonymous SSA recorded for 300 days by an orbiting low earth orbit (LEO) satellite for analysis. The data have several important characteristics besides periodicity: (i) they exhibit an overall change in trend and several jump points (ii) the data number 20 million in total, thereby requiring a denoising method with low time complexity; (iii) most problematically, there are many outliers that unlike random noise have a certain pattern, which makes it difficult to eliminate them using conventional methods. We apply the TRIP algorithm for data recovery, and the result shows that it recovers the true information perfectly. Based on the recovered data, we select a representative point in each period to extract the overall trend of the SSA power, and this trend clearly demonstrates the degradation of solar cells and device failures. These cleaned data can be used for further degradation analysis and provide a solid foundation for other reliability analysis.
Performability Assessment Measure Based on Deep Learning for Open Source Software
ABSTRACT. We focus on the edge computing. Several open source software (OSS) are included in the software environment of edge computing. Then, it is important to consider the characteristics of OSS in terms of the edge computing. We focus on the reliability assessment measures based on the deep learning for the edge computing. In particluar, we propose the performability as reliability assessment measure from the proposed deep learning model. Also, this paper shows several numerical examples in terms of the performability by using the actual fault big data sets.
Developing the MAIC Model for the Key Component of Spindle Floating Seat Utilizing Process Capability Indices
ABSTRACT. The industrial community frequently utilizes the six-sigma method and the process capability index as tools. These methods not only enhance process quality also achieves energy conservation and carbon reduction. However, Limited sample sizes are common, attributed to factors such as destructive testing, costly detection methods, and inadequate technological capabilities, all of which compromise the reliability of statistical approaches. Therefore, this paper applied the method integrating process capability index and fuzzy testing model to develop execution models for the Six Sigma MAIC improvement model. The proposed model not only reduces the likelihood of misjudgment due to sampling errors but also enhances testing accuracy, particularly in cases of small sample sizes.
Firstly, using the asymmetric tolerance index for the target functional and derived the upper confidence limit process capability indices. Subsequently, according to the above, we deduce the minimum value of the index estimator. Next, construct a radar chart and combine the cause & effect analysis to analysis the parameters which effect the problem. Secondly, utilizes the Taguchi design of experiments for improve the quality levels of individual quality characteristics. Lastly, according to these results, this study demonstrates the applicability of the proposed approach with an illustrative example.
Evaluate the yield rate of multi-process products using the Six Sigma DMAIC method
ABSTRACT. With the rising consumer awareness, the emphasis on quality management and risk control has significantly increased. Without an effective quality assurance system, manufacturing costs can rise due to yield issues, and inconsistent quality may lead to market backlash. This study applies the Six Sigma DMAIC methodology to wheelchair accessory parts, focusing on the critical stages of Define, Measure, Analyze, Improve, and Control.
Initially, during the sampling stage, the production process is planned, incorporating inspection points into the key processes using a production flowchart. Failure Modes and Effects Analysis (FMEA) is used to identify potential failures, assess their severity, frequency, and detectability to implement preventive measures. Simultaneously, the control plan outlines quality characteristics from mechanical equipment and inspection methods to process parameters, evaluation methods, inspection ratios, frequencies, and analysis methods, and proposes improvement recommendations when anomalies are detected.
Finally, the process capability index is used to evaluate the quality stability of the product in continuous production, aiming to control defects within three standard deviations, reduce complaints, transportation, and technical maintenance carbon emissions, promote sustainable development, reduce production costs, and improve customer loyalty.
Constructing a Variables Two-Plan Sampling System for Validation of Product Reliability
ABSTRACT. Ensuring that products maintain normal functionality during the warranty period is crucial for companies to control warranty costs and sustain brand reputation. The variable acceptance sampling plan based on the lifetime performance index is a practical technique for verifying product lifespan because it provides the necessary failure numbers and batch acceptance criteria required for lifespan testing. In this paper, we propose a two-plan sampling system based on the lifetime performance index that features a flexible sampling mechanism. Compared to existing methods based on the lifetime performance index, the proposed method offers superior cost-effectiveness.
A modified repetitive group sampling plan with a switching rule
ABSTRACT. The conventional repetitive group sampling plan (RGSP) has showcased its cost- efficiency and adaptability in lot decision-making over recent years. However, it struggles to adjust when facing diverse quality submissions. This study introduces a revised approach to RGSP, integrating variables inspection based on unilateral process capability indices. The method features a switchable mechanism for resampling, alternating between reduced and normal inspections. Comparative analysis with single sampling plan demonstrates that the proposed method demands fewer average samples to meet identical quality criteria. Furthermore, it produces operating characteristic curves with enhanced discriminatory ability across varying quality levels. To facilitate practical implementation, this paper presents a tabulation of solved plan parameters tailored to specific quality conditions, allowing for easy reference.