ESREL2023: EUROPEAN SAFETY AND RELIABILITY CONFERENCE 2023
PROGRAM FOR THURSDAY, SEPTEMBER 7TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-11:00 Session 20A: S.23: Dependent failure behaviour in risk/reliability modelling, maintenance and Prognostics and Health Management (PHM) II
Location: Room 100/3023
09:00
Optical surface analysis with Support Vector Machines based on two different measurement techniques

ABSTRACT. The optical perception of high precision, fine grinded surfaces is an important quality feature for these products. Its manufacturing process is rather complex and depends on a variety of process parameters (e.g. feed rate, cutting speed) which have a direct impact on the surface topography. Therefore, the durable quality of a product can be improved by an optimized configuration of the process parameters.

To improve the conventional methods of condition monitoring, a new image processing analysis approach is needed to get a faster and more cost-efficient analysis of produced surfaces. For this reason, different optical techniques based on image analysis have been developed over the past years.

In this study, fine grinded surface images have been generated under constant boundary conditions in a test rig built up in a lab. The gathered image material in combination with the classical measured surface topography values is used as the training data for machine learning analyses.

In real-world applications, data often exists in unbalanced class distributions, which can result in biased trained machine learning models. Since the manufacturer of these surfaces would produce economically prohibitive rejects, the used data also consist of an imbalanced distribution. Since the data basis plays an essential role for the training of machine learning models, the challenge in the application is often to find cost-efficient, fast and at the same time process-adaptable measurement methods that also have sufficient accuracy. Basically, the measured arithmetic average roughness value (Ra) out of two different measurement methods was used as the target variable in this study. The measurement methods used are distinguished between tactile and optical convocal measurement, which ensure different precisions and different scattering. This results in two data sets with unequal imbalanced distributions and different statistical variance.

The present target values are available both as a class and as a continuous value, so that a classification as well as regression analysis with Support Vector Machines can be performed. SVMs are a type of machine learning algorithms which can be particularly applied for any kind of analysis based on extracted features. In order to find suitable parameters for the SVMs, a comprehensive parameter study for the different data sets was performed. Finally, the influence of the different measurement methods with the same input database is analyzed and discussed in detail.

09:15
An Integrated Approach for Failure Diagnosis and Analysis of Industrial Systems Based on Multi-Class Multi-Output Classification: A Complex Hydraulic Application
PRESENTER: Bahman Askari

ABSTRACT. For complex systems, a fault of one or several components does not necessarily lead to a failure of the system, but if the failed components are not immediately replaced, they may conduct some other components to an idle state. In this work, a data-driven model with a two-step decision approach is proposed to provide a comprehensive analysis of the potential failures and their causes. In the first step, a Multi-Class Multi-Output (MCMO) classification technique is used to diagnose potential failures based on sensor signals, and, in the second step, Failure Analysis (FA) is applied to investigate the root causes of those failures. The proposed approach is applied to a multi-component Hydraulic System (HS) case study, showing the resulting effectiveness in improving system reliability, reducing downtime, and minimizing the impact of failures on system operations. The results show that MCMO classification is a promising approach for multi-component system failure diagnosis that offers several advantages over conventional methods.

09:30
A Reflection on Time and Parameters in Common Mode/Cause Failures
PRESENTER: Pavel Krcal

ABSTRACT. Common cause, or common mode, failures (CCF) are often major contributors in large scale risk analyses of complex systems, such as nuclear probabilistic risk assessment. In this paper we discuss two challenges and limitations of the most widely used approaches regarding CCF, the parametric approach and the time dependent behavior.

A simultaneous failure of several components by a common cause is typically modeled by a single event (a CCF event). Parametric CCF models define failure probabilities of CCF events based on their multiplicity, i.e., how many components fail simultaneously. For example, the Multiple Greek Letters (MGL) or Alpha Factor models define the failure probability of each CCF multiplicity as a fraction of the original event and these fractions can be calculated from model parameters. The time dependent behavior is the ability of the model to consider that the information about the component status may vary in time (and thereby affecting the likelihood of a CCF failure to occur).

The prevailing approach to CCF quantification uses parametric models. The time dependent behavior is either not considered or limited to sequential or staggered testing for the Alpha parametric model. This paper explores possibilities to generalize the time dependent behavior and what benefits this would bring. We will also discuss if using a more generalized definition of the CCF probability, rather than a parametric model, could bring flexibility and resolve some issues with CCF in current real-life probabilistic safety assessment models, like the possibility to define several CCF groups covering the same events.

09:45
Mathematical Formulation of Markov Decision Process to address Maintenance Policy in PV farms
PRESENTER: Luciana Casacio

ABSTRACT. In order to face the challenges incurred by climate change, the industry has been striving to improve the overall performance of PV systems. Unsolved challenges remain concerning reliability, numerous unforeseen outages, and high operation and maintenance (O&M) costs. In this context, this work increases the operational performance of PV plants by improving current methodologies for O&M in PV systems. We develop a maintenance approach based in a Markov decision process model to analyse the data from PV power plants, prioritise actions, advise asset replacement, and schedule preventive maintenance tasks based on past experiences and the PV system condition. The results allow economic improvement through downtime reduction and early detection of system under performance.

10:00
A multivariate Poisson model with flexible dependence structure

ABSTRACT. Multivariate distributions are indispensable tools for modeling complex data structures with multiple dependent variables. Despite extensive research on discrete multivariate distributions, the multivariate Poisson distribution remains inadequately defined. However, multivariate Poisson counts are not rare and have gained considerable attention in scientific fields such as reliability engineering. Accurately specifying the dependence structure presents a significant challenge in analyzing such data. Although several methods have been proposed in the literature to address this issue, they have limitations in satisfying all feasible correlations. Currently, there is an outstanding question regarding the development of a multivariate Poisson model that is easily interpretable and effectively handles dependent Poisson counts. In this study, we present a novel multivariate Poisson model that leverages multivariate reduction techniques (MRT) to enable greater flexibility in the dependence structure, particularly for negative correlations, than classical constructions. Our proposed model generalizes existing MRT-based methods by having the same parameters when some of our model's parameters are preset. We demonstrate the feasible regions of correlations and show that our model overcomes the limitations of previous methods, making it ideal for analyzing multivariate Poisson counts. Furthermore, we establish several probabilistic properties, including the probability mass function, the probability-generating function, and the Pearson correlation coefficient. We also provide a detailed discussion of maximum likelihood estimation and an algorithm for generating multivariate Poisson random variables. Our model's superiority is demonstrated through simulations and a real-world example.

10:15
A new aggregation methodology for the HPP Health Index
PRESENTER: Luis Guimarães

ABSTRACT. Hydroelectric Power Plant (HPP) supports European Union Electric Power System flexibility with various services (regulation capability, fast frequency control, fast start/stop, fast generating to pumping modes transition, high ramping rate, inertia emulation, and fault ride-through capacity, among others). New technology solutions, such as variable speed, are being studied to provide further flexibility in the framework of the XFLEX project. However, these additional capabilities impose new challenges on HPP's Operations and Maintenance (O\&M). This work aims to increase the HPP's availability under this new paradigm. Proper health indexes (HI) should reflect the machine degradation, which is a critical component in health monitoring, fault diagnosis, and remaining useful life prediction. We explore indicators related specifically to Hydraulic Machines, namely the mechanical efficiency and the discharge of run-of-river Kaplan turbines, to develop a model to calculate the HPP global health index as a measurement of the impact in the HPP condition of each selected operating point for the energy production. This work proposes using Data Envelopment Analysis (DEA) to analyse the weights when aggregating the indicators into a global health indicator. Analyzing the global HI improves the equipment's performance, leading to operating points with better power production.

09:00-11:00 Session 20B: Organizational Factors and Safety Culture
09:00
Enforcement of the safety culture in the LPG and LNG sector: findings of Seveso controlling activities

ABSTRACT. This paper concerns the results that emerged from the analysis of the control activities carried out by National Institute for Environmental Protection and Research (ISPRA) in the context of the Italian implementation of the Directive 2012/18/EU aiming at the control of major accidents (so-called "Seveso III"), on industrial establishments in the LPG (Liquefied Petroleum Gas) and LNG (Liquefied Natural Gas) sector. Starting from the presentation of the European and Italian technical regulation, a sample of the following types of establishments are analysed in terms of technical and organizational aspects: urban storage and distribution of LPG; underground storage of LPG; multi-site LPG storage company; LNG storage and regasification terminal. The main conclusions of the assessments of Safety Reports and inspections on Safety Management Systems conducted on some national case studies are explained, with examples of accidents occurred in industrial sites, based on the operational experience taken from the site managers analysis during the control activities, aimed at emphasizing the return of experience and the lessons learned. Finally, some types of non-compliances found during the control activities are provided, in the LPG and LNG sector, in terms of organizational and management aspects aimed at the enforcement of the safety culture of the industrial establishments. They regard the evaluation of: Safety Reports, from a structural, organizational, and documentary completeness point of view; Safety Management System inspections, in terms of training, operational control and emergency management activities.

09:15
The role of mentors in crisis preparedness and safety management training

ABSTRACT. To sustain a supreme safety and crisis preparedness in high-risk context, simulation and training is imperative. Preparing to handle what may happen, must occur in a learning setting that is properly planned and executed, to achieve appropriate response when accidents arrive. From a four-year research project, Sætren and Stenhammer (2022) have developed a model that offers a model for how to carry out simulations and training, to ensure supreme safety and crisis preparedness when accidents arrive.

The learning wheel consists of several components that must be integrated in simulation and training, for quality learning to occur (see figure 1). The model includes participation of mentors who have extensive experience with preparedness in high-risk contexts. Mentors have been found to be beneficial during training, to build high-reliability organizations when crisis arrives (Lekka and Sugden 2011). But there is little empirical research on the mentor's role during planning and execution of training. We therefore seek to build novel insight to the mentor's role during training to optimize the learning process.

The research question is: How can the role of mentors be optimized for learning processes in safety and crisis preparedness training?

Method: We have conducted interviews with two experienced mentors, one from a university strategic crisis preparedness team, and one from a police department. They have participated as mentors in several simulations at Nordlab, a simulation lab at Nord University, in addition to extensive experience from other exercises and real-life occurrences. We are planning to conduct additionally two - four interviews in January and February 2023. The method of analysis is planned to be reflexive thematic analysis.

Results: Preliminary findings indicate, firstly, that mentors have an important role prior to the simulation, to ensure context realism in the preparation of the simulation case. Second, that mentors are important during the simulation, to build further context realism by explaining how the simulation might unfold in real life, and to build psychological safety by being available to the students for reflections and exploration of possible actions to take. Third, that mentors participate in the debrief after the simulation, to contribute to the evaluation and to steer dialogues in the debrief to achieve desired learning objectives. These findings show that, to optimize the role of mentors in simulation and training on safety and crisis preparedness in high-risk context, the mentor ought to be woven into all components of the learning wheel to develop high-reliability organizations.

Figure 1 Nordlab’s pedagogical wheel: factors promoting an optimal simulated learning process in emergency exercises in high-risk and high-sensitivity environments (Sætren et al., 2022)

Sætren, G.B., Stenhammer, H.C., et al. (2022). Computer-assisted management training for emergency response professionals in challenging environments. Safety in Extreme Environments, 4, 277–290 (2022). https://doi.org/10.1007/s42797-022-00066-0 Lekka, C. and Sugden C. (2011) The successes and challenges of implementing high reliability principles: A case study of a UK oil refinery. Process safety and environmental protection, 89(6); 443-451. https://doi.org/10.1016/j.psep.2011.07.003

09:30
Case managers’ Assessment of the Value of Managing Reports in a Deviation Management System - an Exploratory Study

ABSTRACT. Incident reporting systems are generally applied to support organizations in learning from incidents and accidents to prevent unwanted events from recurring [1], [2]. However, some organizations are struggling in their work to reduce their number of incidents [3]. This has been suggested to be partly attributed to shortcomings related to learning from the events [4]. Studies have revealed a range of barriers to learning from incidents [5], including under-reporting, poor quality of reports and ineffective follow-up on recommendations in the reports.

Case managers play a key role for learning from an incident reporting system: Unless cases are managed, no lessons will be learned based the reports. Within the constraints of their organizations, case managers govern the timeline and quality of the case management process. The paper explores what case characteristics that impact case managers’ assessment of the effect/effectiveness of case management. High effect/effectiveness score is assumed to motivated to engagement in case management.

The paper is based on data from an incident reporting system in a research organisation. Incidents, near misses, improvement suggestions and positive feedback are reported in the system. At the time of the study, the reporting system had been running for a year and consisted of 396 cases. Of these, 64 had not been opened, 79 were in the process of being managed, and 253 were closed. Of the closed cases, 141 had been evaluated for their effect/effectiveness by the case manager. We hypothesise that cases with safety and security incidents and near misses will and be associated with high effect/efficiency scores, due to the learning outcome, and that cases concerning more common challenges will be associated with lower effect/efficiency scores, because they are handled routinely. In general, we hypothesise that cases with titles clearly capturing their content will be opened faster, than cases with more diffuse titles.

The study is currently ongoing and uses mixed methods: (1) Statistics is calculated based on the quantitative data contained in the reporting system. (2) Cause analysis quality and completeness are characterised using a simple rating scale. (3) Thematic analysis based on level of completeness and clarity of the case titles and content descriptions.

The results may be used to support reviews of case management effectiveness in an organization, and thus to promote organizational learning.

References: [1] Johnson, C. (2003). Failure in Safety-Critical Systems: A Handbook of Incident and Accident Reporting. Glasgow University Press, Glasgow, Scotland. [2] Margaryan, A., Littlejohn, A., Stanton, N. (2017). Research and development agenda for Learning from Incidents. Safety Science, 99(A), pp. 5–13 [3] Drupsteen, L., Steijger, D.M.J., Groeneweg, J., Zwetsloot, G.I.J.M. (2011). What are the bottlenecks in the learning from incidents process? IchemE Hazards XXII conference, Liverpool, United Kingdom. [4] Kjellen, U. (2000). Prevention of accidents through experience feedback, Taylor & Francis, London and New York. [5] ESReDA guidelines (2015). Barriers to learning from incidents and accidents. ESReDA Project Group Dynamic Learning as the Follow-up from Accident Investigations: http://www.esreda.org/

09:45
Three Iterations for Safety Analyses

ABSTRACT. A major contribution to a safety case are safety analyses such as FTA, FMEA, ETA. While different safety standards and guidelines often just require a certain analysis for the safety case, others define how an analysis shall be done. In document-oriented standards it is explained for what pat of the safety case the results of an analysis are foreseen whereas process-oriented standards focus more on the phase for which an analysis shall be produced. However, there does not exist any standard that describes when reasonably to start a certain safety analysis and to which extent it shall be done to optimize the positive impact of an analysis to the architecture and design of a product. In addition, such a recommendation for a starting could help project managers to better plan their projects in terms of “cannot start earlier than”. From our experience, we see three different iterations for a safety analysis. These are initial, preliminary and final. All these iterations have a different purpose. The purpose of an initial analysis is to give a first estimation on a safety concept. It typically shows major problems or shows the feasibility of a concept. The preliminary iteration gives feedback about a more concrete proposal of a design. This is typically based on a sample (e.g. B-Sample). This iteration highlights points which need to be improved e.g. by enhancing safety measures. The purpose of the final analysis is to provide evidence on the final design for the safety demonstration and the thus for the safety case and the assessment.

10:00
Approaches to residential fire safety – a systematic literature review
PRESENTER: Razieh Amiri

ABSTRACT. Home fires can still cause fatalities, severe injuries, and damage to the household's assets [1-3]. Statistics indicate that the current fire safety measures have not been as effective for all types of residents, e.g. in many countries, the majority of fire fatalities are mostly concerned vulnerable groups [4-5]. The main objective of this paper is to conduct a systematic literature review in the field of residential fire safety, to evaluate the studied fire safety measures from two perspectives. First, it is investigated what aspects of fire safety are addressed, including individual needs, the technical and physical environment, as well as the social and organizational environment. Second, it is investigated whether the studies focused more toward preventing the causes, and/or the consequences. Out of the initial 1303 studies from three search engines, 438 studies were selected for further assessment. The results show that the amount of research and studies in residential fire safety is generally increasing during the last decades, and only in recent years, they have addressed all the three aspects; individual needs, the technical and physical environment, as well as the social and organizational environment. Also, comparing the total number of articles focusing on preventing the causes and/or consequences, the existing literature seems to be leaning toward measures reducing the consequence of the fire. The studies that identified to prevent both causes and consequences, are mostly toward addressing more than one of the three factors. Research that focuses on the interplay of individual needs, the technical and physical environment, as well as the social and organizational environment in residential fire safety can lead to new insights and better prioritization of measures. This can also be useful in terms of prioritizing and aiding the implementation of measures, thus providing policymakers, governments, and authorities advice that can contribute to achieving higher fire safety levels in homes. This understanding can be crucial to develop new solutions and measures to find and target the most important ones that can be connected to fire causes and hazardous events, and it can only be achievable through multidisciplinary cooperation with different groups of expertise including health care, social science and humanities, technical fire safety, relevant authorities, and stakeholders.|

[1] C. Subramaniam, “Human factors influencing fire safety measures,” Disaster Prev. Manag., vol. 13, no. 2, pp. 110–116, 2004, doi:10.1108/09653560410534243 [2] A. Steen-Hansen, K. Storesund, and C. Sesseng, “Learning from fire investigations and research – A Norwegian perspective on moving from a reactive to a proactive fire safety management,” Fire Saf. J., vol.120, p.103047, Mar.2021, doi:10.1016/J.FIRESAF.2020.103047 [3] USFA, “Topical Fire Report Series-Fire Death Rate Trends: An International Perspective,” vol.12, no.8, Jul. 2011, Accessed: Mar.2022. [4] P. Cassidy, N. McConnell, and K. Boyce, “The older adult: Associated fire risks and current challenges for the development of future fire safety intervention strategies,” Fire Mater., vol.45, no.4, pp.553–563, Jun.2021, doi:10.1002/FAM.2823. [5] A. Jonsson, M. Runefors, S. Särdqvist, and F. Nilson, “Fire-Related Mortality in Sweden: Temporal Trends 1952 to 2013,”Fire Technol., vol.52, no.6, pp.1697–1707, Nov. 2016, doi:10.1007/S10694-015-0551-5.

10:15
Methods for integrating anticipatory and adaptive approaches in proactive crisis management – a literature review

ABSTRACT. Organizations involved in providing essential services to society often perform a range of activities to prevent and prepare for future potential crises. Since organizations may be confronted with both events that can be foreseen on beforehand as well as events or conditions that cannot be foreseen, preventive and preparatory activities should be designed keeping both possibilities in mind. The main reason for this is that the character of an event can affect the effectiveness of preventive and preparatory measures. In the case of foreseeable events, anticipatory strategies are often useful to develop plans and procedures as well as to decide what resources should be employed if a crisis occurs. In the case of unforeseeable events or conditions, on the other hand, the organization needs to use adaptive strategies aiming to develop the ability to solve new tasks, re-organize, provide services in new ways or to identify and use resources that were not accessible to the organization prior to the event. An essential part of proactive crisis management is to conduct assessments aiming to create an understanding of an organization’s current risks, vulnerabilities and capabilities. Such an understanding is essential to inform the design and selection of preventive and preparatory measures. These assessments must provide an understanding of the organization’s abilities to both deal with events that are foreseen and those that are unforeseen. In most cases, however, existing assessment methods and practices are either anticipatory (focusing on the foreseen) or adaptive (focusing on the unforeseen) and rarely are the two perspectives integrated in the same method or assessment. Here it is argued that there may be a great value in performing assessments where the two perspectives are integrated aiming to provide a holistic understanding of an organization’s ability to deal with both the foreseen and the unforeseen. In this paper, the results from a systematic literature review are presented. In the review, assessment methods suggested in the research literature where the explicit aim is to assess an organization from both an anticipatory and an adaptive perspective are identified and described. In addition, the usefulness of these methods for public sector organizations providing essential services to society are reflected on and needs for further research are outlined.

09:00-11:00 Session 20C: S.10: Advances in Reliability Engineering and Risk Management in Oil and Gas Industries II
09:00
On the Relevancy of Systems Thinking in Permanent Plug and Abandonment Regulations in Norway
PRESENTER: Rune Vikane

ABSTRACT. The Norwegian Continental Shelf (NCS) contains more than 2500 production wells, all requiring permanent Plug and Abandonment (P&A) at the end of their lifetime. Norwegian petroleum regulations for P&A have up to recently had a vision of zero well leakage for perpetuity, as communicated through the standard NORSOK D-010. An implication of this vision, as any well leakage is unacceptable, is a perspective making it acceptable to consider each well in isolation and to ignore systemic effects. The 2021 revision of the NORSOK D-010 Standard as such represents a paradigm shift in P&A regulation by replacing the vision with a non-zero quantitative risk acceptance criterion. Now allowing for low-rate well leakage. This change influences perspective and challenges the principle to ignore systemic effects. The objective of this article is to consider the relevancy and potential role of systems thinking in Norwegian P&A regulations. Leveson’s STAMP framework is used as basis for this analysis. The system-wide effects investigated include, the aggregated well leakage rates in different area perspectives, and, whether feedback loops concerning actual leakage are adequate for proper management of the well leakage risk. The analysis indicates a benefit of system’s thinking for P&A activities. The well density is very high in certain areas on the NCS, and well characteristics which are associated with an elevated well leakage risk may be common to several wells in an area. We argue that a system’s approach to P&A is required for proper management of the well leakage risk in fields where the well characteristics indicate an elevated well leakage risk. Leveson’s STAMP theory postulates that feedback loops are essential to ensure that a system operates according to its design. The analysis indicates that the detection and monitoring of well leakage fail to provide the feedback needed to verify that the system is operating properly. It may also be difficult to establish best technologies and best practices in P&A without a well-functioning feedback system. As such, a systems theoretical approach represents a way to improve current regulations.

09:15
A Methodology Proposal for Estimating the Costs Associated with Failure Effects of Subsea Equipment

ABSTRACT. A proposed methodology is presented for estimating the costs associated with the failure effects in subsea equipment. The methodology takes into account the cost of operating losses when a failure mode inhibits the main function of the system and prevents the creation of value, as well as the costs of indirect losses, such as environmental costs caused by the degradation of the environment due to the emission of pollutants, human costs caused by human losses (injury, illness or death) and financial costs caused by the reduction of customer orders depending on the type of failure mode. The concept of Value of Statistical Life (VSL) is used to consider life losses. The development of this methodology is based on a systematic review of the literature, which uses the Scopus, Web of Science, Science Direct and Google Scholar databases. The review found that most literature focuses on modeling of the different types of costs involved in the equipment life cycle, rather than directly related to the costs of failure effects.

09:30
Human Factors and safety in automated and remote operations in oil and gas: A review
PRESENTER: Stig Ole Johnsen

ABSTRACT. This paper explores the Human Factors of automation and remote operations through review of safety literature. The literature was selected through keyword search and snowballing. We have prioritized empirical papers and explored safety issues based on a systemic perspective. Automation is designed to assist the operators in high and low workload situations. When unexpected events occur and automation fails, it can lead to loss of situational awareness (SA) and reduce system safety. The motivation for remote operations has been to reduce costs and remove operators from hazards. We have not found any systematic literature reviews of safety related to automation or remote operations. Findings indicate that poor design is a root cause in about 50% of the cases. Challenges found in accident investigations are that too many causal factors are categorized as human error. Suggested good practice of user centric design in control facilities are ecological interface design, eye tracking, and design of appropriate alarms. There is a lack of communication between system developers and end-users. There is still the challenge of vigilance when monitoring highly automated systems. Automation seems to support safety when it is based on careful design. We see the need for exploration of remote operations and automation in safety critical operations and suggest selecting specific cases together with the industry to document experiences and safety challenges

09:45
Safety Artifacts in Oil and Gas Industry: An Analysis of Permit to Work Process
PRESENTER: Francisco Silva

ABSTRACT. Safety critical activities performed in oil and gas industries need to be constantly assessed by the Permit-To-Work (PTW) process. The PTW is a formal process to communicate safety critical tasks and control certain types of works identified as potentially hazardous. Despite its relevance beyond risk analysis, the imagined purpose of this safety artifact is sometimes different from the function of this artifact in practice, being seen as an enabling device, without its real purpose. The objective of this study is to analyse the PTW process in the oil and gas industry. The context was PTW authorized for a cargo handling between the oil rig and a supply vessel and was collected through observations, interviews, and documents analysis. The Functional Resonance Analysis Method (FRAM) was adopted to modelling the Work-As-Done (WAD) and the Work-As-Imagined (WAI). The analysis allowed identify four factors that could be linked with the differences of the artifact in the practice: lack of system integration on the rig; centralized information on the shift leader; compliance with the task registration; and lack of feedback concerning the operation. This study illustrates how this systematic approach helps to understand daily safety-critical operations, improving solutions to cope with the daily variability, instead of the linear approaches commonly adopted in the industry, focusing on eliminating them.

10:00
Development of a software tool to implement reliability assessment of developing technologies
PRESENTER: Caio Souto Maior

ABSTRACT. The reliability assessment of under development O&G equipment is a critical matter to the industry and has been increasingly pursuit by the stakeholders. In this context, it is essential to aggregate information from different taxonomical levels and from different development phases, which consists in a challenge to the traditional reliability tools. The use of a multilevel reliability model, in conjunction with the Bayesian approach, is one of the possible pathways to estimate reliability of developing technology. However, given the large amount of data and the hermetic knowledge needed to apply Bayesian methods, it is fundamental to have a reliability computational tool to implement the reliability estimation. In this work, a reliability software developed to deal with multilevel Bayesian reliability models specifically developed for the O&G industry is presented. The software tool allows the user to input equipment information and characteristics, and provide the reliability estimation all over the development process, giving the decision-makers the means to use the reliability as a key decision variable in the developing technology pathway.

10:15
Software reliability analysis in the O&G industry: a review with applications
PRESENTER: Eduardo Menezes

ABSTRACT. With the increasing digitalization of the O&G industry, the presence of software applications has been growing consistently. These are frequently related to critical functions and their failure can lead to undesirable consequences. In this context, it is fundamental to develop reliability analysis for software in the O&G industry. Software reliability presents special characteristics when compared to other O&G equipment, including the process of reliability growth at each software release and the use of special reliability models. This work has the aim of reviewing the concepts and methods regarding software reliability, applied for a completion interface software responsible to manage the communication between subsea and topside equipment. The test planning for software reliability evaluation is also developed and explained, given the particular requirements of the considered application.

10:30
Design and validation of a digital twin for fault detection at O&G facility
PRESENTER: Eduardo Menezes

ABSTRACT. The fault detection has been used by oil and gas companies for some time and can be seen in most of the areas. One of the relevant components is a centrifugal pump that is responsible for transporting a fluid from one reservoir to another. The use of these equipments is critical to the O&G refinery and drawbacks in their operation can lead to productive losses. Given the hard and costly maintenance, it is fundamental to have means for diagnosis and prognosis of this component. Digital twin (DT) has been increasingly researched for emulating physical assets in a computational environment, in order to understand their behavior in several operation conditions and carry out failure analyses. The present work develops a DT for a hydraulic centrifugal pump system, considering simulations of normal and anomalous operation, comparing with open-source databases. The DT includes modeling of hydraulics and thermal exchanges phenomena, as well as different control sequences applied to the pumping system.

09:00-11:00 Session 20D: S.11: Dynamic risk assessment and emergency management for complex human-machine systems I

Many modern engineering systems have features such as: non-constant failure and repair rates, dependencies between the component or sub-system failures and complex maintenance strategies.  Commonly used methodologies such as fault tree/event tree analysis do not adequately analyse such systems. This session will explore novel developments in methods used for predicting the failure probability or failure frequency of systems operating normal or phased missions.

Location: Room 2A/2065
09:00
A weighted fuzzy CREAM for assessing human error probability in risky experiment operation
PRESENTER: Qinhao Zhang

ABSTRACT. Laboratory safety accidents have recently been recognized as a new topic for safety research, and the existing research shows that human factor is a leading factor that causing laboratory accidents. However, at present, particular research on Human Error Probability (HEP) quantification of risky experiments in laboratory is limited. Therefore, it is necessary to implement Human Reliability Analysis (HRA) technique to quantify HEP during risky experiments in laboratory. Due to the above description, this study proposes a weighted fuzzy Cognitive Reliability and Error Analysis Methods (CREAM) to deal with HEP quantification for laboratory safety.

The proposed method comprehensively considers each CPC’s weight, each CPC’s fuzzy membership degree, and the weight of each activated fuzzy If-Then rule to determine the final membership degree of each control mode. Afterwards, the wanted HEP value can be estimated by the centre of area defuzzification.

This study select the 12•26 laboratory explosion accident in Beijing Jiaotong University as the example, and through the proposed method, the HEP value and levels of the human reliability can be collected. Finally, some particular efforts can be reasonably made to avoid human errors.

09:15
Human related risk assessment during operating dangerous experiments in laboratory
PRESENTER: Zexing Zhang

ABSTRACT. Laboratory safety has become a key concern which attracts many attentions from governments and academic institutes. Besides, according to some published reports, about 70% of accidents are closely related to human related risks. Therefore, to the laboratory safety, it is necessary to carry out some particular efforts to evaluate human related risk during operation an experiment. However, there is limited research to analyse laboratory from the aspect of human related risk. In order to assess human related risk, this study provides an integrated method which contains Hierarchy Task Analysis (HTA), Human Error Assessment and Reduction Technique (HEART) and risk matrix. HTA method is used to decompose experiments into several steps for further analysis. Then widely used human reliability method HEART is used to calculate the human error probability in each decomposed step by evaluating the corresponding Error Producing Conditions (EPCs) for each step. Finally, a risk matrix is constructed to evaluated the risk level of human in each step. The proposed method is applied to a risky experiment (the effect of magnetic field on acetylene explosion rate and pressure), the human related risk level and several risky steps are identified.

09:30
Innovative and immersive crisis management training device
PRESENTER: Justin Larouzée

ABSTRACT. Accidents such as Fukushima Daiichi demonstrates the limits of crisis management training. During a crisis, decision-makers may rely on their experience or the application procedures, but they often need creative skills. Indeed, a crisis is characterized by incomplete or contradictory information, changing goals, and time pressure. Thus, decision-makers must find solutions in an emotional context of stress and fatigue. Traditional crisis management training relies on return of experience and the drill procedures in pre-identified scenarios (theoretically explained or in serious games). This have contributed to reinforcing organizational response but it’s not adapted to develop creativity. To do so, we designed and tested an immersive training device to disrupt learners' cognition, making them aware of the importance of their senses, emotions, and representations in the decision-making process. During the session, learners must make decisions based on what surrounds them. Before exiting the room, they must evaluate their performance (which is impossible since the sequence is designed to be meaningless). The debriefing showed how difficult it was for the learners to admit the absurdness of the exercise, as revealed by the strength and ingenuity of their rationalization mechanisms. Our ambition was to complement theoretical lessons. However, first results have inspired a 3-5-year research program to study the articulation between lived experience, feelings, memory, and self-narrative in the face of a crisis.

09:45
Optimizing Observation Strategy in Emergency Response by Combining Bayesian Network and Multi-Criteria Decision Analysis
PRESENTER: Moritz Schneider

ABSTRACT. Within the realm of protection of infrastructures, it is essential to quickly identify potential risks in case of safety or security incidents. When alarms are triggered, the full extent of threats or damages is sometimes not clear. For example, unknown hazardous materials may be released or the structural integrity of a building could be compromised. In such cases, reconnaissance activities are required. Here, we study how the usage of autonomous systems equipped with portable sensors may support scenario identification and thus might help to decrease risks for emergency response personnel during scenario exploration. The process of reconnaissance can be viewed as an optimisation problem with many different criteria that affect the selection process of an optimal route through a location. Besides the gain in information about the situation, other criteria such as the safety of the autonomous system should be considered. As these criteria can be conflicting, the application of multi-criteria decision analysis (MCDA) methods might proof beneficial. In this work, we present a first approach to optimise the observation strategy in emergency response. A Bayesian network is established to infer key aspects of the situation based on new information provided by the sensors of the autonomous system. A sequential multi-criteria decision analysis is performed based on predefined criteria and current information obtained from the Bayesian network. The approach is illustrated by a simplified generic case study of a small building with multiple rooms. First results show that even simplified situations may lead to complex decision-making processes.

10:00
Safety and risk assessment of offshore wind turbines: the human factor perspective

ABSTRACT. The detrimental impacts of wind farm growth on public health and the environment have been extensively analysed. However, adverse implications stemming from offshore wind turbine operations are directly exposed to the wind turbine's offshore workers, but the specific research is limited. Among these are harsh weather conditions, particularly when ice formation is anticipated. Ice that falls from the blades is particularly hazardous. Wind turbines generate electromagnetic radiation, which causes harm to the workers. Electrocution from the turbine's high-voltage wires increases the risk of harm when working in and around a wind project. Transferring personnel to the turbines for installation, inspection, and maintenance is the most hazardous aspect of the operation of an offshore wind farm. Since turbines are only accessible by boats or helicopters, water level also plays a pivotal role in their accessibility. If the magnitude of the waves increases to a dangerous level while maintenance employees are performing operational work, they might have to be stranded. Wind turbine noise may also disrupt the sleep of offshore workers residing on platforms adjacent to the turbine. Shadow flicker, which occurs when the blades of a wind turbine rotate in bright sunlight and produce shadows, typically induces seizures in photosensitive and epileptic individuals.

The purpose of this study is to identify any reported associations between wind turbines' operations and harmful factors and suspected health-related effects on offshore workers. The risk assessment will be conducted, and the mitigation measures will be provided to address the corresponding risks identified. This study will also place an emphasis on the necessity to develop an effective tool for the analysis of the underlying hazards that exist in the generation of power from wind turbines. The Digital Twin technology will be used to create and test the future risks of the new energy system.

10:15
A resilience evaluation methodology of emergency systems under known external shocks
PRESENTER: Xu An

ABSTRACT. Under unanticipated external shocks, technical-human-organizational failures in emergency operations can lead to the crash of emergency system. Resilience is a recognized metric to evaluate system performance loss when the system is subject to undesired disruptions. In this paper, a hybrid model integrating System-Theoretic Accident Model and Processes (STAMP) with dynamic Bayesian networks (DBNs) is proposed to construct resilience assessment models. First, we identify the impact of types, intensity, and decomposed factors in known external disasters on the system. Second, emergency operations are regarded as a multi-step process, we utilize STAMP to analyze the increasing complexity and coupling of hierarchical control and feedback structure in emergency scenarios. Third, the proposed STAMP is used to develop DBN for establishing resilience evaluation models. Eventually, effects of shock categories, system configurations, and maintenance strategies on the resilience of emergency systems are investigated.

10:30
Multi-unit risk and context informed monitoring and decision-making

ABSTRACT. Single-unit probabilistic safety analysis (SU PSA) of a given NPP is a necessary, but not sufficient, condition for risk assessment and monitoring, comprehensive safety characterization, and decision-making for performing various operations in normal and emergency conditions. This is due to some shortcomings of the PSA making it unrealistic, which necessitates its mandatory complementation with deterministic safety analysis (DSA). These shortcomings are the following: 1. Risk has an integral character, and many deterministic analyzes a priori do not have an integral character, but a particular character for establishing facts, subjectively determining and reducing a limited number of representative sequences and boundary cases, with a similar accident progression (the simulation is with an integral model, but for a particular case). And besides, the exchange of information between DSA and PSA is incomplete, and their interaction is suboptimal. 2. Risk is uncertain, which is due to ignorance (epistemic uncertainty) and the impossibility to clearly separate and distribute in time and space the random from the regular (stochastic uncertainty). 3. Site risk must be based on multi-unit (MU) PSA, which requires a compatible, and not independent, examination of all units, fuel storages and radioactive materials for the most complete possible spectrum of combined initiating events and internal and external hazards at the site. 4. Risk is not only static but also dynamic, which gives rise to the need for operationality in monitoring, distribution (globally & locally) and reckoning in time (short & long term) for decision-making, considering regulatory requirements and quantitative rules (if any) for research, planning and evaluation of safety measures and implementation of technical solutions. 5. Risk includes not only the calculated part but also the residual risk too, which is due to the omission or non-correlation in the PSA model of the multiple initiating events and hazards, common causes failures (CCF) of structures, systems and components (SSC), changes in configuration availability, reliability parameters of basic events and conditions in the PSA model due to its extensively increasing models, leading to difficulties for calculation, organization, uncertainty and sensitivity assessments, and conversion into more operational tools for monitoring statistical risk, which to support conscious and intuitive decision-making. 6. Risk is influenced by the processes of cognition, thinking, communication, decision and action of the person or team of people (or trained artificial intelligence, if natural intelligence is replaced), which implies a comprehensive study of the dependencies, factors and conditions (context) of the situation in which the installation operates and the person or group finds themselves and makes decision. The research and overcoming of these shortcomings can be done separately, but it can also be done by searching for a common means for their joint identification, qualification, quantification and elimination through a procedure for monitoring of risk and context in a given situation. The paper presents the possibilities for such joint overcoming by means of a risk monitoring tool based on PSA, supplemented with the procedure for context quantification of the "Performance Evaluation of Teamwork" method for multi-unit risk- and context-informed monitoring, decision-making and management.

09:00-11:00 Session 20E: Healthcare and Medical Industry

Healthcare and Medical Industry. This session explores the adoption of new technology in heathcare, their risk and risk mitigations actions. The section also presents the results of risk perception studies in healthcare and advanced risk assessment methodologies for healthcare systems.

Location: Room 100/4013
09:00
Identifying Variation in the Newborn Life Support Procedure: An Automated Method
PRESENTER: Alfian Tan

ABSTRACT. This research is conducted for developing an automated method to recognize variations in the Newborn Life Support (NLS) procedure. Compliance with the NLS standard guideline is essential to prevent any adverse consequences for the newborn. Video recordings of resuscitation are frequently used in research to identify types of variations and understand how to minimize unwanted ones. Despite their benefits, it takes a significant amount of time and human resources to manually evaluate the procedure from videos. Therefore, an automated method could help. In this study, a variation recognition based on an action recognition technique is built. In the first step, automatic object segmentation is performed on every NLS action image. In the second stage, a number of features involving the proportion of medical objects availability and their movement, as well as association among actions are extracted and fed into machine learning models. The results show that the strategy of considering actions’ associations and preliminary prediction of actions succeeded in improving the model performance. However, the whole recognition system still works fairly, and it is only for the wet towel removal step in the procedure, yet it has been useful to inform the adherence of the recorded procedure to the NLS guideline. This study is an initial work that will advance toward the integration of automated variation recognition with reliability modeling work on the NLS procedure.

09:15
Optimizing performance and safety for a particle therapy accelerator

ABSTRACT. Performance and safety are two key attributes which guarantee the effectiveness of a medical device for safe clinical application. Likewise in other sectors, the two attributes are generally inversely interdependent: the higher the performance, the lower the safety and vice versa. Performance and safety are analyzed before the certification of the medical device via the claimed benefits and risks: the claimed benefits shall outweigh the risks, which must be evaluated as acceptable. This principle also applies to design upgrades by considering the benefit-risk ratio. A design upgrade is approved if the benefit-risk ratio increases or it remains acceptable. The analysis of the benefits and the risks is either quantitative or qualitative, depending on the complexity of the device and the available clinical data. In this respect, it is important that the design of a medical device is associated to measurable performance and safety related quantities, in the conceptual design phase and then during the clinical use. For example, this makes it possible to identify if the safety systems need to be upgraded (and how) in order to allow a higher performance. But sometimes either the benefits or the risks are uncertain, which is the case of medical devices for most innovative treatment methods. There is no hard border between approval and rejection, but rather a set of principles that apply in given circumstances. From the manufacturing point of view, a medical device shall meet the progresses of the medical sector in the safest possible way (the safety-first principle). This means to develop a trade-off strategy between performance and safety, as based on measurable quantities that reflect benefits and risks for the patients with the least possible uncertainty. The medical device MAPTA (MedAustron Particle Therapy Accelerator) of MedAustron (Lower Austria) started clinical operation with proton beams in 2016 and then it has been upgraded to include other radiation heads and new ion species. The design upgrades have allowed several performance increases for patient benefits, such as the reduction of the treatment time (by optimization of the workflow), the delivery of new ion species (by allowing different treatment indications) and the optimization of the treatment plan delivery (by improving the beam quality and homogeneity of dose distribution). This paper gives an overview of the practice and the principles that have been applied in MedAustron with evaluating the performance increase (the benefits) and trading it off against safety (the risks).

09:30
Real-time integrated learning and decision making for cumulative shock degradation
PRESENTER: Joachim Arts

ABSTRACT. Problem definition: Unexpected failures of equipment can have severe consequences and costs. Such unexpected failures can be prevented by performing preventive replacement based on real-time degradation data. We study a component that degrades according to a compound Poisson process and fails when the degradation exceeds the failure threshold. An online sensor measures the degradation in real time, but interventions are only possible during planned downtime.

Academic/practical relevance: We characterize the optimal replacement policy that integrates real-time learning from the online sensor. We demonstrate the effectiveness in practice with a case study on interventional x-ray machines, where filaments are subjected to cumulative shock degradation. The data set of this case study is available in the online companion. As such, it can serve as a benchmark data set for future studies on stochastically deteriorating systems.

Methodology: The degradation parameters vary from one component to the next but cannot be observed directly; the component population is heterogeneous. These parameters must therefore be inferred by observing the real-time degradation signal. We model this situation as a partially observable Markov decision process (POMDP) so that decision making and learning are integrated. We collapse the information state space of this POMDP to three dimensions so that optimal policies can be analyzed and computed tractably.

Results: The optimal policy is a state dependent control limit. The control limit increases with age but may decrease as a result of other information in the degradation signal. Numerical case study analyses reveal that integration of learning and decision making leads to cost reductions of 10.50% relative to approaches that do not learn from the real-time signal and 4.28% relative to approaches that separate learning and decision making.

Managerial implications: Real-time sensor information can reduce the cost of maintenance and unplanned downtime by a considerable amount. The integration of learning and decision making is tractably possible for industrial systems with our state space collapse. Finally, the benefit of our model increases with the amount of data available for initial model calibration, whereas additional data are much less valuable for approaches that ignore population heterogeneity.

09:45
An analysis of resilience and risk assessment in digital healthcare systems
PRESENTER: Georgia Tzitzili

ABSTRACT. Complexity and criticality of the healthcare services including the emerging demand to overcome crisis like the covid-19 pandemic, along with the continuing technological changes, affect overall healthcare resilience (Biddle et al., 2020; Wiig & O’Hara, 2021). Resilience, that outlines the system’s capacity to respond and recover from a disruption/threat (Aven, 2011), is essential in risk management and policy making to provide high quality and available healthcare services. In this work, an analysis on healthcare information systems’ resilience and risk management of the information technology (IT) infrastructure is presented, based on an extensive literature review. The aim is to identify the several aspects of healthcare information systems’ resilience and its contribution in risk management. In the past, healthcare IT included management software, access to e-prescription and electronic health records, however nowadays, with the covid-19 pandemic acting as a digital accelerator, healthcare systems offer access to diagnostic and therapeutic decision-making methods, telemedicine, teleconsultation, robotics, Medical Internet of Things and other related to IT (El-Sherif et al., 2022; Shen et al., 2021). According to Biddle et.al. (2020), healthcare resilience is mostly referred to service delivery, health workforce or governance issues. On the other hand, in only very few cases the dimension of healthcare information systems’ resilience is considered such as security protocols adopted, lack of cloud-based framework, different cloud-computing technologies and security layers, lack of healthcare professionals’ IT security awareness and unsafe remote patient monitoring technologies. Although, healthcare IT resilience isn’t considered in the strategic planning of the health organization, new surveys reveal the need to address the risk and the vulnerable parts of the system that may arise, especially in the field of cybersecurity and education as this is identified to mainly gain the researchers’ interest. Even small investments in the area may increase resilience (Shrivastava et al., 2021). The main outcome of our work has revealed the lack of cybersecurity resilience strategies and risk management frameworks of public healthcare organizations. The findings address the need for further resilience and risk assessment modeling in IT healthcare domain.

Aven, T. (2011). On Some Recent Definitions and Analysis Frameworks for Risk, Vulnerability, and Resilience: On Some Recent Definitions and Analysis Frameworks. Risk Analysis, 31(4), 515–522. Biddle, L., Wahedi, K., & Bozorgmehr, K. (2020). Health system resilience: A literature review of empirical research. Health Policy and Planning, 35(8), 1084–1109. https://doi.org/10.1093/heapol/czaa032 El-Sherif, D. M., Abouzid, M., Elzarif, M. T., Ahmed, A. A., Albakri, A., & Alshehri, M. M. (2022). Telehealth and Artificial Intelligence Insights into Healthcare during the COVID-19 Pandemic. Healthcare, 10(2), 385. Shen, Y.-T., Chen, L., Yue, W.-W., & Xu, H.-X. (2021). Digital Technology-Based Telemedicine for the COVID-19 Pandemic. Frontiers in Medicine, 8, 646506. Shrivastava, U., Hazarika, B., & Rea, A. (2021). Restoring clinical information system operations post data disaster: The role of IT investment, integration and interoperability. Industrial Management & Data Systems, 121(12), 2672–2696. Wiig, S., & O’Hara, J. K. (2021). Resilient and responsive healthcare services and systems: Challenges and opportunities in a changing world. BMC Health Services Research, 21(1), 1037.

10:00
Improving reliability of radiopharmaceuticals synthesis system used in nuclear medicine
PRESENTER: Nicolae Brinzei

ABSTRACT. The use of radioactive isotopes in nuclear medicine is crucial for diagnosis and treatment of many diseases, and the multi-purpose synthesis system plays a vital role in producing radiopharmaceuticals. However, due to the harsh radioactive operational conditions, the system must have high reliability to avoid failures that could have serious consequences. The purpose of the work presented in this paper is to develop a probabilistic assessment of system reliability using a mathematical modelling based on graph issued from Hasse diagrams and a reliability block diagram to compare the results. By dividing the synthesis system into small subsystems, the reliability of each subsystem and its possible architectures has been quantified. This approach allows identifying the components that have crucial impact on system reliability and clarifying their failure rates, taking into account operational conditions. The results show that it is possible to assess the reliability of a multi-purpose synthesis system used to produce radiopharmaceuticals for nuclear medicine using the proposed modelling and probabilistic assessment approach.

10:15
A Fuzzy-based Multi-dimensional Risk Assessment Model for the Healthcare System
PRESENTER: Hanqin Zhang

ABSTRACT. The adoption of multi-criteria risk assessment methods, such as probability and impact matrices, in the healthcare sector poses several risks, one of which is subjectivity and bias in decision making. To this end, the aim of this research is to develop a multi-criteria healthcare risk assessment model under fuzzy environment. To achieve this, a three-step research methodology was employed. The study identifies 29 risk factors, with insufficient team cooperation and communication, poor response to health pandemics, and misdiagnosis being the top three risks. The findings of this study might be of value to relevant medical personnel and organisations in effectively managing risks and developing appropriate risk interventions.

10:30
Risk Seeking Attitudes towards COVID-19 Vaccination and the role of HCPs (Health Care Professionals) in Norway and Pakistan.
PRESENTER: Frederic Bouder

ABSTRACT. COVID-19 is a pandemic situation in which people suffer from the SARS-COV virus. Most of the people this fall are sick as a result of COVID-19 due to which the people suffer from mild to severe symptoms and recover without having special treatment. However, people suffering from severe symptoms have to require special treatment and medication. For this purpose, vaccinations had been introduced to prevent COVID-19. However, several risks had been associated with the vaccination for COVID-19. There exist several risks and threats associated with vaccination. The purpose of this paper is to identify the COVID-19 vaccination risks among people in Norway and Pakistan.

This research study attempts to highlight the risks identified by Healthcare Professionals (HCPs) while providing COVID-19 vaccination. This study adopts a qualitative research design in which a total of 20 interviews have been conducted with HCPs. A total of 10 interviews from Pakistan and 10 interviews from Norway have been conducted. The research has indicated that a total of several causalities and motivational conversations with HCPs are the motivations for people to take the COVID-19 vaccination. However, the myths and unavailability of reality and truth are the risks faced by people in Pakistan. HCPs from Norway have indicated that the willingness of the patients and support from family members are motivations for the encouragement of people to take vaccines in Norway. However, the risk-seeking attitude of people is influenced by the scare of side effects and negative media reports, and lack of trust in the vaccines. Similarly, a research study should also be carried out by following a longitudinal design by determining the risk-seeking attitude of people before and after taking the COVID-19 vaccination.

09:00-11:00 Session 20F: Security II
Location: Room 100/5017
09:00
A pragmatic capability-based framework for national security risk governance
PRESENTER: Monica Endregard

ABSTRACT. The capabilities needed to protect national security and conduct crisis management in a comprehensive defence context depend on increasingly interconnected and complex ICT infrastructures and systems. Straightforward examples of such critical ICT-based capabilities are the creation of situational awareness, coordination, command and control, communication and exchange of information. As a consequence, ICT-security, and in particular protection of availability and integrity, is of crucial importance. However, protective security work in a national security perspective has traditionally focused on the confidentiality of information and preventing unauthorized access to ICT systems that handle this information. Besides, security has been typically designed to comply with pre-defined protection requirements focused on the protection against specific threats, but without the necessary emphasis on the larger picture of how and for what purposes ICT infrastructures and services are used.[1] Trends in risk and security research, as well as the new Norwegian security legislation launched in 2019, put the mission outcomes as the key drivers for identifying security criteria and prioritising security measures. This entails a shift of balance of the protective security work from a predominantly rule-based regime to a risk-based approach. Mission criticality should guide identification and prioritisation of security measures to achieve an appropriate level of security for organisations performing activities and operating information systems and infrastructures of importance for national security.[2] The implementation of the new security regime requires a shift of mind-set as well as a new systematic holistic approach to create the connection between high-level national security goals via capabilities down to the technical level of ICT systems. The paper suggests a pragmatic capability-based framework for national security governance inspired by system-theoretic approaches to security.[3] The military capability and context of air space surveillance and assertion of sovereignty is used as a case to develop and illustrate the framework. The first step encompasses information elicitation. National security interests are put in relation with specific (military) mission goals, which in turn point to individual tasks at the operational level. The aim is to create a hierarchy and traceability from high-level security interests to the criticality of the ICT systems underpinning military tasks. The second step constitutes the risk assessment part. For modern digital systems and infrastructures complexity and uncertainty will contribute to major challenges in describing and evaluating risks in order to decide on security measures, thus requiring integrated and dynamic risk and security management founded on a broad knowledge base. Both internal and external dependencies with associated uncertainty should be described. [1] Young, W. & Leveson, N. (2013). Systems thinking for safety and security. In Proceedings of the 29th Annual Computer Security Applications Conference (ACSAC ’13). ACM, New York, NY, USA, 1–8. [2] Act of 1 June 2018, No. 24 relating to national security (Security Act). https://lovdata.no/dokument/NLE/lov/2018-06-01-24 [3] Carter, B.T, Bakirtzis, G., Elks, C.R. & Fleming, C.H. (2018). “A systems approach for eliciting mission-centric security requirements.” IEEE. IEEE Xplore Full-Text PDF

09:15
Societal Security as a System-of-Systems: Customs Agencies’ Cross-Sectoral Contributions

ABSTRACT. The aim of this paper is to provide new insights into cross-sectoral, cooperative aspects of societal security. Societal security can be defined as the continuous outcome of a resilience-based system-of-systems whose purpose is to protect society against a wide range of risks. One of its subsystems is the field of customs. The role of customs agencies in societal security is not well understood. In particular, the cross-sectoral, cooperative aspects have been overlooked both in official guidance and in practice. This paper analyzes the contributions of the Norwegian Customs (NC) to societal security from a system-of-systems perspective, using data from governance documents, official reports, and a list of the NC’s collaborators. The main findings are that (1) customs is a node in societal security, and its role much wider than earlier recognized; that (2) the current framework for societal security does not ade¬quately account for agencies whose normal contributions are outside their own sector; and that (3) this lack of understanding im¬pedes efficient and effective measures. Insights on societal security as a system-of-systems are summarized in a jigsaw puzzle analogy.

09:30
Data-driven analysis of the airport cargo screening process
PRESENTER: Jacek Skorupski

ABSTRACT. Shipments transported by air are screened, as are passengers and their luggage. It aims to detect objects or substances that can be used to commit an act of unlawful interference. They are governed by international regulations, while their implementation is monitored due to the possible severe consequences of allowing prohibited substances to be transported. These regulations are subject to frequent amendments aimed at the mandatory introduction of increasingly newer equipment and control techniques into widespread use. This is necessary due to the parallel development of methods and techniques used by those planning to commit an act of unlawful interference. Therefore, it is essential to study the effects of using the aforementioned new equipment variants for the cargo security checkpoint (CSC) and new inspection procedures. The two most vital criteria in this evaluation are the effectiveness of detecting prohibited items and substances and the capacity of the CSC. These criteria are contradictory, which forces the search for compromise solutions. Due to the possible wide variety of shipments, the screening process is complex, and its parameterization under specific conditions is challenging. The research aimed to analyze the security screening process, considering the number of items included in a single load and its weight. In doing so, it should be noted that these shipments are usually commercial and must be delivered to a specific consignee. This necessitates a specific approach to the inspection process, which seeks to preserve the integrity of each shipment. If effective inspection of a shipment’s contents requires disassembly, it is necessary to reintegrate it into the original cargo unit. This makes the inspection process time-consuming, which is a significant problem in on-time delivery. This is all the more important because most cargo shipments are transported by aircraft also carrying passengers and their luggage. The main focus of the research was on the CSC throughput, depending on the type of shipment being inspected. This was done using the results of 2021 measurements, which recorded, among other things, the type and time of inspection performed. A model based on a Bayesian Network was developed to examine the probability of performing particular types of inspections depending on the weight and number of pieces comprising a single cargo unit and the execution times of inspection procedures. An essential part of the research was developing a tool for predicting the inspection time of an entire batch of shipments to be transported on a single flight. For this purpose, a Naive Bayes Classifier was used, which proved effective in the application studied. The developed model served to understand better the course of the cargo screening process at the airport and as a verification tool for the micro-scale simulation model of the studied process implemented in the form of a colored, timed, stochastic Petri net. The methods used and the tools created made it possible to evaluate different variants of CSC equipment and other solutions for the screening process in terms of throughput. They thus provided the basis for the bicriteria analysis mentioned earlier.

09:45
Lessons learned from performing cyber-security research on critical infrastructures

ABSTRACT. The last 5 years has marked a paradigm shift when it comes to focus and awareness on cyber security across industries. In Norway this has been strongly motivated by governmental influence through in particular updated rules and regulation [1]. Security and cyber-security as topical areas have brought challenges when integrating with safety, as discussed in [2]. One initiative addressing cyber challenges has been the 4-year cyber research program CybWin (2019-2022) which has had a holistic, practical approach to cyber security of Norwegian critical infrastructures (CI). Previous project reporting has included the organization, setup and challenges concerning laboratory infrastructure and how experiment were conducted [3], and knowledge and training needs for critical infrastructure stakeholders.

A motivation for the CybWin project was the unavailability of cyber ranges, cyber security infrastructures and systems, and relevant cyber security data sets pertaining to CI. An updated survey [4] from 2021 provides overview of existing systems and infrastructures for cyber research and highlights both variability in types and size of data sets available as well as system and infrastructure realism. As an answer, a cyber security center (CSC) research infrastructure was established at the Institute for Energy Technology in Norway. The CSC was developed iteratively with the needs of increasingly developing cyber research requirements of CybWin, resulting in capabilities to perform controlled cyber-attack experiments on TRL9 system enclaves in Air Traffic Management and Energy grid systems.

The paper presents an overview of research provided in the CSC, highlighting both capabilities and shortcomings of the current laboratory infrastructure. The CybWin CI enclaves are presented in more detail to provide insights into the types and variability of systems and knowledge required for the performed cyber research. We discuss the suitability of the infrastructure with regards to supporting different types of systems and experiments, as well as supporting needs of different stakeholders related to CI operations. Experiences from cyber research in the CSC in the period 2019-2022 indicate a strong need for access to expertise in information technology and operational technology systems and operations, cyber-attack and cyber defense competence, human factors knowledge and experimental research competence for complex systems. To address practical cyber security challenges pointed out in [2] we provide suggestions on how industry and academia can be involved practically to obtain access to the competence needed to ensure realistic results and a wide impact.

[1] Ministry of Justice and Public Security: "Act relating to national security" and "Regulations relating to the protective security work of undertakings", 2019. [2] Simensen, J.E., Gran, B. A., "Information- and Cyber-Security Practices as Inhibitors to Digital Safety", Risk, Reliability and Societal Safety - ESREL 2021, 19-23 September 2021, Angers, France. [3] J. E. Simensen, P-A., Jørgensen, A. L. Toppe, "Experience from performing controlled technical cyber experiments on Critical Infrastructure as hybrid events", ESREL 2022, Dublin, Ireland, 29th Aug-1st September 2022. [4] Conti, M., Doradel, D., Turrin, F.: "A Survey on Industrial Control System Testbeds and Datasets for Security Research", IEEE Communications Surveys & Tutorials, 2021, DOI:10.1109/COMST.2021.3094360

09:00-11:00 Session 20G: Autonomous Driving Safety II
09:00
Challenges and emerging practices in design of automation

ABSTRACT. Automation is expected to improve operational efficiency, as well as increased safety and quality consistency. But as more automation is added to a system, the lower the situation awareness of the operators, and the less likely they will be able to take over control when needed. In safety critical systems this could have severe effects, and several recent accidents illustrates that poor implementation of meaningful human control in automated systems and remote control is a major accident cause. As the capabilities of autonomy increase, the frequency of human intervention will be less. But for the foreseeable future there will be some level of human-system interaction, and the success of these semiautonomous systems will be highly dependent on human-autonomy interfaces. Research into automation early stated that this new technology challenges sensemaking, and requires more, not less interaction design and interface design. This underlines the importance of investigating design guidelines of human-autonomy interfaces, to ensure that these systems are designed in a way that aligns with human capabilities, both strengths and weaknesses.

With this as a background we conducted semi-structured interviews with 14 experts involved in design to identify challenges in design when introducing automation, as well as identify emerging practises in use. The interview notes were subject to a thematic analysis, and this resulted in the two main themes, "Challenges in design " and "Design in practice", which each has its associated sub themes. The main themes and subthemes illustrate the current challenges in automation technology design, as well as the practices adopted by designers of these technologies. The main themes are linked in such a way that challenges in design contribute to shaping design practice. However, also aspects of the current design practices contribute to several challenges. They resulting themes underline the need to update both design methods and design standards to overcome these challenges, as well as ensuring that best practices are used in real design processes. There is a growing awareness that introduction of automation requires innovation and development in design methods and standards to ensure that human factors are considered in an early phase of technology development. The design and introduction of new automated technology in safety critical environments is often done by using technology first followed by user acceptance, resulting in a problematic totality characterized by complexity and risky practices.

The results from our work provide a multifaceted picture of both design practices in use today, as well as the challenges related to these practices in safety critical environments. Ensuring design processes that consider human factors by involving users in an early phase seems to be a prerequisite for successful design in safety critical environments. The development of both design methods and standards to facilitate this seems to be an important way forward.

09:10
Verification of Self-Healing and Self-Proctecting Properties in Autonomous Vehicles

ABSTRACT. The ongoing developments and research towards autonomous vehicles in the automotive domain lead to various challenges. One of the main challenges is to design and develop a dependable system and verify its safety, reliability, and security. The current design options and ideas cannot provide a satisfactory solution due to the shift from manually controlled systems to highly complex and interconnected systems [1]. In addition, also the cost requirements must be considered within the technical solution to develop economically reasonable autonomous vehicles.

In the scientific community, the so-called "Organic Computing" offers a promising approach to overcome these presented adversities. The Organic Computing approach proposes the introduction of self*-properties into the system. This approach includes, i.a., self-configuration, self-organization, self-management, self-healing, and self-protecting, with the idea that the system can carry out specific, intelligent tasks independently from human commands. Therefore, often observer/controller architectures [2], to guarantee that the intended purpose of the overall vehicles is met, are used. Thereby the controller has three options to influence the system's behavior: influencing local decision rules, the system structure, and the environment. The best option for the current situation is determined, selected, and forwarded to the actuators. Then the resulting behavior is observed, and machine learning techniques can be used to improve the selection process over time.

Regarding safety and security, the self*-properties, particularly self-healing and self-protecting, are introducing new possibilities. Self-healing is the property of the system to be able to detect, diagnose and repair failures on its own, and self-protecting is the property of the system to protect itself against (security) attacks [1]. The emerging benefit is the increase in the system’s safety and security parameters while maintaining the resources and costs at a reasonable level.

Current research focuses mainly on architectures and functional implementations of self*-properties. However, to our knowledge, approaches for modeling and verifying these properties' desired impact still need to be developed. The verification of self-*properties is essential for a suitable and sufficient design. Therefore, this paper identifies and defines the resulting ideas regarding self-healing and self-protecting and their implication for other self*-properties. Based on these definitions, modeling concepts to verify and quantify these properties regarding of safety and security are viewed. Then an existing assessment tool called "ERIS" [3] is extended by this modeling concept. Lastly, exemplary assessments are performed to validate the quality of the implemented modeling concept.

[1] Tomforde, S., Sick, B., & Müller-Schloer, C.. (2017). Organic Computing in the Spotlight. [2] Richter, Urban & Mnif, Moez & Branke, Juergen & Müller-Schloer, Christian & Schmeck, Hartmut. (2006). Towards a generic observer/controller architecture for Organic Computing. 112-119 [3] Rinaldo, R. and D. Hutter (2020). Integrated analysis of safety and security hazards in automotive systems. In Computer Security, Volume 12501 of LNCS, Guildford, UK. Springer. ESORICS 2020, workshop CyberICPS.

09:20
Towards Modelling Sensor Failures in Automotive Driving Simulators
PRESENTER: Rhea C. Rinaldo

ABSTRACT. Testing autonomous vehicles is a costly and tedious task, but essential to ensure the safe operation of the vehicle in any conceivable scenario. Therefore a plethora of diverse, high quality test scenarios are required to train and validate the autonomous function. This is extremely challenging, as the amount and diversity of required test scenarios can hardly be covered by test drives alone. In order to increase the diversity of test scenarios and stage emergency situations like accidents safely, many manufacturers rely on automotive driving simulators. In general these simulators extensively model some 3D environment and simulate an ego vehicle in it. The usual output comprises the perceived data of various vehicle sensors, ground truth information and meta information such as collision indicators. This data can either be used as input for training purposes, or for testing purposes by connecting the trained function directly to the simulator.

Currently these simulators follow a ``functional paradigm'' that focusses on a realistic representation of the environment and implements sensing components that percept it ideally. However in reality sensors do not produce an ideal output, as they are victim to various environmental effects like interfering signals, radiation, reflections, obscuration through mud, water, foliage etc., vibration and natural ageing and wear. Furthermore, the environment can be flawed, for instance, traffic signs may be damaged or obscured. Consequently, we suspect that in the future a shift from the functional paradigm to a ``fault paradigm'' will become necessary. Thereby the idea is that simulators include faults and failures of the infrastructure, the ego vehicle and its sensors, with the goal of achieving a more realistic representation of driving in the actual world. Moreover, the generated data is valuable to the training and testing of fault detection mechanisms (in combination with ground truth data). For example the identification of an obscured sensor, ghost targets or even a security attack.

To support this, we present first steps towards the direction of a fault paradigm. Therefore we make use of the open-source simulator Carla and extend it by several failure modes of radar sensors. In the first step, a literature study on the various environmental effects that can condition autonomous radars is performed and the current implementation of the Carla radar is discussed. Afterwards an extension of the C++ Carla base by a ``Faulty Radar Sensor'', that implements several failure modes is presented. We distinguish four base failure modes that allow for further parametrization: Data transfer error due to loose connections or jamming attacks, blockage by obscuration, reconfiguration of the sensor position as subject to bad mounting or a precedent collision and signal disturbances through intentional or unintentional interference and reflections. We aim to close the paper by showing experimental results, discussing the effect for autonomous functions and future extensions to this topic.

09:30
Performance Index Modeling from Fault Injection Analysis for an Autonomous Lane-Keeping System
PRESENTER: Parthib Khound

ABSTRACT. A faulty sensor data could not only undermine the stability but also drastically compromise the safety of autonomous systems. The reliability of the functional operation can be significantly enhanced, if any monitoring modules can evaluate the risk on the system for a particular fault in the sensor. Based on the estimated risk, the system can execute the necessary safety operation. To develop a risk evaluating algorithm, the data connecting the faults and the effects should be gathered. This paper focuses on quantifying the effects caused by given fault types with different strengths. The considered system is a lane keeping robot and the sensor is an RGB camera. There exist many fault injection methods, based on signal conditioning [1], or applied to AI networks [2, 3], sensor signals [4, 5], etc. In this paper, the faults are injected to the input image of the RGB camera. This emulates the possible hardware and environmental faults on the sensor. Some common examples of the fault types are condensation, blur, broken lens, etc. The strength represents a defined intensity of a fault type to three levels, i.e., slight, medium, and extreme. For the fault injection, we will be using our in-house tool [6].

In [4], perception errors are statistically injected to the bounding box and the brake-torque effects on the wheels of an autonomous vehicle are presented. In [5], the analyses of the effects on an autonomous driving application for different fault types in the RGB camera are discussed. Indeed, different fault types have certain effects on the functional operation. Additionally, the strength of faults intensifies the effects to different degrees. For example, a slightly blurry image would not necessarily alter the lane keeping performance of the robot, but if the image is extremely blurred, the lane keeping operation could completely fail. Here, the quality of the lane following for different fault type and strength will be quantified with reference to the normal operation on a prescribed course.

References: [1] C. Yang, et al., “A fault-injection strategy for traction drive control systems,” IEEE Trans. Indust. Electron., vol. 64, no. 7, pp. 5719-5727, (2017). [2] Y. Liu, et al., “Fault injection attack on deep neural network,” in IEEE/ACM Int. Conf. Com.-Aided Design (ICCAD), pp. 131-138, (2017). [3] P. Su and D. Chen. “Using fault injection for the training of functions to detect soft errors of DNNs in automotive vehicles,” in Int. Conf. Dependability Complex Syst., pp. 308-318, (2022). [4] P. Mitra, et al., “Towards modeling of perception errors in autonomous vehicles,” in IEEE 21st Int. Conf. Intell. Transp. Sys. (ITSC), pp. 3024-3029, (2018). [5] F. Secci and A. Ceccarelli, “On failures of RGB cameras and their effects in autonomous driving applications,” in IEEE 31st Int. Symp. Soft. Rel. Eng. (ISSRE), pp. 13-24, (2020). [6] O. Mohammed (2022), “Fault injecting tool,” [Online]. Available: https://github.com/omarMohammed-USI/Faults-injecting-tool-USI, (accessed Dec. 9, 2022).

09:40
Enhancing Safety Assurance for Automated Driving Systems by Supporting Operation Simulation and Data Analysis
PRESENTER: Dejiu Chen

ABSTRACT. Automated Driving Systems (ADS) employ various techniques for operation perception, task planning and vehicle control. For driving on public roads, it is critical to guarantee the operational safety of such systems by attaining Minimal Risk Condition (MRC) despite unexpected environmental disruptions, human errors, functional faults and security attacks. This paper proposes a methodology to automatically identify potentially highly critical operational conditions by leveraging the design-time information in terms of vehicle architecture models and environment models. To identify the critical operating conditions, these design-time models are combined systematically with a variety of faults models for revealing the system behaviours in the presence of anomalies. The contributions of this paper are summarized as follows: 1) The design of a method for extracting related internal and external operational conditions in different system models. 2) The design of software services for identifying critical parameters and synthesizing operational data with fault injection. 3) The design for supporting operation simulation and data analysis. In this paper, we present the design of a framework that aims at assuring the operational safety of ADS by connecting design-time system models and run-time monitoring services data. The design-time system models contain the specifications of system architecture and external environment. A software service (referred to as critical environmental parameters identifier) is introduced to derive potentially safety-critical environmental conditions and the corresponding operational scenarios to be simulated. To verify and validate the robustness of the intended functionalities of ADS, another software service (referred to as critical vehicle parameters identifier) is employed to elicit potentially critical internal operational conditions of related system components of a vehicle for a specification of fault injection. For the simulation, these generated specifications of operational scenarios and faults are imported into a chosen simulation platform (e.g., CARLA). A system operation analyzer is designed to collect and analyze the operational data generated by the simulation runs. This analyzer automatically classifies the failure cases and the related component anomalies. The results are then fed back to the system models for the enrichment of system knowledge.

09:50
Complementary object detection: Improving reliability of object candidates using redundant detection approaches

ABSTRACT. In actual advanced technical applications, like autonomous driving, Machine Learning is utilized. Most of these methods work well in certain and/or trained situations but can fail in unknown or uncertain situations. Therefore, overreliance might lead to safety-critical situations. Detecting objects appears as a key task for the operation of safety relevant automated systems, like autonomous vehicles. During operation, a set of static and dynamic objects must be continuously detected with a high reliability. This requirement results from the consequences of failures for system’s decision making and action realization. To address potential failures of an object detection system, different redundant approaches can be used. Object detection is usually performed by learning-based methods using models trained using sensor data. Recent research aims for fusion, combining different modalities and architectures to utilize their advantages. Due to the performance potential of learning-based methods and the resulting research interest, a high variation of promising approaches is available. It is known that different approaches benefit from different features or conditions. It can be assumed that a combination of diverse approaches compensates each other’s drawbacks and leads to improved reliability and robustness of the final prediction. However, the quantification of reliability of a particular prediction is given by a detection score predicted by the trained model. While a higher score indicates higher confidence, it does not reflect the actual uncertainty of the prediction. It remains difficult to decide whether to accept or reject a prediction due to the lack of situational knowledge and a reliable quality indicator. In this contribution the fusion of detections of multiple detection systems at a detection level is studied using different opinion pooling strategies. The predicted detection score is calibrated using the true positive rate at a score level. This results in a standardized score over different detection approaches. Afterwards detection candidates of different approaches are associated and a new detection candidate is generated in a fusion stage. Therefore missed or false positive detections of one apporach can be compensated based on a redundent set of predicted object candidates. The aim is to highlight certain detections, to reduce the detection score of false positive detections, and to provide a reliable measurement of certainty during online operation. The fused results show an improved performance compared to a single approach. Future work will aim for the consideration of tracking and timeline information in order to further imporve the fusion performance.

10:00
Safety hazard identification of inspection and maintenance operations for Automated Driving Systems in Mobility as a Service

ABSTRACT. Cooperative decision-making between humans and automated agents operating at various levels of autonomy (LoA) is an increasing trend observed across multiple industries and research areas. Assessing emerging properties and unintended behaviors in complex engineering systems is key to developing policies to prevent and mitigate risks during operation stages. An aspect often overlooked in analyses of autonomous system operation is developing and enforcing adequate inspection and maintenance policies. In this work, the Concurrent Task Analysis (CoTA) method is used to analyze the operation of a Level 4 Automated Driving System (L4 ADS) fleet employed for Mobility as a Service (MaaS). The method is employed to define tasks and responsibilities key to supporting the safe operation of the ADS vehicles based on a functional breakdown of the system, the development of operational scenarios, and the identification of safety hazards. The CoTA describes the interaction between distinct fleet operator agents (e.g., fleet monitoring and vehicle maintenance), identifies critical tasks, and traces cascading and latent failures between them. This paper presents the CoTA of the inspection and maintenance operational phases and discusses the safety implications on the fleet operator’s safety responsibilities to ensure adequate operation of the ADS fleet.

09:00-11:00 Session 20H: S39: Reinforcement Learning, Transfer Learning and Preventive Maintenance
09:00
Learnable wavelet transform and domain adversarial learning for enhanced bearing fault diagnosis

ABSTRACT. Unsupervised domain adaptation techniques have been widely used to detect the health conditions of rolling bearings. Despite the importance of cross-domain fault diagnosis, it has not received much attention for applications in noisy environments. To address this issue, we propose a novel architecture that combines learnable wavelet packet transform with domain adversarial neural networks (DANN-LWPT). The proposed method involves utilizing the learnable wavelet packet transform (LWPT) and wavelet packet transform (WPT) to decompose and reconstruct signals from the source and target domains. These reconstructed signals are then fed into a domain adversarial neural network (DANN). We introduce a guidance loss that dynamically enforces similarity between the source and target domain signals in the time-frequency domain during the process of decomposition and reconstruction, promoting the learning of domain-invariant and discriminative features. We compare our proposed method with other representative domain adaptation approaches, and the results of the evaluation show its superiority

09:15
Calibrated Self-Training for Cross-Domain Bearing Fault Diagnosis
PRESENTER: Florent Forest

ABSTRACT. Fault diagnosis of rolling bearings is a crucial task in Prognostics and Health Management, as rolling elements are ubiquitous in industrial assets. Data-driven approaches based on deep neural networks have achieved great progress in this area. However, they require the collection of large representative labeled data sets. Yet, industrial assets are often operated in working conditions different from the one in which the labeled data have been collected, requiring a transfer between working conditions. In this work, we tackle classification of bearing fault types and severity levels under varying operating conditions, in the setting of unsupervised domain adaptation (UDA), where labeled data are available in a source domain and only unlabeled data are available in a different but related target domain. In self-training UDA methods, based on pseudo-labeling of target samples, one major challenge is to avoid error accumulation due to low-quality pseudo-labels. Most such methods select pseudo-labels based on their prediction confidence. However, it is well known that pseudo-labels are often over-confident and badly calibrated in the target domain. In this work, we aim to address these challenges and propose to incorporate post-hoc calibration, such as the well-known temperature scaling, into the self-training process to increase the quality of selected pseudo-labels. We propose calibrated versions of two self-training algorithms, Calibrated Pseudo-Labeling and Calibrated Adaptive Teacher, achieving competitive results on the Paderborn University (PU) benchmark.

09:30
Federated Transfer Learning for Condition-based Maintenance in Nuclear Power Plants
PRESENTER: Vivek Agarwal

ABSTRACT. The current fleet of nuclear power plants in the United States is transitioning from a time-consuming, labor-intensive, and cost-prohibitive preventive maintenance strategy to a condition-based predictive maintenance (PdM) strategy. A risk-informed PdM strategy will be discussed in the presentation. One of the aspects of the risk-informed PdM strategy is scalability, the presentation will focus on a first-of-a-kind federated transfer learning (FTL) framework [1] (Fig. 1) for the condition-based maintenance of a circulating water system. The FTL framework developed and presented here details how an aggregated model obtained under federated learning at one plant site can diagnose the same fault mode at a different plant site using transfer learning. The fault used for demonstration is waterbox fouling in the circulating water system. The FTL framework was verified using a multi-kernel adaptive support vector machine and an artificial neural network, details are in [1]. The results compare the performance of individual machine learning models with aggregated models as part of federated learning. In the case of transfer learning, the results of individual machine learning models are compared with transferred aggregated models with and without training on data from a new plant site. Finally, this paper presents some challenges associated with the FTL framework development.

09:45
Condition monitoring of railway overhead catenary through point cloud processing

ABSTRACT. Railway overhead catenary (ROC) is a linear asset and spread over large area. Different regions of the linear asset are exposed to different climate conditions such as temperature, wind, and ice accretion and operating conditions. If these conditions disrupt the functionality, then it leads to failure resulting in line closure. Being ROC is a linear asset, condition monitoring (CM) is difficult due to large distances, climate conditions, costly due to requirement of special equipment at the location and effects the scheduled traffic by occupying the tracks. Hence, there is a need for technologies to monitor the condition of ROC through a cloud-based approach which has faster response time. Light Detection and Ranging (LiDAR) can be used for CM of ROC. It collects spatial data in the form of 3D point cloud in various domains such as construction, mining and railways. LiDAR devices will be mounted on locomotives on a regular traffic. The point cloud data is processed to extract the railway assets such as tracks, masts, catenary etc. and surrounding vegetation. Further, processing of point cloud data can be used to extract exact location and position of the assets. One of the failure modes for ROC, if the distance between the two wires is less than the specifications, then it leads to failure. This paper develops a cloud-based approach to measure the distance between specific wires, through processing of point cloud data. This approach forms the foundation for data augmentation and development of hybrid digital twins (DT) of railway overhead catenary.

10:00
Predictive maintenance of mobile mining machinery: A case study for dumpers

ABSTRACT. The health of mobile mining machinery is critical to the achieve effectiveness and efficiency in mining production. However, the performance of mobile mining machinery, such as dumpers, is influenced by factors such as the operational environment, machine reliability, maintenance regime, human factor, etc., that lead to the downtime of dumpers. These downtimes have significant consequences on the overall equipment effectiveness (OEE) and lead to decreased capacity, increased maintenance costs and reduces availability. The enablement of prognostics and health management (PHM) can contribute to improve the OEE in mining production. Conventionally, the existing solutions focus mostly on the reliability and maintainability analysis of dumpers using failure data, maintenance data, operation data etc. Though several existing methods utilize condition monitoring techniques, there is less focus on monitoring the engine vibration and impact the health of the driver. In addition, the existing solutions are not real-time, scalable, or offline-based. Hence, the objective of this paper is to develop a concept for the enablement of PHM for the engine and driver comfort of dumpers. Furthermore, a cloud-based solution for condition monitoring of dumpers has been designed and developed. The solution can be used to assess the engine vibrations and seat vibrations and to estimate the remaining useful life (RUL) of the selected features using standards. The cloud-based architecture is implemented on the AI Factory platform that enable PHM for the improvement of OEE. This platform also facilitates the enablement of a digital twin for components and systems within dumpers or other mobile mining machinery.

10:15
Predictive maintenance of multi-component aircraft system using convolutional neural networks and deep reinforcement learning
PRESENTER: Juseong Lee

ABSTRACT. Predictive maintenance is a new approach to replacing components based on the data-driven Remaining-Useful-Life (RUL) prognostics. However, implementing predictive maintenance remains challenging for aircraft. First, as aircraft maintenance requires high reliability, it is necessary to quantify the uncertainty of the predicted RUL. Moreover, the maintenance of multi-component systems should be planned considering the updated RUL distributions of individual components and complex cost models. This paper proposes an integrated method for the predictive maintenance of multi-component aircraft systems. We estimate the probability distribution of RUL using convolutional neural networks and Monte Carlo dropouts. Then, deep reinforcement learning (DRL) is applied to plan the replacement of multiple components based on individual RUL distributions. This method considers the uncertainty of RUL predictions, risk of component failure, time-varying maintenance costs, and maintenance slot costs. A case study on the predictive replacement of two turbofan engines illustrates the proposed method. By considering the probability distribution of RUL and grouping some replacements, the proposed DRL-based predictive maintenance provides lowered long-term maintenance cost.

10:30
Adversarial Multi-Agent Reinforcement Learning for Fault-Tolerant Design of Complex Systems
PRESENTER: Joachim Grimstad

ABSTRACT. As systems become more complex, designing and operating them becomes increasingly challenging, and increases the demands on engineers and operators. Ensuring the safety and reliability of such systems is crucial, however, traditional design methodologies may not prove adequate to manage the growing complexity of systems. In this paper, a conceptual approach to designing fault-tolerant complex systems is proposed. The approach extends Model-Based System Engineering (MBSE) with zero-sum game models. These models allow Adversarial Multi-Agent Reinforcement Learning (Adv-MARL) techniques to explore various strategies, and assess outcomes. They also have the potential to identify vulnerabilities that can be addressed by refining the system design.

11:00
Digital Twin-based hybrid PHM framework for monitoring package-level degradation

ABSTRACT. The underlying idea of Digital Twin is a continuously updated virtual representation of an object, system, or process which replicates all phases in the lifecycle of its physical counterpart. Originally conceptualized in 2003, the term Digital Twin took shape after it was defined in NASA’s roadmap in 2010. But it has picked up a lot of traction in the past five years, becoming a widely used phrase in the context of products, processes, businesses, and more. The concept initially evolved in the context of aerospace and manufacturing applications, and was eventually embraced by many other industries such as healthcare and electronics.

Number of electronic devices used for various applications has grown steeply in the last ten years; some of the applications demanding a requirement to withstand harsher environments. Thus, prognostics and health management (PHM) of microelectronics has gained importance more ever than before. So far, the implementation of Digital Twin has been adapted and contextualized based on the field of application; and thus, it does not have a single fit-for-all definition or a standardized workflow. Therefore, it is crucial to create a clear framework for the implementation of a Digital Twin system for PHM of microelectronics. This paper presents such a framework based on the five-dimensional model of Digital Twin.

First, both physics-based and data-driven approaches for lifetime prediction have been discussed along with their limitations on an individual basis. Then, using both as building blocks, the implementation of a hybrid approach is presented. Other additional requirements for the hybrid approach, such as modelling physics-of-degradation of materials, is also discussed. Fundamental differences in a model and a Digital Twin have been addressed and three different complexity-levels (weak, cloud, and edge) of its connections to physical product are described. Conflict of using cloud and edge approach for data-driven models, as well as advantages of utilizing both approaches together has been discussed. Lastly, an example of implementing the hybrid approach for monitoring package-level thermal and hygroscopic degradation using a continuously updated model has been discussed.

09:00-11:00 Session 20I: S.22: Development and application of methods for enhancing the reliability of electronic devices
09:00
Two Algorithms For Defect Detection in Wafer Fabrication

ABSTRACT. We propose two algorithms for defect detection in high-dimensional measurement data from wafer fabrication. The measurement data may be taken from different measurement steps and may include continuous values like electrical quantities and values from discrete value ranges like states of count registers.

The first of these algorithms is based on computing a similarity indicator of the form op(<x,y>/H(x)) with several possibilities for the operator op, where x (device under test) and y (some training chip out of a randomly selected set serving as samples) are obtained by scaling, followed by applying one of several thresholding methods.

The second of these algorithms is based on analyzing mode positions of the conditional distributions of positive and negative objects found in some random sample set. Based on this analysis, measurements can be ranked according to some specific definition of relevancy, which also implies a method for dimensional reduction. Then every non-sample chip is evaluated by counting for how many measurements of the selection the mode criterion is satisfied.

Both algorithms are designed in order to depend on only few variables to optimize over which makes finding an approximation of a global optimum, not just some local optimum, easy.

For each of the two algorithms, we present an algorithmic method for dimensional reduction. Applying this as an extra preprocessing step may reduce the necessary amount of measurements, in some settings even to as little as three per chip.

The implementations of these algorithms have been applied to measurement data of more than 100,000 chips from real-world wafer fabrication of different semiconductor products in course of the iRel40 project. A selection of representative results can be presented, where we are using Cohen's Kappa value, accuracy, sizes of sample sets and the necessary number of measurements as measures for success.

One ability of the second algorithm is learning to detect defects in certain lots of front-end measurement data with small sample sets and/or only few measurements per chip, where optimizing for minimum sample set size and optimizing for minimum number of measurements by dimensional reduction may show surprisingly small numbers in certain applications.

The time complexity of both algorithms is essentially multi-linear in the input data sizes, up to logarithmic factors.

Furthermore, we can give some stochastic analysis of the defect indicator underlying the first algorithm, whereby the subject of analysis is the quotient of two integer random variables, leading to approximation formulas consisting of summed exponentials. We also defined a modified indicator allowing for somewhat simpler expressions in analysis.

References:

[1] https://arxiv.org/abs/2111.03727v1

[2] https://arxiv.org/abs/2108.11757v1

09:15
Transient thermal analysis, test verification and structural optimization on the integrated flywheel

ABSTRACT. With the requirement of agility and miniaturization to the aerospace industry, the high-integration concept has been widely applied to design the conventional mechanic-electric products. Taking the brand-new flywheel as an example, the mechanism and the control parts are achieved to merge together in recent years. Meanwhile, the maneuvering condition test of the flywheel usually induces to higher heat consumption and more rigid thermal reliability on the components, which is used to determine the design margin and the derating level. In order to protect the product and reduce the failure risk, it is necessary to analyze the transient thermal distribution of the flywheel and the dissipating paths of the PCB layout with the help of simulation. Additionally, it is feasible to improve the thermal reliability by using structural optimization methods. Based on the statement above, the transient thermal analysis of the integrated mechanic-electric flywheel is conducted and the simulation model is established by FEA software. Under the maneuvering conditions, the influence of heat consumption variation on the flywheel’s circuits is paid more attention. The temperature-time curves of focused components are plotted and the location of highest temperature value is pointed out. It is shown that the components with high temperatures are primarily concentrated on the power module and the drive module. After that, the thermal balancing test is operated to verify the simulation accuracy. It is summarized that the simulation results have a good consistence with the test data, and the computational errors are mainly less than 3°C. Finally, the structural optimization on the component placement is carried out to improve the thermal reliability of the integrated flywheel by definition of the temperature target, the region constraint and the cycle iteration.

09:30
An all-digital HW monitoring system at run-time to improve the reliability of electronic devices
PRESENTER: Luigi Pomante

ABSTRACT. This paper describes a use case developed in the context of the ECSEL iRel40 European research project. The main goal of such a use case is to show as it has been possible to exploit an all-digital hardware monitoring system at run-time to improve reliability with focus on verification activities. For such a purpose, a pacemaker has been developed in two versions. One version has been based on a Commercial Off-The-Shelf microcontroller, where some properties have been verified by means of a classical offline traces analysis approach. The other version has been based on a soft-core on Field Programmable Gate Array, where the same properties have been verified at run-time by means of the adopted all-digital hardware monitoring system. The comparison of the two verification approaches shows how it is possible (i) to reduce the time needed to perform verification and (ii) to provide the opportunity to verify more complex properties with respect to classical Built-In Self-Test approaches.

09:45
Data-Driven Remaining Useful Life Estimation of Discrete Power Electronic Devices

ABSTRACT. Robust and accurate prognostics models for estimation of remaining useful life (RUL) are becoming an increasingly important aspect of research in reliability and safety in modern electronic components and systems. In this work, a data driven approach to the prognostics problem is presented. In particular, machine learning models are trained to predict the RUL of wire-bonded silicon carbide (SiC) metal–oxide–semiconductor field-effect transistors (MOSFETs) subjected to power cycling until failure. During such power cycling, ON-state voltage and various temperature measurements are continuously collected. As the data set contains full run-to-failure trajectories, the issue of estimating RUL is naturally formulated in terms of supervised learning. Three neural network architectures were trained, evaluated, and compared on the RUL problem: a temporal convolutional neural network (TCN), a long short-term memory neural network (LSTM) and a convolutional gated recurrent neural network (Conv-GRU). While the results show that all networks perform well on held out testing data if the testing samples are of similar aging acceleration as the samples in the training data set, performance on out-of-distribution data is significantly lower. To this end, we discuss potential research directions to improve model performance in such scenarios.

10:00
There is no superior maintenance style in asset management

ABSTRACT. Enhancing the reliability of Electronic Components and Systems is optimizing certain (usually undetermined) characteristics of a system in a certain (usually undetermined) phase of the life cycle. ECSEL JU European Project “Intelligent Reliability 4.0” [1] rephrases this objective to enhance and ensure reliability of ECS by reducing their failure rates along the entire value chain. But does this also encompass the maintenance style and activities? Inspection, fault detection, diagnostics and prognostics at component and subsystem levels will contribute to quantitative assessment of performance and other properties of system and application level functions that can be linked to a multidimensional impact-space specific for the organisational context. The public-private context-driven weights of this impact-space, or Health Indices determine to a larger extend which maintenance style is most effective and affordable. A framework therefor is proposed and presented to obtain a consistent and objective relationship between the observed corporate values and the actions performed to maintain their proper levels. Application of this framework is demonstrated using examples originating from the field of power systems. The examples illustrate the role of fault detection, diagnostics and subsequent prognostics in relation to the maintenance strategies and the intended contribution to the corporate values. Attention is paid to the bottom part of the bathtub curve: “the range that covers most of the operational life, where maintenance style, repair and system configuration largely determine the system performance” [3]. Operational life can be extended by repair or replacement of components in order to reset the wear-out process and keep the system failure rate and availability at acceptable levels. The maintenance style will be affected by the possibilities and limitations of diagnostics to monitor the wear-out condition of a component or the ability to classify a component to early wear-out- or normal components. But also the availability of work forces may determine the maintenance style. The optimization of the mix of maintenance styles is subject of Reliability Centered Maintenance [3]. Examples of water treeing, partial discharge detection and thermal aging will be given.

References: [1] Intelligent Reliability 4.0, URL https://www.irel40.eu/, last checked: 30 January 2023. [2] R. Ross, Reliability Analysis for Asset Management of Electric Power Grids, Hoboken, NJ: Wiley-IEEE Press, 2019. [3] R. Ross, “Prognostics and Health Management for Power Electronics and Electrical Power Systems”, in Proceedings 9th International Conference on Condition Monitoring and Diagnosis 2022 (CMD 2022), Kitakyushu, 2022. [4] R. Ross, “Diagnostics, Maintenance & Replacement Strategies Applied to Insulators & Cable Systems”, in Proceedings of the INMR World Congress, Berlin, 2022.

10:15
Using Predictive Analytics Approaches to Investigate Climatic Reliability and Humidity Robustness issues of PCBA under Different Conditions
PRESENTER: Sajjad Bahrebar

ABSTRACT. Nowadays, climatic reliability and humidity robustness of electronic devices has become a significant issue for both consumer and industrial electronics due to various reasons. One reason for this increased problem is the widespread use of electronics in many locations. The climatic reliability of electronics is attributed to the interaction of external climatic conditions and printed circuit board assembly (PCBA) characteristics as the main part of each electronic device, which compromise the performance of electronics due to the electrochemical failure process. In order to improve the reliability of electronics, requires a detailed understanding of the synergetic and interaction effects of various controllable factors, such as humidity, temperature, pitch distance, voltage, contamination types, and contamination levels. Moreover, it is crucial for reliability assessment to understand the relative importance of factors and their levels to take remedial action at an earlier stage based on selecting the best PCBA material, soldering process, and optimizing the design in desired tasks for particular applications and climatic conditions. This study presents the most suitable approach and prediction model based on the input datasets by using a combination of statistical analysis, probabilistic approaches, and machine learning algorithms to predict LC, TTF, failure state, and highly risky conditions, that could provide a better perspective of PCBA reliability and helps to reduce electronic waste due to failure.

10:30
Process Data Analysis for Improved Burn-In Strategies Based on Complementary AI Models
PRESENTER: Lars Langenberg

ABSTRACT. Safety-critical applications of semiconductors, such as in automotive systems, require either 100% control or advanced sampling and test strategies to ensure that the high-quality targets are met. Burn-in (BI) is the established test method which measures the devices under stress conditions, such as increased temperature, to screen out the early life failures. Advanced sampling and test strategies have been developed to reduce BI times and/or sample size without affecting the defined quality targets e.g., based on flexible sampling plans, previous BI studies [1] or AI models trained on burn-in process data [2]. This paper presents a novel approach to assess the health status of a wafer lot depending on its individual way through the production process. In semiconductor manufacturing, Advanced Process Control (APC) is already deeply embedded in the process. While originally intended for process control and automation on the shop floor, an APC system provides a vast amount of data that allows to analyse the specific production process of a given wafer lot comprehensively. In our work, real APC data was retrieved from a 15-months period during which BI-tests revealed several failures. The investigated datasets contain time series data from all processing steps, such as the involved machines and actions, their key process parameters, and the detected deviations. Each dataset contains about 2 to 3 million samples per wafer lot and up to 52 features. The data does not include any sensor raw data or any visual inspection data which are typically used for AI-based wafer analysis. Therefore, APC data can be viewed as high-level meta information that aggregates the underlying raw data. As expected, the datasets are highly imbalanced due to the very low defect rates; nevertheless, 10 sets with known BI defects were retrieved. An LSTM-based autoencoder model was selected first and trained on the wafer lots without known failures. The comparison of the loss functions indicated a slight increase of the mean average error by 26% of the “bad” vs. the “good lots”. Still, after data reduction and variation of the training data, the accuracy remained unsatisfactory. Therefore, a complementary binary classification model, a deep neural network with convolutional layers and a fully connected output layer, was trained on the same datasets. The new model confirms the results of the LSTM autoencoder, but with much higher accuracy of > 90%, as validated with the available test data. The resulting probability of the classification can now be used as a health indicator to optimize the BI strategy. Wafer lots with a high health indicator will need less/no BI testing compared to lots with a low health indicator. In addition, the combined approach generates further insights which a single AI method cannot achieve. Anomalies in the loss function of the autoencoder indicate deviations from an average APC dataset. In cases of a low health indicator, these anomalies point to specific process steps or machines that are responsible for the AI-based health assessment and, therefore, may be root causes of the observed failures.

References: [1] https://doi.org/10.1002/qre.2896 [2] https://doi.org/10.3850/978-981-18-2016-8_763-cd

10:45
Impact of thermo-mechanical effects to the measurement behavior of a molded SO16 System-in-Package current sensor investigated by a physics based design for reliability approach
PRESENTER: Heiner Moeller

ABSTRACT. Future mobility applications will generate new demands for the electronic components and systems. Due to the progressing electrification accurate and reliable information from current sensors are essential during operation to control the whole electrical system. Harsh environmental conditions as well as specific mission profiles have to be considered during the development phase already [1] for appropriate sensor devices. Thermo-mechanical effects caused by temperature changes during production and operation as well as influences from design or production related tolerances can lead to unfavourable situations. Unfortunately, different aspects or quantitative influences are uncertain at an early design stage. Sometimes, inherent unknowns are always present and remain throughout the whole product life cycle. With incorporation of parametric sensitivity studies and modern CAE methods it is possible to face these issues and achieve a deeper understanding of the physical device behaviour. In this work a physics based design for reliability approach will be presented investigating the thermo-mechanical behaviour of a current sensor device during production and operation. The analysed moulded SO16 System-in-Package (SiP) is exposed to different temperature loads which constantly affect the intrinsic stress/strain distribution at the internal AMR sensor surface. This can lead to a deviation of the final offset voltage and result in a manifestation of a sensor calibration inaccuracy. Consequently, this would affect the overall measurement performance for the whole AMR sensor-based device during operation. To understand the device behaviour and help to improve the later product reliability for the purposed operation environment, various aspects had to be considered. Production related influences from materials (Young’s modulus, CTE, post mould cure shrinkage etc.) and processes (dwell temperatures and dwell times) are important for the initial device state as shown in a previous publication [2]. Furthermore, a certain mould compound ageing was recognised during different reliability tests. This normally reflects in a growing oxidized layer at the outer surface of the moulded package with changed material properties. Depending on different mission profiles during operation this is suspected for a significant impact to the stress/strains at the AMR sensor and to a possible measurement accuracy. As a result, the generated design for reliability “environment” represents a comprehensive approach to combine production process and operation for examining possible thermo-mechanical effects.

Acknowledgment The Authors would like to gratefully acknowledge the funding of this work by the project iRel40 from the ECSEL Joint Undertaking (JU) under grant agreement No. 876659 within the European Horizon2020 program. The project received co-funding from the German BMBF (grant agreement No. 16MEE0084S and 16MEE0100) and saxon ministry for economics and labor (SMWA) in Germany.

[1] Kapur, C.K., Pecht, M.: Reliability engineering, Chapter 11 Probabilistic Design for Reliability and the Factor of Safety, 2014

[2] H. Moeller; H. Knoll; P. Hille; R. Dudek; S. Rzepka; “Improving the production quality and robustness of a SO16 sensor package by a simulation based digital twin approach”, 9th Electronics System-Integration Technology Conference (ESTC) 2022, 13th – 16th September 2022, Sibiu, Romania

11:00
Prediction of the Number of Defectives in a Production Batch of Semiconductor Devices
PRESENTER: Ibrahim Ahmed

ABSTRACT. The production of semiconductor devices requires to satisfy high reliability standards. For this reason, manufactured devices are subject to burn-in, i.e., extensive testing under accelerated stress conditions, which is costly and time-consuming. The present work develops a model for predicting the quality of the devices from data collected during the production process. The developed modelling approach is based on: a) a combination of Piecewise Aggregate Approximation (PAA) and Principal Component Analysis (PCA) for the extraction of features relevant for inferring the quality of the devices from signal measurements collected during the production; and b) a model based on Probabilistic Support Vector Regression (PSVR) for predicting the number of defective devices in the production batch. The model is validated on synthetic data, which emulate signal measurements collected during the production of semiconductor devices. The obtained results show that the proposed model is able of predicting the number of defective devices in a production batch with satisfactory accuracy.

11:00-11:30Coffee Break
11:30-12:10 Session 21: Plenary talk: Professor Jan Erik Vinnem

Professor Jan Vinnem, Emeritous Professor at NTNU on Maritime Safety.

13:00-14:00Lunch Break