SEAA 2024: 50TH EUROMICRO CONFERENCE SERIES ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS
PROGRAM FOR WEDNESDAY, AUGUST 28TH
Days:
next day
all days

View: session overviewtalk overview

10:30-10:45Coffee Break
10:45-12:15 Session 4A: SEAA Session-1: SM (1)
Location: Room 108
10:45
Predicting Software Functional Size Using Natural Language Processing: An Exploratory Case Study

ABSTRACT. Software Size Measurement (SSM) plays an essential role in software project management as it enables the acquisition of software size, which is the primary input for development effort and schedule estimation. However, many small and medium-sized companies cannot perform objective SSM and Software Effort Estimation (SEE) due to the lack of resources and an expert workforce. This results in inadequate estimates and projects exceeding the planned time and budget. Therefore, organizations need to perform objective SSM and SEE using minimal resources without an expert workforce. In this research, we conducted an exploratory case study to predict the functional size of software project requirements using state-of-the-art large language models (LLMs). For this aim, we fine-tuned BERT and BERT_SE with a set of user stories and their respective functional size in COSMIC Function Points (CFP). We gathered the user stories included in different project requirement documents. Although we used a relatively small dataset to train the models, we obtained promising size prediction results with 0.74 accuracy in total size and between 0.84 and 0.92 accuracy in data movement size.

11:10
Towards the Construction of a Software Benchmarking Dataset via Systematic Literature Review

ABSTRACT. Effort estimation is a fundamental task during the planning of software projects. Prediction models usually rely on two essential factors: software size and effort data. Measuring the size of the software can be done at various stages of the project with desired accuracy. Nevertheless, the industry faces challenges when it comes to collecting reliable actual effort data. Consequently, organizations encounter difficulties in establishing effort prediction models. Benchmarking datasets are available, but, in most cases, they have huge variances that make them less useful for effort prediction. In this study, we aimed to answer whether creating a software benchmarking dataset is possible by gathering the data from the literature. To the best of our knowledge, a comprehensive dataset that gathers the functional size and effort data of the studies from the literature is unavailable. For this purpose, we performed a systematic literature review to find studies that include projects measured with the COSMIC Functional Size Measurement (FSM) method and the related effort. As a result, we formed a dataset including 337 records from 18 studies that shared the corresponding size and effort data. Although we performed a limited search, we created a larger dataset than many datasets in the literature. In light of our review, we obtained that most studies did not share their dataset, and many lacked case details such as implementation environment and the scope of software development life cycle activities included in the effort data. We also compared the dataset with the ISBSG repository and found that our dataset has less variation in productivity. Our review showed the applicability of creating a software benchmarking dataset is possible by gathering the data from the literature. In conclusion, this study addresses gaps in the literature through a cost-free and easily extendable dataset.

11:35
Using generative AI to support standardization work – the case of 3GPP

ABSTRACT. Standardization processes build upon consensus between partners, which depends on their ability to identify points of disagreement and resolving them. Large standardization organizations, like the 3GPP or ISO, rely on leaders of work packages who can correctly, and efficiently, identify disagreements, discuss them and reach a consensus. This task, however, is effort-, labor-intensive and costly. In this paper, we address the problem of identifying similarities, dissimilarities and discussion points using large language models. In a design science research study, we work with one of the organizations which leads several workgroups in the 3GPP standard. Our goal is to understand how well the language models can support the standardization process in becoming more cost-efficient, faster and more reliable. Our results show that generic models for text summarization correlate well with domain expert's and delegate's assessments (Pearson correlation between 0.66 and 0.98), but that there is a need for domain-specific models to provide better discussion materials for the standardization groups.

12:00
Different Strokes for Different Folks: A Comparison of Developer and Tester Views on Testing

ABSTRACT. In this paper, we re-analyse the data from a previous study of Straubinger et al., which asked 284 industrial IT staff about their views on testing. In that study, as well as developers, the dedicated role of tester was included in the data - both roles were treated as the same role. In this paper, we posit that the two roles (i.e., developer and tester) are so very different that we should analyse each role separately; testers will have unique insights into testing, so separating their views and experiences from developers is important. To this end, we analyse six of the same research questions as the original study, using separate developer and tester data. Results showed that for almost every question we re-visited, testers differed in their opinions from developers, whether on the type of testing they did, measures of code quality, effort to write tests and motivation for testing.

10:45-12:15 Session 4B: SEAA Session-2: SMSE (1)
Chair:
Location: Room 107
10:45
A Systematic Mapping Study on Teaching of Security Concepts in Programming Courses

ABSTRACT. Context: To effectively defend against ever-evolving cybersecurity threats, software systems should be made as secure as possible. To achieve this, software developers should understand potential vulnerabilities and apply secure coding practices. To prepare these skilled professionals, it is important that cybersecurity concepts are included in programming courses taught at universities. Objective: To present a comprehensive and unbiased literature review on teaching of cybersecurity concepts in programming courses taught at universities. Method: We perform a Systematic Mapping Study. We present six research questions, define our selection criteria, and develop a classification scheme. Results and Conclusions: We select 24 publications. Our results show a wide range of research contributions. We also outline guidelines and identify opportunities for future studies. The guidelines include coverage of security knowledge categories and evaluation of contributions. We suggest that future studies should cover security issues, negative impacts, and countermeasures, as well as apply evaluation techniques that examine students’ knowledge. The opportunities for future studies are related to advanced courses, security knowledge frameworks, and program- ming environments. Furthermore, there is a need of a holistic security framework that covers the security concepts identified in this study and is suitable for education.

11:10
The Past, Present, and Future of Research on the Continuous Development of AI

ABSTRACT. Since 2020, 33 literature reviews have systematically synthesized research on the continuous development of AI, also known as Machine Learning Operations (MLOps), reflecting the increasing prevalence of AI models across various fields and the multifaceted challenges in their development, integration, and deployment. Yet, the lack of comprehensive analysis of these literature reviews and their covered topics complicates the selection of relevant ones and anticipating trends in the future. In addition, these literature reviews gathered 1397 primary sources to describe aspects of AI's continuous development, integration, and deployment, posing a hidden gem to gain insights into the past and present work and derive insights into the future of AI's continuous development.

With this work, we 1) systematically collected and summarised the existing 33 literature reviews. 2) Due to minimal overlap between the literature reviews' primary sources, we offer holistic insights into and interrelations of frequently addressed topics, encompassing Software Engineering (SE) practices for AI, the AI development pipeline, and associated challenges. 3) We discuss future research directions for the continuous development, integration, and deployment of AI. Therefore, we base our arguments on identified clusters in the primary sources of identified literature reviews. This discussion focuses on AI model reliability and resource consumption, emphasizing the interrelation of proposed solutions and the effects on the whole pipeline.

11:35
Artificial Intelligence Methods in Software Refactoring: A Systematic Literature Review

ABSTRACT. Refactoring is an important process in software engineering, aiming to improve code quality without altering the behavior. This article represents a systematic review of the literature (SLR) on artificial intelligence in the domain of software refactoring. Following a rigorous methodology consisting of automated data extraction, snowballing techniques, and manual validation, we created a dataset consisting of 156 articles. The focus of the investigation was to identify the refactoring stages that are addressed. The results show that as research type, most of the contribution propose solutions, while other forms of research such as evaluation, validation and experience are less represented in publications. Refactoring detection represents the highest interest in research contributions, while other refactoring stages, such as prioritization or testing are less investigated. The most commonly used AI methods include Random Forests, Genetic Algorithms, SVM, CNNs and Decision Trees. Based on this literature review, we have identified research trends and opportunities for future research.

12:00
User Experience and Security in Digital Health Applications: Results from a Rapid Review

ABSTRACT. In recent years, a growing interest has been in the adoption of medical web or mobile applications or more in general applications in the digital health (DHEAL) field. These applications are designed for a wide range of users, from novice to expert and also end-users with and without disabilities, without adequately considering their unique software security needs. In this short paper, we present the results of a Rapid Review (RR) to identify existing approaches and methods to assess User Experience (UX), usability, accessibility and/or security in DHEAL applications. This RR has been conducted in the context of a research project (“DHEAL–COM Digital Health Solutions in Community Medicine). Among the others, the objective of DHEAL–COM is to delve into the complex relationship between UX (and its variants, like usability and accessibility) and security, i.e., to understand to what extent the principle of acceptability in security is taken into account when developing DHEAL applications. The outcomes of our RR should provide evidence to the stakeholders involved in the DHEAL–COM project and to researchers and practitioners who work in the DHEAL context. The findings of our RR emerge from 39 papers and can be summarized as follows: (i) there are several methods to assess usability; (ii) the most common methods are focused only on common usability aspects and in a few cases these methods concerns accessibility and credibility of the content; (iii) there are several methods to assess security and most of them are dictated by legislative rules; (iv) although the difficulty in finding a compromise between usability and security is clear in many cases, there are neither solutions nor approaches to deal with both of them.

10:45-12:15 Session 4C: SEAA Session-3: DAIDE (1)
Location: Room 105
10:45
Analyzing the Potency of Pretrained Transformer Models for Automated Program Repair

ABSTRACT. Manually finding and fixing bugs is cumbersome work, which consumes valuable resources in the software development cycle. In this work, we examine the capability of pretrained transformer models for tackling the task of automated program repair. Previous research has been focused on inherently different machine learning architectures for solving this use case. Our contributions include a novel dataset for fine-tuning the models, the introduction of a windowing technique augmenting the pretrained model and the evaluation on the commonly used Defects4J benchmark along with an ablation study. The findings demonstrate that leveraging our dataset leads to enhanced model performance surpassing Bugs2Fix. Our model enhancements significantly boost overall performance, enabling resulting models to achieve parity with the current state of the art by fixing 30 bugs in 27 minutes on Defects4J. This shows that pretrained transformers are promising for the task of automated bug fixing and should be considered by future research. However, similar to the existing state-of-the-art solutions, the performance still needs be improved to provide practical benefits to end users.

11:10
Inter-organizational Data Sharing Processes – an exploratory analysis of incentives and challenges

ABSTRACT. Businesses across different areas of interest are increasingly depending on data, particularly for machine learning (ML) applications. To ensure data provisioning, inter-organizational data sharing is proposed, e.g. in the form of data ecosystems.

The aim of this study was to perform an exploratory investigation into the data sharing practices that exist in business-to-business (B2B) and business-to-customers (B2C) relations, in order to shape a knowledge foundation for future research. We launched a qualitative survey, using interviews as data collection method. We conducted and analyzed eleven interviews with representatives from seven different companies across several industries with the aim of finding key practices, differences and similarities between approaches, so we could formulate the future research goals and questions.

We grouped the core findings of this study into three categories: organizational aspects of data sharing, where we noticed the importance of data sharing and data ownership as business driver; technical aspects of data sharing, related to data types, formats, maintenance and infrastructures; and challenges, with privacy being the highest concern along the data volumes and cost of data.

11:35
Experimentation in Software Ecosystems: a Systematic Literature Review

ABSTRACT. Context: Software ecosystems have transformed many industries, redefining collaboration and value co-creation. The success of such ecosystems depends on the dynamism of the network of users on its different sides. Consequently, decision-making in such multifaceted and interconnected environments is more complex than in conventional software products. Online controlled experiments are considered the gold standard for aiding decision-making in software engineering processes. Experiments are extensively used to reduce bias and estimation noise for design, engineering, and business decisions. However, experimentation in software ecosystems is inherently more complex as it deals with atypical sources of bias and technical complications. Primary studies of experimentation approaches in software ecosystems are scattered across multiple domains and disciplines, and secondary research on the topic is scarce as highlighted in different tertiary studies. Hence, we conducted this study.

Objectives: To explore primary research on experimentation in software ecosystems; Summarize current approaches, toolboxes, and solutions that practitioners and researchers, facing similar problems, can use to inform their approaches; To outline underexplored research areas and provide recommendations for practitioners.

Method: We conducted a systematic literature review. The search strategy, application of exclusion and inclusion criteria, and subsequent quality assessment resulted in 63 relevant studies. Data extraction process was designed and carried out to collect data relevant to the study objectives. The extracted data underwent descriptive and thematic syntheses and analyses, in addition to cross-analysis on relevant axes.

Contributions: The study resulted in four contributions. First, a distillation of the themes and patterns in the available research on the topic. Second, a practical summary of the experimental designs specific to each software ecosystem type. Third, an actionable road map for practitioners in order to achieve experimentation maturity in software ecosystems. Fourth, an outline of the underexplored research areas.

12:00
Semantic-Aware Multi-modal Information Retrieval in A Data Lake – A Literature Review

ABSTRACT. Machine Learning (ML) is continuously permeating a growing amount of application domains. Generative AI such as Large Language Models (LLMs) also sees broad adoption to process multi-modal data such as text, images, audio, and video. While training such models is usually motivated by using even more data, handling such data efficiently has already become a very practical challenge in industry--double as much data is certainly not double as good. Rather the opposite is important since getting an understanding of the inherent quality and diversity of the underlying data-lakes is a growing challenge for application-specific ML as well as for fine-tuning LLMs nowadays. Furthermore, information retrieval (IR) from such growing data-lakes is challenged when considering the temporal dimension as present in time-series signal data (eg., video or lidar data in case of automotive data) to determine its semantic value. This study focuses on the different semantic-aware techniques to extract embeddings from mono-/multi-/cross-modal data for IR. We provide an overview of their evolution and highlight challenges to overcome in the context of time-series data. Articles were collected to get information about the state-of-the-art techniques over time and understand how embeddings became the most popular one. The study provides a catalogue of data processing tools with a focus on applications of embedding for three different categories of data modalities.

12:15-13:30Lunch Break
14:45-16:15 Session 6A: SEAA Session-4: SM (2)
Location: Room 108
14:45
Exploring Benefits of Bellwether Projects in Cross-Project IR-based Fault Localization

ABSTRACT. CONTEXT: Information retrieval-based bug localization (IRBL) is a promising approach for efficiently identifying buggy software modules in response to user bug reports. Supervised learning techniques were adopted to improve the bug localization performance, but they brought the cold-start problem due to insufficient training data. Recent studies focused on transfer learning techniques to utilize cross-project data. These techniques improved performance but left a question regarding better cross-project data selection. OBJECTIVE: To evaluate the effectiveness of bellwethers, which are exemplary cross-project data better than the others for training a cross-project bug localization model. METHOD: With a bug localization method, the performance of cross-project bug localization was observed to find bellwether projects and to evaluate its effective usage for bug localization. RESULTS: One cross-project was dominantly better than the others. Also, it was often helpful to mix a bellwether project with the small available within-project data to improve the localization performance. CONCLUSION: A practical implication is to select cross-project data supported by cross-project bug localization on other projects. Mixing it with target project data is often beneficial at an early phase.

15:10
An Empirical Study on the Relation between Programming Languages and the Emergence of Community Smells

ABSTRACT. To provide a measurable representation of social issues in software teams, the research community defined a set of anti-patterns that may lead to the emergence of both social and technical debt, i.e., “community smells”. Researchers have investigated community smells from different perspectives; in particular, they have analyzed how product-related aspects of software development, such as architecture and introducing a new language, could influence community smells. However, how technical project characteristics may be in relation to the emergence of community smells is still unknown. Different from those works, we aim to investigate how adopting specific programming languages might influence the socio-technical alignment and congruence of the development community, possibly inducing their overall ability to communicate and collaborate, leading to the emergence of social anti-patterns, i.e., community smells. We studied the relationship between the most used programming languages and the community smells in 100 open-source projects on GITHUB. Key results of the study show a low statistical correlation for specific community smells like Prima Donna Effects, Solution Defiance, and Organizational Skirmish, highlighting the fact that for some programming languages, its adoption could not be an indicator of the presence or absence of community smells.

15:35
Software companies responses to hybrid working

ABSTRACT. [Context]: The COVID-19 pandemic has disrupted the global market and workplace landscape. As a response, hybrid work situations have become popular in the software business sector. This way of working has positive or negative impacts on software companies. [Objective]: This study investigates software companies' responses to hybrid working in the post-pandemic situation. [Method]: We conducted a large-scale survey to achieve our objective. Our results are based on a qualitative analysis of 124 valid responses. [Results]: The main result of our study is a taxonomy of software companies’ impacts on hybrid working at individual, team and organisation levels. We found higher positive responses at individual and organisational levels than negative responses. At the team level, both positive and negative impacts obtained a uniform number of responses. [Conclusion]: The results seem to indicate that hybrid working became credible with the wave of COVID-19, with 83 positive responses outweighing the 41 negative responses. Software company respondents witnessed better work-life balance, productivity, and efficiency in hybrid working.

16:00
A Confirmation Study on the Removal of Dead Code from Java Desktop Applications

ABSTRACT. In this paper, we present the results of a confirmation study on the impact of dead-method removal on the internal structure of source code, time to compile source code, and space to store compilation results (i.e., executable code). To that end, we considered 23 open-source Java desktop applications hosted on GitHub. We cleaned up each of these applications from its dead methods to obtain two versions: one with dead methods (i.e., original version) and another one without them (i.e., cleaned version). For each of these applications, we compared its versions (original and cleaned) to determine whether, and to what extent, the removal of dead methods affects the internal structure of source code and the usage of resources such as compilation time and space to store executable code. We observed that, after removing dead methods, the internal structure of source code significantly improves, while the time to compile source code significantly diminishes as well as the space to store compilation results. We also performed correlation analyses that allowed us to conclude that the more dead methods are removed, the greater the improvement to the internal structure of source code and the less space to store executable code.

14:45-16:15 Session 6B: SEAA Session-5: CPS (1) & Tools Sessions Session
Location: Room 107
14:45
Model-Based Reliability, Availability, and Maintainability Analysis for Satellite Systems with Collaborative Maneuvers via Stochastic Games

ABSTRACT. Space-based navigation systems (GPS, GLONASS) rely on satellites to operate in orbit and have lifetimes of 10 years or more. Engineers employ Reliability, Availability, and Maintainability (RAM) analysis during the design phase to maximize a satellite's mean time between failures (MTBF). These design parameters help to optimize maintenance plans, enhance overall reliability, and extend the satellite's lifespan. The paper presents a novel approach using concurrent stochastic games (CSG) to model a single satellite with logical and formal specifications of RAM properties in rPATL. We leverage the PRISM-games model checker for quantitative analysis while considering collaborative behaviors between involved players in orbit and on the ground. This CSG-based approach offers a rich design space where actors considered as players involved in satellite maintenance can collaborate and learn optimal strategies.

15:10
Parallelized Code Generation from Simulink Models for Event-driven and Timer-driven ROS 2 Nodes

ABSTRACT. In recent years, the complexity and scale of embedded systems, especially in the rapidly developing field of autonomous driving systems, have increased significantly. This has led to the adoption of software and hardware approaches such as Robot Operating System (ROS) 2 and multi-core processors. Traditional manual program parallelization faces challenges, including maintaining data integrity and avoiding concurrency issues such as deadlocks. While model-based development (MBD) automates this process, it encounters difficulties with the integration of modern frameworks such as ROS 2 in multi-input scenarios. This paper proposes an MBD framework to overcome these issues, categorizing ROS 2-compatible Simulink models into event-driven and timer-driven types for targeted parallelization. As a result, it extends the conventional parallelization by MBD and supports ROS 2-based models with multiple inputs. The evaluation results show that after applying parallelization with the proposed framework, all patterns show a reduction in execution time, confirming the effectiveness of parallelization.

15:35
Model-Based Gamification Design with Web-Agon: An Automated Analysis Tool for Gamification

ABSTRACT. Designing effective gamified solutions is a difficult and highly complex task. Supporting tools for the requirements analyst are very rare, while most existing tools provide automation for reasoning over complex knowledge models. From our experience within the EU Projects and the interviews conducted with the analysts using gamification tools, this paper brings out the crucial lessons learned on automating gamification analysis and design. We employed these lessons to guide the development of Web-Agon, a web-based solution that automates reasoning over models to support the analyst. Web-Agon, based on the Acceptance/Gamification Requirements Agon Framework, facilitates systematic gamification analysis of software systems. This approach, driven by acceptance (psychological, sociological, behavioral) requirements, has proven effective in designing systems that positively engage users. Based on models and gamification principles, Web-Agon contributes to building user-centered, engaging software systems. We have evaluated the effectiveness of our tool by a case study on Participatory Architectural Change Management in ATM systems with the use of Web-Agon for system gamification. We obtained positive results in terms of supporting analyst in a structured, systematic and automated way, reducing potential errors, thanks to automated functionalities, as well as speeding up the gamification process.

15:50
Robin: A Systematic Literature Mapping Management Tool

ABSTRACT. Systematic literature mapping is an essential part of research methodology. Conducting a systematic literature mapping is challenging. Researchers query publications from various sources, which need to be filtered, categorized, and cleared of duplicates. It is usually the case that the number of publications ranges between hundreds to thousands. The whole process is often performed iteratively and repeatedly, especially at the start of the mapping study, which only further increases the effort. When a team of researchers conducts a mapping study, the members may have different opinions on filtering and categorizing papers, which must be resolved. To our knowledge, this problem is very poorly supported by open-source tools. To address these issues, we present a tool called Robin which facilitates managing the steps of conducting a mapping study within a team. It provides search tools, categorization, and a platform for team members to define their criteria for including and excluding papers. In addition, Robin is connected to publicly available publications search platforms such as IEEE API and Scopus API. Robin is written in Python-Django and can be installed as a web application.

16:05
From Group Psychology to Software Engineering Research to Automotive R&D: Measuring Team Development at Volvo Cars

ABSTRACT. From 2019 to 2022, Volvo Cars successfully translated our research discoveries regarding group dynamics within agile teams into widespread industrial practice. We wish to illuminate the insights gained through the process of garnering support, providing training, executing implementation, and sustaining a tool embraced by approximately 700 teams and 9,000 employees. This tool was designed to empower agile teams and propel their internal development. Our experiences underscore the necessity of comprehensive team training, the cultivation of a cadre of trainers across the organization, and the creation of a novel software solution. In essence, we deduce that an automated concise survey tool, coupled with a repository of actionable strategies, holds remarkable potential in fostering the maturation of agile teams, but we also share many of the challenges we encountered during the implementation.

14:45-16:15 Session 6C: SEAA Session-6: DAIDE (2)
Location: Room 105
14:45
Experimentation in Industrial Software Ecosystems: an Interview Study

ABSTRACT. Industrial software ecosystems refer to a network of interdependent actors, co-creating value through a shared technological platform specifically tailored to industrial sectors. Developing, maintaining, and orchestrating such platforms involves many challenges that require complex decision making. Experimentation can help alleviate this complexity and reduce decision uncertainty and bias. However, experimentation requires certain organizational, infrastructural, and data-related prerequisites which can be uniquely challenging to achieve in industrial software ecosystems. Through semi-structured interviews with 25 industry professionals involved in various roles across 17 ecosystems, we analyze the difficulties faced in conducting effective experiments in such environments. The interview protocol covered aspects related to the methodologies, data handling processes, and current experimentation practices, as well as the challenges faced by practitioners who engage in experimentation initiatives. The study findings reveal technical, organizational, and market-related challenges, detailing the complexities facing experimentation initiatives in industrial software ecosystems. The findings are presented in an actionable manner, following a model that allows business-oriented alignment of architecture, process, and organizational evolution strategies. The study identifies key impediments, such as data integration difficulties, stringent regulatory environments, and prevailing organizational cultures that hinder continuous experimentation practices. Our analysis provides a foundation for understanding the unique challenges facing experimentation efforts in industrial software ecosystems and offers insights into potential strategies to improve the effectiveness of these initiatives.

15:10
The State of Generative AI Adoption from Software Practitioners' Perspective: An Empirical Study

ABSTRACT. Context: Generative AI (GenAI) brings new opportunities to the software industry and the digital economy in a broader context. Investment in AI technology may bring hope to developing countries to escape the “middle-income trap.” Objective: This study aimed to explore and capture the practitioners’ perception of GenAI adoption in the fast-paced software industry in the context of developing countries. Method: We conducted online focus group discussions with 18 practitioners from various roles to collect qualitative data. The practitioners have an average of 7.8 years of working experience and have used GenAI for over a year. We employed thematic analysis and the Human-AI Collaboration and Adaptation Framework (HACAF) to identify the influencing factors of GenAI adoption, such as awareness, use cases, and challenges. Results: The GenAI technology adoption is evident from practitioners. We identified 22 use cases, three of which were novel, i.e., contextualizing solutions, assisting the internal audit process, and benchmarking the internal software development process. We also discovered seven key challenges associated with the GenAI adoption, two of which were novel, namely, no matching use cases and unforeseen benefits. These challenges slow GenAI adoption and potentially hinder developing countries from entering a high-skill industry. Conclusion: While the adoption of GenAI technology is promising, industry-academia collaboration is needed to find solutions and strategies to address the challenges and maximize its potential benefits.

15:35
Unveiling Data Preprocessing Patterns in Computational Notebooks

ABSTRACT. Data preprocessing, which includes data integration, cleaning, and transformation, is often a time and effort-intensive step due to its fundamental importance. This crucial phase is integral for ensuring the quality and suitability of data for subsequent stages, such as feature engineering and model training. This paper explores the current state of data preprocessing in the context of Machine Learning (ML) and data-driven systems. With a focus on Python-based notebooks, we investigate how prevalent data preparation practices are in computational notebooks focused on ML model development. This paper presents the results of the analysis of 184,570 computational notebooks collected from Kaggle, a platform hosting data science competitions. Despite the crucial role played by data preprocessing in guaranteeing the effectiveness of model performance, our results expose a significant lack of emphasis on data preprocessing activities in the examined notebooks. Notably, users holding the highest rankings tend to skip data preprocessing steps and focus on model-related activities. Although other users exhibit more frequent incorporation of data preprocessing methods, the overall prevalence remains relatively limited. We discovered that data preparation practices such as missing values are present in 20% to 60% of the notebooks depending on a competition, whereas outliers handling are only present in less than 20% of the analyzed scripts, and the most frequently applied practices are the data transformation methods.

16:15-17:00 Session 7: Coffee Break + Poster Session
Location: Auditorium Hall
16:15
A Systematic Analysis of MLOps Features and Platforms

ABSTRACT. While many companies aim to use Machine Learning (ML) models, transitioning to deployment and practical application of such models can be very time-consuming and technically challenging. To address this, MLOps (ML Operations) offers processes, tools, practices, and patterns to bring ML models into operation. A large number of tools and platforms have been created to support developers in creating practical solutions. However, specific needs vary strongly in a situation-dependent manner, and a good overview of their characteristics is missing, making the architect’s task very challenging. We conducted a systematic literature review (SLR) of MLOps platforms, describing their qualities, features, tactics, and patterns. In this paper, we want to map the design space of MLOps platforms. We are guided by the Attribute-Driven Design (ADD) methodology. In this way, we want to provide software architects with a tool to support their work in the platform area.

16:25
Nextgen of Search and Rescue operations\\through trustworthy doped drones fleet

ABSTRACT. Search and rescue operations are critical and involve logistical and human infrastructures, requiring high technology trustworthiness. A multi-layer secure architecture on top of a drone mesh is presented to address these challenges. It is leveraged by security features such as blockchain and CIA-compliant communication paradigms.

16:40
Using Large Language Models for Source Code Documentation

ABSTRACT. Writing good software documentation imposes significant effort. Large Language Models (LLMs) could potentially streamline that process, though. So, the question arises whether current LLMs are able to generate valid code documentation for classes and methods on basis of the bare code. According to literature, various such models have the capability to generate documentation that is on par with or even superior to reference documentation. In our experimental study, we found that the model GPT-4 by OpenAI leads to poor results when measuring similarity to the reference documentation on class level. Thus, GPT-4 is not yet usable for generating class documentation. On method level, however, the model achieved higher similarity ratings and can be considered applicable for the use case.

16:50
Project security using analytical time evaluation techniques

ABSTRACT. Time planning of IT Projects is difficult due to a large number of uncertainties in the software development process, inclusion of people, and difficulties with time estimates or complexity evaluation.

Mistakes in software development planning open ways to architectural, functional, and integrative deficiencies of the product and surrounding processes. These can be expressed as problems of correctness and security, and eventually constitute business risks that can be realized in case of improper scheduling and project failure.

Using a simple synthetic project, we compare three analytic techniques to evaluate project duration. These three techniques can work with uncertain input parameters, represent projects as stochastic activity networks, and use simple computations to approximate distributions of the start time and end time of all tasks.

17:00-18:30 Session 8A: SEAA Session-7: SM (3)
Location: Room 108
17:00
Measuring Software Development Waste in Open-Source Software Projects

ABSTRACT. Software Development Waste (SDW) is defined as any resource-consuming activity that does not add value to the client or the organization developing the software. SDW impacts the overall efficiency and productivity of a software project as the scale and size of the project grows. Although engineering leaders usually put in effort to minimize waste, the lack of definitive measures to track and manage SDW is a cause of concern. To address this gap, we propose five measures, namely Stale Forks, Project Diversification Index, PR Rejection Rate, Backlog Inversion Index, and Feature Fulfillment Rate to potentially identify unused artifacts, building the wrong feature/product, mismanagement of backlog types of SDW. We apply these measures on ten open-source projects and share our observations to apply them in practice for managing SDW.

17:25
Investigating team maturity in an agile automotive reorganization

ABSTRACT. About seven years ago, Volvo Cars initiated a large-scale agile transformation. Midst this journey, a significant restructuring of the R&D department took place. Our study aims to illuminate how team maturity levels are impacted during such comprehensive reorganizations. We collected data from 63 teams to comprehend the effects of organizational changes on these agile teams. Additionally, qualitative data was gathered to validate our findings and explore underlying reasons. Contrary to what was expected, the reorganization did not significantly alter the distribution of team maturity. High turnover rates and frequent reorganizations were identified as key factors to why the less mature teams remained in the early stages of team development. Conversely, teams in the second category remained stable at a higher maturity stage, primarily because the teams themselves remained largely intact, with only management structures changing. In conclusion, while reorganizations may hinder some teams’ development, others maintain stability at a higher level of maturity despite substantial managerial changes.

17:40
Decoding Difficulties in Implementing Agile Principles

ABSTRACT. Agile adoption is strongly related to adherence to the four values and, consequently, to the 12 principles of the Agile Manifesto. All principles are equally important and at the same time, dependencies between them exist. Difficulties in implementing these principles can be perceived differently within an Agile team. Our exploratory study seeks to augment the understandability of difficulties related to agile principles, their causes, and their mitigation experiences. Thematic analysis was applied to survey responses targeting different experiences in Agile teams. Quantitative analysis provides an overview of the perception of the most difficult principles, respectively qualitative results lead to a root cause analysis and classification of mitigation methods. Our study determined that ”Welcome changing requirements” is the most difficult principle to implement. The causes of difficulties are diverse and can be classified in seven categories, while the solutions refer mainly to project and process work and customer communication.

17:00-18:30 Session 8B: SEAA Session-8: SMSE (2)
Location: Room 107
17:00
On the need for configurable travel recommender systems: A systematic mapping study

ABSTRACT. Travel Recommender Systems (TRSs) have been proposed to ease the burden of choice in the travel domain, by providing valuable suggestions based on user preferences. Despite the broad similarities in functionalities and data provided by TRSs, these systems are significantly influenced by the diverse and heterogeneous contexts in which they operate. This plays a crucial role in determining the accuracy and appropriateness of the travel recommendations they deliver. For instance, in contexts like smart cities and natural parks, diverse runtime information—such as traffic conditions and trail status, respectively—should be utilized to ensure the delivery of pertinent recommendations, aligned with user preferences within the specific context. However, there is a trend to build TRSs from scratch for different contexts, rather than supporting developers with configuration approaches that promote reuse, minimize errors, and accelerate time-to-market. To illustrate this gap, in this paper, we conduct a systematic mapping study to examine the extent to which existing TRSs are configurable for different contexts. The conducted analysis reveals the lack of configuration support assisting TRSs providers in developing TRSs closely tied to their operational context. Our findings shed light on uncovered challenges in the domain, thus fostering future research focused on providing new methodologies enabling providers to handle TRSs configurations.

17:25
A Systematic Mapping Study on Quality-in-Use and Sustainability in Software

ABSTRACT. The relevance of sustainable software is growing, drawing attention from both academic researchers and practitioners. Understanding its impact on end-users' perceived quality, often termed quality-in-use, is crucial. This paper aims to explore the relationship between software sustainability and quality-in-use as reported in literature. We conducted a systematic mapping study using Scopus, one of the largest peer-reviewed databases, to explore this relationship. Twenty-seven papers were selected and analyzed based on publication year, source, application domain, research type, empirical type, sustainability dimensions, quality-in-use attributes, and the relationship between sustainability and quality-in-use. Results indicate this area of study is in its early stages, with few papers published annually. Conferences were the primary publication channels, with healthcare being the most examined application domain. Papers were categorized into solution proposals and evaluation research, with over half undergoing empirical evaluation. The environmental dimension, particularly power consumption, was the most studied sustainability dimension. Quality-in-use was explored through various attributes, with 63\% of teh selected papers identifying a relationship between sustainability and quality-in-use. This mapping study highlights the lack of a common definition within the software engineering community regarding software sustainability and quality-in-use, calling for further research to elucidate the impact of sustainable software products on quality-in-use.

17:50
A Systematic Mapping Study on SDN Controllers for Enhancing Security in IoT Networks.

ABSTRACT. Abstract—Context: The increase in Internet of Things (IoT) devices gives rise to an increase in deceptive manipulations by malicious actors. These actors should be prevented from targeting the IoT networks. Cybersecurity threats have evolved and become dynamically sophisticated, such that they could exploit any vulnerability found in IoT networks. However, with the introduction of the Software Defined Network (SDN) in the IoT networks as the central monitoring unit, IoT networks are less vulnerable and less prone to threats. Objective: To present a comprehensive and unbiased overview of the state-of-the-art IoT networks security enhancement using SDN controllers. Method: We review the current body of knowledge on enhancing the security of IoT networks using SDN with a Systematic Mapping Study (SMS) following the established guidelines. Results: The SMS result comprises 33 primary studies analyzed against four major research questions. The SMS highlights current research trends and identifies gaps in the SDN-IoT network security. Conclusion: We conclude that the SDN controller architecture commonly used for securing IoT networks is the centralized controller architecture. However, this architecture is not without its limitations. Additionally, the predominant technique utilized for risk mitigation is machine learning.

17:00-18:30 Session 8C: SEAA Session-9: DAIDE (3) & STREAM (1)
Location: Room 105
17:00
AGORA: An Approach for Generating Acceptance Test Cases from Use Cases

ABSTRACT. This paper introduces AGORA, an innovative approach that leverages Large Language Models to automate the definition of acceptance test cases from use cases. AGORA consists of two phases that exploit prompt engineering to 1) identify test cases for specific use cases and 2) generate detailed acceptance tests cases. AGORA was evaluated through a controlled experiment involving industry professionals, comparing the effectiveness and efficiency of the proposed approach with the manual method. The results showed that AGORA can generate acceptance test cases with a quality comparable to that obtained manually but improving the process efficiency by over 90% in a fraction of the time. Furthermore, user feedback indicated high satisfaction with using the proposed approach. These findings underscore the potential of AGORA as a tool to enhance the efficiency and quality of the software testing process.

17:25
A Catalog of Cost Patterns and Antipatterns for Infrastructure as Code

ABSTRACT. Cloud adoption is historically driven by cost considerations. As the complexity of the software systems deployed on the cloud continuously increases, and with it also the need to manage larger and more complex infrastructures, Infrastructure as Code (IaC) approaches become invaluable tools. However, not many existing works have looked into the cost implications of IaC use for cloud-based software. In this work we build on an existing dataset that has looked into cost-related commits on IaC artifacts in open-source repositories in order to identify recurring solutions and ineffective practices in cost management. We present a catalog of patterns and antipatterns organizing our findings, and discuss its implication for practitioners and researchers.

17:50
Exploring Complexity Issues in Junior Developer Code using Static Analysis and FCA

ABSTRACT. We report on an exploratory evaluation that combines static analysis with formal concept analysis to investigate complexity issues in source code produced as part of a mandatory course in computer science. Our dataset includes over 500 Python and Java projects that represent student solutions to four semesters worth of programming assignments. We employ the latest version of SonarQube configured to use an extended set of analysis rules and focus on code complexity issues, which are known to impact code readability and maintainability. We study the distribution and composition of these complexity issues and employ formal concept analysis to study the relation between them and other issue types. We present the results of a comparative evaluation regarding the distribution of code complexity issues between Python and Java. Our most important results are synthesized in a series of remarks to help practitioners and educators allay complexity issues in junior developer code, as well as assist the latter in improving their coding skills. Finally, the dataset and SonarQube configuration are available in the form of an open data package that enables replicating or extending our work.

18:15
The Python Software Quality Dataset

ABSTRACT. With Python's ascension as a dominant programming language, particularly in the fields of artificial intelligence and data science, the need for comprehensive datasets focusing on software quality within Python projects has become increasingly noticeable. This study introduces a detailed dataset designed to address this gap, enriching academic resources in software engineering. The dataset encompasses a wide array of software quality metrics on up to 80 projects, including 51.765.853 SonarQube issues, 268.506 SonarQube code quality metrics, 11.915 software refactoring records, and 155.127 pairs of bug-inducing and bug-fixing commits, along with 863.931 GitHub issue tracker entries. This extensive collection serves as a versatile tool for various research activities, enabling analysis of the relationships between technical debt and software refactorings, correlations between refactoring processes and bug resolution, and their overall impact on software maintainability and reliability. Furthermore, the dataset provides a foundation for developing AI-driven predictive models to detect future software quality issues and refactoring opportunities. By offering a comprehensive and multifaceted dataset, this study significantly contributes to understanding and improving software quality in Python projects.