SAC_2025: THE 40TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING
PROGRAM FOR FRIDAY, APRIL 4TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:30 Session 17A: AIED
Location: ROSA DEI VENTI
09:00
Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs

ABSTRACT. In education, the capability of generating human-like text of Large Language Models (LLMs) inspired work on how they can increase the efficiency of learning and teaching. We study the affordability of these models for educators and students by investigating how LLMs answer multiple-choice questions (MCQs) w.r.t. hardware constraints and refinement techniques. We explore this space by using generic pre-trained LLMs (the 7B, 13B, and 70B variants of LLaMA-2) to answer 162 undergraduate-level MCQs from a course on Programming Languages (PL)---the MCQ dataset is a contribution of this work, which we make publicly available. Specifically, we dissect how different factors, such as using readily-available material---(parts of) the course's textbook---for fine-tuning and quantisation (to decrease resource usage) can change the accuracy of the responses. The main takeaway is that smaller textbook-based fine-tuned models outperform generic larger ones (whose pre-training requires conspicuous resources), making the usage of LLMs for answering MCQs resource- and material-wise affordable.

09:18
Fine-Tuning GPT-3.5-Turbo for Automatic Feedback Generation

ABSTRACT. Scaling up the delivery of effective feedback remains an open challenge in education. Existing automatic feedback generation (AFG) methods fall short in providing feedback highly tailored to tasks, students, and instructors’ preferences, simultaneously. Recent evidence suggests that Large Language Models (LLMs), with their ability to follow instructions and generate text, could address this limitation. Existing studies have generated feedback using GPT models in their ready-to-use Chat version, using almost exclusively prompting strategies to direct the model towards the desired output. Results are largely positive; however, space for improvement remains. For the first time, the present study reports observations and results from fine-tuning GPT-3.5-turbo for AFG for open-ended situational judgment questions from the high-stakes test Casper. The LLM was fine-tuned using a small set of hand-written feedback examples, and independent judges and text experts evaluated model performance using a rubric based on qualities of effective feedback identified in the literature. Moreover, a survey study measured users’ satisfaction with automatic feedback. Results show that, although not perfect, the fine-tuned model generated outputs largely aligned with the desired qualities and often aligned with the given guidelines, satisfying the majority of users. The strengths and weaknesses of our model are discussed, and directions for future research are suggested.

09:36
Assessing the Real-World Impact of Disagreement Between Human Graders and LLMs

ABSTRACT. Applying artificial intelligence models to grade student answers is a popular application. Lately Large Language Models (LLMs) have shown promising results. However, the disagreement between human graders and LLMs is often considered too large for practical adoption. In this paper, we investigate the real-world impact of this disagreement on final grades. Instead of focusing on individual answers, we simulate the grading process of an entire exam. We use an unmodified LLM (OpenAI GPT-3.5 Turbo) with one-shot prompting for grading individual answers to short answer questions from computer science courses at a German university. Our main contributions are the evaluation of the real-world impact on examination grades in contrast to correctness of individual student answers, the simulation of grading strategies common in human grading practice, and the discussion of the results in the context of observed inter-rater variabilities among human graders. The findings confirm the natural expectation that the impact of the disagreement is lower for final grades than when looking at individual answers. We quantify this effect and compare it to a grading obtained by simulating a second human grader.

09:54
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling

ABSTRACT. Generative artificial intelligence (GenAI) can reshape education and learning. While large language models (LLMs) like ChatGPT dominate current educational research, multimodal capabilities—such as text-to-speech and text-to-image—are less explored. This study uses topic modeling to map the research landscape of multimodal and generative AI in education. An extensive literature search using Dimensions.ai yielded 4175 articles. Employing a topic modeling approach, latent topics were extracted, resulting in 38 interpretable topics organized into 14 thematic areas. Findings indicate a predominant focus on text-to-text models in educational contexts, with other modalities underexplored, overlooking the broader potential of multimodal approaches. The results suggest a research gap, stressing the importance of more balanced attention across different AI modalities and educational levels. In summary, this research provides an overview of current trends in generative AI for education, underlining opportunities for future exploration of multimodal technologies to fully realize the transformative potential of artificial intelligence in education.

09:00-10:30 Session 17B: EMBS
Location: LIBECCIO
09:00
A Scalable Approach for Memory Optimization in AUTOSAR Schedule Tables

ABSTRACT. Modern embedded automotive software uses AUTOSAR for software development. This software is organized into a set of runnables that represent basic functionality. For optimal resource utilization, runnables are grouped into tasks. In AUTOSAR, a schedule table is used for the deterministic time triggering of task activations or events. In the state of the art, the schedule table is generated at design time. The memory demand of the schedule table is highly sensitive to the application parameters, i.e., the periods and offsets of the runnables and task types. Variations in these parameter values can very significantly increase the memory demand of the schedule table, even for small applications consisting of, e.g., only four runnables. In this paper, we propose an alternative approach in which the memory demand of the schedule table is insensitive to variations in the values of the periods and offsets of runnables, and task types. Using multiple case studies, we show that, although our approach introduces a slight runtime overhead, it reduces memory demand significantly.

09:18
Exploiting Omega Network and Inexact Accumulative Parallel Counter to Enhance Energy Efficiency in Stochastic Computing

ABSTRACT. Stochastic computing (SC) has garnered a great interest due to its energy efficiency and robustness against external noise, yet a long latency on stochastic computations and considerable overheads caused by conversions between binary numbers and stochastic numbers persist as notable challenges. This paper introduces a novel parallel random number generator (RNG) and accumulative parallel counters (APCs) to address both challenges. In particular, we propose a new parallel RNG design based on Omega network to bolster the randomness of generated numbers, thereby enhancing accuracy and reducing latency. Additionally, we introduce a novel APC design technique leveraging approximate 4-2 compressors to improve hardware efficiency while preserving the accuracy of SC computations. When implemented using a 65-nm CMOS technology, our proposed SC architecture outperforms other SC alternatives in terms of both hardware efficiency and computation accuracy. Specifically, our APC designs exhibit substantial enhancements of up to 30.1×, 26.6×, 5.9×, and 151× in area, power, delay, and energy, respectively, compared to traditional APCs. Also, we validate the efficacy of the proposed SC design through an image processing application, demonstrating superior processing quality alongside significantly enhanced hardware efficiency.

09:36
Optimizing Compute Core Assignment for Dynamic Batch Inference in AI Inference Accelerator

ABSTRACT. Modern AI inference accelerators offer high-performance and power-efficient computations for machine learning models. Most accelerators employ static inference to enhance performance, which requires models to be compiled with predetermined input batch sizes and intermediate tensor shapes. However, static inference can lead to program failures or inefficient execution when processing batched data of varying sizes, a scenario known as dynamic batch inference. This work addresses this challenge by focusing on the emerging multicore AI inference accelerators that offer flexible compute core assignment. We propose to dynamically partition the input batch data into smaller batches, and create multiple model instances to process each partition in parallel. The challenge lies in how to determine the optimal number of model instances, the proper batch size for each handling model, and the assignment of compute cores among the models, to minimize the inference time. To solve the problem, we construct an accurate profiling-based cost model and devise a dynamic programming algorithm to determine the best configuration. Experimental results indicate that our method achieves 3.05x higher throughput on average in multi-person pose estimation benchmarks, compared to the EdgeTPU-like inference strategy.

09:54
Efficient Scheduling of Weakly-Hard Real-Time Tasks with Sufficient Schedulability Condition

ABSTRACT. Many real-time tasks, particularly control tasks, can accommodate occasional missed deadlines thanks to robust algorithms. These tasks can be effectively modeled using the weakly-hard model, which specifies the maximum number of tolerable deadline misses, denoted as mi , within a sequence of Ki executions. Research indicates that utilizing the weakly-hard model can significantly reduce the over-provisioning typically required in the design of real-time systems. Therefore, different scheduling algorithms and schedulability analyses have been proposed in the last few years. However, state-of-the-art scheduling analyses do not scale with larger values of Ki. We present a new job-level fixed priority scheduling algorithm whose schedulability analysis scales with Ki . Furthermore, our scheduling algorithm leverages the tolerable continuous deadline misses to assigning priorities to jobs. Schedulability analyses show that the computation time of our analysis is up to 100 faster comparing to the approaches in literature improving also the schedulability ratio for total utilization of 0.9.

09:00-10:30 Session 17C: IMFBS & GIA
Location: BORA
09:00
X3A: Efficient Multimodal Deepfake Detection with Score-Level Fusion

ABSTRACT. Advances in deepfake generation have highlighted the necessity for sophisticated detection methods and realistic datasets to ensure models are effectively generalized. While traditional datasets focused on unimodal manipulations, the emergence of multimodal datasets, which include audio-visual forgeries, increased the complexity of deepfake detection. The recent release of the LAV-DF and AV-Deepfake1M datasets featured partial manipulations in multimodal contents and underscored the need for effective video-level detection methods to identify these forgeries. In this work, we propose X3A, an efficient multimodal video deepfake detection model exploiting two powerful unimodal models with probabilistic score-level fusion. X3A leverages the advantage of using raw visual and audio inputs without relying on hand-crafted features. We conducted the extensive experiments on multiple different multimodal deepfake benchmark datasets and achieved superior performance on multimodal deepfake detection, successively detecting entirely and partially manipulated scenarios. Our X3A model demonstrates an accuracy of 0.9960 AUC of 0.9999 on the most challenging AV-Deepfake1M benchmark, surpassing all existing models.

09:18
GAN or DM? In-depth Analysis and Evaluation of AI-generated Face Data for Generalizable Deepfake Detection

ABSTRACT. Deepfake detection remains challenging, particularly when identifying deepfakes generated by unseen forgery methods. Recent studies have shown that detectors trained on forgery data from Generative Adversarial Networks (GAN) cannot generalize well on data from Diffusion Models (DM) and vice versa. As generative methods such as GAN and DM are significantly advanced for creating highly photorealistic images, it becomes crucial to develop generalized methods to detect forgeries generated from different generation methods. While research on generalizable detectors is gaining momentum, the impact of training data on detectors’ generalization ability has yet to be extensively studied, especially concerning synthetic human face images. In this work, we train popular deep neural networks using face data generated by various generative models and thoroughly analyze their generalizability. Our results reveal significant differences in model performance based on the forgery method used to generate the training data. Notably, we identify specific scenarios that significantly enhance model generalization, contradicting previous research finding that models trained on DM-generated data would achieve higher generalization performance than those trained on GAN-generated data. These findings emphasize the crucial role of training data selection in enhancing the generalization capabilities of deepfake detectors. By strategically selecting and combining datasets, we can develop more robust detection systems, laying a foundation for future research in creating reliable and universal deepfake detection methods.

09:36
Finger Vein Spoof GANs: Issues in Presentation Attack Detector Training

ABSTRACT. Four GAN-based I2I translation techniques for unpaired data are employed for the synthesis of biometric finger vein presentation attack instrument (PAI) samples corresponding to three public presentation attack datasets. These synthetic samples are used to train presentation attack detectors (PAD) using a variety of distinct feature sets in their classifier. The primary objective of this work is to assess the usefulness of these synthetic data for augmenting PAI datasets, and our analysis reveals that CycleGAN generated PAI samples are best suited to train PAD while DRIT generated data are hardly suited at all for this task. This result corresponds well to visual appearance and objective quality measures of the synthetic PAI samples.

However, there are some particularities connected to the nature of the PAD training: Different types of features used in PAD can lead to very different behaviour of the PAD system trained with synthetic data. For example, Fourier or LBP feature sets must not be used as these respond more to the embedded GAN model fingerprints than to visual similarity of synthetic and real PAI samples. On the other hand, pre-trained neural network features, Haralick features, and surprisingly, also simple features like histograms or localised variance and entropy can be used in the PAD system and lead to stable PAI sample detection results across all datasets and GAN-types (except DRIT) considered. Consequently, results indicate which synthesis technique / feature extraction scheme combinations should be considered when augmenting real PAI samples with synthetic ones in PAD training, and which combinations should be avoided.

09:54
Features Leverage in Graph Models for Mineral ProspectivityMapping

ABSTRACT. Mineral Prospectivity Mapping (MPM), the process of identifying areas with high potential for mineral deposits, can be divided into two main categories: knowledge-driven and data-driven. Knowledge-driven techniques rely on expert opinion on geological data, while data-driven techniques employ ML models to predict the probabilities of mineral occurrences based on known geological datasets.Recently, with the advancement of machine learning (ML) methods,data-driven MPM has gained significant improvements. Notably,graph-based approaches overcome the disadvantages of previously used approaches (pixel-based, image-based) and have demonstrated better performances. However, the graph construction in current methods is based solely on spatial distances between pixels, regard-less of their geological attributes. In this paper, we introduce a novel graph construction approach that combines spatial distances with other distances obtained from feature mining. Our experiments show that this combination outperforms existing graphs, and can be considered as a promising approach to integrate feature mining into data-driven models in MPM.

09:00-10:30 Session 17D: ST
Chair:
Location: GRECALE
09:00
A Circular Construction Product Ontology for End-of-Life Decision-Making

ABSTRACT. Efficient management of end-of-life (EoL) products is critical for advancing circularity in supply chains, particularly within the construction industry where EoL strategies are hindered by heterogenous lifecycle data and data silos. Current tools like Environmental Product Declarations (EPDs) and Digital Product Passports (DPPs) are limited by their dependency on seamless data integration and interoperability which remain significant challenges. To address these, we present the Circular Construction Product Ontology (CCPO), an applied framework designed to overcome semantic and data heterogeneity challenges in EoL decision-making for construction products. CCPO standardises vocabulary and facilitates data integration across supply chain stakeholders enabling lifecycle assessments (LCA) and robust decision-making. By aggregating disparate data into a unified product provenance, CCPO enables automated EoL recommendations through customisable SWRL rules aligned with European standards and stakeholder-specific circularity SLAs, demonstrating its scalability and integration capabilities. The adopted circular product scenario depicts CCPO's application while competency question evaluations show its superior performance in generating accurate EoL suggestions highlighting its potential to greatly improve decision-making in circular supply chains and its applicability in real-world construction environments.

09:18
LogBabylon: A Unified Framework for Cross-Log File Integration and Analysis

ABSTRACT. Logs are critical resources that record events, activities, or messages produced by software applications, operating systems, servers, or network devices. However, consolidating the heterogeneous logs and cross-referencing them is challenging and complicated. Manually analyzing the log data is time-consuming and prone to errors. LogBabylon is a centralized log data consolidating solution that leverages Large Language Models (LLMs) integrated with Retrieval-Augmented Generation (RAG) technology. LogBabylon interprets the log data in a human-readable way and adds insight analysis of the system performance and anomaly alerts. It provides a paramount view of the system landscape, enabling proactive management and rapid incident response. LogBabylon consolidates diverse log sources and enhances the extracted information's accuracy and relevancy. This facilitates a deeper understanding of log data, supporting more effective decision-making and operational efficiency. Furthermore, LogBabylon streamlines the log analysis process, significantly reducing the time and effort required to interpret complex datasets. Its capabilities extend to generating context-aware insights, offering an invaluable tool for continuous monitoring, performance optimization, and security assurance in dynamic computing environments.

09:36
Taxonomy Expansion through Collaborative LLM Mapping

ABSTRACT. Hierarchical taxonomies serve as essential structures for organizing and analyzing concepts across various domains, including healthcare, finance, and economics. However, maintaining their accuracy and relevance presents significant challenges, requiring experts to identify and revise new concepts constantly. In this context, distributional semantics techniques emerge as a valuable solution by suggesting terms that may be associated with existing concepts.

This research introduces an innovative method for enhancing taxonomies by integrating related terms using contextual word embeddings as encoders. We present TAXMAP (TAonomy eXpansion through Collaborative LLM MAPping), a system designed to autonomously expand any hierarchical taxonomy with new terms utilizing three generative models. Moreover, TAXMAP includes a human validation component to guarantee the selection of the most relevant terms for incorporation.

Our framework was implemented in an EU initiative aimed at refining the official European Skill taxonomy, ESCO, by incorporating over 40,000 digital terms gathered from the Web, thus aligning ESCO skills with the evolving needs of the labor market. As a result, 924 terms were proposed, with 757 of them validated by domain experts as accurate matches. By employing a suite of large language models (LLMs) as encoders, our framework effectively overcomes the limitations of generative models, reducing errors and ensuring high precision in taxonomy enrichment. Additionally, the initial deployment of TAXMAP significantly decreased the human effort required for the project. We evaluated the robustness of our system against a baseline based on ESCO's hierarchy, achieving an impressive 81% Positive Predictive Value (PPV) when combining all three models.

09:54
SkiLLMo: Normalized ESCO Skill Extraction through Transformer Models
PRESENTER: Antonio Serino

ABSTRACT. In recent years, natural language processing (NLP) technologies have made a significant contribution in addressing a number of labour market tasks. One of the most interesting challenges is the automatic extraction of competences from unstructured texts.

This paper presents a pipeline for efficiently extracting and standardizing skills from job advertisements using NLP techniques. The proposed methodology leverages open-source Transformer and Large Language Models (LLMs) to extract skills and map them to the European labour market taxonomy, ESCO.

To address the computational challenges of processing lengthy job advertisements, a BERT model was fine-tuned to identify text segments likely containing skills. This filtering step reduces noise and ensures that only relevant content is processed further. The filtered text is then passed to an LLM, which extracts implicit and explicit hard and soft skills through prompt engineering. The extracted skills are subsequently matched with entries in a vector store containing the ESCO taxonomy to achieve standardization.

Evaluation by domain experts shows that the pipeline achieves a precision of 91% for skill extraction, 80% for skill standardization and a combined overall precision of 79%. These results demonstrate the effectiveness of the proposed approach in facilitating structured and standardized skill extraction from job postings.

10:30-11:00Coffee Break
11:00-12:30 Session 18A: AIED
Location: ROSA DEI VENTI
11:00
Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance

ABSTRACT. Educational data mining (EDM) is a part of applied computing that focuses on automatically analyzing data from learning contexts. Early prediction for identifying at-risk students is a crucial and widely researched topic in EDM research. It enables instructors to support at-risk students to stay on track, preventing student dropout or failure. Previous studies have predicted students' learning performance to identify at-risk students by using machine learning on data collected from e-learning platforms. However, most studies aimed to identify at-risk students utilizing the entire course data after the course finished. This does not correspond to the real-world scenario that at-risk students may drop out before the course ends. To address this problem, we introduce an RNN-Attention-KD (knowledge distillation) framework to predict at-risk students early throughout a course. It leverages the strengths of Recurrent Neural Networks (RNNs) in handling time-sequence data to predict students' performance at each time step and employs an attention mechanism to focus on relevant time steps for improved predictive accuracy. At the same time, KD is applied to compress the time steps to facilitate early prediction. In an empirical evaluation, RNN-Attention-KD outperforms traditional neural network models in terms of recall and F1-measure. For example, it obtained recall and F1-measure of 0.49 and 0.51 for Weeks 1--3 and 0.51 and 0.61 for Weeks 1--6 across all datasets from four years of a university course. Then, an ablation study investigated the contributions of different knowledge transfer methods (distillation objectives). We found that hint loss from the hidden layer of RNN and context vector loss from the attention module on RNN could enhance the model's prediction performance for identifying at-risk students. These results are relevant for EDM researchers employing deep learning models.

11:18
The Use of Generative Artificial Intelligence for Upper Secondary Mathematics Education Through the Lens of Technology Acceptance

ABSTRACT. In this study, we investigate the students’ perceptions of using Generative Artificial Intelligence (GenAI) in upper secondary mathematics education. Data to represent how key constructs of the Technology Acceptance Model—Perceived Usefulness, Perceived Ease of Use, Perceived Enjoyment, and Intention to Use—influence the adoption of AI tools was collected from the Finnish high school students. First, a structural equation model for a comparative study with [19] was constructed and analyzed. Then, an extended model with the additional construct of Compatibility, which represents the alignment of AI tools with students’ educational experiences and needs, was proposed and analyzed. The results demonstrate the strong influence of perceived usefulness on the intention to use GenAI, emphasizing the statistically significant role of perceived enjoyment in determining perceived usefulness and ease of use. The inclusion of compatibility improved the model’s explanatory power, particularly in predicting perceived usefulness. This study contributes to a deeper understanding of how AI tools can be integrated into mathematics education and highlights key differences between the Finnish educational context and previous studies based on structural equation modeling.

11:36
UKTA: Unified Korean Text Analyzer

ABSTRACT. Evaluating writing quality is complex and time-consuming often delaying feedback to learners. While automated writing evaluation tools are effective for English, Korean automated writing evaluation tools face challenges due to their inability to address multi-view analysis, error propagation, and evaluation explainability. To overcome these challenges, we introduce UKTA (Unified Korean Text Analyzer), a comprehensive Korea text analysis and writing evaluation system. UKTA provides accurate low-level morpheme analysis, key lexical features for mid-level explainability, and transparent high-level rubric-based writing scores. Our approach enhances accuracy and quadratic weighted kappa over existing baseline, positioning UKTA as a leading multi-perspective tool for Korean text analysis and writing evaluation.

11:54
Measuring (meta)emotion, (meta)motivation, and (meta)cognition using digital trace data: A systematic review of K-12 self-regulated learning

ABSTRACT. Artificial intelligence (AI) has demonstrated significant potential in enhancing digital learning by offering personalized and adaptive experiences that meet learners' individual needs. However, while self-regulated learning (SRL) skills are critical for success in digital environments, AI-driven learner models mainly focus on cognitive processes, with limited integration of SRL skills. This systematic review synthesizes research from 1990 to 2024, analyzing digital trace data from various learning platforms to identify which data serve as indicators of the three phases and areas of SRL in K-12 digital learning. Our findings have identified digital trace data that can be used as indicators of (meta)emotion, (meta)motivation, and (meta)cognition across the three phases of SRL. Additionally, the findings revealed gaps in tracing (meta)motivation and (meta)emotion, especially in the preparatory and appraisal phases of SRL. While a variety of data traces in the (meta)cognitive area are addressed, the analysis of study results highlights the challenges of meaningfully interpreting the learning process. Despite the challenges, the literature review reveals that research on trace data for supporting SRL is evolving, with great potential for integrating SRL tracing with adaptive SRL scaffolding.

11:00-12:30 Session 18B: EMBS
Location: LIBECCIO
11:00
SeismicSense: Phase Picking of Seismic Events with Embedded Machine Learning

ABSTRACT. Analyzing seismic data is essential for understanding natural geological processes and anthropogenic activities, particularly in localizing seismic events. Recent advances in seismic analysis mainly rely on resource-intensive machine learning approaches, which, however, cannot be applied in resource-constraint environments (e.g. underwater, underground or rural areas). To address this, we present, SeismicSense, a lightweight neural network (NN)-based approach for seismic data analysis at the sensor level, which enables detecting seismic events and localizing them by picking seismic event phases. Our cascading architecture utilizes an initial NN to filter out non-earthquake events (false positives). Once SeismicSense identifies an earthquake, it then identifies the P- and S-phases, which are critical for accurately localizing seismic activity. Precise identification of these phases allows for the determination of a seismic activity's origin and magnitude, facilitating rapid and accurate response efforts. This also efficiently reduces data transmissions, allowing for selective communication when detecting seismic events, such as earthquakes. While being 20 times smaller than state-of-the-art models and requiring only 186 KB of RAM, SeismicSense achieves F1-scores of 99.4\% for earthquake, 98\% for P-wave, and 96\% for S-wave detections. Furthermore, we demonstrate the efficacy of integer acceleration of modern MCUs, which leads to an 18-fold reduction in inference time on Cortex-M MCUs when compared to non-accelerated inference.

11:18
Software-Hardware Binding for Protection of Sensitive Data in Embedded Software

ABSTRACT. Embedded software used in industrial systems frequently relies on data that ensures the correct and efficient operation of these systems. Thus, companies invest considerable resources in fine-tuning this data, making it their valuable intellectual property (IP). We present a novel protection mechanism for this IP that combines fingerprints extracted from hardware with Boolean logic. Unlike usual copy protection approaches, illegal copies of the software still run on cloned devices, but sub-optimally. According to our security evaluation, only a complex dynamic analysis of the protected software running on the genuine target device can unveil the secret data. This makes the protection offered by our method more difficult to bypass. Notably, our approach does not require additional hardware, relying only on relatively simple updates to the software. We evaluate our protection mechanism by binding the parameters of a PID controller to a microcontroller unit (MCU) by using a physically unclonable function (PUF) based on its SRAM.

11:36
asmMBA: Robust Virtualization Obfuscation with Assembly-Based Mixed Boolean-Arithmetic

ABSTRACT. Commercial virtualization obfuscation tools like VMProtect and Themida, which rely on transforming original code into virtual instructions, have been successfully reverse engineered by attackers. To safeguard the intellectual property of the virtualization obfuscation architecture from reverse engineering, recent works have applied complex Mixed Boolean-Arithmetic (MBA) obfuscation to the handler code responsible for the core functions of the virtualization obfuscation. In this paper, we first show that a state-of-the-art MBA-based protection method such as Loki can be efficiently deobfuscated and then we introduce Loki-Blast. The proposed method effectively simplifies nested MBA expressions, revealing weaknesses in current MBA-based obfuscation methods used in virtualization obfuscation tools. In light of these vulnerabilities, we propose asmMBA, a novel assembly-based MBA obfuscation technique. Applying MBA transformations directly at the assembly level, asmMBA introduces a layer of complexity that complicates the static and dynamic analysis, which enables the software to effectively resist modern deobfuscation tools like MBA-Blast and Chosen-Instruction Attack. Our evaluation shows that asmMBA can generate up to 10^42 distinct obfuscated versions of a simple program depending on the protection level. This makes it difficult for attackers to acquire reusable knowledge from the target program, and it also significantly increases the complexity of program analysis. We experimentally demonstrate that asmMBA expressions are not deobfuscated by the MBA deobfuscation tool. These results demonstrate that asmMBA provides strong protection against deobfuscation attacks while maintaining manageable performance overhead, making it a practical solution for real-world software protection.

11:54
Probabilistic Timing Estimates in Scenarios Under Testing Constraints

ABSTRACT. Measurement-based probabilistic (MBP) methods like Extreme Value Theory (EVT) and the Markov's Inequality have been exploited to derive probabilistic Worst-Case Execution Time (pWCET) estimates. Usually, the reliability and accuracy of pWCET techniques have been evaluated on medium to large sample sizes, N = [10^3, 10^5]. However, several works increasingly advocate for containing the cost of carrying out the test campaign by reducing the number of executions (i.e. the samle size) required by pWCET analysis. Specific scenarios, for example, impose inherent limitations on the collection of timing measurements due to cost and availability of appropriate testing facilities. In this work, we analyze the impact of small sample sizes on MBP. Our analysis shows that classical EVT models for tail estimation require a threshold that estimates where the tail of the distribution begins. In low sample scenarios, the uncertainty in determining this threshold can compromise the reliability of EVT estimates. We also assess the impact of small samples on RESTK, a time forecast methods based on Markov's Inequality. Our results with synthetic data and representative kernels show that RESTK provides the best trade-off in terms of trustworthiness and tightness for small samples, partly due to not relying on the selection of any threshold, as opposed to EVT.

11:00-12:30 Session 18C: GIA & DS
Location: BORA
11:00
Mapillary Street Vegetation Scoring: End-to-End Process

ABSTRACT. This paper introduces a novel framework for quantifying urban street greenery by integrating crowdsourced street-level panoramic images with advanced computer vision techniques. By leveraging 360$^{\circ}$ images captured by volunteers and applying semantic segmentation using the Mask2Former model, this study provides a detailed, human-centered perspective on urban greenery. The framework not only quantifies greenery at the street level but also establishes meaningful correlations with national tree coverage data derived from aerial imagery and LiDAR. Key contributions include the development and validation of a robust methodology for scoring street-level greenery, demonstrating its alignment with large-scale national tree data, and showcasing its potential to enhance urban environmental analysis. These findings provide valuable insights into the automation of green space accessibility analysis and have significant implications in cities.

11:18
MorphoLayerTrace (MLT): A Modified Automated Radio-Echo Sounding Englacial Layer-tracing Algorithm for Englacial Layer Annotation in Ice Penetrating Radar Data

ABSTRACT. Modeling ice flow is a critical component of sea level rise projections, yet the datasets available to enhance our understanding of large-scale ice dynamics remain limited. Extracting the englacial layer configuration of the Greenland ice sheet offers valuable insights into the age of the ice, which can inform studies of past snow accumulation, glacier sliding, and provide context for modern glacier change. Although these englacial layers have been extensively surveyed using ice-penetrating radar/radio-echo sounding, the resulting radargram imagery, with fine grained ice layers, is often labeled manually or semi-automatically. This is a labor-intensive process that hinders integration into glacier models. In this paper, we propose an improved automatic annotation method, MorphoLayerTrace (MLT), building upon the Automated Radio-Echo Sounding Englacial Layer-tracing Package (ARESELP). Our approach enhances englacial layer tracing by utilizing peak distance thresholds and morphological image processing to select reliable seed points, significantly improving layer continuity and reducing discontinuities. Our technique is designed to operate effectively on both individual radargram frames and multi-frame sets, enabling better performance over extended distances. We evaluate the method using 100 radargram frames collected across North Greenland, demonstrating its ability to trace more layers and maintain greater continuity compared to previous methods. Furthermore, we introduce novel validation metrics, such as the Layer Proportion Score ($LPS$) and the Multi-Frame Layer Consistency (MFLC) score, which provide a more robust and ground truth-independent evaluation of annotation quality. Our results show that while the method excels in short-range layer detection over the prior layer tracing methods, further refinement is needed for maintaining long-range continuity across multiple frames, offering a promising direction for future development in automatic englacial layer annotation.

11:36
Sanity Checks in Smart Home Sensor Streams

ABSTRACT. The integrity of sensor datasets used in smart home applications is crucial for tasks like activity recognition and automation. We identify common validity issues such as event ordering errors, lifecycle inconsistencies, and data corruption, which are often overlooked but can significantly affect the reliability of analyses. We present a toolbox based on the BeepBeep stream processing library that enables efficient verification of these sanity checks on data streams. Our analysis of several publicly available smart home datasets reveals that most of them violate key assumptions about sensor behavior, emphasizing the need for pre-validation.

11:54
Efficient Instance Selection in Tree-Based Models for Data Streams Classification

ABSTRACT. The learning from continuous data streams is a relevant area within machine learning, focusing on the creation and updating of predictive models in real time as new data becomes available for training and prediction. Among the most widely used methods for this type of task, Hoeffding Trees are highly valued for their simplicity and robustness across a variety of applications and are considered the primary choice for generating decision trees in data stream contexts. However, Hoeffding Trees tend to continuously expand as new data is incorporated, resulting in increased processing time and memory consumption, often without providing significant gains in accuracy. In this study, we propose an instance selection scheme that combines different strategies to regularize Hoeffding Trees and their variants, mitigating excessive growth without compromising model accuracy. The method selects misclassified instances and a fraction of correctly classified instances during the training phase. After extensive experimental evaluation, the instance selection scheme demonstrates superior predictive performance compared to the original models (without selection), for both real and synthetic datasets for data streams, using a reduced subset of examples. Additionally, the method achieves relevant improvements in processing time, model complexity, and memory consumption, highlighting the effectiveness of the proposed instance selection scheme.

12:12
ASML-REG: Automated Machine Learning for Data Stream Regression

ABSTRACT. Online learning scenarios present a significant challenge for AutoML techniques due to the dynamic nature of data distributions, where the optimal model and configuration may change over time. While most research in machine learning for data streams has primarily focused on classification algorithms, regression methods have received significantly less attention. To address this gap, we propose ASML-REG, an Automated Streaming Machine Learning framework designed specifically for regression tasks on data streams. ASML-REG continuously explores a vast and diverse space of pipeline configurations, adapting to evolving data by focusing on the current best design, performing adaptive random searches in promising areas, and maintaining an ensemble of top-performing pipelines. Our experiments with real and synthetic datasets demonstrate that ASML-REG significantly outperforms current state-of-the-art data stream regression algorithms.

11:00-12:30 Session 18D: RE
Location: GRECALE
11:00
UTL: A Unified Language for Requirements Templates

ABSTRACT. Requirements specification is an important phase of the software development life cycle, especially for safety critical systems (SCS) due to their high number of requirements and certification constraints. The use of templates to specify requirements has been proposed in literature as they strike a balance between the ambiguity of natural language and the difficulty of using formal languages. However, existing template-based approaches use different notations, rarely provide tool support, and generally target specific types of requirements. Thus, it is often necessary to create new custom templates, but it is difficult to do so given that there is no well-defined process to follow and no unified notation to reuse. To fill this gap, we propose the Unified Templates Language (UTL), a unified language for the definition of requirements templates and a process for using the language. We leverage model-driven engineering (MDE) to build UTL. Using MDE supports the creation and evolution of templates, and it eases the extension, maintainability, and implementation of UTL. UTL was proposed to support an industrial partner in the certification of a SCS, and implemented within a requirements specification tool. In this paper, we introduce the abstract syntax, concrete syntax, well-formedness rules and the semantics of UTL. We also provide a systematic process for creating templates using UTL. We evaluate the ability of UTL to specify different types of templates, and its usability and usefulness through a user study. The results show that UTL covers different kinds of templates, and, together with its supporting tool, it eases the creation of templates.

11:18
Advances in Requirements Engineering for Well-Being, Aging, and Health: A Systematic Mapping Study

ABSTRACT. Context. The increasing focus on well-being, aging, and health (WBAH) is impacting various fields, including Requirements Engineering (RE). This is particularly relevant, as RE involves defining, documenting, and managing software system requirements to ensure that developed systems effectively address user needs. However, there remains a limited understanding of how RE practices can be tailored to create WBAH-centric systems. Aims. This study aims to explore the current landscape of RE practices used in the development of software systems that support WBAH. Method. We conducted a systematic mapping study, analyzing articles from reputable conferences and journals. Results. Our review identified 50 articles published between 2013 and 2023 that examine RE efforts related to specifying and documenting WBAH systems. The predominant topics discussed include usability and evaluation techniques, elicitation and modeling methods, security and privacy concerns for healthcare applications, and the adaptation of RE for specific populations, such as the elderly. Conclusion. This study highlights significant knowledge gaps that present opportunities for further investigation by RE researchers.

11:36
Enhancing Device-Goal-Norm Modeling for Ambient Assisted Living with Large Language Models

ABSTRACT. In the rapidly evolving landscape of elderly care, designing personalized Ambient Assisted Living (AAL) applications relying on IoT-enabled devices presents complex challenges at the intersection of technology, human needs, and ethical considerations. This paper presents an innovative approach that enhances the design process of these applications by integrating large language models (LLMs) to automatically generate interaction patterns between IoT devices and users' needs and preferences. These patterns are used to select the more appropriate devices and their interaction modalities for each user at run-time. The LLM-enhanced process not only reduces analysts' workload but also may reveal nuanced interactions between IoT devices and user needs that might be missed in traditional methods, where the knowledge of the characteristics of available devices may be incomplete or not updated. The proposed approach, on the one hand, facilitates the design of highly adaptive and personalized assistive technologies and, on the other hand, demonstrates the potential of combining artificial intelligence with human expertise to improve software system design.

11:54
SimRE: A Requirements Similarity Tool for Software Product Lines

ABSTRACT. A Software Product Line (SPL) is a paradigm that effectively describes families of products based on reuse. Requirements engineering in this domain is a complex task, especially when new products are introduced. In this context, identifying similarities between new and existing requirements can help avoid additional effort and duplication. On the other hand, with the accelerated progress of deep learning in textual analysis, particularly through pre-trained models, several opportunities for semantic analysis have emerged, including semantic similarity. However, most of the research in this area has focused on the English language, with limited attention given to other languages, such as Spanish.

In this paper, we introduce a novel tool, SimRE, that helps SPL engineers automatically identify the similarity of requirements written in Spanish using multilingual pre-trained models. We conducted a quasi-experiment to validate and compare the performance of pre-trained models within the tool. This analysis aimed to provide a better understanding of their effectiveness in the Spanish language, particularly in the context of a Geographic Information System (GIS) product line.