SAC_2025: THE 40TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING
PROGRAM FOR THURSDAY, APRIL 3RD
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 13: Keynote: Andrea Cavallaro

Title: The Pursuit of Privacy in the AI Age

Abstract:
We generate and share vast amounts of data that reveal a detailed portrait of our lives, exposing our identity, behaviors, and preferences. To enable individuals to exercise greater control over their personal information, I will present novel approaches to identify and protect sensitive attributes within data. I will present feature representations that effectively disentangle sensitive information from non-sensitive attributes. Furthermore, I will present perturbation techniques designed to obfuscate sensitive attributes while preserving or potentially enhancing the overall quality of the data, thereby safeguarding sensitive information from unwanted inference.

Location: PLENARY Room
10:00-10:30Coffee Break
10:30-12:00 Session 14A: MLA
Location: ROSA DEI VENTI
10:30
Collaborative filtering through weighted similarities of user and item embeddings

ABSTRACT. The adoption of neural networks and other complex models in recommender systems has increased markedly in recent years, often considered state-of-the-art. However, award-winning research has demonstrated that traditional matrix factorization methods can still be competitive with these newer models, offering simplicity in implementation and lower computational demands. Hybrid models integrating matrix factorization algorithms are a common strategy, effectively combining different methods to mitigate their limitations. This paper introduces a novel ensemble method that merges user--item and item--item recommendations through weighted similarity to generate top-$N$ recommendations. Unlike previous approaches, our proposal employs the same user and item embeddings for both recommendation strategies, enhancing model efficiency. Experimental results indicate that our ensemble method achieves competitive performance and exhibits stability across diverse datasets, performing well in scenarios favoring either user-item or item-item recommendations. Furthermore, the model eliminates the need for embedding-specific fine-tuning, enabling seamless reuse of hyperparameters from the underlying algorithm without performance degradation. This contributes to the model's simplicity and efficiency. Our code is open-sourced at https://anonymous.4open.science/r/weighted-sims-7B45/.

10:48
Network-based instance hardness measures for classification problems

ABSTRACT. Instance hardness measures allow one to characterize and understand why some instances are harder to classify than others in a classification dataset. An instance can be hard to classify for different reasons, such as being in an overlapping region of the classes or a region of poor data representativeness. While there are many instance hardness measures in the related literature, they are mainly concerned with measuring class overlap. This paper also addresses measuring sparsity in a dataset by building a proximity graph from data and extracting some network-based measures from the nodes. Experimentally, we show that some of these measures are effective in characterizing instance hardness and complement the ones from the literature by measuring the density of the regions where the instances are located.

11:06
Data Balancing for Mitigating Sampling Bias in Machine Learning

ABSTRACT. We increasingly integrate technology into our daily activities, and using Machine Learning (ML) algorithms in various domains has become a common practice. However, in crucial sectors where algorithmic decisions significantly impact people's lives, there is a need to scrutinize these decisions more carefully. Using these algorithms in critical areas, such as courtrooms, raises concerns about potential bias and prejudice, directly affecting the justice and partiality of these tools. There is an urge to create algorithms supporting ethical decisions. This paper proposes using data balancing techniques to mitigate the sample bias present in datasets, aiming to make subsequent ML algorithm training more impartial. A version of the ADASYN algorithm is developed, which performs data balancing at both the class level and at the level of protected attributes, enhancing the diversity and representativeness of the protected groups in the datasets. Experimental results show the technique can promote greater fairness in the predictions of different ML models while keeping a good trade-off with overall accuracy.

11:24
MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation

ABSTRACT. In this paper, we introduce MoRSE (Mixture of RAGs Security Experts), the first specialised AI chatbot for cybersecurity. MoRSE aims to provide comprehensive and complete knowledge about cybersecurity. MoRSE uses two RAG (Retrieval Augmented Generation) systems designed to retrieve and organize information from multidimensional cybersecurity contexts. MoRSE differs from traditional RAGs by using parallel retrievers that work together to retrieve semantically related information in different formats and structures. Unlike traditional Large Language Models (LLMs) that rely on Parametric Knowledge Bases, MoRSE retrieves relevant documents from Non-Parametric Knowledge Bases in response to user queries. Subsequently, MoRSE uses this information to generate accurate answers. In addition, MoRSE benefits from real-time updates to its knowledge bases, enabling continuous knowledge enrichment without retraining. We have evaluated the effectiveness of MoRSE against other state-of-the-art LLMs, evaluating the system on 600 cybersecurity specific questions. The experimental evaluation has shown that the improvement in terms of relevance and correctness of the answer is more than 10\% compared to known solutions such as GPT-4 and Mixtral 7x8.

11:42
PULLM: A Multimodal Framework for Enhanced 3D Point Cloud Upsampling Using Large Language Models

ABSTRACT. Point cloud upsampling is a critical task in 3D computer vision, aiming to generate dense and uniformly distributed point sets from sparse inputs. While current self-supervised methods show promise, they often struggle with preserving fine-grained geometric details, especially for highly sparse point clouds. To address these limitations, we propose PointUpsampleLLM (PULLM), a novel multimodal framework that leverages the power of large language models (LLMs) to enhance 3D point cloud upsampling. PULLM integrates a pretrained Point Cloud LLM (PointLLM) with visual features extracted from point clouds, learning a unified representation that captures both geometric and semantic information. At the core of our approach is the Feature Aware Translator (FAT) module, which effectively bridges the modality gap between visual and textual features, enhancing the spatial understanding of the LLM. PULLM generates textual descriptions of point clouds on-the-fly, eliminating the need for large paired datasets. Extensive experiments on the PU1K and PUGAN benchmarks demonstrate that PULLM consistently outperforms state-of-the-art methods, achieving significant improvements in Chamfer Distance, Hausdorff Distance, and Point-to-Plane distance metrics. For instance, on the PUGAN dataset with sparse inputs, PULLM achieves a 56.15\% improvement in Chamfer Distance over the best baseline. Our qualitative results further illustrate PULLM's superior ability to preserve fine details and generate high-quality upsampled point clouds across various object types and geometries.

10:30-12:00 Session 14B: IRMAS
Location: LIBECCIO
10:30
Communication Isolation For Multi-Robot Systems Using ROS2

ABSTRACT. This article examines how to design communication isolation in a fleet of robots using the Robot Operating System (ROS2). Several communication methods are analyzed and compared based on various criteria (isolation scale, isolation strength, network usage, etc.). A specific scenario is implemented for each communication method to demonstrate their feasibility and highlight some of their limitations. Finally, experimental results on both simulations and real robots compare five communication strategies, providing guidelines to help select the best method depending on the project's requirements.

10:48
MARTES: Multi-Agent Reinforcement learning Training Environment for Scheduling

ABSTRACT. Digitization fostered by the Industry 4.0 paradigm and smart factories leads to more connectivity and data abundance but also to a more dynamic industrial environment that makes scheduling an even harder problem. Large factories with complex configurations like Hybrid Flow-Shops (HFS) cannot rely on centralized, reactive, and non-adaptive heuristics, or metaheuristics that produce high-quality schedules but are time-expensive. We propose MARTES, a Multi-Agent Reinforcement Learning (MARL) Training Environment for Scheduling. In this work, MARTES trains models to be used in HFS scenarios. The resulting models decide among different dispatching rules to select what job to process next. The results show that exploiting MARTES models yields high-quality schedulings, outperforming traditional dispatching rules like First Come First Serve, Earliest Deadline First, or Shortest Job First by even a 26.4%, increasing the deadlines met by jobs in more than 30%, and improving tardiness by even a 50.5% in time-constrained scenarios. MARTES models can also compete in performance with heuristics as NEH, and metaheuristics as genetic (GA) or iterative greedy (IG) algorithms, differing in less than a 1% in makespan results for large instances. Time-wise, NEH can be up to 2 orders of magnitude slower than MARTES models' training times. GA or IG execution times can be similar to MARTES models' training times but require additional executions when changes occur on the factory pipeline, unlike MARTES models.

11:06
A Semantic Mapping Framework for Service Robots

ABSTRACT. Service robots are undergoing a massification process similar to what happened with personal computers and cell phones a few decades ago. Their ubiquitous coexistence and interaction with humans requires that their representation models of the workspace go beyond metric information used for safe navigation. They are also required to assign semantic meaning to objects and places, i.e. to build semantic maps, in order to understand scenes and engage in human-like interactions. This paper proposes the Semantic MAPping (SMAP) framework to provide a service robot operating in human populated environments with a semantic mapping layer on top of a metric SLAM layer. SMAP is modular, expandable, and efficient enough to run locally on the robot. It has been implemented in Robot Operating System 2 (ROS2) using modular Docker containers. Preliminary experiments with a Pioneer 3-DX mobile robot having a system on module Nvidia Jetson AGX Xavier demonstrated its potential for future service robotics applications.

11:24
Lightweight Decentralized Neural Network-Based Strategies for Multi-Robot Patrolling

ABSTRACT. The problem of decentralized multi-robot patrol has previously been approached primarily with hand-designed strategies for minimization of "idleness" over the vertices of a graph-structured environment. Here we present two lightweight neural network-based strategies to tackle this problem, and show that they significantly outperform existing strategies in both idleness minimization and against an intelligent intruder model. Our results also indicate important considerations for future strategy design.

11:42
ST-CBS: Spatio-Temporal Conflict Based Search in Continuous Space for Multi-Agent Pathfinding

ABSTRACT. In this paper, we consider the problem of Multi-Agent Pathfinding (MAPF) in continuous space to find conflict-free paths. The difficulty of the problem arises from two primary factors. First, the involvement of multiple agents leads to combinatorial decision-making, escalating the search space exponentially. Second, the continuous space presents potentially infinite states and actions. We propose a two-level approach, Spatio-Temporal Conflict Based Search (ST-CBS). For the low level, we develop Unidirectional Spatio-Temporal RRT* to generate a path in a spatio-temporal state space for each agent without considering inter-agent conflicts. At the high level, ST-CBS performs a best-first search on a Constraints Tree to resolve conflicts found in the paths of agents. Our method offers benefits in terms of deadlock prevention and faster computation. In the experiment, ST-CBS achieves higher success rates even with a large number of agents (100% for 20 and more agents) and faster planning and execution time compared to recent MAPF algorithms.

10:30-12:00 Session 14C: CPS
Location: BORA
10:30
Rabbitail: A Tail Latency-Aware Scheduler for Deep Learning Recommendation Systems with Hierarchical Embedding Storage

ABSTRACT. Deep learning-based recommendation systems are critical for many online platforms, but face challenges in managing large embedding tables while meeting strict latency requirements. This paper presents Rabbitail, a novel inference scheduler designed for recommendation systems utilizing hierarchical DRAM-SSD embedding storage. Rabbitail employs a cache-aware approach, classifying inferences into hit and miss categories based on embedding cache lookup results. This allows hit inferences to proceed immediately to top MLPs without waiting for slower SSD retrievals. For miss inferences, Rabbitail implements an on-demand embedding lookup strategy and a reordering mechanism to optimize SSD retrieval. Additionally, it uses dedicated resource allocation for prompt processing of miss queue MLP tasks and employs batch splitting to manage maximum execution times. Evaluations using real-world datasets demonstrate that Rabbitail significantly reduces end-to-end model inference tail latency, achieving a 53.7% lower p99 tail latency compared to the baseline while maintaining throughput.

10:48
Relay Memory Analysis via Constraint Satisfaction Modeling for Malware Detection in the Electrical Power Grid

ABSTRACT. In this paper, we describe an artificial intelligence (AI) method that recognizes valid power grid physics data in the memory dump of an industrial computer, otherwise known as a protective relay. Protective relays are special-purpose computers that run algorithms specialized in monitoring and controlling power grid equipment such as power transformers. We designed and implemented in Python a backtracking search algorithm based on the IEC 61850 specification, which seeks to validate the physics meaning of analyzed data. These data are assigned to the domains of power quantities, which are then structured into a constraint hypergraph. Our approach performs node and arc consistency checks and revisions on the constraint hypergraph, generates power quantities for data assignments during the search, and then generates a search tree while exploring assignments of data to power quantities that are physics compliant. If a complete assignment is reached, the analyzed data are deemed physics compliant. If otherwise a solution is not possible, the analyzed data are deemed to include malicious code and data.

The rationale behind our work is that the bytes of shellcode or memory addresses, which are commonly injected by exploits and malware, fail to show a relation with the physics of power equipment.

11:06
WMN-CDA: Contrastive Domain Adaptation for Wireless Mesh Network Configuration

ABSTRACT. Wireless Mesh Networks (WMNs) have become essential for a wide range of applications, such as industrial automation, environmental monitoring, and smart cities. Network operators encounter significant challenges when selecting WMN parameters to ensure good network performance under various ambient operating conditions. Current domain adaptation methods designed for WMN configurations transfer the network configuration knowledge learned from simulations (source domain) to the physical deployment (target domain) at the domain level and fail to consider the simulation-to-reality gap variance under different network configurations, which causes misalignment and overfitting issues. To address such issues, we introduce the WMN Contrastive Domain Adaptation (WMN-CDA) framework, which leverages contrastive learning to transfer the network configuration knowledge learned from simulations to the physical deployment in a discriminative way at a granular scale. WMN-CDA employs the Network Configuration and Simulation-to-Reality contrastive losses to align feature representations and provide good network configuration predictions for physical deployments. We have implemented WMN-CDA and evaluated it with the data collected from a physical testbed with 50 devices and four wireless simulators. Experimental results show significant improvements over the baseline.

11:24
Harnessing Sub-blocks Erase of NAND flash for Secure Deletion Performance Enhancement on CPS

ABSTRACT. With the growing awareness of data security, the secure deletion of digital data has become a common practice for security consensus users. In particular, while Cyber-Physical Systems (CPS) are widely used for healthcare, transportation, and industrial automation, these systems often involve sensitive data. As the solid-state NAND flash has become the main storage medium of CPS, secure deletion of NAND flash has become a focal point for the CPS community. However, due to the nature of out-of-place updates and asymmetric access and erase units, directly enforcing erase-based secure deletion mechanisms on NAND flash could lead to significant performance degradation. To resolve such a concern, this paper proposes the sub-block erase secure deletion (SESD) strategy, which utilizes the sub-block erase of 3D NAND flash to enhance the efficiency of secure deletion. Firstly, the previous sub-block-aware runtime victim finder is redesigned to streamline the criteria for selecting victim sub-blocks, aiding the space allocation and secure deletion efficiency. Secondly, a hotness-aware allocator is included to track the update frequency of logical pages in a block hot-level list for lowered secure deletion overhead. Evaluation results demonstrate that the proposed strategy can reduce write amplification by up to 68.94%, compared with the conventional block-erase-based secure deletion schemes.

11:42
Leveraging BRSKI to Protect the Hardware Supply Chain of Operational Technology: Opportunities and Challenges

ABSTRACT. The increase of interconnected Operational Technology (OT) devices leads to a need for scalable, yet secure onboarding to establish a trust relationship between a new device and its operator domain. The protocol Bootstrapping Remote Secure Key Infrastructure (BRSKI) is a promising candidate to automatically establish such trust relationships and secure the OT hardware supply chain, especially when used in combination with hardware-based cryptographic device identities. Although there is a reference implementation, BRSKI has not seen many real-world applications yet. We develop a testbed to investigate possible causes by analyzing the capabilities of the BRSKI reference implementation, optimizing specific aspects, and extending its functionality to utilize trusted platform modules protecting the keys bound to the device's identity. Subsequently, we conduct a feasibility study to assess whether BRSKI can be used in conformity with IEC 62443. Our findings suggest that BRSKI provides promising opportunities to secure the OT hardware supply chain but also that there is potential for improvement in its implementation.

10:30-12:00 Session 14D: SEC
Location: GRECALE
10:30
U Can Touch This! Microarchitectural Timing Attacks via Machine Clears

ABSTRACT. Microarchitectural timing attacks exploit subtle timing variations caused by hardware behaviors to leak sensitive information. In this paper, we introduce MCHammer, a novel side-channel technique that leverages machine clears induced by self-modifying code detection mechanisms. Unlike most traditional techniques, MCHammer does not require memory access or waiting periods, making it highly efficient. We compare MCHammer to the classical Flush+Reload technique, improving in terms of trace granularity, providing a powerful side-channel attack vector. Using MCHammer, we successfully recover keys from a deployed implementation of a cryptographic tool. Our findings highlight the practical implications of MCHammer and its potential impact on real-world systems.

10:48
MemBERT: Foundation model for memory forensics

ABSTRACT. Foundation models have demonstrated significant advancements in natural language processing and computer vision, yet their potential in cybersecurity is unexplored. Current memory forensics tools and machine learning models often suffer from limited versatility and adaptability, presenting a crucial research gap. To address this, we introduce MemBERT, a foundation model designed explicitly for memory forensics. MemBERT is trained on extensive process dump data, with and without metadata inclusion, to capture intricate patterns present in the main memory. Its potential impact on cybersecurity practices could be significantly similar to the effects of foundation models in natural language processing. We aim to streamline memory forensics by reducing the manual effort and coding traditionally required by cybersecurity practitioners. Through comprehensive experimentation, we demonstrate MemBERT’s efficiency in a downstream task of extracting OpenSSH encryption keys and other memory structures from raw process dumps. The results reveal that the robust embeddings generated by MemBERT significantly help identify structures within memory. Additionally, we demonstrate that our model’s embeddings can be compressed with minimal loss of accuracy, further highlighting its efficiency. Our findings with MemBERT go beyond just its performance in a specific task. They indicate that MemBERT substantially advances memory forensics, providing a versatile and powerful tool for cybersecurity professionals. This research not only addresses the existing limitations of the current forensics process model but also sets the stage for the broader application of foundation models in the cybersecurity domain.

11:06
Swiss Cheese CAPTCHA: A Novel Multi-barrier Mechanism for Bot Detection
PRESENTER: P. Sahithi Reddy

ABSTRACT. A Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is one of the primary barriers between notorious bots and legitimate human users. However, advancements in Artificial Intelligence (AI) have enabled malicious bots to circumvent CAPTCHA challenges effectively. As a result, several types of CAPTCHA have been rendered ineffective.

In this paper, we introduce Swiss Cheese CAPTCHA, a novel sensor-based solution designed to be easily solvable by humans while presenting multiple obstructions for bots (similar to the Swiss Cheese Model) even when the sensor outputs can be predicted and interfered with. We leverage a range of human cognitive abilities and Generic Sensor API in modern devices to provide robust protection against automated attacks by making it more computationally expensive for bots to produce a valid answer within a stipulated time. We conducted two user studies to assess our proposal's effectiveness: one involving 116 participants to assess the likability and improvise the design and the other, with 107 participants, to investigate the impact of improvised design changes on cognitive abilities. Our results from these studies show an average completion time of 4.76 seconds and 6.12 seconds, with a success rate of 90.3% and 83.25%, respectively. By analyzing the 2141 resultant trajectories from both the user studies, we assess the learnability, error recovery rate, efficiency, and satisfaction of users using the scheme. Finally, we devise an automated attack against our proposal to analyze its security in real world; we find the probability of attack success is low.

11:24
Anomaly Detection and Mitigation for Electric Vehicle Charging-Based Attacks on the Power Grid

ABSTRACT. With the increasing adoption of Electric Vehicles (EVs), power grids have to deal with the resulting increase in EV charging loads. A generic method of handling EV loads is load balancing, which requires cooperation from the involved systems, i.e., EVs and Charge Points (CPs). However, if we consider the potential of compromised EVs/CPs, existing load balancing methods fail and the threat of EV charging-based attacks on grid stability arises. In this paper, we address this issue by proposing a combined concept for the detection and mitigation of related attacks. Specifically, we propose a two-step Intrusion Detection System (IDS) that first detects attacks with a potential impact on the grid and in a second step identifies the systems involved in an attack. The design of the IDS enables two attack type-dependent mitigation methods that either correct manipulated data or counteract malicious changes in charging load. Our evaluation identifies specific design choices that enable a good attack detection performance. Additionally, our evaluation shows the effectiveness of the proposed mitigation methods and their relation to IDS performance.

11:42
Improving Anomaly Detection for Electric Vehicle Charging with Generative Adversarial Networks

ABSTRACT. Intrusion Detection Systems (IDSs) are often considered to be an important security mechanism for different use-cases. The Electric Vehicle (EV) charging use-case is one example, with various research articles proposing IDS solutions. One issue in this context, however, is the lack of representative datasets with a variety of realistic attack scenarios, which are vital for evaluating IDSs. Especially concerning the cyber-physical aspects of EV charging representative datasets are missing and related work usually relies on generating random anomalies or manual attack insertions in datasets of normal charging behavior. This can result in unrealistic or biased attack data. In this paper, we address this issue by proposing a Generative Adversarial Network (GAN)-based IDS training method for EV charging. For this, a GAN is used against a pre-trained IDS to generate attack data that avoids detection. Afterwards, the IDS can be re-trained under consideration of the new attack data in order to eliminate persistent gaps or biases in detection. We implement and evaluate the GAN-based training system. Our evaluation shows the ability of the GAN to identify flaws in existing IDSs. Additionally, we show the effectiveness of re-training IDSs with the GAN output.

12:00-13:30SAC Luncheon
13:30-15:00 Session 15A: MLA
Location: ROSA DEI VENTI
13:30
Hierarchical Contrastive Learning with Multiple Augmentations for Sequential Recommendation

ABSTRACT. Sequential recommendation aims to predict users' next actions by analyzing their historical behavior. Lately, contrastive learning has become prominent in this domain, especially when user interactions with items are sparse. Although data augmentation methods have flourished in fields like computer vision, their potential in sequential recommendation remains under-explored. Thus, we present Hierarchical Contrastive Learning with Multiple Augmentations for Sequential Recommendation (HCLRec), a novel framework that harnesses multiple augmentation techniques to create diverse views on user sequences. This framework systematically employs existing augmentation techniques, creating a hierarchy to generate varied views. First, we augment the input sequences to various views using multiple augmentations. Through the continuous composition of these augmentation methods, we formulate both low-level and high-level view pairs. Second, an effective sequence-based encoder is used to embed input sequences, complemented by the supplementary blocks to capture users' nonlinear behaviors, which are further varied by augmentations. Input sequences are routed to subsequent layers based on the number of augmentations applied, helping the model discern intricate sequential patterns intensified by these augmentations. Finally, contrastive losses is calculated between view pairs of the same level within each layer. This allows the encoder to learn from the contrastive losses between augmented views of the same level, and the gap caused by different information between the low-level views and high-level views by multiple augmentations is reduced. In evaluations, HCLRec outperforms state-of-the-art methods by up to 7.22% and demonstrates its effectiveness in handling sparse data.

13:48
Hybrid Flow Shop Scheduling through Reinforcement Learning: A systematic literature review
PRESENTER: Victor Pugliese

ABSTRACT. This paper reviews the application of Reinforcement Learning (RL) in solving Hybrid Flow Shop Scheduling (HFS) problems, a complex manufacturing scheduling challenge. HFS involves processing jobs through multiple stages, each stage has multiple machines that can work in parallel, aiming to optimize objectives like makespan, tardiness, and energy consumption. While traditional methods are well-studied, RL’s in HFS problem is relatively new. The review analyzes 26 studies identified through IEEE Xplore, Scopus, and Web of Science databases (as of April 2024), categorizing them based on RL algorithms, problem types, and objectives. Our analysis reveals the increasing adoption of advanced RL methods like Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) to handle the complexities of HFS, often achieving superior performance compared to metaheuristics and scheduling heuristics. Furthermore, we explore the trend of integrating RL with other optimization techniques and discuss the potential for real-world applications, model interpretability, and the consideration of additional constraints and uncertainties. This review provides valuable insights into the current state and future directions in HFS using RL.

14:06
Neighbor-Based Decentralized Training Strategies for Multi-Agent Reinforcement Learning
PRESENTER: Davide Domini

ABSTRACT. Multi-agent deep reinforcement learning has demonstrated significant potential as a promising framework for developing autonomous agents capable of operating within complex, multi-agent environments in a wide range of domains, like robotics, traffic management, and video games. Centralized training with decentralized execution has emerged as the predominant training paradigm, demonstrating significant effectiveness in learning complex policies. However, its reliance on a centralized learner necessitates that agents acquire policies offline and subsequently execute them online. This constraint motivates the exploration of decentralized training methodologies. Despite their greater flexibility, decentralized approaches often face critical challenges, like: slower convergence rates, heightened instability and lower performance compared to centralized methods. Therefore, this paper proposes three neighbor-based decentralized training strategies based on the well-known Deep-Q Learning algorithm and investigates their effectiveness as a viable alternative to centralized training. We evaluate experience sharing, k-nearest neighbor averaging, and k-nearest neighbor consensus methods in a cooperative multi-agent environment and compare their performance against centralized training and totally decentralized training. Our results show that neighborbased methods can achieve comparable performance to centralized training while offering improved scalability and communication efficiency.

14:24
SC-Block++: A Blocking Algorithm Based on Adaptive Flood Regularization

ABSTRACT. The rapid surge in the number of Web shops presents a challenge for consumers: navigating through the vast amount of stores and products available. Therefore, entity resolution has become an important task to aggregate product information across different Web shops. As entity resolution is a computationally demanding process, its pipelines are divided into two: a blocking phase, which uses a computationally cheap method to select candidate product pairs, and a matching phase with a computationally expensive method to identify matching pairs from the set of candidate pairs. In this paper, we propose SC-Block++, an extension to a state-of-the-art blocking algorithm SC-Block. SC-Block utilizes a RoBERTa base transformer model, trained using Supervised Contrastive Learning, to position the product records in an embedding space, and produces a set of candidate pairs using a nearest-neighbour search. We extend the training procedure of the RoBERTa base transformer model by incorporating Adaptive Flood Regularization (AdaFlood), a regularization method aimed to prevent overfitting and to improve the generalization performance of the model. We compare SC-Block++ to SC-Block, and other benchmark methods on three different data sets, and find that SC-Block++ is able to construct candidate pairs more effectively than the other blocking schemes.

13:30-15:00 Session 15B: ISDE
Location: LIBECCIO
13:30
Soilcast: a Multitask Encoder-Decoder AI Model for Precision Agriculture

ABSTRACT. This paper introduces Soilcast, an advanced multitask encoder-decoder predictive model designed to accurately forecast soil moisture in agricultural fields. By leveraging data from multiple sources and locations, Soilcast enhances resilience against overfitting, a common issue with traditional Long Short-Term Memory (LSTM) models. Tested on over 1,000 agricultural fields within the region of XXXX, in YYYY (masked as requested), Soilcast demonstrated superior performance compared to pure LSTM models, reducing mean squared error and mean absolute error by 10% and 15%, respectively, on average across datasets. The model's flexible architecture allows for both generalization across diverse datasets and specialization for specific fields, ensuring accurate daily soil moisture predictions, which are crucial for effectively optimizing irrigation. Additionally, Soilcast achieved a classification accuracy exceeding 92% in predicting soil moisture stress, outperforming singletask models in both robustness and generalization. These results position Soilcast as a valuable tool for improving water efficiency in response to climate challenges, fostering sustainable precision agriculture practices.

13:48
Fusing Expert Knowledge and Internet of Things Data for Digital Twin Models: Addressing Uncertainty in Expert Statements

ABSTRACT. Extracting Digital Twin models by fusing expert knowledge with Internet of Things data remains a challenging and open research area. Existing literature offers very limited approaches for seamless and systematic extraction of Digital Twin models from these combined sources. In this paper, we address the research gap by proposing a novel approach that considers and integrates the uncertainty inherent in human expert knowledge into the extraction processes of Digital Twin models. Given that experts possess unique experiences, contextual understandings and judgements, their knowledge can be highly divergent, complex, ambiguous, and even incorrect or incomplete. Consequently, not all expert knowledge statements should be equally weighted in the resulting simulation models. Our contributions include a comprehensive literature review on the uncertainty in expert knowledge and the proposal of an approach to integrate this uncertainty in the extraction of Digital Twin models from fused expert knowledge and IoT data. We demonstrate our approach through a case study in reliability assessment.

14:06
Deep Reinforcement Learning of Simulated Students Multimodal Mobility Behavior: Application to the City of Toulouse

ABSTRACT. This study presents a Deep Reinforcement Learning (DRL) approach to address the multimodal mobility behavior of daily commuters, focusing specifically on students' home-university multimodal trips. The proposed mesoscopic model addresses key limitations of recent macro and microscopic models by balancing individual mobility preferences with significant group-level student factors. At its core, the model employs a Proximal Policy Optimization (PPO)-based agent that learns to match student navigation behavior in a multimodal transportation network, considering his group mobility factors such as vehicle ownership and origin-destination regions. Experiments conducted on a SUMO (Simulation of Urban MObility) simulated dataset of a university students' trips in the Toulouse metropolitan area demonstrate the model's performance in both unimodal and multimodal transportation scenarios. The resulting policy offers potential applications in predicting future multimodal mobility behavior, optimizing resource allocation for communities with regular travel needs, and developing more efficient and environment friendly urban transportation systems.

14:24
A DIF-Driven Threshold Tuning Method for Improving Group Fairness

ABSTRACT. To promote social good, current decision support systems based on machine learning must not propagate society's various types of discrimination. Consequently, a desirable behavior for classifiers used in decision-making is that their results do not favor or disadvantage any specific sociodemographic group. One way to achieve this behavior is through post-processing methods, which apply threshold tuning to select the decision boundary that enhances the impartiality of the trained model's decisions. Various strategies have been proposed to determine the optimal threshold, but finding the trade-off between fairness and predictive performance remains challenging. Recently, the application of Differential Item Functioning (DIF) concepts has proven effective for this purpose in model selection, which is a similar application. This finding makes using DIF in threshold tuning appealing and an unexplored contribution to the literature on fairness in machine learning. This paper addresses this gap and proposes DIF-PP, a novel post-processing method based on DIF. We experimentally evaluated our method against three baselines using fifteen datasets, six classification algorithms with sixteen settings for each one, four group fairness metrics, one predictive performance measure, one multi-criteria measure, and one statistical significance test. Our experimental results indicate that DIF-PP provides the best trade-off between group fairness metrics and predictive performance, making it the optimal choice for threshold tuning of binary classifiers applied to decision-making tasks involving people.

13:30-15:00 Session 15D: SEC
Location: GRECALE
13:30
Strengthening Application Security through Integrity Protection of System Call Usage

ABSTRACT. Attackers who exploit program vulnerabilities leverage system calls provided by the OS kernel to execute malicious actions, posing significant security risks. This work introduced Scynteg, a new framework for protecting system call usage in modern applications. Scynteg enforces control flow integrity of system call invocations and system call argument integrity through a combination of a secure kernel module and an LLVM-based compiler. We show that Scynteg effectively protects C/C++ programs against control-flow hijacking and non-control-data attacks on system call arguments. Our prototyped implementation for Linux on Arm leverages hardware security extensions to effectively protect applications while incurring a modest performance overhead.

13:48
Stealth Extension Exfiltration (SEE) Attacks: Stealing User Data without Permissions via Browser Extensions
PRESENTER: Chaejin Lim

ABSTRACT. Web browser extensions have become essential tools in modern browsing, offering enhanced functionality and customization. However, these extensions also introduce a new attack surface, expanding the scope for security vulnerabilities in web browsers. This paper presents Stealth Extension Exfiltration (SEE) attacks, a novel threat that exploits the mismanagement of browser extension permissions. SEE attacks enable malicious extensions to bypass security measures and perform unauthorized actions, such as sending arbitrary HTTP requests, misusing the fetch API to access local files, and exfiltrating sensitive user data without explicit user permissions. Our large-scale analysis of 57,831 real-world browser extensions reveals vulnerabilities that could potentially affect up to 351 million users. We provide concrete examples of these attacks, demonstrating how they can stealthily evade detection while compromising user privacy and security. We reported these risks to the Google security team, who acknowledged the threat posed by SEE attacks. To address these vulnerabilities, we propose mitigation strategies that include enforcing a stricter separation between host permissions and content scripts, as well as implementing more granular access control for sensitive APIs.

14:06
Detecting Cache-based Side-Channel Attacks by Leveraging Mesh Interconnect Traffic Monitoring
PRESENTER: Xingchao Bian

ABSTRACT. Cache-based side-channel attacks (CSCAs) pose a significant threat to computer systems by exploiting timing differences in cache memory accesses to infer a victim's memory access patterns. However, while prior studies have developed various techniques and countermeasures to detect and prevent CSCAs, maintaining their effectiveness under heavy workload conditions remains challenging.

In this paper, we introduce a novel approach that integrates traffic monitoring within the processor interconnect with deep learning techniques to detect various CSCAs. Our method effectively identifies unusual behaviors associated with attacks, even when the system is under significant concurrent workloads. To assess the effectiveness of our detection method under heavy workloads, we performed comprehensive evaluations using the PARSEC 3.0 multi-core benchmark suite. This suite was employed as a concurrent workload, running alongside with the victim and the attacker processes. Our detection mechanism was tested against well-known CSCAs, including Flush+Reload, Prime+Probe, Evict+Reload, and Flush+Flush. Compared to established CSCA detection methods, our approach shows superior performance, particularly under heavy workloads, achieving an average F1-score of 0.94 on a 10-core system—an improvement of 0.33 over previous methods. We expanded our study to 16-core and 28-core processors, where our method maintained an average F1-score of 0.94 across all four attack types, with an average runtime overhead of 2.0% across the three systems.

14:24
Data Distribution and Redistribution - A formal and practical Analysis of the DDS Security Standard

ABSTRACT. The Data Distribution Service (DDS) is a popular communication middleware for the Internet of Things (IoT), providing its own security mechanisms specified in the DDS Security standard. In this work, we formally analyze the authentication handshake protocol and the encryption algorithm used in DDS. We discover a replay vulnerability in the encryption algorithm, implement a proof-of-concept attack on an open-source implementation of DDS, and review security-relevant changes in the recently published version 1.2.

14:42
GUARD: Generic API De-obfuscation and Obfuscated Malware Unpacking with sIAT

ABSTRACT. Within the field of malware analysis, the application programming interface (API) is pivotal for identifying and understanding threats, thereby enabling the development of effective countermeasures. In particular, API obfuscation presents significant challenges in malware analysis, obscuring the malware's inner operations and hindering effective analysis. Despite the importance of resolving obfuscated API, there exists a notable research gap, as recent efforts have overlooked the challenges posed by API obfuscation. Additionally, previous unpacking studies have not made their executable files and data public, hindering replication and follow-up research. To address this research gap, we propose an emulation-based generic API de-obfuscation and unpacking method, called GUARD. Our method employs an obfuscated call emulation combined with a stack-layout analysis algorithm and a scattered import address table (sIAT), effectively restoring original APIs from packed files. Our evaluations against sophisticated commercial packers, including Themida and VMProtect, demonstrate the method's capability to successfully restore APIs and unpack files previously unaddressed by existing research while improving malware detection rate by as much as 24%.

15:00-15:30Coffee Break
15:30-17:00 Session 16A: MLA
Location: ROSA DEI VENTI
15:30
Comparative Study of Lexical and Semantic Approaches in Closed-domain Product Search

ABSTRACT. One of the key challenges in information retrieval from closed-domain documents is the prevalence of technical and abbreviated terms specific to the domain. This scenario often hinders users from effectively searching for relevant information, even when available. The presentation and structure of product information are crucial for the reliability of a product search system from a user perspective. Lexical and semantic search methods are commonly employed in such applications. In this work, we evaluate the trade-offs of these techniques across two datasets with distinct domains and structures: electronic invoices and a governmental product catalog. Our results suggest that lexical search algorithms, such as BM25, tend to retrieve more relevant products faster, whereas semantic search methods rank the relevant documents more effectively.

15:48
Towards Robust Facial Recognition: Gabor Filter-Based Feature Extraction for NIR-VIS Heterogeneous Face Recognition

ABSTRACT. Face recognition systems have advanced significantly in recent years, spurred by computational improvements that have enabled the development of robust deep-learning models. Nevertheless, some challenges persist in real-world applications, such as variations in illumination conditions. Near-infrared (NIR) cameras have been addressed as a possible solution to mitigate this problem. However, most face recognition systems are trained on visible spectrum (VIS) image datasets, necessitating cross-domain mapping strategies for deploying such cameras. This work introduces a novel training strategy for enhancing NIR-VIS Heterogeneous Face Recognition (HFR) systems, considering illumination variance and dataset limitations. We propose integrating Gabor filters, which are applied to extract invariant features from VIS images for use in the NIR domain, involving Principal Component Analysis (PCA) for feature reduction and employing the Mahalanobis distance for classification. This method aims to improve the robustness and accuracy of facial recognition across diverse lighting conditions. We demonstrate significant performance improvements on the CASSIA NIR-VIS 2.0, CARL and BUAAVISNIR datasets when applying Gabor filters. We measured enhanced model performance by up to 76\% of the Rank-five metric, in the best scenario, with compromising only 9\% (in the worst scenario) of its results in VIS scenarios. The proposal maintained a comparable execution time to the traditional model without the Gabor Filter feature extraction step, adding only, in the worst case, a minimal overhead of 13 milliseconds for the architecture GhostFaceNet using the Gabor filtering process. These results suggest that the proposed approach is feasible for real-world applications, especially in environments with limited computational resources.

16:06
MELA: Multi-Event Localization Answering Framework for Video Question Answering

ABSTRACT. The field of Video Question Answering (VideoQA) addresses the challenge of answering questions about content within videos. Recent VideoQA models leveraging large language models (LLMs) transform frame features extracted by vision processors to understand and utilize LLMs. While the approaches adopting LLMs boosted understanding of each video frame, they tend to overlook multiple event concepts within the video, such as human-object interactions, which are carried by temporal changes in visual information. To leverage multiple event information LLMs, we propose the Multi-Event Localization Answering (MELA) framework, a novel method that detects multiple events from video and utilizes them on keyframe localization and question answering. By analyzing the relationships between events mentioned in the question and other events in the video, MELA identifies the set of essential events related to the question. The Multi-event Localizer in MELA identifies and selects keyframes from the frames within the video segments presenting essential events. Then, the Event-aware Answerer determines the answer to the question by using the selected keyframes and video events detected using the event detector. The use of event information significantly improves MELA's ability to better interpret complex human-object interactions, showing improved performance on the STAR VideoQA dataset in both fine-tuning and zero-shot settings compared to baseline techniques. We show an in-depth analysis of our framework, consisting of the impact of Multi-event Localizer and Event-aware Answerer, comparison with baseline Localizer, and impact of the event detector module.

15:30-17:00 Session 16B: ISDE
Location: LIBECCIO
15:30
A Retrieval-Augmented Framework For Meeting Insight Extraction

ABSTRACT. Meetings are vital for collaboration and decision-making in professional environments, yet recalling key details from past discussions can be challenging and this impacts productivity. In this paper, we address this issue by developing a solution that extracts crucial insights from historical meetings using Retrieval Augmented Generation (RAG) techniques. Users can easily upload meeting records and query for relevant information. A core feature of our proposed system is grouping meetings based on abstractive summaries, using state-of-the-art clustering algorithms extensively trained for accuracy. Upon user inquiry, the system identifies the most relevant cluster and retrieves related conversations from the Pinecone vector store database. These conversations, paired with custom prompts, are processed through a Large Language Model (LLM) to generate precise responses. Our optimization efforts focus on exploring various encoders and LLMs, with fine-tuning to ensure seamless integration and high performance. This approach tackles challenges in meeting summarization, content discovery, and user-friendly information retrieval.

15:48
Deciphering Social Behaviour: a Novel Biological Approach For Social Users Classification

ABSTRACT. Social media platforms continue to struggle with the growing presence of social bots—automated accounts that can influence public opinion and facilitate the spread of disinformation. Over time, these social bots have advanced significantly, making them increasingly difficult to distinguish from genuine users. Recently, new groups of bots have emerged, utilizing Large Language Models to generate content for posting, further complicating detection efforts. This paper proposes a novel approach that uses algorithms to measure the similarity between DNA strings, commonly used in biological contexts, to classify social users as bots or not. Our approach begins by clustering social media users into distinct macro species based on the similarities (and differences) observed in their timelines. These macro species are subsequently classified as either bots or genuine users, using a novel metric we developed that evaluates their behavioral characteristics in a way that mirrors biological comparison methods. This study extends beyond past approaches that focus solely on identical behaviors via analyses of the accounts' timelines. By incorporating new metrics, our approach systematically classifies non-trivial accounts into appropriate categories, effectively peeling back layers to reveal non-obvious species.

16:06
Supervised Ensemble-based Causal DAG Selection

ABSTRACT. Causal Discovery (CD) identifies cause-and-effect relationships from data using statistical learning. Several CD algorithms have been proposed relying on different assumptions, e.g. about the statistical relations among variables. However, which assumptions actually hold for a specific case study is not known a priori. Given a dataset obtained by sampling the joint distribution of all variables of a generative causal model, in general each algorithm could reconstruct a different Direct Acyclic Graph (DAG): some will be closer to the ground truth (GT) DAG than others, depending also on the applicability of the respective assumptions to the case study. As a consequence, given a collection of heterogeneous case studies, a hypothetical GT-aware oracle, able to select the best DAG out of the set of reconstructed DAGs, will outclass the average performance of the individual algorithms of the ensemble. In this work, we propose a supervised approach, relying on multilabel classification, to select the DAGs closest to GT by only comparing the topologies of the reconstructed DAGs. We carried out the study on a wide synthetic data set of causal models, sampling DAG topologies up to ten vertices, and using a representative set of linear and non-linear statistical dependencies. Whereas the best individual CD algorithm yields, on average, a distance from GT three times larger than the oracle, our algorithm features an average distance from GT only about $10 \%$ larger than the oracle.

15:30-17:00 Session 16C: PL
Location: BORA
15:30
[Invited Talk] Formal and Practical Aspects of Domain-Specific Languages

ABSTRACT. Domain-specific languages (DSLs) assist a software developer (or end-user) in writing a program using idioms similar to the abstractions found in a specific problem domain. Indeed, the enhanced software productivity and reliability benefits that have been reported from DSL usage are hard to ignore, and DSLs are flourishing. However, tool support for DSLs is lacking when compared to the capabilities provided for standard General-Purpose Languages (GPLs). For example, support for unit testing of a DSL programs, as well as DSL debuggers, are rare. A Systematic Mapping Study (SMS) has been performed to better understand the DSL research field and to identify research trends and any possible open issues. In this talk appropriate methodologies and tools needed to support the development of DSLs will be discussed, as well as some open problems of DSL development (e.g., combining DSLs).

 

Bio: Marjan Mernik received the MSc and PhD degrees in Computer Science from the University of Maribor in 1994 and 1998, respectively.  He is currently a professor at the University of Maribor, Faculty of Electrical Engineering and Computer Science. He was a visiting professor at the University of Alabama at Birmingham, Department of Computer and Information Sciences. His research interests include programming languages, compilers, domain-specific (modeling) languages, grammar-based systems, grammatical inference, and evolutionary computations.He is a member of the ACM, and EAPLS. He is the Editor-in-Chief of the Journal of Computer Languages, as well as Associate Editors of the Applied Soft Computing Journal, Information Sciences Journal, and Swarm and Evolutionary Computation Journal. He is being named a Highly Cited Researcher for years 2017 and 2018. More information about his work is available at https://lpm.feri.um.si/en/members/mernik/.

16:06
A Mechanized Formalization of an FRP Language with Effects

ABSTRACT. Functional Reactive Programming (FRP) is a functional programming paradigm designed for systems interacting with their environment. The Yampa library, a Haskell implementation, allows users to construct signal functions that synchronously process input streams to produce output streams. While this library facilitates concise and robust coding, managing I/O is cumbersome. To address this issue, the Wormholes library extends Yampa with constructs to bind I/O to resource names, accessible in an imperative style. Few FRP languages are formalized, and Wormholes added challenging features. This article presents a mechanized formalization of a slightly modified version of Wormholes, improving the result and correcting some issues.

16:24
A Platform-Independent Software-Intensive Workflow Modeling Language And An Open-Source Visual Programming Tool: A Bottom-Up Approach Using Ontology Integration Of Industrial Workflow Engines

ABSTRACT. Many contemporary software-intensive services are developed as workflows of collaborative and interdependent tasks. Industrial workflow platforms (i.e., engines) such as Airflow and Kubeflow automatically execute and monitor the workflow specified in platform-specific code. The code-based workflow specification becomes complex and error-prone as services grow in complexity. Furthermore, differences in platform-specific workflow specifications cause inefficiencies when porting workflows between platforms, even if the different platforms handle semantically the same workflow.

In this paper, we propose a bottom-up approach for developing a platform-independent software-intensive workflow modeling language. The approach systematically extends the UML activity diagram by building platform-independent ontologies of the workflow specification from the given target industrial workflow engines. Based on the approach, we develop a platform-independent Workflow Modeling Language (WorkflowML) that covers four famous workflow engines (Airflow, Kubeflow, Argo workflow, and Metaflow). Furthermore, we implement an open-source visual programming tool for WorkflowML using the ADOxx metamodeling platform. We validate our approach by evaluating the expressiveness of WorkflowML based on modeling case studies of 42 simple workflows and two real-case workflow-based services. The evaluation results validate that WorkflowML serves as an effective common visual language for target workflow engines, supported by an open-source visual programming tool.

16:42
Breadth-first Cycle Collection Reference Counting: Theory and a Rust Smart Pointer Implementation

ABSTRACT. We present a new garbage collection reference counting algorithm capable of collecting reference cycles—overcoming a known limita- tion of traditional reference counting. The algorithm’s key features include resilience to errors during tracing, support for object finali- sation, no need for supplementary heap memory during collection, and a fast breadth-first tracing approach that avoids stack over- flows. We implement the algorithm as a Rust library that is idiomatic and highly compatible with the Rust ecosystem and that leverages Rust’s type system and borrow checker to minimise unsafe code and prevent undefined behaviour. We report benchmarks that show that our proposal performs comparably to popular Rust alternatives and outperforms them when dealing with garbage cycles.

15:30-17:00 Session 16D: SEC
Location: GRECALE
15:30
Towards a Comprehensive Evaluation of Voltage-Based Fingerprinting for the CAN Bus

ABSTRACT. The Controller Area Network (CAN), a standard communication protocol in modern vehicles, lacks inherent security features, making it susceptible to attacks. While various defense mechanisms have been proposed, their practical implementation in resource-constrained vehicles remains limited. This paper presents a comprehensive evaluation framework for voltage-based fingerprinting, a promising technique for identifying and mitigating CAN bus attacks. This framework compares the performance of four different machine learning (ML) models, analyzes the impact of distinct sections within the CAN voltage frame, explores various waveform and feature types, and considers practical deployment factors such as detection latency and sampling rate. Notably, the paper investigates the CAN ringing phenomenon and its potential for efficient ECU identification. Results demonstrate that the proposed framework offers robust classification performance while ensuring real-world feasibility.

15:48
Detection of Device Triggerable Vulnerabilities in Android Companion Apps through Interactive Triaging

ABSTRACT. We are increasingly relying on Internet of Things (IoT) devices for most of our daily tasks. However, IoT devices are riddled with security vulnerabilities. Most IoT devices have an associated Mobile Companion App (CApp) that enables users to control and access these devices remotely in a user-friendly manner. CApps are man- ufactured by the device vendors, and they trust these IoT devices. This blind trust results in DtM vulnerabilities, where attackers can compromise CApps by exploiting the corresponding IoT device. In this paper, we present RearFind, the first static analysis technique to find DtM vulnerabilities in CApps. We also design an interac- tive triaging technique to reduce false positive alerts through user feedback. Our evaluation shows that RearFind was able to find 5 new (i.e., previously unknown) DtM vulnerabilities. Our interactive triaging technique was able to reduce the false positives by 20%.

16:06
Formally Verifying Robustness and Generalisation of Network Intrusion Detection Models

ABSTRACT. We introduce a new approach for robustness and generalisation of neural network models used in Network Intrusion Detection Systems (NIDS). Models for NIDS must be robust against both natural perturbations (accounting for typical variations in network behaviour) and adversarial attacks (designed to conceal malicious traffic). The standard approach to robustness is a cycle of training to recognise existing attacks followed by generating new attack variations to defeat detection. Besides robustness within a dataset, another problem with research NIDS models trained on limited datasets is the tendency to over-fit to the dataset chosen; this highlights the need for cross-dataset generalisation. Our new approach addresses both problems, by incorporating recent formal verification tools for neural networks into traditional NIDS pipelines. These frameworks allow us to characterise the input space and we also use verification outputs to generate constrained counterexamples to generate new malicious and benign data. Then adversarial training improves both generalisation and adversarial robustness. We demonstrate the ideas with novel specifications for network traffic and training simple networks that are verifiable. We show that cross-dataset and cross-attack generalisation of our models is good and can outperform more complex models trained using state-of-the-art approaches, unable to be verified similarly.

16:24
Clogging DoS Resilient Bootstrapping of Efficient V2V Validation

ABSTRACT. In Vehicular Communication (VC ) systems, neighboring vehicles exchange authenticated transportation safety messages, informing about own mobility and the environment. Verifying all received messages in a dense neighborhood introduces significant cryptographic computation overhead for resource-constrained vehicular On-Board Units ( OBUs). Attackers can exploit this to launch Denial of Service (DoS ) attacks to clog OBUs by broadcasting bogus messages at a high rate. This attack is particularly effective due to an inherent asymmetry and amplification factor: each safety message is to be validated by all receiving neighboring vehicles. This imbalance can lead to significant delays in sifting benign messages amidst a deluge of bogus messages. Even worse, failure to promptly verify a significant amount of benign messages can paralyze Vehicle-to-Vehicle (V2V) enabled applications. We address this challenge, proposing a mechanism that thwarts such attacks: puzzle-based pre-validation that prioritizes verification of potentially valid messages with yet unknown (i.e., unverified) Pseudonymous Certificates (PCs). Verification of such PCs (and their corresponding messages) can bootstrap the efficient pre-validation of follow-up messages authenticated by the same PCs. We show experimental results confirming our scheme can effectively mitigate unsophisticated clogging DoS attacks that do not attempt to solve puzzles.We further show our scheme also significantly raises the bar for sophisticated adversaries: it can be configured to force attackers to solve puzzles for their bogus messages actively - something possible only by investing in significantly higher (hundreds of times more) computational power than that of the targeted benign vehicles. Last but not least, our scheme can be adaptive while remaining compatible to standardized V2V security.