View: session overviewtalk overview
Title: The Pursuit of Privacy in the AI Age
Abstract:
We generate and share vast amounts of data that reveal a detailed portrait of our lives, exposing our identity, behaviors, and preferences. To enable individuals to exercise greater control over their personal information, I will present novel approaches to identify and protect sensitive attributes within data. I will present feature representations that effectively disentangle sensitive information from non-sensitive attributes. Furthermore, I will present perturbation techniques designed to obfuscate sensitive attributes while preserving or potentially enhancing the overall quality of the data, thereby safeguarding sensitive information from unwanted inference.
10:30 | Collaborative filtering through weighted similarities of user and item embeddings ABSTRACT. The adoption of neural networks and other complex models in recommender systems has increased markedly in recent years, often considered state-of-the-art. However, award-winning research has demonstrated that traditional matrix factorization methods can still be competitive with these newer models, offering simplicity in implementation and lower computational demands. Hybrid models integrating matrix factorization algorithms are a common strategy, effectively combining different methods to mitigate their limitations. This paper introduces a novel ensemble method that merges user--item and item--item recommendations through weighted similarity to generate top-$N$ recommendations. Unlike previous approaches, our proposal employs the same user and item embeddings for both recommendation strategies, enhancing model efficiency. Experimental results indicate that our ensemble method achieves competitive performance and exhibits stability across diverse datasets, performing well in scenarios favoring either user-item or item-item recommendations. Furthermore, the model eliminates the need for embedding-specific fine-tuning, enabling seamless reuse of hyperparameters from the underlying algorithm without performance degradation. This contributes to the model's simplicity and efficiency. Our code is open-sourced at https://anonymous.4open.science/r/weighted-sims-7B45/. |
10:48 | Network-based instance hardness measures for classification problems PRESENTER: Márcio Basgalupp ABSTRACT. Instance hardness measures allow one to characterize and understand why some instances are harder to classify than others in a classification dataset. An instance can be hard to classify for different reasons, such as being in an overlapping region of the classes or a region of poor data representativeness. While there are many instance hardness measures in the related literature, they are mainly concerned with measuring class overlap. This paper also addresses measuring sparsity in a dataset by building a proximity graph from data and extracting some network-based measures from the nodes. Experimentally, we show that some of these measures are effective in characterizing instance hardness and complement the ones from the literature by measuring the density of the regions where the instances are located. |
11:06 | Data Balancing for Mitigating Sampling Bias in Machine Learning PRESENTER: Márcio Basgalupp ABSTRACT. We increasingly integrate technology into our daily activities, and using Machine Learning (ML) algorithms in various domains has become a common practice. However, in crucial sectors where algorithmic decisions significantly impact people's lives, there is a need to scrutinize these decisions more carefully. Using these algorithms in critical areas, such as courtrooms, raises concerns about potential bias and prejudice, directly affecting the justice and partiality of these tools. There is an urge to create algorithms supporting ethical decisions. This paper proposes using data balancing techniques to mitigate the sample bias present in datasets, aiming to make subsequent ML algorithm training more impartial. A version of the ADASYN algorithm is developed, which performs data balancing at both the class level and at the level of protected attributes, enhancing the diversity and representativeness of the protected groups in the datasets. Experimental results show the technique can promote greater fairness in the predictions of different ML models while keeping a good trade-off with overall accuracy. |
11:24 | MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation ABSTRACT. In this paper, we introduce MoRSE (Mixture of RAGs Security Experts), the first specialised AI chatbot for cybersecurity. MoRSE aims to provide comprehensive and complete knowledge about cybersecurity. MoRSE uses two RAG (Retrieval Augmented Generation) systems designed to retrieve and organize information from multidimensional cybersecurity contexts. MoRSE differs from traditional RAGs by using parallel retrievers that work together to retrieve semantically related information in different formats and structures. Unlike traditional Large Language Models (LLMs) that rely on Parametric Knowledge Bases, MoRSE retrieves relevant documents from Non-Parametric Knowledge Bases in response to user queries. Subsequently, MoRSE uses this information to generate accurate answers. In addition, MoRSE benefits from real-time updates to its knowledge bases, enabling continuous knowledge enrichment without retraining. We have evaluated the effectiveness of MoRSE against other state-of-the-art LLMs, evaluating the system on 600 cybersecurity specific questions. The experimental evaluation has shown that the improvement in terms of relevance and correctness of the answer is more than 10\% compared to known solutions such as GPT-4 and Mixtral 7x8. |
11:42 | PULLM: A Multimodal Framework for Enhanced 3D Point Cloud Upsampling Using Large Language Models ABSTRACT. Point cloud upsampling is a critical task in 3D computer vision, aiming to generate dense and uniformly distributed point sets from sparse inputs. While current self-supervised methods show promise, they often struggle with preserving fine-grained geometric details, especially for highly sparse point clouds. To address these limitations, we propose PointUpsampleLLM (PULLM), a novel multimodal framework that leverages the power of large language models (LLMs) to enhance 3D point cloud upsampling. PULLM integrates a pretrained Point Cloud LLM (PointLLM) with visual features extracted from point clouds, learning a unified representation that captures both geometric and semantic information. At the core of our approach is the Feature Aware Translator (FAT) module, which effectively bridges the modality gap between visual and textual features, enhancing the spatial understanding of the LLM. PULLM generates textual descriptions of point clouds on-the-fly, eliminating the need for large paired datasets. Extensive experiments on the PU1K and PUGAN benchmarks demonstrate that PULLM consistently outperforms state-of-the-art methods, achieving significant improvements in Chamfer Distance, Hausdorff Distance, and Point-to-Plane distance metrics. For instance, on the PUGAN dataset with sparse inputs, PULLM achieves a 56.15\% improvement in Chamfer Distance over the best baseline. Our qualitative results further illustrate PULLM's superior ability to preserve fine details and generate high-quality upsampled point clouds across various object types and geometries. |
10:30 | Communication Isolation For Multi-Robot Systems Using ROS2 ABSTRACT. This article examines how to design communication isolation in a fleet of robots using the Robot Operating System (ROS2). Several communication methods are analyzed and compared based on various criteria (isolation scale, isolation strength, network usage, etc.). A specific scenario is implemented for each communication method to demonstrate their feasibility and highlight some of their limitations. Finally, experimental results on both simulations and real robots compare five communication strategies, providing guidelines to help select the best method depending on the project's requirements. |
10:48 | MARTES: Multi-Agent Reinforcement learning Training Environment for Scheduling PRESENTER: Karen Yadira Lliguin León ABSTRACT. Digitization fostered by the Industry 4.0 paradigm and smart factories leads to more connectivity and data abundance but also to a more dynamic industrial environment that makes scheduling an even harder problem. Large factories with complex configurations like Hybrid Flow-Shops (HFS) cannot rely on centralized, reactive, and non-adaptive heuristics, or metaheuristics that produce high-quality schedules but are time-expensive. We propose MARTES, a Multi-Agent Reinforcement Learning (MARL) Training Environment for Scheduling. In this work, MARTES trains models to be used in HFS scenarios. The resulting models decide among different dispatching rules to select what job to process next. The results show that exploiting MARTES models yields high-quality schedulings, outperforming traditional dispatching rules like First Come First Serve, Earliest Deadline First, or Shortest Job First by even a 26.4%, increasing the deadlines met by jobs in more than 30%, and improving tardiness by even a 50.5% in time-constrained scenarios. MARTES models can also compete in performance with heuristics as NEH, and metaheuristics as genetic (GA) or iterative greedy (IG) algorithms, differing in less than a 1% in makespan results for large instances. Time-wise, NEH can be up to 2 orders of magnitude slower than MARTES models' training times. GA or IG execution times can be similar to MARTES models' training times but require additional executions when changes occur on the factory pipeline, unlike MARTES models. |
11:06 | A Semantic Mapping Framework for Service Robots ABSTRACT. Service robots are undergoing a massification process similar to what happened with personal computers and cell phones a few decades ago. Their ubiquitous coexistence and interaction with humans requires that their representation models of the workspace go beyond metric information used for safe navigation. They are also required to assign semantic meaning to objects and places, i.e. to build semantic maps, in order to understand scenes and engage in human-like interactions. This paper proposes the Semantic MAPping (SMAP) framework to provide a service robot operating in human populated environments with a semantic mapping layer on top of a metric SLAM layer. SMAP is modular, expandable, and efficient enough to run locally on the robot. It has been implemented in Robot Operating System 2 (ROS2) using modular Docker containers. Preliminary experiments with a Pioneer 3-DX mobile robot having a system on module Nvidia Jetson AGX Xavier demonstrated its potential for future service robotics applications. |
11:24 | Lightweight Decentralized Neural Network-Based Strategies for Multi-Robot Patrolling ABSTRACT. The problem of decentralized multi-robot patrol has previously been approached primarily with hand-designed strategies for minimization of "idleness" over the vertices of a graph-structured environment. Here we present two lightweight neural network-based strategies to tackle this problem, and show that they significantly outperform existing strategies in both idleness minimization and against an intelligent intruder model. Our results also indicate important considerations for future strategy design. |
11:42 | ST-CBS: Spatio-Temporal Conflict Based Search in Continuous Space for Multi-Agent Pathfinding ABSTRACT. In this paper, we consider the problem of Multi-Agent Pathfinding (MAPF) in continuous space to find conflict-free paths. The difficulty of the problem arises from two primary factors. First, the involvement of multiple agents leads to combinatorial decision-making, escalating the search space exponentially. Second, the continuous space presents potentially infinite states and actions. We propose a two-level approach, Spatio-Temporal Conflict Based Search (ST-CBS). For the low level, we develop Unidirectional Spatio-Temporal RRT* to generate a path in a spatio-temporal state space for each agent without considering inter-agent conflicts. At the high level, ST-CBS performs a best-first search on a Constraints Tree to resolve conflicts found in the paths of agents. Our method offers benefits in terms of deadlock prevention and faster computation. In the experiment, ST-CBS achieves higher success rates even with a large number of agents (100% for 20 and more agents) and faster planning and execution time compared to recent MAPF algorithms. |
10:30 | U Can Touch This! Microarchitectural Timing Attacks via Machine Clears ABSTRACT. Microarchitectural timing attacks exploit subtle timing variations caused by hardware behaviors to leak sensitive information. In this paper, we introduce MCHammer, a novel side-channel technique that leverages machine clears induced by self-modifying code detection mechanisms. Unlike most traditional techniques, MCHammer does not require memory access or waiting periods, making it highly efficient. We compare MCHammer to the classical Flush+Reload technique, improving in terms of trace granularity, providing a powerful side-channel attack vector. Using MCHammer, we successfully recover keys from a deployed implementation of a cryptographic tool. Our findings highlight the practical implications of MCHammer and its potential impact on real-world systems. |
10:48 | MemBERT: Foundation model for memory forensics ABSTRACT. Foundation models have demonstrated significant advancements in natural language processing and computer vision, yet their potential in cybersecurity is unexplored. Current memory forensics tools and machine learning models often suffer from limited versatility and adaptability, presenting a crucial research gap. To address this, we introduce MemBERT, a foundation model designed explicitly for memory forensics. MemBERT is trained on extensive process dump data, with and without metadata inclusion, to capture intricate patterns present in the main memory. Its potential impact on cybersecurity practices could be significantly similar to the effects of foundation models in natural language processing. We aim to streamline memory forensics by reducing the manual effort and coding traditionally required by cybersecurity practitioners. Through comprehensive experimentation, we demonstrate MemBERT’s efficiency in a downstream task of extracting OpenSSH encryption keys and other memory structures from raw process dumps. The results reveal that the robust embeddings generated by MemBERT significantly help identify structures within memory. Additionally, we demonstrate that our model’s embeddings can be compressed with minimal loss of accuracy, further highlighting its efficiency. Our findings with MemBERT go beyond just its performance in a specific task. They indicate that MemBERT substantially advances memory forensics, providing a versatile and powerful tool for cybersecurity professionals. This research not only addresses the existing limitations of the current forensics process model but also sets the stage for the broader application of foundation models in the cybersecurity domain. |
11:06 | Swiss Cheese CAPTCHA: A Novel Multi-barrier Mechanism for Bot Detection PRESENTER: P. Sahithi Reddy ABSTRACT. A Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is one of the primary barriers between notorious bots and legitimate human users. However, advancements in Artificial Intelligence (AI) have enabled malicious bots to circumvent CAPTCHA challenges effectively. As a result, several types of CAPTCHA have been rendered ineffective. In this paper, we introduce Swiss Cheese CAPTCHA, a novel sensor-based solution designed to be easily solvable by humans while presenting multiple obstructions for bots (similar to the Swiss Cheese Model) even when the sensor outputs can be predicted and interfered with. We leverage a range of human cognitive abilities and Generic Sensor API in modern devices to provide robust protection against automated attacks by making it more computationally expensive for bots to produce a valid answer within a stipulated time. We conducted two user studies to assess our proposal's effectiveness: one involving 116 participants to assess the likability and improvise the design and the other, with 107 participants, to investigate the impact of improvised design changes on cognitive abilities. Our results from these studies show an average completion time of 4.76 seconds and 6.12 seconds, with a success rate of 90.3% and 83.25%, respectively. By analyzing the 2141 resultant trajectories from both the user studies, we assess the learnability, error recovery rate, efficiency, and satisfaction of users using the scheme. Finally, we devise an automated attack against our proposal to analyze its security in real world; we find the probability of attack success is low. |
11:24 | Anomaly Detection and Mitigation for Electric Vehicle Charging-Based Attacks on the Power Grid ABSTRACT. With the increasing adoption of Electric Vehicles (EVs), power grids have to deal with the resulting increase in EV charging loads. A generic method of handling EV loads is load balancing, which requires cooperation from the involved systems, i.e., EVs and Charge Points (CPs). However, if we consider the potential of compromised EVs/CPs, existing load balancing methods fail and the threat of EV charging-based attacks on grid stability arises. In this paper, we address this issue by proposing a combined concept for the detection and mitigation of related attacks. Specifically, we propose a two-step Intrusion Detection System (IDS) that first detects attacks with a potential impact on the grid and in a second step identifies the systems involved in an attack. The design of the IDS enables two attack type-dependent mitigation methods that either correct manipulated data or counteract malicious changes in charging load. Our evaluation identifies specific design choices that enable a good attack detection performance. Additionally, our evaluation shows the effectiveness of the proposed mitigation methods and their relation to IDS performance. |
11:42 | Improving Anomaly Detection for Electric Vehicle Charging with Generative Adversarial Networks ABSTRACT. Intrusion Detection Systems (IDSs) are often considered to be an important security mechanism for different use-cases. The Electric Vehicle (EV) charging use-case is one example, with various research articles proposing IDS solutions. One issue in this context, however, is the lack of representative datasets with a variety of realistic attack scenarios, which are vital for evaluating IDSs. Especially concerning the cyber-physical aspects of EV charging representative datasets are missing and related work usually relies on generating random anomalies or manual attack insertions in datasets of normal charging behavior. This can result in unrealistic or biased attack data. In this paper, we address this issue by proposing a Generative Adversarial Network (GAN)-based IDS training method for EV charging. For this, a GAN is used against a pre-trained IDS to generate attack data that avoids detection. Afterwards, the IDS can be re-trained under consideration of the new attack data in order to eliminate persistent gaps or biases in detection. We implement and evaluate the GAN-based training system. Our evaluation shows the ability of the GAN to identify flaws in existing IDSs. Additionally, we show the effectiveness of re-training IDSs with the GAN output. |
13:30 | Hierarchical Contrastive Learning with Multiple Augmentations for Sequential Recommendation ABSTRACT. Sequential recommendation aims to predict users' next actions by analyzing their historical behavior. Lately, contrastive learning has become prominent in this domain, especially when user interactions with items are sparse. Although data augmentation methods have flourished in fields like computer vision, their potential in sequential recommendation remains under-explored. Thus, we present Hierarchical Contrastive Learning with Multiple Augmentations for Sequential Recommendation (HCLRec), a novel framework that harnesses multiple augmentation techniques to create diverse views on user sequences. This framework systematically employs existing augmentation techniques, creating a hierarchy to generate varied views. First, we augment the input sequences to various views using multiple augmentations. Through the continuous composition of these augmentation methods, we formulate both low-level and high-level view pairs. Second, an effective sequence-based encoder is used to embed input sequences, complemented by the supplementary blocks to capture users' nonlinear behaviors, which are further varied by augmentations. Input sequences are routed to subsequent layers based on the number of augmentations applied, helping the model discern intricate sequential patterns intensified by these augmentations. Finally, contrastive losses is calculated between view pairs of the same level within each layer. This allows the encoder to learn from the contrastive losses between augmented views of the same level, and the gap caused by different information between the low-level views and high-level views by multiple augmentations is reduced. In evaluations, HCLRec outperforms state-of-the-art methods by up to 7.22% and demonstrates its effectiveness in handling sparse data. |
13:48 | Hybrid Flow Shop Scheduling through Reinforcement Learning: A systematic literature review PRESENTER: Victor Pugliese ABSTRACT. This paper reviews the application of Reinforcement Learning (RL) in solving Hybrid Flow Shop Scheduling (HFS) problems, a complex manufacturing scheduling challenge. HFS involves processing jobs through multiple stages, each stage has multiple machines that can work in parallel, aiming to optimize objectives like makespan, tardiness, and energy consumption. While traditional methods are well-studied, RL’s in HFS problem is relatively new. The review analyzes 26 studies identified through IEEE Xplore, Scopus, and Web of Science databases (as of April 2024), categorizing them based on RL algorithms, problem types, and objectives. Our analysis reveals the increasing adoption of advanced RL methods like Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) to handle the complexities of HFS, often achieving superior performance compared to metaheuristics and scheduling heuristics. Furthermore, we explore the trend of integrating RL with other optimization techniques and discuss the potential for real-world applications, model interpretability, and the consideration of additional constraints and uncertainties. This review provides valuable insights into the current state and future directions in HFS using RL. |
14:06 | Neighbor-Based Decentralized Training Strategies for Multi-Agent Reinforcement Learning PRESENTER: Davide Domini ABSTRACT. Multi-agent deep reinforcement learning has demonstrated significant potential as a promising framework for developing autonomous agents capable of operating within complex, multi-agent environments in a wide range of domains, like robotics, traffic management, and video games. Centralized training with decentralized execution has emerged as the predominant training paradigm, demonstrating significant effectiveness in learning complex policies. However, its reliance on a centralized learner necessitates that agents acquire policies offline and subsequently execute them online. This constraint motivates the exploration of decentralized training methodologies. Despite their greater flexibility, decentralized approaches often face critical challenges, like: slower convergence rates, heightened instability and lower performance compared to centralized methods. Therefore, this paper proposes three neighbor-based decentralized training strategies based on the well-known Deep-Q Learning algorithm and investigates their effectiveness as a viable alternative to centralized training. We evaluate experience sharing, k-nearest neighbor averaging, and k-nearest neighbor consensus methods in a cooperative multi-agent environment and compare their performance against centralized training and totally decentralized training. Our results show that neighborbased methods can achieve comparable performance to centralized training while offering improved scalability and communication efficiency. |
14:24 | SC-Block++: A Blocking Algorithm Based on Adaptive Flood Regularization ABSTRACT. The rapid surge in the number of Web shops presents a challenge for consumers: navigating through the vast amount of stores and products available. Therefore, entity resolution has become an important task to aggregate product information across different Web shops. As entity resolution is a computationally demanding process, its pipelines are divided into two: a blocking phase, which uses a computationally cheap method to select candidate product pairs, and a matching phase with a computationally expensive method to identify matching pairs from the set of candidate pairs. In this paper, we propose SC-Block++, an extension to a state-of-the-art blocking algorithm SC-Block. SC-Block utilizes a RoBERTa base transformer model, trained using Supervised Contrastive Learning, to position the product records in an embedding space, and produces a set of candidate pairs using a nearest-neighbour search. We extend the training procedure of the RoBERTa base transformer model by incorporating Adaptive Flood Regularization (AdaFlood), a regularization method aimed to prevent overfitting and to improve the generalization performance of the model. We compare SC-Block++ to SC-Block, and other benchmark methods on three different data sets, and find that SC-Block++ is able to construct candidate pairs more effectively than the other blocking schemes. |
13:30 | Soilcast: a Multitask Encoder-Decoder AI Model for Precision Agriculture ABSTRACT. This paper introduces Soilcast, an advanced multitask encoder-decoder predictive model designed to accurately forecast soil moisture in agricultural fields. By leveraging data from multiple sources and locations, Soilcast enhances resilience against overfitting, a common issue with traditional Long Short-Term Memory (LSTM) models. Tested on over 1,000 agricultural fields within the region of XXXX, in YYYY (masked as requested), Soilcast demonstrated superior performance compared to pure LSTM models, reducing mean squared error and mean absolute error by 10% and 15%, respectively, on average across datasets. The model's flexible architecture allows for both generalization across diverse datasets and specialization for specific fields, ensuring accurate daily soil moisture predictions, which are crucial for effectively optimizing irrigation. Additionally, Soilcast achieved a classification accuracy exceeding 92% in predicting soil moisture stress, outperforming singletask models in both robustness and generalization. These results position Soilcast as a valuable tool for improving water efficiency in response to climate challenges, fostering sustainable precision agriculture practices. |
13:48 | Fusing Expert Knowledge and Internet of Things Data for Digital Twin Models: Addressing Uncertainty in Expert Statements PRESENTER: Michelle Jungmann ABSTRACT. Extracting Digital Twin models by fusing expert knowledge with Internet of Things data remains a challenging and open research area. Existing literature offers very limited approaches for seamless and systematic extraction of Digital Twin models from these combined sources. In this paper, we address the research gap by proposing a novel approach that considers and integrates the uncertainty inherent in human expert knowledge into the extraction processes of Digital Twin models. Given that experts possess unique experiences, contextual understandings and judgements, their knowledge can be highly divergent, complex, ambiguous, and even incorrect or incomplete. Consequently, not all expert knowledge statements should be equally weighted in the resulting simulation models. Our contributions include a comprehensive literature review on the uncertainty in expert knowledge and the proposal of an approach to integrate this uncertainty in the extraction of Digital Twin models from fused expert knowledge and IoT data. We demonstrate our approach through a case study in reliability assessment. |
14:06 | Deep Reinforcement Learning of Simulated Students Multimodal Mobility Behavior: Application to the City of Toulouse ABSTRACT. This study presents a Deep Reinforcement Learning (DRL) approach to address the multimodal mobility behavior of daily commuters, focusing specifically on students' home-university multimodal trips. The proposed mesoscopic model addresses key limitations of recent macro and microscopic models by balancing individual mobility preferences with significant group-level student factors. At its core, the model employs a Proximal Policy Optimization (PPO)-based agent that learns to match student navigation behavior in a multimodal transportation network, considering his group mobility factors such as vehicle ownership and origin-destination regions. Experiments conducted on a SUMO (Simulation of Urban MObility) simulated dataset of a university students' trips in the Toulouse metropolitan area demonstrate the model's performance in both unimodal and multimodal transportation scenarios. The resulting policy offers potential applications in predicting future multimodal mobility behavior, optimizing resource allocation for communities with regular travel needs, and developing more efficient and environment friendly urban transportation systems. |
14:24 | A DIF-Driven Threshold Tuning Method for Improving Group Fairness ABSTRACT. To promote social good, current decision support systems based on machine learning must not propagate society's various types of discrimination. Consequently, a desirable behavior for classifiers used in decision-making is that their results do not favor or disadvantage any specific sociodemographic group. One way to achieve this behavior is through post-processing methods, which apply threshold tuning to select the decision boundary that enhances the impartiality of the trained model's decisions. Various strategies have been proposed to determine the optimal threshold, but finding the trade-off between fairness and predictive performance remains challenging. Recently, the application of Differential Item Functioning (DIF) concepts has proven effective for this purpose in model selection, which is a similar application. This finding makes using DIF in threshold tuning appealing and an unexplored contribution to the literature on fairness in machine learning. This paper addresses this gap and proposes DIF-PP, a novel post-processing method based on DIF. We experimentally evaluated our method against three baselines using fifteen datasets, six classification algorithms with sixteen settings for each one, four group fairness metrics, one predictive performance measure, one multi-criteria measure, and one statistical significance test. Our experimental results indicate that DIF-PP provides the best trade-off between group fairness metrics and predictive performance, making it the optimal choice for threshold tuning of binary classifiers applied to decision-making tasks involving people. |
13:30 | Strengthening Application Security through Integrity Protection of System Call Usage ABSTRACT. Attackers who exploit program vulnerabilities leverage system calls provided by the OS kernel to execute malicious actions, posing significant security risks. This work introduced Scynteg, a new framework for protecting system call usage in modern applications. Scynteg enforces control flow integrity of system call invocations and system call argument integrity through a combination of a secure kernel module and an LLVM-based compiler. We show that Scynteg effectively protects C/C++ programs against control-flow hijacking and non-control-data attacks on system call arguments. Our prototyped implementation for Linux on Arm leverages hardware security extensions to effectively protect applications while incurring a modest performance overhead. |
13:48 | Stealth Extension Exfiltration (SEE) Attacks: Stealing User Data without Permissions via Browser Extensions PRESENTER: Chaejin Lim ABSTRACT. Web browser extensions have become essential tools in modern browsing, offering enhanced functionality and customization. However, these extensions also introduce a new attack surface, expanding the scope for security vulnerabilities in web browsers. This paper presents Stealth Extension Exfiltration (SEE) attacks, a novel threat that exploits the mismanagement of browser extension permissions. SEE attacks enable malicious extensions to bypass security measures and perform unauthorized actions, such as sending arbitrary HTTP requests, misusing the fetch API to access local files, and exfiltrating sensitive user data without explicit user permissions. Our large-scale analysis of 57,831 real-world browser extensions reveals vulnerabilities that could potentially affect up to 351 million users. We provide concrete examples of these attacks, demonstrating how they can stealthily evade detection while compromising user privacy and security. We reported these risks to the Google security team, who acknowledged the threat posed by SEE attacks. To address these vulnerabilities, we propose mitigation strategies that include enforcing a stricter separation between host permissions and content scripts, as well as implementing more granular access control for sensitive APIs. |
14:06 | Detecting Cache-based Side-Channel Attacks by Leveraging Mesh Interconnect Traffic Monitoring PRESENTER: Xingchao Bian ABSTRACT. Cache-based side-channel attacks (CSCAs) pose a significant threat to computer systems by exploiting timing differences in cache memory accesses to infer a victim's memory access patterns. However, while prior studies have developed various techniques and countermeasures to detect and prevent CSCAs, maintaining their effectiveness under heavy workload conditions remains challenging. In this paper, we introduce a novel approach that integrates traffic monitoring within the processor interconnect with deep learning techniques to detect various CSCAs. Our method effectively identifies unusual behaviors associated with attacks, even when the system is under significant concurrent workloads. To assess the effectiveness of our detection method under heavy workloads, we performed comprehensive evaluations using the PARSEC 3.0 multi-core benchmark suite. This suite was employed as a concurrent workload, running alongside with the victim and the attacker processes. Our detection mechanism was tested against well-known CSCAs, including Flush+Reload, Prime+Probe, Evict+Reload, and Flush+Flush. Compared to established CSCA detection methods, our approach shows superior performance, particularly under heavy workloads, achieving an average F1-score of 0.94 on a 10-core system—an improvement of 0.33 over previous methods. We expanded our study to 16-core and 28-core processors, where our method maintained an average F1-score of 0.94 across all four attack types, with an average runtime overhead of 2.0% across the three systems. |
14:24 | Data Distribution and Redistribution - A formal and practical Analysis of the DDS Security Standard ABSTRACT. The Data Distribution Service (DDS) is a popular communication middleware for the Internet of Things (IoT), providing its own security mechanisms specified in the DDS Security standard. In this work, we formally analyze the authentication handshake protocol and the encryption algorithm used in DDS. We discover a replay vulnerability in the encryption algorithm, implement a proof-of-concept attack on an open-source implementation of DDS, and review security-relevant changes in the recently published version 1.2. |
14:42 | GUARD: Generic API De-obfuscation and Obfuscated Malware Unpacking with sIAT ABSTRACT. Within the field of malware analysis, the application programming interface (API) is pivotal for identifying and understanding threats, thereby enabling the development of effective countermeasures. In particular, API obfuscation presents significant challenges in malware analysis, obscuring the malware's inner operations and hindering effective analysis. Despite the importance of resolving obfuscated API, there exists a notable research gap, as recent efforts have overlooked the challenges posed by API obfuscation. Additionally, previous unpacking studies have not made their executable files and data public, hindering replication and follow-up research. To address this research gap, we propose an emulation-based generic API de-obfuscation and unpacking method, called GUARD. Our method employs an obfuscated call emulation combined with a stack-layout analysis algorithm and a scattered import address table (sIAT), effectively restoring original APIs from packed files. Our evaluations against sophisticated commercial packers, including Themida and VMProtect, demonstrate the method's capability to successfully restore APIs and unpack files previously unaddressed by existing research while improving malware detection rate by as much as 24%. |
15:30 | [Invited Talk] Formal and Practical Aspects of Domain-Specific Languages ABSTRACT. Domain-specific languages (DSLs) assist a software developer (or end-user) in writing a program using idioms similar to the abstractions found in a specific problem domain. Indeed, the enhanced software productivity and reliability benefits that have been reported from DSL usage are hard to ignore, and DSLs are flourishing. However, tool support for DSLs is lacking when compared to the capabilities provided for standard General-Purpose Languages (GPLs). For example, support for unit testing of a DSL programs, as well as DSL debuggers, are rare. A Systematic Mapping Study (SMS) has been performed to better understand the DSL research field and to identify research trends and any possible open issues. In this talk appropriate methodologies and tools needed to support the development of DSLs will be discussed, as well as some open problems of DSL development (e.g., combining DSLs).
Bio: Marjan Mernik received the MSc and PhD degrees in Computer Science from the University of Maribor in 1994 and 1998, respectively. He is currently a professor at the University of Maribor, Faculty of Electrical Engineering and Computer Science. He was a visiting professor at the University of Alabama at Birmingham, Department of Computer and Information Sciences. His research interests include programming languages, compilers, domain-specific (modeling) languages, grammar-based systems, grammatical inference, and evolutionary computations.He is a member of the ACM, and EAPLS. He is the Editor-in-Chief of the Journal of Computer Languages, as well as Associate Editors of the Applied Soft Computing Journal, Information Sciences Journal, and Swarm and Evolutionary Computation Journal. He is being named a Highly Cited Researcher for years 2017 and 2018. More information about his work is available at https://lpm.feri.um.si/en/members/mernik/. |
16:06 | A Mechanized Formalization of an FRP Language with Effects ABSTRACT. Functional Reactive Programming (FRP) is a functional programming paradigm designed for systems interacting with their environment. The Yampa library, a Haskell implementation, allows users to construct signal functions that synchronously process input streams to produce output streams. While this library facilitates concise and robust coding, managing I/O is cumbersome. To address this issue, the Wormholes library extends Yampa with constructs to bind I/O to resource names, accessible in an imperative style. Few FRP languages are formalized, and Wormholes added challenging features. This article presents a mechanized formalization of a slightly modified version of Wormholes, improving the result and correcting some issues. |
16:24 | A Platform-Independent Software-Intensive Workflow Modeling Language And An Open-Source Visual Programming Tool: A Bottom-Up Approach Using Ontology Integration Of Industrial Workflow Engines ABSTRACT. Many contemporary software-intensive services are developed as workflows of collaborative and interdependent tasks. Industrial workflow platforms (i.e., engines) such as Airflow and Kubeflow automatically execute and monitor the workflow specified in platform-specific code. The code-based workflow specification becomes complex and error-prone as services grow in complexity. Furthermore, differences in platform-specific workflow specifications cause inefficiencies when porting workflows between platforms, even if the different platforms handle semantically the same workflow. In this paper, we propose a bottom-up approach for developing a platform-independent software-intensive workflow modeling language. The approach systematically extends the UML activity diagram by building platform-independent ontologies of the workflow specification from the given target industrial workflow engines. Based on the approach, we develop a platform-independent Workflow Modeling Language (WorkflowML) that covers four famous workflow engines (Airflow, Kubeflow, Argo workflow, and Metaflow). Furthermore, we implement an open-source visual programming tool for WorkflowML using the ADOxx metamodeling platform. We validate our approach by evaluating the expressiveness of WorkflowML based on modeling case studies of 42 simple workflows and two real-case workflow-based services. The evaluation results validate that WorkflowML serves as an effective common visual language for target workflow engines, supported by an open-source visual programming tool. |
16:42 | Breadth-first Cycle Collection Reference Counting: Theory and a Rust Smart Pointer Implementation ABSTRACT. We present a new garbage collection reference counting algorithm capable of collecting reference cycles—overcoming a known limita- tion of traditional reference counting. The algorithm’s key features include resilience to errors during tracing, support for object finali- sation, no need for supplementary heap memory during collection, and a fast breadth-first tracing approach that avoids stack over- flows. We implement the algorithm as a Rust library that is idiomatic and highly compatible with the Rust ecosystem and that leverages Rust’s type system and borrow checker to minimise unsafe code and prevent undefined behaviour. We report benchmarks that show that our proposal performs comparably to popular Rust alternatives and outperforms them when dealing with garbage cycles. |