FPS-2024: 17TH INTERNATIONAL SYMPOSIUM ON FOUNDATIONS & PRACTICE OF SECURITY
PROGRAM FOR WEDNESDAY, DECEMBER 11TH
Days:
previous day
all days

View: session overviewtalk overview

09:10-10:45 Session 7: Preserving privacy and maintaining trust for end users in a complex and numeric cyberspace

*includes a paper nominated for Best Paper Award

09:10
Another Walk for Monchi

ABSTRACT. Monchi is a new protocol aimed at privacy-preserving biometric identification. It begins with scores computation in the encrypted domain thanks to homomorphic encryption and ends with comparisons of these scores to a given threshold with function secret sharing. We here study the integration in that context of scores computation techniques recently introduced by Bassit et al. that eliminate homomorphic multiplications by replacing them by lookup tables. First, we extend this lookup tables biometric recognition solution by adding the use of function secret sharing for the final comparison of scores. Then, we introduce a two-party computation of the scores with lookup tables which fits nicely together with the function secret sharing scores comparison. Our solutions accommodate well with the flight boarding use case introduced by Monchi.

09:25
An Innovative DSSE Framework: Ensuring Data Privacy and Query Verification in Untrusted Cloud Environments

ABSTRACT. Dynamic Searchable Symmetric Encryption (DSSE) empowers cloud servers to execute search queries on encrypted user documents while maintaining the encryption of both queries and documents. This technology also supports efficient updates to the document set by the user. However, most current DSSE approaches rely on the "honest-but-curious" server model, where the cloud server is expected to follow the protocol without deviation. Given that cloud servers are not always fully trustworthy, this assumption can be problematic in real-world scenarios. In our paper, we introduce a new forward-private DSSE scheme that includes efficient result verification. To achieve this, the server must provide a "verification token" with the search results to confirm the integrity of the data returned. We enhance retrieval efficiency by using a customized B+ tree to store verification token values and a tailored linked-list data structure. Our cost-effective scheme supports both document updates and searches. Security proofs and performance evaluations demonstrate the scheme's practicality, efficiency, and robust security.

09:45
Privacy-Preserving Machine Learning Inference for Intrusion Detection

ABSTRACT. Cyber-attacks are becoming increasingly sophisticated, prompting more interest in Anomaly-based Intrusion Detection Systems for their ability to detect zero-day threats. As networks expand, outsourcing detection services to third parties raises concerns about data security demanding a need for privacy preservation techniques. Homomorphic Encryption (HE) offers a solution, enabling privacy-preserving anomaly detection by allowing computations without compromising privacy. Our work focuses on developing Private classifiers using HE for intrusion detection. Experiments on the CICIDS2017 dataset using five Machine Learning (ML) classifiers reveal that private Deep Learning (DL) methods achieve performance levels comparable to traditional classifiers. Among these, the Convolutional Neural Network (CNN) strikes a balance between classification speed and detection performance, making it the preferred choice for privacy-preserving intrusion detection.

10:05
Priv-IoT: Privacy-preserving Machine Learning in IoT Utilizing TEE and Lightweight Ciphers

ABSTRACT. The need for lightweight cryptographic primitives is greater than ever due to the rapid advancements in the Internet of Things (IoT) and the increasing presence of resource-constrained devices. In response to this, the NIST has standardized the ASCON lightweight authenticated encryption with associated data (AEAD) and hash algorithm as a stan- dard for lightweight cryptography (LWC). Besides protected IoT data communications, IoT data analytics is crucial for operational efficiency, data-driven innovation, improved decision-making, and predictive main- tenance. We consider a real-world scenario of Cloud-IoT where an IoT application is connected to a (potentially untrusted) cloud. In this paper, we propose Priv-IoT, a privacy-preserving machine learning (PPML) sys- tem, using it an IoT application owner can securely transport IoT data to the cloud and enable secure machine learning (ML) on the IoT data. Our secure IoT data transport protocol is based on a lightweight AEAD scheme and a standard security protocol (e.g., TLS) to resist against var- ious external and internal attacks. We enable secure ML analytics using a trusted execution environment (e.g., Intel-SGX) in the bring-your-own- encryption paradigm. We prototype and evaluate our proposed system using a list of LWC algorithms and fundamental regression algorithms in SGX, and present extensive experimental results on real-world datasets.

10:25
Enhancing Text Encryption in Cloud Computing Using a Hyper-Chaotic System with Logistic Map

ABSTRACT. Cloud security is of paramount importance for data protection, service continuity, and maintaining user confidence in this advanced technology. Managing data in a cloud environment requires a robust encryption solution and increased operational efficiency. This article examines the application of chaotic image encryption algorithms to secure text. Our method involves converting the text message into an image, followed by the application of a specific image encryption algorithm. The results of our approach demonstrate that the resulting encrypted image has a high level of security and is resistant to differential attacks.

13:30-15:00 Session 8: Intersecting security, privacy, and machine learning techniques to detect, mitigate, and prevent threats

*includes a paper nominated for Best Paper Award

13:30
LocalIntel: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge

ABSTRACT. Security Operations Center (SoC) analysts gather threat reports from openly accessible global threat repositories and process them to tailor the information to their organization’s needs, such as developing threat intelligence and security policies. They also depend on organizational internal repositories, which act as private local knowledge database. These local knowledge databases store credible cyber intelligence, critical operational, and infrastructure details. SoCs undertake a manual labor-intensive task of utilizing these global threat repositories and local knowledge databases to create both organization-specific threat intelligence and mitigation policies. Recently, Large Language Models (LLMs) have shown the capability to process diverse knowledge sources efficiently. We leverage this ability to process both knowledge and automate this organization-specific threat intelligence generation. In this work, we present LocalIntel, a novel automated threat intelligence contextualization framework that retrieves zero-day vulnerability reports from the global threat repositories and uses its local knowledge database to determine implications and mitigation strategies to alert and assist the SoC analyst. LocalIntel comprises two key phases: knowledge retrieval and contextualization. Quantitative and qualitative assessment has shown effectiveness in generating up to 93% accurate organizational threat intelligence compared to SoC analyst-generated ground truths with 64% inter-rater agreement.

13:50
Intelligent Green Efficiency for Intrusion Detection

ABSTRACT. Artificial Intelligence (AI) has emerged in popularity recently, recording great progress in various industries. However, the environmental impact of AI is a growing concern, in terms of the energy consumption and carbon footprint of Machine Learning (ML) and Deep Learning (DL) models, making essential investigate Green AI, an attempt to reduce the climate impact of AI systems. This paper presents an assessment of different programming languages and Feature Selection (FS) methods to improve computation performance of AI focusing on Network Intrusion Detection (NID) and cyber-attack classification tasks. Experiments were conducted using five ML models - Random Forest, XGBoost, LightGBM, Multi-Layer Perceptron, and Long Short-Term Memory - implemented in four programming languages - Python, Java, R, and Rust - along with three FS methods - Information Gain, Recursive Feature Elimination, and Chi-Square. The obtained results demonstrated that FS plays an important role enhancing the computational efficiency of AI models without compromising detection accuracy, highlighting languages like Python and R, that benefit from a rich AI libraries environment. These conclusions can be useful to design efficient and sustainable AI systems that still provide a good generalization and a reliable detection.

14:10
A Privacy-Preserving Behavioral Authentication System

ABSTRACT. A behavioral authentication (BA) system leverages the behavioral characteristics of users to verify their identity claims through a verification algorithm. A verification algorithm can be developed by training a machine learning (ML) classifier on user profiles, which eliminates the need to maintain a profile database and enhances system performance. However, similar to other ML systems, ML-based BA classifiers are vulnerable to privacy attacks which can leak sensitive behavioral data. To ensure the privacy of behavioral data, we propose a non-crypto-based approach suitable for low-computing devices. Before sharing the profiles with the verifier, each user will apply random projection (RP) to their behavioral profiles. This transformation will guarantee the correctness and security properties of BA systems, as RP has the ability to preserve Euclidean distances among vectors in a metric space with high probability. Our approach also meets the renewability, unlinkability, and irreversibility requirements of privacy-preserving authentication systems. Extensive experiments on various datasets, along with comprehensive security and privacy evaluations, validate the effectiveness of our method. Our approach is general and can be applied to other BA systems.

14:25
Automated Exploration of Optimal Neural Network Structures for Deepfake Detection

ABSTRACT. The proliferation of Deepfake technology has raised concerns about its potential misuse for malicious purposes, such as defaming celebrities or causing political unrest. While existing methods have reported high accuracy in detecting Deepfakes, challenges remain in adapting to the rapidly evolving Deepfake technology and developing efficient and effective detectors. In this paper, we propose a novel approach to address these challenges by utilizing advanced Neural Architecture Search (NAS) methods, specifically focusing on DARTS, PC-DARTS, and DU-DARTS. Our experimental results demonstrate that the PC-DARTS method achieves the highest test AUC of 0.88 among the techniques investigated, with a learning time of only 2.86 GPU days. This highlights the efficiency and effectiveness of our approach in automatically building Deepfake detection models. Moreover, the models using NAS exhibit competitive performance compared to state-of-the-art architectures such as XceptionNet, EfficientNet, and MobileNet. Our results suggest that the automatic search process using advanced NAS methods can quickly and easily construct adaptive and high-performance Deepfake detection models, providing a promising direction for combating the ever-evolving Deepfake technology. In particular, the results of PC-DARTS emphasize the importance of efficient training time while achieving high test AUC, providing a new perspective on the automatic search for optimal network structures in the context of Deepfake detection.

14:40
An Empirical Study of Black-box based Membership Inference Attacks on a Real-World Dataset
PRESENTER: Yujeong Kwon

ABSTRACT. The recent advancements in artificial intelligence drive the widespread adoption of Machine-Learning-as-a-Service platforms, which offers valuable services. However, these pervasive utilities in the cloud environment unavoidably encounter security and privacy issues. In particular, a membership inference attack (MIA) poses a threat by recognizing the presence of a data sample in a training set for the target model. Although prior MIA approaches underline privacy risks repeatedly by demonstrating experimental results with standard benchmark datasets such as MNIST and CIFAR. However, the effectiveness of such techniques on a real-world dataset remains questionable. We are the first to perform an in-depth empirical study on black-box based MIAs that hold realistic assumptions, including six metric-based and three classifier-based MIAs with the high-dimensional image dataset that consists of identification (ID) cards and driving licenses. Additionally, we introduce the Siamese-based MIA that shows similar or better performance than the state-of-the-art approaches and suggest training a shadow model with autoencoder based reconstructed images. Our major findings show that the performance of MIA techniques against too many features may be degraded; the MIA configuration or a sample’s properties can impact the accuracy of membership inference on members and non-members.

15:15-16:45 Session 9: New trends of machine leaning and AI applied to cybersecurity
15:15
ModelForge: Using GenAI to Improve the Development of Security Protocols
PRESENTER: Martin Duclos

ABSTRACT. Formal methods can be used for verifying security protocols, but their adoption can be hindered by the complexity of translating natural language protocol specifications into formal representations. In this paper, we introduce ModelForge, a novel tool that automates the translation of protocol specifications for the Cryptographic Protocol Shapes Analyzer (CPSA). By leveraging advances in Natural Language Processing (NLP) and Generative AI (GenAI), ModelForge processes protocol specifications and generates a CPSA protocol definition. This approach reduces the manual effort required, making formal analysis more accessible. We evaluate ModelForge by fine-tuning a large language model (LLM) to generate protocol definitions for CPSA, comparing its performance with other popular LLMs. The results from our evaluation show that ModelForge consistently produces quality outputs, excelling in syntactic accuracy, though some refinement is needed to handle certain protocol details. The contributions of this work include the architecture and proof of concept for a translating tool designed to simplify the adoption of formal methods in the development of security protocols.

15:35
Detecting Energy Attacks in the Battery-less Internet of Things

ABSTRACT. We present a technique to detect energy attacks in the battery-less Internet of Things (IoT). Battery-less IoT devices rely on ambient energy harvesting and are employed in a multitude of applications, including safety-critical ones such as biomedical implants. Due to scarce energy intakes and limited energy buffers, their executions become intermittent, alternating periods of active operation with periods of recharging energy buffers. Evidence exists that demonstrates how exerting limited control on ambient energy, one can create situations of livelock, denial of service, and priority inversion, without physical device access. We call these situations energy attacks. Using concepts of approximate intermittent computing and machine learning, we design a technique that can detect energy attacks with 92%+ accuracy, that is, up to 37% better than the baselines, and with up to one-fifth of their energy overhead. By design, our technique does not cause any additional energy failure compared to regular intermittent processing.

15:55
Is Expert-Labeled Data Worth the Cost? Exploring Active and Semi-Supervised Learning Across Imbalance Scenarios in Financial Crime Detection

ABSTRACT. The ongoing fight against financial crime and fraudulent activities is crucial to safeguarding privacy and security. Addressing this issue is critical for protecting global financial integrity and disrupting illicit attempts. Despite the potential of machine learning, its application in fraud detection encounters unique challenges including limited labels, extremely rare fraud cases, and a vast volume of instances. This is a challenging imbalance scenario where learners must rely solely on an exceptionally small set of labeled instances. Active learning and semi-supervised learning have been proposed to handle such situations. However, their effectiveness and usefulness in these scenarios where different resampling techniques can be applied have not been compared. This paper tackles this gap by conducting a comparative analysis of active and semi-supervised learning approaches for fraud detection. We investigate how the imbalance ratio affects their effectiveness and observe their sensitivity to different resampling techniques. Our results show that both frameworks improve fraud detection, even when the imbalance increases. Highly imbalanced datasets are more affected by different settings for requesting expert labeling than the less imbalanced. In less imbalanced domains, active learning tends to outperform others while in highly imbalanced domains, the performance differences are less pronounced. Applying sampling techniques and moderately adjusting the imbalance ratio improves performance across different subsets of the dataset.

16:10
ExploitabilityBirthMark: An Early Predictor of the Likelihood of Exploitation

ABSTRACT. In recent years, there has been a steady increase in the number of reported vulnerabilities (CVEs), increasing the work load of organizations trying to update their systems promptly. This underscores the need to prioritize specific critical vulnerabilities over others to correctly prevent cyberattacks. Unfortunately, the current methods available for assessing the exploitability of vulnerabilities have substantial shortcomings. In particular, they often consists in prediction models that encode data that may not be immediately available at the time a vulnerability is first reported. In this paper, we introduce an innovative exploitability prediction method that exclusively uses information accessible at the time of a vulnerability's initial publication. Our approach obtains better performance compared to the most widely used vulnerability exploit prediction algorithms, in scenarios where data is subject to the aforementioned limitations.

16:25
"You still have to study" - On the Security of LLM generated code.

ABSTRACT. We witness an increasing usage of AI-assistants even for routine (classroom) programming tasks. However, the code generated on basis of a so called "prompt" by the programmer does not always meet accepted security standards. The actual quality of the programmers’ prompt determines whether generated code contains weaknesses or not. We analyse 4 major LLMs with respect to the security of generated Python and Javascript code using the MITRE CWE catalogue as the guiding security definition. Our results show that using different prompt- ing techniques, some LLMs initially generate 65% code which is deemed insecure by a trained security engineer. On the other hand almost all analysed LLMs will eventually generate code being close to 100% secure with increasing manual guidance of a skilled engineer.