Towards Privacy-Preserving Classification-as-a-Service for DGA Detection
ABSTRACT. Domain generation algorithm (DGA) classifiers can be used to detect and block the establishment of a connection between bots and their command-and-control server.
Classification-as-a-service (CaaS) can separate the classification of domain names from the need for real-world training data, which are difficult to obtain but mandatory for well performing classifiers.
However, domain names as well as trained models may contain privacy-critical information which should not be leaked to either the model provider or the data provider.
Several generic frameworks for privacy-preserving machine learning (ML) have been proposed in the past that can preserve data and model privacy.
Thus, it seems high time to combine state-of-the-art DGA classifiers and privacy-preservation frameworks to enable privacy-preserving CaaS, preserving both, data and model privacy for the DGA detection use case.
In this work, we examine the real-world applicability of four generic frameworks for privacy-preserving ML using different state-of-the-art DGA detection models.
Our results show that out-of-the-box DGA detection models are computationally infeasible for privacy-preserving inference in a real-world setting.
We propose model simplifications that achieve a reduction in inference latency of up to 95%, and up to 97% in communication complexity while causing an accuracy penalty of less than 0.17%.
Despite this significant improvement, real-time classification is still not feasible in a traditional two-party setting.
Thus, more efficient secure multi-party computation (SMPC) or homomorphic encryption (HE) schemes are required to enable real-world feasibility of privacy-preserving CaaS for DGA detection.
Preparing for National Cyber Crises Using Non-linear Cyber Exercises
ABSTRACT. Cyber exercises are a well-received and established means to strengthen the problem-solving skills of personnel and to prepare staff for future cyber incidents. While this concept seems to work for the majority of expected issues, where practicing the application of specific processes, tools, and methods to mitigate the effects of large-scale cyber attacks is key, existing cyber exercise approaches are just of limited use for crises management. The reason for this lies in the very nature of a crisis. While 'common' incidents appear to be more predictable and can usually be dealt with thoroughly prepared standard procedures and well-rehearsed responses, crises however, are inherently uncertain, and off-the-shelf solutions may even be counterproductive. Complex decisions are to be made in short time-frames, influenced by a lot more stakeholders compared to internal incidents, including regulators, the media, and even the general public. These decisions can barely be guided by prepared plans or checklists, thus new forms of preparation are required, which challenge the participants to practice decision making under pressure, but further give them the opportunity to re-consider choices, walk alternative paths and enable them to find the best possible solution for a given situation. For this purpose, this paper discusses a new approach for non-linear cyber exercises, which allow branching points to develop a storyline, and employ new techniques, such as 'Fast Forward' to quickly progress to the critical stages of long-lasting crises, 'Playback' to consolidate gained skills, and 'Pause-Adapt-Repeat' to play through alternative paths. In this paper, we discuss limiting factors of today's cyber exercises for large-scale cyber crises preparation, and introduce concepts for non-linear exercises to compensate these issues.
A new approach for cross-silo federated learning and its privacy risks
ABSTRACT. Federated Learning has witnessed an increasing popularity in the past few years for its ability to train Machine Learning models in critical contexts, using private data without moving them. Most of the approaches in the literature are focused on mobile environments, where mobile devices contain the data of single users, and typically deal with images or text data. In this paper, we define \hcsfedavg, a novel federated learning approach tailored for training machine learning models on data distributed over federated organizations hierarchically organized. Our method focuses on the generalization capabilities of the neural network models, providing a new mechanism for the selection of their best weights. In addition, it is tailored for tabular data.
We empirically test the performance of our approach on two different tabular datasets, showing very good results in terms of performance and generalization capabilities.
Then, we also tackle the problem of assessing the privacy risk of users represented in the training data. In particular, we empirically show, by attacking the \hcsfedavg models with the Membership Inference Attack, that the privacy of the users in the training data may have high risk.
FOX: Fooling with Explanations. Privacy Protection with Adversarial Reactions in Social Media
ABSTRACT. Social media data has been mined over the years to predict individual sensitive attributes such as political and religious beliefs. Indeed, mining such data can improve the user experience with personalization and freemium services. Still, it can also be harmful and discriminative when used to make critical decisions, such as employment. In this work, we investigate social media privacy protection against attribute inference attacks using machine learning explainability and adversarial defense strategies. More precisely, we propose FOX (FOoling with eXplanations), an adversarial at- tack framework to explain and fool sensitive attributes inference models by generating effective adversarial reactions. We evaluate the performance of FOX with other SOTA baselines in a black-box setting by attacking five gender attribute classifiers trained on Facebook pictures reactions, specifically (i) comments generated by Facebook users excluding the picture owner, and (ii) textual tags (i.e., alt-text) generated by Facebook. Our experiments show that FOX successfully fools (about 99.7% and 93.2% of the time) the classifiers, outperforms the SOTA baselines and gives a good transferability of adversarial features.
DaRoute: Inferring trajectories from zero-permission smartphone sensors
ABSTRACT. Nowadays, smartphones are equipped with a multitude of sensors, including GPS, that enables location-based services. However, leakage or misuse of user locations poses a severe privacy threat, motivating operating systems to usually restrict direct access to these resources for applications. Nevertheless, this work demonstrates how an adversary can deduce sensitive location information by inferring a vehicle's trajectory through inbuilt motion sensors collectible by zero-permission mobile apps. Therefore, the presented attack incorporates data from the accelerometer, the gyroscope, and the magnetometer. We then extract so-called path events from raw data to eventually match them against reference data from OpenStreetMap. At the example of real-world data from three different cities, several drivers, and different smartphones, we show that our approach can infer traveled routes with high accuracy within minutes while robust to sensor errors. Our experiments show that even for areas as large as approximately 4500 km², the accuracy of detecting the correct route is as high as 87.14%, significantly outperforming similar approaches from Narain et al. and Waltereit et al.
Introducing a Framework to Enable Anonymous Secure Mulit-Party Computation
ABSTRACT. Secure Multi-Party Computation (SMPC) allows a set of parties to securely compute a functionality in a distributed fashion without the need for any trusted external party. Usually, it is assumed that the parties know each other and have already established authenticated channels among each other. However, in practice the parties sometimes require to stay anonymous. In this paper, we conceptualize a framework that enables the repeated execution of an SMPC protocol for a given functionality such that the parties can keep their participation in the protocol executions private and at the same time be sure that only trustworthy parties may take part in a protocol execution. We identify the security properties that an implementation of our framework must meet and introduce a first implementation of the framework that achieves these properties.
ABSTRACT. Smart products, such as toy robots, must comply with multiple legal requirements of the countries they are sold and used in. Currently, compliance with the legal environment requires manually customizing products for different markets. In this paper, we explore a design approach for smart products that enforces compliance with aspects of the European Union's data protection principles within a product's firmware through a toy robot case study. To this end, we present an exchange between computer scientists and legal scholars that identified the relevant data flows, their processing needs, and the implementation decisions that could allow a device to operate while complying with the EU data protection law. By designing a data-minimizing toy robot, we show that the variety, amount, and quality of data that is exposed, processed, and stored outside a user's premises can be considerably reduced while preserving the device's functionality. In comparison with a robot designed using a traditional approach, in which 90\% of the collected types of information are stored by the data controller or a remote service, our proposed design leads to the mandatory exposure of only 7 out of 15 collected types of information, all of which are legally required by the data controller to demonstrate consent. Moreover, our design is aligned with the Data Privacy Vocabulary, which enables the toy robot to cross geographic borders and seamlessly adjust its data processing activities to the local regulations.
Impact of environmental conditions on fingerprint system performance
ABSTRACT. Biometrics testing has for objective to determine the performance of a biometric system in order to guarantee security and user experience requirements. Providing trust in biometric systems is a key for many manufacturers. The performance is usually measured through the computation of matching scores between legitimate and impostor samples from a given database.
Different bias in particular those linked to the environmental conditions can modify the performance of a biometric system. In this paper, we study the impact of acquisition conditions on fingerprint systems considering at the same time the quality and accuracy. We defined an own-made database controlling the acquisition conditions and we observe the behavior of three different matchers on these biometric data. Experimental results allow us to quantify their impact on performance and draw conclusions for testing biometric systems.
Secure Allocation for Graph-Based Virtual Machines in Cloud Environments
ABSTRACT. Cloud computing systems (CCSs) enable the sharing of physical computing resources through virtualisation, where a group of virtual machines (VMs) can share the same physical resources of a given machine. However, this sharing can lead to a so-called side-channel attack (SCA), widely recognised as a potential threat to CCSs. Specifically, malicious VMs can capture information from (target) VMs, i.e., those with sensitive information, by merely co-located with them on the same physical machine. As such, a VM allocation algorithm needs to be cognizant of this issue and attempts to allocate the malicious and target VMs onto different machines, i.e., the allocation algorithm needs to be security-aware. This paper investigates the allocation patterns of VM allocation algorithms that are more likely to lead to a secure allocation. A driving objective is to reduce the number of VM migrations during allocation. We also propose a graph-based secure VMs allocation algorithm (GbSRS) to minimise SCA threats. Our results show that algorithms following a stacking-based behaviour are more likely to produce secure VMs allocation than those following spreading or random behaviours.
ABSTRACT. Garg, Goldwasser and Vasudevan (Eurocrypt 2020) invented the notion of deletion-compliance to formally model the "right to be forgotten", a concept that confers individuals more control over their digital data. A requirement of deletion-compliance is strong privacy for the deletion requesters since no outside observer must be able to tell if deleted data was ever present in the first place. Naturally, many real world systems where information can flow across users are automatically ruled out.
The main thesis of this paper is that deletion-compliance is a standalone notion, distinct from privacy. We present an alternative definition that meaningfully captures deletion-compliance without any privacy implications. This allows broader class of data collectors to demonstrate compliance to deletion requests and to be paired with various notions of privacy. Our new definition has several appealing properties:
- It is implied by the stronger definition of Garg et al. under natural conditions, and is equivalent when we add a privacy requirement.
- It is naturally composable with minimal assumptions.
- Its requirements are met by data structure implementations that do not reveal the order of operations, a concept known as history-independence.
Along the way, we discuss the many challenges that remain in providing a universal definition of compliance to the "right to be forgotten."
SteelEye: An Application-Layer Attack Detection and Attribution Model in Industrial Control Systems using Semi-Deep Learning
ABSTRACT. The security of Industrial Control Systems is of high importance as they play a critical role in uninterrupted services
provided by Critical Infrastructure operators. Due to a large number of devices and their geographical distribution, Industrial Control
Systems need efficient automatic cyber-attack detection and attribution methods, which suggests us AI-based approaches. This paper
proposes a model called SteelEye based on Semi-Deep Learning (SDL) for accurate detection and attribution of cyber-attacks at the
application layer in industrial control systems. The proposed model depends on Bag of Features (BoF) for accurate detection of
cyber-attacks and utilizes Categorical Boosting (CatBoost) as the base predictor for attack attribution. Empirical results demonstrate
that SteelEye remarkably outperforms state-of-the-art cyber-attack detection and attribution methods in terms of accuracy, precision,
recall, and F1-score.
A Novel Trust Model In Detecting Final-Phase Attacks in Substations
ABSTRACT. A substation's security is paramount because it is an integral part of the Smart Grid for the transmission and distribution of electricity. Advanced persistent threats (APTs) have become the bane of the substation because they can remain undetected for a period until final attacks are launched. A lot of existing techniques may not be real-time enough to detect these final attacks. Trust, even though less investigated, can be used to tackle these attacks. In this paper, we present a trust model designed specifically for the Modbus communication protocol that can detect final attacks from APTs when a substation is compromised. This model is formed from the perspective of the substation device and was successfully tested on two publicly available Modbus datasets under three testing scenarios. The external test, the internal test, and the internal test with IP-MAC blacklisting. The first test assumes attackers' IP, and MAC addresses are not part of the substation network, and the other two assume otherwise. Our model detected the attacks within each dataset and also revealed the attack behaviour within the two datasets. Our model can also be extended to other protocols, and this has been marked for future work.
SegmentPerturb: Effective Black-Box Hidden Voice Attack on Commercial ASR Systems via Selective Deletion
ABSTRACT. Voice control systems continue becoming more pervasive as they are deployed in mobile phones, smart home devices, automobiles, etc. Commonly, voice control systems have high privileges on the device, such as making a call or placing an order. However, they are vulnerable to voice attacks, which may lead to serious consequences.
In this paper, we propose SegmentPerturb which crafts hidden voice commands via inquiring the target models. The general idea of SegmentPerturb is that we separate the original command audio into multiple equal-length segments and apply maximum perturbation on each segment by probing the target speech recognition system. We show that our method is as efficient, and in some aspects outperforming other methods from previous works. We choose four popular speech recognition APIs and one mainstream smart home device to conduct the experiments. Results suggest that this algorithm can generate voice commands which can be recognized by the machine but are hard to understand by a human.
User Identification in Online Social Networks using Graph Transformer Networks
ABSTRACT. The problem of user recognition in online social networks is driven by the need for higher security. Previous recognition systems have extensively employed content-based features and temporal patterns to identify and represent distinctive characteristics within user profiles. This work reveals that semantic textual analysis and a graph representation of the user's social network can be utilized to develop a user identification system. A graph transformer network architecture is proposed for the closed-set node identification task, leveraging the weighted social network graph as input. Users retweeting, mentioning, or replying to a target user's tweet are considered neighbors in the social network graph and connected to the target user. The proposed user identification system outperforms all state-of-the-art systems. Moreover, we validate its performance on three publicly available datasets.
User Profiling on Universal Data Insights tool on IBM Cloud Pak for Security
ABSTRACT. User profiling is one of the most important research topics where organizations endeavour to establish profiles of user activities to detect or predict potential abnormal behaviours. Previous researches have mainly focused on detecting and identifying static activities through social media. A universal analysis based on streaming settings to monitor user activities continuously is missing. This paper proposes a framework for user profiling based on UDI platforms to address this issue. Our framework consists of three main steps: simulating realistic scenarios for user activities, proposing and extracting potential features, and applying machine learning models on simulated datasets. Our experimental results show that selected machine learning algorithms can distinguish most abnormal behaviours correctly. LODA, RRCF, and LSCP algorithms achieve the highest performance among all algorithms. Tree-based algorithms such as Isolation Forest acquire the best results when considering small datasets and speed. Furthermore, machine learning algorithms' performance demonstrates the high quality of our simulated datasets.
Practical Protection of Binary Applications via Transparent Immunization
ABSTRACT. In the past few years, massive data breach attacks on large organizations (e.g., Anthem Inc., Equifax) have compromised sensitive data of tens or even hundreds of millions of people. Many of these massive data breach attacks are due to vulnerabilities in applications (e.g., Apache web server) and a single unpatched application vulnerability could cause prohibitive loss to the business. The 2017 Equifax data breach attack has compromised sensitive data of 148 million people and has costed Equifax $1.4 billion as of May 2019. Unfortunately the average time to detect, contain a data breach was 206 days and 73 days respectively in 2019. There is a pressing need to develop practical and deployable capability to detect and block previously unseen, application specific cyberattacks on vulnerable binary applications in real-time.
In this paper, we present AppImmu, a practical cyber defense system that can detect and block previously unknown cyberattacks on vulnerable binary applications in real-time with no false positive. Given a potentially vulnerable ELF binary application , AppImmu can transparently and statically immunize it into an immunized version via binary rewriting. At run-time, AppImmu uses kernel level immunization based anomaly detection techniques to detect and block previously unknown cyberattacks on immunized binary applications without any prior knowledge of the attacks. We have successfully immunized real world large binary applications such as Apache Java execution environment, bash shell, Snort in Linux and have successfully detected and blocked real world data breach attacks (e.g., Apache Strut exploit used in 2017 Equifax data breach attack, Shellshock exploit) in true real-time. Our benchmark experiments show that AppImmu incurs less than 6% run-time overhead in overall system performance, 2.1% run-time overhead for applications under typical workload, 0.71% run-time overhead in Java execution environment.
Using wrist movements for cyber attacks on examination proctoring
ABSTRACT. Recent advancements in wearable computing have sparked off a wide range of innovative ways though which humans interact with computers. From EEG headbands to eye trackers and smart rings to mention but a few, there is a fast-growing series of new technologies and (or) apps that leverage these gadgets to improve aspects of human living. One interesting question that is rarely asked however is the following: how could the ''bad' guys' leverage these new ways of human-machine interaction to create technologies that go against the common good? This paper studies one such scenario. Specifically, we show how a methodical formulation of human-computer interactions made possible by haptic feedback, wrist motion sensing and stealthy graphical feedback could be used to cheat an examination. For simplicity we refer to this system as an "attack" on examination systems. The attack is done through collaboration between a knowledgeable student (i.e., a mercenary) and a weak student (i.e., the beneficiary) who depends on the mercenary for solutions. Through a combination of experiments and theoretical modeling, we show the attack to be highly effective. The paper raises the question of whether policies on usage of sensor-enabled gadgets in examination settings need a rethink in the wake of today's innovative ways to human-computer interaction.
Detection of Demand Manipulation Attacks on a Power Grid
ABSTRACT. An increased usage in IoT devices across the globe has posed a threat to the power grid. When an attacker has access to multiple IoT devices within the same geographical location, they can possibly disrupt the power grid by regulating a botnet of high-wattage IoT devices. Anomaly detection comes handy to inform the power operator of an anomalous behavior during such an attack. However, it is difficult to detect anomalies especially when such attacks are taking place obscurely and for prolonged time periods. With this motive, we compare different anomaly detection systems in terms of detecting these anomalies collectively. We generate attack data using real-world power consumption data across multiple apartments to assess the performance of various prediction-based detection techniques as well as commercial detection applications. After thorough analysis of the results, we discuss the various cases when an attack is not detected. We then propose a novel dynamic thresholding mechanism, which improves the detection rate up to 97% across different attack scenarios, when used with prediction-based anomaly score techniques.
API-based Ransomware Detection using Machine Learning-based Threat Detection Models
ABSTRACT. Ransomware is a major malware attack experienced by large corporations and healthcare services. Ransomware employs the idea of cryptovirology, which uses cryptography to design malware. The goal of ransomware is to extort ransom by threatening the victim with the destruction of their data. Ransomware typically involves a 3-step process: analyzing the victim’s network traffic, identifying a vulnerability, and then exploiting it. Thus, the detection of ransomware has become an important undertaking that involves various sophisticated solutions for improving security. To further enhance ransomware detection capabilities, this paper focuses on an Application Programming Interface (API)-based ransomware detection approach in combination with machine learning (ML) techniques. The focus of this research is (i) understanding the life cycle of ransomware on the Windows platform, (ii) dynamic analysis of ransomware samples to extract various features of malicious code patterns, and (iii) developing and validating machine learning-based ransomware detection models on different ransomware and benign samples. Data were collected from publicly available repositories and subjected to sandbox analysis for sampling. The sampled datasets were applied to build a k-nearest neighbor model. The grid search hyperparameter optimization algorithm was employed to obtain the best fit model; the results were cross-validated with the testing datasets. This analysis yielded a high ransomware detection accuracy of ~99.18% for Windows-based platforms and shows the potential for achieving high-accuracy ransomware detection capabilities when using a combination of API calls and an ML model. This approach can be further utilized with existing multilayer security solutions to protect critical data from ransomware attacks.
A Practical Oblivious Cloud Storage System based on TEE and Client Gateway
ABSTRACT. In this paper, we propose a new oblivious cloud storage system,
which is more efficient and scalable than existing schemes
due to the combined leverage of
SGX-based trusted execution environment (TEE) at the cloud server side
and the moderate storage space at the client side.
We present the detailed design,
and implement and evaluate the system.
The evaluation results show that,
when the size of outsourced data is 1-20 GB and the block size is 1-8 KB,
the data access throughput between 320 KB/s and 640 KB/s can be attained,
and the average query latency for each block is only 2.26-12.80 ms.