A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI
ABSTRACT. Different security issues are a common problem for open source packages archived to and delivered through software ecosystems. These often manifest themselves as software weaknesses that may lead to concrete software vulnerabilities. This paper examines various security issues in Python packages with static analysis. The dataset is based on a snapshot of all packages stored to the Python Package Index (PyPI). In total, over 197 thousand packages and over 749 thousand security issues are covered. Even under the constraints imposed by static analysis, (a) the results indicate prevalence of security issues; at least one issue is present for about 46% of the Python packages. In terms of the issue types, (b) exception handling and different code injections have been the most common issues. The subprocess module stands out in this regard. Reflecting the generally small size of the packages, (c) software size metrics do not predict well the amount of issues revealed through static analysis. With these results and the accompanying discussion, the paper contributes to the field of large-scale empirical studies for better understanding security problems in software ecosystems.
Epistemic Analysis of a Key-Management Vulnerability in LoRaWAN
ABSTRACT. Smart devices in applications like remote sensing systems use the LoRaWAN protocol to connect with and transmit data to a central server. The device and server use the protocol's handshake procedure to start a communication session and negotiate session encryption keys. However, session keys remain unchanged throughout communications with the server. Static session keys make the protocol vulnerable to attack. An intruder that compromises the session keys can decrypt past and future messages. This work studies the LoRaWAN handshake procedure, its security properties, namely mutual authentication and secrecy, and proposes a key exchange scheme to mitigate the session key vulnerability. It proposes new epistemic definitions for mutual authentication and secrecy. Our definitions are clear and precise. To validate them, we prove that the handshake and new key exchange scheme satisfy these definitions. Based on this validation, we show that the protocol is secure. Finally, the work shows that the new key exchange scheme is feasible for devices with limited processing power, bandwidth, and memory.
FORTRESS: FORtified Tamper-Resistant Envelope with Embedded Security Sensor
ABSTRACT. Protecting security modules from attacks on the hardware level presents a very challenging endeavor since the attacker can manipulate the device directly through physical access. To address this issue, different physical security enclosures have been developed with the goal to cover entire hardware modules and, hence, protect them from external manipulation.
Novel concepts are battery-less and based on Physical Unclonable Functions (PUFs), aiming at overcoming the most severe drawbacks of past devices; the need for active monitoring and, thus, limited battery life-time.
Although some progress has already been made for certain aspects of PUF-based enclosures, the combination and integration of all required components and the creation of a corresponding architecture for Hardware Security Modules (HSMs) is still an open issue.
In this paper, we present FORTRESS, a PUF-based HSM that integrates the tamper-sensitive capacitive PUF-based envelope and its embedded security sensor IC into a secure architecture. Our concept proposes a secure life cycle concept including shipment aspects, a full key generation scheme with re-enrollment capabilities, and the next generation Embedded Key Management System.
With FORTRESS, we take the next step towards the productive
operation of PUF-based HSMs.
Deterministic and Statistical Strategies to Protect ANNs against Fault Injection Attacks
ABSTRACT. Artificial neural networks are currently used for many tasks, including safety critical ones such as automated driving. Hence, it is very important to protect them against faults and fault attacks. In this work, we propose two fault injection attack detection mechanisms: one based on using output labels for a reference input, and the other on the activations of neurons. First, we calibrate our detectors during normal conditions. Thereafter, we verify them to maximize fault detection performance. To prove the effectiveness of our solution, we consider highly employed neural networks (AlexNet, GoogleNet, and VGG) with their associated dataset ImageNet. Our results show that for both detectors we are able to obtain a high rate of coverage against faults, typically above 96%. Moreover, the hardware and software implementations of our detector indicate an extremely low area and time overhead.
GAIROSCOPE: Leaking Data from Air-Gapped Computers to Nearby Smartphones using Speakers-to-Gyro Communication
ABSTRACT. It is known that malware can leak data from isolated, air-gapped computers to nearby smartphones using ultrasonic waves. However, this covert channel requires access to the smartphone's microphone, which is highly protected in Android OS and iOS, and might be non-accessible, disabled, or blocked.
In this paper we present `GAIROSCOPE', a ultrasonic covert channel that doesn't require a microphone on the receiving side. Our malware generates ultrasonic tones in the resonance frequencies of the MEMS gyroscope. These inaudible frequencies produce mechanical vibrations within the smartphone's gyroscope, which can be demodulated into binary information. Notably, the gyroscope in smartphones is considered to be a 'safe' sensor that can be used in apps and javascript. We introduce the adversarial attack model and present related work. We provide the relevant technical background and present the design and implementation of GAIROSCOPE. We present the evaluation results and discuss a set of countermeasures to this threat. Our experiments show that attackers can exfiltrate sensitive information from air-gapped computers to a smartphone located few meters away via Speakers-to-Gyroscope covert channel.
LibBlock - Towards Decentralized Library System based on Blockchain and IPFS
ABSTRACT. In modern times, the definition and the library's expected functionality did not change much as before. It is still a place for us to hold massive collections of information. Traditionally, libraries require physical storage space for writings and publications, but storing and managing costs can be tremendous. Although the aid of digital promises and computers allows a super high density of information storage, it did not lower the library's complexity. As our main source of information is moving away from physical writings toward digital, the new digital library (i.e., state-run library) faces the challenges of records' integrity and storage efficiency. Focused on this issue, we learn the demands from the Royal Library in Denmark and explore the use of blockchain technology. We introduce a system named LibBlock, by combining with both smart contract and IPFS in order to provide a robust, decentralized, flexible, and adaptive e-Library, which enables the ease of scalability and rigid record keeping. In the evaluation, we investigate the initial performance of LibBlock with Ethereum and show its viability and efficiency.
EPF: An Evolutionary, Protocol-Aware, and Coverage-Guided Network Fuzzing Framework
ABSTRACT. Despite the success of coverage-guided fuzzing, the technique is rarely applied to the network domain. Here, specifications require fuzzers to handle complex data structures and communication protocols, limiting the effective usage of most existing coverage-guided fuzzers. In this paper, we introduce EPF, a coverage-guided, protocol-aware network fuzzing framework. EPF uses population-based simulated annealing to heuristically schedule packet types during fuzzing. In conjunction with a custom genetic algorithm that uses coverage metrics as fitness function, the framework steers input generation towards coverage maximization. Users can add protocols by defining packet models and state graphs through a Scapy-powered API. We conduct a case study on an implementation of the IEC 60870-5-104 SCADA protocol and compare our proof of concept with AFLNet. Based on a total of 600 CPU days worth of fuzzing data, we measure effectiveness using bug and coverage metrics. Regardless of various possible optimizations not yet implemented, we show that EPF achieves similar effectiveness as AFLNet.
Clear the Fog: Towards a Taxonomy of Self-Sovereign Identity Ecosystem Members
ABSTRACT. The current self-sovereign identity (SSI) ecosystem is rapidly changing and ill-defined. Manifold actors, projects, and initiatives produce different SSI solutions, frameworks, protocols, and distributed ledgers. Even though some patterns exist among SSI ecosystem members, no elaborate systematization has been made. This paper conducts a systematic gray literature review to structure the SSI ecosystem. Specifically, we derive a four-dimensional taxonomy to describe members of the SSI ecosystem. Then, we classify the ecosystem members into eight archetypes to help locate new and existing members within the SSI ecosystem. We find that SSI ecosystem members either govern the SSI ecosystem and/or networks, implement SSI offerings, or support governing and/or implementing members. The study suggests that, as the SSI ecosystem grows, the number of governing members will grow slower than the number of implementing and supporting members.
PIdARCI: Using Assembly Instruction Patterns to Identify, Annotate, and Revert Compiler Idioms
ABSTRACT. Analysis of binary code is a building block of computer security.
Especially in malware or firmware analysis where source code oftentimes is not available, techniques like decompilation are utilized to figure out the functionality of binaries.
While decompilation of binary code is a challenging task itself, compiler optimization heavily complicates it.
During the optimization phase in modern compilers, human-readable expressions are often transformed into instruction sequences (compiler idioms) that may be more efficient in terms of speed or size than the direct translation.
However, these transformations are often considerably worse in terms of readability for the analyst.
Such compiler specific sequences are not only significantly longer than the apparent translation of the original high-level language operation but also have no trivial correlation to the original expression's semantics.
Modern decompilers address this issue by reverting compiler idioms using static, manually crafted rules.
In this paper, we introduce a novel approach to find and annotate arithmetic compiler idioms with their corresponding high-level language expressions to significantly simplify manual analysis.
In contrast to previous approaches, our method does not require manual work to create the pattern database for matching compiler idioms and significantly less manual labour to derive the transformation rules to calculate the original constants.
In our evaluation, we compared the results of PIdARCI against the current academic and commercial state-of-the-art Ghidra, RetDec, and Hex Rays / IDA Pro.
We evaluated that PIdARCI matches more than 99% of all considered compiler idioms, exceeding the matching rate of the other approaches.
Additionally, in contrast to PIdARCI, these approaches rely on manually created patterns leading to considerably more manual effort for maintaining rules or adding support for new compiler idioms, e.g. for a new compiler version or architecture.
Dazed and Confused: What’s Wrong with Crypto Libraries?
ABSTRACT. Recent studies have shown that developers have difficulties in using cryptographic APIs, which often led to security flaws. We are interested to tackle this matter by looking into what types of problems exist in various crypto libraries. We manually studied 500 posts on Stack Overflow associated with 20 popular crypto libraries. We realized there are 10 themes in the discussions. Interestingly, there were only two questions related to attacks against cryptography. There were 63 discussions in which developers had interoperability issues when working with more than a crypto library. The majority of posts (112) were about encryption/decryption problems and 111 were about installation/compilation issues of crypto libraries. Overall, we realize that the crypto libraries are frequently involved in more than five themes of discussions. We believe the current initial findings can help team leaders and experienced developers to correctly guide the team members in the domain of cryptography. Moreover, future research should investigate the similarity of problems at the API level among popular crypto libraries
Fool Me Once: A Study of Password Selection Evolution over the Past Decade
ABSTRACT. Passwords have been around for many decades and have tenaciously remained the primary means of identification and authentication. Assuming that the communication channel is not intercepted, the strength of security provided by passwords is largely dependent on two factors: password selection and password storage mechanism. While both areas have been looked into by researchers in the past, there is no consensus to suggest whether or not humanity has moved towards choosing stronger passwords, notwithstanding strong password enforcement policies. One of the key reasons behind this shortcoming is the lack of data about individual credentials in leaked datasets, which usually contain only usernames and passwords.
To the best of our knowledge, we are the first researchers to enrich the attribute set of any user credential database, thus allowing deeper insights. We outline the method we devised for adding new attributes (time-stamp and source inference) to one of the largest user credential datasets (roughly 1.4 billion credentials leaked between 2008 and 2021). Subsequently, we use our modified dataset to determine how passwords have evolved overtime with respect to strength and whether humankind as a whole has learned from its past mistakes.
ABSTRACT. Passphrases offer an alternative to traditional pass-words which aim to be stronger and more memorable. However,users tend to choose short passphrases with predictable patterns that may reduce the security they offer. To explore the potential of long passphrases, we formulate a set of passphrase policies and guidelines aimed at supporting their creation and use. Through a 39-day user study we analyze the usability and security of passphrases generated using our policies and guidelines. Our analysis indicates these policies lead to reasonable usability and promising security for some use cases, and that there are some common pitfalls in free-form passphrase creation. Our results suggest that our policies can support the use of long passphrases.
Towards Query-efficient Black-box Adversarial Attack on Text Classification Models
ABSTRACT. Recent work has demonstrated that modern text classifiers trained on Deep Neural Networks are vulnerable to adversarial attacks. There is not enough study on text data in comparison to the image domain. The lack of investigation originates from the challenges that authors confront in the text-domain. Despite being extremely prosperous, most adversarial attacks in the text domain ignore the overhead they induced on the victim model. In this paper, we propose a Query-efficient Black-box Adversarial Attack on text data that tries to attack a textual deep neural network by considering the amount of overhead that it may produce. We show that the proposed attack is as powerful as the state-of-the-art adversarial attacks while requiring fewer queries to the victim model. The evaluation of our method proves the promising results.
Deep Federated Learning-Based Cyber-Attack Detection in Industrial Control Systems
ABSTRACT. Due to the differences between Information Technology (IT) and Industrial Control System (ICS) networks, current IT security solutions are not working effectively on ICS networks. Moreover, due to security and privacy issues, ICS owners usually do not share their network data with third parties to train specific machine learning-based ICS security solutions. To rectify the mentioned issues, a scalable deep federated learning-based method is presented in this paper. In the proposed method, each client trains an unsupervised deep neural network model using local data and shares its parameters with a server. The server aggregates the clients' parameters, makes a generalized public model, and shares it with all clients. The proposed model is evaluated using a real-world ICS dataset in a water treatment system and compared with two non-federated learning-based methods. Findings show that the proposed method outperformed the other two methods with the same computational complexity as other deep neural network-based methods in the literature.
Trust Quantification for Autonomous Medical Advisory Systems
ABSTRACT. Autonomous Medical Advisory Systems (AMAS) integrate sensors and implement learning technologies to provide intelligent and real-time recommendations. In this paper, we propose a formal framework for quantifying trust using the Bayesian network for the sensor layer of AMAS systems. First, we identify the various factors influencing trust in this context. We make the factors granular enough such that the probability of the trust for the factor to be in a specific state can be measured. Then, using a probabilistic graphical model, we impose a compact structure to the identified factors such that the posterior probability of the trustworthiness of the entire system or its constituents can be computed. Parametrized cases of Bayesian network are simulated in MATLAB to demonstrate the applicability and scalability of the model for trust inference.
Traceable and Privacy-Preserving Non-Interactive Data Sharing in Mobile Crowdsensing
ABSTRACT. Data sharing is one of the key technologies, which provides the practice of making data collected from a crowd of mobile devices available to others using a cloud infrastructure, known as mobile crowdsensing (MCS). However, the collected data may contain sensitive information, and sharing them in public clouds without proper protection could cause serious security problems, such as privacy leakage, unauthorized access, and secret key abuse. To address the above issues, in this paper, we propose a Traceable and privacy-preserving non-Interactive Data Sharing (TIDS) scheme in mobile crowdsensing. Specifically, to achieve privacy-preserving fine-grained data sharing, an attribute-based access policy is generated by a data owner without interacting with data users in the TIDS. Furthermore, we design a ciphertext conversion mechanism to support flexible data sharing. Also, by utilizing traceable Ciphertext-Policy Attribute-Based Encryption (CP-ABE), TIDS supports a trusted authority to trace malicious users who abuse their secret keys without incurring additional computational overhead. Security analysis demonstrates that TIDS can protect the confidentiality of the outsourced data. Experimental results show that TIDS can achieve efficient data sharing in mobile crowdsensing applications.
Designing Personalized OS Update Message based on Security Behavior Stage Model
ABSTRACT. As one of the scales which assess the end-user’s security behavior, the security behavior stage model (SeBeST) [1] is a practical approach to characterize similar groups of users (precontemplation, contemplation, preparation, action and maintenance stages) and provide customized remedies to improve their security behavior. For example, in OS update message customization, a group that does not update OS continuously may require a message indicating the ease of OS update; on the other hand, updating users need a message indicating the importance of OS update. In this paper, we propose a personalized OS update message interface based on SeBeST. We conduct two online surveys to evaluate effective appearance and message as the personalized user interface (UI). First, we assess the interface's appearance individually for the three behavior stages (preparation, action, and maintenance) and then combine the customized messages and the selected impressions for these stages. We confirmed that appropriate appearances are different for each stage. For example, a highlighted red button is efficient for users in the preparation stage. On the other hand, the red background is suitable for users of the action and maintenance stages. The highlighted yellow and green buttons are not suitable for any stages. We discovered that the combination of the message indicating the disadvantage of the OS update and the UI which is the highlighted red button is suitable for the preparation and action stages. In addition, we confirmed the best combination for users of the maintenance stage is a message indicating the ease of OS update and the UI which is mouse over pop-up representation. Therefore, it is necessary for the user to show the appropriate message and UI for each user of the three stages. In this research, we focus on SeBeST and OS updating behavior as the same as literature [1], however, these findings could be applied to other scales and other security behaviors by changing experimental design slightly.
Evaluating the Current State of ApplicationProgramming Interfaces for Verifiable Credentials
ABSTRACT. One of the challenges to the adoption of the decentralised approach to digital ID is a lack of consensus and standardisation of how different stakeholders within the ecosystem can inter-operate. As a means to address this issue, we examine the use of Standard Application Programming Interfaces (API) to integrate decentralised digital identification systems to preexisting ones. We first examine the current literature and solutions to (a) assess the attributes necessary to compare and contrast APIs, and (b) create a list of API providers within the decentralised digital ID marketplace, (c) compare the API providers against the attributes established. Based on an API Usability and Adoption framework as our lens, we assessed 19 service providers of APIs against their use cases. We identified that whilst the APIs are maturing, the APIs remain inconsistent and poorly adopted. A clear standard API could assist in better adoption. The guidance provided can inform organisations implementing digital identity and verifiable credentials along their adoption journey.
Cross the Chasm: Scalable Privacy-Preserving Federated Learning against Poisoning Attack
ABSTRACT. Privacy protection and defense against poisoning attack and are two critical problems hindering the proliferation of federated learning (FL). However, they are two inherently contrary issues. For constructing a privacy-preserving FL, solutions tend to transform the original information (e.g., gradient information) to be indistinguishable. Nevertheless, to defend against poisoning attacks is required to identify the abnormal information via the distinguishability. Therefore, it is really a challenge to handle these two issues simultaneously under a unified framework. In this paper, we build a bridge between them, proposing a scalable privacy-preserving federated learning (SPPFL) against poisoning attacks. To be specific, based on the
the technology of secure multi-party computation (MPC), we construct a secure framework to protect users’ privacy during the training process, while punishing poisoners via the method of distance evaluation. Besides, we conduct a rigorous proof to demonstrate the security of our SPPFL. Furthermore, we implement extensive experiments to illustrate the performance of our scheme.
A Hybrid Approach for Privacy-Preserving Graph Neural Network using SGX
ABSTRACT. The Multi-party Secure Computation (MPC)-based methods for privacy-preserving Graph Neural Network (GNN) are still challenged by high communication overhead. Moreover, the security guarantee of most MPC-based methods can only secure against the semi-honest adversary, and upgrading the security guarantee into malicious causes a further increase in communication overhead. Alternatively, Software Guard Extensions (SGX) provides native CPU speed for secure computation while ensuring data confidentiality and code integrity. Unfortunately, previous work has shown that SGX is vulnerable to side-channel attacks that deprive it of confidentiality and preserve only its integrity. To solve the above problems, we propose a generic n-party secure computation framework for privacy-preserving GNN using SGX. This framework can reduce the communication overhead and improve the security guarantee of the protocol without relying on the confidentiality of SGX. Specifically, both the data holders and the server hold SGX. Each data holder sends secret shares of data to other data holders. They enrich the data and perform MPC efficiently with the assistance of the server. Code integrity of SGX ensures that data holders and the server must execute according to protocols, so malicious adversaries cannot deviate from the protocol to breach privacy and security. Even if the confidentiality of SGX was breached, the adversary could only access the ciphertext in MPC, not the plaintext. We conduct experiments on public datasets to demonstrate that our framework has achieved comparable performance with the traditional GNN and perform security analysis to validate that our framework satisfies security and privacy requirements.
Unmasking Privacy Leakage through Android Apps Obscured with Hidden Permissions
ABSTRACT. Data theft is the major security threat for the mobile app users. The growing importance of digitization motivates diversity of available applications. This leads to make the
conventional screening mechanism largely ineffective specifically for Android smartphones. In this paper, we propose a novel and lightweight method for classifying Android apps into low, medium and high risk categories. Our approach relies largely on the other permissions (also called as hidden permissions) of the Android applications. We can get these permissions only on the official site of the application Google Play store. We have proposed linear regression based technique to classify the apps into different risk categories. We will show how other permissions can be used as strong indicator for defining risk categories. We have used K-means clustering to validate and explain the decision of our method. In an evaluation with 500 applications and 101 other
permissions, our proposed approach decide the risk factor of the app and the explanation is provided for each detection reveal relevant properties of the detected risk.
Light-weight Active Security for Detecting DDoS Attacks in Containerised ICPS
ABSTRACT. Containerisation technologies like Docker provide
unparalleled flexibility in deploying software. In Industrial
Cyber-Physical Systems (ICPS), containerisation promises high
scalability, reconfigurability and dependability. Denial of Service
(DoD/DDoS) is a significant security threat in containerised ICPS
applications, which execute on resource-constrained computers
like PLCs, and cannot support traditional security mechanisms
like firewalls that sacrifice performance and throughput.
We propose a novel, light-weight active security approach
to detecting DoS/DDoS attacks through frequency analysis of
network traffic (packets). Our approach identifies attacks by
recording a frequency signature of the flow of packets in an ICPS
under normal operation. Subsequently, an attack is modelled as
any anomalies in the network that modify the frequency profile of
network traffic in the ICPS. Our prototype implementation and
evaluation show that this active security method is light-weight
and suitable for resource-constrained ICPS platforms.
Detection of Induced False Negatives in Malware Samples
ABSTRACT. Malware detection is an important area of cyber security. Computer systems rely on malware detection applications to prevent malware attacks from succeeding. Malware detection is not a straightforward task, as new variants of malware are generated at an increasing rate. Machine learning has been utilised to generate predictive classification models to identify new malware variants which conventional malware detection methods may not detect. Machine learning, has however, been found to be vulnerable to different types of adversarial attacks, in which an attacker is able to negatively affect the classification ability of the ML model. Several defensive measures to prevent adversarial poisoning attacks have been developed, but they often rely on the use of a trusted clean dataset to help identify and remove adversarial examples from the training dataset. The defence in this paper does not require a trusted clean dataset, but instead, identifies intentional false negatives (zero day malware classified as benign) at the testing stage by examining the activation weights of the ML model. The defence was able to identify 94.07% of the successful targeted poisoning attacks.
A Novel Intrusion Detection Model for Class-imbalanced Learning Based on SMOTE and Attention Mechanism
ABSTRACT. With the rapid development of the Internet of
Things, the continuous emergence of network attacks has brought
great threats to network security. Intrusion Detection System(IDS) can identify malicious network attacks and has become
a powerful tool to ensure network security. Many methods based
on deep learning have been applied in intrusion detection systems.
However, most of these studies ignore the imbalance of network
traffic, and the focus of intrusion detection is to find a small
number of attack samples. Therefore, they have low accuracy in
classifying network attack samples that are far less than normal
traffic. In this article, we establish an intrusion detection model
SE-DAS (SMOTE and Edited Nearest Neighbours with Dual
Attention SRU, SE-DAS), which uses the SE algorithm to balance
the minority samples in network intrusion detection. Specifically, we use the feature attention mechanism to analyze the
relationship between historical information and input features,
and extract important features. A timing attention mechanism is
used to independently select historical information at key time
points in the SRU(Simple Recurrent Units) network to improve
the stability of the model detection efficiency. The experimental
results on the UNSW-NB15 dataset show that the detection effect
of the model on minority categories is 0.037 higher than the
macro-average ROC area using the original SMOTE algorithm,
and the recall rate reaches 98.65%, which is better than similar
deep learning models.
Data Storage in the Multi-Cloud: Data Splitting Leveraging on Existing Data
ABSTRACT. Data splitting tries to preserve privacy by partitioning data into fragments to be stored in various cloud storage locations and therefore can be shared across the multicloud. It offers an advantage over methods that purely rely on cryptography because it allows data to be stored in clear, thus supports most data operations. However, the majority of the existing data splitting techniques do not consider data that has already been stored or exists in the multi-cloud. This leads to unnecessary use of resources to re-split data into data fragments that are readily available in the multi-cloud. This work proposes a data splitting framework that leverages existing data in the multi-cloud. The framework improves data splitting mechanisms, reducing the number of operations required to split data, and the number of resulting data fragments. Therefore, also reducing the number of Cloud storage locations managed by a data owner. The framework tries to search third-party data fragments that already exist in the multi-cloud to avoid costly operations, while other data fragments are outsourced for storage. This work examines the considerations for employing third-party data fragments when data splitting. An analysis was conducted on the applicability of the proposed framework to existing data splitting techniques. The proposed framework was applied to an existing data splitting mechanism to complement its capabilities.