PST2021: 18TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY, AND TRUST 2021
PROGRAM FOR MONDAY, DECEMBER 13TH
Days:
next day
all days

View: session overviewtalk overview

10:15-10:30Short Break
13:50-14:00Short Break
14:00-15:20 Session 1
14:00
EPSim-GS: Efficient and Privacy-Preserving Similarity Range Query over Genomic Sequences

ABSTRACT. Similarity query over genomic sequences has played a significant role in personalized medicine and has applications in various fields, including DNA alignment and genomic sequencing. Since handling genomic sequences requires massive storage and considerable computational capacity, service providers prefer to process similarity queries over genomic sequences on cloud servers rather than at client sides. Due to the sensitivity of genomic sequences, preserving the privacy of queries has attracted considerable attention, and as a result, genomic sequences demand to be outsourced in an encrypted form. Although many schemes have been proposed for similarity queries over encrypted genomic data, they are either inefficient or have limitations in supporting the dynamic update of the dataset. To address the challenge, we propose an efficient and privacy-preserving similarity range query scheme, namely EPSim-GS. First, we introduce how to build a hash table to index the dataset, and present a similarity range query algorithm based on the hash table. Then, we design two cloud-based privacy-preserving protocols based on the Paillier cryptosystem to support the similarity range query algorithm over the encrypted dataset. After that, we propose the privacy-preserving similarity range query scheme (EPSim-GS) by leveraging the two privacy-preserving protocols. We then analyze the security of EPSim-GS and prove that it is privacy-preserving. Finally, we perform experiments to evaluate the scheme's performance, and the results indicate that it is computationally efficient.

14:16
Balancing Efficiency and Security for Network Access Control in Space-Air-Ground Integrated Networks

ABSTRACT. In this paper, we investigate the efficiency of network access control with the co-existence of multiple network operators and propose an efficient and secure network access control architecture (ESNAC) that offers fast identity authentication and access authorization in space-air-ground integrated networks. The major challenge lies in enabling multiple independent network operators to authorize and authenticate mobile users for network access in a secure and efficient way, even they are not mutually trusted. To address this challenge, we introduce a new aggregate anonymous credential mechanism to enable a mobile user to present network access authorization of a group of network operators based on the consolidated anonymous credential that is aggregated from the partial anonymous credentials of the network operators. In addition, the efficient authentication of packet delivery is provided based on a sequential aggregate signature that enables each network operator to sign network packets for authentication and sequentially aggregate signatures for communication efficiency. Finally, we discuss the desired security properties of ESNAC and demonstrated its computational and communication efficiency by comparing with the conventional scheme without aggregation.

14:32
PUPy: A Generalized, Optimistic Context Detection Framework for Implicit Authentication

ABSTRACT. All modern smart devices like smartphones and laptops employ some form of authentication to ensure that access to confidential data by the wrong person is avoided. Implicit authentication aims to limit the amount of explicit authentications that are necessary for the user, using passive approaches to authenticate the user instead. Context detection frameworks aim to reduce explicit authentications by disabling explicit authentication entirely when appropriate. Since explicit and implicit authentication are not mutually exclusive, we can also use context detection frameworks to decide whether explicit or implicit authentication should be used when authentication is required.

We present a novel context detection framework, PUPy, that uses sensed context data to infer and make available three values---privacy, unfamiliarity, and proximity---allowing clients of our framework, like authentication services, to better adapt to different contexts. As opposed to existing work, our context detection framework is based on an optimistic approach to context detection. Our assumption is that the absence of data, like the inability to detect nearby people or devices, can be taken as a sign that a context is safe. Such an optimistic approach may provide less security than a pessimistic approach, but provides a significantly improved user experience due to reducing the number of explicit authentications.

We provide an Android implementation of the framework, including an API that allows other developers to contribute modules to the system. Finally, we conduct statistical analysis of our framework based on a large real-world dataset. We find that PUPy compares favourably to existing works, permitting a 77.2\% reduction in the number of explicit authentications.

14:48
Using CGAN to Deal with Class Imbalance and Small Sample Size in Cybersecurity Problems

ABSTRACT. Predictive modelling in cybersecurity domains usually involves dealing with complex settings. The class imbalance problem is a well-know challenge typically present in the cybersecurity domain. For instance, in a real-world intrusion detection scenario, the number of attacks is expected to be a a very small percentage of the normal cases. Moreover, in these applications, the number of available examples labelled is also small due to the complexity and cost of the labelling process: teams of domain experts need to be involved in the process which becomes expensive, time consuming and prone to errors. To address these problems is critical to the success of predictive modelling in cybersecurity applications.

In this paper we tackle the class imbalance and small sample size through the use of a CGAN-based up-sampling procedure. We carry out an extensive set of experiments that show the positive impact of applying this solution to address the class imbalance and small sample size problems. A large data repository is built and freely provided to the research community containing 114 binary datasets based on real-world cybersecurity problems that are generated with diversified levels of imbalance and sample size. Our experiments show a clear advantage of using the CGAN-based up-sampling method specially for situations where the sample size is small and there is a large imbalance between the problem classes. In the most critical scenarios associated with extreme rarity and very small sample size, an impressive performance boost is achieved. We also explore the behaviour of this approach when the presence of these problems is less marked and we found that, while CGAN-based up-sampling is not able to further improve the minority class performance, it also has no negative impact. Thus, it is a safe to use solution, also in these scenarios.

15:04
Updatable Linear Map Commitments and Their Applications in Elementary Databases

ABSTRACT. Linear map commitments allow the prover to commit to a vector in an l-dimensional space over a finite field, with the ability to open the result of an arbitrary linear map from an l-dimensional space to a q-dimensional space over a finite field. In this paper, we propose updatable feature and perfectly hiding property for linear map commitments. Updatable feature means the prover can update the commitment more efficiently than recompute the commitment when some of the entries in the committed vector are changed. Perfectly hiding property ensures the commitment reveals no information about the committed vector. Then we present the implementation of our updatable linear map commitment (ULMC) on R-ate pairing defined over the 256-bit BN curve recommended in SM9 standard, which provides around 100-bit security. Our implementation shows that the ULMC’s are efficient enough to support the elementary database constructions that simultaneously support batching membership test, linear combination test, updatable feature and authenticity. Finally, we show that the ULMC-powered elementary databases are capable of supporting various applications where privacy and trust are the first priority such as exam result management systems, Internet of Things (IoT) management systems and empowering the business between banks and enterprises.

15:20-15:30Short Break
15:30-16:50 Session 2
15:30
The Race-Timing Prototype

ABSTRACT. The disclosure of transient execution attacks, such as Meltdown and Spectre, has once again highlighted the threat of cache side-channel exploitation, specially as an exploitable hardware component capable of leaking sensitive user data. The existing Prime+Probe and Flush+Reload techniques exploit this side-channel by closely monitoring cache hits and misses, achiev- able by utilizing the processor’s cycle counter as a timer to mea- sure the latency of memory accesses. From the manufacturer’s perspective, reducing the cycle counter’s resolution appears to be a suitable countermeasure against the aforementioned techniques, and AMD now implements akin modifications in their Zen micro-architecture. Yet, this paper introduces The Race-Timing Prototype, in the x86-64 architecture, to prove that a devoted adversaries can effectively distinguish cache hits from misses by only employing carefully-crafted memory race conditions. Furthermore, in view that our prototype does not rely on any proprietary instruction set extension, we demonstrate that our design works on both Intel and AMD processors. Additionally, with propose a new semantic of Out-of-Order Detachment that further allows our Race-Timing measurements to closely match the accuracy of hardware Performance Monitoring Counters when distinguishing fast L1D and L2 cache accesses. On the whole, this work demonstrates that an adversary can, without utilizing cycle counters at all, exploit a cache side-channel with high-precision timing.

15:46
Securing Critical Infrastructure Through Innovative Use Of Merged Hierarchical Deep Neural Networks

ABSTRACT. Multi-clouds are becoming central to large and modern applications including those in business, industry and critical infrastructure sectors. Designers usually deploy these clouds hierarchically to get the best advantage of low latency of the edge clouds and high processing capabilities of the core clouds. The data that flows into the clouds for processing and storage and moves out to other system domains for further use must cross multiple trust boundaries, and as a result, face large attack surfaces. This gives malicious actors abundant opportunity to penetrate and potentially harm organizations or bring down critical services causing widespread disruptions and mayhem. Deep neural network models can be used in innovative ways to protect the confidentiality and integrity of dataflows in the clouds. However, the use of deep learning comes with some challenges. In large multi-location and multi-cloud environments, deep learning models grow rapidly in size and complexity, inhibiting fast training of cloud models and making it difficult to maintain accuracy of detection of known and unknown attacks on the data-in-motion. This impedes their use in critical infrastructure services. We propose innovative distributed-hierarchical-merged models, which make use of cooperative training at the edge and the core clouds, and the power of data and model parallelisms, to achieve rapid training with high accuracy. Our broad objectives in this paper are twofold: Firstly, we show that merged hierarchical deep learning models, working cooperatively in the multi-cloud, significantly reduce the parameters to be trained and results in faster core cloud training time. Secondly, training the merged core model with distribute strategy for data parallelism on CPUs and GPUs further reduces the training time considerably. We also show that while achieving improvement by 15-20% over unmerged models and the accuracy of detection of unknown attacks in the range 96.9-99.5%.

16:02
Towards Change Detection in Privacy Policies with Natural Language Processing

ABSTRACT. Privacy policies notify users about the privacy practices of websites, mobile apps, and other products and services. However, users rarely read them and struggle to understand their contents. Due to the complicated nature of these documents, it gets even harder to understand and take note of any changes of interest or concern when the policies are changed or revised. With advances in machine learning and natural language processing, tools that can automatically annotate sentences of policies have been developed. These annotations can help a user identify and understand relevant parts of a privacy policy. In this paper, we present our attempt to further such annotations by also detecting the important changes that occurred across sentences. Using supervised machine learning models, word-embedding, similarity matching, and structural analysis of sentences, we present a process that takes two different versions of a privacy policy as input, matches the sentences of one version to another based on semantic similarity, and identifies relevant changes between two matched sentences. We present the results and insights of applying our approach on 81 privacy policies manually downloaded from Facebook, WhatsApp, Twitter, Google, LinkedIn and Snapchat, ranging between the period of 1999 to 2020.

16:18
Measurement of Local Differential Privacy Techniques for IoT-based Streaming Data

ABSTRACT. Various Internet of Things (IoT) devices generate complex, dynamically changed, and infinite data streams. Adversaries can cause harm if they can access the user's sensitive raw streaming data. For this reason, protecting the privacy of the data streams is crucial. In this paper, we explore local differential privacy techniques for streaming data. We compare the techniques and report the advantages and limitations. We also present the effect on component (e.g., smoother, perturber) variations of distribution-based local differential privacy. We find that combining distribution-based noise during perturbation provides more flexibility to the interested entity.

16:34
TEE-based Selective Testing of Local Workers in Federated Learning Systems

ABSTRACT. This paper considers a federated learning system composed of a central coordinating server and multiple distributed local workers, all having access to trusted execution environments (TEEs).In order to ensure that the untrusted workers correctly perform local learning, we propose a new TEE-based approach that also combines techniques from applied cryptography, smart contract and game theory. Theoretical analysis and implementation-based evaluations show that, the proposed approach is secure, efficient and practical.

16:50-17:00Short Break
17:00-18:20 Session 3
17:00
IoT Malware Detection Using Function Call Graph Embedding

ABSTRACT. In the era of rapid network development, IoT devices are being deployed more and more widely, and various kinds of malware are gradually appearing with the deployment level. Among the existing static analysis, structure-based analysis such as graph embedding solutions could retrieve the semantic features of targeted binary, thereby receiving lots of attentions. However, when applying the well-known graph embedding solutions on malware analysis, not only the structure of binary, but also the relationship between neighboring nodes should be taken into consideration to achieve a more robust solution. In this paper, we propose a novel IoT malware detection based on Function-Call Graph (FCG) embedding, where caller-callee relationship is considered as important features for analysis. The performance of the proposed method is evaluated on a large- scale dataset consisting of 112K malware and 89k benignware samples collected from seven CPU architectures. The method shows a 99% accuracy on IoT malware detection, outperforms the existing graph embedding solutions. Moreover, when CPU architecture is taken into consideration, the proposed method with support vector machine and multilayer perception classifier can yield even higher performance.

17:16
Intrusion Detection in Internet of Things using Convolutional Neural Networks

ABSTRACT. Internet of Things (IoT) has become a popular paradigm to fulfil needs of the industry such as asset tracking, resource monitoring and automation. As security mechanisms are often neglected during the deployment of IoT devices, they are more easily attacked by complicated and large volume intrusion attacks using advanced techniques. Artificial Intelligence (AI) has been used by the cyber security community in the past decade to automatically identify such attacks. However, deep learning methods have yet to be extensively explored for Intrusion Detection Systems (IDS) specifically for IoT. Most recent works are based on time sequential models like LSTM and there is short of research in CNNs as they are not naturally suited for this problem. In this article, we propose a novel solution to the intrusion attacks against IoT devices using CNNs. The data is encoded that facilitates the convolutional operations to capture the patterns from the sensors data along time that are useful for attacks detection by CNNs. The proposed method is integrated with two classical CNNs: ResNet and EfficientNet, where the detection performance is evaluated. The experimental results show significant improvement in both true positive rate and false positive rate compared to the baseline using LSTM.

17:32
Clustering based opcode graph generation for malware variant detection

ABSTRACT. Malwares are the key means leveraged by threat actors in the cyber space for their attacks. There is a large array of commercial solutions in the market and significant scientific research to tackle the challenge of the detection and defense against malwares. At the same time, attackers also advance their capabilities in creating polymorphic and metamorphic malwares to make it increasingly challenging for existing solutions. To tackle this issue, we propose a methodology to perform malware detection and family attribution. The proposed methodology first performs the extraction of opcodes from malwares in each family and constructs their respective opcode graphs. We explore the use of clustering algorithms on the opcode graphs to detect clusters of malwares within the same malware family. Such clusters can be seen as belonging to different sub-family groups. Opcode graph signatures are built from each detected cluster. Hence, for each malware family, a group of signatures is generated to represent the family. These signatures are used to classify an unknown sample as benign or belonging to one the malware families. We evaluate our methodology by performing experiments on a dataset consisting of both benign files and malware samples belonging to a number of different malware families and comparing the results to existing approach.

17:48
Gaining Location Privacy from Service Flexibility: A Bayesian Game Theoretic Approach

ABSTRACT. When using location-based services (LBSs), a user obtains points-of-interest (PoI) information by providing the LBS platform with his current geo-location. Such a search also leads to potential privacy leakage if an adversary has access to his geo-data. Traditional k-anonymity mechanisms instruct a user to bear the overhead to report his current location together with k-1 dummy locations to confuse the adversary, which only work well given a large number k. Aware of the common practices that a user is actually flexible in service requirement (e.g., as long as the searched PoIs are within his walking distance), we propose a novel approach to help the user gain location privacy from service flexibility for the case of a small number k. By analyzing the strategic interaction between the user and the adversary in a Bayesian game, we prove that the user with service flexibility should never report his real location for searching PoIs nearby. Instead, he should jointly use all k dummy locations to confuse the adversary's inference of his real location. Take k=2 for example, we manage to show that if the adversary is not likely to access both dummy geo-data, the user should report the two dummy locations at two opposite directions of his real location, and otherwise at the same direction. Perhaps surprisingly, our approach may enable the user to benefit from the adversary's access to more geo-data. Finally, extensive simulations using some real data show that our mechanism obviously outperforms k-anonymity mechanism especially under a small number k.

18:04
Searching on Non-Systematic Erasure Codes

ABSTRACT. Non systematic erasure codes provide confidentiality of data and can even provide strong security guarantees with appropriate parameter selection. So far however, they have lacked an elegant and efficient method for keyword search over the codes. While the obvious method of reconstruction before search can be too slow, direct search produces inaccurate results. This has been one of the barriers in the wider adoption of non systematic erasure codes in distributed and secure storage. In this paper we present an elegant solution to this problem by building an index data structure that we call the Search Vector, created from the generator matrix of the erasure code. We show that this method introduces a very small amount of delay in return of a high degree of accuracy in search results. We also analyse the security of the scheme in terms of information leakage and show that the information leaked from the index data structure is very small even when one assumes the worst case scenario of the attacker having access to the Search Vector.