IWSPA 2020: 6TH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS 2020
PROGRAM FOR WEDNESDAY, MARCH 18TH

View: session overviewtalk overview

09:15-10:15 Session 1: Security Metrics and APT

IWSPA Keynote

09:15
Security Metrics and Risk Analysis for Enterprise Systems

ABSTRACT. Protection of enterprise systems from cyber attacks is a challenge. Vulnerabilities are regularly discovered in software systems that are exploited to launch cyber attacks. Security Analysts need objective metrics to manage the security risk of an enterprise systems. In this talk, we will give an overview of our research on security metrics and challenges for security risk analysis of enterprise systems. A standard model for security metrics will enable us to answer questions such as “are we more secure than yesterday” or “how does the security of one system compare with another?” We will present a methodology for security risk analysis that is based on the model of Attack Graphs and the Common Vulnerability Scoring System (CVSS). We will also present a framework for detection and forensic analysis of Advanced Persistent Threats.

10:45-12:15 Session 2: Multi-modal Data Analysis for Security

Full Paper Session 1

10:45
Mitigating File-Injection Attacks with Natural LanguageProcessing

ABSTRACT. Searchable Encryption can bridge the gap between privacy protec-tion and data utilization. As it leaks access pattern to attain practicalsearch performance, it is vulnerable under advanced attacks. Whilethese advanced attacks show significant privacy leakage, the as-sumptions of these attacks are often strong and the methods thatcan be used to mitigate these attacks are limited.In this paper, we investigate one of these advanced attacks, re-ferred to as file-injection attacks, and examine whether this attackis viable in practice. In addition, we also propose a defense methodto mitigate file-injection attacks. By leveraging natural languageprocessing, we formulate the generation of injected files in theattack as an automated text generation problem with restrictionson word selection, and then we tackle the problem with n-gramand Recursive Neural Networks. We formulate the defense as asemantic analysis problem, in which we extract linguistic featuresand address the problem using machine learning. Our experientialresults on real-world datasets suggest two interesting observations.First, automatically generating injected files in the attack will resultin files with low semantics. Second, it is viable to automaticallydetect injected files based on semantics and mitigate file-injectionattacks.

11:15
Towards Image-Based Dark Vendor Profiling: An Analysis of Image Metadata and Image Hashing in Dark Web Marketplaces

ABSTRACT. Anonymity networks, such as Tor, facilitate the hosting of hidden online marketplaces where dark vendors are able to anonymously trade paraphernalia such as drugs, weapons, and hacking services. Effective dark marketplace analysis and dark vendor profiling techniques support dark web investigations and help to identify and locate these perpetrators. Existing automated techniques are text based, leaving non-textual artifacts, such as images, out of consideration. Though image data can further improve investigative analysis, there are two primary challenges associated with dark web image analysis: (a) ethical concerns over the presence of child exploitation imagery in illegal markets, and (b) the computational overhead needed to download, analyze, and store image content. In this research, we investigate and address the aforementioned challenges to enable dark marketplace image analysis. Namely, we examine image metadata and explore several image hashing techniques to represent image content, allowing us to collect image based intelligence and identify reused images among dark marketplaces while preventing exposure to illegal content and decreasing computational overhead. Our study reveals approximately 75% of dark marketplace listings include image data, indicating the importance of considering image content for investigative analysis. Additionally, 2% of considered images were found to contain metadata and approximately 50% of image hashes were repeated among marketplace listings, suggesting the presence of easily obtainable incriminating evidence and frequency of image reuse among dark vendors. Finally, through an image hash analysis, we demonstrate the effectiveness of using image hashing to identify similar images between dark marketplaces.

11:45
Towards Automatic Detection and Explanation of Hate Speech and Offensive Language

ABSTRACT. The use of hate speech and offensive language online has become widely recognized as a critical social problem plaguing today's Internet users. Previous research in the detection of hate speech and offensive language has primarily focused on using machine learning approaches to naively detect hate and offensive language, without explaining the reasons for their detection. In this work, we introduce a novel hate speech and offensive language defense system called HateDefender, which consists of a detection model based on deep Long Short-term Memory (LSTM) neural networks and an explanation model based on the gating signals of LSTMs. HateDefender effectively detects hate speech and offensive language (average accuracy of 90.82% and 89.10% on hate and offensive language, respectively) and explains their factors by pinpointing the exact words that are responsible for causing them. Our system uses these explanations for the effective intervention of such incidents online.

13:30-14:00 Session 3: Tutorial

IWSPA 2020 Tutorial on Adversarial Machine Learning

13:30
Adversarial Machine Learning for Text Data

ABSTRACT. In this tutorial, we investigate the history, evolution and latest research topics in the area of adversarial machine learning for text data. Both classical attacks on spam filters and more recent attacks on deep learning models for text classification problems will be discussed. We then discuss proposed and potential defenses against these attacks. We conclude with some directions for future research.

14:00-15:30 Session 4: Phishing and APT

Full Paper Session II

14:00
Diverse Datasets and a Customizable Benchmarking Framework for Phishing

ABSTRACT. Phishing is a serious challenge that remains largely unsolved despite the efforts of many researchers. In this Dataset/Tool description paper, we present: (i) our work on creating high quality, diverse and representative email and URL/website datasets for phishing and making them publicly available, and (ii) our benchmarking framework, PhishBench, which automates the extraction of more than 200 features, more than 30 classifiers, and 12 evaluation metrics, for detection of phishing emails, websites and URLs. Using PhishBench, the research community can easily run their models and benchmark their work against the work of others, who have used common dataset sources for emails (Nazario, SpamAssassin, WikiLeaks, etc.) and URLs (PhishTank, APWG, Alexa, etc.).

14:30
Automatic Recognition of Advanced Persistent Threat Tactics for Enterprise Security

ABSTRACT. Advanced Persistent Threats (APT) has become the concern of many enterprise networks. APT can remain undetected for a long time span and lead to undesirable consequences such as stealing of sensitive data, broken workflow, and so on. To achieve the attack goal, attackers usually leverage specific tactics that utilize a variety of techniques. This paper explores the recognition of APT tactics through synthesized analysis and correlation of data from various sources. We propose a framework for detecting the APT tactics and discuss the application of machine learning techniques in this problem. Our framework can be used by the security analysts for effective detection of APT attacks. The evaluation of our approach shows that it can detect APT tactics with high accuracy and low false positive rate. Therefore, it can be used for tactic-centric APT detection and effective implementation of cyber security response operations.

15:00
Phish-GAN: Generating Phishing Attacks Using Feature-Oriented Adversarial Deep Neural Networks

ABSTRACT. The URL features of websites has been frequently used in generating phishing detection techniques and machine learning is the most widely used approach to identify anomalous patterns in the features of URLs as signs of possible phishing. However, adversaries may have enough knowledge and motivation to bypass URL classification algorithms by creating examples that evade those classifiers. This paper proposes a feature-oriented approach that generates URL-based phishing examples using Generative Adversarial Networks. The created examples can fool Blackbox phishing detectors, even when those detectors are created using sophisticated approaches, such as those relying on intra-URL similarities. The created instances are used to deceive Blackbox machine learning-based phishing detection models. We tested the approach using real phishing datasets, we also considered feature importance when generating adversarial examples. The results show that GAN networks are very effective in creating adversarial phishing examples that can fool both simple and sophisticated machine learning phishing detection models.

16:00-16:40 Session 5: Privacy and Attacker Behavior

Short Paper Session

16:00
Privacy-preserving SVM on Outsourced Genomic Data via Secure Multi-party Computation

ABSTRACT. Machine learning methods are employed in many areas, such as medical data research, for their efficient and powerful data mining ability. However, submitting unprotected data to a third party, which attempts to train a machine learning model, may suffer from data leakage and privacy violation when the third party is compromised by an adversary. Hence, designing a protocol to execute encrypted computation is inevitably indispensable. In order to address this problem, we propose protocols based on secure multi-party computation to train a support vector machine model privately. Utilizing the semi-honest adversary model and oblivious transfer, the proposed protocols enable the training of a non-linear support vector machine on the combined data from various sources without sacrificing the privacy of individuals. The protocols are applied to train a support vector machine model with the radial basis function kernel on HIV sequence data to predict the efficacy of a certain antiviral drug, which only works if the viruses can only use the human CCR5 coreceptor for cell entry. Benchmarked on synthesized data with 10 data sources that consist of random generated integers, containing 100 labeled samples each, the protocol has consumed online time 2991.386/166.912 ms on average in arithmetic/boolean circuits, respectively. The cross-validation has reached 0.5819 F1-score on average on training data with the optimized parameters, which have reached 0.7058 F1-score afterwards on testing data set, which consist of protein sequence of CCR5 and its subtypes. The complete training and testing process on the real data, which contains in total 766 samples having 924 features after encoding, has consumed 43.75/15.84 seconds on average using arithmetic/boolean circuits, respectively, which shows the effectiveness and efficiency of our protocols compared to some of the existing studies in the literature.

16:30
Dissecting Cyberadversarial Intrusion Stages via Interdisciplinary Observations

ABSTRACT. Advanced Persistent Threats (APTs) are professional, sophisticated threats that pose a serious concern to our technologically-dependent society. As these threats become more common, conventional response-driven cyberattack management needs to be substituted with anticipatory defense measures. Understanding adversarial behavior and movement is critical to improve our ability to proactively defend. This paper focuses on understanding adversarial movement and adaptation using a case study from a real-time cybersecurity exercise. Through multidisciplinary methodologies from social and hard sciences, this paper presents a mechanism to dissect cyberadversarial intrusion chains to unpack movement, and adaptations.