IEEE ISCC 2020: IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS 2020
PROGRAM FOR FRIDAY, JULY 10TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:30 Session 11: Security Session IV
  • Privacy / Anonymity
  • Attacks and Defenses
  • Authentication, Authorization and Accounting
  • Hardware Security
  • Intrusion Detection
  • Moving Target Defense (MTD)
  • Blockchain
09:00
MalFinder: An Ensemble Learning-based Framework For Malicious Traffic Detection
PRESENTER: Candong Rong

ABSTRACT. Malicious events pose a significant threat to the current increasingly interconnected Internet community. Detection based on features of network traffic and machine learning algorithms is a common approach to identify malicious events. The performance of approaches is associated with the used features and algorithms. In this paper, we propose MalFinder, an ensemble learning-based framework for malicious traffic detection. Considering the trend of network traffic encryption and the complexity of decrypting traffic, we utilize statistical features and sequence features to describe network traffic. We extend the dimensions of these two types of features to enhance their capability for representing traffic data. Feature importance analysis and contrast experiments illustrate the effectiveness of our new features. Among our selected classifiers suitable for malicious traffic detection, boosting-based classifiers XGBoost and LightGBM can reduce bias, and bagging-based classifier Random Forest can reduce variance. Stacking, which is the integration method of the classification results used in our framework, can improve the generalization ability of the method. MalFinder can achieve 96.58% F-measure and 95.44% accuracy in the malicious traffic detection task on a real-world dataset, whose results are better than those of comparison methods. In terms of unseen malicious traffic discovery, MalFinder still provides good performance with 93.46% F-measure and 91.04% accuracy, which even surpasses the results in the task of known malicious traffic detection of other comparative methods. With consideration of the scarcity of public data sets used for malicious traffic detection, we have exposed our self-built dataset for more extensive researches.

09:15
Power Range: Forward Private Multi-Client Symmetric Searchable Encryption with Range Queries Support

ABSTRACT. Symmetric Searchable encryption (SSE) is an encryption technique that allows users to search directly over their outsourced encrypted data while preserving the privacy of both the files and the queries. In this paper, we present Power Range – a dynamic SSE scheme (DSSE) that supports range queries in the multi-client model. We prove that our construction captures the very crucial notion of forward privacy in the sense that additions and deletions of files do not reveal any information about the content of past queries. Finally, to deal with the problem of synchronization in the multi-client model, we exploit the functionality offered by Trusted Execution Environments and Intel’s SGX.

09:30
Efficient and Secure Hash Function Scheme Based on RC4 Stream Cipher

ABSTRACT. High number of rounds is needed for the existing message authentication algorithms, such as keyed hash functions like Hash-based Message Authentication Code (HMAC) or block cipher based functions like Cipher-based Message Authentication Code (CMAC) and Galois Message Authentication Code (GMAC). Moreover, the employed compression functions consist of several operations to achieve two main properties: confusion and diffusion. This large number of rounds introduces high overhead for resource-limited systems like Internet of Things (IoT) or delay-sensitive systems that have real-time requirements like Intelligent Transparent Systems. In this paper, a new lightweight message authentication algorithm is proposed to reduce the number of rounds to one. The proposed compression function is based on the RC4 stream cipher to reduce the required overhead in terms of latency and resources. Finally, the security and performance analysis shows that the proposed keyed hash function is resistant towards existing security attacks with low resources overhead.

09:45
An extensible IoT Security Taxonomy

ABSTRACT. The Internet of Things (IoT) offers many opportunities for the industrial and private sector through its various functionalities. However, the omnipresent deployment of computing devices also introduces many possibilities for attackers.In order to understand the threats in the IoT environment,it is necessary to establish a common terminology. Previously proposed taxonomies are either ambiguous or cover the IoT only partially and are therefore not applicable. It is also not sufficient to use existing security taxonomies for computer systems as they do not consider the cyber-physical aspect of IoT systems. We propose a taxonomy for IoT threats that is based on the attacked layer and fundamental security principles. Our naming scheme enables an accurate and future-proof description of threats,as well as consistent future extensions. Our taxonomy enables classifying and comparing different IoT attacks with regards to violated security goals and functionalities. We validate our approach based on existing taxonomy and attack descriptions.

10:00
Flush-Detector: More Secure API Resistant to Flush-Based Spectre Attacks on ARM Cortex-A9
PRESENTER: Min He

ABSTRACT. ARM series processors are increasingly used in IoT and cloud services because of their high performance and flexibility of hardware design, especially Cortex-A9 MPCore processor. However, they also suffer from various types of security threats, typically such as flush-based cache attacks. Among these attacks, flush-based Spectre attacks(using Flush + Reload for Spectre attacks) represent a serious threat to system. They usually induce the victim to speculatively perform operations that would not occur during the correct program execution, and then leak the victim's confidential information to the adversary via cache side channel attacks. So far, there is no widely accepted solution to defend against Spectre attacks. The proposed solutions either lead to large performance losses or sacrifice transparency. In this paper, we propose a secure flush operation API named Flush-Detector to mitigate flush-based Spectre attacks. We present the design and implement of Flush-Detector to detect and defend against flush-based Spectre attacks on ARM Cortex-A9 MPCore. The attack experimental results show that Flush-Detector can detect flush-based Spectre attacks in real time and reduce the attack success rate to less than 1%. Moreover, performance test results demonstrate that the time consumption of Flush-Detector API is about 17.7% longer than the original cache flush API.

11:00-13:00 Session 12: Tutorial : Big Sequence Management

Data series are a prevalent data type that has attracted lots of interest in recent years. Specifically, there has been an explosive interest towards the analysis of large volumes of data series in many different domains. This is both in businesses (e.g., in mobile applications) and in sciences (e.g., in biology). In this tutorial, we focus on applications that produce massive collections of data series, and we provide the necessary background on data series storage, retrieval and analytics. We look at systems historically used to handle and mine data in the form of data series, as well as at the state of the art data series management systems that were recently proposed. Moreover, we discuss the need for fast similarity search for supporting data mining applications, and describe efficient similarity search techniques, indexes and query processing algorithms. Finally, we look at the gap of modern data series management systems in regards to support for efficient complex analytics, we argue in favor of the integration of summarizations and indexes in modern data series management systems, and we discuss the role that deep learning techniques can play in this context. We conclude with the challenges and open research problems in this domain.

11:00
Big Sequence Management
PRESENTER: Themis Palpanas
14:00-15:30 Session 13A: Artificial Intelligence (AI) : Session III
  • Artificial Intelligent Systems applications in Computers and Communications
  • AI Technologies
  • Game Theory
  • Machine and Deep Learning of Knowledge
  • Bio-inspired Computing in Communications
  • Data Science and Data Engineering
  • Distributed Knowledge and Processing
14:00
Efficient Malware Originated Traffic Classification by Using Generative Adversarial Networks

ABSTRACT. With the booming of malware-based cyber-security incidents and the sophistication of attacks, previous detections based on malware sample analysis appear powerless due to time-consuming and labor-intensive analysis procedure. The existing detection methods based on traffic analysis rely heavily on the existing traffic patterns, which hinder detecting the zero-day attacks generated by malware variants. In this paper, we propose an approach based on deep learning referred to as TrafficGAN, which analyzes (HTTP) traffic sessions to distinguish between malware-related and normal traffic. We try to explore traffic patterns of malware variants by adding noise and category condition to generate various similar traffic. And then, we use discriminative model to seek the deviation between abnormal traffic and normal traffic by extracting the essential difference. We increase the diversity of data by generating samples adversarially, which enhances the robustness of the system to detect zero-day attacks. We conduct extensive experiments on the public dataset and our data collected for specific targets. The results demonstrate that our method achieves superior performance to other methods and protects specific targets from the susceptibility of malware.

14:15
NeuralPot: An Industrial Honeypot Implementation Based On Convolutional Neural Networks

ABSTRACT. Honeypots are powerful security tools, which are developed to shield commercial and industrial networks from malicious activity. Honeypots act as passive and interactive decoys in a network by attracting malicious activity away from critical network devices. Given that the security incidents against industrial and critical infrastructure are getting sophisticated and persistent, advanced security systems are needed. In this paper, a novel industrial honeypot implementation is presented, which is based on the Modbus protocol, entitled NeuralPot. The presented NeuralPot honeypot is able to emulate industrial Modbus entities in order to actively confuse the intruders. It achieves this by introducing two distinct deep neural networks, a Generative Adversarial Network and an Autoencoder Network, which learn Modbus device behavior and generate realistic-looking traffic behavior. Based on the evaluation results, the proposed industrial honeypot performs well in terms of accuracy, similarity, and elapsed time of data generation.

14:30
Knowledge-Based Machine Learning Boosting for Adversarial Task Detection in Mobile Crowdsensing

ABSTRACT. Mobile Crowdsensing (MCS) leverages Sensing as a Service paradigm to contribute to the Internet of Things ecosystems through non-dedicated sensing capabilities of smart mobile devices. Distributed and non-trusted nature of MCS systems are vulnerable against various threats for the devices, MCS platforms, as well as the participating devices that provide sensory data services. Out of the many threats, submission of fake tasks may lead to drained resources at the participating devices, and clogged sensing server resources at MCS platforms. In this paper, classical machine learning performance is boosted by knowledge-based methods and sequential feature selection which is proposed for the first time against fake tasks submission to MCS platforms. Prior Knowledge Input and Prior Knowledge Input with Difference exploit AdaBoost and Decision Tree methods as initial accuracy to improve the accuracy of learning the legitimacy of submitted tasks to MCS platforms. Moreover, Sequential Feature Selection is implemented to investigate further improvements for the detection of task legitimacy in MCS campaigns. Intelligently selected 5 features amongst 10 possible features and implementation of knowledge-based methods boost the accuracy of machine learning performance from 93.67% to 97.37% for AdaBoost, and from 92.28% to 97.58% for Decision Trees.

14:45
AWMF: All-Weighted Metric Factorization for Collaborative Ranking

ABSTRACT. This paper contributes improvements on both the defect of dot product and the imbalance of the datasets in matrix factorization. Above all, matrix factorization is still the most widely used technology in the recommendation system. However, its dot product does not satisfy the triangle inequality, which restricts the improvement of its recommendation effect. We take inspiration from the distance factor of metric learning, and convert the determinants of user-item relevance from the size of the dot product to the distance of the metric factorization. Furthermore, the number of positive examples is much smaller than the negatives in most datasets. Such an unbalanced scenario will affect the accuracy of recommendations. Inspired by the positive semidefinite matrix of the popular Mahalanobis distance in the field of metric learning, we have fully considered the interaction information between users and items and propose the concept of all-weighted matrix. Finally, the combination of the two improved techniques proposed the All-Weighted Metric Factorization (AWMF) method, which is applied to the personalized ranking task. Extensive experimental results on three real-world datasets demonstrate that our method outperforms the competitive baselines on several evaluation metrics.

15:00
Reluplex made more practical: Leaky ReLU
PRESENTER: Jin Xu

ABSTRACT. In recent years, Deep Neural Networks (DNNs) have been experiencing rapid development and have been widely used in various fields. However, while DNNs have shown strong capabilities, their security problems have gradually been exposed. Therefore, the formal guarantee of neural network output is needed. Prior to the appearance of the Reluplex algorithm, the verification of DNNs was always a difficult problem. Reluplex algorithm is specially used to verify DNNs with ReLU activation function. This is an excellent and effective algorithm, but it cannot verify more activation functions. ReLU activation function will bring about “Dead Neuron” problem, and Leaky ReLU activation function can solve this problem, so it is necessary to verify DNNs based on Leaky ReLU activation function. Therefore, we propose the Leaky-Reluplex algorithm, which is based on the Reuplex algorithm. Leaky-Reluplex algorithm can verify DNNs based on Leaky ReLU activation function.

14:00-15:30 Session 13B: Cloud and Edge Computing : Session III
  • Cloud Computing (IaaS, PaaS, and SaaS)
  • Mobile Cloud and Mobile Cloud Networking
  • Fog Computing
  • Distributed Systems Architecture and Management
14:00
Trail: A Blockchain Architecture for Light Nodes

ABSTRACT. In Bitcoin and Ethereum, the nodes require a large storage to keep all of the blockchain data, such as transactions, UTXOs, account states. As of January 2020, the storage size of Bitcoin blockchain has expanded to 260 GB and will continue to increase. This is a big hurdle to become a block proposer or a validator. Although there are many studies to reduce the storage size, the proposed methods were that nodes cannot keep all blocks or cannot generate a block. We propose an architecture called Trail that allows nodes to hold all blocks in a small storage, and to generate and validate blocks and transactions. Trail does not depend on the consensus algorithm or the kind of fork choice rule. In this architecture, the client has the data to prove own balances, and generates a transaction containing the proof of balances. The nodes in Trail do not keep transactions, UTXOs and account balances. They keep only blocks. The blocksize is about 8 KB which is 100 times smaller than Bitcoin. Further, the blocksize is constant regardless of the number of accounts and the number of transactions. Compared to traditional blockchains, clients need to keep Merkle proofs additionally. However, with proper data archiving, the storage size on the client device will be about 1.6 MB. Trail allows more users to be block proposers and validators and improves the decentralization of the blockchain.

14:15
GIN: Better going safe with personalized routes

ABSTRACT. Contextual data characterize distinct regions of the city allowing to differentiate them according to security, entertainment, services, among others. Using contextual data to suggest routes helps to understand new aspects of a city that can change users perceptions of different routes. The impact of each type of contextual data may vary according to the user's profile, which is not taken into account in most of the systems proposed by the literature. In addition, it is necessary to consider the behavior of contextual data which changes according to the type of data. To mitigate the problems mentioned above, a route suggestion system with space-time risk is proposed called GIN. The system consists of three modules, namely: identification of contextual windows, context mapping, and route personalization. In addition, a strategy to decrease the number of route requests is proposed to improve the system scalability. The evaluation results show that the system adapts to sensitive changes in the user's profile. In addition, positive results were obtained by using the behavior of contextual data to avoid unnecessary requests. This allowed for a reduction of up to 50\% of requests made to the system.

14:30
Inference Time Optimization Using BranchyNet Partitioning

ABSTRACT. Deep Neural Network (DNN) inference requires high computation power, which generally involves a cloud infrastructure. However, sending raw data to the cloud can increase the inference time due to the communication delay. To reduce this delay, the first DNN layers can be executed at an edge infrastructure and the remaining ones at the cloud. Depending on which layers are processed at the edge, the amount of data can be highly reduced. However, executing layers at the edge can increase the processing delay. A partitioning problem tries to address this trade-off, choosing the set of layers to be executed at the edge to minimize the inference time. In this work, we address the problem of partitioning a BranchyNet, which is a DNN type where the inference can stop at the middle layers. We show that this partitioning can be treated as the shortest path problem, and thus solved in polynomial time.

14:45
Cache Efficient Louvain with Local RCM
PRESENTER: Sanaz Gheibi

ABSTRACT. We develop a cache efficient Louvain community detection algorithm. Its effectiveness is demonstrated by benchmarking it against four existing Louvain algorithms on the Intel Knights Landing (KNL) and Haswell computational platforms using real and synthetic datasets. For a single iteration of Louvain, our algorithm obtains a speedup of up to 76.18% on real datasets on KNL, 51.91% on real networks on Haswell, 71.31% on synthetic networks on KNL, and 59.13% on synthetic networks on Haswell. These percentages using 2 iterations are 62.91%, 43.27%, 67.61%, and 54.43%, respectively.

15:00
Foiling Sybils with HAPS in Permissionless Systems: An Address-based Peer Sampling Service

ABSTRACT. Blockchains and distributed ledgers have brought renewed interest in Byzantine fault-tolerant protocols and decentralized systems, two domains studied for several decades. Recent promising works have in particular proposed to use epidemic protocols to overcome the limitations of popular Blockchain mechanisms, such as proof-of-stake or proof-of-work. These works unfortunately assume a perfect peer-sampling service, immune to malicious attacks, a property that is difficult and costly to achieve. We revisit this fundamental problem in this paper, and propose a novel Byzantine-tolerant peer-sampling service that is resilient to Sybil attacks in open systems by exploiting the underlying structure of wide-area networks.

16:00-17:30 Session 14A: Security Session V
  • Privacy / Anonymity
  • Attacks and Defenses
  • Authentication, Authorization and Accounting
  • Hardware Security
  • Intrusion Detection
  • Moving Target Defense (MTD)
  • Blockchain
16:00
UAV-based Surveillance System: an Anomaly Detection Approach

ABSTRACT. Recent advancements in avionics and electronics systems led to the increased use of Unmanned Aerial Vehicles (UAVs) in several military and civilian missions. One of the main advantages that makes UAVs attractive is their ability to reach remote regions that are inaccessible to human operators, i.e. provide new aerial perspective in visual surveillance. Autonomous visual surveillance systems require real time anomalies detection. However, there are many difficulties associated with automatic anomalies detection by an UAV, as there is a lack in the proposed contributions describing abnormal events detection in videos recorded by a drone. In this paper, we propose an anomaly detection approach in a surveillance mission where videos are acquired by an UAV. We combine deep features extracted using a pretrained Convolutional Neural Network (CNN) with an unsupervised classification method, namely One Class Support Vector Machine (OCSVM). The quantitative results obtained on the used dataset show that our proposed method achieves good results in comparison to existing technique with an Area Under Curve (AUC) of 0:93.

16:15
K-Cipher: A Low Latency, Bit Length Parameterizable Cipher

ABSTRACT. We present the design of a novel low latency, bit length parameterizable cipher, called the "K-Cipher". K-Cipher is particularly useful to applications that need to support ultra low latency encryption at arbitrary ciphertext lengths. We can think of a range of networking, gaming and computing applications that may require encrypting data at unusual block lengths for many different reasons, such as to make space for other unencrypted state values. Furthermore, in modern applications, encryption is typically required to complete inside stringent time frames in order not to affect performance. K-Cipher has been designed to meet these requirements. In the paper we present the K-Cipher design and discuss its rationale. We also present results from our ongoing security analysis which suggest that only 2 to 4 rounds are sufficient to make the cipher operate securely. Finally, we present synthesis results from 2-round 32 bit and 64 bit K-Cipher encrypt datapaths, produced using Intel's 10 nm process technology. Our results show that the encrypt datapaths can complete in no more than 767 psec, or 3 clocks in 3.9-4.9 GHz frequencies, and are associated with a maximum area requirement of 1875 square microns.

16:30
Finding Persistent Elements of Anomalous Flows in Distributed Monitoring Systems

ABSTRACT. This paper concentrates on the issue of detecting persistent elements of anomalous flows in a distributed monitoring system, which has many applications in detecting cyber-attacks, forecasting influenza, analyzing search keywords, and etc. However, only a few studies consider the anomalous flow detection problem in distributed systems. Meanwhile, most of the existing studies on persistent element detection problem in distributed systems assume that there is only one flow in the data stream, which is not always true in practice. In this paper, we combine the problems of anomalous flow detection and persistent elements finding, and propose an efficient mechanism to find the t-persistent elements of p-anomalous flows from element sets of numerous flows in the monitors of a distributed system, where t and p are system parameters that can be defined based on the application requirement. We adopt tight data structures such as bitmap and bloom filter to record the elements of different flows and filter out the elements that not in the t-persistent element set, which can help us reduce the communication overhead between monitors and the controller. We also give an analysis of how to get the optimal settings of these tight data structures that can minimize the total communication overhead. The experiment results based on real network traces show that the proposed mechanism achieves 76.1% and 69.2% reduction in communication overhead in comparison with a straightforward solution and a state-of-the-art solution based on coding cuckoo filter, respectively.

16:45
Towards Comprehensive Detection of DNS Tunnels

ABSTRACT. Domain Name System (DNS) is a fundamental service of the Internet. DNS tunnel is one of the most threatening abuses of DNS that has posed a huge threat to user privacy and internet security. Attackers conceal the information into DNS packets to evade detection by firewalls and intrusion detection systems. And the newly developed DNS tunnels, which have been used by Advanced Persist Threat groups, tend to use A and AAAA resource records (RRs) for transmission, making them stealthier and more threatening. Prior DNS tunnels detection approaches mainly focus on subdomains and TXT RRs. Less attention is currently being paid to newly developed A and AAAA RRs based DNS tunnels. In this paper, we propose a novel approach to detect DNS tunnels, including those newly developed ones that use A and AAAA RRs for transmission. We first investigate RR types used by different DNS tunnel tools. Novel features are extracted from the domains and 4 types of RRs that are most commonly used for tunneling to measure the amount and content of information exchanged between authoritative nameservers and clients. We also analyze the detection capabilities of different features. The anomaly detection algorithm is employed on domains related features and 4 types of RRs related features separately. The overlaps of outliers will be flagged as DNS tunnels. Our approach has been evaluated on real-world traffic. The experimental results show that our approach can detect all the DNS tunnels with extremely low false positive rate.

17:00
Defense Against Advanced Persistent Threats: Optimal Network Security Hardening Using Multi-stage Maze Network Game

ABSTRACT. Advanced Persistent Threat (APT) is a stealthy, continuous and sophisticated method of network attacks, which can cause serious privacy leakage and millions of dollars losses. In this paper, we introduce a new game-theoretic framework of the interaction between a defender who uses limited Security Resources(SRs) to harden network and an attacker who adopts a multi-stage plan to attack the network. The game model is derived from Stackelberg games called a Multi-stage Maze Network Game (M\textsuperscript{2}NG) in which the characteristics of APT are fully considered. The possible plans of the attacker are compactly represented using attack graphs(AGs), but the compact representation of the attacker's strategies presents a computational challenge and reaching the Nash Equilibrium(NE) is NP-hard. We present a method that first translates AGs into Markov Decision Process(MDP) and then achieves the optimal SRs allocation using the policy hill-climbing(PHC) algorithm. Finally, we present an empirical evaluation of the model and analyze the scalability and sensitivity of the algorithm. Simulation results exhibit that our proposed reinforcement learning-based SRs allocation is feasible and efficient.

16:00-17:30 Session 14B: Services and Protocols III
  • Advances in Internet Protocols
  • Green Networking
  • Real Time Communication Services
  • Routing and Multicast
  • Network Design, Optimization and Management
  • Network Reliability, Quality of Service and Quality of Experience
  • Fault-Tolerance and Error Recovery
  • Web Services and Service Oriented Architectures
  • Standards Evolution
  • Digital Satellite Communications Service
  • Localisation Protocols
  • Communications Services and Management
  • Crowdsourcing applications
  • Crowdsensing
  • Social Networks
  • Peer-to-Peer Computing
  • Computing applications
  • Software Engineering
  • Big Data, Data Mining and Database Applications
16:00
Exploiting AS-level Routing Properties to Locate Traffic Differentiation in the Internet

ABSTRACT. Network Neutrality states that all traffic in the Internet must be treated equally and thus cannot suffer unfair traffic differentiation (TD). Several solutions for detecting the presence of TD in the Internet have been proposed. However, locating where in the network TD is happening is still an open problem. In this work, we propose a strategy to locate Autonomous Systems (ASes) that are differentiating traffic. The proposed strategy takes advantage of AS-level routing properties to identify valid AS-level paths between end-hosts. It is then possible to select measurement points between which the AS-level paths traverse suspect ASes. Probes are sent from the measurement points and processed using end-to-end TD detectors based on statistical inference. The main idea is to check suspect ASes until only the AS that is actually discriminating traffic is filtered out. We first present results of experiments executed to validate the routing properties employed. Then the efficiency of the proposal for locating TD is evaluated using simulation. The results show that the proposed strategy is effective and efficient.

16:15
Identifying Impacts of Protocol and Internet Development on the Bitcoin Network

ABSTRACT. Improving transaction throughput is an important challenge for Bitcoin. However, shortening the block generation interval or increasing the block size to improve throughput makes it sharing blocks within the network slower and increases the number of orphan blocks. Consequently, the security of the blockchain is sacrificed. To mitigate this, it is necessary to reduce the block propagation delay. Because of the contribution of new Bitcoin protocols and the improvements of the Internet, the block propagation delay in the Bitcoin network has been shortened in recent years. In this study, we identify impacts of compact block relay---an up-to-date Bitcoin protocol---and Internet improvement on the block propagation delay and fork rate in the Bitcoin network from 2015 to 2019. Existing measurement studies could not identify them but our simulation enables it. The experimental results reveal that compact block relay contributes to shortening the block propagation delay more than Internet improvements. The block propagation delay is reduced by 64.5% for the 50th percentile and 63.7% for the 90th percentile due to Internet improvements, and by 90.1% for the 50th percentile and by 87.6% for the 90th percentile due to compact block relay.

16:30
A Parallel Graph Partitioning Approach to Enhance Community Detection in Social Networks
PRESENTER: Tales Lopes

ABSTRACT. Dealing with complex networks is often a challenge due to the high computational cost in analyzing a huge amount of data. Partitioning methods can decrease the complexity of large structures by reducing them to smaller, less connected parts. Also, the data splitting allows the use of multiprocessing to accelerate the execution of data procedures with simultaneity and parallelism. In this paper, we propose a new parallel partitioning algorithm with a focus on assisting in community detection in social networks. The algorithm uses a subtree-splitting strategy, as well as boundaries defined, in order to cut the network into n balanced subnetworks. Our proposal stands out for the focus on aiding density-based approaches, such as the NetSCAN clustering algorithm, considering two particulars: (i) keeping the partitions connectivity, and; (ii) allowing node overlapping between partitions. Experiments were carried out with different instances intending to investigate the partitions obtained and evaluate our proposal. Furthermore, the algorithm performance analysis in a large network is employed, sequential and parallel implementations are compared in terms of execution time and memory consumption. Evidence was provided that the proposed algorithm is able to split an extensive data set into balanced partitions with optimistic performance results.

16:45
ResTor: A Pre-Processing Model for Removing the Noise Pattern in Flow Correlation
PRESENTER: Zhong Guan

ABSTRACT. Flow correlation is a common approach to break the anonymity of anonymous communication. However, unpredictable network noise caused by multiple factors in open Internet raises the bar for existing correlation methods. Traditional methods which compare statistical distance of data flows and deep learning methods such as convolutional neural network behave worse because network noise changes traffic shape. In this paper, we design a pre-processing model called ResTor to perform the noise reduction before actually correlating entering and exiting flows. ResTor takes advantage of the stacked auto-encoder architecture to remove noise in two phases, and treats the byte accumulation sequences which are smoothed at fixed intervals as fitting targets. Experiment results show that the exiting Tor flows processed by ResTor are closer to their corresponding entering flows, thus the correlation task can be finished effectively even using traditional correlation ways: cosine distance and other statistical metrics assisted by ResTor achieves less computational overhead and higher correlation accuracy on Tor compared to the state-of-the-art method of DeepCorr, especially when traffic is obfuscated.

17:00
A Feedback Mechanism for Prediction-based Anomaly Detection In Content Delivery Networks

ABSTRACT. CDN (Content Delivery Network) has become an important infrastructure of the Internet to reduce transmission delay and improve end-to-end user experience. However, due to the complexity of CDN system, building an anomaly detection system is non-trivial. Anomaly detection system usually suffers from undesirable performance in terms of high rate of false positive and false negative, which may result from either a poor prediction model or inappropriate threshold setting. Identifying the root cause of a false detection is critical for diagnosing and improving the performance of anomaly detection. In this paper, we propose a feedback mechanism for prediction-based anomaly detection in CDN. Specifically, based on a small amount of feedbacks provided by human operators, we introduce a carefully-designed metric named Fitting-score to determine whether the prediction model can fit the data well. Then, a threshold adjustment mechanism is proposed to dynamically adjust the thresholds of residual errors. The experiments based on a three-month real CDN dataset collected from a top ISP-operated CDN in China validate the proposed method can effectively improve the performance of anomaly detection.