BDCAT2020: 7TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES
PROGRAM FOR MONDAY, DECEMBER 7TH
Days:
next day
all days

View: session overviewtalk overview

09:00-13:10 Session 1: Opening and Keynotes

Conference Opening (General & PC Chairs)

Paul Monk (Chief Scientific Advisor: Business, Energy & Industrial Strategy, UK Government)

Break

Keynote (UCC): Rajiv Ranjan (Newcastle University, UK)

Break

Keynote3 (Industry talk): (Bloc Digital, UK) – Roche (US or Switzerland)

Break

Keynote (BDCAT): Minyi Guo (Shanghai Jiatong University, China)

13:30-15:00 Session 2: Machine Learning: Health and Environment
13:30
Attention Models for PM25 Prediction

ABSTRACT. Air pollution is becoming a rising and serious environmental problem, mainly as a result from the migrations in urban areas. By employing effective air pollution monitoring systems, the pollution could be closely monitored, but this is not enough to make a significant impact in decreasing the pollution. The most effective value obtained from these systems is the amount of data that can be used to build pollution prediction models. To date, there are lot of different attempts to tackle the problem of air pollution prediction, but there is no evidence of their successful implementation in decreasing air pollution. In the last years, with the recent advances of deep learning techniques, and the increasing amount of data available, there are lot of proposed models for tackling the problem. In this research paper, we propose two different attention based models for air pollution prediction. Our models differ from all previous proposed models by introducing different attention factors for the previous timesteps when making a prediction. The model learns the attention factors, allowing it to learn the optimal amount that previous timesteps affect the current prediction. Using this approach we could better learn the patterns and dependencies in the data and in turn build better prediction models. We show that our models outperform two state-of-the-art models by employing our novel architecture.

14:00
A Multi-Modal Deep Learning approach to the Early Prediction of Mild Cognitive Impairment Conversion to Alzheimer’s Disease

ABSTRACT. Mild cognitive impairment (MCI) has been described as the intermediary stage before Alzheimer’s Disease – many people however remain stable or even demonstrate improvement in cognition. Early detection of progressive MCI (pMCI) therefore can be utilised in identifying at-risk individuals and directing additional medical treatment in order to revert conversion to AD as well as provide psychosocial support for the person and their family. This paper presents a novel solution in the early detection of pMCI people and classification of AD risk within MCI people. We propose a model, MudNet, to utilise deep learning in the simultaneous prediction of progressive/stable MCI classes and time-to-AD conversion where high-risk pMCI people see conversion to AD within 24 months and low-risk people greater than 24 months. MudNet is trained and validated using baseline clinical and volumetric MRI data (n = 559 scans) from participants of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The model utilises T1-weighted structural MRIs alongside clinical data which also contains neuropsychological (RAVLT, ADAS-11, ADAS-13, ADASQ4, MMSE) tests as inputs. The averaged results of our model indicate a binary accuracy of 69.8\% for conversion predictions and a categorical accuracy of 66.9\% for risk classifications.

14:30
Evaluating Deep Learning Algorithms for Real-Time Arrhythmia Detection

ABSTRACT. Cardiovascular diseases, such as heart attack and congestive heart failure, are the leading cause of death both in the United States and worldwide. The current medical practice for diagnosing cardiovascular diseases is not suitable for long-term, out-of-hospital use. A key to long-term monitoring is the ability to detect abnormal cardiac rhythms, i.e., arrhythmia, in real-time. In this paper, we present our work in designing real-time sensing, and evaluating machine learning algorithms for real-time arrhythmia detection. Most of the existing work applies machine learning algorithms to electrocardiogram (ECG) images to detect abnormal patterns. These approaches are not suitable for real-time processing due to high processing overhead. In our work, we treat data as time series, and evaluate various machine learning algorithms in terms of both learning and computational performance. Our experimental results show that the long short-term memory network (LSTM) has both high accuracy and efficiency, demonstrating great potential for online detection of arrhythmia.

16:00-18:00 Session 3: Data Processing and Visualisation
16:00
Large-scale Data Integration Using Graph Probabilistic Dependencies

ABSTRACT. The diversity and proliferation of Knowledge bases have made data integration one of the key challenges in the data science domain. The imperfect representations of entities, particularly in graphs, add additional challenges in data integration. Graph dependencies (GDs) were investigated in existing studies for the integration and maintenance of data quality on graphs. However, the majority of graphs contain plenty of duplicates with high diversity. Consequently, the existence of dependencies over these graphs becomes highly uncertain. In this paper, we proposed graph probabilistic dependencies (GPDs) to address the issue of uncertainty over these large-scale graphs with a novel class of dependencies for graphs. GPDs can provide a probabilistic explanation for dealing with uncertainty while discovering dependencies over graphs. Furthermore, a case study is provided to verify the correctness of the data integration process based on GPDs. Preliminary results demonstrated the effectiveness of GPDs in terms of reducing redundancies and inconsistencies over the benchmark datasets.

16:30
A Data Indexing Technique to Improve the Search Latency of AND Queries for Large Scale Textual Documents

ABSTRACT. Boolean AND queries (BAQ) are one of the most important types of queries used in text searching. In this paper, a graph-based indexing technique is proposed to improve the search latency of BAQ using a Graph-Based Index (GBI) structure. It also shows how a graph structure represented using a hash table can reduce the number of intersections needed for the evaluation of BAQ. The performance of the proposed technique is compared with one of the most widely used index structures for textual documents called Inverted Index. A detailed performance analysis is performed through prototyping and measurement on a system subjected to a synthetic workload. To get further performance insights, the proposed graph-based indexing technique is also compared with an enterprise-level search engine called Elasticsearch which uses Inverted Index at its core. The analysis shows that the graph-based indexing technique can reduce the latency for executing BAQ significantly in comparison to the other techniques.

17:00
Iris: Amortized, Resource Efficient Visualizations of Voluminous Spatiotemporal Datasets

ABSTRACT. The growth in observational data volumes over the past decade has occurred alongside a need to make sense of the phenomena that underpin them. Visualization is a key component of the data wrangling process that precedes the analyses that informs these insights. The crux of this study is interactive visualizations of spatiotemporal phenomena from voluminous datasets. Spatiotemporal visualizations of voluminous datasets introduce challenges relating to interactivity, overlaying multiple datasets and dynamic feature selection, resource capacity constraints, and scaling. In this study we describe our methodology to address these challenges. We rely on a novel mix of algorithms and systems innovations working in concert to ensure effective apportioning and amortization of workloads and enable interactivity during visualizations. In particular our research prototype, Iris, leverages sketching algorithms, effective query predicate generation and evaluation, avoids performance hotspots, harnesses coprocessors for hardware acceleration, and convolutional neural network based encoders to render visualizations while preserving responsiveness and interactivity. We also report on several empirical benchmarks that demonstrate the suitability of our methodology to preserve interactivity while utilizing resources effectively to scale.

17:30
Agami: Scalable Visual Analytics over Multidimensional Data Streams

ABSTRACT. As worldwide capability to collect, store, and manage information continues to grow, the ensuing datasets become increasingly difficult to understand and extract insights from. Interactive data visualizations offers a promising avenue to efficiently navigate and gain insights from highly complex datasets, but the velocity of modern data streams often means that precomputed representations or summarizations of the data will quickly become obsolete. Our system, Agami, provides live-updating, interactive visualizations over streaming data. We leverage in-memory data sketches to summarize and aggregate information to be visualized, and also allow users to query future feature values by leveraging online machine learning models. Our approach facilitates low-latency, iterative exploration of data streams and can scale out incrementally to handle increasing stream velocities and query loads. We also provide a thorough evaluation of our data structures and system performance using a real-world meteorological dataset.