DATAMOD 2019: 8TH INTERNATIONAL SYMPOSIUM "FROM DATA TO MODELS AND BACK (DATAMOD)
PROGRAM FOR TUESDAY, OCTOBER 8TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:00 Session 5

No talks scheduled to allow for attendence of keynotes at other workshops.

10:30-12:30 Session 6: Data Analysis with Models

Accepted papers

10:30
Gender Recognition in the Wild with Small Sample Size - A Dictionary Learning Approach

ABSTRACT. In this work we address the problem of gender recognition from facial images acquired in the wild. This problem is particularly difficult due to the presence of variations in pose, ethnicity, age and image quality. Moreover, we consider the special case in which only a small sample size is available for the training phase. We rely on a feature representation obtained from the well known VGG-Face Deep Convolutional Neural Network (DCNN) and exploit the effectiveness of a sparse-driven sub-dictionary learning strategy which has proven to be able to represent both local and global characteristics of the train and probe faces. Results on the publicly available LFW dataset are provided in order to demonstrate the effectiveness of the proposed method.

11:00
An instrumented mobile language learning application for the analysis of usability and learning

ABSTRACT. Mobile applications for language learning (MALL) is a field that is at large dominated by translation-based learning approaches. Moreover, MALL feature a number of common practices that may not effectively address learning or may even increase the number of user errors. In this tool paper, we introduce a language learning application equipped with instrumentation code to collect data about user behavior and use such data in different ways. The most obvious use is to provide statistics and patterns of learning of the users, which can be used by users to adjust their learning approaches and by researchers to study learning processes and attitudes. For the benefit of the user collected data can be also exploited to drive the synthesis of exercises that best suit the user's language level and learning approach and are not likely to cause usability errors.

The main use of the application is, however, as a tool for research purposes. In fact, it is a tool for testing new forms of exercises and their combination on samples of users, thus providing valuable information for research in language learning as well as supporting the software development process of new MALL. Finally, an additional feature of the tool is the conversion of the collected data into a formal description of the user's behaviour to be used for formal verification and validation purposes.

11:30
Analysis and Visualization of Performance Indicators in University Admission Tests

ABSTRACT. This paper presents an analytical experience to evaluate the performance and anomaly detection of tests for admission to public Universities in Italy. Each test it personalized for each student and is composed of a series of questions, classified on different domains (e.g. maths, science, logic, etc.). Since each test is unique for composition, it is crucial to guarantee a similar level of difficulty for all the tests in a session. For this reason, to each question, it is assigned a level of difficulty from a domain expert. Thus, the general difficultness of a test depends on the correct classification of each item. We exploit a series of data mining processes to evaluate the performance of the different questions for a period of five years. We used clustering to group the questions according to a series of performance indicators to provide labeling of the data-driven level of difficulty. The measured level is compared with the \textit{a priori} assigned by experts. The misclassifications are then highlighted to the expert, who will be able to refine the question or the classification. Sequential pattern mining is used to check if biases are present in the composition of the tests and their performance. This analysis is meant to exclude overlaps or direct dependencies among questions. Analyzing co-occurrences we are able to state that the composition of each test is fair and uniform for all the students, even on several sessions. The analytical results are presented to the expert through a visual web application that loads the analytical data and indicators and composes an interactive dashboard. The user may explore the patterns and models extracted, by filtering and changing thresholds and analytical parameters.

12:00
Anomaly Detection From Log Files Using Unsupervised Deep Learning

ABSTRACT. Computer systems have grown in complexity to the point where manual inspection of system behaviour purposes of malfunction detection have become unfeasible. As these systems output voluminous logs of their activity, machine led analysis of them is a growing need with already several existing solutions. These largely depend on having hand-crafted features, require raw log preprocessing and feature extraction or use supervised learning necessitating having a labeled log dataset not always easily procurable. We propose a two part deep autoencoder model with LSTM units that requires no hand-crafted features, no preprocessing of data as it works on raw text and outputs an anomaly score for each log entry. This anomaly score represents the rarity of a log event both in terms of its content and temporal context. The model was trained and tested on a dataset of HDFS logs containing 2 million raw lines of which half was used for training and half for testing. While this model cannot match the performance of a supervised binary classifier, it could be a useful tool as a coarse filter for human inspection of log files where a labeled dataset is unavailable.

14:00-15:00 Session 7: Presentation Reports

Presentation Reports

14:00
Economics-driven behaviour intervention support in organizations

ABSTRACT. Security policy-makers (influencers) in an organization set security policies that embody intended behaviours for employees (as decision-makers) to follow. Decision-makers then face choices, where this is not simply a binary decision of whether to comply or not, but also how to approach compliance and secure working alongside other workplace pressures, and limited resources for identifying optimal security related choices. Conflict arises due to information asymmetries present in the relationship, where influencers and decision-makers both consider costs, gains, and losses in ways which are not necessarily aligned. With the need to promote `good enough' decisions about security-related behaviours under such constraints, we hypothesize that actions to resolve this misalignment can benefit from constructs from both neoclassical economics and behavioural economics. Here we demonstrate how current approaches to security behaviour provisioning in organizations mirror rational-agent economics, even where behavioural economics is embodied in the promotion of individual security behaviours. We develop and present a framework to accommodate bounded security decision-making, within an ongoing programme of behaviours which must be provisioned for and supported. We also point to applications of the framework in negotiating sustainable security behaviours, such as policy concordance and just security cultures.

14:20
Preliminary Results on Predicting Robustness of Biochemical Pathways through Machine Learning on Graphs

ABSTRACT. We propose the use of Machine Learning methods specifically designed for the application on graphs in order to predict robustness properties of biochemical pathways on the basis of the structure of the pathway graph only. This would make it possible, after the model has been trained on a set of classified graphs examples, to avoid the burden of performing a huge number of simulations.

14:40
Interpreting Probabilistic Models of Social Group Interactions in Meetings

ABSTRACT. A major challenge in Computational Social Science consists in modelling and explaining the temporal dynamics of human communication. Understanding small group interactions can help shed light on sociological and social psychological questions relating to human communications. Previous work showed how Markov rewards models can be used to analyse group interaction in meeting. We explore further the potential of these models by formulating queries over interaction as probabilistic temporal logic properties and analysing them with probabilistic model checking. For this study, we analyse a dataset taken from a standard corpus of scenario and non-scenario meetings and demonstrate the expressiveness of our approach to validate expected interactions and identify new ones.