ICA3PP 2016: 16TH INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING
PROGRAM FOR WEDNESDAY, DECEMBER 14TH
Days:
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 2: Building Up Intelligible Parallel Computing World

Vladimir Voevodin

  • Deputy Director, Research Computing Center, Lomonosov Moscow State University
  • Head of the Department on Supercomputers and Quantum Informatics, Computational Mathematics and Cybernetics Faculty, MSU

The computing world is changing rapidly. All devices – from mobile phones and personal computers to high-performance supercomputers – are becoming parallel. The huge capacity of modern supercomputers allows complex problems, previously thought impossible, to be solved. Performance of the best supercomputers in the world is measured in Petaflops providing unprecedentedly powerful instruments for research. At the same time, the efficient usage of all opportunities offered by modern computing systems represents a global challenge and requires new knowledge, skills and abilities, where one of the main roles belongs to understanding of key properties of parallel algorithms. The talk will address the urgent need for theoretical and practical technologies of an accurate and concerted design of high performance computing systems, highly parallel algorithms, and extreme scaled applications to be able to solve large problems using the current and prospective generations of high performance computing systems. The most essential concept behind these technologies is co-design which is a very close partnership or interrelationship between all the layers involved in the process of solving these problems on high performance computing systems: mathematical methods, algorithms, applications, programming technologies, runtime systems, layers of system software and hardware. The notion of co-design is so important for HPC that the following thesis is definitely true: “No efficient co-design technologies, no reasonable exascale in the future”.

Vladimir Voevodin is Deputy Director of the Research Computing Center at Lomonosov Moscow State University. He is Head of the Department “Supercomputers and Quantum Informatics” at the Computational Mathematics and Cybernetics Faculty of MSU, professor, corresponding member of Russian academy of sciences.

Vl. Voevodin specializes in parallel computing, supercomputing, extreme computing, program tuning and optimization, fine structure of algorithms and programs, parallel programming technologies, scalability and efficiency of supercomputers and applications, supercomputing co-design technologies, software tools for parallel computers, and supercomputing education. His research, experience and knowledge became a basis for the supercomputing center of Moscow State University, which was founded in 1999 and is currently the largest supercomputing center in Russia. He has contributed to the design and implementation of the following tools, software packages, systems and online resources: V- Ray, X-Com, AGORA, Parallel.ru, hpc-education.ru, hpc-russia.ru, LINEAL, Sigma, Top50, OctoShell, Octotron, AlgoWiki. He has published 90 scientific papers with 4 books among them. Vl.Voevodin is one of the founders of Supercomputing Consortium of Russian Universities established in 2008, which currently comprises more than 60 members. He is a leader of the major national activities on Supercomputing Education in Russia and General Chair of the two largest Russian supercomputing conferences.

Location: Room Alejandra
10:00-10:30Coffee Break
10:30-12:30 Session 3A: ICA3PP: Software Systems and Programming (I) and DLMCS
Location: Room Alejandra
10:30
Redundancy Elimination in the ExaStencils Code Generator

ABSTRACT. Optimizing the performance o fcompute-bound codes requires, among other techniques, the elimination of redundant computations. The well-known concept of common subexpression elimination can achieve this in parts, and almost every production compiler conducts such an optimization. However, due to the conservative nature of these compilers, an external redundancy elimination can additionally increase the performance. For stencil codes using finite volume discretizations, an extension to eliminate redundancies between loop iterations is also very promising. We integrated both a classic common subexpression elimination and an extended version in the ExaStencils code generator and present their impact on a real-world application.

11:00
A Dataflow IR for Memory Efficient RIPL Compilation to FPGAs

ABSTRACT. Field programmable gate arrays (FPGAs) are fundamentally different to fixed processors architectures because their memory hierarchies can be tailored to the needs of an algorithm. FPGA compilers for high level languages are not hindered by fixed memory hierarchies. The constraint when compiling to FPGAs is the availability of resources.

In this paper we describe how the dataflow intermediary of our declarative FPGA image processing DSL called RIPL enables us to constrain memory. We use five benchmarks to demonstrate that memory use with RIPL is comparable to the Vivado HLS OpenCV library without the need for language pragmas to guide hardware synthesis. The benchmarks also show that RIPL is more expressive than the Darkroom FPGA image processing language.

11:30
A new scalable approach for distributed metadata in HPC
SPEAKER: Julio Ortega

ABSTRACT. In the last years not only a growth of data-intensive storage has been observed, but also compute-intensive workloads need a high computing power and high parallelism with good performance and great scalability. Many distributed filesystem have focused in how to distribute data across multiple processing nodes, but one of the main problem to solve is the management of the ever-greater number of metadata requests. In fact, some studies have identified that an optimized metadata management is a key factor to achieve good performance. Applications in high performance computing usually require filesystems able to provide a huge amount of operations per second to achieve the required level of performance. Although the metadata storage is smaller than data storage, metadata operations consume large CPU cycles, so a single metadata server cannot be longer sufficient. In this paper we define a completely distributed method that provides efficient metadata management and seamlessly adapts to general purpose and scientific computing filesystem workloads. The throughput performance is measured by a metadata benchmark and compared with several distributed filesystems. The results show great scalability in creating operations on a single directory accessed by multiple clients.

12:00
Enabling Android-based devices to high-end GPGPUs

ABSTRACT. The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance and some obvious important limitations in general purpose GPU hardware integration. In this paper we describe the first, to the best of our knowledge, offloading platform that enables Android devices with no GPU support to run Nvidia CUDA kernels by migrating their execution on high-end GPGPU servers. The framework is highly modular and exposes a rich Application Programming Interface (API) to the developers making it highly transparent and hiding the complexity of the network layer. We present the first preliminary results, showing that not only GPGPU offloading is possible but it is also promising in terms of performance.

10:30-12:30 Session 3B: ICA3PP: Parallel and Distributed Algorithms (I)
Location: Room Beatriz I
10:30
A GPU-based backtracking algorithm for permutation combinatorial problems

ABSTRACT. This work presents a GPU-based backtracking algorithm for permutation combinatorial problems based on the Integer-Vector-Matrix (IVM) data structure. IVM is a data structure dedicated to permutation combinatorial optimization problems. In this algorithm, the load balancing is performed without intervention of the CPU, inside a work stealing phase invoked after each node expansion phase. The proposed work stealing approach uses a virtual n-dimensional hypercube topology and a triggering mechanism to reduce the overhead incurred by dynamic load balancing. We have implemented this new algorithm for solving instances of the Asymmetric Travelling Salesman Problem by implicit enumeration, a scenario where the cost of node evaluation is low, compared to the overall search procedure. Experimental results show that the dynamically load balanced IVM-algorithm reaches speed-ups up to 17x over a serial implementation using a bitset-data structure and up to 2x over its GPU counterpart.

11:00
Buffer Minimization for Rate-optimal Scheduling of Synchronous Dataflow Graphs on Multicore Systems
SPEAKER: Mingze Ma

ABSTRACT. Streaming applications are generally modelled by dataflow graphs, among which Synchronous Dataflow Graphs (SDFGs) are one of the most popular models. Self-timed Execution (STE) based methods are proved to be very powerful for the analysis and scheduling of SDFGs. In this paper, an extension of STE is presented based on which, an exact algorithm is proposed to find the minimal memory usage for buffering to guarantee rate-optimal execution, that is execution under maximal throughput, of an SDFG on a multicore system. The searching process of the algorithm is shortened significantly by developing and introducing effective accelerating methods. As only heuristic methods exist currently, this paper is the first work which gives an exact solution for the buffer minimization scheduling problem for rate-optimal scheduling on multicore systems. Experimental results show that in 15.57% of the considered cases, the proposed exact algorithm obtains less buffer usage compared with a state-of-the-art heuristic algorithm and gets equal results for the rest of the cases. The buffer decrease is up to 56.50% and 5.42% on average for these 15.57% improved cases. In addition, a heuristic is proposed, which gets less buffer use for 21.83% of the experimental cases and obtains 6.69% of buffer decrease on average for the improved cases when compared with a state-of-art widely used heuristic from the literature.

11:30
Light Loss-Less Data Compression, with GPU implementation

ABSTRACT. There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call Light Loss-Less (LLL) data compression. It is designed so that decompression can be highly parallelized and run very efficiently on the GPU. This makes sense for many applications in which compressed data is read and decompressed many times and decompression performed more frequently than compression. We first show optimal sequential and parallel algorithms for LLL decompression and implemented them to run on the CPU and on the GPU, respectively. To show the potentiality of LLL data compression method, we have evaluated the running time using five images and compared with well-known compression methods LZW and LZSS. Our GPU implementation of LLL decompression runs 41.8-134 times faster than the CPU implementation. Also, the running time on the GPU of our experiments show that LLL decompression is 2.28-9.32 times faster than LZW decompression and 3.88-15.8 times faster that LZSS decompression, although their compression ratios are comparable.

12:00
Deterministic Construction of Regular Geometric Graphs with Short Average Distance and Limited Edge Length
SPEAKER: Koji Nakano

ABSTRACT. This paper proposes a deterministic method to construct 5-regular geometric graphs with short average distance under the constraint such that the set of vertices is a subset of $¥mathbb{N}¥times ¥mathbb{N}$ and the length of each edge is at most 4. This problem is motivated by the design of efficient floor plan of parallel computers consisting of a number of computing nodes arranged on a two-dimensional array. In such systems, the degree of vertices is determined by the number of ports of the routers and the edge length is limited by a certain value determined by the cycle time. The goodness of the resulting geometric graph is evaluated by the average shortest path length (ASPL) between vertices which reflects the average communication delay between computing nodes. The idea of the proposed method is to arrange the basic component derived from $(3,g)$-cage in a two-dimensional manner and to connect adjacent components by parallel edges of length 4 each. The result of numerical calculations shows that the average distance in the resulting graph is close to the lower bound so that the gap to the lower bound is less than 0.98 when the number of vertices is 432000.

10:30-12:30 Session 3C: ICA3PP: Performance Modeling and Evaluation (I)
Location: Room Beatriz II
10:30
Synthetic Traffic Model of the Graph500 Communications
SPEAKER: Pablo Fuentes

ABSTRACT. As BigData applications have gained momentum over the last years, the Graph500 benchmark has appeared in an attempt to steer the design of HPC systems to maximize the performance under memory-constricted application workloads. These workloads have a significant impact on the memory use and are also communication intensive. A realistic simulation of such benchmarks for architectural research is challenging, since full-system and trace-driven simulators limit severely the size and detail of the workloads that can be used to evaluate the impact of network improvements. Synthetic traffic workloads are one of the least resource-consuming method to evaluate the performance. In this work, we propose a synthetic traffic model that emulates the behavior of the Graph500 communications. This model has been obtained empirically through a characterization of several executions of the benchmark with different input parameters. We verify the validity of our model against a characterization of the execution of the benchmark with different parameters.

11:00
Road Segment Information based Named Data Networking for Vehicular Environments
SPEAKER: Junlan Xiao

ABSTRACT. Vehicular Ad Hoc Network (VANET) can provide strong support for intelligent transportation systems and Internet services. How to disseminate data to vehicular nodes has been always at the core of VANET technologists. In this paper, we consider data dissemination via Named Data Networking (NDN), which is a new and promising Internet technology to realize content centric networking. NDN can route and forward data according to data con-tent (or ID) rather than the address/location of the data, so that can disseminate and share data more efficiently. Our work focuses on how to cope with the mobility of vehicular nodes, which is not considered in original NDN. Different from existing NDN for vehicular environments, we propose to establish data dissemination route using road segment information rather than node information. Such a new approach can avoid the effect of topology changes on data routes so as to keep data routes stable. A road segment route is instantiated as routes of nodes upon data forwarding demands. The mechanism to realize such instantiation is the major challenge addressed in our design. Simulations via ndnSIM clearly show that our design performs much better than existing ones.

11:30
A Distributed Formal Model for the Analysis and Verification of Arbitration Protocols on MPSoCs Architecture

ABSTRACT. In digital system design, control access protocols are used to allocate shared resources. Whenever a resource, such as a bus is shared, an arbiter is required to assign the access to the resource at a particular time. In SoC (System on Chip) architectures, the design, analysis and implementation of such arbiter, has become increasingly important due to its significant impact on the performance and efficiency of such systems. In this paper, we provide a high-level abstract and formal model of MPSoCs architecture. The proposed model provides a way to easily implement, analyze and compare different arbitration protocols. It also allows to study relevant properties such as fairness, mutual exclusion and deadlock freedom at a high level of abstraction. The studied protocols as well as the MPSoC architecture have been modeled in a distributed manner which makes the generation of a distributed implementation more relevant.

10:30-12:30 Session 3D: IUCC (I)
Location: Room Beatriz III
10:30
Mixed Reality Cubicle and Cave Automatic Virtual Environment
SPEAKER: Clement Onime

ABSTRACT. In Cave Automatic Virtual Environments (CAVEs), a computer generated environment is projected all around a user to fully immerse or eliminate all reference to the real world. Typically, Virtual Reality (VR) CAVEs also track and respond to the user’s physical orientation, movements and gestures. Mixed reality environments instead focus on combining real world objects with computer generated ones. In this paper, we focus on the application of Augmented Reality (AR) as a mixed reality technology via (or to) mobile devices such as head-mounted devices, smart-phones and tablets. We present the development of mixed reality applications for mobile (smart-phone and tablet) devices leading up to the implementation of an mixed reality (AR) cubicle for immersive Three Dimensional (3D) visualizations. We also present the results of a study on the familiarity with both VR and AR technologies among students from two institutions of tertiary education. The paper concludes with a discussion of planned deployment and upgrade of mixed reality cubicles using mobile VR equipment.

11:00
Toward Ubiquitous Data Processing
SPEAKER: Ichiro Satoh

ABSTRACT. Ubiquitous computing environments generate a large amount of data generated at the edges of networks, e.g., sensors, which measure or monitor the real world. To analyze such data, we needed to transmit the data to a cluster of high-performance servers, e.g., data centers and cloud computing. However, the cost of data transmission is heavy. This paper enables ubiquitous computing environments to analyze the data by using a popular approach to analyzing data in big data processing, e.g., MapReduce, which have been designed for high-performance servers. The framework proposed in this paper deploys programs for data processing at the nodes that contain the target data as a map step and executes the programs with the local data. Finally, it aggregates the results of the programs to certain nodes as a reduce step. The architecture of the framework, its basic performance, and its application are also described here.

11:30
BaaS-4US: A Framework to Develop Standard Backends as a Service for Ubiquitous Applications

ABSTRACT. Internet of Things (IoT) is presented as the next computing and interaction paradigm to be implemented in society, after current advances in Web, mobile and cloud-based technologies. However, there are still a large number of open challenges to be addressed for its successful deployment. Many proposals are focused on hardware and low-level technical issues, but the use of software design based techniques/methods can also provide significant advantages in terms of an easier development of IoT applications. This research work introduces a framework intended to facilitate the development of backend service platforms based on standards. It follows the ideas behind the Internet of Services (IoS) paradigm and cloud computing, which aim to provide advanced storage, access and analysis of a large amount of ever-growing data.

12:30-14:00Lunch Break
14:00-16:00 Session 4A: ICA3PP: Software Systems and Programming (II)
Location: Room Alejandra
14:00
Deciding Soundness for Workflow Nets Based on Branching Processes
SPEAKER: Guanjun Liu

ABSTRACT. Workflow nets (WF-nets) are widely used to model and analyse business process systems. As an important property of WF-nets, soundness requires that the systems are both deadlock- and livelock-free and each activity of the system has a chance to be executed. Van der Aalst has proven that the soundness problem is decidable and we have also showed that it is PSPACE-complete for bounded ones. The soundness detection is generally carried out through the reachability graph and thus the state explosion problem is a big obstacle to this technique. The unfolding technique can effectively avoid/alleviate the state explosion problem for those Petri nets that have many concurrent actions. This paper defines \emph{basic unfolding} that is a finite prefix of the unfolding of a WF-net. Furthermore, a necessary and sufficient condition is proposed to decide soundness based on basic unfolding.

14:30
Creating Distributed Execution Plans with BobolangNG

ABSTRACT. Execution plans constitute the traditional interface between DBMS front-ends and back-ends; similar networks of interconnected operators are found also outside database systems. Tasks like adapting execution plans for distributed or heterogeneous run-time environments require a~plan transformation mechanism which is simple enough to produce predictable results while general enough to express advanced communication schemes required for instance in skew-resistant partitioning. In this paper, we describe the BobolangNG language designed to express execution plans as well as their transformations, based on hierarchical models known from many environments but enhanced with a novel compile-time mechanism of component multiplication. Compared to approaches based on general graph rewriting, the plan transformation in BobolangNG is not iterative; therefore the consequences and limitations of the process are easier to understand and the development of distribution strategies and experimenting with distributed plans are easier and safer.

15:00
A C++ Generic Parallel Pattern Interface for Stream Processing

ABSTRACT. Parallel programming frameworks aid to a great extent developers to implement applications in order to exploit parallel hardware resources. Nevertheless, developers require extra expertise to properly use and tune them to operate on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and requires, in most of the cases, considerable efforts. Apart from that, the lack of high-level abstraction via parallel patterns of these frameworks increases the complexity for developing parallel applications. To pave the way in this direction, this paper proposes GrPPI, a generic and reusable high-level parallel pattern interface for stream-based C++ applications. Thanks to its high-level C++ API, this interface allows users to easily expose parallelism in sequential applications using already existing parallel frameworks, such as C++ threads, OpenMP and Intel TBB. We evaluate this approach using an image processing use case to demonstrate its benefits from the usability, flexibility, and performance points of view.

15:30
A Portable Lock-free Bounded Queue

ABSTRACT. Attaining efficient and portable lock-free containers is challenging as almost any CPU family implements slightly different memory models and atomic read-modify-write operations. C++11 offers a memory model and operation abstractions that enable portable implementations of non-blocking algorithms. In this paper, we present a first scalable and portable lock-free bounded queue supporting multiple readers and multiple writers. Our design uses unique empty values to decouple writing an element from incrementing the tail during enqueue. Dequeue employs a helping scheme that delays helping in the regular case, thereby reducing contention on shared memory. We evaluate our implementation on a range of architectures featuring weak and strong memory consistency models. Our comparisons with known blocking and lock-free designs demonstrate that the presented implementation scales well on architectures that implement a weak memory consistency model.

14:00-16:00 Session 4B: ICA3PP: Parallel and Distributed Algorithms (II)
Location: Room Beatriz I
14:00
Scaling DBSCAN-like algorithms for event detection systems in Twitter

ABSTRACT. The increasing use of mobile social networks has lately trans-formed news media. Real-world events are nowadays reported in socialnetworks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithmsconstitute a well-known clustering approach to retrospective event de-tection. However, scaling such algorithms to geographically large regionsand temporarily long periods present two major shortcomings. First, de-tecting real-world events from the vast amount of tweets cannot be per-formed anymore in a single machine. Second, the tweeting activity variesa lot within these broad space-time regions limiting the use of globalparameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing themthrough a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benets of ourproposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.

14:30
Feedback Control Optimization for Performance and Energy Efficiency on CPU-GPU Heterogeneous Systems

ABSTRACT. Owing to the rising awareness of environment protection, high performance is not the only aim in system design, energy efficiency has increasingly become an important goal. In accordance with this goal, heterogeneous systems which are more efficient than CPU-based homogeneous systems, and occupying a growing proportion in the Top500 and the Green500 lists. Nevertheless, heterogeneous system design being more complex presents greater challenges in achieving a good tradeoff between performance and energy efficiency for applications running on such systems. To address the performance energy tradeoff issue in CPU-GPU heterogeneous systems, we propose a novel feedback control optimization (FCO) method that alternates between frequency scaling of device and division of kernel workload between CPU and GPU. Given a kernel and a workload division, frequency scaling involves finding near-optimal core frequency of the CPU and of the GPU. Further, an iterative algorithm is proposed for finding a near-optimal workload division that balance workload between CPU and GPU at a frequency that was optimal for the previous workload division. The frequency scaling phase and workload division phase are alternatively performed until the proposed FCO method converges and finds a configuration including core frequency for CPU, core frequency for GPU, and the workload division. Experiments show that compared with the state-of-the-art GreenGPU method, performance can be improved by 7.9%, while energy consumption can be reduced by 4.16%.

15:00
Implementing Snapshot Objects on Top of Crash-Prone Asynchronous Message-Passing Systems
SPEAKER: Michel Raynal

ABSTRACT. Distributed snapshots, as introduced by Chandy and Lamport in the context of asynchronous failure-free message-passing distributed systems, are consistent global states in which the observed distributed application might have passed through. It appears that two such distributed snapshots cannot necessarily be compared (in the sense of determining which one of them is the ``first''). Differently, snapshots introduced in asynchronous crash-prone read/write distributed systems are totally ordered, which greatly simplify their use by upper layer applications.

In order to benefit from shared memory snapshot objects, it is possible to simulate a read/write shared memory on top of an asynchronous crash-prone message-passing system, and build then snapshot objects on top of it. This algorithm stacking is costly in both time and messages. To circumvent this drawback, this paper presents algorithms building snapshot objects directly on top of asynchronous crash-prone message-passing system. ``Directly'' means here ``without building an intermediate layer such as a read/write shared memory''. To the authors knowledge, the proposed algorithms are the first providing such constructions. Interestingly enough, these algorithms are efficient and relatively simple.

15:30
Towards Parallel CFD computation for the ADAPT framework
SPEAKER: Imad Kissami

ABSTRACT. In order to run Computational Fluid Dynamics (CFD) codes on large scale infrastructures, parallel computing has to be used because of the computational intensive nature of the problems. In this paper we investigate the ADAPT platform where we couple ow Partial Dieventual Equations and a Poisson equation. This leads to a linear system which we solve using direct methods. The implementation deals with the MUMPS parallel multi-frontal direct solver and mesh partitioning methods using METIS to improve the performance of the framework. We also investigate, in this paper, how the mesh partitioning methods are able to optimize the mesh cell distribution for the ADAPT solver. The experience gained in this paper facilitates the move to the 3D version of ADAPT and the move to a Service Oriented view of ADAPT as future work.

14:00-16:00 Session 4C: IUCC (II)
Location: Room Beatriz III
14:00
An Automated Finger-Knuckle-Print Identification System Using Jointly RBF & RFT Classifiers

ABSTRACT. Ensuring the security of a wide variety of applications such as public security, access control and banking, is becoming an increasingly important problem as modern technology is integrated into the majority of these applications. Thus, establishing identity of a person is the significant method used to ensure a high security level. As a security method, biometric identification has a long tradition and is a synonym for the uniqueness of person. In this paper, based on Finger-Knuckle-Print (FKP), we present a multimodal personal identification system using two feature extraction methods with their fusion applied at the matching score level. In this study, the segmented FKP is firstly represented by the Histogram of Oriented Gradients (HOG) descriptors. Subsequently, the Radial Basis Function (RBF) and Random Forest Transform (RFT) models are used to design two sub-systems. The proposed method is validated for their efficacy on the available PolyU FKP Database of 165 users. Our experimental results show the effectiveness and reliability of the proposed approach, which brings both high identification and accuracy rate.

14:30
Bluetooth Low Energy based Occupancy Detection for Emergency Management

ABSTRACT. A reliable estimation of an area's occupancy can be beneficial to a large variety of applications, and especially in relation to emergency management. For example, it can help detect areas of priority and assign emergency personnel in an efficient manner. However, occupancy detection can be a major challenge in indoor environments. A recent technology that can prove very useful in that respect is Bluetooth Low Energy (BLE), which is able to provide the location of a user using information from beacons installed in a building. Here, we evaluate BLE as the primary means of occupancy estimation in an indoor environment, using a prototype system composed of BLE beacons, a mobile application and a server. We employ three machine learning approaches (k-nearest neighbours, logistic regression and support vector machines) to determine the presence of occupants inside specific areas of an office space and we evaluate our approach in two independent experimental settings. Our experimental results indicate that combining BLE with machine learning is certainly promising as the basis for occupancy estimation.

15:00
Long Life Application Approach for user context management and situation understanding

ABSTRACT. Nowadays mobile devices host many applications that are directly downloaded and installed from stores. The existence of such a large amount of apps for a myriad of purposes imposes a huge overhead on users, who are in charge of selecting, installing and executing the appropriate apps then deleting them when no more needed. These applications have mostly neglected to take into account the user’s context by proposing static non evolving scenarios of use. In order to address these concerns, comes the idea of a single app, constantly executed on the mobile device while automatically handling software components according to the users’ needs. Long Life Application a new way to respond to the user’s needs in a dynamic way that evolves at run time (include/exclude/migrate business functionalities and update interactions mode) according to the user’s need while moving in his surroundings by understanding the occurring events and building contextually described situations

14:00-16:00 Session 4D: VEAI (I)
Location: Room Beatriz II
14:00
Versatile mixed reality educational spaces: a medical education implementation case

ABSTRACT. Education has tapped into gaming to increase engagement and facilitate knowledge retention through experiential modalities. There have been several high technological intensity digital laboratories where learners are engaged with the subject matter through various activities. There are also several low intensity web based solutions that familiarize learners with curriculum material but in less experientially intensive manners. A ubiquitous approach to highly impactful, low cost, high reusability gamified educational approach is a versatile mixed reality educational environment. Using prolific technologies such as inexpensive AR headsets and a versatile low maintenance DB back ends, a real world environment can be transformed with 3D graphics and audio to various educational spaces of highly impactful content. Such a simple implementation is presented here for virtual patients in medicine. A simple DB backend supports a presentation frontend developed in Unity, utilizing the vuforia mixed reality platform. Based on this implementation future directions are explored.

14:30
Feature Extraction for Emotion Recognition and Modelling using Neurophysiological Data
SPEAKER: Leo Galway

ABSTRACT. The ubiquitous computing paradigm is becoming a reality; we are reaching a level of automation and computing in which people and devices interact seamlessly. However, one of the main challenges is the difficulty users have in interacting with these increasingly complex systems. Ultimately, endowing machines with the ability to perceive users emotions will enable a more intuitive and reliable interaction. Consequently, using the electroencephalogram (EEG) as a bio-signal sensor, the affective state of a user can be modelled and subsequently utilised in order to achieve a system that can recognise and react to the users emotions. In this context, this paper investigates feature vector generation from EEG signals for the purpose of affective state modelling based on Russells Circumplex Model. Investigations are presented that aim to provide the foundation for future work in modelling user affect and interaction experiences through exploitation of different input modalities. The DEAP dataset was used within this work, along with a Support Vector Machine, which yielded reasonable classification accuracies for Valence and Arousal using feature vectors based on statistical measurements and band power from the  and waves and High Order Crossing of the EEG signal.

15:00
Learning Physics in an Immersive and Interactive Virtual Reality Laboratory

ABSTRACT. Understanding physical phenomena is still a challenging task. Three-dimensional interactive simulations are a valuable tool to support understanding. We implemented different physical simulations designed so that students can interact and learn with them. However, student engagement is crucial to support the learning process. One promising form to engage and immerse students in three-dimensional environments are virtual reality applications. In this paper we explore interactive virtual reality experiences implemented with HTC Vive as alternative form of learning tool supporting engagement. We examine the user experience in a physics laboratory presented in interactive and immersive VR, looking at engagement, motivation, usability and learning. First results indicate that such experiences are well suited as addition for traditional in-class learning and support realistic laboratory setups and simulations in an engaging, interesting, and immersive way.

15:30
Engaging immersive video consumers: Challenges regarding 360-degree gamified video applications

ABSTRACT. 360-degree videos is a new medium that has gained the attention of the research community imposing challenges for creating more interactive and engaging immersive experiences. The purpose of this study is to introduce a set of technical and design challenges for interactive, gamified 360-degree mixed reality applications that immerse and engage users. The development of gamified applications refers to the merely incorporation of game elements in the interaction design process to attract and engage the user through playful interaction with the virtual world. The study presents experiments with the incorporation of series of game elements such as time pressure challenges, badges and user levels, storytelling narrative and immediate visual feedback to the interaction design logic of a mixed reality mobile gaming application that runs in an environment composed of 360-degree video and 3D computer generated objects. In the present study, the architecture and overall process for creating such an application is being presented along with a list of design implications and constraints. The paper concludes with future directions and conclusions on improving the level of immersion and engagement of 360-degree video consumers.

16:00-16:30Coffee Break
16:30-18:00 Session 5A: ICA3PP: Parallel and Distributed Architectures
Location: Room Alejandra
16:30
Optimized Mapping Spiking Neural Networks onto Network-on-Chip
SPEAKER: Yu Ji

ABSTRACT. Mapping spiking neural networks (SNNs) onto network-on-chips (NoCs) is pivotal to fully utilize the hardware resources of dedicated multi-core proces-sors (CMPs) for SNNs’ simulation. This paper presents such a mapping frame-work from the aspect of architecture evaluation. Under this framework, we present two strategies accordingly: The first tends to put highly communi-cating tasks together. The second is opposite, which aims at SNN features to achieve a balanced distribution of neurons according to their active degrees; for communication-intensive and unbalanced SNNs, this one can alleviate NoC congestion and improve the simulation speed more. This framework also con-tains a customized NoC simulator to evaluate mapping strategies. Results show that our strategies can achieve a higher simulation speed (up to 1.37 times), and energy consumptions can be reduced or rise very limited.

17:00
Intelligent SPARQL Endpoints: Optimizing Execution Performance by Automatic Query Relaxation and Queue Scheduling

ABSTRACT. The Web of Data is widely considered as one of the major global repositories populated with countless interconnected and structured data prompting these linked datasets to be continuously and sharply increasing. In this context the so-called SPARQL Protocol and RDF Query Language is commonly used to retrieve and manage stored data by means of SPARQL endpoints, a query processing service especially designed to get access to these databases. Nevertheless, due to the large amount of data tackled by such endpoints and their structural complexity, these services usually suffer from severe performance issues, including inadmissible processing times. This work aims at overcoming this noted inefficiency by designing a distributed parallel system architecture that improves the performance of SPARQL endpoints by incorporating two functionalities: 1) a queuing system to avoid bottlenecks during the execution of SPARQL queries; and 2) an intelligent relaxation of the queries submitted to the endpoint at hand whenever the relaxation itself and the consequently lowered complexity of the query are beneficial for the overall performance of the system. To this end the system relies on a two-fold optimization criterion: the minimization of the query running time, as predicted by a supervised learning model; and the maximization of the quality of the results of the query as quantified by a measure of similarity. These two conflicting optimization criteria are efficiently balanced by two bi-objective heuristic algorithms sequentially executed over groups of SPARQL queries. The approach is validated on a prototype and several experiments that evince the promising performance of our scheme.

16:30-18:00 Session 5B: ICA3PP: Applications of Parallel and Distributed Computing (I)
Location: Room Beatriz I
16:30
Improving the Performance of Cardiac Simulations in a Multi-GPU Architecture Using a Coalesced Data and Kernel Scheme

ABSTRACT. Computational modeling is a useful tool to study many distinct and complex phenomena. In this work we focus on models that describe the electrical and mechanical behavior of the heart, under normal and pathological conditions. The high complexity of the associated biophysical processes translates into complex mathematical and computational models. This, in turn, translates to cardiac simulators that demand a lot of computational power to be executed. Therefore, most of the state-of-the-art cardiac simulators are implemented to run in parallel architectures. In this paper we evaluate a new coalesced data and kernel scheme used to reduce the execution costs of cardiac simulations that run on multi-GPU environments. The new scheme was tested for an important part of the simulator, the solution of the systems of Ordinary Differential Equations (ODEs). The results have shown that the proposed scheme is very effective. The execution time to solve the systems of ODEs on the multi-GPU environment was reduced by half, when compared to a scheme that does not implemented the proposed data and kernel coalescing. As a result, the total execution time of cardiac simulations was 25% faster.

17:00
An Efficient Implementation of LZW Compression in the FPGA
SPEAKER: Xin Zhou

ABSTRACT. LZW compression algorithm is one of the most famous ¥linebreak dictionary-based lossless compression algorithms. LZW algorithm is exploited in UNIX file compression utility ``compress" and in GIF and TIFF image formats. It converts an input string of characters into a string of codes using a dictionary that maps strings into codes. The dictionary is dynamically created by repeatedly adding newly appeared substrings during the conversion. The main contribution of this paper is to present a new hardware architecture for accelerating LZW compression using an FPGA. In the proposed architecture, we efficiently use dual-port block RAMs embedded in the FPGA to implement a hash table that is used as a dictionary. Using independent two ports of the block RAM, reading and writing operations for the hash table are performed simultaneously. Additionally, we can read eight values in the hash table in one clock cycle by partitioning the hash table into eight tables. LZW compression on a single CPU, where the frequency of FPGA is 165.56MHz. Since the proposed hardware implementation of LZW compression is compactly designed, we have succeeded in implementing 24 identical circuits in an FPGA, where the clock frequency of FPGA is 163.35MHz. Our implementation of 24 proposed circuits attains a speed up factor of 23.51 times faster than a sequential LZW compression on a single CPU.

17:30
Methodological Approach to Data-Centric Cloudification of Scientific Iterative Workflows

ABSTRACT. The computational complexity and the constantly increasing amount of input data for scientific computing models is threatening their scalability. In addition, this is leading towards more data-intensive scientific computing, thus rising the need to combine techniques and infrastructures from the HPC and Big Data worlds. This paper presents a methodological approach to cloudify generalist iterative scrientific workflows, with a focus on improving data locality and preserving performance. To evaluate this methodology, it was applied to an hydrological simulator, EnKF-HGS. The design was implemented using Apache Spark, and assessed in a local cluster and in Amazon Elastic Compute Cloud (EC2) against the original version to evaluate performance and scalabilty.

16:30-18:00 Session 5C: CSS 2016: 8th International Symposium on Cyberspace Safety and Security
Location: Room Beatriz III
16:30
An Empirical Study of Denial of Service (DoS) against VoIP
SPEAKER: James Yu

ABSTRACT. This research is motivated by consistent hacking activities against our VoIP servers where we observed millions of intrusion attempts from all over the world. From the syslog data on a server, we observed the REGISTER flooding attack. We then conducted a more thorough study by capturing every incoming and outgoing SIP packet from the tcpdump data. In addition to the REGISTER flooding attack, we also identify the INVITE flooding attack. Our major findings are (1) a REGISTER flooding rate of 200 msg/sec has the potential to deplete CPU resource and causes a Denial of Service (DoS) attack, and (2) an INVITE flooding rate with only 110 msg/sec could cause a DoS attack because of process stack overflow. We also discuss different approaches to prevent the SIP-base DoS attacks.

17:00
Behaviour-based anomaly detection of cyber-physical attacks on a robotic vehicle

ABSTRACT. Security is one of the key challenges in cyber-physical systems, because by their nature, any cyber attack against them can have physical repercussions. This is a critical issue for autonomous vehicles; if compromised in terms of their communications or computation they can cause considerable physical damage due to their mobility. Our aim here is to facilitate the automatic detection of cyber attacks on a robotic vehicle. For this purpose, we have developed a detection mechanism, which monitors real-time data from a large number of sources onboard the vehicle, including its sensors, networks and processing. Following a learning phase, where the vehicle is trained in a non-attack state on what values are considered normal, it is then subjected to a series of different cyber-physical and physical-cyber attacks. We approach the problem as a binary classification problem of whether the robot is able to self-detect when and whether it is under attack. Our experimental results show that the approach is promising for most attacks that the vehicle is subjected to. We further improve its performance by using weights that accentuate the anomalies that are less common thus improving overall performance of the detection mechanism for unknown attacks.

17:30
On The Evaluation of Security Properties of Containerized Systems

ABSTRACT. Security of virtualization systems has become a central topic in the field of computer security. Although such systems are widely deployed, a uniform approach to the definition and verification of their security is still missing.

In this paper we make the first step towards this unification. We consider two models for defining security threats for virtualisation systems, namely one defined by Reshetova et al. and one defined by the Common Criteria Recognition Arrangement. We argue that the two models are equivalent in the sense that they define the same security perimeter. Such equivalence allows the possibility of deriving the security of a system in one model given its security in the other model.

As a use case we consider the security of the Docker virtualisation system.

16:30-18:00 Session 5D: VEAI (II)
Location: Room Beatriz II
16:30
Interactive Virtual Archaeology: Constructing the prehistoric past at Avebury henge
SPEAKER: Liz Falconer

ABSTRACT. Avebury Henge is situated approximately 20 miles north of Stonehenge in Wiltshire, U.K. and contains the largest known Neolithic stone circle in Europe. It is part of the Avebury, Stonehenge and Associated Sites World Heritage Site but is less well-known than Stonehenge, despite being an impressive monument with a ditch and bank system more than 400m in diameter. In the past 20 years or so many computer-based simulations of archaeological sites around the world have been created, but many of these digital reconstructions lack both the ability to enable personal presence of the user in the virtual landscape itself, and the facility for users to interact with others and the virtual environment around them in real time. The project described in this paper aims to research the issues of presence and interaction in virtually reconstructed ancient landscapes through a reconstruction of Avebury Henge in a virtual world environment, as it might have been circa 2,300 BCE. This paper describes and evaluates the construction phase of the project in a virtual world environment.

17:00
An Evaluation of the Efficacy of a Perceptually Controlled Immersive Environment for Learning Acupuncture

ABSTRACT. This paper presents a basic but functional Perceptual User Interface (PUI) controlled immersive environment (IE) on an electronic learning platform (e-Learning) in order to deliver educational material relating to the NADA (National Acupuncture Detoxification Association) protocol for acupuncture. The purpose of this study is to evaluate the learning efficacy of the PUI IE e-Learning application with a typical Graphical User Interface (GUI) e-Learning IE application. Both are compared to a more traditional learning method. Additionally, this paper evaluates user interface (UI) sentiment of the systems and determines any correlations with demographics and system preference.

17:30
Integrating virtual worlds with Learning Management Systems: the MULTIS approach

ABSTRACT. Learning Management Systems (LMS) provide minimal support for educational use of virtual worlds. Integration efforts look from within the virtual world, providing hooks to services in the LMS, requiring teachers/trainers to setup and manage virtual world activities from within the virtual world platform. We present the inverse approach, leveraging the LMS to support virtual world educational/training activities. In our approach, the LMS automatically provides the teacher/trainer with setup, control, tracking, and storage. It is the result of a joint effort by academic and corporate teams, implemented in the Formare LMS for OpenSimulator and Second Life Grid virtual world platforms. We explain how the MULTIS architecture can be used for integration, with concrete cases, an approach that can be implemented in other LMS and virtual world platforms, to overcome the limitations of existing systems for organizational management of e-learning activities.