View: session overviewtalk overview
13:30 | Capturing Provenance for Runtime Data Analysis in Computational Science and Engineering Applications ABSTRACT. Capturing provenance data for runtime analysis has several challenges in high per-formance computational science engineering applications. The main issues are avoid-ing significant overhead in data capture, loading and runtime query support; and coupling provenance capture mechanisms with applications built with highly efficient numerical libraries, and visualization frameworks targeted to high performance envi-ronments. This work presents DfA-prov, an approach to capture provenance data and domain data aiming at high performance applications. |
13:35 | UniProv - Provenance Management for UNICORE Workflows in HPC Environments ABSTRACT. The goal of comprehensive provenance tracking in the scientific environ-ment should be the inclusion of the entire life cycle of data management. Thus, the data collection process begins with the registration of lab-generated or sensor-generated data, continues to organize and manage data in the stor-age repositories, processing analysis and simulation data on clusters and HPC systems, and finally referencing and verifying computational results in scien-tific publications. In the associated provenance tracking life cycle, UniProv initially concentrates on the processing and simulation of data in scientific workflows used in particular on supercomputers in the HPC environment. In this context, UniProv aims to create the core of a provenance management framework that can be extended in order to integrate different sources of the scientific provenance cycle. Here UniProv should facilitate the creation, the standardized formalization, the storage and the retrieval of Provenance In-formation. |
13:40 | Towards a PROV Ontology for Simulation Models SPEAKER: Andreas Ruscheinski ABSTRACT. Simulation models and data are the primary products of simulation studies. Although the provenance of simulation data and the support of single simulation experiments have received a lot of attention, this is not the case for simulation models. The question of how a simulation model has been generated requires to integrate diverse simulation experiments and entities at different levels of abstractions within and across entire simulation studies. Based on a concrete simulation model, we will use the PROV Data Model (PROV-DM) and illuminate the benefits of the PROV-DM approach to identify and relate entities and activities that contributed to the generation of a simulation model, thereby taking first steps in defining a PROV-DM ontology for simulation models. |
13:45 | Capturing the Provenance of Internet of Things Deployments ABSTRACT. This paper introduces the System Deployment Provenance Ontology and an associated set of provenance templates. These can be used to describe Internet of Things deployments |
13:50 | Towards Transparency of IoT Message Brokers ABSTRACT. In this paper we propose an ontological model for documenting provenance of MQTT message brokers to enhance the transparency of interactions between IoT agents. |
13:55 | Provenance-based Root Cause Analysis for Revenue Leakage Detection: Telecommunication Case Study ABSTRACT. Revenue Assurance (RA) function represents a top priority for most of the telecom operators worldwide. Revenue leakage, if not prevented, could cause a significant revenue loss of an operator, depending on the severity of the leakage affecting their profitability and continuity. Detecting and preventing revenue leakage is a key process to assure telecom systems and processes efficiency, accuracy and effectiveness. There are two general revenue leakage detection approaches: big data analytics or rule-based. Both approaches seek to detect abnormal usage and profit trend behavior and revenue leakage based on certain patterns or predefined rules, however both are mainly human-driven and fail to automatically debug and drill down for root causes of leakage anomalies and issues. In this work, a rule-based RA approach that deploys a provenance-based model is proposed. This model represents the workflow of critical RA functions enriched with contextual and semantic information that may detect implied critical leakage issues and generate potential leakage alerts. A query model is developed for the provenance model that can be applied over the captured data to automate, facilitate and improve the current process of investigation for root cause debugging and drill down of revenue leakages. |
14:00 | Implementing Data Provenance in Health Data Analytics Software ABSTRACT. Data provenance is a technique that describes the history of digital objects. In health applications, it can be used to deliver auditability and transparency, leading to increased trust in software. When implementing provenance in end-user scenarios, on top of standard provenance requirements, it is important to properly contextualize the provenance features within the domain and ensure their usability. We have developed a novel user interface, embedded into Imolytics data analysis tool and based on our Provenance Template technology, to help the end-user consume provenance information. In this demonstration, we shall demonstrate how the interface can be used to examine the audit trail of analysis results to spot when the two analytical methods start producing different results. In addition to the novel provenance UI, this is the first implementation of standard-based data provenance in a commercial data analytics software tool. |
14:05 | Case Base Reasoning decision support using the DecPROV ontology for decision modelling ABSTRACT. Decisions are modelled using a new, Semantic Web, specialised provenance ontology. This allows for management in graph databases and common instance components to be globally addressed. New decisions are compared to those in a Case Base to provide best-practice advice. This is a Decision Support System (DSS) which also assists other DSS by revealing contemporary practice in standardised ways with details for decision categorisation. |
14:10 | Research Data Alliance's Provenance Pattern Working Group ABSTRACT. The Research Data Alliance (RDA) currently has a Working Group called Provenance Patterns that is collecting Use Cases about, Patterns for and Implementation of provenance in order to capture the state of the art across research data institutions and also to share best practices to improve community skill. The RDA is not a standards body but a practice body so its working groups, like Provenance Patterns are keen to promote existing standards and tools and to provide feed back to standards organisations, tool makers and researchers so that current standards and tools can be improved and the next generation created with a broad set of inputs. This poster will convey the specific goals and timelines of the WorkingGroup. |
14:15 | Bottleneck Patterns in Provenance ABSTRACT. A bottleneck, in general, is a point of congestion in a system which impacts its efficiency, productivity and may lead to delays. Identifying and then fixing bottlenecks is an important step in maintaining and improving a system. To detect bottlenecks, we must understand the flow of processes, and dependencies between resources. Thus provenance information is an appropriate form of input to address this matter. In this paper, bottleneck patterns based on provenance graphs are proposed. These patterns are used to define the structures bottlenecks may take based on their classification, and offer a way to detect possible bottlenecks. An example from soybeans distribution is used to illustrate this preliminary work. |
14:20 | Architecture for Template-driven Provenance Recording ABSTRACT. Provenance templates define abstract patterns of provenance data and have been shown to be useful when implementing support for provenance capture in existing software tools. Their strength is in exposing only the relevant provenance capture actions through a service interface, whilst hiding the complexities associated with managing the provenance data. We present an architecture for the creation and management of libraries of such documents constructed using templates. |
14:25 | Combining Provenance Management and Schema Evolution ABSTRACT. The combination of provenance management and schema evolution using the CHASE algorithm is the focus of our research in the area of research data management. The aim is to combine the construction of a CHASE inverse mapping to calculate the minimal part of the original database -- the "minimal sub-database" -- with a CHASE-based schema mapping for schema evolution. |
14:30 | Provenance for Entity Resolution ABSTRACT. Data provenance can support the understanding and debugging of complex data processing pipelines, which are for instance common in data integration scenarios. One task in data integration is entity resolution (ER), i.e., the identification of multiple representations of a same real world entity. This paper focuses of provenance modeling and capture for typical ER tasks. While our definition of ER provenance is independent of the actual language or technology used to define an ER task, the method we implement as a proof of concept instruments ER rules specified in HIL, a high-level data integration language. |
14:35 | Where Provenance in Database Storage ABSTRACT. Where-Provenance is a relationship between a data item and the location from which this data was copied. In a DBMS, a typical use case is the connection that exists between the output of a query and the particular data value(s) that originated it. Normal DBMS operations create a variety of auxiliary copies of the data (e.g., indexes, MVs, cached copies). These copies exist over time with relationships that evolve continuously -- A) indexes maintain the copy with a reference to the origin value, B) MVs maintain the copy without a reference to the source table, C) cached copies are created once and are never maintained. Typically this where provenance is not computed and maintained. We show that forensic analysis of storage can derive where provenance of data items. We show how this computed where provenance can be useful for forensic reports and evidence from corrupted databases or validation and repair of tampered DBMS storage. |
14:40 | Streaming Provenance Compression ABSTRACT. Operating system data provenance has a range of applications, such as security monitoring, debugging heterogeneous runtime environments, and profiling complex applications. However, fine-grained collection of provenance over extended periods of time can result in large amounts of metadata. Xie et al. describe an algorithm that leverages the subgraph similarity in provenance graphs and locality of reference to to perform batch compression. We build on their effort to construct on online version that can perform streaming compression in SPADE. Further, our approach also provides performance and compression improvements over their baseline. |
14:45 | Evidence of Power-law structure in Provenance graphs ABSTRACT. One of the major issues with system based prove- nance is the storage and processing of traces in provenance related workloads. The generalised representation method for Whole-system-Provenance related workloads are graphs. These graphs are not well understood, and current work focusses on their extraction and processing, without a thorough characterisation being in place. This paper studies the topology of such graphs. We also discuss the implications the result- ing understanding brings. We analyse multiple Whole-system- Provenance graphs and discuss their structural and topological properties. Our observations allow for a novel understanding of the structure of Whole-system-Provenance graphs. In this work, we analyse the structure of graphs derived from tracing of processes running on machines. Using this analysis, we suggest that the graphs so generated have properties similar to those of a larger class of graphs, namely power-law graphs. Despite the graphs having time evolving properties, they remain true to the properties of power-law graphs. |
14:50 | Quine: a Temporal Graph System for Provenance Storage and Analysis ABSTRACT. This demonstration introduces “Quine”, a prototype graph database and processing system designed specifically for provenance analysis with capabilities that include: fine-grained graph versioning to support querying historical data after it has changed; standing queries to execute callbacks as data matching arbitrary queries is streamed in; and queries through time to express arbitrary causal ordering on past data. The system uses a novel combination of schema-less data storage and a strongly-typed query language to enable well-typed analyses of structures and types unexpected when the database was initialized. The system is designed to handle very large data with support for partitioning the graph to run across any number of hosts/shards across a network. |
14:55 | A Graph Testing Framework for Provenance Network Analytics ABSTRACT. Provenance Network Analytics is a method of analyzing provenance that assesses a collection of provenance graphs by training a machine learning algorithm to make predictions about the characteristics of data artefacts based on their provenance graph metrics. The shape of a provenance graph can vary according the modelling approach chosen by data analysts, and this is likely to affect the accuracy of machine learning algorithms, so we propose a framework for capturing provenance using semantic web technologies to allow use of multiple provenance models at runtime in order to test their effects. |
15:00 | Provenance for astrophysical data ABSTRACT. In the context of astronomy projects, provenance information is important to enable scientists to trace back the origin of a dataset, e.g. an image, spectrum, catalog or a single point in a spectral energy distribution diagram or a light curve. It is used to learn about the people and organizations involved in a project and assess the quality of the dataset as well as the usefulness of the dataset for their own scientific work. As part of the data model group in the International Virtual Observatory Alliance (IVOA) we are working on the definition of a provenance data model for astronomy which shall describe how provenance metadata can be modeled, stored and exchanged. The data model is being implemented for different projects and use-cases so we can ensure its applicability and suitability to real-world problems. |
15:05 | Data Provenance in Agriculture ABSTRACT. Abstract. Soils are probably the most important natural resource in Agriculture, and soils security is one of the most critical growing global issues. Soils security is an emerging concept motivated by sustainable development. Soils experiments require huge amounts of high-quality data, are very hard to be reproduced, but there are few studies about the provenance of such experiments. We present OpenSoils shares curated soil data and knowledge about data-centric soils experiments. OpenSoils is a provenance-oriented and lightweight computational e-infrastructure that collects, stores, describes, curates and, harmonizes various soil datasets. OpenSoils is one of the first open science-based computational frame-work of soils security in the literature. |
15:10 | Extracting Provenance Metadata from Privacy Policies ABSTRACT. Privacy policies are legal documents that describe activities over personal data such as its collection, usage, processing, sharing, and storage. Expressing this information as provenance metadata can aid in legal accountability as well as modelling of data usage in real-world use-cases. In this paper, we describe our early work on identification, extraction, and representation of provenance information within privacy policies. We discuss the adoption of entity extraction approaches using concepts and keywords defined by the GDPRtEXT resource along with using annotated privacy policy corpus from the UsablePrivacy project. We use the previously published GDPRov ontology (an extension of PROV-O) to model provenance model extracted from privacy policies. |
15:15 | Provenance-Enabled Stewardship of Human Data in the GDPR era ABSTRACT. Within life-science research the upcoming EU General Data Protection Regulation has a significant operational impact on organisations that use and exchange controlled-access Human Data. One implication of the GDPR is data bookkeeping. In this poster we describe a software tool, the Data Information System (DAISY), designed to record data protection relevant provenance of Human Data held and exchanged by research organisations. |