previous day
next day
all days

View: session overviewtalk overview

11:30-13:00 Session 13: TaPP - Research session I: Provenance use cases and applications
Using Provenance for Generating Automatic Citations

ABSTRACT. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While an entire Research Object may be citable using a URI or a DOI, given the diversity of elements, it is often desirable to cite sub-components of a research object. In this paper, we present an approach for automatically generating citations of sub-components of a research object by using recorded provenance traces of a research object. These human readable citations can help identify, authorize, date and retrieve the published sub-components of research objects, which can themselves be large. Finally, the generated citations are suggestions that can be grouped and combined for a higher level citation.

Pointer Provenance in a Capability Architecture

ABSTRACT. We design and implement a framework for tracking pointer provenance, using our CHERI fat-pointer capability architecture to facilitate analysis of security implications of program pointer flows in both userspace and privileged code, with minimal instrumentation. CHERI enforces pointer provenance validity at the architectural level, in the presence of complex pointer arithmetic and type casting. CHERI features present new opportunities for provenance research: we discuss possible analysis techniques and highlight lessons and open questions from our work.

Provenance-based Intrusion Detection: Opportunities and Challenges

ABSTRACT. Attackers constantly evade intrusion detection systems as new attack vectors sidestep their defense mechanisms. Provenance provides a detailed, structured history of the interactions of digital objects within a system. It is ideal for intrusion detection as it offers a holistic, attack-vector-agnostic view of system execution. We believe that graph analysis on provenance graphs fundamentally strengthens detection robustness. Towards this goal, we discuss opportunities and challenges associated with provenance-based intrusion detection and offer our insights based on our past experience.

14:00-15:30 Session 14: TaPP - Keynote: Provenance and Probabilities in Relational Databases: From Theory to Practice
Provenance and Probabilities in Relational Databases: From Theory to Practice

ABSTRACT. We review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases.

16:00-17:00 Session 15: TaPP - Research session II: Provenance enabled systems
Curator: Provenance Management for Modern Distributed Systems

ABSTRACT. Data provenance is a valuable tool for protecting and troubleshooting distributed systems. Careful design of the provenance components reduces the impact on the design, implementation, and operation of the distributed system. In this paper, we present Curator, a provenance management toolkit that can be easily integrated with microservice-based systems and other modern distributed systems. This paper describes the design of Curator and discusses how we have used Curator to add provenance to distributed systems. We find that our approach results in no changes to the design of these distributed systems and minimal additional code and dependencies to manage. In addition, Curator uses the same scalable infrastructure as the distributed system and can therefore scale with the distributed system.

Wrattler: Reproducible, live and polyglot notebooks

ABSTRACT. Notebook systems such as Jupyter became a popular programming environment for data science, because they support interactive data exploration and provide a convenient way of interleaving code, comments and visualizations. However, most notebook systems use an architecture that makes reproducibility and versioning difficult and limits the interaction model.

In this paper, we present Wrattler, a new notebook system built around provenance that addresses the above issues. Wrattler separates state management from script evaluation and controls the evaluation using a dependency graph maintained in the web browser. This allows richer forms of interactivity, an efficient evaluation through caching, guarantees reproducibility and makes it possible to support versioning.