Komadu Provenance Collection Tool

Collect and visualize provenance data

Komadu is a standalone provenance collection tool that can be added to an existing cyberinfrastructure for the purpose of collecting and visualizing provenance data.

It supports the W3C PROV specification. Komadu is the successor of Karma provenance tool which is based on OPM.

How it works

Komadu provides a Web Services API and a Messaging API for both provenance collection and querying collected data. Provenance collection is driven by notifications which represents a particular event related to some activity, entity or agent. Query API can be mainly used to find details about a particular activity, entity or agent and to generate the provenance graph for a particular activity, entity or agent.

Visualization of provenance data is more useful with support for manipulating very large structures, for displaying different views and for interactivity. This can help a user to navigate their experiment information with a mental map of what is going on in the experiment, to compare different experimental runs quantitatively, and to do model selection with an effective collaboration between the user and the discovery system. Komadu comes with a command line tool which converts a generated graph into a CSV (Comma Separated Values) file and that can be imported into most of the visualization tools Cytoscape.

Key Features of Komadu Version 1.0

Support for W3C PROV specification

Komadu is fully compliant with the W3C PROV specification and generates provenance graphs according to the PROV-XML schema.

Easy to use API

Komadu ingest and query APIs are nicely aligned with W3C PROV standards. Therefore it’s very easy to understand and use for someone who is familiar with the specification.

Ability to collect provenance information of any application

Unlike Karma, Komadu is not tightly coupled to workflows and scientific provenance collection. Komadu API uses generic terms and operations such that any type of provenance can be captured and queried.

Contextless graph generation

Karma generates provenance graphs based on a context workflow URI which must be shared across all workflows/services within the system. This limitation is no longer there in Komadu and it can generate graphs starting from a given node by following their relationships.

Support for Activity, Entity and Agent graphs

Komadu can generate the provenance graph starting from any node which can be an Activity, Entity or Agent. This allows the user to track lineage of any component that he/she is interested in.

Backward compatibility for context graphs

Komadu provides the support for context based graphs for workflows to make it backward compatible with Karma.

Asynchronous notification processing

Komadu comes with an asynchronous notification processing mechanism which makes the tool faster and more responsive. All incoming raw notifications are stored in the database first and a background thread pool processes those notifications later. Behaviour of the thread pool can be controlled through a set of parameters in komadu.properties file.

Customizable cache expiration

All generated graphs are cached in the database to be reused within the cache interval. This cache interval can be configured through a parameter in komadu.properties file.

Visualization of graphs

Komadu comes with a tool which converts the generated XML based PROV graph into a CSV file which can be imported to most of the visualization tools like Cytoscape.

Test cases to cover all ingest and query API operations

Komadu comes with a set of tests which cover almost all the supported operations. User can follow these to develop their own Komadu client.

Contributors

  • Beth Plale, Project director
  • Isuru Suriarachchi
  • Quan Zhou
  • Devarshi Ghoshal

Licensing

The Komadu Provenance Collection Tool is licensed under Apache License, Version 2.0 (the "License"). The code is copyrighted and copyright owned by The Trustees of Indiana University. Komadu is a product of the Data to Insight Center of Pervasive Technology Institute at Indiana University.

See digital data provenance for more information.