Persistent Identifiers (PIDs) and Data Management - Experiences and Use Cases

Streaming Sensor Data 

The use case focused on air sensor data collected from devices that are installed mostly in Taiwan by the air quality microsensing project (https://pm25.lass-net.org/ ) - a collaboration between the Taiwanese academic institutions, computer industry and Taipei city government. The streaming data for our use case was collected in 2017.

To enable easy referencing and re-use from a repository, PIDs were assigned to daily sensor feeds of the data. An ongoing research and testing focuses on the RDA data collections API - software that facilitates management of collections and their member objects and provides flexible searching functionality for representing data objects.

For more information, data and code visit: 

Rice Genomics Galaxy Workflows

PRAGMA-RDA Data Service Galaxy application brings persistent IDs and registration of data objects generated by scientific analysis, that is carried out using cloud virtual machines (VMs) in PRAGMA (http://www.pragma-grid.net/). The objective of the project is to improve sharing of data objects specifically from genomic analyses by the International Rice Research Institute (http://irri.org/) community. This service is designed to be reusable in other cases where VMs are used for analysis and PIDs are used to enhance sharing and reusability of results.

As part of our research at IU, we explore provenance extraction. Our design employs a hands-off technique to gather information about a researcher’s rice genomics analysis while the analysis is running. Such information includes who performed the analysis, when it was performed, and under what conditions. We store provenance as part of the PID record in the PID Kernel Information. Figures 1 and 2 below demonstrate the difference between information publishing as PID Kernel (left) and the full provenance trace of a workflow execution (right).