Home
Centers
Data to Insight Center (D2I)
Project Areas
Persistent Identifiers (PIDs) and Data Management - Experiences and Use Cases

Persistent Identifiers (PIDs) and Data Management - Experiences and Use Cases

Streaming Sensor Data

The use case focused on air sensor data collected from devices that are installed mostly in Taiwan by the air quality microsensing project (https://pm25.lass-net.org/ ) - a collaboration between the Taiwanese academic institutions, computer industry and Taipei city government. The streaming data for our use case was collected in 2017.

To enable easy referencing and re-use from a repository, PIDs were assigned to daily sensor feeds of the data. An ongoing research and testing focuses on the RDA data collections API - software that facilitates management of collections and their member objects and provides flexible searching functionality for representing data objects.

For more information, data and code visit:

https://github.com/RDACollectionsWG/specification

https://github.com/Data-to-Insight-Center/SEADTrain

Rice Genomics Galaxy Workflows

PRAGMA-RDA Data Service Galaxy application brings persistent IDs and registration of data objects generated by scientific analysis, that is carried out using cloud virtual machines (VMs) in PRAGMA (http://www.pragma-grid.net/). The objective of the project is to improve sharing of data objects specifically from genomic analyses by the International Rice Research Institute (http://irri.org/) community. This service is designed to be reusable in other cases where VMs are used for analysis and PIDs are used to enhance sharing and reusability of results.

As part of our research at IU, we explore provenance extraction. Our design employs a hands-off technique to gather information about a researcher’s rice genomics analysis while the analysis is running. Such information includes who performed the analysis, when it was performed, and under what conditions. We store provenance as part of the PID record in the PID Kernel Information. Figures 1 and 2 below demonstrate the difference between information publishing as PID Kernel (left) and the full provenance trace of a workflow execution (right).

More Information, Data and Code

Streaming Sensor Data Metadata and PID assignment:

GitHub Code: https://github.com/Data-to-Insight-Center/SEADTrain/tree/master/sead-client

GitHub Data: https://github.com/Data-to-Insight-Center/SEADTrain/tree/master/data

GitHub Wiki: https://github.com/Data-to-Insight-Center/SEADTrain/wiki/PID-Creation

GitHub Wiki: https://github.com/Data-to-Insight-Center/SEADTrain/wiki/SEADTrain-Data-Description

Rice Genomics Galaxy Workflows Metadata and PID assignment:

GitHub Code: https://github.com/Data-to-Insight-Center/RDA-PRAGMA-Data-Service/tree/master/pragmapit-ext

GitHub Code: https://github.com/Data-to-Insight-Center/RDA-PRAGMA-Data-Service/tree/master

GitHub Wiki: https://github.com/Data-to-Insight-Center/RDA-PRAGMA-Data-Service/wiki/Welcome-to-PRAGMA-Data-Service-Prototype

GitHub Wiki: https://github.com/Data-to-Insight-Center/RDA-PRAGMA-Data-Service/wiki/RDA-PRAGMA-Data-Identity-Service-API-Documents

Data Publishing Workflow Services:

Streaming Sensor Data: http://d2i-dev.d2i.indiana.edu:8081/iusc-azure-search/search.html

Rice Genomics Galaxy Workflow: http://202.90.159.39:8079/dataidentity/irri-search.html?DataTypePID=20.5000.347/1af9b7467412d3982998&DataTypeName=IRRI%20Rice%20Genomes%20tassel%20workflow

Persistent Identifiers (PIDs) and Data Management - Experiences and Use Cases

Streaming Sensor Data

Rice Genomics Galaxy Workflows

Pervasive Technology Institute social media channels