Persistent Identifiers (PIDs) and Data Management - Data Management Studies

Data Repository for Research Data Objects

In the PRAGMA project, data repository is designed for managing scientific data objects across the boundaries among different domains. Our data repository presents an interface that can host both long-tail data objects and large data sets.

This is implemented with a noSQL database MongoDB, which allows to distribute the database among different machines while maintain replicas in other machines. With MongoDB as backend, we also use a single framework to store both metadata and data and offer users the possibility to decide on what information they want as data objects metadata.

For more details about our PRAGMA data repository visit the GitHub page:
https://github.com/Data-to-Insight-Center/PRAGMA-Data-Repository/tree/master

Digital Object Interface Protocol Study (Forthcoming)

DOIP (Digital Object Interface Protocol) is a protocol and authentication mechanism used to interact with digital objects through a connection to the digital object server. In this protocol information is represented as sets of key-value pairs (see example below):

{
<message> := <messagetype> ’:’
<segment><newline>
<segment> :=
<segment> := <kvpair>
<segment> ’&’ <kvpair>
<segment> := <kvpair>
<kvpair> := <key>
<kvpair> := <key> ’=’ <value>
<messagetype> := <encodedtoken>
<key> := <encodedtoken>
<value> := <encodedtoken>
}

As our extended PID system supports two type of transport protocols - DOIP and HTTP (HyperText Transfer Protocol) - we need to evaluate which protocol is better for our use cases and select the protocol based on that.

Experimental methods and findings

As part of the baseline study, we measure the network behavior for both HTTP and DOIP protocols. Network behavior is measured by the overall Response time minus the Service time for both send and receive. The measurement uses a 5,000 event workload.

Type of ProtocolMean (in ms)Standard Deviation (in ms)
HTTP7:424:19
DOIP3:923:76

As can be seen from the table, the DOIP protocol is almost twice as fast as HTTP based on response time, and shows more stability given its smaller standard deviation. However, as a custom protocol, DOIP may be more problematic for widespread adoption.