Indiana University

 

D2I Current Projects

Data Management Research

Data to Insight Center guidance on preparing an NSF Data Management Plan:  "Indiana University Guidance on Preparing NSF Data Management Plan"

D2I is funding up to 5 research grants to address aspects of data management such as but not limited to extensible infrastructure for data interoperability, data to insight, data lifecycles, digital data curation, scientific data preservation. The grants are intended to support the early development of innovative research likely to compete favorably for competitive external funding in the future and/or produce translational deliverables such as new software, a demonstration collection, or other tangible output that advances the goals of the Data to Insight Center.

Proposal Page
Data Management Plan

 
Funding: Lilly Endowment
July 12, 2010 - current
 
Digital Data Provenance - Karma
As research digital data collections become more accessible, it becomes increasingly important to address the issues of data validity and quality: To record and manage information about where each data object originated, the processes applied to the data products, and by whom.  This project is developing tools for provenance generation and collection and case-based reasoning. The tools and collected data are also available for download for wider community use.
Project Page
 
Internal Links
 Discussion Board
Funding: NASA | Geni
October 2009 - current
 
Geni-NetKarma
The project will collect provenance of the data generated by GENI. A GENI Provenance Registry (NetKarma) will capture the workflow of GENI slice creation, topology of the slice, operational status and other measurement statistics and correlate it with the experimental data. NetKarma will allow researchers to see the exact state of the network and store configuration of the experiment and its slice.
Project Page
GENI Project Wiki
Internal Links
 Discussion Board
Funding: Geni
October 2009 - current
 
Hierarchical MapReduce
MapReduce is a model for processing huge datasets on embarrassingly parallel applications using a large number of compute resources. But typical MapReduce frameworks are limited to scheduling jobs to run within a single cluster. However, a single cluster would not be easy to scale, and the input dataset could be widely distributed across multiple clusters. We extend the MapReduce framework to a hierarchical framework that gathers computation resources from different clusters and run MapReduce jobs across them. The applications implemented in this framework adopt the “Map-Reduce-Global Reduce” model where computations are expressed as three functions: Map, Reduce, and Global Reduce. The global controller in our framework splits the data set and maps them onto multiple “local” MapReduce clusters to run Map and Reduce functions, and the local results are returned back to the global controller to run the Global Reduce function.
Project Page
 
Internal Links
 Discussion Board
Funding: PTI
Jan 2011 - current
 



NASA-InstantKarma
The project will improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community.  It will customize and integrate Karma into NASA data production by collecting and disseminating provenance of AMSR-E (Advanced Microwave Scanning Radiometer - Earth Observing System) standard data products.
Project Page
 
Internal Links
 Discussion Board
Funding: NASA
April 2010 - current
 
LEAD II 
LEAD II is a follow-on to the successful Linked Environments for Atmospheric Discovery NSF funded large-scale ITR. LEAD II carries the vision of LEAD forward into new areas as it explores research challenges in hybrid computing and in the manipulation and use of weather data in non-weather applications.  LEAD II supported Vortex2 and is currently supporting Kathleen Baker with her USDA crop disease research.
Project Page
LEAD Portal
Internal Links
 Discussion Board
 Developers Website
Funding: Microsoft Research,
USDA & Data to Insight Center
September 2009 - current
 
Sigiri
Sigiri is a light-weight job management and abstraction service that supports job specifications like JSDL and RSL. A Web Service Interface allows integration with various scientific workflow systems and each step in job submission and management is decoupled to increase scalability.
Project Page
 
Internal Links
 Discussion Board
Funding: Data to Insight Center
Current
 
Social-Ecological Informatics LogoSocial-Ecological Informatics
Social-ecological researchers study the interactions of the environment, users, and governance of environmental resources. The research undertaken by the Social Ecological Informatics group applies database and data management, information retrieval, knowledge management, human computer interaction design, and ontological tools and approaches to enhancing the value of social-ecological data for research and policy use. We are investigating general means to archive longitudinal databases of survey based environmental data using XML and DDI. Through application of Latent Semantic Analysis on free text in a social-ecological database, we illuminate concepts that serve as enhanced metadata. By linking to data sources on the web through Linked Data, we investigate harvesting of additional metadata. We are also exploring ontological infrastructure to link between social-ecological databases and the higher level models that describe actions and outcomes in complex social-ecological systems.
Project Page
 
Internal Links
  Discussion Board
Funding: NSF
October 2010 - current
 



StreamflowStreamflow
 
Streamflow integrates data streams into a standard workflow system through a programming model approach that introduces new workflow semantics that enable scientific workflow designers to incorporate data streams into the experiment without major changes to the infrastructure.  It utilizes XBaya as a graphical client program for workflow composition, execution and monitoring.
Project Page
 
Internal Links
 Discussion Board
Funding: Data to Insight Center
Current
 
XMC Cat
XMC Cat is a web service toolkit for capturing and storing metadata during the execution of scientific workflows to enable data discovery and reuse. Its advantages include adaptability to domain schemata through configuration instead of code changes, support for automatic capture of metadata through curation plugins, and search and browse capabilities through a web-based GUI that dynamically adjusts to the domain schema. This allows XMC Cat to be deployed in different scientific domains without requiring new code to be written. It is currently in use in the LEAD Science Gateway.
Project Page
 
Internal Links
 Discussion Board
Funding: Data to Insight Center
October 2005 - current
 

Archived Projects >>