D2I Current Projects
![]() | Data Management Research | |||||||
Data to Insight Center guidance on preparing an NSF Data Management Plan: "Indiana University Guidance on Preparing NSF Data Management Plan" D2I is funding up to 5 research grants to address aspects of data management such as but not limited to extensible infrastructure for data interoperability, data to insight, data lifecycles, digital data curation, scientific data preservation. The grants are intended to support the early development of innovative research likely to compete favorably for competitive external funding in the future and/or produce translational deliverables such as new software, a demonstration collection, or other tangible output that advances the goals of the Data to Insight Center. | ||||||||
| Proposal Page | ||||||||
| Data Management Plan | ||||||||
| Funding: Lilly Endowment | ||||||||
| July 12, 2010 - current | ||||||||
![]() | Digital Data Provenance - Karma | |||||||
| As research digital data collections become more accessible, it becomes increasingly important to address the issues of data validity and quality: To record and manage information about where each data object originated, the processes applied to the data products, and by whom. This project is developing tools for provenance generation and collection and case-based reasoning. The tools and collected data are also available for download for wider community use. | ||||||||
| Project Page | ||||||||
| | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: NASA | Geni | ||||||||
| October 2009 - current | ||||||||
![]() | Geni-NetKarma | |||||||
| The project will collect provenance of the data generated by GENI. A GENI Provenance Registry (NetKarma) will capture the workflow of GENI slice creation, topology of the slice, operational status and other measurement statistics and correlate it with the experimental data. NetKarma will allow researchers to see the exact state of the network and store configuration of the experiment and its slice. | ||||||||
| Project Page | ||||||||
| GENI Project Wiki | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: Geni | ||||||||
| October 2009 - current | ||||||||
![]() | Hierarchical MapReduce | |||||||
| MapReduce is a model for processing huge datasets on embarrassingly parallel applications using a large number of compute resources. But typical MapReduce frameworks are limited to scheduling jobs to run within a single cluster. However, a single cluster would not be easy to scale, and the input dataset could be widely distributed across multiple clusters. We extend the MapReduce framework to a hierarchical framework that gathers computation resources from different clusters and run MapReduce jobs across them. The applications implemented in this framework adopt the “Map-Reduce-Global Reduce” model where computations are expressed as three functions: Map, Reduce, and Global Reduce. The global controller in our framework splits the data set and maps them onto multiple “local” MapReduce clusters to run Map and Reduce functions, and the local results are returned back to the global controller to run the Global Reduce function. | ||||||||
| Project Page | ||||||||
| | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: PTI | ||||||||
| Jan 2011 - current | ||||||||
![]() | NASA-InstantKarma | |||||||
| The project will improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community. It will customize and integrate Karma into NASA data production by collecting and disseminating provenance of AMSR-E (Advanced Microwave Scanning Radiometer - Earth Observing System) standard data products. | ||||||||
| Project Page | ||||||||
| | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: NASA | ||||||||
| April 2010 - current | ||||||||
![]() | LEAD II | |||||||
| LEAD II is a follow-on to the successful Linked Environments for Atmospheric Discovery NSF funded large-scale ITR. LEAD II carries the vision of LEAD forward into new areas as it explores research challenges in hybrid computing and in the manipulation and use of weather data in non-weather applications. LEAD II supported Vortex2 and is currently supporting Kathleen Baker with her USDA crop disease research. | ||||||||
| Project Page | ||||||||
| LEAD Portal | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Developers Website | ||||||||
| Funding: Microsoft Research, USDA & Data to Insight Center | ||||||||
| September 2009 - current | ||||||||
![]() | Sigiri | |||||||
| Sigiri is a light-weight job management and abstraction service that supports job specifications like JSDL and RSL. A Web Service Interface allows integration with various scientific workflow systems and each step in job submission and management is decoupled to increase scalability. | ||||||||
| Project Page | ||||||||
| | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: Data to Insight Center | ||||||||
| Current | ||||||||
| Social-Ecological Informatics | ||||||||
| Social-ecological researchers study the interactions of the environment, users, and governance of environmental resources. The research undertaken by the Social Ecological Informatics group applies database and data management, information retrieval, knowledge management, human computer interaction design, and ontological tools and approaches to enhancing the value of social-ecological data for research and policy use. We are investigating general means to archive longitudinal databases of survey based environmental data using XML and DDI. Through application of Latent Semantic Analysis on free text in a social-ecological database, we illuminate concepts that serve as enhanced metadata. By linking to data sources on the web through Linked Data, we investigate harvesting of additional metadata. We are also exploring ontological infrastructure to link between social-ecological databases and the higher level models that describe actions and outcomes in complex social-ecological systems. | ||||||||
| Project Page | ||||||||
| | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: NSF | ||||||||
| October 2010 - current | ||||||||
![]() | Streamflow | |||||||
| Streamflow integrates data streams into a standard workflow system through a programming model approach that introduces new workflow semantics that enable scientific workflow designers to incorporate data streams into the experiment without major changes to the infrastructure. It utilizes XBaya as a graphical client program for workflow composition, execution and monitoring. | ||||||||
| Project Page | ||||||||
| | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: Data to Insight Center | ||||||||
| Current | ||||||||
![]() | XMC Cat | |||||||
| XMC Cat is a web service toolkit for capturing and storing metadata during the execution of scientific workflows to enable data discovery and reuse. Its advantages include adaptability to domain schemata through configuration instead of code changes, support for automatic capture of metadata through curation plugins, and search and browse capabilities through a web-based GUI that dynamically adjusts to the domain schema. This allows XMC Cat to be deployed in different scientific domains without requiring new code to be written. It is currently in use in the LEAD Science Gateway. | ||||||||
| Project Page | ||||||||
| | ||||||||
| Internal Links | ||||||||
| Discussion Board | ||||||||
| Funding: Data to Insight Center | ||||||||
| October 2005 - current | ||||||||








