XML Metadata Concept Catalog (XMC Cat)

PTI no longer actively supports XMC Cat, but the source code is still available.

About

XMC Cat is a metadata catalog that stores rich metadata describing data objects that are themselves stored in files, storage repositories, or on the web. It is an open source web service written in Java that utilizes the Axis2 web service engine and Apache Tomcat.

Its features include adaptability to domain schemata through configuration instead of code changes, support for automatic capture of metadata through the use of curation plugins, and search and browse capabilities through a web-based GUI that is dynamically generated from a domain schema. This allows XMC Cat to be deployed in different scientific and educational domains without requiring new code to be written.

XMC Cat is currently in use in the LEAD Science Gateway.

Metadata's role in XMC Cat

Metadata schemas used in science and education are composed of complex concepts that describe the data products generated by a community. XMC Cat exploits this unique feature of scientific metadata to both efficiently store metadata and perform detailed data discovery queries.

This concept-based approach also enables the automatic generation of the data search GUI and easier deployment of metadata catalogs in diverse scientific domains.

How it works

An XML metadata schema (or schemas) is partitioned into the concepts it contains and metadata can be efficiently ingested and validated incrementally using concepts as the unit of storage. These concepts are also shredded to allow detailed data discovery through a point-and-click search GUI.

This combination of concepts as the unit of metadata storage along with shredded metadata provide efficiency for both insert and query operations by enabling for the rapid rebuilding of the XML metadata in response to detailed data discovery queries. This approach also enables the query interface to dynamically adapt to the native data type for each metadata element, be it numeric, string, temporal, or spatial.

Query interfaces customized for the schema are constructed automatically based on the metadata schema for which each community deploys XMC Cat. Additionally, the necessary XML Beans and XSLT code needed to configure XMC Cat for a domain schema can be generated through a point-and-click web interface.

Resources

Both the client and server can either be downloaded as binaries or compiled from source code by downloading the source tarball. The source code is configured to be built using Maven2, and the build script will generate the client, server, and some additional utilities used in XMC Cat.

Since XMC Cat is a web service described by a WSDL, clients can also use the tool of their choice to to build a client. In the server installation of XMC Cat, we use the XML Beans data binding.

Please see the XMC Cat manual available on the Data to Insight wiki for installation instructions and details on using XMC Cat.

Starting with version 2.0, downloads for XMC Cat are available from SourceForge.

Publications

For more information, please see XMC Cat on Google Scholar.

Contributors

  • Scott Jensen
  • Beth Plale
  • Yiming Sun
  • Shobana Krishnan
  • Bina Bhaskar
  • Kavitha Chandrasekar
  • Kalani Ruwanpathirana
  • Bimalee Salpitikorala

Sponsors

Data to Insight Center
Indiana University Pervasive Technology Institute
2009-present

National Science Foundation
Linked Environments for Atmospheric Discovery
2003-2009