Enrich service is
primarily built to show
- how adding certain kinds of descriptive details will streamline data consumer activities and
- how certain
descriptive details can be automatically added in order to streamline the data producer activities.
Currently, select tabular datasets from data.gov are harvested and
processed in an automated fashion for demonstrating the value of
adding descriptive details useful for consumers. That automated
process resulted in 1) identifying the column names in those
datasets and 2) identifying the syntactic nature of the values in
each of those columns, e.g., that column 1 is an integer and the
values range between 100 and 5000, that column 2 is a date and the
date format is yyyy-mm-dd, and so on. We call the record that
stores all the generated descriptions as "data type" records. Those
data type records can be manually edited or enhanced even further.
For example, semantic information such as that column 1 is not just
an integer, but is also a "temperature expressed in fahrenheit"
could also be stated.
To view a semantically enhanced example, search for "back bay" in
the search bar, and click on the "Dataset Preview" button of the
first result with the title "Back Bay National Wildlife Refuge
Water Quality Data".
Enrich service currently leverages existing infrastructures and
standards:
- In order for Enrich service to store "data type" records,
a "data type registry" is used behind the scenes. We have
registered several hundred concepts (e.g., pressure, temperature,
length, etc.) and measurement units (e.g., Fahrenheit, Celsius, etc.)
in that data type registry. National Library of Medicine has made
available those concepts and measurement units for public
consumption.
- In order to display datasets and corresponding data type
records on the Enrich user interface based on a dataset title, a data source, etc., a metadata
registry is also used. We harvested the metadata records that
data.gov provided along with the datasets.
- In order to link datasets, data type records, and
metadata records together, handles (from the Handle System) are
used.
The software at
cordra.org is used
for managing the data type records as well as metadata records. And
more information about handles can be found at
handle.net.
Currently, the data type records are represented using JSON and can
easily be represented using, say, RDF. The data type record
structure does not conform to any existing standard, but relevant
efforts are ongoing to evaluate existing standards.
For more information about Enrich and for additional background
information, click
here.