ENRICH YOUR DATASETS

We generate descriptive details useful for dataset consumers

Learn more about Enrich service here.

SHARE

Share your datasets with our service for improving consumer experience

ENRICH

Learn and update descriptive details that our service auto-generated

CONSTRUCT

Construct ideal datasets by transforming values or by assembling one or more enriched datasets

Share your datasets.

Specify a link to pull the dataset from

Name

URL

Create Dataset

Share another dataset

Processed Dataset.

We have successfully processed the shared dataset.
You can edit the descriptive details below by clicking on the dataset preview.

Find datasets we enriched.

Search is powered by Cordra.

Search Results (showing results from 3 matches)

of 23

Construct your ideal dataset.

Datasets selected for construction from above are shown here.

The Dataset you Selected

You have selected this dataset for merging with other datasets.

Compatible Sets

Compatible Datasets

We have identified datasets that are compatible to the dataset that you have selected.
Select a dataset below and click on Merge button to see a few rows of the merged dataset.

We could not identify any datasets in our system that are compatible with the dataset that you have selected. If you believe this is an error, please let us know by clicking here.

Merged Dataset

Results of merged dataset

Showing from rows

Background

Enrich service is primarily built to show

how adding certain kinds of descriptive details will streamline data consumer activities and
how certain descriptive details can be automatically added in order to streamline the data producer activities.

Currently, select tabular datasets from data.gov are harvested and processed in an automated fashion for demonstrating the value of adding descriptive details useful for consumers. That automated process resulted in 1) identifying the column names in those datasets and 2) identifying the syntactic nature of the values in each of those columns, e.g., that column 1 is an integer and the values range between 100 and 5000, that column 2 is a date and the date format is yyyy-mm-dd, and so on. We call the record that stores all the generated descriptions as "data type" records. Those data type records can be manually edited or enhanced even further. For example, semantic information such as that column 1 is not just an integer, but is also a "temperature expressed in fahrenheit" could also be stated.

To view a semantically enhanced example, search for "back bay" in the search bar, and click on the "Dataset Preview" button of the first result with the title "Back Bay National Wildlife Refuge Water Quality Data".

Enrich service currently leverages existing infrastructures and standards:

In order for Enrich service to store "data type" records, a "data type registry" is used behind the scenes. We have registered several hundred concepts (e.g., pressure, temperature, length, etc.) and measurement units (e.g., Fahrenheit, Celsius, etc.) in that data type registry. National Library of Medicine has made available those concepts and measurement units for public consumption.
In order to display datasets and corresponding data type records on the Enrich user interface based on a dataset title, a data source, etc., a metadata registry is also used. We harvested the metadata records that data.gov provided along with the datasets.
In order to link datasets, data type records, and metadata records together, handles (from the Handle System) are used.

The software at cordra.org is used for managing the data type records as well as metadata records. And more information about handles can be found at handle.net.

Currently, the data type records are represented using JSON and can easily be represented using, say, RDF. The data type record structure does not conform to any existing standard, but relevant efforts are ongoing to evaluate existing standards.

For more information about Enrich and for additional background information, click here.