Skip to Main Content

Course & Subject Guides

Metadata & Discovery @ Pitt

This guide will assist researchers in understanding the basics of metadata and selecting appropriate metadata standards.

Linked Data and the Semantic Web

Linked data is the basis of the Semantic Web. Berners-Lee et al. describe the Semantic Web as “an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.¹” Information is structured in a way that computers can draw relationships between resources. Those relationships are what is known as linked data.

Creating shareable, high quality structured metadata enables your research to become more discoverable and linkable to other related datasets.

Interested in learning more? There are many open-access resources for getting started on Linked Data. A select few follow:

  • Linked Data. World Wide Web Consortium (2015)
    • A concise primer by the W3C that includes a brief description of the Semantic Web and examples of linked datasets.
  • Linked Data: Evolving the Web into a Global Data Space. Tom Heath and Christian Bizer (2011)
    • Provides a conceptual and technical introduction to the field of Linked Data.

1. Berners-Lee, Tim, James Hendler, and Ora Lassila. 2001. "The Semantic Web." Scientific American 284 (5): 34-43.

Best Practices for Preparing Your Data

Four Rules for the Semantic Web

Tim Berners-Lee proposes four general guidelines for preparing linked data for the semantic web.

  1. Use URIs as names for things. This includes naming relationships between things. Using real-world URIs in your data is crucial to link your data to other datasets.
  2. Use web URIs so that people can look up those names. Make sure the URIs come from an established controlled vocabulary with an active community that maintains and supports the standard.
  3. When someone looks up a URI, provide useful information, using standards like controlled vocabularies.
  4. Include links to other URIs, so people and computers can discover more things.

Five-Star Linked Open Data

Even further, he suggests a five-star scheme for Linked Open Data (LOD):

Available on the web (whatever format) but with an open licence, to be Open Data
★★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★ All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★ All the above, plus: Link your data to other people’s data to provide context

 

Source: Berners-Lee, Tim. 2006. Linked Data. Last updated June 18, 2009.

Key Terms in Linked Data

Linked Open Data (LOD)

Linked Data which is released under an open license, which does not impede its reuse for free. — Tim Berners-Lee, Linked Data.

Examples of large linked open data sets include DBpedia and Wikidata.

Resource Description Framework (RDF)

A suite of semantic web standards developed by the Worldwide Web Consortium (W3C). These standards create a structure for making simple statements about resources so that machines can interpret relationships. These statements are called triples, which are subject-predicate-object statements used to describe the relationships between entities in a linked data environment.

Resources: RDF 1.1 Primer

Schema

A set of elements for structuring data (ex. MARC, MODS, EAD, RDFS).

URI

A URI, or Uniform Resource Identifier, is a unique, controlled term used to identify something.

One type of URI is a URN, or uniform resource name, which is an established, standardized label for a particular entity. The other type of URI is a URL, which provides an internet location for a resource. Machine-interpretable URIs are usually in URL form. These URLs may lead human users to further information about these resources, but not all URIs need to point to a human-readable webpage.

An example of a URI for a still image (picture, map, etc.) from the Dublin Core type vocabulary is http://purl.org/dc/dcmitype/StillImage. Another example gives a URI for the American author named Mark Twain as https://viaf.org/viaf/50566653.

Triples

Also know as Semantic Triples, triples are subject-predicate-object statements used to describe the relationships between entities in a linked data environment. They are the building blocks of Linked Data. For example, to describe a book titled "Some Book" written by an author named Jane Doe, the triple may be something like Jane Doe is the author of "Some Book." In the Semantic Web, each component of a triple is usually given in URIs.

Semantic Web

Considered the next stage of development after the World Wide Web, the Semantic Web is an envisioning of a World Wide Web of linked data. In the Semantic Web, all data on the Web is structured and machine-readable. This enables computers to infer relationships between resources, increasing the means for human discovery of new realms of knowledge. One contemporary pre-cursor to the Semantic Web is the Google Knowledge Panel, which brings together data from many sources into a compact and simple info box.

Turtle

Terse RDF Triple Language (better known as Turtle) is a syntax and format for storing and representing data in the Resource Description Framework (RDF) model. Data in Turtle format presents in triple form using URIs rather than words to represent values. For instance, information about a book could be presented in field/value form or in Turtle format:

Tabular format: While machines may be able to store this information, this format is only interpretable by humans.

Title: Adventures of Huckleberry Finn.

Author: Mark Twain

Turtle format: While not as easy for humans to understand in its raw form, Turtle is interpretable by machines which can infer enough to display information in a more human user-friendly format.

 <http://example.org/person/Mark_Twain>
    <http://example.org/relation/author>
    <http://example.org/books/Huckleberry_Finn> .

Other Resources for Linked Data and the Semantic Web