Linked data is the basis of the Semantic Web. Berners-Lee et al. describe the Semantic Web as “an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.¹” Information is structured in a way that computers can draw relationships between resources. Those relationships are what is known as linked data.
Creating shareable, high quality structured metadata enables your research to become more discoverable and linkable to other related datasets.
Interested in learning more? There are many open-access resources for getting started on Linked Data. A select few follow:
1. Berners-Lee, Tim, James Hendler, and Ora Lassila. 2001. "The Semantic Web." Scientific American 284 (5): 34-43.
Four Rules for the Semantic Web
Tim Berners-Lee proposes four general guidelines for preparing linked data for the semantic web.
Five-Star Linked Open Data
Even further, he suggests a five-star scheme for Linked Open Data (LOD):
★ | Available on the web (whatever format) but with an open licence, to be Open Data |
★★ | Available as machine-readable structured data (e.g. excel instead of image scan of a table) |
★★★ | as (2) plus non-proprietary format (e.g. CSV instead of excel) |
★★★★ | All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff |
★★★★★ | All the above, plus: Link your data to other people’s data to provide context |
Source: Berners-Lee, Tim. 2006. Linked Data. Last updated June 18, 2009.
Linked Open Data (LOD)
Linked Data which is released under an open license, which does not impede its reuse for free. — Tim Berners-Lee, Linked Data.
Examples of large linked open data sets include DBpedia and Wikidata.
Resource Description Framework (RDF)
A suite of semantic web standards developed by the Worldwide Web Consortium (W3C). These standards create a structure for making simple statements about resources so that machines can interpret relationships. These statements are called triples, which are subject-predicate-object statements used to describe the relationships between entities in a linked data environment.
Resources: RDF 1.1 Primer
Schema
A set of elements for structuring data (ex. MARC, MODS, EAD, RDFS).
URI
A URI, or Uniform Resource Identifier, is a unique, controlled term used to identify something.
One type of URI is a URN, or uniform resource name, which is an established, standardized label for a particular entity. The other type of URI is a URL, which provides an internet location for a resource. Machine-interpretable URIs are usually in URL form. These URLs may lead human users to further information about these resources, but not all URIs need to point to a human-readable webpage.
An example of a URI for a still image (picture, map, etc.) from the Dublin Core type vocabulary is http://purl.org/dc/dcmitype/StillImage. Another example gives a URI for the American author named Mark Twain as https://viaf.org/viaf/50566653.
Triples
Also know as Semantic Triples, triples are subject-predicate-object statements used to describe the relationships between entities in a linked data environment. They are the building blocks of Linked Data. For example, to describe a book titled "Some Book" written by an author named Jane Doe, the triple may be something like Jane Doe is the author of "Some Book." In the Semantic Web, each component of a triple is usually given in URIs.
Semantic Web
Considered the next stage of development after the World Wide Web, the Semantic Web is an envisioning of a World Wide Web of linked data. In the Semantic Web, all data on the Web is structured and machine-readable. This enables computers to infer relationships between resources, increasing the means for human discovery of new realms of knowledge. One contemporary pre-cursor to the Semantic Web is the Google Knowledge Panel, which brings together data from many sources into a compact and simple info box.
Turtle
Terse RDF Triple Language (better known as Turtle) is a syntax and format for storing and representing data in the Resource Description Framework (RDF) model. Data in Turtle format presents in triple form using URIs rather than words to represent values. For instance, information about a book could be presented in field/value form or in Turtle format:
Tabular format: While machines may be able to store this information, this format is only interpretable by humans.
Title: Adventures of Huckleberry Finn.
Author: Mark Twain
Turtle format: While not as easy for humans to understand in its raw form, Turtle is interpretable by machines which can infer enough to display information in a more human user-friendly format.
<http://example.org/person/Mark_Twain>
<http://example.org/relation/author>
<http://example.org/books/Huckleberry_Finn> .