Skip to Main Content

Course & Subject Guides

Metadata & Discovery @ Pitt

This guide will assist researchers in understanding the basics of metadata and selecting appropriate metadata standards.

What are taxonomies and controlled vocabularies?

Sometimes used interchangeably, taxonomies and controlled vocabularies are controlled lists of terms used to organize information. A controlled vocabulary is "an organized arrangement of words and phrases used to index content and/or to retrieve content through browsing and searching." (Harpring). Controlled vocabulary can be used in any industry that collects and uses information, such as academic research, libraries, corporations, governmental organizations, etc. Common types of controlled vocabularies include term lists, authority files, and thesauri.

Because controlled vocabularies require the use of predefined terms, they can be challenging to adopt and apply, but using controlled vocabularies during data or metadata creation supports consistency and accuracy. There are well-established vocabularies for personal and corporate names, geographic names, topics, concepts, resource types and genres, and languages.

Sources:

Patricia Harpring. "What Are Controlled Vocabularies?" in Introduction to Controlled Vocabularies

Heather Hedden. "Taxonomies and controlled vocabularies best practices for metadata" in Journal of Digital Asset Management (Oct 2010, vol. 6, no. 5) 

Types of Controlled Vocabularies

The are many types of controlled vocabularies from simple term lists to complex machine-readable ontologies. A few of the most commonly used controlled vocabularies are listed below. Understanding the type and scope is useful when selecting appropriate controlled vocabularies for projects.

Term Lists

A term lists, sometimes called a pick list, is the simplest type of controlled vocabularies. To create a term list, an agreed upon list of words and/or phrases is developed to identify a specific characteristic of a person, event, object, or other "thing". To use a term list, the user must select a term identified in the list. There are no synonyms or related terms identified; it is just a simple list of terms. Term lists are usually best when there are not a lot of terms needed, such as lists of file formats or object types.

Authority Files

The next level of complexity in controlled vocabularies is the authority file. Like term lists, authorities files provide a consistent list of terms to describe different kinds of resources, but also include cross-references from variant or alternate terms. Authority files often include other contextual or biographical information to assist users with disambiguation. Providing the preferred term, along with alternate versions provides more context for the metadata creator. Additionally, alternate and variant terms could be indexed in databases, allowing users to find the appropriate resources, even if they do not know the proper term. Authority files are commonly used to identify proper forms of names, such as in the Library of Congress Name Authority File.

Taxonomies

In recent years, the term "taxonomy" has become a sort of generic term for any kind of controlled vocabulary, particularly in business applications. From an Information Sciences perspective, a taxonomy is a hierarchical classification or categorization system in which "all the terms belong to a single hierarchical structure and have parent/child or broader/narrower relationships to other terms.²" Taxonomies allow for classification according to a pre-determined system. In comparison to authority files, which have preferred terms and variant terms, a taxonomy also includes a hierarchy, designating both broader and narrower terms. 

Thesauri

A thesaurus is a kind of dictionary represents all the concepts for a specific domain in a consistent manner and labels each concept with a preferred term. Like the previously described examples of controlled vocabularies, thesauri contain preferred terms, variant terms, and broader and narrower terms. Additional, the thesaurus also includes related terms, which may or may not be part of the same hierarchical structure of the term. A commonly used thesaurus for describing art, architecture, and material culture objects is the Getty Art & Architecture Thesaurus.For instance, the Getty Art & Architecture Thesaurus entry for "cellular telephones" gives alternate terms such as mobile phone in English, Handy in German, and shǒu  in Pinyin Chinese. The entry also provides a definition and a hierarchical categorization of the term.

Recommended Controlled Vocabularies

General Purpose:

Sciences:

Social and Behavioral Sciences:

Arts and Humanities:

Other Resources on Taxonomies and Controlled Vocabularies