Pitt community: write to Digital Scholarship Services or use our AskUs form
Pitt health sciences researchers: contact Data Services, Health Sciences Library System
Dominic Bordelon, dbordelon@pitt.edu
"Data Sharing @ Pitt" by University of Pittsburgh Library System is licensed for reuse under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
If you share tabular data such as spreadsheets, you should also describe what the columns in your spreadsheet(s) mean. This is accomplished with a data dictionary. A data dictionary is a separate table or spreadsheet, with one row for each column in the data of interest.
For example, if you have a spreadsheet with two columns, treatment_group and tumor_size, then the corresponding data dictionary should have two rows, one for treatment_group and one for tumor_size. Then, each variable is described according to human-friendly meaning, data type, and potentially more. This type of information—information about data of interest—is also referred to as metadata.
Figure: An example data dictionary creation process. Data adapted from WPRDC.
For any categorical variables in your data, there should also be descriptions of the levels (categories) available in the variable and their meaning. Information about categorical variables—such as label, meaning, and assignment criteria—is collected in a code book, which may be expressed within the data dictionary (if the categories are simple) or in its own document. A dedicated codebook document may go into much more detail and may sometimes be encoded in a machine-readable format such as XML.
Each of your tabular data sets (i.e., each spreadsheet) should have an accompanying data dictionary, although the same data dictionary can describe multiple files so long as they all follow the same format.
Data dictionaries and codebooks help others to interpret your data post-publication, but they can also be useful during the life of a longer-running and/or collaborative project, where such documentation facilitates consistency across files and team members. For this reason, you may want to consider developing and maintaining data dictionaries and codebooks early in (and throughout) your data collection process.
Typical fields (columns) in a data dictionary include: