Skip to Main Content

Course & Subject Guides

Data Management @ Pitt

Learn about the principles of (research) data management.

What is data quality?

Data quality is, simply put, "data that are fit for use by data consumers" (Wang and Strong 1996)—note that "perfection" is not a goal. Many frameworks for data quality have been proposed, such as Total Data Quality Management (TDQM) and Data Quality Assessment (DQA), as well as frameworks in specific domains such as health care or Web data. Different frameworks emphasize different dimensions (characteristics/attributes) of the data. These dimensions are "highly context dependent and their relevancy and importance can vary between organizations and types of data" (Cichy and Rass 2019). Among many frameworks, Cichy and Rass identify the following dimensions as the most common in defining data quality:

  • Completeness (sufficient breadth, depth, and scope)
  • Accuracy (correct, reliable, certified)
  • Timeliness (age of data is appropriate for the task at hand)
  • Consistency (same format / compatibility with pre-existing data)
  • Accessibility (available, easily retrievable)

These dimensions give us areas of focus when assessing and attempting to improve data quality. Consequences of poor data quality may include inaccuracy, greater uncertainty, and even misguided decision making.

Data quality tips

  1. Plan ahead how you will represent and store data (data modeling), using tidy data principles.
  2. Double-check hand-entered data. If there are large quantities to check or someone else entered it, randomly spot check. Consider also checking extreme outlier values.
  3. Keep an untouched copy of the data, so that you can always recover the original observations.
  4. Document your workflows, especially for data cleaning and preprocessing, and especially save any code that you run on your data.
  5. Implement validation checks, for example, ensuring that all values in a column are numeric, or that all values are within an expected range.
  6. Assign review and quality checking of data to a team member.
  7. Decide how the dimensions of data quality described in the above section apply to your situation, and establish relevant data governance policies for your group.

More resources for data quality