Skip to Main Content

Course & Subject Guides

Data Sharing @ Pitt

Learn about the principles and how-to of sharing academic research data.

Why prep your data?

If you want to share data effectively, there is some effort involved. So, why bother? Here are a few reasons to consider:

  • Findings can be validated or verified more easily with prepared data (reproducibility)
  • Methods that are fully documented can be applied in new studies (replicability)
  • Well prepared data and well-documented processes may carry greater persuasiveness with reviewers and readers because the PI’s evidence is readily available and legible.
  • The PI’s time may be saved by receiving fewer interpretive requests from colleagues who want to validate the findings or reuse the data; the needed information is already available online.
  • The PI is likely to reference and/or reuse the data themselves at some point in the future; prepared data are easier for the creator to make sense of after a long time away.
  • Clear definitions of the variables in two separate datasets allow a researcher to assess whether the data can be integrated/combined and what transformations may be needed (e.g., different units).
  • Risk of information loss in the long term (due to accident, misremembering, etc.) is greatly reduced, because the PI has transferred their knowledge about the data into permanent documentation.

Data preparation activities

To prepare your data for sharing, there are a variety of activities to consider.

Each activity is described in detail in the linked subpage, also available in the navigation bar.

Getting this work done

The material discussed above may seem like a lot to deal with at the end of a project. For this reason, we recommend that activities such as file organization and naming and data dictionary development occur initially very early in the project, followed by a practicing the established conventions and periodic updates (e.g., adding a new row to the data dictionary when a new column is added to the data). It's possible that you may need to revise as you go, which is also OK, as long as there is consistency across all parts of the project at a given moment.

If you are handy with Python, R, or bash scripting, you can also automate some of this work, such as renaming large batches of files to fit the convention. However, ⚠ make a copy of your project first, and ensure that the script works perfectly on the copy, before applying it to your real files. There is a very real risk of data loss here! 💀 Proceed with caution.

More resources for data preparation

See also the resources linked on individual "Data preparation activities" subpages.