Data Sharing @ Pitt

Learn about the principles and how-to of sharing academic research data.

Rationales for (and benefits of) data sharing

What is "Data Sharing?" In academic research contexts, data sharing refers to the posting of raw and/or processed data in a repository for use by other researchers. Typically, access to such repositories is free and open to the general public via the internet, although access may need to be controlled if there are sensitive data. Often nowadays, when a researcher submits a manuscript, the publisher will also require that the data supporting the findings be shared in an open repository; however, researchers may also choose to share data outside of the traditional publication scope. In fact, some disciplines, such as genomics, rely heavily on readily available big data; meanwhile, data centers such as the National Radio Astronomy Observatory (NRAO) Archive support research by persistently offering data produced by facility-scale instruments.

Here are some of the reasons for data sharing, according to agencies and proponents:

  • Data sharing makes for better science because it enables validation, replication, and secondary analysis by fellow researchers and potential collaborators.
  • With sharing as a goal, good data management practices are incentivized, which themselves improve the PI's science.
  • Activity by bad actors (data fabrication, p-hacking, etc.) is easier to detect and hold to account if data are shared. 
  • Online availability of data offers greater opportunity for researchers who are underprivileged in their access to instrumentation or other needed material.
  • From an ethical perspective, the products of publicly funded research should be available to the public to the extent possible.

Data sharing is also part of the Open Science movement and an extension of the Open Access (OA) movement. Data that have been shared are sometimes called "Open Data," but note that this term may also refer to civic and governmental data.

Considerations for data sharing

Before eagerly posting one's data, there are several elements to consider:

  • The University claims ownership of the data (Nordenberg 2009); inventions must be registered with the Innovation Institute and releases of any associated data approved prior to sharing. Similarly, data may be subject to a Data Use Agreement which prevents sharing.
  • Data which may compromise an individual's right to privacy cannot be publicly shared. Because the true privacy risks associated with a datum can be difficult to determine, there is a risk that well-intentioned researchers will nevertheless "leak" sensitive information.
  • Data presenting a national security or public safety risk cannot be publicly shared.
  • Quality data sharing does require some effort from the researcher. (Fortunately, when undertaken early, most practices for quality shared data will benefit your project's overall data management.)
  • Depending on the amount of data your project produces, there may be costs associated with data sharing. (And these may be in addition to costs associated with data management.)
  • Some researchers are understandably concerned about being "scooped" by competitors, i.e., that someone online will find something the PI overlooked in the data.

Data sharing requirements

The University of Pittsburgh does not require researchers to share their data. However, the University does have a data retention policy of seven years (Nordenberg 2009). Additional data guidance and other resources can be found on the University's Human Research Protection Office website.

The only major funder currently requiring data sharing in proposals is the National Institute of Health (NIH), effective since January 2023. Find out more about NIH data sharing requirements from Pitt's Health Sciences Library System (HSLS). That said, other agencies are expected to join the NIH in this requirement soon.

The National Science Foundation (NSF) has announced NSF Public Access Plan 2.0, which requires (effective for proposals submitted January 2025 onward) Open-Access publication of manuscripts and sharing of supporting data in repositories. Budget allowances are made for costs associated with data management and sharing. More specific plans and requirements may be announced by individual NSF directorates.

Both NIH and NSF are responding to the 2022 OSTP Memo by Director Alondra Nelson, which mandates that federal granting agencies require data sharing by December 31, 2025. Additionally, there is to be no embargo on shared data; some sharing schemes so far have embargoed (delayed) data sharing until a year after manuscript publication. The memo expresses better science as a goal, but also equitable access to research outcomes. The research community's response to COVID-19 is cited as a particular success story of data sharing.

Increasingly, publishers are requiring sharing of supporting data. A list can be found at the Publisher Data Availability Policies Index.

Getting started with data sharing

Thinking about sharing your own data? Check out the pages below (or in the navigation bar) to learn more.