Pitt community: write to Digital Scholarship Services or use our AskUs form
Pitt health sciences researchers: contact Data Services, Health Sciences Library System
Dominic Bordelon, dbordelon@pitt.edu
"Data Sharing @ Pitt" by University of Pittsburgh Library System is licensed for reuse under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
If your research involves human subjects, part of your responsible research practice, as overseen by the Institutional Review Board, will be to ensure that data are shared only in ways according with participants' informed consent, and only in ways such that participants cannot be identified.
What constitutes identifiable information? First, we should consider two types of identification: (1) to identify a subject based on what the data of interest say about them, and (2) to identify a subject's presence in the data of interest, based on other information available to the seeker. Both of these may be risks to your participants. Furthermore, research has shown that merely fragmentary information may be enough for a (human or alogrithmic) snooper to reconstruct a target's identity.
With these risks in mind, it is important to protect subject identities by removing (or randomly changing) a certain amount of information. A straightforward approach is to implement the HIPAA Safe Harbor Standard, which lists specific data elements to be removed or altered.
Another possibility is to utilize differential privacy methods, which means to carefully introduce statistical noise to the data such that individual identities are "scrambled" (protected) but the sample remains statistically similar enough for analysis. There are a variety of differential privacy methods depending on characteristics of the data, and each method is tunable according to sample statistics and desired strength of privacy protection (noise level). Since this area is both complex and high-risk, you should seek consultation with a statistician to properly apply these methods.
Other limitations on your project's data sharing may involve commercial concerns (e.g., for patentable technology) or national security. While funders are increasing their expectations with regards to data sharing, there should remain exemptions for data elements which are "sensitive" for any of the above reasons.