Skip to main content

Research Data Management @ Pitt

This guide will assist researchers in planning for the various stages of managing their research data and in preparing data management plans required with funding proposals.

About rights and permissions

As a data creator, you have certain rights over the work and an opportunity to license your data appropriately to facilitate sharing and re-use. The application of copyright and licensing depends on several factors - whether your data set contains quantitative data, qualitative data, or sensitive information. Copyright and licensing options vary depending on the type of data and its sensitivity. 
 
Best practices include:
  • Understanding the nature of your dataset and whether your data are subject to copyright.
  • Making your data as open and reusable as possible, ideally by dedicating it to the Public Domain.
  • Identifying any restrictions of sharing data, e.g. from Terms of Use.
  • Asserting your rights under the Doctrine of Fair Use if necessary.
  • Considering carefully any ethical questions involved in sharing your openly and choosing licensing and access options to match. 
The information presented here is a brief overview of a very complicated topic. Please get in touch with the Research Data Management team for help with any of the rights and permissions considerations below. 

Resources from the HRPO and the Office of Research

The Human Research Protection Office (HRPO; formerly the IRB) protects the rights and welfare of human subjects in studies carried out at the University of Pittsburgh. 

  • Researchers who are working with human subjects may be required to submit a Data and Safety Monitoring Plan (DSMP) to the HRPO. Learn about DSMPs and HRPO requirements here
  • The HRPO states, "The Principal Investigator (PI) is responsible for ensuring that research data is secure when it is collected, stored, transmitted, or shared." Find  guidance from the IRB on keeping data secure on the IRB website.
  • For researchers working with health information, review the IRB's HIPAA Privacy Rule Guidance for Researchers.

Data Use Agreements may be necessary to share data that is generated from projects receiving certain funding types. The University of Pittsburgh's Office of Research explains that a DUA "is a contractual document used for the transfer of data that has been developed by nonprofit, government or private industry, where the data is nonpublic or is otherwise subject to some restrictions on its use." Learn more about DUAs from the Office of Research here and here.

Rights and quantitative data

By quantitative data, we mean data that are numerical values or measurements of facts about the universe. Because facts are not subject to copyright, most quantitative data are not copyrightable in the United States and copyright laws usually do not apply or are not enforceable. 

However, the arrangement, selection, and coordination of the data set as a whole may be subject to copyright. This depends on the creativity involved with arranging and displaying the data. 

Many researchers believe in the importance of sharing data openly to facilitate the greatest possible reuse of the data. For example, Dryad and the Panton Principles for Open Data strongly recommend that data be contributed to the public domain. When a data set is dedicated to the Public Domain, then the creator declares that others may use the data set in its current form (and, therefore, the potential copyright of the arrangement, selection, and coordination of the data set are dedicated to the Public Domain). Below are two examples of licenses that a data set creator can apply to a quantitative data set to dedicate it to the Public Domain.

Rights and qualitative data

By qualitative data, we mean data that contain observations, texts, conversations, artistic or creative works, which are usually collected in the humanities and social science fields. Some examples of qualitative data include text corpora, interviews, photographs, and social media output. Because these are often creative expressions made by individuals that are fixed in a tangible form, many of these data sets are subject to copyright and permission may need to be obtained for their use. For those compiling qualitative data sets, privacy, ethics, and licenses are of key concern.

For those collecting interviews or other recordings and documentation made by research subjects, clear guidelines for the usage and ownership of these materials should be set out in a Consent Form and cleared with the IRB. This is also the case when research work is conducted via the Internet.

  • Considerations and Recommendations Concerning Internet Research and Human Subjects Research Regulations from the US Department of Health and Human Services

  • Guidelines for Ethical Conduct in Participant Observation (University of Toronto) - contains advice on what to consider when writing a consent form and protocol.

  • Communicating Qualitative Research Study Designs to Research Ethics Review Boards (2011) by Carolyn Ells - a discussion on ethical issues in collecting data for qualitative research studies and how to construct a protocol that reflects these considerations.

Researchers must identify whether the data are in the public domain, subject to licensing terms, or may qualify as Fair Use. Because these data sets often include substantial transformative use, a Fair Use argument may be particularly powerful for qualitative data sets.

 

  • Copyright and Intellectual Property Toolkit: Public Domain - contains information on how to determine if an item is in the public domain.

  • Copyright and Intellectual Property Toolkit: Fair Use - contains information about the doctrine of Fair Use and tools for making a Fair Use argument.

  • Understanding Fair Use: Transformative Use - read more about the "Fifth Factor" of Fair Use, Transformation.

When obtaining data from the Internet via scraping tools, the restrictions in Terms of Service and Developer Policies apply, especially from social media websites.

 

  • Fair Use in the Age of Social Media - an article covering the basics of Fair Use in social media contexts.

  • Challenges of Using Twitter as a Data Source - covers some of the issues with using and sharing qualitative social media data sets, including licensing issues. See also Twitter's Developer Policy, which applies to those creating data sets by scraping Twitter.

Access control and permissions for sensitive data

For data sets that contain sensitive research, e.g. human subject research, access control may be an option. Mixed levels of access control may be put in place for some data, combining controlled access to confidential data with standard access to non-confidential data. 

Anonymizing data

Before data collected during research with human subjects is published, researchers should ensure the removal of any personally identifiable information (or PII). A documented plan for anonymizing the data will serve to mitigate the risk to participants, encourage consistency in practices among the research team throughout the project, and help future users to understand what decisions were made during the anonymization.

Some approaches for anonymization include:

  • Avoid the collection of identifying information that is unnecessary for the study
  • Remove direct identifiers (i.e. participant names, addresses, phone numbers) from the data and, when appropriate replace this information with a code (i.e. participant number or pseudonym in place of name)
  • Aggregate variables or reduce the precision of reporting when possible to lower the potential for identification. For example, rather than recording full birth dates or precise ages of participants, the research team may decide to record year of birth or age range.

Licensing options

Beyond the Public Domain licensing options above, there are some other licensing options that can apply to data sets. Creative Commons licenses allow creators to specify the rights for reuse - typically with attribution to the creator, but potentially also including bans on commercial use and derivatives. It is not recommended to prohibit derivative works on a data set, as this will compromise the usability of that data set.

  • How to License Research Data by the Digital Curation Centre (UK)

  • Copyright and Intellectual Property Toolkit: Creative Commons, Copyleft, and Other Licenses

  • Open Data Commons - licenses specifically created for data reuse, including a Public Domain dedication as well as an Attribution Required license.

 

Licenses can work in tandem with access control, Fair Use, and ethical considerations detailed above. For complex situations, contact us for guidance.

Intellectual property considerations

Copyright law protects the original creative expressions that are fixed in either physical or digital form. The US Copyright Law provides examples of creative works that are protected -- including literary works, musical works, and motion pictures -- and works that are not eligible for copyright protection -- including ideas, processes, and concepts. Factual information has been interpreted as being outside of the protection of the copyright law, which has implications for data. Peter Hirtle of Cornell University Library cautions, "Not all data is in the public domain. A project might, for example, be built around copyrighted photographs; the photographs are part of the project’s 'data.' But in many cases, the data in a data management system as well as the metadata describing that data will be factual, and hence not protected by copyright." For Hirtle's useful guidance, see his "Introduction to Intellectual Property Rights in Data Management."

Even if datasets are not protected under copyright, researchers who are not the creators may be uncertain whether they are indeed allowed to use it for their own work. Licenses that clearly outline the terms of use can help to alleviate this uncertainty and to promote the use the data. Creative Commons licenses and Open Data Commons licenses are two noteworthy instruments for specifying the terms of use for datasets.