Data Use Agreements may be necessary to share data that is generated from projects receiving certain funding types. The University of Pittsburgh's Office of Research explains that a DUA "is a contractual document used for the transfer of data that has been developed by nonprofit, government or private industry, where the data is nonpublic or is otherwise subject to some restrictions on its use." Learn more about DUAs from the Office of Research here and here.
By quantitative data, we mean data that are numerical values or measurements of facts about the universe. Because facts are not subject to copyright, most quantitative data are not copyrightable in the United States and copyright laws usually do not apply or are not enforceable.
However, the arrangement, selection, and coordination of the data set as a whole may be subject to copyright. This depends on the creativity involved with arranging and displaying the data.
Many researchers believe in the importance of sharing data openly to facilitate the greatest possible reuse of the data. For example, Dryad and the Panton Principles for Open Data strongly recommend that data be contributed to the public domain. When a data set is dedicated to the Public Domain, then the creator declares that others may use the data set in its current form (and, therefore, the potential copyright of the arrangement, selection, and coordination of the data set are dedicated to the Public Domain). Below are two examples of licenses that a data set creator can apply to a quantitative data set to dedicate it to the Public Domain.
By qualitative data, we mean data that contain observations, texts, conversations, artistic or creative works, which are usually collected in the humanities and social science fields. Some examples of qualitative data include text corpora, interviews, photographs, and social media output. Because these are often creative expressions made by individuals that are fixed in a tangible form, many of these data sets are subject to copyright and permission may need to be obtained for their use. For those compiling qualitative data sets, privacy, ethics, and licenses are of key concern.
For those collecting interviews or other recordings and documentation made by research subjects, clear guidelines for the usage and ownership of these materials should be set out in a Consent Form and cleared with the IRB. This is also the case when research work is conducted via the Internet.
Considerations and Recommendations Concerning Internet Research and Human Subjects Research Regulations from the US Department of Health and Human Services
Guidelines for Ethical Conduct in Participant Observation (University of Toronto) - contains advice on what to consider when writing a consent form and protocol.
Communicating Qualitative Research Study Designs to Research Ethics Review Boards (2011) by Carolyn Ells - a discussion on ethical issues in collecting data for qualitative research studies and how to construct a protocol that reflects these considerations.
Researchers must identify whether the data are in the public domain, subject to licensing terms, or may qualify as Fair Use. Because these data sets often include substantial transformative use, a Fair Use argument may be particularly powerful for qualitative data sets.
Copyright and Intellectual Property Toolkit: Public Domain - contains information on how to determine if an item is in the public domain.
Copyright and Intellectual Property Toolkit: Fair Use - contains information about the doctrine of Fair Use and tools for making a Fair Use argument.
Understanding Fair Use: Transformative Use - read more about the "Fifth Factor" of Fair Use, Transformation.
When obtaining data from the Internet via scraping tools, the restrictions in Terms of Service and Developer Policies apply, especially from social media websites.
Fair Use in the Age of Social Media - an article covering the basics of Fair Use in social media contexts.
Challenges of Using Twitter as a Data Source - covers some of the issues with using and sharing qualitative social media data sets, including licensing issues. See also Twitter's Developer Policy, which applies to those creating data sets by scraping Twitter.
For data sets that contain sensitive research, e.g. human subject research, access control may be an option. Mixed levels of access control may be put in place for some data, combining controlled access to confidential data with standard access to non-confidential data.
Before data collected during research with human subjects is published, researchers should ensure the removal of any personally identifiable information (or PII). A documented plan for anonymizing the data will serve to mitigate the risk to participants, encourage consistency in practices among the research team throughout the project, and help future users to understand what decisions were made during the anonymization.
Some approaches for anonymization include:
Beyond the Public Domain licensing options above, there are some other licensing options that can apply to data sets. Creative Commons licenses allow creators to specify the rights for reuse - typically with attribution to the creator, but potentially also including bans on commercial use and derivatives. It is not recommended to prohibit derivative works on a data set, as this will compromise the usability of that data set.
How to License Research Data by the Digital Curation Centre (UK)
Copyright and Intellectual Property Toolkit: Creative Commons, Copyleft, and Other Licenses
Open Data Commons - licenses specifically created for data reuse, including a Public Domain dedication as well as an Attribution Required license.
Licenses can work in tandem with access control, Fair Use, and ethical considerations detailed above. For complex situations, contact us for guidance.