Skip to main content

Dissertation Boot Camp Library Resources @ Pitt: File Management and Sharing

This guide is designed to help students attending dissertation boot camp sessions find and organize research information.

The importance of file management

This page provides guidance on issues related file management and sharing, including selecting file formats, naming files, and depositing research data in D-Scholarship@Pitt. 

Developing a file management approach that makes sense to you can save time and resources in the long-run. Memory is fleeting. An inconsistent or vague file naming convention can make it challenging to identify a relevant or current file later in your research process. It can pose difficulties to producing the research files that support your dissertation following completion of your project.

Open availability of your scholarly materials can encourage other research discoveries and foster knowledge. In order for your materials to be usable, however, others need to be able to access, trust, and interpret them. A clearly defined file structure system, an intelligible file naming convention, and accessible file formats can help support reuse.
 
 

Choosing a file format

The format of the electronic data files you work with during your research may be determined by the research equipment and computer hardware and software that you have access to. However, for long-term preservation and ease of sharing, best practices may dictate that the files be converted to a different format after your project has ended. Give some thought to this eventuality at the outset. Considerations include:

  • Will your data be in a format that requires proprietary software to access it?
  • If you will be depositing your data in a repository at the end of your project, does the repository have specific guidelines or requirements with respect to file format?
  • What features of your data might be lost or modified in the conversion to another file format?

Stanford University Libraries - Data Management Services provides a useful overview of preferred file formats. From the Stanford resource:

  • Containers: TAR, GZIP, ZIP

  • Databases: XML, CSV

  • Geospatial: SHP, DBF, GeoTIFF, NetCDF

  • Moving images: MOV, MPEG, AVI, MXF

  • Sounds: WAVE, AIFF, MP3, MXF

  • Statistics: ASCII, DTA, POR, SAS, SAV

  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP

  • Tabular data: CSV

  • Text: XML, PDF/A, HTML, ASCII, UTF-8

  • Web archive: WARC

Additional helpful guidelines for selecting file formats can be found at these websites:

Data documentation

For data to be interpretable and useful to others, researchers should document their research workflow, decisions that they make during their research process, and their manipulation of the data. The UK Data Archive outlines a set of best practices for data documentation, which is captured here: 

Good data documentation includes information on:

  • the context of data collection: project history, aim, objectives and hypotheses
  • data collection methods: sampling, data collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used
  • dataset structure of data files, study cases, relationships between files
  • data validation, checking, proofing, cleaning and quality assurance procedures carried out
  • changes made to data over time since their original creation and identification of different versions of data files
  • information on access and use conditions or data confidentiality

At data-level, datasets should also be documented with:

  • names, labels and descriptions for variables, records and their values
  • explanation of codes and classification schemes used
  • codes of, and reasons for, missing values
  • derived data created after collection, with code, algorithm or command file used to create them
  • weighting and grossing variables created
  • data listing with descriptions for cases, individuals or items studied

Variable-level descriptions may be embedded within a dataset itself as metadata. Other documentation may be contained in user guides, reports, publications, working papers and laboratory books (see Managing and Sharing Data UK Data Archive).

Good practices for file naming

Before you begin your research, decide on a naming convention for your files. Document the naming convention you choose, and make sure that you and your collaborators follow it. It will save you time and will help others who may use your files in the future!

When developing your naming conventions, consider the following suggestions:

  • Give files a meaningful and descriptive name. A file name might include a combination of elements, such as type of equipment used, date, and researcher's last name. Decide on the best order for elements in a file name; it will affect how the files are sorted.
  • Keep names a reasonable length; some applications won't work well with long file names. A maximum of 25 characters is a good rule of thumb.
  • To separate elements in a file name, consider using underscores (_) or hyphens (-). Avoid using blank spaces in a file name.
  • Use periods only to separate the file name from the file type extension (.txt, .jpg, etc.)
  • If including date as part of the file name, use the standard format yyyymmdd to ensure that files sort in chronological order.
  • If your file name will include a numerical component, such as a subject number or version number, use leading zeros (001, 002, etc.) so that files sort in sequential order.
  • Avoid special characters like ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ‘ “

It's not too late to align file names with a consistent file naming convention that you develop. The following are tools and approaches for renaming a collection of files:  

Sharing Your Research Data in D-Scholarship@Pitt

Deposit the data that your dissertation! Many disciplines are gravitating toward a culture of openness and, with funders increasingly requiring shared data, this gravitation is sure to continue. Early career academics can begin to embrace the principles of open research and transparency by sharing their dissertation and thesis data. Data deposited in D-Scholarship@Pitt can be cited by other users and pointed to as evidence of scholarly impact.

D-Scholarship@Pitt:

  • Accepts nearly any format of file
  • Assigns your data a Digital Object Identifier (DOI), a permanent and unique identifier for a digital object that will help others to find and cite your data
  • Allows you to add information that provides important context for your data so that others can discover, understand, and trust the data files
  • Tracks your work using alternative metrics (“altmetrics”) to help demonstrate your impact and see how others are using your data
  • Is indexed by Google and other major Internet search engines, the Pennsylvania Digital Library, and PITTCat+

The University Library System wants to work with you to help them share their supporting data! We are available to meet for a consultation about preparing your data for deposit in D-Scholarship@Pitt.