Skip to Main Content

Course & Subject Guides

Data Management @ Pitt

Learn about the principles of (research) data management.

What is version control?

Version control is a software feature or application that tracks all changes in a document (file) over time, and makes past revisions available for retrieval. This functionality may be useful for an individual working alone, but also—and especially—for several collaborators working with the same files, where managing changes over time becomes complex. These tools originated in software development, where they track lines of code, but version control may also be applied in a setting like the Microsoft Office suite, or to binary files which don't have any "lines of code." For a shared repository, file access may either be concurrent, in which independent versions of files must be merged, or may rely on a "checkout" mechanism which restricts access to a single user at a time. Some cloud-based storage providers have also implemented versioning for files in the form of a "file history" features. (Henderson 2021)

Version control is also a significant technology for publishing Web content over time, such as via GitHub and Wikipedia. GitHub is an online platform for sharing version-controlled projects (repositories) using the Git software. Git is the de facto standard for file-system version control; alternatives have been mercurial and subversion. Meanwhile, Wikipedia implements a "Page history" feature which allows any user to easily assess what changes were made to an article, when, and by whom; the ability to roll back to a previous version greatly facilitates moderation against vandalism, for example.

Version control with Git and GitHub

Git is an open-source software which has become the community standard for collaborative version control. The operating principle is that Git monitors changes in the project's file system, and the user periodically commits snapshots of these changes to a "repository," a database behind the scenes. These snapshots or "commits" can be later retrieved and inspected to reconstruct the repository state at those respective points in time. Git is available as a command-line utility, via several graphical clients, or via integrations in software such as VS Code and RStudio.

GitHub is an online platform for sharing and reusing Git repositories. It implements many social and workflow management features, such as an "Issues" forum for each repository, "Forks" for developing derivative editions of other users' projects, and "Pull Requests" for proposing and accepting collaborative changes to a repository.

GitHub is also useful for hosting web pages, once the user has learned how Git and GitHub work.

Resources for Pitt community members to learn Git and GitHub

Version control with Microsoft Office and OneDrive

Microsoft Office files, such as Word documents and PowerPoint presentations, have a feature called Version History which allows the user to restore previous versions of a file.

Furthermore, OneDrive (and SharePoint) also implement Version History for files. This can be useful for recovering from mistakes, whether working alone or with colleagues.

More resources for version control