Skip to Main Content

Course & Subject Guides

Text Mining & Analysis @ Pitt

An introduction to text mining/analysis and resources for finding text data, preparing text data for analysis, methods and tools for analyzing text data, and further readings regarding text mining and its various methods.

Get Help with Text Mining & Analysis

Pitt community: Write to Digital Scholarship Services or use our AskUs form

Pitt health sciences researchers: Contact Data Services, Health Sciences Library System

Guide Contributors

License

"Text Mining & Analysis @ Pitt" by University of Pittsburgh Library System is licensed for reuse under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
CC BY

Sources of Text Data

Before you can start your text mining/analysis project, you'll first need to gather text data and build a corpus (an organized collection of texts). There are many sources of text data, whether you're conducting a research study through interviewing or surveying, collecting primary sources or journal articles, downloading data sets or corpora created by others, or extracting data from the web. If you are looking for text data for your text mining project, below are links to pages with resources and/or tools for some major sources of text data: