Pitt community: Write to Digital Scholarship Services or use our AskUs form
Pitt health sciences researchers: Contact Data Services, Health Sciences Library System
"Text Mining & Analysis @ Pitt" by University of Pittsburgh Library System is licensed for reuse under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Before you can start your text mining/analysis project, you'll first need to gather text data and build a corpus (an organized collection of texts). There are many sources of text data, whether you're conducting a research study through interviewing or surveying, collecting primary sources or journal articles, downloading data sets or corpora created by others, or extracting data from the web. If you are looking for text data for your text mining project, below are links to pages with resources and/or tools for some major sources of text data: