Skip to Main Content

Course & Subject Guides

Text Mining & Analysis @ Pitt

An introduction to text mining/analysis and resources for finding text data, preparing text data for analysis, methods and tools for analyzing text data, and further readings regarding text mining and its various methods.

Preparing Text Data

Most text is created and stored so that humans to read and use, but text analysis/mining requires your text data to be machine readable (i.e., in a form that a computer can process), structured, and clean. Hence, after gathering your text data, the next step usually entails optical character recognition (OCR)if you're working with document scans or images, and/or text preprocessing (e.g., parsing, cleaning, transforming). 

This guide provides tools and helpful resources for the following text data preparation tasks: