Skip to Main Content

Course & Subject Guides

Text Mining & Analysis @ Pitt

An introduction to text mining/analysis and resources for finding text data, preparing text data for analysis, methods and tools for analyzing text data, and further readings regarding text mining and its various methods.

Get Help with Text Mining & Analysis

Pitt community: Write to Digital Scholarship Services or use our AskUs form

Pitt health sciences researchers: Contact Data Services, Health Sciences Library System

Guide Contributors

License

"Text Mining & Analysis @ Pitt" by University of Pittsburgh Library System is licensed for reuse under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
CC BY

Preparing Text Data

Most text is created and stored so that humans to read and use, but text analysis/mining requires your text data to be machine readable (i.e., in a form that a computer can process), structured, and clean. Hence, after gathering your text data, the next step usually entails optical character recognition (OCR)if you're working with document scans or images, and/or text preprocessing (e.g., parsing, cleaning, transforming). 

This guide provides tools and helpful resources for the following text data preparation tasks: