Most text is created and stored so that humans to read and use, but text analysis/mining requires your text data to be machine readable (i.e., in a form that a computer can process), structured, and clean. Hence, after gathering your text data, the next step usually entails optical character recognition (OCR), if you're working with document scans or images, and/or text preprocessing (e.g., parsing, cleaning, transforming).
This guide provides tools and helpful resources for the following text data preparation tasks: