Skip to Main Content

Course & Subject Guides

Text Mining & Analysis @ Pitt

An introduction to text mining/analysis and resources for finding text data, preparing text data for analysis, methods and tools for analyzing text data, and further readings regarding text mining and its various methods.

Text Encoding

Text encoding is the process of selecting and marking specific aspects of a text using a machine-readable markup language, such as XML. Based on XML, the TEI (Text Encoding Initiative) has become the standard coding guideline for the digital representation of literary texts through descriptive markup. 

 

Tools

 

Out-of-the-Box
  • Roma
    For generating P5-compatible schemas and documentation, TEI customization development

  • TEIGarage
    For managing the conversion between TEI documents and a variety of formats

  • Data Dictionary Generator
    For generating encoding documentation by creating profiles of every element and attribute appearing in a TEI file

  • Oxygen XML Editor
    For creating, editing, and publish XML documents

  • Code Browser
    For navigating and editing source code

  • Notepad++
    For editing source code

 

Example Projects