Pitt community: Write to Digital Scholarship Services or use our AskUs form
Pitt health sciences researchers: Contact Data Services, Health Sciences Library System
Collocation traces the appearance of words that commonly appear next to each other in a text or series of text in order to analyze the words' importance.
Voyant Tools
Web-based reading and analysis environment for digital texts, for performing tasks such as word frequencies, collocations, concordance, visualization (graphs, grids, word clouds, etc.)
AntConc
Freeware, multi-platform, multi-purpose corpus analysis toolkit that hosts a comprehensive set of tools, including a powerful concordancer, word and keyword frequency generators, tools for cluster and lexical bundle analysis, and a word distribution plot
WordHoard
Application for the close reading and scholarly analysis of deeply tagged texts, including word frequencies, concordances, collocations, and scripting
CasualConc
Concordance program for macOS, designed for exploratory-type text analysis and visualization of frequency data, including keyword in context (KWIC) concordance lines, word clusters, collocation analysis, and word count
NLTK (Natural Language Toolkit)
For accessing corpora and lexicons, tokenization, stemming, (part-of-speech) tagging, parsing, transformations, translation, chunking, collocations, classification, clustering, topic segmentation, concordancing, frequency distributions, sentiment analysis, named entity recognition, probability distributions, semantic reasoning, evaluation metrics, manipulating linguistic data (in SIL Toolbox format), language modeling, and other NLP tasks
spaCy
For tokenization, named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more
scikit-learn
For classification, regression, clustering, dimensionality reduction, model selection, and preprocessing
tidytext
For converting to and from non-tidy formats, word and document frequency analysis (tf-idf), n-grams and correlations, sentiment analysis with tidy data, and topic modeling
openNLP
For NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution
RcmdrPlugin.temis
For performing a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering
RWEKA
For stemming, data transformation, distribution-based balancing of datasets, replacing missing numerical values, dataset resampling, anonymization, normalization, classification, regression, clustering, association rules, and visualization
tm
For importing and handling corpus data, metadata management, stemming, stop word deletion, removal of white space, string processing, count-based analysis methods, text clustering, text classification, and string kernels