Skip to Main Content

Course & Subject Guides

Text Mining & Analysis @ Pitt

An introduction to text mining/analysis and resources for finding text data, preparing text data for analysis, methods and tools for analyzing text data, and further readings regarding text mining and its various methods.

Sentiment Analysis

Sentiment analysis (sometimes called opinion mining) is used for classifying and interpreting the emotional valence of text data from positive to negative. 

 

Tools

 

Out-of-the-Box
Programmatic

Python

  • NLTK (Natural Language Toolkit)
    For accessing corpora and lexicons, tokenization, stemming, (part-of-speech) tagging, parsing, transformations, translation, chunking, collocations, classification, clustering, topic segmentation, concordancing, frequency distributions, sentiment analysis, named entity recognition, probability distributions, semantic reasoning, evaluation metrics, manipulating linguistic data (in SIL Toolbox format), language modeling, and other NLP tasks

  • VADER (Valence Aware Dictionary and sEntiment Reasoner)
    For lexicon and rule-based sentiment analysis; specifically attuned to sentiments expressed in social media, and works well on texts from other domains

  • NLP Architect
    For word chunking, named entity recognition, dependency parsing, intent extraction, sentiment classification, language models, transformations, Aspect Based Sentiment Analysis (ABSA), joint intent detection and slot tagging, noun phrase embedding representation (NP2Vec), most common word sense detection, relation identification, cross document coreference, noun phrase semantic segmentation, term set expansion, topics and trend analysis, optimizing NLP/NLU models

  • Spark NLP
    For tokenization, word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, spell checking, multi-class text classification, transformation (BERT, XLNet, ELMO, ALBERT, and Universal Sentence Encoder), multi-class sentiment analysis, machine translation (+180 languages), summarization and question Answering (Google T5), and many more NLP tasks

  • Polyglot
    For tokenization (165 languages), language detection (196 languages), named entity recognition (40 languages), part-of-speech tagging (16 languages), sentiment analysis (136 languages), word embeddings (137 languages), morphological analysis (135 languages), transliteration (69 languages)

  • Pattern
    For webscraping (Google, Wikipedia, Twitter, Facebook, generic RSS, etc.), web crawling, HTML DOM parsing, part-of-speech tagging, n-gram search, sentiment analysis, vector space modeling, clustering, classification (KNN, SVM, Perceptron), graph centrality and visualization

R

  • tidytext
    For converting to and from non-tidy formats, word and document frequency analysis (tf-idf), n-grams and correlations, sentiment analysis with tidy data, and topic modeling

  • SentimentAnalysis
    For dictionary-based sentiment analysis

  • Syuzhet Package
    For extracting sentiment and sentiment-derived plot arcs from text

  • mscstexta4r
    provides an interface to the Microsoft Cognitive Services Text Analytics API and can be used to perform sentiment analysis, topic detection, language detection, and key phrase extraction

Java

  • CoreNLP
    For deriving linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment analysis, quote attributions, and relations

  • Stanford Sentiment Analysis
    For sentiment analysis using recursive deep learning models