Skip to Main Content

Course & Subject Guides

Text Mining & Analysis @ Pitt

An introduction to text mining/analysis and resources for finding text data, preparing text data for analysis, methods and tools for analyzing text data, and further readings regarding text mining and its various methods.

Social Media

Social media posts and comments represent an expansive data source for text mining and analysis. To access this content, researchers can often use the social media company's own API (application programming interface) or other tools that utilize these APIs, such as Python. 

 

APIs

 

API is the acronym for Application Programming Interface, which is a set of functions and protocols that enables two applications to talk to each other. In other words, an API serves as an intermediary or messenger between applications, databases, and devices. They are the code they allow you to use apps like Facebook, send instant messages, and check the weather on your phone. Social media APIs can enable you to extract data from social media platforms.

 

Other Tools

 

Out-of-the-Box
  • Twitter Archiving Google Sheet (TAGS)
    Free Google Sheet template which lets you setup and run automated collection of search results from Twitter

  • Facepager
    Application for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping

  • Social Feed Manager
    Open source software that harvests social media data and web resources from Twitter, Tumblr, Flickr, and Sina Weibo; empowers researchers, faculty, students, and archivists to collect, manage, and export social media data

  • Social Media Macroscope
    Open-source social media analytics tool that allows researchers to collect and analyze social media data. SMILE can perform functions such as text-preprocessing, phrase mining, sentiment analysis, network analysis, and machine learning text classification

  • Social Media Research Toolkit
    List of 50+ social media research tools

Programmatic

Python

  • tweepy
    Easy-to-use Python library for accessing the Twitter API

  • socialreaper
    Social media scraping/data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

  • Twitter Scraper
    Script that grabs all of a user's tweets (going beyond the 3,200 limit)

R

  • streamR
    Open source package that functions to access Twitter's filter, sample, and user streams, and to parse the output into data frames.