Use R to connect to twitter and create a wordcloud of your tweets

Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account.

First steps in R

Install required libraries twitteR and wordcloud and load them.

install.packages(c("wordcloud", "twitteR"))
library(twitteR)
library(wordcloud)

Create a twitter app

To be able to authenticate your API requests with the R package twitteR you need to authenticate yourself. To have an endpoint for that, you need to create a Twitter App at https://apps.twitter.com/. Click “Create New App” and fill the required fields with your values.

  • Name: choose a name for your app, unfortunately it has to be unique. Most combinations of R and Twitter I could think of were already taken, so I just took veRenaTweeteR 😉
  • Description: Some description.
  • Website: They want you to provide a website URL e.g. where your app can be downloaded. Since I don’t plan to “publish” my app in anyway I just put my blog address.
  • Callback URL: You have to put http://127.0.0.1:1410 to be redirected after authentication.
Here you set everything for your app.
Here you set everything for your app.

When you successfully created your app, go to Keys and Access Tokens. There you find consumer key and consumer secret that you need to authenticate in R.

Here you get the consumer key and the consumer secret.
Here you get the consumer key and the consumer secret.

Authenticating and first steps with twitteR

Save the keys from your Twitter App.

twitter_key<-"your_twitter_key"
twitter_secret<-"your_twitter_secret"
oauth<-setup_twitter_oauth(twitter_key, twitter_secret)

After this, a browser will pop open which will ask you to login with your Twitter account (unless you are already logged in) and ask you to give permissions to yourAppName. When you correctly set the callback URL, the following text will appear:

This message is shown in the browser after successful authentication.
This message is shown in the browser after successful authentication.

With the following command we get the 100 newest tweets of user "ExpectAPatronum" (which is me), but you can do it for other users as well. The second line will display the structure of the newest tweet.

myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])

A tweet contains lots of information (from statusSource we can even tell I sent it using the iPhone app!).

Reference class 'status' [package "twitteR"] with 17 fields
 $ text         : chr "Don't agree with everything but still funny! https://t.co/2bMYBDkfGY"
 $ favorited    : logi FALSE
 $ favoriteCount: num 0
 $ replyToSN    : chr(0) 
 $ created      : POSIXct[1:1], format: "2016-01-18 07:21:31"
 $ truncated    : logi FALSE
 $ replyToSID   : chr(0) 
 $ id           : chr "688984546289790976"
 $ replyToUID   : chr(0) 
 $ statusSource : chr "Twitter for iPhone"
 $ screenName   : chr "ExpectAPatronum"
 $ retweetCount : num 0
 $ isRetweet    : logi FALSE
 $ retweeted    : logi FALSE
 $ longitude    : chr(0) 
 $ latitude     : chr(0) 
 $ urls         :'data.frame':	1 obs. of  5 variables:
  ..$ url         : chr "https://t.co/2bMYBDkfGY"
  ..$ expanded_url: chr "https://twitter.com/jennybryan/status/688866722980364289"
  ..$ display_url : chr "twitter.com/jennybryan/sta…""| __truncated__
  ..$ start_index : num 45
  ..$ stop_index  : num 68
 and 53 methods, of which 39 are  possibly relevant:
   getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet, getLatitude,
   getLongitude, getReplyToSID, getReplyToSN, getReplyToUID, getRetweetCount,
   getRetweeted, getRetweeters, getRetweets, getScreenName, getStatusSource, getText,
   getTruncated, getUrls, initialize, setCreated, setFavoriteCount, setFavorited, setId,
   setIsRetweet, setLatitude, setLongitude, setReplyToSID, setReplyToSN, setReplyToUID,
   setRetweetCount, setRetweeted, setScreenName, setStatusSource, setText, setTruncated,
   setUrls, toDataFrame, toDataFrame#twitterObj

Creating the wordcloud

With the following wordcloud I created the first wordcloud:

set.seed(1234) # to always get the same wordcloud and for better reproducibility
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text})) # to extract only the text of each status object
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)
clean_words<-words[-grep("http|@|#|ü|ä|ö", words)] # remove urls, usernames, hashtags and umlauts (the latter can not be displayed by all fonts)
wordcloud(clean_words, min.freq=2)
Without any specific settings.
Without any specific settings.

Making it look nicer

Since I didn't like the default font and also not the ones suggested in the example section of the package, I started to look for other possible fonts. From the help I found out that everything can be passed as parameter vfont which is also accepted by the method text {graphics} because this parameter will be passed on to this method. This method accepts Hershey fonts (which contains 8 font families with different faces like bold, italic, ...).

Playing around with that a little I generated a few more wordclouds.

wordcloud(clean_words, min.freq=2, vfont=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("gothic italian", "plain"))
Font serif (plain).
Font serif (plain).
Font script (plain).
Font script (plain).
Font gothic italian (plain).
Font gothic italian (plain).

One other important issue for a nice wordcloud is definitely also font color. wordcloud uses the package RColorBrewer for that (which is automatically installed with wordcloud).

The package RColorBrewer provides several palettes of colors that look nice together. I chose the palette "Pastel1" with 7 colors (minimum is 3, maximum depends on the palette). Of course you can use par to change other settings of the plot.

pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)
Font script (plain), gray background and color palette Pastel1.
Font script (plain), gray background and color palette Pastel1.

Other settings

As already seen, you can change the font (vfont) and the color (colors) of the wordcloud. There are a lot more settings in wordcloud:

  • words
  • freq
  • scale (=4,.5): range of the size of the words
  • min.freq (=3): the minimum frequency of a word to be included. I always set it to at least 2.
  • max.words (=Inf): maximum number of words in the wordcloud
  • random.order (=TRUE): otherwise words are plotted in decreasing frequency
  • random.color (=FALSE)
  • rot.per (=.1): how many words are 90 degree rotated
  • colors (= "black")
  • ordered.colors (= FALSE)
  • use.r.layout (=FALSE)
  • fixed.asp (=TRUE)
  • ...: any parameter that can be passed to text (e.g. vfont)

Source code

library(wordcloud)
library(twitteR)

install.packages("extrafont")
library(extrafont)
font_import()

twitter_key<-"your_key"
twitter_secret<-"your_secret"

oauth<-setup_twitter_oauth(twitter_key, twitter_secret)
myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])

tweetTexts<-unlist(lapply(myTweets, function(t) { t$text}))

#### wordcloud

set.seed(1234)
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)

length(grep("http", words))
length(grep("@", words))
length(grep("#", words))

clean_words<-words[-grep("http|@|#|ü|ä|ö", words)]
wordcloud(clean_words, min.freq=2)

#### playing with the settings 

wordcloud(clean_words, min.freq=2, vfont=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("gothic italian", "plain"))


pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)

#### feature image

pal<-brewer.pal(7, "Dark2")
par(bg="lightgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)

2 thoughts on “Use R to connect to twitter and create a wordcloud of your tweets”

  1. Hey Verena,
    Danke erstmals für das tolle Tutorial. Es ist genau das was ich gesucht habe.
    Leider bekomme ich aber immer eine Fehlermeldung bei:
    tweetTexts<-unlist(lapply(myTweets, function(t) { t$text}))

    Das ist der Error text:
    Error in t$text : $ operator is invalid for atomic vectors

    Weißt du, woran das liegen könnte?

    Liebe Grüße

    Clemens

    1. Hallo Clemens!
      Ich habe den Code jetzt noch mal ausprobiert und bei mir klappt es, anfangs hatte ich befürchtet, dass sich das Package oder die Twitter API eventuell geändert hat. Bekommst du Tweets zurück oder ist die Liste myTweets leer? Was kommt denn bei str(myTweets[[1]]) als Output?
      LG,
      Verena

Leave a Reply

Your email address will not be published. Required fields are marked *