Fetching the number of tweets

Depending on how many tweets you wish for, it could take many days and hours to get it. The query we are about to conduct may take only a few minutes. Using the beepr package, it's possible to trigger an alarm once the query is complete:

if(!require(beepr)){install.packages('beepr')}
library(beepr)

Given point number three, you might not be able to reproduce the results that I got myself. Yet, I encourage you to try the codes and compare the results; that's a great way to get some practice. Let's get started with search_tweets2():

tweets_dt <- search_tweets2(q = '#rstats', 
n = 20000,
include_rts = T,
tweet_mode = 'extended',
retryonratelimit = T,
token = my_token)

for(i in 0:2){beep(5); Sys.sleep(3)}

The previous code will collect tweets that have #rstats (q parameter) by using search_tweets2() and will store those in a DataFrame named tweets_dt. There is also a function called search_tweets(), the first of its kind. The former shows a small advantage by enabling you to directly pass arguments to the API; the text_mode argument can be found inside the API's documentation. 

The default tweet_mode is extended and won't change our outcome. Yet, this is a way to introduce how parameters from the API can be inputted into search_tweets2().

The code asked for 20,000 tweets, but I ended up with only 15,999; that is explained by point number three. Arguments include_rts = T and retryonratelimit = T are respectively asking search_tweets2() to also query retweets and also to continue after 15 minutes if the rate limit is reached. The last line will trigger an alarm that will repeat three times in a row, with a 3 seconds interval between each beep.

Setting an alarm with beepr::beep() is great to get you time to do other stuff while waiting for some time-consuming code to run. There are plenty of options for sounds; my favorite is beep(5), which plays the treasure sound from Zelda's game.

Our recently created DataFrame with Twitter data (tweets_dt) has 42 variables. You can check how many observations and variables you got using the following code:

dim(tweets_dt)
# [1] 15999 42
names(tweets_dt)

The last line outputs all of the variables' names. We used the search_tweets2() function given by rtweet to collect information about tweets related to #rstats. There are a bunch of other functions that are often called to collect Twitter data:

  • get_timeline(): Used to get the user's timeline
  • stream_tweets(): Collects a live stream of Twitter data
  • post_tweet(): Post tweets from your console
  • save_as_csv(): Easily saves Twitter data that's created by rtweet
  • read_twitter_csv(): Easily reads Twitter data, saved as a .csv
For the whole documentation, follow https://cran.r-project.org/web/packages/rtweet/rtweet.pdf. I put a similar CSV file out on the internet. If you want to get the exactly same results that I am getting, try this: tweets_dt <- read_twitter_csv(url('http://bit.do/rstats4life')). Don't forget to load the rtweet package.

So far, we briefly discussed what KDD and data mining could mean. We also learned about some ways to retrieve text from the web using three different packages: httr, rvest, and rtweet. Now, it's time to move further, clean the data, and transform it in an insightful way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.25.23