Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Using the Twitter river

In the previous recipes, we have seen rivers that fetch data from data stores, both SQL and NoSQL. In this recipe, we'll discuss how to use the Twitter river to collect tweets from Twitter and store them in ElasticSearch.

Getting ready

You need a working ElasticSearch and OAuth Twitter token. To obtain it, you need to log in to Twitter (https://dev.twitter.com/apps/) and create a new app at https://dev.twitter.com/apps/new.

How to do it...

For using the Twitter river, we need to perform the following steps:

Firstly, we need to install the Twitter river plugin, which is available on Github (https://github.com/elasticsearch/elasticsearch-river-twitter). We can install the river plugin in the usual way as follows:
```
bin/plugin -install elasticsearch/elasticsearch-river-twitter/1.4.0
```

The result should be as follows:

-> Installing elasticsearch/elasticsearch-river-twitter/1.4.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-river-twitter/elasticsearch-river-twitter-1.4.0.zip...
Downloading …....DONE
Installed river-twitter into …/elasticsearch/plugins/river-twitter

Restart your ElasticSearch node to be sure that the river plugin is loaded. In the log, you should see the following result:

…
[2013-08-18 14:59:10,143][INFO ][node                     ] [Fight-Man] initializing ...
[2013-08-18 14:59:10,163][INFO ][plugins                  ] [Fight-Man] loaded [river-twitter, transport-thrift, jdbc-river], sites []

We need to create a config(.json) file to configure the river, as follows:

{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "type" : "sample",
        "ignore_retweet" : true
    },
    "index" : {
        "index" : "my_twitter_river",
        "type" : "status",
        "bulk_size" : 100
    }
}

Now we can create the river with the current configuration as follows:

curl -XPUT 'http://127.0.0.1:9200/_river/twitterriver/_meta' -d @config.json

The result will be as follows:

{"ok":true,"_index":"_river","_type":"twitterriver",
"_id":"_meta","_version":1}

How it works...

The Twitter river, after having logged into Twitter, starts collecting tweets and sends them in bulk to ElasticSearch.

The river type is twitter and all client configurations live on the twitter object. The following are the most common parameters:

oauth: This parameter is an object containing four keys to access the Twitter API. These are generated when you create a Twitter application, and these keys are as follows:
- consumer_key
- consumer_secret
- access_token
- access_token_secret
type: This can be one of the following three allowed by the Twitter API:
- sample
- filter (refer to https://dev.twitter.com/docs/api/1.1/post/statuses/filter)
- firehose
raw (default false): This parameter if true, the tweets are indexed in ElasticSearch without any change.
ignore_retweet (default false): This parameter if true, retweets are skipped.

There's more…

To control the Twitter flow, we need to define an additional filter object.

Defining a filter automatically switches the type to filter. The Twitter filter API allows to define the following additional parameters to filter:

tracks: This is used to track the keywords.
follow: This follows the IDs of Twitter users.
locations: This tracks a set of bounding box.

These are the filter capabilities allowed by Twitter to reduce the number of tweets sent to you and to focus the search on some particular targets.

A filter river config file will look as follows:

{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "filter" : {
            "tracks" : ["elasticsearch", "cookbook", "packtpub"],
        }
    }
}

Table of Contents for
Using the Twitter river

Using the Twitter river

Getting ready

How to do it...

How it works...

There's more…

See also

Table of Contents for Using the Twitter river

Create new playlist

Sign In

Sign Up

Using the Twitter river

Getting ready

How to do it...

How it works...

There's more…

See also

Table of Contents for
Using the Twitter river