Using the Twitter river

In the previous recipes, we have seen rivers that fetch data from data stores, both SQL and NoSQL. In this recipe, we'll discuss how to use the Twitter river to collect tweets from Twitter and store them in ElasticSearch.

Getting ready

You need a working ElasticSearch and OAuth Twitter token. To obtain it, you need to log in to Twitter (https://dev.twitter.com/apps/) and create a new app at https://dev.twitter.com/apps/new.

How to do it...

For using the Twitter river, we need to perform the following steps:

  1. Firstly, we need to install the Twitter river plugin, which is available on Github (https://github.com/elasticsearch/elasticsearch-river-twitter). We can install the river plugin in the usual way as follows:
    bin/plugin -install elasticsearch/elasticsearch-river-twitter/1.4.0
  2. The result should be as follows:
    -> Installing elasticsearch/elasticsearch-river-twitter/1.4.0...
    Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-river-twitter/elasticsearch-river-twitter-1.4.0.zip...
    Downloading …....DONE
    Installed river-twitter into …/elasticsearch/plugins/river-twitter
  3. Restart your ElasticSearch node to be sure that the river plugin is loaded. In the log, you should see the following result:
    …
    [2013-08-18 14:59:10,143][INFO ][node                     ] [Fight-Man] initializing ...
    [2013-08-18 14:59:10,163][INFO ][plugins                  ] [Fight-Man] loaded [river-twitter, transport-thrift, jdbc-river], sites []
  4. We need to create a config(.json) file to configure the river, as follows:
    {
        "type" : "twitter",
        "twitter" : {
            "oauth" : {
                "consumer_key" : "*** YOUR Consumer key HERE ***",
                "consumer_secret" : "*** YOUR Consumer secret HERE ***",
                "access_token" : "*** YOUR Access token HERE ***",
                "access_token_secret" : "*** YOUR Access token secret HERE ***"
            },
            "type" : "sample",
            "ignore_retweet" : true
        },
        "index" : {
            "index" : "my_twitter_river",
            "type" : "status",
            "bulk_size" : 100
        }
    }
  5. Now we can create the river with the current configuration as follows:
    curl -XPUT 'http://127.0.0.1:9200/_river/twitterriver/_meta' -d @config.json
  6. The result will be as follows:
    {"ok":true,"_index":"_river","_type":"twitterriver",
    "_id":"_meta","_version":1}

How it works...

The Twitter river, after having logged into Twitter, starts collecting tweets and sends them in bulk to ElasticSearch.

The river type is twitter and all client configurations live on the twitter object. The following are the most common parameters:

  • oauth: This parameter is an object containing four keys to access the Twitter API. These are generated when you create a Twitter application, and these keys are as follows:
    • consumer_key
    • consumer_secret
    • access_token
    • access_token_secret
  • type: This can be one of the following three allowed by the Twitter API:
  • raw (default false): This parameter if true, the tweets are indexed in ElasticSearch without any change.
  • ignore_retweet (default false): This parameter if true, retweets are skipped.

There's more…

To control the Twitter flow, we need to define an additional filter object.

Defining a filter automatically switches the type to filter. The Twitter filter API allows to define the following additional parameters to filter:

  • tracks: This is used to track the keywords.
  • follow: This follows the IDs of Twitter users.
  • locations: This tracks a set of bounding box.

These are the filter capabilities allowed by Twitter to reduce the number of tweets sent to you and to focus the search on some particular targets.

A filter river config file will look as follows:

{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        },
        "filter" : {
            "tracks" : ["elasticsearch", "cookbook", "packtpub"],
        }
    }
}

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.22.49