Chapter 6. Discovering Hunk Integration Apps

Hunk can be used not only for doing analytics on data stored in Hadoop. We will discover other options using special integration applications. These come from the https://splunkbase.splunk.com/ portal, which has hundreds of published applications. This chapter is devoted to integration schemes between the popular NoSQL document-oriented Mongo and Hunk stores.

What is Mongo?

Mongo is a popular NoSQL solution. There are many pros and cons for using Mongo. It's a great choice when you want to get simple and rather fast persistent key-value storage with a nice JavaScript interface for querying stored data. We recommend you start with Mongo if you don't really need a strict SQL schema and your data volumes are estimated in terabytes. Mongo is amazingly simple compared to the whole Hadoop ecosystem; probably it's the right option to start exploring the denormalized NoSQL world.

Installation

Mongo is already installed and ready to use. Mongo installation is not described. We use Mongo version 3.0.

You will install the special Hunk app that integrates Mongo and Hunk.

Installing the Mongo app

Visit https://splunkbase.splunk.com/app/1810/#/documentation and download the app. You should use the VM browser to download it:

  1. Click on Splunk Apps:
    Installing the Mongo app
  2. Click on Manage Apps:
    Installing the Mongo app
  3. Choose Install app from file:
    Installing the Mongo app
  4. Select the downloaded app and install it.
  5. You should see the Mongo app among the other installed apps, if successful:
    Installing the Mongo app

Mongo provider

A Mongo provider is created and used to access Mongo data. Go to the Virtual Indexes tab and see the created local-mongodb provider:

Mongo provider

Check the provider settings:

Mongo provider

You can change the property named vix.mongodb.host if you want to connect to some other Mongo instance.

Creating a virtual index

Now it's time to create virtual indexes based on Mongo collections. There is a bug. So you have to:

  1. Choose the hadoop provider.
  2. Change the input path to Path to data in HDFS:
    Creating a virtual index
  3. Switch back to the mongo provider:
    Creating a virtual index
  4. Repeat these steps for five mongo collections in order to get five indexes in Hunk:
    Creating a virtual index

Inputting data from the recommendation engine backend

There is a sample of data collected by the recommendation engine backend. When the user clicks on the recommendation, the event is recorded and sent to Mongo. That data is used to self-tune the recommendation engine later. Data is stored into daily collections. Mongo allows us to create daily collections easily and helps to partition the data. The best approach is to think about data partitioning in advance.

Data schemas

Let's explore data schemas. The schema describes the document stored in MongoDB and extracted to Hadoop:

{ [-]
   _timestamp: 1423080002
   block_id: 4
   cross_id: 896bba91c21c620b0902fbec05b3246bce21859c
   idvisitor: 783852c991fbefb8
   is_napoleon: 2
   original_id: null
   rec: 2291655
   service: 2
   shop_id: 173
   target_site_id: 0
   time: 1423080002
   type: 1
}

We are interested in these fields:

  • timestamp: When the recommendation click event happened
  • shop_id: Where the recommendation has been displayed
  • service: Which service provided the recommendation

Data mechanics

Lets see how data in being generated. There is a user that comes to e-commerce site. His browser gets cookie named cross_id. Cookie gives us a chance to track user interaction with site: what pages he visits, what items he clicks. There is a service that shows recommendations to user based on user site activity. Each recommendation has unique ID rec_id. Service stores list of recommendations that were displayed to user. Service captures click event, when user clicked promoted item from recommendations set. We know exactly which recommendations (with unique key named rec_id) did user see and click.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.197.251