Chapter 2. Building Your First Data Pipeline with ELK

In the previous chapter, we got familiar with each component of ELK Stack—Elasticsearch, Logstash, and Kibana. We got the components installed and configured. In this chapter, we will build our first basic data pipeline using ELK Stack. This will help us understand how easy it is to get together the components of ELK Stack to build an end-to-end analytics pipeline.

While running the example in this chapter, we assume that you already installed Elasticsearch, Logstash, and Kibana as described in Chapter 1, Introduction to ELK Stack.

Input dataset

For our example, the dataset that we are going to use here is the daily Google (GOOG) Quotes price dataset over a 6 month period from July 1, 2014 to December 31, 2014. This is a good dataset to understand how we can quickly analyze simple datasets, such as these, with ELK.

Note

This dataset can be easily downloaded from the following source:

http://finance.yahoo.com/q/hp?s=GOOG

Data format for input dataset

The most significant fields of this dataset are Date, Open Price, Close Price, High Price, Volume, and Adjusted Price.

The following table shows some of the sample data from the dataset. The actual dataset is in the CSV format.

Date

Open

High

Low

Close

Volume

Adj Close

Dec 31, 2014

531.25

532.60

525.80

526.40

1,368,200

526.40

Dec 30, 2014

528.09

531.15

527.13

530.42

876,300

530.42

Dec 29, 2014

532.19

535.48

530.01

530.33

2,278,500

530.33

Dec 26, 2014

528.77

534.25

527.31

534.03

1,036,000

534.03

Dec 24, 2014

530.51

531.76

527.02

528.77

705,900

528.77

Dec 23, 2014

527.00

534.56

526.29

530.59

2,197,600

530.59

Dec 22, 2014

516.08

526.46

516.08

524.87

2,723,800

524.87

Dec 19, 2014

511.51

517.72

506.91

516.35

3,690,200

516.35

Dec 18, 2014

512.95

513.87

504.70

511.10

2,926,700

511.10

Dec 17, 2014

497.00

507.00

496.81

504.89

2,883,200

504.89

Dec 16, 2014

511.56

513.05

489.00

495.39

3,964,300

495.39

Dec 15, 2014

522.74

523.10

513.27

513.80

2,813,400

513.80

Dec 12, 2014

523.51

528.50

518.66

518.66

1,994,600

518.66

Dec 11, 2014

527.80

533.92

527.10

528.34

1,610,800

528.34

Dec 10, 2014

533.08

536.33

525.56

526.06

1,712,300

526.06

We need to put this data into a location from where ELK Stack can access it for further analysis.

We will look at some of the top entries of the CSV file using the Unix head command as follows:

$ head GOOG.csv
2014-12-31,531.25244,532.60236,525.80237,526.4024,1368200,526.4024
2014-12-30,528.09241,531.1524,527.13239,530.42242,876300,530.42242
2014-12-29,532.19244,535.48242,530.01337,530.3324,2278500,530.3324
2014-12-26,528.7724,534.25244,527.31238,534.03247,1036000,534.03247
2014-12-24,530.51245,531.76141,527.0224,528.7724,705900,528.7724
2014-12-23,527.00238,534.56244,526.29236,530.59241,2197600,530.59241
2014-12-22,516.08234,526.4624,516.08234,524.87238,2723800,524.87238
2014-12-19,511.51233,517.72235,506.9133,516.35229,3690200,516.35229
2014-12-18,512.95233,513.87231,504.7023,511.10233,2926700,511.10233

Each row represents the Quote price data for a particular date separated by a comma.

Now, when we are familiar with the data, we will set up the ELK Stack where we can parse and process the data using Logstash, index it in Elasticsearch, and then build beautiful visualizations in Kibana.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.130.227