Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Build a Simple Blog with Search Capability using Elasticsearch

In this chapter, we will create a simple blog that can create and delete posts. Then we will work on adding some features to our blog such as the following:

Implement a very simple blog with CRUD and admin features
Work and install Elasticsearch and Logstash
Try out the PHP client of Elasticsearch
Learn to build a tool for working with Elasticsearch
Build a cache for searches to our database
Build a chart based on our Elasticsearch information

Creating the CRUD and admin system

First, let's build the SQL of our posts. The database table should contain at the very least the post title, post content, post date, and modified and published dates.

This is what the SQL should look like:

CREATE TABLE posts( 
id INT(11) PRIMARY KEY AUTO INCREMENT, 
post_title TEXT, 
post_content TEXT, 
post_date DATETIME, 
modified DATETIME, 
published DATETIME 
);

Now let's create a function to read the data. A typical blog site has comments and some additional metadata for SEO related to the blog post. But in this chapter, we won't be creating this part. Anyway, it should be fairly trivial to add a table relating to comments data and to have data about SEO metadata about each post in another table.

Let's start by creating the admin system. We need to log in, so we'll have to create a simple login-logout script:

//admin.php 
<form action="admin.php" method="post"> 
Username: <input type="text" name="username"><br /> 
Password: <input type="text" name="username"><br /> 
<input type="submit" name="submit"> 
</form> 
<?php 
$db = new mysqli(); //etc 
 
 
Function checkPassword($username, $password) { 
//generate hash 
    $bpassword = password_hash($password); 
 
//clean up username for sanitization 
$username = $db->real_escape_string($username); 
 
    $query = mysqli_query("SELECT * FROM users WHERE password='".$bpassword."' AND username = '". $username. "'"); 
if($query->num_rows() > 0) { 
return true; 
     } 
return false; 
} 
 
if(isset$_POST[' assword']) && isset ($_POST['username']) ) { 
If(checkPassword($_POST['username'], $_POST['password'])) { 
$_SESSION['admin'] = true; 
$_SESSION['logged_in'] = true; 
$_SESSION['expires'] = 3600; //1 hour 
      $_SESSION['signin_time'] = time(); //unix time 
      header('Location: admin_crud_posts.php'); 
} 
else { 
       //lead the user out 
header('Location: logout.php'); 
    } 
   } 
}

When you log in to admin.php, you set the sessions and are then redirected to the CRUD page.

The script for the admin CRUD page is as follows:

<?php 
$db = new mysqli(); //etc 
function delete($post_id) { 
   $sql_query = "DELETE FROM posts WHERE id= '". $post_id."'"; 
  $db->query($sql_query); 
 
} 
 
function update($postTitle, $postContent, $postAuthor, $postId) { 
$sql_query = "UPDATE posts  
   SET  title = '".$postTitle. "',  
   post_content = '". $postContent. "',  
   post_author='". $postAuthor."'   
   WHERE id = '".$postId."'"; 
   $db->query($sql_query); 
} 
 
function create($postTitle, $postContent, $postAuthor) { 
 
$insert_query = "INSERT INTO posts (null , 
    '" . $postTitle."', 
    '". $postContent."', 
   '" $postAuthor."')";  
   
$db->query($insert_query); 
 
} 
 
$query = "SELECT * FROM posts"; 
$result = $db->query($query); 
 
//display 
?> 
<table> 
<tr> 
<td>Title</td> 
<td>Content</td> 
<td>Author</td> 
<td>Administer</td> 
</tr> 
while($row = $db->fetch_array($query,MYSQLI_ASSOC)) { 
  $id = $row['id']; 
echo '<tr>'; 
 
echo '<td>' .$row['title'] . '</td>'; 
 
echo '<td>' . $row['content'] . '</td>';   
 
echo '<td>' . $row['author'] . '</td>'; 
 
echo '<td><a href="edit.php?postid='.$id.'">Edit</a>'; 
echo '<a href="delete.php?postid='.$id.'">Delete</a>'.</td>';' 
echo '</tr>'; 
} 
echo "</table>"; 
 
?>

In the preceding script, we simply defined some functions that will handle the CRUD operations for us. To display the data, we just simply loop through the database and output it in a table.

The edit and delete pages, which are the scripts needed for a user interface and functions to edit or delete the posts, are as follows:

edit.php:

<?php 
function redirect($home) { 
header('Location: '. $home); 
} 
if(!empty($_POST)) { 
   $query = 'UPDATE posts SET title='" .  $_POST['title']. "', content='". $_POST['content']."' WHERE id = ".$_POST['id']; 
   $db->query($query); 
   redirect('index.php'); 
} else { 
  $id = $_GET['id']; 
  $q = "SELECT * FROM posts WHERE id= '".$_GET['id'] . "'" 
?> 
<form action="edit.php" method="post"> 
 
<input name="post_title type="text" value=" ="<?php echo  $_POST[ 
title'] ?>"> 
 
<input type="text" value="<?php echo $_POST['content'] ?>"> 
 
<input type="hidden" value="<?php echo $_GET['id'] ?>"> 
 
</form> 
<?php 
} 
?>

Let's create the actual functionality for deleting the post. Following is how delete.php would look like:

<?php 
 
function redirect($home) { 
    header('Location: '. $home); 
} 
if(isset ($_GET['postid'])) { 
    $query = "DELETE FROM  posts WHERE id = '".$_GET['post_id']."'"; 
$db->query($query); 
redirect('index.php'); 
}

Our logger for PHP, Monolog, will add the posts to the Elasticsearch using the Logstash plugin for Elasticsearch.

We'll set up a Logstash plugin, which first checks if the document exists and, if not, then inserts it.

To update Elasticsearch, we'll need to perform an upsert, which will update the same record if it exists, and if it does not exist, it will create a new one.

Also, we've implemented a way to delete the post from being visible in our CRUD, but not actually delete it from the database, as we'll need it for retrieval purposes.

For every action that needs to be done, we simply use the $_GET['id'] to determine what we are going to do when that is clicked.

Like any blog, we need a front page for the user to display the posts that are available to read:

index.php:

<html> 
<?php 
$res = $db->query("SELECT * FROM posts LIMIT 10"); 
foreach$posts as $post { 
<h1><?phpecho $post[]?> 
?> 
} 
?>

In the preceding code, we make extensive use of shorthand opening php tags so that we can focus on the page layout. Notice how it weaves in and out of PHP mode, but it looks like we are just using a template, meaning we can see the general outline of the HTML markup without getting too much into the details of the PHP code.

Seeding the post table

Without any data, our blog is useless. Therefore, for demonstration purposes, we'll just use a seeder script to automatically populate our table with data.

Let's use a popular library for generating fake content, Faker, which is available at https://github.com/fzaninotto/Faker.

With Faker, all you have to do is load it by providing the required path to its autoload.php file and load it using composer (composer require fzaninotto/faker).

The complete script for generating fake content is as follows:

<?php 
require "vendor/autoload"; 
$faker = FakerFactory::create(); 
for($i=0; $i < 10; $i++) { 
  $id = $i; 
  $post = $faker->paragraph(3, true); 
  $title  = $faker->text(150);  
  $query = "INSERT INTO posts VALUES (".$id.",'".$title."','".$post . "','1')" 
} 
 
?>

Now let's move on to getting acquainted with Elasticsearch, the database search engine for our blog posts.

What is Elasticsearch?

Elasticsearch is a search server. It's a full-text search engine that comes with an HTTP web interface and schema-free JSON documents. What this means is that we store new searchable data by using JSON. The API to enter these documents uses the HTTP protocol. In this chapter, we will learn how to use PHP and build a rich search engine that can do the following:

Set up the Elasticsearch PHP client
Add search data to Elasticsearch for indexing
Learn how to use keywords for relevance
Cache our search results
Use Elasticsearch with Logstash to store apache logs
Parse XML for storage into Elasticsearch

Installing Elasticsearch and the PHP client

Creating the web interface for consumption of Elasticsearch.

As far as you need to know, Elasticsearch just needs to be installed by simply using the latest source code of Elasticsearch.

The installation instructions are as follows:

Go to https://www.elastic.co/ and download the source file that's related to your computer system, whether it's a Mac OSX, a Linux, or a Windows machine.
After downloading the file to your computer, you should run the setup installation notes.
For example, for Mac OSX and Linux operating systems, you can do the following:

Install Java 1.8.
Download Elasticsearch through curl (in the command line):

curl -L -O 
      https://download.elastic.co/elasticsearch/release/org/elasticsearch
      /distribution/tar/elasticsearch/2.1.0/elasticsearch-2.1.0.tar.gz

Extract the archive and change directory into it:

tar -zxvf elasticsearch-2.1.0.tar.gz
cd /path/to/elasticsearch/archive

Start it up:

cd bin
./elasticsearch

An alternative way to install Elasticsearch for Mac OSX is using homebrew, which is available at http://brew.sh/ . Then, install it by using brew with the following command:

brew install elasticsearch

For Windows operating systems, you just need to click through the wizard installation program, as shown in the following screenshot:
Once that is installed, you also need to install the Logstash agent. The Logstash agent is in charge of sending data to Elasticsearch from various input sources.
You can download it from the Elasticsearch website and follow the installation instructions for your computer system.
For Linux, you can download a tar file and then you just have the other way for Linux, that is to use the package manager, which is either apt-get or yum, depending on your flavor of Linux.

You can test Elasticsearch by installing Postman and doing a GET request to http://localhost:9200:

Install Postman by opening Google Chrome and visiting https://www.getpostman.com/. You can install it on Chrome by going to add-ons and searching for Postman.
Once Postman is installed, you can register or skip registration:
Now try doing a GET request to http://localhost:9200:
The next step is to try out the PHP client library for Elasticsearch in your composer. Following is how to do that:

First, include Elasticsearch in your composer.json file:

      { 
      "require":{ 
      "elasticsearch/elasticsearch":"~2.0" 
      } 
      }

Get composer:

      curl-s http://getcomposer.org/installer | php 
      phpcomposer.phar install --no-dev

Instantiate a new client by including it in your project:

      require'vendor/autoload.php'; 
 
      $client =ElasticsearchClientBuilder::create()->build();

Now let's try indexing a document. To do so, let's create a PHP file to use the PHP client as follows:

$params=[ 
    'index'=> 'my_index', 
    'type'=> 'my_type', 
    'id'=> 'my_id', 
    'body'=>['testField'=> 'abc'] 
]; 
 
$response = $client->index($params); 
print_r($response);

We can also retrieve that document by creating a script with the following code:

$params=[ 
    'index'=> 'my_index', 
    'type'=> 'my_type', 
    'id'=> 'my_id' 
]; 
 
$response = $client->get($params); 
print_r($response);

If we're performing a search, the code is as follows:

$params=[ 
    'index'=> 'my_index', 
    'type'=> 'my_type', 
    'body'=>[ 
        'query'=>[ 
            'match'=>[ 
                'testField'=> 'abc' 
] 
] 
] 
]; 
 
$response = $client->search($params); 
print_r($response);

In a nutshell, the Elasticsearch PHP client makes it easier to insert, search, and get a document from Elasticsearch.

Building a PHP Elasticsearch tool

The aforementioned functionality can be used to create a PHP-backed user interface to insert, query, and search for documents using the Elasticsearch PHP client.

Here is a simple bootstrap (an HTML CSS framework) form:

<div class="col-md-6"> 
<div class="panel panel-info"> 
<div class="panel-heading">Create Document for indexing</div> 
<div class="panel-body"> 
<form method="post" action="new_document" role="form"> 
<div class="form-group"> 
<label class="control-label" for="Title">Title</label> 
<input type="text" class="form-control" id="newTitle" placeholder="Title"> 
</div> 
<div class="form-group"> 
<label class="control-label" for="exampleInputFile">Post Content</label> 
<textarea class="form-control" rows="5" name="post_body"></textarea> 
<p class="help-block">Add some Content</p> 
</div> 
<div class="form-group"> 
<label class="control-label">Keywords/Tags</label> 
<div class="col-sm-10"> 
<input type="text" class="form-control" placeholder="keywords, tags, more keywords" name="keywords"> 
</div> 
<p class="help-block">You know, #tags</p> 
</div> 
<button type="submit" class="btnbtn-default">Create New Document</button> 
</form> 
</div> 
</div> 
</div>

This is what the form should look like:

When the user submits the details of the content, we'll need to catch the content, keywords, or tags that the user has inputted. The PHP script that will enter the inputs into MySQL and then into our script, which will push it onto our Elasticsearch:

public function insertData($data) { 
  $sql = "INSERT INTO posts ('title', 'tags', 'content') VALUES('" . $data['title] . "','" . $data['tags'] . "','" .$data['content'] . ")"; 
mysql_query($sql); 
} 
 
insertData($_POST);

Now let's try to post this document to Elasticsearch as well:

$params=[ 
    'index'=> 'my_posts', 
    'type'=>'posts', 
    'id'=>'posts', 
    'body'=>[ 
       'title'=>$_POST['title'], 
       'tags' => $_POST['tags'], 
       'content' => $_POST['content'] 
] 
]; 
 
$response = $client->index($params); 
print_r($response);

Adding documents to our Elasticsearch

Elasticsearch uses indexes to store each data point into its database. From our MySQL database, we need to post the data into Elasticsearch.

Let's discuss how indexing in Elasticsearch actually works. What makes it faster than conventional search by MySQL is that it searches the index instead.

How does indexing work in Elasticsearch? It uses the Apache Lucene to create something called an inverted index. An inverted index means that it looks up the search terms without having to scan every single entry. It basically means that it has a lookup table that lists all the words ever entered the system.

Here is an overview of the architecture of the ELK stack:

In the preceding diagram, we can see that INPUT SOURCES, usually the logs or some other data source, goes into Logstash. From Logstash, it then goes into Elasticsearch.

Once the data reaches Elasticsearch, it goes through some tokenizing and filtering. Tokenizing is the process of dissecting strings into different parts. Filtering is when some terms are sorted into separate indexes. For example, we may have an Apache log index, and then also have another input source, such as Redis, pushing into another searchable index.

The searchable index is the reversed index we mentioned previously. A searchable index is basically made searchable by storing each term and referring to their original content into an index. It's similar to what is done in an indexed database. It is the same process when we create primary keys and use it as the index to search entire records.

You can have many nodes performing this indexing in a cluster, all handled by the Elasticsearch engine. In the preceding diagram, the nodes are labeled N1 to N4.

Querying Elasticsearch

We now understand each part, so how do we query Elasticsearch? First, let's get introduced to Elasticsearch. When you start running Elasticsearch, you should send an HTTP request to http://localhost:9200.

We can do this using the Elasticsearch web API, which allows us to use RESTful HTTP requests to the Elasticsearch server. This RESTful API is the only way to insert records into Elasticsearch.

Installing Logstash

Logstash is simply the central logging system where all the messages going to Elasticsearch will pass through.

To set up Logstash, follow the guide that's available on the Elasticsearch website:

https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html.

Elasticsearch and Logstash work together to get different types of indexed logs into Elasticsearch.

We need to create something called a transport or middleware between the two data points. To do so, we need to set up Logstash. It is known as the ingestion workhorse for Elasticsearch and much more. It is a data collection engine that pipelines data from data source to the destination, which is Elasticsearch. Logstash is basically like a simple data pipeline.

We will create a cronjob, which is basically a background task, that will add new entries from our post table and put them into Elasticsearch.

Unix and Linux users who are familiar with the concept of a pipe, | , will be familiar with what a pipeline does.

Logstash simply transforms our raw log messages into a format called JSON.

Tip

JSON, also known as JavaScript Object Notation, is a popular format for transferring data between web services. It is lightweight, and many programming languages, including PHP, have a way to encode and decode JSON-formatted messages.

Setting up the Logstash configuration

The input part of a Logstash configuration is concerned with reading and parsing log data correctly. It consists of the input data source and the parser to use. Here is a sample configuration where we will read from a redis input source:

input { 
redis { 
key =>phplogs 
data_type => ['list'] 
  } 
}

But first, to be able to push to redis, we should install and use phpredis, an extension library that allows PHP to insert data into redis.

Installing PHP Redis

Installing PHP Redis should be simple. It's available in most package repositories for Linux platforms. You can read the documentation on how to install it at https://github.com/phpredis/phpredis .

Once you have it installed, you can test that your PHP Redis installation is working by creating the following script and running it:

<?php 
$redis = new Redis() or die("Cannot load Redis module."); 
$redis->connect('localhost'); 
$redis->set('random', rand(5000,6000)); 
echo $redis->get('random');

In the preceding example, we're able to start a new Redis connection and from there set a key called random to a number between 5000 and 6000. Finally, we echo out the data that we've just entered by calling echo $redis->get('random').

With that in place, let's create the real PHP code using the logging library for PHP, called Monolog, to store our logs in Redis.

Let's create a composer.json that the logging project will use.

In the terminal, let's run the initialize composer:

composer init

It will interactively ask some questions after which it should create a composer.json file.

Now install Monolog by typing the following:

composer require monolog/monolog

Let's set up the PHP code that will read from our MySQL database and then push it over to Elasticsearch:

<?php 
require'vendor/autoload.php' 
 
useMonologLogger; 
useMonologHandlerRedisHandler; 
useMonologFormatterLogstashFormatter; 
usePredisClient; 
 
$redisHandler=newRedisHandler(newClient(),'phplogs'); 
$formatter =newLogstashFormatter('my_app'); 
$redisHandler->setFormatter($formatter); 
 
// Create a Logger instance  
$logger =newLogger('logstash_test', array($redisHandler)); 
$logger->info('Logging some infos to logstash.');

In the preceding code, we've created a redisHandler with the name of the logs to be called phplogs. We then set the LogstashFormatter instance to use the application name my_app.

At the end of the script, we create a new logger instance, connect it to the redisHandler, and call the info() method of the logger to log the data.

Monolog separates the responsibilities of the formatter from the actual logging. The logger is responsible for creating the messages, and the Formatter formats the messages into the appropriate format so that Logstash can understand it. Logstash, in turn, pipes it to Elasticsearch, where the data about the log is indexed and is stored in the Elasticsearch index for querying later.

That's the wonderful thing about Elasticsearch. As long as you have Logstash, you can choose from different input sources for Logstash to process and Elasticsearch will do its job of saving the data when Logstash pushes to it.

Encoding and decoding JSON messages

Now that we know how to work with the Monolog library, we need to integrate it into our blog application. We'll do so by creating a cronjob that will check for new blog posts for that day and store them in Elasticsearch through the use of a PHP script.

First, let's create a folder called server_scripts where we put all our cronjobs:

$ mkdir ~/server_scripts 
$ cd ~/server_scripts

Now, here is our code:

<?php 
$db_name = 'test'; 
$db_pass = 'test123'; 
$db_username = 'testuser' 
$host = 'localhost'; 
$dbconn = mysqli_connect(); 
$date_now = date('Y-m-d 00:00:00'); 
$date_now_end = date('Y-m-d 00:00:00',mktime() + 86400); 
$res = $dbcon->query("SELECT * FROM posts WHERE created >= '". $date_now."' AND created < '". $date_now_end. "'"); 
 
while($row = $dbconn->fetch_object($res)) { 
  /* do redis queries here */ 
 
}

Using Logstash, we can read from our redis data and let it do its work, which would then process it and output it with the following output plugin code for Logstash:

output{ 
elasticsearch_http{ 
host=> localhost 
} 
}

Storing Apache logs in Elasticsearch

Monitoring logs is an important aspect of any web application. Most critical systems have what is known as a dashboard, and that is exactly we will build in this segment with PHP.

As a bonus to this chapter, let's talk about another logging topic, server logs. Sometimes we want to be able to determine the performance of the server at a certain time.

Another thing you can do with Elasticsearch is to store Apache logs. For our application, we can add this so that we know about our users a little bit more.

This could be useful, for example, if we're interested in monitoring the browser a user is using and where users are coming from when they access our site.

To do so, we just have to set up some configuration using the Apache input plugin as follows:

input { 
file { 
path => "/var/log/apache/access.log" 
start_position => beginning  
ignore_older => 0  
    } 
} 
 
filter { 
grok { 
match => { "message" => "%{COMBINEDAPACHELOG}"} 
    } 
geoip { 
source => "clientip" 
    } 
} 
 
output { 
elasticsearch {} 
stdout {} 
}

A Kibana dashboard may be created when you install Kibana from Elasticsearch; however, it requires end users to already know how to use the tool to create various queries.

However, there is a need to make it simpler for upper management to view the data without having to know how to create Kibana dashboards.

For our end users to not have to learn how to use Kibana and create dashboards, we will simply query the ILog information when the dashboard page is requested. For the charting library, we will use a popular library known as Highcharts. To get the information, however, we will need to create a simple query that will return some information to us in JSON format.

Handle the Apache logs, we can create it using PHP Elasticsearch client library. It's a simple client library that allows us to query Elasticsearch for information that we need, including the number of hits.

We will create a simple histogram for our website to show the number of accesses that are logged in our database.

For example, we'll use the PHP Elasticsearch SDK to query Elasticsearch and display the Elasticsearch results.

We also have to make the histogram dynamic. Basically, when the user wants to select between certain dates, we should be able to set up Highcharts to just get the data points and create a graph. If you haven't checked out Highcharts, please refer to http://www.highcharts.com/ .

Getting filtered data to display with Highcharts

Like any chart user, we sometimes require the ability to filter down whatever we see in our graph. Instead of relying on Highcharts to give us controls to filter down our data, we should be able to do the filtering by changing the data that Highcharts will render.

In the following Highcharts code, we are adding the following container divider for our page; first, we get the data from our Elasticsearch engine using JavaScript:

<script> 
 
$(function () {  
client.search({ 
index: 'apachelogs', 
type: 'logs', 
body: { 
query: { 
       "match_all": { 
    
       }, 
       {  
         "range": { 
             "epoch_date": { 
               "lt": <?php echo mktime(0,0,0, date('n'), date('j'), date('Y') ) ?>, 
 
               "gte": <?php echo mktime(0,0,0, date('n'), date('j'), date('Y')+1 ) ?> 
          } 
         } 
       }  
          } 
       } 
}).then(function (resp) { 
var hits = resp.hits.hits; 
varlogCounts = new Array(); 
    _.map(resp.hits.hits, function(count) 
    {logCounts.push(count.count)}); 
 
  $('#container').highcharts({ 
chart: { 
type: 'bar' 
        }, 
title: { 
text: 'Apache Logs' 
        }, 
xAxis: { 
categories: logDates 
        }, 
yAxis: { 
title: { 
text: 'Log Volume' 
            } 
        }, 
   plotLines: [{ 
         value: 0, 
         width: 1, 
         color: '#87D82F' 
         }] 
   }, 
   tooltip: { 
   valueSuffix: ' logs' 
    }, 
   plotOptions: { 
   series: { 
         cursor: 'pointer', 
         point: { 
   }, 
   marker: { 
   lineWidth: 1 
       } 
     } 
   }, 
   legend: { 
         layout: 'vertical', 
         align: 'right', 
         verticalAlign: 'middle', 
         borderWidth: 0 
      }, 
   series: [{ 
   name: 'Volumes', 
   data: logCounts 
       }] 
      });  
 
}, function (err) { 
console.trace(err.message); 
    $('#container').html('We did not get any data'); 
}); 
 
 
}); 
   </script> 
 
   <div id="container" style="width:100%; height:400px;"></div>

This is done is using the filter command of JavaScript and then parsing that data into our Highcharts graph. You'll also need to use underscore for the filtering function, which will help sort out which data we want to present to the user.

Let's first build the form to filter our Highcharts histogram.

This is what the HTML code for the search filter in the CRUD view will look like:

<form> 
<select name="date_start" id="dateStart"> 
<?php 
$aWeekAgo = date('Y-m-d H:i:s', mktime( 7 days)) 
    $aMonthAgo = date(Y-m-d H:i:s', mktime( -30));    
//a month to a week 
<option value="time">Time start</option> 
</select> 
<select name="date_end" id="dateEnd"> 
<?php 
    $currentDate= date('Y-m-d H:i:s');        
$nextWeek = date('', mktime(+7 d)); 
    $nextMonth = date( ,mktime (+30)); 
?> 
<option value=""><?php echo substr($currentData,10);?> 
</option> 
<button id="filter" name="Filter">Filter</button> 
</form>

To enable quick re-rendering of our graph, we have to attach a listener using plain old JavaScript every time the filter button is clicked and then simply erase the information of the div element that contains our Highcharts graph.

The following JavaScript code will update the filter using jQuery and underscore and the same code in the first bar chart:

<script src="https://code.jquery.com/jquery-2.2.4.min.js" integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44=" crossorigin="anonymous"></script> 
 
<script src="txet/javascript"> 
$("button#filter").click {  
dateStart = $('input#dateStart').val().split("/"); 
dateEnd = $('input#dateEnd').val().split("/"); 
epochDateStart = Math.round(new Date(parseInt(dateStart[])]), parseInt(dateStart[1]), parseInt(dateStart[2])).getTime()/1000); 
epochDateEnd = Math.round(new Date(parseInt(dateEnd [])]), parseInt(dateEnd [1]), parseInt(dateEnd[2])).getTime()/1000); 
 
       }; 
 
client.search({ 
index: 'apachelogs', 
type: 'logs', 
body: { 
query: { 
       "match_all": { 
    
       }, 
       {  
         "range": { 
             "epoch_date": { 
               "lt": epochDateStart, 
 
               "gte": epochDateEnd 
          } 
         } 
       }  
          } 
       } 
}).then(function (resp) { 
var hits = resp.hits.hits; //look for hits per day fromelasticsearch apache logs 
varlogCounts = new Array(); 
    _.map(resp.hits.hits, function(count) 
    {logCounts.push(count.count)}); 
 
    
$('#container').highcharts({ 
chart: { 
type: 'bar' 
        }, 
title: { 
text: 'Apache Logs' 
        }, 
xAxis: { 
categories: logDates 
        }, 
yAxis: { 
title: { 
text: 'Log Volume' 
            } 
        } 
 
   }); 
}); 
</script>

In the preceding code, we've included jquery and underscore libraries. When the button is clicked to focus on some dates, we set $_GET['date'] through the form and then PHP gets the information using a simple trick where we re-render the div containing the graph by simply flushing the ihtml elements inside it, and then asking Highcharts to re-render the data.

To make this a little cooler, we can use a CSS animation effect so it looks like we're focusing a camera.

This can be done using the jQuery CSS transform techniques, and then resizing it back to normal and reloading a new graph:

$("button#filter").click( function() { 
   //..other code 
  $("#container").animate ({ 
width: [ "toggle", "swing" ], 
height: [ "toggle", "swing" ] 
}); 
    
});

Now we've learned how to filter using JavaScript and allow filtering of the JSON data using the filter style. Take note that filter is a relatively new JavaScript function; it only got introduced with ECMAScript 6. We've used it to create the dashboard that upper management needs to be able to generate reports for their own purposes.

We can use the underscore library, which has the filter function.

We'll just load the latest logs that are in Elasticsearch, and then, if we want to perform a search, we'll create a way to filter and specify what data to search in the logs.

Let's create the Logstash configuration for Apache's logs to be grokked by Elasticsearch.

All we need to do is point the input Logstash configuration to our Apache logs location (usually a file in the /var/log/apache2 directory).

This is the basic Logstash configuration for Apache, which reads the Apache access log file at /var/log/apache2/access.log:

input {    file { 
path => '/var/log/apache2/access.log' 
        } 
} 
 
filter { 
grok { 
    match =>{ "message" => "%{COMBINEDAPACHELOG}" } 
  } 
date { 
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] 
  } 
}

It uses something called a grok filter that matches anything that resembles an Apache log format and matches the timestamp to the dd/MMM/yyyy:HH:mm:ss Z date format.

If you think of Elasticsearch as the end of the rainbow and Apache logs as the start of the rainbow, then Logstash is like the rainbow that transports the logs from both ends into a format that Elasticsearch can understand.

Grokking is the term used to describe reformatting a message format into something that Elasticsearch can interpret. This just means that it will search for a pattern and filter match for that pattern in particular, it will look up the log's timestamp and message and other attributes in JSON, which is what Elasticsearch then stores in its database.

Dashboard app for viewing Elasticsearch logs

Let's now create a dashboard for our blog that will allow us to see the data that we have in Elasticsearch, both posts and Apache logs. We'll use the PHP Elasticsearch SDK to query Elasticsearch and display the Elasticsearch results.

We'll just load the latest logs that are in Elasticsearch, and then, if we want to perform a search, we'll create a way to filter and specify what data to search in the logs.

This is what the search filter form will look like:

Dashboard app for viewing Elasticsearch logs

In search.php, we'll create a simple form for searching values in Elasticsearch:

<form action="search_elasticsearch.php" method="post"> 
<table> 
   <tr> 
<td>Select time or query search term 
<tr><td>Time or search</td> 
<td><select> 
    <option value="time">Time</option> 
     <option value="query">Query Term</option> 
<select> 
</td>  
</tr> 
<tr> 
<td>Time Start/End</td> 
  <td><input type="text" name="searchTimestart" placeholder="YYYY-MM-DD HH:MM:SS" > /  
  <input type="text" name="searchTimeEnd" placeholder="YYYY-MM-DD HH:MM:SS" > 
</td> 
</tr> 
<tr> 
<td>Search Term:</td><td><input name="searchTerm"></td> 
</tr> 
<tr><td colspan="2"> 
<input type="submit" name="search"> 
</td></tr> 
</table> 
</form>

When the user clicks on Submit, we will then show the results to the user.

Our form should simply show us what records we have for that day for both the Apache logs and the blog posts.

This is how we query ElasticSearch for that information in the command line using curl:

$ curl http://localhost:9200/_search?q=post_date>2016-11-15

Now we'll get a JSON response from Elasticsearch:

{"took":403,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.01989093,"hits":[{"_index":"posts","_type":"post","_id":"1","_score":0.01989093,"_source":{ 
  body: { 
    "user" : "kimchy", 
    "post_date" : "2016-11-15T14:12:12", 
    "post_body" : "trying out Elasticsearch" 
  }  
}}]}}

We can use a REST client (a way to query RESTful API's in Firefox) as well to query the database just specify the GET method and the path and set the q variable in the URL to the parameters you want to search:

Simple search engine with result caching

To install the PHP Redis, visit https://github.com/phpredis/phpredis .

Every time the user searches, we can save their recent searches in Redis and just present those results if they already exist. The implementation might looks as follows:

<?php 
$db = new mysqli(HOST, DB_USER, DB_PASSWORD, DB_NAME); //define the connection details 
    
if(isset($_POST['search'])) {  
 
$hashSearchTerm = md5($_POST['search']); 
    //get from redis and check if key exist,  
    //if it does, return search result    
    $rKeys = $redis->keys(*); 
     
   if(in_array($rKeys, $hashSearchTerm){  
         $searchResults =  $redis->get($hashSearchTerm); 
         echo "<ul>"; 
         foreach($searchResults as $result) { 
                 echo "<li> 
     <a href="readpost.php?id=" . $result ['postId']. "">".$result['postTitle'] . "</a> 
        </li>" ; 
         echo "</ul>"; 
        } 
   } else { 
     $query = "SELECT * from posts WHERE post_title LIKE '%".$_POST['search']."%' OR post_content LIKE '%".$_POST['search']."%'"; 
 
     $result = $db->query($query); 
     if($result->num_rows() > 0) { 
     echo "<ul>;" 
     while ($row = $result->fetch_array(MYSQL_BOTH))  
       { 
       $queryResults = [ 
       'postId' => $row['id'], 
       'postTitle' => $row['post_title']; 
        ]; 
 
        echo "<li> 
     <a href="readpost.php?id=" . $row['id']. "">".$row['post_title'] . "</a> 
        </li>" ; 
       } 
     echo "</ul>"; 
  
     $redis->setEx($hashSearchTerm, 3600, $queryResults); 
 
     } 
   } 
} //end if $_POST 
else { 
  echo "No search term in input"; 
} 
?>

Redis is a simple dictionary. It stores a key and the value of that key in its database. In the preceding code, we use it to store a reference to the user's search results so that next time the same search is performed, we can just pull what we have from the Redis data.

In the preceding code, we converted the search term into a hash so that it can be easily identified as the same query that came through and it can be stored easily as the key (which should be one string only, no spaces allowed). If after hashing we find the key in Redis, then we get it from Redis instead of fetching it from the database.

Redis can expire keys by saving the key using the $redis->setEx method, which allows us to store the key and expire it after X number of seconds. In this case, we're storing it for 3,600 seconds, which is equivalent to an hour.

Cache basics

The concept of a cache is to return the already searched items back to the user so that for other users who are searching for the same exact search results, the application should no longer need to do a full database fetch from the MySQL database.

The bad thing with having a cache is that you have to perform cache invalidation.

Cache invalidation of Redis data

Cache invalidation is when you need to expire and delete the cache data. This is because your cache may no longer be real time after a while. Of course, after invalidation, you need to renew the data in the cache, which happens when there is a new request to the data. The cache invalidation process can take one of the following three methods:

Purge is when we remove content from the cache data right away.
Refresh just means get new data and overwrite the already existing data. This means that even though there is a match in the cache, we will refresh that match with the new information fresh from wherever it comes from.
Ban is basically adding previously cached content to a ban list. When another client fetches the same information and, upon checking the blacklist, if it already exists, the cached content just gets updated.

We can run a cronjob continuously in the background that will update every cache result with new results for that search.

This is what the background PHP script that runs every 15 minute might look like in crontab:

0,15,30,45 * * * * php /path/to/phpfile

To get Logstash to put data in Redis, we just need to do the following:

# shipper from apache logs to redis data 
output { 
redis { host => "127.0.0.1" data_type => "channel" key => "logstash-%{@type}-%{+yyyy.MM.dd.HH}" } 
}

This is how the PHP script that deletes data from the cache would work:

functiongetPreviousSearches() { 
return  $redis->get('searches'); //an array of previously searched searchDates 
} 
 
$prevSearches = getPreviousSearches(); 
 
$prevResults = $redis->get('prev_results');  
 
if($_POST['search']) { 
 
  if(in_array($prevSEarches)&&in_array($prevResults[$_POST['search']])) { 
if($prevSEarches[$_POST['search'])] { 
            $redis->expire($prevSearches($_POST['searchDate'])) { 
         Return $prevResults[$_POST['search']]; 
} else { 
         $values =$redis->get('logstash-'.$_POST['search']); 
             $previousResults[] = $values; 
         $redis->set('prev_results', $previousResults); 
          
          } 
          
} 
     }   
  }

In the preceding script, we basically check the searchDate searched earlier, and if we have it, we set it to expire.

If it also appears in the previousResults array, we give that to the user; otherwise, we do a new redis->get command to get the results for that searched date.

Using browser localStorage as cache

Another option for cache storage is to save it in the client browser itself. The technology is known as localStorage.

We can use it as a simple cache for the user and store the search results, and if the user wants to search for the same thing, we just check the localStorage cache.

Tip

localStorage can only store 5 MB of data. But this is quite a lot considering that a regular text file is just a few kilobytes.

We can make use of the elasticsearch.js client instead of the PHP client to make requests to our Elasticsearch. The browser-compatible version can be downloaded from https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/browser-builds.html .

We can also use Bower to install the elasticsearch.js client:

bower install elasticsearch

For our purpose, we can take advantage of the jQuery Build by creating a client using jQuery:

var client = new $.es.Client({ 
hosts: 'localhost:9200' 
});

We should now be able to use JavaScript to populate the localStorage.

Since we are just querying and displaying on the client side, it's a perfect match!

Take note that we might not be able to log the data that was searched for by using a client-side script. However, we could save the search query history as a model containing the items keys that were searched for.

The basic JavaScript searchQuery object would look like the following:

varsearchQuery = { 
search: {queryItems: [ { 
'title: 'someName',  
  'author': 'Joe',  
   'tags': 'some tags'}  
] }; 
};

We can test whether the client works by running the following JavaScript file:

client.ping({ 
requestTimeout: 30000, 
 
  // undocumented params are appended to the query string 
hello: "elasticsearch" 
}, function (error) { 
if (error) { 
console.error('elasticsearch cluster is down!'); 
  } else { 
console.log('All is well'); 
  } 
});

The results could be cached into localStorage by doing the following:

localStorage.setItem('results',JSON.stringify(results));

We'll populate the results with data we find from elasticsearch and then just check if the same query was done earlier.

We also need to keep the data fresh. Let's hypothesize that it takes about 15 minutes before a user gets bored and would refresh the page to try to see new information.

In the same manner, we check whether the search result have been displayed in the past:

var searches = localStorage.get('searches'); 
if(searches != mktime( date('H'), date('i')-15) ) { 
  //fetch again 
varsearchParams = { 
index: 'logdates', 
body:  
query: { 
match: { 
date: $('#search_date').value; 
 
} 
client.search(); 
} else { 
  //output results from previous search; 
prevResults[$("#search_date").val()]; 
}

Now, whenever we expire the search criteria, say after about 15 minutes, we will simply clear the cache and put in the new search results that Elasticsearch finds.

Working with streams

Here, we will take advantage of PHP's Monolog library and then stream the data instead of pushing complete strings. The nice thing about working with streams is that they can easily pipe into Logstash and, in turn, store it into Elasticsearch as indexed data. Logstash also has features for creating data streams and streaming the data.

We can directly input our data without even using Logstash, using something that is known as streams. For more information on streams, refer to http://php.net/manual/en/book.stream.php .

Here, for example, is a way to push some data to Elasticsearch: http://localhost/dev/streams/php_input.php:

curl -d "Hello World" -d "foo=bar&name=John" http://localhost/dev/streams/php_input.php

In php_input, we can put the following code:

readfile('php://input')

We'll be getting Hello World&foo=bar&name=John, which means that PHP was able to get the very first string as a stream using the PHP input stream.

To play around with PHP streams, let's create a stream using PHP manually. PHP developers usually have some experience working with stream data already when working with output buffering.

The idea with output buffering is to collect the stream until it's complete and then show it to the user.

This is especially useful when the stream isn't finished yet and we need to wait for the ending character for the data to be completely transferred.

We can push streams into Elasticsearch! This can be done using the Logstash input plugin to handle streams. This is how PHP can output to a stream:

<?php 
require 'vendor/autoload.php'; 
$client = new ElasticsearchClient(); 
ob_start(); 
$log['body'] = array('hello' => 'world', 'message' => 'some test'); 
$log['index'] = 'test'; 
$log['type'] = 'log'; 
echo json_encode($log);  
//flush output of echo into $data 
$data = ob_get_flush(); 
$newData = json_decode($data); //turn back to array 
$client->index($newData);

Storing and searching XML documents using PHP

We can also work with XML documents and insert them into Elasticsearch. To do so, we can transform the data into JSON and then push the JSON into Elasticsearch.

First, you can check out the following XML to JSON converter:

If you want to check that the XML has been converted correctly to JSON, check out the XML TO JSON Converter tool at http://codebeautify.org/xmltojson ; from there, you can easily check out how to export an XML to JSON:

Storing and searching XML documents using PHP

Using Elasticsearch to search a social network database

In this section, we'll simply use our knowledge to apply it to an existing social network built with PHP.

Let's pretend we have users who want to be able to search their social feed. Here's where we build a full-blown auto-dropdown search system.

Every time the user posts, we need to be able to store all the data in Elasticsearch.

However, in our search queries, we will match search results to the actual word that the user fetched. If it doesn't match the query in each, character-by-character, we won't display it.

We first need to build the feed. The SQL schema will look as follows:

CREATE TABLE feed ( 
Id INT(11) PRIMARY KEY, 
Post_title TEXT, 
post_content TEXT, 
post_topics TEXT, 
post_time DATETIME, 
post_type VARCHAR(255), 
posted_by INT (11) DEFAULT '1'  
) ;

Post_type would handle the type of post—photo, video, link, or just plain text.

So, if the user added a type of picture, it would be saved as an image type. And when a person searches for a post, they can filter by the type.

Every time users save a new photo, or a new post, we will also store the data into Elasticsearch, which will look as follows:

INSERT INTO feed (`post_title`, `post_content`, `post_time`, `post_type`) VALUES ('some title', 'some content', '2015-03-20 00:00:00', 'image', 1);

Now we need to make an input form when the user inserts the preceding new posting. We'll just build the one that can upload a photo with a title or just add text:

<h2>Post something</h2> 
 
<form type="post" action="submit_status.php" enctype="multipart/form-data"> 
Title:<input name="title" type="text" /> 
Details: <input name="content" type="text"> 
Select photo:  
<input type="file" name="fileToUpload" id="fileToUpload"> 
<input type="hidden" value="<?php echo $_SESSION['user_id'] ?>" name="user_id"> 
<input name="submit" type="submit"> 
 
</form>

The submit_status.php script will have the following code to save into the database:

<?php 
use ElasticsearchClientBuilder; 
 
   require 'vendor/autoload.php'; 
 
$db = new mysqli(HOST, DB_USER, DB_PASSWORD, DATABASE); 
 
 $client = ClientBuilder::create()->build(); 
if(isset($_POST['submit'])) { 
  $contentType = (!empty($_FILES['fileToUpload'])) ? 'image' : ' 
 
$db->query("INSERT INTO feed (`post_title`, `post_content`, `post_time`, `post_type`, `posted_by`)  
VALUES ('". $_POST['title'] ."','" . $_POST['content'] . "','" . date('Y-m-d H:i:s'). "','" . $contentType . "','" . $_POST['user_id']); 
 
//save into elasticsearch 
$params = [ 
    'index' => 'my_feed', 
    'type' => 'posts', 
    'body' => [  
      'contenttype' => $contentType, 
      'title'  => $_POST['title'], 
      'content' => $_POST['content'],         
      'author' => $_POST['user_id'] 
    ] 
]; 
       $client->index($params); 
  } 
 
 ?>

Displaying randomized search engine results

The preceding feed database table is the table that everyone will post to. We need to enable randomly showing what's on the feed. We can insert posts into feeds instead of storing.

By searching from Elasticsearch and randomly rearranging the data, we can make our searches more fun. In a way, this makes sure that people using our social network will be able to see random posts in their feed.

To search from the posts, instead of doing a direct query to SQL, we will search the Elasticsearch database for the data.

First, let's figure out how to insert the data into an Elasticsearch index called posts. With Elasticsearch open, we simply do the following:

$ curl-XPUT 'http://localhost:9200/friends/'-d '{ 
"settings":{ 
"number_of_shards":3, 
"number_of_replicas":2 
} 
}'

We will probably also want to search our friends, and if we have a ton of friends, they won't all be on the feed. So, we just need another index to search called the friends index.

The following code, when run in the Linux command line, will allow us to create a new friends index:

$ curl-XPUT 'http://localhost:9200/friends/'-d '{ 
"settings":{ 
"number_of_shards":3, 
"number_of_replicas":2 
} 
}'

So, we can now store data about our friends using the friends index:

$ curl-XPUT 'http://localhost:9200/friends/posts/1'-d '{ 
"user":"kimchy", 
"post_date":"2016-06-15T14:12:12", 
"message":"fred the friend" 
}'

We'll usually look for friends of friends and we'll, of course, show that to our user if there are any friends with the search query.

Summary

In this chapter, we discussed how to create a blog system, experimented with Elasticsearch, and were able to do the following:

Create a simple blog application and store data in MySQL
Install Logstash and Elasticsearch
Practice working with Elasticsearch using curl
Get data into Elasticsearch using the PHP Client
Chart information (hits) from Elasticsearch using Highcharts
Use the elasticsearch.js client to query Elasticsearch for information
Use Redis and localStorage in the browser to work with caching

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Build a Simple Blog with Search Capability using Elasticsearch

Create new playlist

Sign In

Sign Up