Handling time-sliced data using aliases

There are situations in which time-sliced data is the only logical solution to go for. For example, if you are indexing logs to your SolrCloud cluster, you probably want to divide the data in time slices depending on how much data you have—if you only have index logs with error level, then you can probably live with monthly collections. If you are indexing all logs from all your applications, daily collections will probably be the way to go. With the time-sliced collections, there are a few things that the application needs to handle; for example, knowing to which collection it should currently send data to and which collection or collections should be used for querying. To simplify this, Solr allows you to use aliases and this recipe will show you how to handle that.

Getting ready

We assume that we already have our configuration stored in ZooKeeper and we have created a SolrCloud cluster. If you don't know how to do this, refer to the Creating a new SolrCloud cluster recipe in Chapter 7, In the Cloud.

How to do it...

Let's assume that we want to create daily indices, because we use our SolrCloud cluster to store logs coming from different applications in our environment. I also assume that we only want to search in day or week intervals:

  1. We will start by creating an initial collection that will hold our data. To do this, we run a command similar to the following one:
    curl 'localhost:8983/solr/admin/collections?action=CREATE&name=logs_2014-11-10&numShards=1&replicationFactor=1&collection.configName=logs'
    
  2. Now to simplify indexing, we will create an alias called logs_index so that our indexing application always uses the same collection name. We do this by running the following command:
    curl 'localhost:8983/solr/admin/collections?action=CREATEALIAS&name=logs_index&collections=logs_2014-11-10'
    
  3. We also said that we want to simplify querying so that our UI doesn't need to worry about collections and their names. We need to create two aliases—we do this by running the following commands:
    curl 'localhost:8983/solr/admin/collections?action=CREATEALIAS&name=logs_search_day&collections=logs_2014-11-10'
    curl 'localhost:8983/solr/admin/collections?action=CREATEALIAS&name=logs_search_week&collections=logs_2014-11-10'
    
  4. Now, let's create a new daily collection using the following command:
    curl 'localhost:8983/solr/admin/collections?action=CREATE&name=logs_2014-11-11&numShards=1&replicationFactor=1&collection.configName=logs'
    
  5. After this has been done and the day ended, we need to run a command to alter our aliases. First, we need to alter our logs_index alias to point to a new, empty collection. We do this by running the following command:
    curl 'localhost:8983/solr/admin/collections?action=CREATEALIAS&name=logs_index&collections=logs_2014-11-11'
    
  6. Now we need to update aliases used for searching. We can do this by running the following commands:
    curl 'localhost:8983/solr/admin/collections?action=CREATEALIAS&name=logs_search_day&collections=logs_2014-11-11'
    curl 'localhost:8983/solr/admin/collections?action=CREATEALIAS&name=logs_search_week&collections=logs_2014-11-10,logs_2014-11-11'
    

Unfortunately, this is the only thing that we will have to automate ourselves—switching aliases. The rest will be handled by Solr. Now let's see how all this works.

How it works...

We started by creating a new collection. The collection is called logs_2014-11-10 and it will be used to store logs from November 10, 2014. However, we don't want our application to know the logic behind the naming of the collections. It might happen that we will change the naming and we don't want to force the application to be changed during the same time. Because of this, we created two aliases for searching and one for indexing. We will always use a single alias pointing to only a single collection for indexing. Actually, Solr won't index data if an alias points to more than one collection. Of course, the logs_search_day alias will also point to a single collection—the most recent one and the logs_search_week alias will cover the whole week (in our case, we start with a single collection, that's why it covers only a single collection initially).

Alias creation is very simple. We run a command to the same REST API that we used during the collection creation—to /admin/collections. We specify the action=CREATEALIAS command. We need to provide two things—first, the name of the alias we want to use (we do this by providing the name parameter) and the list of comma-separated collections that should be grouped with that alias (using the collections parameter).

After adding a new collection and when a new day comes, we need to switch our aliases. We again run the same command that we used to create the logs_index alias, but instead of pointing it to the logs_2014-11-10 collection, we point it to the newest collection, which is logs_2014-11-11. Solr will just overwrite the old alias definition. A similar thing is done for the aliases used for searching. We point the logs_search_day alias to the newest collection, which is logs_2014-11-11, and now we point the logs_search_week alias to two collections (we will point it to three collections on the next day and so on).

The only thing we need to worry about is making automation work to create new collections and switching the aliases, because Solr doesn't do that for us.

There's more...

There is one more thing I would like to describe when it comes to handling aliases.

Deleting an alias

In addition to creating aliases, Solr allows you to delete them as well. For example, if we would like to delete an alias called logs_search_day, we will run the following command:

curl 'localhost:8983/solr/admin/collections?action=DELETEALIAS&name=logs_search_day'

As you can see, the only thing that we need to provide is the action=DELETEALIAS parameter and the name of the alias we want to delete using the name request parameter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.154.161