Chapter 11. Deployment

Now that you have identified the information you want to make searchable, built the Solr schema to support your expected queries, and made the tweaks to the configuration you need, you're ready to deploy your Solr-based search platform into production. While the process of deployment may seem simple after all the effort you've gone through in development, it brings its own set of challenges. In this chapter, we'll look at the issues that come up when going from "Solr runs on my desktop" to "Solr is ready for the enterprise".

We'll cover the following topics in this chapter:

  • Implementation methodology
  • Installing Solr into a Servlet container
  • Configuring logging
  • A SearchHandler per search interface
  • Solr cores, and the new admin features
  • Setting up ZooKeeper for SolrCloud
  • Monitoring Solr
  • Securing Solr

Deployment methodology for Solr

There are a number of questions that you need to ask yourself in order to inform the development of a smooth deployment strategy for Solr. The deployment process should ideally be fully scripted and integrated into the existing Configuration Management (CM) process of your application.

Note

Configuration Management is the task of tracking and controlling changes in the software. CM attempts to make the changes that occur in software knowable as it evolves to mitigate mistakes caused due to those changes.

Questions to ask

The list of questions that you'll want to answer to work in conjunction with your operations team includes:

  • How similar is my deployment environment to my development and test environments? Can I project that if one Solr instance was enough to meet my load requirements in test, then it is also applicable to the load expected in production based on having similar physical hardware?
  • Do I need multiple Solr servers to meet the projected load or for failover? If you do, look back at Chapter 10, Scaling Solr.
  • Do I have an existing build tool such as Ant/MSBuild/Capistrano with which to integrate the deployment process? Even better, does my organization use a deployment tool such as Puppet or Chef that I can leverage?
  • How will I import the initial data into Solr? Is this a one-time-only process that might take hours or days to perform and needs to be scheduled ahead of time? Is there a nightly process in the application that will perform this step? Can I trigger the load process from the deploy script?
  • Have I changed the source code required to build Solr to meet my own needs? Do I need to version it in my own source control repository? Can I package my modifications to Solr as discrete components instead of changing the source of Solr and rebuilding?
  • Do I have full access to the data in production, or do I have to coordinate with an operations team who are responsible for controlling access to production? If operations is performing the indexing tasks, are the steps required properly documented and automated?
  • Have I defined acceptance tests for ensuring Solr is returning the appropriate results for a specific search before moving to production?
  • What are the defined performance targets, such as requests per second, time to index data, time to perform query that Solr needs to meet? Are these documented as a Service Level Agreement (SLA)?
  • Into what kind of servlet container (Tomcat, Jetty, or JBoss) will Solr be deployed? Does how I secure Solr change depending on the servlet container?
  • What is my monitoring strategy for making sure Solr is performing properly? This isn't just about Solr's response time or error monitoring but critically includes the user queries. The single best tool for improving your search relevance is to look at your user queries. A reasonable user query that is returning zero results directly points to how you can improve your relevancy.
  • Do I need to store index data directories separately from application code directories, for instance, on a separate hard drive? If I have small enough indexes to fit in RAM, can I use a memory-based filesystem? Can I use SSDs?
  • What is my backup strategy for my indexes, if any? If the indexes can be rebuilt quickly from another data source, then backups may not be needed. However, if the indexes are the "Gold Master", such as from crawling the Web for data that can't be re-crawled, or the time to rebuild an index is too great, then frequent backups are crucial.
  • Are any scripted administration tasks required, for example, performing index optimizations, old backups removal, deletion of stale data, or rebuilding spell check dictionaries?
  • Am I better off with an externally hosted Solr capability? There are a number of companies that have launched SaaS offerings for Solr, from Acquia offering hosted Solr search specifically for Drupal websites to WebSolr providing a generic Solr hosting option.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.68.14