Setting up ZooKeeper for SolrCloud

ZooKeeper is the technology that keeps all the nodes in SolrCloud in sync, and in Chapter 10, Scaling Solr , we discussed how to leverage it. However, for convenience, we told Solr to run the ZooKeeper service internally (embedded, in-process) by passing the zkRun parameter to Solr on startup. While you could do that in production, you usually shouldn't because then you tie your ZooKeeper service to your Solr nodes. So imagine the scenario where you want to stop and restart Solr? Running embedded ZooKeeper means that you also take down one of your ZooKeeper nodes when you stop a Solr node. ZooKeeper has the concept of a quorum of servers that all host the exact same configuration file, and to have a valid quorum, at least half of the ZooKeeper processes must be functioning. If you have three Solr nodes running embedded ZooKeeper, and you restart two of the Solr nodes, you no longer have a quorum of 2 out of 3 servers, a situation called split brain, and your SolrCloud cluster goes down. Since your Solr nodes are much more volatile in nature than your ZooKeeper nodes, you hamstring the reliability of your ZooKeeper service.

Folks are often concerned when you spec out a set of servers for SolrCloud and mention that you need an additional three or five servers to run your ZooKeeper service on, beyond the servers hosting Solr. However, since ZooKeeper, in the context of providing configuration management to a cluster of SolrCloud servers, is pretty lightly used, and therefore doesn't generate much load. It has two tasks: store the configuration files for a Solr collection, including the locations of all the nodes making up each collection, and send messages to all the Solr nodes when the state changes for one of them like a node going up or down, or a new collection being defined. When a message about a state change is sent to Solr, then each Solr node queries back to ZooKeeper about the state change, and adjusts accordingly. That adjustment may be downloading a new synonyms.txt or a solrconfig.xml and restarting the core to load the new configuration.

Neither of these tasks requires extensive disk space, CPU, or memory, so it's very reasonable to run your ZooKeeper nodes on virtual machines. All the heavy work of indexing data, performing queries, is done on the Solr nodes, so they should be sized appropriately.

Installing ZooKeeper

Installing ZooKeeper by hand is pretty straightforward, though it's a great thing to automate with tools like Puppet since you have to repeat the same basic steps multiple times!

Solr pretty much keeps up with the latest ZooKeeper; check the release notes for your specific Solr download. Download the ZooKeeper package and unzip to a reasonable directory like /opt/ZooKeeper. In the unzipped directory, there will be a sample configuration file at ./conf/zoo_sample.cfg, copy it to zoo.cfg. Edit the file and add two parameters that point to where the ZooKeeper data and transaction logs are store. In very high performance situations, you might want dedicated disks for that, but in most cases, with SolrCloud, you can have both sets of data stored on the same disk. Also, provide a list of all the servers that are in the ensemble of ZooKeeper servers:

dataDir=/var/lib/zookeeperdata
dataLogDir=/var/log/zookeeper
# servers in the ensemble
server.1=zk1.mycompany.com:2888:3888
server.2=zk2.mycompany.com:2888:3888
server.3=zk3.mycompany.com:2888:3888

There is one odditiy in ZooKeeper in which you provide a magic file called myid that specifies the name of the server at the root of dataDir: /var/lib/zookeeperdata/myid. On zk1.mycompany.com, that would be the value 1, on zk2.mycompany.com, it would be 2, and so on. You can then start ZooKeeper by running the following code:

>> ./bin/zkServer.sh ../conf/start zoo.cfg

Repeat these steps on each server, changing the myid file.

Administering Data in ZooKeeper

There are two ways of interacting with ZooKeeper, one is the SolrCloud centric script called zkcli.sh that comes with Solr. Assuming you have a local SolrCloud running using ./examples/10/start-musicbrainz-solrcloud.sh, navigate to ./examples/10/solrcloud-working-dir/example/scripts/cloud-scripts. To list out the contents of the data in ZooKeeper, run the list command:

>> ./zkcli.sh -zkhost localhost:9983 -cmd list

You can upload a specific file using the put command or just push up, or pull down a complete set of configuration files using the upconfig or downconfig commands.

However, if you want to treat the ZooKeeper data as a simple Unix filesystem with commands such as ls or rm, then you need to use the command-line client that comes with ZooKeeper. To make things more confusing, it is called zkCli.sh, just one character case off of the Solr version called zkcli.sh.

>> ./zookeeper-3.4.6/bin/zkCli.sh -server 127.0.0.1:9983

Notice that we pass in the IP address and port of the ZooKeeper ensemble we want to connect to. You can now list out all the config files that belong to the mbtypes collection:

>> ls /configs/mbtypes

To see all the commands available for interacting with the data in ZooKeeper, run help:

>> help
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.200.86