Working with the Solr HDFS connector

Apache Solr can utilize HDFS for indexing and storing its indices on the Hadoop system. It does not utilize a MapReduce-based framework for indexing. The following diagram shows the interaction pattern between Solr and HDFS. You can read more details about Apache Hadoop at http://hadoop.apache.org/docs/r2.4.0/.

Working with the Solr HDFS connector

Let's understand how this can be done.

  1. To start with, the first and most important task is getting Apache Hadoop set up on your machine (proxy node configuration), or setting up a Hadoop cluster. You can download the latest Hadoop tarball or zip from http://hadoop.apache.org. The newer generation Hadoop uses advanced MapReduce (also known as YARN).
  2. Based on the requirement, you can set up a single node (Documentation: http://hadoop.apache.org/docs/r<version>/hadoop-project-dist/hadoop-common/SingleCluster.html) or a cluster (Documentation: http://hadoop.apache.org/docs/r<version>/hadoop-project-dist/hadoop-common/ClusterSetup.html).
  3. Typically, you will be required to set up the Hadoop environment and modify different configurations (yarn-site.xml, hdfs-site.xml, master, slaves, and others). Once it is set up, restart the Hadoop cluster.
  4. Once Hadoop is setup, verify the installation of Hadoop by accessing http://host:port/cluster. You will see the following Hadoop cluster status:
    Working with the Solr HDFS connector
  5. Now, using the HDFS command, create a folder in HDFS to keep your Solr index and Solr logs:
    $ $HADOOP_HOME/bin/hdfs.sh dfs -mkdir /Solr
    $ $HADOOP_HOME/bin/hdfs.sh dfs -mkdir /Solr-logs
    

    This call will create folders in the root folder, that is , /, on HDFS. You can verify these by running:

    $ $HADOOP_HOME/bin/hdfs.sh dfs –ls /
    
    Found 2 items
    drwxr-xr-x   - hrishi supergroup          0 2014-05-11 11:29 /Solr
    drwxr-xr-x   - hrishi supergroup          0 2014-05-11 11:27 /Solr-logs
    

    You can also browse the folder structure by accessing http://<host>:50070/.

  6. Once the folders are created, the next step will be to point Apache Solr to run with Hadoop HDFS. This can be done by passing JVM arguments for DirectoryFactory. If you are running Solr on a jetty, you can use the following command:
    java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://<host>:19000/solr -Dsolr.updatelog=hdfs:// <host>:19000/Solr-logs -jar start.jar
    

    You can validate the Solr on HDFS by accessing the Solr admin UI, to see it running on HDFS as shown in following screenshot:

    Working with the Solr HDFS connector
  7. In case you are using Apache SolrCloud, you can point solr.hdfs.home to your HDFS folder, and keep the data and logs on the local machine.
    java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.hdfs.home=hdfs://<host>:19000/solrhdfs -jar start.jar
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.28.65