Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Working with the Solr HDFS connector

Apache Solr can utilize HDFS for indexing and storing its indices on the Hadoop system. It does not utilize a MapReduce-based framework for indexing. The following diagram shows the interaction pattern between Solr and HDFS. You can read more details about Apache Hadoop at http://hadoop.apache.org/docs/r2.4.0/.

Let's understand how this can be done.

To start with, the first and most important task is getting Apache Hadoop set up on your machine (proxy node configuration), or setting up a Hadoop cluster. You can download the latest Hadoop tarball or zip from http://hadoop.apache.org. The newer generation Hadoop uses advanced MapReduce (also known as YARN).
Based on the requirement, you can set up a single node (Documentation: http://hadoop.apache.org/docs/r<version>/hadoop-project-dist/hadoop-common/SingleCluster.html) or a cluster (Documentation: http://hadoop.apache.org/docs/r<version>/hadoop-project-dist/hadoop-common/ClusterSetup.html).
Typically, you will be required to set up the Hadoop environment and modify different configurations (yarn-site.xml, hdfs-site.xml, master, slaves, and others). Once it is set up, restart the Hadoop cluster.
Once Hadoop is setup, verify the installation of Hadoop by accessing http://host:port/cluster. You will see the following Hadoop cluster status:

Now, using the HDFS command, create a folder in HDFS to keep your Solr index and Solr logs:

$ $HADOOP_HOME/bin/hdfs.sh dfs -mkdir /Solr
$ $HADOOP_HOME/bin/hdfs.sh dfs -mkdir /Solr-logs

This call will create folders in the root folder, that is , /, on HDFS. You can verify these by running:

$ $HADOOP_HOME/bin/hdfs.sh dfs –ls /

Found 2 items
drwxr-xr-x   - hrishi supergroup          0 2014-05-11 11:29 /Solr
drwxr-xr-x   - hrishi supergroup          0 2014-05-11 11:27 /Solr-logs

You can also browse the folder structure by accessing http://<host>:50070/.

Once the folders are created, the next step will be to point Apache Solr to run with Hadoop HDFS. This can be done by passing JVM arguments for DirectoryFactory. If you are running Solr on a jetty, you can use the following command:
```
java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://<host>:19000/solr -Dsolr.updatelog=hdfs:// <host>:19000/Solr-logs -jar start.jar
```
You can validate the Solr on HDFS by accessing the Solr admin UI, to see it running on HDFS as shown in following screenshot:
In case you are using Apache SolrCloud, you can point solr.hdfs.home to your HDFS folder, and keep the data and logs on the local machine.
```
java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.hdfs.home=hdfs://<host>:19000/solrhdfs -jar start.jar
```

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Working with the Solr HDFS connector

Create new playlist

Sign In

Sign Up

Working with the Solr HDFS connector

Table of Contents for
Working with the Solr HDFS connector