Apache Solr can utilize HDFS for indexing and storing its indices on the Hadoop system. It does not utilize a MapReduce-based framework for indexing. The following diagram shows the interaction pattern between Solr and HDFS. You can read more details about Apache Hadoop at http://hadoop.apache.org/docs/r2.4.0/.
Let's understand how this can be done.
yarn-site.xml
, hdfs-site.xml
, master, slaves, and others). Once it is set up, restart the Hadoop cluster.http://host:port/cluster
. You will see the following Hadoop cluster status:$ $HADOOP_HOME/bin/hdfs.sh dfs -mkdir /Solr $ $HADOOP_HOME/bin/hdfs.sh dfs -mkdir /Solr-logs
This call will create folders in the root folder, that is , /, on HDFS. You can verify these by running:
$ $HADOOP_HOME/bin/hdfs.sh dfs –ls /
Found 2 items drwxr-xr-x - hrishi supergroup 0 2014-05-11 11:29 /Solr drwxr-xr-x - hrishi supergroup 0 2014-05-11 11:27 /Solr-logs
You can also browse the folder structure by accessing http://<host>:50070/
.
java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://<host>:19000/solr -Dsolr.updatelog=hdfs:// <host>:19000/Solr-logs -jar start.jar
You can validate the Solr on HDFS by accessing the Solr admin UI, to see it running on HDFS as shown in following screenshot:
solr.hdfs.home
to your HDFS folder, and keep the data and logs on the local machine.java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.hdfs.home=hdfs://<host>:19000/solrhdfs -jar start.jar
3.144.28.65