Exploring the post tool

In order to index different types of content to the Solr server, Solr provides a command-line tool.

To run this tool in Unix, use the following command:

bin/post -c gettingstarted example/exampledocs/books.json

For Windows, it gets a bit tricky as bin/post is available only as a Unix shell script.

On Windows, we need to use SimplePostTool, which is a standalone Java program and can be packaged in post.jar located at example/exampledocs. Navigate to example/exampledocs and issue this command:

java -jar post.jar -h

We will see the following output:

As you can see, we get the full documentation of the post tool.

Issue the following command to run the post tool in Windows:

java -Dc=gettingstarted -jar example/exampledocs/post.jar example/films/films.json

This will index content from films.json to the server at localhost:8983.

In order to index all the documents with the extension XML, issue the following command from the SOLR_HOME directory:

java -jar example/exampledocs/post.jar -Dc gettingstarted *.xml

Let's say you want to delete a document with ID 23 from the gettingstarted collection/core; you can issue the following command:

java -jar example/exampledocs/post.jar -Dc gettingstarted -Dd '<delete><id>23</id></delete>'

Similarly, we can index .json and .csv files as shown here:

java -jar example/exampledocs/post.jar -Dc gettingstarted *.json
java -jar example/exampledocs/post.jar -Dc gettingstarted *.csv

As you can see, there is not much difference in indexing CSV, XML, and JSON documents.

Now let's learn how to index rich documents. Let's say we want to index a Word document; we will issue the following command:


java -jar example/exampledocs/post.jar -Dc gettingstarted sample.doc

If we want to specify a bunch of documents of type .pdf and .doc in a folder named samplefolder, then we issue the following command:

java -jar example/exampledocs/post.jar -Dc gettingstarted -Dfiletypes doc,pdf samplefolder/

Now that we have learned how to use the post tool for indexing, let's see another technique to do the same, known as index handlers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.143.207