In order to index different types of content to the Solr server, Solr provides a command-line tool.
To run this tool in Unix, use the following command:
bin/post -c gettingstarted example/exampledocs/books.json
For Windows, it gets a bit tricky as bin/post is available only as a Unix shell script.
On Windows, we need to use SimplePostTool, which is a standalone Java program and can be packaged in post.jar located at example/exampledocs. Navigate to example/exampledocs and issue this command:
java -jar post.jar -h
We will see the following output:
As you can see, we get the full documentation of the post tool.
Issue the following command to run the post tool in Windows:
java -Dc=gettingstarted -jar example/exampledocs/post.jar example/films/films.json
This will index content from films.json to the server at localhost:8983.
In order to index all the documents with the extension XML, issue the following command from the SOLR_HOME directory:
java -jar example/exampledocs/post.jar -Dc gettingstarted *.xml
Let's say you want to delete a document with ID 23 from the gettingstarted collection/core; you can issue the following command:
java -jar example/exampledocs/post.jar -Dc gettingstarted -Dd '<delete><id>23</id></delete>'
Similarly, we can index .json and .csv files as shown here:
java -jar example/exampledocs/post.jar -Dc gettingstarted *.json
java -jar example/exampledocs/post.jar -Dc gettingstarted *.csv
As you can see, there is not much difference in indexing CSV, XML, and JSON documents.
Now let's learn how to index rich documents. Let's say we want to index a Word document; we will issue the following command:
java -jar example/exampledocs/post.jar -Dc gettingstarted sample.doc
If we want to specify a bunch of documents of type .pdf and .doc in a folder named samplefolder, then we issue the following command:
java -jar example/exampledocs/post.jar -Dc gettingstarted -Dfiletypes doc,pdf samplefolder/
Now that we have learned how to use the post tool for indexing, let's see another technique to do the same, known as index handlers.