In Solr, we can index remote or local files by enabling remote streaming in solrconfig.xml
. Let's see how we can use this feature in Solr, we'll follow the steps given here to enable the remote streaming feature.
Let's use our newly created languages-example
core and modify solrconfig.xml
. We'll replace the requestDispatcher
config in our solrconfig.xml
file with the following lines:
<requestDispatcher handleSelect="false" > <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048" addHttpRequestToContext="false"/> </requestDispatcher>
The enableRemoteStreaming="true"
property will enable the remote streaming feature. This will enable us to index remote or local files. Let's go ahead and index a remote file in our Solr index:
$ curl http://localhost:8983/solr/languages-example/update?commit=true -F stream.url=https://raw.githubusercontent.com/sachin-handiekar/SolrIndexingBook/master/Chapter-9/files/multilang-remote.xml
We'll get the following response from Solr:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">968</int></lst> </response>
We can see the indexed document in Solr after navigating to the query browser. We'll get this output:
{ responseHeader : { status : 0, QTime : 8 }, response : { numFound : 3, start : 0, docs : [{ id : "doc4", language : "en", content_en : " Hello, This is a remote file which is waiting to get indexed in Solr. " }, { id : "doc5", language : "ru", content_ru : " Привет , это удаленный файл, который ждет, чтобы получить индексируются в Solr . " }, { id : "doc6", language : "fr", content_fr : " Bonjour, ceci est un fichier distant qui est en attente pour obtenir indexé dans Solr. " } ] } }
As we can see from the preceding response, the original document has been overwritten and we're just seeing one document getting indexed in Solr.
We can also use DumpRequestHandler
to debug the requests that are made by adding the following request handler in solrconfig.xml
:
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler" />
For example, let's see the contents of the remote file that we recently indexed into Solr. When we navigate to the following URL, we'll see the XML response from the /debug/dump
request handler; it will contain the stream response:
http://localhost:8983/solr/languages-example/debug/dump?stream.url=https://raw.githubusercontent.com/sachin-handiekar/SolrIndexingBook/master/Chapter-9/files/multilang-remote.xml
This URL will return the following XML response, which will contain the data of the remote file:
The use of the /debug/dump
request handler should be disabled in production, as anyone can see the contents of the remote/local file, which creates a security risk.
18.225.98.18