Content streaming

In Solr, we can index remote or local files by enabling remote streaming in solrconfig.xml. Let's see how we can use this feature in Solr, we'll follow the steps given here to enable the remote streaming feature.

Let's use our newly created languages-example core and modify solrconfig.xml. We'll replace the requestDispatcher config in our solrconfig.xml file with the following lines:

  <requestDispatcher handleSelect="false" > 
    <requestParsers enableRemoteStreaming="true" 
      multipartUploadLimitInKB="2048000"
      formdataUploadLimitInKB="2048"
      addHttpRequestToContext="false"/> 
  </requestDispatcher>

The enableRemoteStreaming="true" property will enable the remote streaming feature. This will enable us to index remote or local files. Let's go ahead and index a remote file in our Solr index:

$ curl http://localhost:8983/solr/languages-example/update?commit=true -F stream.url=https://raw.githubusercontent.com/sachin-handiekar/SolrIndexingBook/master/Chapter-9/files/multilang-remote.xml

We'll get the following response from Solr:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">968</int></lst>
</response>

We can see the indexed document in Solr after navigating to the query browser. We'll get this output:

{
  responseHeader : {
    status : 0,
    QTime : 8
  },
  response : {
    numFound : 3,
    start : 0,
    docs : [{
      id : "doc4",
      language : "en",
      content_en : " Hello, This is a remote file which is waiting to get indexed in Solr. "
    }, {
      id : "doc5",
      language : "ru",
      content_ru : " Привет , это удаленный файл, который ждет, чтобы получить индексируются в Solr . "
    }, {
      id : "doc6",
      language : "fr",
      content_fr : " Bonjour, ceci est un fichier distant qui est en attente pour obtenir indexé dans Solr. "
    }
    ]
  }
}

As we can see from the preceding response, the original document has been overwritten and we're just seeing one document getting indexed in Solr.

We can also use DumpRequestHandler to debug the requests that are made by adding the following request handler in solrconfig.xml:

<requestHandler name="/debug/dump" class="solr.DumpRequestHandler" />

For example, let's see the contents of the remote file that we recently indexed into Solr. When we navigate to the following URL, we'll see the XML response from the /debug/dump request handler; it will contain the stream response:

http://localhost:8983/solr/languages-example/debug/dump?stream.url=https://raw.githubusercontent.com/sachin-handiekar/SolrIndexingBook/master/Chapter-9/files/multilang-remote.xml

This URL will return the following XML response, which will contain the data of the remote file:

Content streaming

The use of the /debug/dump request handler should be disabled in production, as anyone can see the contents of the remote/local file, which creates a security risk.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.98.18