Indexing data using XPath

For simplicity, we'll use FileDataSource. With it, we can import data into Solr from XML files using XPathEntityProcessor to retrieve the data.

Let's go ahead and create a new core named MusicCatalogue-DIH-XPath in Solr. We can create the configuration files similarly to the ones we previously created for JDBCDataSource.

In solrconfig.xml, we'll use the following content:

  <requestHandler name="/dataimport" class="solr.DataImportHandler">
    <lst name="defaults">
      <str name="config">xpath-data-config.xml</str>
    </lst>
  </requestHandler>

We'll create a new file called xpath-data-config.xml, which will contain FileDataSource and XPathEntityProcessor:

<dataConfig>
  <!-- File Data Source -->
  <dataSource type="FileDataSource" encoding="UTF-8" />
  
  <document>
    <entity 
      processor="XPathEntityProcessor"
      name="musicCatalog"
      pk="songId"
      url="/path/to/SolrIndexingExamples/Chapter-5/sampleData.xml"
      forEach="/musicCatalog/albums/album/"
      transformer="RegexTransformer">

      <field column="songId" xpath="/musicCatalog/albums/album/songId"/>
      <field column="songName" xpath="/musicCatalog/albums/album/songName"/>
      <field column="artistName" xpath="/musicCatalog/albums/album/artistName"/>
      <field column="albumArtist" xpath="/musicCatalog/albums/album/albumArtist"/>
      <field column="albumName" xpath="/musicCatalog/albums/album/albumName"/>
      <field column="songDuration" xpath="/musicCatalog/albums/album/songDuration"/>
      <field column="composer" xpath="/musicCatalog/albums/album/composer"/>
      <field column="rating" xpath="/musicCatalog/albums/album/rating"/>
      <field column="year" xpath="/musicCatalog/albums/album/year"/>
      <field column="genre" xpath="/musicCatalog/albums/album/genre"/>
    </entity>
  </document>
</dataConfig>

In the preceding <dataConfig> element, we're just using a single XML file; we need to get this file indexed for our example. We can also use the following configuration to index a list of XML files:

<dataConfig>
  <dataSource type="FileDataSource" encoding="UTF-8"/>
  <document>
    <entity
      name="document"
      processor="FileListEntityProcessor"
      baseDir="/path/to/xml-files"
      fileName=".*.xml$"
      recursive="false"
      rootEntity="false"
      dataSource="null">
      <entity 
      processor="XPathEntityProcessor"
      name="musicCatalog"
      pk="songId"
      url="${document.fileAbsolutePath}"
      forEach="/musicCatalog/albums/album/"
      transformer="RegexTransformer">

      <!-- Definition of Fields as per the previous example -->
    </entity>
    </entity>
  </document>
</dataConfig>

A sample XML file that contains the sample album data has been provided in the code that is available with this book.

The contents of the sample XML file look like the following:

<musicCatalog>
  <albums>
    <album>
      <songId>100000010</songId>
      <songName>(Oh No) What You Got</songName>
      <artistName>Justin Timberlake</artistName>
      <albumArtist>Various</albumArtist>
      <albumName>Justified</albumName>
      <songDuration>4.31</songDuration>
      <composer/>
      <rating>3.5</rating>
      <year>2002</year>
      <genre>Pop, Electronic, Dance, Adult Contemporary, Teen Pop</genre>
    </album>
  </albums>
</musicCatalog>

As we can see from xpath-data-config.xml, we are using FileDataSource to read the contents of the file. Then, using XPathEntityProcessor, we fetch the values of the field. For example, we retrieve artistName using the following code:

  <field column="artistName" xpath="/musicCatalog/albums/album/artistName"/>

The xpath attribute is used to pass an XPath expression to the field element, which is used by XPathEntityProcessor to retrieve the artistName value from the XML document and is then fed into Solr for indexing.

Let's test our newly created core in Solr. To do this, we'll start our Solr instance and navigate to the Solr Admin UI (http://localhost:8983/solr/#/musicCatalog-DIH-XPath/).

Let's import the XML data using the DataImport tab. To do this, click on the Dataimport tab, select the full-import option, and click on Execute, as shown in this screenshot:

Indexing data using XPath

As we can see from the preceding screenshot, after we click on the Execute button, the data import handler indexes the data from the XML file into Solr. Solr gives the following output, which tells the user how many documents were added/updated or deleted:

Indexing data using XPath

After running the import, we can query the Solr index to retrieve our indexed document. To do this, we can use the query browser in the Solr Admin UI, or we can directly go to this URL:

http://localhost:8983/solr/musicCatalog-DIH-XPath/select?q=*%3A*&wt=json&indent=true

The following result is expected if the data import is successful:

{
    "responseHeader":{
      "status":0,
      "QTime":0
    },
    "response":{
      "numFound":1,
      "start":0,
      "docs":[
         {
            "genre":"Pop, Electronic, Dance, Adult Contemporary, Teen Pop",
            "composer":"",
            "albumArtist":"Various",
            "tmpField":[
               "Various",
               "100000010",
               "Justin Timberlake"
            ],
            "albumName":"Justified",
            "songDuration":4.31,
            "year":2002,
            "songName":"(Oh No) What You Got",
            "rating":3.5,
            "songId":"100000010",
            "artistName":"Justin Timberlake"
         }
      ]
    }
}

The preceding result shows us how we can use the data import handler to index XML documents into Solr.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.164.246