Time for action – defining the schema.xml file with only dynamic fields and tokenization

The best way to keep the schema.xml file simple is to use only dynamic fields, so that we don't need to decide on the fields used from the beginning.

Our example will then have the following format:

<schema name='simple' version='1.1'>
  <types>
    <fieldtype name='string' class='solr.StrField' postingsFormat='SimpleText' />
  </types>
  <fields>
    <dynamicField name='*' type='string' multiValued='true' indexed='true' stored='true' />
    <copyField source='*' dest='fulltext' />
    <field name='fullText' type='string' multiValued='true' />
  </fields>
  <defaultSearchField>fullText</defaultSearchField>
  <solrQueryParser defaultOperator='OR' />
</schema>

As you can see, the schema.xml file is almost identical to the first example and even simpler; this should be one of the easiest and simplest ways to have a working Solr instance very quickly.

What just happened?

Here we write only a few things, and we are able to index every kind of field (posted to the /update API that we will see in a while) on our new Solr core called pdfs. We can conduct some tests with the XML posting format, but in this case we will anticipate the use of an internal Solr component (SolrCell / Tika) capable of doing an automatic extraction of metadata and text from PDF. So it's important to have a schema flexible enough to receive every field emitted from it, without the knowledge of the fields that will be emitted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.128.113