Now let's see how the configuration files looks after all the modifications.
If you look at the provided sources, you'll find the full configuration at the path /SolrStarterBook/solr-app/chp02/pdfs/conf
, the index data saved at the path /SolrStarterBook/solr-app/chp02/pdfs/data/index
, and some scripts useful for your testing at the path of the companion directory /test/chp02/pdfs
. In case you miss something while reading, you will also find the step-by-step version of the updated configurations in the same /cho02/
folder.
Our configuration will now look as shown in the following piece of code:
<?xml version='1.0' encoding='UTF-8' ?> <config> <luceneMatchVersion>LUCENE_45</luceneMatchVersion> <directoryFactory name='DirectoryFactory'class='solr.MMapDirectoryFactory' /> <codecFactory name='CodecFactory' class='solr.SchemaCodecFactory' /> <lib dir='${solr.core.instanceDir}/../lib' /> <requestHandler name='standard' class='solr.StandardRequestHandler' default='true' /> <requestHandler name='/update' class='solr.UpdateRequestHandler'> <lst name='defaults'> <str name='update.chain'>deduplication</str> </lst> </requestHandler> <requestHandler name='/update/extract' class='solr.extraction.ExtractingRequestHandler'> <lst name='defaults'> <str name='captureAttr'>true</str> <str name='lowernames'>true</str> <str name='overwrite'>true</str> <str name='literalsOverride'>true</str> <str name='fmap.a'>link</str> <str name='update.chain'>deduplication</str> </lst> </requestHandler> <updateRequestProcessorChain name='deduplication'> <processor class='org.apache.solr.update.processor.SignatureUpdateProcessorFactory'> <bool name='overwriteDupes'>false</bool> <str name='signatureField'>uid</str> <bool name='enabled'>true</bool> <str name='fields'>content</str> <str name='minTokenLen'>10</str> <str name='quantRate'>.2</str> <str name='signatureClass'>solr.update.processor.TextProfileSignature</str> </processor> <processor class='solr.LogUpdateProcessorFactory' /> <processor class='solr.RunUpdateProcessorFactory' /> </updateRequestProcessorChain> <requestHandler name='/admin/' class='org.apache.solr.handler.admin.AdminHandlers' /> <admin><defaultQuery>*:*</defaultQuery></admin> </config>
In the following chapters we will avoid transcribing a full configuration file, to make the example more readable. In this chapter we had the first look at a complete file. Though it is very simple, it will become more complex. This will help us to understand the ideas of the whole process more clearly.
This file now contains all the types used in the examples, with their analysis (tokenization, case transformation):
<?xml version='1.0' encoding='UTF-8' ?> <schema name='pdfs' version='1.1'> <types> <fieldtype name='string' class='solr.StrField' postingsFormat='SimpleText' /> <fieldtype name='text' class='solr.TextField' postingsFormat='SimpleText'> <analyzer> <tokenizer class='solr.WhitespaceTokenizerFactory' /> <filter class='solr.LowerCaseFilterFactory' /> </analyzer> </fieldtype> </types> <fields> <field name='uid' type='string' indexed='true'stored='true' multiValued='false' /> <dynamicField name='*' type='string' multiValued='true' indexed='true' stored='true' /> <copyField source='*' dest='fullText' /> <field name='fullText' type='text' multiValued='true' /> </fields> <defaultSearchField>fullText</defaultSearchField> <solrQueryParser defaultOperator='OR' /> <uniqueKey>uid</uniqueKey> </schema>
Dynamic fields have been introduced to include some flexibility, directly indexing every field exposed by Tika as metadata.
18.226.104.127