Definitions and properties of field types

Before going to the definitions and properties, we will see what field analysis means.

What Solr should do or how it should interpret data whenever data is indexed is important. For example, a description of a book can contain lots of useless words: helping verbs such as is, was, and are; pronouns such as they, we, and so on; and other general words such as the, a, this, and so on. Querying these words will bring all the data. Similarly what should we do with words that have capital letters?

All of these problems can be catered using field analysis to ignore common words or casing while indexing or querying. We will dive deep into field analysis in the next chapter.

Now, coming back to field types, all analyses on a field are done by the field type, whether documents are indexed or a query is made on the index.

All field types are specified in schema.xml. A field type can have the following attributes:

  • The name field, which is mandatory.
  • The class field, which is also mandatory. This tells us which class to implement.
  • In the case of TextField, you can mention description to convey what the TextField does.
  • Based on the Implementation class certain field type properties which may or may not be mandatory.

The field type is defined within the fieldType tags. Let's take a look at a field type definition for text_en:

 <fieldType name="text_en" class="solr.TextField" 
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query
time
<filter class="solr.SynonymGraphFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true"
expand="false"/>
<filter class="solr.FlattenGraphFilterFactory"/>

-->
<!-- Case insensitive stop word removal-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

As you can see, the first line defines fieldType with name text_en, which implements the solr.TextField class. It also has an attribute, positionIncrementGap, which adds spaces between multi-value fields.

For example, let's say your text contains the following tokens:

writer: Sandeep Nair
writer: Dharmesh Vasoya

Now, without any positionIncrementGap attribute, it is possible to bring up the results when someone searches for Nair Dharmesh. But with the positionIncrementGap attribute, we can avoid this.

We will cover the rest of the details available in fieldType class in detail in the next chapter.

You must have noticed that a class begins with solr in solr.TextField. This is a short form for the fully qualified package name org.apache.solr.schema.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.55.75