We can define the structure of a Solr document by writing the schema.xml
file (by defining its fields); we can also define some data manipulation strategy such as tokenizing texts in order to take care of single words instead of full phrases. The steps for writing a simple schema.xml
file are as follows:
schema.xml
file will involve the following elements:<schema name='simple' version='1.1'> <types> … </types> <fields> … </fields> <uniqueKey> … </uniqueKey> … </schema>
/SolrStarterbook/solr-app/chp02/conf/schema.xml
:<?xml version='1.0' ?> <schema name='simple' version='1.1'> <types> <fieldtype name='string' class='solr.StrField' /> <fieldType name='long' class='solr.TrieLongField' /> </types> <fields> <field name='id' type='long' required='true' /> <field name='author' type='string' multiValued='true' /> <field name='title' type='string' /> <field name='text' type='string' /> <dynamicField name='*_string' type='string' multiValued='true' indexed='true' stored='true' /> <copyField source='*' dest='fullText' /> <field name='fullText' type='string'multiValued='true' indexed='true' /> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>fullText</defaultSearchField> <solrQueryParser defaultOperator='OR' /> </schema>
Note how we have defined different fields for handling different types of data.
It is important to underlay the difference between storing the actual data and creating an indexing process over the metadata manipulated and derived (extracted, projected, filtered, and so on) from them. With Solr we usually take care of the second case, even in a special case where we can also be interested in storing the actual data, using Solr as a NoSQL database, as we will see later.
We may not necessarily be interested in describing all the data we have (for example, what we have in our databases), but only what can be relevant in the search and navigation context.
At the beginning it can look like a duplication of functionality between Solr and a relational database technology, but it is not. Solr is not designed to replace traditional relational databases. In most cases Solr is used in parallel with the relational database, to expose a simple and efficient full-text API over the DBMS data. As we will see later, the data can not only be indexed but also stored in Solr so that it's even possible to adopt it as a NoSQL store in certain cases.
You should easily recognize the essential parts of this file as follows:
If you are not a programmer or you are not familiar with data types, I suggest you start by using the basic string
type. When you have something working, you can move to using more advanced features, specific for a certain data type. For example, dates. If dates are saved using the required specific data type, it allows optimization for range queries over a certain period of time.
*
). The most simple way to do this is by copying the values into a default field where we will perform the actual searches. This field will also have its own type and analysis defined.<dynamicField name='*_s' type='string' />
we can post new documents containing string values such as firstName_s='Alfredo'
and surname_s='Serafini'
. This is an ideal case for prototypes, as we can work with the Solr API without defining a final schema for our data.and
between the various words used for a search is intuitively narrowed to a small set of documents. So in most cases you will use the or
operator instead, which is less restrictive and more natural for common queries. The and
approach is generally useful, for example, when working with navigation filters or conducting an advanced search on large datasets.Every field can define the following three important attributes:
true
, a Solr document can contain more than one instance of values for the field. The default value is false
.true
, the field is used in index. Generally we will use only indexed fields, but it can be interesting to have them not indexed in certain instances, for example, if we want to save a value without using it for searches.schema.xml
file to update the index; however, it is not explicitly saved unless we decide to store it.Imagine indexing several different synonyms of the same word using a word_synonim
multivalued field, but storing only this specific word in a word_original
field. When the user searches for the word or one of its synonyms, all the documents produced as output will only contain the field word_synonim
, which is the only one stored.
3.129.249.92