Time for action – looking inside an index with SimpleTextCodec

I would suggest the use of SimpleTextCodec to start looking inside a Solr Index, observing its internal structure saved as structured text.

A codec is a particular component that implements a specific policy to handle binary data, so you can consider it as the engine used for internal serialization of an index.

In order to enable it, it's important to add a specific codecFactory codec in the solrconfig.xml file:

<codecFactory name='CodecFactory' class='solr.SchemaCodecFactory' />

Once the codec is added, we can decide which fields (in schema.xml, as usual) are to be saved using the codec. Every field that we want to serialize by the codec needs to be declared explicitly, as shown in the following piece of code:

<fieldtype name='string' class='solr.StrField' postingsFormat='SimpleText' />

If you update the configuration and restart your Solr instance, it's important to clean up the index before indexing example data again. This is done just to avoid potential problems and confusion. When we are done with this, we can easily take a look at the plain textual representation of what our index contains.

What just happened?

The SimpleTextCodec codec is a particular codec started as an experimental tool, and has now been added to the official components available in the default installation. It saves Terms in plain text files, so it's possible to directly view how the data is saved into the index itself, by simply opening one of the files ending in .pst:

What just happened?

Be aware that this kind of codec is only useful for testing and learning purposes, as the process of writing to text files is one of the slowest. The performance degrades very quickly and hence it is not the ideal choice for a production environment. But if you want to update your index and save or update some document in it, it can be very interesting to observe how the index changes internally.

Observing the data and having an idea on what an index looks like internally finally introduces us to one of the most important concepts, the inverted index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.2.225