Creating a numeric field

We've learned how to deal with textual content using a StringField and TextField in Lucene, so now let's take a look at how numerals are handled. Lucene provides four Field classes for storing numeric values. They are IntField, FloatField, LongField, and DoubleField, and are analogous to Java numeric types. Lucene, being a text search engine, treats numeral as term internally and indexes them in a trie structure (also called ordered tree data structure) as illustrated in the following:

Creating a numeric field

Each Term is logically assigned to larger and larger predefined lower-precision brackets. For example, let's assume that the brackets are divided by a quotient of division of a lower level by ten as in the preceding diagram. So, under the 1 bracket (at the top level), we get DocId associated with values in the 100s range, and under the 12 bracket, we get association with values in the 120s range and so on. Now, let's say you want to search by numeric range of all documents with the numeric value between 230 and 239: Lucene can simply find the 23 bracket in the index and return all the DocIds underneath. As you can see, this technique allows Lucene to leverage its indexing power to also handle numerals with ease.

The numbers of brackets can be tuned by changing the value called precisionStep. A smaller precisionStep value will result in a larger number of brackets that will consume more disk space and, at the same time, will improve the search range performance. The value can only be changed by creating a FieldType custom. The default value is 4 and is selected by Lucene's team for a reasonable tradeoff between disk space consumption and performance.

Numeral values in Lucene can be sorted, searched by range, and matched exactly, which is similar to what you would do in a text field. Note that if you intend to sort by a numeric field, you should create a separate single-value field to sort purposes (by setting precisionStep to Integer.MAX_VALUE), as this is more efficient than using thebracketed index.

How to do it...

Let's look at a code sample for creating numeric fields:

IntField intField = new IntField("int_value", 100, Field.Store.YES);
LongField longField = new LongField("long_value", 100L, Field.Store.YES);
FloatField floatField = new FloatField("float_value", 100.0F, Field.Store.YES);
DoubleField doubleField = new DoubleField("double_value", 100.0D, Field.Store.YES);
FieldType sortedIntField = new FieldType();
sortedIntField.setNumericType(FieldType.NumericType.INT);
sortedIntField.setNumericPrecisionStep(Integer.MAX_VALUE);
sortedIntField.setStored(false);
sortedIntField.setIndexed(true);
IntField intFieldSorted = new IntField("int_value_sort", 100, sortedIntField);
Document document = new Document();
document.add(intField);
document.add(longField);
document.add(floatField);
document.add(doubleField);
document.add(intFieldSorted);

How it works...

The instantiation of different numeric fields are pretty much the same as you find in the code. The first parameter is the name of the field, the second parameter is the value, and last the parameter is FieldType. In our example, we specified that we want the field value stored by passing in Field.Store.YES.

In the second portion, where we defined our own FieldType, we can see a demonstration of creating a single-valued IntField to sort purposes. We set the numeric type to FieldType.NumericType.INT and precisionStep to Integer.MAX_VALUE, so that we can ensure the index is not bracketed. Then, we set stored to false because we are already storing the same int value in intField, and have indexed this to true so that this field goes into an index. Also, lastly, we created another Field called intFieldSorted to use this custom FieldType.

The fields are now ready to add to a document as shown in the last portion of the code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.198.138