Time for action – adding a custom DocTransformer to hide empty fields in the results

Since DocTransformers can manipulate the data we are going to return for a given document, they can be very useful for adding custom normalizations or simple data manipulations. Looking briefly at how to design a new transformer can add some clarity.

Imagine we had a transformer designed to avoid returning the empty values in our results (in our case there will be several empty fields):

>> curl -X GET 'http://localhost:8983/solr/paintings_transformers/select?q=*:*&fl=*,[noempty]&wt=json&indent=true'

The previous request works if we have added two lines like these in our solrconfig.xml:

<lib dir="${solr.core.instanceDir}/lib/" regex="solr-plugins-java.jar" />
<transformer name="noempty" class="it.seralf.solrbook.doctransformers.RemoveEmptyFieldsTransformerFactory" />

Here, the factory class is used to create an instance of RemoveEmptyFieldsTransformer, which has an outline as shown in the following code:

class RemoveEmptyFieldsTransformer extends DocTransformer {
  final String name = "noempty";
  @Override
  public String getName() {
    return this.name;
  }
  private void removeEmpty(final List<?> list){
    // TODO: remove empty fields in a list
  }
  @Override
  public void transform(final SolrDocument doc, final int docid) {
    final Iterator<Entry<String, Object>> it = doc.entrySet().iterator();
    while(it.hasNext()){
      final Entry<String, Object> entry = it.next();
      if(entry.getValue() == null) {
        it.remove();
      }else if(entry.getValue() instanceof List<?>){
        final List<?> list = (List<?>)entry.getValue();
        removeEmpty(list);
        if(list.size()==0) it.remove(); // if the list is empty
      }
    }
  }
}

The compiled jar containing the class will be under the /lib directory under the core folder. I've omitted the details here, you can find the complete runnable example under the path /SolrStarterBook/solr-app/chp04/paintings_transformers/, and the source in the project /SolrStarterBook/projects/solr-maven/solr-plugins-java.

What just happened?

The Java class structure is really simple. Every new transformer must extend an abstract, general DocTransformer class. There exists a single transform() method that contains all the logic, and we expect to receive inside it a SolrDocument object to transform its values. We will again see this behavior when we introduce customizations inside the update chain.

The getName()method is used to correctly recognize this object. This allows us to call its execution from requests by the name noempty. The same name has been used in the configuration file to bind the name to the Java class. Also, notice that a very similar approach can be used to introduce custom functions.

Looking at core parameters for queries

At this point, you are probably curious about what are all the parameters that we can use within the Solr request. A complete list will be tedious and difficult to read, because there are many specialized parameters and sometimes fields-specific parameters!. I would suggest you to start reading carefully on how to use the main and basic ones that you can find in the following table:

Parameter

Meaning

Default value

q

This is the query.

*:*

defType

This is the query parser that will be used.

Lucene query parser

q.op

The Boolean query operator used: AND/OR.

OR

df

This is the default field that will be deprecated soon.

defined in schema.xml

start

The ordinal number of the first document to be returned in the results.

0

rows

The number of document to be returned in the results.

10

fl

The list of the fields to be exposed in the results.

all

fq

This is a filter query for filtering the results.

N/A

pageDoc and pageScore

These parameters are useful to request document with a score greater than a certain value. Notice that they need the implicit score field.

N/A

omitHeader

This produces a response without the header part.

false

qt

This is the type of the query handler.

Lucene standard

wt

This is the writer for the response formatting.

XML

debug

This adds debugging info to the output.

false

timeAllowed

This is the maximum time allowed (in milliseconds) for executing a query.

N/A

As usual, this should be considered as a list from which to start. If you want more details, you should check the wiki page for the list:

http://wiki.apache.org/solr/CommonQueryParameters

You can also look at the following wiki pages:

http://wiki.apache.org/solr/SearchHandler

http://wiki.apache.org/solr/SearchComponent

In these pages, you will find that there are actually some different search components that can extend this list. We will cover the most interesting ones for our purposes in Chapter 5, Extending Search, and Chapter 6, Using Faceted Search – from Searching to Finding. Although they can be used in combination with the ones showed in the table, at the moment we will remain focused on the most essential parts of a Solr query.

Using the Lucene query parser with defType

We can explicitly decide to use the Lucene query parser, which is the default, or a custom parser that uses some different equivalent combination for the defType parameter:

  • q={!query defType=lucene}museum:louvre
  • q={!lucene}museum:louvre
  • q=museum:louvre&defType=lucene
  • q=museum:louvre

For the moment, we choose to use the last case, which uses the most simple syntax. The others will be used in a more advanced search, when you need to choose a specific alternative parser for queries or for handling sub-queries.

Notice that the default field parameter df can be used for specifying a default field in which to search, but will be deprecated in the next versions. So, I suggest not to use it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.196.103