Since DocTransformers
can manipulate the data we are going to return for a given document, they can be very useful for adding custom normalizations or simple data manipulations. Looking briefly at how to design a new transformer can add some clarity.
Imagine we had a transformer designed to avoid returning the empty values in our results (in our case there will be several empty fields):
>> curl -X GET 'http://localhost:8983/solr/paintings_transformers/select?q=*:*&fl=*,[noempty]&wt=json&indent=true'
The previous request works if we have added two lines like these in our solrconfig.xml
:
<lib dir="${solr.core.instanceDir}/lib/" regex="solr-plugins-java.jar" /> <transformer name="noempty" class="it.seralf.solrbook.doctransformers.RemoveEmptyFieldsTransformerFactory" />
Here, the factory class is used to create an instance of RemoveEmptyFieldsTransformer
, which has an outline as shown in the following code:
class RemoveEmptyFieldsTransformer extends DocTransformer { final String name = "noempty"; @Override public String getName() { return this.name; } private void removeEmpty(final List<?> list){ // TODO: remove empty fields in a list } @Override public void transform(final SolrDocument doc, final int docid) { final Iterator<Entry<String, Object>> it = doc.entrySet().iterator(); while(it.hasNext()){ final Entry<String, Object> entry = it.next(); if(entry.getValue() == null) { it.remove(); }else if(entry.getValue() instanceof List<?>){ final List<?> list = (List<?>)entry.getValue(); removeEmpty(list); if(list.size()==0) it.remove(); // if the list is empty } } } }
The compiled jar containing the class will be under the /lib
directory under the core folder. I've omitted the details here, you can find the complete runnable example under the path /SolrStarterBook/solr-app/chp04/paintings_transformers/
, and the source in the project /SolrStarterBook/projects/solr-maven/solr-plugins-java
.
The Java class structure is really simple. Every new transformer must extend an abstract, general DocTransformer
class. There exists a single transform()
method that contains all the logic, and we expect to receive inside it a SolrDocument
object to transform its values. We will again see this behavior when we introduce customizations inside the update chain.
The getName()
method is used to correctly recognize this object. This allows us to call its execution from requests by the name noempty
. The same name has been used in the configuration file to bind the name to the Java class. Also, notice that a very similar approach can be used to introduce custom functions.
At this point, you are probably curious about what are all the parameters that we can use within the Solr request. A complete list will be tedious and difficult to read, because there are many specialized parameters and sometimes fields-specific parameters!. I would suggest you to start reading carefully on how to use the main and basic ones that you can find in the following table:
Parameter |
Meaning |
Default value |
---|---|---|
|
This is the query. |
|
|
This is the query parser that will be used. |
Lucene query parser |
|
The Boolean query operator used: |
|
|
This is the default field that will be deprecated soon. |
defined in |
|
The ordinal number of the first document to be returned in the results. |
0 |
|
The number of document to be returned in the results. |
10 |
|
The list of the fields to be exposed in the results. |
all |
|
This is a filter query for filtering the results. |
N/A |
|
These parameters are useful to request document with a score greater than a certain value. Notice that they need the implicit |
N/A |
|
This produces a response without the header part. |
false |
|
This is the type of the query handler. |
Lucene standard |
|
This is the writer for the response formatting. |
XML |
|
This adds debugging info to the output. |
false |
|
This is the maximum time allowed (in milliseconds) for executing a query. |
N/A |
As usual, this should be considered as a list from which to start. If you want more details, you should check the wiki page for the list:
http://wiki.apache.org/solr/CommonQueryParameters
You can also look at the following wiki pages:
http://wiki.apache.org/solr/SearchHandler
http://wiki.apache.org/solr/SearchComponent
In these pages, you will find that there are actually some different search components that can extend this list. We will cover the most interesting ones for our purposes in Chapter 5, Extending Search, and Chapter 6, Using Faceted Search – from Searching to Finding. Although they can be used in combination with the ones showed in the table, at the moment we will remain focused on the most essential parts of a Solr query.
We can explicitly decide to use the Lucene query parser, which is the default, or a custom parser that uses some different equivalent combination for the defType
parameter:
q={!query defType=lucene}museum:louvre
q={!lucene}museum:louvre
q=museum:louvre&defType=lucene
q=museum:louvre
For the moment, we choose to use the last case, which uses the most simple syntax. The others will be used in a more advanced search, when you need to choose a specific alternative parser for queries or for handling sub-queries.
Notice that the default field parameter df
can be used for specifying a default field in which to search, but will be deprecated in the next versions. So, I suggest not to use it.
3.135.196.103