The QueryElevation component

At times, you may desire to make editorial/manual modifications to the search results of particular user queries. This might be done as a solution to a popular user query that doesn't score an expected document sufficiently high—if it even matched at all. The query might have found nothing at all, perhaps due to a common misspelling. The opposite may also be true: the top result for a popular user query might yield a document that technically matched according to your search configuration, but certainly isn't what you were looking for. Another usage scenario is implementing a system akin to paid keywords for certain documents to be on top for certain user queries.

Tip

This feature isn't a general approach to fix queries not yielding effective search results; it is a Band-Aid for that problem. If a query isn't returning an expected document scored sufficiently high enough (if at all), then use Solr's query debugging to observe the score computation. You may end up troubleshooting text analysis issues too if a search query doesn't match an expected document—perhaps by adding a synonym. The end result may be tuning the boosts or applying function queries to incorporate other relevant fields into the scoring. When you are satisfied with the scoring and just need to make an occasional editorial decision, this component is for you.

Configuration

This search component is not in the standard component list and so it must be registered with a handler in solrconfig.xml. Here, we'll add it to the /mb_artists request handler definition, just for this example, anyway:

<requestHandler name="/mb_artists" class="solr.SearchHandler">
  <lst name="defaults">
…
  </lst>
  <arr name="last-components">
    <str>elevateArtists</str>
  </arr>
</requestHandler>

<searchComponent name="elevateArtists" 
    class="solr.QueryElevationComponent">
  <str name="queryFieldType">text</str>
  <str name="config-file">elevateArtists.xml</str>
  <str name="forceElevation">false</str>
</searchComponent>

This excerpt also reveals the registration of the search component using the same name as that referenced in last-components. A name was chosen to reflect the fact that this elevation configuration is only for artists. There are three named configuration parameters for a query elevation component, and they are explained as follows:

  • config-file: This is a reference to the configuration file containing the editorial adjustments. It is resolved relative to both Solr's conf directory, and if that fails, then Solr's data directory.

    Note

    When it's in the data directory (usually a sibling to conf), it will be reloaded when Solr commits.

  • queryFieldType: This is a reference to a field type in schema.xml. It is used to normalize both a query (the q parameter) and the query text attribute found in the configuration file, for comparison purposes. A field type might be crafted just for this purpose, but it should suffice to simply choose one that at least performs lowercasing. By default, there is no normalization.
  • forceElevation: The query elevation component fools Solr into thinking the specified documents matched the user's query and scored the highest. However, by default, it will not violate the desired sort as specified by the sort parameter. In order to force the elevated documents to the top no matter what sort is, set this parameter to true.

    Note

    A new option in Solr 4.7 is the ability for a request to specify which docs to elevate (or exclude) via the elevateIds and excludeIds (comma delimited unique key IDs) request parameters, which overrides the config file.

Let's take a peek at elevateArtists.xml:

<elevate>
  <query text="corgan">
    <doc id="Artist:11650" /><!--the Smashing Pumpkins-->
    <doc id="Artist:510" /><!-- Green Day -->
    <doc id="Artist:35656" exclude="true" /><!-- Starchildren -->
  </query>
  <!-- others queries... -->
</elevate>

In this elevation file, we've specified that when a user searches for corgan, the Smashing Pumpkins then Green Day should appear in the top two positions in the search results and that the artist Starchildren is to be excluded. Note that query elevation kicks in when the configured query text matches the user's query exactly, while taking into consideration configured text analysis. Thus, a search for billy corgan would not be affected by this configuration. It shouldn't be surprising that the documents are listed by ID in this file, but those IDs may not be clear alone to whoever reads this file, so we suggest using some comments to clarify the intent of the changes as seen here.

This component is quite simple with unsurprising results, so an example of this in action is not given. The only thing notable about the results when searching for corgan with the preceding configuration is that the top two results, the Smashing Pumpkins and Green Day, have scores of 1.72 and 0.0, respectively, yet the maxScore value in the result element is 11.3. Normally, a default sort results in the first document having the same score as the maximum score, but in this case that happens at the third position, as the first two were inserted by the query elevation component. Moreover, normally a result document has a score greater than 0, but in this case one was inserted by this component that never matched the user's query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.144.69