Joining

In most real world applications, models share relationships of some kind, either directly through their attributes or through an association "table". Traditional database engines make use of foreign keys to describe relationships, and SQL joins are used to merge the record sets together.

Solr has limited support for joining via its join query parsers (join and block-join). These query parsers use the local-params syntax to describe relationships between documents—local-params was described earlier in this chapter.

Note

These parsers are not equal to SQL joins. The main difference between SQL and Solr in regard to joins is that the Solr joins do not merge related documents together in the search results. Solr joins are analogous to an SQL inner query in a WHERE clause.

The join query parser

The join query syntax takes two attributes, to and from, both of which accept field names as their values. The from field is used to link matching documents (those that matched the join query) to documents that match the to field. Not surprisingly, the join parser also requires a query. This query parser also supports joining across cores through its fromIndex option. As an example, let's say we'd like to fetch a set of documents from the mbartists core, where the artist has a certain release from the mbreleases core:

http://localhost:8983/solr/mbartists/select?q={!join from=r_a_id to=a_id fromIndex=mbreleases}r_id:139850&fl=type,a_id

The following is the syntax:

{!join from=r_a_id to=a_id fromIndex=mbreleases}r_id:139850

The resulting documents would be something like:

<result name="response" numFound="1" start="0">
  <doc>
    <str name="type">Artist</str>
    <long name="a_id">11650</long>
  </doc>
</result>

For completeness, here's the same query using SQL:

SELECT type,a_id FROM mbartists a where a.a_id IN (SELECT r.r_a_id FROM mbreleases r where r.r_id = 139850);

The field type of the from and to fields should be the same.

Here's another example showing a join between more than one core/index within the same query. This also makes use of the special local-params v attribute (the query):

fq={!join from=childId1 to=primaryCoreId fromIndex=childCore1 v=$childQ1} AND {!join from=childId2 to=primaryCoreId fromIndex=childCore2 v=$childQ2}&childQ1=(field1:abc AND field2:[0 TO 1234])&childQ2=(field3:xyz)

Tip

If there's a fair chance the same join query will occur again, put it in a filter query (the fq parameter) if you can, so that it will be cached.

One use of the Solr join is to put your volatile data in one core, and the more static in another core, using joins to associate records at query time.

Join queries have no influence on relevancy or document scores. If you're up for customizing Solr though, the Lucene join module contains a scoring join query, which could be used with little effort.

It should be noted that join queries can be slow; the more matching IDs there are, the longer the response time will become. In many cases, a carefully designed schema can satisfy most requirements by making good use of denormalization instead of joining.

Block-join query parsers

Block-join is called as such because it requires subdocuments to be indexed together in one block with the parent, which trickles down to the underlying index. You can even have a nested hierarchy. But this index requirement is a big limitation—you can't update any one document without updating an entire tree from a parent, and you can't use atomic updates. But for this trade-off, you get very fast joins.

Chapter 4, Indexing Data, covers the details on nested documents, but we'll provide a simple example here. Our sample nested-docs.json file contains the following JSON:

{
  "add": [{
    "id": "1",
    "title_t": "Node A",
    "relType_s": "parent",
    "_childDocuments_": [{
      "id": "2",
      "title_t": "Node A:A"
    }]
  }, {
    "id": "3",
    "title_t": "Node B",
    "relType_s": "parent",
    "_childDocuments_": [{
      "id": "4",
      "title_t": "Node B:B"
    }]
  }]
}

As you can see, the relationships are all self-contained within the special childDocuments_ array field. To index, we can simply use curl:

curl -H 'content-type: application/json' -X POST "http://127.0.0.1:8983/solr/collection1/update?commit=true" --data-binary @nested-docs.json

Now that we have our nested documents indexed, we can query them using the block-join query parsers. There are actually two: block-join-parent and block-join-children. These parsers are quite different from the aforementioned join parser. Instead of returning documents matching a field-based foreign key, we use a query to identify parent/child documents to which we then apply a query to fetch the related results.

The block-join-children parser

The block-join-children parser is to find child documents given a query for parent documents. The syntax requires one attribute called of, the value being a simple Solr query that will be used to identify all valid parent documents. The primary query will be used to find specific parent documents within this set. Matching child documents will then be returned in the result set. For example, to find the child documents of Node A, we use the block-join-children parser as follows:

http://localhost:8983/solr/collection1/select?q={!child of="relType_s:parent"}title_t:"Node A"&wt=json&omitHeader=true

The following is the syntax:

{!child of="relType_s:parent"}title_t:"Node A"

That query yields this response:

{
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "2",
        "title_t": "Node A:A"
      }
    ]
  }
}

This returns exactly what you'd expect: one child of Node A, Node A:A.

The block-join-parent parser

To query for parent documents given a query for child documents, use the block-join-parent parser. This parser syntax requires one attribute called which. The value of this attribute is a Solr query that will be used to identify all valid parent documents. The primary query will be used to find specific child documents. Matching parent documents will then be returned in the result set. Here's an example:

http://localhost:8983/solr/collection1/select?q={!parent which="relType_s:parent"}title_t:"Node A:A"&wt=json&omitHeader=true

The following is the syntax:

{!parent which="relType_s:parent"}title_t:"Node A:A"

The yielded response for this query is as follows:

{
  "response": {
    "numFound": 1,
    "start": 0,
      "docs": [
        {
            "id": "1",
            "title_t": "Node A",
            "relType_s": "parent",
            "_version_": 1492571473528750000
        }
    ]
  }
}

This is what was expected; one parent document, Node A.

Note

There are other join implementations currently available as patch files on SOLR-4787: PostFilterJoin, to join records that match the main query and ValueSourceJoin, to return values from the second core based on the join query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.22.244