Avoiding eager fetching

While eager fetching of collections looks like a simple and quick solution to select N+1 problem, it does introduce an issue. Since eager fetching uses left outer joins, it results in Cartesian product of matching row in root table and collection table. Following unit test illustrates this behavior. We have three employees stored in-memory database. All three employees live in London. Two of the three employees are members of a community and the third employee is member of two communities. The query in the following unit test loads all employees living in London and also eagerly fetches their Communities collection. I expect to get three employee instance back.

[Test]
public void QueryOver()
{
  using (var transaction = Database.Session.BeginTransaction())
  {
    Address residentialAddress = null;
    var employees = Database.Session.QueryOver<Employee>()
            .JoinAlias(e => e.ResidentialAddress, 
            () => residentialAddress)
            .Where(() => residentialAddress.City == "London")
            .Fetch(e => e.Communities).Eager
            .List<Employee>();

    Assert.That(employees.Count, Is.EqualTo(3));
    transaction.Commit();
  }
}

If you run this test, you will note that we get four records from database and the test fails. Why did that happen? If we look at the SQL that is issued by the preceding query, which is shown next, we realize that there is a join between the Employee and Community tables:

SELECT          this_.id AS id0_2_, 
                //... employee columns 
                residentia1_.id AS id2_0_, 
                //... address columns 
                communitie4_.employee_id AS employee1_4_, 
                community5_.id           AS community2_4_, 
                //... community columns 
FROM            employee this_ 
INNER JOIN      address residentia1_ 
ON              this_.id=residentia1_.employee_id 
LEFT OUTER JOIN employee_community communitie4_ 
ON              this_.id=communitie4_.employee_id 
LEFT OUTER JOIN community community5_ 
ON              communitie4_.community_id=community5_.id 
WHERE           residentia1_.city = @p0;@p0 = 'London' [Type: String (0)]

To LC: Format the following part of the above in Code Highlight

LEFT OUTER JOIN employee_community communitie4_ 
ON              this_.id=communitie4_.employee_id 
LEFT OUTER JOIN community community5_ 
ON              communitie4_.community_id=community5_.id 

The left outer join between Employee and Community makes the employee who is member of two communities appear in the result twice. So we get a total of 4 records – 2 each for the employees who are member of one community and 2 for employee who is member of 2 communities. If we had 1000 qualifying employees and each a member of 2 communities, we would have got back 2000 results. The problem amplifies even more if we eagerly fetch more than one collection in the same query. In that case, the original Cartesian product would be multiplied by the number of the items in the new collection.

This is such a common problem that NHibernate has provided a neat fix for this. The fix is in the form of a result transformer. Result transformer operates on the results loaded from database and transforms them into a different form. A result transformer is represented by interface IResultTransformer. Various different implementations of this interface are available to satisfy different result transformation requirements. The one we are interested in here is called DistinctRootEntityResultTransformer. This implementation takes the query results which may contain duplicates due to Cartesian product, de-duplicates the root entity, and places the collection items under the correct instance of root entity. If we change the query from the previous unit test to something as follows, then the test would pass:

Address residentialAddress = null;
var employees = Database.Session.QueryOver<Employee>()
.JoinAlias(e => e.ResidentialAddress, () => residentialAddress)
.Where(() => residentialAddress.City == "London")
.Fetch(e => e.Communities).Eager
.TransformUsing(Transformers.DistinctRootEntity).List<Employee>();

We have registered a result transformer to be used with this query by placing a call to method TransformUsing just before we call the terminal method List<Employee>. The parameter passed to this method is an instance of IResultTransformer that we want to be used. We have used factory class Transformers which make easy to work with all in-built transformers. Transformers.DistinctRootEntity is what would give us the unique results.

Note

I have used QueryOver example as it is explicit about use of transformers. LINQ possesses the same problem with Cartesian products but you would never see it because it transparently applies DistinctRootEntityResultTransformer for you all the time.

So are we saying that DistinctRootEntityResultTransformer fixes the problem introduced by eager fetching? In a way yes, but the solution is not most elegant. Result transformers do their transformation in memory. That means we are loading all the results out of a Cartesian product into memory first and then processing them before they are returned to our code. Cartesian product has a tendency to grow too large too soon. There could be situation when a Cartesian product inadvertently returns hundreds of thousands of records which would kill your application, both memory-and-CPU wise. For these reasons it is best to avoid eager fetching if you can. What to do when we do not want collections lazily loaded because of select N+1 issue and we also do not want them to be eagerly fetched due to Cartesian product issue? Solution is to use future queries.

Future queries to the rescue

We have seen that future queries let us bundle multiple queries together and have them executed in one database roundtrip. We can use future queries to give us the effect of eager loading without side-effects from Cartesian products. Suppose we want to write the same query we saw previously, and in addition to the Communities collection also load the Benefits collection. Following code snippet shows how we can do it using future query:

Address residentialAddress = null;
var employees = Database.Session.QueryOver<Employee>()
  .JoinAlias(e => e.ResidentialAddress, () => residentialAddress)
  .Where(() => residentialAddress.City == "London")
  .Future<Employee>();

Database.Session.QueryOver<Employee>()
  .JoinAlias(e => e.ResidentialAddress, () => residentialAddress)
  .Where(() => residentialAddress.City == "London")
  .Fetch(e => e.Communities).Eager
  .Future<Employee>();

Database.Session.QueryOver<Employee>()
  .JoinAlias(e => e.ResidentialAddress, () => residentialAddress)
  .Where(() => residentialAddress.City == "London")
  .Fetch(e => e.Benefits).Eager
  .Future<Employee>();

The important parts are in bold in the preceding code. We have used the base query to retrieve employees living in London and turned that into a future query by calling Future<Employee> in the end. The next two queries may seem like repetition of the first query but if you look carefully, they are eagerly fetching the Communities and Benefits collections. These queries are also marked as future queries but their results are not stored in any local variable. The results are discarded. Internally, NHibernate still tracks and executes these queries when the first query is executed. This is what happens internally.

The first query is executed and the results from that query are kept in identity map of session. Second query is not any different from its non-future version. Following SQL is emitted for the second query:

SELECT this_.*, 
       community5_.*, 
FROM   employee this_ 
       INNER JOIN address residentia1_ 
               ON this_.id = residentia1_.employee_id 
       LEFT OUTER JOIN employee_community communitie4_ 
                    ON this_.id = communitie4_.employee_id 
       LEFT OUTER JOIN community community5_ 
                    ON communitie4_.community_id = community5_.id 
WHERE  residentia1_.city = 'London';

Note

This is an intentionally shortened version of actual SQL generated when the future query is run.

This is still a Cartesian product between Employee and Community. NHibernate uses only part of this result to populate the Communities collection on the employee instances, that it stored in the identity map populated using the results from the first query. Everything else from this query is discarded. The same is repeated for the results of the third query.

Two things to note here – we have not avoided the Cartesian products altogether but there is not significant processing of the results in memory. Also, in case of eager fetching multiple collections, we got a big Cartesian product involving all the collections. With future queries we get small Cartesian products involving only root entity and one collection at a time. This significantly reduces the data coming back from database and also the amount of processing required on the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.181.57