Avoiding the select N+1 problem

We had briefly mentioned select N+1 problem in the previous chapter. Select N+1 is bad from both performance and memory consumption point of view. We are going to discuss how we can avoid this problem. Before we get our hands dirty, let's spend some time trying to understand what is select N+1 problem.

What is the select N+1 problem?

It is easier to understand select N+1 problem if we read it in reverse – 1+N select. That is right. Let's see how.

Suppose you want to load all employees living in London and then for every employee iterate through the benefits that they are getting. Following is one way of doing that using a LINQ query:

[Test]
public void WithSelectNPlusOneIssue()
{
  using (var transaction = Database.Session.BeginTransaction())
  {
    var employees = Database.Session.Query<Employee>()
    .Where(e => e.ResidentialAddress.City == "London");

    foreach (var employee in employees)
    {
      foreach (var benefit in employee.Benefits)
      {
        Assert.That(benefit.Employee, Is.Not.Null);
    }
  }
  transaction.Commit();
  }
}

Let me state one thing about the preceding code before we continue with our discussion. I have inserted a not so meaningful assert in there because I needed some work to be done on each benefit instance in the Benefits collection. In real life, this could be some business operation that needs to be performed on the loaded benefit object. With that clear, let's get back to our discussion.

If you notice, the Benefits collection is lazily loaded as we iterate through the loaded employee instances. When the previous code is run, we get the following SQL. This is a stripped down version only showing the parts relevant to this discussion.

SELECT     employee0_.id AS id0_, 
           //...other columns...// 
FROM       employee employee0_ 
INNER JOIN address address1_ 
ON         employee0_.id=address1_.employee_id 
WHERE      address1_.city=@p0;
@p0 = 'London' [Type: String (0)]

SELECT          benefits0_.employee_id AS employee4_1_, 
                benefits0_.id          AS id1_, 
                //...other columns...// 
FROM            benefit benefits0_ 
LEFT OUTER JOIN leave benefits0_1_ 
ON              benefits0_.id=benefits0_1_.id 
LEFT OUTER JOIN skillsenhancementallowance benefits0_2_ 
ON              benefits0_.id=benefits0_2_.id 
LEFT OUTER JOIN seasonticketloan benefits0_3_ 
ON              benefits0_.id=benefits0_3_.id 
WHERE           benefits0_.employee_id=@p0;
@p0 = 11 [Type: Int32 (0)]

SELECT          benefits0_.employee_id AS employee4_1_, 
                benefits0_.id          AS id1_, 
                //...other columns...// 
FROM            benefit benefits0_ 
LEFT OUTER JOIN leave benefits0_1_ 
ON              benefits0_.id=benefits0_1_.id 
LEFT OUTER JOIN skillsenhancementallowance benefits0_2_ 
ON              benefits0_.id=benefits0_2_.id 
LEFT OUTER JOIN seasonticketloan benefits0_3_ 
ON              benefits0_.id=benefits0_3_.id 
WHERE           benefits0_.employee_id=@p0;
@p0 = 12 [Type: Int32 (0)]

SELECT          benefits0_.employee_id AS employee4_1_, 
                benefits0_.id          AS id1_, 
                //...other columns...// 
FROM            benefit benefits0_ 
LEFT OUTER JOIN leave benefits0_1_ 
ON              benefits0_.id=benefits0_1_.id 
LEFT OUTER JOIN skillsenhancementallowance benefits0_2_ 
ON              benefits0_.id=benefits0_2_.id 
LEFT OUTER JOIN seasonticketloan benefits0_3_ 
ON              benefits0_.id=benefits0_3_.id 
WHERE           benefits0_.employee_id=@p0;
@p0 = 13 [Type: Int32 (0)]

There are a total of four SELECT statements in the preceding SQL. First one is to retrieve employees living in London. We got three instances of such employees having ID 11, 12, and 13. The next three SELECT statements are to fetch the Benefits collection for each employee instance. Do you see the pattern here? We have got one SELECT that returns N records and then for each of those N records we get an additional SELECT. So we end up getting 1 + N SELECT statements. That is select N+1 problem for us. Or read in the reverse order 1 + N SELECTs problem.

Why select N+1 is a problem?

Following are the three reasons why select N+1 is not a good thing:

  • In terms of algorithmic complexity, we have got complexity of order O(N). For a small number of N, it is fine to send multiple SELECT statements to database but as N increases, then sending hundreds of SELECT statements to database could have significant impact on application's behavior.
  • While the N SELECT statements are bad, what is worse is that each of those SELECT statements are sent to database in a separate roundtrip. Imagine what would happen to network traffic for a large number of N.
  • Those N SELECT statements are not the best in class. They are querying on foreign key that is present on the table. If this key is not indexed then you are doing a full table scan to get your results. For a large value of N, doing full table scan hundreds of times in succession is not a good idea.

How do we fix the select N+1 problem?

Select N+1 has multiple problems and there are multiple solutions to it. Not every solution addresses every problem of select N+1 but you can choose the right solution depending on severity of issue at hand. For example, you can use the following batching solution to bundle up number of SELECT statements in a batch and have them sent to database in one roundtrip, thus reducing cost of remote calls. Let's see what each solution offers and in which situation it can be used:

  • Eager fetching: We have covered eager fetching in the previous chapter. With eager fetching you can fetch collections along with root entity at the same time. Using this option would disable lazy loading and you may end up fetching a large number of collection items even when they are not needed. Though eager fetching solves the select N+1 problem completely, I recommend using caution with eager fetching when lazily loading multiple collections or collections having large number of items in them. We will see the reason for it in the Avoid eager fetching section of this chapter.
  • Fetching strategies: Join fetching and subselect fetching both fix the select N+1 problem. Join fetching, being same as eager fetching, poses same issues around lazily loading multiple collections and collections having large number of items in them. So use join fetching wisely. Subselct offers pre-fetching which may result in unnecessarily loading the collections that we may never access in the code. If you want pre-fetching a small number of collections at a time then use batching of lazy collection we discussed in this chapter.
  • Batching: In the batching section at the beginning of this chapter, we saw how the SELECT statements generated out of lazy loading of collections can be batched together. This solution does not reduce the number of SELECT statements that are generated but lets you reduce the number of database roundtrips, thus saving the cost of remote calls.
  • Extra Lazy: We have briefly touched upon extra lazy option in the previous chapter. Extra lazy adds smartness to loading of indexed collections. If you have got an indexed collection and your code accesses a particular item of that collection using index, then NHibernate can fetch only that one item instead of the whole collection if the collection is set to use extra lazy behavior. From the previous chapter we know how to set extra lazy behavior on a collection. Beyond that, this is a very simple concept which we will not explore in any detail beyond this point.
  • Caching: This is easiest to understand. By turning on caching and making sure that collection items are cached as they are loaded, we can reduce the number of database roundtrips by having the collection items loaded from cache. We will discuss caching in Chapter 11, A Whirlwind Tour of Other NHibernate Features, where we talk about features of NHibernate that you do not use routinely.

As with fetching strategies, there is no right solution to the select N+1 problem. My best bet would be to use lazy loading with batching turned on. If I need a particular collection to be eagerly loaded then I will use future queries. This is what we discuss next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.96.94