Eager loading with LINQ

Often, when we query for some set of entities, we also want to load some of the related entities. In this recipe, we'll show you how we can use LINQ extensions to eager load the child collections of our query results.

Getting ready

Complete the Getting Ready section at the beginning of this chapter.

How to do it…

  1. Create a new folder named EagerLoadingWithLinq in the project.
  2. Add a new class named Recipe to the folder:
    using System.Linq;
    using NH4CookbookHelpers.Queries;
    using NH4CookbookHelpers.Queries.Model;
    using NHibernate;
    using NHibernate.Linq;
    
    namespace QueryRecipes.EagerLoadingWithLinq
    {
        public class Recipe : QueryRecipe
        {
            protected override void Run(ISession session)
            {
                var book = session.Query<Book>()
                    .Fetch(x => x.Publisher)
                    .FirstOrDefault();
    
                Show("Book:", book);
    
                var movies = session.Query<Movie>()
                    .FetchMany(x => x.Actors)
                    .ToList();
    
                Show("Movies:", movies);
            }
        }
    }
  3. Run the application and start the EagerLoadingWithLinq recipe.
  4. Inspect the query log to see how the related entities have been included in the queries.

How it works…

In this recipe, we eagerly load the Publisher for our Book(s) and the ActorRoles for our Movies to avoid select N+1 problems. In a select N+1 situation, with one select statement, you can load a few entities from the database and begin to enumerate through them. As you enumerate through them, you access some lazy loaded property or collection. This triggers a separate database query for each entity in the original query. If we iterate through 1000 entities, we would have the original query plus 1000 nearly identical queries because we triggered lazy loading, hence the name select N+1. This creates N+1 round trips to the database, which will quickly kill performance, overwork the database, and even crash it.

Here, we iterate through each movie in the database. For each instance, we display the name of the actor in the starring role. This would normally trigger a separate database query for each movie, a potential select N+1 problem.

By specifying that we want to load the related ActorRoles in the same query, we can avoid this problem:

session.Query<Movie>()
  .FetchMany(x=>x.Actors)

This returns all movies and their actor roles, including the movies without actor roles. As can be seen in the query log, the ActorRoles are included using a LEFT JOIN.

The book and its publisher is fetched in a similar way, but since the Publisher property is a relation to one entity, as opposed to the Actors property which is a collection, we used Fetch instead of FetchMany. Why the difference? Well, we could actually have used Fetch for both types, but the eager loading doesn't have to stop at the first level. Consider if the ActorRole entity had an (admittedly weird) property called BookAboutTheRole. If we wanted to load that too, the query would look similar to this:

session.Query<Movie>()
    .FetchMany(x=>x.Actors)
   .ThenFetch(a=>a.BookAboutTheRole)

Had we used Fetch(x=>x.Actors), the ThenFetch method would not know that it should act on a single ActorRole, rather than the ISet<ActorRole>.

There's more…

The Then prefixed methods have a very important distinction. They act upon the preceding fetched type, allowing us to dig down into the object structure. Using the standard Fetch or FetchMany immediately brings us back to the root. This is useful, since we can add more fetch instructions, for example if ActorRole had yet another property that we wanted to include:

session.Query<Movie>()
       .FetchMany(x=>x.Actors)
                .ThenFetch(a=>a.BookAboutTheRole)
       .FetchMany(x=>x.Actors)
                .ThenFetch(a=>a.DocumentaryAboutTheActor)

We have to use the same FetchMany twice, but that's just a syntactic inconvenience. It doesn't actually include the actors twice.

It would seem then that eager loading is something that can be used frequently, and it can. However, there are a few drawbacks and pitfalls. Extending the query with several fetch requests increases the complexity of the queries and the amount of data retrieved. It may be worth considering different strategies, such as projecting only the data that's really needed.

Another important issue is that these eager fetches causes tables to be joined into the query. When the fetched property is a collection, it means two things:

Limiting the result set doesn't work

Consider we want to fetch only two movies, but want their Actors collection to be eager loaded. It seems we want this query:

session.Query<Movie>()
   .FetchMany(x => x.Actors)
   .OrderBy(x=>x.Name)
   .Take(2)

Unfortunately, that could go horribly wrong. The limit will be applied to the entire query, including the joins, meaning that we will end up with only one movie, unless the first movie has only one actor. It also means that the Actors collection won't be properly initialized with all the included entities.

Eager loading multiple collections should be avoided

If our Movie class also had a collection property for MakeUpArtists, we might be tempted and eager to load them in the same query. We certainly can:

session.Query<Movie>()
   .FetchMany(x => x.Actors)
   .FetchMany(x => x.MakeUpArtists)

While it may work, in limited scenarios, this causes two unrelated joins to be added to the query. The result becomes what's called a Cartesian product. If a movie has 100 actors and 10 make-up artists, 1000 rows would be returned for that single movie. Adding more collections will multiply the result even further. Not only does that impact the performance, it also means that collections that allow duplicates (bags) may be filled with more duplicates than intended.

Fortunately, there are solutions to both these issues, which will be discussed in Chapter 5, Improving Performance.

Use batching as an alternative solution

An alternative for greatly reducing the impact of a select N+1 problem is to use the batch-size property in the mapping. Suppose we had added batch-size to our movies mapping, as shown in the following code:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
    assembly="Eg.Core"
    namespace="Eg.Core">
  <subclass name="Movie" extends="Product">
    <property name="Director" />
    <list name="Actors" cascade="all-delete-orphan"
          batch-size="10">
      <key column="MovieId" />
      <index column="ActorIndex" />
      <one-to-many class="ActorRole"/>
    </list>
  </subclass>
</hibernate-mapping>

With a typical select N+1 bug, we would trigger a query on each movie. This behavior changes when we set the batch-size to 10. NHibernate needs to query for the contents of an Actors collection to initialize it, but it notices the batch-size setting. It finds nine other uninitialized Actors collections in the session and loads all of them at once with a single query.

If we have 10 movies, we only need two queries instead of 11, for 20 movies, we need three instead of 21, and so on. This cuts out about 90 percent of our queries.

See also

  • Eager loading multiple collections
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.125.205