Batching

Total execution time of a NHibernate query constitutes of three things:

  • Time it takes to send the query to remote database server
  • Time it takes to execute actual SQL on the database server
  • Time it takes to return the results to client, process the results on client, and hydrate the entities

Depending on what kind of queries you are writing, you may be spending a lot of time sending your queries to remote database server. In such a situation, you can batch multiple SQL statements together and send them to database server for execution in one roundtrip. Batching does not impact the time it takes to execute actual SQL on the server, but reduces the network time as you are doing a single database interaction against several you would have done in absence of batching. NHibernate offers different ways of batching queries that can be used in different scenarios.

As we discussed in the beginning, NHibernate queries can be generally divided into two buckets – read queries and write queries. For write queries, SQL commands are sent to database either when transaction is committed or when session is flushed. Read queries are more instantaneous. First, SQL command is sent to database when terminal method such as List<T> is called on the query. If lazy loading is enabled, further SQL commands are sent as our code interacts with loaded entities. For read queries, there is no one point in time (such as commit of transaction, and so on.) when all the accumulated operations are sent to database at once. One thing that is common in both types of queries is that each SQL command results in a database roundtrip making whole data access chatty and slow. You can overcome this issue by batching multiple SQL statements together. NHibernate supports batching for both type of queries.

Batching the write queries using ADO.NET batching

NHibernate uses ADO.NET under the hood for its database interaction. ADO.NET inherently supports the concept of batching SQL queries. If you are inserting or updating large number of entities in a single transaction, then you can utilize underlying ADO.NET batching to bundle up the INSERT/UPDATE statements to reduce the number of roundtrips to database. NHibernate lets you configure ADO.NET batch size either at session factory level or at session level. Let's go over an example to see how this works.

In the following code, we are inserting 100 employee instances having their ResidentialAddress property set and one community instance added to the Communities collection:

[Test]
public void InsertLargeNumberOfEmployeeRecords()
{
  using (var transaction = Database.Session.BeginTransaction())
  {
    for (int i = 0; i < 100; i++)
    {
      var employee = new Employee
      {
        DateOfBirth = new DateTime(1972, 3, 5),
        DateOfJoining = new DateTime(2001, 5, 28),
        ResidentialAddress = new Address()
      };
      employee.AddCommunity(new Community());

      Database.Session.Save(employee);
    }

  transaction.Commit();
  }
}

Note that we have not set meaningful values for different properties of the entities that we are saving. That is not needed for the preceding test. Date fields are populated because MS SQL Server 2012 was used for this test and it cannot work with the DateTime type of fields if they are left empty.

When this code is run, a total of 424 SQL statements would be sent to database. Of this, 24 statements are produced during the process of ID generation for entities to be inserted. Remaining 400 are produced for actual insert operation of 100 records into each Employee, Address, Employee_Community, and Community table. We would now turn on ADO.NET batching in our configuration using the following line of code:

SetProperty(Environment.BatchSize, "1");

Preceding property works in two ways. If it is not set or set to zero, it would mean that batching is disabled. If set to any number greater than 0, it would mean that batching is turned on and batch size is equal to the number set in configuration. This is configuration of batch size at session factory level. This configuration applies to all session instances created from that session factory.

If you want different batch size for each session, then you can change the batch size at session level by calling the SetBatchSize method on ISession. Change to batch size made by calling this method is local to that session and any new sessions created would inherit their batch size from the session factory configuration.

Following image illustrates what happens when we change the batch size from 0 to 50 and then to 100:

Batching the write queries using ADO.NET batching

The first image on the left shows summary of each session. The number in square brackets next to each session is the number of database roundtrips that each session did. First 24 statements of each session are from ID generation logic so we are going to ignore those in this discussion. These cannot be batched as they run in their own transaction and possibly more than one transaction. Ignoring these roundtrips, we can see that first session did a total of 400 roundtrips while next two did 8 and 4 respectively. First session did not use any batching. Second session used a batch size of 50. NHibernate put 50 statements into a batch thus making 8 database roundtrips. The last session used batch size of 100 resulting in only 4 database roundtrips.

While this example is a bit contrived to prove the point, note the reduction in the number of roundtrips to database. In scenarios where database is on a different server or a different network, this can bring about huge savings in the time spent in sending SQL to remote server.

Limitations of batching

Write batching sounds like a quick win but this feature is not supported on all databases. Currently native batching is only supported on MS SQL Server, Oracle, and MySQL.

Besides that, write batching does not work in the couple of situations described next:

  • When you are relying on databases to generate identifier values, for example, using identity columns as primary keys or using database sequences. Use of database identities means that NHibernate needs to go to database after every insert to read the identifier value which prevents batching. So if you are using database identities and you want to use batching then best option is to choose a different ID generation strategy if you can.
  • Second situation is more of a NHibernate limitation. Batching works only if the entity being saved is inserted into one table. This does not include the tables for associated entity. This limitation poses a problem when an entity is part of an inheritance hierarchy. If the inheritance hierarchy is mapped using "table per subclass" or "table per concrete class" strategy then two tables are involved in saving of any derived entity. For instance, when the Leave class is saved, a record is inserted into base table Benefit and table Leave. Batching does not work in this case and if you are saving 100 instances of the Leave entity then there would be 100 database roundtrips. This obviously can be fixed if you use "table per hierarchy" strategy which puts everything into a single table.

Memory consumption and batching

Let me contrive the above example further to prove another point. What would happen if instead of persisting 100 instances of employee entity, you are asked to persist 100K instances? Most likely, you would be waiting for a long time to see your 100K records inserted into the database.

Note

I am using this example just to prove a point. NHibernate or any ORM is not a tool to carry out bulk insert kind of operations. There are tools available that are built for these kinds of operations and are more efficient at bulk insert than ORMs. Readers are recommended to use such tools instead of NHibernate.

If you use code similar to the one we used previously, with batching enabled and set to an appropriate number, you may get the work done in reasonable time but it is highly likely that you would be greeted with an out of memory exception. That is because every entity that is persisted is also kept in memory by session. Depending on the specification of the server, you would soon fill up the memory available to your application. One simple solution to resolve this issue is to flush the in-memory entity state to database at regular intervals and clear the session to reduce memory consumption. But I would not rely on this solution as clearing the session at right time is very important. Flushing/clearing the session at wrong time may result in issues. A better solution is to use stateless sessions. Stateless sessions are discussed in detail in Chapter 11, A Whirlwind Tour of Other NHibernate Features.

Since we are on the topic of persisting large number of entities using NHibernate, let me bring another characteristic of this situation to your attention. When we call ISession.Save, NHibernate does some checks on the entity being passed to the Save method. While doing these checks, NHibernate has to go through the graph of entities that it has stored in session. If there is a large number of entities stored in session then these checks take long time, resulting in save operation becoming slow. The flushing solution we discussed just now may help here. But if there is no reliable way of flushing the session then the gains you get out of batching are quickly outweighed by losses from slow execution of the Save method.

Read batching using future queries

ADO.NET batching capability can be used for write queries very easily but the same concept does not work for read queries. Design of read queries poses some challenges when it comes to batching them. Specifically, the following two characteristics of read queries highlight why batching of read queries is difficult:

  • Lazy loading defers execution of some of the queries to later time thus reducing the opportunity for batching multiple related queries together
  • Read queries need to be executed instantaneously, the moment terminal methods such as ToList<T> are called on them

Deferred query execution offered by lazy loading is a feature we want to use more often than not. So there is no point in turning off lazy loading in favor of ability to batch queries. Second issue is a bit different. If we have multiple related queries which are not dependent on each other then can we do something to have them sent to database in one batch? By not being dependent on each other, I mean, you do not need to have a prior query executed in order to execute a later query. Future queries were introduced in NHibernate to handle situations exactly like this.

Future queries work the same way as lazy loading by deferring the query execution to future time. Let's use an example to understand this better. Suppose we want to build a summary section where we want to display total number of employees, total number of communities, and top 5 communities by number of members. We can write something as the following using LINQ in order to build this summary section:

var employees = Database.Session.Query<Employee>();

var communities = Database.Session.Query<Community>();

var topCommunities = Database.Session.Query<Community>()
        .OrderByDescending(c => c.Members.Count())
        .Take(5);

var benefits = Database.Session.Query<Benefit>();

Note

I have used an example to show you how future queries can be used to batch together completely unrelated queries. Another practical example where you can find future queries very useful is when you want to show paginated search results. You would like to return matching records on current page along with the total number of records that match the search criteria. You would write two different queries for this but can batch those using future queries.

Preceding code would send three SELECT statements to database at three different times when each of following methods are called:

employees.Count()
communities.Count()
topCommunities.ToList();

Since these methods are called at different times, each SELECT statement is sent to database independently of each other, causing more network traffic. This can be significant for remote database servers. These statements are related with each other and satisfy a single UI requirement that displays the summary section. There is no need to execute these queries independently of each other. There is no side-effect if these queries are bundled together and sent to database all at once. That is what future queries offer. Using future queries you can tell NHibernate to bundle a set of queries and send to database in one batch. You can convert any LINQ query to a future query by calling method ToFuture() in the end. Following is how our previous queries would look like when converted to future queries:

var employees = Database.Session.Query<Employee>().ToFuture();

var communities = Database.Session.Query<Community>().ToFuture();

var topCommunities = Database.Session.Query<Community>()
.OrderByDescending(c => c.Members.Count()).Take(5).ToFuture();

var benefits = Database.Session.Query<Benefit>().ToFuture();

Call to the ToFuture() method must be placed in the end. ToFuture returns IEnumerable<T>. NHibernate has a special type NHibernate.Impl.DelayedEnumerator<T> which implements IEnumerable<T>. Actually, it is this type that NHibernate returns. Internally, NHibernate keeps track of all the DelayedEnumerator<T> instances so that if query execution is triggered on any one of them, queries for rest of the DelayedEnumerator<T> instances are also sent to database for execution in the same roundtrip. The effect we get is similar to what we got with write batching. The only difference is that we cannot tell how many statements we want to bundle together. All the future queries that are associated with the session would be executed at once when one of them is triggered.

This looks good so far. But there still is a small problem with the preceding code. The first future query gets us the count of employees in the database and the SELECT statement generated for it is as follows:

SELECT employee0_.id            AS Id0_, 
       employee0_.firstname     AS Firstname0_, 
       employee0_.lastname      AS Lastname0_, 
       employee0_.emailaddress  AS EmailAdd5_0_, 
       employee0_.dateofbirth   AS DateOfBi6_0_, 
       employee0_.dateofjoining AS DateOfJo7_0_, 
       employee0_.isadmin       AS IsAdmin0_, 
       employee0_.password      AS Password0_ 
FROM   employee employee0_;   

All we are after here is the count of employees but as you can see, we are loading the whole Employee table in memory and then counting the rows. That is because we are calling the Count() method on a future query (after the call to the ToFuture() method). Part of the query till ToFuture() is executed as a database query, everything after that is executed in-memory on the results loaded.

Why not place the call to Count() before ToFuture() then? Well, we cannot. ToFuture() is an extension method on IQueryable<T>, so it can only be called on IQueryable<T>. Methods such as Count() and ToList() change the IQueryable<T> to IEnumerable<T>/int and hence it is not possible to call ToFuture() after these methods are called.

Ideally, we would want the following SQL to be generated:

SELECT Cast(Count(*) AS INT) AS col_0_0_
FROM employee employee0_;

Fortunately, you can do this with future queries. If your query returns a scalar value then you can utilize companion method ToFutureValue. This method takes a lambda expression as input and returns an instance of IFutureValue<T>. Following example demonstrates how we can use this method to get count of employees:

var employees = Database.Session.Query<Employee>()
        .ToFutureValue(e => e.Count());

In the preceding method, we have passed e.Count() as argument to the ToFutureValue method. NHibernate would use that to build the SQL. ToFutureValue returns an IFutureValue<T> instance. In our case, T is an integer hence IFutureValue<int> would be returned. When the query is executed, you can access the result of the query via the Value property present on the IFutureValue<T> instance, shown as follows:

Assert.That(employees.Value, Is.EqualTo(3));

Future queries are supported in QueryOver as well but the method signature is slightly different. There is a generic signature in QueryOver's version of future queries. In LINQ you had ToFuture, in QueryOver you have ToFuture<T>. ToFutureValue is same for QueryOver and works in exactly the same way.

Note

Future queries is just a wrapper over another feature called MultiQuery. MultiQuery lets you write queries that NHibernate should batch together and send to database for execution in single roundtrip. MultiQuery is only supported for HQL and criteria. Future queries is the only way to use MultiQuery in QueryOver and LINQ. Since I have been encouraging the use of LINQ more and more, I do not intend to cover MultiQuery separately. Avid readers may choose to explore this topic on their own.

Batching lazy collections

Lazy collections are loaded as you access them. Also, the SQL statements required to load the lazy collections are generated internally by NHibernate. Future queries do not help in this situation. But NHibernate has a special batching support for lazy collection. If you are loading multiple lazy collections, then you can tell NHibernate to load a bunch of them together instead of one by one as they are accessed. Let's see how this works. Following code loads a bunch of employees who live in London. It then iterates through each of the loaded employee instance and accesses the Benefits collection on them.

var employees = Database.Session.Query<Employee>()
.Where(e => e.ResidentialAddress.City == "London");

foreach (var employee in employees)
{
  foreach (var benefit in employee.Benefits)
  {
    //do something with benefit here
  }
}

As we loop through employee instances and access the Benefits collection on each instance, a SELECT statement similar to the following is issued:

SELECT *--all columns from Benefit table
FROM benefit benefits0_
--join with other concrete benefit tables
WHERE benefits0_.employee_id =@p0;
@p0 = 11 [TYPE: Int32 (0)]

The actual SELECT statement is quite big and hence a shortened version highlighting more relevant part is presented here.

For our example, three employee instances having IDs 11, 12, and 13 are loaded. Correspondingly, three SELECT statements are sent to database in separate roundtrip as the Benefits collection on each instance is accessed. This is not very efficient and we can tell NHibernate to be forward looking and fetch more than one collection when it hits the database to load first collection. This is called batching of lazy collections. Following code listing depicts how to batch loading of three Benefits collection together:

Set(e => e.Benefits,
mapper =>
{
  mapper.Key(k => k.Column("Employee_Id"));
  mapper.Cascade(Cascade.All.Include(Cascade.DeleteOrphans));
  mapper.Inverse(true);
  mapper.BatchSize(3);
},
relation => relation.OneToMany(mapping =>
mapping.Class(typeof(Benefit))));

Preceding code snippet is from mapping of the Employee entity where the Benefits collection on the Employee entity is mapped. As part of one-to-many and many-to-many collection mapping, we can call method BatchSize to tell NHibernate to pre-fetch collections in a batch. The size of the batch is passed as parameter to the BatchSize method. After declaring a batch size, we get the following single SELECT statement in place of three we had observed earlier:

SELECT *
--all columns from Benefit table
FROM benefit benefits0_
--join with other concrete benefit tables
WHERE benefits0_.employee_id IN ( @p0, @p1, @p2 );
@p0 = 11 [Type: Int32 (0)],
@p1 = 12 [Type: Int32 (0)],
@p2 = 13 [Type: Int32 (0)]

Note how NHibernate changed the WHERE clause from matching a single employee ID to matching three employee IDs in one go, thus saving us two database roundtrips. If we had 10 employee instances, then NHibernate would load their benefits collection in batches of 3, 3, 3, and 1. These batches are loaded as they are accessed and not all at once. So if you had 10 employee instances, then first batch would be loaded when the benefits collection on the first instance is accessed. Second batch would be loaded when the benefits collection on the fourth instance is accessed, and so on.

Note

The SELECT statement generated with batching turned on uses the IN clause. The IN clauses may perform badly in some situations. So always compare performance with and without batching to confirm whether you are actually gaining anything out of batching lazy collections. Optimal size of batch is also a factor that plays an important role here. So play with multiple settings before you come to a conclusion.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.168.214