Impact of different processing models on memory consumption

Another critical component of framework analysis is comparing memory usage. Thinking back to our discussion of the Thread per connection model in Chapter 1, Why Reactive Spring?, we know that, instead of allocating memory for tiny events' objects, we allocate a huge dedicated Thread for each new connection. The first thing that we should bear in mind is that a Thread keeps some space free for its stack. The actual stack size depends on the OS and JVM configuration. By default, for most common servers running on 64-bit, the VM stack size is 1 MB.

An event means a signal about changes in the system state, such as an opened connection or data availability.

For the high-load scenario, with this technique, we will have high memory consumption. At most, there will be an unreasonable overhead to keep the whole 1 MB stack along with the request and response body. If a dedicated thread pool is limited, it will lead to the throughput and average latency decreasing. So, in Web MVC, we have to balance the memory usage and the system throughput. In contrast, as we learned from the previous section, WebFlux can use a fixed number of Thread instances to process much more requests and at the same time use a predictable amount of memory. To get a full understanding of how memory is used in previous measurements, take a look at the memory usage comparison:

Diagram 6.22. A comparison of the memory usage of WebFlux and Web MVC

In the preceding diagram, the solid line is for Web MVC and the dashed line is for WebFlux. In this case, lower is better. It should be noticed that both applications will be given additional JVM parameters—Xms26GB and Xmx26GB. This means that both applications have access to the same amount of dedicated memory. However, for Web MVC, memory usage grows with increased parallelization. As mentioned at the beginning of this section, the usual Thread stack size is 1 MB. In our case, the Thread stack size is set to -Xss512K, so each new thread takes an additional ~512 KB of memory. Hence, for the thread-per-connection model, we have inefficient memory usage.

In contrast, for WebFlux, memory usage is stable in spite of parallelization. This means that WebFlux consumes memory more optimally. In other words, it means that, with WebFlux, we may use cheaper servers.

To make sure this is the right assumption, let's try to run a small experiment, again on the predictability of memory usage and how it might help us in unpredictable situations. For this test, we will try to analyze how much money we will spend on cloud infrastructure using Web MVC and WebFlux.

To measure the upper limit of the system, we will carry out stress testing and verify how many requests our system will be able to handle. In running our web application, we will launch an Amazon EC2 t2.small instance, which has one virtual CPU and 2 GB of RAM. The operating system will be Amazon Linux with JDK 1.8.0_144 and VM 25.144-b01. For the first round of measurements, we will use Spring Boot 2.0.x and Web MVC with Tomcat. Also, to simulate network calls and other I/O activity, which are a usual component of the modern system, we will use the following naïve piece of code:

@RestController
@SpringBootApplication
public class BlockingDemoApplication {
   ...
   @GetMapping("/endpoint")
   public String get() throws InterruptedException {
      Thread.sleep(1000);
      return "Hello";
   }
}

To run our application, we will use the following command:

java -Xmx2g 
     -Xms1g
     -Dserver.tomcat.max-threads=20000 
     -Dserver.tomcat.max-connections=20000 
     -Dserver.tomcat.accept-count=20000 
     -jar blocking-demo-0.0.1-SNAPSHOT.jar

So, with the preceding configuration, we will check whether our system can handle up to 20,000 users without failures. If we run our load test, we will get the following results:

Number of simultaneous requests	Average latency (milliseconds)
100	1,271
1,000	1,429
10,000	`OutOfMemoryError`/Killed

These results may vary over time, but on average they will be identical. As we can see, 2 GB of memory is not enough to handle 10,000 independent threads per connection. Of course, by tuning and playing around with the specific configuration of JVM and Tomcat, we might be able to slightly improve our results, but this does not solve the problem of unreasonable memory wastage. By keeping the same application server and just switching to WebFlux over Servlet 3.1, we may see significant improvements. The new web application looks as follows:

@RestController
@SpringBootApplication
public class TomcatNonBlockingDemoApplication {
   ...
   @GetMapping("/endpoint")
   public Mono<String> get() {
      return Mono.just("Hello")
                 .delaySubscription(Duration.ofSeconds(1));
   }
}

In this case, the interaction simulation with I/O will be asynchronous and non-blocking, which is easily available with the fluent Reactor 3 API.

Note that the default server engine for WebFlux is Reactor-Netty. So, in order to switch to the Tomcat web server, we have to exclude spring-boot-starter-reactor-netty from WebFlux and provide a dependency on the spring-boot-starter-tomcat module.

To run a new stack, we will use the following command:

java -Xmx2g 
     -Xms1g
     -Dserver.tomcat.accept-count=20000 
     -jar non-blocking-demo-tomcat-0.0.1-SNAPSHOT.jar

Similarly, we allocate all RAM for our Java application, but in this case, we use the default thread pool size, which is 200 threads. By running the same tests, we will get the following results:

Number of simultaneous requests	Average latency (milliseconds)
100	1,203
1,000	1,407
10,000	9,661

As we can observe, in this case, our application shows much better results. Our results are still not ideal, since some of the users with a high load will have to wait for quite a long time. To improve the outcome, let's check the throughput and latency of a genuinely reactive server, which is Reactor-Netty.

Since the code and command for running the new web application are identical, let's cover just the benchmark results:

Number of simultaneous requests	Average latency (milliseconds)
1,000	1,370
10,000	2,699
20,000	6,310

As we can see, the results are much better. First of all, for Netty, we chose a minimum throughput of 1,000 connections at once. The upper limit was set to 20,000. This is enough to show that Netty as a server gives twice the performance of Tomcat with the same configurations. This comparison alone shows that WebFlux-based solutions may reduce the cost of infrastructure, because now our applications fit on cheaper servers and consume resources in a much more efficient way.

Another bonus that comes with the WebFlux module is the ability to process the incoming request body faster, with less memory consumption. This feature turns on when the incoming body is a collection of elements and our system can process each item separately:

Diagram 6.23. WebFlux processing a large array of data in small chunks

To learn more about reactive message encoding and decoding, please see this link: https://docs.spring.io/spring/docs/current/spring-framework-reference/web-reactive.html#webflux-codecs.

As we can see from the preceding diagram, the system requires just a small piece of the request body in order to start processing data. The same may be achieved when we send a response body to the client. We do not have to wait for the whole response body and instead may start writing each element to the network as it comes. The following shows how we may achieve this with WebFlux:

@RestController
@RequestMapping("/api/json")
class BigJSONProcessorController {

   @GetMapping(
      value = "/process-json",
      produces = MediaType.APPLICATION_STREAM_JSON_VALUE
   )
   public Flux<ProcessedItem> processOneByOne(Flux<Item> bodyFlux) {
      return bodyFlux
         .map(item -> processItem(item))
         .filter(processedItem -> filterItem(processedItem));
   }
}

As we can see from the preceding code, such amazing features are available without hacking the internals of the Spring WebFlux module and may be achieved with the usage of an available API. In addition, the usage of such a processing model allows us to return the first response much faster, since the time between uploading the first item to the network and receiving the response is equal to the following:

Note that the technique of streaming data processing does not allow us to predict the content length of the response body, which may be considered a disadvantage.

By comparison, Web MVC needs to upload the whole request into memory. Only after that can it process the incoming body:

Diagram 6.24. Web MVC processing a large array of data at once

It is impossible to process data reactively, as in WebFlux, since the usual declaration of @Controller looks as follows:

@RestController
@RequestMapping("/api/json")
class BigJSONProcessorController {

   @GetMapping("/process-json") 
   public List<ProcessedItem> processOneByOne(
      List<Item> bodyList
   ) {
      return bodyList
         .stream()
         .map(item -> processItem(item))
         .filter(processedItem -> filterItem(processedItem))
         .collect(toList());
   }
}

Here, the method declaration explicitly requires the full request body to be converted into collections of particular items. From a mathematical perspective, the average processing time is equal to the following:

Again, returning the first result to the user requires the whole request body to be processed and the results to be aggregated to the collection. Only after that will our system be able to send a response to the client. This means that WebFlux uses much less memory than Web MVC. WebFlux will be able to return the first response much faster than Web MVC and is capable of processing an infinite stream of data.

Table of Contents for Impact of different processing models on memory consumption

Create new playlist

Sign In

Sign Up

Table of Contents for
Impact of different processing models on memory consumption