To verify this statement, let's try to do a simple load test. For this purpose, we are going to use a simple Spring Boot 2.x application with Web MVC or WebFlux (let's call it the middleware). We are also going to simulate I/O activity from the middleware by making a few network calls to a third-party service, which will return an empty successful response with a guaranteed 200 milliseconds of average latency. The communication flow is depicted as follows:
To launch our middleware and simulate client activity, we are going to use a Microsoft Azure infrastructure with Ubuntu Server 16.04 installed on each machine. For the middleware, we are going to use D12 v2 VM (4 virtual CPUs and 28 GB RAM). For the client, we are going to use F4 v2 VM (4 virtual CPUs and 8 GB RAM). User activity will be increased sequentially in small steps. We are going to start our load test with four simultaneous users and finish with 20,000 simultaneous users. This will give us a smooth latency curve and throughput change and allow us to create understandable graphics. To produce an appropriate load on the middleware and collect statistics and measurement characteristics correctly, we are going to use a modern HTTP benchmarking tool called wrk (https://github.com/wg/wrk).
The following is an example of the Web MVC middleware code used for measurements:
@RestController // (1)
@SpringBootApplication //
public class BlockingDemoApplication //
implements InitializingBean { //
... // (1.1)
@GetMapping("/") // (2)
public void get() { //
restTemplate.getForObject(someUri, String.class); // (2.1)
restTemplate.getForObject(someUri, String.class); // (2.2)
} //
... //
} //
The preceding code can be described as follows:
- This is the declaration of the class, annotated by@SpringBootApplication. At the same time, this class is a controller annotated with @RestController. To keep this example as simple as possible, we have skipped the initialization process and declared fields in this class, as shown at point (1.1).
- Here, we have a get method with the @GetMapping declaration. In order to reduce redundant network traffic and focus only on framework performance, we do not return any content in the response body. According to the flow mentioned in the preceding diagram, we perform two HTTP requests to the remote server, as shown at points (2.1) and (2.2).
As we can see from the previous example and schema, the middleware's average response time should be around 400 milliseconds.
Note that, for this test, we are going to use a Tomcat web server, which is the default for Web MVC. In addition, to see how the performance changes in Web MVC, we are going to set up as many Thread instances as simultaneous users. The following sh script shows a setup for Tomcat:
java -Xss512K -Xmx24G -Xms24G
-Dserver.tomcat.prestartmin-spare-threads=true
-Dserver.tomcat.prestart-min-spare-threads=true
-Dserver.tomcat.max-threads=$1
-Dserver.tomcat.min-spare-threads=$1
-Dserver.tomcat.max-connections=100000
-Dserver.tomcat.accept-count=100000
-jar ...
As we can see from the preceding script, the values of the max-threads and min-spare-threads parameters are dynamic and are defined by the number of parallel users in the test.
By launching the test suite against our service, we will get the following result curve:
The preceding diagram shows that, at some point, we start losing throughput, which means that there is contention or incoherence in our application.
In order to compare the performance results of the Web MVC framework, we have to run an identical test for WebFlux as well. The following is the code that we use in order to measure WebFlux-based application performance:
@RestController
@SpringBootApplication
public class ReactiveDemoApplication
implements InitializingBean {
...
@GetMapping("/")
public Mono<Void> get() { // (1)
return //
webClient //
.get() // (2)
.uri(someUri) //
.retrieve() //
.bodyToMono(DataBuffer.class) //
.doOnNext(DataBufferUtils::release) //
.then( // (3)
webClient //
.get() // (4)
.uri(someUri) //
.retrieve() //
.bodyToMono(DataBuffer.class) //
.doOnNext(DataBufferUtils::release) //
.then() //
) //
.then(); // (5)
}
...
}
The preceding code shows that we are now actively using Spring WebFlux and Project Reactor features in order to achieve asynchronous and non-blocking requests and response processing. Just as in the Web MVC case, at point (1), we return a Void result, but it is now wrapped in the reactive type, Mono. Then, we execute a remote call using the WebClient API, and then at point (3), we perform – in the same sequential fashion – the second remote call, shown at point (4). Finally, we skip the result of the execution of both calls and return a Mono<Void> result that notifies the subscriber of the completion of both executions.
By launching the test suite against our WebFlux-based middleware, we will get the following result curve:
As we may see from the preceding chart, the tendency of the WebFlux curve is somewhat similar to the WebMVC curve.
In order to compare both curves, let's put them on the same plot:
In the preceding diagram, the line of + (plus) symbols is for Web MVC and the line of - (dash) symbols is for WebFlux. In this case, higher means better; as we can see, WebFlux has almost twice the throughput.
Also, it should be noted here that there are no measurements for Web MVC after 12,000 parallel users. The problem is that Tomcat's thread pool takes too much memory and does not fit in the given 28 GB. Therefore, each time Tomcat tries to dedicate more than 12,000 Thread instances, the Linux kernel kills that process. This point emphasizes that the thread-per-connection model does not fit in cases where we need to handle more than around 10,000 users.
Nevertheless, both curves show a similar tendency and have critical points, after which they start degrading in throughput. This may be explained by the fact that many systems have their limitations in terms of open client connections. In addition, the comparison may be a bit unfair since we use different implementations of HTTP clients, with different configurations. For example, the default connection strategy for RestTemplate is to allocate a new HTTP connection on each new call. In contrast, the default Netty-based WebClient implementation uses a connection pool under the hood. In this case, a connection may be reused. Even though the system may be tuned to reuse opened connections, such a comparison may be misrepresentative.
Therefore, to get a better comparison, we are going to simulate the network activity by providing a 400 millisecond delay. For both cases, the following code is used:
Mono.empty()
.delaySubscription(Duration.ofMillis(200))
.then(Mono.empty()
.delaySubscription(Duration.ofMillis(200)))
.then()
For WebFlux, the return type is Mono<Void>, and for Web MVC, the execution flow is ended by calling the .block() operation, so the Thread will be blocked for a specified delay. Here, we use the same code in order to get identical behavior for delay scheduling.
We are also going to use a similar cloud setup. For the middleware, we are going to use E4S V3 VM (four virtual CPUs and 32 GB of RAM) and for the client, B4MS VM (four virtual CPUs and 16 GB RAM).
By running our test suite against the services, the following results can be observed:
In the preceding diagram, the line of + (plus) symbols is for Web MVC and the line of - (dash) symbols is for WebFlux. As we can see, the overall results are higher than with real external calls. That means that either a connection pool within the application or a connection policy within the operating system has a huge impact on system performance.
Nevertheless, WebFlux is still showing twice the throughput of Web MVC, which finally proves our assumption about the inefficiency of the thread-per-connection model. WebFlux still behaves as was proposed by Amdahl's Law. However, we should remember that, along with application limitations, there are system limitations, which may alter our interpretation of the final results.
We can also compare both modules with regard to their latency and CPU usage, which are depicted in diagrams 6.18 and 6.19 respectively:
In the preceding diagram, the line of + (plus) symbols is for Web MVC and the line of - (dash) symbols is for WebFlux. In this case, the lower the result, the better. The preceding diagram depicts a huge degradation in latency for Web MVC. At a parallelization level of 12,000 simultaneous users, WebFlux shows a response time that is around 2.1 times better.
From the perspective of CPU usage, we have the following tendency:
In the preceding diagram, the solid line is for Web MVC and the dashed line is for WebFlux. Again, the lower the result, the better in this case. We can conclude that WebFlux is much more efficient with regard to throughput, latency, and CPU usage. The difference in CPU usage may be explained by the redundant work context switching between different Thread instances.