Wire protocol for database access

There are types of databases called embedded databases. Such databases run inside the application process itself and do not require any communication over the network. For embedded databases, there is no hard requirement to have a wire protocol, even though some have or may run both as an embedded mode or as a separate service. Later in the chapter, we are going to use the H2 embedded database in a few examples.

However, most software uses databases that run in separate processes on separate servers (or in separate containers). An application uses a specialized client library called a database driver to communicate with an external database. Furthermore, the wire protocol defines how the database driver and the database itself communicates. It defines the format of the order of messages sent between a client and the database. In most cases, a wire protocol is language independent so that a Java application can query a database written in C++.

As wire protocols are usually designed to work over TCP/IP, there is no need for the wire protocol to be blocking. As with the synchronous HTTP communication, the protocol itself is not blocking, and it is a client that decides to be blocked while waiting for results. Moreover, TCP is an asynchronous protocol that supports backpressure via the flow control implemented by the sliding window. However, the sliding window approach is tuned for sending chunks of bytes over the network and may not reflect the needs of the application's backpressure in the best way. For example, when receiving rows from a database, it's more natural to request the next portion of data processing in the number of rows rather than relying on the system settings that define network buffer size. Of course, the wire protocol may intentionally use another mechanism or even a combination of mechanisms for backpressure implementation, but it is essential to remember that TCP mechanisms also work under the hood all the time.

It is also possible to use a more high-level protocol as a basis for the database wire protocol. For example, we may use HTTP2, WebSockets, gRPC, or RSocket. Chapter 8Scaling Up with Cloud Streams, makes a brief comparison of the RSocket and gRPC protocols.

Aside from the backpressure concerns, there are different approaches for communicating big datasets between the client and the database. For example, a client inserts tens of thousands of rows of data, or an analytical query result contains millions of rows. For the sake of simplicity, let's consider only the latter use case. In general, there are a few approaches for communicating such a result set:

  • Calculate the entire result set on the database side, putting the data into a container and sending the container entirely as soon as the query finishes. This approach does not imply any logical backpressure and requires huge buffers on the database side (and also potentially on the client side). Furthermore, the client receives its first results only after the whole query is executed. Such an approach is easy to implement. Furthermore, the query execution process does not last too long and may cause less contention with updated queries happening at the same time.
  • Send result set in chunks as the client requests them. The query may be executed entirely, and results may be stored in a buffer. Alternatively, the database may execute the query only to the point where it fills one or a few requested chunks and continues execution only after it has communicated the client's demand. This way of operating may require fewer memory buffers, returns the first rows when the query is still running, and makes it possible to propagate logical backpressure or query cancelation.
  • Send results as a stream as soon as such results are obtained during the query execution. On top of that, the client may also inform the database about the demand and propagate logical backpressure that may, in turn, impact the query execution process. Such an approach requires almost no additional buffers, and the client receives the first row of the result as soon as it is possible. However, this way of communicating may under-utilize the network and CPU due to a very chatty manner of communication and frequent system calls.

The following diagram shows the interaction flow for a chunked result flow:

Diagram 7.11 Iterating over the query results using chunks

In general, different databases implement one or more of the approaches in their wire protocols. For example, MySQL knows how to send data as a whole or as a stream, row by row. At the same time, PostgreSQL databases have a concept called portal, which allows the client to propagate as many rows of data as they are ready to receive. The preceding diagram depicts how a Java application may use such an approach.

At this level, a well-designed database wire protocol may already have all the characteristics required to be reactive. At the same time, even the most straightforward protocol may potentially be wrapped with a reactive driver that may use TCP control flow for backpressure propagation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.191.86