Chapter 6. Serving

Fast Data applications are built to deliver continuous results and are consumed by other apps and microservices. Some examples include real-time dashboards for monitoring business Key Performance Indicators (KPIs), or an application to enrich the analytical capabilities of Business Intelligence (BI) software, or an aggregation of messages to be queried by a RESTful API. Applications may also apply machine learning (ML) techniques to the data, such as scoring an ML model, or even train a model on the fly.

Let’s explore some patterns we can use in a serving layer. We can use a Big Table–based technology, such as Cassandra or HBase (or, more traditionally, RDBMS), that is continuously updated. Users then consume the data with client applications that read from them.

Importing batch data into highly indexed and aggregated data stores used with analytical data-mining BI tools is a common practice. A newer trend is to use Streaming SQL to apply analytical transformations on live data. Streaming SQL is supported by all major stream processors including Apache Spark, Apache Flink, and Kafka Streams.

Finally, it is possible to serve data directly from the message backbone. For example, we can consume messages from a Kafka topic into a dashboard to build a dynamic, low-latency web application.

Sharing Stateful Streaming State

When running stateful streaming applications, another possibility is to share a view of that state directly. This is a relatively new ability in stream processors. Some options available today are Flink Queryable State and interactive queries for Kafka Streams, including its akka-http implementation by Lightbend.

With respect to machine learning, it’s possible to integrate state with machine learning models to facilitate scoring. For example, see the Flink Improvement Proposal (FLIP) for Model Serving.

Data-Driven Microservices

Just as our Fast Data applications are data-driven, so are microservices. In fact, implementing microservices in this way is not a new concept. If we drop the fashionable label of “microservices” and think of them as application services, then we’ve seen this pattern before with service-oriented architecture (SOA) and enterprise service bus (ESB).

We can link microservices in a similar fashion to using an ESB, but by using Apache Kafka as the message backbone instead. As an example, Figure 6-1 illustrates a simplified architecture for an e-commerce website that relies on Kafka as the messaging infrastructure that supports its message exchange model. By using Kafka, we can to scale our services to support very high volume as well as easily integrate with stream processors.

Microservices Architecture Example
Figure 6-1. A simplified e-commerce example of a microservices architecture using Kafka as a message bus

In microservices, we promote nonblocking operations that reduce latency and overhead by asynchronously publishing and subscribing to event messages. A service subscribes to messages to get state changes to relevant domain entities, and publishes messages to inform other services of its own state changes. A service becomes more resilient by encapsulating its own state and becoming the gateway to accessing it. A service can stay online without failing as a consequence of dependent services going down. In such a case, the service will continue to function but may not be up-to-date.

Microservices share many of the same properties of a Fast Data application, to the point that it’s becoming more difficult to distinguish them. Both stream unbounded sets of data in the form of subscribing to messages or listening for API calls. Both are always online. Both output something in the form of API responses or new messages to be subscribed to by yet other microservices or Fast Data applications.

The main distinction is that a microservice allows for general application development in which we’re free to implement any logic we want, whereas a Fast Data application is implemented using a stream processor that may constrain us in various ways. However, Fast Data apps are becoming more generic as well. Therefore, we conclude that microservices and Fast Data applications are converging and often have the same domain requirements, use the same design patterns, and have the same operational experiences.

State and Microservices

Stream processors have various means to maintain state, but historically it’s been challenging to provide a rich stateful experience. We’ve already mentioned that new libraries are available to share state in stream processors, but the technology is still in its early days, and developers are often forced to call out to more general-purpose storage.

Akka is a great way to model a complex domain and maintain state. Akka is a toolkit, not a framework, so you can bring components in when you need them to build highly customized general-purpose applications that can stream from Kafka (reactive-kafka), expose HTTP APIs (akka-http), persist to various databases (Akka Persistence, JDBC, etc.), and much more.

Akka Cluster provides the tools to distribute your state across more than one machine. It includes advanced cluster-forming algorithms and conflict-free replicated data types (CRDTs, similar to those used by various distributed databases), and can even straddle multiple data centers. Exploring Akka is a huge topic, but it’s essential reading when you have a complex stateful requirement for your Fast Data platform.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.170.63