9. Message- and Event-Driven Architectures

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9. Message- and Event-Driven Architectures

A message-driven architecture is one that emphasizes sending and receiving messages as playing a prominent role throughout the system. In general, message-driven architectures have been chosen less often compared to REpresentational State Transfer (REST) and remote procedure calls (RPC). This is because REST and RPC seem more similar to general-purpose programming language paradigms than does messaging; the former approaches provide abstractions that give the impression of procedure calls and method invocations, with which many programmers are already familiar.

Yet, REST and RPC are brittle mechanisms in comparison to general-purpose programming languages. It’s highly unlikely that a procedure call or method invocation will fail due to brittle mechanisms within a programming language. With the REST-over-HTTP and RPC approaches, it is very likely that failures will occur due to network and remote service failures. When failure does occur, the temporal coupling between one remote service and another will tend to cause a complete failure of the client service. The more remote services or subsystems that are involved in the given use case, the worse the problem can become. As Leslie Lamport, a distributed systems expert, described it:

A distributed system is one that prevents you from working because of the failure of a machine that you had never heard of.

That sort of cascading failure tends to be avoided when systems use asynchronous messaging, because the requests and responses are all temporally decoupled. Figure 9.1 highlights the relaxed temporal dependencies across subsystems involved in a choreographed event-driven process. To be clear, events capturing and communicating business interests are (generally) a form of message, and message-driven processes are a superset of event-driven processes.¹

¹ Some use a strict definition of a message, specifying that it must be directly sent peer-to-peer from sender to receiver. They may also constrain events to only those sent via pub-sub. The authors hold the opinion that this is an overly limiting perspective. The next section discusses using poll-based REST as a means to read event logs. Although many consumers might read such a feed of events, it is not the same as the push model of pub-sub. Of course, anyone can hold an opinion, so neither view is necessarily wrong.

Choreographed and Orchestrated Processes

These are two primary styles of process management: choreography and orchestration. Choreography comprises a decentralized style of process where, for example, events are published using messaging, and each subsystem context must determine whether the event is relevant to it, and if so, apply the event to its state. One or more events emitted in response by the subsystem in context will be relevant to one or more other subsystems. Choreography is relatively simple to understand and is most practical when processes have only a few steps. One disadvantage of this process management style is that when the process stalls somewhere, it can be difficult to determine where it went wrong and why. Another disadvantage is that event dependencies become coupled to subsystems that don’t own the events, and must be subjectively interpreted and applied to their own purpose. And, of course, dependencies on events can become quite tangled as system and process complexities increase.

Orchestration, in contrast, features a centralized style of process manager (i.e., Saga) that receives events emitted by any number of subsystems involved in the process, and then creates command messages that drive subsequent steps of the process to relevant subsystems. The advantages of using orchestrated processes include reduced dependencies across subsystems involved in a given process because the orchestrator takes on the entire responsibility of translating from event to command. The orchestrator can be a central point of failure, but that’s generally not a significant concern given the scalability and failover strategies common to well-designed distributed systems. Typically, an orchestrator is designed and implemented by the team most interested in its ultimate outcome. They might become a blocker when changes are made across subsystems involved in the process that must be applied inside the orchestrator. An orchestrator might be too complex for controlling processes with less complexity. The orchestrator must not become a dungeon for business logic; it must be used only to drive steps of the process.

Figure 9.1 Event-driven choreography: events over a message bus translated to commands.

The overall system works by means of the events that occur in a given subsystem being made available to other subsystems via a message bus. Message-driven architecture is also a Reactive architecture because the components, as seen in Figure 9.1, are passive until message stimuli occur, and then the appropriate components react to those stimuli. In contrast, imperative code drives responses by means of REST and RPC procedure calls or method invocations. Reactive is defined as having four primary characteristics: responsive, resilient, elastic, and message-driven [Reactive].

In Figure 9.1, the six steps executed across three subsystem contexts (Underwriting, Risk, and Rate) collectively provide the calculated rate needed to quote a policy premium for an applicant over the Web. The Underwriting subsystem context is blissfully unaware of the details involved in reaching the result. At some time in the future after the Application Submitted event occurs, Underwriting will be informed that a quotable premium rate is available.

The desired Underwriting outcome could require 2 seconds or 12 seconds to achieve. There is no failure in Underwriting because of the infrastructure being preconfigured to time out after 5 seconds, which would hold sway over the response to a REST request. It’s not that 12 seconds is an acceptable long-term Service Level Agreement (SLA)—but it is perfectly acceptable in the face of full failure of the Risk or Rate subsystem, followed by full recovery on a different cloud infrastructure, and possibly even in another region. Neither ordinary REST nor RPC would survive that pathway.

Note a detail of the process in Figure 9.1 that’s due to the nature of choreography: Events sent on the message bus must be translated by the receiver so that the internal stimulus is cast in the language of the consuming context. The event Application Submitted means nothing to Risk—but it does after it’s translated to Assess Risk. The same goes for Risk Assessed in the context of Rate: When translated to Calculate Rate, it makes perfect sense because a rate can be determined from an Assessment Result that’s available as part of the Risk Assessed event.

While message bus communication is typically used for collaborative computing across subsystem contexts, it’s not a practical fit for message-driven communication within a single subsystem context. As Figure 9.2 illustrates, the individual components within a subsystem context might be implemented as actors that function in an Actor Model runtime. Every actor component is message-driven, such that actors are Reactive. When one actor wants another actor to do something, it sends a message to the other actor that will be delivered asynchronously. Although each message delivery executes asynchronously, the flow shown in Figure 9.2 functions in sequence according to the step numbers.

Figure 9.2 Reactive architecture inside a subsystem context is achieved by using the Actor Model.

In Figure 9.2, every individual actor is represented by a circular element (as opposed to rectangular ones that represent plain objects). The message sending between actors can be different from that based on the message bus. For example, steps 1 and 2 indicate that the Message Bridge actor, a driver adapter, receives an Application Submitted event message from the message bus and adapts it to a message sent on to the Risk Assessor actor. What appears to be a normal method invocation is not one typically used in object-to-object communication. This method invocation packs the intent of the invocation into an object-based message that is enqueued in the Risk Assessor actor’s mailbox. The assessRisk(facts) message will be delivered as an actual method invocation on the Risk Assessor actor implementation as soon as possible.

The nature of the Actor Model ensures that all computer processors are used all the time, making for highly efficient compute solutions that drive down the monetary cost of operations on both on-premises and cloud infrastructures. This is accomplished by the Actor Model runtime using the compute node’s limited number of threads, along with scheduling and dispatching that are performed cooperatively by all actors. The limited number of threads must be distributed across any number of actors. Each actor that has an available message in its mailbox is scheduled to run when a thread becomes available for it, and will (typically) handle only a single pending message while using that thread. Any remaining messages available to the actor are scheduled to be delivered after the current one completes.

This highlights another benefit of the Actor Model. Every actor will handle only one message at a time, meaning that actors are individually single-threaded, although many actors handling messages simultaneously over a short period of time mean that the overall runtime model is massively concurrent. The fact that each actor is single-threaded means that it need not protect its internal state data from use by two or more threads entering simultaneously. State protection is further strengthened by the rule that actors must not ever share mutable internal state.

The types of use cases highlighted in Figure 9.2 and throughout this book are specifically supported by VLINGO XOOM, a free and open source (FOSS) Reactive toolset based on the Actor Model for Monolith and Microservices architectures [VLINGO-XOOM].

Message- and Event-Based REST

As discussed in Chapter 8, REST can be used for integration across Bounded Contexts, but how does REST support message- and event-driven architectures? Although most people don’t think of REST in terms of messaging, that’s actually its specialty. The HTTP specification refers to every request and response in terms of messages. Thus, REST is by definition a message-driven architecture, and events are a distinct type of message. The trick is to turn the requests for event consumption into asynchronous operations. It’s these techniques that are not as well known as are typical Web application patterns.

But why would anyone want to serve messages, and specifically events, to consumers by means of REST? A few reasons include that the Web is highly scalable, developers everywhere understand the ideas behind the Web and HTTP, and serving static content is quite fast (and cacheable on the server and client). Developers in general tend to be unfamiliar with message buses and brokers, or at least less familiar with them than with HTTP. Being uncomfortable with message buses and brokers should not prevent the use of message- and event-driven architectures.

Event Logs

As a basic rule of thumb, every event persistently stored must be immutable; that is, it must never be changed. A second and related rule is that if an event is in error, rather than patching its persistent state, there must be a compensating event that is persisted later and eventually “fixes” the data in error when consumers apply the event over top of the data that they produced before the error was known. In other words, if consumers have already consumed an event in error, changing it later will not help those consumers. They should never start over and apply all events again from the beginning—that would likely lead to disaster. Carry these rules forward when considering the following points.

As events occur in a subsystem context, they should be collected persistently in the order in which they occurred. From the sequence of events, a series of event logs might be created, either virtually or physically. Figure 9.3 provides a depiction of these logs.

Figure 9.3 Event logs starting with 1–20 and continuing to nearly 1 million total events.

This can be done by using a formal database (as opposed to a flat file or directories of flat files) that supports a reliable sequence number that is assigned to each event and then incremented. Relational databases support this process through a feature known as sequences or with another feature called auto-increment columns. The event logs are created logically by determining a maximum number of entries in the individual logs and then creating a virtual moving window to serve each log dynamically.

Some negative trade-offs with a relational database include somewhat slow event log serving compared to purpose-built log databases. If a vast number of events is stored, the disk space used to maintain them over the long term could be problematic; however, if the system uses cloud-enabled relational databases, that concern will likely never become an actual problem. Even so, it might make sense to create a virtual array of tables, where each table will hold only a maximum number of events and excess events will then spill over into the next logical table. There is also the risk that users might be tempted to modify an existing event in the database, which must never be done. Because the logs are records of past occurrences, it would be incorrect to patch an event.

A good relational database and a developer with advanced skills in its use can not only support many millions of database rows in a single table, but also enable fast retrieval by means of correct key indexing. Although this statement is true, the viewpoint it expresses assumes that many millions of rows in a single database table are enough. In reality, some systems produce many millions or even billions of events every single day. If a relational database still makes sense under these conditions, a virtual array of tables can help. It might also seem obvious that such a system could use a highly scalable NoSQL database instead. That would also solve a set of problems, but it wouldn’t work well to insert new events using a monotonically increasing integer key. Doing so generally greatly hampers the sharding/hashing algorithms employed by such databases.

There are other ways to handle this situation. As pictured in Figure 9.3, maintaining event logs can be accomplished by writing a series of flat files to a normal disk that are servable REST resources. After each log file is written, the content would be available as static content. The static flat files can be replicated to a few or many servers, just as would be done when scaling typical website content.

A possible downside to this approach is the need for a flat-file structure that specifies not only how many events should be written into a single log flat file, but also how the files will be laid out on disk. Operating systems place limits on the number of files that can be held in a given directory. Even when a system is capable of storing a very great number of files in a single directory, these limits will slow access. An approach similar to the hierarchies used by email servers can make flat file access very fast.

The positive trade-off with a relational database is that the sheer number of flat files and directory layouts will offer little temptation to patch the contents. If that is not a deterrent, then secure access to the filesystem can be.

Whatever choice is made, there are several ways for the events to be consumed using REST, as described in the sections that follow.

Subscriber Polling

Subscribers can use simple polling of the log resources:

GET /streams/{name}/1-20
GET /streams/{name}/21-40
GET /streams/{name}/41-60
GET /streams/{name}/61-80
GET /streams/{name}/81-100
GET /streams/{name}/101-120

In this example, the {name} placeholder is replaced with the name of the stream being read, such as underwriting or, even more generally, policy-marketplace. The former would serve only Underwriting-specific events, while the latter would provide a full stream of all events over the various subsystem contexts, including Underwriting, Risk, and Rate.

The disadvantage is that if subscriber polling is not implemented correctly, clients will constantly request the next log, which is not yet available, and those requests could cause a lot of network traffic. Requests must also be limited to reasonably sized logs. This can be enforced by making the resource identities fixed ranges, with the next and previous logs referenced by hypermedia links in response headers. Caching techniques and timed read intervals can be established using response header metadata to smooth out request swamping. Additionally, even a partial log can be served by using a generic minted URI:

GET /streams/policy-marketplace/current

The current resource is the means to consume the most recent event log resource. If the current log—for example, 101-120—is beyond the previous event logs that have not yet been read by a given client, an HTTP response header will provide the link to navigate to the previous log, which would be read and applied before the current log. This backward navigation would continue until the client’s most recently applied event is read. From that point, all events not yet applied are applied, which would include navigating forward until the current log is reached. Once again, caching plays into this approach by preventing pre-read but not yet applied logs from being reread from the server, even when they are explicitly requested by means of a redundant GET operation. This is explained in more detail in Implementing Domain-Driven Design [IDDD], and in our own follow-on book, Implementing Strategic Monoliths and Microservices (Vernon & Jaskuła, Addison-Wesley, forthcoming).

Server-Sent Events

Although Server-Sent Events (SSE) are well known as support for server-to-browser event feeds, that is not the intended usage here. The problem with browser usage is that not all browsers support the SSE specification. Even so, SSE is a worthy integration option between the events producer and its non-browser, services/applications clients that need to receive the events.

The specification for SSE states that a client should request a long-lived connection to the server for a subscription. Upon subscribing, the client may specify the identifier of the last event that it successfully applied. In such a case, the client would have previously been subscribed but disconnected at some point within the stream:

GET /streams/policy-marketplace
. . .
Last-Event-ID: 102470

As implied by providing its current starting position, the client is responsible for maintaining its current position in the stream.

As a result of subscribing, the events available will stream from the beginning or from the position of the Last-Event-ID and continue until the client unsubscribes or otherwise disconnects. The following is the format that is approved by the SSE specification, though actual applications might contain more or less fields. Each event is followed by a blank line:

id: 102470
event: RiskAssessed
data: { “name” : “value”, ... }

. . .
id: 102480
event: RateCalculated
data: { “name” : “value”, ... }

. . .

To unsubscribe from the stream, the client sends the following message:

DELETE /streams/policy-marketplace

When this message is sent, the subscription is terminated, the server sends a 200 OK response, and the server closes its end of the channel. After receiving its 200 OK response, the client should also close the channel.

Event-Driven and Process Management

Earlier sections in this chapter clarified the ideas behind event-driven process management where the focus is on choreographed processes. Choreography requires the Bounded Contexts participating in a process to understand the events from one or more other contexts, and to interpret those events based on their local meaning. Here, the focus shifts to orchestration² [Conductor], putting a central component in charge of driving the process from start to finish.³

³ It is possible that a process might never end because ongoing streams of messages—whether events, commands, queries and their results, or something else—might never end. Here, start-to-finish processes are used for practical purposes, but be aware that there are no limitations imposed by this style.

² Netflix found it was harder to scale choreography-based processes in the face of its growing business needs and complexities. A choreographed pub-sub model worked for the simplest of the flows, but quickly showed its limits. For that reason, Netflix has created its own orchestration framework called Conductor.

In Figure 9.4, the process manager named Application Premium Process is responsible for driving the outcome of a fully constructed quote to an applicant who has submitted an application, the steps for which are described in the list that follows.

Figure 9.4 Orchestration: commands are sent on the bus to drive process results.

1. The event ApplicationSubmitted has occurred as a result of the Aggregate type Application being created from the applicant’s submitted application document. For the sake of brevity, the Application instance creation is not shown. The process begins when the process manager sees the ApplicationSubmitted event.

2. The ApplicationSubmitted event is translated to a command named AssessRisk and enqueued on the message bus.

3. The AssessRisk command is delivered to the Risk Context, where it is dispatched to the Domain Service named RiskAssessor. Below the RiskAssessor are processing details (not shown in Figure 9.4).

4. Once the risk has been assessed, the RiskAssessed event is emitted and enqueued on the message bus.

5. The RiskAssessed event is delivered to the process manager.

6. The RiskAssessed event is translated to the CalculateRate command and enqueued on the message bus.

7. The CalculateRate command is delivered to the Rate Context, where it is dispatched to the Domain Service named RateCalculator. Below the RateCalculator are processing details (not shown in Figure 9.4).

8. Once the rate has been calculated, the RateCalculated event is emitted and enqueued on the message bus.

9. The RateCalculated event is delivered to the process manager.

10. The RateCalculated event is translated to the GenerateQuote command and dispatched locally and directly to the Domain Service named QuoteGenerator. The QuoteGenerator is responsible for interpreting the PremiumRates to QuoteLines and dispatched to the Aggregate named PremiumQuote (see Chapter 7, “Modeling Domain Concepts,” and Chapter 8, “Foundation Architecture,” for more details). When the final QuoteLine is recorded, the QuoteGenerated event is emitted and stored in the database.

11. Once an event is stored in the database, it can be enqueued on the message bus—and this is true for the QuoteGenerated event. In the case of the Application Premium Process, the receipt of the QuoteGenerated event marks the end of the process.

Examining Figure 9.4, it might appear that an attempt to enqueue events and commands on the message bus could fail, causing the overall failure of the process. Consider, however, that all of the events and the commands translated from them are first persisted into a database, and then placed on the message bus, sometimes repeatedly until this effort succeeds. This establishes an at-least-once delivery contract. Steps 10 and 11 highlight the persistence first, enqueuing second sequence. However, going to that level of detail on all steps illustrated in Figure 9.4 would detract from the main flow and obscure the main points that should be gleaned from the example.

With orchestrated processes, the process manager is responsible for driving the process. This generally places the process itself downstream, so that the collaborating contexts do not need to know anything about the process details, only how to perform their core responsibilities.

In the preceding example, the Application Premium Process is housed in the Underwriting Context. This need not be the case, however, because the process might be deployed separately. Yet, by default it makes sense for the process to be deployed along with the Bounded Context components that require the process to be accomplished. This design choice has been made for the Application Premium Process, which is placed within the Underwriting Context. Such a design tends to reduce the complexity of the overall process.

The question remains: Are the Application Premium Process and the contexts involved deployed as a Monolith or as separate Microservices? The use of a message bus, as seen in Figure 9.4, might seem to imply a Microservices architecture. That’s possible, but not necessarily so:

▪ The message bus might be provided inside a Monolith using lightweight messaging, such as with ZeroMQ.

▪ The teams might have decided that the Monolith should have used more reliable messaging middleware or a cloud-based message bus (or message log) such as RabbitMQ, Kafka, IBM MQ, implementations of JMS, AWS SNS, AWS Kinesis, Google Cloud Pub/Sub, or Azure Message Bus.⁴ Choose whatever works best for your project requirements and SLAs.

⁴ The number of possible messaging mechanisms is too large to present an exhaustive list here. The options identified here are among some better known to the authors, and are generally used extensively.

▪ The solution might require using a Microservices architecture, or a blend of Monoliths and Microservices. Reliable messaging mechanisms, whether cloud-based or on premises, are the sound choices for these situations.

As discussed in Chapter 6, use of a schema registry reduces the complexity of cross-context dependencies and translations into and out of various Published Languages, which is required of the Application Premium Process. One such FOSS schema registry is provided with VLINGO XOOM—namely, Schemata [VLINGO-XOOM].

Event Sourcing

It’s become customary for software developers to store objects in a relational database. With a domain-driven approach, it’s generally a matter of persisting the state of whole Aggregates that way. Tools called object-relational mappers are available to help with this task. Of late, several relational databases have innovated around storing objects that are serialized as JSON, which is a good trade-off for addressing the common impedance⁵ of the competing forces of the object and relational models. For one thing, a serialized JSON object can be queried in much the same way as relational columns by means of specialized SQL extensions.

⁵ Many architects and developers are familiar with these impedances, so this chapter doesn’t offer exhaustive descriptions of them. They generally relate to the desire to structure an object for some modeling advantage, which then runs up against the limits of object-relational mapping tools and/or databases. In such cases, the object-relational mapping tools and databases win, and the object modeler loses.

Yet, there is an alternative, rather radically different approach to object persistence that emphasizes the opposite: Don’t store objects; store records of their changes instead. This practice, known as Event Sourcing,⁶ requires the records of changes to the Aggregate state to be captured in events.

⁶ There are more event-driven patterns than the ones described in this book. Extensive descriptions are provided in our follow-on book, Implementing Strategic Monoliths and Microservices (Vernon & Jaskuła, Addison-Wesley, forthcoming).

It’s helpful to reference Figure 9.5 for the discussion that follows.

Figure 9.5 Event Sourcing is used to persist Aggregate state and to reconstitute it.

The ideas behind Event Sourcing are fairly simple. When a command handled by an Aggregate causes its state to change, the change is represented by at least one event. The one or more events representing the change are fine-grained; that is, they represent the minimal state required to capture the essence of the change. These events are stored in a database that can maintain the order in which the events occurred for a specific Aggregate. The ordered collection of events is known as the Aggregate’s stream. Every time that a change to the Aggregate occurs and one or more events are emitted, it represents a different version of the stream and the stream length grows.

Assuming the Aggregate instance under discussion goes out of scope and is garbage collected by the runtime, if its subsequent use then becomes necessary, the Aggregate instance must be reconstituted. As expected, its state must once again reflect all changes from its first event to its most recent changes. This is accomplished by reading the Aggregate’s stream from the database in the order in which the events originally occurred, and then reapplying them one-by-one to the Aggregate. In this way, the Aggregate state is gradually modified to reflect the change that each event represented.

This approach sounds both powerful and simple, and so far it is. Yet, you must wield this sword carefully, because it can also inflict pain.

Straightforward Except When ...

The challenges arise not with the initial use of Event Sourcing, but primarily for a few other reasons:

▪ When significant design changes occur in one or more Aggregate types

▪ When there is an error in an event in an Aggregate stream

▪ When reconstituting the state of an Aggregate that has a large stream of events

▪ When complex data views must be assembled from the events

Handling design changes involves producing streams that represent either a dividing of one stream or a merging of two or more streams.

An error in an event in one Aggregate instance stream must be patched. And by “patch,” we mean not patching the existing event data in the database, but rather adding a new event, possibly of a different type used only for the patch. This is intended to cause an interpretation that compensates for the error in the reconstituted Aggregate state previously affected, as well as for any downstream consumers of the previous error-containing event. Given an error in one Aggregate instance stream of a given type, there’s a good chance that the same error exists in all instances of that Aggregate type, or at least many of them. The issue is with the code that produced the event. Thus, the code must be fixed and the streams must be patched as described.

When an Aggregate’s state is constituted by a large event stream, the performance when reconstituting its state can be enhanced by employing state snapshots. Such snapshots are taken of the full Aggregate state at specific version intervals, such as every 100, 200, or any number of versions that achieves acceptable performance. When the stream is read for the purpose of reconstituting the Aggregate state, the snapshot is read first, and then only the events that were emitted following the snapshot version are read and applied to the state in that order.

When using Event Sourcing, you should assume that Command Query Responsibility Segregation (CQRS; described in the next section) will nearly always be needed. CQRS is used to project events emitted by Aggregates into views that are queried and rendered on user interfaces, or that are consumed for other informational needs.

None of this is easy, but then again, bugs happen in all software, and migrations happen to persistent state no matter which persistence approach is used or what the chosen database is. There are ways to reduce the pain. It is not within the scope of this book to provide deep guidance on using Event Sourcing, but you can find that information in our follow-on implementation book, Implementing Strategic Monoliths and Microservices (Vernon & Jaskuła, Addison-Wesley, forthcoming).

When the potential pain of using Event Sourcing is understood, it tends to cause a bit of wonderment as to why it would be employed in the first place. There’s nothing wrong with that—it’s generally good to ask why. The worst case arises when Event Sourcing is used without a clear understanding of why and how,⁷ as this application is generally followed by regret and a fixing of blame on Event Sourcing. All too often, architects and programmers make technology and design choices based on intrigue rather than business-driven purpose, and then get in way over their heads in accidental complexity. In addition, architects and programmers are often influenced by technology-driven frameworks and tools produced by vendors that try to convince their market that Event Sourcing is the best way to implement Microservices.⁸

⁸ The assumption that Microservices are good is a weak reason to use them. Microservices are good only when need informs purpose. Vendors who insist that Microservices are always good, and that Event Sourcing is the best way to implement Microservices, are misleading their customers, whether intentionally or not.

⁷ Some of the loudest complaints about Event Sourcing that the authors have heard are the outcome of not understanding it, and thus not using it correctly.

Noting the pain ahead of the gain is an important warning sign for the uninitiated, curious, and gullible.

The good news is that there are very definite reasons to use Event Sourcing. Next, we consider the possible gains while understanding that these might require some pain—but that’s all part of the software patterns and tools experience. They come with a package deal known as trade-offs and consequences, and there are always positives and negatives with each choice made. The point is, if the positives are necessary, then the negatives are necessary, too. Now, consider what is gained from Event Sourcing:

1. We can maintain an audit trail of every change that has occurred in the instances of every given Aggregate type that uses Event Sourcing. This might be required by or at least smart to use within specific industries.

2. Event Sourcing can be compared with a general ledger employed in accounting. Changes are never made to existing entries. That is, only new entries are added to the ledger, and no entries are ever changed for correction. Problems caused by one or more previous entries are corrected by adding one or more new entries that compensate. This was described in the sidebar “Straightforward Except When ….”

3. Event Sourcing is useful in addressing the complexities of specific business problems. For example, because events represent both what happens in the business domain and when they happen, event streams can be used for special time-based purposes.

4. Besides using the event streams for persistence, we can apply them in many different ways, such as decision analytics, machine learning, “what if” studies, and similar knowledge-based projections.

5. The audit trail as a “general ledger” doubles as a debugging tool. Developers can use the series of factual events as a means to consider every level of change, which can lead to insights into when and how bugs were introduced. This help is unavailable when object states are fully replaced by every new change.

It might be appropriate for some Aggregate types in a Bounded Context to use Event Sourcing, while others do not. Conversely, this mixed approach would not be practical if a totally ordered stream of all changes to all Aggregate types is important to the business.

Although it has motivations in technical solutions, Event Sourcing should be employed only for justifiable business reasons and avoided for others. Considering this, only points 1–3 in the previous list of benefits from Event Sourcing have business motivations. Points 4 and 5 do not, but are advantageous when any of 1–3 are satisfied.

CQRS

Users of systems tend to view data differently than users who create and modify data. The system user must often view a larger, more diverse, coarse-grained dataset to make a decision. Once this decision is made, the operation that the system user carries out is fine-grained and targeted. Consider the following examples:

▪ Think of all the patient data that a doctor must view before prescribing a medication: the patient’s vital signs, past health conditions, treatments, and procedures; current and past mediations; allergies (including those to medications); and even the patient’s behavior and sentiments. Following this review, the doctor will record a medication, dosage, administration, duration, and number of refills, probably across a single row on a form.

▪ An underwriter must view data from a submitted application; all research, such as a property inspection or the applicant’s health examination; past claims of losses or health conditions; assessed risks; and the use of all this data to calculate a recommended quotable premium. After considering the entire set of information, the underwriter can click a button to either offer a quote for a policy or deny a policy to the applicant.

This puts the viewable model and the operational model at odds. The available data structures are generally optimized around the operational model rather than the viewable model. Under such circumstances, it can be very complex and computationally expensive to assemble the viewable dataset.

The CQRS pattern can be used to address this challenge. As Figure 9.6 illustrates, this pattern calls for two models: one that is optimized for command operations, and one that is optimized for queries that aggregate the viewable dataset.

Figure 9.6 The Command and Query models form two possible pathways.

The pattern in Figure 9.6 works as follows:

1. The user is presented with a view dataset from the query model as a form.

2. The user makes a decision, fills in some data, and submits the form as a command.

3. The command is carried out on the command model, and the data is persisted.

4. The persisted command model changes are projected into the query model for as many viewable datasets as needed.

5. Go back to step 1.

Figure 9.6 shows storage for the Command model and the Query model as two separate databases. Although such a design makes sense for large-scale and high-throughput systems, it is not truly necessary. The model storage might be only a virtual/logical separation, actually using a single database or database schema. Given that the two models are physically one and that a single transaction can manage multiple writes, this design implies that both the Command model and the Query model could be transactionally consistent. Maintaining transactional consistency saves developers the headaches incurred when the two models are physically separated, but eventually consistent.

When using Event Sourcing, it is generally necessary to also use CQRS. Otherwise, there is no way to query the Event Sourced Command model other than by Aggregate unique identity, which makes any sophisticated queries implemented to render broad viewable datasets either impossible or extremely prohibitive. To overcome this limitation, events emitted from the Event Sourced Command model are projected into the Query model.

Serverless and Function as a Service

Cloud-based serverless architectures are increasingly becoming a force in the software industry. This trend reflects the simplicity and savings afforded by serverless designs. The term “serverless” might seem a bit misleading because the solution obviously requires computers.⁹ Yes, but the term is actually being applied from the cloud subscriber-developer perspective. Cloud vendors provision the servers, rather than the subscriber taking on this responsibility. To cloud subscribers, then, there are no servers, only available uptime.

⁹ As software professionals are constantly reminded, naming is hard, and software names are too quickly chiseled in stone. As with HATEOAS, perhaps consider “serverless” to be a glyph that represents a concept.

To position serverless architectures more explicitly, some have used the term Backend as a Service (BaaS). Yet, thinking only in terms of hosting the application’s “backend” does not adequately describe the full benefits. Serverless architecture is also Infrastructure as a Service (IaaS), where infrastructure is not limited to computer and network resources. More specifically, infrastructure from this perspective includes all kinds of software that application developers do not need to create and (sometimes) subscribers do not need to even pay for. This is truly a leap forward in regard to the age-old mantra, “Focus on the business.” The following are some key benefits of using a serverless architecture:

▪ Users pay only for actual compute time, not for the well-known “always up” provisioned servers.

▪ The significant cost savings are generally hard to believe, but users can believe them anyway.

▪ Other than deciding on required cloud software components, planning is easier.

▪ The solutions use a plethora of hardware and cloud-native software infrastructure and mechanisms that are free or very low cost.

▪ Development is accelerated due to diminished infrastructure concerns.

▪ Businesses can deploy cloud-native modular Monoliths.

▪ Serverless architecture offers strong support for browser clients and mobile clients.

▪ Because they need to create less software, users can actually focus on the business solutions.

The Ports and Adapters architecture discussed in Chapter 8, “Foundation Architecture,” is still useful, and fits quite well with the serverless approach. In fact, if it is carefully architected and designed, there is a good chance that service and application packaging need not change. The greatly reduced overhead in regard to dealing with infrastructure software has great benefits, as previously described. The primary difference relates to how service and application software is architected, designed, and run. The term cloud-native refers to utilizing all purpose-built infrastructure, as well as mechanisms such as databases and messaging, that are themselves designed specifically for cloud computing.

Consider the example of a browser-based REST request from a user. The request arrives at an API gateway that is provided by the cloud vendor. The gateway is configured to understand how to dispatch requests onto the service or application. When the dispatch takes place, the cloud platform determines whether the REST request handler (i.e., endpoint or adapter) and its software dependencies are already running and currently available. If so, the request is immediately dispatched to that request handler. If not, the cloud platform spins up a server with the subscriber’s software running, and dispatches it to the request handler. In either case, from that point onward the subscriber’s software operates normally.

When the request handler provides a response to the client, the subscriber’s use of the server ends. The cloud platform then determines whether the server should remain available for future requests, or should be shut down. The subscriber incurs only the cost of handling the request, and any applicable costs for the hardware and software infrastructure required to do so.

If the entire time to actually handle the request is 20, 50, 100, or 1,000 milliseconds, that’s what the subscriber pays for to run their software. If no requests arrive for a second or two, the subscriber doesn’t pay for that time. Compare that to cloud-provisioned servers, which incur costs just to stay alive and ready, every second of every day, whether or not they are in use.

Function as a Service (FaaS) is a kind of serverless mechanism that supports the software architecture and design for the aforementioned characteristics. Yet, FaaS is typically used to deploy very small components into their operational capacities. The functions are meant to carry out very narrowly focused operations that complete very quickly. Think of creating a single procedure or method within a very large system—that is roughly the scope addressed when implementing and deploying a FaaS.

One difference might be how a request is handled. Thinking in terms of functional programming, a function is implemented with side-effect-free behavior. As was stated in the section, “Functional Core with Imperative Shell” in Chapter 8: “[A] pure function always returns the same result for the same input and never causes observable side effects.” Based on this definition, it could be that the entire state on which a given FaaS will operate is provided as input parameters, requiring no database interaction. The result is determined by producing a new value that is then used by the FaaS, just as a function would work. The input might be an event that was caused elsewhere in the system, or it might be an incoming REST request. Whatever the case, it’s possible that the FaaS might not itself interact with a database, either reading from it or writing to it. Writing to the database would cause a side effect. That said, there is no restriction to FaaS using a database as a procedure or method could for both reading and writing state.

Applying the Tools

Several examples of applying message- and event-driven architectures have already been given in this and previous chapters. The remaining chapters detail the application of specific architectures and patterns explained in Part III. The companion book, Implementing Strategic Monoliths and Microservices (Vernon & Jaskuła, Addison-Wesley, forthcoming), provides exhaustive implementation examples.

Summary

This chapter considered the challenges of using distributed systems and synchronization between subsystems to complete a large use case by completing a number of smaller step-based cases. The message- and event-driven architectures were introduced as a means of carrying out complex, multi-step processes while avoiding cascading failure due to asynchronous messaging and temporal decoupling. Process management using both choreography and orchestration was introduced, along with the differences between these approaches and how each can be used. REST’s role in process management was described as well. Event Sourcing and CQRS were introduced, including their use in message- and event-driven systems. Serverless and FaaS architectures show promise in future cloud computing. Here are action items from this chapter:

▪ Use choreography for decentralized processing of limited multi-step distributed use cases.

▪ Employ the centralized orchestration pattern when complex processes with numerous steps are needed, driving steps across relevant subsystems.

▪ Consider the use of REST-based clients and providers of event-driven notifications when familiar technology approaches are desired.

▪ Use Event Sourcing to persist the stream of event records that represent the state of an entity over time.

▪ With CQRS, separate the service/application command operations and state from those used by queries to display groupings of state.

This completes Part III. Part IV ties this and the two previous parts together to build business-centric Monoliths and Microservices.

References

[Conductor] https://netflix.github.io/conductor/

[IDDD] Vaughn Vernon. Implementing Domain-Driven Design. Boston, MA: Addison-Wesley, 2013.

[Reactive] https://www.reactivemanifesto.org/

[VLINGO-XOOM] https://vlingo.io

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9. Message- and Event-Driven Architectures

Create new playlist

Sign In

Sign Up

9. Message- and Event-Driven Architectures

Message- and Event-Based REST

Event Logs

Subscriber Polling

Server-Sent Events

Event-Driven and Process Management

Event Sourcing

CQRS

Serverless and Function as a Service

Applying the Tools

Summary

References

Table of Contents for
9. Message- and Event-Driven Architectures