Building clean and uncompromised Monoliths is not a pipe dream, and is the best choice for a large number of systems. It’s also not as easy to achieve as 1-2-3. It requires thought, skill, discipline, and determination—much the same mindset needed when creating Microservices.
Note
The running case study introduced in Chapter 1, and carried throughout this book, is continued in this chapter. If some context within the examples seems to be missing, please see previous chapters.
To lead off, we’ll provide a recap of why Monoliths are a good choice in many cases, at least early on, and how Monolithic architectures can be created effectively by using the previously defined strategic toolset. Organizations should learn to architect, design, and construct Monoliths that meet their business strategic goals and that are maintainable and extensible over a long lifetime.
There are three primary concerns and goals with Monoliths:
1. Getting a Monolith right from the start and keeping it that way
2. Getting a Monolith right after it was previously done wrong
3. Getting from Monolith to Microservices
Today, most organizations are pursuing the third concern and goal due to their need to deal with Monoliths done wrong. It’s not the authors’ place to say that the third concern is the wrong goal, but stakeholders should be willing to consider that it might not be necessary or even the best target to aim for. Assuming that a Monolith-to-Microservices transition is the correct ultimate goal, it’s possible that initially focusing on the second concern is the better initial strategy.
Understanding how to succeed when dealing with the second concern is best achieved by observing how to successfully accomplish the first goal. As an analogy, consider people who are trained to recognize counterfeit money. They don’t learn about every possible example of counterfeit money, partly because it’s impossible to know all of those and the unending attempts at new ways to counterfeit. Instead, they learn everything about the authentic currency by practicing: “touch, tilt, look at, look through.” Those trained in these ways are then capable of detecting all sorts of attempts at counterfeiting. They can explain everything that is wrong about the counterfeit and how it would appear if authentic.
Following this reasoning, the first and second listed goals are addressed in that order in this chapter. Chapter 11, “Monolith to Microservices Like a Boss,” delves into the third goal, by examining two distinct approaches to this transition. Before jumping in, it’s fair to provide a brief historical overview starting at around 20 years before this writing.
It’s only proper to provide a historical perspective of the industry in the early 21st century, because it’s unfair to judge past decisions without understanding the influences of the time the decisions were made. When NuCoverage first started on its journey in 2007, the “software is eating the world” declaration was not yet made. That was still four years in the future. Even before 2001, software had taken on a different role due to the dot-com boom. Although the fledgling global Web had not matured far past brochureware and e-commerce storefronts, it was starting to hit its stride.
By 2001, Software as a Service (SaaS) 1.0 had mostly failed due to overambitious business plans and a lack of experience in developing enterprise-grade systems, the expense of operating on premises (because the cloud was not available), and—most especially—desktop suite lock-in. At that time there was buzz around “intranets,” but software was still for the most part an augmentation to human decision making. It was rare to conduct business on the Web, other than to buy from an e-commerce store.
Make no mistake about it: Humans were still carrying a lot of the cognitive load in day-to-day business. Software was responsible for managing the large volume of data that was impervious to human use without augmentation around specific use cases. Business was still being conducted by phone and email. Such business workers had what was sold as software to help deal with their tasks. Commercial enterprise office products, such as SharePoint, were used by low-tech knowledge workers to tuck away a lot of indispensable email. These workers, who possessed business knowledge but far too few technical skills for that sort of responsibility, would hack document management repositories and feed their emails and attachments to these disorganized, quasi–“data warehouses.” After only a few months of archiving preexisting emails along with new ones, the warehouses became nearly unusable, and certainly far less organized than many paper filing systems. They were actually glorified network filesystems with no rules or constraints on hierarchies or naming, where finding things was attempted with poor-quality search tools, resulting in many false positives and few true ones. Businesses could forget about attempting to gain business intelligence from that data unless they undertook a major initiative to perform scrubbing, restructuring, and reorganizing, so as to mold the data into useful pools of potential knowledge rather than a usability crapshoot.
Trial-and-error approaches to “office automation” weren’t isolated to end-user–designed ticky-tacky document management systems. Still, considering the predominant software development and production platforms is also eye opening. From the late 1990s until perhaps 2004, the Java world was ruled by J2EE. Between 2003 and 2006, the Spring Framework began to disrupt J2EE, even becoming mixed together with the facilities of J2EE application servers by application architects.
On the other side of the divide was the .NET Framework, which arrived in 2002. By 2004, claims of .NET overtaking J2EE in popularity were abounding, yet the reality seemed to be somewhat less than that. One of the better .NET improvements over J2EE was to leave out anything resembling Enterprise JavaBeans (EJB). EJB Entity Beans were an utter disaster, and never truly delivered a convincing solution for database entity objects. Clued-in Java developers quickly realized that TOPLink and the later-arriving Hibernate provided a far superior object-persistence experience. Those who stuck with the Entity Bean bandwagon stumbled many times. On the .NET side, the Entity Framework arrived in 2008 but was met with great disappointment. Odd mapping rules forced impractical object designs, and it was several years before these difficulties were overcome without a lot of help from advanced application developers.
Given the state of the tech world around 2007, NuCoverage did well to bridle its ambitions by initially going no further than building software that assisted underwriters in issuing policies and claims adjusters in determining correct loss coverages. There was no Web-facing application submission or virtual handshake upon issuing a policy or covering a claim. This conservative approach got the company out of the gate, where it could make fast strides toward making a profit while enhancing its system as it learned more through experience.
The problem was that the NuCoverage software teams didn’t possess a good understanding of and appreciation for software architecture, or any clue about modularizing their different business capabilities. This gap would have been observable very early on by wiser, more experienced heads. As a result, the team gradually faced the reality of crippling debt and entropy as time advanced. It didn’t take long for the initial lumps of mud to take a toll. Unsurprisingly, the unrecognized and unpaid debt, piling up year after year, led to layer of mud upon layer of mud, until the Big Ball of Mud Monolith illustrated in Figure 10.1 had NuCoverage working on a creepingly slow fix and new-functionality conveyor belt.
With an understanding of software architecture that employs modularization of business capabilities, how could the NuCoverage enterprise have been shaped, even as a Monolith, to be welcoming to many years of change?
Look beyond technologies and frameworks. Throwing SharePoint, Enterprise Java, Entity Framework, JPA, database products, and message buses at the problem domain will never spare teams from the need to think. Even while the crazy technology failures were being thrust at every software IT department, CIO, CTO, and developer, there were reasonable approaches and guidelines that could have actually helped.
Domain-Driven Design (DDD) was introduced by 2003 and the Ports and Adapters architecture by 2004. Even earlier, the Layers architecture was available from the first volume of Patterns of Software Architecture [POSA1]. Extreme Programming [XP] existed well before DDD, as did the Agile Manifesto [Manifesto]. Several ideas found in DDD bear a striking resemblance to XP and the Agile Manifesto. There was a clear reference to organizational (business) capabilities by 2004 [BC-HBR]. Additionally, the ways of object-oriented design [OOD] and programming [OOP], domain modeling [Domain-Modeling], and class responsibilities and collaborations [CRC] were being used long before NuCoverage entered the fray. Several of the ideas, patterns, and tools in existence by then had the opportunity to influence architecture and development in 2007. Yet, that doesn’t mean that they were used, and observing the results of systems developed from the past and in the present, it’s clear that to a large degree they still aren’t.
Don’t underestimate the importance of hiring top software architects and developers. Do top businesses look for bargains in C-level executives? Then why shop for bottom-shelf software developers when serious practitioners can be recruited? Understand who to recruit—that is, the right people with the right skills. Be aware that expertise in software engineering, although different, is as important as that among executive management.
Even though NuCoverage might have favored the idea that greatly minimizing software development costs would be to its advantage, it definitely wasn’t. Taking conservative first steps for an early release of a minimal system doesn’t justify poor-quality architecture and design. In fact, quite the opposite is justifiable. Early sound, tested architecture and design would have prepared the system’s codebase for continued development of progressively advanced functionality, and actually facilitated much more rapid deliveries over the long haul. First and foremost, attention must be given to commissioning architects and lead engineers to guard architecture and code quality with avid, hawklike awareness and enthusiasm, as their unending priority.
Consider what a 2007 reboot of NuCoverage could have yielded. The context of 2007 must be kept in mind when reading the following descriptions. To be even more explicit, the following discussion does not represent the contemporary context—that is, when WellBank approaches NuCoverage. Instead, these events take place well before then, when NuCoverage is in startup mode.
Recall that a business capability defines what a business does. This implies that, while a business might be reorganized into a completely new structure, the new structure doesn’t change its business capabilities. The business, for the most part, still does what it did before the new structure came along. Of course, the restructuring might have been undertaken for the purpose of enabling new business capabilities, but no business capabilities that were profitable the day before the restructuring have been jettisoned by the day after. This reinforces the point that it is generally best to define a software model’s communication boundaries according to the business capability that it implements.
In Figure 10.1, three business capabilities are apparent: Underwriting, Claims, and Billing. They’re obvious choices for an insurance company. Yet, there are other necessary capabilities that must also be discovered and identified by the NuCoverage startup team.
As shown in Table 10.1, the NuCoverage startup team collectively identified their business capabilities. This required walk-throughs of concrete usage scenarios that were understood by unified conversations to develop a shared understanding of the business goals. Each is described in Table 10.1 in terms of its initial purpose and implementation.
Figure 10.2 shows the Monolith container with eight Bounded Contexts modules, each representing a business capability. The business capability type, both in Table 10.1 and Figure 10.2, indicates the level of strategic value of each.
At the present time, the Risk business capability is definitely a Core Domain. Because of the urgency to ensure high business value and integrated interactive operations, there are initially four different business capabilities with core value. Underwriters perform their essential workflows and garner valuable decision-making information while utilizing these four core contexts: Underwriting, Risk, Rate, and Renewals. Not surprisingly, underwriters have a lot to say about their workflows and the guiding environment, treating the four core business capabilities as an underwriting product suite.
Over time, the core value will tend to shift. For example, as Underwriting becomes more automated through continuous improvements in the Risk and Rate functions, Underwriting will transition to a supporting role. The same will be true of Renewals. There will be new Core Domains added.
A few years after the successful launch of the minimum platform and subsequent continuous improvements, another business capability arises—namely, Rewards. As explained in Chapter 5, “Contextual Expertise,” the Safe Driver Reward was the initial policyholder reward introduced. Initially, it was considered a simple value in the Policyholder Accounts capability. Although this was not a favorable long-term decision, it would suffice until business priorities would drive out the additional Rewards capability.
After identifying business capabilities, NuCoverage now needs to make a few architectural decisions. For example, how will users of various kinds interact with the platform? It might seem obvious that a Web-based user interface will be used. Even so, there is a growing need for devices that support employees’ job duties, which require mobility and convenience when those employees are on the go. Is there a sound way to propose, track, and finally determine which architectural decisions are needed and how the user-driven mechanisms and others can be provided by means of eyes-wide-open adoption?
The teams rely on Architecture Decision Records (ADRs) to define, propose, track, and implement prominent architectural choices. Listing 10.1 provides three examples. The ADR approach to decision making was described in Chapter 2, “Essential Strategic Learning Tools.” These examples illustrate pertinent decisions made by the Auto Insurance System teams.
Title: ADR 001: REST Request-Response for Desktop User Interfaces Status: Accepted Context: Support Web-based user interfaces with REST Decision: Use Web standards for desktop clients Consequences: Advantages: HTTP; Scale; Inexpensive for experiments Disadvantages: Unsuitable for most mobile devices
Title: ADR 002: Use Native Device Development for mobile apps UI Status: Accepted Context: Support iOS and Android Toolkits for mobile apps Decision: Use iOS and Android standard toolkits for mobile apps Consequences: Advantages: Native look and feel Disadvantages: Multiple device types, form factors, languages, toolkits; slow development
Title: ADR 003: Platform Message Exchange Status: Accepted Context: Collaborating subsystems exchange commands, events, and queries Decision: Use RabbitMQ for reliable message exchanges Consequences: Advantages: Throughput; Scale; Polyglot; FOSS; Support available Disadvantages: Stability?; latency vs in-memory transport?; support quality?; Operational complexity
Note
In the discussions that follow, there are several references to various architectures and related patterns, such as REST, messaging, and event-driven architectures. See Chapters 8 and 9 for discussions of these concepts.
The eight Bounded Contexts in Figure 10.3 correspond one-to-one with those shown in Figure 10.2. Highlighted in Figure 10.3 are the architectures in use—namely, Ports and Adapters; in Figure 10.2, the emphasis is on the modules used to separate each Bounded Context from the others. Actually, in Figure 10.3, it might look as if each Bounded Context has almost exact architectural implementations, but this is drawn only symbolically. In actuality, the various Bounded Contexts could share some resemblance due to the nature of Ports and Adapters. However, the inside application part of the architecture layer, possibly with a domain model, would be implemented differently depending on the need to deal with different levels of complexity. See Part II, “Driving Business Innovation,” and Part III, “Events-First Architecture,” for more detailed examples.
Note that the displays of the user interfaces are not shown in Figure 10.3 for the sake of simplicity, so it appears as if users are interacting directly with adapters. Also, the users would usually be shown strictly on the left of each subsystem. Here the architectures are “rotated” for the convenience of showing users surrounding the system and playing roles in multiple subsystems.
The results of the ADRs can be identified in Figure 10.3. ADR 001 and its REST request–response architecture, as well as ADR 002 for the user interfaces of mobile apps, are recognized in the user interactions with the Monolith. Furthermore, the outcome of ADR 003 is seen in the exchange of messages on the Message Bus (or broker). It is superimposed through the center of the Monolith to represent the means by which all Bounded Contexts collaborate and integrate inside the Monolith. The NuCoverage software development team was impressed by the early releases of RabbitMQ that occurred in the same year as the founding of NuCoverage.
Essential to understanding and properly implementing this Monolithic architecture correctly is a perfectly clean separation between every Bounded Context and the others. In Figure 10.3, this is represented by all contexts using the Message Bus for inter-context communication. Yet, at the boundaries of each context there must be adapters, as explained in Chapter 8, that adapt all input to and output from each Bounded Context in a way that suits the integration situation. As explained in Chapter 6, it’s vital to recognize any upstream–downstream relationships. When one Bounded Context is downstream from another, the downstream context must translate its own language to those of the upstream context. This will happen before a message (event, command, query) is placed on the Message Bus. Thus, the coupling between contexts is maintained with the proper directionality, which in most cases should be unidirectional.
Don’t conclude that Figure 10.3 insinuates there is only one correct way to communicate between contexts—that is, through reliable (durable messages with at-least-once delivery semantics), asynchronous messaging. The same effect could be accomplished by using inter-context request–response APIs that honor upstream–downstream relationships. Such APIs can even be executed asynchronously, and be fully event-driven.
Even so, if a business chooses to use request–response APIs rather than messaging APIs via a message bus, it must understand that this might impact the smooth transition to future architectural decisions. At such a point, any collaboration and integration between Bounded Contexts will need to deal with failures that are common to distributed systems. As explained in Chapter 11, “Monolith to Microservices Like a Boss,” network, server, and additional infrastructure failures will have potentially catastrophic consequences in the face of naive implementations. Be prepared to implement failure recovery schemes.
Generally speaking, by implementing a message bus/broker, some of such problems can be alleviated “out of the box” because of the temporal decoupling. The benefit derives from designing in latency-tolerance rather than relying on Service Level Agreements (SLAs) that are based on time-limited request–response communications. When many messages have queued in a message bus/broker and require delivery, receivers can be overwhelmed by large bursts of messages within a short time frame. Using Reactive Streams can help here because it supports what is known as backpressure, which provides a way for receivers to set limits on deliveries within a processing window. Of course, it’s important to select a resilient mechanism in the first place. For both message producers and consumers, temporary loss of live connections to the messaging mechanism can occur, as well as other failures such as broker leadership. Overcoming these challenges by means of retries with persisted data, however, is far simpler than trying to deal with the problems associated with accessing several temporally coupled services REST APIs. Because messaging tends to happen out-of-band, users are usually unaware of temporary failures, whereas using the APIs of temporally coupled services most often puts the problem “in the user’s face.”
To be clear, each Bounded Context should own its own database(s), and those must not be directly shared or accessed from outside the Bounded Context. No well-designed context with purposeful architecture should directly share its database with any other system-level contexts or any legacy systems outside that context. This is true of any storage at all, as well as any queuing mechanisms meant only for internal use. The only means of integration with such a context is through its public-facing APIs, which may be by means of REST, RPC, and/or messaging. See Parts II and III of this book for more information.
Sometimes it will not be practical to allocate a completely different database instance for each Bounded Context when creating a Monolith. Depending on the database product(s) in use, there might be ways to create a single database with multiple schemas (Postgres and Oracle). With other database products, using a single database instance but with context-specific tables might be a possible solution. In both of these possible database designs, a critical constraint is to provide unique user accounts and credentials to prevent easy access to context-owned database resources that must be protected from direct use by other contexts. The database resources that belong to a Bounded Context must be rendered as virtually invisible to other resources located outside the context. Due to the potential of several or many Bounded Contexts using a single database, all of the scale and performance consequences of many connections and simultaneous operations may occur.
Inside the Bounded Context in the application layer and in the domain model at the center, strictly employing the following tactical patterns will keep both object and temporal coupling to a minimum:
▪ Modules
▪ Aggregate-Based Entities
▪ Domain Events
▪ Domain Services
The focus of this book is purposely constrained to strategy; it is not meant to discuss full implementation of such a system. Our follow-up book, Implementing Strategic Monoliths and Microservices (Vernon & Jaskuła, Addison-Wesley, forthcoming), describes these and other implementation tools and techniques in full detail.
The timeline presented in this section is different from the one just described. This one began in 2007, but the effort resulted in the entire Auto Insurance System becoming a Big Ball of Mud of frightening size. Now, 14 years later, NuCoverage has to turn this enormous wrong into a right. In this case, the task will be accomplished as Monolith-to-Monolith refactoring rather than a Monolith-to-Microservices transition. Chapter 11 describes the leap from Monolith to Microservices in greater detail.
Chapter 1, “Business Goals and Digital Transformation,” and Chapter 2, “Essential Strategic Learning Tools,” discussed the reasons why software goes wrong. It’s not practical to try to identify every possible reason for poorly implemented software. Most times, the software implementation simply starts out that way. Less often, it starts out well and then drifts into chaos as time changes the team dynamics that instituted sound architecture and effective design. In whatever ways it went wrong, transitioning to right is the new first order of business.
Some of the common ways that component source code exhibits chaos are as follows:
▪ Technical rather than strategic business focus
▪ Undisciplined, haphazard structure (the unarchitecture)
▪ Lack of business-focused modularity; minimal modules that are technically motivated
▪ No unit tests; some large, slow, inter-layer integration tests
▪ Anemic model; CRUD-focused application
▪ Business logic lost in the user interface, and over multiple layers
▪ Large numbers of program source files within a single, technical module
▪ Deeply tangled coupling across many component source files (classes)
▪ Strong, bidirectional coupling across modules
▪ No separation of concerns; single concerns separated across multiple layers (both can exist together)
No doubt this list could go on, but we note that these are the common “big-ticket” items. In other words, these problems carry a high cost in terms of maintaining the existing code and making any attempts to correct the wrong.
First consider a few of the highest-level problems: technical motivation, poor structure, and lack of meaningful modularization. Listing 10.2 provides a typical modularity failure situation.1 This module structure and the module names might not be catastrophic if that structure hosted only a single, well-defined business communication context—that is, the realization of a single business capability. But it doesn’t, and even if it did, this modular structure is nearly useless.
1 Note the meaningless “ais” module nomenclature chosen, which stands for Auto Insurance System. Yet, that’s the least of the naming problems for this system’s modules. Further, architected and designed properly, a business-centric codebase should never require components known as “helpers” and “utils.”
nucoverage.ais.controller nucoverage.ais.dao nucoverage.ais.dto nucoverage.ais.endpoint nucoverage.ais.entity nucoverage.ais.helper nucoverage.ais.repository nucoverage.ais.service nucoverage.ais.util
This is, in fact, the kind of module structure that would likely be used to “organize” the entire Monolithic Auto Insurance System. Think about that: eight major business functions all tucked away in a set of modules that have zero business meaning. There would be quite literally hundreds of component source files (e.g., classes) in many, if not all, of the individual modules. Only tacit knowledge—employing an elephant-like memory—of the contents of each module could lead to survival of the mayhem within. Those not capable of leaning on their own memory will tax those who do with their constant barrage of questions. But really, mayhem? The inter-module coupling in multiple directions wasn’t even mentioned, but it plays a big role in any effort to scoop and reshape meaningful components from a Big Ball of Mud. Yes, mayhem is definitely the right word.
Still, correcting the problems is possible. Where does a team begin to take corrective action, and what steps should be followed throughout the journey?
One thing that likely happens on a daily basis is change; that is, change happens daily just to keep the system running by means of expedient bug fixes. That’s the norm. Although this seems like a relentless foe, it might actually be a friend when more than one kind of change is needed and the team decides to slow down just a bit. As patch fixes occur, the team can take a bit of extra time to initiate and continue the refactorings leading to cleanup.
One of the first means of corrective action is to add tests to the code that must change due to business drivers, such as for making patches and other bug fixes. Create a corresponding test every time a patch or any other change is made. Consider the following steps to be repeated continuously:
Step 1. Add tests to verify the ultimate correctness of code that is being changed, and then fix business-driven bugs.
a. First create tests that fail due to the bug being fixed.
b. Generally at the beginning of such efforts, the best things to test are not stand-alone units, but rather coarse-grained integrated layers. For example, given a Service Layer backed by an anemic model, creating tests against the Service Layer makes most sense because testing an anemic model is nearly useless. Tests against an anemic model would test only that attribute/property setters and getters work correctly, which are almost impossible to get wrong, especially when they are generated by an IDE as is typical. Testing that the Service Layer sets the expected attributes/properties is useful at the early stages of this effort.
c. Correct the code in error so that the test passes. Commit the changes to the primary codebase.
Step 2. After the tests are in place and the fixes are made, immediately experiment with modularity and attempt to move related business logic from the Service Layer code to the model.
a. Is it possible to relocate the code with current bug fixes into a new business-centric module structure? If so, take that step. Test that the refactoring doesn’t cause regression. It might be necessary to add one or a few more tests to be certain. Create new tests for the relocated code by refactoring relevant code out of the previous tests and into the new tests.
b. When a Service Layer fronts an anemic model, locate refactoring opportunities in the Service Layer methods. Typically the Service Layer uses a series of Entity attribute/property setters. These setters can be aggregated into a single, behavioral method on the Entity. To do so, migrate the setter invocations to the new method on the model Entity and have the Service Layer now call that new Entity method. The name of the new method should reflect the Ubiquitous Language in context. Test before and after, using tests on both the Service Layer and the model, to ensure that the components are healing from anemia.
c. If other convenient, quick-win opportunities arise to refactor code that is nearby the code being fixed, jump in and make it right. These efforts could focus on additional modularizing of code and/or factoring the Service Layer business logic into the model. Remain somewhat conservative so that this work doesn’t result in regression. All such changes should always be tested. These quick-wins must not require hours of labor, only minutes here and there.
d. As each change is made and all tests pass, commit the test and main code changes to the primary codebase.
Step 3. When additional changes are required to fix bugs or add features in code that has already received some care, take the opportunity to discover opportunities for more refactoring.
a. Create preparatory tests for all of the following refactorings.
b. Modularize other source files (e.g., classes) related to the ones that were previously changed but were left in the old modules because they didn’t require change at the time.
c. Factor additional business logic from the Service Layer into the model; that is, when the model is anemic, the only chance of finding business logic is in the Service Layer.
d. As each change is made and all tests pass, commit the test and main code changes to the primary codebase.
Step 4. When a Service Layer takes several parameters that will be used for setting data on Entities, refactor all related parameters into their respective Value Object types. Pass the Value Object types into the Entity behavioral methods that were previously introduced upon aggregating Entity setters into a single behavioral method on an Entity.
Step 5. As the codebase becomes more stable and there are fewer bugs to fix expediently on a daily basis, use the time to refactor as defined in the previous steps. At this point the improvements will tend to accelerate because the team has grown in experience and confidence.
Consider a reasonable module structure for the eight Bounded Contexts of the Auto Insurance System built as a Monolith. As seen in Table 10.2, each of the Bounded Contexts has a corresponding module just under the NuCoverage
company identifier. Every submodule under the context-identifying module addresses a specific architectural concern.
In Table 10.2, the submodules of every context module are not shown. Instead, the submodules of the Underwriting context module are shown; they are representative of the common submodules to be found in the other context modules. There are likely to be other submodules, especially within the model
of each context. There might be other submodules inside the infrastructure
module. For instance, if gRPC is used, there would be an infrastructure.rpc
submodule.
Although it might be tempting to immediately break this Monolith into separate executable components, such as Java JAR files or Windows DLL files, it’s probably best to leave the entire system in a single source project and single executable component for a time. The reasons are explained in the next section, “Break the Coupling.”
Working in this way with a gradually but continuously improving codebase for a number of months, and possibly fewer months than anticipated, will take the code from a muddy mess to a much more readable, changeable, and stable version of itself.
A friend of one of the authors, experienced in the construction industry, asserts that it requires one-tenth or less of the amount of time to tear down a building structure in an orderly fashion as was required to build it. The purpose is not merely to destroy the building, but to reuse all of its significant parts. Software construction is not much like the building construction industry, but this assertion might well provide a clue to the orderly refactoring and restructuring of a large system codebase with the intention to significantly reuse its code. Although the codebase has been a liability up to this point, it is possible that existing code is simpler to reshape than it is to newly construct. If it required 10 years to reach a position of deep technical debt, it could require as much as one year for a team to work their way out of that debt by using the stepwise approach to improvement we just described. Assuming a team has the experience to skillfully address this difficult situation, it certainly won’t require 10 years to reach a vastly improved state. At a minimum, one-tenth of the original build time frame is a decent goal to set for this refactoring, and the team could potentially even outperform it by a considerable margin.
There’s still a major challenge to overcome. So far, the team has avoided breaking the strong coupling between most components. That challenge is the most difficult part of successfully chipping away at change to reach a fully refactored system codebase. Breaking the strongly coupled components into loosely coupled or completely decoupled components is hard work. In reality, tight coupling in the Big Ball of Mud is likely one of the biggest reasons for the mysterious bugs that have undermined system quality and plagued the development team. It’s been the most difficult kinds of bugs to track down. What is more, some such bugs in the midst of a wildly confusing Big Ball of Mud might never be completely understood. The root cause of the relentless system failings must change.
Consider what’s happened to this point. All components of a specific type (Entities, for example) that were previously in one very large module are now spread across several contextual modules. Sure, housing components in their contextually correct modules is a desired effect. What isn’t desired is the coupling that still exists between the components. This has been purposely left unaddressed because it’s the most difficult of refactorings to deal with, and it may be very disruptive to a large team of developers. It’s easy to damage the stability of the system and interrupt previously good progress when working through decoupling efforts. That’s the bad kind of easy.
Because there is still coupling between contextual modules, it might be best to use a mono-repo for source code revision control for the time being. Some even prefer a mono-repo on a long-term basis. Initially, this might help reduce the complexity of maintaining dependencies across executable components, such as Java JAR files and Windows DLL files.
As Figure 10.4 shows, the legacy Policy
, Claim
, and Policyholder
were previously under a single overly crowded module known as NuCoverage.ais.entity
, along with all other Entities in the entire system. Now they are all relocated to a specific contextual module and submodule. Good. But examine the lines that indicate coupling between the Entities. Although the coupling was previously the result of major problems, it was nearly invisible to the developers unless they were specifically looking for it. Now it’s started to stick out like a sore thumb. Breaking these couplings apart is tedious work, but it can be readily accomplished.
The primary strategy for decoupling can be summed up in two rules:
1. For components that are coupled but have been relocated into separate contextual modules, make it a top priority to break the coupling during this step. Assuming that there are business rules across contextual modules that require consistency between components, employ eventual consistency.
2. For components that have landed in the same contextual module, prioritize these as least important to decouple for the time being. Assuming that there are business rules across contextual modules that require consistency between components, employ immediate, transactional consistency. Address these as potential candidates for eventual consistency after the first priority decouplings are achieved.
Addressing the first priority requires eliminating inter-context component direct coupling. To collaborate and integrate between contexts, there must be some coupling. Still, the kinds of coupling can be changed and the firmness of attachment greatly reduced.
As Figure 10.5 demonstrates, coupling is reduced in three ways:
▪ There is no longer transactional consistency of components in different contextual modules.
▪ Lack of transactional consistency means that temporal coupling is greatly reduced.
▪ There are no direct object references from components in one contextual module to those in other contextual modules. References are by identity only.
An example of reference by identity only is seen in Claim
, which holds the policyId
of the Policy
against which the claim was filed. Also, an Account
holds one or more policyId
references for each of the policies issued to the policyholder. The same goes for each of the claims filed, with the Account
holding a claimId
for each. There might be other value data held along with the identity, such as a policy type being held along with its respective policyId
.
As discussed in Chapter 5, the Renewals Context requires a Policy
type in its model. The Policy
in the Underwriting Context is the original record of its issuance. Creating a Policy
in the Renewals Context is not duplicating code or wrongly disregarding the Don’t Repeat Yourself (DRY) principle. The Renewals Context has a different definition for Policy
, even though it is related to the original record of its issuance. That’s part of the purpose of Bounded Contexts. The explicit separation acknowledges and protects the differences from a confusing merger of two distinct languages into one language that is wrong for both uses. The best models reflect a team’s ability to sustain elevated communication in a context.
Another tedious challenge is to migrate existing database tables to support the new context boundaries for the new model Entity types. Breaking coupling apart on models will almost certainly require changes to the database.
When using object-relational mapping, usually one Entity references others using foreign keys, both with forward and reverse relationships. The foreign keys constraints across contextual databases/schemas will be eliminated, and very likely present within the same database/schema. Depending on the kind of database in use, one or more columns might be needed to hold identities of associated Entities, or those might be embedded in a fully serialized Entity state. Reference by identity techniques are used both for Entities of the same context and those of inter-context references.
Database migrations are common in enterprise applications, so they should not be viewed as perilous steps. Nevertheless, these migrations must be handled carefully, which further emphasizes the need to be conservative in carrying out these decoupling efforts as gradual, stepwise refinements. As explained in the section “Right from Wrong,” the team should create a separate database, database schema, or access-managed database tables for each newly defined context.
Next, the team will take on the second priority refactoring—that is, breaking up as much of the same-context coupling as possible. Using Aggregate (Entity transactional boundary) rules, decouple all Entities from the others unless they require transactional consistency to meet business rules.
All “helper” and “util” (or utilities) components can quite possibly be entirely eliminated from the legacy system. There should be no need, or very little need, for extraneous components. That lot is often created to house common code used across business components, such as to enforce rules, to validate and constrain state, and to run simple processes. These components should instead be primarily part of the domain model. If they cannot be placed entirely in Entity and Value Object types, then they would likely be amenable to handling by Domain Services. Otherwise, some “helper” and “util” components may remain or become part of the Service Layer (i.e., Application Services).
After all the organization’s determination and vigorous effort was focused on the goal of building a strategically significant modular Monolith, whether from the start or by means of laborious correction, it would be most disappointing for the good results to gradually slide into a Big Ball of Mud. This can happen when those who were committed to the original effort leave the project. Although such a migration is likely to happen to some extent, the organization should try to prevent a mass exodus.
On the one hand, system project departures might happen due to normal attrition. Competent and driven engineers want new challenges, and there might be no way to keep some of them on the team. When this occurs, the most experienced hands must be replaced with others who have similar strengths and skills.
On the other hand, oftentimes management will move the more experienced engineers to new projects, and backfill the hole they leave with less experienced engineers. This tends to happen when the well-designed system is considered to be “done” and enters “maintenance mode.” The problems with this approach are discussed in Chapter 2 in the section, “Getting Conway’s Law Right”—specifically at the bullet point, “Keep the team together.” Considering a system that solves complex problems to be “done” when it reaches version 1.0 or even 1.4 is generally the wrong mentality. Some of the most important insights into ways to differentiate value very likely lie ahead.
The very nature of new innovations in an existing large codebase means that significant architectural and model design changes will need to occur. A domain model concept could be split to form multiple new concepts, or two or more could be joined to form a single concept, each with different transactional boundaries, and take on new characteristics—and that means big changes will occur. These kinds of changes must not be left to less experienced developers without any architecture and design supervision. Getting these kinds of changes wrong could easily be followed by a slide down a slippery slope, and end in a new Big Ball of Mud.
Just as with any investment, watchful care is necessary to keep a strategic codebase in good order. A strategic software differentiator is worthy of continued investment and care to keep its dividends flowing for years to come.
This chapter considered the whys and hows of Monoliths as a choice of architecture. Monoliths should not be mistaken for a Big Ball of Mud. Monoliths are a means to achieve well-architected solutions from the start, or to mold them correctly after they were previously poorly architected. Business capabilities were considered again in this chapter to demonstrate their relationship to software models. Architecture Decision Records (ADRs) can help the team define, propose, track, and implement their major architectural choices. Guidance on how to avoid coupling between components was also presented. Strong coupling generally leads to a tangled Big Ball of Mud—something that should be avoided, and if not, eventually defeated. The chapter finished with advice on how to keep the strategic codebase in good order.
Here are the action items from this chapter:
▪ Not every system requires a Microservices architecture. Monoliths can be a viable alternative and architecture choice for many teams and enterprise situations.
▪ Business capabilities define what a business does, and typically do not change even if a business structure is reorganized.
▪ Understanding the importance of and maintaining a clean separation between every Bounded Context is a key to properly implementing a Monolithic architecture correctly.
▪ Transforming from a Big Ball of Mud to a Monolith requires a lot of business-minded strategic discipline, intelligent tactics, and patience.
▪ Event-driven architectures make coupling explicit, highlighting inter-context dependencies.
▪ Beware of maintenance mode, because it is often a trap that results in overlooking the strategic differentiating value that is still ahead.
Chapter 11 explores a few options for transitioning from Monoliths to Microservices. The first option explore how to get from a well-modularized Monolithic architecture to Microservices. The second option coerces a Big Ball of Mud Monolith to Microservices. The first option is relatively straightforward—but don’t plan on the second option being anything other than very difficult.
[BC-HBR] https://hbr.org/2004/06/capitalizing-on-capabilities
[CRC] https://en.wikipedia.org/wiki/Class-responsibility-collaboration_card
[Domain-Modeling] https://en.wikipedia.org/wiki/Domain_model
[Evolutionary] Neal Ford, Patrick Kua, and Rebecca Parsons. Building Evolutionary Architectures. Sebastopol, CA: O’Reilly Media, 2017.
[Manifesto] https://agilemanifesto.org/
[OOD] https://en.wikipedia.org/wiki/Object-oriented_design
[OOP] https://en.wikipedia.org/wiki/Object-oriented_programming
[POSA1] https://en.wikipedia.org/wiki/Pattern-Oriented_Software_Architecture
[XP] http://www.extremeprogramming.org/rules/customer.html
35.170.81.33