11
Intermission: An insurance policy for free

This chapter covers

  • A broken system where no parts were broken
  • Context mapping to understand what’s going on
  • Risk of myopic views of microservices

So far, we’ve covered lots of different ways to use design to make software more secure. We’ve collected designs from different areas, like cloud architecture, Domain-Driven Design (DDD), and reactive systems, where security wasn’t the original focus. The nice thing is that all these designs can be used in ways that increase security as a beneficial side effect. All in all, we’ve covered a lot of ground, and we’ll soon turn to applying these fundamentals to some different scenarios, such as legacy systems and microservices architectures. But before doing that, let’s take a quick break and look at how a system of systems can break without any specific part being broken. We’ll do that by examining a case study of a real-life system.

If you’re in a hurry, you can safely skip this chapter. Otherwise, hang on, it’s a fun story with some interesting details. This is a real-life story about how an insurance company came to give away policies without payment. It’s also about how that disaster could have been avoided.

Like many companies today, the company in question decided to split its monolithic system into several smaller parts, changing the architecture to more of a microservices style. Splitting the system was probably the right thing to do at the time, but some subtle yet important points were missed when the systems were developed separately. During this development, the meaning of the term payment shifted slightly, and separate teams came to interpret it in different ways. In the end, some systems thought the company had been paid the premium, when it had not.

One way to avoid this disaster would have been to model the different contexts more consciously and make the effort to draw out the context map in combination with active refactoring to more precise models. To make this happen, it’s important to bring together experts early on from all the adjacent domains and have them discover subtle issues together. This becomes even more important when developing a microservices architecture, where a fragmented understanding can have severe consequences.

We’ll take a brief look at a possible solution here and dive more deeply into these topics in later chapters (especially chapter 12 on legacy code and chapter 13 on the microservices architecture). This story serves as an introduction to those ideas, however.

Let’s start with the story of a good old-fashioned, brick-and-mortar insurance company and how it began its digital journey. The name and location of the company have been withheld for obvious reasons, and the details of the court case have been changed so that it can’t be traced.

11.1 Over-the-counter insurance policies

We’ll start at the beginning. The insurance company has been in business for quite some time. Historically, its main business has been providing insurance policies for real estate (housing as well as business) and cars. The company has always worked locally with a branch office in every city in the part of the state where it conducts business. All business is done over the counter. When a new customer signs up for a policy, they sign a document at the office and pay over the counter, and the company mails a policy letter proving a valid policy. Likewise, when a customer renews an insurance contract, they show up at the office and pay over the counter, and a policy renewal letter is mailed. In recent years, the back-office system has started using a print-and-mail service so the personnel at the branches don’t need to worry about printing and mailing. The system automatically mails the policy letter as soon as the payment is registered (figure 11.1).

figure11-01.png

Figure 11.1 When a payment is made, the insurance system mails a new policy letter.

Under the hood, the system couples a Payment with a new PolicyPeriod, which is what triggers a new mailing. This is all part of one codebase, developed and deployed as a monolith—but that’s soon to change. If you were to make a context map at this point, it’d look something like figure 11.2.

figure11-02.png

Figure 11.2 A context map showing the one single context of the monolith

Granted, this isn’t much of a map right now, but we’ll see how it evolves as our story unfolds.

11.2 Separating services

The team that develops the system grows over time. Many different policy plans need to be supported, and more are being added at regular intervals. At the same time, there’s a lot of functionality around finance: keeping track of payments from customers, reimbursements, payments to partners, such as car repair shops, and so on. Even though the team has grown, they feel overwhelmed by the amount of functionality to develop and support. Because of this, it’s decided to split the team and the system into two parts: one system for finance and one system for policies.

The finance team will keep track of payments, handle contracts with partners, and deal with reimbursements. The policy team will concentrate on the ever-increasing plenitude of policy variations. In this way, each smaller team will be better able to focus on its domain. If you draw the context map again now, you’ll see that it’s slightly more interesting than before the separation (figure 11.3).

figure11-03.eps

Figure 11.3 Payment for an insurance policy, across the two domains of finance and policy

The transition to separate the monolith into two systems goes pretty smoothly, perhaps because everybody is still familiar with the full domain and pretty clear about what’s a Payment and what’s a PolicyPeriod. But that’s also soon to change.

These systems depend on each other in many ways. One of the main connections is that when a Payment is made in the finance system, that event is reported to the policy system, which reacts by creating a new PolicyPeriod and, subsequently, automatically prints and mails a policy letter. It’s clear that the policy system reacts when the finance system registers a payment. The policy system is aware of Payment as a concept, but the team doesn’t need to understand the intricacies of how such a payment is made.

As time goes by, the teams drift further and further apart. Keeping track of different financial flows seems to be work enough in and of itself, especially with the company signing new partner programs with both car repair shops and craftspeople, such as plumbers and carpenters. Keeping track of unusual policies is also a full-time occupation, with salespeople inventing combo deals and rebates to attract and keep customers. The two teams spend less and less time together and get less and less insight into each other’s domains. On a few occasions, they even get hostile when one of the teams changes some event that the other subscribes to and breaks the other team’s code. Still, things work out. But that’s also soon to change.

11.3 A new payment type

The organization continues to grow, and there are now two product managers: one for the policy system and the team working on it and one for the finance system and the team working on that. Each of the product managers governs their own backlog, and they communicate little. Management takes the lack of communication as a good sign—obviously, it has managed to cut the development organization into parts that can work independently of each other. But now a new payment type is introduced, and a fatal mistake is made.

At the top of the finance team’s backlog is a story to allow a new way of paying through bank giro instead of cash over the counter. Bank giro is a method of transferring money where the customer instructs their bank to transfer funds from one of their accounts to the insurance company without writing a check. The customer doesn’t even need to know the insurance company’s bank account number; instead, they use a special bank giro number. The insurance company can restructure its bank accounts or even change to a different bank without the customers needing to know or care.1 

The businesspeople at the insurance company settle the deal with the bank that provides the giro service. The finance system product manager adds a story called “implement bank giro payment” high on the backlog. Within a few sprints, it’s time for implementation, and the finance development team is provided with documentation on how to integrate with the giro payment system at the bank. They learn that with giro payments, there are three different messages that they can retrieve from the bank: Payment, Confirm, and Bounce. The documentation for Payment states that a giro payment has been registered. The finance team starts fetching these messages from the bank. When they receive a Payment message from the bank, they consider it to be a Payment in the finance system. Figure 11.4 shows what this looks like.

figure11-04.eps

Figure 11.4 A Payment message arrives from the bank and is interpreted as a Payment by the finance system.

It seems logical that a payment is a payment, but it’s crucial to pay attention to context, as the teams will find out. Doing system integration by matching strings to each other is a dangerous practice. Just as an order in the military isn’t the same as an order in a warehouse, the finance team will eventually realize that something that’s a payment in one context shouldn’t automatically be interpreted as a payment in another context. But to be honest, the documentation doesn’t help much (table 11.1), unless you already know the ins and outs of a bank giro transfer.

The finance team is careful not to disturb the integration point with the policy system. As we mentioned, there were some debacles when one team broke the other team’s system. The finance team takes care to send Payment messages to the policy team in the same way as before.

Table 11.1 Bank giro payment process
MessageDocumentationSpontaneous interpretationWhat it means
PaymentGiro payment has been registeredOK, payment is ready.No money has been transferred yet; we’ve registered that we’ll try to transfer it.
ConfirmConfirmation of payment processingOh? OK, whatever.Money has been transferred.
BounceConfirmation still pending, will try againNo worries…Failed to transfer money; if remaining attempts are zero, the failure is permanent.

In their system, the policy team continues to listen for Payment messages from finance. When they receive such a message, they create a corresponding PolicyPeriod, which triggers a new policy letter being mailed to the purchaser. They don’t know whether it’s a cash payment or a bank giro payment—and that’s the beauty of it, isn’t it? They can’t know and don’t need to know—separation of concerns in action. But there’s a catch. If you now draw the context map again, as shown in figure 11.5, you’ll see all three contexts and how they have been mapped to each other: the external bank context, the finance context, and the policy context.

figure11-05.eps

Figure 11.5 Mapping of a bank giro payment through the three domains: bank, finance, and policy

You might be familiar with both the bank giro domain and the insurance domain. In that case, you’ll see the subtle mistake that was made. A bank giro Payment from the external bank is mapped to a Payment of the internal finance system, which is then mapped to a PolicyPeriod. It seems natural enough—a payment is a payment. But due to two subtleties, one in the insurance domain and one in the bank giro domain, this approach isn’t sound.

The subtlety in the insurance domain is that insurance policies aren’t like most goods. If you buy a necklace, a seller might agree to get paid later. If you don’t pay on time, the seller can cancel the purchase and request to have the necklace back. This is customer trade practice. But for some goods, it doesn’t work (figure 11.6).

figure11-06.eps

Figure 11.6 For some goods, it makes sense to take them back if payment isn’t made; for others, not so much.

An insurance policy is a kind of good where it doesn’t work to cancel the purchase if the buyer doesn’t pay. The buyer will have already enjoyed the benefit of the policy in the meantime; you can’t take it back. Selling an insurance policy is like selling a lottery ticket; the seller must ensure the buyer pays for the ticket before the drawing, because few would afterward bother to pay for a ticket that turned out to be a loser. In the same way, who’d pay for a car insurance policy after a period when there were no accidents?

An insurance company (or a lottery ticket seller) could accept payment from a trustworthy, recurring customer in the form of an outstanding debt—a bond, debenture, or similar. In doing so, it would be trusting the customer to clear the debt later. But most insurance companies, including the one in our story, require customers to pay the money before the policy goes into effect. This leads to the second subtlety, the one in the bank giro domain, where a payment isn’t a payment (see figure 11.7).

figure11-07.eps

Figure 11.7 A bank giro payment isn’t what you expect it to be.

As you might recall, the Payment message from the bank means a giro payment has been registered, but this doesn’t mean that the money has been transferred. In bank giro lingo, it signals that a request for the payment has been made, and it’ll be processed at the appropriate time (for example, during the coming night’s large batch-job window). When the money is transferred, the bank giro payment is said to be complete and can be confirmed (hence the message Confirm). If the transfer can’t be completed, perhaps because of lack of funds in the paying account, the giro payment is said to have bounced and will typically be retried a couple of times (hence the message Bounce).

You’ll soon see how these two subtleties together can make for a disastrous combination—a customer who enjoys the protection of a car insurance policy before paying for it. But in our story, the finance and policy teams, the policy holders, and the company at large are still in ignorant bliss about future events. The company rolls out this new payment method.

11.4 A crashed car, a late payment, and a court case

Existing customers start using the hassle-free payment method, and new customers sign on. Things go fine until one day when a customer makes a claim on his car insurance policy. To provide evidence, the customer shows an insurance letter that was mailed to him (figure 11.8). That letter has an extended date to cover a new period. The problem is, he didn’t pay his fee for this period.

figure11-08.png

Figure 11.8 A customer with a valid policy letter

What had happened was that the bank giro payment was registered in due order. But on the payment date, the customer didn’t have sufficient funds in his account, so the bank giro withdrawal was rejected. During the following week, there were a few follow-up attempts, but as there were never enough funds, the payment was never completed. Still, the policy system sent him a policy renewal letter. Happy not to pay, the customer let this continue month after month. Well, that’s until he crashed his car. Then the customer hurried to pay the outstanding debt, after the fact. What happened wasn’t anything strange. The system worked as designed.

For customers who choose to pay by bank giro, when a policy comes up for renewal, the finance system issues a new payment request, which is sent to the customer’s bank. On receiving this request, the bank sends a Payment message back to the finance system. The finance system treats this in the same way as a cash payment over the counter because they are conceptualized as the same thing. When it receives the Payment message, it sends it on to the policy system, which reacts by prolonging the policy period and sending out a renewal letter to the policy holder.

The interesting thing is what didn’t happen. Because there weren’t sufficient funds in the policy holder’s account, there never was a transfer and never a Confirm message. But because the finance system only listened for Payment messages, the missing Confirm went unnoticed. What happened was that the bank system issued a Bounce{remaining_attempts: 3} message, saying that it wasn’t able to do the transfer but would try again later, three more times. The finance system could safely ignore those messages until there was a Bounce{remaining_attempts: 0} message, meaning that the bank had finally given up on its attempts to draw money from the customer’s account.

When the insurance company first started accepting giro payments, the bounce-and-give-up scenario was completely ignored. There were other (but cumbersome) manual processes in place that would catch these cases. Later, the finance system was developed to detect these situations and put the customers on a watch list. The company then contacted the defaulting customers, starting with mailing out a reminder. Unfortunately, the functionality to do this was seen as a pure financial issue. The policy system never learned about this final Bounce and continued to believe the customers had paid.

And here you see the glitch. Even though we use the word payment for both things, a bank giro payment isn’t the same as a cash over-the-counter payment. In the latter case, the insurance company gets the money in its possession immediately. But in the former case, the insurance company doesn’t get the money until the giro payment is processed by the bank. If there’s no money in the account after three retries, then no money is transferred, and the insurance company won’t get paid, but the policy system will nevertheless have sent out a new policy letter.

The company claimed that the owner of the crashed car wasn’t entitled to compensation—he hadn’t paid his bill on time. The policy letter that was sent was due to a bug in the system. And paying the fee after a crash didn’t count as a valid payment; it was clearly an attempt to try to cover himself. Although the customer had finally made a payment after the incident, this didn’t entitle him to backdated coverage for the period during which he had not paid for the policy. On the other hand, the car owner claimed that he had a valid policy letter on the day of the crash, and that he had fulfilled his monetary duties by paying. Neither party would budge, so the case finally ended up in court.

In the trial, the judge ruled in favor of the policy holder. He interpreted the policy renewal letter as proof that the company had accepted a continued agreement: if the payment hadn’t yet been made in full, then the company had clearly accepted payment in the form of an outstanding debt. Legally, the company was bound by the issued policy letter.

We can only speculate how the judge would have interpreted the situation otherwise, but it stands to reason that had there not been a renewal letter, then the company could’ve argued that it didn’t accept an outstanding debt as a payment. Most probably the ruling would have gone in favor of the company instead. The essence here is that the way the company did business de facto defined how its intentions were interpreted legally (figure 11.9).

figure11-09.png

Figure 11.9 What a system does can be interpreted as what it’s intended to do.

Even if the conceptualization that put bank giro and cash payments on par was a mistake, it was the way the company did business and, therefore, was interpreted as intentional. The court didn’t care whether issuing the policy letter was a bug, a mistake, or a bad business decision. The company had acted as if it treated a bank giro payment (order) as a cash payment, and it needed to stand by that.

11.5 Understanding what went wrong

This situation, where the company hands out policies when it shouldn’t, is clearly a bug. But where is that bug located? In fact, going back to the individual systems, none of them does anything that’s unreasonable according to its domain. The bank giro system does what it should—arguably, it has some strange namings, but what domain doesn’t? We might claim that the finance system contains a bug because it takes something that’s not a completed money transfer (a bank giro payment) and considers it to be a payment from a customer. From a strict financial domain perspective, this is perfectly reasonable. Payments can be made in many forms, and cash is only one of them; in many contexts, it’s normal to accept a payment in the form of an IOU (I owe you) note or some other outstanding debt. The policy system also does exactly what it should. It detects a policy payment and reacts by issuing a new policy.

It’s hard to claim that the bug is in any one of the system integrations (for example, the integration between the external bank and the finance system). A registered but uncompleted bank giro payment can certainly be interpreted as the customer having declared the intention to pay, and the company, having accepted this, being in debt until the money is transferred. It takes gathering our collective understanding of the subtleties, looking at all three domains at the same time, to see that this situation isn’t sound.

11.6 Seeing the entire picture

Avoiding this situation would have required collaboration and communication between people with expertise in all three domains: bank giro transfer, finance, and insurance policies. Let’s take a closer look at what went wrong and what could have been done differently. How could we have ensured those people talked to each other, sooner (preferable) or later?

The focus on not breaking the technical dependencies was clearly a hampering factor. It encouraged the finance team to reuse the same technical construct, Payment, even though its meaning had diverged: the word payment had come to mean both immediate payment over the counter and a payment request sent to the bank for clearance. But in their unwillingness to disturb the policy team, the finance team continued to use the term payment for both cases. The sad part is that this is exactly what messed things up for the policy team, because to them, the two types of payment weren’t the same. To the policy team, immediate payment over the counter was a sound foundation for issuing a policy renewal letter, but a payment request sent to the bank for clearance wasn’t.

What should have been done instead? The obvious answer is that the finance team should have listened for the bank giro message Confirm instead of Payment. The Confirm message marks that a transaction has been completed, which is what the insurance company regards as a sound foundation for issuing a policy. But, how would that happen? What would have caused the finance team to do a different mapping?

Let’s play out a different scenario. Suppose you’re the project manager or technical lead in the finance team at the time the new payment option is introduced. You decide to guide the team in implementing it using deliberate context mapping. You stop to think about what the team thus far has called a payment. The concept was specific enough when cash payments were the only payments that existed, but now that you’re about to add giro payments, this is no longer the case.

After some discussion with your team, you muster up the courage to refactor the domain and call the original type of Payment a CashPayment instead, because that’s what it is. In the best interest of both worlds, you make a note that you need to talk about this with the policy team that sits downstream, and who’ll need to handle the name change. And if you forget, or if it doesn’t occur to you to talk to the policy team, perhaps you don’t know that they listen for that specific Payment message? After you refactor Payment to CashPayment and deploy the change to production, it’ll only be a matter of time until someone from the policy team approaches you and asks what has happened to their expected Payment messages. If nothing else, this will cause a discussion to happen; now you’ll have to rely on your diplomatic skills! Jokes aside, it’s not desirable to break a technical dependency. But if that’s what it takes to ensure that a crucial domain discussion happens, then it’s worth it.

Back to the story: now that your team has renamed Payment to CashPayment, you can draw the context map of the finance domain and the neighboring domains that it’s about to integrate with: bank giro and policy. The map will look like figure 11.10.

figure11-10.eps

Figure 11.10 A map of the three domains after renaming Payment to CashPayment

Looking at the map, you can consider what should be mapped to what. It’s pretty obvious that there’s nothing in the finance domain that a bank giro payment can be mapped to—a CashPayment certainly doesn’t suit. There’s the temptation to abstract CashPayment to something that could accommodate a bank giro payment as well. But you fight that urge because it’s much better to first gain a deeper insight and then do the abstractions than it is to make a premature abstraction that might be less insightful.

Preferring to be specific first, you add a new type to the finance domain, GiroPayment. But you’re still stuck on what it should map to. The Payment in the bank giro domain certainly looks like a good candidate, but you have limited insight into the subtleties of bank giro payments and risk jumping to conclusions.

You now feel that you have moved as far as you can on your own. Your context map looks like figure 11.11, but you can’t make the mapping between the bank giro concepts and your newly created GiroPayment.

figure11-11.eps

Figure 11.11 A map of the three domains with the newly created GiroPayment. What should it map to?

You decide it’s time to meet with experts in all the affected fields. You invite domain experts from the policy team and from the finance department (or the bank), one of whom is knowledgeable about bank giro payments, to a small workshop. What you’re looking for at this time is the deliberate discovery of how the external bank giro domain maps to your finance domain, taking the policy domain into account. This isn’t a trivial thing, and therefore you deliberately set the stage to support these discoveries.

The sound of a deliberate discovery is when an expert in one domain says “Oh, OK, I get it” to an expert in another domain when gaining insight into that person’s domain. On the other hand, the sound of a late discovery might be “Aaahhhh, no, that can’t be true!” uttered by someone who spotted a fundamental mistake close to a deadline and will need to work day and night to fix it (see figure 11.12).2 

figure11-12.eps

Figure 11.12 The difference between early deliberate discovery and late ignorant discovery

To see how deliberate discovery might play out, consider the following hypothetical conversation between the policy expert, Polly; Ann from finance; and Bahnk, who really knows banking systems.

“What happens when we get a new payment?” asks policy expert Polly.

“Well, I guess we get a payment through giro,” says Ann from finance.

“That’s when we get a Payment message from the bank?” Polly enquires.

“Yep, that’s when a payment is registered,” says Bahnk, who’s the expert on the bank systems.

“OK, so we have the money, and we can issue a new policy letter,” Polly concludes.

“Wait. I didn’t say we have the money,” Bahnk protests.

“Yes, you did,” Ann challenges, perplexed.

“No, I said the payment was registered, not that it’s confirmed,” Bahnk explains.

“Does that mean we haven’t got the money?” Polly wonders out loud.

“Right. The bank has just registered the request to make a payment; the money hasn’t been transferred yet. It’ll probably transfer at the next nightly batch run if there are sufficient funds in the sending account,” Bahnk clarifies.

“Oh, but that shouldn’t be a problem,” says Ann from finance. “It means that the customer will owe us money until the money is transferred. It’s still a transfer of assets. It will do.”

“No, it won’t!” Polly protests vividly, airing her policy expertise. “We can’t start a policy until we are really paid-paid, not just promise-to-pay-paid.”

“In that case, it’s the Confirm message you should wait for,” says Bahnk.

“Oh, OK, I get it,” concludes Ann, who has learned something subtle about the interaction between the domains of bank giro and insurance policies.

During this meeting, you manage to facilitate a deliberate discovery about how the domains should map to each other. The new domain mapping will look like figure 11.13.

figure11-13.eps

Figure 11.13 A map of the three domains with complete mappings

At this point, you’ve developed a deep knowledge about the business. You understand that for a policy to be renewed, there needs to be confirmation that the money has been transferred. You are now ready to define your abstractions—something you earlier decided to defer until you had a better understanding. One option might be to add a new abstraction, PolicyPayment, which is created when the finance system has received a payment for a new policy period. For a CashPayment, this happens immediately because you know the amount has been paid in full over the counter. For a GiroPayment, it occurs only when the finance system receives a Confirm message from the bank indicating that the payment has been completed. The policy system would then listen for this new message type instead of the old Payment messages, and create a new PolicyPeriod when it receives a PolicyPayment message.

The key takeaway from this story is that none of these systems were broken if you looked at each domain in isolation. But the holistic effect was that a subtle mistake was introduced—a mistake that had serious consequences and at a different scale could have become catastrophic. The remedy is to not rely on expertise about a single system or a single domain when you want to ensure security. Instead, bring together experts from all the adjacent domains to gain a rich understanding.

11.7 A note on microservices architecture

Finally, we’d like to set this story in the context of a microservices architecture. In this story, there were only three systems involved. In a microservices architecture, there might be several hundred systems, each a service and each a domain. Many times we’ve seen the microservices architecture sold with the promise that when you need to make changes, you can do so surgically within one single service. Often this is coupled with the promise that if the service is maintained by a team, that team doesn’t need to disturb (talk to) any other team. We think that this is a dangerous misrepresentation!

Don’t misunderstand us. We’re not opposed to using microservices; in fact, we think it’s a good idea. We’re opposed to the misconception that you can safely make myopic changes to a single service without taking the holistic picture into account. Thankfully, you seldom need to take all the services into account, but we definitely recommend that you always have a look at the neighboring services when you make a change.

Summary

  • Do deliberate discovery early to get deep insights into subtle aspects of the domain.
  • Start specific, then abstract later.
  • Collect expertise from all adjacent domains.
  • Refactor names if they change semantics, especially if they change semantics outside the bounded context.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.21.5