Chapter 3

Mission Command

Decision Superiority—better decisions arrived at and implemented faster than an opponent can react, or in a non-combatant situation, at a tempo that allows the force to shape the situation or react to changes and accomplish its mission

US Joint Chiefs of Staff, Joint Vision 2020

Leading people is never easy. A good leader is expected to be both a guide that helps identify what needs to be done and a source of support to ensure that both the individual and team are successful.

There are many ways that managers take on this challenge. Some are very hands-off, letting their team self-organize and only getting involved to provide occasional feedback and guidance. Most managers, however, take a far more traditional command and control approach, directing what everyone does and monitoring how closely work is performed to the expected standard. Either approach works fairly well when the delivery ecosystem is predictable, as it is far easier for someone to eventually figure out and correct for any problems.

As delivery ecosystems become increasingly dynamic and unordered, these approaches start to break down. The right thing to work on one minute can abruptly not be in the next, leaving managers and people used to traditional management approaches feeling a loss of control, unsupported, or, worse, set up for failure.

Finding an effective management approach that is unhampered by such dynamics and can actually help teams learn and adapt far more quickly than in more traditional management structures is not only possible, but has been actively used for hundreds of years. It focuses on intent and the outcome desired rather than what work is done and how it is performed.

The approach is called Mission Command. It emerged from a realm that most associate with top-down rigidity, the military, and is an integral component to outperforming your enemy in John Boyd’s OODA loop, as discussed in Chapter 2, “How We Make Decisions.”

The Origins of Mission Command

Images

Figure 3.1
“There is no way that Napoleon will defeat our superior army!”

Unpredictability in the battlefield has long been a challenge for army commanders. The chaos of war can easily interrupt, delay, or damage not only a commander’s ability to communicate but also his or her ability to understand what is going on enough to direct troop action.

Technological improvements have done little to help. Even with the invention of wireless radios and drones, the time it takes to observe, process, and relay complex situational information, have it received, understood, and considered on the other side, and then for a decision to be made and codified in suitably detailed actions, relayed back, understood, and acted upon is just too long to be workable. There is simply too much nuanced information to communicate, and even if it all could be relayed, the sheer quantity would, much like the alerts of an overzealous monitoring system, overwhelm central command.

All of this friction creates a cloud of uncertainty that compromises the entire decision cycle, rendering commanders incapable of keeping up with events while preventing subordinates closer to the action from taking the initiative in exploiting any momentary opportunities exposed to them.

Some military leaders, such as Alexander the Great and the Mongols, found ways of not just overcoming such chaos but also using it to their advantage. As a result, they repeatedly dominated the battlefield by using rapid and unpredictable change as a weapon to defeat far superior adversaries.

Despite knowing of those successes, as well as the fact that the victor in a battle rarely wins by destroying the physical fighting capacity of the enemy, most military organizations hung onto rigid traditional top-down organizational techniques. It was not until the 19th century, in the wake of the brutal defeat of the Prussian Army by another such military leader, Napoleon, that serious efforts were made to codify the techniques that Alexander and others had so lethally wielded.

Learning How to Lead Effectively the Hard Way

During the Napoleonic Wars, the Prussian army was one of the most powerful in Europe. It was larger, more experienced, and better equipped than any rival. Despite all of that, it repeatedly found itself outmatched in the battlefield against Napoleon’s far smaller decentralized forces. At the battles of Jena and Auerstedt in 1806, for instance, Prussian forces outnumbered the French forces nearly 2:1 and yet were soundly defeated.

It became clear that the traditional centralized command structure and rigid planning of the Prussian forces was degrading its ability to effectively deploy and lead troops in the battlefield. Rather than take the initiative, troops were expected only to execute what they were ordered to do. With battlefield conditions changing rapidly and no way to effectively communicate their subtleties to commanders to understand and adjust orders effectively in response, the Prussians found themselves continually outmaneuvered and unable to exploit their superior capabilities.

The top-down approach also completely disregarded the skills and experiences that the Prussian troops brought with them. This further constrained the speed and potential effectiveness of Prussian decision making, all at a time when such capabilities were needed most.

Over time, Napoleon came to believe that his success was due to his own tactical brilliance and not the freedom he gave his troops to decide and act in the field. As a result, he slowly succumbed to the lure of more traditional centralized command as he struggled in the freezing battlefields of Russia, where his fortunes turned. This did not go unnoticed by the Prussians. Noting that a significant factor in Napoleon’s early battlefield successes seemed to be tied to devolving authority to his officers in the field who could adapt quickly to changing battlefield conditions, the Chief of the Prussian Army’s General Staff, General David Scharnhorst, noted that Napoleon put together a reform commission to overhaul decision making throughout the chain of command. Carl von Clausewitz, who later wrote On War, participated on the commission and introduced the concept of the “fog of war” to describe how changing conditions and incomplete information can affect decision making.

Scharnhorst and the commission decided the best way forward was to train staff officers to act decisively and autonomously in the heat of battle. They soon developed techniques and military schools to teach them to a professional officer corps chosen on merit alone. Later, Helmuth von Moltke took over as Chief, expanded upon the commission’s work, building the core of the doctrine of Auftragstaktik, or what we call Mission Command.

Managing Through Unpredictability

The best way to understand the thinking behind Mission Command and the importance and interplay between its various components is to understand how its approach to unpredictability differs from other management methods. To do this, let’s start by introducing the potential sources of unpredictability in your ecosystem and how other management methods attempt to tackle them. These potential sources are flawed knowledge and awareness, misalignments, and misjudging ecosystem order and complexity.

Knowledge and Awareness Weaknesses

Images

Figure 3.2
Little did Bill know, his remodeling project was about to take a turn.

Not everyone in the delivery ecosystem is likely to always possess all the relevant knowledge and awareness needed to make the best decision. Sometimes they might lack the relevant skills and experience, or be unaware of some key aspect about the dynamics and interdependencies within the ecosystem to identify and suitably solve the situation they find themselves in.

Like most centralized command structures, most management approaches attempt to tackle this problem by trying to separate the duties of decision making from those of performing most of the actions required to execute the decision. Being on top of the team hierarchy and peers with managers of other functions, it is assumed that managers should be best placed with enough bandwidth to understand what is going on across the ecosystem. This allows staff to be treated much like any tool or material, with each possessing known capabilities and capacities that can be applied by managers to complete defined tasks using approved methods.

The problem with such approaches is that they count heavily on managers maintaining sufficient situational awareness for everyone, all while separating them from directly witnessing many of the details that will likely be uncovered which could improve future decisions. As ecosystems become larger and more complex, the ability to capture and stay sufficiently aware of all aspects necessary to make effective decisions becomes increasingly difficult. Dividing work based on skills often exacerbates this, leading to the fragmentation of work across more specialized teams. As a result, the work, and often the reward and recognition structure for performing it, becomes separated from the target outcome. Not only does this make it more difficult for the outcome to be achieved, it removes much of the incentive to share information that might not be directly relevant to the work of function that discovers it but provides critical insight necessary to achieve the target outcome.

Finally, such approaches also overlook the fact that users and customers are also important actors in the ecosystem. Failure is just as likely if users and customers do not know enough about how to use your services or how they will meet their expectations. To them, a solution that is impenetrable to use or seems irrelevant can be worse than no solution at all.

Misalignments

Images

Figure 3.3
Misalignments.

The second source of unpredictability is caused by misalignments. People can misunderstand a task that needs to be done, misjudge what others might do, misprioritize, or simply forget all or part of the task. This can be caused by everything from lossy or slow communication to a lack of trust or commitment.

Misalignments are a common affliction, enough so that I spend a lot of time looking at how information moves through an organization, is understood, and is acted upon to help me hunt down and fix their sources. This is especially useful when misalignments are endemic or concentrated in a particular part of the ecosystem.

Traditional management recognizes the problem and expects managers to use detailed tasking and assessments to find misalignments when they occur. This approach suffers from the same limitations that create knowledge and awareness gaps. It also does little to protect against another type of misalignment of missing the outcome the customer is trying to achieve. Staff focus on completing the tasks to the standard they are asked to meet, not whether the work leads to the outcome desired. This leaves finding and course correcting to managers, many of whom will likely be so busy directing people that they can easily overlook or misjudge the details of what is needed. Also, being one or more steps away from the activity, managers are more likely to struggle to determine what course corrections are needed to deliver what is required.

Misjudgment of Ecosystem Complexity

The third source of unpredictability is caused by misjudging the dynamics of your operating ecosystem domain. As we explore in Chapter 5, “Risk,” not all ecosystems are ordered and predictable. This poses a problem for traditional management techniques, which are designed to expect that either managers or the experts they hire can determine from what is happening what actions need to be taken.

The more unordered an ecosystem is, the more likely that causality cannot be determined until, at best, after the fact. Expecting orderliness in your delivery ecosystem when there is none leads to misjudgment of the situation and ultimately poor decision making. What makes it worse is that this same unordering effect can also make it difficult to understand why a decision was poor and learn from it.

The Anatomy of Mission Command

Images

Figure 3.4
“...and this is where you find our mission’s intent.”

Both Clausewitz and Moltke recognized and wrote extensively about each form of unpredictability. They knew that they were so endemic to warfare that, at best, only the very beginning of a military operation could be planned with any sort of precision. However, they also found that there were ways to minimize the effects of each. Some, such as staff rides and war-gaming (both covered later in this chapter), tackle some aspects head on. Others avoid problems altogether by fundamentally restructuring how orders are given.

Key to all of this is the realization that, regardless of whether the activity was a single mission or the overall strategy for victory over an enemy, the target outcome should remain stable throughout. In Moltke’s view, this means that the main objective of any leader is to share and be clear about the intent behind any action with subordinates. The leader should not instruct subordinates of what specific tasks to perform or specify what methods have to be used, but instead needs to ensure that they understand what the target outcomes are and how the mission or campaign needs to contribute to achieving it.

By focusing on the target outcome rather than specific actions and methods, subordinates are free to choose the most optimal options and make the most sensible decisions for whatever situations they encounter that best achieve the target outcome. This approach is fundamentally different than what most of us are used to. In traditional organizations, the outcome is less something subordinates pursue and more something that the manager is seen orchestrating subordinate tasks to reach.

There are a number of important components to Mission Command that must be present together to make it work. These are commander’s intent, backbriefing, einheit, and continual improvement.

As we examine each of these components to understand what it is and how it works, you might notice a number of parallels that many of the components have with concepts that are central to Lean and Agile. These parallels are not accidental. However, it is important to note that many of the failures that organizations have when adopting Lean or Agile often begin by only implementing some and not all of these components.

Commander’s Intent

Commander’s Intent is where the journey for defining the target outcome begins. Along with backbriefing, it is the way that the commander or manager provides subordinates with the clearly defined objective, or intent, that forms the anchor point to guide decisions throughout any mission or scenario. This is necessarily more than simply telling subordinates of a target outcome. Leaders need to be confident that their subordinates understand enough of the nuances of what it means so that if any plans fail or conditions change, they can alter their approach and adapt to still reach the target outcome.

This is actually far more involved than it might sound. Even when trust levels are high between leaders and subordinates and both share many of the same values, there are a lot of places where subtle yet important details can be lost. This is enough of a challenge that I often encourage team members to, if at all possible, first check to see if the intended outcome behind any request, task, requirement, or job duty is clear, meaningful, and actionable before beginning work. I call this passing the “Pony Test.”

Images

Figure 3.5
“Wanting a pony doesn’t mean you must have one in your bedroom!”

The “Pony Test” is a quick sanity check. It is designed to catch whether any important information about the intent of a work item is missing or unclear. The name originates from the actions of a particular product manager who, one manager later noted, acted like a child demanding a pony. This product manager was known for regularly failing to give any reasons beyond “because I want them” for why certain requirements were being asked for, and refused to take seriously any repercussions for delivering them. After one particular disaster, the organization decided that anyone who wanted to make a request needed to both have a meaningful purpose or reason for it and consider any costs or consequences to ensure it didn’t become an open-ended headache.

The test is simple, and asks the following questions:

  • What is the target outcome? How does this differ from the current situation? Is it a current problem or need that can be measured, or a future issue or opportunity?

  • Is there a timeframe within which the need must be met? If so, why and what happens if it is missed?

  • Are there any priorities that we need to be aware of, both within delivering the need and in the larger scope of our other duties? If so, what are they and how do we resolve conflicts?

  • Are there any constraints that we must abide by?

What is good about the Pony Test is that it works just as well for new requests as it does for guiding triage and incident activities. For instance, one commonly ill-defined requirement that hits DevOps teams is the meaning of “uptime.” Everyone wants high uptime, but few bother to articulate what uptime actually means, why they need it, when they need it, what the relative value of different uptime conditions is, or what happens when they do not have it.

Anyone who has run a service knows that when it comes to uptime, not all times, conditions, or services are equal. No one cares if a service is down when no one needs it, but everyone is irate when it is down at a critical point. Knowing and understanding these nuances can help staff make better decisions on how to approach problems. When faced with time and resourcing constraints, they might spend more time designing and testing the reliability or redundancy in one area over another, or start triaging a more important data integrity issue over restarting a service. As services become increasingly complex, managers need to spend more time ensuring that any nuances are shared and understood by all.

To ensure that Commander’s Intent was understood and acted upon, Moltke and others developed a means to convey it that answers the same questions as our Pony Test. This is called the brief. As you will see, it can be heavily repurposed for our own needs.

Brief

In order to ensure that subordinates feel free to determine how they are going to reach the target outcome, orders need to look far less like the instructions that most people think of and more like a description of what that desired outcome means and why it needs to be achieved. This description needs to strike a balance that can direct subordinates toward the intended outcome but is not so detailed and prescriptive that it impinges on any creativity and initiative they might need to reach it.

Rather than calling these “orders,” which most associate with a laundry list of what to do, they are often referred to as directives. In order to effectively capture strategic intent, a commander’s directives typically follow a particular pattern of roughly four or five short elements, described in the following sections.

Situational Overview

A situational overview is a brief description of the current situation. Its goal is to both make clear what is going on and provide some relevant background behind what needs to be done.

In an IT setting, a situational overview could be something like “Our business push to increase the number of customers has steadily raised the load on our systems. We are now at the point where at peak periods service performance, defined by reasonable service responsiveness without timeouts or errors, threatens to dip below what customers expect.”

The situational overview is very much a starting point. Unless subordinates are already familiar with the situation, it is likely that they will need to obtain more information before attempting to deliver a solution in pursuit of the target outcome.

Statement of the Desired Outcome or Overall Mission Objective

In the next step the manager or commander covers the essential objective that needs to be achieved, and the “why” behind it. This is not some fluffy statement, but something with a meaningful objective that enables those receiving the statement to take short-term actions that can provide results toward achieving the conveyed intent. It often contains elements of who, what, when, where, and why, but avoids specifying how. This is the essence of the information necessary to answer the first question in the Pony Test.

Sticking with the current example, this could be “We need to ensure that production service performance stays in line with customer expectations. This needs to be done in order to maintain customer loyalty as well as to allow us to gain greater market share.”

Execution Priorities

Whether we are on the battlefield or in an office, it is rare that we are delivering every aspect of a service alone in a vacuum. We inevitably need to coordinate with and seek help from others. The leader needs to help make the subordinate aware of the themes and priorities of activities of such groups. The leader may suggest reaching out to certain teams or individuals in order to help the subordinate make key decisions to best achieve the envisioned end state.

In our example, a leader might state: “We need your team to work out the most expedient and effective way to allow us to seamlessly accomplish this. Marketing wants to put together a campaign that, if successful, may result in dramatically increasing our customer base. They have been planning to target mid-level companies who might benefit from certain aspects of our services. Jane from Marketing can work with you to provide details and can feed back to her team in case they need to make any adjustments based on your work. We are also in the middle of a contract renewal with our most important customer, who has brought up concerns about our ability to scale. Joe in Sales is available to help. Other than triage support this is the highest-priority item in the Technical division. The CEO and board are supportive of investing, but need to understand what might be entailed and what impact it might have to our capex and ability to deliver other projects.”

Anti-Goals and Constraints

Images

Figure 3.6
“No, we cannot start a real disaster to test our DR processes!”

It is rare for a mission or activity to lack any boundaries or constraints. Sometimes these are caused by upcoming events that subordinates may not be fully aware of, or limitations in time or resources that need to be accounted for in future decisions. Similarly, anti-goals are explicit items that cannot be allowed. This helps clarify the scope of action available.

To complete our example, let’s see what constraints our “commander” has laid out: “We know that some customers are at a critical point in their business cycle where, for the next month, they are particularly sensitive to any interruptions in the availability of our service. Severe interruptions not only could cause us to lose customers, but would damage our reputation severely. That could force us to exit that market and to scale back our growth plans. Your team needs to look for ways to minimize customer impact. We also know that the Business Intelligence team is working on improving real-time analytics, which could involve some of the same systems you might decide to look at. They need to be aware of any potential changes in their area. Finally, any improvements absolutely cannot impair our production security in any way.”

Throughout the directive briefing, the subordinate is expected to ask questions, challenge details, and point out any potentially erroneous information. The purpose of this is not to challenge the capabilities or authority of the leader, but to help clarify and ensure that the subordinate fully comprehends and has what is needed to stay oriented to the outcome. As we all know, anything that can be misunderstood often will be.

Backbriefing

Images

Figure 3.7
“With this approach we should be able to achieve that outcome.”

The brief is never enough for the manager or commander to be confident that subordinates fully understand and have what they need in order to progress the commander’s intent of a directive. This is why subordinates are expected to follow up a directive briefing by providing what is called a backbriefing.

The purpose of a backbriefing is not to provide a typical detailed plan. It is very much an opportunity for subordinates to both get the answers to any important questions they might have about the directive as well as demonstrate that they understand its underlying intent, their role in achieving it, and the relationship between their mission and the mission of others who may be involved. In this way it acts as a quick check to eliminate any ambiguity or misunderstanding.

The backbriefing also serves as a check to help the leader gain additional clarity of the implications of their own directives. Being closer to the details, subordinates have a more granular view of potential risks and needs that may jeopardize the mission’s success. It is possible that these details may result in the directive being altered or eliminated altogether.

Backbriefings also allow subordinates in different work areas to compare notes and ensure alignments across the organization. This can further improve situational awareness and decision accuracy.

Backbriefings happen relatively soon after the initial directive. For small and well-understood directives, they might run back-to-back with the brief. However, I find that there is a lot of value in giving subordinates some time to think about the directive, do some short checks on some of its important aspects, and to put together a loose outline of an approach. For single-person tasks or team-sized tasks, this could be as short as a few hours or a few days. Such investigation is typically done as a spike within the weekly cycle. For larger initiatives, as well as when subordinates have their own teams and staff, more time will likely be needed to allow teams to be briefed and coordinate with each other to provide a meaningful backbriefing.

In our earlier example, the subordinate may come back with information regarding recent performance degradations that pinpoint potential bottlenecks in the database and data structures. The work in this area might be particularly risky. Data schema changes might also impact the work being done by the BI team, and might also have an unforeseen impact on potential integration efforts with an external tool that have been discussed. The subordinate might also come back with data showing that increased load has not been the result of additional customers, but is due to unexplained changes in the way that one particular customer uses the service.

Einheit: The Power of Mutual Trust

Images

Figure 3.8
“Trust makes us stronger.”

To work well, Mission Command needs more than some back and forth about the outcomes and intent of any activities. There also needs to be some level of mutual trust between the leader and subordinates. The leader needs to be confident that the subordinates will understand and carry out what is desired, while subordinates need to trust that they will be supported when exercising their initiative. Without it information flow and shared awareness will wane at the very moment when they are needed most: when conditions cause decisions and plans to go awry. Even the hint of losing support or incurring blame can cause people to become protective of important yet potentially embarrassing details, feel prevented from asking necessary probing questions, and feel unable or unwilling to reach out for help.

The Prussians observed that trust and a commitment to a shared cause was a major factor that improved the cohesiveness and fighting ability of Napoleon’s army. Scharnhorst’s early reforms helped set in motion a number of the key factors that Moltke later made part of the fabric of Mission Command in the Prussian Army. It became known as einheit, which roughly means “unity” and “oneness” in German.

Despite being a foreign term to non-German speakers, einheit is a useful way of describing the dynamic that creates team cohesiveness. Not only does it have far less baggage than the often overused word “trust,” it also expresses two other important factors of trust that are so often overlooked. One is the idea of building a shared commitment to an objective. The other is personal insight, understanding, and rapport that members of a team have for each other.

Having shared commitment to an objective means that no one has to worry that some hidden agenda or ulterior motive might get in the way of achieving the stated outcome. Everyone can be confident that the stated target outcomes are the primary goal that matters and no manager or subordinate is going to act in a way that is going to undermine efforts to reach them.

Familiarity and shared experiences between team members is another important part of einheit that can also significantly aid alignment. The more you know the strengths and weaknesses of each other, the ways each other communicates, and the way each other will think about, approach, and solve problems, the more seamlessly you can work together toward the objective.

Strong team member familiarity can help fill in any situational awareness gaps needed to make a good decision. For instance, you might approach discussing a topic in a different way if you know that it will help your teammate or manager understand it better. You might pair with someone who is strong in an area you are weaker in for help or to strengthen your knowledge. You might even look to share your own knowledge in an area others are unfamiliar with. All of these can help you and your team perform more effectively.

People who have worked in close-knit teams know that there is no better way to build an additional level of shared context than to spend time building shared experiences, and with it trust, within and across teams. Moltke taught his troops to recognize the importance of shared context and trust themselves. From there he showed them that this level of rapport gave their own intuition an extra level of accuracy that they could then rely upon. This meant that they could reduce the level of confusion and misalignment that might otherwise lead to inaccurate and conflicting conclusions. As everyone took part, this “going and seeing” provided a far superior level of situational awareness across the entire organization.

Regular interaction and mutual trust had the added benefit of encouraging implicit communication to develop among the team. Think about a family member or someone you have been close to for a long time. If you’ve known them long enough, you can often gain a clear sense of the state of a situation from little more than body language or the tone of their voice, even without being explicitly told its details. Implicit communication and mutual experience can provide a means for information to travel quickly and accurately. Subtle differences, from the tone of a voice to a slight change in the way that someone behaves, can convey large amounts of information very rapidly and accurately.

Prussian commanders spent a great deal of time in the field working closely with their subordinates, learning the nuances and capabilities of one another across the collective unit as they interacted within the environment. The intuitive awareness that staff gained gave them the ability to act quickly and in unison with little in the way of discussion.

This implicit knowledge greatly improves situational awareness. It also sharpens appropriate responsiveness to escalating situations. It was so effective that after German unification in 1871 it was integrated into the wider German military.

In German such awareness is referred to as Fingerspitzengefühl, which means “fingertip feel.” It is conceptually similar to English sayings like “keeping one’s finger on the pulse.”

As John Boyd had discovered in his interviews of former German officers, Fingerspitzengefühl was at the heart of the successes of the German military during World War II. It allowed troops and officers to quickly establish important and meaningful relationships between disparate pieces of information as they became available. Their shared experiences helped them intuitively know how to respond, even when the information received was incomplete. Officers were able to build a mental map of the battlefield, allowing them to continuously stay oriented and respond with speed and determination.

Many who have studied Toyota have likely also encountered similar discussions of the importance of building up intuitive knowledge and trust within the organization. Some at Toyota compare it to the way that samurai once practiced until their long swords became extensions of their arm. In the same way, the individuals in the organization can build the same connections to work as one.

Creating Einheit in DevOps

As an IT leader, I spend a lot of time and effort to foster a sense of einheit across the organization. I find that it starts with spending time with teams as they go about their daily work. I watch how communication flows between team members and how they share new ideas and discoveries. I look at how information flows across teams and the larger organization to see where it gets lost, distorted, or simply slows down. These are all places where einheit is likely absent.

I do encourage teams to have regular team lunches, “brown bag” discussions, and team outings. I also utilize the regular sync points that are discussed in Chapter 14, “Cycles and Sync Points.” I find that the amount of focus and value put into both retrospectives and strategic reviews by both team members and managers can determine how much einheit will develop. When both retrospectives and strategic reviews are taken seriously and are seen as valuable, team commitment will go up. People take notice when there is dedicated investment in helping the team improve.

The Queue Master role, as discussed in Chapter 13, “Queue Master,” also can help build the feeling of “oneness,” especially when those holding the role are positioned to realize that they are there to look for patterns and help guide the team to success. The role itself is usually eye-opening, allowing individuals to take a step back and look at the dynamics across the team. When paired with retrospectives, it can be a powerful motivator toward encouraging collaboration.

Einheit can also be created across teams as well as between geographically distributed team members. When faced with coordination challenges in such situations, I look for opportunities to encourage team members to spend quality time face-to-face with their counterparts in their own surroundings. This helps people understand and build the rapport that creates einheit. When done well, I find that as people really get to know each other, any expense is more than paid for by the value from the reduction in organizational friction and mistakes.

The final aspect I try to rid the organization of is the idea that there needs to be someone to blame for a mistake or negative event. This is always hard to accomplish. Most people have a natural tendency to look for fault in someone’s actions while simultaneously avoiding blame on themselves. Negative events are typically caused by the loss of situational awareness, and if there is any “blame,” it is the system that enabled it to occur and not the people in it. Negative events rarely happen because someone intended for them to occur. Finding the cause of the loss, learning why it happened, and finding a way to prevent it from happening again, all while not blaming the person or people who were involved, will go a long way toward improving team trust.

Continual Improvement

Images

Figure 3.9
Continual Improvement.

Having the autonomy to adjust to unpredictability only works if there is also a culture in place that is willing and able to discover and incorporate new knowledge and approaches to better attain the target outcome. This type of culture is surprisingly uncommon, and not because people are resistant to improving their abilities and knowledge. One reason, explored in Chapter 7, “Learning,” comes from the way that many of us approach the learning process, assuming that learning comes top-down and that there is one absolute truth for everything and that it does not change. Another comes from how constrained people feel to objectively challenge accepted norms and beliefs in pursuit of the target outcome.

Challenging the status quo, whether it is attempting to validate assumptions or seeking solutions that are better or more effective than existing favored approaches, is never easy. For one, the status quo is familiar and generally accepted as “good enough” to get the job done. Unlike new approaches, favored approaches need little justification for use. This holds true even when the usual assumption or method is known to have significant flaws that damage its suitability.

This need to justify any new investigation or ideas is where learning and improvement enthusiasm so often falter. Trying something new, innovative, or unconventional is risky, especially if there is low tolerance in the organization for mistakes. Failure is not fun at the best of times, but it can be devastating if it puts your job at risk. Besides negatively affecting anyone’s willingness to experiment, intolerance for mistakes can also cause people to hide important information that is needed to be situationally aware of a situation and improve it. This is true even when an organization touts continual improvement. Without explicit managerial cover and support, few feel safe enough in their positions to take a chance of being blamed, or worse, for a mishap.

There are numerous examples of this tendency to “play it safe” throughout IT, whether you are talking about the lag and dysfunction of improving project planning and management techniques, problematic postmortems structured to lay blame on a person or team, or the way that new technologies and techniques struggle to be adopted and used effectively. One example that is regularly overlooked yet particularly relevant to DevOps is the journey to modernize software installation and configuration management.

Even though manually installing and configuring software is people intensive and error prone, IT teams have long found it to be completely normal. In the early days when there were only a handful of systems to install and configure, it hardly seemed worthwhile to spend time building a robust revision-controlled automated provisioning system that could authoritatively report a current node configuration state. But even as the number of installations and configuration complexity have increased to warrant deploying such tools and supporting processes, many IT teams remain reticent to use them despite their real benefits. When it does occur, the initiative is usually management-led, typically with either a vendor who can take the blame or an internal sponsor who is willing to take the political risk.

John Boyd, Helmuth von Moltke, and others realized that relentlessly challenging the validity of our assumptions and approaches needed for continual improvement is critical for success. In his 1976 paper “Destruction and Creation,”1 Boyd stated that in order to shape and be shaped by our changing environment, we must continually destroy and re-create the mental patterns that we develop to comprehend and cope with it. It is only through continual learning and improvement that we can adjust to take advantage of these changes to “improve our capacity for independent action.”2

1. http://pogoarchives.org/m/dni/john_boyd_compendium/destruction_and_creation.pdf

2. Boyd, John, “Destruction and Creation,” p. 1.

Moltke also understood that punishing one case of misjudgment would kill off every attempt to encourage initiative. He was famously tolerant of mistakes. He felt that as long as the commander’s intent was recognized and understood, mistakes made in the pursuit of the objective should be accepted and viewed as occasions to learn and improve. He instructed superior officers to refrain from punishment or harsh criticism of mistakes, and instead praise the initiative and correct the troops in such a way that they learn.

Both Boyd and Moltke also knew that learning and improvement need fast and regular feedback in order to increase ecosystem understanding and build up the overall body of knowledge that is available for use in the future. Some of the most valuable feedback often comes from uncovering what went wrong, and any fault or blame would only encourage people to withhold this valuable information.

Over time, two techniques to improve experimentation and reflection have emerged from the ideas of Boyd and Moltke. These have been widely adopted by a number of Western militaries and have analogs both in Lean Manufacturing and in Agile methodologies such as Scrum. In the military they are called staff rides and after action reviews. The purpose of each is important for IT organizations to understand as it is so often lost in their IT equivalents.

Staff Rides

Like most military commanders, Moltke had his troops regularly practice. Where Moltke was different was that he was less interested in critiquing how well troops marched in formation or how closely they followed orders. He was more interested in getting them to think about how to use their capabilities to successfully navigate a dynamic ecosystem. He did this through competitive war-gaming. The most innovative of these were his staff rides.

A staff ride would begin with Moltke taking select officers out to an area where a significant military event, such as a battle or major deployment, was likely to occur. Using the latest topographical maps and military intelligence, he would construct hypothetical scenarios and “what if” situations for his officers to work through. To make it real, at times the rides were combined with practice maneuvers incorporating troops and weaponry.

The rides tested staff on their ability to take in information in their environment and construct actions from that context. The unpredictability and fluidity of warfare meant the officers had to think on their feet and improvise. They learned that success belonged not to the team that followed an order exactly as prescribed, but the one that achieved the desired outcome. This also happens to be the essence of the art of learning to think.

Participation did not end after the action. After a muddy day out in the field, Moltke would sit with his troops and insist they review the day, analyzing what happened and finding ways to incorporate what they learned. This combination of strategy, practice, and analysis in a real-world setting helped his officers be better prepared to think creatively on how best to achieve the outcome.

Chaos Engineering has turned into one form of the IT equivalent of a staff ride. By actively creating real failure scenarios that technical staff need to navigate in their code and supporting systems, they start to think beyond the idealized “happy path.” I usually take this much further with my teams by “war-gaming” out scenarios to get the entire team to think together rationally about various problems. Such war-gaming can either be a paper exercise or, preferably, involve actual actions taken in a war-gaming environment. Other times it might simply involve walking through either a past event but within the context of the current or some future ecosystem configuration, or some hypothetical scenario that business or market conditions might eventuate. I use opportunities like new initiative planning, strategic reviews, and audits that naturally bring the team together to minimize disruption and distractions.

Most teams are skeptical of the value of such activities at first, missing that the focus is less on how but why they would act in a certain way during a particular scenario. Often people are surprised to find their assumptions are flawed, or that there might be a far simpler or more elegant solution. Everyone learns along the way, and it is a great way to encourage people to think about how to hone and improve.

After Action Reviews

Review and reflection were brought directly to the battlefield during World War II with the development of the after action review, or debrief. Immediately following action, survivors were brought together with technical information surrounding the event. The goal of the review is focused on helping teams reflect upon and continuously learn from their experiences. Importantly, ranks are intentionally put aside to ensure candid discussion.

The US military discovered a number of benefits with the after action review. For one, they allowed teams to work together to better understand and learn from the action. Each person carried a piece of what happened from their personal experience, but rarely did they have the full picture. Being able to review and reflect helped teams make sense of what happened, to make the experience meaningful. They found that this improved unit readiness and built cohesion. It also contributed to the larger knowledge of the military, helping with future action.

In IT, retrospectives should play a very similar role to the after action review. They should encourage open and candid discussion in a way that helps the team reflect and learn.

Organizational Impacts of Mission Command

The Mission Command approach is vastly different than what people are normally used to. It requires both more independent and more holistic thinking by staff, which creates more shared ownership in the target outcomes, as well as higher awareness of the interdependencies and need for alignment across teams at the individual contributor level. If done well, this goes a long way toward helping teams minimize the sources of decision friction that make delivering effectively in an unpredictable ecosystem so difficult.

Moltke made gaining such awareness much more explicit. He required every subordinate to be trained to function effectively at two levels of command above his appointment. This doesn’t mean that they needed to have lots of extra skills and certifications in order to be able to do their job. That is impractical. The purpose was to build the skills for them to think about the bigger picture of the system that they are operating in.

Lean practitioners tend to call this “systems thinking.” Some IT types, especially those who have been in the industry a long time, remember when there was a need to deeply understand the dynamics and interactions across complex IT ecosystems. Back in those days such a skill was sometimes referred to as “systems engineering.”

Such systems thinking is increasingly important in a service delivery world, where service components and service consumers can be both numerous and diverse. By ensuring that the intent and any necessary constraints are clear and understood, all while ensuring that the flow of information about current conditions is free of blame, teams can make the necessary decisions and improvements to deliver the target outcomes.

Summary

Teams that are told a desired outcome are far more effective than those that are told what actions they must perform, because it enables teams to quickly and independently adapt to previously unknown or changing conditions. Briefings and backbriefings are mechanisms that arose from the Prussian military and were later adapted in the civilian world. They can help leaders and staff align on the intended outcomes, any execution constraints that must be followed, and what resources and support the team might need when pursuing them.

For this framework to work, there must be mutual trust within the team and between the team and management. Team members also must be encouraged to work closely together to reduce communication friction and improve shared situational awareness. When done well, einheit is established, allowing the team to deliver smoothly as one, with few misalignments and little conflict.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.82.4