Chapter 14

Cycles and Sync Points

If we do not hang together, we shall surely hang separately.

Thomas Paine

Those of us delivering and managing services in increasingly complex and dynamic ecosystems have likely discovered that conditions can change without warning, swiftly rendering our best-laid plans obsolete. This is bad enough when it happens to one person. It can be catastrophic when it affects the awareness and alignment across the entire team.

Having a visual workflow, along with the Queue Master and Service Engineering Lead, does help expose emerging changes and events. However, without some mechanisms in place to make sure that everyone becomes aware of a changed condition so they can adjust, reflect, and improve, inevitably someone will miss out and be left behind.

The best way to establish this shared awareness is to do so with a set of communication mechanisms that brings everyone together. In order to minimize disruptiveness, the mechanisms should align with the rhythms of life. Following these natural flows and cycles goes a long way toward turning them into natural habits that reduce the level of interruption and misalignment that more traditional mechanisms cause.

This chapter will walk through the mechanisms I’ve found work well in the service delivery space. They not only augment those mechanisms that we have already covered, but also provide the investigation and learning space necessary for implementing and improving the instrumentation, automation, and governance mechanisms covered in subsequent chapters.

Inform, Align, Reflect, and Improve

Unless your ecosystem is full of arsonists bent on seeking glory through disorder, chances are that you and your team want to know two things:

  • Are you making the right decisions to achieve the target outcomes?

  • How can you improve to make both your decision making and actions more effective?

Making effective decisions can be surprisingly difficult. It isn’t enough to know something. To guide action toward a desired result, you need to have just enough context about the dynamics of the situation at hand (your informed situational awareness) to match with relevant knowledge, either in the form of experience or knowledge resources that are easily accessible.

Improving your decision-making abilities requires a further step that involves taking the end result of any decision you made and comparing it to what was expected. If these do not align, you have to figure out (or reflect upon) what caused the misalignment.

Any number of events can cause misalignments that can degrade decision making, including:

  • Having material flaws in your understanding of your experience

  • Having outdated or incorrect situational awareness at the time of the decision

  • Not having timely access to appropriate knowledge resources

  • Lacking availability of sufficiently suitable execution resources

  • Using process or execution mechanisms that contain too much friction (in the form of speed, variability, or reliability)

Avoiding these problems becomes more complicated with other actors in the ecosystem. Whether you are depending upon their actions or merely affected by them, any shortfall in shared awareness or alignment can cause collisions or muddy any understanding of the actual effects and accuracy of your decisions.

To counter such tendencies and maintain alignment, organizations have tried a number of different strategies. These span from trying to control alignment through a top-down process to leaving it to the team to self-organize and figure it out themselves. Each has some limitations that are worthwhile to quickly explore.

Top-Down Alignment Control Approach

The most commonly used is the top-down control approach. It relies upon direct orchestration of the work that staff performs using a mix of scheduling methods, along with process and method controls that are managed by some sort of project, program, or staff manager.

People rely on the top-down control approach because it is simple, provides a sense of control that is alluring, and aligns with traditional management thinking. It also can work in ordered environments, as defined in Chapter 5, “Risk.” However, top-down control depends so heavily upon both a reliably predictable dynamic between the cause and effect of actions and the ability of the person managing the work to maintain a clear and accurate level of situational awareness across the ecosystem that any slip of either can cause a cascading failure. If this were not bad enough, this approach also counts on the manager-type to find and correct any faults and drive improvements through the team. Either of these is difficult when your awareness degrades. It also does not help that such failures also tend to degrade trust between management and the team.

Alignment Through Iterative Approaches

Moving across the spectrum are the more Agile-style iterative approaches, whether in the form of more cyclical approaches like Scrum or more flow-based approaches like Kanban. Rather than trying to control everything centrally, iterative approaches accept that not everything in the ecosystem is going to be clear and instead rely upon the fact that those performing the work likely will have the most up-to-date contextual information in the immediate area they are in. These approaches employ methods that try to optimize the flow of this contextual information across the delivery team to allow the team to self-organize, make informed decisions, and make improvements themselves to deliver more effectively.

The Scrum Sprint Model

The standard Scrum model leverages frequent cyclical mechanisms to align team work to customer objectives, while also allowing for work coordination to take place more tactically at daily standups. These cyclical mechanisms are then supported by both the show-and-tell at the end of the sprint, in order to get feedback from the customer on how well the work aligns to their expectations, along with the retrospective, which allows the team to reflect on their own challenges, make adjustments, and learn.

All of this does a much better job of helping keep everyone informed and aligned with each other and the priorities of the stakeholders. It also does a reasonably good job of encouraging the team to reflect and improve.

Unfortunately, this model’s weakness is that it does not deal particularly well with the unplanned reactive operational work that is core to DevOps. In order to both build a reliable iterative rhythm and get regular useful feedback from the customer, this model counts on work being planned and prioritized up front and then remaining static for the duration of the sprint.

When there is too much unplanned work hitting the team, there is a risk of all of this breaking down. Where this seems to be felt most deeply is in the cyclical alignment mechanisms. Most alignment activity tends to occur at Sprint planning, where the product owner can work with the team to figure out dependencies and areas where team members need to coordinate with one another. While an excellent Scrum Master or product owner might be able to help somewhat to untangle messes caused by unplanned work during a sprint, they often are hindered by the lack of a sufficiently deep level of visibility and situational awareness across the ecosystem to really help.

Kanban

Unlike Scrum, kanban thrives when tasks are unpredictable. This is also why many elements of it form important parts of the workflow described in Chapter 12, “Workflow.” By focusing on the flow of tasks and the amount of work in progress, it allows tasks to be reordered and new ones to be inserted at any time. There is even a means to expedite urgent work.

However, one of the biggest problems with teams that use kanban is that so many tend to overlook the need for cross-team synchronization, alignment, and improvement. This is not because this need was ignored in the creation of kanban. Kanban, as described by David Anderson, has daily standups much like Scrum, where everyone “walks the board” with someone acting as a facilitator. These daily standups, along with after meetings, allow the team to find and remove blockers as well as stay in synch. There are also queue replenishment meetings, which are similar to Scrum’s sprint planning in that they provide an agreed understanding of priorities and objectives, along with release planning, and even review and improvement sessions.

Where the problems begin is that most teams miss the intent behind these mechanisms. Rather than thinking about the target outcomes, how to maintain alignment across the team, and continual learning and improvement, most instead concentrate on the board and how many tasks they are moving through it. Outcomes and priorities are often neglected or forgotten about unless an escalation occurs. Review and improvement tend to be ignored altogether.

Even when the cyclic mechanisms are performed, most fail to achieve their underlying intent. Daily standups tend to devolve into who is blocking whom rather than everyone looking at the whole board to understand what is going on.

Losing the value of these sync points or dropping them completely is so common that it does not take much effort for a skilled eye to scan a board to see that it is happening. It leaves a clear shadow of fragmentation that degrades the alignment and delivery effectiveness of the team using it.

Service Operations Synchronization and Improvement

Now that we know where common alignment methods tend to fall short, what can be done to overcome these problems?

Rather than starting from scratch, we instead build upon the good work that has come from the iterative approaches. This starts with the kanban-like workflow as described in Chapter 12. We then add elements based on the cycles and synchronization points of both Scrum and kanban, but with a couple of important twists.

The first is the introduction of the Queue Master and Service Engineering Lead. As you will see in this chapter, both of these play a major role in overcoming many of the challenges of keeping the entire team situationally aware and aligned.

The second is more interesting. Having to react to unplanned work all the time, teams start to become increasingly tactically focused. This tendency can bleed into the alignment and improvement mechanisms, causing teams to think in a much more short-term way that suboptimizes what improvements and learning they can achieve.

For that reason, I have found that it is more sensible to divide these cycles into two. The first is the shorter tactical cycle, much of which has elements familiar to the iterative Agile cycles. The second is a much longer strategic cycle that is focused on deeper problem solving and improvement to help the team deliver to meet the target outcomes more effectively.

Let’s walk through each to understand them better.

The Tactical Cycle

The tactical cycle is primarily focused on keeping the team informed and aligned on a day-to day-basis. Many aspects of it are similar to a Scrum sprint. It is centered on the workflow and is led by the Queue Master. As the name suggests, the cycle contains mechanisms designed to help with tactical prioritization, resource allocation, event scheduling, and conflict resolution. Reflection and improvement are also important elements, but tend to be tightly targeted to either immediate need or the target outcomes laid out as part of the strategic cycle.

Images

Figure 14.1
Tactical cycle.

The length of the cycle is typically one week in order to improve the opportunities to adjust to findings coming from reactive work.

If development occurs in a separate team, it is helpful whenever possible to align the start of a tactical cycle to the start of the development sprint. This allows SE Leads to quickly assess and align resources and scheduling of activities across teams. If for some reason alignment is not possible, SE Leads will need to work closely with their delivery team before the cycle kickoff to try to determine what might be needed. Even a somewhat inaccurate view can help the Queue Master and team limit potentially damaging surprises.

The Queue Master usually rotates with each cycle. This is useful for two reasons. The first is that it introduces a regular fresh set of eyes into what is going on across the ecosystem. The second is that the work the new Queue Master needs to do in order to get up to speed gives both the new and the previous Queue Masters an opportunity to compare notes and get a fresh perspective about everything from the current state of the workflow to any outstanding activities from the previous cycle and any known blockers or known work coming in. This Queue Master brief in the hours before the new tactical cycle begins can help prevent both the Queue Master and the wider team from being lulled into dangerous complacency. Nothing encourages people to sharpen their situational awareness as quickly as the possibility of unknowingly being handed a raging dumpster fire.

Once the new Queue Master for the cycle has been briefed, this cycle begins with a cycle kickoff. There are also daily standups, and the cycle ends with a retrospective. While the resemblance with Agile counterparts is helpful, as noted earlier there are a number of important differences.

Queue Master Brief
Images

Figure 14.2
The brief between current and subsequent QM is an important first step.

While the tactical cycle is a continuous loop, the person holding the Queue Master role is not always the same. For this reason the Queue Master needs to go through a series of steps to prepare for a smooth handoff.

The process begins in the hours before the cycle kickoff meeting and is generally only as long as it needs to be. It starts with the current Queue Master reviewing the workflow board with the new Queue Master. This is usually short, and most of the focus is to provide extra context behind activities across the board that may simply be too lengthy to cover or otherwise not suitable to cover with the rest of the team during the retrospective or kickoff meetings.

The new Queue Master usually follows this by getting a quick rundown from any Service Engineering Leads of any major upcoming events or scheduled work. The intent is two-fold. One is to make sure SE Leads uncover any upcoming work that may not have made its way to the board. The other is to catch any potential resource requirements and dependencies that cannot be resolved by the team alone and require management help to sort out. This helps minimize resource conflicts from suddenly derailing the Kickoff meeting.

From there, the new Queue Master should touch base with management and/or key business contacts. This is to find out about any changing priorities or impending development or business activities that might provide useful context or uncover potential operational risk or constraints during the upcoming cycle. Resource and scheduling conflicts should be escalated here, if required. Sometimes it might make sense to arrange to have someone from management at the kickoff to answer questions and give guidance.

By the time the Queue Master is done, they should have a decent outline to keep the kickoff meeting focused.

Cycle Kickoff
Images

Figure 14.3
The Cycle Kickoff.

The cycle always begins with the kickoff. The purpose of the kickoff is to bring the team together to agree upon the priorities and theme of the cycle. This is done in order to help align the team as well as provide a forum to surface potential resource and skillset needs for the coming cycle. As the Queue Master has to ensure the flow of work during the cycle, they are best suited to run the kickoff meeting.

When the kickoff meeting happens, it should first provide a theme for the cycle, if there is one, along with ranked priorities for the team. This is followed by a quick rundown of impeding development, business, and operational activities for everyone to be aware of. Then, each SE Lead goes through upcoming events in their projects, along with any details, context, and resources needed for upcoming work that needs to be scheduled and performed. From there, the Queue Master walks through the workflow, asking any questions and making sure that the team members know of any issues that might prevent scheduled or important work from occurring, or anything that might get in the way or slow down the pace of flowing work. Improvement items that have been agreed to are picked up and put in the Ready queue alongside any other ready known work. Once everyone is in order, the meeting ends.

Important Differences Between Kickoffs and Sprint Planning

To some people, the kickoff might just look like a somewhat stripped-down sprint planning or kanban queue replenishment meeting. There are enough similarities that you could overlay the two for teams that have both development and operational duties. However, before doing so there are some very important differences that you need to be aware of.

The first is that the unpredictable nature of operational work means that it is folly to load up a cycle to capacity with preplanned work and expect that it will all be done. There is simply no way of knowing whether capacity will be severely constrained by an operational disaster, high-priority emergency work, or some other event. This makes planning and coordination difficult.

The best way to counter this unpredictability is to limit how much of the team has to be exposed to its interruptions. Establishing the Queue Master role can go a long way to help. Another helpful way to counter unpredictability is to limit the size and uneven distribution of work items. This includes minimizing the number of tasks that require poorly distributed specialized skills. Doing so increases team flexibility by making the damage caused by any unexpected interruptions that do slip through far less severe.

The workflow itself also is useful for giving you a reasonable idea of not just the likely slack in capacity the team might have for interrupts but also what impact certain types of interruptions might have. This is useful for expectations setting and risk mitigation.

Another important difference is that, unlike in Scrum, there is rarely a stable set of prioritized work items coming from one stakeholder. The unpredictable nature of customer, infrastructure, security, and even organization demands means that newly incoming work can easily displace other high-priority tasks mid-cycle. The Queue Master and SE Leads should help reduce this unpredictability quite a lot, though it is unlikely to go away entirely.

The accompanying sidebar provides an example of a Cycle Kickoff meeting to help you understand its typical dynamics.

Daily Standup

The second of the iteration mechanisms is the daily standup. The daily standup is conducted to reinforce awareness across the team. Like its development counterpart, it happens daily and is intentionally short. It is intended to bolster the team’s ability to uncover and sort through any rising problems, conflicts, or coordination opportunities that might otherwise be missed or happen later than is optimal.

Images

Figure 14.4
The daily standup.

Just like its development counterpart, the daily standup is not a status meeting and must not ever become heavyweight. The key is to only mention things that others ought to be aware of.

There are a few minor differences in the structure of the service operations standup that are worth mentioning. The first is that the Scrum Master facilitation role is taken up by the Queue Master. The objectives of the Queue Master bear many similarities in keeping the standup short (preferably no more than 15 minutes) and focused on synchronizing team members as well as helping people with blockers and conflict. Where the differences come in is that the Queue Master uses the workflow as a tool to inform and to help spot conflict.

Standups start with a brief report from whoever was oncall during any production incidents that are noteworthy to mention. These are really short heads-up mentions of problem areas, whether or not a problem is still ongoing, who if anyone is still engaged on it, and if there is any incident report that people can look at. The key is to stay brief. Deeper discussion can happen afterward if necessary.

The Queue Master then takes the lead. They mention highlights on the workflow, including any interesting, important, or high-priority work that people need to be aware of in the queue, whether they have spotted a potential problem in one or more tasks that need to be brought to people’s attention, and if there are any dependencies or blockers that people need to be aware of.

Following the Queue Master, the chance to speak goes around the team. For double-duty development and operational teams, this can be just like any normal standup. For dedicated operational teams, this often can be a lot quicker than a normal standup. With all the work on the board, each team member only needs to mention specific items that they think people should know about regarding either what has been done or what is coming up. They can also bring up questions or problems about a particular matter, which then should be addressed after the standup.

Retrospective

The retrospective is run at the very end of the iteration. It is an opportunity for the team to reflect upon the previous week, talk about what has happened, and look for improvements to be put in place:

  • Were the priorities incorrect, or was important information missing that was unexpectedly uncovered during the course of the week?

  • Were the goals and the amount of work that the team thought they would accomplish overambitious, or were tasks unexpectedly quick to complete, allowing the team to tackle additional work?

  • Were there tasks accidentally handled in the wrong order, or were there instances where tasks required significant avoidable rework?

  • Were there times when there was too much work in progress, and if so, why?

  • Are there upcoming developments that the team needs to know more about or needs to adjust to?

  • Are there discoveries or developments that might help the team?

Images

Figure 14.5
The retrospective.

Together, the retrospective creates a formal mechanism for team members to learn from both events and from each other and improve for the next iteration. It can also act as a natural inspection point and potential firebreak to allow for problems and dysfunction to rise to the surface closer to when they are happening. This provides better context for the problems, as well as allows them to be dealt with in a way that reduces the potential damage that they might otherwise cause. It can also help the team better articulate situations where management help and support might be required. This could be to receive additional guidance, to help in removing an impediment, or to obtain resources in an improvement that requires investment. For investments, the retrospective can help collect evidence to build a case for management to review.

The retrospective also marks the end of the term for the current week’s Queue Master. The advantage of this is that it is an expected break point for the team. This allows the current Queue Master to make sure that any important Queue Master items are handled properly and not lost.

General Meeting Structure

The length of the retrospective is heavily determined by the number of items that the team feels they need to discuss and agree to next steps on. Generally, you should aim for it to take an hour, but padded so there is the ability to spill into a second hour when necessary. Keeping it brief helps everyone stay focused and engaged. Anything longer tends to become less impactful, and most team members generally like having any free excess time in their schedules.

Everyone should be invited to attend the retrospective, though active participation is required by the following roles, which each have a commitment to fulfill:

  • The current week’s Queue Master

  • Next week’s Queue Master

  • Service Engineering Leads (if the role exists)

  • Any incident details (or important on-call findings)

  • Key individuals who fielded significant items during the iteration that may require discussion. These people can be brought in as necessary and do not need to stay for the whole meeting.

It is important that notes are taken during the retrospective to ensure that what is discussed is captured and can be subsequently tracked. In order to help make sure that the next week’s Queue Master is up to speed, it is usually good practice to make that person responsible for taking notes for the meeting for publishing afterward. These notes should at the very least include the list of potential discussion items, details of the ones discussed, along with any decisions and next steps with assigned owners. These notes should be published via a wiki with links to any work items that are referenced. These notes provide a useful insight into problem patterns, history, and progression that can be used to support further discussion in the strategic review.

The current week’s Queue Master starts the meeting by providing a summary of key workflow details from the last week, as well as a review of whether the theme for the week held. This summary should be brief, with the primary focus on any anomalies that are worthy of further discussion or further investigation and follow-up, not a rehash of everything that happened over the entire week. The Queue Master should lay out each item on a board or a Post-it Note with a 30-second summary of why it is noteworthy (which could be anything from making people more aware of a situation, an area that should be targeted for improvement, a failure or conflict that needs to be further investigated, a Dark Matter item that appears to be part of a bigger problem, etc.). The team can add to the list. After that is complete, the team votes on the top three to five items to discuss.

The next step is for the Service Engineering Leads to summarize details from their engagement with either a delivery or operational project. The focus is primarily on new developments, learning, or questions that may be of interest to the rest of the team. They might point to any new documentation, any demos or reviews, and any impending installation, configuration, or operational work that might be coming up to be entered into the workflow.

It is good to keep the SE Lead updates as brief as possible. Any need for a deeper dive into details can and should be done separately.

The SE Lead, along with the Queue Master and rest of the team, should look for and point out any opportunities for others to get exposure that can help the team come up to speed on the engagement. If the team feels that a single point of failure is developing in the team, it should be pointed out here so that remedies can be discussed.

Once the SE Leads are done, anyone who handled any production incidents or were part of an on-call rotation is given an opportunity to mention any items that are noteworthy for the team to think about. Again, like the Queue Master, this shouldn’t be a rehash of the week. It should instead target such things as problem areas with the production services that might warrant additional awareness, discussion or investigation, as well as potential areas to improve oncall itself. The target is to look to improve service “ilities,” while improving the effectiveness of incident management and the difficulty of on-call rotations for everyone.

Tools & Automation Engineering follows the production incident and management section with any updates or feedback that they feel is worthwhile to give the rest of the team. Sometimes this will be a mention of new tools or capabilities available to the team, along with establishing a time to go through them with team members. Other times it might be to ask questions or provide feedback to the team on particular problem areas that might require further discussion.

The final bit that gets covered before going into the top three to five discussion items is a quick run through of the tallies for Dark Matter items by the Queue Master. The main goal here is to see if the numbers are increasing or decreasing, and if anything new has popped up. As Dark Matter is often a target-rich environment for self-service automation, the team can use this time to discuss whether workflow backlog items should be created for Tools & Automation Engineering to consider tackling, and whether they should take a higher or lower priority to other work.

Once that is done, the team goes back to the top discussion items. If something that was brought up in the other parts of the meeting becomes more pertinent to discuss further, the team can vote to include it.

The Learning and Improvement Discussion

When the team gets to the top discussion items, there is often a tendency for the team to spend it complaining. While that can be therapeutic, it is not a great use of the team’s retrospective time. The team should instead dedicate this part of the retrospective to articulating the problem and assigning next steps to tackle it.

The structure of the discussion of each item should be as follows:

  • Initial statement of the problem.

  • How the problem measurably detracts from the team’s ability to progress toward outcomes.

  • Ways that the problem can be further investigated (in cases where the root cause is not clear); or, as in the case of the scheduler problem described in the upcoming sidebar, deeper explanation might come as part of the retrospective.

  • If the problem is a tactical change, determine what countermeasures can be or have been put in place to minimize or eliminate the problem. This determination should include by whom, at what cost (time, money, resources), and in what timeframe. It should also include how the effect of the countermeasure will be measured, by whom, and when the measures would be reviewed.

  • If the problem requires a more strategic change, determine whether it is an item that should be brought to the next strategic review meeting. If so, what evidence needs to be collected, and by whom, to help?

The discussion should be time boxed, generally to an agreed-upon time in the team, and moderated to keep everyone on topic. The moderator can be the next week’s Queue Master (assuming they are not the one actively pushing the topic), the manager of the team, or, in cases where it is a lively topic, a neutral third party.

Once the team has gone through the structure, a vote should be held to see whether everyone is satisfied with the result. If they are not, the topic can either be taken to a separate agreed-to meeting with the respective parties or escalated up the management chain for resolution.

The Strategic Cycle

Operationally oriented work by nature tends to be both constant and heavily tactically focused. In such a reactive atmosphere people can unknowingly become inured to being in perpetual firefighting mode. This can cause people to not only fail to take a step back to understand and eliminate the underlying problems, but also lose sight of the target outcomes that key stakeholders are trying to achieve.

Many of us have seen various manifestations of this shortsightedness in our professional and personal lives. It could have been an overworked clerk turning away a customer in order to complete some paperwork, services that randomly interrupt and close important customer sessions, causing them to lose work, or a team intentionally leaving an important server in production that has become an irreproducible snowflake because no one has time to figure out how to rebuild it.

While retrospectives do help teams reflect and improve, it is far too easy for a team to become so focused on fixing the immediate tactical problems (like making the filling out of paperwork faster) that they miss the larger patterns of what is happening around them (e.g., having to fill the paperwork out at all, or having its completion being so urgent that it affects customer engagement and sales).

The strategic cycle tries to break this pattern. It does this in two ways. One is by explicitly dedicating some portion of team bandwidth to allow the team to take a step back from their day-to-day tactical activities to look more critically at whether there are more effective ways to achieve the outcomes desired. This time allows for deeper exploration of systemic problems and experimentation with larger or more radical improvement efforts that can break the cycle of limited half measures that so often hobble needed change.

The other way the strategic cycle tries to break the pattern is by giving the team ownership of improving themselves and their own efficacy. This subtle yet important shift in perspective moves the onus of improving away from management and to those who are more likely to make effective and lasting change. It also helps individuals and teams feel more empowered to initiate change, as well as feel pride for any improvements they enact.

Giving teams the bandwidth and responsibility to improve doesn’t mean that improvement is a disorganized free-for-all. The strategic cycle relies upon three mechanisms to help bring and maintain focus throughout.

The first of these mechanisms is the improvement and problem-solving kata, as explained in Chapter 7, “Learning.” The improvement kata is used by the members of the team to organize and explore improvements toward an agreed target condition. Work that gets generated as part of the kata gets integrated into the workflow much like any team project so that it can be tracked and team members avoid becoming unnecessarily overloaded.

Team members working on strategic cycle items can sometimes need help and guidance to stay on track and progress. This is where the second mechanism, the coaching practice, comes in. The coaching practice, also covered in Chapter 7, is a way for coaches, managers, and team leads to help team members shape and progress their improvement kata efforts. Sometimes help is in the form of problem analysis. Other times it might be helping the team shape investment cases, providing resources to help their efforts, or redirecting tactical work to give them the bandwidth to progress.

What pulls the whole strategic cycle together and provides the target conditions that are fed into these improvement katas come out of the strategic review. The strategic review, described in detail in the next section, is the main formal event of the strategic cycle and the one that it begins and ends with. It is the mechanism that involves the whole team where they review and reset target outcomes, reflect on larger or more stubborn retrospective discussion topics, wargame or run a hack-a-thon to rough out potential new solutions to a common problem, as well as improve cross-team alignment.

Together, these activities create the atmosphere necessary to promote the sort of learning and professional growth that help individuals and the team succeed.

The strategic cycle is intentionally longer than the tactical cycles it overlays, with the optimal length being monthly. This not only helps create a break from the day-to-day pressures that get in the way of looking objectively across the ecosystem, but also gives the team a chance to gain support to tackle bigger problems that hold it back.

For busy teams, using a double strategic cycle loop can be a workable option. In these situations there is a major cycle to tackle large problems and significant transformational efforts that runs quarterly, and minor cycles that tackle either smaller strategic items or, for distributed teams, local aspects of the larger cycle item. This model is far from ideal, and is only recommended if the regular approach simply isn’t working.

Let’s take a look at the mechanics of the strategic review and the different forms it can take to understand how it anchors the entire strategic cycle.

Strategic Review

Images

Figure 14.6
The Strategic Review.

The intent of the strategic review is to establish the focus for the strategic cycle by defining one or more target conditions to achieve. The topic or theme behind these conditions is typically chosen by the team about one or two weeks before holding the session, either by having a dot vote (where each person on the team is allotted three to five votes they can put on one or more items, with the topic having the most votes being chosen) or by taking the most urgent or important topic on the list. Choosing the topic beforehand maximizes the amount of time in the review that can be dedicated to working through the problem. It also allows the team to prepare by gathering evidence, materials, and/or people who might prove valuable for the session.

There are three typical sources of topics. The most common source is the tactical cycle retrospectives. Often there are larger or more involved items that need more focused time to solve than can be provided in a tactical cycle. Another possible source is a major shift in service offerings or organizational structures. Such a shift often has very real impacts on the team that need to be explored and understood so that appropriate adjustments can be made.

The third source is past strategic cycle topics that need review to determine whether they need to be explored further or if new target conditions need to be set. Typically, a topic should be sized with conditions that can be reached in one to two strategic cycles. A topic that needs more than one cycle should be reviewed at the next strategic review to see if it is on track or requires adjustments. If it cannot be completed by the end of the second cycle, either the target condition was mis-scoped or those working on it were not given enough assistance to complete it. In the case of the latter, the strategic review should be dedicated to coming up with new ways to ensure sufficient assistance can be secured in the future.

One very important point is that the review is not a mechanism for management to kick off some business-driven initiative that has little to do with the team learning and improving itself. The review is for the team. This is an important point to make, especially as the person leading or moderating the meeting is often the leader or manager of the team. For this reason, especially in the early forming/storming days where there are lots of issues and little understanding or alignment on how best to use the meeting to solve them in a way where the team can learn and improve, using an outside facilitator or balanced party well versed in such reviews to help moderate is a good idea.

The review is arguably one of the most important events for the team. It not only helps improve team cohesion and cross-team alignment, but also is an opportunity to bring everyone together out of their everyday roles to learn and break free of potentially flawed mental models though exposure to the insights that others have of the operating ecosystem. For this reason, it is important to invite everyone on the team.

Many teams find such regular strategic reviews extremely difficult to do, especially when teams first start on their DevOps journey. Meetings that can be long and take you out of your day-to-day activities always seem like a painful distraction no matter how helpful they end up being in the end. There is also the challenge of having everyone involved. It is not uncommon for teams to be big, busy, and geographically separated with poor telecommunications setups.

Those teams that are truly geographically split (such as US-India, US-Europe, US/EU-Asia) with sizeable numbers on both sides face a far bigger problem. Some can get by with cross-coordinated local strategic reviews where each location focuses on specific areas and then shares with the other. Even with those, having at least quarterly joint reviews has a lot of value. This can be accomplished by rotating the hosting between locations and supplementing with some travel of key staff from the remote location to help with cross-pollination. It is still not ideal, and can still fall foul of cultural disconnects, but it is far more effective than not doing it at all.

General Review Structure

The typical strategic review is broken into three parts. The first is a very quick review of progress on the measures that came out of the last review. Typically, any updates are posted regularly for the whole team to see so that any significant problems can be flagged for discussion later in the Review. The purpose of the first part is to have a short and focused update to add any needed additional color on the topic from those acting on them. This is usually timeboxed to no more than 5 minutes each.

The second part of the review is the main subject matter itself. This begins with an initial statement of the topic or theme, along with some explanation as to why it is important to cover for the session. This does not need to be particularly long, just enough to set the scene for the rest of the meeting.

One thing to keep in mind is that a review does not necessarily have to be purely problem-based. Sometimes it might be devoted to exploring a new technology, doing a deep dive in an environment or subsystem, or meeting with a customer. In each of these cases there needs to be a clear and measurable objective agreed to at the beginning that must be met by the conclusion of the cycle. These are typically one of the following:

  • Insights to improve team situational awareness with next steps that alter the team’s approach and ways of working

  • New technology, tool, or process approaches to be adopted or extended to help the team improve situational awareness, or decision and delivery efficacy to meet target outcomes

When a problem is the topic, the discussion should follow a somewhat similar structure to that of the retrospective, including an initial statement of the problem and how it measurably detracts from the team’s ability to progress toward target outcomes. This discussion should be capped to no more than 15 minutes.

Once the team agrees to a statement, the team moves to the third part of the review, which is to dig into the problem to come up with a path for resolving it. This should employ blame-free problem-solving tools to determine the problem’s root causes and explore potential workable solutions to reach the target condition. Sometimes a root cause will not be totally apparent, in which case experiments should be framed up to discover more. Like any improvement initiatives, these should be set up with clear target conditions that can be fed into a learning kata.

Depending on the problem you are trying to solve, there are a number of problem-solving tools that can work well to help. For instance, value stream mapping works well for analyzing flow and handoff problems.

Perhaps one of the most versatile tools for structuring a general problem-solving discussion is the A3.

A3 Problem Solving for the Strategic Review

Often root cause analysis takes more than simply talking about a topic to get through its various elements objectively. Sometimes to get to the root cause and help with putting together countermeasures to improve, you need a tool or a guide to help. This is where a number of Lean tools can help.

A3 problem solving is one such tool to help structure a review topic. The A3 is a simple template that traditionally lives on an A3-size sheet of paper, or roughly the equivalent of a sheet of American legal-size paper. The size is helpful in that it is both portable and keeps the team focused and brief, noting only what is important and relevant. Some teams resort to including diagrams and pictures on their A3s to convey richer information more compactly. The A3 can also be used with great effect in weekly retrospectives, normal day-to-day problem solving, and even in incident postmortems. The A3 can be employed during the Strategic Review for bigger structural and strategic items, though it can also be valuable to help organize your thinking around any sized problem-solving activity.

Figure 14.7 shows an example of an A3. I have included a more readable version of the A3 in the appendix.

Images

Figure 14.7
Example A3.

The title of the A3 is typically the theme being targeted. The theme must not be a solution, like “We need more automation” or “We need XYZ tool,” but should be more the problem you are trying to solve (like “Installing software is too time/people intensive” or “Troubleshooting XYZ problem is error prone”).

From there, the A3 is broken into the following sections:

  1. Background: Why is this topic important, and what is the business case? It needs to be relevant to the organization’s objectives, as well as concise in order to communicate the value in trying to address this problem. If there are other side benefits to working on this topic, such as learning, those can also be listed. This section basically needs to be clear enough not only for the team to care, but also to start to frame up the case for investing time and/or money in the problem.

  2. Current Conditions: What is currently going on, and what is the actual problem with the current situation? This should be fact based and clear, and quantified with baseline metrics if at all possible. Graphs and other visual indicators can help.

  3. Goal or Target Condition: What is the measurable or identifiable target outcomes(s) you are aiming to achieve? Be specific, and make sure it ties back to the business case and indicators captured in your current conditions. It is also useful to indicate how the outcome will be measured or evaluated.

  4. Root Cause Analysis: This is where the team spends time to get to the bottom of the problem. Some people go crazy and employ various problem-solving tools to help. The important thing is not the technique, but that you uncover why the problem is happening, and what the root cause is. Oftentimes, especially in Operations, the problem is a symptom of an even deeper issue.

  5. Proposal or Countermeasure: If the team has done an effective job of getting down to the root cause, you can now work together to put together one or more proposed countermeasures for solving the problem or improving the current situation. The countermeasures should address one or more aspects of the root cause. They should be accompanied by measurable or observable criteria to verify their impact and determine whether the action will prevent the recurrence of the problem.

  6. Plan: Who is responsible for doing what, by when? The answers to these questions need to be clear and agreed upon. Also, the implementation order is both unambiguous and reasonable.

  7. Results Confirmation: This details the results of the countermeasures to determine whether their effectiveness met expected targets and measures improved in line with the goal statement. If measures have not improved, the reasons why are detailed. This section is filled in at an agreed-upon future date, usually during a subsequent review.

  8. Follow Up: What did we learn about this situation, and in light of this new knowledge, what should we do? What is necessary to prevent a recurrence of the problem? What remains to be accomplished? Is any communication required, and if so, to whom and in what form(s)?

The theme, as well as some background and available supporting information regarding the current condition, is often agreed upon and gathered before the review. This allows the team to quickly review the information and agree upon a reasonable goal, then focus most of the time on performing root cause analysis, developing proposed countermeasures, and creating an agreed plan for moving forward.

During the root cause analysis, the team should take the opportunity to examine the relevant parts of the environment to help better understand the situation and give better context to any data. This may include gemba walks, engaging directly with a customer or other areas of the business that may help provide an external perspective of the situation.

Note

Gemba walks consist of walking through the relevant parts of the ecosystem. This can be anything from looking at the specific details of a process through a worked example, looking at the development process and underlying conditions within it, analyzing build or deployment environment construction/creation/hygiene to understand the situation and any potential issues it may have, examining logging or monitoring noise, investigating deployment issues in order to find and eliminate their causes, understanding how customers engage with services in order to improve their ability to achieve their target outcomes, and the like.

If at all possible, these discussions should be structured and time boxed to ensure that the team can agree upon the root cause(s), and then have sufficient time to develop and agree upon countermeasures and a plan forward before the end of the review.

The results from any countermeasures can be reviewed either at a subsequent strategic review or at some other agreed-upon point in time. I tend to prefer reviewing at the next strategic review whenever possible, because it both helps maintain momentum and allows for a convenient time to discuss and schedule any follow-up activities.

Here is an example of an A3:

Title: Environment setup is too time and people intensive

Background:

  • Setup requests require significant advanced notice, or else cause change delays.

  • Team capacity is heavily impacted with each request, causing other work to build up and be delayed.

  • Development, customers, and the business are complaining at the long lead times required to turn around an environment.

Current Condition:

  • [Value stream map of current process]

Goal:

  • Environment setup time reduced to one day or less

  • 80% reduction in rework caused by setup mistakes

  • Operational team intensity reduced by 75%

Root Cause Analysis:

  • [This could be a mind map of some variety, a fishbone diagram, or something else that you find helpful.]

Countermeasures:

  • Operating system installation automation

    • Turnaround time reduction target to less than 2 hours

    • Reduction of operating system installation rework by 90%

  • Cloud virtualization with an Image library

    • Turnaround time reduction target for frequently used preset configurations to less than 2 hours

  • Standardized packaging of all software

    • Reduction of installation rework by 90%

  • Software deployment automation

    • Reduction of people intensity

Plan

  • [High-level action plan with next steps]

As the typical problem levels begin to shrink, the team should start to find that they have a bit more capacity to tackle ever bigger or more complex items within the strategic review.

Summary

Creating a natural rhythm of synchronization and improvement events is a great way of turning alignment, reflection, and learning into a continuous habit, one that is shared by the entire team. The events along the shorter tactical cycle aid the team with planning, workload balancing, maintaining shared awareness, and making tactical improvements, while the longer strategic cycle help build in time for steady learning and improvement. Together they bring all the elements together to help the team succeed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.0.61