Chapter 14. Cycles and Sync Points

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 14 Cycles and Sync Points

If we do not hang together, we shall surely hang separately.

Thomas Paine

Those of us delivering and managing services in increasingly complex and dynamic ecosystems have likely discovered that conditions can change without warning, swiftly rendering our best-laid plans obsolete. This is bad enough when it happens to one person. It can be catastrophic when it affects the awareness and alignment across the entire team.

Having a visual workflow, along with the Queue Master and Service Engineering Lead, does help expose emerging changes and events. However, without some mechanisms in place to make sure that everyone becomes aware of a changed condition so they can adjust, reflect, and improve, inevitably someone will miss out and be left behind.

The best way to establish this shared awareness is to do so with a set of communication mechanisms that brings everyone together. In order to minimize disruptiveness, the mechanisms should align with the rhythms of life. Following these natural flows and cycles goes a long way toward turning them into natural habits that reduce the level of interruption and misalignment that more traditional mechanisms cause.

This chapter will walk through the mechanisms I’ve found work well in the service delivery space. They not only augment those mechanisms that we have already covered, but also provide the investigation and learning space necessary for implementing and improving the instrumentation, automation, and governance mechanisms covered in subsequent chapters.

Inform, Align, Reflect, and Improve

Unless your ecosystem is full of arsonists bent on seeking glory through disorder, chances are that you and your team want to know two things:

Are you making the right decisions to achieve the target outcomes?
How can you improve to make both your decision making and actions more effective?

Making effective decisions can be surprisingly difficult. It isn’t enough to know something. To guide action toward a desired result, you need to have just enough context about the dynamics of the situation at hand (your informed situational awareness) to match with relevant knowledge, either in the form of experience or knowledge resources that are easily accessible.

Improving your decision-making abilities requires a further step that involves taking the end result of any decision you made and comparing it to what was expected. If these do not align, you have to figure out (or reflect upon) what caused the misalignment.

Any number of events can cause misalignments that can degrade decision making, including:

Having material flaws in your understanding of your experience
Having outdated or incorrect situational awareness at the time of the decision
Not having timely access to appropriate knowledge resources
Lacking availability of sufficiently suitable execution resources
Using process or execution mechanisms that contain too much friction (in the form of speed, variability, or reliability)

Avoiding these problems becomes more complicated with other actors in the ecosystem. Whether you are depending upon their actions or merely affected by them, any shortfall in shared awareness or alignment can cause collisions or muddy any understanding of the actual effects and accuracy of your decisions.

To counter such tendencies and maintain alignment, organizations have tried a number of different strategies. These span from trying to control alignment through a top-down process to leaving it to the team to self-organize and figure it out themselves. Each has some limitations that are worthwhile to quickly explore.

Top-Down Alignment Control Approach

The most commonly used is the top-down control approach. It relies upon direct orchestration of the work that staff performs using a mix of scheduling methods, along with process and method controls that are managed by some sort of project, program, or staff manager.

People rely on the top-down control approach because it is simple, provides a sense of control that is alluring, and aligns with traditional management thinking. It also can work in ordered environments, as defined in Chapter 5, “Risk.” However, top-down control depends so heavily upon both a reliably predictable dynamic between the cause and effect of actions and the ability of the person managing the work to maintain a clear and accurate level of situational awareness across the ecosystem that any slip of either can cause a cascading failure. If this were not bad enough, this approach also counts on the manager-type to find and correct any faults and drive improvements through the team. Either of these is difficult when your awareness degrades. It also does not help that such failures also tend to degrade trust between management and the team.

Alignment Through Iterative Approaches

Moving across the spectrum are the more Agile-style iterative approaches, whether in the form of more cyclical approaches like Scrum or more flow-based approaches like Kanban. Rather than trying to control everything centrally, iterative approaches accept that not everything in the ecosystem is going to be clear and instead rely upon the fact that those performing the work likely will have the most up-to-date contextual information in the immediate area they are in. These approaches employ methods that try to optimize the flow of this contextual information across the delivery team to allow the team to self-organize, make informed decisions, and make improvements themselves to deliver more effectively.

The Scrum Sprint Model

The standard Scrum model leverages frequent cyclical mechanisms to align team work to customer objectives, while also allowing for work coordination to take place more tactically at daily standups. These cyclical mechanisms are then supported by both the show-and-tell at the end of the sprint, in order to get feedback from the customer on how well the work aligns to their expectations, along with the retrospective, which allows the team to reflect on their own challenges, make adjustments, and learn.

All of this does a much better job of helping keep everyone informed and aligned with each other and the priorities of the stakeholders. It also does a reasonably good job of encouraging the team to reflect and improve.

Unfortunately, this model’s weakness is that it does not deal particularly well with the unplanned reactive operational work that is core to DevOps. In order to both build a reliable iterative rhythm and get regular useful feedback from the customer, this model counts on work being planned and prioritized up front and then remaining static for the duration of the sprint.

When there is too much unplanned work hitting the team, there is a risk of all of this breaking down. Where this seems to be felt most deeply is in the cyclical alignment mechanisms. Most alignment activity tends to occur at Sprint planning, where the product owner can work with the team to figure out dependencies and areas where team members need to coordinate with one another. While an excellent Scrum Master or product owner might be able to help somewhat to untangle messes caused by unplanned work during a sprint, they often are hindered by the lack of a sufficiently deep level of visibility and situational awareness across the ecosystem to really help.

Kanban

Unlike Scrum, kanban thrives when tasks are unpredictable. This is also why many elements of it form important parts of the workflow described in Chapter 12, “Workflow.” By focusing on the flow of tasks and the amount of work in progress, it allows tasks to be reordered and new ones to be inserted at any time. There is even a means to expedite urgent work.

However, one of the biggest problems with teams that use kanban is that so many tend to overlook the need for cross-team synchronization, alignment, and improvement. This is not because this need was ignored in the creation of kanban. Kanban, as described by David Anderson, has daily standups much like Scrum, where everyone “walks the board” with someone acting as a facilitator. These daily standups, along with after meetings, allow the team to find and remove blockers as well as stay in synch. There are also queue replenishment meetings, which are similar to Scrum’s sprint planning in that they provide an agreed understanding of priorities and objectives, along with release planning, and even review and improvement sessions.

Where the problems begin is that most teams miss the intent behind these mechanisms. Rather than thinking about the target outcomes, how to maintain alignment across the team, and continual learning and improvement, most instead concentrate on the board and how many tasks they are moving through it. Outcomes and priorities are often neglected or forgotten about unless an escalation occurs. Review and improvement tend to be ignored altogether.

Even when the cyclic mechanisms are performed, most fail to achieve their underlying intent. Daily standups tend to devolve into who is blocking whom rather than everyone looking at the whole board to understand what is going on.

Losing the value of these sync points or dropping them completely is so common that it does not take much effort for a skilled eye to scan a board to see that it is happening. It leaves a clear shadow of fragmentation that degrades the alignment and delivery effectiveness of the team using it.

Service Operations Synchronization and Improvement

Now that we know where common alignment methods tend to fall short, what can be done to overcome these problems?

Rather than starting from scratch, we instead build upon the good work that has come from the iterative approaches. This starts with the kanban-like workflow as described in Chapter 12. We then add elements based on the cycles and synchronization points of both Scrum and kanban, but with a couple of important twists.

The first is the introduction of the Queue Master and Service Engineering Lead. As you will see in this chapter, both of these play a major role in overcoming many of the challenges of keeping the entire team situationally aware and aligned.

The second is more interesting. Having to react to unplanned work all the time, teams start to become increasingly tactically focused. This tendency can bleed into the alignment and improvement mechanisms, causing teams to think in a much more short-term way that suboptimizes what improvements and learning they can achieve.

For that reason, I have found that it is more sensible to divide these cycles into two. The first is the shorter tactical cycle, much of which has elements familiar to the iterative Agile cycles. The second is a much longer strategic cycle that is focused on deeper problem solving and improvement to help the team deliver to meet the target outcomes more effectively.

Let’s walk through each to understand them better.

The Tactical Cycle

The tactical cycle is primarily focused on keeping the team informed and aligned on a day-to day-basis. Many aspects of it are similar to a Scrum sprint. It is centered on the workflow and is led by the Queue Master. As the name suggests, the cycle contains mechanisms designed to help with tactical prioritization, resource allocation, event scheduling, and conflict resolution. Reflection and improvement are also important elements, but tend to be tightly targeted to either immediate need or the target outcomes laid out as part of the strategic cycle.

Images — **Figure 14.1**
Tactical cycle.

The length of the cycle is typically one week in order to improve the opportunities to adjust to findings coming from reactive work.

If development occurs in a separate team, it is helpful whenever possible to align the start of a tactical cycle to the start of the development sprint. This allows SE Leads to quickly assess and align resources and scheduling of activities across teams. If for some reason alignment is not possible, SE Leads will need to work closely with their delivery team before the cycle kickoff to try to determine what might be needed. Even a somewhat inaccurate view can help the Queue Master and team limit potentially damaging surprises.

The Queue Master usually rotates with each cycle. This is useful for two reasons. The first is that it introduces a regular fresh set of eyes into what is going on across the ecosystem. The second is that the work the new Queue Master needs to do in order to get up to speed gives both the new and the previous Queue Masters an opportunity to compare notes and get a fresh perspective about everything from the current state of the workflow to any outstanding activities from the previous cycle and any known blockers or known work coming in. This Queue Master brief in the hours before the new tactical cycle begins can help prevent both the Queue Master and the wider team from being lulled into dangerous complacency. Nothing encourages people to sharpen their situational awareness as quickly as the possibility of unknowingly being handed a raging dumpster fire.

Once the new Queue Master for the cycle has been briefed, this cycle begins with a cycle kickoff. There are also daily standups, and the cycle ends with a retrospective. While the resemblance with Agile counterparts is helpful, as noted earlier there are a number of important differences.

Queue Master Brief

While the tactical cycle is a continuous loop, the person holding the Queue Master role is not always the same. For this reason the Queue Master needs to go through a series of steps to prepare for a smooth handoff.

The process begins in the hours before the cycle kickoff meeting and is generally only as long as it needs to be. It starts with the current Queue Master reviewing the workflow board with the new Queue Master. This is usually short, and most of the focus is to provide extra context behind activities across the board that may simply be too lengthy to cover or otherwise not suitable to cover with the rest of the team during the retrospective or kickoff meetings.

The new Queue Master usually follows this by getting a quick rundown from any Service Engineering Leads of any major upcoming events or scheduled work. The intent is two-fold. One is to make sure SE Leads uncover any upcoming work that may not have made its way to the board. The other is to catch any potential resource requirements and dependencies that cannot be resolved by the team alone and require management help to sort out. This helps minimize resource conflicts from suddenly derailing the Kickoff meeting.

From there, the new Queue Master should touch base with management and/or key business contacts. This is to find out about any changing priorities or impending development or business activities that might provide useful context or uncover potential operational risk or constraints during the upcoming cycle. Resource and scheduling conflicts should be escalated here, if required. Sometimes it might make sense to arrange to have someone from management at the kickoff to answer questions and give guidance.

By the time the Queue Master is done, they should have a decent outline to keep the kickoff meeting focused.

Cycle Kickoff

The cycle always begins with the kickoff. The purpose of the kickoff is to bring the team together to agree upon the priorities and theme of the cycle. This is done in order to help align the team as well as provide a forum to surface potential resource and skillset needs for the coming cycle. As the Queue Master has to ensure the flow of work during the cycle, they are best suited to run the kickoff meeting.

When the kickoff meeting happens, it should first provide a theme for the cycle, if there is one, along with ranked priorities for the team. This is followed by a quick rundown of impeding development, business, and operational activities for everyone to be aware of. Then, each SE Lead goes through upcoming events in their projects, along with any details, context, and resources needed for upcoming work that needs to be scheduled and performed. From there, the Queue Master walks through the workflow, asking any questions and making sure that the team members know of any issues that might prevent scheduled or important work from occurring, or anything that might get in the way or slow down the pace of flowing work. Improvement items that have been agreed to are picked up and put in the Ready queue alongside any other ready known work. Once everyone is in order, the meeting ends.

Important Differences Between Kickoffs and Sprint Planning

To some people, the kickoff might just look like a somewhat stripped-down sprint planning or kanban queue replenishment meeting. There are enough similarities that you could overlay the two for teams that have both development and operational duties. However, before doing so there are some very important differences that you need to be aware of.

The first is that the unpredictable nature of operational work means that it is folly to load up a cycle to capacity with preplanned work and expect that it will all be done. There is simply no way of knowing whether capacity will be severely constrained by an operational disaster, high-priority emergency work, or some other event. This makes planning and coordination difficult.

The best way to counter this unpredictability is to limit how much of the team has to be exposed to its interruptions. Establishing the Queue Master role can go a long way to help. Another helpful way to counter unpredictability is to limit the size and uneven distribution of work items. This includes minimizing the number of tasks that require poorly distributed specialized skills. Doing so increases team flexibility by making the damage caused by any unexpected interruptions that do slip through far less severe.

The workflow itself also is useful for giving you a reasonable idea of not just the likely slack in capacity the team might have for interrupts but also what impact certain types of interruptions might have. This is useful for expectations setting and risk mitigation.

Another important difference is that, unlike in Scrum, there is rarely a stable set of prioritized work items coming from one stakeholder. The unpredictable nature of customer, infrastructure, security, and even organization demands means that newly incoming work can easily displace other high-priority tasks mid-cycle. The Queue Master and SE Leads should help reduce this unpredictability quite a lot, though it is unlikely to go away entirely.

The accompanying sidebar provides an example of a Cycle Kickoff meeting to help you understand its typical dynamics.

The Kickoff

It was Ed’s turn to be Queue Master. Queue Master weeks were always a bit disruptive, but Ed had really grown to appreciate the bigger-picture view they provided. They also helped him feel like he was really contributing to the team.

He knew that he had to get prepared for the kickoff meeting later that day, so he grabbed a pen and a pad of paper and walked over to the workflow board to take a look. He was already very familiar with its current state, but knew that there was always the possibility of something he hadn’t noticed while handling his own work. Taking some quick notes, he could then ask about anything he felt he might need details on.

The board was as busy as ever. The team had managed to get through a lot of tasks over the last week, though there were some things that could have been in better shape. For instance, Kathy’s oncall shift had been far busier than normal, meaning that she had a significant backlog of work still left to do to investigate the latest cloud caching technologies. There were also some tasks from the group that Beth was the SE Lead for that she had handed off to Simon while she was Queue Master that hadn’t gone quite as planned. Ed was sure that this would be brought up in the retrospective. Everyone needed to know if there were communication or handoff problems, or even if that annoying “this isn’t our Lead” trust problem popped up again in the delivery teams when work got picked up by others.

Ed could see that Emily was already loading up the Ready column with a bunch of work that needed to happen in preparation for the Feeds team’s upcoming release. She was always the most proactive SE Lead on the team. Ed was always amazed at how thoroughly she was on top of everything. Having said that, he noticed that there were a couple of items that she had put up that looked like they might touch some services that were being worked on by the team Beth was Lead on. He made a note to ask whether they were aligned on it.

After reviewing the board, it was time to visit the current week’s Queue Master. Beth was busy collecting her notes from the week and wrapping up some of the remaining Dark Matter items.

“Hey, Beth, how’s the week shaping up?”

“Mostly okay,” replied Beth. “I had to defuse another ‘oh-my-God you have to do this thing I have forgotten to tell you about for two weeks right now’ issue from Product. They had completely forgotten that they needed a beta of the new reporting engine and enough data to populate it for the trade show next week. If we hadn’t already integrated Danela’s environment management tools into the build process, they would have been completely out of luck.”

“At least that was taken care of” mused Ed.

“Yeah, definitely. Anyway, the workflow is in decent shape. I have some archiving Dark Matter tasks to finish up, and I probably ought to go talk to Simon before the retro.”

“Oh, I noticed that some of Emily’s work is touching on your project.”

“I noticed. Thanks for reminding me!” replied Beth. “I will bring it up in the kickoff if Simon doesn’t know about it.”

Ed swung by Billie’s office to see if there was anything important going on next week. He also went over to Janet to make sure that everything was more or less in place for the trade show. He then met up quickly with each of the Leads just before the retrospective kicked off.

The team had agreed that it made the most sense to have the new week’s kickoff on workflow Friday afternoons immediately after the retrospective. Some team members found this a little awkward. Most were pretty tired by the end of the week, and even though the board and Queue Master usually caught everything by Monday, they might forget some of the context of what was discussed. There was also the problem that some of the bigger customer-effecting releases tended to happen on the weekend to minimize customer impact. This sometimes left fallout that needed to be cleaned up on the following Monday. Mondays were always busy, making it hard to set aside the time for a kickoff. Holding the kickoff next to the retrospective was also very useful as it allowed the learning of the previous week to be immediately incorporated into the following iteration. There were also times that starting right before a weekend was useful, if nothing more than to let people get ready for the next week.

To deal with the “forgetting problem,” they agreed that Monday immediately before the standup the Queue Master would do a 5-minute recap.

When the kickoff started it was clear that the two main themes for the upcoming week were the buildup for the release Emily was on, and for people to help Kathy with her project. With Beth’s help, Ed quickly covered the scope of the trade show. Even though Janet was involved and the code was still beta, most of what was going to be demonstrated at the trade show was fairly static and therefore not likely to cause much in the way of problems.

Emily then started to go through the items she had coming up. “OK, guys, I have six tasks that I am teeing up for next week. The top two are things that I definitely want to handle. This third item gives a pretty good overview of what the release is about, so it should be done by someone other than me. This fourth one needs to be done before the last two, so I marked it as a dependency.”

Beth pointed at the last one and said “that one looks like it touches the transfer service, which folks on my team are working on.”

Yep, it does,” replied Emily. “You might want to tackle that one. Let’s meet after this and talk. I would love to make sure we aren’t causing problems.”

“Sure,” responded Beth.

After that, Kathy went through her caching project items. Sam offered to try to pick some of them up to help.

Beth then went through the three items that she had, and said that there were likely to be a few more coming in, but as the team hadn’t worked through the details for their next sprint yet, it didn’t make sense to put any more up.

Once everyone finished, it was Danela’s turn. Danela was the Tools & Automation Engineer for the team. She took on building anything that the others didn’t have the time or skills to do. Much of her work would arise from problems discussed in the retrospective that the team agreed needed automation help to resolve. These would then be prioritized by the team in the kickoff. It wasn’t very often that she had to react to anything during the iteration, so she was usually able to commit to a stable set of work items.

For this iteration, Danela was going to continue to work on some of the logging and auditing functionality for the automated deployment tooling. Much of this came out of the need to authoritatively report on the configurations of instances and services on particular dates, as well as to list who installed which changes present on them on which dates. As there were no other pressing matters, the team asked a few questions about her approach and plans for delivering it, but mostly left her to it.

Ed was feeling pretty good about the iteration once they wrapped up the meeting. Everything seemed to be in far more control and visible than before they started running the iterations. Even the predictable unplanned demand from Product was now far easier to deal with.

Daily Standup

The second of the iteration mechanisms is the daily standup. The daily standup is conducted to reinforce awareness across the team. Like its development counterpart, it happens daily and is intentionally short. It is intended to bolster the team’s ability to uncover and sort through any rising problems, conflicts, or coordination opportunities that might otherwise be missed or happen later than is optimal.

Just like its development counterpart, the daily standup is not a status meeting and must not ever become heavyweight. The key is to only mention things that others ought to be aware of.

There are a few minor differences in the structure of the service operations standup that are worth mentioning. The first is that the Scrum Master facilitation role is taken up by the Queue Master. The objectives of the Queue Master bear many similarities in keeping the standup short (preferably no more than 15 minutes) and focused on synchronizing team members as well as helping people with blockers and conflict. Where the differences come in is that the Queue Master uses the workflow as a tool to inform and to help spot conflict.

Standups start with a brief report from whoever was oncall during any production incidents that are noteworthy to mention. These are really short heads-up mentions of problem areas, whether or not a problem is still ongoing, who if anyone is still engaged on it, and if there is any incident report that people can look at. The key is to stay brief. Deeper discussion can happen afterward if necessary.

The Queue Master then takes the lead. They mention highlights on the workflow, including any interesting, important, or high-priority work that people need to be aware of in the queue, whether they have spotted a potential problem in one or more tasks that need to be brought to people’s attention, and if there are any dependencies or blockers that people need to be aware of.

Following the Queue Master, the chance to speak goes around the team. For double-duty development and operational teams, this can be just like any normal standup. For dedicated operational teams, this often can be a lot quicker than a normal standup. With all the work on the board, each team member only needs to mention specific items that they think people should know about regarding either what has been done or what is coming up. They can also bring up questions or problems about a particular matter, which then should be addressed after the standup.

The Standup

It was now Tuesday, and Ed’s week as Queue Master was going fairly well so far. Sure, there were still the usual vague requests like “Can you do that thing with the server?” tasks that every QM had to throw back for clarification. But for the most part the flow of tasks through the workflow was going fairly smoothly.

Simon was oncall this week, so Ed had Simon start.

“There were a couple of failures with the scheduler last night,” grumbled a rather groggy Simon. “Nothing serious. I know that the release Beth is on touches it, so I will let her know if I find anything interesting.” Next was Ed’s turn. Usually the Queue Master starts by bringing attention to any escalated or expedited work, then asks about any blocked or problematic work in the workflow. Today the only items to note were some important pieces of work that had been put in the queue for the deployments that Beth and Emily were leading. He mentioned them, but then let those two fill in any details.

Sam was next. “I might pull in the prerequisite task from Emily’s team today, if it is okay and nothing else major comes up. I’ll sync with Emily after this.”

“Ok, sure,” replied Emily.

Emily was next. She had cleared the first item she had to do, but a couple of other items came up that she still needed to load into the queue.

“We are looking to deploy next week,” stated Emily. “Unlike Beth’s, there aren’t all that many important changes. I have already updated the Arsenal wiki, and will point out some of the interesting bits at the retrospective.”

Thomas then spoke. He noted that he had two tasks in progress because he had inadvertently grabbed one that had a prerequisite of the second. He noted that he wanted to discuss how to prevent that in the retrospective at the end of the week.

Beth was next. “Well, the timing of the scheduler failure was pretty convenient. Mike has been deep in that code.” She then reached out and moved two tasks out of the Ready column. “We need to hold off on these two tasks until Simon, Mike, and I have a chance to dig into what happened. The others are still a go. I might also add a couple more later today. As always, let me know if you have any questions on any of these tasks.

Danela was last. “I am getting ready to update Depot and Mister Forensics later today. The changes so far are minor. As always it would be good to get any feedback. I am planning on updating the staging side first. If everything is fine I will roll it out in production. I will be sure to swing by and talk to Emily and Beth before I do anything. Let me know if any of you are interested in knowing more.”

Ed then wrapped it up and everyone started their day.

Retrospective

The retrospective is run at the very end of the iteration. It is an opportunity for the team to reflect upon the previous week, talk about what has happened, and look for improvements to be put in place:

Were the priorities incorrect, or was important information missing that was unexpectedly uncovered during the course of the week?
Were the goals and the amount of work that the team thought they would accomplish overambitious, or were tasks unexpectedly quick to complete, allowing the team to tackle additional work?
Were there tasks accidentally handled in the wrong order, or were there instances where tasks required significant avoidable rework?
Were there times when there was too much work in progress, and if so, why?
Are there upcoming developments that the team needs to know more about or needs to adjust to?
Are there discoveries or developments that might help the team?

Together, the retrospective creates a formal mechanism for team members to learn from both events and from each other and improve for the next iteration. It can also act as a natural inspection point and potential firebreak to allow for problems and dysfunction to rise to the surface closer to when they are happening. This provides better context for the problems, as well as allows them to be dealt with in a way that reduces the potential damage that they might otherwise cause. It can also help the team better articulate situations where management help and support might be required. This could be to receive additional guidance, to help in removing an impediment, or to obtain resources in an improvement that requires investment. For investments, the retrospective can help collect evidence to build a case for management to review.

The retrospective also marks the end of the term for the current week’s Queue Master. The advantage of this is that it is an expected break point for the team. This allows the current Queue Master to make sure that any important Queue Master items are handled properly and not lost.

General Meeting Structure

The length of the retrospective is heavily determined by the number of items that the team feels they need to discuss and agree to next steps on. Generally, you should aim for it to take an hour, but padded so there is the ability to spill into a second hour when necessary. Keeping it brief helps everyone stay focused and engaged. Anything longer tends to become less impactful, and most team members generally like having any free excess time in their schedules.

Everyone should be invited to attend the retrospective, though active participation is required by the following roles, which each have a commitment to fulfill:

The current week’s Queue Master
Next week’s Queue Master
Service Engineering Leads (if the role exists)
Any incident details (or important on-call findings)
Key individuals who fielded significant items during the iteration that may require discussion. These people can be brought in as necessary and do not need to stay for the whole meeting.

It is important that notes are taken during the retrospective to ensure that what is discussed is captured and can be subsequently tracked. In order to help make sure that the next week’s Queue Master is up to speed, it is usually good practice to make that person responsible for taking notes for the meeting for publishing afterward. These notes should at the very least include the list of potential discussion items, details of the ones discussed, along with any decisions and next steps with assigned owners. These notes should be published via a wiki with links to any work items that are referenced. These notes provide a useful insight into problem patterns, history, and progression that can be used to support further discussion in the strategic review.

The current week’s Queue Master starts the meeting by providing a summary of key workflow details from the last week, as well as a review of whether the theme for the week held. This summary should be brief, with the primary focus on any anomalies that are worthy of further discussion or further investigation and follow-up, not a rehash of everything that happened over the entire week. The Queue Master should lay out each item on a board or a Post-it Note with a 30-second summary of why it is noteworthy (which could be anything from making people more aware of a situation, an area that should be targeted for improvement, a failure or conflict that needs to be further investigated, a Dark Matter item that appears to be part of a bigger problem, etc.). The team can add to the list. After that is complete, the team votes on the top three to five items to discuss.

The next step is for the Service Engineering Leads to summarize details from their engagement with either a delivery or operational project. The focus is primarily on new developments, learning, or questions that may be of interest to the rest of the team. They might point to any new documentation, any demos or reviews, and any impending installation, configuration, or operational work that might be coming up to be entered into the workflow.

It is good to keep the SE Lead updates as brief as possible. Any need for a deeper dive into details can and should be done separately.

The SE Lead, along with the Queue Master and rest of the team, should look for and point out any opportunities for others to get exposure that can help the team come up to speed on the engagement. If the team feels that a single point of failure is developing in the team, it should be pointed out here so that remedies can be discussed.

Once the SE Leads are done, anyone who handled any production incidents or were part of an on-call rotation is given an opportunity to mention any items that are noteworthy for the team to think about. Again, like the Queue Master, this shouldn’t be a rehash of the week. It should instead target such things as problem areas with the production services that might warrant additional awareness, discussion or investigation, as well as potential areas to improve oncall itself. The target is to look to improve service “ilities,” while improving the effectiveness of incident management and the difficulty of on-call rotations for everyone.

Tools & Automation Engineering follows the production incident and management section with any updates or feedback that they feel is worthwhile to give the rest of the team. Sometimes this will be a mention of new tools or capabilities available to the team, along with establishing a time to go through them with team members. Other times it might be to ask questions or provide feedback to the team on particular problem areas that might require further discussion.

The final bit that gets covered before going into the top three to five discussion items is a quick run through of the tallies for Dark Matter items by the Queue Master. The main goal here is to see if the numbers are increasing or decreasing, and if anything new has popped up. As Dark Matter is often a target-rich environment for self-service automation, the team can use this time to discuss whether workflow backlog items should be created for Tools & Automation Engineering to consider tackling, and whether they should take a higher or lower priority to other work.

Once that is done, the team goes back to the top discussion items. If something that was brought up in the other parts of the meeting becomes more pertinent to discuss further, the team can vote to include it.

The Learning and Improvement Discussion

When the team gets to the top discussion items, there is often a tendency for the team to spend it complaining. While that can be therapeutic, it is not a great use of the team’s retrospective time. The team should instead dedicate this part of the retrospective to articulating the problem and assigning next steps to tackle it.

The structure of the discussion of each item should be as follows:

Initial statement of the problem.
How the problem measurably detracts from the team’s ability to progress toward outcomes.
Ways that the problem can be further investigated (in cases where the root cause is not clear); or, as in the case of the scheduler problem described in the upcoming sidebar, deeper explanation might come as part of the retrospective.
If the problem is a tactical change, determine what countermeasures can be or have been put in place to minimize or eliminate the problem. This determination should include by whom, at what cost (time, money, resources), and in what timeframe. It should also include how the effect of the countermeasure will be measured, by whom, and when the measures would be reviewed.
If the problem requires a more strategic change, determine whether it is an item that should be brought to the next strategic review meeting. If so, what evidence needs to be collected, and by whom, to help?

The discussion should be time boxed, generally to an agreed-upon time in the team, and moderated to keep everyone on topic. The moderator can be the next week’s Queue Master (assuming they are not the one actively pushing the topic), the manager of the team, or, in cases where it is a lively topic, a neutral third party.

Once the team has gone through the structure, a vote should be held to see whether everyone is satisfied with the result. If they are not, the topic can either be taken to a separate agreed-to meeting with the respective parties or escalated up the management chain for resolution.

Ed and the Retrospective

Finally it was nearing the end of the iteration for Ed. The week had been long and a bit bumpy. He was looking forward to handing over Queue Master to Sam, who was already starting to pull together the information he needed for the Kickoff meeting.

The team got together around the workflow board. Simon and Beth invited Mary to the retrospective to cover some of the issues that had been happening around the scheduler, as well as to answer any questions from the team around it and their upcoming release. Normally, Mary would be brought in when the scheduler topic came up, but Mary decided it would be interesting to sit in on the whole meeting to see what was going on in Operations in general.

Billie also decided to pop in for the meeting. The team was used to her showing up. She usually didn’t participate, though it was helpful having her there when big items came up that needed to be escalated up to her.

Once everyone was there from the team, Ed got started.

“This week was a bit less of a wild circus than last week. From a workflow perspective we appeared to make a lot of good progress on the two themes. Emily managed to get all of her work items done, and Kathy’s project made up most of the slippage from last week.

“I heard that there might be a new project kicking off soon that will need a Lead. I think Sam is next to get one. We haven’t received a request yet, but just be aware it is coming.

“I noticed a few odd things this week. Thomas had a big buildup of WIP this week. I know one of them was caused by a missed dependency, but there are still several items in the Doing column. It would be good to know if there is a bigger issue there we need to look into.

“There was also a problem where developers from the team Beth is Lead for kept trying to cram work into the workflow without going through Beth. I know those guys have been a little problematic. It would be good to discuss to understand if the problem is particular to them or needs to be more widely addressed.

“The scheduler problem we had this week is probably also another useful topic to put out there, especially as Mary is here.

“We had one expedited item off of the back of the scheduler outage. There are also a couple of items in the Ready column that have languished for the entire iteration. It would be good to know if we need to do anything specific to get those pushed through, as well as if we need to make any changes to how we handle work to prevent that from happening.

“Finally, there was a flood of tickets that came in about unlocking accounts. It would be good to know if it is a potential bug that development needs to take care of or the start of something we should create a self-service tool around.”

Ed put up Post-it Notes, one about WIP, one for Lead work problems, one for languishing tasks, and one for unlocks. “Does anyone have any others to put up before we vote?”

Beth raised her hand. “I have one, especially as Billie is here. What do we do when two teams are releasing changes on the same component immediately after each other? I am not sure if we can solve it ourselves, but it is an issue Emily and I have suddenly found ourselves facing.”

“OK, great! Let’s add that,” responded Ed. “Now everyone vote. Each gets to vote for three of them.”

After everyone voted, it was clear that the three winners were Lead work problems, the scheduler, and Beth’s overlap issue. Simon had put all of his votes on the scheduler item, as he was still concerned it might fail again, and wanted the team to know more about it.

Emily spoke up. “Let me go first. I might need to step out early to take care of some of the launch things that are coming together right now.”

“Sure,” replied Ed.

“Thanks. As you guys know we are launching this weekend. I think most of you made the run-through I did on Wednesday on the key changes. Fortunately, there aren’t many. I will be around all weekend in case anything goes wrong. Thomas and I agreed to sync up after the Kickoff meeting, and then at 9 p.m. on Saturday and Sunday. If any of you are dying for an excuse to go to a work meeting over the weekend, let me know.”

Beth muttered, “Actually, I might pop in on Saturday just to check on the status of the overlap items between our releases. Otherwise I will probably be thinking about it all weekend.”

“OK, I’ll send you the meeting details,” replied Emily.

Beth then spoke up: “I probably ought to go next. As you all know, my team is also getting ready to release. Originally we were hoping for next week. However, between the overlaps in Emily and my releases, and the problem with the scheduler we had this week, it looks like it might get pushed out. I will let all of you know when we have a better idea.”

Beth continued, “I’ll leave the scheduler topic for the discussion later. I know that we are also going to talk about the problems I have been facing with my team later, but I thought I’d say a couple of things now. My team is still clearly struggling with the SE Lead idea. I know that Kathy, who is probably the best Lead we have, was their Lead last time.”

“Yeah, they are a pain,” replied Kathy. “Are they still forgetting stuff and having planning problems?”

“Yes, that is part of it,” stated Beth. “Their work also touches a lot of services, which I don’t think everyone quite appreciates. Anyway, it would be really great if we can find some ways for us to help them help all of us.”

Kathy then thanked everyone for helping her gain back some time in her caching project. “We will be running some trials soon. Let me know if you have any questions or interest in knowing more about what we are up to. We are still about a month or so from any real action.”

It was now Simon’s turn as the on-call person. “We’re covering the scheduler later, which was this week’s albatross.”

When Simon was done, Danela updated the team about the Tools & Automation area. “OK, the Depot and Mister Forensics updates went well, as all of you know. We now have a much better track record of changes moving forward. I created a little widget that some of you might find useful for showing configuration churn over a block of time. I also updated the chatbot to better annotate anything we post up to Arsenal. It is also a little more userfriendly to use. Let me know if you guys have any feedback.

“On the account unlocking, it would be good to know a bit more about what is causing the problem. We shouldn’t create a tool that just deals with symptoms. I will talk with Ed afterward this, and maybe propose some investigative work for me for next week.”

“Great, thanks!” replied Ed. “OK, here are the Dark Matter tallies. As you can see, account blocks shot way up. Restarts are still going down, which is good. Now that Danela’s tools are out there to allow development to rebuild and restore most things on-the-fly, most of the requests we get are for more obscure things that need additional permissions. It is an area we might want to investigate more to see whether or not we can also automate that chain. Everything else was mostly fine.”

Ed continued: “OK, it is time to go through our discussion topics. As Lead issues got the most votes, let’s start there.

“The problem is about teams not working effectively with SE Leads. There is the issue of unknown or unqualified work hitting the Queue Master without the Lead’s knowledge. This, of course, usually creates a lot more work for the Queue Master. But the bigger problem is why the Lead is being left out of the loop. Not all work needs to go through the Lead. But in the case of the Zephyr team, they seem to habitually not engage with their Lead. As a result, we cannot help catch problems like the release overlap problem, or effectively help guide “ility” targeting or even ensure that the release goes smoothly.”

Kathy then spoke. “Part of the challenge I noticed was that the Zephyr folks are a little militant. They seem to think they know everything, and won’t listen to any comments we might have. They seem to think that we are just nontechnical janitor types or something. It is very weird.”

“I know that I definitely have been trying,” responded Beth. “Another issue that I think is more pertinent to the problem of unqualified tasks going around the Lead is that some teams, and Zephyr in particular, like to approach sprint planning as an all-or-nothing commitment. I continue to get pressure to commit myself 100% to the sprint, and I’m told that only I can do the tasks and no one else. The other problem is that, contrary to my understanding of Scrum, they feel that they can decide what I do and how I do it. That is a big problem, especially when what they demand cannot be done.”

Beth continued, “Here are two examples from this week. One ticket was for production data to be put into the development environment, even though it included PII (personally identifiable information) that we simply cannot expose like that. The other was a demand for administrator access to the servers hosting the scheduler. The Zephyr team refused to believe that even we try to minimize our own shell access to production!”

Ed replied, “I saw those tickets come through. What countermeasures can we put in place to help?”

As Zephyr was the team that she was on, Mary decided to speak up. “I know there are some strong personalities on our team that can make us a bit of a handful at times. It might be helpful to ‘re-onboard’ us. That way everyone can be exposed to what is supposed to happen.”

“Another thing we could consider doing is putting in place some sort of ‘SE maturity’ rating for teams,” replied Emily. “We did have something sort of like that when we first started, where we gave specific benefits for teams hitting specific targets. Maybe we can use something like that to nudge Zephyr in the right direction. We just need backing from management.”

“I can probably help with that,” said Billie. “Come up with a proposal and we can go through it. I think having some sorts of maturity measures can be helpful in general if they are crafted well.”

Emily responded, “I can start crafting something. I might need help from Kathy, Beth, and anyone else who might want to participate. Then we can go through it with Billie for her approval and then roll it out.”

“Count me in as well,” said Simon.

Ed smiled. “That sounds like a good plan to me. Is everyone okay with that proposal?”

Everyone agreed.

Ed then moved on to the scheduler problem. With Simon’s help, Mary gave some background to the problem, how to detect when an incident was happening, and what to do to fix it. She then went through what they were doing to put in a more permanent solution. It meant that their upcoming release would be pushed out, but with some of the other challenges going on, that was probably going to happen anyway. Mary agreed that she would provide an update in the next Retrospective.

Finally, the team went through the overlap issue. The general agreement was that the problem came down to a general lack of visibility of the code being worked on, as well as a lack of information sharing across development teams. They agreed to put repository transparency into the maturity work that Emily was doing. Billie also agreed to speak with the senior engineers of each team.

With that the team adjourned the meeting.

The Strategic Cycle

Operationally oriented work by nature tends to be both constant and heavily tactically focused. In such a reactive atmosphere people can unknowingly become inured to being in perpetual firefighting mode. This can cause people to not only fail to take a step back to understand and eliminate the underlying problems, but also lose sight of the target outcomes that key stakeholders are trying to achieve.

Many of us have seen various manifestations of this shortsightedness in our professional and personal lives. It could have been an overworked clerk turning away a customer in order to complete some paperwork, services that randomly interrupt and close important customer sessions, causing them to lose work, or a team intentionally leaving an important server in production that has become an irreproducible snowflake because no one has time to figure out how to rebuild it.

While retrospectives do help teams reflect and improve, it is far too easy for a team to become so focused on fixing the immediate tactical problems (like making the filling out of paperwork faster) that they miss the larger patterns of what is happening around them (e.g., having to fill the paperwork out at all, or having its completion being so urgent that it affects customer engagement and sales).

The strategic cycle tries to break this pattern. It does this in two ways. One is by explicitly dedicating some portion of team bandwidth to allow the team to take a step back from their day-to-day tactical activities to look more critically at whether there are more effective ways to achieve the outcomes desired. This time allows for deeper exploration of systemic problems and experimentation with larger or more radical improvement efforts that can break the cycle of limited half measures that so often hobble needed change.

The other way the strategic cycle tries to break the pattern is by giving the team ownership of improving themselves and their own efficacy. This subtle yet important shift in perspective moves the onus of improving away from management and to those who are more likely to make effective and lasting change. It also helps individuals and teams feel more empowered to initiate change, as well as feel pride for any improvements they enact.

Giving teams the bandwidth and responsibility to improve doesn’t mean that improvement is a disorganized free-for-all. The strategic cycle relies upon three mechanisms to help bring and maintain focus throughout.

The first of these mechanisms is the improvement and problem-solving kata, as explained in Chapter 7, “Learning.” The improvement kata is used by the members of the team to organize and explore improvements toward an agreed target condition. Work that gets generated as part of the kata gets integrated into the workflow much like any team project so that it can be tracked and team members avoid becoming unnecessarily overloaded.

Team members working on strategic cycle items can sometimes need help and guidance to stay on track and progress. This is where the second mechanism, the coaching practice, comes in. The coaching practice, also covered in Chapter 7, is a way for coaches, managers, and team leads to help team members shape and progress their improvement kata efforts. Sometimes help is in the form of problem analysis. Other times it might be helping the team shape investment cases, providing resources to help their efforts, or redirecting tactical work to give them the bandwidth to progress.

What pulls the whole strategic cycle together and provides the target conditions that are fed into these improvement katas come out of the strategic review. The strategic review, described in detail in the next section, is the main formal event of the strategic cycle and the one that it begins and ends with. It is the mechanism that involves the whole team where they review and reset target outcomes, reflect on larger or more stubborn retrospective discussion topics, wargame or run a hack-a-thon to rough out potential new solutions to a common problem, as well as improve cross-team alignment.

Together, these activities create the atmosphere necessary to promote the sort of learning and professional growth that help individuals and the team succeed.

The strategic cycle is intentionally longer than the tactical cycles it overlays, with the optimal length being monthly. This not only helps create a break from the day-to-day pressures that get in the way of looking objectively across the ecosystem, but also gives the team a chance to gain support to tackle bigger problems that hold it back.

For busy teams, using a double strategic cycle loop can be a workable option. In these situations there is a major cycle to tackle large problems and significant transformational efforts that runs quarterly, and minor cycles that tackle either smaller strategic items or, for distributed teams, local aspects of the larger cycle item. This model is far from ideal, and is only recommended if the regular approach simply isn’t working.

Let’s take a look at the mechanics of the strategic review and the different forms it can take to understand how it anchors the entire strategic cycle.

Strategic Review

The intent of the strategic review is to establish the focus for the strategic cycle by defining one or more target conditions to achieve. The topic or theme behind these conditions is typically chosen by the team about one or two weeks before holding the session, either by having a dot vote (where each person on the team is allotted three to five votes they can put on one or more items, with the topic having the most votes being chosen) or by taking the most urgent or important topic on the list. Choosing the topic beforehand maximizes the amount of time in the review that can be dedicated to working through the problem. It also allows the team to prepare by gathering evidence, materials, and/or people who might prove valuable for the session.

There are three typical sources of topics. The most common source is the tactical cycle retrospectives. Often there are larger or more involved items that need more focused time to solve than can be provided in a tactical cycle. Another possible source is a major shift in service offerings or organizational structures. Such a shift often has very real impacts on the team that need to be explored and understood so that appropriate adjustments can be made.

The third source is past strategic cycle topics that need review to determine whether they need to be explored further or if new target conditions need to be set. Typically, a topic should be sized with conditions that can be reached in one to two strategic cycles. A topic that needs more than one cycle should be reviewed at the next strategic review to see if it is on track or requires adjustments. If it cannot be completed by the end of the second cycle, either the target condition was mis-scoped or those working on it were not given enough assistance to complete it. In the case of the latter, the strategic review should be dedicated to coming up with new ways to ensure sufficient assistance can be secured in the future.

One very important point is that the review is not a mechanism for management to kick off some business-driven initiative that has little to do with the team learning and improving itself. The review is for the team. This is an important point to make, especially as the person leading or moderating the meeting is often the leader or manager of the team. For this reason, especially in the early forming/storming days where there are lots of issues and little understanding or alignment on how best to use the meeting to solve them in a way where the team can learn and improve, using an outside facilitator or balanced party well versed in such reviews to help moderate is a good idea.

The review is arguably one of the most important events for the team. It not only helps improve team cohesion and cross-team alignment, but also is an opportunity to bring everyone together out of their everyday roles to learn and break free of potentially flawed mental models though exposure to the insights that others have of the operating ecosystem. For this reason, it is important to invite everyone on the team.

Many teams find such regular strategic reviews extremely difficult to do, especially when teams first start on their DevOps journey. Meetings that can be long and take you out of your day-to-day activities always seem like a painful distraction no matter how helpful they end up being in the end. There is also the challenge of having everyone involved. It is not uncommon for teams to be big, busy, and geographically separated with poor telecommunications setups.

Those teams that are truly geographically split (such as US-India, US-Europe, US/EU-Asia) with sizeable numbers on both sides face a far bigger problem. Some can get by with cross-coordinated local strategic reviews where each location focuses on specific areas and then shares with the other. Even with those, having at least quarterly joint reviews has a lot of value. This can be accomplished by rotating the hosting between locations and supplementing with some travel of key staff from the remote location to help with cross-pollination. It is still not ideal, and can still fall foul of cultural disconnects, but it is far more effective than not doing it at all.

General Review Structure

The typical strategic review is broken into three parts. The first is a very quick review of progress on the measures that came out of the last review. Typically, any updates are posted regularly for the whole team to see so that any significant problems can be flagged for discussion later in the Review. The purpose of the first part is to have a short and focused update to add any needed additional color on the topic from those acting on them. This is usually timeboxed to no more than 5 minutes each.

The second part of the review is the main subject matter itself. This begins with an initial statement of the topic or theme, along with some explanation as to why it is important to cover for the session. This does not need to be particularly long, just enough to set the scene for the rest of the meeting.

One thing to keep in mind is that a review does not necessarily have to be purely problem-based. Sometimes it might be devoted to exploring a new technology, doing a deep dive in an environment or subsystem, or meeting with a customer. In each of these cases there needs to be a clear and measurable objective agreed to at the beginning that must be met by the conclusion of the cycle. These are typically one of the following:

Insights to improve team situational awareness with next steps that alter the team’s approach and ways of working
New technology, tool, or process approaches to be adopted or extended to help the team improve situational awareness, or decision and delivery efficacy to meet target outcomes

When a problem is the topic, the discussion should follow a somewhat similar structure to that of the retrospective, including an initial statement of the problem and how it measurably detracts from the team’s ability to progress toward target outcomes. This discussion should be capped to no more than 15 minutes.

Once the team agrees to a statement, the team moves to the third part of the review, which is to dig into the problem to come up with a path for resolving it. This should employ blame-free problem-solving tools to determine the problem’s root causes and explore potential workable solutions to reach the target condition. Sometimes a root cause will not be totally apparent, in which case experiments should be framed up to discover more. Like any improvement initiatives, these should be set up with clear target conditions that can be fed into a learning kata.

Depending on the problem you are trying to solve, there are a number of problem-solving tools that can work well to help. For instance, value stream mapping works well for analyzing flow and handoff problems.

Perhaps one of the most versatile tools for structuring a general problem-solving discussion is the A3.

A3 Problem Solving for the Strategic Review

Often root cause analysis takes more than simply talking about a topic to get through its various elements objectively. Sometimes to get to the root cause and help with putting together countermeasures to improve, you need a tool or a guide to help. This is where a number of Lean tools can help.

A3 problem solving is one such tool to help structure a review topic. The A3 is a simple template that traditionally lives on an A3-size sheet of paper, or roughly the equivalent of a sheet of American legal-size paper. The size is helpful in that it is both portable and keeps the team focused and brief, noting only what is important and relevant. Some teams resort to including diagrams and pictures on their A3s to convey richer information more compactly. The A3 can also be used with great effect in weekly retrospectives, normal day-to-day problem solving, and even in incident postmortems. The A3 can be employed during the Strategic Review for bigger structural and strategic items, though it can also be valuable to help organize your thinking around any sized problem-solving activity.

Figure 14.7 shows an example of an A3. I have included a more readable version of the A3 in the appendix.

The title of the A3 is typically the theme being targeted. The theme must not be a solution, like “We need more automation” or “We need XYZ tool,” but should be more the problem you are trying to solve (like “Installing software is too time/people intensive” or “Troubleshooting XYZ problem is error prone”).

From there, the A3 is broken into the following sections:

Background: Why is this topic important, and what is the business case? It needs to be relevant to the organization’s objectives, as well as concise in order to communicate the value in trying to address this problem. If there are other side benefits to working on this topic, such as learning, those can also be listed. This section basically needs to be clear enough not only for the team to care, but also to start to frame up the case for investing time and/or money in the problem.
Current Conditions: What is currently going on, and what is the actual problem with the current situation? This should be fact based and clear, and quantified with baseline metrics if at all possible. Graphs and other visual indicators can help.
Goal or Target Condition: What is the measurable or identifiable target outcomes(s) you are aiming to achieve? Be specific, and make sure it ties back to the business case and indicators captured in your current conditions. It is also useful to indicate how the outcome will be measured or evaluated.
Root Cause Analysis: This is where the team spends time to get to the bottom of the problem. Some people go crazy and employ various problem-solving tools to help. The important thing is not the technique, but that you uncover why the problem is happening, and what the root cause is. Oftentimes, especially in Operations, the problem is a symptom of an even deeper issue.
Proposal or Countermeasure: If the team has done an effective job of getting down to the root cause, you can now work together to put together one or more proposed countermeasures for solving the problem or improving the current situation. The countermeasures should address one or more aspects of the root cause. They should be accompanied by measurable or observable criteria to verify their impact and determine whether the action will prevent the recurrence of the problem.
Plan: Who is responsible for doing what, by when? The answers to these questions need to be clear and agreed upon. Also, the implementation order is both unambiguous and reasonable.
Results Confirmation: This details the results of the countermeasures to determine whether their effectiveness met expected targets and measures improved in line with the goal statement. If measures have not improved, the reasons why are detailed. This section is filled in at an agreed-upon future date, usually during a subsequent review.
Follow Up: What did we learn about this situation, and in light of this new knowledge, what should we do? What is necessary to prevent a recurrence of the problem? What remains to be accomplished? Is any communication required, and if so, to whom and in what form(s)?

The theme, as well as some background and available supporting information regarding the current condition, is often agreed upon and gathered before the review. This allows the team to quickly review the information and agree upon a reasonable goal, then focus most of the time on performing root cause analysis, developing proposed countermeasures, and creating an agreed plan for moving forward.

During the root cause analysis, the team should take the opportunity to examine the relevant parts of the environment to help better understand the situation and give better context to any data. This may include gemba walks, engaging directly with a customer or other areas of the business that may help provide an external perspective of the situation.

Note

Gemba walks consist of walking through the relevant parts of the ecosystem. This can be anything from looking at the specific details of a process through a worked example, looking at the development process and underlying conditions within it, analyzing build or deployment environment construction/creation/hygiene to understand the situation and any potential issues it may have, examining logging or monitoring noise, investigating deployment issues in order to find and eliminate their causes, understanding how customers engage with services in order to improve their ability to achieve their target outcomes, and the like.

If at all possible, these discussions should be structured and time boxed to ensure that the team can agree upon the root cause(s), and then have sufficient time to develop and agree upon countermeasures and a plan forward before the end of the review.

The results from any countermeasures can be reviewed either at a subsequent strategic review or at some other agreed-upon point in time. I tend to prefer reviewing at the next strategic review whenever possible, because it both helps maintain momentum and allows for a convenient time to discuss and schedule any follow-up activities.

Here is an example of an A3:

Title: Environment setup is too time and people intensive

Background:

Setup requests require significant advanced notice, or else cause change delays.
Team capacity is heavily impacted with each request, causing other work to build up and be delayed.
Development, customers, and the business are complaining at the long lead times required to turn around an environment.

Current Condition:

[Value stream map of current process]

Goal:

Environment setup time reduced to one day or less
80% reduction in rework caused by setup mistakes
Operational team intensity reduced by 75%

Root Cause Analysis:

[This could be a mind map of some variety, a fishbone diagram, or something else that you find helpful.]

Countermeasures:

Operating system installation automation
- Turnaround time reduction target to less than 2 hours
- Reduction of operating system installation rework by 90%
Cloud virtualization with an Image library
- Turnaround time reduction target for frequently used preset configurations to less than 2 hours
Standardized packaging of all software
- Reduction of installation rework by 90%
Software deployment automation
- Reduction of people intensity

Plan

[High-level action plan with next steps]

As the typical problem levels begin to shrink, the team should start to find that they have a bit more capacity to tackle ever bigger or more complex items within the strategic review.

Summary

Creating a natural rhythm of synchronization and improvement events is a great way of turning alignment, reflection, and learning into a continuous habit, one that is shared by the entire team. The events along the shorter tactical cycle aid the team with planning, workload balancing, maintaining shared awareness, and making tactical improvements, while the longer strategic cycle help build in time for steady learning and improvement. Together they bring all the elements together to help the team succeed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14. Cycles and Sync Points

Create new playlist

Sign In

Sign Up

Chapter 14

Cycles and Sync Points