Chapter 13

Queue Master

The only people who see the big picture are the ones who step outside the frame.

Salman Rushdie

Anyone who has worked in an active technical environment knows how difficult it can be to stay on top of everything that is going on. Work comes in from different directions, tasks are often fragmented or out of order, and objectives can be unknown or unclear. The reactive nature of the environment amplifies all of this confusion, obfuscating what is happening across the ecosystem.

Having a visual workflow can help. But if you have spent any significant time with a kanban board or other workflow mechanisms, you likely know that tasks require more than just being slotted into a task board or tracking tool. They need to be added in with sufficient context to help the team avoid errors, conflicts, and rework.

Some people might think that managers are a natural solution for this issue. However, they are usually not close enough to the actual day-to-day work to have enough context. If that is not challenging enough, routing all the work coming in through them can dangerously limit how much time managers are able to spend on important activities such as advocating for the team and bridging with the rest of the organization. Concerns about managerial work overload can lead to deeper management hierarchies to compensate, further increasing the sorts of handoffs that can damage or restrict information flow.

A far better approach is to have someone in the trenches manage the workflow entry point. Such an approach both overcomes the context, information flow, and learning problems that managers would face and limits the number of team members who are being interrupted. However, this approach must be implemented in a way that doesn’t just create a context and learning problem somewhere else. It must also keep work flowing in a way that handles unpredictable events, maintains priorities, and spots dependencies that can block progress.

Fortunately, there is a strategy to do all of this and still help everyone retain their hands-on situational awareness. If executed well, it can create a means of uncovering useful patterns and help the team learn. Implementing this strategy requires creating a duty (rotating) role called the Queue Master.

An Introduction to the Queue Master

Images

Figure 13.1
The often misunderstood Queue Master.

The Queue Master is perhaps one of the most essential and misunderstood roles for any dynamic technical environment. It is not a first-tier helpdesk function or a people management role. When it is working well, it can actually reduce the cycle time between when a request comes in and is completed.

The Queue Master arose in various incarnations in the early service delivery organizations that were trying to balance the need to maintain both high uptimes and respond rapidly to change. Ensuring all the work coming in was properly sorted and prioritized was a huge challenge, especially as an increasing number of tasks were reactive or unplanned. Most of the companies were also start-ups that could not afford to hire lots of staff. Instead, they needed some way of aligning priorities and identifying hidden blockers, bottlenecks, and dependencies, all while minimizing the number of interruptions hitting the larger team.

As described in Chapter 12, “Workflow,” many delivery teams saw how work randomly entering the team was creating huge numbers of problems. Work duplication and collisions were common. Often team members felt so overloaded that they burned out and were unable to share what they were doing, let alone keep sight of the big picture of what was going on. As time went on, this began to detrimentally affect the ability to attract and retain talent, and with it service quality started to suffer.

Even those organizations that could afford staff often found that traditional approaches like establishing helpdesks, putting in place a tiered support model, or having a manager coordinator/project manager all seemed insufficient. They always struggled to be sufficiently responsive while also minimizing information loss.

I have found that tight and timely coordination requires the sort of immediate deeper context that can only be achieved by someone who also regularly performs the work. However, at the same time the person needed to not be so bogged down with work themselves that they could not maintain sufficient awareness across the service ecosystem. This combination was the only way that someone could effectively recalibrate priorities rapidly as necessary and readjust expectations smoothly to keep everything flowing.

Eventually, the team’s continued experimentation resulted in establishing the Queue Master.

Role Mechanics

Images

Figure 13.2
Role mechanics.

There are a number of role mechanics that are key to the success of the Queue Master. You may find some important additions that you need for your organization’s circumstances. However, you need to be careful when making any material variations away from those covered in this section. It is easy for such changes to unintentionally damage the underlying intent of maintaining situational awareness and flow within the team.

Let’s walk through each of the role mechanics to understand them a bit better, as well as why they are important.

Rotation

Perhaps the rule that is most important to the success of the Queue Master is that the duties must rotate across members of the team performing the work. Optimally this is done with a weekly cadence.

While at first many team members might find taking on a duty role to be disruptive to their own work, rotating the role at the natural cadence of a working week has a number of distinct benefits. The first is that it ensures the Queue Master maintains sufficient context with the work and team performing it. Active team members are exposed to the inner workings of the infrastructure, the software and services, and the people who manage them.

Images

Figure 13.3
Regularly rotating the Queue Master role is important.

This exposure provides a level of insight that managers or anyone else not directly immersed in it lacks.

Rotating at a natural cadence also gives everyone a chance to step out of their day-to-day work and properly see what is occurring across the ecosystem. I cannot state strongly enough how eye-opening this can be. Most of us forget how easy it is to lose sight of the big picture when we are buried in our work. Stepping back from it to see what is going on across a typical work cycle not only helps us to see what is happening, but helps the team understand and resolve problems that damage team effectiveness.

Every time I have put in place the Queue Master rotation, I have had at least one of the team who was most skeptical of the idea come to me on day three or four of their first rotation exclaiming how they now understood why the role was so important, detailing how much crazy stuff they never realized was going on that needed to be fixed.

Rotating the role also helps everyone on the team start to see and really appreciate the knowledge and effort other team members are contributing. They will often see team members struggling with some problem or dysfunction that they can help with. They will also remember and appreciate when they are helped by another team member who takes on Queue Master duty. Together, this builds team unity and fosters further collaboration.

Rotation also brings value to the Queue Master role itself. The differing perspectives that each person brings to the role can uncover hidden problems and solutions. Rotation also reduces the likelihood that those in the role will get inured to the sorts of dysfunction that can so often damage team effectiveness. If nothing else, the shift in exposure to another perspective of the ecosystem is likely to spur people to question the status quo.

Rotation greatly reduces the chance of Queue Master specialization taking root in the team. The last thing any technical team needs is yet another specialization that reduces the flexibility of the larger team.

Entry Management
Images

Figure 13.4
Entry management.

To be effective, the Queue Master must sit at the point where all non-incident work officially enters the team. This includes all work, from requests coming in from other teams to work internally generated by the team. The dual intent of this entry management is to ensure the clarity and priority of the work and to capture cross-organizational nuances that might otherwise be missed.

Entry management achieves a number of objectives. For one, it makes all work trackable and eliminates the chances of any assumptions cropping up that can be missed along the way. It also prevents people from misrouting tasks to any random person in the team, thus reducing interrupts and misprioritization of work. Another benefit of routing everything through the Queue Master is that they can act as a quick filter that helps limit garbage from entering the workflow. This reduces the chances of team members getting partial or incorrect information, or a misprioritized task hitting the team directly.

This does not mean that routing should be a heavyweight process, or that it is the Queue Master’s job to write up all the tasks themselves. Routing should be extremely light touch whenever possible. In some cases, particularly for internal team work, routing might be as simple as someone mentioning that they are adding a task to help clarify or add some important missing piece of work. As for writing tasks, the one requesting the work should write and submit the task whenever possible. The Queue Master can, if necessary, then follow up to add any missing information. This reduces any information loss.

All of this helps prevent the clogging up of the workflow. Most people might feel that their request is both important and obvious, even when it is not. This checking and sorting ensures that everything goes in the right place with enough information that it doesn’t get lost, misprioritized, or lead to unnecessary confusion. Critically urgent tasks will still always go into the Expedite column. The Queue Master can and should provide background and some queue reordering for anything else that might still be important or is a dependency to something that is yet not worthy of being expedited.

The best way to prevent unnecessary task rejection is for the team to proactively create and tune templates and examples of what a requester minimally needs to include in a ticket for it to be acceptable. This will go a long way in reducing both frustration and rework. Training and outreach should also be available to educate others across the organization. This will help stop the inevitable unhelpful tickets such as “The service is broken. Please fix,” “Need some software installed,” or the all-time favorite “Is the thingy backed up?”

Sorting and Dependency Discovery

As unplanned work tends to be a regular occurrence for service delivery teams, chances are good that work will come in that bypasses people like managers, project/program managers, and architects who can call out any hidden dependencies or ordering conflicts. To solve this, the Queue Master regularly goes through the queues in search of such problems to proactively help the team avoid any unnecessary blocks or rework that they might cause.

Having someone with intimate knowledge of the ecosystem do this has the added benefit of also catching any conflicts that might have been unknown or unclear to managers and architects. Sometimes this can be done in a lightweight way, such as simply flagging a conflict to the rest of the team to be aware of. In the case that the conflict is more serious, the Queue Master can send the task back to the requester for clarification or escalate to management for help.

To deal with all of this sorting, the default state of any work coming into the team is typically considered Unqualified, at least until the Queue Master has a chance to take a quick look at it to determine whether or not it is suitable for the board. Most of the time the Queue Master will determine the work item is fine and move it into the Ready column on the board. If, however, a task is just plain wrong or misdirected, the Queue Master can reject it and hand it back to the requester with an explanation.

Likewise, if a task is unclear, too big, or open-ended, the Queue Master has an opportunity to send the item back to the requester to clarify and improve the request before it has a chance to cause any problems with the team. One method for doing this is to alert the requester, then flag the task and place it in a Clarification queue in the workflow.

I have found that tracking both clarifications and rejections can help the team remain aware of what is going on while also helping them spot and understand if there are patterns or sources that are causing such problems that need to be addressed. This helps the team to focus only on the sorts of template, education, and process improvements that will make a difference in reducing problem occurrence over time. This improves their cycle time, but also reduces the frustration and workloads on the requester and Queue Master. It helps minimize any problems or misunderstandings that somehow make their way down to those performing the work, and does so without throwing in excessive processes for no real reason.

Improved entry management by the Queue Master also has the benefit of reducing the chances of work unevenly flowing across the team. Lean practitioners know that uneven flow creates waste. By ensuring that tasks enter in only through the Queue Master, team members avoid being put in a position where they have to choose between creating conflict with requesters and taking on too much work. This reduces team pressure.

Dark Matter Handling
Images

Figure 13.5
Getting Dark Matter under control is a critical role of the QM.

One of the powerful benefits of insisting that all noncritical work must go through the Queue Master is that it becomes a single point to receive and handle all Dark Matter requests. This ensures that they are captured and tracked to aid in understanding their sources and occurrence.

Redirecting Dark Matter to one central place can be tough to do at first for both the organization and the team. However, doing so is worth all the grumbling and minor inconvenience. Not only is recording the tasks important for understanding the source of and sizing the demand for various activities, but offloading trivial tasks to one place reduces the number of interrupts that otherwise would pepper the team. This reduces errors and helps improve responsiveness and flow.

Maintaining Flow

The role of Queue Master does not end with work entering the workflow. In fact, many of the most valuable Queue Master contributions occur by keeping the focus on the bigger picture. One of these contributions is maintaining the flow of work.

For instance, not all tasks that reach the Ready column in the workflow board necessarily deserve the same treatment. While management, the product owner, or even individual engineers often do most of the prioritization work, there are some cases where the Queue Master may need to do some reordering. This is usually due to either sudden shifts in priorities or to ensure that one or more key prerequisites get completed to avoid workflow blockages. When a Queue Master does this, they will usually alert the team of the situation, either in the daily standup or via chat, to make sure that any priorities or dependencies are clear and understood.

If a task requires special handling because the work item requires either specific skills or specific context, the Queue Master can make sure that these handling requirements are clearly marked. This will aid in informing the team when the task is ready to be pulled in, and can help with tracking as it moves across the workflow. As a team matures and the Service Engineering Lead role (covered in Chapter 9, “Service Delivery Maturity and the Service Engineering Lead”) becomes better established, the team itself frequently becomes the biggest source of such tasks. They are usually related to delivery tasks created and marked by the SE Lead to help with tracking.

While everyone is responsible for the board, there are times when there are items that can languish in a column. The Queue Master actively monitors the board to spot them and remove any blocks or potential problems that might be preventing them from progressing. The Queue Master also looks for large build-ups of work in progress, especially those caused by unexpected problems that develop.

The Queue Master uses the daily standup, as well as the team’s retrospective, as a way to bring up issues the team members encounter. The Queue Master can ask team members for details, as well as help to remove the issues and restore flow. The Queue Master can also work with the team to understand the issues’ root causes and find ways to prevent them from happening again in the future. Together, these ensure that work does not languish unnecessarily and that team members become overwhelmed with work.

Pattern Recognition
Images

Figure 13.6
Patterns are useful for quick recognition.

One of the major advantages of having a Queue Master tasked with actively surveying the workflow from the entry point through to completion is that they will start to see patterns emerge that can go a long way toward helping the team improve.

Such patterns begin with the demand coming in itself. There is always subtle yet important interplay between the types of tasks coming in, those asking for them to be performed, and larger developments taking place across the ecosystem. This context is valuable, especially in environments where a significant segment of demand is not otherwise easy to plan for in advance. If you can understand the source of the demand, its frequency, and what might drive it, you can begin to predict when it might occur again in the future. It also gives insight into similar requests, and similar requesters, that helps you establish larger patterns of demand. This can aid in resource planning, as well as sizing investment return from standardizing and automating the task, or eliminating its root cause.

Similarly, you can also start to understand and track dependency patterns between tasks. If a particular task enters the flow, you know that there is a good possibility that those same dependent tasks might follow. Understanding these dependency patterns can help preempt additional requests and improve team responsiveness and understanding. This reduces workflow noise and provides useful automation targets. It also aids coordination.

Being able to uncover demand patterns also provides useful clues about what the requester is trying to accomplish with the request. This is important because not everyone shares or is clear about the outcome they are trying to attain. It is not uncommon for people to make requests that, due to solution bias or some other misunderstanding, will not achieve what they are after. The Queue Master is uniquely positioned to notice such discrepancies and flag them for further investigation.

Patterns also exist in the workflow itself. There might be tasks that take far longer than they should, require rework, or have a large number of handoffs. There are also times when the Queue Master might notice that certain tasks can only be done by one person or a small subset of the team, creating a dangerous bottleneck that indicates knowledge and awareness might not be flowing evenly across the team.

Each of these patterns should be noted by the Queue Master and discussed with the rest of the team through the various syncing and improvement mechanisms discussed in Chapter 14, “Cycles and Sync Points.” Queue Masters who are great at pattern matching can make a big difference for a team. As more people take on the role, people start to share and find patterns based on their experiences. This strengthens the collective mind, providing fertile ground for innovative solutions to challenging issues.

Office Hours

It is important to point out that, unlike a typical on-call rotation, the Queue Master generally is not a role that must be available 24 hours a day. Nothing the role does is so urgent that it must always be attended to. Even if there happen to be off-hours requests that are so critical that they need immediate attention, they can easily be routed to on-call.

Images

Figure 13.7
The Queue Master role needs to be available for daily demand peaks.

For that reason, it is worthwhile to set the expectation that the Queue Master will be around and available at a set time period during the business day. Usually Queue Master work begins after the daily standup and ends at a reasonable set time each day that corresponds to the lull of task requests at the end of the workday.

Besides preventing burnout, establishing office hours also makes it obvious when work hits the team at unreasonable times. This doesn’t mean that there shouldn’t be a mechanism for extremely critical work that happens in the off-hours. In fact, if something so urgent comes in that addressing it cannot wait until the next day, it should be treated as a production incident. This does several things. It ensures that it will get immediate attention. It also ensures that the request is subsequently reviewed to see if there is a genuine need that has been missed that requires a change or even a new tool in order to address it.

Finally, and most importantly, raising an off-hours request as an incident can help requesters consider the urgency of their request. This allows for true problems to be accounted for and addressed in more sustainable ways.

“Follow the Sun” Queue Mastering

Images

Figure 13.8
For distributed orgs, the QM might need to rotate between offices during the day.

The one challenge to office hours is if your organization has technical teams six or more time zones away. Optimally, you should divide service responsibilities into self-contained parts that can all be handled locally. However, there are some cases where either this is unrealistic or cross-team coordination and awareness are so important that there is a daily need to ensure alignment.

In such cases you may have to consider having a “follow the sun” Queue Master setup. In this model, there would be at best two and no more than three Queue Masters, each separated by at least six time zones, to handle work during the core hours of their geographically distant location.

While I have successfully used the “follow the sun” model at several organizations, it is far from easy. A lot of context needs to be shared between each regional Queue Master to maintain alignment. I have found that this can be done by having the larger team try to do the following:

  • Think carefully about how work is organized across the delivery team. Often you can keep certain closely coupled work types together over a given cycle. This reduces the number of handoffs, and thus context overhead, required to maintain alignment and minimize rework. I still recommend rotating these “type” clusters from time to time to avoid creating bottlenecks, but doing so at natural cycle breakpoint.

  • Have as much communication as possible go over a globally shared chat channel. The transcript will add a lot more context than what can sensibly be put into a ticket.

  • Have one of the Queue Masters for the cycle take the “lead” or primary position to help with coordination. Try to rotate which location’s Queue Master takes the lead so that everyone feels included.

  • The Queue Masters need to synchronize at least daily over voice/video chat with each other. They should also share notes in a shared document. These are particularly important if you must have three Queue Masters. Even if time zones are far apart, you will likely need to get all of them together at least twice (usually toward the beginning and then at the very end) to synchronize and avoid too much regional drift.

Sync Point Management

Maintaining situational awareness, learning, and improvement all require there to be a certain amount of alignment of the team. There are lots of mechanisms that you can use to do this, from wikis and document repositories to chat rooms, standups, pairing, demos, meetings, and just working in the same location and chatting from time to time. Each has its advantages and disadvantages, which is why I have dedicated an entire chapter (Chapter 14, “Cycles and Sync Points”) to talk through patterns that have worked well in organizations I have worked. In them, the one common thread that I have found is that it is extremely helpful to have one person making sure that they happen. The best-placed person for that is the Queue Master.

The Queue Master will be exposed to the chaos at a high level, meaning that they should be better placed to see problems that others are living with but might be too close to see. By being the one in charge of the sync points, the Queue Master can help draw out the deeper details from others. This helps people get a better perspective and understanding of what is going on, helps build situational awareness, helps improve team collaboration, and even helps those in the heat of a situation to step back and look more objectively.

Other Work When on Duty

One challenge of a busy Queue Master cycle is that it might not give the person holding the role enough time to handle their normal day-to-day and project responsibilities. Like being on-call for production support, Queue Master responsibilities take precedence over just about everything else. This means that any other activities that the person has must either be transferred to someone else during the cycle or wait until that person has available capacity to continue.

This approach might seem extremely disruptive at first. However, as the Queue Master starts to make things visible enough that they can be worked on and improved, most can find a good balance between their rotation and their existing work.

One important thing that everyone should keep in mind is that the ability of the Queue Master to reduce the noise that otherwise would interrupt the team is of immense value. It means that the team members can complete their work without interruption, improving overall team flow throughput. Team members are also far more likely to share details about the work they are doing if they might need others to jump in to pick up the slack from time to time. This further builds team situational awareness and flexibility.

The cyclic rotation of the Queue Master role means the period is temporary and that activities can be planned to work around it. Over time as the workflow begins to settle, even this disruption will greatly lessen.

Queue Master Rollout Challenges

While the value of the Queue Master becomes increasingly obvious over time, it often seems like a struggle to get it established. The challenges always seem to follow a familiar pattern. This section describes some of the main challenges along with some strategies to work through them.

Team Members Don’t See the Value

The first pushback most teams see is that most team members at first simply do not see the value of the Queue Master. Everyone is already extremely busy, so the idea of messing around with the way that work is handled sounds like another set of useless and cumbersome steps that will further strain the team.

The main intent of the Queue Master role is to build a better understanding of the patterns of work hitting the team so that the team can find faster, easier, and ultimately more effective ways of delivering the outcomes that matter. Team members often think that they already know what is going on and simply need more resources. Yet their awareness naturally narrows to their own work and the parts of the ecosystem that directly impact it.

This narrowing, along with not consistently having enough time to assemble enough of the right compelling evidence to convince others to invest and try out new things, can slow the speed of improvements. This can become increasingly frustrating and demoralizing.

Team members will often give the role a chance if they can be shown and convinced that the Queue Master is a faster and more effective way to capture and articulate the problems the team faces in a way that can help them get the support they need for improvements. Fortunately, it does not usually take too many cycles to open everyone’s eyes to all sorts of ways the role can facilitate huge improvement gains.

More Traditionally Minded Managers Thwarting Rollout

Sometimes the team is more than happy to try something different, but there is resistance to do so from a more traditionally minded manager who feels that they are somehow losing control. They might feel uncomfortable with delegating many of the responsibilities they see as under their purview. These managers might be particularly concerned with some of their more junior team members handling the role.

For those managers, the positioning needs to be turned around to show that the Queue Master and workflow are there to make their lives easier. Unless the working ecosystem is in the Obvious realm (as discussed in detail in Chapter 5, “Risk”), it is highly likely that they already feel like their world is somewhat out of control. The workflow and Queue Master put the job of indicating what is going on in the hands of the people who are most likely to know. As the workflow is for the team, it and the Queue Master can help the manager see where there might be problems that need to be addressed, and help them build cases for a change in direction or more resources from the business. As the team becomes more successful, the manager can be more outward facing, further helping the team and their own careers.

One way to overcome this obstacle is by not trying to introduce the Queue Master role with everyone in the rotation all at once. Instead, begin with some of the more experienced team members, possibly with some experienced coaching help, to ease in the role. This not only helps everyone, including managers, get the most out of workflow and its learning and feedback mechanisms but also provides them with a better understanding of the role, why it is being implemented, and what everyone can do to improve it.

The case for project managers is similar. They are one of the key links to make sure that cross-team coordination is happening, and that work being performed is moving the organization toward the target outcomes of the project or program they are running. The Queue Master helps them to spot potential conflicts, problematic ambiguity, as well as resource constraints and general problems before they blow up the project.

Pushy Queue Masters

I have seen instances where Queue Masters see themselves as a sort of “boss for the week.” When this happens they try to push work onto team members and push it through the workflow. This goes explicitly against the intent of both the role and the workflow mechanism. Work needs to get pulled through the workflow. The Queue Master’s role is to help ensure that the priorities, ordering, and needs of the work items are clear. They can point out problems and help unblock work, but outside of keeping people within the rules of the workflow, they are not there to tell people what to do or how to do it.

Junior Team Members as Queue Masters

Most people are at least a little uncomfortable with the Queue Master role at first. It is a new role to them, and they still need to become familiar with its ins and outs. This is especially the case with less experienced junior team members. They are going to be put into a situation where they need to qualify incoming work, lead sync points, and ask more senior staff for guidance. This role can be tough and far more prone to mistakes. Sometimes junior team members might feel like they are facing a fire hose-like flow of work hitting the team that seems impossible to overcome.

For this reason, the more experienced people on the team should initiate the Queue Master role first. As everyone becomes more familiar with the role, junior people can shadow a Queue Master rotation, then flip it and have the more experienced person shadow the junior person as they take a turn at it. Once there has been a successful rotation, the junior person is included in the rotation. A buddy system (regardless of seniority level) is also helpful to help sort out questions. Eventually, everyone should be onboarded and helping everyone else.

Queue Masters Who Struggle to Lead Sync Points

Some people are shy, introverted, or just generally uncomfortable with bringing attention to themselves. Some simply do not want to be put in a position where they feel responsible for the success of the team. That is certainly understandable. Yet, it is important for everyone to have their turn at the Queue Master role, both to take part in helping the team and to get a chance to take a step back and see firsthand the dynamics across the service ecosystem. It is also critical that someone act as the lead to help bring the rest of the team together for the various sync points that occur during the Queue Master cycle, and the Queue Master is the best placed to do so.

As with junior team members, it is worthwhile for the team to help those uncomfortable Queue Mastering to build confidence. There are a number of ways to help the uncomfortable run sync points. There can be some shadowing by a senior member, much like Rashmi did with Matt in the sidebar story. The team can construct some simple templates for running the various meetings. There are some details in Chapter 7, “Cycles and Sync Points,” that can help here. The team can also provide some constructive feedback and encouragement along the way. As the team should be participating in all the sync points, this should be fairly easy to do.

Above all, it should never be forgotten that the team is only successful if everyone is able to pitch in and give their best.

Summary

Like the Service Engineering Lead, the Queue Master is a critical role for helping improve situational awareness across the delivery organization, and everyone in the team should rotate through the role. It ensures that the workflow works well and helps the team learn and improve.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.35.5