Chapter 14. Agile Architecture Design

We shall be unable to turn natural advantage to account unless we make use of local guides.

—Sun Tzu

You may recall from Chapter 9, Managing Crises and Escalations, that one of our favorite technology companies is Etsy, the marketplace for handmade or vintage items. While this company has a stellar technology team, with John Allspaw at the helm of the technical operations team, it also experiences site outages occasionally. One such event occurred on July 30, 2012, while employees were performing a database upgrade to support languages that required multi-byte character sets.1 They upgraded one of the production databases and everything went as planned. A database upgrade, especially one that changes the encoding of the data at rest, runs the risk of corrupting or losing data. The team then put together a careful plan for slowly upgrading the remaining servers over a period of time. The upgrade had gone out to only one server, and the upgrades for the other servers were basically placed “on deck.”

1. One byte can represent 256 characters, which is enough for the combined languages of English, French, Italian, German, and Spanish. The character sets of other languages, such as Chinese, Japanese, and Korean, include a larger set of ideographic characters that require two or more bytes to represent such a great number of these complex characters. The term for mixing single-byte characters alongside two-or-more-byte characters is “multi-byte.”

Etsy often has many changes queued up to be introduced to production. The company was having problems with site slowness during the nightly backups and had queued up an improvement to allow for faster backups. To release the backup fixes, Etsy’s engineers were using an automated tool whose job was to make sure all of the servers were consistent. After testing, they pushed the fix to the backups using the tool, and expected to confirm later that night that backups successfully ran without interfering with the site’s performance. What they didn’t know at the time was that deploying the improvement to the backups also deployed the database language upgrade.

Once the team realized that approximately 60% of Etsy’s database servers were upgrading their character sets while actively serving traffic, they quickly brought the site down to protect against corruption. The team that responded to this event was multidisciplinary in its composition. As Allspaw states, “No one team owns response to an incident.” At Etsy, many teams participate in a 24/7 on-call group. There are no problem managers necessary because the teams own the services they produce. In some cases, such as the search team, the multidisciplinary Agile development team acts as first responders. The search team receives the alerts first and escalates the problem to technical operations only when necessary.

Once the team was able to confirm that the databases were correct and behaving normally, they brought the site back up. All the while, they communicated to the community through the EtsyStatus site (http://etsystatus.com/).2 This example shows how Agile teams (5- to 12-person cross-functional teams of product managers, engineers, QA personnel, and DevOps staff) can own the service or product they produce from architectural design all the way through production support.

2. This case study was taken from John Allspaw’s posting “Demystifying Site Outages,” https://blog.etsy.com/news/2012/demystifying-site-outages/.

In the last chapter we discussed two processes, joint architecture design (JAD) and the Architecture Review Board (ARB). The purpose of using these processes was to ensure we had cross-functional teams designing our products (JAD) and multidisciplinary high-level engineers and executives ensuring the consistency of standards (ARB). In this chapter, we will approach the design of features and systems in a different manner. We’ll explain how the new Agile Organization is responsible for the design of its part of the system and how these independent Agile teams are capable of following standards.

Architecture in Agile Organizations

Let’s start by reviewing the Agile Organization that we described in detail in Chapter 3, Designing Organizations. In functionally aligned organizations, individuals are organized by their skill, discipline, or specialty. However, almost every project requires coordination across teams, which means coordination across functions. This is especially true of SaaS offerings, where the responsibility of not only developing and testing the software, but also hosting and supporting it, falls on the company’s technology team. This results in some amount of affective conflict. The amount obviously depends on many factors but we want to minimize all of it if possible. Recall that affective conflict is “bad” conflict centered on roles and ownership and often involves questions of “who” owns something or “how” a task should be done. This type of conflict is destructive and leads to physical and emotional stress on team members. Physically, it can leave us drained as our sympathetic nervous system (the same system involved in the fight-or-flight syndrome kicked off by the hypothalamus) releases the stress hormones cortisol, epinephrine, and norepinephrine. Organizationally, teams may fight over the ownership of products and approaches to problems, leading to closed minds and suboptimal results. Agile Organizations, in contrast, break down the organizational boundaries that functional organizations struggle with and empower the teams, eliminating the problem that matrix organizations face.

Agile Organizations are 5- to 12-person teams made up of personnel with the skill sets necessary to design, develop, deliver, and support a product or service for customers. These teams are cross-functional (multidisciplinary) and self-contained. They are empowered to make their own decisions without seeking approval from people outside their teams. They are capable of handling the full life cycle of their products or services. When teams are aligned by services, are autonomous, and are cross-functionally composed, there is a significant decrease in affective conflict. When team members are aligned by shared goals and no longer need to argue about who is responsible or who should perform certain tasks, the team wins or loses together. Everyone on the team is responsible for ensuring the service provided meets the business goals.

Recall the theory of innovation we presented in Chapter 3, which informed multidisciplinary team construction with the purpose of driving higher levels of innovation. In a SaaS product offering, this increased innovation is often measured in terms of faster time to market with features, better quality of the product, and higher availability. The drivers of this innovation are the decreased level of affective conflict and the increased levels of cognitive conflict, network diversity, and empowerment. The Agile Organization structure provides all of these drivers, and the result is often a significant increase in innovation.

Ownership of Architecture

All of this increase in innovation, greater autonomy, and less affective conflict sounds great. However, along with great power comes great responsibility. These Agile teams own the architecture of their services or products. In other words, instead of relying on a separate architecture team, this team owns the design and implementation of its products. How does this work in practice?

The Agile team is composed of personnel who have all the skill sets necessary for the team to remain autonomous and produce its product. The typical members of the team include a product manager, several software developers, a quality assurance engineer, and a DevOps engineer. If the organization has software architects, these individuals are placed on Agile teams as well. The composition should remind you of the JAD process composition from Chapter 13. If the Agile team is to design and develop highly available and scalable products and services, its members need the same skills that any other group would need to accomplish the task. JAD is a Band-Aid meant to create the multidisciplinary design within functionally oriented teams that is inherent to Agile teams.

We know that experiential, network, and skill set diversity allows individuals to develop better solutions to problems—and architecting a feature, story, solution, bug fix, or product offering is no different. The best solutions come from teams that include representatives from multiple disciplines who can draw on their different viewpoints and experiences to attack the problem. Ask a software engineer to design a door, and he or she immediately begins thinking about how a door works—how do the hinges work, how does the handle turn, and so on? Ask a product manager to design a door, and he or she might begin thinking about the benefits the door should achieve, how other competing doors function, and what the last round of user testing revealed, Ask a system administrator to design a door, and he or she will probably start thinking about how to secure it or how to recover from common failures within latches or locks. When these different viewpoints are collected together, they make a stronger, more scalable, and more available product.

Network diversity is a measure of how individuals on a team have different personal or professional networks. This becomes important with regard to innovation because almost all projects run into roadblocks. Teams with diverse networks are better able to identify potential roadblocks or likely problems early in the project because they can seek advice from a wide variety of people outside of the team. When the team actually does encounter roadblocks, those teams with the most diverse networks are better suited to finding resources outside of their teams to get around the obstacle.

Limited Resources

When an Agile team has a diverse skill set, a broad network of contacts, and a wide variety of experiences to draw upon, the team members are more able to design, develop, deploy, and support highly reliable and scalable products. But what happens when the company can’t afford to put an architect on every team, or when there are only two DevOps engineers and six Agile teams? This problem arises not only with cash-constrained startups but also with the flush-with-cash, hyper-growing companies. In the first case, the company cannot afford to hire as many architects, DevOps engineers, and other technical professionals as it would like. In the second case, the company might not be able to attract and retain as many architects or DevOps engineers as quickly and as long as it would like. It is very common to grow teams asymmetrically by hiring software developers first, then product managers, then perhaps a QA engineer, then a DevOps engineer, and finally an architect. This hiring process to fill out the team might take a year to 18 months to accomplish. No company will let the software developers sit idle for this amount of time while waiting for an architect to join their team. But what is a team to do in this situation?

When faced with limited personnel and resources, teams often try to make do with what the organization has at the time. This approach can result in considerable risk for the company. Consider the situation where there are not enough product owners. The outcome may be hastily written user stories and poor prioritization, which can in turn result in costly mistakes made by the Agile teams.

The first step for the Agile teams is to ensure they have the necessary resources to perform their function well. Anytime a key resource is missing, the team is in jeopardy. One way to make the business case for necessary resources is to use the Kanban board to visually display bottlenecks in the development process. Another approach is for the team to share resources across teams. The limited resource, such as DevOps engineers, can be shared across teams. These individuals should be assigned to multiple teams but not all teams. Think multitenant but not all-tenant. For example, if you have two DevOps engineers and six Agile teams, assign DevOps engineer 1 to Agile teams A, B, and C. Assign DevOps engineer 2 to Agile teams D, E, and F. This way the teams feel some sense of connectedness with the DevOps engineers assigned to them. They start to think, “DevOps engineer 1 is my go-to person” rather than “We have a pool of two DevOps engineers to whom I make requests by submitting a ticket to a queue.” See the difference? In once scenario we’re breaking down walls; in the other we’re ensuring that they remain up between the teams.

Standards

One topic of discussion that invariably comes up with multiple teams is maintaining standards across those teams. If we allow each Agile team to autonomously decide which patterns, libraries, frameworks, or even technologies it will rely upon, how does the company benefit from economies of scale? How will engineers who transfer between teams have any common or shared knowledge? The answer to these questions, as in a lot of cases, is “It depends.” It depends on the organization, the leaders, and the team members. Let’s look at a few different approaches.

Some organizations hold the beliefs that teams should be allowed to decide independently on all things and that the best ideas will permeate across teams. Other organizations take a different approach, bringing members of teams together to make decisions about standards that these members are expected to uphold when they return to their teams. We’ll start by looking at how Spotify, the digital music service, addresses the design of architecture by Agile teams. We’ll then contrast this approach with that of Wooga, a social gaming company that takes a slightly different approach.

We highlighted Spotify’s Agile team structure in Chapter 3, Designing Organizations, and want to return to that organization to take a deeper dive into how it coordinates across teams. If you recall, Spotify organizes around small teams called squads that are similar to a Scrum team (in our vernacular, an Agile team). Squads are designed to feel like a mini-startup, containing all the skills and tools necessary to design, develop, test, and release their services into production. As Kniberg and Ivarsson state in their October 2012 paper “Scaling Agile @ Spotify with Tribes, Squads, Chapters, and Guilds,” “There is a downside to everything, and the potential downside to full autonomy is a loss of economies of scale. The tester in squad A may be wrestling with a problem that the tester in squad B solved last week. If all testers could get together, across squads and tribes, they could share knowledge and create tools for the benefit of all squads.” They continue with the question, “If each squad was fully autonomous and had no communication with other squads, then what is the point of having a company?” For these reasons, Spotify has adopted chapters and guilds.

A chapter is a small group of people who have similar skills. Each chapter meets regularly to discuss its area of expertise and challenges. The chapter lead is a line manager as well as a member of a squad, involved in the day-to-day work. A guild is a more organic and wide-reaching “community of interest”—that is, a group of people who want to share knowledge, tools, code, and practices. Chapters are always local to a tribe (a collection of squads working in a related area), while a guild usually cuts across the entire organization. By utilizing chapters and guilds, Spotify can ensure that architectural standards, development standards, libraries, and even technologies are shared across teams. The chapter and guild leaders facilitate the discussions, experiments, and ultimately the decisions by which all the teams will comply. Chapter and guild members who participate in the process are responsible for bringing the knowledge and decisions back to their team and ensure their colleagues abide by the chapter’s or guild’s decisions. This approach offers a nice balance between an autocratic top-down approach and a democratic bottoms-up approach. Even so, it isn’t the only way for small, independent Agile teams to design and architect their services and products within a larger organization.

In Jesper Richter-Reichhelm’s article “Using Independent Teams to Scale a Small Company: A Look at How Games Company Wooga Works”3 posted on The Next Web (http://thenextweb.com/) on September 8, 2013, the author outlines how he approaches fostering independent teams and challenges some of the Spotify ideas. Richter-Reichhelm is the head of engineering at Wooga, a social gaming company founded in 2009. Wooga has grown from around 20 employees in its first year to more than 250 employees in 2013. Richter-Reichhelm states, “In the early days everyone in the company worked closely together and were not slowed down having to wait for approvals. Normally as a company grows, this changes as management layers are added, and work simply becomes less efficient. How did we hold onto that culture? The answer: centering everything around independent game teams.”

3. http://thenextweb.com/entrepreneur/2013/09/08/using-independent-teams-to-scale-a-small-company-a-look-at-how-games-company-wooga-works/.

Similar to Spotify and our model of the Agile team, Richter-Reichhelm created small autonomous, cross-functional teams that were responsible for independent games. These teams write and operate the games themselves, not relying on a centralized technical operations team or a centralized framework. “Engineers are not forced to share or reuse code” is how Richter-Reichhelm describes the level of independence under which each team operates. With regard to input from individuals outside the team, including the company founders, “It’s completely up to them [the team] if they want to listen or ignore outside advice.”

However, to leverage knowledge and gain some economies of scale, the Wooga teams actively share knowledge. They accomplish this through weekly status updates, lightning talks, brown bag talks, and other interactions. This shared knowledge helps the teams not have to relearn the same lessons over and over again. As Richter-Reichhelm describes it, “This way we can try out new things in one game, and when they work, that knowledge is spread to other teams. This works quite organically.” It should be pointed out that teams have shared results such as key performance indicators (KPIs) but these are not used competitively across teams.

The approach at Wooga is very different from the approach at Spotify, where chapter and guild leaders are tasked with ensuring knowledge and standards are shared across teams. So which one is the best? Both approaches are necessary, and companies need to evaluate which standards need to be in place across their own teams and which standards can be established by each of these teams. For your own organization, it depends on the organization’s culture, maturity, and processes. What are you as a leader comfortable with? Do you have a culture mix that includes experienced enough individuals to act independently in the best interest of the company, or do you need stronger oversight? The answers to these questions will lead you down the path to the best answer for your organization at a particular time. As your organization matures and grows, the approach to this might need to change as well.

ARB in the Agile Organization

As we mentioned earlier, the Agile team provides the cross-functional design that the JAD process attempts to achieve; thus JAD is not necessary when employees are organized into true autonomous, cross-functional Agile teams. The next logical question is, “Do you need an ARB in an Agile Organization?” The answer, once again, is “It depends.” Many companies with which we consult that have multidisciplinary teams still have an ARB process. The primary benefit of the ARB is that it provides a third-party view of the team and as such is not subject to the groupthink phenomenon that sometimes plagues autonomous teams. However, if we put the ARB within the product life-cycle development process, then we are impacting the benefit of autonomy and time to market benefits engendered by Agile teams. One potential fix to gain the pros and mitigate the cons is to perform the ARB review after the sprint as part of the retrospective. Another option is to convene the ARB daily (perhaps at lunch) for any projects that need to be reviewed. This way, teams are not stopped for any substantial period of time waiting for the ARB to convene.

When using an ARB in an Agile Organization, the primary goal is to ensure the architecture principles agreed to are being followed. This helps ensure consistency across teams. The board’s second goal is to teach the teams’ engineers and architects through interaction. This becomes increasingly important as team sizes grow quickly or upon acquisition of new companies where prior standards exist. Lastly, the board helps evaluate team members individually in terms of how they understand and enforce standards. They also evaluate the teams themselves—evaluating how they come together to create designs to help correct deficiencies.

Conclusion

Agile teams, which are designed to be small and autonomous, can be independently responsible for the architecture and design of their services and products. This autonomy in regard to their designs is critical if the teams are to be as nimble as possible. While the JAD process can be effective at ensuring cross-functional designs are taking place, a cross-functional Agile team ensures this outcome occurs by its very composition.

Sometimes, organizations face the prospect of limited resources, such as a smaller number of DevOps engineers than teams. Our advice is to assign individual DevOps personnel to a number of teams. Ideally, teams should know their go-to person by name and not have to submit tickets to a queue. This helps break down the walls that organizational structure can put up.

One of the questions we often get when discussing Agile teams with clients is how standards can be shared across teams if the teams are completely autonomous. One approach (used by Spotify) involves chapters and guilds that cross Agile teams (squads) to determine and enforce these standards. A different approach (used by Wooga) is for teams to be very independent but actively share knowledge through various forums. Finally, you can consider using an ARB to validate principle adoption and compliance by either convening frequently or using their meetings as a retrospective on recently released designs.

Key Points

• Agile teams should act autonomously, which means they should own the design and architecture of their services and products.

• The ARB and JAD processes ensure a cross-functional design of services. The Agile team, by its constitution or makeup, ensures this outcome as well.

• When resources such as DevOps engineers or architects are limited, assign them to multiple teams as named individuals. Do not revert back to tickets and queues fronting a pool of nameless DevOps resources.

• Sharing standards and knowledge across teams to achieve economies of scale can still be accomplished with Agile teams. There are various approaches that can be used in such cases. Choosing the right approach for your organization depends on multiple factors, including your teams’ maturity, your products’ complexity, and your comfort with distributed command and control.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.1.225