© Dave Harrison and Knox Lively 2019
Dave Harrison and Knox LivelyAchieving DevOpshttps://doi.org/10.1007/978-1-4842-4388-6_1

1. The Abyss Stares Back

Dave Harrison1  and Knox Lively2
(1)
Madras, OR, USA
(2)
Montclair, NJ, USA
 

This story isn’t true, but it could be.

The conversations in this book we’ve had in some form many times. The feeling persists at the end of day, rubbing our weary eyes – we were busy today, crazy busy, and it feels like we got a lot done. So, why do we feel like we’re treading water? There’s all these wonderful things we’d like to accomplish, personally and in our careers – but it feels like they always take the back seat to survival, keeping the wolf away from the door.

At least, that’s the position Ben is in. He’s come from the development ranks, but his last line of code was written a decade ago. Now, his life consists of meetings and e-mails. And more and more, he feels caught up in a gathering catastrophe. The team is starting to fracture with the stress of long hours and unreasonable expectations; irreplaceable people are defecting to other companies. He’s stuck with an IT and Operations team that seems both incompetent and hostile, deliberately sabotaging his team’s releases and botching support, while shifting all the blame elsewhere. And management is starting to tune him out and freeze his requests. It seems like, one way or another, he’ll need to find another job, or perhaps another career.

In short, it’s a mess. He’s stuck and isn’t sure how to find the way out, or even where to begin.

Picking Up the Pieces

George sat back heavily in his chair, which groaned in protest. “That was…. Gawd-awful. Honestly, Ben, I don’t think I can take another meeting like that.”

I grimaced. “Well, you have to admit, it could have gone worse. Our team looks really good by comparison with what our IT partners have been delivering.” I drew out the word partners just enough to give it the right sarcastic flavor. “Or not delivering. I mean, if we can keep up this trajectory, we could get more developers and finally start making real progress.”

I’ve got to admit though – this latest meeting had not gone well. No team had covered themselves with glory in WonderTek’s most recent release, and my business partners have increasingly become frustrated. New features – features that our customer base had been asking for months now – had to be rolled back; they were simply too buggy to be shown. Worse, unexpected integration issues that arose very late in the testing phase had caused noticeable performance degradation issues. Customers were now having to submit orders over the phone, while teams worked around the clock to try to get order processing off the ground. Credit card orders were running the risk of expiring.

My last meeting had been with the company CTO and the CEO and several vice presidents I’ve never met before. There wasn’t a smile in the room, except for some sardonic ones when I was laying out my remediation plan. Clearly, people were losing faith, and it was eroding my position.

It seems at times like we are slogging uphill. Since adopting Agile, my team has moved mountains, and we’ve met all their delivery goals. It simply is not our fault that the last release had not gone well. That message evidently hasn’t sunk in, however; the clear chasm between my team and the IT/Operations group was baldly exposed during the meeting. I will need to do something to reverse the tide.

Thankfully, I had a good friend to watch my back. I’ve known George over the past 10 years; he’s my wartime consigliere, to borrow a phrase from The Godfather. At the moment, George’s thin frame slumped in the chair, head back and eyes closed; the picture of abject dejection. But I knew this was just a passing thing; George was indomitable. And the man had an uncanny sense for what was going on behind the scenes. I will need every bit of his political savvy and knack for pulling on the right strings.

I pinch my nose and try to clear my head. “George – I don’t get it. Three months ago, all our teams signed off on this company-wide Agile movement . We have the most expensive and experienced consultants available to help us out. We’re churning out more work than we’ve ever done, so our adoption of Agile is on track. Yet our last release has even more bugs and the business is unhappier than ever. I keep thinking I’m missing something.”

George’s head lifts slightly, and I catch a glimmer from his slitted eyes. “It’s true the developers are working together well and we’re producing code at a faster rate. But our testing is in a shambles right now, and operations is dumping bugs on us faster than we can keep up. We are drowning in technical debt, Ben.” He sighed. “It’s true that we may be winning as a unit, but what good does that do us if we are losing as a team?”

He straightens a little, and now his eyes were open a little. “If success means delivering new features as fast as possible to QA, we are successful. But what if that’s not the finish line?” A long pause. I reach over and pat his hand. “George, we’re a shared services org. That means our responsibility is to deliver code period. After that, it’s just not our responsibility, and eventually people are going to realize that and put the blame where it belongs.”

My old friend is shaking his head. “I’m agreeing with you that Agile is working for us, but from what I’m seeing it’s actually making things worse down the line. We’re burning through our chips, Ben. The people in that room aren’t listening, because they view us as failing – our releases aren’t making it into production – or once they get there we are drowning in post release cleanup like this.” The chair creaked again, and George stands up. “I’m going to go home, make myself an Old Fashioned, and try to forget about this meeting.” We say our goodbyes and part ways.

That’s the difference between George and me, I think as I head out to my car. Instead of feeling afraid or drained, I was oddly energized by the back-and-forth tug of war in the boardroom. I’d always loved courtroom dramas and enjoyed fighting for my team on the executive level. Still, George had a point; the partnership with Operations had become more strained of late. I resolved to put some feelers out.

A Bad Start to the Day

There was just a little blue softening the black of the early morning sky when I pull into the rain-slicked parking lot at Wondertek. I’ve learned long ago that 90% of my work gets done before 9 a.m., and it’s easily the best part of my day – a chance to reflect a little without distraction.

Usually, that is. I was just stirring around the bubbling coffee grounds in the French press when Douglas pops his head over the cubicle wall. “Ben, can you come into my office for a bit?”

A pang of fear clenches my gut as I follow Douglas back. Douglas is not quite looking me in the eyes, and he is definitely not smiling. We chat uncomfortably for a few minutes about how things were going post release. Then, he begins. “I suppose you probably know why we are having this little chat.”

“Um, not quite.” I swallow a bit. “I realize things did not go well with the last release, but we are showing progress – ”

“Showing progress? Ben, I hired you to make me look good. You told me, don’t worry about the team, I’ve got it under control. From what I hear, this last release was just more of the same.” He looked out the window and slowly let out a puff of air. “For the past three years we have been telling people that things will be different, that we’re working as partners to get our code out the door faster and better.”

My stomach is clenching up, and my throat is dry. This is starting to sound like a pack-your-things discussion, and I’m not prepared. It’s a bad economy, my savings account is in tatters, and it would take easily half a year or more to find another job. “I don’t understand this. We ARE releasing faster, and I have the numbers to prove it. Agile is working for us, Douglas. The team is pulling together as one, we met all of our goals for this quarter early. My God, we just finished rolling out that footwear survey app two weeks ahead of time – and it’s a killer app, Douglas, it uses stuff we’ve never tried before. Really cutting edge, JavaScript-based, the UI is clean and so responsive.”

“Ben.” My managers’ tone is flat, emphatic, and final. “The footwear people were the first ones I heard from yesterday. They aren’t happy, at all. And our relationship with Operations seems to be getting worse as well, they’re calling you The Cowboys.”

“Cowboys!”, I snort. “Do you realize that they’ve been sitting on our releases for 6 weeks now prepping environments and there’s no sign of traction…”

Douglas held out a hand; he’s looking at me now, and he’s angry. “Do me a favor and just listen to me. We are paid by the business. They want these features out the door. This footwear app that your team wrote, is it in production?” Another silence. “I already know it isn’t. Footwear tells me that they’ve been waiting on it for two months; they’re going to miss their window for this quarter to get this to customers for their marketing push.”

“Douglas, come on! The app is done, tested and ready to deploy; we are waiting on production environments to get built out. That’s not on me! Operations didn’t handle their procurement right, and they keep shoving us to the bottom of their priority queue. There is no escalation path or even a way to check status with them without walking down and hoping I can catch someone in their cubicle who’s halfway competent.” I roll my eyes. “Dear God, I’ve seen better organization on 16th century whaling ships!”

“Ben, just stop it. Your predecessor talked like that too – it was always the other guys. Well, he didn’t last. The business does not care one good goddamn if the issue was Operations or you. In their eyes, YOU are failing – as a group – to get them what they’ve been asking for.”

This is sounding exactly like what George was saying last night. I have to force myself not to lean forward or clench my fists under this barrage. Douglas continues, “All right. I’d be doing you a disservice if I didn’t say that you are trying your hardest. I wasn’t sure if Agile was going to work for us when you first started on, but now I’m convinced, and I think the business is happier with some things you’re trying. But I’d also be doing you a disservice if I pretended that things are going terrific. They’re not. We are sitting on a powder keg right now, and the perception is that your team is the place where good ideas go to die. You need to turn this around.”

He leans back and pinches the bridge of his nose. “Whatever is going on here, you need to make it right with your partners in Operations and IT. Right or wrong, you won’t be able to move an inch without their cooperation, and they’re pissed right now. And go to your customers and shore things up there as well. Get me a remediation plan, Ben – I want it in writing, and it needs to be soon.”

Back at my desk, I find myself shaking a little; that was a close call. Even the aroma of the coffee can’t dispel the sense of gloom from this morning’s ambush. I need to come up with some workable ideas, fast. When George comes in, I tell him about the morning’s developments. He’s back to his old unflappable self; his face never registers even a flicker or a trace of surprise. Instead, he’s philosophical. He muses, “I don’t think Douglas wants to get rid of us. Actually, I think he’s trying to protect us. But like I said yesterday, we’re burning out our support.”

I’m still a little flummoxed. “I can show that we’ve made progress. We sold the business on Agile a year ago, and it’s worked. The team loves the retrospectives and the planning sessions; we’re pulling together well and we’re being transparent to our customers. But there’s problems down the line. We’re releasing to QA every two weeks, right on schedule – but those releases just sit there because Operations can’t build out environments fast enough. Our test cycle is completely manual; we’ve thrown people at it and added an offshore team in Bangalore to push work through faster but it’s still a 4-day turnaround. Will somebody please explain to me why blame keeps falling on our shoulders when it’s obvious that our turnaround time is not the problem?”

George smiles. “Last night I did a little research and found out something strange. Do you know what our turnaround time was before we started doing Agile?” I shrug, and he continues: “It took about 2 months to get a Hello World type web app out the door and in production, remember? Well, guess how long that takes now, nine months later? … Ben, we haven’t shifted the needle at all. It still takes 2 months.”

“That’s impossible! It would only take our team a few days to crank that website out and get it in test.”

“Yes, well, that’s the problem, isn’t it? The finish line for our team is getting it out the door to QA. But, in the eyes of the business, at that point we’re just starting – the real finish line is working as designed in production. And our pain point with production deployments and building environments haven’t changed at all. We’re still stuck in quicksand.”

I smile grimly. “Urggh. A nice little epitaph for my career here at WonderTek. George , I’m in my mid-forties, and I don’t want to have to hit the job market as damaged goods like this. It feels very much like we’ve got a sword over our heads.” I drum my fingers on the desk. “That’s really true? Our turnaround time hasn’t budged?” George nods. “So, that’s a number we can start with, and it might explain why the team seems to be moving faster but our success isn’t registering. Let’s put together some kind of remediation plan to convince people we’ve got a handle on things. I’ll need to spend some time talking to my partners and see if we can salvage things with them.”

Operations Piles On

For whatever reason, my latest craze is root beer floats. Lately, and this was something I’m not very proud of, I’ve been starting the day with one. The mixture of the creamy smooth vanilla ice cream and the fizzy root beer hits my taste buds just right. This morning, I spooned some into my Yeti cup; I found the insulated stainless steel makes the ice cream form a little crunchy crust that was better than any morning coffee. I need to hit the ground running this morning; hopefully, the sugar high will carry me through.

I begin with the most unpleasant task, meeting with Operations. At WonderTek , this group seems to always be complaining about lack of resources. In my eyes, they had a nasty habit of sitting on work to build a case for more people. It was an old game that unfortunately seemed to be working; the department had tripled in size over the past 2 years. Emily, who ran the team, was tough and no-nonsense; she held her team accountable to a high standard but was also fiercely protective and was an implacable adversary when things go wrong. Despite our differences, I respect her, and I’ve worked hard to build up a good personal rapport with her. If fences needed mending, I was starting at the right place.

But perhaps not this morning. As soon as I walk into her office, she swivels her chair around and gives me a look of pure, white fury. “Is this about last weekend? Ben, I am not – repeat – NOT happy with your team.”

“I just got back from my boss, Emily . He tells me you’ve been complaining about us. He seems to be under the impression it was our guys that dropped the ball. Emily, I thought we were partners. I really don’t appreciate being stabbed in the back like this.” My jaw sets. “I thought we were past this.”

Emily laughs bitterly. “On a personal level, we’re fine. I think some of the things your team has done over the past year have been really positive. But this last weekend…” She shakes her head. “I spent most of Sunday, when I was hoping to go wine tasting with my husband, having to talk Kevin off a ledge. He wanted to quit, Ben.”

“Kevin? Good God, what are you talking about? My team told me they’ve been waiting on him to create a VM for the past four weeks. That’s a right-mouse click, Emily. And he was a complete dead weight when it came time for our deployment. We gave you guys documentation, we walked you through what needed to happen, and when we needed Operations to partner up with us, they left us holding the bag. We got those environments late, Emily, and we had to spend most of the weekend getting them functional.”

Emily’s smile widens into a smirk. “Those jokers on your team are selling you a line. Did they tell you when they came by with those setup instructions?” She jabs her index finger on the desk. “Wednesday. Freaking Wednesday, Ben. The rollout was on Friday night! And look at this crap!” She pulls a thick sheaf of papers from a folder on her desk and slaps it down triumphantly in front of me, like a prosecutor with a particularly incriminating piece of evidence. “I’m looking at five pages of itemized directions here. And it’s garbage. Look at this one – ‘5. Set up SQL Server’. What does that mean, exactly? What version, what size, what’s the backup/restore strategy – none of that’s here, we’re left to guess. Ben, you put my guys under the bullseye. This just can’t keep happening.”

I hadn’t been expecting this and find myself stammering a little as I leaf through the handover documentation. “You know, I looked through those directions myself when we were planning the rollout. To me, it looked complete. We even put up a wiki…”

“A wiki! I’d settle for a decent head start. It’s time you faced facts. Since you guys made this move to Agile, you’ve been pushing out releases every two weeks. And each of them requires a touchpoint with my team. You say that we’re partners, but you’ve never asked us if we’re ready to handle releases every two weeks versus every 6 months. And we’re just not, Ben. We don’t have the people – don’t you dare laugh, you don’t work over here and you don’t know what it’s like supporting something in production. You talk partnership, but you’re not walking the walk, plain and simple. A real partner would never drop something like this on us with no warning.”

I stand up. The whole point was to salvage our relationship; if I stay much longer, I’ll start throwing some facts of my own in her face. Remember what your Dad used to sayonce you say those words, those angry words, you can’t unsay them. “Emily, I didn’t know that you guys felt this way. You know we can’t move anything without your help, and it seems like we are missing something as a team. I need to think this over.”

“You know me, Ben, I’ll always tell it to you straight. We’re really sick and tired of being downstream of you guys. I can’t lose Kevin, he’s my best person and frankly he’s done working with your team.” She started to put her headphones on. “I’ve got a meeting, and we need to do a postmortem of this last little misadventure. Let’s talk later in the week, OK?”

In Debt up to Our Eyeballs

Things got no better when I walk across the street to talk to the Footwear people. My main partner there, Tabrez, sees me as I come through the lobby doors. He just shakes his head slightly and turns back to his meeting; I will have to try again later.

Back at my office, I get caught up on e-mail. There was a few from my QA lead Rajesh, complaining about being behind on their test cycles as a result of the last push. That was par for the course; it seems like there is never enough resources or time for the QA team . More seriously, my ace BSA, Elaine, left me a short note: You need to see me, today. I sigh and say a silent prayer for some energy.

The daily standup gives me just the nice jolt I need. The last sprint ended on Friday – this was Day 1 of a new effort, and the team is visibly excited to be getting to work on some new features. I listen and make a mental note of some blockers looming up ahead. For the first time in days, I can feel my shoulders unclench. God help me, I love these guys. We’ve come so far.

A year ago, Agile was just an unproven theory. My team was one of the first to adopt it at WonderTek. Now, of course, I pretend that it was all part of my inscrutable Master Plan, but I remember damn well what led to us adopting Agile: sheer desperation. This was my first taste of management, and for months I had felt completely disconnected from what the team was actually doing. This left me vulnerable to credibility-burning surprises. Agile was a gamble that paid off handsomely for me; I know exactly what the team was committing to and how we were doing. It also helped me gate work; in the past, people would swing by to “check on how things were going” and try to get their favorite developers to prioritize their latest emergency project. Best of all, it gave something a manager can’t have enough of – insulation. If anything was stopping our progress, or if it looked like we were going to fall short, I knew well ahead of time and could get ahead of things.

What made the biggest impact was the smallest of things: a few hundred dollars sunk into widescreen monitors. Over a few weekends, I’d read some books on Toyota’s Lean movement, including the widespread use of “information radiators” to show progress in common areas. The following week, I had set up a few monitors facing the hallway displaying all the key metrics for the team: a nice burndown chart for the sprint, our progress against their assigned tasks, and the improvement in velocity made as a group over the past year. This helped the team by keeping their commitments top of mind, but the real payoff came with other groups. People kept dropping by and asking me questions – “What’s a PBI? Why do you guys meet every day – isn’t that inefficient? Why do you estimate in points instead of hours?”

It was an odd and very human thing. If I had paraded around scrum and Agile concepts, it would have fallen flat as an attempt to make other teams look bad in comparison. Just displaying the work we were doing without pushing caused this kind of snowball effect, where it gathered momentum on its own. Other teams started working in sprints, quietly and without fanfare. They all did it a little differently, but now there wasn’t an engineering team at WonderTek that wasn’t producing work in short bursts instead of year-long milestones.

I remember very well though being terrified about exposure. WonderTek can be a very political place. At times, my job seems like a game of Donkey Kong with flaming barrels coming at me from all sides. Everything is based on a flawless reputation for competence; a certain level of paranoia can be a healthy survival trait here. Given this operating climate, sending out those first few retrospectives – including our mistakes and where we didn’t complete work – gave me pause. Mistakes would be seized upon and magnified by other team leads; I worried about losing face with our business partners.

Astonishingly, no hammers or flaming barrels descended from above. In fact, it seemed like being up-front with where the team was falling short weirdly seemed to increase my credibility. And the team really enjoyed it; several had told me that complete, honest, hold-nothing-back retrospectives were the best side effect of implementing Agile. During sprint planning sessions, it’s common to see points being made off a retrospective done months earlier; it had saved us several times from going down the rabbit hole and repeating mistakes.

There were rough points, of course. Agile wasn’t free; it required a time investment. Meeting together every 2 weeks for a retrospective session, including a group demo and show-and-tell , took up almost 4 hours; combined with the sprint planning meeting, which took another half day, they were losing too much time keeping the wheels turning. But the business seemed to respond well to the bargain that I put to them during that first trial period with Agile – if you give us isolation, we will give you transparency and accountability. I don’t see business partners dropping by “just checking up on things” for example, once we gave them a public dashboard to check on their deliverables and explained that work would be gated and planned sprint by sprint.

I still don’t like all the time lost with the retrospective and planning sessions. It’s unavoidable, and in the end I’ve decided it’s a tax well worth paying. Just being able to plan and commit to a small set of tasks had a transformative effect on what had been a very chaotic and unhappy group. Suddenly, we were in the driver’s seat, instead of being a victim of events beyond our control.

This didn’t mean that life was bliss. There’s a tapping on my office door – Alex, my lead developer, pokes his head in. “Hey, can we talk a second?”

Trouble ahead; Alex didn’t do this often. “I think I’ve got a few minutes before the next beat-down. What’s up?”

“Listen, we just finished this big push. We said a few sprints ago that we would buy out some cycles to start paying down our technical debt. Our bug list is through the roof, but that’s not the worst of it – our code quality continues to degrade. To get the last few features out, we had to take some shortcuts. I don’t even want to show you some of the hardcoding and crappy spaghetti code in the last release, or the new libraries we threw in there completely untested. My suspicion is that’s the real cause of some of the performance issues we’ve been seeing in production. Most of my guys are heads down knocking down bugs, and we likely will be doing that for a few more sprints. We can throw our commitments for this sprint totally out the window – our main priority is keeping our heads above water.”

“Well, that’s obviously not good.” I sigh; this conversation had a déjà vu quality to it that grated. Every sprint I come in thinking we finally have a clean slate to really start knocking down new work and ramping up velocity. And it seems like on day 2 of every sprint, half my firepower bleeds away handling support and post-release fixes. As unpleasant as this news was, it really wasn’t unexpected. “Obviously I’ve got some more damage control to do then with our stakeholders because we’re not going to be able to deliver what we promised.”

Alex grimaces. “I know you’re focused on new work, and I get it – but at some point, we are going to have to say no. We threw architecture almost completely out the window months ago trying to make our dates, and look where that’s got us. Our codebase is just a big ball of mud – half the time when there’s a problem, it takes us days just to figure out where the problem could be because our releases are so gigantic. And then when we try to roll out a fix, we get all these wonky regression errors. No one really understands how things work end to end because it’s held together with baling wire and sealing wax – yes, me too unfortunately – and God help us if one of my people gets sick. Padma was out two days ago; no one else on the team even knew where her work was, let alone how to fix it.”

Now, I’m starting to get a little pissed off. This is nothing more than a process problem, and I’ve told Alex before to set a higher standard when it came to documentation. “Alex, we’ve been over this before. You need to have better documentation, even a wiki or a SharePoint site for Pete’s sake. Your guys should be totally interchangeable – if we’re doing what we should be and leaving a better documentation trail, it shouldn’t matter at all if someone’s sick. We’re still thinking like a bunch of skilled individual craftsmen, not a team.”

“Ok. Just something to think about, all right?” Alex gets up and says, “Listen, we committed to Agile, and it’s working. But our technical debt is rising, and at some point, the bill is going to come due.”

I pat him on the shoulder on the way out and then look at my phone. I’ll be late for the release recap meeting. I hate coming in late to any meeting, as it puts me on my back foot. This one especially looks like a bruiser, and now I was at a serious disadvantage.

Release Retrospective

Sure enough, there was a set of unfriendly stares as I come into the meeting room. There were about 20 people in the room, about a dozen more than I like for a productive working session. Of course , that wasn’t the point today – this was more like a show trial. Emily has a phalanx of IT peeps on her side of the table, whose main purpose seemed to be nodding in agreement whenever she drops a bombshell.

She launches a frag grenade now. “Thanks for joining us Ben – we were just talking with your team about what happened this last launch. It seems like every time there’s a new version of this software from you guys, we end up having to work round the clock getting it to work. And then there’s dealing with all the support calls afterwards because it’s broken. This last one was particularly bad – we’re still adding them up, but there’s over a dozen major issues we’ve identified so far in production, and about a hundred minor UI bugs. Ivan tells me that it’s going to take his team days just to triage and prioritize the issues. Don’t you guys test your code before you send it out the door?”

I fight the urge to roll my eyes with this expression of sympathy for poor Ivan having to actually lift a finger. Ivan was a particularly vocal and nasty thorn in my side. He was in charge of 24x7 support and triage here. So far, all his ideas seem to focus on shifting responsibilities elsewhere. His latest “process improvement” had consisted of buying an expensive trouble ticket software system, separate from the one the development team used. Besides the ongoing drain caused by trying to reconcile two different ticket queues, any bugs called in by customers now passed along untouched to hit my team directly. Ivan views this as efficient; I view it as a naked attempt to shift responsibility and burden my team with production support.

It frustrates me, because easily two thirds of the bugs we face are non–code related – networking issues, trouble with authentication, or good old-fashioned user error. Now, his team had to sort through the bugs themselves and weed out environmental or dead-end problems – a big reason why we are tied down every sprint with unplanned support work. Most of the organization seemed to think that Ivan was the Operations version of Jack Welch, but I think he’s a buzzword-quoting career asshole.

Rajesh, on the far side of the table, had his hackles up with Emily’s attack on their quality. “Emily, with all due respect, you don’t know what we’ve put into our testing layer. Just with this last release, we had a team of 18 people in Bangalore running smoke-testing against the UI layer round the clock. We’ve invested heavily in Selenium and our integration test coverage is rising every sprint. The issues with this last release have nothing to do with lack of testing – the dev teams changed the UI significantly, which meant we had to rewrite our functional test layer. Everything was broken – it was a miracle we were able to get our test coverage up to what we did, since we had to start from scratch a month ago.”

Elaine, my ace business analyst, is sitting two seats over from Rajesh. She says, “Not only that, but there were the same old problems with what was in the release itself. Our business stakeholders looked at the features we were releasing and there were several that just didn’t make the cut from a quality standpoint, and a few had to be rolled back completely because the developers either misunderstood the requirements or the feature itself just wasn’t needed anymore. Honestly, some of these requests have been sitting waiting on us for six months or more – I’m surprised any of them had any value at all by the time we got to them.” She gives me a glance and says, “Rolling back those bad features seemed to take up a lot of time that could have been spent knocking down some of the bugs our users have been reporting from the last release.”

The lone developer at the table, Alex, had been checking his phone, but now his head popped up. “Yes, about that – that code had been in a feature branch that was at least two months old. Keeping that code stable and up to date is killing us, it’s going to take me a few sprints just to integrate this with main and nail down issues with conflicting libraries and get the code merged. It didn’t help that last Thursday Elaine came over and told me to get rid of the new order submission screens. That late in the game – it’s like if a customer comes in asking for a cup of coffee with cream, and then after I make it just the way they wanted they change their mind and ask me to get rid of the cream. Of course, I can do that, but it’s going to take me a lot of work. I wish our business people would understand the impacts they’re causing when they change their minds – it’s directly leading to these stability problems.”

“This is exactly why we need to pay more attention to the basics,” Emily said, frowning. “We’ve been saying for months that our current pace isn’t sustainable. Our releases are breaking, they’re not just high-risk – it seems like every one is a guaranteed fail. One year ago, we started up CAB meetings and for whatever reason it dwindled away due to lack of support. I think we should start them up again – go over defects, run postmortems on more than an ad hoc basis. These breakages are making everyone look bad, not just my team. It’s high time we begin instituting a little discipline in these processes. In my last job at Phoenix Insurance, we had a great process – gated releases based on CAB approval, zero defect meetings where all the developers could go over root causes. There just wasn’t this Wild West type bad behavior we’re seeing with runaway releases...”

Enough – time to start focusing on some kind of outcome instead of this finger-pointing. I’ve learned a few tricks from a round of marriage counseling that Julie and I went through, and one of them was active listening. I stand up and walk over to the whiteboard and draw a table with three columns:

../images/462163_1_En_1_Chapter/462163_1_En_1_Figa_HTML.gif

I say, “OK, it’s obvious the wheels came off with this last release. We’ve committed to Agile as a company, so let’s handle this like we would any other retrospective – which starts with listening. What I’m seeing here is a few things that are not working.”

I add the following to the middle column:
  • Branch integration issues

  • Integration code coverage dropping

  • Business changing its minds about features to deploy

  • Too many bugs making it out the door to production (strain on Operations)

  • Late or nonexistent communication with IT on infrastructure needs

  • Long list of features that are aging

  • Crappy release documentation and rollout instructions

  • Technical debt rising

I step back and look at the board. I don’t necessarily agree with several of the points or their impact, but I figure throwing in the comments about the strain on IT and Operations might help soothe some tempers. I ask, “That last one is something Alex brought to my attention just a few minutes ago – it seems like the quality of our application code is showing some strain as well. Does that cover things?”

Elaine says, “In my view, we need to start training your team more in understanding and eliciting requirements. We only have two business analysts on the team, Ben – a lot of teams have a 1:1 ratio of developers to BSA’s, and they don’t see this kind of friction. If you aren’t going to get me help with people, I am going to need to have your developers start putting in more work into understanding what needs to be done. It can’t be just me.”

Emily is still folding her arms – the permafrost on her side of the table shows no sign in thawing. “Put under the ‘What We’ll Do Differently’ column the weekly CAB and zero-defect meetings. Attendance should be mandatory. I’m tired of just seeing people from my team in the room.”

“Fair enough.” The dry erase marker squeaks as I add it to the list. “Before we go any further, let’s talk about what is working. I think we can agree that we’ve got some serious quality issues here. Was there anything that went well for us?”

Complete silence around the table. Even Alex shrugs; he is back on his phone, checking e-mail. Emily, surprisingly, is the first to speak. “You know, I thought putting Kevin in touch directly with Alex Friday night – when it looked like we’d have to roll the entire thing back – really helped things immensely. Once we actually had some back-and-forth going, we were able to find the mismatch in the server configuration that had caused all those intermittent connection issues. We barely made the cutoff to avoid a rollback, which would have really been a nightmare – but at least we made it.”

I nod. Time for an olive branch. “I really appreciated having your team available and working with us in the war room, Emily. Kevin put in some heroic work.” I sigh and sit down. “OK, let’s put cross team collaboration as a win on the left side, and leave it at that. Emily, on the CAB meetings, let me attend to represent our team, so you’ll have someone to work with on the dev side of things.”

We’re out of time for the meeting, which is good – I’d purposely capped it at 30 minutes. Meetings, like all gases, tended to expand to fill the space allocated to them. Emily closes her laptop and stands up. “I look forward to seeing a remediation plan. One thing I think we all agree on here – this can’t just keep on happening. It’s expensive and stressful.”

I nod my head grimly as the meeting breaks up. The feeling I’d had since early this morning of a cloud hanging over me feels stronger than ever. I’m bone tired and still have a long day stretching ahead working on postrelease cleanup. As I leave the room, I look back at the whiteboard. The empty column on the right mocks me. Without a clear problem definition, it was hard to even think about a long-term solution.

What Isn’t Working

At 3 p.m. every Tuesday, I sit down with George for half an hour to go over how things were going with our Agile improvements . Of late, these meetings had lasted barely a few minutes as we ironed out some rough spots; given the events of the past week, I have a feeling this one might stretch a little longer.

George begins by drawing up some of the troubles we’d pinpointed during the release recap meeting, then stops. “How are you doing, Ben? You seem tired today.”

I lift my eyebrows. “Yeah, guess I am – for some reason.” We both laugh. “I keep coming back to that talk we had a few days ago and I’m frustrated. Maybe we’ve plateaued, and maybe our partners are dropping the ball, I don’t know. But this latest crapfest is really undermining my credibility, and George – that’s all I’ve got! Agile has really helped the team, but I can only control this much” – I hold my hands close together – “not the rest of the company. I don’t control project management, or IT, or Operations. Even our security and architecture people are siloed off.”

We’re both silent a few seconds, then I continue: “I don’t think Douglas is going to fire us. But it’s clear to me that some of the promises we’ve been making about higher quality with more frequent releases we just can’t keep. My gut tells me shifting to longer releases is a mistake, but if we can’t get IT to move at the speed we do, we may have to go back to a quarterly release cycle. Development is very much a balance between safety and speed, and right now our Agile velocity is hurting our safety.”

“So, it seems like now is a good time to start talking about that next leap forward.”

“Ah, you mean DevOps. The latest shiny object.” I shake my head. “We’re not Netflix or Amazon, George. I don’t have a billion dollars, or hundreds of expensive, highly competent engineers. I’ve got a few dozen people available to me, that’s it – and we’ve got a couple dozen mission critical apps to support. And you know well what a nightmare those apps are. We inherited a bunch of garbage kludged together by people who are no longer here and didn’t believe in writing things down. So most of the time, we’re afraid to touch them – they’re just too brittle, too badly put together.”

“And like I said – I only speak for part of the company. If it’s related to development and writing code, or testing, you know my people – we can try it and see if we get results . But anything other than that is above my pay grade.” I look at George, and my mouth tightened. “You can’t ask a supertanker to perform like a sailboat, George. Our turning radius just isn’t that small, and we’re having problems just getting this jump to Agile to work for us. The perception, as you saw in that meeting, is that we’re making things worse. Upper management sees us as being a cost center – WonderTek has always been and always be a sportswear company, first, last, and always.”

George is relentless. “I think that attitude about ‘we’re just a clothing company’ is changing, Ben. I was talking to a salesman the other day. Did you know that our sales people are still driving around with vans of samples to show our retailers their spring displays? But our competition is driving around with tablets – they can actually show the store manager how their displays will look, with a view they can rotate and play with, of their actual store. Guess who is going to be selling more clothing next quarter to that store? Under Armour, North Face, and Marmot are all using technology as a differentiator – not just a cost of doing business. I’m seeing signs that we’re moving in that direction too.”

That’s interesting – I hadn’t heard that. “That’s definitely something to think about down the road. Let’s get back to what is in our control. We have this list of trouble areas; Douglas wants a remediation plan. What’s something we can promise in the short to medium term that can give people a better level of confidence in us?”

George is smiling at me. I know him well; he’s going to hang onto this DevOps thing like a bulldog. “Well, looking at this list, we’re seeing some issues like brittle test code, long wait times to get features out the door, integration hangovers, and quality issues. Oddly enough, these are all exactly the kind of problems DevOps was meant to address. We can start by focusing on the main pain point that’s hurting everyone, and – without even using that forbidden word ‘DevOps’ – begin that supertanker turn you talked about.”

He goes back to the board and crosses out the right column, “Things We Will Do Differently,” and put in all capitals the word “QUALITY.” Then, he put the following bullet points in the table:

What Didn’t Work

QUALITY

    • Branch integration issues

    • Code coverage dropping

    • Business changing its mind

    • Too many bugs

    • Bad communication prerelease with IT

    • Long list of aging features

    • Rising technical debt

• Code review

• Continuous integration

• Continuous delivery

“Hang on a second, George. We do code reviews already.”

“We do – infrequently and very, very briefly as part of a pre-release show and tell. That’s great for public speaking practice, but it’s not really helping improve the overall quality of your code, does it? I was listening to a talk by Randy Shoup , who used to work for Google – he said that, if he could go back, that’d be one practice he’d really refactor – he’d have code reviews be more frequent. Like, daily.”

“George, you know our situation – the business is waiting on these features. They won’t stand for a lot of time lost with handholding.”

“Do you think the business is happy now?”

That, I have to admit, is another good point. George continues, “What I’m asking the team is to think about quality, first and foremost. It’s obvious that the business is not happy with the way features are being delivered right now. If we want a different result, we are going to have to think about this differently, more holistically.

“Where we are trying to get as a team is to get fast feedback and smaller release bits. The two work together – this last release represented three months of work, mushed into one feature branch, and it was delivered late. It was a tidal wave of crap – we’ll be picking up the wreckage from this for weeks. However, if we move the pain forward, and if we do more frequent releases with smaller amounts of changes in each one, our risk level will actually go down. We want small, frequent waves, not huge, catastrophic ones. Faster is safer, Ben. I think our goal should be to get to daily releases, and in six months if at all possible.”

My jaw drops a little. George might as well be talking about going to Mars. “Daily?! We’re having a hard time running out releases quarterly at the moment. It’s going to be hard to sell this, George. If the perception out there is that we’re a bunch of developer cowboys, this will look as if we’re trying to wriggle out of our commitments.”

George stares hard at me, that bulldog gleam in his eye. “Here’s what I don’t understand. Why are you asking permission from others to do your job? You’ve said to me several times, ‘the business has every right to tell me what to do, but not how to do it – that’s my job.’ So why are you asking for an OK from them now? Everything I heard in that meeting today had to do with quality – improving that is part of our team’s core function as professionals. For the next three months, everything we do needs to center around that one focus point, and there’s no one in the company that can tell you differently.”

I lay my hands flat on the desk . “I haven’t lost my nerve, if that’s what you’re asking. But I know this company and what’s possible. What you’re asking for – we simply don’t have the maturity level or the tools to make this happen. Now, I need to put together some kind of realistic plan. Get me something more realistic and we can talk about DevOps once we get our own house in order.”

The Brain Trust

Every few weeks, I like to get together with a few close friends early in the morning and shoot the breeze. I call them the “Brain Trust ” – usually it’s just George, Elaine, Alex, and myself. Sometimes, we talk about work and the problems we’re facing; other times, we get caught up on the weekend adventures or how our families are doing.

This early in the morning, the only other people here are the earliest of the kitchen crew, arriving for lunch prep. Every table in the large cafeteria space was empty. The clink of dishes and silverware is oddly comforting; it reminds me of my college days. Most of my fellow students loved hanging out at the library. My favorite study area was the local McMenamins brewpub, where I’d study late into the night. I loved the hum and clatter in the background; it helps the nice, relaxed, informal vibe that gets good ideas flowing.

Today, though, I invited Rajesh along and bought him a coffee. Rajesh is a gentle soul and has one of those calm, sensible voices that is easily drowned out. I’ve run into people like Rajesh in every place I’ve ever worked. They tend to be silent heroes, not caring overmuch about credit, but in a time of crisis they always seem to be in the right place with a thoughtful, perfect solution. Perhaps, I need to invite him along more often. I like this group small and trim, but if quality really is our sticking point, I need to have Rajesh here and contributing.

George dumps three creamers into his coffee and stirs it contentedly. He seems to be able to eat whatever he wants. I feel a pang of jealousy – for being as thin as he is, George never seems to struggle overmuch with weight as I do. Ah well. “Business today guys, so sorry – we’re playing cleanup from the release last week. George, why don’t you show us the latest?”

George recaps the to-do items we’ve uncovered, in a now-familiar litany: branch integration issues , rising technical debt, and the abysmal communications ahead of the release. “Each of these has cost us time, but when we look at the overall picture, the consistent issue that seems to crop up is quality. We’re spending weeks testing our code but the features we’re delivering are still making it out the door with some pretty significant defects. Do you think that sounds about right?”

I can always count on Elaine to be our conscience and connect us with how the business thinks. She says, “The biggest issue isn’t even on this list. My father used to tell me – it takes a lifetime to build up credibility, and 30 seconds to lose it. We are losing credibility with the business by the day – our reputation couldn’t be worse right now. They’re calling us ‘The Black Hole’. Until we start improving our turnaround time on requests and making some headway on these requirements that are piling up on us, that rep is only going to get worse.”

Rajesh groaned. “Elaine , we just finished talking about technical debt. We really need to buckle down and pay off that debt before we can expect to move forward. That means getting everyone on the team to commit to quality first. We need to take 3 or 4 sprints and focus on getting our integration testing caught up. This is do or die for us – unless we get our test coverage up, we’re never going to be able to produce work at any kind of decent pace. We’re just going to keep on churning out defects and hoping for the best.”

I straighten up in my chair; a four-sprint timeout – and it’ll be at least six by the time all is said and done – was one idea I hadn’t seen coming. Elaine counters, “Rajesh, it’s really naïve to think that we can take out 2 months of our time and sit on our hands working on buffing up testing when other requests are dying on the vine. The business simply won’t stand for it.”

George was munching on a cream cheese danish, and now he brushes some crumbs off his chest. “It seems to me like we need to find a better balance between velocity and safety. Right now, Rajesh is saying that our safety is dropping, and our test coverage numbers show that. However, if we focus on testing, our velocity goes to zero. That can’t happen.”

I don’t know what the answer is, and it’s making me panicky. I do know, instinctively, that Elaine is right; we can’t take ourselves offline for a quarter. They’ll dismantle us. Elaine is telling us that the slow rate of progress we’re making is alienating the people I most need on our side. Rajesh is saying that we aren’t eating our vegetables and that we’re paying for it. We have to go faster to survive; but the person I most trust on the team is insisting that we need to put the brakes on. It’s an impossible situation.

George is smiling. He says, “I totally understand why we all feel a little overwhelmed right now. When I look at all the things we need to do to get better at once, it looks gigantic. There’s a lot that’s outside our control right now – which means stress. But last week we isolated a few things that are totally within our control to implement that we can do today. We need to implement better peer reviews, like on a daily basis, and look at improving our delivery rate so we have small, testable batches going out the door.”

Rajesh looked pained. “Sorry to say this George, but increasing our release rate right now is the last thing we need. The code that the developers are writing just isn’t testable. As for full regression testing with our offshore team, every time we do a release – that’s three weeks. We can’t speed that up until we rethink our testing approach, and maybe change our framework. Implementing continuous delivery – I mean, that’s just not realistic given our situation. We don’t have a small army of people to throw at this thing, and it’s just a pipe dream to even think about continuous delivery as long as what we produce takes so long to smoke-test.”

The lunch crowd is starting to filter in; we’ve run out of time. I sigh, “OK, so let’s go through the things George mentioned. Thumbs up or down, your honest opinion.” I read off the items George had written on the board again. First was CI/CD, and to my surprise, that didn’t get a single vote except for George. Rajesh’s proposal to refactor our test framework in a lengthy reset gets shot down almost immediately. That left two items – daily code reviews and peer approval. Both got almost unanimous assent; these were, the Brain Trust agreed, low risk items that could help improve quality without taking away attention from work that needed to be done.

I can tell Rajesh is still frustrated. I grab him by the shoulder before we leave the cafeteria and say – “Rajesh, I know this seems like a step back…”

“It is a step back, Ben. You’re signaling by your actions that quality just doesn’t matter. The developers will continue to follow your lead and throw stuff over the fence.” His mouth tightened a little. “I told you before, you have to go slow to go fast.”

“Yes, I remember. Look – no one at that table thinks that testing comes second place. I think what you’re hearing is a lot of fear and uncertainty. No one wants to be the one to tell the business that we need to stop work for several months to get things right. I definitely can’t sell that to Douglas, and I would hate to hear what our business partners would say.” I smile. “I’m really glad you came today, and your thoughts were dead on. Let’s put some thought into it. Take two weeks – and come to me with a workable proposal. A proposal on specifically how to overhaul our testing so we won’t continually have QA lagging behind. Pretend like you have no constraints, no sacred cows, and cost and resources aren’t a problem. If it’s workable, I promise you, I’ll do whatever I can to implement it.”

Rajesh has stopped seething . “So, in two weeks – you’re going to listen to what I have to say. And you’ll show it to the team?”

“I promise, Rajesh. Don’t give up on me yet, I won’t give up on you.”

Bleeding Out

Just as I begin navigating to my comfy office chair, my phone buzzes. It’s a text from Laure in HR: Call me. I have some bad news about your team.

Ten minutes later , my ears still ringing from the call, I stop by Erik’s cubicle and tap him on the shoulder. He looks up guiltily then follows me back to the cafeteria. Erik was one of my prize hires. He had been hired on 2 years ago after an exhaustive national search and had quickly proven his worth; he was a top performer on the team and had cutting edge coding skills. Laure’s news that Erik was leaving was a real blow; even worse, Erik had apparently decided to burn down every bridge he could on his way out the door.

“Erik, I hear you’ve handed in your notice. Why didn’t you tell me first?”

“Ah, you know how it is man – it was late, you’d already gone home. I didn’t want to hurt your feelings. This offer is just too good to pass up.”

“Where are you planning to go?” Erik’s eyes dropped, but after a slight pause he shrugged his shoulders and said, “Actually, I’m taking on a dev lead position at Netflix. They’re going to let me work remote, and I might get some more time with my son.”

Erik’s son has autism – I had arranged for him to have 2 days a week working from home with that in mind. I say as calmly as possible, “I understand you leaving – the gold watch days are long behind us. But, the way you’re leaving – Laure tells me you gave us zeroes across the board. You said we had weak leadership , that we were stuck in the mud, that we waste most of our time with busy work. You even recommended that we get folded into other groups. Erik, this reflects directly on me. I can’t understand why you didn’t come to me first.”

Erik sighs and fidgets in his chair. “Look, Ben, on a personal level – you and I are fine, I like you. But honestly, can you really claim that we are cutting edge? I mean, look at this last release. I see us churning this garbage out, and when I try to point out ways of improving our apps so that they’re not constantly breaking, I just get a bunch of head-nods. But nothing really changes. I tried to put it as best as I could with Laure, but bottom line – I’m leaving because Netflix is a 1% company. They get it, and they treat development as a core line of business, a vital part of how they do work. We are retail, Ben. It’s all about dropping costs as low as possible, and they treat us like peons. It’s always ‘do more with less’, ‘we don’t have time for that’, ‘the business expects us to meet these dates.’ And it’s been one death march after another lately. So then all this talk about partnership is just talk – I’m not a partner, I’m not valued. Thanks, but no thanks – I’d rather go to a company that takes me seriously, that values my time, where I can really make a difference. Here, it’s like I’m trying to do open heart surgery in a stable. It’s a losing game.”

I reply hotly, “Erik, that’s great, and we’ve had conversations about this before. You know, I can’t control any of that. But we’ve created a little oasis of sanity for the team. Things are working better here, and I can’t believe you’re saying that you’re not valued. We made a lot of concessions to get you the working schedule you wanted, and you’re paid higher than anyone on my team, including some people that have been here twenty years. I feel absolutely stabbed in the back.”

“Well, that’s kind of why I ended up going to Laure, instead of talking to you. I knew you’d take it personally.” Erik stands up. “Ben, you made me all kinds of promises to get me to come out here from Tennessee. You told me that we were going to be changing, modernizing, that I could control my destiny and change the organization for the better.” He pointed to the WonderTek label on his shirt – “You know I love WonderTek, I’ve always worn this stuff and I’m loyal to what this company stands for. But the way we write software is thirty years old, we’re hopelessly stuck in the past. It’s the responsibility of the team lead to lead, Ben. In my view, and I’m sorry to say this, you smile and say we’ll get to that, just as soon as possible, but nothing ever really happens. You’re just part of the problem.”

We part in icy silence. It would take me a few days to look Erik in the eye. This one stung a little. I owe Laure quite a bit for letting me know, side-channel, that one of my employees was tarring my rep on his way out the door. The team commonly sheds a person or two a year, and it usually seems to be a few weeks after a hard push like their last release. Still, exit interviews were valued very highly by management; Erik’s remarks were bound to come up in my next performance eval with Douglas.

And now I have to find a replacement for Erik. It could take me months to find the right person and bring them up to speed. It looks like our team’s productivity was going to have to take yet another serious hit.

Dysfunction Thrives on Secrets (Making Technical Debt Visible)

At the next daily standup , there is a definite chill in the air. Everyone rattles through their daily to-do’s in a robotic, flat monotone. Even Padma and George, normally the happy bubbles of the group, seem subdued as they talk about their challenges to tackle for the day.

As the group wraps up and starts to disperse, I hold up a hand. “I’m not feeling a very good vibe today from you guys. What’s going on?” Silence, but a few people glanced at Erik. I smile. “So, I think it’s time to acknowledge the elephant in the room. Does this have to do with Erik leaving us?”

Erik grimaces and rolls his eyes up at the ceiling; George starts to smirk. “How do you like your eggs, Erik? Oh wait, let me guess – Eggs Benedict.” Everyone groans, but at least now things are out in the open.

I got to give him credit; Erik is a stand-up guy. He faces the half-circle of his teammates now without flinching. “It’s true, I’m leaving. I love you guys, you know that. This is a big step up for me. I’m hoping you can support me instead of acting like I just stabbed you in the back.”

George asked, half-jokingly, “Was it something I said?” But I can feel the air of unspoken hostility starting to dissipate. Smiling broadly, I say cheerfully, “I think it’d be really unfair if we acted like Erik was a traitor or leaving us in the lurch. Sometimes we can grow within the company. And sometimes we end up looking for those challenges elsewhere. But that’s Erik’s right and privilege to map out his own career. Naturally, we’d prefer to keep you Erik, but I’m excited you’re going to get a chance at leading a team of your own. That being said – you are going to leave a huge hole. Everyone here is really going to miss you.”

For a few minutes, we chat with Erik about his new gig and what he’s excited about. Erik purses his lips a little; he’s on the spot and doesn’t like it. “What I said is true – I’m not leaving WonderTek so much as I found something cool and new that happens to be elsewhere. It’s not that big of a deal, people come and go all the time. And the economy’s good enough where I’d be a fool not to think about options.”

He sighs. “In terms of what the team here can do for the next person, well, I would” – slight pause here as he gauges me a little – “follow through on some of the things I told Ben a few days back. We keep saying we want to be cutting edge, modernize, get out of firefighting mode, blah blah. But the decisions we make are in the exact opposite direction – we are a clothing company not an IT shop, we don’t have time for testing, forget about operationalizing, we need to deliver these features on these dates and get it done. Our words and our actions need to line up.”

He looked around the group, pointedly. “I also told Ben that I was tired of trying to do heart surgery in a stable; I wanted to join a company where IT and application development are part of the DNA and our skills are really valued. This new spot gives me a chance to really drive and have an input on architecture at a meaningful level, not just trying to patch up a sinking ship.”

I keep a calm smile on my face, but inside I’m seething; WonderTek is not the Titanic. I could squelch this, but that would give everyone the wrong message – and maybe lead to more defections. George is one step ahead of me as usual – he chips in, “Erik, you are telling us that the entire organizational culture here is working against the direction of the team, and that’s why we keep getting caught in unplanned maintenance hell. That’s great, and I actually agree with you, but this is a problem you’re raising without a solution. What’s something you think we need to improve on as a team that wouldn’t require a magical fairy wand of power?”

Erik replies, “It’s the exact same thing I’ve been bringing up in retrospectives for the past year. We need immutable infrastructure, a set of golden images and templates that we deploy as part of our releases. That’s the key to eliminating three quarters of the problems we see that are caused by things other than code. If we had more discipline in how we set up and deploy our infrastructure, we’d get meaningful release management, faster dev/test cycles. That’s the straw that stirs the drink – infrastructure as code.”

Harry, one of my silverback coders, grunts derisively and says flatly, “Erik, like I told you last retrospective, this isn’t Amazon. We are always being asked to do more with less, and moving mountains means we have to turn requests over fast. Like it or not we have multiple lines of business to support and hundreds of applications. The kinds of automation you’re advocating require a lot of setup and governance and – most of all – TIME. Time we just don’t have. What you’re proposing might work for a software company, but I have yet to see it working in the real world where we’re saddled with tons of legacy code and customers screaming their heads off over every delay.”

Rajesh nods in agreement and says, “Yes, and our current pace just isn’t sustainable. We need to move more slowly and shore up our testing, so what we push out the door doesn’t fall on its face. That kills our rep with the business that’s paying the bills. We have to move slow for our apps to have reliability and stability.”

Erik says resignedly, “Look, this is exactly the discussion I didn’t want to have. This is not a divorce, it’s a career decision. You should be able to hear this without getting all up in my face. And why are you selling yourselves short like this?” He looks around the group for a few seconds then continues: “We are a software company that happens to sell clothing; development and IT are a vital part of our business. Pretending otherwise and treating our work as a cost center or a program where risk has to be managed is hopelessly out of touch. I’ve worked for startups, I’ve worked for large enterprises, banking institutions with heavy compliance and security requirements, insurance companies, it doesn’t matter. We are all software companies, plain and simple, and it’s been that way for a long time. In the end our job is to produce high-quality software as fast as possible. I’ve just mentioned one proven way to do that, something everyone here should be 100% onboard with, and all I’m getting is static, like I have for the past 12 months.

“And is it really a tradeoff between speed and quality – one or the other?” Erik looks around the circle for a few moments. “I don’t buy that you have to go slow to go fast – in our industry, right now, we have competitors that are releasing multiple times a day and their quality and stability keeps going up. Don’t you guys think our management knows that? At some point, we are going to have to stop trying to find new and ingenious ways to fail, and start thinking at a higher level. You just can’t solve these problems we’re facing using the same old way, at the same level of intelligence that created them.”

There was a long pause following Erik’s announcement. I see some angry faces, but at least half of the team is nodding in agreement. And Erik, much as I hate to say it, is right on some points. If I care about honesty, this is the time to show it. I say calmly, “OK, so this is a daily standup that has kind of spiraled a little bit. I’m sorry Erik – I really didn’t mean to put you on the spot.” I’m lying a little – there’s a part of me that enjoys watching Erik squirm in payback for his poisonous exit interview. “I’m accepting that we need to think about new ways of doing things. I think we’ll start with putting on paper the values we have as a team.”

I draw this on the top of the whiteboard:

What We Value in Our Tribe

Open, honest communication – you will never get in trouble for saying the truth

I turn to the group and say, “That’s front and center, and I’m going to have that show up in every performance eval and 1x1 we have. As some of you know, I grew up in a pretty dysfunctional family. One thing I’ve learned is that addictive behavior – whether it’s centered on alcohol, drugs, or some form of abuse – thrives in the darkness. In my family, with my wife and kids, I try to be transparent, to be honest. If we have a problem as a team, I don’t want to waste time trying to sweep it under the rug or ignore it and pretend like we’re perfect. That’s a dead-end game. I don’t want to be perfect. I want to be better. Everyone of you, I know, feels exactly the same.

“Erik is telling us that we have a problem. And it just happens to match with some of the things we’ve discovered in our last release, so I know it’s grounded in reality. Let’s put up here – right in the open, in our team area, the problems we’re discussing and some of the possible solutions, so it’s not a secret anymore.”

What Isn’t Working

Branch integration issues

Code coverage dropping

Business changing its mind

Too many bugs – stuck in firefighting

Bad communication prerelease with IT

Long list of aging features

Rising technical debt

Infrastructure creaky, inconsistent, difficult to manage

Alert and monitoring nonexistent

I get general agreement from the group; this isn’t a complete list of all that ails us, but a good starting point at least. Now, I ask for some solutions to throw up there – and the board starts to look more complete. This fits pretty well with what the Brain Trust and I were talking about earlier:

Things We Could Try

Continuous integration & delivery

Smaller releases

Refining requirements gathering

Peer review

Version control

Code coverage and testing

Better teamwork with outside teams especially Ops

Paying down technical debt

Infrastructure as code / a set of “golden images”

We all step back and take this in. It’s an imposing list. Harry speaks for everybody when he laughs shortly and says, “This is like staring at Mount Everest. Any one of these are monumental tasks. It’ll take us years even to scrape the surface.”

George steps in with the right words. “Yeah, I agree. Let’s not try to cure everything on this list at once. Let’s just think about the last release. Which of these items sticks out as being the low-lying fruit?”

Padma says quietly, “I like that point about peer review. You know, where I used to work, we really took peer reviews seriously. I mean, you couldn’t check in code unless at least one other person had looked over what you’d done. It forced me to write neater, more well documented code.”

Hearing this, I start to smile and can see several others doing the same – it’s hard to imagine Padma ever writing a sloppy line of code, even with a gun to her head. She’s the ultimate professional; her approaches and techniques are always top-notch, and the thoroughness of her documentation is the stuff of legend. But George is nodding his head, musing: “I kind of like this, because it’s low cost, it’s something we can control, and it’s a way of holding each other accountable to a common standard. In the past, that’s worked really well for us.”

Alex groans loudly. “I’ve also worked on teams that tried peer review, and let me tell ya – that’s one idea that sounds great, in theory. But it just ends up being a gauntlet for junior programmers to run while they get beaten senseless. All it ever leads to is people being afraid to change the name of a module or else the Grammar Police come out of the walls and start whipping people for breaking their sacred naming conventions.”

“I’ve worked places in the past where things went that way too,” I mutter, frowning. But there has to be something here for Padma to bring it up. “It seems to me like your former outfit figured out how to do code and peer reviews right, Padma. Tell you what – let’s sidebar this one. Can we talk about this in a week?”

Padma looks relieved; now, she has a few days to prepare. “OK – now on to version control,” I nudge. “Now, that’s the first thing we addressed with our switch to Agile. Thankfully, this is one area I don’t think we need to talk about at length.”

Erik coughs, loudly. “Actually, in Continuous Delivery Jez Humble pointed out that most people think their version control strategy is rock solid, but that’s rarely the case. Using version control is foundational and it means that you can deploy and release any version of your software to any environment at will using a fully automated process. You know you’re doing things right if your process is repeatable and reliable; that’s clearly not the case with us right now. We’ve got our source code in, and most of our config files, that’s true. But does that capture everything needed to run things, including systems configuration? And, does that really match what is running on production?”

Uncomfortable silence follows this question. Erik continues: “Let’s say that we lost a production server. How long would it take us to rebuild it, and could it be done 100% with what’s in our version control?”

Despite myself, I start to laugh. “OK, point taken. Much as I hate it. What bothers me the most about there’s a huge amount of work you’re talking about, capturing EVERYTHING needed to deploy, build, test, and release in version control. Up to now we’ve really just thought about version control as being for code. Let’s add this to our to-do list then and come back to this again in a week. Erik, could you take point on that one?” He nods, and I move on. “This item on breaking up large features into small, incremental changes, and having them merged to trunk daily. Is that even practicable for us?”

George says, “Well, I know that’s a key part of how the big companies we keep bringing up as examples do their work. Engineers at Microsoft, Netflix, Amazon, all have slightly different build strategies – but they all view long-lived feature branches as poison. So their teams check their work into mainline multiple times a day, and they don’t seem to have the integration issues we suffer through.”

Harry begins ticking off the now-familiar reasons why this could never work for us. “Once again, even if we had the resources to support a refactor – which we don’t – we have interfaces and hooks to our mainframe systems that change at different rates, which means a lot of releases stuck in queue waiting on a dependency to finish development. Our current branching strategy isn’t perfect but I don’t think there’s a lot of ROI in refactoring it just to try to make our brownfield applications look like a shiny web-native greenfield app.”

There is some back and forth discussions on this for a few minutes, but no clear consensus on whether this was a main pain point. It looks like the costs of implementing continuous integration might not outweigh the benefits for us. With some relief, I put it on the list as a potential future topic and mentally shelve it. We have enough at this point to get started on, even if the crystal ball still looks a little murky. Here’s what the right side of the wallboard looks like after our edits:

To Do

Peer Review (Padma)

Version Control (Erik)

Code coverage and testing (Rajesh)

Maybe Later

Better teamwork with outside teams i.e. Ops

Continuous integration & delivery

Smaller releases

Refining requirements gathering

Paying down technical debt

Infrastructure as code / a set of “golden images”

What We Value in our Tribe

Open, honest communication – you will never get in trouble for saying the truth

We’re now well over half an hour overtime; it’s time to wrap things up. I say, “I’ve heard from some of you that it seems like we write things on sand, never really learning or improving. And Alex tells me that we’ve never really listed our technical debt in one place; our shortcuts and deferred work end up getting buried in our retrospectives. So, we’re going to leave this up here – and I’ll put this on our wiki so we can all keep this single list of our problems and potential solutions up to date and prioritized. But I’m not putting anything on this list that we’re not serious about, meaning we don’t have someone that can drive it start to finish.

“Here’s the takeaway message. We won’t get anywhere as a team if we don’t base everything we do on honesty. Even if its uncomfortable and it’s not what we want to hear. So, the sweeping things under the rug days are over, starting today. I’m adding Erik’s immutable infrastructure to our list. And I am making our team values explicit and in writing – you will never, EVER get in trouble for saying the truth. It won’t show up on your performance eval, it won’t be passed upwards with your name attached. We want feedback, we want honesty. I’m going to leave this up here, and we’ll have a copy in every retrospective, so you can hold me accountable.”

There was still a residual air of grumpiness as the team starts to disperse, but oddly enough I feel more cheerful than I have in days. Somehow, making some of those unspoken, implicit team values explicit seems like something of a turning point.

Legacy’s Kiss of Death

It’s been a long day; I put on my coat and head for the door. It’s going to be mostly back roads for me today in getting home. All the roads out of Portland right now are bright red on Google Maps, like the circulatory system of an unhealthy heart patient.

George intercepts me a few steps from the door, and we chat a little as we head to our cars. I ask him what he thinks about the team values and to-do list on the board. Under the hood of his raincoat – no true Oregonian would use an umbrella, that’s for tourists – I get the glimmer of a smile. “Well, it’s a start… we’ve had these values for a while, you know. But I think it helps to make sure they’re explicit, spelled out, and up on a wall somewhere.”

“Yeah, I think it was past time.” I pause and decide to confide in George a little more. “Erik leaving really hurts us, and more than just losing his tech skills. He really threw the team – and me – under the bus; made it sound like we were treading water, complacent, atrophying. That’s the kind of ammo that our friends out there in Operations would love to get ahold of.” I sigh. “As I’m sure they eventually will.” I was speaking loudly, I noticed, and I sounded angry. I’m tired of playing defense, and the last few weeks have worn on me.

George pats me on the shoulder and starts to drift toward his own car; a few steps away, he stops. “As great as it looks to have our team values up on the wall, in the end it’s just a bunch of feel-good nonsense if we’re stuck in quicksand. We still haven’t even started to address the main struggle we’re having – our massive amount of technical debt, all the legacy applications out there that we know little or nothing about and break down every other day. Until we get our arms around that, we can’t make headway with the demands Footwear and others are putting on us for new work.”

I grimace. This was exactly what I was afraid of in terms of perception. If the team interprets our discussion today as window-dressing, nothing will change – we’ll sink back into complacency, a sure death sentence. And was it really true that our vast amount of legacy support demands means we have no viable options to move out of firefighting hell?

“I agree with you, George. And I need your help. We need to get ahead of Erik’s smears. Perceptions are important – I want people outside the team to see that the way we work doesn’t jive with the mud he’s throwing around on the way out the door. That sinking ship thing – sorry, I don’t agree. We are NOT a lost a cause.”

We part ways, and I make my way onto the hopelessly clogged Highway 26, heading home. As expected, the drive home was a hellish nightmare of stop and go traffic. It gives me time though to think about the team’s feedback and George’s comments. Were we hopelessly stuck in the mud?

Behind the Story

The first step Ben’s team takes to getting the mastery over their spiraling technical debt situation is to make it visible. Why is that important?

Making Technical Debt Visible

“How did you guys at Microsoft do it?”

“What’s the best practices you recommend for us to implement DevOps?”

“Which tools do you recommend for CI/CD?”

“Can you give us a roadmap or recipe so we can reach our goals?”

Some form of such questions gets raised almost every time we meet with a new customer. Perhaps, this is the reason why you bought this book; if so, you’re probably wondering why we are answering with a story.

There Is No Single Roadmap or “Best Practice”

It’s completely understandable that people want to ask about “best practices” or get a specific roadmap. Uncertainty means risk; actions mean exposure, mistakes, and vulnerability . To remove some of this risk, it’s tempting to take what another company has done and try to implement it like a prescription, a recipe. We’re going to explore some of the common elements that we found in successful enterprises that use DevOps to innovate at scale. But to quote Aaron Bjork from the Azure DevOps program team, “it would be foolish for me to say, you should exactly do it our way.”

Here’s the problem with a roadmap: they end up becoming prescriptions. But a roadmap is based on a starting point, and an ending point. And it’s a certainty – your starting point, assets and liabilities, product portfolio, and end goals are going to vary dramatically from ours. Let’s say you went on a cross country trip, from Seattle to Orlando. How helpful would a roadmap of an earlier trip from Tempe to LA be for you?

In other words, recipes can and do fail us… But experimentation and learning works, always. We believe you’ll learn more from how Ben’s team goes about solving their problems than we could ever teach by giving you a generic and ultimately misleading checklist.

Just for example, take the groundbreaking book The Goal by Eli Goldratt . This is a book that radically refined how we thought about manufacturing in the 1980s and was the inspiration behind “The Phoenix Project.” But we often forget that the author was a scientist and interested – not so much in pat answers or a recipe – but in a new way of teaching that used both Socratic questioning and the scientific method.

Asking questions instead of lecturing with “facts” or “best practices” forces us into an uncomfortable area where our preconceptions are challenged. That discomfort leads directly to an environment where failure is welcomed, where we’re encouraged to learn from mistakes instead of trying to sweep them under the rug or point fingers. The scientific method teaches us that everything is a hypothesis or a theory, an assumption; we can come up with thousands of trials, but if even one test disproves our theory, it must be abandoned, and we have to rethink our assumptions with a new theory.

One Team, One Year

In this book, we are writing a story that reflects the world that is, not as we would like it to be. In the real world, we aren’t blessed with limitless power, a blank slate, and armies of well-trained consultants and employees. No, we are saddled with legacy debt, constrained by limited resources, and often have very limited management buy-in.

In this book, we’ll tell the story of a team that overcomes their limitations by learning to experiment. As we go along, the team will gradually go up the mountain, making mistakes along the way – but always moving onward and upward, iteratively building on success.

The methods underlying the works of Eli Goldratt, Gene Kim , and Edward Deming – the scientific method, and Socratic questioning – should always be top of mind, regardless of what you call your transformation or how you go about it. Applying this scientific method however requires a forward-thinking mindset and a group that isn’t just tolerant of failure but embraces it as an opportunity to learn. That doesn’t describe the vast majority of the organizations we’ve worked with. With some highly political companies we’ve engaged with, we used to joke that getting stuff done was like playing Snakes and Ladders – only with no ladders, only snakes!

We can forget about the scientific method in the WonderTek the way things are now. In that environment, it’s not safe to make mistakes. Any errors or lack of knowledge will be exploited by other teams; the name of the game is avoiding blame and shifting responsibility elsewhere.

Sure enough, that’s what we’re finding here at WonderTek. There’s a few antipatterns that the team is saddled with that is causing some definite angst:
  • Multimonth milestones and long-lived feature branches, leading to painfully long and messy integration phases

  • Traditional command-and-control type organization, leading to risk-averse climate where mistakes are punished politically and siloization

  • Lengthy feedback cycles and disengaged stakeholders

  • Manual, error-prone, and time-consuming release processes and a lack of attention to quality, especially in testing

  • Poorly understood and brittle legacy code that is resistant to change and a constant drain on resources with firefighting

  • A creaky, inconsistent infrastructure layer that is hard to scale and impossible to manage

It is just starting to dawn on Ben’s team how much this is cumulatively impacting them.

Game Attracts Game

And as we see in the section “Bleeding Out ” – one of the most insidious effects that’s somewhat hidden from Ben is with hiring and a gradual brain drain. As Mary and Tom Poppendieck have brought out, up until a century ago, capital was a critical resource. Now, the largest constraint globally is the passion and energy of bright creative people. It’s becoming harder and harder for Ben to hire and keep the best and brightest, as we see with Erik’s defection. His best and sharpest minds are walking out the door, to more competitive, fun, capable companies. The people that remain in this environment are atrophying in their skills; Ben is being asked to perform like a thoroughbred, with less and less capable people to do the work.

One decisionmaker we interviewed told us that tech and IT was now a supply problem – companies are finding technical resources and skilled people to be the real constraint. As he put it, “If we're going to recruit and retain world class talent, we have to show we're serious about learning. I absolutely believe that game attracts game, that people want to work with the best. Most of my focus and those of my people is around attracting and keeping the best – that’s our firepower.”

The enterprises that focus on learning and eliminating stress and meaningless work are outperforming others, because they’re able to attract and keep the best people. These are companies that aren’t focused on “cloud first” or “doing DevOps” – they are learning focused and invest in continuous training and practice consistently.

Making Debt Visible

In the section “Dysfunction Thrives on Secrets,” one of the first steps the team takes is to make their technical debt visible. Why begin with this, instead of better release management practices?

Perhaps, Ben is trying to create a Plimsoll line .

Back in the 17th and 18th centuries, it was very common to lose freighter ships to overloading. In fact, unscrupulous shipowners would often send overloaded aging ships out to sea to collect on insurance money – which led to an appalling loss of life. In 1876, British Parliament passed a bill that made it mandatory to have marks on both sides of a ship. If the ship was overloaded, the load line marks would disappear underwater – and the harbormaster would prevent the ship from leaving port.1

If we are putting a ship out to sea, it’s important to know how much weight we are carrying so we don’t risk being swamped. If we’re trying to pay our way out of debt, the first step is to track what we are spending and keep up to date a full list of our debts.

In any project, we are going to incur technical debt – improvements we have to defer, the cost of shortcuts made to make our goals. Most of these revolve around improvements in areas like security, performance, and the like. These may not be key technical functions that the business is asking for, but they end up being rocks that our application is wrecked on – the unspoken, implicit expectations that when missed causes our users to hate what we’re producing.

We like the neat way the cost of technical debt is defined by Kief Morris :

Technical debt is a metaphor for problems in a system that have been left unfixed. As with most financial debts, your system charges interest for technical debt. You might have to pay interest in the form of ongoing manual workarounds needed to keep things running. You may pay it as extra time taken to make changes that would be simpler with a cleaner architecture. Or charges may take the form of unreliable or hard-to-use services for your users. Software craftsmanship is largely about avoiding technical debt. Make a habit of fixing problems and flaws as you discover them, which is preferably as you make them, rather than falling into the bad habit of thinking it’s good enough for now.

This is a controversial view. Some people dislike technical debt as a metaphor for poorly implemented systems, because it implies a deliberate, responsible decision, like borrowing money to start a business. But it’s worth considering that there are different types of debt. Implementing something badly is like taking a payday loan to pay for a vacation: it runs a serious risk of bankrupting you. 2

In any project we’re involved in, we make sure as one of the first steps that there’s a full list of our technical debt exposed and visible to the stakeholders; paying down this debt should be what we choose to invest our time in after the very limited delivery goals for each sprint. Microsoft made paying down this debt one of their top priorities on the Azure DevOps product teams. This was accomplished by means of tracking their BTE ratio – Bugs-to-Engineers – and making sure that this number rarely exceeded a reasonable count. We discuss how the BTE ratio helped give Microsoft clarity in the interviews with Sam Guckenheimer and Aaron Bjork in the Appendix, and in Chapter 6 under metrics and monitoring. Suffice to say – for us, this was a vivid example of a Plimsoll line and the positive behaviors and results gained by keeping technical debt visible.

Another method used very successfully to expose technical debt by large enterprises such as Ticketmaster, CapitalOne, and Exxon is the much-maligned maturity matrix. We can’t agree with many of the books out there that flatly call a maturity or capability matrix an antipattern; there’s several powerful examples of companies using cloud-enabling capabilities to create self-organizing teams that assess themselves, including recommendations for improvement that fit their specific use case.

The main point is in the section title: Dysfunction does thrive on secrets. Happy families value honest, open discussions – even when this causes some pain in the short term. It’s the dysfunctional families that tend to hide problems or minimize them. Left in the dark, these secrets often multiply into a crushing burden.

Exposing the weight that Ben’s team is carrying as a list – a Plimsoll line – is a powerful first step to beginning to formulate a better way of lightening the teams’ load.

We’ll talk more in Chapter 7 about how to recruit that all-important executive support we’ll need to start paying down this debt. Suffice to say that it’s critical to speak in the language of business, to link our list of debts to quantifiable results: dollars saved, waste eliminated, or opportunities for gain. Michael Stahnke of Puppet told us that it’s critical for engineers to learn how to put technical debt in practical terms that executive decision-makers can understand:

…Think about the way you are presenting this information to the people that matter in your company. You want to steer towards a measurable outcome that matters to people. For example, don’t try to sell MTTR by itself. If I say I need to have more worker roles added to this process, or more servers in the background – that’s great for techies but not for the people writing the checks. But if I say it costs me this much every time this problem happens, then I get strong feedback. What would it cost to fix this? Then we can talk about our options and have a discussion. But if something’s important, I do try to convert it to dollars – that’s the universal language of business.

That’s something I wish more developers understood. We need to think about the long-term maintenance costs of the things we build more. Over time, the cost of development gets rounded to zero in comparison with the cost of operating and maintaining the system.

Operations teams have a better understanding of business value and maintenance costs than developers do. If we really understood the language of business – we could get the backing we need to produce better quality software and keep our technical debt in check. 3

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.193.3