Tidy home, tidy mind.

Anon.

Chapter 14
Good Housekeeping

Production issues again. You and your team are sitting together looking through the logs and scratching your heads.

“That just doesn’t make any sense, though,” says Ben. “Why would the quorum keep trying to connect to the instance that crashed?”

“No idea,” you reply. “Isn’t it meant to detect that it’s gone down, then another instance gets elected as leader?”

“Hang on, didn’t this happen before?” says Tara. “I seem to remember us debugging a similar issue a few months back. What did we do to fix it then?”

You can’t remember. You search back through old commits to the codebase to try and find the one that was meant to have fixed this problem last time, assuming it did actually happen last time, and also assuming that you did actually fix it. No dice. You can’t see anything.

“Can’t find a relevant commit, unless it’s named something silly,” you say.

You notice Emma walking past—she’s a principal engineer and no doubt would have some good insight into this problem.

“Emma, have you got a second?”

“Sure, what’s up?”

“We’re having production issues with our storage layer again. Some of the instances have died, and we’ve brought them back up, but the existing instances won’t stop connecting to the old one, and our apps are stuck as a result.”

Emma wheels a chair over and has a look at the logs with you.

“OK, can you show me the configuration that you’re launching this with?”

“Sure,” you say, bringing up the production configuration to the application. Emma leans in and studies the values.

“Where’s the connection timeout default defined?” she asks.

“There’s meant to be a default?” you reply.

“Yeah. Otherwise it’ll retry dead instances forever and not try to seek a new leader. I think that’s your problem. Put that in and restart it and see what happens.”

You do that, and the cluster elects a new leader and springs back to life. Your applications go green again.

“I’m surprised you’re not using the defaults we’re all using, since that’s inherited from there,” says Emma as she stands up from her chair.

“There’s defaults?” replies Tara.

“Yeah, there’s a parent config for applications talking to that storage. I’ll send you the link to it.”

“Do you reckon this is what caused all of the outages that we had last month?” you ask.

“Probably, it seems likely,” says Emma, walking back to her desk. “I’ll send you the location of the parent config now.”

The members of your team look at each other, clearly feeling a bit stupid.

“Did we really lose all of that time last month because we didn’t have some config values set?” asks Ben.

“Seems like it. We spent days thinking it was our code,” you reply. “Why didn’t we know that config existed? Maybe even after the third time it happened?”

images/Dividers/CH_14.png

Does this situation sound familiar? We’ve all been there. Some missed communication, bad documentation, or crossed wires causing all manner of problems. This happens at all levels: between individuals, within teams, and as a whole department and company. Without a disciplined approach to improvement, communication, and information sharing, departments can begin to fragment. Teams can feel like they are speaking different languages, even though they’re sitting next to each other in the office.

The good news is that there are a number of techniques that you can use as a manager to keep your house tidy. You don’t have to wait for your CTO to make declarations on how to solve these problems from the top down. Instead, you can begin to implement them in your team for an immediate benefit. You can then help spread this bottom-up approach by sharing what you’re doing with others in your department and bringing them on board. Your spotless house can make the neighborhood better too.

This chapter is about forming groups and instilling simple processes so that you embrace collective learning, make fewer repeated mistakes, and interact with others that you may have never met before. This chapter is about good housekeeping in your team and in your department.

Here’s what we’re going to look at:

  • Why communication is tricky, especially in large departments, and how that has a measurable effect on the software that you build.

  • How you can cross team boundaries to unite people of similar skill sets and interests with guilds.

  • How encouraging a culture of technical talks can unite your team and your department.

  • How problems can be opportunities for department-wide improvement, both in your software and in your processes.

  • Some tools for solving common problems such as understanding the context around why software is designed in a particular way, whether a team is improving over time, and working out who is responsible for what.

If you’re looking for ways to increase your influence in your department, then this chapter might just be the one for you. You’ll find many practices you can start in your team that can pick up momentum elsewhere, and you can be the driving force. Are you ready? Let’s go.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.35.128