Chapter 2. Interviewing Site Reliability Engineers

There are two phrases you continually hear about why projects are successful: “My engineers made it possible” or “My engineers did the impossible.”

Much of this book is about technology and how to apply it to problems. That technology is completely useless without the right people. This chapter is dedicated to the softer side of our jobs: hiring those right people. Every one of these hires will operate and build the technology that is used to solve problems, disrupt industries, and achieve feats that no one thought were possible.

It all begins with the engineer.

Interviewing 101

Before we get into the specifics of SRE hiring, it is useful to build a common understanding of how interviewing typically works. This is a general overview based on experiences working at companies such as AOL, Google, YouTube, and Dropbox.

Who Is Involved

Generally, there is a minimum of three parties involved: the candidate, the recruiter, and the hiring manager.

Industry Versus University

Usually, there are two types of candidate profiles that hiring managers recruit: industry and university (which includes masters and PhDs). Industry candidates are engineers who have worked in similar positions at other companies. University candidates are exactly what it sounds like—students. Compared to recruiting software engineers, recruiting SREs from universities is significantly harder and more specialized. The major reason for this is that you will first need to educate university students on what SRE is before they will even apply for the role. In this book, we focus exclusively on industry recruiting.

Biases

Before we go too deep into the interviewing content, it is important to recognize that we are all humans and thus have biases. We do not cover how to avoid all biases within this chapter. However, we do call out two specific ways to combat biases that organizations starting to build out SRE teams should seek to implement. The first is to do a blind résumé review, and the second is to standardize your hiring processes. Both of these are simple enough tweaks to existing processes or are simply best practices for building a new interview process.

Blind résumé review means that you are attempting to remove biases from the résumé review by obfuscating identifying information within the résumé. You will need to understand what matters to your organization and calibrate the blind review process accordingly.

Helping you to standardize the hiring process is a goal of this chapter. By the end of this chapter you will have learned a systematic way of evaluating candidates, both at scale as well as for a limited number of positions.

At the end of this chapter, there are links to further reading on the topic of conscious and unconscious biases.

The Funnel

Most companies think of hiring as a funnel that has stages through which candidates pass until they either fall out at a given stage or receive an offer. Figure 2-1 illustrates what a typical funnel looks like.

The hiring funnel.
Figure 2-1. The hiring funnel

Let’s take a look at the individual steps involved:

  1. Pre-interview chats

  2. Recruiter screen

  3. Phone screen

  4. Onsite interview

  5. Evaluation

  6. Take-home questions

  7. Additional evaluation

  8. Reference checks

  9. Selling candidates

  10. Offer out

  11. Offer accepted

Each of these stages serves a unique purpose. Measuring the stages of the funnel will help you to refine the process and hire world-class candidates. What and how you measure is a topic for another book. What’s important here is that you know the funnel exists and some of its basic stages.

SRE Funnels

Now that you know what a funnel is, there are three major areas that are crafted specifically for hiring SREs: the phone screen, onsite interviews, and take-home questions. This section lays out the purpose of each area and what technical and cultural areas it covers. The content of the interview and stages will differ between companies and is not meant to be a prescriptive formula to recruiting. As with all areas related to talent within your organization, you should apply the lens that your organization requires.

Phone Screens

Before you bring a candidate on site, it is important to get some basic signal with respect to motivation, technical aptitude, and general experience. The phone screen is done at the top of the funnel because it can help filter out candidates who will not make it through the interview. This prevents incurring a high expense at later stages, in the candidate’s time as well as your engineering team’s time. Also, you can use the phone screen to touch on a variety of topics to help form a hypothesis of where this engineer would fit or would not fit into your organization. If you are unable to formulate this hypothesis, you might want to walk away after the phone screen.

Conducting a phone screen

A good format that has been tested at multiple companies is to have phone-screen questions that are specific and concrete enough that the candidate is able to get to a solution within 20 minutes. For example, having a coding and a process or troubleshooting question allows you to determine someone’s technical breadth and experience.

Coding, which is pragmatic and simple enough to do in a shared online coding tool, works well and usually results in reasonable signal regarding the candidate’s technical depth.

Troubleshooting or process questions should aim at understanding how the candidate reasons through problems that do not involve code. A simple rule of thumb can be aiming for two 20-minute questions along with a 10- to 15-minute discussion about the role, company, and expectations; this combination typically fills a one-hour phone screen.

You will want to have a few phone-screen questions in your tool chest and be willing to pivot depending on the candidate’s experience. Having an easy question that allows you to keep the conversation going is perfectly fine at this stage.

It’s important that the candidate walks away from the conversation with a positive view of your organization, even if you are not moving on to the next stage.

The Onsite Interview

Before you bring a candidate into your office to interview, you should formulate an explicit hypothesis around the candidate. This allows you to do the following:

  • Set expectations for the interview panel

  • Fully understand what your organization is looking to hire

  • Craft an interview loop that will fairly evaluate the candidate

A good hypothesis will include the expected seniority, growth path, and leadership skill set, which you can validate via the interview process. For example:

The systems engineering team, which works on Linux automation, is looking to hire a senior engineer with strong distributed systems knowledge that will grow into a technical leader, either formal or informal, of a team of mostly junior engineers.

Teasing this apart, you can identify what the team is looking for in a candidate:

  • Senior engineer

  • Systems knowledge

  • Distributed systems

  • Technical leadership

  • Mentorship

Taking this data and combining it with other baseline requirements that you might have, such as coding, you can create an interview loop that accurately evaluates the candidate for the role. In this case, to evaluate these skills, you want to understand the candidate’s working knowledge of Linux, how that person operates distributed systems, and how well they can mentor and set technical direction on a team as well as whether the candidate possesses the baseline coding requirement.

A good loop for a candidate that is being evaluated for that hypothesis would include the following:

  • Two medium-to-difficult coding questions that involve understanding of system designs

  • Deep dive or architecture

  • A focused interview on leadership, mentorship, and working with others

Coding and system questions

Good questions that can test coding and systems tend to be those that are real world and are bounded by your organization’s size against a scaling factor of 10. For example, consider the following question:

Design and code a system that can distribute a package in parallel to N servers.

You want to select an N that is realistic for the scenario that you expect the candidate to face. If your organization has 100 servers, it’s realistic to expect it will grow to 1,000 but not 100,000. You will expect the SRE to solve the problem at hand when on the job, so ask the question in a way that is realistic for the working environment.

Deep dives and architecture questions

Deep dives and architecture test similar attributes of a candidate—namely, their ability to reason about technical trade-offs. However, you will want to set up the interviews in a slightly different way. Deep dives tend to work better when you ask the candidate to select the space ahead of time. This allows you to place an engineer with the candidate who has thought about the problem ahead of time. Architecture questions do not require this advance preparation. However, similar to the coding question, you want to set bounding parameters. An example of a bounding parameter can be to specify an initial scale and then increase the scale at various stages. By starting small, you can assess attributes of the candidate, such as their bias toward simple versus complex solutions as well as how they think about trade-offs at each stage of the problem.

Cultural interviews

All cultural fit interviews are a way of ascertaining how well a candidate will fit into your organization’s cultural value system. This is not a “would I like to hang out with this person?” interview. Rather, it is a properly crafted set of questions that dig into the way an engineer works with others, whether the person focuses on the right areas, and whether they are able to articulate problems.

Cultural fit can even offset technical weaknesses depending on the size and structure of your organization. For example, an extremely talented engineer who cannot operate in a large company might be a pass, whereas someone who understands organizational leverage but has small technical gaps might be much more valuable at scale. You can determine this fit by posing questions that dig into the candidate’s working style, such as asking for examples of how the person approached a certain project and what reflections and learnings they took from the experience. Good candidates will typically be able to articulate the “why’s” behind their thought process and what they would do differently given your environment.

Take-Home Questions

Occasionally, you will have a scenario in which you want to get additional technical signal after you have looked at the interview feedback. You can choose to have the candidate return for another technical interview or use a take-home question. Take-home questions have positives and negatives. Let’s explore both before examining what makes a good take-home question.

On the positive side, take-home questions allow a candidate to fully showcase their ability to write software in a more real-world scenario. The candidate has access to their own development environment, access to their favorite search engine, and the ability to pace themselves. All of this results in much higher signal, including aspects around collaboration via clarifying questions, if needed, with the proctor.

However, take-home questions are not without drawbacks. The major disadvantages of take-home questions involve the time commitment required of both the candidate and an engineer from your company. A typical scenario requires a three-day window for both parties to be available and will delay the potential offer. You should be very judicious in the use of a take-home question.

Good take-home questions should be built in a way that can provide signal for the area or areas for which you lacked signal from the initial interviews. At this stage of the funnel, you will have very few candidates, and it will also be hard to calibrate your organization on the questions. To counteract this, having another hypothesis on what the take-home will validate is very important. It is equally important to not discount the onsite interview if parts of the take-home are not on par with the onsite. Remember, you are looking to validate something you did not get signal on during the initial onsite interview.

Advice for Hiring Managers

The hiring manager, or the person responsible for staffing the role, has three major responsibilities during the interview process:

  • Building the hypothesis (covered in “The Onsite Interview”)

  • Convincing or selling the candidate to join your organization

  • Knowing when to walk away from a candidate

Selling candidates

From the first minute you meet and talk to a candidate, you should be selling them on your organization. How you do this will vary from person to person. You want to know the answer to three questions in order to to effectively sell a candidate:

  • What motivates the candidate?

  • What is their value system?

  • What type of environment brings out their strengths?

By understanding these dimensions, you can craft a compelling narrative for a candidate to join your organization.

Walking away

Walking away from a candidate is one of the hardest things to do during the process. You, and potentially your team and organization, will have made a time and emotional investment into a candidate. Walking away is also something that can occur at any stage, not just after you decide to pass because of interview performance.

The core reason for walking away from a candidate comes down to organizational fit. Each time you learn something new about a candidate is an opportunity to reevaluate. Just as in engineering, the hiring process is about trade-offs. A perfect candidate might demand a salary that you cannot afford, or they might have the perfect technical skills but none of the leadership skills you are looking for. Walking away is hard, but know that making the wrong hire is almost always worse in the long run. Optimize your pipeline to avoid false positives. If you are willing to take risks, be explicit and have a strong performance management program in place.

Final Thoughts on Interviewing SREs

People form the backbone of your organization, and you should use your interviewing process to ensure that you have found the very best people for your organization. Be prepared to take risks and place bets on people just as you would if they were internal to your organization, because you will never know everything you want to know in such a short time with a candidate. This is an imperfect process attempting to get a perfect output, and thus you will need to iterate and be open-minded as well as take risks.

At the end of the day, even if you decide to not make an offer based on the interview performance, make sure that the candidate walks away feeling the process was fair and they had a good experience. Wouldn’t that be what you would want if you were in their shoes?

Contributor Bio

Andrew Fong is an engineering director at Dropbox. He is also one of the first members of the SRECon steering committee and helped cochair the inaugural conferences. He has spent his career in infrastructure at companies such as AOL, YouTube/Google, and Dropbox.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.244.201