Chapter 15. The Intersection of Reliability and Privacy

Note

Privacy engineering is a young field in which industry players remain cautious with their public discussion. We hope that the relatively abstract concepts and approaches discussed in this chapter will spark ideas of more concrete privacy opportunities within your own organization and build an environment in which conversations around privacy innovation can thrive.

With the recent publication of Google’s SRE book as well as a fair number of other publications and conferences about SRE, DevOps, and related movements, there’s a fairly active conversation about reliability engineering across the industry. As an inherently more sensitive topic, privacy engineering is less openly discussed, and, as a result, less well understood. Although many companies and organizations are beginning to think about many of the correct and important aspects of privacy and to consider privacy as an engineering discipline, this field is much less robust than SRE. Yet at the same time, privacy is a critically important concern for virtually every company or organization that handles private data, given that mishandling private data typically can’t be undone.

After scanning the contents of this volume, you might be asking yourself, “Why does a book about SRE dedicate an entire chapter to privacy?” Sure, any organization that actually cares about its users should invest energy in both reliability and privacy. But beyond that, why is privacy so relevant to SRE?

As anyone working on reliability knows, SRE doesn’t exist in a vacuum. There are many concerns that arise when performing SRE work (e.g., reliability, cost, efficiency, scalability, and security); privacy is one of these related concerns, and arguably one of the most important. Although privacy engineering, security engineering, and SRE are related disciplines, privacy engineering is a distinct field/position because it requires distinct cultural knowledge. Privacy engineers bridge social and technical work by maintaining all of the real-world context and perspective to understand what data is sensitive to whom and under what conditions. This chapter will help you build a robust privacy engineering posture in your organization based on SRE-style principles, regardless of your background.

The Intersection of Reliability and Privacy

Why, as someone in the field of reliability engineering, do you also need to think about privacy engineering, and why are you well positioned to do so?

The starting point for both reliability and privacy is the same: to deeply understand your systems before you can begin to reason about what reliability and privacy should look like in a specific environment. From the opposite end, privacy and reliability resemble each other because they share an end goal: satisfying user expectations. Both disciplines examine their problem space through the lens of what users expect. Users expect a company or organization’s products to work, and to work most of the time; they also expect you to respect their privacy—including data—appropriately. Both reliability engineering and privacy engineering boil down to the ultimate goal of ensuring user trust. Both ask, “Is the system working in a way that makes sense to the user, or will the user be surprised because the system doesn’t behave as expected?” One way to look at user expectations is to consider surprise a failure mode of a system: if a user can’t trust a system with their data, that system might as well be down.

Reliability also intersects the privacy realm on a structural level: privacy is protected by actual technical and administrative operations. These systems need to work reliably to fulfill their mission of protecting user privacy. In terms of the software development life cycle, both privacy and reliability concerns are also analogous. Operations teams long ago realized that the earlier they are involved in the pipeline, the better the end result. The same principle holds true for privacy engineering.

As experts in reliability, SREs are already concerned with meeting user expectations when it comes to providing a reliable system or project. But when it comes to the people using your product, some of their strongest expectations (whether they realize it or not) involve privacy. Privacy issues have come to the forefront in current events and have risen in public awareness in recent years. More than ever, users expect more from service providers when it comes to privacy. Because privacy is so fundamentally tied up with user expectations and trust, there’s a high demand for providers to do what they say with user data. The people tasked with safeguarding user expectations—whether that be an engineer, technical project manager, or program manager explicitly working on privacy engineering, or a counterpart in SRE who’s positioned to do so—will be the guardians of this realm. These teams are responsible for meeting users’ implicit and explicit expectations, fundamentally creating a reliable user experience (UX).

The good news is that there are people already doing work in the area of privacy engineering, and (as discussed in the section “Privacy and SRE: Common Approaches” later in this chapter) what they’ve learned can help you, as an SRE, to begin thinking about and approaching privacy in a productive way. Even better: if you’re just beginning to think about how to (better) engineer privacy into your system or service, you don’t need to start from scratch. SREs are already well equipped to create privacy value because SRE techniques are useful in the privacy space.

The General Landscape of Privacy Engineering

A privacy engineer’s goal is to go above and beyond compliance to try to make good products. Privacy engineering is not solely about checking boxes to achieve legal compliance. Rather, it is about developing creative solutions to achieve products that people trust, often according to extremely challenging technical, administrative, and legal requirements.

There is no single checklist that answers, “Is this privacy, or not?” As a complex discipline, privacy engineering is characterized by a certain amount of subjectivity: different people (users from all different walks of life, product visionaries, governments) might have quite different desires about privacy-related matters. This is one of the reasons we tend to think about “user respect”—although user respect isn’t an explicit mental model, it gets people asking the right questions: “Having seen and read the product’s notices and policies, would I, as a user, feel that this product works correctly? What about another user who’s not like me?” In its ideal form, privacy engineering should pursue intentional diversity, incorporating a range of life experiences, demographics, and personal philosophies. To create really good products, you must incorporate as many perspectives as possible and make sure you don’t miss something that would be obvious from a different frame of reference (see the comments by Amber Yust in the Buzzfeed article, “You’ve Never Heard of this Team at Google—But They’re Thinking of You”).

Google engineer Lea Kissner argues in her G+ post “Privacy, Security, and Paranoia” that privacy engineers “stand between our users and the dark places of the internet.” Privacy work tends to fall into three main categories (we’ll leave security engineering out of scope for this chapter—this is a substantial topic that deserves its own thorough treatment):

Guard

Find and solve potential privacy problems for products. Much of this work involves cultural mindset-related steps in addition to consulting and teaching good privacy practices to other teams, from product teams to business analysts to support teams.

Strengthen

Make it easy for all teams developing products to “do the right thing.” These are the technological infrastructure steps that follow the cultural mindset-related steps that we looked at earlier. To this end, privacy engineers design and build infrastructure, work with teams to improve existing systems, build privacy-related product features, and develop and provide shared libraries for easy implementation of privacy concepts.

Extinguish

When a fire does arise, privacy engineers put it out. They learn from these events by finding ways to generalize solutions or avoid problems in the future—not just for one team, but for many. Note that postmortem culture is just as useful for privacy as for SRE, although it’s often subject to wariness from legal departments due to the sometimes-sensitive nature of the fires.

Privacy engineers tend to think about the products and services they protect in a very specific way. When you ask a privacy engineer to evaluate a product or service, that engineer will be thinking about the following questions:

  • What data is involved?

  • Where is the data stored?

  • How is the data used?

  • What are the potential implications of having this data available?

  • What are the user’s expectations?

  • Who has access to the data, and how?

Note that this list is somewhat aspirational—these are nuanced questions that you might not be able to answer straightaway. At the same time, this list is incomplete: a good privacy engineer considers the nuances of how a product works and its real-world implications, not just the raw flow of data.

Privacy engineers drill into aspects of a system or user behavior that aren’t obvious to the untrained eye. Even though they necessarily must keep state around complexity, there’s also a skill set around practicing empathy and how to factor that practice into engineering work. For example, when debugging or remediating a bug or incident, a privacy engineer thinks not only about user impact, but also about the specific intentions of users—their human motivations, desires, and goals. Furthermore, they don’t just think about “typical” users, but consider the many different user audiences and their diverse assumptions and expectations around product behavior and privacy.

For example: for cases in which a system bug has resulted in unintended behavior, a fix generally consists of two parts: ensuring the specific cause/bug is fixed (“stopping the bleeding”), and then attempting to restore affected users to the intended/happy state (“cleaning up the mess”). Privacy engineers are especially useful for that second step because what the “intended state” is for a particular user is often a complex question to answer (consider: does a user mashing a button really want to perform the action repeatedly?). Having a clear record of explicit user interactions makes it easier to work backward to what results the user expected versus what they got. This can make a huge difference in how long it takes to restore expected system behavior. With this information, an engineer can analyze and replay inputs to return to a known-good state, even if the outputs were corrupted. This ability can be particularly crucial when the state the bug affected is privacy-critical, such as privacy preference data or access control lists (ACLs). Privacy engineers try to envision these scenarios ahead of time and ensure that this safety net infrastructure is built in from the ground up.

Privacy and SRE: Common Approaches

Given that reliability engineering and privacy engineering share the goal of ensuring user trust—a goal that requires big-picture thinking about worst-case scenarios—it’s no surprise that both tend to attract people with similar mindsets and outlooks. In both disciplines, the ability to “see broken things” is a key aspect of a good engineer. Although they have different foci (availability versus respect), both good privacy engineers and reliability engineers look at a system and see how it breaks, not how it succeeds. Many of the lessons SREs have learned over time also apply to privacy engineering.

Reducing Toil

One key element that elevates SRE from straightforward operational work to a proper engineering discipline is its focus on reducing human time spent on toil. The same goal can be applied to privacy engineering: frameworks and careful selection of defaults are two opportunities to reduce human toil.

Automation

You might not think that automation—a tried-and-true core concept of SRE—applies to privacy engineering in an immediate and obvious way. Privacy-related matters are judgment calls and human decisions, which means that they can’t just be automated away, right? Actually, automation can be helpful in privacy engineering.

Automation often entails writing a script, program, or service that programmatically eliminates some aspect of human toil. To apply this model to privacy, you might write a script or simple program that checks to make sure auditing settings match up rather than requiring a human to perform manual verification. A simple example: checking that only a specifically designated set of storage buckets are world-readable, and all others are not. A more complex example: enforcing mutual exclusion between access to two datasets, if policy has determined they should never be cross-joined.

Default behavior for shared architectures

Automation makes things simpler for humans, reducing the amount of effort required by humans and freeing up human time for other tasks. We can effectively “automate” many privacy improvements by building systems that “do the right thing” by default. In other words, “Make correct easy.” Specifically, we can implement system defaults that handle a great deal of decision making, drastically reducing how often engineers building a product need to consciously make a decision for which an improper choice could lead to an undesirable privacy outcome.

Instead of requiring the developers building your products to repeatedly face the same questions, and make the same decisions, sound privacy engineering should have them enumerate and contemplate these decision points in advance. If, for a given situation, there’s a correct or safe choice that applies to 80% of situations, make that choice your system default. Doing so relieves many people of that decision-making burden 80% of the time. For example, a shared library, a schema, or a data access layer might be good places to consider implementing defaults. As a result, you’ll no longer need to spend human time dealing with decisions that could have been made in advance. Instead, you can then spend your human time on decisions that are actually difficult. By focusing your time, you can dig deeper into the hard problems to find better (and more repeatable) solutions.

Note that one of the creative challenges privacy engineers face is when the common choice and the safest choice are not the same. It might not always be possible to adopt the safest choice as the default, which means downstream developers will want to carefully ensure that their usage matches their intent. Make sure these cases are well documented so that other developers know they exist.

Frameworks

Reliability and privacy concerns span multiple products and services. As they encounter new systems, engineers tasked with reliability and/or privacy must either find ways to understand a large variety of systems, or ways to standardize systems so that they don’t need to reunderstand them from scratch each time. Frameworks bake in reliability and privacy best practices in an efficient and scalable way. Factoring both aspects into system design also means that you don’t need to invest the energy and resources to retrofit a product to meet reliability and privacy standards.

So how might you practically apply the concept of frameworks to privacy in your organization? The following examples can get you started thinking about potential approaches:

  • How you handle access control should be one of the most important properties of your system. Establishing a framework for handling ACLs will ensure that all (new) systems can easily and consistently apply your recommended best practices.

  • Deletion of user data is another canonical concern of privacy engineering. Having a consistent, organized system to propagate deletions throughout your system (including caches, syndication to third-party sites, etc.) helps ensure that you don’t leave data orphaned.

The industry still has some work to do when it comes to frameworks. To provide just one specific example, it seems that few startups currently use Role-Based Access Control (RBAC), which is a basic and widely accepted tenet of good privacy engineering.1 Surveying the industry for any kind of standard frameworks (for example, baking in the principle of least privilege for free when turning up new products or services) also turns up few results.

Efficient and Deliberate Problem Solving

As it has evolved, SRE has worked out many of the kinks that lead to inefficient, disjointed troubleshooting and problem solving. Privacy engineering can embrace these aspects of SRE culture without having to experience the same (sometimes painful) journey. Here are just a couple of examples of how SRE best practices in this area can also directly apply to privacy engineering.

Solve challenges once

When you solve a problem, prevent other people from needing to reinvent the wheel by publicizing your solution. Widely communicate what you did to investigate and solve the problem, the decisions you made, why you made these decisions, the results of your decisions, and when and why others should also adopt this solution. Be sure to document the scope of what you’ve solved in terms of constraints and context. For example, when you address a privacy concern for the United States, that solution might not apply when you expand to the EU.

For example, in the privacy space, you might do the following:

  • Build a system to create the necessary audit trails for user consent screens and then reuse it.

  • Build a differentially private experiment system and then reuse it.2

  • Perform UX studies to determine a clear and concise way of describing the privacy implications of a feature. Then, push to have that language used across all products with that type of feature. (Note that accessibility features intersect in interesting ways here.)3

Find and address root causes

Merely fixing symptoms means that the same issue is likely to recur in the future. Step back, take a look at the bigger picture, and invest in the extra levels of investigation to determine the actual cause of the problem and fix it at its source. As discussed in relation to postmortems,4 an investigation that assigns blame to people is counterproductive. Instead, fix the technical or process factor underlying the issue.

In the privacy space, you might apply this principle in the following ways:

  • If a bug results in a data leak, don’t just fix the bug. Sometimes you might end up revising your documentation, safeguards, or tests; sometimes you might determine that something in a library or framework makes it difficult to do the right thing and therefore needs to be revised.

  • If you find the ACL on a storage directory to be overly broad, don’t just fix that particular ACL. Find the tool that sets up the directories and change its default ACLs to be narrower.

  • Create memorable ways to emphasize privacy concerns to other job functions early on in the design phase of a project. For example, you might have designs for sharing flows that consistently enumerate which user interface elements indicate each of “who-what-where” (who is sharing, what are they sharing, where are they sharing it). Eventually “who-what-where” will become a mantra.

Relationship Management

Although concepts like automation and root causing might be obvious wins in the reliability and privacy environments, (as previously mentioned) neither privacy engineering nor SRE exist in a vacuum—both organizations work in a larger engineering and product ecosystem with multiple other players, each with its own priorities and goals. Note that privacy engineering is cross-functional in ways that differ from reliability engineering in that a lot of privacy work is driven not by engineering mandates, but by legal, policy, and compliance needs and business risks. Here, we focus on product team relationships and the ways in which privacy can leverage SRE wisdom.

A key aspect of relationship management when it comes to privacy is making sure that you focus on what has the biggest pragmatic impact for the user, not features that are flashy or high profile. Privacy is unique in its potential impact and the high stakes involved. Unlike deciding on the perfect color schema or menu bar for a product, or even making sure that a service doesn’t violate an agreed-upon Service-Level Agreements (SLA), most privacy-related pitfalls tend to be one-way ratchets: mishandling private data typically can’t be undone.

Because of the high stakes, it’s important to foster strong collaborative relationships by providing your partners with actionable and constructive feedback. When giving guidance to product teams, avoid merely pointing out why their products or processes are flawed. Instead, focus on building a shared vision. Express feedback about how to meet your goals in the context of their goals and your larger shared goals. For example, align the privacy-centric goal of transparency and control with the product team’s goal of building trust in its product by describing how doing the right thing will delight users. Your feedback loop is a two-way street, as is your relationship: people on both sides of the equation can save each other time and energy by making their value propositions clear, explicitly acknowledging the other people’s goals, and working together toward a shared goal.

Early Intervention and Education Through Evangelism

After your colleagues are aware that they need to factor reliability and privacy into product decisions, figure out where your talents are best applied and how to scale your expertise effectively by educating others. Spread knowledge about your goals—not just what your goals are, but why you have these particular goals. Instead of simply telling developers, “Your product needs to do x,” tell them why their product needs to do x (“If your product doesn’t do x, the fallout is y and z”). Even better, also point them to other products that do x, with proven benefits a and b.

In the reliability space, this conversation might look something like the following:

Not so great: “We need you to move your service onto this RPC framework.”

Better: “We need you to move your service onto this RPC framework because it will allow us to better monitor requests. That way, we can understand where slowdowns are, and then work to improve product performance.”

In the privacy space, this conversation might look something like the following:

Not so great: “We need your product to integrate with the new privacy settings account dashboard.”

Better: “We need your product to integrate with the new privacy settings account dashboard. Products x, y, and z are already using this new dashboard, so integrating will help users find controls where they expect to find them. Our end goal here is to minimize user frustration by providing a consistent experience across products.”

When it comes to both reliability- and privacy-related matters, when people understand why you’re supplying them with a specific piece of guidance, both that immediate project and future projects will benefit. If teams understand your areas of concern up front, next time they can proactively approach your team early in the project life cycle rather than shortly before they’re hoping to push to production. Again, the key to good communication here is to focus on your shared mission and assume good intent, rather than assigning blame.

Early engagement is always best, and beyond providing frameworks to engage products from the design phase (see “Frameworks”), proactive education is your best (and sometimes only) hope of getting your partners to talk about privacy and reliability at an appropriate time. Otherwise, people don’t even realize that they should engage with privacy engineering until they’re forced to talk to you, which tends to happen last when developing a new product or feature (if it happens at all). Failing to engage with a product from its early stages means that the product will veer into directions you don’t want it to go. Having a broad network of people who understand what you care about and why also helps your partners detect outages and other potential issues earlier.

Proactively educating others about privacy also allows you to distribute load. The goal isn’t to avoid work your team should rightfully be handling, but to engage in knowledge sharing that enables you to spend your time on the hard problems that only you can solve. For issues that are clear-cut, after your partners understand what you care about and how to avoid obvious and predictable problems, they won’t need to come to you for these straightforward cases. The product team both saves time and avoids having to potentially redo work in light of privacy matters that they could have considered from square one.

For example, access control is a topic that every product team needs to approach strategically. Instead of starting this discussion from the basics with each and every product team, educate your developers on the benefits of having well-structured access control groups. From a reliability standpoint, this means engineers are less likely to cause an outage when making changes (for example, because some critical workflow is gated by an access path). From a privacy standpoint, it’s important to have good visibility into who has access to your systems so that you can prevent unauthorized access to user data. In a similar vein, you should also make sure that developers design their products to clearly track who’s talking to your service. If you can’t differentiate between the clients accessing your service, you won’t know who to work with to resolve a problem.

You can make better decisions about the actually important and hard questions facing your team when you don’t need to waste time answering basic questions or providing standard design advice repeatedly. Your partners also benefit because they have a quicker turnaround time on their questions.

Nuances, Differences, and Trade-Offs

Despite their similarities, reliability engineering and privacy engineering have some fundamental differences.

Although neither reliability nor privacy is strictly black and white, when it comes to user expectations around reliability, you have more latitude in defining an acceptable threshold that constitutes a “reliability outage.” Privacy “outages” are subject to many external factors, such as how users react to particular events and even legal and regulatory requirements. Even though users might be perfectly happy if a service is available for 99% of a year, they might not be so happy if you guarantee to handle only 99% of their data in the right way. Reliability issues are inherently also more “fixable”: if your service is down, you can fix the problem by getting that service back up and running, but there’s no way to “fix” a compromised database: you can’t unring a bell.

Some design decisions might end up trading one of these aspects for the sake of the other; however, creating technical reliability at the cost of user surprise isn’t necessarily productive. This equation is far more likely to be weighted in favor of privacy. Sometimes, it makes sense to ship a product that’s not “perfectly reliable” for the sake of shipping something usable—mandating that a launch or service is 100% risk-free means that you’d never ship anything. But because of the consequences of an “outage,” privacy doesn’t have the same degree of flexibility. A service that goes down can be restored without lasting ill effects (customers understand an occasional outage), but a privacy incident can have permanent effects. These long-term effects should factor into operational decisions. At the end of the day, you’re creating a product that actual people use, not a hypothetical technical service with abstract “users.”

Conclusion

Reliability engineering and privacy engineering are fundamentally similar in many ways: both disciplines work from the same foundation and toward the same ultimate goal. Both can leverage many of the same best practices and approaches. Both are sufficiently important to users, and hard enough to get right, that they should be treated as proper engineering disciplines, not as afterthoughts. And both should be ingrained in your company or organization’s culture. Although their states of maturity may differ, SRE and privacy engineering are living, breathing, and quickly evolving fields—as their core tenants gain wider adoption across the industry, they both must evolve alongside user expectations.

Google teams frequently make use of the techniques described in this chapter to build world-class products that respect user privacy. SREs are in an ideal position to advocate for user privacy, even if they don’t explicitly work in the privacy space (and particularly if your organization can’t dedicate specific resources to privacy engineering). Working from the base of effective problem-solving skills, privacy engineers combine those skills with empathy and societal context to tackle a different realm of user-centric challenges. As any well-seasoned SRE knows, metrics are just a means to an end; the user’s experience is what really matters.

Contributor Bios

Betsy Beyer is a technical writer for Google in New York City specializing in site reliability engineering. In addition to editing Site Reliability Engineering (O’Reilly, 2016), she has previously written documentation for Google’s Data Center and Hardware Operations teams in Mountain View and across its globally distributed data centers. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University.

Amber Yust worked in Google SRE before joining Google’s privacy effort in 2014. As a staff privacy engineer, she now leads a team working to engineer reliable privacy into Google’s products at a fundamental level.

1 Also referred to as scoped access: narrowing the scope of access granted (both to humans and to production roles) not only helps protect privacy, it also reduces the potential impact of a security breach or production accident. This concept is sometimes alluded to in the general sysadmin advice of “don’t run everything as root,” but here is taken further and structured.

2 For an example of a system that’s similar in spirit, see the code for Rappor, a privacy reporting system.

3 Good examples of this standardization include app permission prompts on Android and iOS.

4 See the SRE Book, Chapter 15: Postmortem Culture: Learning from Failure and Postmortem Action Items: Plan the Work and Work the Plan.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.143.31