Chapter 5. eBPF in Cloud Native Environments

The cloud native approach to computing has taken off exponentially in recent years. In this chapter, I’ll discuss why eBPF is so well-suited to tooling for cloud native environments. To keep things concrete, I’ll refer to Kubernetes, but the same concepts apply to any platform that uses containers.

One Kernel per Host

To understand why eBPF is so powerful in the cloud native world, you’ll need to be very clear on one concept: there is only one kernel per machine (or virtual machine), and all the containers running on that machine share the same kernel,1 as shown in Figure 5-1. The same kernel is involved with and aware of all the application code running on any given host machine.

By instrumenting the kernel, as we do when using eBPF, we can simultaneously instrument all the application code running on that machine. When we load an eBPF program into the kernel and attach it to an event, it gets triggered irrespective of which process is involved with the event.

Figure 5-1. All the containers on the same host share a single kernel

eBPF Versus the Sidecar Model

Prior to eBPF, most observability and security tooling for Kubernetes used the sidecar model. This model allows you to deploy the instrumentation in a separate container but within the same pod as the application. When this approach was invented, it was a step forward because it meant you no longer had to write instrumentation code directly in the app. Simply by deploying the sidecar, it would have visibility over the other containers in the same pod. The process of injecting sidecars is usually automated, so this provides a mechanism for ensuring all your apps are instrumented.

Each sidecar container consumes resources, and this is multiplied by the number of pods with the sidecar injected. This can be very significant—for example, if each sidecar needs its own copy of routing information, or policy rules, this is wasteful. (For more on this, Thomas Graf wrote a comparison of sidecars with eBPF for service mesh.)

Another issue with sidecars is that you can’t guarantee that every application on the machine has been instrumented correctly. Imagine that an attacker manages to compromise one of your hosts and starts a separate pod to run, say, a cryptocurrency miner. They are unlikely to do you the courtesy of instrumenting their mining pod with your sidecar observability or security tools. You’ll need a separate system to be aware of this activity.

But that same cryptocurrency miner shares the kernel with the legitimate pods running on that host. If you’re using eBPF-based instrumentation, as illustrated in Figure 5-2, the miner is automatically subject to it.

Figure 5-2. Sidecars can only observe activity in their own pods, but eBPF programs can observe all activity

eBPF and Process Isolation

Instead of per-pod sidecars, I’m advocating the consolidation of functionality into a single per-node, eBPF-based agent. If that agent has access to all the pods running on the machine, isn’t that a security risk? Haven’t we lost the isolation between applications that might prevent them from interfering with each other?

As someone who has spent a lot of time working in container security, I can relate to these concerns, but it’s important to dig into the underlying mechanisms to really understand why it’s not the flaw that it might appear to be at first.

The important thing to remember is that those pods all share one kernel, and the kernel does not have an innate understanding of pods or containers. Instead, the kernel operates on processes, and uses cgroups and namespaces to isolate processes from each other. Those structures stop processes in user space from being able to see or interfere with each other, as policed by the kernel. As soon as data is being processed within the kernel (for example, being read from disk or sent to a network) you are relying on the kernel behaving correctly. It is only the kernel’s own code that says, for example, that it should respect, say, file permissions. There’s no extra magic authoritative layer that would stop the kernel from ignoring file permissions and reading data out of any file it wanted to access—it’s simply that the kernel itself is coded not to do so.

The security controls that exist on a Linux system assume that the kernel itself can be trusted. They are there to protect against bad behavior from code running in user space.

We saw in Chapter 2 that the eBPF verifier ensures that any eBPF program only attempts to access memory that it should have access to. The verifier checks that the program can’t possibly go beyond its remit, including ensuring that the memory is owned by the current process or is part of the current network packet. This means that eBPF code is subject to much stricter controls than the surrounding kernel code, which doesn’t have to pass any kind of verifier step.

If an attacker escapes a containerized application onto the node, and is able to escalate privileges, that attacker can compromise other applications on the same node. Since those escapes are not unknown, as a container security expert, I would not recommend running sensitive applications on a shared machine alongside untrusted applications or users without some level of additional security tooling. For highly sensitive data, you might not even want to run within a virtual machine on the same bare metal as untrusted users. But if you are prepared to run applications side-by-side on the same virtual machine (which is completely reasonable in many applications that are not particularly sensitive), then eBPF is not adding risk beyond what already exists by sharing a kernel.

Of course, a malicious eBPF program could wreak all kinds of havoc, and it would certainly be easy to write eBPF code that behaves badly—for example, taking copies of every network packet and sending it to an eavesdropper. By default, nonroot users don’t have permission to load eBPF programs,2 and you should only grant users or software systems this permission if you really trust them, much as for root permissions. So, you do have to be careful about the provenance of the code you run (and there is an initiative in play to support signature checking of eBPF programs to help with this). You can also use eBPF programs to keep a watchful eye on other eBPF programs!

Now that you have an overview of why eBPF is a powerful basis for cloud native instrumentation, the next chapter gives you some concrete examples of eBPF tooling from the cloud native ecosystem.

1 This is nearly always true, unless you are using a virtualization approach like Kata containers, Firecracker or unikernels, where each “container” runs in its own virtual machine.

2 The Linux capability CAP_BPF grants permission to load BPF programs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.196.175