Chapter 5. Networking

In this chapter we will focus on networking aspects of your workloads. We will first review the defaults that Kubernetes proper comes equipped with and what else is readily available due to integrations. We cover networking topics including East-West and North-South traffic, that is, intra-pod and inter-pod communication, communication with the worker node (hosts), cluster-external communication, workload identity, and encryption on the wire.

In the second part of this chapter we have a look at two more recent additions to the Kubernetes networking toolbox: service meshes and the Linux kernel extension mechanisms eBPF. We try to give you a rough idea if, how, and where you can, going forward, benefit from both.

As you can see in Figure 5-1 there are many moving parts in the networking space.

Network layer model
Figure 5-1. Network layer model

The good news is that most if not all of the protocols should be familiar to you, since Kubernetes uses the standard Internet Engineering Task Force (IETF) suite of networking protocols, from the Internet Protocol to the Domain Name System (DNS). What changes, really, is the scope and generally the assumptions about how the protocols are used. For example, when deployed on a world-wide scale, it makes sense to make the time-to-live (TTL) of a DNS record months or longer.

In the context of a container that may run for hours or days at best, this assumption doesn’t hold anymore. Clever adversaries can exploit such assumptions and as you should know by now that’s exactly what the Captain would do.

We will, in this chapter, focus on the in Kubernetes most-often used protocols and their weak points with respect to workloads in this chapter.

As Captain Hashjack likes to say, “loose lips sink ships”, so we’ll first explore for permissive networking defaults, then show how to attack them as well as discuss the controls you can implement to detect and mitigate these attacks.

Defaults

With defaults we mean the default values of configurations of components that you get when you use Kubernetes from source, in an unmodified manner.

From a networking perspective workloads in Kubernetes find the following setup:

  • Flat topology: every pod can see and talk to every other pod in the cluster.

  • No security context: workloads can escalate to host network interface controller (NIC).

  • No environmental restrictions: workloads can query their host and cloud metadata.

  • No identity for workloads.

  • No encryption on the wire (between pods and cluster-externally).

While above list might look scary, maybe a different way to look at it makes it easier to assess the risks present, have a look at Figure 5-2.

Kubernetes networking overview
Figure 5-2. Kubernetes networking overview

As depicted in Figure 5-2 the main communication paths in Kubernetes are as follows:

Let’s now have a closer look at the communication paths and other networking-relevant defaults in Kubernetes, including “The state of the ARP”, “No security context”, “No workload identity”, and “No encryption on the wire”.

Note

There are some aspects of the networking space that depend heavily on the environment Kubernetes is used. For example, when using hosted Kubernetes from one of the cloud providers, the control plane and or data plane may or may not be publicly available. If you are interested in learning more how the big three handle this, have a look at:

Since this is not an intrinsic property of Kubernetes and there are many combinations possible we decided to exclude this topic from our discussion in this chapter.

So, are you ready to learn about the Kubernetes networking defaults?

Intra-pod networking

The way intra-pod networking in Kubernetes works is as follows. An implicit so called pause container in a pod (cp in Figure 5-3) spans a Linux network namespace.

Other containers in the pod, such as init containers (like ci1 and ci2) and the main application container and sidecars, such as proxies or logging containers, for example c1 to c3, then join the pause container’s network and IPC namespace.

Internals of a Kubernetes pod
Figure 5-3. Internals of a Kubernetes pod

The pause container has the network bridge mode enabled and all the other containers in the pod are sharing their namespace via container mode.

As discussed in Chapter 2, pods were designed to make it easy to lift and shift existing applications into Kubernetes, the security implications are somber. Ideally, you rewrite the application so that the tight coupling of containers in a pod are not necessary or deploy traditional tooling in the context of a pod.

While the latter seems like a good idea, initially, do remember that this is a stopgap measure at best. Once the boundaries are clear and effectively every microservice is deployed in its own pod, you can go ahead and use the techniques discussed in the next sections.

In addition, no matter if you’re looking at defense in depth in the context of a pod or cluster-wide, you can employ a range of dedicated container security open source and commercial offerings, see also the Appendix of the book.

Inter-pod traffic

In a Kubernetes cluster by default every pod can see and talk to every other pod. This default is from a security perspective a nightmare (or a free ride, depending on which side you stand) and we can not empasize enough how dangerous this fact is.

No matter what your threat model is, this “all traffic is allowed” policy for both inter-pod and external traffic represents one giant attack vector. In other words: you should never rely on the Kubernetes defaults in the networking space. That is, you should never ever run a Kubernetes cluster without restricting network traffic in some form or shape. For a practical example how you can go about this, have a look at “Traffic flow control”.

Pod-to-worker node traffic

If not disabled, workloads can query the worker node (host) they are running on as well as the (cloud) environments they are deployed into.

No default protection exists for worker nodes, routable from the CNI. Further the worker nodes may be able to access cloud resources, data stores, and API servers. Some cloud providers, notably Google, offer some solutions for this issue, see for example shielded GKE Nodes.

For cloud environments in general good practices exists. For example, Amazon EKS recommends to restrict access to instance metadata and equally GKE documents how to protect cluster metadata.

Further, commercial offerings like Nirmata’s Virtual Clusters and Workload Policies can be used in this context.

Cluster-external traffic

To allow pods to communicate with cluster-external endpoints, Kubernetes has over time added a number of mechanisms, most recently and widely used are what is called an Ingress. This allows for Layer 7 routing (HTTP), whereas for other use cases such as Layer 3/4 routing you would need to use older, less convenient methods, see also Publishing Services (ServiceTypes) in the docs.

In order for you to use the Ingress resource, you will need to pick a ingress controller, one of the many choices, oftentimes open source based, including but not limited to:

In addition, cloud providers usually provide their own solutions, integrated with their managed loadbalancing services.

Encryption on the wire (TLS) is nowadays almost the default and most Ingress solutions support it out of the box and alternatively you can use a service mesh for securing your North-South traffic (“Service Meshes”).

Last but not least, on the application level you might want to consider using a Web Application Firewall (WAF) such as offered by most cloud providers or also standalone such as Wallarm’s offering.

More and more practitioners sharing their experiences in this space, so keep an eye out for blog posts and CNCF webinars covering this topic, for example, Shaping Chick-fil-A One Traffic in a Multi-Region Active-Active Architecture.

The state of the ARP

Address Resolution Protocol (ARP) is a link layer protocol used by the Internet Protocol (IP) to map IP network addresses to the hardware (MAC) addresses. As Liz Rice showed in her KubeCon NA 2019 talk on CAP_NET_RAW and ARP Spoofing in Your Cluster: It’s Going Downhill From Here how defaults allow us to open raw network sockets and how this can lead to issues.

This involves the following steps:

  • Using ARP and DNS to fool a victim pod to visit a fake URL.

  • This is possible due to the way Kubernetes handles local FQDNs.

  • It requires that CAP_NET_RAW is available to a pod.

For more details, see the Aqua Security blog post DNS Spoofing on Kubernetes Clusters.

The good news is, there are defenses available to mitigate the ARP-based attacks and spoil the Captain’s mood:

How can you tell if you’re affected? Use kube-hunter, for example.

No security context

By default, workloads can escalate to the NIC of the worker node they are running on. For example, when running privileged containers, one can escape from the container using kernel modules. Further, as the Microsoft Azure team pointed out in their Threat matrix for Kubernetes:

Attackers with network access to the host (for example, via running code on a compromised container) can send API requests to the Kubelet API. Specifically querying https://[NODE IP]:10255/pods/ retrieves the running pods on the node. https://[NODE IP]:10255/spec/ retrieves information about the node itself, such as CPU and memory consumption.

Naturally, one wants to avoid above scenarios and one way to go about this is to apply pod security policies as discussed in the runtime polocies section.

For example, the Baseline/Default policy has the following defined:

  • Sharing the host namespaces must be disallowed: spec.hostNetwork, spec.hostPID, spec.hostIPC

  • Privileged pods disable most security mechanisms and must be disallowed: spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged

  • HostPorts should be disallowed or at minimum restricted to a known list: spec.containers[*].ports[*].hostPort and spec.initContainers[*].ports[*].hostPort

In addition, there are a number of commercial offerings, such as Palo Alto Networks Prisma Cloud (formerly Twistlock) that you can use to harden your worker nodes, in this context.

No workload identity

By default, Kubernetes does not assign an identity to services. SPIFFE/SPIRE can be used to manage workload identities and enable mTLS.

SPIFFE (Secure Production Identity Framework for Everyone) is a collection of specifications for securely identifying workloads.

It provides a framework enabling you to dynamically issue an identity to a service across environments by defining short-lived cryptographic identity documents—called SPIFFE Verifiable Identity Document (SVID)—via an API. Your workloads in turn can use these SVIDs when authenticating to other workloads. For example, an SVID can be used to establish an TLS connection or to verify a JWT token.

No encryption on the wire

For workloads in regulated industries, that is, any kind of app that is required to conform to a (government issued) regulation, encryption on the wire—or encryption in transit, as it’s sometimes called—is typically one of the requirements. For example, if you have a Payment Card Industry Data Security Standard (PCI DSS) compliant app as a bank, or a Health Insurance Portability and Accountability Act (HIPAA) compliant app as a health care provider, you will want to make sure that the communication between your containerized microservices is protected against sniffing and person-in-the-middle attacks.

These days, the Transport Layer Security (TLS) protocol as defined in RFC 8446 and older IETF paperwork is usually used to encrypt traffic on the wire. It uses asymmetric encryption to agree on a shared secret negotiated at the beginning of the session (hand shake) and in turn symmetric encryption to encrypt the workload data. This setup is a nice performance vs. security tradeoff.

While control plane components such as the API server, etcd, or a kubelet can rely on an PKI infra out-of-the-box, providing APIs and good practices for certificates the same is sadly not true for your workloads.

Tip

You can see the API Server’s hostname, and any IPs encoded into its TLS certificate, with openssl.

By default, the traffic between pods and to the outside world is not encrypted.

To mitigate, enable workload encryption on the wire, for example with Calico, using Wireguard VPN, or with Cilium which supports both Wireguard and IPsec.

Another option to provide not only this sort of encryption but also workload identity “No workload identity” are service meshes, so let’s move on to this topic.

With the defaults out of the way, let’s move on to the threat modelling for the networking space.

Threat model

The threat model in the networking space (cf “Starting to threat model”), that is, the collection of identified networking vulnerabilities according to the risk they pose, is what we’re focusing on in the following.

So, what is the threat model we consider in the networking space, with respect to workloads? What are our assumptions about what attackers could do to our precious workloads and beyond to the infrastructure?

The following observations should give you an idea about potential threat models. We illustrate these scenarios with some examples of past attacks, covering the 2018 to 2020 time frame:

  • Using the front door, for example via an ingress controller or a load balancer and then either pivot or performing a denial-of-service attack, such as observed in CVE-2020-15127.

  • Using developer access paths like kubectl cp (CVE-2019-11249) or developer environments such as Minikube, witnessed in CVE-2018-1002103.

  • Launching a pod with access to host networking or unecessary capabilities, as we will further discuss in “The state of the ARP”.

  • Leverage a compromised workload to connect to another workload.

  • Port scan of all CNI plugins and further use this information to identify vulnerabilities, for example CVE-2019-9946.

  • Attacking a control plane component such as the API server and etcd or a kubelet or kube-proxy on the worker, for example CVE-2020-8558, CVE-2019-11248, CVE-2019-11247, and CVE-2018-1002105.

  • Server-side request forgery (SSRF), for example concerning the hosting environment, like a cloud provider’s VMs.

  • Man-in-the-middle attacks, such as seen in the context of IPv6 routing, see also CVE-2020-10749.

Now that we have a basic idea of the potential threat model, let’s go through see how the defaults can be exploited and defended against, in turn.

Traffic flow control

We’ve seen the networking defaults and what kind of communication paths are present in Kubernetes. In the following, we walk you through an end to end setup and show you how to secure the external traffic using network policies.

The setup

To demonstrate the networking defaults in action, let’s use kind, a tool for running local Kubernetes clusters using Docker containers.

So let’s create a kind cluster with networking prepared for Calico as well as Ingress enabled, see also the docs. We are using the following config:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true" 1
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
- role: worker
networking:
  disableDefaultCNI: true 2
  podSubnet: 192.168.0.0/16 3
1

Enable Ingress for cluster.

2

Disable the native kindnet.

3

In preparation to install Calico, set to its default subnet.

Assuming above YAML snippet is stored in a file called cluster-config.yaml you can now create the kind cluster as follows:

$ kind create cluster --name cnnp 
                      --config cluster-config.yaml
Creating cluster "cnnp" ...

Note that if you do this the first time, the above output might look different and it can take several minutes to pull the respective container images.

Next we install and patch Calico to make it work with kind. Kudos to Alex Brand for putting together the necessary patch instructions:

$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
...
serviceaccount/calico-kube-controllers created

$ kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true
daemonset.apps/calico-node env updated

And to verify if everything is up and running as expected:

$ kubectl -n kube-system get pods | grep calico-node
calico-node-2j2wd                            0/1     Running             0          18s
calico-node-4hx46                            0/1     Running             0          18s
calico-node-qnvs6                            0/1     Running             0          18s

Before we can deploy our app, we need one last bit of infrastructure in place, a load balancer, making the pods available to the outside world (your machine).

For this we use Ambassador as an ingress controller:

$ kubectl apply -f https://github.com/datawire/ambassador-operator/releases/latest/download/ambassador-operator-crds.yaml && 
  kubectl apply -n ambassador -f https://github.com/datawire/ambassador-operator/releases/latest/download/ambassador-operator-kind.yaml && 
  kubectl wait --timeout=180s -n ambassador --for=condition=deployed ambassadorinstallations/ambassador
customresourcedefinition.apiextensions.k8s.io/ambassadorinstallations.getambassador.io created
namespace/ambassador created
configmap/static-helm-values created
serviceaccount/ambassador-operator created
clusterrole.rbac.authorization.k8s.io/ambassador-operator-cluster created
clusterrolebinding.rbac.authorization.k8s.io/ambassador-operator-cluster created
role.rbac.authorization.k8s.io/ambassador-operator created
rolebinding.rbac.authorization.k8s.io/ambassador-operator created
deployment.apps/ambassador-operator created
ambassadorinstallation.getambassador.io/ambassador created
ambassadorinstallation.getambassador.io/ambassador condition met

Now we can launch the application, a webserver. First off, we want to do all of the following in a dedicated namespace called npdemo, so let’s create one:

$ kubectl create ns npdemo
namespace/npdemo created

Next, create a YAML file called workload.yaml that defines a deployment, a service, and and ingress resource, in total representing our workload application:

kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:alpine
        name: main
        ports:
        - containerPort: 80
---
kind: Service
apiVersion: v1
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  ports:
  - port: 80
---
kind: Ingress 1
apiVersion: extensions/v1beta1
metadata:
  name: mainig
  annotations:
    kubernetes.io/ingress.class: ambassador
spec:
  rules:
  - http:
      paths:
      - path: /api
        backend:
          serviceName: nginx
          servicePort: 80
1

We configure the ingress in a way that if we hit the /api URL path we expect it to route traffic to our nginx service.

Next, you want to create the resources defined in workload.yaml by using:

$ kubectl -n npdemo apply -f workload.yaml
deployment.apps/nginx created
service/nginx created
ingress.extensions/mainig created

When you now try to access the app as exposed in the ingress resource above you should be able to do the following (note that we’re only counting the lines returned to verify we get something back):

$ curl -s 127.0.0.1/api | wc -l
  25

Wait. What just happened? We put an ingress in front of the NGINX service and it happily receives traffic from outside? That can’t be good.

Network policies to the rescue!

So, how can we keep the Captain and his crew to get their dirty paws on our cluster? Network Policies are coming to our rescue. While we will cover policies in a dedicated chapter (see Chapter 8) we point out network policies and their usage here since they are so useful and, given the “by default all traffic is allowed” attitude of Kubernetes, one can argue almost necessary.

While Kubernetes allows you to define and apply network policies out-of-the-box, you need something that enforces the policies you define and that’s the job of a provider.

For example, in the following we will be using Calico, however there are many more options available, such as the eBPF-based solutions discussed in “eBPF”.

We shut down all traffic with the following Kubernetes network policy in a file called, fittingly, np-deny-all.yaml:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-all
spec:
  podSelector: {} 1
  policyTypes:
  - Ingress 2
1

Selects the pods in the same namespace, in our case all.

2

Disallow any ingress traffic.

Tip

Network policies are notoriously difficult to get right, so in this context, you may want to check out the following:

So let’s apply the above network policy and see if we can still access the app from outside of the cluster:

$ kubectl -n npdemo apply -f np-deny-all.yaml
networkpolicy.networking.k8s.io/deny-all created

$ kubectl -n npdemo describe netpol deny-all
Name:         deny-all
Namespace:    npdemo
Created on:   2020-09-22 10:39:27 +0100 IST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    <none> (Selected pods are isolated for ingress connectivity)
  Not affecting egress traffic
  Policy Types: Ingress

And this should fail now, based on our network policy (giving it a 3 second time out, just to be sure):

$ curl --max-time 3 127.0.0.1/api
curl: (28) Operation timed out after 3005 milliseconds with 0 bytes received
Tip

If you only have kubectl available, you can still make raw network requests. Of course it shouldn’t be in your container image in the first place!

We hope by now you get an idea how dangerous the defaults—all network traffic to and from pods is allowed—and how you can defend against it.

Learn more about network policies, including recipes as well as tips and tricks via the resources we put together in the Appendix of the book.

Tip

In addition to network policies some cloud providers offer other native mechanisms to restrict traffic from/to pods, for example, see AWS security groups for pods.

Finally, don’t forget to clean up your Kubernetes cluster using kind delete cluster --name cnnp, once you’re done exploring the topic of network policies.

Now that we’ve seen a concrete networking setup in action, let’s move on to a different topic: service meshes. This, relatively recent technology can help you in addressing some of the earlier pointed out not-so-secure defaults including workload identity and encryption on the wire.

Service Meshes

A somewhat advanced topic, a service mesh is in a sense complimentary to Kubernetes and can be beneficial in a number of use cases. Let’s have a look at how the most important workload-level networking issues can be addressed using a service mesh.

Concept

A service mesh as conceptually shown in Figure 5-5 is, as per their creators, a collection of user-space proxies in front of your apps along with a management process to configure said proxies.

The proxies are referred to as the service mesh’s data plane, and the management process as its control plane. The proxies intercept calls between services and does something interesting with or to these calls, for example, disallow a certain communication path or collect metrics from the call. The control plane on the other hand coordinates the behavior of the proxies and provides the administrator an API.

Service mesh concept
Figure 5-5. Service mesh concept

Options and uptake

At time of writing, a number of service meshes exist as well as proposed quasi standards for interoperability, such as the CNCF project Service Mesh Interface or work of the Envoy-based Universal Data Plane API Working Group (UDPA-WG).

While it is early days, we witness certain uptake, especially out of security considerations (cf. Figure 5-6. For example, The New Stack (TNS) reports in its 2020 Service Mesh survey:

A third of respondents’ organizations are using service meshes to control communications traffic between microservices in production Kubernetes environments. Another 34% use service mesh technology in a test environment, or are piloting or actively evaluating solutions.

TNS 2020 service mesh survey excerpt
Figure 5-6. TNS 2020 service mesh survey excerpt

Going forward, many exciting application areas and nifty defense mechanisms based on service meshes are possbile, for example Identity Federation for Multi-Cluster Kubernetes and Service Mesh or using OPA in Istio. That said, many end-users are not yet ready to go all in and/or are in a holding pattern, waiting for cloud and platform providers to make the data plane of the service mesh part of the underlying infrastructure. Alternatively, the data plane may be implemented on the operating system level, for example, using eBPF.

Case study: mTLS with Linkerd

Linkerd is a graduated CNCF project, originally created by Buoyant.

Linkerd automatically enables mutual Transport Layer Security (mTLS) for most HTTP-based communication between meshed pods. Let’s see that in action.

In order to follow along, install Linkerd in a test cluster. We’re using kind in the following and assume you have both the Kubernetes cluster set up and configured as well as the Linkerd CLI:

$ linkerd check --pre
kubernetes-api
...
Status check results are √

Now that we know that we’re in a position to install Linkerd, let’s go ahead and do it:

$ linkerd install | kubectl apply -f -
namespace/linkerd created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-identity created
...
deployment.apps/linkerd-grafana created

And finally verify the install:

$ linkerd check
kubernetes-api
...
Status check results are √

Great! All up and running. You could have a quick look at the Linkerd dashboard using linkerd dashboard & which should show something like depicted in Figure 5-7.

Linkerd dashboard showing example traffic stats
Figure 5-7. Linkerd dashboard showing example traffic stats

OK, back to mTLS: once we have enabled the mesh in the respective namespaces it should be impossible for us, even from within the cluster, to directly talk to a service using, say curl and doing a HTTP query. Let’s see how that works.

In the following we’re reusing the setup and from “Inter-pod traffic” but you can really use any workload that exposes a HTTP service within the cluster.

First, we need to enable the mesh, or meshify, as the good folks from Buoyant call it:

$ kubectl get -n npdemo deploy -o yaml | 
          linkerd inject - | kubectl apply -f -


$ kubectl get -n ambassador deploy -o yaml | 
          linkerd inject - | kubectl apply -f -

Now we can validate our mTL setup using tshark as follows:

$ curl -sL https://run.linkerd.io/emojivoto.yml 
  | linkerd inject --enable-debug-sidecar - 
  | kubectl apply -f -
namespace "emojivoto" injected
...
deployment.apps/web created

Once the sample app is up and running we can use an remote shell into the attached debug container that Linkerd kindly put there for us:

$ kubectl -n emojivoto exec -it  1
  $(kubectl -n emojivoto get po -o name | grep voting)  2
  -c linkerd-debug -- /bin/bash 3
1

Connect to pod for interactive (terminal) use.

2

Provide pod name for the exec command.

3

Target the linkerd-debug container in the pod.

Now, from within the debug container we use tshark to inspect the packets on the NIC and expect to see TLS traffic:

root@voting-57bc56-s4l:/# tshark -i any  1
                                 -d tcp.port==8080,ssl  2
                          | grep -v 127.0.0.1 3

Running as user "root" and group "root". This could be dangerous.
Capturing on 'any'

 1 0.000000000 192.168.49.192  192.168.49.231 TCP 76 41704  4191 [SYN] Seq=0 Win=28000 Len=0 MSS=1400 SACK_PERM=1 TSval=42965802 TSecr=0 WS=128
 2 0.000023419 192.168.49.231  192.168.49.192 TCP 76 4191  41704 [SYN, ACK] Seq=0 Ack=1 Win=27760 Len=0 MSS=1400 SACK_PERM=1 TSval=42965802 TSecr=42965802 WS=128
 3 0.000041904 192.168.49.192  192.168.49.231 TCP 68 41704  4191 [ACK] Seq=1 Ack=1 Win=28032 Len=0 TSval=42965802 TSecr=42965802
 4 0.000356637 192.168.49.192  192.168.49.231 HTTP 189 GET /ready HTTP/1.1
 5 0.000397207 192.168.49.231  192.168.49.192 TCP 68 4191  41704 [ACK] Seq=1 Ack=122 Win=27776 Len=0 TSval=42965802 TSecr=42965802
 6 0.000483689 192.168.49.231  192.168.49.192 HTTP 149 HTTP/1.1 200 OK
 ...
1

Listen on all available network interfaces for live packet capture.

2

Decode any traffic running over port 8080 as TLS.

3

Ignoring 127.0.0.1 (localhost) as this traffic will always be unencrypted.

Yay, it works, encryption on the wire for free! And with this we’ve completed the mTLS case study.

If you want to learn more about how to use service meshes to secure your East-West communication, we have put together some suggested further reading in the Appendix.

While service meshes certainly can help you with networking related security challenges, fending off the Captain and his crew, you should, be aware of weakenesses. For example, from Envoy-based systems, if you run a container with UID 1337, it bypasses the Istio/Envoy sidecar or, by default, the Envoy admin dashboard is accessible from within the container because it shares a network. For more background on this topic, check out the in-depth Istio Security Assessment.

Now it’s time to move on to the last part of the workload networking topic: what happens on a single worker node.

eBPF

After the service mesh adventure, we focus our attention now onto a topic that is on the one hand entirely of opposite character and on the other hand can also be viewed and understood to be used in the service mesh data plane. We have a look at eBPF, a modern and powerful way to extend the Linux kernel and with it you can address a number of networking related security challenges.

Concept

Originally, this piece of Linux kernel technology was known under the name Berkeley Packet Filter (BPF). Then it experienced a number of enhancements, mainly dirven by Google, Facebook, and Netflix and to distinguish it from the original implementation it was called eBPF. Nowadays, the kernel project and technology is commonly known as eBPF, which is a term in itself and does not stand for anything per se, that is to say it’s not considered an acronym any longer.

Technically, eBPF is a feature of the Linux kernel and you’ll need the Linux kernal version 3.18 or above to benefit from it. It enables you to safely and efficiently extend the Linux kernel functions by using the bpf(2) syscall (see also the man pages for details). eBPF is implemented as a in-kernel virtual machine using a custom 64 bit RISC instruction set.

In Figure 5-8 you see a high-level overview taken from Brendan Gregg’s book Linux Extended BPF (eBPF) Tracing Tools:

eBPF overview in the Linux kernel
Figure 5-8. eBPF overview in the Linux kernel

This sounds promising, but is eBPF already used in the wild, and also, which options have you available? Let’s take a look.

Options and uptake

In 2021, eBPF is already used in a number of places and for use cases such as:

  • In Kubernetes, as a CNI plugin to enable pod networking for example, in Cilium and Project Calico as well as for service scalability (in the context of kube-proxy). For observability, like for Linux kernel tracing such as with iovisor/bpftrace as well as in a clustered setup with Hubble.

  • As a security control, for example to perform container runtime scanning as you can use with projects such as CNCF Falco but also for enforcing Network Policies “Traffic flow control” in Kubernetes (via Cilium, Calico, etc.).

  • Network loadbalancing like Facebook’s L4 katran library.

  • In Chapter 9, we’re looking into another exciting use case: low-level intrusion detection systems (IDS) for Kubernetes.

We see an increasing number of players entering the eBPF field, leading the charge is Isovalent. While it’s still early days from an adoption perspective, eBPF has a huge potential. Coming back to the service mesh data plane: it is perfectly doable and thinkable to implement the Envoy APIs as a set of eBPF programs and push the handling from user space side-car proxy into the kernel.

Extending the kernel with user space programs sounds interesting, but how does that look, in practice?

Case study: attaching a probe to a Go programm

Let’s have a look at an example from the Cilium project. The following is a Go program available in main.go and demonstrates how you can attach an eBPF program (written in C) to a kernel symbol. The overall result of the exercise is that whenever the sys_execve syscall is invoked, a kernel counter is increased, which the Go program then reads and prints out the number of times the probed symbol has been called per second.

The following line in main.go (edited to fit the page, should all be on the same line) instructs the Go toolchain to include the compiled C program that contains our eBPF code:

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go
  -cc clang-11 KProbeExample ./bpf/kprobe_example.c -- -I../headers

In kprobe_example.c we find the eBPF program itself:

#include "common.h"
#include "bpf_helpers.h"

char __license[] SEC("license") = "Dual MIT/GPL"; 1

struct bpf_map_def SEC("maps") kprobe_map = { 2
    .type = BPF_MAP_TYPE_ARRAY,
    .key_size = sizeof(u32),
    .value_size = sizeof(u64),
    .max_entries = 1,
};

SEC("kprobe/sys_execve")
int kprobe_execve() { 3
    u32 key = 0;
    u64 initval = 1, *valp;

    valp = bpf_map_lookup_elem(&kprobe_map, &key);
    if (!valp) {
        bpf_map_update_elem(&kprobe_map, &key, &initval, BPF_ANY);
        return 0;
    }
    __sync_fetch_and_add(valp, 1);

    return 0;
}
1

You must define a license.

2

Enables exchange of data between kernel and user space.

3

The entry point of our eBPF probe (program).

As you can guess, writing eBPF by hand is not fun. Luckily there are a number of great tools and environments available that take care of the low-level stuff for you.

Note

Just as we were wrapping up the book writing, the Linux Foundation announced that Facebook, Google, Isovalent, Microsoft and Netflix joined together to create the eBPF Foundation, and with it giving the eBPF project a vendor-neutral home. Stay tuned!

To dive deeper into the eBPF topic we suggest you read Linux Observability with BPF: Advanced Programming for Performance Analysis and Networking by David Calavera and Lorenzo Fontana. If you’re looking for a quick overview, Matt Oswalt has a nice Introduction to eBPF.

To stay on top of things, have a look at ebpf.io and check out what the community publishes on the YouTube channel for this topic.

Further, have a look at Pixie, in Figure 5-9 we show an example screen shot, an open source, eBPF-based observability tool with an active community and broad industry support.

Pixie in action
Figure 5-9. Pixie in action

With this short eBPF overview we’ve reached the end of the networking chapter.

Conclusion

Summing up, there are a number of defaults in the Kubernetes networking space you want to be aware of. As a baseline, you can apply the good practices you know from a non-containerized environment in combination with intrusion detection tooling as shown in Chapter 9. In addition you want to use native resources such as network policies potentially in combination with other CNCF projects such as SPIFFE for workload identity to strengthen your security posture.

Service meshes, while still early days, are another promising option to enforce porlicies and gain insights in what is going on. Last but not least, eBPF is the up and coming star in the networking arena, enabling a number of security-related use case.

Now that we have the networking secured, we are ready for the Captain to move on to more “solid” grounds: storage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset