14 Nodes and Kubernetes security

This chapter covers

  • Node hardening and Pod manifest
  • API server security, including RBAC
  • User authentications and authorization
  • The Open Policy Agent (OPA)
  • Multi-tenancy in Kubernetes

We just wrapped up securing the Pod in the previous chapter; now we’ll cover securing the Kubernetes node. In this chapter, we’ll include more information about node security as it relates to possible attacks on nodes and Pods, and we’ll provide full examples with a number of configurations.

14.1 Node security

Securing a node in Kubernetes is analogous to securing any other VM or data center server. We’ll cover Transport Layer Security (TLS) certificates to start. These certificates allow for securing nodes, but we’ll also look at issues related to image immutability, workloads, network policies, and so on. Treat this chapter as an à la carte menu of important security topics that you should at least consider for running Kubernetes in production.

14.1.1 TLS certificates

All external communications in Kubernetes generally occur over TLS, although this can be configured. However, there are many flavors of TLS. For this reason, you can select a cipher suite for the Kubernetes API server to use. Most installers or self-hosted versions of Kubernetes will handle the creation of the TLS certificates for you. Cipher suites are collections of algorithms that, in aggregate, allow for TLS to happen securely. Defining a TLS algorithm consists of

  • Key exchanges—Sets up an agreed upon way to exchange keys for encryption/ decryption

  • Authentication—Confirms the identity of the sender of a message

  • Encryption—Disguises messages so outsiders can’t read them

  • Message authentication—Confirms that messages are coming from a valid source

In Kubernetes, you might find the following cipher-suite: TLS_ECDHE_ECDSA_WITH _AES_256_CBC_SHA384. Let’s break that down. Each underscore (_) in this string separates one algorithm from the next. For example, if we set --tls-cipher-suites in the API server to be something like

TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256

we can look this specific protocol up at http://mng.bz/nYZ8 and then determine how the communication protocol works. For example:

  • Protocol—Transport Layer Security (TLS)

  • Key exchange—Elliptic Curve Diffie-Hellman Ephemeral (ECDHE)

  • Authentication—Elliptic Curve Digital Signature Algorithm (ECDSA)

  • Encryption—Advanced Encryption Standard with 256-bit key in cipher block chaining mode (AES 256 CBC)

The specifics of these protocols are beyond the scope of this book, but it is important to note that you need to monitor your TLS security posture, especially if it is set by a larger standards body in your organization, to confirm that your security model in Kubernetes aligns with the TLS standards for your organization. For example, to update the cipher suites used by any Kubernetes service, send it the tls-cipher-suites argument on startup:

--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256

Adding this to your API server ensures that it only connects to other services using this cipher suite. As shown, you can support multiple cipher suites by adding a comma to separate the values. A comprehensive list of suites is available in the help page for any service (for example, http://mng.bz/voZq shows the help for the Kubernetes scheduler, kube-scheduler). It’s also important to note that

  • If a TLS cipher is exposed as having a vulnerability, you’ll want to update the cipher suites in the Kubernetes API server, scheduler, controller manager, and kubelet. Each of these components serves content over TLS in one way or another.

  • If your organization doesn’t allow certain cipher suites, you should explicitly delist these.

Note If you oversimplify the cipher suites that you allow into your API server, you might risk certain types of clients not being able to connect to it. As an example of this, Amazon ELBs are known to sometimes use HTTPS health checks to ensure that an endpoint is up before forwarding traffic to it, but they don’t support some common TLS ciphers used in the Kubernetes API server. Version 1 of the AWS load balancer API only supports non-elliptic cipher algorithms, such as TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256. The result here can be crippling; your entire cluster will not work at all! Because it’s common to put several API servers behind a single ELB, keep in mind that a TCP health check (rather than HTTPS) might be easier to manage over time, especially if you require special security ciphers on your API servers.

14.1.2 Immutable OSs vs. patching nodes

Immutability is something that you cannot change. An immutable OS consists of components and binaries that are read-only, and ones that you cannot patch. Instead of patching and updating software, you can replace the entire OS by wiping the OS from the server or the cloud and then deleting the VM and creating a new one. Kubernetes has simplified the running of immutable OSs by allowing an administrator to more easily move workloads off a node (because it is a built-in feature).

Instead of having a system that automatically applies patches with your distribution’s package manager, use an immutable OS. Having a Kubernetes cluster removes the notion of customized snowflake servers. These are servers that run specific applications and replace those with a node that is standardized, but running an immutable OS as the next logical step. One of the easiest ways to inject a vulnerability into a mutable system is to replace a common Linux library.

Immutable OSs are read-only, and because they are read-only, it becomes impossible to make specific changes, because you cannot write those changes to disk, and reduces our exposure. Using a distribution that is immutable removes a multitude of these opportunities. In general, a Kubernetes control plane (the API server, controller manager, and scheduler) node will have

  • A kubelet binary

  • The kube-proxy binary

  • A containerd or other container runtime executable

  • An image for etcd

All of these are baked in for rapid bootstrapping startup. Meanwhile, a Kubernetes worker node will have the same components, with the exception of etcd. This is an important distinction because etcd doesn’t run in a supported manner on a Windows environment, but some users will want to run Windows worker nodes to run Windows Pods.

Building custom images for Windows workers is extremely important because the Windows OS is not redistributable, so end users must build Windows kubelet images if they want to use an immutable deployment model. To learn more about immutable images, you can evaluate the Tanzu Community Edition project (https://tanzu .vmware.com/tanzu/community). It aims to provide the broader community with a “batteries-included” approach to using immutable images along with the Cluster API to bring up usable, production grade clusters. Many other hosted Kubernetes services, including Googles GKE, also use immutable operating systems.

14.1.3 Isolated container runtimes

Containers are amazing, but they do fall short of completely isolating the process from the OS. Docker Engine (and other container engines) does not fully sandbox a running container from the Linux kernel. There is not a strong security boundary between the container and the host, so if the host’s kernel has an exploit, the container can probably access the Linux kernel exploit and take advantage of that. Docker Engine utilizes Linux namespaces to separate processes from things like direct access to the Linux networking stack, but there are still holes. For instance, the hosts /sys and /proc filesystems are still read by a process running inside a container.

Projects like gVisor, IBM Nabla, Amazon Firecracker, and Kata provide a virtual Linux kernel that isolates a container’s process from the host’s kernel, thus providing a truer sandbox. These projects are still relatively new, at least in an open source sense, and are not yet predominantly used in a Kubernetes environment. These are just a few of the projects that are quite mature because gVisor is used as part of Google Cloud Platform, and Firecracker is used as part of the Amazon Web Services platform. Perhaps by the time you read this, more Kubernetes clusters containers will run on top of a virtual kernel! We can even think about spinning up micro VMs as Pods. These are fun times that we live in!

14.1.4 Resource attacks

A Kubernetes node has a finite amount of resources that include CPU, memory, and disk. We have a number of Pods, a kube-proxy, kubelets, and other Linux processes running on the cluster. The node typically has a CNI provider, a logging daemon, and other processes supporting the cluster. You need to ensure that the container(s) in the Pods do not overwhelm the node resource usage. If you do not provide restraints, then a container can overwhelm a node and impact all of the other systems. In essence, a runaway container process can perform a Denial of Service (DoS) attack on a node. Enter the resource limits. . . .

Resource limits are controlled by implementing three different API-level objects and configurations. Pod API objects can have settings that control each of the limits. For instance, the following YAML stanza provides CPU, memory, and disk space usage limits:

apiVersion: v1
kind: Pod
metadata:
  name: core-kube-limited
spec:
  containers:
  - name: app
    image:
    resources:
      requests:                   
       memory: "42Mi"
        cpu: "42m"
        ephemeral-storage: "21Gi"
      limits:                     
       memory: "128Mi"
        cpu: "84m"
        ephemeral-storage: "42Gi"

Provisions the initial amount of CPU, memory, or storage

Sets the maximum amount of CPU, memory, and storage allowed

In terms of security, if any of these values are surpassed, the Pod is restarted. And, if the limits are passed again, then the Pod is terminated and not started again.

Another interesting thing is that resource requests and limits also impact the scheduling of a Pod on a node. The node that hosts a Pod must have the resources available for the initial requests for the scheduler to pick the node that hosts the Pod. You may notice that we are using units to represent requests and limit memory, CPU, and ephemeral storage values.

14.1.5 CPU units

To measure a CPU, the base unit that Kubernetes uses is 1, which equates to one hyperthread on bare metal or one core/vCPU in the cloud. You are also allowed to express a CPU unit in a decimal; for example, you can have 0.25 CPU units. Moreover, the API also allows you to convert 0.25 decimal CPU units to read as 250 m. All of these stanzas are allowed for the CPU:

resources:
  requests:
    cpu: "42"    
resources:
  requests:
  cpu: "0.42"    
resources:
  requests:
    cpu: "420m"  

Sets 42 CPUs (it’s a big server!)

0.42 of a CPU that is measured as a unit of 1

This is the same as 0.42 in the previous code block.

14.1.6 Memory units

Memory is measured in bytes, in integers, and in fixed-point numbers using these suffixes: E, P, T, G, M, or K. Or you can use Ei, Pi, Ti, Gi, Mi, or Ki, which represent the power-of-two equivalents. The following stanzas are roughly all the same values:

resources:
  requests:
    memory: "128974848"    
resources:
  requests:
    memory: "129e6"        

Byte plain number representations (128,974,848 bytes)

129e6 is sometimes written as 129e+6 in scientific notation: 129e+6 == 129000000. This stanza represents 129,000,000 bytes.

The next stanzas deal with the typical megabits versus megabytes conversions:

resources:
  requests:
    memory: "129M"      

129 megabits == 1.613e+7 bytes, which is close to the 129e+6 value.

Next, megabytes:

resources:
  requests:
    memory: "123Mi"      

123 megabytes == 1.613e+7 bytes, which is close to the 129e+6 value.

14.1.7 Storage units

The newest API configuration is ephemeral storage requests and limits. Ephemeral storage limits apply to three storage components:

  • emptyDir volumes, except tmpfs

  • Directories holding node-level logs

  • Writeable container layers

When a limit is surpassed, the kubelet evicts the Pod. Each node is configured with a maximum amount of ephemeral storage, which again impacts the schedule of Pods to a node. There is yet another limit where a user can specify specific node limits called extended resources. You can find more details about extended resources in the Kubernetes documentation.

14.1.8 Host networks vs. Pod networks

In section 14.4, we will cover NetworkPolicies. These give you the ability to lock down Pod communication using a CNI provider that, typically, implements these policies for you. There’s a much more fundamental type of network security you should consider, however: not running a Pod on the same network as your hosts. This instantly

  • Limits the access of the outside world to your Pod

  • Limits the access of your Pod to the host’s network ports

Having a Pod join the host network allows the Pod to have easier access to the node and, thus, increases the blast radius upon an attack. If a Pod does not have to run on the host network, do not run the Pod on the host network! Should a Pod need to run on the host network, then do not expose that Pod to the internet. The following code snippet is a partial YAML definition of a Pod that runs on the host network. You will often see Pods running on the host network if the Pod is performing administrative tasks, such as logging or networking (a CNI provider):

apiVersion: v1
kind: Pod
metadata:
  name: host-Pod
spec:
  hostNetwork: true

14.1.9 Pod example

We have covered different Pod API configurations: service account tokens, CPU, and other resource settings, security context, and so forth. Here is an example that contains all the configurations:

apiVersion: v1
kind: Pod
metadata:
  name: example-Pod
spec:
  automountServiceAccountToken: false   
 securityContext:                       
   runAsUser: 3042
    runAsGroup: 4042
    fsGroup: 5042
    capabilities:
      add: ["NET_ADMIN"]
  hostNetwork: true                     
 volumes:
  - name: sc-vol
     emptyDir: {}
  containers:
  - name: sc-container
    image: my-container
    resources:                          
     requests:
        memory: "42Mi"
        cpu: "42m"
        ephemeral-storage: "1Gi"
      limits:
        memory: "128Mi"
        cpu: "84m"
        ephemeral-storage: "2Gi"
    volumeMounts:
    - name: sc-vol
      mountPath: /data/foo
  serviceAccountName: network-sa        

Disables automount for the service account token

Sets the security context and gives a capability of NET_ADMIN

Runs on the host network

Sets resource limits

Gives the Pod a specific service account

14.2 API server security

Components like binary authentication use webhooks provided by the admission controller. Various controllers are part of the Kubernetes API server and create a webhook as an entry point for events. For instance, ImagePolicyWebhook is one of the plugins that allows for the system to respond to the webhook and make admission decisions about containers. If a Pod does not pass the admission standards, it holds it in a pending state—it is not deployed to the cluster. Admission controllers can validate the API objects being created in the cluster, mutate or change those objects, or do both. From a security standpoint, this provides an immense amount of control and auditing capabilities for a cluster.

14.2.1 Role-based access control (RBAC)

First and foremost, role-based access control (RBAC) needs to be enabled on your cluster. Currently, RBAC is enabled by most installers and cloud-hosted providers. The Kubernetes API server uses the --authorization-mode=RBAC flag to enable RBAC. If you are using a self-hosted version of Kubernetes, such as GKE, RBAC is enabled. The authors are certain there is an edge case where running RBAC does not meet business needs. However, the other 99% of the time, you need to enable RBAC.

RBAC is a role-based security mechanism that controls user and system access to resources. It restricts access to resources to only authorized users and service accounts via roles and privileges. How does that apply to Kubernetes? One of the most critical components that you want to secure with Kubernetes is the API server. When a system user has administrator access to the cluster via the API server, that user can drain nodes, delete objects, and otherwise cause a great level of disruption. Administrators in Kubernetes are root users in the context of the cluster.

RBAC is a powerful security component that provides great flexibility in how you restrict API access within a cluster. Because it is a powerful mechanism, it also has the usual side effect of being quite complex and challenging to debug at times.

Note An average Pod running in Kubernetes should not have access to the API server, so you should disable the mounting of the service account token.

14.2.2 RBAC API definition

The RBAC API defines the following types:

  • Role—Contains a set of permissions, limited to a namespace

  • ClusterRole—Contains a set of permissions that are cluster-wide

  • RoleBinding—Grants a role to a user or a group

  • ClusterRole—Grants a ClusterRole to a user or a group

Within the Role and ClusterRole definitions, there are several defined components. The first that we will cover is verbs, which include API and HTTP verbs. Objects within the API server can have a get request; hence, the get request verb definition. We often think about this in terms of the create, read, update, and delete (CRUD) verbs defined in the creation of REST services. Verbs you can use include

  • API request verbs for resource requests—get, list, create, update, patch, watch, proxy, redirect, delete, and deletecollection

  • HTTP request verbs for non-resource requests—get, post, put, and delete

For instance, if you want an operator to be able to watch and update Pods, you can

  1. Define the resource (in this case, a Pod)

  2. Define the verbs that the Role has access to (most likely list and patch)

  3. Define the API groups (using an empty string denotes the core API group)

You are already familiar with API groups because they are the apiVersion and kind that appear in Kubernetes manifests. API groups follow the REST path in the API server itself (/apis/$GROUP_NAME/$VERSION) and use apiVersion $GROUP_NAME/$VERSION (for instance, batch/v1). Let’s keep it simple, though, and not deal with API groups just yet. We’ll start with the core API group instead. Here is an example of a role for a specific namespace. Because roles are limited to namespaces, this provides access to perform list and patch verbs on the Pod resource:

# Create a custom role in the default namespace that grants access to
# list, and patch Pods
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: Pod-labeler
  namespace: rbac-example
rules:
- apiGroups: [""] # "" refers to the core API group
  resources: ["Pods"] # the
  verbs: ["list", "patch"] # authorization to use list and patch verbs

For the previous example, we can define a service account to use the role in that snippet like so:

# Create a ServiceAccount that will be bound to the role above
apiVersion: v1
kind: ServiceAccount
  metadata:
    name: Pod-labeler
    namespace: rbac-example

The previous YAML creates a service account that can be used by a Pod. Next, we’ll create a role binding to join the previous service account with the role that was also defined previously:

# Binds the Pod-labeler ServiceAccount to the Pod-labeler Role
# Any Pod using the Pod-labeler ServiceAccount will be granted
# API permissions based on the Pod-labeler role.
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: Pod-labeler
  namespace: rbac-example
subjects:
  # List of service accounts to bind
- kind: ServiceAccount
  name: Pod-labeler
roleRef:
  # The role to bind
  kind: Role
  name: Pod-labeler
  apiGroup: rbac.authorization.k8s.io

Now you can launch the Pod within a deployment that has the service account assigned to that Pod:

# Deploys a single Pod to run the Pod-labeler code
apiVersion: apps/v1
kind: Deployment
metadata:
  name: Pod-labeler
  namespace: rbac-example
spec:
  replicas: 1
 
  # Control any Pod labeled with app=Pod-labeler
  selector:
    matchLabels:
      app: Pod-labeler
 
  template:
    # Ensure created Pods are labeled with app=Pod-labeler
    # to match the deployment selector
    metadata:
      labels:
        app: Pod-labeler
 
    spec:
      # define the service account the Pod uses
      serviceAccount: Pod-labeler
 
      # Another security improvement, set the UID and GID the Pod runs with
      # Pod-level security context to define the default UID and GIDs
      # under which to run all container processes. We use 9999 for
      # all IDs because it is unprivileged and known to be unallocated
      # on the node instances.
      securityContext:
        runAsUser: 9999
        runAsGroup: 9999
        fsGroup: 9999
 
      containers:
      - image: gcr.io/pso-examples/Pod-labeler:0.1.5
        name: Pod-labeler

Let’s recap. We have created a role with the permissions to patch and list Pods. Then, we created a service account so we can create a Pod and have that Pod use the defined user. Next, we defined a role binding to add the service account to the role. Lastly, we launched a deployment that has a Pod defined, which uses the service account that was previously defined.

RBAC is nontrivial, but vital to the security of a Kubernetes cluster. The previous YAML was taken from the Helmsman RBAC demo located at http://mng.bz/ZzMa.

14.2.3 Resources and subresources

Most RBAC resources use a single name, like Pod or Deployment. Some resources have subresources, such as in the following code snippet:

GET /api/v1/namespaces/rbac-example/Pods/Pod-labeler/log

This API endpoint denotes the path to the subresource log in the rbac-example namespace for the Pod named Pod-labeler. The definition follows:

GET /api/v1/namespaces/{namespace}/Pods/{name}/log

In order to use the subresource of logs, you would define a role. The following shows an example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: rbac-example
  name: Pod-and-Pod-logs-reader
rules:
- apiGroups: [""]
  resources: ["Pods", "Pods/log"]
  verbs: ["get", "list"]

You can also further restrict access to the logs of a Pod by naming the Pod. For example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: rbac-example
  name: Pod-labeler-logs
rules:
- apiGroups: [""]
  resourceNames: ["Pod-labeler"]
  resources: ["Pods/log"]
  verbs: ["get"]

Notice that the rules element in the previous YAML is an array. The following code snippet shows how you can add multiple permissions to the YAML. The resources, resourceNames, and verbs can be of any combination:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: rbac-example
  name: Pod-labeler-logs
rules:
- apiGroups: [""]
  resourceNames: ["Pod-labeler"]
  resources: ["Pods/log"]
  verbs: ["get"]
- apiGroups: [""]
  resourceNames: ["another-Pod"]
  resources: ["Pods/log"]
  verbs: ["get"]

Resources are things like Pods and nodes, but the API server also includes elements that are not resources. These are defined by the actual URI component in the API REST endpoint; for example, giving the Pod-labeler-logs RBAC role access to the /healthz API endpoint, as the following snippet shows:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: rbac-example
  name: Pod-labeler-logs
rules:
- apiGroups: [""]
  resourceNames: ["Pod-labeler"]
  resources: ["Pods/log"]
  verbs: ["get"]
- apiGroups: [""]
  resourceNames: ["another-Pod"]
  resources: ["Pods/log"]
  verbs: ["get"]
- nonResourceURLs: ["/healthz", "/healthz/*"]    
  verbs: ["get", "post"]

The asterisk (*) in a nonresource URL is a suffix glob match.

14.2.4 Subjects and RBAC

Role bindings can encompass Users, ServiceAccounts, and Groups in Kubernetes. In the following example, we will create another service account called log-reader and add the service account to the role-binding definition in the previous section. In the example, we also have a user named james-bond and a group named MI-6:

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: Pod-labeler
  namespace: rbac-example
subjects:
  # List of service accounts to bind
- kind: ServiceAccount
  name: Pod-labeler
- kind: SeviceAccount
  name: log-reader
- kind: User
  name: james-bond
- kind: Group
  name: MI-6
roleRef:
  # The role to bind
  kind: Role
  name: Pod-labeler
  apiGroup: rbac.authorization.k8s.io

Note Users and groups are created by the authentication strategy that is set up for the cluster.

14.2.5 Debugging RBAC

Granted, RBAC is complex, and it’s a pain, but we always have the audit log. Kubernetes, when enabled, logs an audit trail of security events that have affected the cluster. These events include user actions, administrator actions, and/or other components inside the cluster. Basically, you get a “who,” “what,” “from where,” and “how” if RBAC and other security components are utilized. The audit logging is configured via an audit policy file that is passed into the API server via kube-apiserver --audit-policy-file.

So we have a log of all events—awesome! But wait . . . you now have a cluster that has hundreds of roles, a bunch of users, and a plethora of role bindings. Now you have to join all of the data together. For that, there’s a couple of tools to assist us. The common theme between these tools is the joining of the different objects used to define RBAC access. In order to introspect RBAC-based permissions in cluster roles, RoleBindings and the Subject need to be joined. The Subject can be users, groups, or service accounts:

  • ReactiveOps has created a tool that allows a user to find the current roles that a user, group, or service account is a member of. rbac-lookup is available at https://github.com/reactiveops/rbac-lookup.

  • Another tool that finds which permissions a user or service account has within the cluster is kubectl-rbac. This tool is located at https://github.com/octarinesec/kubectl-rbac.

  • Jordan Liggit has an open source tool called audit2rbac. This tool takes audit logs and a username and creates a role and role binding that match the requested access. Calls are made to the API server, and you can capture the logs. From there, you can run audit2rbac to generate the needed RBAC (in other words, RBAC reverse engineered).

14.3 Authn, Authz, and Secrets

Authn (user authentication) and Authz (authorization) are the group and permissions that an authenticated user has. It may seem odd that we are talking about Secrets here as well, but some of the same tools used for authentication and authorization are used for Secrets, and you often need authentication and authorization to access Secrets.

First and foremost, do not use the default cluster administrative certificates that are generated when installing a cluster. You will need an IAM (Identity and Access Management) service provider for authenticating and authorizing users. Also, do not enable username and password authentication on the API server; use the built-in capability for user authentication utilizing TLS certificates.

14.3.1 IAM service accounts: Securing your cloud APIs

Kubernetes containers have identities that are cloud native (these identities are aware of the cloud). This is beautiful and terrifying at the same time. Without a threat model for your cloud, you cannot have a threat model for your Kubernetes cluster.

Cloud IAM service accounts comprise the basics of security, including authorization and authentication for people and systems. Within the data center, Kubernetes security configuration is limited to the Linux system, Kubernetes, the network, and the containers deployed. When running Kubernetes in the cloud, a new wrinkle emerges—the IAM roles of nodes and Pods:

  • IAM is the role for a specific user or service account, and that role is then a member of a group.

  • Every node in a Kubernetes cluster has an IAM role.

  • Pods typically inherit that role.

The nodes in your cluster, specifically those that run on the control plane, need IAM roles to run inside the cloud. The reason for this is that a lot of cloud-native functionality in Kubernetes comes from the fact that Kubernetes itself has a notion of how to talk to its own cloud provider. As an example, let’s take a cue from GKE’s official documentation: Google Cloud Platform automatically creates a service account named compute engine default service account, and GKE associates it with the nodes it creates. Depending on how your project is configured, the default service account may or may not have permissions to use other cloud-platform APIs. GKE also assigns some limited access scopes to compute instances. Updating the default service account’s permissions or assigning more access scopes to compute instances is not the recommended way to authenticate to other cloud-platform services from Pods running on GKE.

Your containers, therefore, have privileges in many cases that are on par with the nodes themselves. As clouds evolve to make more granular permission models for containers, this default will likely improve in the future. It remains the case, however, that you need to ensure that the IAM role or roles have the least amount of permissions applicable and that there will always be ways to change these IAM roles. For instance, when using GKE in Google Cloud Platform, you have to create a new IAM role in a project for a cluster. If you do not, the cluster usually uses the compute engine default service account, which has editor permissions.

Editor in Google Cloud allows a given account (in this case, a node in your cluster, which translates to potentially any Pod) to edit any resource within that project. For example, you could delete an entire fleet of databases, TCP load balancers, or cloud DNS entries by simply compromising a given Pod in the cluster. Moreover, you should remove the default service account for any project that you create in GCP. The same problems exist in AWS, Azure, and others. The bottom line is that each cluster is created with its own unique service account, and that service account has the least possible permissions. With tools like kops (Kubernetes Operations), we can go through every permission a Kubernetes cluster requires, and kops then creates an IAM role specific to the control plane, as well as another for the nodes.

14.3.2 Access to cloud resources

Assuming that you have configured your Kubernetes nodes with the lowest level of permissions needed, now that you’ve read this, you may feel safe. In fact, if you are running a solution like AKS (Azure Kubernetes Service), you do not have to worry about configuring the control plane and only have to be concerned with the node-level IAM, but that’s not all. For example, a developer creates a service that needs to talk to a hosted cloud service—say, a file store. The Pod that is running now needs a service account with the correct roles. There are various approaches to this.

Note AKS is probably the easiest solution, but it does create some challenges. You need to limit the Pods on the nodes to ones that only need access to the cloud resources or accept the risk that all Pods will now have file-store access.

tip Use a tool like kube2iam (https://github.com/jtblin/kube2iam) or kiam (https://github.com/uswitch/kiam) for this approach.

Some newly created operators can assign specific service accounts to specific Pods. The component on each node intercepts the calls to the cloud API. Instead of using the node’s IAM role, it assigns a role to a Pod in your cluster, which is denoted via annotations. Some of the hosted cloud providers have similar solutions. Some cloud providers, like Google, have sidecars that can run and connect to a cloud SQL service. The sidecar is assigned a role and then proxies the applications that connect to the database.

Probably the most complicated but more robust solution is to use a centralized vault server. With this, you can have applications retrieve short-lived IAM tokens that allow cloud system access. Often, a sidecar is used to automatically refresh the tokens used. We can also use HashiCorp Vault to secure Secrets that are not IAM credentials. If your use case requires robust Secrets and IAM management, Vault is an excellent solution, but as with all solutions that are mission critical, you will need to maintain and support it.

tip Use HashiCorp Vault (https://www.vaultproject.io/) to store Secrets.

14.3.3 Private API servers

The last thing that we are going to cover in this section is the API server’s network access. You can either make your API server inaccessible via the internet or place the API server on a private network. You will need to utilize a bastion host, VPN, or other form of connectivity to the API server if you place the API server load balancer on a private network, so this solution is not as convenient.

The API server is an extremely sensitive security point and must be guarded as such. DoS attacks or general intrusion can cripple a cluster. Moreover, when the Kubernetes community finds security problems, they occasionally exist in the API server. If you can, put your API server on a private network, or at least whitelist the IP addresses that are able to connect to the load balancer that fronts your API server.

14.4 Network security

Again, this is an area of security that is rarely addressed properly. By default, a Pod on the Pod network can access any Pod anywhere on the cluster, which also includes the API server. This capability exists to allow a Pod to access systems like DNS for service lookups. A Pod running on the host network can access just about everything: all Pods, all nodes, and the API server. A Pod on the host network can even access the kubelet API port if the port is enabled.

Network policies are objects that you can define to control network traffic between Pods. NetworkPolicy objects allow you to configure access around Pod ingress and egress. Ingress is traffic that is coming into a Pod, while egress is network traffic leaving the Pod.

14.4.1 Network policies

You can create a NetworkPolicy object on any Kubernetes cluster, but you need a running security provider such as Calico. Calico is a CNI provider that also provides a separate application to implement network policies. If you create a network policy without a provider, the policy does nothing. Network policies have the following constraints and features. They are

  • Applied to Pods

  • Matched to specific Pods via label selectors

  • In control of both ingress and egress network traffic

  • In control of network traffic defined by a CIDR range, a specific namespace, or a matched Pod or Pods

  • Designed to specifically handle TCP, UDP, and SCTP traffic

  • Capable of handling named ports or specific Pod numbers

Let’s try this. To set up a kind cluster and install Calico on it, first run the following command to create the kind cluster, and do not start the default CNI. Calico will be installed next:

$ cat <<EOF | kind create cluster --config -
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  # the default CNI will not be installed
  disableDefaultCNI: true
  PodSubnet: "192.168.0.0/16"
EOF

Next, we install the Calico Operator and its custom resources. Use the following commands to do this:

$ kubectl create -f 
https://docs.projectcalico.org/manifests/tigera-operator.yaml
$ kubectl create -f 
https://docs.projectcalico.org/manifests/custom-resources.yaml

Now we can observe the Pod startup. Use the following command:

$ kubectl get Pods --watch -n calico-system

Next, set up a couple of namespaces, an NGINX server to serve a test web page, and a BusyBox container where we can run wget. To do this, use the following commands:

$ kubectl create ns web
$ kubectl create ns test-bed
$ kubectl create deployment -n web nginx --image=nginx
$ kubectl expose -n web deployment nginx --port=80
$ kubectl run --namespace=test-bed testing --rm -ti --image busybox /bin/sh

From the command prompt on the BusyBox container, access the NGINX server installed in the web namespace. Here’s the command for this:

$ wget -q nginx.web -O

Now, install a network policy that denies all inbound traffic to the NGINX Pod. Use the following command:

$ kubectl create -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-ingress
  namespace: web
spec:
  PodSelector:
    matchLabels: {}
  policyTypes:
  - Ingress
EOF

This command creates a policy so that you can no longer access the NGINX web page from the testing Pod. Run the following command on the command line for the testing Pod. The command will time out and fail:

$ wget -q --timeout=5 nginx.web.svc.cluster.local -O -

Next, open the Pod ingress from the test-bed namespace to the web namespace. Use the following code snippet:

$ kubectl create -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: access-web
  namespace: web
spec:
  PodSelector:
    matchLabels:
      app: nginx
  policyTypes:
    - Ingress
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            name: test-bed
EOF

On the command line for the testing Pod, enter

$ wget -q --timeout=5 nginx.web.svc.cluster.local -O -

You will notice that the command fails. The reason is that network policies match labels, and the test-bed namespace is not labeled. The following command adds the label:

$ kubectl label namespaces test-bed name=test-bed

On the command line for the testing Pod, check that the network policy now works. Here’s the command:

$ wget -q --timeout=5 nginx.web.svc.cluster.local -O -

The first recommendation for all firewall configurations is to create a deny-all rule. This policy denies all traffic flow within a namespace. Run the following command and disable all ingress and egress Pods for the test-bed namespace:

$ kubectl create -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-all
  namespace: test-bed
spec:
  PodSelector:
    matchLabels: {}    
 policyTypes:          
 - Ingress
  - Egress
EOF

Matches all Pods in a namespace

Defines the two policy types: ingress and egress

Now, implementing this policy causes some fun side effects. Not only can the Pods not talk to anything else (except in their namespace), but now they cannot talk to the DNS provider in the kube-system namespace. If the Pod does not need DNS capability, do not enable it! Let’s apply the following network policy to enable egress for DNS:

$ kubectl label namespaces kube-system name=kube-system
$ kubectl create -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: dns-egress
  namespace: test-bed
spec:
  PodSelector:
    matchLabels: {}          
 policyTypes:                
 - Egress
  egress:
  - to:
    - namespaceSelector:     
       matchLabels:
          name: kube-system
    ports:                   
   - protocol: UDP
      port: 53
EOF

Matches all Pods in the core Kubernetes namespace

Only allows egress

Egress rule that matches a labeled kube-system

Only allows UDP over port 53, which is the protocol and port for DNS

If you run the wget command, you will notice that the command still fails. We have ingress on the web namespace allowed but do not have egress from the test-bed namespace to the web namespace enabled. Run the following command to turn on egress to the web namespace from the test-bed Pod:

$ kubectl label namespaces web name=web
$ kubectl create -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-bed-egress
  namespace: test-bed
spec:
  PodSelector:
    matchLabels: {}
  policyTypes:
    - Egress
  egress:
    - to:
      - namespaceSelector:
          matchLabels:
            name: web
EOF

You probably noticed that NetworkPolicy rules can get complex. If you are running a cluster where the trust model is high trust, implementing network policies may not benefit your security posture. Using the 80/20 rule, do not start with NetworkPolicies if your organization is not updating its images. Yes, network policies are complex, and that is partially why using a service mesh in your organization may assist your security.

Service mesh

A service mesh is an application that runs on top of a Kubernetes cluster and provides various capabilities that often improve observability, monitoring, and reliability. Common service meshes include Istio, Linkerd, Consul, and others. We mention service meshes in the security chapter because they can assist your organization with two key things: mutual TLS and advanced network traffic-flow policies. We cover this topic very briefly since there are entire books on this subject.

A service mesh adds a complex layer on top of virtually every application that it runs in a cluster, but also provides many good security components. Again, use your judgement as to whether adding a service mesh is warranted, but do not start with a service mesh on day one. If you want to know whether your cluster conforms to the CNCF specification for the NetworkPolicy API, you can run the NetworkPolicy test suites using Sonobuoy (which we’ve covered in previous chapters):

$ sonobuoy run --e2e-focus=NetworkPolicy
# wait about 30 minutes
$ sonobuoy status

This outputs a series of table tests that show you exactly how network policies are working on your cluster. To learn more about the concepts of NetworkPolicy API conformance for CNI providers, check out http://mng.bz/XW7M. We highly recommend running the NetworkPolicy conformance tests when evaluating your CNI provider for compatibility to the Kubernetes network security specifications.

14.4.2 Load balancers

One thing to keep in mind is that Kubernetes can create external load balancers that expose your applications to the world, and it does so automatically. This may seem like common knowledge, but putting the wrong service into a production environment can expose a service (such as an administrative user interface) to a database. Use tooling during CI (continuous integration) or a tool like the Open Policy Agent (OPA) to ensure that external load balancers are not created accidentally. Also, use internal load balancers when you can.

14.4.3 Open Policy Agent (OPA)

We previously mentioned that operators can help an organization further secure a cluster. The OPA, a CNCF project, strives to allow declarative policies that are run via the admission controller.

OPA is a lightweight general-purpose policy engine that can be co-located with your service. You can integrate OPA as a sidecar, host-level daemon, or library.

Services offload policy decisions to OPA by executing queries. OPA evaluates policies and data to produce query results (which are sent back to the client). Policies are written in a high-level declarative language and can be loaded into OPA via the filesystem or well-defined APIs.

—Open Policy Agent (http://mng.bz/RE6O)

There are two different components that OPA maintains: the OPA admission controller and the OPA Gatekeeper. The Gatekeeper does not use a sidecar, utilizes CRDs (custom resource definitions), is extensible, and performs audit functionality. The next section walks through installing Gatekeeper on a kind Kubernetes cluster.

Installing OPA

First, clean up your cluster running Calico. Then, let’s start another cluster:

$ kind delete cluster
$ kind create cluster

Next, install OPA Gatekeeper using the following command:

$ kubectl apply -f 
https://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.7.0/
 deploy/gatekeeper.yaml

The following command prints the names of the Pods installed:

$ kubectl -n gatekeeper-system get po
NAME                                           READY  STATUS   RESTARTS  AGE
gatekeeper-audit-7d99d9d87d-rb4qh              1/1    Running  0         40s
gatekeeper-controller-manager-f94cc7dfc-j6zjv  1/1    Running  0         39s
gatekeeper-controller-manager-f94cc7dfc-mxz6d  1/1    Running  0         39s
gatekeeper-controller-manager-f94cc7dfc-rvqvj  1/1    Running  0         39s

Note You can also use Helm to install OPA Gatekeeper.

Gatekeeper CRDs

One of the complexities of OPA is learning a new language (called Rego) to write policies. See http://mng.bz/2jdm for more information about Rego. With Gatekeeper, you will put policies written in Rego into the supported CRDs. You need to create two different CRDs to add a policy:

  • A constraint template to define the policy and its targets

  • A constraint to enable a constraint template and define how the policy is enabled

An example of a constraint template and the associated constraint follows. The source block contains two CRDs defined in YAML. In this example, the match stanza supports

  • kinds—Defines Kubernetes API objects

  • namespaces—Specifies a list of namespaces

  • excludedNamespaces—Specifies a list of excluded namespaces

  • scope— *, cluster, or namespace

  • labelSelector—Sets the standard Kubernetes label selector

  • namespaceSelector—Sets the standard Kubernetes namespace selector

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: enforcespecificcontainerregistry
spec:
  crd:
    spec:
      names:
        kind: EnforceSpecificContainerRegistry        
       # Schema for the `parameters` field
        openAPIV3Schema:
          properties:
            repos:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |                                         
       package enforcespecificcontainerregistry
 
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          satisfied := [good | repo = input.parameters.repos[_] ;
 good = startswith(container.image, repo)]
          not any(satisfied)
          msg := sprintf("container ‘%v' has an invalid image repo
 ‘%v', allowed repos are %v",
 [container.name, container.image, input.parameters.repos])
        }
 
        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          satisfied := [good | repo = input.parameters.repos[_] ;
 good = startswith(container.image, repo)]
          not any(satisfied)
          msg := sprintf("container ‘%v' has an invalid image repo ‘%v',
 allowed repos are %v",
 [container.name, container.image, input.parameters.repos])
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: EnforceSpecificContainerRegistry
metadata:
  name: enforcespecificcontainerregistrytestns
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces:
      - "test-ns"
  parameters:
    repos:
      - "myrepo.com"

Defines EnforceSpecific-ContainerRegistry, a CRD that’s used for the constraint

Now, let’s take the previous YAML file and save it to two files (one with the template and the second with the constraint). On the cluster, install the template file first and the constraint second. (For brevity, we do not provide the command.) Now we can exercise the policy by running the following commands:

$ kubectl create ns test-ns
$ kubectl create deployment -n test-ns nginx --image=nginx

You can check the status of the deployment by running the following command (we are expecting that the Pod will not start):

$ kubectl get -n test-ns deployments.apps
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     0            0           37s

If you execute a kubectl -n test-ns get Pods, you will notice that no Pods are running. The event logs contain the message that shows Pod creation failing. You can view the logs with the following command:

$ kubectl -n test-ns get events
7s          Warning   FailedCreate        replicaset/nginx-6799fc88d8
 Error creating: admission webhook "validation.gatekeeper.sh"
 denied the request: [denied by
 enforcespecificcontainerregistrytestns] container <nginx>
 has an invalid image repo <nginx>, allowed repos are
 ["myrepo.com"]

14.4.4 Multi-tenancy

To categorize multi-tenancy, look at the level of trust that the tenants have with one another, and then develop that model. There are three basic buckets or security models to categorize as multi-tenancy:

  • High trust (same company)—Different departments in the same company are running workloads on the same cluster.

  • Medium to low trust (different companies)—External customers are running applications on your cluster in different namespaces.

  • Zero trust (data governed by law)—Different applications are running data that is governed by laws, where allowing access between different data stores can cause legal action against your company.

The Kubernetes community has worked on solving these use cases for many years. Jessie Frazelle sums it up nicely in her blog post entitled “Hard Multi-Tenancy in Kubernetes”:

The models for multi-tenancy have been discussed at length in the community’s multi-tenancy working group. . . . There have also been some proposals offered to solve each model. The current model of tenancy in Kubernetes assumes the cluster is the security boundary. You can build a SaaS on top of Kubernetes but you will need to bring your own trusted API and not just use the Kubernetes API. Of course, with that comes a lot of considerations you must also think about when building your cluster securely for a SaaS even.

—Jessie Frazelle, http://mng.bz/1jdn

The Kubernetes API was not built with the concept of having multiple isolated customers within the same cluster. Docker Engine and other container runtimes also have problems with running malicious or untrusted workloads. Software components like gVisor have made headway in properly sandboxing containers, but at the time of writing this book, we are not at a place where you can run a completely untrusted container.

So where are we? A security person would say it depends on your trust and security model. We previously listed three security models: high trust (same company), low trust to no trust (different companies), and zero trust (data governed by law). Kubernetes can support high trust multi-tenancy and, depending on the model, can support trust models in between high and low trust models. When you have zero trust or low trust with and between tenants, you need to use separate clusters for each client. Some companies run hundreds of clusters so that each small group of applications gets its own cluster, but, admittedly, that is a lot of clusters to manage.

Even if the clients are part of the same company, it may be necessary to isolate Pods on specific nodes due to data sensitivity. Through RBAC, namespaces, network policies, and node isolation, it’s possible to gain a decent level of isolation. Admittedly, there is a risk of hosting a workload that different companies run to use the same Kubernetes cluster. The support for multi-tenancy will grow over time.

Note Multi-tenancy also applies to running other environments, like development or test in a production cluster. You can, however, introduce bad actor code into a cluster by intermingling environments.

There are two main challenges using a single Kubernetes to host multiple customers: the API server and node security. After establishing authentication, authorization, and RBAC, why is there a problem with the API server and multiple tenants? One of the problems is with the URI layout of the API server. A common pattern for having multiple tenants using the same API server is to have the user ID, project ID, or some unique ID starting the URI.

Having a URI that starts with a unique ID allows a tenant to make a call to get all namespaces. Because Kubernetes does not have this isolation, you need to run kubectl get namespaces to get all the namespaces in a cluster. You’ll also need an API layer on top of the Kubernetes API to provide this form of isolation.

Another pattern for allowing multi-tenants is the capability of nesting resources, and the basic resource boundary in Kubernetes is namespaces. Kubernetes namespaces do not allow for nesting. Many resources are cross-namespace, including the default service account token. Often, tenants want to have fine-grain RBAC capability themselves, and giving a user permissions to create RBAC objects within Kubernetes can give the user capabilities beyond their shared tenancy.

Regarding node security, the challenge lies within. If you have multiple tenants on the same Kubernetes cluster, remember that they all share the following items (this is just a short list):

  • The control plane and the API server

  • Add-ons like the DNS server, logging, or TLS certificate generation

  • Custom resource definitions (CRDs)

  • Networks

  • Host resources

Overview of trusted multi-tenancy

Many companies want multi-tenancy to reduce costs and management overhead. There is value in not running three clusters: one each for development, test, and production environments. Simply run a single Kubernetes cluster for all three environments. Also, some companies do not want to have separate clusters for different products and/or software departments. Again, this is a business and security decision, and the organizations we work with usually have budgets and a constraint on human resources.

We are not going to give you step-by-step instructions on how to do multi-tenancy. We are simply providing guidelines on what steps you will need to implement. These steps will change over time and vary between organizations with different security models:

  1. Write down and design a security model. It may seem obvious, but we have seen organizations that do not use a security model. A security model needs to include different user roles, including cluster administrators and namespace administrators, and one or more tenant roles. A standardized naming convention for all of the API objects, users, and other components your organization creates is also critical.

  2. Utilize various API objects:

    • Namespaces
    • NetworkingPolicies
    • ResourceQuotas
    • ServiceAccounts and RBACRules
  3. Using tools like a service mesh with features like mutual TLS and network policy management can provide another level of security. Using a service mesh does add a significant layer of complexity, so only use it when needed.

  4. Consider using an OPA to assist with applying policy-based controls to the Kubernetes cluster.

tip If you are going to combine multiple environments in a single cluster, there are not only security concerns, but also challenges with testing Kubernetes upgrades. It is best to test upgrades first on another cluster.

14.5 Kubernetes tips

Here is a short list of various configurations and setup requirements:

  • Have a private API server endpoint, and if you can, do not expose your API server to the internet.

  • Use RBAC.

  • Use network policies.

  • Do not enable username and password authorization on the API server.

  • Use specific users when creating Pods, and don’t use the default admin accounts.

  • Rarely allow Pods to run on the host network.

  • Use serviceAccountName if the Pod needs to access the API server; otherwise, set automountServiceAccountToken to false.

  • Use resource quotas on namespaces and define limits in all Pods.

Summary

  • Node security relies on TLS certificates to secure communication between nodes and the control plane.

  • Using immutable OSs can further harden nodes.

  • Resource limits can prevent resource-level attacks.

  • Use the Pod network, unless you have to use the host network. The host network allows a Pod to talk to the node OS.

  • RBAC is key to securing an API server. It is non-trivial, but necessary.

  • The IAM service accounts allow for the proper isolation of Pod permissions.

  • Network policies are key to isolating network traffic. Otherwise, everything can talk to everything else.

  • An Open Policy Agent (OPA) allows a user to write security policies and enforces those policies on a Kubernetes cluster.

  • Kubernetes was not built initially with zero trust multi-tenancy in mind. You’ll find forms of mult-tenancy, but they come with tradeoffs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.174.156