Have you ever wondered how you can ensure that all containers running on a cluster come only from an approved container registry? Or maybe you’ve been asked to ensure that services are never exposed to the internet. These are precisely the problems that policy and governance for your cluster set out to answer. As Kubernetes matures and becomes adopted by more and more enterprises, the question of policy and governance is becoming increasingly frequent. Although this area is still relatively new and upcoming, in this chapter we share what you can do to make sure that your cluster is in compliance with the defined policies of your enterprise.
Whether you operate in a highly regulated environment—for example, health care or financial services—or you simply want to make sure that you maintain a level of control over what’s running on your clusters, you’re going to need a way to implement the stated policies of the enterprise. After these policies are defined, you will need to determine how to implement policy and maintain clusters that are compliant to these policies. These policies might be in place to meet regulatory compliance or simply to enforce best practices. Whatever the reason, you must be sure that you do not sacrifice developer agility and self-service when implementing these policies.
In Kubernetes, policy is everywhere. Whether it be network policy or pod security policy, we’ve all come to understand what policy is and when to use it. We trust that whatever is declared in Kubernetes resource specifications is implemented as per the policy definition. Both network policy and pod security policy are implemented at runtime. However, who manages the content that is actually defined in these Kubernetes resource specifications? That’s the job for policy and governance. Rather than implementing policy at runtime, when we talk about policy in the context of governance, what we mean is defining policy that controls the fields and values in the Kubernetes resource specifications themselves. Only Kubernetes resource specifications that are compliant against these policies are allowed and committed to the cluster state.
To be able to make decisions about what resources are compliant, we need a policy engine that is flexible enough to meet a variety of needs. The Open Policy Agent (OPA) is an open source, flexible, lightweight policy engine that has become increasingly popular in the cloud-native ecosystem. Having OPA in the ecosystem has allowed many implementations of different Kubernetes governance tools to appear. One such Kubernetes policy and governance project the community is rallying around is called Gatekeeper. For the rest of this chapter, we use Gatekeeper as the canonical example to illustrate how you might achieve policy and governance for your cluster. Although there are other implementations of policy and governance tools in the ecosystem, they all seek to provide the same user experience (UX) by allowing only compliant Kubernetes resource specifications to be committed to the cluster.
Gatekeeper is an open source customizable Kubernetes admission webhook for cluster policy and governance. Gatekeeper takes advantage of the OPA constraint framework to enforce custom resource definition (CRD)-based policies. Using CRDs allows for an integrated Kubernetes experience that decouples policy authoring from implementation. Policy templates are referred to as constraint templates, which can be shared and reused across clusters. Gatekeeper enables resource validation and audit functionality. One of the great things about Gatekeeper is that it’s portable, which means that you can implement it on any Kubernetes clusters, and if you are already using OPA, you might be able to port that policy over to Gatekeeper.
Gatekeeper is still under active development and is subject to change. For the most recent updates on the project, visit the official upstream repository.
It’s important not to become too stuck in the weeds and actually consider the problem that we are trying to solve. Let’s take a look at some policies that solve some of the most common compliance issues for context:
Services must not be exposed publicly on the internet.
Allow containers only from trusted container registries.
All containers must have resource limits.
Ingress hostnames must not overlap.
Ingresses must use only HTTPS.
Gatekeeper has adopted much of the same terminology as OPA. It’s important that we cover what that terminology is so that you can understand how Gatekeeper operates. Gatekeeper uses the OPA constraint framework. Here, we introduce three new terms:
Constraint
Rego
Constraint template
The best way to think about constraints is as restrictions that you apply to specific fields and values of Kubernetes resource specifications. This is really just a long way of saying policy. This means that when constraints are defined, you are effectively stating that you DO NOT want to allow this. The implications of this approach mean that resources are implicitly allowed without a constraint that issues a deny. This is important because instead of allowing the Kubernetes resources specification fields and values you want, you are denying only the ones you do not want. This architectural decision suits Kubernetes resource specifications nicely because they are ever changing.
Constraint templates are a Custom Resource Definition (CRD) that provide a means of templating policy so that it can be shared or reused. In addition, parameters for the policy can be validated. Let’s take a look at a constraint template in the context of the earlier examples. In the following example, we share a constraint template that provides the policy “Only allow containers from trusted container registries”:
apiVersion
:
templates.gatekeeper.sh/v1alpha1
kind
:
ConstraintTemplate
metadata
:
name
:
k8sallowedrepos
spec
:
crd
:
spec
:
names
:
kind
:
K8sAllowedRepos
listKind
:
K8sAllowedReposList
plural
:
k8sallowedrepos
singular
:
k8sallowedrepos
validation
:
# Schema for the `parameters` field
openAPIV3Schema
:
properties
:
repos
:
type
:
array
items
:
type
:
string
targets
:
-
target
:
admission.k8s.gatekeeper.sh
rego
:
|
package k8sallowedrepos
deny[{"msg": msg}] {
container := input.review.object.spec.containers[_]
satisfied := [good | repo = input.constraint.spec.parameters.repos[_] ; good = startswith(container.image, repo)]
not any(satisfied)
msg := sprintf("container <%v> has an invalid image repo <%v>, allowed repos are %v", [container.name, container.image, input.constraint.spec.parameters.repos])
}
The constraint template consists of three main components:
The name is the most important part. We reference this later.
Indicated by the validation field, this section defines the input parameters and their associated types. In this example, we have a single parameter called repo
that is an array of strings.
Indicated by the target
field, this section contains templated rego (the language to define policy in OPA). Using a constraint template allows the templated rego to be reused and means that generic policy can be shared. If the rule matches, the constraint is violated.
To use the previous constraint template, we must create a
constraint resource. The purpose of the constraint resource is to
provide the necessary parameters to the constraint template that we
created earlier. You can see that the kind
of the resource defined in the following example
is K8sAllowedRepos
, which maps to the constraint template defined in
the previous section:
apiVersion
:
constraints.gatekeeper.sh/v1alpha1
kind
:
K8sAllowedRepos
metadata
:
name
:
prod-repo-is-openpolicyagent
spec
:
match
:
kinds
:
-
apiGroups
:
[
""
]
kinds
:
[
"Pod"
]
namespaces
:
-
"production"
parameters
:
repos
:
-
"openpolicyagent"
The constraint consists of two main sections:
Notice that this constraint is of kind K8sAllowedRepos
, which matches the name of the constraint template.
The match
field defines the scope of intent for the policy. In this example, we are matching pods only in the production namespace.
The parameters define the intent for the policy. Notice that they match the type from the constraint template schema from the previous section. In this case, we allow only container images that start with openpolicyagent
.
Constraints have the following operational characteristics:
Logically AND-ed together
When multiple policies validate the same field, if one violates then the whole request is rejected
Schema validation that allows early error detection
Selection criteria
Can use label selectors
Constrain only certain kinds
Constrain only in certain namespaces
In some cases, you might want to compare the current resource against
other resources that are in the cluster, for example, in the case of
“Ingress hostnames must not overlap.” OPA needs to have all of the other
Ingress resources in its cache in order to evaluate the rule.
Gatekeeper uses a config
resource to manage which data is cached in OPA
in order to perform evaluations such as the one previously mentioned. In
addition, config
resources are also used in the audit functionality, which we explore a bit later on.
The following example config
resource caches v1 service, pods, and
namespaces:
apiVersion
:
config.gatekeeper.sh/v1alpha1
kind
:
Config
metadata
:
name
:
config
namespace
:
gatekeeper-system
spec
:
sync
:
syncOnly
:
-
kind
:
Service
version
:
v1
-
kind
:
Pod
version
:
v1
-
kind
:
Namespace
version
:
v1
Gatekeeper enables real-time feedback to cluster users for resources
that violate defined policy. If we consider the example from the
previous sections, we allow containers only from repositories that start
with openpolicyagent
.
Let’s try to create the following resource; it is not compliant given the current policy:
apiVersion
:
v1
kind
:
Pod
metadata
:
name
:
opa
namespace
:
production
spec
:
containers
:
-
name
:
opa
image
:
quay.io/opa:0.9.2
This gives you the violation message that’s defined in the constraint template:
$
kubectl create -f bad_resources/opa_wrong_repo.yaml Error from server(
container <opa> has an invalid image repo <quay.io/opa:0.9.2>, allowed repos are[
"openpolicyagent"
])
: error when creating"bad_resources/opa_wrong_repo.yaml"
: admission webhook"validation.gatekeeper.sh"
denied the request: container <opa> has an invalid image repo <quay.io/opa:0.9.2>, allowed repos are[
"openpolicyagent"
]
Thus far, we have discussed only how to define policy and have it
enforced as part of the request admission process. How do you handle a
cluster that already has resources deployed where you want to know what is
in compliance with the defined policy? That is exactly what audit sets
out to achieve. When using audit, Gatekeeper periodically evaluates
resources against the defined constraints. This helps with the detection
of misconfigured resources according to policy and allows for
remediation. The audit results are stored in the status field of the
constraint, making them easy to find by simply using kubectl
. To use audit, the resources to be audited must be replicated. For more details, refer to “Data Replication”.
Let’s take a look at the constraint called prod-repo-is-openpolicyagent
that you defined in the previous section:
$
kubectl get k8sallowedrepos prod-repo-is-openpolicyagent -o yaml apiVersion: constraints.gatekeeper.sh/v1alpha1 kind: K8sAllowedRepos metadata: creationTimestamp:"2019-06-04T06:05:05Z"
finalizers: - finalizers.gatekeeper.sh/constraint generation: 2820 name: prod-repo-is-openpolicyagent resourceVersion:"4075433"
selfLink: /apis/constraints.gatekeeper.sh/v1alpha1/k8sallowedrepos/prod-repo-is-openpolicyagent uid: b291e054-868e-11e9-868d-000d3afdb27e spec: match: kinds: - apiGroups: -""
kinds: - Pod namespaces: - production parameters: repos: - openpolicyagent status: auditTimestamp:"2019-06-05T05:51:16Z"
enforced:true
violations: - kind: Pod message: container <nginx> has an invalid image repo <nginx>, allowed repos are
[
"openpolicyagent"
]
name: nginx namespace: production
Upon inspection, you can see the last time the audit ran in the
auditTimestamp
field. We also see all of the resources that violate this
constraint under the violations
field.
The Gatekeeper repository ships with fantastic demonstration content that walks you through a detailed example of building policies to meet compliance for a bank. We would strongly recommend walking through the demonstration for a hands-on approach to how Gatekeeper operates. You can find the demonstration in this Git repository.
The Gatekeeper project is continuing to grow and is looking to solve other problems in the areas of policy and governance, which includes features like these:
Mutation (modifying resources based on policy; for example, add these labels)
External data sources (integration with Lightweight Directory Access Protocol [LDAP] or Active Directory for policy lookup)
Authorization (using Gatekeeper as a Kubernetes authorization module)
Dry run (allow users to test policy before making it active in a cluster)
If these sound like interesting problems that you might be willing to help solve, the Gatekeeper community is always looking for new users and contributors to help shape the future of the project. If you would like to learn more, head over to the upstream repository on GitHub.
You should consider the following best practices when implementing policy and governance on your clusters:
If you want to enforce a specific field in a pod, you need to make a
determination of which Kubernetes resource specification you want to
inspect and enforce. Let’s consider the case of Deployments, for example.
Deployments manage ReplicaSets, which manage pods. We could enforce at
all three levels, but the best choice is the one that is the lowest
handoff point before the runtime, which in this case is the pod. This decision,
however, has implications. The user-friendly error message when we try to deploy a
noncompliant pod, as seen in “UX”, is not going to be displayed. This is because the
user is not creating the noncompliant resource, the ReplicaSet is. This
experience means that the user would need to determine that the resource
is not compliant by running a kubectl describe
on the current
ReplicaSet associated with the Deployment. Although this might seem
cumbersome, this is consistent behavior with other Kubernetes features,
such as pod security policy.
Constraints can be applied to Kubernetes resources on the following criteria: kinds, namespaces, and label selectors. We would strongly recommend scoping the constraint to the resources to which you want it to be applied as tightly as possible. This ensures consistent policy behavior as the resources on the cluster grow, and means that resources that don’t need to be evaluated aren’t being passed to OPA, which can result in other inefficiencies.
Synchronizing and enforcing on potentially sensitive data such as Kubernetes secrets is not recommended. Given that OPA will hold this in its cache (if it is configured to replicate that data) and resources will be passed to Gatekeeper, it leaves surface area for a potential attack vector.
If you have many constraints defined, a deny of constraint means that the entire request is denied. There is no way to make this function as a logical OR.
In this chapter, we covered why policy and governance are important and walked through a project that’s built upon OPA, a cloud-native ecosystem policy engine, to provide a Kubernetes-native approach to policy and governance. You should now be prepared and confident the next time the security teams asks, “Are our clusters in compliance with our defined policy?”
3.145.201.71