8

YAML and Kubernetes Manifests

We can’t talk about GitOps and Argo CD and not have a chapter dedicated to YAML Ain’t Markup Language (YAML). We wrote a lot of YAML in all the chapters so far, and I expect you will write a lot more if you start using Argo CD, so we are going to check some ways to statically analyze it. First, we will take a close look at the most common templating engines, Helm and Kustomize, and how we can use them in order to generate the final manifests our GitOps engine is going to apply. Then, we will look at a tool that can validate the manifests we will be creating against the Kubernetes schema. After this, we will check the most common practices to enforce on the manifests, which helps us to introduce stability and predictability into the system. And we will finish the chapter by introducing one of the most interesting tools to use in pipelines in order to perform extended checks over YAML—conftest, which allows you to write your own rules in a language called Rego.

The main topics we are going to cover in this chapter are listed here:

  • Working with templating options
  • Exploring types of validation
  • Validating a Kubernetes schema
  • Enforcing best practices for your manifests
  • Performing extended checks with conftest

Technical requirements

In this chapter, we will concentrate on the steps we can take before the YAML is being merged to the GitOps repository. Once that code reaches the main branch it will get applied to the cluster by Argo CD, so we will take a look at the possible validations we can perform prior to the merge. While not mandatory to have a running installation of Argo CD, it will still be good to have one, so you can check the installations of the applications we will create in the end. We will need to have Helm (https://helm.sh/docs/intro/install/) and Kustomize (https://kubectl.docs.kubernetes.io/installation/kustomize/) installed. We will also use Docker (https://docs.docker.com/get-docker/) to run containers for all the tools we will use in the demos. So, we will not install them; instead, we will be using their container images. All the code we will be writing can be found at our official repository https://github.com/PacktPublishing/ArgoCD-in-Practice, in the ch08 folder.

Working with templating options

We want to take a look at the main YAML templating options, Helm and Kustomize, and how you can get the best out of them when used with Argo CD. We are not going to introduce how these work as we expect that you have some knowledge of these tools. If you are not familiar with them, please follow their official guides—for Helm, we have https://helm.sh/docs/intro/quickstart/, and for Kustomize, there is https://kubectl.docs.kubernetes.io/guides/. Instead, we will be focusing on how you can generate manifests from templates in the same way as done by Argo CD.

Helm

Helm is probably the most used templating option for Kubernetes manifests. It is very popular and widely adopted, so you will probably deploy most of your applications using Helm charts. The easiest way to start installing Helm charts into a cluster is to use the native declarative support of Argo CD applications. We can see how we will be able to deploy a Traefik chart with this approach (https://github.com/traefik/traefik-helm-chart). Traefik plays the role of an ingress controller, allowing us to handle incoming traffic in our Kubernetes clusters. It will make the connection between an external endpoint and an internal Service and also enables us to define all sorts of middleware components.

So, we want to deploy a Traefik chart, version 9.14.3, and for starters, we will have some parameters overridden from their default values: for example, we want three replicas instead of one; we want to enable a PodDisruptionBudget (PDB), which makes it possible to define how many pods can be unavailable or available in case of an unexpected event (a cluster upgrade can be an example of such an event); we want to have the logs level on INFO and to also enable the access logs.

You can find the Application in the ch08/Helm/traefik-application.yaml folder from the official repository of this book, and you can apply it to an existing installation like this:

kapply -f ch08/Helm/traefik-application.yaml -n argocd

This is what the Application should look like:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
 name: traefik
 namespace: argocd
spec:
 project: default
 source:
   chart: traefik
   repoURL: https://helm.traefik.io/traefik
   targetRevision: 9.14.3
   helm:
     parameters:
       - name: image.tag
         value: "2.4.2"
       - name: deployment.replicas
         value: "3"
       - name: podDisruptionBudget.enabled
         value: "true"
       - name: logs.general.level
         value: "INFO"
       - name: logs.access.enabled
         value: "true"
 destination:
   server: https://kubernetes.default.svc
   namespace: traefik
 syncPolicy:
   automated: {}
   syncOptions:
     - CreateNamespace=true 

These are just a few examples of parameters that we can override. In a real installation, we will have many more that will be different than the default ones. It’s likely we will need the Service to be of type LoadBalancer and then control different settings with the help of annotations. Then, we will need to set resources for the containers, maybe some additional Pod annotations if we are using a Service mesh, and so on. The problem that I am trying to highlight is that for once, the Application file could become really big because of many parameters to overwrite, but also we are making a tight coupling between the Argo CD Application details and the Helm chart details. It would be much better if we could separate these two parts—the Application definition and the Helm chart.

And that would be possible if we used the Helm chart we want to deploy as a subchart. Alternatively, you might also sometimes come across the term umbrella chart, which means we will be using a chart that defines a dependency on the needed chart, thus becoming a subchart, or the main chart becoming an umbrella. In this case, we will have a folder called traefik-umbrella, and inside the folder, we need at least two files to define the subchart. The first file, Chart.yaml, should look like this:

name: traefik-umbrella
apiVersion: v2
version: 0.1.0
dependencies:
- name: traefik
  version: 9.14.3
  repository: "https://helm.traefik.io/traefik"
  alias: traefik

And then, the values.yaml file will only contain our overwrites, as illustrated in the following code snippet:

traefik:
 image:
   tag: "2.4.2"
 deployment:
   replicas: 3
 podDisruptionBudget:
   enabled: true
 logs:
   general:
     level: "INFO"
   access:
     enabled: "true"

Next, we create an Application that points to the folder where these two files are located. Argo CD notices the Chart.yaml file and realizes that it is about a Helm chart. Initially, it will download all the dependencies using the helm dependencies update command, after which it will call helm template in order to generate the manifests that will get applied to the cluster. This process can sometimes lead to an error (for example, a timeout while downloading the dependencies or a dependency version that doesn’t exist), and this will only be visible in the Argo CD Application state. So, it would be good if we could catch such errors before the Helm chart is processed by Argo CD, such as in a pipeline that could run before merging the YAML changes to the main branch.

Following the subchart approach, when we do an upgrade, we need to modify the version in Chart.yaml—for example, in our case, we can upgrade from 9.14.3 to 9.15.1. The update looks like a simple one, but on the chart itself, there could be many changes made, resources added, default values modified, and other things that could affect the application behavior. So, it would be good if we could get an indication of what types of changes the new version introduces.

Note – Argo CD updates CRDs automatically

If you are familiar with how Helm 3 handles Custom Resource Definitions (CRDs), you would know that they are not updated automatically (only the initial creation is done automatically). Instead, you need to manually handle the update: https://helm.sh/docs/chart_best_practices/custom_resource_definitions/. But Argo CD will not treat them in a special way, and they will be updated as any other resource. That’s because Helm is only used to generate manifests, after which they are applied with kubectl.

We can do all this: catch any errors regarding the Helm YAML templating and check for the introduced changes with a simple script that can be run in our pipeline (the script can be found in the official repository, in the ch08/Helm/validate-template.sh file). We fetch the Traefik subchart, and then we generate all the manifests to an output directory. Then, assuming we are doing this from a feature branch and not the main one, as it would be a good idea to run the script part of a pull request (PR), we check out main and again generate templates in a different output folder this time. In the end, we call diff to get the differences between the two folders where we generated the Kubernetes manifests and we do this with a || true suffix because diff returns an error exit code if differences are found, while for us it is not an exception. The following snippet shows all the code involved:

helm dependency update traefik-umbrella
helm template traefik-umbrella –include-crds --values traefik-umbrella/values.yaml --namespace traefik --output-dir out
git checkout main
helm dependency update traefik-umbrella
helm template traefik-umbrella –include-crds --values traefik-umbrella /values.yaml --namespace traefik --output-dir out-default-branch
diff -r out-default-branch out || true

If you only want to verify that your changes will not give an error when applied by Argo CD, you will need to run just the first two commands. And if the templates give an error that is not easy to understand, there is always the --debug flag that can be used in the helm template command (more on this at https://helm.sh/docs/chart_template_guide/debugging/).

Because of Helm’s popularity, you will end up installing a lot of Helm charts with Argo CD. The main applications out there already have a Helm chart created that you can easily import and use in your clusters. So, it will be worth investing the extra time into creating a pipeline for dealing with Helm charts—the sooner you do it, the more you can benefit from it, and you will throw fewer issues at Argo CD because you will catch them beforehand with the validations that run in your pipeline.

Kustomize

Kustomize, while not as popular as Helm, is still an option that application developers like to use. Argo CD initially only offered simple manifests for Kustomize to install, and after a while, a Helm chart was created. I know many people who were early adopters of Argo CD that still deploy it with Kustomize, which had quite significant advantages as compared to Helm version 2, but this is no longer the case after Helm version 3 was released.

When Argo CD looks into the Git repository, it realizes that it is about Kustomize if it finds the kustomization.yaml file. If the file exists in the repository, then it will try to generate manifests using the kustomize build command (https://github.com/kubernetes-sigs/kustomize#1-make-a-kustomization-file), and—as usual—when the manifests are ready, they will be applied with kubectl apply to the destination cluster. So, we can apply the same logic as we did with Helm templating to generate original manifests from the main branch and updated manifests from the current working branch, and then compare them with the diff command and display the result for the user to see. As there are fewer applications deployed these days with Kustomize, and also because it is pretty similar to what we did with Helm templates, we are not going to do any demo of this.

Generating the manifests that Argo CD will apply is a good place to start to have some prior validation of the Kubernetes resources before handing them over to our GitOps operator. If there is an issue, we can find it in advance, thus avoiding the need to check the application status after the sync, to see if everything was alright on the side of Argo CD. This way, we can reduce the number of errors thrown after we merge our changes by shifting left and running these checks on every commit. Next, we will go through the types of validations we can perform for our manifests, from the ones that make checks on the YAML structure to the ones that understand Kubernetes resources.

Exploring types of validation

There is a lot of YAML in the Kubernetes world, and we have seen how we can use templating options around it in order to improve it. We usually start with small manifests, and then we keep adding resources to our Helm chart and then more environment variables, an init container, sidecars, and so on. In the end, we will have a big Helm chart or Kustomize manifests that we will apply to the cluster.

So, it would be a good idea to validate them before applying them, in order to catch as many issues as possible before they turn into bigger issues in our clusters that can only be fixed with manual intervention.

We can start with the simplest linting of all, and that’s of YAML files’ structure, without looking at the syntax—so, making abstraction that it contains Kubernetes manifests. A good tool to verify that your YAML content doesn’t have any cosmetic issues—such as too big lines or trailing spaces, or the more problematic indentation issues—is yamllint: https://github.com/adrienverge/yamllint. This embeds a set of rules that check different things on your YAML files, and it also gives you the possibility to extend it by creating your own rules.

It is easy to start using it in a pipeline because a Docker image is provided, and it can be found in both Docker Hub (https://hub.docker.com/r/pipelinecomponents/yamllint) and GitLab registry (https://gitlab.com/pipeline-components/yamllint). It has extensive documentation (https://yamllint.readthedocs.io/en/stable/index.html) and it provides many configuration options, such as disabling rules or modifying default values (for example, the max length of a YAML line can be configured).

We should not expect yamllint to surface big issues from our manifests; in the end, it checks that the structure is in order, but if we follow its rules, we will end up having YAML manifests that are easy to read and understand.

Once our YAML files look neat, we can go on and introduce more tools we can use for validating our manifests. Because Helm is the most used option for templating, there are also tools created especially for testing Helm charts. The most important one is the official chart-testing project or ct (https://github.com/helm/chart-testing). Besides validating a Helm chart, it can also install it and perform live tests on it (which under the hood will run the helm install and helm test commands). We will look closer at how we can do the linting part as that can give us fast feedback without actually performing an installation, so there is no need for a real Kubernetes cluster.

The command for static validation is ct lint (https://github.com/helm/chart-testing/blob/main/doc/ct_lint.md), and under the hood, it will run helm lint and many other actions, such as chart version validations and confirms that you bumped it in your feature branch compared to the default branch, and even checks that chart maintainers are part of your GitLab top group or GitHub organization.

It does support linting with multiple values.yaml files that you can define in the ci folder, directly under our chart, making it easy to test different logic based on different inputs provided in the YAML file.

Another advantage is that the ct project embeds the yamllint tool we described before and even another one called yamale (https://github.com/23andMe/Yamale), also used for YAML structure linting, so there is no need to use them separately—they will be run once with the ct lint command.

There is a container image provided, which makes it easy to run it as part of continuous integration (CI) pipelines or locally—for example, as part of a Git pre-commit hook (https://quay.io/repository/helmpack/chart-testing).

Besides these tools used to validate YAML and Helm charts, we have the ones that understand Kubernetes resources, including all the relations and constraints that can be defined for them. So far, there are three types of such tools, as outlined here:

  • Those that can validate the manifests according to a Kubernetes schema version so that we can run checks before upgrading to a newer version.
  • Those that verify the manifests follow a (fixed) list of best practices.
  • Those that allow you to define your own rules for validation. They are harder to use as we need to learn how to build those new rules, but they give you big flexibility this way.

Next, we will explore all the three types of tools we described, and we will start by looking at how we can prepare for the schema updates introduced by new Kubernetes versions.

Next, we are going to explore a tool that looks more promising in each of these three categories. We will check out kubeconform to perform application programming interface (API) validations, then kube-linter for making sure we follow best practices, and in the end, we will take a close look at conftest, a powerful tool that allows us to build our own rules in the Rego language.

Validating a Kubernetes schema

New Kubernetes versions can come with API version deprecations and removals. For example, in release 1.16 for CustomResourceDefinition, the apiextensions.k8s.io/v1 version was introduced, while in apiextensions.k8s.io/v1beta1 it was deprecated, and later, in version 1.22, it was completely removed. So, if you used apiextensions.k8s.io/v1beta1 for CustomResourceDefinitions starting with Kubernetes version 1.16, you would get a deprecation warning, while if you used it with version 1.22, you would get an error because the version doesn’t exist anymore.

Usually, the problem is not only about applying older and unsupported API versions to a Kubernetes cluster; instead, it may be more likely that you have a deprecated version already installed with some applications, after which you upgrade the cluster to a version where the API is completely removed. Normally, you should catch it while you upgrade the development or testing clusters, but there is always a slight chance of missing the error and ending up with it in your production cluster.

So, a better way would be if we could verify the manifests we want to apply against specific versions of Kubernetes before we actually apply them, and we can achieve this in a few ways. First, we can run kubectl apply with the --dry-run=server option, which will take the manifests and send them to the server for validation, but the changes will not be saved. We also have a corresponding --dry-run flag that we can use for the helm install command. This approach works very well, but the problem is that you need a Kubernetes cluster with a specific version up and running. For example, if you have now in production Kubernetes version 20, and you want to validate your manifests for versions 21 and 22, you would need one cluster for each version. If you start them on demand, it takes time—at least a couple of minutes nowadays—so, that’s not fast enough feedback that you can wait for every commit. If you keep all three versions up and running all the time, that will mean you would need to pay a higher cost to the cloud provider for the infrastructure, and it also means additional effort to support them.

Another way to validate your manifests would be to use the schema of the Kubernetes API. This is described in the OpenAPI format (https://kubernetes.io/docs/concepts/overview/kubernetes-api/), with the schema file being in the Kubernetes Git repository, which can be transformed into a JavaScript Object Notation (JSON) schema that can then be used to validate our YAML manifests (as YAML is a superset of JSON). And we have tools that do all these transformations and validations automatically, so you don’t need to deal with them, just pass via parameters the Kubernetes version to use for validation.

The first tool that made this possible was kubeval (https://github.com/instrumenta/kubeval), but lately, it doesn’t look as though this project is under active development anymore. Their latest version, v0.16.1, is around a year old, and there haven’t been any more accepted PRs for many months now. This opened the way for other tools to accomplish schema validation, and among them, it seems that kubeconform (https://github.com/yannh/kubeconform) is the most advanced one.

So, let’s see how we can validate the manifests we generated for Traefik with different schema versions. We will not need to install kubeconform as we will use the container image to run the tool. The latest version right now is v0.4.12, and I will be using the amd64 version of the container image as that’s the most common (and also my) central processing unit (CPU) architecture. You also have Advanced RISC Machine (ARM) container versions; you can find them all here: https://github.com/yannh/kubeconform/pkgs/container/kubeconform. Going back to the Traefik Helm chart, we will first need to generate manifests, and then we will run kubeconform validations for versions 21 and 22.

CRDs are a little bit different than other types in Kubernetes; their schema is based on OpenAPI v3 and not v2 (https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions). That’s why kubeconform (and also kubeval) can’t handle validations for them very well because their tools are failing to import the OpenAPI v3 schema. So, if you run kubeconform directly to validate CRDs, you will get an error that the schema was not found. We will see this in action for our Traefik Helm chart; the commands can be found in the ch08/kubeconform/validate-schema.sh file. We will run kubeconform with its container image, passing the path to where the manifests were generated as a volume to the container (this will work only for Linux distributions and macOS. For Windows, please replace $(pwd)/out with the full path to the /out folder). The code is illustrated here:

helm dependency update traefik-umbrella
helm template traefik-umbrella –include-crds --values traefik-umbrella/values.yaml --namespace traefik --output-dir out 
docker run -v $(pwd)/out:/templates ghcr.io/yannh/kubeconform:v0.4.12-amd64 -kubernetes-version 1.21.0 /templates

The command will give us several warnings, such as these:

/templates/traefik-umbrella/charts/traefik/crds/traefikservices.yaml - CustomResourceDefinition traefikservices.traefik.containo.us failed validation: could not find schema for CustomResourceDefinition
/templates/traefik-umbrella/charts/traefik/templates/dashboard-hook-ingressroute.yaml - IngressRoute RELEASE-NAME-traefik-dashboard failed validation: could not find schema for IngressRoute

This means we are missing definitions for these types, and we have two options to choose from—either we skip validating them or we provide the missing definitions. In our case, we would still like to validate CRDs, but we can skip IngressRoute as that is something introduced by the Traefik chart and not related to a Kubernetes version. For skipping this, we have the –skip flag. For the CRD definition, there are already generated ones under https://jenkins-x.github.io/jenkins-x-schemas/. For validating version 1.21, we will use apiextensions.k8s.io/v1beta1 as it is still available, while for version 1.22, we will use apiextensions.k8s.io/v1 as the previous one was removed. For version 1.21, this will be the command:

docker run -v $(pwd)/out:/templates ghcr.io/yannh/kubeconform:v0.4.12-amd64 -kubernetes-version 1.21.0 -skip IngressRoute -schema-location default -schema-location 'https://jenkins-x.github.io/jenkins-x-schemas/apiextensions.k8s.io/v1beta1/customresourcedefinition.json' /templates

And we will not get any error, which is the expected behavior. Then, the command for version 1.22 will look like this:

docker run -v $(pwd)/out:/templates ghcr.io/yannh/kubeconform:v0.4.12-amd64 -kubernetes-version 1.22.0 -skip IngressRoute -schema-location default -schema-location 'https://jenkins-x.github.io/jenkins-x-schemas/apiextensions.k8s.io/v1/customresourcedefinition.json' /templates

And here, we get a list of errors for every CRD similar to this item, which is what we were expecting because of the differences between v1beta1 and v1 of apiextensions.k8s.io:

/templates/traefik-umbrella/charts/traefik/crds/ingressroute.yaml - CustomResourceDefinition ingressroutes.traefik.containo.us is invalid: For field spec: Additional property version is not allowed

Note – Kubernetes Is Moving To OpenAPI v3

Starting with version 1.23, Kubernetes is starting to migrate to OpenAPI v3, for now only in an alpha state. The main advantage for Kubernetes schema validating tools is that now the schema is published in Git in v3 format and that there is no need to transform CRDs from v3 to v2 to be validated: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2896-openapi-v3.

There is an active discussion in kubeconform GitHub issues on how to make the validations of CRDs easier. You can follow it here: https://github.com/yannh/kubeconform/issues/51.

Validating the schema of each version in the CI part before applying the changes allows us to shift left and catch any possible issues as fast as possible. This is one type of validation that we can do by statically analyzing the YAML manifests, but there are others we can perform. So, next, we are going to see a tool that checks the Kubernetes manifests for a series of best practices.

Enforcing best practices for your manifests

When you go live with your applications on Kubernetes, you want things to go as smoothly as possible. In order to achieve that, you need to get prepared in advance and be up to date with all the important items when deploying on Kubernetes, from making sure that all Pods have defined readiness and liveness probes (https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) and that the containers have memory and CPU limits and requests (https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/), up to checking that the latest tag is not used for container images (https://kubernetes.io/docs/concepts/containers/images/). And then, you need to make sure that the product teams are following these practices when building their Helm charts. Most likely, your organization plans to add more and more microservices to production, and also, you find new and more complex practices that the developers need to be aware of. So, ideally, it would be great if you could automate such validations and add this as a gate in your CI pipelines.

The good thing is that there are tools that can accomplish such checks, and two of the most known ones are kube-score (https://github.com/zegl/kube-score) and kube-linter (https://github.com/stackrox/kube-linter). These are pretty similar in terms of the end result, and they are both easy to add to CI pipelines as they provide container images. kube-score is the older project, so it might be used by more people, but kube-linter has more GitHub stars (2k for kube-linter, compared to 1.8k for kube-score as of August 2022). In the end, I will go with kube-score for our demo as I like the fact that it provides container images that also bundle Helm or Kustomize. This will make our tasks a little easier for generating manifests to scan, and the fact is, right now, kube-linter is declared to be in an alpha stage. That may change in the future, so if you have to pick one or the other, you should make your own thorough analysis.

The checks that kube-score performs are categorized into critical and warning ones, and the tool will automatically return exit code 1 if it finds any critical issues. It has a flag to also exit with code 1 if only warnings have been found: --exit-one-on-warning. It also has a list of optional validations that could be performed that you would need to enable with the --enable-optional-test flag, followed by the name of the test. There is also the possibility to ignore a check that is performed by default with the --ignore-test flag followed by its name, so if you know that you are not ready yet for some of them, you can stop checking them for a period.

Next, let’s see how we can perform some checks for our Traefik installation and how we can address some of the findings, skip some of them, or enable some of the optional ones. We will do this using the container image of the 1.14.0 version: zegl/kube-score:v1.14.0. kube-score also provides container images with Helm 3 or Kustomize bundled, so if your continuous integration/continuous deployment (CI/CD) engine supports running containers directly (such as GitLab CI/CD), you can use those and perform the manifest generation inside the container. In our case, we will create a script that uses the container image as we would run it in a virtual machine (VM). We will perform a check without any parameters, based on the default values. For a complete list of which tests are enabled by default and which are optional, you can check out the following link: https://github.com/zegl/kube-score/blob/master/README_CHECKS.md.

It is a little harder to get the docker run command running this time because kube-score expects the exact YAML filename as input, so the part with globs from the path (/**/*.yaml) is handled by the shell. To make it work in our case, the relative paths inside the container and on the machine have to be the same. For this, I set the working directory in the container to / (with the -w flag), while on the machine we need to be in the directory where the out folder is. The script can be found in the official repository, in the file located at ch08/kube-score/enforcing-best-practices.sh. The code is illustrated here:

helm dependency update traefik-umbrella
helm template traefik-umbrella –include-crds --values traefik-umbrella/values.yaml --namespace traefik --output-dir out 
docker run -v $(pwd)/out:/out -w / zegl/kube-score:v1.14.0 score out/traefik-umbrella/charts/traefik/**/*.yaml

The command results in a series of critical issues and warnings, most of them being on the Deployment manifest. We will post only a few of them here:

apps/v1/Deployment RELEASE-NAME-traefik                                       
    [CRITICAL] Container Image Pull Policy
        · RELEASE-NAME-traefik -> ImagePullPolicy is not set to Always
            It's recommended to always set the ImagePullPolicy to Always, to
            make sure that the imagePullSecrets are always correct, and to
            always get the image you want.
........................................
    [CRITICAL] Container Resources
        · RELEASE-NAME-traefik -> CPU limit is not set
            Resource limits are recommended to avoid resource DDOS. Set
            resources.limits.cpu
        · RELEASE-NAME-traefik -> Memory limit is not set
            Resource limits are recommended to avoid resource DDOS. Set
            resources.limits.memory
        · RELEASE-NAME-traefik -> CPU request is not set
            Resource requests are recommended to make sure that the application
            can start and run without crashing. Set resources.requests.cpu
        · RELEASE-NAME-traefik -> Memory request is not set
            Resource requests are recommended to make sure that the application
            can start and run without crashing. Set resources.requests.memory
    [WARNING] Deployment has host PodAntiAffinity
        · Deployment does not have a host podAntiAffinity set
            It's recommended to set a podAntiAffinity that stops multiple pods
            from a deployment from being scheduled on the same node. This
            increases availability in case the node becomes unavailable.

Here, we can also see what the command would look like if we would like to skip the check for the container image pull policy to be Always and enable the optional test for validating that all your containers have a seccomp policy configured, used to restrict a container’s system calls (syscalls):

docker run -v $(pwd)/out:/out -w / zegl/kube-score:v1.14.0 score --ignore-test container-image-pull-policy --enable-optional-test container-seccomp-profile out/traefik-umbrella/charts/traefik/**/*.yaml

kube-score has performed more than 30 checks today if we count both the default and optional ones, but even with this number, it is not a comprehensive check of your manifests. It does try to give you some insights into things that can possibly go wrong, and because of this, it can also be considered a good learning resource. When you start checking out more details on the tests it is performing, you can find a lot of relevant information on how to make your cluster stable and reliable.

Performing extended checks with conftest

Open Policy Agent (OPA) (https://www.openpolicyagent.org) is an engine that can validate objects prior to performing a change on them. Its main advantage lies in the fact that it doesn’t come with a predefined list of checks; instead, it supports extensible policies as they are based on rules created in the Rego language (https://www.openpolicyagent.org/docs/latest/policy-language/). You might have heard of OPA in conjunction with Kubernetes: that it can be used like an admission controller (a part usually handled by the Gatekeeper project: https://github.com/open-policy-agent/gatekeeper) in order to add a pre-validation of the objects you want to apply in a cluster. OPA is really successful at adding policy-as-code checks for Kubernetes, but it is more than that: it is an engine that can be run almost everywhere we have a runtime, including in our CI/CD pipelines.

For Kubernetes, you can create your own custom rules to be enforced by OPA. For example, you can have a policy that says every namespace needs to have a label that specifies the team that owns it, or a policy that states that every container you deploy on the cluster needs to be from a pre-approved list of registries, or even registry users or organizations. But it would be even better if we could run such policies in the pipeline without actually applying them on the cluster so that we would get feedback much faster from our CI/CD system before we could merge this and hand over the changes to Argo CD, and there is an open source tool that can accomplish that with an embedded OPA engine: conftest (https://www.conftest.dev).

conftest allows us to create policies in Rego and run them in our pipelines, having the OPA engine validate the manifests. The rules that we create for conftest are not exactly the same as the ones we use in the Kubernetes admission controller, but they are similar, and it is easy to adapt them to one side or the other.

We will go through an example to check if our container image either comes from our private registry or if it can be from Docker Hub but needs to be an official image (https://docs.docker.com/docker-hub/official_images/), which are much safer to use. Let’s say that our private registry is in Google Cloud (https://cloud.google.com/container-registry) and that all images should start with eu.gcr.io/_my_company_/, while the Docker Hub official images are the ones that don’t have any user, so their format should be without any additional / sign, such as traefik:2.4.2 and not zegl/kube-score:v1.14.0.

Rego is not an easy language that can be picked up in a few hours, but it is also not too complicated once you get beyond the basics. Here are a few things to consider when you start writing or reading Rego. This is not a comprehensive list, but something to get you started:

  • Every statement returns true or false (statements such as assignment always return true).
  • input is a special keyword and it is the root of the JSON (or YAML) object it analyzes.
  • Rules are a set of instructions that allow us to take a decision, while functions are similar to functions from other programming languages—they take an input and return a value.
  • On all the expressions inside a function or rule AND is applied, which means they all need to be true for the end result to be true.
  • A function or rule can be defined multiple times, and on the result of all of them OR will be applied.
  • With conftest, we are allowed to use only special rules, such as deny, violation, and warn. If the result of the rule is false then conftest will exit with an error, and this way, we can stop the pipeline.

You will find a lot of resources about Rego online and how you can use it with OPA or conftest, and I recommend you also check out this video, which is an OPA deep dive from 2018 (a little older, but still relevant to how Rego works):

https://www.youtube.com/watch?v=4mBJSIhs2xQ

In our case, we would define two functions both named valid_container_registry, the first checking if the registry used is Docker Hub with an official image, meaning there is no / sign in the image name, while the second function will verify that the first and second values for a split of the image by / will be eu.gcr.io and _my_company_. The code can also be found in the official repository in the ch08/conftest folder. The policy that we defined and that you can see next is in the policy/deployment.rego file, because conftest expects by default all policies to be under the policy folder, while the resources we analyzed are under the manifests folder. This is what the functions and the deny rule look like:

package main
 
deny[msg] {
   input.kind == "Deployment"
   container_registry_image := input.spec.template.spec.containers[_].image
   output = split(container_registry_image, "/")
   not valid_container_registry(output)
   msg = "invalid container registry in the deployment"
}
valid_container_registry(imageArr) = true {
   count(imageArr) == 1
}
valid_container_registry(imageArr) = true {
   count(imageArr) > 2
   imageArr[0] == "eu.gcr.io"
   imageArr[1] == "_my_company_"
}

And you can run conftest with this command, using its container image and attaching a volume to the container that has the policy and the manifests to analyze:

docker run -v $(pwd):/project openpolicyagent/conftest:v0.30.0 test manifests/

The result should be similar to the following:

2 tests, 2 passed, 0 warnings, 0 failures, 0 exceptions

The image that we are using in the analyzed Deployment, traefik:2.4.2, is an official Docker Hub one, so that’s why our checks are passing. Feel free to modify the manifests for test purposes and run conftest again to see how it fails.

Writing policies is not an easy task, and the community started gathering them in repositories in order to share them. You have projects such as https://github.com/open-policy-agent/library or https://github.com/redhat-cop/rego-policies, and I also want to share one repository that tries to gather all the OPA/Rego/conftest resources together, such as documentation, articles, books, additional tools, or package libraries: https://github.com/anderseknert/awesome-opa.

conftest is much more than Kubernetes manifests’ validation. Besides YAML/JSON, it can do checks for many other declarative languages. It currently supports HashiCorp Configuration Language (HCL) and HCL2, so we can write policies for Terraform on infrastructure provisioning, Dockerfiles to check container creation, and others such as initialization (INI), Tom’s Obvious Minimal Language (TOML), Extensible Markup Language (XML), Jsonnet, and so on, which means it is worth checking it and trying it out as it has a lot of potential for defining gates in many types of pipelines.

Summary

In this chapter, we went through some of the options we have to statically analyze Kubernetes YAML manifests. We saw how we can generate manifests from templating engines such as Helm or Kustomize, and then we checked some tools that can perform several types of jobs: kubeconform will validate your manifests against the OpenAPI Kubernetes schema, kube-score will check that you follow a predefined list of best practices, while conftest can do everything because it allows you to define your own rules and policies for the manifests to follow. All these validations can be easily added to your CI pipeline, and we have seen examples of how to use them directly with their container images.

In the next chapter, we will take a close look at what the future might bring for Argo CD and how it can be used to democratize and standardize GitOps with GitOps Engine, an innovative project built with the help of other organizations from the community that is already seeing some good adoption rates in the industry.

Further reading

For more information, refer to the following resources:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.101.192