Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

B. Schmeling, M. DargatzKubernetes Native Developmenthttps://doi.org/10.1007/978-1-4842-7942-7_6

6. Operations As Code with Kubernetes Operators and GitOps

Benjamin Schmeling¹ and Maximilian Dargatz²

(1)

Bensheim, Germany

(2)

Halle (Saale), Sachsen-Anhalt, Germany

In the last chapter, we have learned how to deploy applications in an automated way via Pipelines and GitOps into different environments. The application is now installed and runs in our Kubernetes cluster. We can now sit back, relax, and observe our application running while our users enjoy our software.

But wait, let us anticipate what could happen after Day 1, when our application attracts more and more users, more instances of the application get deployed to the cluster, and we add new features. Sooner or later, we have to conduct planned (e.g., application updates) and unplanned (e.g., something goes wrong) operational tasks. Although we have Kubernetes in place, there are still challenging operational tasks left that are not yet automated because they are, for example, application specific and cannot be addressed by existing Kubernetes capabilities, for example, migrating an application’s database schema. But who could take care of these tasks? To find the answer to this question, the last part of our journey leads us to the rendezvous between developers and operations: Kubernetes Operators.

Kubernetes Operators – How to Automate Operations

With Pipelines and Helm as a way to package and install our application on one hand and GitOps as a mechanism to make deployments declarative on the other hand, we have the necessary means to bring our application into production. From the application lifecycle point of view, we transition into the runtime phase. This is the phase where operations takes care of the application. This involves, among others, the following Day 2 tasks:

Monitor the stability and the performance of the application.
When the monitoring revealed issues, depending on the type of issue
- Reconfigure/repair the application.
- Scale up the application.
- Resize storage volumes.
- Notify the developers to change their code.
- Conduct failover activities.
When a new version has been released
- Decide when to deploy the new version.
- Update the application.
- Migrate data if necessary.
Proactively create backups and restore from a previous state if necessary.

In Chapter 2, we mentioned different types of platform services that can help us address Day 2 operations, for example, monitoring and tracing. Another example is the Kubernetes health checks that can monitor our application and restart it when it is unhealthy. However, this is a quite limited feature. Maybe it is not necessary to restart the application but just change its configuration to fully recover. Or maybe it is a permanent fault caused by a bug in the code, then restarting won’t suffice. The limitations we are experiencing with platform services originate in their generality. They aim at modularizing cross-cutting concerns for all types of applications. For Day 2 operations, we will often face application-specific challenges.

Consider a backup process, for example. How do I store my backup and what kind of preparations need to be made before I can, for instance, create a dump of my database? Or what do I need to do to migrate my data to a new schema? We require someone that has a deep understanding of the application, its requirements, and specifics. What if we could automate all this and put this into code that we could package and deliver with our application? Kubernetes Operators – in contrast to Helm and others that provide repeatable deployments – cover the full lifecycle of an application.

What Is an Operator?

An Operator is a method of packaging, deploying, and managing an application. It encapsulates the operational knowledge for a specific application it has been written for. This allows anybody who is developing Kubernetes-native applications to package and deliver their actual software accompanied by a component that automates its operations. Who knows better how to operate and manage software than the vendor itself? Figure 6-1 depicts the Operator managing its operands which are usually various Kubernetes resources such as Deployments, Services, or even Custom Resources. For instance, we could write an Operator that manages all Kubernetes resources for our Local News application.

Figure 6-1
The Operator managing its operands

The Operator Design Pattern

The Operator design pattern describes the high-level concepts of an Operator. It constitutes three components: the code that captures the operational knowledge, the application to be managed by the Operator, and a domain-specific language to express the desired state in a declarative way. Figure 6-2 illustrates the interaction of the components. The Operator code watches the desired state and applies changes whenever it deviates from the actual state. Furthermore, it reports the current status of the application.

Note

Wait a minute! The terms actual and desired state should be familiar from the last chapter. Are we talking about GitOps? Actually, no. In Chapter 1, the Kubernetes-Controller-Manager was briefly introduced, and it was pointed out that there are many “built-in” controllers for the core Kubernetes Resources. One example is the Replication Controller that ensures the declared number of Pods is running. If you intentionally or accidentally delete a Pod, the Replication Controller takes care to bring it up again. An Operator can do the same for you – but not just for a ReplicaSet or a Deployment but for your entire application with all its Kubernetes Resources. Moreover, we will soon discover that Operators love YAML. And that’s when GitOps can play a role. But this is something that will be discussed in the last part of this chapter.

The pattern we have described so far is a high-level concept and hence platform independent. It could be implemented in various ways on different platforms. Let us now look at how this is implemented in Kubernetes.

Kubernetes Operators

In Kubernetes , we have the perfect means to implement the Operator Pattern. Firstly, we can extend the Kubernetes API by application-specific endpoints so both human operators (e.g., via kubectl) and code can interact with the application. Custom Resource Definitions can be used to capture the desired state. Its data structure can be customized to serve as an abstraction layer for application-specific configuration. We could, for instance, express that we want to have two instances of our application in the spec section. The same applies to the status of the application that we can report in the corresponding status section. Depending on the status of its components/resources, we can report that the application is, for instance, either healthy or unhealthy.

Secondly, we need some means to encapsulate the code that watches this CRD and runs a control (reconciliation) loop that constantly compares the actual state with the desired one. This is exactly what a Kubernetes controller does, and, fortunately, we have already described how to write our own custom Kubernetes controller in Chapter 4.

Operator Capabilities

Let us now – after shedding some more light on what an Operator is and which components it constitutes of – look closer at the Day 2 operations that we have already mentioned. How should an Operator support us with these?

Installation

An Operator should automate the installation process of all necessary resources for an application just like Helm does. Thereby, it should check and verify that the installation process went as expected and report the health status as well as the installed version of the application. But the Operator goes much further than Helm, and we will soon discover that Helm can even serve as the perfect starting point to build an Operator.

Upgrade

An Operator should be able to upgrade an application to a newer version. It should know the dependencies and the required steps to fulfill the upgrade. Furthermore, the Operator should be able to roll back if necessary.

Backup and Restore

An Operator should be able to create backups for the data it is managing, especially if it is a stateful application. The backup can either be triggered automatically or manually. The time and location of the last backup should be reported. Furthermore, it should be possible to restore the backup in such a way that the application is up and running after a successful restore. The manual triggering of backups and restores could be implemented based on Custom Resources.

Reconfigure/Repair

The reconfigure and repair capability is also often referred to as auto-remediation and should ensure that the Operator can restore the application from a failed state. The failed state could, for instance, be determined via health checks or application metrics. The Operator that has deep knowledge of the application knows which remediating action it should take. It could, for example, roll back to a previous well-behaving configuration.

Collect Metrics

The Operator should provide metrics about the operational tasks which were handled, potential errors, and the (high-level) application state. It may also take over the task of providing metrics for its operands, that is, the applications it is managing.

(Auto)scale

Think of a complex application with several components. How do you know for which components a scale-up actually leads to increased performance? The Operator should know the bottlenecks and how to scale the application and its resources horizontally and/or vertically. It should finally check the status of all scaled resources. For autoscaling, it should collect the metrics the scaling should rely on. When a certain threshold is reached, it should scale up or down.

What Is a Capability Level?

To be able to communicate the expected capabilities of an Operator, there is a range of five capability levels (often referred to as maturity levels) depicted in Figure 6-3. Each level builds on the previous one, that is, it requires all capabilities from the previous level, for example, a level 3 Operator must also support basic install as well as seamless upgrades. Let us now look at the different levels and what they should include in the following.

Figure 6-3
The Operator capability levels

Level 1 – Basic Install

The Operator is able to provision an application through a Custom Resource. The specification of the Custom Resource is used to configure the application. For example, our Local News application can be installed by creating a Custom Resource. The Operator creates all the Deployments, Services, and Ingress for our components. It also deploys the database and initializes an empty database schema. When all services are ready, it reports this in the status field of the Custom Resource.

Level 2 – Seamless Upgrades

At the seamless upgrade level, the Operator can upgrade its version as well as the version of its managed application. This can but does not necessarily need to be coincident. One common approach is to upgrade the application during the upgrade of its Operator. Another way to implement this is that the upgrade of the application is triggered by a change in a Custom Resource. For example, the upgrade of our Local News application might be managed by a level 2 Operator that updates the container image versions of the managed Deployment resources and migrates the database schema if necessary. It should also know the sequence to upgrade the components. Thereby, it would try to minimize the downtime of the application. Each version of the Operator updates to its desired Local News application version.

Level 3 – Full Lifecycle

A level 3 Operator is able to create backups, restore from those backups, handle complex reconfiguration flows if necessary, and implement failover and failback. For example, the database of the Local News application could be dumped once per day and stored in an object store. We could then create a Custom Resource to express that the backup should be restored by the Operator.

Level 4 – Deep Insights

An Operator that is on level 4 provides deep insights about the managed application and itself. More specifically, it collects metrics, yields dashboards, and sets up alerts to expose the health status and performance metrics. For example, all components that constitute the Local News application could provide metrics about the number of requests per second (rate), number of errors (errors), and the amount of time a request takes (duration). These would be the key metrics defined by the RED method.¹ Furthermore, the Operator could create alerts with thresholds for some of its metrics such as the free disk space for the volume the database is using.

Level 5 – Autopilot

Operators at the highest level minimize any remaining manual intervention. They should be able to repair, tune, and autoscale (up and down) their operands. For example, the Operator for the Local News application could monitor the throughput of the Feed-Scraper component and discover that there are more incoming feeds to be analyzed than can be processed. Hence, it would move the analysis component to another Kubernetes compute node with less utilization.

To reach the highest capability level, you need to put a lot of effort into the development of the Operator. But this may pay out in the end since you gain a lot of operational automation. Furthermore, the Operator that you deliver with your application could well differentiate you from your competitors. Let us say you are looking for a certain application to run on Kubernetes, for example, a database, and you find various products with similar features and properties. However, the vendor of one of the products offers a level 5 Operator. You might decide in favor of the product that is backed by an Operator.

Develop Your Own Operator

After we have introduced the basic concepts and components of an Operator, let us now create our own with the help of the Operator SDK.² The Operator SDK is part of the Operator Framework³ and supports you to build, test, and package Operators. The initial design for our Operator is shown in Figure 6-4. We will create a CRD called LocalNewsApp that is watched by our Operator which in turn manages the resources for the Local News application. How to achieve this depends on the type of language used for implementing the Operator.

Figure 6-4
The initial design of the Local News Operator

The Operator SDK provides code scaffolding for three types of languages (at the time of writing) that can be used to build an Operator:

The Helm-based Operator takes existing Helm charts (or creates new ones) and turns them into an Operator. It creates a simple CRD that accepts your well-known Helm parameters as part of the spec of your Custom Resource. In addition, it generates a controller that takes care of synching the state of the Kubernetes resources with that of your Helm chart. Every time you change your Custom Resource, it will create a new Helm release, and vice versa, if you change resources in the cluster, it will change them back to the state described by the Custom Resource. This is the easiest but most limited type of Operator since it is only able to implement capability levels 1 and 2.
The Ansible-based Operator turns an Ansible⁴ playbook into an Operator. It creates a simple CRD, and whenever a Custom Resource of this kind will be created, updated, or deleted, an Ansible role is run by the generated controller. The contents of the spec key can be accessed as parameters in the Ansible role. We won’t elaborate on this any further as it is out of the scope of this book. If you would like to find out more about this approach, please refer to the docs about Ansible Operators.⁵
The Go-based Operator generates a Go code skeleton with a reconciliation loop that can be extended with your custom logic. Furthermore, it allows you to express your CRD as a Go struct. This is the most flexible approach of the three but also the most complex one.

In the following, we will describe how to build a Helm-based and a Go-based Operator. We will demonstrate this with the Local News application. You can either create a new project folder and follow our approach step by step, or you can have a look at the resulting projects that can be found in this book’s Git repository⁶ in the folder k8s/operator. You can copy all commands that we will use in the following from the file snippets/chapter6/commands-chap6.md.

Helm-Based Operator with the Operator SDK

Foremost, since both approaches we are demonstrating in this chapter rely on the Operator SDK, we will first need to install it as described by the installation guide.⁷ In the following, we used version v1.15.0 which can be found in the corresponding GitHub release.⁸ Install it and then you will be able to use the CLI with the command operator-sdk.

Before we start, let us first give a brief summary of what we will do with the Helm Operator. We will

1.
Use the Operator SDK to generate a project for us that uses our existing Helm chart for the Local News application. Thereby, it generates a LocalNewsApp CRD that can be used to parameterize our Helm chart.
2.
Run the Operator locally from our project structure. It communicates with the cluster via the Kubernetes API.
3.
Deploy a simple instance of the Local News application by creating a LocalNewsApp Custom Resource.
4.
Modify the LocalNewsApp Custom Resource and add a custom feed URL to demonstrate how to define Helm parameters via the spec section of the CRD.
5.
Demonstrate what happens when we change resources managed by the Operator.
6.
Delete the LocalNewsApp to show how it cleans up its managed resources.
7.
Generate a second CRD to create Feed-Scraper Deployments similar to what we did in Chapter 4.
8.
Finally, deploy our Operator to the Kubernetes cluster instead of running it locally. This requires building and pushing a container image.

Initializing the Project

To generate a new Operator project, make sure that your Kubernetes cluster is up and running and the kubectl config points to that cluster, for example, check by running “kubectl config current-context”. Then create a new folder k8s/operator/localnews-operator-helm and run the commands in Listing 6-1.

cd k8s/operator/localnews-operator-helm

operator-sdk init --domain apress.com --group kubdev --version v1alpha1 --kind LocalNewsApp --plugins helm --helm-chart ../../helm-chart

Listing 6-1

Initialize the Helm-Based Operator from the Existing Helm Chart

Please note that you can create your Operator also in another folder, but then you need to change the path to the Helm Chart and some of the commands accordingly. After successfully running the commands, a project layout – depicted in Figure 6-5 – with various subfolders has been created that we will explain in the following.

Let us have a look at the different files and folders that have been generated:

1.
A folder helm-charts that contains a copy of your Helm chart or a new basic Helm Chart that you can extend. By pointing to our own Helm Chart, you will find a copy of it here, but we will, later on, create an additional one from scratch.
2.
A folder config/crd which contains a generated CRD YAML. The group, names, and versions are derived from our operator-sdk init parameters such as domain, group, version, and kind. The CRD for the Local News application looks similar to the one introduced in Chapter 4. However, the schema of the spec is not further defined. It is declared as the type object with x-kubernetes-preserve-unknown-fields: true. This allows us to define all parameters available from our Helm Charts as part of the spec. However, we could also define parameters that do not exist in the Helm chart. Those parameters will be ignored. Please note that this is a generated YAML skeleton that you can and should extend. That is, you could define a stricter schema that restricts the contents of the spec only to those parameters that are supported by your Helm Chart.
3.
A folder config/manager which contains a Namespace, a Deployment, and a ConfigMap for the controller manager. It manages the localnewsapp-controller that implements the bridge between the LocalNewsApp CRD and our Helm chart. It watches our Custom Resources that describe the desired state and also the actual state, which are, for example, the Deployments and Services of the Local News application. If we change the Custom Resource, for instance, this triggers a new Helm release to match the desired and actual state. But it works also in the other direction. When we change the actual state by changing or deleting, for example, a Deployment of the Local News application, it gets immediately rolled back or replaced.
4.
A folder config/default containing the default “kustomization” as the starting point for Kustomize. The following info box provides more information about Kustomize.
5.
A folder config/prometheus containing a ServiceMonitoring CR to tell a Prometheus monitoring server that the metrics provided by the controller should be scraped. If you want to use this, you will need to enable this in the config/default/kustomization.yaml by uncommenting all sections with ‘PROMETHEUS’. The Operator will provide a set of standard metrics about the Go runtime performance (yes, the Helm Operator controller manager is written in Go), reconciliation stats, resources created, REST client requests, and many more.
6.
A config/samples folder that will contain an example CR with all available default values from the Helm chart. This is copied from the values.yaml of your chart into the spec section of the sample CR.
7.
A config/rbac folder with various Roles, RoleBindings, and a ServiceAccount for the controller manager. The set of roles and bindings allows the controller manager to manage its CRDs. In addition, you will find an editor and a viewer role for general use as well as a set of other more advanced resources that we will not use in our example.
8.
A config/scorecard folder to run tests against our operator. We will not dive into this topic further. If you are interested, you can read the scorecard docs.⁹
9.
A config/manifests folder to generate a manifests directory in an Operator bundle. We will explain the details in the section “Advanced: The Operator Lifecycle Manager – Who Manages the Operators?”.
10.
A watches.yaml defines what kind of resource the Operator should watch. You will find your CRD here, but you could also add additional watches using the operator-sdk create api command.
11.
A Dockerfile to build a container image for the controller manager based on a standard Helm Operator image. It adds the helm-charts folder and watches.yaml to the image.
12.
A Makefile to build and push the container image for your Operator, respectively the controller manager that will enforce the desired state as soon as we deploy Custom Resources. Those will trigger the deployment of our Local News application. It additionally provides various deployment commands such as (un)installing the CRDs as well as (un)deploying the controller manager. We will get to know many of them in the following.
13.
A PROJECT file containing metadata about the generated project structure for the Operator SDK.

Note

Kustomize¹⁰ is a template-free way to customize application configuration and is directly built into the kubectl CLI. You can enable it by running “kubectl apply -k”. The kustomization.yaml lists the resources that should be customized. Furthermore, you can configure it to generate new resources and transform existing resources, for example, by patching them. The Operator SDK makes use of Kustomize, and it can also be used in the context of ArgoCD.

As you can see, the Operator SDK generates quite a bit of configuration and code that gives us a jumpstart for writing our Operator. The generated layout is just a starting point though. Since it is only generated once (as long as you won’t repeat the init command of course), you should extend it as required. Nevertheless, the code that we have generated so far is fully functional. So let us have a look at how we can run our Operator.

Running Our Operator

The simplest way to run our Operator is to use our Makefile within the new project folder. Ensure that nothing is occupying port 8080 because the Operator will use it. Then, run the command “make -C k8s/operator/localnews-operator-helm install run”. This will build our CRDs with Kustomize and deploy them to the cluster.¹¹ Furthermore, it will download the helm-operator binary¹² if necessary, install it as bin/helm-operator in your project folder, and start it with the parameter run. This binary allows us to run our Helm Operator locally from the project folder. Although the Operator is not running in your cluster but rather directly on your local machine, it communicates via the Kubernetes API with your Kubernetes cluster. If you work with multiple clusters, this will be the one currently set for kubectl. The output looks like the contents of Listing 6-2. Since the process is running until we press Ctrl+C, we can follow the logs and look at what happens during our next steps.

...

https://customresourcedefinition.apiextensions.k8s.io/localnewsapps.kubdev.apress.com created

/usr/local/bin/helm-operator run

...

{"level":"info","ts":1636412827.636159,"logger":"cmd","msg":"Watch namespaces not configured by environment variable WATCH_NAMESPACE or file. Watching all namespaces.","Namespace":""}

...

Listing 6-2

Run the Operator from Outside the Cluster

Create an Instance of the Application

This was quite simple, but in order to trigger the logic of our Operator, we first need to create a Custom Resource as defined in Listing 6-3. To avoid stopping the running Operator process, we should open a new terminal and switch back to the repository’s base folder of this book’s Git repository.¹³ Then we create the Custom Resource with “kubectl apply -f snippets/chapter6/news-sample1.yaml”.

apiVersion: kubdev.apress.com/v1alpha1

kind: LocalNewsApp

metadata:

spec: {}

Listing 6-3

Basic LocalNewsApp YAML

The creation of the new resource will trigger the deployment of the Local News application with all its components in the default Kubernetes Namespace. You can see that this deployment is controlled by Helm which logs the reconciliation event as can be seen in Listing 6-4.

{"level":"info","ts":1639409841.148087,"logger":"helm.controller","msg":"Reconciled release", "namespace":"default", "name":"mynewsapp","apiVersion":"kubdev.apress.com/v1alpha1","kind":"LocalNewsApp","release":"mynewsapp"}

Listing 6-4

Operator Output After Creating the LocalNewsApp Custom Resource

Let us have a look at the Helm release as shown in Listing 6-5 by running “helm list”.

NAME NAMESPACE REVISION STATUS CHART mynewsapp default 1 deployed localnews-helm-1.0.0

Listing 6-5

Listing Helm Releases Triggered by the Operator

We can even see which resources have been deployed by inspecting the status of our CRD via the following command “kubectl describe LocalNewsApp mynewsapp”. The results are shown in Listing 6-6. The status field gets updated with the information as soon as the Helm installation is done and successful. Furthermore, it shows which resources have been installed.

Spec:

Status:

Conditions:

Last Transition Time: 2021-12-13T15:31:14Z

Status: True

Type: Initialized

Last Transition Time: 2021-12-13T15:31:15Z

Reason: InstallSuccessful

Status: True

Type: Deployed

# Source: localnews-helm/templates/location-extractor-service.yaml

apiVersion: v1

kind: Service

metadata:

...

Listing 6-6

Installation Details Reported in the Status Section

The Pod resources that have been created are shown in Listing 6-7 and can be retrieved with the familiar “kubectl get pods” command.

NAME READY STATUS feed-scraper-75b8c8bf54-8zw8h 1/1 Running

location-extractor-654664f599-xhrx6 1/1 Running

news-backend-648ccd5bff-6tdnc 1/1 Running

news-frontend-6558bd8d67-ms5lj 1/1 Running

postgis-8cdc94675-x6hgj 1/1 Running

Listing 6-7

List Resources Created by the Operator

This worked quite well! But we must admit that the Custom Resource we have used was quite simple. Let us add some configuration parameters in the spec part.

Modifying the CRD by Adding Helm Parameters

What if we would like to change the configuration of our application? Maybe we would like to update the news feed that is currently used because we think CNN is a better fit than BBC. This is a good example for exploring how the values in the spec section of our CRD get mapped to Helm parameters. Let us now add the CNN feeds URL via the Custom Resource LocalNewsApp shown in Listing 6-8 and apply this as an update to our already deployed Custom Resource named mynewsapp. It is straightforward to find out how to do this. We just need to look into the values.yaml of our Helm chart.

apiVersion: kubdev.apress.com/v1alpha1

kind: LocalNewsApp

metadata:

spec:

feedscraper:

envVars:

feeds:

value: http://rss.cnn.com/rss/edition_world.rss

Listing 6-8

Configure the Resources Managed by the Operator via Custom Resource Spec

Update the existing Custom Resource by running “kubectl apply -f snippets/chapter6/news-sample2.yaml”. This example demonstrates how we can set the Helm parameters from our spec part of the CR. The values from the spec are directly translated into the respective Helm parameters. Let us look at what happened through this change by checking on the logs of our Operator shown in Listing 6-9.

{"level":"info","ts":1639410376.32279,"logger":"helm.controller","msg":"Upgraded release","namespace":"default","name":"mynewsapp","apiVersion":"kubdev.apress.com/v1alpha1","kind":"LocalNewsApp","release":"mynewsapp","force":false}

Listing 6-9

Operator Logs After Modifying the Custom Resource Spec

That also means there must have been another revision of our Helm installation. Let us verify this by running “helm list” again. The results are shown in Listing 6-10.

NAME NAMESPACE REVISION STATUS CHART mynewsapp default 2 deployed localnews-helm-1.0.0

Listing 6-10

Helm Revision After Changing the Custom Resource

To check if the application is now actually using CNN as the new RSS feed, you could head over to the News-Frontend in your browser. If you click on the news rendered on the map, you should find news items from CNN. And while in the previous chapters of the book we pointed you to the Kubernetes Service of the News-Frontend which had been exposed via NodePort, this is usually not something someone would do in the runtime phase. The Helm chart is already configured to expose the application correctly: via a Kubernetes Ingress. The Ingress can be activated by further configuring the mynewsapp instance of the LocalNewsApp CR. All you have to do is update the file snippets/chapter6/news-sample3.yaml with the IP of your Minikube virtual machine. You can get it by running “minikube ip” on the command line. Afterward, run “kubectl apply -f snippets/chapter6/news-sample3.yaml” to update the CR. By running “kubectl get ingress”, you get the URL that the UI has been exposed with. You should find the old BBC news entries that are already in the database but also new ones from CNN! Caution, if you work with Minikube, make sure the ingress addon is enabled (“minikube addons enable ingress”).

Let us reflect on what has actually happened so far. We changed our CR and the Operator synchronized its operands, respectively the resources created by Helm with the new state. But how does that work? The controller runs a reconciliation loop that is triggered when one of the watched CRDs is changed. In this case, a Helm upgrade command is executed that will upgrade the current Helm release. This is a great feature, but what about the other direction: someone changes one of its managed resources, that is, is our Operator capable of prohibiting so-called configuration drifts?

Modifying Resources Owned by the Operator

To answer this question, we will manually delete one of the Deployments managed by the Operator. If we had deleted a Pod, this would start a new Pod because of the ReplicaSet that watches the running Pods. But if we deleted the Deployment without our Operator managing it, Kubernetes would not have created a new one. Why? Because there is no desired state that is telling Kubernetes to do that. This changes with our Operator because it watches all managed resources by so-called owner references. Whenever a resource is changed that is owned by our CRD, for example, the Deployment location-extractor, this will trigger another reconciliation loop that will compare the resources with the manifests from the Helm release. If there is any kind of deviation between the expected and the actual resources, the Operator will create missing resources or patch existing ones via Kube-API. Let us demonstrate this by deleting the Location-Extractor Deployment running the command “kubectl delete deployment location-extractor”. If you run “kubectl get deployments”, you will see as in Listing 6-11 that the deleted Deployment has instantly been recreated.

NAME READY UP-TO-DATE AVAILABLE AGE

feed-scraper 1/1 1 1 20m

location-extractor 1/1 1 1 5s

news-backend 1/1 1 1 20m

news-frontend 1/1 1 1 20m

postgis 1/1 1 1 20m

Listing 6-11

Helm Operator Recreating the Deleted Deployment

Let us now look at what happens if we change the Deployment spec. To edit the Deployment, run “kubectl edit deployment location-extractor”. Your default text editor will pop up, and you can change the spec.replicas attribute from 1 to, let’s say, 2. The command “kubectl describe deployments location-extractor” reveals that the desired number of replicas has been set back from 2 to 1 as we can see in Listing 6-12. This is because the Operator patched the Deployment to comply with the desired state of the Helm release.

Events:

Type Reason Age From Message

---- ------ --- ---- -------

Normal ScalingReplicaSet 12s deployment-controller Scaled up replica set location-extractor-c9767f4bd to 2

Normal ScalingReplicaSet 11s deployment-controller Scaled down replica set location-extractor-c9767f4bd to 1

Listing 6-12

Helm Operator Patching Drifted Replicas Configuration

If we change parts of the resource that are not defined by the respective Helm template, the Operator will ignore these changes and leave the resource untouched. An example with Kubernetes labels that are not getting removed by the Operator can be found in Listing 6-13 after you set a label by running “kubectl label deployments location-extractor untouched=true” and then look at them by running “kubectl get deployments location-extractor --show-labels”.

NAME READY LABELS

location-extractor 1/1 app=location-extractor,untouched=true

Listing 6-13

Helm Operator Ignoring Unmanaged Parts of Resource Specifications

What we have experienced in the last two sections is summarized in Figure 6-6. We used an abstraction via our CRD defining the desired state that is synchronized into two directions. This brings real added value in comparison to using plain Helm. A plain Helm deployment without an Operator is a manual step and is not automatically triggered when your template or parameters change nor can it prohibit any configuration drifts.

Figure 6-6
Helm Operator reconciling CRDs and managed resources

Deleting the CRD

What is left to look at? Right, we should do some cleanup work by removing the CR and running “kubectl delete -f snippets/chapter6/news-sample3.yaml”. The Operator will now trigger Helm to uninstall the Chart and thus remove all resources that have been previously created. You can follow this process by again looking at the logs of the running Operator which will output Uninstalled release. The command “helm list” will then also return an empty result. Afterward, you can safely stop the running Operator with Ctrl+C.

One Chart to Rule Them All?

The Operator SDK gave us a jumpstart for generating an Operator out of an existing Helm chart. Nevertheless, an Operator can do things beyond installing, modifying, uninstalling, and reconciling between the desired and the actual state of the Local News application. Let us recall what we discussed in Chapter 4. We showed how to create an abstraction of the Feed-Scraper component of the Local News application via a CRD. With this, we were able to deploy multiple instances of the Feed-Scraper using a FeedAnalysis CRD.

With our current Helm Operator, we can also spawn the Feed-Scraper Deployment and tell it to use different feeds such as BBC or CNN. However, BBC and CNN are always covered by a single Feed-Scraper component. We cannot express that we want to have multiple Feed-Scraper instances reading different feeds. Obviously, we could change the code of the Feed-Scraper as discussed in our various attempts from Chapter 4. But since it is much easier to maintain, let us extend the existing Operator by another CRD to implement the scenario illustrated in Figure 6-7. This figure reveals the resulting internal structure of the Operator after adding the new CRD. There is a controller manager, now maintaining two controllers, one for each CRD.

Figure 6-7
Helm Operator watching multiple CRDs

To add another CRD to an existing Operator, we use the create api command. To create this new endpoint for a FeedAnalysis CRD, run “operator-sdk create api --group kubdev --version v1alpha1 --kind FeedAnalysis” from the folder k8s/operator/localnews-operator-helm.

This generates a new Helm Chart into k8s/operator/localnews-operator-helm/helm-charts/feedanalysis as well as adds our CRD to the watches.yaml. The Helm Operator controller manager will then run an additional controller called feedanalysis-controller responsible for the new CRD.

The generated resources and code are a nice starting point for us. But we don’t need everything. Therefore, delete all unnecessary resources: hpa.yaml, ingress.yaml, service.yaml, deployment.yaml, NOTES.txt, and the tests folder from the templates folder. Then copy the Feed-Scraper Deployment YAML from the Helm Chart that we have already generated. It is located at helm-charts/localnews-helm/templates/feed-scraper-deployment.yaml. Copy it into the folder helm-charts/feedanalysis/templates. In addition, remove everything from values.yaml except for the serviceAccount section and add the feedscraper as well as the localnews section from helm-charts/localnews-helm/values.yaml. You might also have a look at our complete Helm Operator for reference. As mentioned earlier, you can also find a sample in the repo in the folder k8s/operator/news-operator-helm.

In the Feed-Scraper Deployment YAML, we must now make sure that the naming of the deployed resources is unique for every Helm release. This is necessary because we want to be able to run multiple Feed-Scraper deployments side by side in the same namespace. For this task, we can make use of the named templates generated into the _helpers.tpl file. In Listing 6-14, we use the named templates feedanalysis.fullname and feedanalysis.selectorLabels to set the metadata.name and the matchLabels.

Note

Helm named templates, sometimes also referred to as a partial or a subtemplate, provide the ability to define functionalities that can be reused inside different templates using the include function. By convention, the named templates usually reside in a file called _helpers.tpl.

apiVersion: apps/v1

kind: Deployment

metadata:

...

spec:

...

selector:

matchLabels:

{{- include "feedanalysis.selectorLabels" . | nindent 6 }}

Listing 6-14

Making name and matchLabels Unique per Helm Release

In Listing 6-15, we use it to set the labels of the Pod accordingly. Furthermore, we use the Chart name as the container name.

template:

metadata:

labels:

{{- include "feedanalysis.selectorLabels" . | nindent 8 }}

spec:

containers:

- name: {{ .Chart.Name }}

Listing 6-15

Making Template Section Labels and Container Name Unique per Helm Release

Let us now test the new functionality. First, you have to reinstall the Operator. Stop the current make process (if it is still running) and run “make -C k8s/operator/localnews-operator-helm install run”. Then create a new Custom Resource by running “kubectl apply -f snippets/chapter6/news-sample4.yaml”, this time without the Feed-Scraper as can be seen in Listing 6-16 where we switched off its deployment.

apiVersion: kubdev.apress.com/v1alpha1

kind: LocalNewsApp

metadata:

spec:

feedscraper:

deployment: "off"

Listing 6-16

Create the LocalNewsApp Without Feed-Scraper

Soon the Local News application should be up and running again, but this time without the Feed-Scraper component. A Deployment for this component will be triggered by applying Custom Resources such as the one in Listing 6-17.

apiVersion: kubdev.apress.com/v1alpha1

kind: FeedAnalysis

metadata:

spec:

feedscraper:

envVars:

feeds:

value: http://rss.cnn.com/rss/edition_world.rss

Listing 6-17

Create Two FeedAnalysis Resources (Only One of Them Is Shown Here)

Deploy two of them by running “kubectl apply -f snippets/chapter6/feeds-sample1.yaml” and “kubectl apply -f snippets/chapter6/feeds-sample2.yaml”. One is for CNN and the other one for BBC. You could look up other RSS Feed URLs and create more Custom Resources. However, mind that some RSS Feeds might have a proprietary format and therefore won’t work.

If you list the running Pods with “kubectl get pods”, you should see what is contained in Listing 6-18. Two Feed-Scrapers are running each with its individual feed URL.

NAME READY STATUS

bbc-feed-feedanalysis-5c9486bc6c-526nz 1/1 Running

cnn-feed-feedanalysis-76bd8c6556-tbn6g 1/1 Running

location-extractor-654664f599-mrt5j 1/1 Running

news-backend-648ccd5bff-8rdjs 1/1 Running

news-frontend-6558bd8d67-qwtc2 1/1 Running

postgis-8cdc94675-lxbhv 1/1 Running

Listing 6-18

Pods Running After Deploying the App and Two Feed-Scrapers

Cleaning Up

Before we move on to actually deploy our Operator to Kubernetes, clean up by deleting the three Custom Resources from your cluster with the commands in Listing 6-19.

kubectl delete -f snippets/chapter6/feeds-sample1.yaml

kubectl delete -f snippets/chapter6/feeds-sample2.yaml

kubectl delete -f snippets/chapter6/news-sample4.yaml

Listing 6-19

Cleaning Up

Afterward, stop the Operator process on your machine with Ctrl+C.

Deploying Our Operator to Kubernetes

Until now, we ran our Operator directly from our project folder as a local process that communicated with our cluster via the Kubernetes API. But as we learned in Chapters 3 and 5, the closer we are to the target environment, the more meaningful our tests will be. So let us deploy the Operator to our Kubernetes cluster and repeat what we have done so far.

First of all, we need to build and push the container image of the Operator to be accessible inside Kubernetes via running “make -C k8s/operator/localnews-operator-helm docker-build docker-push IMG=<registry>/<repo>/news-operator:0.0.1”. This triggers a container image build and a push to your Container Registry. Both, by default, make use of Docker. In case you haven’t installed it yet, refer to the official guide.¹⁴ We will use quay.io/k8snativedev/news-operator:0.0.1 as the IMG variable in the following. If you want to build your own Operator Images, replace this with your own target Container Registry that you might have already set up as required in Chapter 5 to run the pipeline.

After pushing the image, we are ready to deploy the Operator. Run the deployment script with “make -C k8s/operator/localnews-operator-helm deploy IMG=quay.io/k8snativedev/news-operator:0.0.1”. This will create a new Namespace localnews-operator-helm-system and deploy all the necessary resources such as the CRDs and other resources required by your Operator into that Namespace.

We can find the Operator controller manager – previously running as a process directly on your machine – now as a Pod running in the cluster as can be seen in Listing 6-20. Check it for yourself by running “kubectl -n localnews-operator-helm-system get pods”.

NAME

localnews-operator-helm-controller-manager-768496b95f-pqfdw

Listing 6-20

Controller Manager Running As a Pod in Kubernetes

If you create the Custom Resource again, for example, by running “kubectl apply -f snippets/chapter6/news-sample1.yaml”, you will see the Operator Pod doing its work either by looking at the logs with “kubectl -n localnews-operator-helm-system logs deployments/localnews-operator-helm-controller-manager manager -f” or watching the Pods of the Local News application getting created again.

With this, we conclude the Helm-based Operator excursion. We demonstrated how we can quickly build an Operator based on an existing Helm chart. We showed how to use CRDs to control Helm-based releases and how to associate different CRDs with different resources and make the management of the Local News application much more convenient and expressive. Moreover, we have the controller manager that keeps synching desired and actual states. Hence, even though we use Helm, we gained the ability to constantly reconcile the managed resources that have been deployed whenever there is either a change of the CRD or a change in the managed resources.

However, as we have already mentioned when talking about the different languages supported by the Operator SDK, a Helm-based Operator can at most reach level 2 (basic installs + seamless upgrades). It can hardly implement level 3 since this would involve writing custom logic that is not feasible with pure Helm. Helm is a package manager with a template engine that is less flexible compared to a general-purpose language. The latter is better suited to implement complex and individualized controller logic. Hence, let us look at Go-based Operators in the following.

Cleaning Up

To clean up, you can just delete the Custom Resource via “kubectl delete -f snippets/chapter6/news-sample1.yaml” and run “make -C k8s/operator/localnews-operator-helm undeploy” which removes the namespace localnews-operator-helm-system with all its resources.

Advanced: Go-Based Operator with Operator SDK

First of all, be aware that this section requires basic skills in the Go language. If you are not familiar with it, you might skip this section and jump directly to the section “Choosing the Right Type of Operator.” Alternatively, you could also take a detour to https://developers.redhat.com/topics/go to learn more about Go.

Before we start, let us first give a brief summary of what you can expect from this section. We will

1.
Map the YAML/JSON of our LocalNewsApp CRD to a Go data structure so that we can read and write its state from our controller.
2.
Write the controller reconciliation logic to create resources from our CRD and update managed resources if they deviate from the desired state. We exemplify this with a single managed resource: the Kubernetes Service of the News-Backend.
3.
Ensure that the managed resources are cleaned up when deleting the Custom Resource by using owner references.
4.
Report the status of the CR deployment via LocalNewsApp CRD. Again, we will do this just for the Kubernetes Service of the News-Backend.
5.
Draft on how we could raise the capability level of the Operator.

To start with Go development, you need to make sure that Go is installed appropriately on your machine. If you want to follow along and it is not installed, head over to the official installation guide.¹⁵ Thereafter, create a new empty folder, for example, k8s/operator/localnews-operator-go for the Go-based Operator, and create a new go module as described in Listing 6-21.

export GO111MODULE=on

go mod init apress.com/m/v2

Listing 6-21

Creating a Go Module

Similar to the Helm Operator, we need to initialize the project which will generate the project structure as shown in Listing 6-22. Run “operator-sdk init --domain apress.com” to initialize the project.

Writing kustomize manifests for you to edit...

Writing scaffold for you to edit...

...

Listing 6-22

Initializing the Go Operator with Operator SDK

Figure 6-8 gives an overview of the generated project structure.

Figure 6-8
Go Operator project structure

Let us have a closer look at the generated files and folders. There is a

1.
config folder with the same subfolders as in the project structure we have generated with the Helm Operator
2.
Dockerfile to build the Operator image
3.
Makefile defining various useful build, run, and push commands that we will use in the following
4.
PROJECT file containing metadata about the generated project structure for the Operator SDK
5.
main.go file containing the main function to initialize the controller manager and its components such as metrics, health, and readiness checks
6.
Folder named hack containing a file boilerplate.go.txt to add headers to the generated source files

In the following, we will take this generated project structure to build a basic Operator that is able to create, update, and delete resources based on our CRD as well as report the deployment status. For the sake of brevity, we will do this only for one resource: the Kubernetes Service for the News-Backend. In our Git repository¹⁶ in the folder k8s/operator/news-operator-go, you can find the complete version of the Operator for your reference.

Figure 6-9 gives an overview of the structure that we will refer to in the following code snippets. The LocalNewsAppSpec and LocalNewsAppStatus structs will be extended by additional elements, and we will create an additional file model/backend_service.go to cover the logic for creating and reconciling the News-Backend Service. The elements in the top are embedded types from external packages such as the Kubernetes Apimachinery¹⁷ (imported as metav1, a Go library to work with the Kubernetes API objects) and the Kubernetes Controller Runtime¹⁸ (imported as client, the client code for accessing the Kubernetes API which is part of the Kubernetes Controller Runtime library).

Figure 6-9
Target design for the Go-based Operator

Note

The project structure and resources generation is implemented with the Kubebuilder¹⁹ framework which aims at reducing complexity when building and publishing Kubernetes APIs in Go. This is why you will find several so-called marker comments starting with //+kubebuilder. These are used to provide metadata/configure the code and YAML generator of Kubebuilder.

Implementing Our CRD with Go Structs

For each CRD, the Operator SDK will generate a Go struct type that represents the CRD and a struct type that encapsulates the controller logic. This is accomplished by instructing the Operator SDK to create an API as defined in Listing 6-23. Run this command which should already be familiar from the Helm-based operator.

operator-sdk create api --group kubdev --version v1alpha1 --kind LocalNewsApp --resource --controller

Listing 6-23

Generating a Go-Based Controller and CRD with Operator SDK

The generated CRD type can be found in the subfolder api/<version>/<crd_name>_types.go. In the case of the command from Listing 6-23, we will find our new file at api/v1alpha1/localnewsapp_types.go. The file is only generated once and can thus be edited. The generated code only contains a skeleton for the CRD which must be extended by our desired data structure. For example, if we would like to be able to configure the distinct components of our Local News application, we must add a data structure for each of them and add this to the spec section of the CRD. Listing 6-24 shows how we have replaced the generated Foo attribute in the LocalNewsAppSpec with attributes representing the components of the Local News application, for example, FeedScraper. For each of the attributes, we used a separate struct type to capture the data structure for its configuration. For the sake of brevity, we only show the NewsBackend struct that contains an attribute ServicePort of the type int32.

package v1alpha1

import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" )

type LocalNewsApp struct {

metav1.TypeMeta `json:",inline"`

metav1.ObjectMeta `json:"metadata,omitempty"`

Spec LocalNewsAppSpec `json:"spec,omitempty"`

Status LocalNewsAppStatus `json:"status,omitempty"`

}

type LocalNewsAppSpec struct {

NewsBackend NewsBackend `json:"newsbackend,omitempty"

FeedScraper FeedScraper `json:"feedscraper,omitempty"`

NewsFrontend NewsFrontend `json:"newsfrontend,omitempty"`

Postgis Postgis `json:"postgis,omitempty"`

Extractor LocationExtractor

`json:"locationextractor,omitempty"`

}

type NewsBackend struct {

// +kubebuilder:default:=8080

ServicePort int32 `json:"servicePort,omitempty"`

...

}

...

Listing 6-24

The LocalNewsApp Struct Type

The data structure defined by the nested type structs can be mapped to a JSON structure that can also be represented as YAML. This is why we find the String literals right to the different attributes, for example, `json:"feedsUrl,omitempty"`, which defines a tag that specifies that this attribute is mapped to the key feedsUrl in the JSON. The omitempty defines that empty values should be omitted from the JSON.

The command “make manifests” that you should run whenever you change this Go file generates the CRD’s YAML representation into config/crd/bases. In our example, the file is named kubdev.apress.com_localnewsapps.yaml. An excerpt of this file is shown in Listing 6-25. You can see that the Go data structure has been mapped into the schema part of the LocalNewsApp CRD.

...

spec:

description: LocalNewsAppSpec defines the desired state of

LocalNewsApp

properties:

newsbackend:

properties:

servicePort:

default: 8080

format: int32

type: integer

type: object

feedscraper:

...

Listing 6-25

Excerpt of the Generated LocalNewsApp CRD YAML

In the same folder as the localnewsapp_types.go, there is a corresponding file called zz_generated.deepcopy.go that contains logic for creating a deep copy (copy all attributes, and if they are of a complex type, copy also their attributes and so on) of your types. We will use this in the next part of our implementation. This deep copy logic can be regenerated by running the command “make generate” which we should do since we changed the localnewsapp_types.go file.

Writing the Controller Reconciliation Logic

In the following, we will implement the controller’s reconciliation logic similar to the one we discussed for the Helm-based operator. In the folder controllers, we will find a file named localnewsapp_controller.go that has been generated by our create api command from Listing 6-23. So we will end up with one controller per CRD. This file is also generated once and can thus be extended by our custom logic. But before we start doing this, let us have a brief look at the skeleton that has already been generated and is shown in Listing 6-26.

type LocalNewsAppReconciler struct {

client.Client

Scheme *runtime.Scheme

}

func (r *LocalNewsAppReconciler) Reconcile(ctx

context.Context, req ctrl.Request) (ctrl.Result, error) {

_ = log.FromContext(ctx)

// your logic here

return ctrl.Result{}, nil

}

func (r *LocalNewsAppReconciler) SetupWithManager(mgr

ctrl.Manager) error {

return ctrl.NewControllerManagedBy(mgr).

For(&kubdevv1alpha1.LocalNewsApp{}).

Complete(r)

}

Listing 6-26

The Generated LocalNewsApp Controller

First of all, we have a LocalNewsAppReconciler type struct embedding the Kubernetes client from the Go-Libraries of the Controller-Runtime Project. This allows us to interact with the Kubernetes API. The Reconcile method contains a placeholder where we can put our custom reconciliation logic. This function is part of the main Kubernetes reconciliation loop in the Kube-Controller-Manager as can be seen in Figure 6-10. The SetupWithManager method uses a controller builder to define which CRDs need to be reconciled via its For function. In our case, this is the Go struct type representing our CRD (from Listing 6-23). Whenever its state changes, our Reconcile function will be called by the Kube-Controller-Manager.

Figure 6-10
Reconciliation loop invoking reconcile functions of controllers

Let us take this as a starting point to manage the Kubernetes Service resource for the News-Backend as a representative example of how to create, update, and delete resources in general. The code is shown in Listing 6-27 and is inserted into the // your logic here comment section in Listing 6-26.

log := ctrllog.FromContext(ctx)

localnewsapp := &kubdevv1alpha1.LocalNewsApp{}

err := r.Get(ctx, req.NamespacedName, localnewsapp)

if err != nil {

if errors.IsNotFound(err) {

log.Info("LocalNewsApp resource not found")

return ctrl.Result{}, nil

}

log.Error(err, "Failed to get LocalNewsApp")

return ctrl.Result{}, err

}

err = r.reconcileBackendService(ctx, localnewsapp)

if err != nil {

log.Error(err, "Failed to reconcile backend service")

return ctrl.Result{}, err

}

Listing 6-27

Inserting Our Own Logic into the Reconcile Method from Listing 6-26

In this code snippet, we use the Kubernetes client to get a resource of the type LocalNewsApp matching the requested NamespacedName (namespace + name). If this resource cannot be found, we do nothing by directly returning the result. If there is another type of error, we return the error. Finally, if we find the requested resource, we call the reconciliation logic, for example, for managing the Service resource for the News-Backend. For the other resources, take a look at the complete Operator in the book’s Git repository in k8s/operator/news-operator-go. Listing 6-28 shows how the reconcileBackendService function looks like.

import (

...

model "apress.com/m/v2/model"

...

)

func (r *LocalNewsAppReconciler) reconcileBackendService(ctx

context.Context, app *kubdevv1alpha1.LocalNewsApp)

error {

found := &v1.Service{}

backendService := model.BackendService(app)

err := r.Get(ctx, types.NamespacedName{Name:

backendService.Name, Namespace: app.Namespace}, found)

if err != nil && errors.IsNotFound(err) {

ctrl.SetControllerReference(app, backendService, r.Scheme)

return r.Create(ctx, backendService)

} else {

reconciled := model.ReconcileBackendService(app, found)

return r.Update(ctx, reconciled)

}

Listing 6-28

Reconciling the News-Backend Service

In the first step, we create a new instance of a Kubernetes Service with all desired properties such as its name, Namespace, labels, and ports. We store this in the local variable backendService. Listing 6-29 shows the BackendService function. For our example, it has been moved into a new file model/backend_service.go. It merely fills the desired properties of the Kubernetes Service resource. We could have read properties from our CR which is passed in as parameter cr, but we have not implemented it yet.

import (

kubdevv1alpha1 "apress.com/m/v2/api/v1alpha1"

v1 "k8s.io/api/core/v1"

v12 "k8s.io/apimachinery/pkg/apis/meta/v1"

"k8s.io/apimachinery/pkg/util/intstr"

)

func BackendService(cr *kubdevv1alpha1.LocalNewsApp)

*v1.Service {

return &v1.Service{

ObjectMeta: v12.ObjectMeta{

Name: "news-backend",

Namespace: cr.Namespace,

Labels: map[string]string{

"app": "news-backend",

Spec: getBackendSpec(cr),

}

func getBackendSpec(cr *kubdevv1alpha1.LocalNewsApp)

v1.ServiceSpec {

spec := v1.ServiceSpec{}

spec.Type = v1.ServiceTypeNodePort

spec.Selector = map[string]string{

"app": "news-backend",

}

if cr.Spec.NewsBackend.ServicePort < 1 {

cr.Spec.NewsBackend.ServicePort = 8080

}

spec.Ports = []v1.ServicePort{

{

Port: cr.Spec.NewsBackend.ServicePort,

TargetPort: intstr.FromInt(8080),

NodePort: 30000,

}

return spec

}

func ReconcileBackendService(cr

*kubdevv1alpha1.LocalNewsApp, currentState *v1.Service)

*v1.Service {

reconciled := currentState.DeepCopy()

reconciled.Spec = getBackendSpec(cr)

return reconciled

}

Listing 6-29

The model/backend_service.go Providing Functions for the News-Backend Service

Then, let us switch back to the body of the reconcileBackendService function from Listing 6-28. We try to get the Service resource via the Kubernetes client by its name news-backend. If we cannot find it, we set a controller reference, that is, the newly created Service resource will be owned by our CRD. We will talk about what this means in a second. Finally, we instruct the Kubernetes client to create the new Service resource.

Otherwise, if the resource already exists, we make sure to set the desired state, for example, if someone changed the Service port, we will ensure that it will get overridden by the desired one. This is implemented in the ReconcileBackendService function shown in Listing 6-29. It just makes a deep copy of the current Service that has been found and replaces the spec part with the desired one.

Finally, we instruct the Kubernetes client to update the Service resource. A prerequisite to getting this working is to ensure that the controller reconciles not only the CRDs but also the owned resource types, in our example, the Kubernetes Service (and Deployment, Ingress, etc. when we talk about the full version of our operator). Listing 6-30 shows how we express this via the controller builder in the SetupWithManager function from Listing 6-26.

return ctrl.NewControllerManagedBy(mgr).

For(&kubdevv1alpha1.LocalNewsApp{}).

Owns(&v1.Service{}).

Complete(r)

Listing 6-30

Extending the Body of the SetupWithManager Function in the Controller

Let us now see the new Operator in action by running it on our Kubernetes cluster (locally from outside the cluster as we did with our Helm Operator) with the command: “make install run”. If Go complains about missing dependencies, for example, the Kubernetes API package, you can install them via “go get -d k8s.io/[email protected]”. You can easily test the behavior of the Operator when you create a new LocalNewsApp with “kubectl apply -f snippets/chapter6/news-sample1.yaml” in another terminal while the Operator is running.

We will see that a new Service resource is created by our Operator with the command “kubectl get svc news-backend -o yaml”. We can run the same configuration drift tests as with the Helm Operator: delete the Service or change the NodePort and you will see that our Operator will instantly recreate or update it.

Deleting Managed Resources Using Owner References

Let us now have a closer look at how the abovementioned owner reference is processed by Kubernetes. To do so, we inspect the results of the “kubectl get svc news-backend -o yaml” command shown in Listing 6-31. We will find an owner reference as part of the metadata of this Service that we will explain in the following.

ownerReferences:

- apiVersion: kubdev.apress.com/v1alpha1

blockOwnerDeletion: true

controller: true

kind: LocalNewsApp

name: mynewsapp

uid: daabda70-2680-488d-a973-839004c7907e

Listing 6-31

News-Backend Service YAML Metadata Excerpt

The owner reference points to the CR named mynewsapp of type LocalNewsApp that we have previously created. When our CR is deleted, this Service will also be deleted. Hence, if you missed the code responsible for deleting the resources created by our Operator, this is the reason. We do not need to write code because Kubernetes will do automatic garbage collection when the owner resource is deleted, that is, if we delete our LocalNewsApp resource with the name mynewsapp, the News-Backend Service will also be deleted. We must just ensure that we set the owner references accordingly.

Setting the Status Field

When we revisit the Operator Pattern shown in Figure 6-2, we can see that we addressed two aspects so far with our Go-based operator: declaratively defining the desired state via our CRD and managing resources (more specifically one resource, the Service for the News-Backend) necessary to deploy and run our application. The third aspect of the pattern is reporting the current status. So let us demonstrate this with a simple example. We report the type and name of our managed resources when it is reconciled.

To pave the way for reporting the managed resources, we need to extend our LocalNewsApp struct, more specifically the LocalNewsAppStatus struct in localnewsapp_types.go. We add a new attribute ManagedResources of type String array as can be seen in Listing 6-32. After adding this, we need to run “make generate manifests” to regenerate the deep copy function and the YAML files.

type LocalNewsAppStatus struct {

ManagedResources []string `json:"managedResources,omitempty"`

}

Listing 6-32

The LocalNewsApp Struct Type Continued

With this, we have defined the data structure for holding the reported information. The next step is to fill this data with the respective information. Hence, we add the following statement to the reconcileBackendService function of the LocalNewsAppReconciler struct as can be seen in Listing 6-33.

...

err := r.Get(ctx, types.NamespacedName{Name:

backendService.Name, Namespace: app.Namespace}, found)

app.Status.ManagedResources[0] = "svc/" + backendService.Name

if err != nil && errors.IsNotFound(err) {

...

Listing 6-33

Setting the Status in the LocalNewsAppReconciler

We need to initialize the ManagedResources attribute before calling the ReconcileBackendService function, and, finally, we must update the status. Both can be seen in Listing 6-34.

if app.Status.ManagedResources == nil {

app.Status.ManagedResources = []string{""}

}

err := r.reconcileBackendService(ctx, localnewsapp)

...

err = r.Status().Update(ctx, localnewsapp)

Listing 6-34

Initializing and Updating the Status in the Reconcile Function

To test our new functionality, let us once again build and run the Operator with “make install run” and show the status of the LocalNewsApp as described in Listing 6-35. Run “kubectl describe LocalNewsApp” to retrieve the status.

...

Status:

Managed Resources:

svc/news-backend

Listing 6-35

Printing the Status of the LocalNewsApp resource

By managing the resources, using the CRD as a source for defining the desired state, and setting the status field, we have demonstrated the foundations for implementing the Operator Pattern based on Go. We still have a lot of work to do because everything we did for the News-Backend Service would need to be done for the other resources as well. This is, however, routine work. Let us rather discuss how we could extend this code skeleton to raise the capability level of the operator.

Raising the Capability Level

Let us revisit the capability levels from the section “What Is a Capability Level?” and discuss what is necessary to reach the respective level based on the Go code described earlier:

Level 1, basic install – To reach level 1, we need to create the reconciliation logic for all resource types, for example, Deployments, Services, and the Ingress for the components including the database of the Local News application that our Operator should own. For feature parity to the Helm-based operator, we should also generate the API for the FeedAnalysis resource and write an additional controller with the respective reconciliation logic to manage Feed-Scraper components. We should also use the status field of the FeedAnalysis to report the progress of the feed scraping process.
Level 2, seamless upgrades – To simplify the updating process, we could tie the version of the managed application to the version of our Operator. The advantage is that our Operator then only needs to manage one version of its resources. To address specifics during the upgrade, we could add additional logic, for example, if we would switch to another version of PostgreSQL and needed to migrate data.
Level 3, full lifecycle – This level requires backup and restore and failover capabilities. We could add additional CRDs to control the desired behavior, for example, how often we should run the backup process. For implementing the actual tasks, we could spawn news Pods, Jobs, or CronJobs, for example, a Job that runs a PostgreSQL dump and stores it to a volume or an object store.
Level 4, deep insights, and level 5, autopilot – This can also be implemented with a combination of new CRDs, new managed resources (e.g., dashboards), and new Go code that implements the metrics collection or other more advanced operational tasks. Since we are using Go as a general-purpose language, there are in principle no limits.

Choosing the Right Type of Operator

We have learned a lot so far about the Helm and the Go Operator. For both, we can generate similar project structures with various artifacts such as configuration files and code via the Operator SDK. These artifacts serve as a starting point for our customizations. However, there is still a big difference in terms of what is generated.

On one hand, the Helm-based Operator produces a ready-to-use level 2 Operator. As long as you have a mature Helm Chart, you only need a few additional customizations, for example, extending the schema for your CRD. On the other hand, you must rely on Helm to define the logic of your operator: you have thus less flexibility, and it is difficult to write an Operator with a capability level greater than 2.

The Go-based Operator is on the other side of the spectrum. It generates only a basic code skeleton, and hence you need to start almost from scratch: you only have a Go data structure representing your CRD as well as a controller skeleton to add your code to. This code, however, can implement various operational tasks potentially allowing us to write an Operator that is finally able to run in autopilot mode (level 5).

Now you might say, I like the idea of writing my Operator logic with a general-purpose language, but I have less experience with Go, can’t I use Java or Python instead – or any other language of choice? Sure you can. If you prefer Java, for example, you can use the Java Operator SDK²⁰ which is another open source project inspired by the Operator SDK. It allows you to generate Java code in different flavors, for example, pure Java, Quarkus, or Spring Boot.

A further approach is to write your Operator from scratch without any SDK. This is often referred to as a Bare Language Operator. In Chapter 4, we started out writing our controller using the Fabric8 Kubernetes client. We could use this as the basis for an Operator.

Finally, it is also worth mentioning that automation tools such as Ansible are well suited to be used for implementing the Operator logic. Ansible allows you to write automation code in a declarative manner, and there are various Ansible collections for Kubernetes that help to get productive very quickly.

Advanced: The Operator Lifecycle Manager – Who Manages the Operators?

We have learned so far that Operators can automate operational tasks such as installation and upgrades. Furthermore, we have implemented a simple Helm-based and a Go-based Operator. But who installs, manages, and upgrades the Operators themselves?

This is where the Operator Lifecycle Manager (OLM) comes into play. The OLM is responsible for the following tasks:

Making Operators available to your Kubernetes cluster via catalogs
Keeping your Operators up to date via auto-update
Checking Operators’ compatibility with their environment and dependencies with other Operators
Preventing conflicts with Operators owning the same CRDs
Providing a user interface to control the Operators

The OLM Packaging Format

To manage an Operator via OLM, we need to provide a bundle containing metadata to describe our Operator. The folder structure is depicted in Figure 6-11. The figure shows the bundle folder in the context of an Operator project. It has two subfolders manifests and metadata. The metadata folder contains a configuration for annotation metadata which we will explain in a second. The manifests folder may contain various files as we will describe in the following. Please note that the names could vary. We chose names to express the resource types of the manifests.

Figure 6-11
OLM bundle structure and contents

The main entry point for describing an Operator that should be managed by the OLM is a CRD called ClusterServiceVersion . In this file, we can define the following metadata:

Descriptive metadata – Contains information about name, description, keywords, maintainer, provider, maturity, version, and minKubeVersion, among others.
Installation metadata – Describes the runtime components of your Operator and their requirements. You define, for instance, the deployments for your Operator Pods as well as the required (cluster) permissions. Another key property is the supported installation mode: OwnNamespace, SingleNamespace, MultiNamespace, and AllNamespaces (for details, see the following info box).
Owned APIs – Lists all CRDs owned by your Operator. This is, for example, important to prevent conflict with other Operators managing the same CRDs.
Required APIs – Lists all CRDs required by your Operator. This allows the OLM to resolve dependencies to other operators that can then be installed automatically.
Native APIs – Lists all (Custom) Resource Definitions required by our Operator outside the scope of the OLM. This could, for instance, be native platform resource definitions such as Deployments or Pods.

Besides this file, you can package additional manifests in your Operator bundles such as ConfigMaps, Secrets, Services, (Cluster)Roles, (Cluster)RoleBindings, ServiceAccounts, PrometheusRules, ServiceMonitors, PodDisruptionBudgets, PriorityClasses, and VerticalPodAutoscalers. Please note that we did not talk about all of these resource types in this book because they are out of scope. Most are Kubernetes core resources, except PrometheusRules and ServiceMonitors which are Prometheus-specific resources. We have just listed them here for the sake of completeness.

In addition to the manifests described earlier, there is a metadata folder. In this folder, there is another YAML file called annotations.yaml. The annotations listed in this file help the OLM to determine how your bundle should be added to a catalog of bundles which is a set of different bundle versions of your Operator. We will learn more about this in a second. An example is an annotation called operators.operatorframework.io.bundle.channels.v1 defining in which catalog channel the bundle should appear.

Note

Channels allow defining different upgrade paths for different users. Each channel may contain different versions of your operator. For example, you could provide an alpha, beta, and stable channel.

Finally, everything must be packaged into a container image containing the manifests, metadata, and an optional test folder. This container image will be labeled with the annotations from the annotations.yaml and pushed into a Container Registry of choice. Another way to publish and share your Operator is the so-called OperatorHub. It hosts a searchable collection of Operators. The Kubernetes OperatorHub is a great place to publish your own Operator so others can use it.²¹

Note

There are four possible installation modes for Operators:

OwnNamespace – The Operator can manage resources in the same Namespace it is installed in.

SingleNamespace – The Operator can manage resources in a single Namespace different from the one it is installed in.

MultiNamespace – The Operator can manage resources in multiple Namespaces of the cluster.

AllNamespaces – The Operator can manage resources in all Namespaces of the cluster. This is the supported mode for our Helm-based Operator.

Deploying Our Operator via OLM

Running our Operator via make install run is a great way to develop and test our Operator. With make deploy, we can easily deploy the Operator to a Kubernetes cluster. However, in a production environment, we would rather rely on the OLM to install and manage our Operator because it simplifies the installation, updates, and management of Operators. Hence, it is important to release our Operator in a format that is processable by the OLM. This is what we will do in the following with our Helm-based operator. The same can be similarly applied to the Go-based operator.

First, to ensure you start with a clean state, run a “minikube delete” followed by a “minikube start --addons=ingress --vm=true --kubernetes-version=‘v1.22.3’ --memory=‘8g’ --cpus=‘4’ --disk-size=‘25000mb’”.

Then we can install the OLM itself to our cluster. This is not our Operator but just the OLM tooling that we will use to manage our Operator. Do it by running the command “operator-sdk olm install --version v0.19.1”. This generates several CustomResourceDefinitions such as the ClusterServiceVersion as well as several Pods: olm-operator, catalog-operator, a catalog for the OperatorHub, and packageservers all deployed to the olm namespace.

Generating the OLM Bundle for Our Own Operator

In the next step, we need to generate an OLM bundle for our Operator as defined in the section “The OLM Packaging Format.” Be aware to change the registry URL to your target registry. The command “make -C k8s/operator/localnews-operator-helm bundle IMG=quay.io/k8snativedev/news-operator:0.0.1” will prompt for several inputs as can be seen in Listing 6-36.

Display name for the operator (required): > Local News Operator

Description for the operator (required):> This is an operator to manage Local News applications

Provider's name for the operator (required): apress.com

Any relevant URL for the provider name (optional): apress.com

Comma-separated list of keywords for your operator (required): news, nlp, rss

Comma-separated list of maintainers and their emails: [email protected]

INFO[0001] Creating bundle.Dockerfile

INFO[0001] Creating bundle/metadata/annotations.yaml

INFO[0001] Bundle metadata generated successfully operator-sdk bundle validate ./bundle

INFO[0000] All validation tests have completed successfully

Listing 6-36

Creating an OLM Bundle

This command generates several additional files in the project folder k8s/operator/localnews-operator-helm. Let us have a look at what has been generated and compare it to what has been described in the part on OLM packaging:

config/manifest/bases/news-operator-helm.clusterserviceversion.yaml – The generated ClusterServiceVersion manifest
bundle/manifests – A folder containing our CRD, an extended version of the generated ClusterServiceVersion resource, a Service for exposing metrics, a ConfigMap to configure the controller manager, and a ClusterRole to get the access rights for reading the metrics
bundle/metadata – A generated annotation YAML file containing core-bundle annotations as well as annotations for testing
tests/scorecard – A scorecard test config YAML
config/manifests – Adds the ClusterServiceVersion to this folder
bundle.Dockerfile – A Dockerfile for building the bundle container image containing the metadata, manifests, and test folder

Building and Pushing the OLM Bundle Image

With the next command, we build our bundle container image based on the new Dockerfile with the command “make -C k8s/operator/localnews-operator-helm bundle-build BUNDLE_IMG=quay.io/k8snativedev/news-operator-bundle:v0.0.1”. Again, replace the URL with your Container Registry accordingly. To be able to pull it from Kubernetes, we must also push it to a Container Registry that is accessible from your cluster. Make sure it is a public Container Registry. Run “make -C k8s/operator/localnews-operator-helm bundle-push BUNDLE_IMG=quay.io/k8snativedev/news-operator-bundle:v0.0.1” to push the container image into your Container Registry.

Note

The variables IMG and BUNDLE_IMG can also be defined in the Makefile or exported as an environment variable. Then, you do not need to specify them in each command. IMG points to your Operator controller manager image. This is required because the ClusterServiceVersion contains a Deployment for the Operator controller manager. BUNDLE_IMAGE points to the bundle image.

Installing the OLM Bundle

Now we are ready to run the OLM bundle on our Kubernetes cluster. Run “operator-sdk run bundle quay.io/k8snativedev/news-operator-bundle:v0.0.1” to do it. Again, replace the Container Registry URL with your own Container Registry URL. Otherwise, if something went wrong earlier or you do not want to build the image yourself, you can leave it as is and stick to the prepared bundle image of this book.

This instructs the OLM to create all necessary resources to install and roll out our Operator. Figure 6-12 gives an overview of their relationships. Let us have a brief look at them to explore their responsibilities.

Figure 6-12
Resources involved in OLM managing operators

We will find two new Pods running when we check with the command “kubectl get pods” as in Listing 6-37.

NAME

localnews-operator-helm-controller-manager-b956666d4-hp65r quay-io-k8snativedev-news-operator-bundle-v0-0-1

Listing 6-37

Local News Operator Pods Running in Kubernetes

The first one is our Operator controller manager. It has been deployed by the OLM. We can follow its logs by running “kubectl logs deployments/localnews-operator-helm-controller-manager manager -f”. The result is similar to the one when we ran make deploy, but the way the Pod has been deployed is completely different as we will see in the following. Before we delve into the details, let us create a new LocalNewsApp resource first to verify everything is working as expected. Run the command “kubectl apply -f snippets/chapter6/news-sample2.yaml” to create it. You should see the Local News application spin up.

What Happened to Our Bundle?

Let us now inspect the second Pod. The bundle Pod is a registry serving a database of pointers to Operator manifest content. But how is this possible when the bundle image we have previously built and pushed was just a set of manifests? When we look at the container image that is used for the Pod, we will see that it is not our BUNDLE_IMG but instead quay.io/operator-framework/opm:latest. We will talk in more depth about OPM in the section “Deploying Operators via OLM Without Operator SDK.” For the moment, it is enough to know that the container has been started with a command that added our BUNDLE_IMG to that registry.

We can see the registry in action if we use a client that speaks the gRPC protocol, for example, grpcurl.²² First, make a port-forward by running “kubectl port-forward quay-io-k8snativedev-news-operator-bundle-v0-0-1 50051:50051” and then access it as defined in Listing 6-38. Note that your bundle name contains the registry URL; hence, you will see a slightly different name except you have used the book’s predefined bundle image. We can see that the Pod serves a list of ClusterServiceVersion entries (one so far because only one version of the Operator exists).

grpcurl -plaintext localhost:50051 api.Registry/ListPackages

{ "name": "localnews-operator-helm" }

grpcurl -plaintext -d '{"name":"localnews-operator-helm"}' localhost:50051 api.Registry/GetPackage

{

"name": "localnews-operator-helm",

"channels": [

{

"name": "alpha",

"csvName": "localnews-operator-helm.v0.0.1"

}

"defaultChannelName": "alpha"

}

grpcurl -plaintext -d '{"pkgName":"localnews-operator-helm","channelName":"alpha"}' localhost:50051 api.Registry/GetBundleForChannel

// returns the ClusterServiceVersion object

Listing 6-38

Configure the Resources Managed by the Operator via CRD Spec

How Does the OLM Install Our Operator?

In addition to the two Pods, several Custom Resources have been created that are used to instruct the OLM on what to do. We will explore them in the following.

Firstly, a CatalogSource has been produced that is the OLM’s index for metadata. This is required to discover and install our Operator. In our case, it points to the internal grpc API address that is served by our bundle Pod. This is because we used the Operator SDK to install the Operator via OLM. When installing it without the Operator SDK, the index is usually stored in a container image that can be referenced by the CatalogSource. We can take a look at the generated CatalogSource by running “kubectl get catalogsources”. The result is shown in Listing 6-39.

NAME DISPLAY TYPE PUBLISHER AGE

localnews-operator-helm-catalog localnews-operator-helm grpc operator-sdk 3min

Listing 6-39

CatalogSource

Secondly, a Subscription has been created that expresses the intention to install the Operator. The Subscription references the channel (e.g., alpha, beta, stable), CatalogSource, installation plan approval (manual or automatic), and the starting ClusterServiceVersion. The Subscription can be viewed by running “kubectl get subscription”. It is shown in Listing 6-40.

NAME SOURCE CHANNELlocalnews-operator-helm-v0-0-1-sub localnews-operator-helm-catalog alpha

Listing 6-40

Subscription

Thirdly, the Subscription triggers the creation of an InstallPlan that defines the type and the status of the approval (approved, because we deployed it via Operator SDK) and the available ClusterServiceVersion names. The InstallPlan can be inspected with the command “kubectl get installplans” and is shown in Listing 6-41.

NAME CSV APPROVAL APPROVED

install-v2tlp localnews-operator-helm.v0.0.1 Manual true

Listing 6-41

InstallPlan

Lastly, an instance of our ClusterServiceVersion (CSV) has been created. This is the resource described by our CSV manifests in the bundle/manifests folder generated by the make bundle command. Check on it by running “kubectl get clusterserviceversion”. Listing 6-42 provides the output of the command.

NAME DISPLAY VERSION REPLACES PHASElocalnews-operator-helm.v0.0.1 Local News Operator 0.0.1 Succeeded

Listing 6-42

ClusterServiceVersion

Note that there are two further CRDs that we mention here for the sake of completeness. An OperatorCondition is a resource that is owned by the CSV and can be used by our Operator to communicate conditions such as that it is upgradeable to the OLM. An OperatorGroup can be used to provide a multitenant configuration by selecting particular namespaces for the Operator deployments. A CSV is a member of an OperatorGroup if the CSV is in the same Namespace as the OperatorGroup and the install modes of the CSV support the targeted Namespaces of the group. Both resources, OperatorCondition and OperatorGroup, have been created for our example and can be discovered using the corresponding “kubectl get …” command.

Cleaning Up

To clean up the Operator installation via OLM, we need to take the following steps:

1.
Delete our Custom Resources to uninstall the application: "kubectl delete LocalNewsApp mynewsapp”.
2.
Unsubscribe from the Operator to avoid that the OLM reinstalls it: “kubectl delete subscription localnews-operator-helm-v0-0-1-sub”.
3.
The ClusterServiceVersion resource represents that the Operator is currently installed. By deleting it, we uninstall the operator: “kubectl delete clusterserviceversion localnews-operator-helm.v0.0.1”.
4.
Delete the CRDs that are owned by your Operator. In our case, we need to delete one CRD: “kubectl delete customresourcedefinition localnewsapps.kubdev.apress.com”.

Deploying Operators via OLM Without Operator SDK

We described how to deploy our Operator via OLM using the Operator SDK. This is a great approach while developing the Operator. However, how will our users – who usually do not use the Operator SDK – deploy our Operator into their Kubernetes environment? The easiest way is to use the Operator Hub.²³

If we would have published our Operator to the OperatorHub, it would be simple to install. The users just have to install the OLM to their cluster which we did earlier (but using the Operator SDK) by running “operator-sdk olm install --version v0.19.1”. This will then create a CatalogSource pointing to the OperatorHub. Since we already installed the OLM to our cluster, we can check this as in Listing 6-43 by running “kubectl -n olm get catalogsources”.

NAME DISPLAY TYPE PUBLISHER

operatorhubio-catalog Community Operators grpc OperatorHub.io

Listing 6-43

How OLM Is Connected to OperatorHub

If you do not publish your Operator to the OperatorHub, your users must first create the respective CatalogSource for your Operator. To make this possible for our own Operator, we must first build and push the catalog image with the command: make -C k8s/operator/localnews-operator-helm catalog-build catalog-push CATALOG_IMG=quay.io/k8snativedev/news-operator-catalog:0.0.1 BUNDLE_IMGS=quay.io/k8snativedev/news-operator-bundle:0.0.1. Then we can use a copy of the CatalogSource resource we have created when running the bundle with operator-sdk run and replace the spec.address attribute with a spec.image attribute pointing to the new catalog container image. Alternatively, you can use the resource from snippets/chapter6/olm/localnews-catalogsource.yaml that is shown in Listing 6-44. If you want to use your previously built catalog container image, replace spec.image with your repository URL and create the CatalogSource with “kubectl apply -f snippets/chapter6/olm/localnews-catalogsource.yaml”.

apiVersion: operators.coreos.com/v1alpha1

kind: CatalogSource

metadata:

namespace: operators

spec:

sourceType: grpc

image: quay.io/k8snativedev/news-operator-catalog:0.0.1

displayName: Localnews Operator Catalog

publisher: apress.com

updateStrategy:

registryPoll:

interval: 30m

Listing 6-44

The CatalogSource Resource Pointing to Your Catalog Image

Then, the Operator can be installed by creating the respective Subscription resource pointing to the catalog source with the spec.source attribute. This is shown in Listing 6-45. You can again use the snippet and run “kubectl apply -f snippets/chapter6/olm/localnews-subscription.yaml”.

apiVersion: operators.coreos.com/v1alpha1

kind: Subscription

metadata:

namespace: operators

spec:

channel: alpha

source: localnews-catalog

sourceNamespace: operators

Listing 6-45

The Subscription Resource

Note

The catalog image build (that we have done via “make catalog-build catalog-push”) depends on opm²⁴ that is used to generate registry databases. In the Makefile of the Operator project, you will find how it is installed in your bin folder. The Operator SDK version v1.15.0 that we are using relies on version v1.15.1 of opm. macOs users should replace their opm binary with version v1.15.4 or higher since there is a bug prohibiting its execution.

Now the Operator is again installed, all its CRDs available, and we could easily deploy our application by creating a CR of kind LocalNewsApp. But wait a moment, what are we actually doing here? In the section GitOps in Chapter 5 we learned that there is an even better way to deploy resources because with GitOps we can make them reproducible. This brings us to another interesting question: How can we combine the GitOps approach with Operators?

Operators Love GitOps

The good news is that Operators and GitOps are a perfect match. Why? Because Operators allow us to manage our software including operational tasks in a declarative way by describing the desired state via YAML manifests. This is the ideal input for a GitOps approach that takes these YAML files and deploys them to our Kubernetes cluster while synchronizing their state with the state defined in our Git repository.

In other words, the desired state in GitOps resides in the Git repository and is defined by the YAML manifest/Helm Chart. GitOps assures that the manifests and the resources in the cluster are in sync. Whenever we change manifests in the Git repository, the changes will be applied to our resources in Kubernetes. What a GitOps tool such as ArgoCD cannot do, however, is to understand the semantics of what is inside the YAML files. This is where Kubernetes and Operator controllers come into play. For them, the content of the YAML defines the desired state, and the logic in these controllers tries to reach this state by orchestrating and running certain operations along with the five Operator capability levels. Up to that point, we have declaratively defined what the desired state should be, the controllers interpret this, compare it to the actual state in the cluster, and finally invoke imperative actions to change the state to the desired one.

Let us revisit our metaphoric example with the auto-refilling fridge. GitOps ensures that our fridge stays always filled with the items on our shopping list. What GitOps does not know, however, is what to do with the items in the fridge. An Operator, or more generally a controller (be it a built-in Kubernetes controller or a custom one deployed with your Operator), plays the role of a chef in our scenario. They interpret the items as ingredients and prepare a tasty meal from them as depicted in Figure 6-13. Although too many cooks spoil the broth, we need indeed many chefs to represent the multitude of different controllers. Each chef is specialized for certain types of meals and thus only interested in a subset of the items in the fridge. This is comparable to the different resource types or CRDs that are watched by our controllers. The result is a tasty dish for our guests: developers and operations can both work in a declarative way with Kubernetes. Much of the complexity is shifted into specialized controllers; what remains is to describe the desired state.

Figure 6-13
Chefs take the items from the auto-refilling fridge to prepare a meal

If you did not get hungry now, switch back to our Local News application. GitOps deploys our CRD manifests FeedAnalysis and LocalNewsApp to our cluster; the controllers embedded in our Operator will start their work on reaching the desired state described by these CRDs. This may involve actions such as watching other resources in the cluster, creating or modifying resources, as well as reporting the status. The process is depicted in Figure 6-14.

Figure 6-14
GitOps and Operators in action

The same principle can also be applied to the deployment of the Operator itself and not just the CRs, even when it is managed by the OLM. This is illustrated in Figure 6-15. The Operator Subscription as well as the respective CatalogSource is deployed to the cluster via a GitOps controller. The OLM takes these resources to create the Controller and bundle Pods as well as further CRDs that are required to manage the state of the Operator.

Figure 6-15
GitOps to deploy an Operator via OLM

Local News Loves GitOps and Operators

Let us do what we have described in Figures 6-14 and 6-15. This is the last example of the book but a spectacular one. It demonstrates that all the effort and complexity we faced in this chapter (and the whole book) will pay out well in the end. So let us take the perspective of an operations team who wants to deploy and manage the Local News application. This team wants to follow our Operators-loves-GitOps approach. What tasks are left to do? First of all, some preparation tasks:

1.
Make sure to clean up the environment, especially if you have followed the examples from the OLM section. You can run once again “minikube delete” and “minikube start --addons=ingress --vm=true --kubernetes-version=‘v1.22.3’ --memory=‘8g’ --cpus=‘4’ --disk-size=‘25000mb’ ”.
2.
Install the OLM with “operator-sdk olm install --version v0.19.1” or without operator-sdk using the steps described in Listing 6-46.
3.
Install ArgoCD as described in Listing 5-26 in Chapter 5 with “kubectl create namespace argocd” and “kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.1.7/manifests/install.yaml”.

curl -L https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.19.1/install.sh -o install.sh

chmod +x install.sh

./install.sh v0.19.1

Listing 6-46

Install the OLM Without Operator SDK

Since ArgoCD and the Operator OLM are now installed, we are ready to deploy and run our application. The only two things that are left to do are to install the Operator and trigger it to deploy the Local News application via a CR. First, create the CatalogSource, which contains the information where our Operator resources are, and the Subscription, which subscribes to a specific version and channel of our localnews Operator. But this time, we will not create it directly but tell ArgoCD where to find both manifests in the Git repository and let ArgoCD do the deployment. Run “kubectl -n argocd apply -f snippets/chapter6/gitops/olm-application.yaml” which creates an ArgoCD application that watches the Git repository and deploys what is inside the folder k8s/operator/gitops/olm. As expected, there are only two resources in this folder, the Subscription and the CatalogSource. Both are getting installed into the Namespace operators. This will dictate the OLM to install the related metadata resources as well as the controller manager Pod and the catalog Pod into the same Namespace (compare to Figure 6-13).

The last part is to deploy the CRD via ArgoCD. Create another ArgoCD application with “kubectl -n argocd apply -f snippets/chapter6/gitops/crd-application.yaml”. This ArgoCD application references k8s/operator/gitops/crd and deploys the Custom Resource of kind LocalNewsApp which triggers the deployment of the whole Local News application. The Namespace will be localnews-gitops-operators, and ArgoCD creates it automatically. Any change of the Custom Resource would now have to go through a git push to the Git repository. The ArgoCD dashboard provides a great overview of what has been deployed. Figure 6-16 shows the hierarchy of Kubernetes resources of the Local News application starting with the ArgoCD application that triggers the deployment of the CustomResource which in turn deploys the application components.

Figure 6-16
ArgoCD dashboard showing the Local News application

Listing 5-27 from the last chapter shows how to access it. And as soon as the two ArgoCD applications are done synching, the Local News application should also be up and running and accessible via “minikube -n localnews-gitops-operators service news-frontend”.

Wrapping It Up

At the end of this chapter, a few simple YAML files were enough to put the configuration of our application under full control in Git, let ArgoCD handle the rollout of them, and move operational tasks into the Operator that manages the Local News application with a dedicated custom Kubernetes controller. However, that came with a price, and that price was to do a lot of preparation and even software development while the Local News application itself remained untouched. But the benefits of having operational knowledge in code beyond everything we discussed in line with the five Operator capability levels are that it is documented and reproducible at scale. A well-written Operator can be put into the hands of anyone to quickly get up and running with your software across cloud or private data centers.

We didn’t want to make the confusion perfect throughout the first five chapters of this book, but what has just been said is exactly the reason why ArgoCD, Tekton, or Eclipse Che do not only provide an installation of their software via Kubernetes manifests or Helm Charts but also as Kubernetes Operators. And you can find all of them at the OperatorHub.²⁵

Closing Words and Outlook

At this point, we are at the end of our journey through the world of Kubernetes-native development. This journey led us through the different phases of the typical development lifecycle.

In the planning and design phase, we had to make fundamental decisions on the architecture and the technologies of our application. At the software architecture level, which defines the structure of our source code, we identified Hexagonal Architecture as a great fit. It defines a flexible structure with clear boundaries between the core – implementing the domain logic – and its surrounding technical details. At the system architecture level, the Domain-Driven Design approach helps us to identify bounded contexts and aggregates which are good candidates to separate our application into independently deployable components. These components could be of different sizes which brought us to a discussion about microservices.

In addition to the architecture, the technology that our application is based on plays an important role. Hence, we discussed the impact of different languages, runtimes, and frameworks on applications that will run on Kubernetes. Furthermore, we discussed various packaging approaches.

In the development phase, developers write the application code based on the architecture and technologies chosen before. Thereby, they can adopt Kubernetes at various levels. They can either ignore and develop just as in a precontainer world, integrate containers into their development, or go all-in on Kubernetes for development. One of the biggest challenges in modern development is to bridge the gap between development and production; hence, the first two approaches pose the highest risk of running into problems that appear for the first time when deploying into production. Adopting Kubernetes for development, however, does not necessarily imply that we must develop on Kubernetes – which is possible – but at least to deploy and test in Kubernetes. We can either code locally or in Kubernetes using a containerized IDE such as Eclipse Che.

The same decision on how to develop applies also to what needs to be developed. Should we develop Kubernetes agnostic or should we make use of the fact that our application will run on Kubernetes? There are different levels of integration into Kubernetes as we discussed. If we go all-in, we can access the Kubernetes API to inspect and manage the environment our application is running in. This helps us to automate certain tasks and to raise the abstraction level for interacting with Kubernetes. With Custom Resource Definitions, we can use the predefined extension mechanism of Kubernetes to write our own types of resources.

The code has been written, built, validated, and finally pushed into a code repository. How can we deliver this code into production in a continuous way? We can create Continuous Integration Pipelines that build, test, and deploy the code in an independent and integrated environment. This can be achieved in a Kubernetes-native way by using a tool such as Tekton. Tekton allowed us to run the various pipeline steps in separate containers orchestrated by Kubernetes. For the deployment part, Helm is the de facto standard for packaging and deploying applications to Kubernetes. This enables us to deploy the application into a test stage to run integrated tests.

Helm, however, is not the end of the road. The deployment itself is still rather imperative and nonrepetitive since it is triggered by CLI commands. Hence, we introduced the GitOps approach that enables us to use our Git repository as the single source of truth and synchronizes the Kubernetes resources with the manifests in our Git repository. Once deployed, our application finally runs on Kubernetes. Day 1 is over and the next day begins. With Operators, we described a way how we can turn operational Day 2 knowledge into code. Operators start their work when pipelines and package managers finish theirs. Depending on their capability level, they even allow your application to run in an autopilot mode. The operational code lives side by side with its application to manage and can thus be bundled as such. Good operators can be a differentiator from other competitors providing similar software.

So, finally, after reading this book, you should be well prepared for writing great Kubernetes-native applications leveraging the benefits and capabilities of Kubernetes. This has great potential to increase the overall quality of your software as well as increase your productivity when writing new containerized applications. However, especially the last chapter showed that Kubernetes combined with the toolset on top of it can bring tremendous value but also doesn’t lack complexity. Therefore, the book should provide you with a very good sense of judgment on how fast and how far you can follow the journey to Kubernetes-native development.

However, with this book, the journey is far from over. There are many more things to discover, and the ecosystem is evolving fast. Just have a look at the Cloud Native Computing Foundation (CNCF) landscape that gives an impression of the velocity.²⁶ Good starting points for further exploration are Service Meshes to better manage distributed applications and serverless for flexible scaling and further abstraction from the platform which both help to make your applications even more focused on business logic. Both topics are covered by other books.²⁷ And while in this book we focused on writing new Kubernetes-native applications, there are still, however, many existing applications out there that should be modernized and are good candidates to run on Kubernetes, and there are also great resources to start with.²⁸

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Operations As Code with Kubernetes Operators and GitOps

Create new playlist

Sign In

Sign Up

6. Operations As Code with Kubernetes Operators and GitOps

Kubernetes Operators – How to Automate Operations

What Is an Operator?

The Operator Design Pattern

Kubernetes Operators

Operator Capabilities

Installation

Upgrade

Backup and Restore

Reconfigure/Repair

Collect Metrics

(Auto)scale

What Is a Capability Level?

Level 1 – Basic Install

Level 2 – Seamless Upgrades

Level 3 – Full Lifecycle

Level 4 – Deep Insights

Level 5 – Autopilot

Develop Your Own Operator

Helm-Based Operator with the Operator SDK

Initializing the Project

Running Our Operator

Create an Instance of the Application

Modifying the CRD by Adding Helm Parameters

Modifying Resources Owned by the Operator

Deleting the CRD

One Chart to Rule Them All?

Cleaning Up

Deploying Our Operator to Kubernetes

Cleaning Up

Advanced: Go-Based Operator with Operator SDK

Implementing Our CRD with Go Structs

Writing the Controller Reconciliation Logic

Deleting Managed Resources Using Owner References

Setting the Status Field

Raising the Capability Level

Choosing the Right Type of Operator

Advanced: The Operator Lifecycle Manager – Who Manages the Operators?

The OLM Packaging Format

Deploying Our Operator via OLM

Generating the OLM Bundle for Our Own Operator

Building and Pushing the OLM Bundle Image

Installing the OLM Bundle

What Happened to Our Bundle?

How Does the OLM Install Our Operator?

Cleaning Up

Deploying Operators via OLM Without Operator SDK

Operators Love GitOps

Local News Loves GitOps and Operators

Wrapping It Up

Closing Words and Outlook

Table of Contents for
6. Operations As Code with Kubernetes Operators and GitOps