In the last chapter, we have learned how to deploy applications in an automated way via Pipelines and GitOps into different environments. The application is now installed and runs in our Kubernetes cluster. We can now sit back, relax, and observe our application running while our users enjoy our software.
But wait, let us anticipate what could happen after Day 1, when our application attracts more and more users, more instances of the application get deployed to the cluster, and we add new features. Sooner or later, we have to conduct planned (e.g., application updates) and unplanned (e.g., something goes wrong) operational tasks. Although we have Kubernetes in place, there are still challenging operational tasks left that are not yet automated because they are, for example, application specific and cannot be addressed by existing Kubernetes capabilities, for example, migrating an application’s database schema. But who could take care of these tasks? To find the answer to this question, the last part of our journey leads us to the rendezvous between developers and operations: Kubernetes Operators.
Kubernetes Operators – How to Automate Operations
Monitor the stability and the performance of the application.
- When the monitoring revealed issues, depending on the type of issue
Reconfigure/repair the application.
Scale up the application.
Resize storage volumes.
Notify the developers to change their code.
Conduct failover activities.
- When a new version has been released
Decide when to deploy the new version.
Update the application.
Migrate data if necessary.
Proactively create backups and restore from a previous state if necessary.
In Chapter 2, we mentioned different types of platform services that can help us address Day 2 operations, for example, monitoring and tracing. Another example is the Kubernetes health checks that can monitor our application and restart it when it is unhealthy. However, this is a quite limited feature. Maybe it is not necessary to restart the application but just change its configuration to fully recover. Or maybe it is a permanent fault caused by a bug in the code, then restarting won’t suffice. The limitations we are experiencing with platform services originate in their generality. They aim at modularizing cross-cutting concerns for all types of applications. For Day 2 operations, we will often face application-specific challenges.
Consider a backup process, for example. How do I store my backup and what kind of preparations need to be made before I can, for instance, create a dump of my database? Or what do I need to do to migrate my data to a new schema? We require someone that has a deep understanding of the application, its requirements, and specifics. What if we could automate all this and put this into code that we could package and deliver with our application? Kubernetes Operators – in contrast to Helm and others that provide repeatable deployments – cover the full lifecycle of an application.
What Is an Operator?
The Operator Design Pattern
The Operator design pattern describes the high-level concepts of an Operator. It constitutes three components: the code that captures the operational knowledge, the application to be managed by the Operator, and a domain-specific language to express the desired state in a declarative way. Figure 6-2 illustrates the interaction of the components. The Operator code watches the desired state and applies changes whenever it deviates from the actual state. Furthermore, it reports the current status of the application.
Wait a minute! The terms actual and desired state should be familiar from the last chapter. Are we talking about GitOps? Actually, no. In Chapter 1, the Kubernetes-Controller-Manager was briefly introduced, and it was pointed out that there are many “built-in” controllers for the core Kubernetes Resources. One example is the Replication Controller that ensures the declared number of Pods is running. If you intentionally or accidentally delete a Pod, the Replication Controller takes care to bring it up again. An Operator can do the same for you – but not just for a ReplicaSet or a Deployment but for your entire application with all its Kubernetes Resources. Moreover, we will soon discover that Operators love YAML. And that’s when GitOps can play a role. But this is something that will be discussed in the last part of this chapter.
The pattern we have described so far is a high-level concept and hence platform independent. It could be implemented in various ways on different platforms. Let us now look at how this is implemented in Kubernetes.
Kubernetes Operators
In Kubernetes , we have the perfect means to implement the Operator Pattern. Firstly, we can extend the Kubernetes API by application-specific endpoints so both human operators (e.g., via kubectl) and code can interact with the application. Custom Resource Definitions can be used to capture the desired state. Its data structure can be customized to serve as an abstraction layer for application-specific configuration. We could, for instance, express that we want to have two instances of our application in the spec section. The same applies to the status of the application that we can report in the corresponding status section. Depending on the status of its components/resources, we can report that the application is, for instance, either healthy or unhealthy.
Secondly, we need some means to encapsulate the code that watches this CRD and runs a control (reconciliation) loop that constantly compares the actual state with the desired one. This is exactly what a Kubernetes controller does, and, fortunately, we have already described how to write our own custom Kubernetes controller in Chapter 4.
Operator Capabilities
Let us now – after shedding some more light on what an Operator is and which components it constitutes of – look closer at the Day 2 operations that we have already mentioned. How should an Operator support us with these?
Installation
An Operator should automate the installation process of all necessary resources for an application just like Helm does. Thereby, it should check and verify that the installation process went as expected and report the health status as well as the installed version of the application. But the Operator goes much further than Helm, and we will soon discover that Helm can even serve as the perfect starting point to build an Operator.
Upgrade
An Operator should be able to upgrade an application to a newer version. It should know the dependencies and the required steps to fulfill the upgrade. Furthermore, the Operator should be able to roll back if necessary.
Backup and Restore
An Operator should be able to create backups for the data it is managing, especially if it is a stateful application. The backup can either be triggered automatically or manually. The time and location of the last backup should be reported. Furthermore, it should be possible to restore the backup in such a way that the application is up and running after a successful restore. The manual triggering of backups and restores could be implemented based on Custom Resources.
Reconfigure/Repair
The reconfigure and repair capability is also often referred to as auto-remediation and should ensure that the Operator can restore the application from a failed state. The failed state could, for instance, be determined via health checks or application metrics. The Operator that has deep knowledge of the application knows which remediating action it should take. It could, for example, roll back to a previous well-behaving configuration.
Collect Metrics
The Operator should provide metrics about the operational tasks which were handled, potential errors, and the (high-level) application state. It may also take over the task of providing metrics for its operands, that is, the applications it is managing.
(Auto)scale
Think of a complex application with several components. How do you know for which components a scale-up actually leads to increased performance? The Operator should know the bottlenecks and how to scale the application and its resources horizontally and/or vertically. It should finally check the status of all scaled resources. For autoscaling, it should collect the metrics the scaling should rely on. When a certain threshold is reached, it should scale up or down.
What Is a Capability Level?
Level 1 – Basic Install
The Operator is able to provision an application through a Custom Resource. The specification of the Custom Resource is used to configure the application. For example, our Local News application can be installed by creating a Custom Resource. The Operator creates all the Deployments, Services, and Ingress for our components. It also deploys the database and initializes an empty database schema. When all services are ready, it reports this in the status field of the Custom Resource.
Level 2 – Seamless Upgrades
At the seamless upgrade level, the Operator can upgrade its version as well as the version of its managed application. This can but does not necessarily need to be coincident. One common approach is to upgrade the application during the upgrade of its Operator. Another way to implement this is that the upgrade of the application is triggered by a change in a Custom Resource. For example, the upgrade of our Local News application might be managed by a level 2 Operator that updates the container image versions of the managed Deployment resources and migrates the database schema if necessary. It should also know the sequence to upgrade the components. Thereby, it would try to minimize the downtime of the application. Each version of the Operator updates to its desired Local News application version.
Level 3 – Full Lifecycle
A level 3 Operator is able to create backups, restore from those backups, handle complex reconfiguration flows if necessary, and implement failover and failback. For example, the database of the Local News application could be dumped once per day and stored in an object store. We could then create a Custom Resource to express that the backup should be restored by the Operator.
Level 4 – Deep Insights
An Operator that is on level 4 provides deep insights about the managed application and itself. More specifically, it collects metrics, yields dashboards, and sets up alerts to expose the health status and performance metrics. For example, all components that constitute the Local News application could provide metrics about the number of requests per second (rate), number of errors (errors), and the amount of time a request takes (duration). These would be the key metrics defined by the RED method.1 Furthermore, the Operator could create alerts with thresholds for some of its metrics such as the free disk space for the volume the database is using.
Level 5 – Autopilot
Operators at the highest level minimize any remaining manual intervention. They should be able to repair, tune, and autoscale (up and down) their operands. For example, the Operator for the Local News application could monitor the throughput of the Feed-Scraper component and discover that there are more incoming feeds to be analyzed than can be processed. Hence, it would move the analysis component to another Kubernetes compute node with less utilization.
To reach the highest capability level, you need to put a lot of effort into the development of the Operator. But this may pay out in the end since you gain a lot of operational automation. Furthermore, the Operator that you deliver with your application could well differentiate you from your competitors. Let us say you are looking for a certain application to run on Kubernetes, for example, a database, and you find various products with similar features and properties. However, the vendor of one of the products offers a level 5 Operator. You might decide in favor of the product that is backed by an Operator.
Develop Your Own Operator
The Helm-based Operator takes existing Helm charts (or creates new ones) and turns them into an Operator. It creates a simple CRD that accepts your well-known Helm parameters as part of the spec of your Custom Resource. In addition, it generates a controller that takes care of synching the state of the Kubernetes resources with that of your Helm chart. Every time you change your Custom Resource, it will create a new Helm release, and vice versa, if you change resources in the cluster, it will change them back to the state described by the Custom Resource. This is the easiest but most limited type of Operator since it is only able to implement capability levels 1 and 2.
The Ansible-based Operator turns an Ansible4 playbook into an Operator. It creates a simple CRD, and whenever a Custom Resource of this kind will be created, updated, or deleted, an Ansible role is run by the generated controller. The contents of the spec key can be accessed as parameters in the Ansible role. We won’t elaborate on this any further as it is out of the scope of this book. If you would like to find out more about this approach, please refer to the docs about Ansible Operators.5
The Go-based Operator generates a Go code skeleton with a reconciliation loop that can be extended with your custom logic. Furthermore, it allows you to express your CRD as a Go struct. This is the most flexible approach of the three but also the most complex one.
In the following, we will describe how to build a Helm-based and a Go-based Operator. We will demonstrate this with the Local News application. You can either create a new project folder and follow our approach step by step, or you can have a look at the resulting projects that can be found in this book’s Git repository6 in the folder k8s/operator. You can copy all commands that we will use in the following from the file snippets/chapter6/commands-chap6.md.
Helm-Based Operator with the Operator SDK
Foremost, since both approaches we are demonstrating in this chapter rely on the Operator SDK, we will first need to install it as described by the installation guide.7 In the following, we used version v1.15.0 which can be found in the corresponding GitHub release.8 Install it and then you will be able to use the CLI with the command operator-sdk.
- 1.
Use the Operator SDK to generate a project for us that uses our existing Helm chart for the Local News application. Thereby, it generates a LocalNewsApp CRD that can be used to parameterize our Helm chart.
- 2.
Run the Operator locally from our project structure. It communicates with the cluster via the Kubernetes API.
- 3.
Deploy a simple instance of the Local News application by creating a LocalNewsApp Custom Resource.
- 4.
Modify the LocalNewsApp Custom Resource and add a custom feed URL to demonstrate how to define Helm parameters via the spec section of the CRD.
- 5.
Demonstrate what happens when we change resources managed by the Operator.
- 6.
Delete the LocalNewsApp to show how it cleans up its managed resources.
- 7.
Generate a second CRD to create Feed-Scraper Deployments similar to what we did in Chapter 4.
- 8.
Finally, deploy our Operator to the Kubernetes cluster instead of running it locally. This requires building and pushing a container image.
Initializing the Project
Initialize the Helm-Based Operator from the Existing Helm Chart
- 1.
A folder helm-charts that contains a copy of your Helm chart or a new basic Helm Chart that you can extend. By pointing to our own Helm Chart, you will find a copy of it here, but we will, later on, create an additional one from scratch.
- 2.
A folder config/crd which contains a generated CRD YAML. The group, names, and versions are derived from our operator-sdk init parameters such as domain, group, version, and kind. The CRD for the Local News application looks similar to the one introduced in Chapter 4. However, the schema of the spec is not further defined. It is declared as the type object with x-kubernetes-preserve-unknown-fields: true. This allows us to define all parameters available from our Helm Charts as part of the spec. However, we could also define parameters that do not exist in the Helm chart. Those parameters will be ignored. Please note that this is a generated YAML skeleton that you can and should extend. That is, you could define a stricter schema that restricts the contents of the spec only to those parameters that are supported by your Helm Chart.
- 3.
A folder config/manager which contains a Namespace, a Deployment, and a ConfigMap for the controller manager. It manages the localnewsapp-controller that implements the bridge between the LocalNewsApp CRD and our Helm chart. It watches our Custom Resources that describe the desired state and also the actual state, which are, for example, the Deployments and Services of the Local News application. If we change the Custom Resource, for instance, this triggers a new Helm release to match the desired and actual state. But it works also in the other direction. When we change the actual state by changing or deleting, for example, a Deployment of the Local News application, it gets immediately rolled back or replaced.
- 4.
A folder config/default containing the default “kustomization” as the starting point for Kustomize. The following info box provides more information about Kustomize.
- 5.
A folder config/prometheus containing a ServiceMonitoring CR to tell a Prometheus monitoring server that the metrics provided by the controller should be scraped. If you want to use this, you will need to enable this in the config/default/kustomization.yaml by uncommenting all sections with ‘PROMETHEUS’. The Operator will provide a set of standard metrics about the Go runtime performance (yes, the Helm Operator controller manager is written in Go), reconciliation stats, resources created, REST client requests, and many more.
- 6.
A config/samples folder that will contain an example CR with all available default values from the Helm chart. This is copied from the values.yaml of your chart into the spec section of the sample CR.
- 7.
A config/rbac folder with various Roles, RoleBindings, and a ServiceAccount for the controller manager. The set of roles and bindings allows the controller manager to manage its CRDs. In addition, you will find an editor and a viewer role for general use as well as a set of other more advanced resources that we will not use in our example.
- 8.
A config/scorecard folder to run tests against our operator. We will not dive into this topic further. If you are interested, you can read the scorecard docs.9
- 9.
A config/manifests folder to generate a manifests directory in an Operator bundle. We will explain the details in the section “Advanced: The Operator Lifecycle Manager – Who Manages the Operators?”.
- 10.
A watches.yaml defines what kind of resource the Operator should watch. You will find your CRD here, but you could also add additional watches using the operator-sdk create api command.
- 11.
A Dockerfile to build a container image for the controller manager based on a standard Helm Operator image. It adds the helm-charts folder and watches.yaml to the image.
- 12.
A Makefile to build and push the container image for your Operator, respectively the controller manager that will enforce the desired state as soon as we deploy Custom Resources. Those will trigger the deployment of our Local News application. It additionally provides various deployment commands such as (un)installing the CRDs as well as (un)deploying the controller manager. We will get to know many of them in the following.
- 13.
A PROJECT file containing metadata about the generated project structure for the Operator SDK.
Kustomize10 is a template-free way to customize application configuration and is directly built into the kubectl CLI. You can enable it by running “kubectl apply -k”. The kustomization.yaml lists the resources that should be customized. Furthermore, you can configure it to generate new resources and transform existing resources, for example, by patching them. The Operator SDK makes use of Kustomize, and it can also be used in the context of ArgoCD.
As you can see, the Operator SDK generates quite a bit of configuration and code that gives us a jumpstart for writing our Operator. The generated layout is just a starting point though. Since it is only generated once (as long as you won’t repeat the init command of course), you should extend it as required. Nevertheless, the code that we have generated so far is fully functional. So let us have a look at how we can run our Operator.
Running Our Operator
Run the Operator from Outside the Cluster
Create an Instance of the Application
Basic LocalNewsApp YAML
Operator Output After Creating the LocalNewsApp Custom Resource
Listing Helm Releases Triggered by the Operator
Installation Details Reported in the Status Section
List Resources Created by the Operator
This worked quite well! But we must admit that the Custom Resource we have used was quite simple. Let us add some configuration parameters in the spec part.
Modifying the CRD by Adding Helm Parameters
Configure the Resources Managed by the Operator via Custom Resource Spec
Operator Logs After Modifying the Custom Resource Spec
Helm Revision After Changing the Custom Resource
To check if the application is now actually using CNN as the new RSS feed, you could head over to the News-Frontend in your browser. If you click on the news rendered on the map, you should find news items from CNN. And while in the previous chapters of the book we pointed you to the Kubernetes Service of the News-Frontend which had been exposed via NodePort, this is usually not something someone would do in the runtime phase. The Helm chart is already configured to expose the application correctly: via a Kubernetes Ingress. The Ingress can be activated by further configuring the mynewsapp instance of the LocalNewsApp CR. All you have to do is update the file snippets/chapter6/news-sample3.yaml with the IP of your Minikube virtual machine. You can get it by running “minikube ip” on the command line. Afterward, run “kubectl apply -f snippets/chapter6/news-sample3.yaml” to update the CR. By running “kubectl get ingress”, you get the URL that the UI has been exposed with. You should find the old BBC news entries that are already in the database but also new ones from CNN! Caution, if you work with Minikube, make sure the ingress addon is enabled (“minikube addons enable ingress”).
Let us reflect on what has actually happened so far. We changed our CR and the Operator synchronized its operands, respectively the resources created by Helm with the new state. But how does that work? The controller runs a reconciliation loop that is triggered when one of the watched CRDs is changed. In this case, a Helm upgrade command is executed that will upgrade the current Helm release. This is a great feature, but what about the other direction: someone changes one of its managed resources, that is, is our Operator capable of prohibiting so-called configuration drifts?
Modifying Resources Owned by the Operator
Helm Operator Recreating the Deleted Deployment
Helm Operator Patching Drifted Replicas Configuration
Helm Operator Ignoring Unmanaged Parts of Resource Specifications
Deleting the CRD
What is left to look at? Right, we should do some cleanup work by removing the CR and running “kubectl delete -f snippets/chapter6/news-sample3.yaml”. The Operator will now trigger Helm to uninstall the Chart and thus remove all resources that have been previously created. You can follow this process by again looking at the logs of the running Operator which will output Uninstalled release. The command “helm list” will then also return an empty result. Afterward, you can safely stop the running Operator with Ctrl+C.
One Chart to Rule Them All?
The Operator SDK gave us a jumpstart for generating an Operator out of an existing Helm chart. Nevertheless, an Operator can do things beyond installing, modifying, uninstalling, and reconciling between the desired and the actual state of the Local News application. Let us recall what we discussed in Chapter 4. We showed how to create an abstraction of the Feed-Scraper component of the Local News application via a CRD. With this, we were able to deploy multiple instances of the Feed-Scraper using a FeedAnalysis CRD.
To add another CRD to an existing Operator, we use the create api command. To create this new endpoint for a FeedAnalysis CRD, run “operator-sdk create api --group kubdev --version v1alpha1 --kind FeedAnalysis” from the folder k8s/operator/localnews-operator-helm.
This generates a new Helm Chart into k8s/operator/localnews-operator-helm/helm-charts/feedanalysis as well as adds our CRD to the watches.yaml. The Helm Operator controller manager will then run an additional controller called feedanalysis-controller responsible for the new CRD.
The generated resources and code are a nice starting point for us. But we don’t need everything. Therefore, delete all unnecessary resources: hpa.yaml, ingress.yaml, service.yaml, deployment.yaml, NOTES.txt, and the tests folder from the templates folder. Then copy the Feed-Scraper Deployment YAML from the Helm Chart that we have already generated. It is located at helm-charts/localnews-helm/templates/feed-scraper-deployment.yaml. Copy it into the folder helm-charts/feedanalysis/templates. In addition, remove everything from values.yaml except for the serviceAccount section and add the feedscraper as well as the localnews section from helm-charts/localnews-helm/values.yaml. You might also have a look at our complete Helm Operator for reference. As mentioned earlier, you can also find a sample in the repo in the folder k8s/operator/news-operator-helm.
In the Feed-Scraper Deployment YAML, we must now make sure that the naming of the deployed resources is unique for every Helm release. This is necessary because we want to be able to run multiple Feed-Scraper deployments side by side in the same namespace. For this task, we can make use of the named templates generated into the _helpers.tpl file. In Listing 6-14, we use the named templates feedanalysis.fullname and feedanalysis.selectorLabels to set the metadata.name and the matchLabels.
Helm named templates, sometimes also referred to as a partial or a subtemplate, provide the ability to define functionalities that can be reused inside different templates using the include function. By convention, the named templates usually reside in a file called _helpers.tpl.
Making name and matchLabels Unique per Helm Release
Making Template Section Labels and Container Name Unique per Helm Release
Create the LocalNewsApp Without Feed-Scraper
Create Two FeedAnalysis Resources (Only One of Them Is Shown Here)
Deploy two of them by running “kubectl apply -f snippets/chapter6/feeds-sample1.yaml” and “kubectl apply -f snippets/chapter6/feeds-sample2.yaml”. One is for CNN and the other one for BBC. You could look up other RSS Feed URLs and create more Custom Resources. However, mind that some RSS Feeds might have a proprietary format and therefore won’t work.
Pods Running After Deploying the App and Two Feed-Scrapers
Cleaning Up
Cleaning Up
Afterward, stop the Operator process on your machine with Ctrl+C.
Deploying Our Operator to Kubernetes
Until now, we ran our Operator directly from our project folder as a local process that communicated with our cluster via the Kubernetes API. But as we learned in Chapters 3 and 5, the closer we are to the target environment, the more meaningful our tests will be. So let us deploy the Operator to our Kubernetes cluster and repeat what we have done so far.
First of all, we need to build and push the container image of the Operator to be accessible inside Kubernetes via running “make -C k8s/operator/localnews-operator-helm docker-build docker-push IMG=<registry>/<repo>/news-operator:0.0.1”. This triggers a container image build and a push to your Container Registry. Both, by default, make use of Docker. In case you haven’t installed it yet, refer to the official guide.14 We will use quay.io/k8snativedev/news-operator:0.0.1 as the IMG variable in the following. If you want to build your own Operator Images, replace this with your own target Container Registry that you might have already set up as required in Chapter 5 to run the pipeline.
After pushing the image, we are ready to deploy the Operator. Run the deployment script with “make -C k8s/operator/localnews-operator-helm deploy IMG=quay.io/k8snativedev/news-operator:0.0.1”. This will create a new Namespace localnews-operator-helm-system and deploy all the necessary resources such as the CRDs and other resources required by your Operator into that Namespace.
Controller Manager Running As a Pod in Kubernetes
If you create the Custom Resource again, for example, by running “kubectl apply -f snippets/chapter6/news-sample1.yaml”, you will see the Operator Pod doing its work either by looking at the logs with “kubectl -n localnews-operator-helm-system logs deployments/localnews-operator-helm-controller-manager manager -f” or watching the Pods of the Local News application getting created again.
With this, we conclude the Helm-based Operator excursion. We demonstrated how we can quickly build an Operator based on an existing Helm chart. We showed how to use CRDs to control Helm-based releases and how to associate different CRDs with different resources and make the management of the Local News application much more convenient and expressive. Moreover, we have the controller manager that keeps synching desired and actual states. Hence, even though we use Helm, we gained the ability to constantly reconcile the managed resources that have been deployed whenever there is either a change of the CRD or a change in the managed resources.
However, as we have already mentioned when talking about the different languages supported by the Operator SDK, a Helm-based Operator can at most reach level 2 (basic installs + seamless upgrades). It can hardly implement level 3 since this would involve writing custom logic that is not feasible with pure Helm. Helm is a package manager with a template engine that is less flexible compared to a general-purpose language. The latter is better suited to implement complex and individualized controller logic. Hence, let us look at Go-based Operators in the following.
Cleaning Up
To clean up, you can just delete the Custom Resource via “kubectl delete -f snippets/chapter6/news-sample1.yaml” and run “make -C k8s/operator/localnews-operator-helm undeploy” which removes the namespace localnews-operator-helm-system with all its resources.
Advanced: Go-Based Operator with Operator SDK
First of all, be aware that this section requires basic skills in the Go language. If you are not familiar with it, you might skip this section and jump directly to the section “Choosing the Right Type of Operator.” Alternatively, you could also take a detour to https://developers.redhat.com/topics/go to learn more about Go.
- 1.
Map the YAML/JSON of our LocalNewsApp CRD to a Go data structure so that we can read and write its state from our controller.
- 2.
Write the controller reconciliation logic to create resources from our CRD and update managed resources if they deviate from the desired state. We exemplify this with a single managed resource: the Kubernetes Service of the News-Backend.
- 3.
Ensure that the managed resources are cleaned up when deleting the Custom Resource by using owner references.
- 4.
Report the status of the CR deployment via LocalNewsApp CRD. Again, we will do this just for the Kubernetes Service of the News-Backend.
- 5.
Draft on how we could raise the capability level of the Operator.
Creating a Go Module
Initializing the Go Operator with Operator SDK
- 1.
config folder with the same subfolders as in the project structure we have generated with the Helm Operator
- 2.
Dockerfile to build the Operator image
- 3.
Makefile defining various useful build, run, and push commands that we will use in the following
- 4.
PROJECT file containing metadata about the generated project structure for the Operator SDK
- 5.
main.go file containing the main function to initialize the controller manager and its components such as metrics, health, and readiness checks
- 6.
Folder named hack containing a file boilerplate.go.txt to add headers to the generated source files
In the following, we will take this generated project structure to build a basic Operator that is able to create, update, and delete resources based on our CRD as well as report the deployment status. For the sake of brevity, we will do this only for one resource: the Kubernetes Service for the News-Backend. In our Git repository16 in the folder k8s/operator/news-operator-go, you can find the complete version of the Operator for your reference.
The project structure and resources generation is implemented with the Kubebuilder19 framework which aims at reducing complexity when building and publishing Kubernetes APIs in Go. This is why you will find several so-called marker comments starting with //+kubebuilder. These are used to provide metadata/configure the code and YAML generator of Kubebuilder.
Implementing Our CRD with Go Structs
Generating a Go-Based Controller and CRD with Operator SDK
The LocalNewsApp Struct Type
The data structure defined by the nested type structs can be mapped to a JSON structure that can also be represented as YAML. This is why we find the String literals right to the different attributes, for example, `json:"feedsUrl,omitempty"`, which defines a tag that specifies that this attribute is mapped to the key feedsUrl in the JSON. The omitempty defines that empty values should be omitted from the JSON.
Excerpt of the Generated LocalNewsApp CRD YAML
In the same folder as the localnewsapp_types.go, there is a corresponding file called zz_generated.deepcopy.go that contains logic for creating a deep copy (copy all attributes, and if they are of a complex type, copy also their attributes and so on) of your types. We will use this in the next part of our implementation. This deep copy logic can be regenerated by running the command “make generate” which we should do since we changed the localnewsapp_types.go file.
Writing the Controller Reconciliation Logic
The Generated LocalNewsApp Controller
Inserting Our Own Logic into the Reconcile Method from Listing 6-26
Reconciling the News-Backend Service
The model/backend_service.go Providing Functions for the News-Backend Service
Then, let us switch back to the body of the reconcileBackendService function from Listing 6-28. We try to get the Service resource via the Kubernetes client by its name news-backend. If we cannot find it, we set a controller reference, that is, the newly created Service resource will be owned by our CRD. We will talk about what this means in a second. Finally, we instruct the Kubernetes client to create the new Service resource.
Otherwise, if the resource already exists, we make sure to set the desired state, for example, if someone changed the Service port, we will ensure that it will get overridden by the desired one. This is implemented in the ReconcileBackendService function shown in Listing 6-29. It just makes a deep copy of the current Service that has been found and replaces the spec part with the desired one.
Extending the Body of the SetupWithManager Function in the Controller
Let us now see the new Operator in action by running it on our Kubernetes cluster (locally from outside the cluster as we did with our Helm Operator) with the command: “make install run”. If Go complains about missing dependencies, for example, the Kubernetes API package, you can install them via “go get -d k8s.io/[email protected]”. You can easily test the behavior of the Operator when you create a new LocalNewsApp with “kubectl apply -f snippets/chapter6/news-sample1.yaml” in another terminal while the Operator is running.
We will see that a new Service resource is created by our Operator with the command “kubectl get svc news-backend -o yaml”. We can run the same configuration drift tests as with the Helm Operator: delete the Service or change the NodePort and you will see that our Operator will instantly recreate or update it.
Deleting Managed Resources Using Owner References
News-Backend Service YAML Metadata Excerpt
The owner reference points to the CR named mynewsapp of type LocalNewsApp that we have previously created. When our CR is deleted, this Service will also be deleted. Hence, if you missed the code responsible for deleting the resources created by our Operator, this is the reason. We do not need to write code because Kubernetes will do automatic garbage collection when the owner resource is deleted, that is, if we delete our LocalNewsApp resource with the name mynewsapp, the News-Backend Service will also be deleted. We must just ensure that we set the owner references accordingly.
Setting the Status Field
When we revisit the Operator Pattern shown in Figure 6-2, we can see that we addressed two aspects so far with our Go-based operator: declaratively defining the desired state via our CRD and managing resources (more specifically one resource, the Service for the News-Backend) necessary to deploy and run our application. The third aspect of the pattern is reporting the current status. So let us demonstrate this with a simple example. We report the type and name of our managed resources when it is reconciled.
The LocalNewsApp Struct Type Continued
Setting the Status in the LocalNewsAppReconciler
Initializing and Updating the Status in the Reconcile Function
Printing the Status of the LocalNewsApp resource
By managing the resources, using the CRD as a source for defining the desired state, and setting the status field, we have demonstrated the foundations for implementing the Operator Pattern based on Go. We still have a lot of work to do because everything we did for the News-Backend Service would need to be done for the other resources as well. This is, however, routine work. Let us rather discuss how we could extend this code skeleton to raise the capability level of the operator.
Raising the Capability Level
Level 1, basic install – To reach level 1, we need to create the reconciliation logic for all resource types, for example, Deployments, Services, and the Ingress for the components including the database of the Local News application that our Operator should own. For feature parity to the Helm-based operator, we should also generate the API for the FeedAnalysis resource and write an additional controller with the respective reconciliation logic to manage Feed-Scraper components. We should also use the status field of the FeedAnalysis to report the progress of the feed scraping process.
Level 2, seamless upgrades – To simplify the updating process, we could tie the version of the managed application to the version of our Operator. The advantage is that our Operator then only needs to manage one version of its resources. To address specifics during the upgrade, we could add additional logic, for example, if we would switch to another version of PostgreSQL and needed to migrate data.
Level 3, full lifecycle – This level requires backup and restore and failover capabilities. We could add additional CRDs to control the desired behavior, for example, how often we should run the backup process. For implementing the actual tasks, we could spawn news Pods, Jobs, or CronJobs, for example, a Job that runs a PostgreSQL dump and stores it to a volume or an object store.
Level 4, deep insights, and level 5, autopilot – This can also be implemented with a combination of new CRDs, new managed resources (e.g., dashboards), and new Go code that implements the metrics collection or other more advanced operational tasks. Since we are using Go as a general-purpose language, there are in principle no limits.
Choosing the Right Type of Operator
We have learned a lot so far about the Helm and the Go Operator. For both, we can generate similar project structures with various artifacts such as configuration files and code via the Operator SDK. These artifacts serve as a starting point for our customizations. However, there is still a big difference in terms of what is generated.
On one hand, the Helm-based Operator produces a ready-to-use level 2 Operator. As long as you have a mature Helm Chart, you only need a few additional customizations, for example, extending the schema for your CRD. On the other hand, you must rely on Helm to define the logic of your operator: you have thus less flexibility, and it is difficult to write an Operator with a capability level greater than 2.
The Go-based Operator is on the other side of the spectrum. It generates only a basic code skeleton, and hence you need to start almost from scratch: you only have a Go data structure representing your CRD as well as a controller skeleton to add your code to. This code, however, can implement various operational tasks potentially allowing us to write an Operator that is finally able to run in autopilot mode (level 5).
Now you might say, I like the idea of writing my Operator logic with a general-purpose language, but I have less experience with Go, can’t I use Java or Python instead – or any other language of choice? Sure you can. If you prefer Java, for example, you can use the Java Operator SDK20 which is another open source project inspired by the Operator SDK. It allows you to generate Java code in different flavors, for example, pure Java, Quarkus, or Spring Boot.
A further approach is to write your Operator from scratch without any SDK. This is often referred to as a Bare Language Operator. In Chapter 4, we started out writing our controller using the Fabric8 Kubernetes client. We could use this as the basis for an Operator.
Finally, it is also worth mentioning that automation tools such as Ansible are well suited to be used for implementing the Operator logic. Ansible allows you to write automation code in a declarative manner, and there are various Ansible collections for Kubernetes that help to get productive very quickly.
Advanced: The Operator Lifecycle Manager – Who Manages the Operators?
We have learned so far that Operators can automate operational tasks such as installation and upgrades. Furthermore, we have implemented a simple Helm-based and a Go-based Operator. But who installs, manages, and upgrades the Operators themselves?
Making Operators available to your Kubernetes cluster via catalogs
Keeping your Operators up to date via auto-update
Checking Operators’ compatibility with their environment and dependencies with other Operators
Preventing conflicts with Operators owning the same CRDs
Providing a user interface to control the Operators
The OLM Packaging Format
Descriptive metadata – Contains information about name, description, keywords, maintainer, provider, maturity, version, and minKubeVersion, among others.
Installation metadata – Describes the runtime components of your Operator and their requirements. You define, for instance, the deployments for your Operator Pods as well as the required (cluster) permissions. Another key property is the supported installation mode: OwnNamespace, SingleNamespace, MultiNamespace, and AllNamespaces (for details, see the following info box).
Owned APIs – Lists all CRDs owned by your Operator. This is, for example, important to prevent conflict with other Operators managing the same CRDs.
Required APIs – Lists all CRDs required by your Operator. This allows the OLM to resolve dependencies to other operators that can then be installed automatically.
Native APIs – Lists all (Custom) Resource Definitions required by our Operator outside the scope of the OLM. This could, for instance, be native platform resource definitions such as Deployments or Pods.
Besides this file, you can package additional manifests in your Operator bundles such as ConfigMaps, Secrets, Services, (Cluster)Roles, (Cluster)RoleBindings, ServiceAccounts, PrometheusRules, ServiceMonitors, PodDisruptionBudgets, PriorityClasses, and VerticalPodAutoscalers. Please note that we did not talk about all of these resource types in this book because they are out of scope. Most are Kubernetes core resources, except PrometheusRules and ServiceMonitors which are Prometheus-specific resources. We have just listed them here for the sake of completeness.
In addition to the manifests described earlier, there is a metadata folder. In this folder, there is another YAML file called annotations.yaml. The annotations listed in this file help the OLM to determine how your bundle should be added to a catalog of bundles which is a set of different bundle versions of your Operator. We will learn more about this in a second. An example is an annotation called operators.operatorframework.io.bundle.channels.v1 defining in which catalog channel the bundle should appear.
Channels allow defining different upgrade paths for different users. Each channel may contain different versions of your operator. For example, you could provide an alpha, beta, and stable channel.
Finally, everything must be packaged into a container image containing the manifests, metadata, and an optional test folder. This container image will be labeled with the annotations from the annotations.yaml and pushed into a Container Registry of choice. Another way to publish and share your Operator is the so-called OperatorHub. It hosts a searchable collection of Operators. The Kubernetes OperatorHub is a great place to publish your own Operator so others can use it.21
There are four possible installation modes for Operators:
OwnNamespace – The Operator can manage resources in the same Namespace it is installed in.
SingleNamespace – The Operator can manage resources in a single Namespace different from the one it is installed in.
MultiNamespace – The Operator can manage resources in multiple Namespaces of the cluster.
AllNamespaces – The Operator can manage resources in all Namespaces of the cluster. This is the supported mode for our Helm-based Operator.
Deploying Our Operator via OLM
Running our Operator via make install run is a great way to develop and test our Operator. With make deploy, we can easily deploy the Operator to a Kubernetes cluster. However, in a production environment, we would rather rely on the OLM to install and manage our Operator because it simplifies the installation, updates, and management of Operators. Hence, it is important to release our Operator in a format that is processable by the OLM. This is what we will do in the following with our Helm-based operator. The same can be similarly applied to the Go-based operator.
First, to ensure you start with a clean state, run a “minikube delete” followed by a “minikube start --addons=ingress --vm=true --kubernetes-version=‘v1.22.3’ --memory=‘8g’ --cpus=‘4’ --disk-size=‘25000mb’”.
Then we can install the OLM itself to our cluster. This is not our Operator but just the OLM tooling that we will use to manage our Operator. Do it by running the command “operator-sdk olm install --version v0.19.1”. This generates several CustomResourceDefinitions such as the ClusterServiceVersion as well as several Pods: olm-operator, catalog-operator, a catalog for the OperatorHub, and packageservers all deployed to the olm namespace.
Generating the OLM Bundle for Our Own Operator
Creating an OLM Bundle
config/manifest/bases/news-operator-helm.clusterserviceversion.yaml – The generated ClusterServiceVersion manifest
bundle/manifests – A folder containing our CRD, an extended version of the generated ClusterServiceVersion resource, a Service for exposing metrics, a ConfigMap to configure the controller manager, and a ClusterRole to get the access rights for reading the metrics
bundle/metadata – A generated annotation YAML file containing core-bundle annotations as well as annotations for testing
tests/scorecard – A scorecard test config YAML
config/manifests – Adds the ClusterServiceVersion to this folder
bundle.Dockerfile – A Dockerfile for building the bundle container image containing the metadata, manifests, and test folder
Building and Pushing the OLM Bundle Image
With the next command, we build our bundle container image based on the new Dockerfile with the command “make -C k8s/operator/localnews-operator-helm bundle-build BUNDLE_IMG=quay.io/k8snativedev/news-operator-bundle:v0.0.1”. Again, replace the URL with your Container Registry accordingly. To be able to pull it from Kubernetes, we must also push it to a Container Registry that is accessible from your cluster. Make sure it is a public Container Registry. Run “make -C k8s/operator/localnews-operator-helm bundle-push BUNDLE_IMG=quay.io/k8snativedev/news-operator-bundle:v0.0.1” to push the container image into your Container Registry.
The variables IMG and BUNDLE_IMG can also be defined in the Makefile or exported as an environment variable. Then, you do not need to specify them in each command. IMG points to your Operator controller manager image. This is required because the ClusterServiceVersion contains a Deployment for the Operator controller manager. BUNDLE_IMAGE points to the bundle image.
Installing the OLM Bundle
Now we are ready to run the OLM bundle on our Kubernetes cluster. Run “operator-sdk run bundle quay.io/k8snativedev/news-operator-bundle:v0.0.1” to do it. Again, replace the Container Registry URL with your own Container Registry URL. Otherwise, if something went wrong earlier or you do not want to build the image yourself, you can leave it as is and stick to the prepared bundle image of this book.
Local News Operator Pods Running in Kubernetes
The first one is our Operator controller manager. It has been deployed by the OLM. We can follow its logs by running “kubectl logs deployments/localnews-operator-helm-controller-manager manager -f”. The result is similar to the one when we ran make deploy, but the way the Pod has been deployed is completely different as we will see in the following. Before we delve into the details, let us create a new LocalNewsApp resource first to verify everything is working as expected. Run the command “kubectl apply -f snippets/chapter6/news-sample2.yaml” to create it. You should see the Local News application spin up.
What Happened to Our Bundle?
Let us now inspect the second Pod. The bundle Pod is a registry serving a database of pointers to Operator manifest content. But how is this possible when the bundle image we have previously built and pushed was just a set of manifests? When we look at the container image that is used for the Pod, we will see that it is not our BUNDLE_IMG but instead quay.io/operator-framework/opm:latest. We will talk in more depth about OPM in the section “Deploying Operators via OLM Without Operator SDK.” For the moment, it is enough to know that the container has been started with a command that added our BUNDLE_IMG to that registry.
Configure the Resources Managed by the Operator via CRD Spec
How Does the OLM Install Our Operator?
In addition to the two Pods, several Custom Resources have been created that are used to instruct the OLM on what to do. We will explore them in the following.
CatalogSource
Subscription
InstallPlan
ClusterServiceVersion
Note that there are two further CRDs that we mention here for the sake of completeness. An OperatorCondition is a resource that is owned by the CSV and can be used by our Operator to communicate conditions such as that it is upgradeable to the OLM. An OperatorGroup can be used to provide a multitenant configuration by selecting particular namespaces for the Operator deployments. A CSV is a member of an OperatorGroup if the CSV is in the same Namespace as the OperatorGroup and the install modes of the CSV support the targeted Namespaces of the group. Both resources, OperatorCondition and OperatorGroup, have been created for our example and can be discovered using the corresponding “kubectl get …” command.
Cleaning Up
- 1.
Delete our Custom Resources to uninstall the application: "kubectl delete LocalNewsApp mynewsapp”.
- 2.
Unsubscribe from the Operator to avoid that the OLM reinstalls it: “kubectl delete subscription localnews-operator-helm-v0-0-1-sub”.
- 3.
The ClusterServiceVersion resource represents that the Operator is currently installed. By deleting it, we uninstall the operator: “kubectl delete clusterserviceversion localnews-operator-helm.v0.0.1”.
- 4.
Delete the CRDs that are owned by your Operator. In our case, we need to delete one CRD: “kubectl delete customresourcedefinition localnewsapps.kubdev.apress.com”.
Deploying Operators via OLM Without Operator SDK
We described how to deploy our Operator via OLM using the Operator SDK. This is a great approach while developing the Operator. However, how will our users – who usually do not use the Operator SDK – deploy our Operator into their Kubernetes environment? The easiest way is to use the Operator Hub.23
How OLM Is Connected to OperatorHub
The CatalogSource Resource Pointing to Your Catalog Image
The Subscription Resource
The catalog image build (that we have done via “make catalog-build catalog-push”) depends on opm24 that is used to generate registry databases. In the Makefile of the Operator project, you will find how it is installed in your bin folder. The Operator SDK version v1.15.0 that we are using relies on version v1.15.1 of opm. macOs users should replace their opm binary with version v1.15.4 or higher since there is a bug prohibiting its execution.
Now the Operator is again installed, all its CRDs available, and we could easily deploy our application by creating a CR of kind LocalNewsApp. But wait a moment, what are we actually doing here? In the section GitOps in Chapter 5 we learned that there is an even better way to deploy resources because with GitOps we can make them reproducible. This brings us to another interesting question: How can we combine the GitOps approach with Operators?
Operators Love GitOps
The good news is that Operators and GitOps are a perfect match. Why? Because Operators allow us to manage our software including operational tasks in a declarative way by describing the desired state via YAML manifests. This is the ideal input for a GitOps approach that takes these YAML files and deploys them to our Kubernetes cluster while synchronizing their state with the state defined in our Git repository.
In other words, the desired state in GitOps resides in the Git repository and is defined by the YAML manifest/Helm Chart. GitOps assures that the manifests and the resources in the cluster are in sync. Whenever we change manifests in the Git repository, the changes will be applied to our resources in Kubernetes. What a GitOps tool such as ArgoCD cannot do, however, is to understand the semantics of what is inside the YAML files. This is where Kubernetes and Operator controllers come into play. For them, the content of the YAML defines the desired state, and the logic in these controllers tries to reach this state by orchestrating and running certain operations along with the five Operator capability levels. Up to that point, we have declaratively defined what the desired state should be, the controllers interpret this, compare it to the actual state in the cluster, and finally invoke imperative actions to change the state to the desired one.
Local News Loves GitOps and Operators
- 1.
Make sure to clean up the environment, especially if you have followed the examples from the OLM section. You can run once again “minikube delete” and “minikube start --addons=ingress --vm=true --kubernetes-version=‘v1.22.3’ --memory=‘8g’ --cpus=‘4’ --disk-size=‘25000mb’ ”.
- 2.
Install the OLM with “operator-sdk olm install --version v0.19.1” or without operator-sdk using the steps described in Listing 6-46.
- 3.
Install ArgoCD as described in Listing 5-26 in Chapter 5 with “kubectl create namespace argocd” and “kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.1.7/manifests/install.yaml”.
Install the OLM Without Operator SDK
Since ArgoCD and the Operator OLM are now installed, we are ready to deploy and run our application. The only two things that are left to do are to install the Operator and trigger it to deploy the Local News application via a CR. First, create the CatalogSource, which contains the information where our Operator resources are, and the Subscription, which subscribes to a specific version and channel of our localnews Operator. But this time, we will not create it directly but tell ArgoCD where to find both manifests in the Git repository and let ArgoCD do the deployment. Run “kubectl -n argocd apply -f snippets/chapter6/gitops/olm-application.yaml” which creates an ArgoCD application that watches the Git repository and deploys what is inside the folder k8s/operator/gitops/olm. As expected, there are only two resources in this folder, the Subscription and the CatalogSource. Both are getting installed into the Namespace operators. This will dictate the OLM to install the related metadata resources as well as the controller manager Pod and the catalog Pod into the same Namespace (compare to Figure 6-13).
Listing 5-27 from the last chapter shows how to access it. And as soon as the two ArgoCD applications are done synching, the Local News application should also be up and running and accessible via “minikube -n localnews-gitops-operators service news-frontend”.
Wrapping It Up
At the end of this chapter, a few simple YAML files were enough to put the configuration of our application under full control in Git, let ArgoCD handle the rollout of them, and move operational tasks into the Operator that manages the Local News application with a dedicated custom Kubernetes controller. However, that came with a price, and that price was to do a lot of preparation and even software development while the Local News application itself remained untouched. But the benefits of having operational knowledge in code beyond everything we discussed in line with the five Operator capability levels are that it is documented and reproducible at scale. A well-written Operator can be put into the hands of anyone to quickly get up and running with your software across cloud or private data centers.
We didn’t want to make the confusion perfect throughout the first five chapters of this book, but what has just been said is exactly the reason why ArgoCD, Tekton, or Eclipse Che do not only provide an installation of their software via Kubernetes manifests or Helm Charts but also as Kubernetes Operators. And you can find all of them at the OperatorHub.25
Closing Words and Outlook
At this point, we are at the end of our journey through the world of Kubernetes-native development. This journey led us through the different phases of the typical development lifecycle.
In the planning and design phase, we had to make fundamental decisions on the architecture and the technologies of our application. At the software architecture level, which defines the structure of our source code, we identified Hexagonal Architecture as a great fit. It defines a flexible structure with clear boundaries between the core – implementing the domain logic – and its surrounding technical details. At the system architecture level, the Domain-Driven Design approach helps us to identify bounded contexts and aggregates which are good candidates to separate our application into independently deployable components. These components could be of different sizes which brought us to a discussion about microservices.
In addition to the architecture, the technology that our application is based on plays an important role. Hence, we discussed the impact of different languages, runtimes, and frameworks on applications that will run on Kubernetes. Furthermore, we discussed various packaging approaches.
In the development phase, developers write the application code based on the architecture and technologies chosen before. Thereby, they can adopt Kubernetes at various levels. They can either ignore and develop just as in a precontainer world, integrate containers into their development, or go all-in on Kubernetes for development. One of the biggest challenges in modern development is to bridge the gap between development and production; hence, the first two approaches pose the highest risk of running into problems that appear for the first time when deploying into production. Adopting Kubernetes for development, however, does not necessarily imply that we must develop on Kubernetes – which is possible – but at least to deploy and test in Kubernetes. We can either code locally or in Kubernetes using a containerized IDE such as Eclipse Che.
The same decision on how to develop applies also to what needs to be developed. Should we develop Kubernetes agnostic or should we make use of the fact that our application will run on Kubernetes? There are different levels of integration into Kubernetes as we discussed. If we go all-in, we can access the Kubernetes API to inspect and manage the environment our application is running in. This helps us to automate certain tasks and to raise the abstraction level for interacting with Kubernetes. With Custom Resource Definitions, we can use the predefined extension mechanism of Kubernetes to write our own types of resources.
The code has been written, built, validated, and finally pushed into a code repository. How can we deliver this code into production in a continuous way? We can create Continuous Integration Pipelines that build, test, and deploy the code in an independent and integrated environment. This can be achieved in a Kubernetes-native way by using a tool such as Tekton. Tekton allowed us to run the various pipeline steps in separate containers orchestrated by Kubernetes. For the deployment part, Helm is the de facto standard for packaging and deploying applications to Kubernetes. This enables us to deploy the application into a test stage to run integrated tests.
Helm, however, is not the end of the road. The deployment itself is still rather imperative and nonrepetitive since it is triggered by CLI commands. Hence, we introduced the GitOps approach that enables us to use our Git repository as the single source of truth and synchronizes the Kubernetes resources with the manifests in our Git repository. Once deployed, our application finally runs on Kubernetes. Day 1 is over and the next day begins. With Operators, we described a way how we can turn operational Day 2 knowledge into code. Operators start their work when pipelines and package managers finish theirs. Depending on their capability level, they even allow your application to run in an autopilot mode. The operational code lives side by side with its application to manage and can thus be bundled as such. Good operators can be a differentiator from other competitors providing similar software.
So, finally, after reading this book, you should be well prepared for writing great Kubernetes-native applications leveraging the benefits and capabilities of Kubernetes. This has great potential to increase the overall quality of your software as well as increase your productivity when writing new containerized applications. However, especially the last chapter showed that Kubernetes combined with the toolset on top of it can bring tremendous value but also doesn’t lack complexity. Therefore, the book should provide you with a very good sense of judgment on how fast and how far you can follow the journey to Kubernetes-native development.
However, with this book, the journey is far from over. There are many more things to discover, and the ecosystem is evolving fast. Just have a look at the Cloud Native Computing Foundation (CNCF) landscape that gives an impression of the velocity.26 Good starting points for further exploration are Service Meshes to better manage distributed applications and serverless for flexible scaling and further abstraction from the platform which both help to make your applications even more focused on business logic. Both topics are covered by other books.27 And while in this book we focused on writing new Kubernetes-native applications, there are still, however, many existing applications out there that should be modernized and are good candidates to run on Kubernetes, and there are also great resources to start with.28