Chapter 6: Verified GitOps – Continuous Delivery Declaratively Defined

The last GitOps concept that we will cover is a relatively recent practice known as verified GitOps. The essential difference between verified GitOps and the other GitOps practices is the desire to leverage GitOps for the entirety of continuous delivery, rather than only for continuous deployment.

One operating requirement across all GitOps practices is the need for declarative language files to be stored in a source code manager and referenced at runtime. And, depending on which practice is being adopted, the number of required files is heavily impacted based on the desired manual intervention. This results in a balancing act between manual intervention steps and the number of declarative files.

In this chapter, we're going to cover the following main topics:  

  • Verified GitOps basics
  • Test, governance, deploy, verify, and restore as code
  • One file, many files, or somewhere in-between
  • Benefits and drawbacks of verified GitOps
  • Common verified GitOps tools

Verified GitOps basics

The past few months of developing and implementing GitOps tools and processes have been illuminating for the DevOps team. The changes in scope from the business and engineering leadership were somewhat discouraging. However, as the team had to restart their efforts when implementing GitOps, they realized that the new requirements helped with futureproofing. Each process or tool that the team had looked at seemed like a great solution for the requirements at that time. But as they tested the solutions, especially with each scope change, they quickly realized that the inevitable future requirements would not have been met. Requirements such as cross-platform support, cloud-native and traditional support, and a desire for repeatable, reliable, and scalable processes were difficult to achieve with the other tools.

These requirements lead them to Ansible, which seemed like it was going to offer the best option for leveraging GitOps across all platform types. One thing that made Ansible so helpful was its ability to leverage declarative files in a native way. This is because Ansible operates in more of a scaffolding design. The DevOps team would need to fill in the pipelines with integrations and self-defined steps. If the team needed to execute a process on a server, they would still have to build the process out themselves. Then, when the process was built out, they could bring that process into Ansible for it to execute.

A major benefit of Ansible is allowing imperative commands to become declarative through the use of templating with variables. The DevOps team could build out they commands they wished to run, add the variables, and then reference those command files wherever they needed them. This templating capability meant that the team only had to build a step once and could then reuse it wherever they needed to. Because of this capability, the team had to figure out how best to leverage this reusability without causing too much confusion. This templating capability meant that DevOps could prevent repetitive development, but it also meant that they would have to balance the functionality with the potential file count.

By representing each deployment or integration function in code, the team would be able to have an execution engine that considers each function as a building block. Each execution could be somewhat unique because of the dynamic nature of templating and user-provided variables. But this would require an execution process to be file-reference heavy, resulting in difficult troubleshooting processes. The alternative is to build fewer execution processes, allowing for a significant number of input variables with less file referencing. The main concerns were the troubleshooting steps and total number of files required to fulfill their needs.

Since the team was considering how they would structure the execution process and files, they found that they would also need to solve secrets usage and user access. Audit requirements had to be included in the process, especially around who has access to the different environments and who deployed to each environment. And, in an effort to futureproof their process, they wanted to see if they could extend the typical deployment-only limitation of GitOps to some of the delivery requirements as well. But adding testing, change management, approvals, validation, and the rest of the delivery process requirements would not be easy.

One area that the team had to consider especially if they wanted to leverage a GitOps process, would be leveraging declarative files for tool integrations. Each integration would require its own file, which includes the configuration and connection requirements. And for every execution that includes the desired tool, the team would have to use another declarative file to define the desired interaction with the tool.

By representing every integration, interaction, execution, and configuration in code, the team would be able to achieve a more verified GitOps practice that would ensure a repeatable, reliable, and scalable approach to the delivery.

The most basic requirements for automating any process are the execution engine and a set of configuration files. The execution engine is the core appliance that enables a process to be automated, and the configuration files define the execution engine's behavior.

Every execution engine is purpose-built to solve a problem, which is reflected in its functionality. The purpose and functionality of the execution engine will lead to a set of standards and best practices for optimal performance and reliability. But a common trend in the engineering world is that execution engines become overextended. Users will often develop workarounds or hacks that allow them to break with the best practices, causing significant unreliability and often resulting in catastrophic system failures. Some execution engines become so overburdened that support teams have to work late evening and weekends to fix or rebuild broken platforms.

An example of overextending would be leveraging an execution engine that is purpose-built for deploying monolithic applications to servers. A common capability of these execution engines is the ability to run scripts from their servers. If a user decides to leverage the scripting capability to run deployment commands against non-server endpoints, then the process becomes error-prone. And since the native functionality is overextended, the tool is not equipped for proper troubleshooting or remediation processes. Overextended processes also result in significant tribal-knowledge and scripting requirements. And eventually, because of the required support burden, team members will churn, causing the entire process to stall and burn out. By avoiding overextension, the reliability of the tool and process can be better guaranteed.

If the execution engine enforces reliability and repeatability in a process, then the configuration files allow for scalability across supported platforms and teams. The developers of the execution engine can build out a configuration process that is both recommended and required. The configuration files should allow users to declare what the desired outcome of the execution should be, and any supported customizations are limited to what the configuration file can supply.

As the process that the group wants to automate grows in terms of support, requirements, and scope, the execution engine and configuration files will also increase. The execution engine will increase in terms of its size, maintenance requirements, administration requirements, and functionalities. However, what often is not addressed in this support scaling is redesigning the execution engine so that it fits a broader or more accurate purpose. Rather, the execution engine's initial purpose is extended to meet the new platforms and use cases, assuming that the new requirements can be addressed in the same way as the old requirements. Additionally, the configuration files will also increase, not just in the number of files, but also in their size and utility.

For example, in regard to a basic server deployment process, a script might be run by the execution engine and is intended to stop a process, copy application files and artifacts to the server, set up a required folder hierarchy, and then start the process again. The configuration files that are supplied to the execution engine would declare what to deploy, where to deploy it, and so on. As the number of servers increases, the execution engine can remain the same, but the configuration file count might grow with one new file for each server. If the required platform support grows to data centers and the cloud, the execution engine must now scale to support the nuances between the different network, platform, and security requirements. And if the required platform support adds non-server-based endpoints to deploy to, the execution engine and configuration files must now scale again. However, because the execution engine was originally purpose-built for server-based support, it will either need to be refactored entirely, or it will have to treat non-server deployments the same as server deployments.

Another thing to understand about automation is that the complexity of the execution engine and configuration files are tightly coupled with the complexity of the process that needs to be automated. Regarding the example of the server deployment, the process of stopping, copying, moving, and starting a process on a computer can be only a few lines of code. The moment that the process expands to include testing the artifact that was deployed on the server, the execution engine and related configuration files become significantly more complex. As such, detailing the different behavior and execution requirements are essential to the success of the automation outcome.

The continuous deployment process can be as simple as deploying to a single server. But it can also be complex such as deploying to a set of Kubernetes clusters across multiple regions. But as we learned in a previous chapter, the deployment is only a small portion of the delivery process. If a team wanted to automate their delivery process while also adhering to GitOps practices, they would need to leverage verified GitOps. Verified GitOps is a GitOps practice that focuses on providing repeatable, reliable, and scalable continuous delivery through GitOps.

Test, governance, deploy, verify, and restore as code

When the DevOps team was documenting the different stages of their desired delivery pipeline design, they wanted to include significantly more than just the deployment process. There was a desire to include QA testing, change management, verification of the production environment, and failure remediation. But shortly after those original design meetings, the team experienced an accelerated timeline and scope change. If they wanted to revisit the idea of automating the delivery process, especially with verified GitOps, they would need internal support.

The team found that it was better to start with a specific platform, build out the delivery pipeline for that platform, and then expand to the other platform support afterward. And because of the native execution capabilities of Kubernetes, it would be less of a lift for the team to build out the delivery pipeline for the containerized applications.

Initially, the team would have to solve the issue of Kubernetes manifest sprawl. If they could turn the main set of resource requirements for each application into a core template set, this would reduce the manifest maintenance. To accomplish this outcome, the team needed to consolidate Helm Charts and let teams leverage override files. At runtime, the execution engine would be prompted to fetch and deploy the manifest files from Git. The main requirement that had to be built was the fetch mechanism of the execution engine, since everything else was natively built into Helm's templating engine.

To use a verified GitOps practice the solution would need a set of declarative files. These files would give information about the Chart to fetch and deploy, access permissions, and environment configurations. And since integration and configuration files are not tightly-coupled to a specific pipeline, they could be referenced and reused almost infinitely. Once that process is defined in code, the team would need to build out the pre-deployment and post-deployment requirements.

The pre-deployment requirements that the team needed to add were the initial change management process and some preliminary security processes. To accomplish the change management requirement, the delivery pipeline would need to create a ticket with the required governance information. Each execution would have to add information about the artifact, environment, testing outcome, execution trigger information and any approvals. As for the security requirements, every execution would have to run a security scan on the artifact and the infrastructure, with the results being added to the change management ticket.

In order to define these steps in code, the team would have to build out tany files related to integrations, configurations, and executions. Similar to the declarative files for the Git integration the required supporting files for the change management and security processes could be built once and reused.

The next part of the delivery pipeline would include any post-deployment steps, such as ticket updates, deployment validation, and so on. But since the change management process was already built out for the pre-deployment steps, the team didn't have to build out that requirement again.

For the other parts of the post-deployment requirements, the team would have to figure out the best way to use a healthcheck to test for success, and what to do if the deployment fails. The easiest of these steps would be executing the restore process because Helm has a rollback feature already. The only part that the team would have to design is how to reference the previous deployment and automate the execution of the desired Helm commands.

In the case of testing the success of the deployment, Helm and Kubernetes can provide verbose logging to watch for any errors and will give the deployment status of the release. If the deployment status is successful the pipeline will be completed. In the case of a failed deployment, the pipeline should include an automated rollback process. The pipeline could reuse the release name from the deployment in the Helm rollback command. Being notified of the deployment status will also be a requirement, especially if there is a failure. At a minimum, the notification message should include the status and a link to the execution. Ideally, the message would include some information from the logs and error. Health checks and deployment validation would be the last steps to build out for the post-deployment.

The health check would not be too difficult if the application had a frontend or API. The pipeline would need to run the command on the desired endpoint and be able to validate the response.

The most difficult part of the post-deployment requirements would be the ability to verify the functionality against actual usage. The DevOps team wanted to leverage the monitoring tools that the engineers were the most accustomed to. One way to verify the deployment would be for the verification tool to send an alert when an issue occured. Most of the engineers were used to getting these alerts, so it would not be confusing for them. The desired point of integration is being able to tell the monitoring tool which deployment went out and getting information back. But even if they got information back, it would be difficult to discern what was good and was not. They would need to look for tools that could offer this capability and be added into their platform.

At the end of the requirements gathering process for the verified GitOps delivery pipeline, the team had a good understanding of what needed to be executed. They now have the difficult task of building out each execution step and declarative file that Ansible could leverage for the verified GitOps process.

Most companies are heterogeneous in their technology stack, meaning that they will have a wide range of different technologies to use and support. To automate a delivery process, every engineering organization will have a cross-team, cross-platform, or cross-cloud support requirement. The initial pain associated with this type of support is being able to adequately integrate and implement the individual steps and stages across each discipline. And each engineering discipline is its own market for a tool of some kind.

At some point, the desire for a general orchestration engine that integrates with the individual execution engines becomes very enticing. This is especially true when the orchestration engine can abstract away the operational knowledge requirements of the underlying tools. An orchestration engine, which is a tool that orchestrates a set of tasks or processes, is always enticing whenever multiple tools are used. A benefit to using an orchestration engine is in regard to reporting and transparency. Another advantage is that the administration requirements of the underlying execution engines can be managed by a set of subject matter experts, or SMEs. For many DevOps teams, Ansible is the orchestration engine of choice. Not only can Ansible integrate with other tools as an orchestrator, it can also be the execution engine when needed.

One main characteristic of an orchestration engine is its ability to behave as an abstraction layer that separates users from operation requirements. For example, Kubernetes has a built-in orchestration layer that abstracts the underlying operating procedures away from the Kubernetes users. Although a Kubernetes administrator can alter the operating procedures, the vast majority of users interacting with a Kubernetes cluster do not have to concern themselves with how Kubernetes accomplishes a task.

Representing the interaction between the underlying execution engines and the orchestration engine in code is a good way of understanding the verified GitOps concept. However, contrary to how originalist GitOps looked to the Git repository as the source of truth, verified GitOps leverages the orchestration engine as the source of truth. The assumption is that the different execution engines and platforms should continue to operate within their scope of purpose. The orchestration engine takes the user's desired state and passes it to the underlying layers, reports execution output, and then moves along the rest of the delivery flow. A core principal of verified GitOps is that execution engines and underlying platforms should be the source of truth. The declarative files in Git operate only as the source of the desired state.

As teams adopt verified GitOps, their orchestration layer will grow to incorporate many different platforms and execution engines. With this growth, they will need to figure out how best to balance their declarative file requirements.

One file, many files, or somewhere in-between

The DevOps team was finishing the design work on a verified GitOps pipeline and began documenting the requirements for the different integrations. They figured that they would need a set of declarative files for every integration, which included access and execution requirements. Then, they would be able to reuse those files in any pipelines that needed them. Because every provider, platform, and tool required these different sets of filesthe team would need to figure out the best folder hierarchy to store the files.

The team would then need to build out a set of declarative files that defined the required variables for integration interactions. Each interaction definition file would have to reference the appropriate integration files for access and permission requirements.

Next, the team would need to build some environment-specific execution requirements to enforce security and compliance standards for every deployment. The DevOps team would need to have a set of policy enforcement files to validate the compliance standards.

With all of the integration interactions codified, the team would need to build out the pipelines and associated triggers. The architecture of a pipeline was a point of contention within the DevOps team. Some of the team wanted to have a pipeline be one large file, which would reduce the overall number of files. Others on the team were worried about readability and preferred a higher number of small files that referenced each other.

No matter how the team designed the pipelines, they would inevitably have a pipeline of pipelines. If a group of related tasks were paired together into their own files, then a pipeline would be made up of building blocks. But this would have a significant amount of file-referencing, which can be very difficult to follow in code. The team needed to figure out how best to structure their pipelines to be scalable, repeatable, and reliable.

The biggest issue with any type of declarative file structure is the amount of work that a team has to contribute to get the maximum effectiveness out of the file. This is most commonly seen in Kubernetes manifests. Every Kubernetes resource can be declared statically in one extremely long YAML file. As templatization became a standard for Kubernetes administrators, these resource files would often be broken up into smaller sets of files for reusability. However, when the files became smaller, the number of files increased exponentially. An increase in the number of files leads to files referencing other files. When any file structure has significant inter-file referencing, the required knowledge to administer and troubleshoot issues increases.

The world of GitOps, regardless of it being original, purist, or verified, falls prey to this balance of manifest size and number of manifests. The end goal should be that the practice does not require repetitively building steps. But in a desire to limit the number of repetitive files, a verified GitOps administrator will be tempted to increase the number of small and unique files that reference each other. This is a delicate balance to get right since the outcome of the file structure will directly contribute to the readability and usability of the system.

As a team looks to adopt verified GitOps, they will need to decide who will build, maintain, use, and improve upon the system over time. By leveraging the large number of small files approach, the future maintenance, usage, and improvement of the system will be restricted to a small set of advanced users. Alternatively, by pursuing a small number of large files approach, the maintenance, usage, and improvement of the system will be easier.

Benefits and drawbacks of verified GitOps

By working through the nuances of verified GitOps and how to support all of the potential platforms and tools that the delivery pipeline requires, the DevOps team has, yet again, found themselves in the position of questioning their tool choice.

Even with all of the testing and research that they conducted, the team found that the building and maintaining requirement was too heavy. Ansible is highly customizable, but requires its users to build out every step in every file. And since there is no auto-generation capability, the overall file building requirement was massive. If the company only needed to support one or two platforms, the workload would be bearable. But because they had to support many different platforms, tools, and use cases, the build requirements were daunting. They needed to make a quick decision as to whether to search for another potential solution or start building out the Ansible process immediately.

But a quick search for a verified GitOps solution gave them zero results. Although there are a significant number of tools that state themselves as being GitOps tools, there was no time to analyze every single tool in that list. Instead, the team figured that if a tool advertised itself as a GitOps tool, showed that it had support for the platforms and tools they needed, and had some form of pipeline code, they would consider it.

These requirements whittled down the list to a small set of tools, which made the analysis easier for the DevOps team. Then, what they needed to understand was how the tools supported the platforms and underlying tool integrations, and how was the pipeline code was leveraged. If they were going to recommend buying a solution, then it needed to have native and intuitive support for their requirements.

Tool documentation would give more insight into how different integrations or tasks were supported. Some of the tools advertised support for a platform or integration, but really meant that a user could build and run scripts. Since that was an issue that they had with their Ansible setup, they decided to move those solutions to the bottom of the list of potentials.

Diving deeper into the solutions that were left, they found a tool with a signficant number of native integrations, GitOps support, and declarative configuration files. The documentation seems to indicate that the tool auto-generates the configuration files and can push them to Git. One drawback with the tool was that the testing and security tools that their company was using did not have native support. However, the platforms were all natively supported, as were most of the verified GitOps requirements. The team would need to download the trial, configure the solution, and see if it would work for them.

Understanding the benefits and drawbacks of any practice is paramount to success. The intent should be to capitalize on the benefits and be fully aware of the drawbacks before making any decision about adopting the practice. And as has been shown throughout this book, the drawbacks of GitOps can range from potential non-starters to mere annoyances. For example, if a team needs to support any platform outside of Kubernetes, then originalist GitOps is a non-starter for them. However, with purist GitOps, there may not be a way to directly support the platform, but the user can work around that issue.

Verified GitOps falls prey to a common GitOps drawback, which is the need to support and build out some set of declarative files. The balance between a few large files and many small files is an annoyance that should be understood when approaching verified GitOps as a potential solution.

Another drawback is that there are no tools marketed toward verified GitOps or purist GitOps. But even though there are no tools purpose-built for these GitOps practices, this does not mean that the practices should be abandoned. There are some tools that exist that allow the execution engine to be extended to support verified GitOps, without it being overextended or overburdened.

Verified GitOps has some major benefits, when properly instrumented. If every integration point is defined declaratively, then the usage of that integration can be more reliable and repeatable. If a team wants to implement a change management process that requires significant data input, an automated approach is the best way to ensure uniformity. By declaratively defining the change management automation with the desired set of data inputs, the outcome will be a reliable process that is automatically repeated and can be scaled across every execution as needed.

The biggest benefit of verified GitOps is the confidence across the entire delivery process that comes from a repeatable, reliable, and scalable practice. Every security, compliance, testing, verification, and deployment requirement can be enforced without manual intervention by users. And as every engineering team knows, fewer manual requirements in any process immediately reduces mistakes, bugs, and bottlenecks.

Common verified GitOps tools

The process of designing the requirements for verified GitOps gave the team a better understanding of where they should spend their time. The DevOps team has experienced building an in-house solution before, mainly avoid the costs of buying a tool. But they have also experienced the significant administration and maintenance effort associated to building a tool. Although Ansible had offered a wide range of customization capabilities, it was essentially a tool that they would have to build out and maintain. The other problem with Ansible was that although the files could be stored in a Git repository and pulled at runtime, the tool required the Git repository to be pulled down before every execution.

After performing market research for a verified GitOps tool, the team found Harness, which had a very promising solution. The tool allowed the team to support Kubernetes, serverless, and server-based platforms. It had a native code conversion process that turned the pipelines in the UI into code, which could be stored in Git. Lastly, the team found that the SaaS style of the tool allowed them to maintain significantly less hardware and software in their environment.

Another massive benefit is that Harness allowed them to quickly adopt a verified GitOps approach for the delivery requirements. They could easily define their integrations, tie those integrations to deployments steps, and enforce all of the executions based on their security and compliance restrictions. These benefits were all significantly enhanced because the tool has the ability to leverage built-in variables in a host of different areas, similar to Ansible.

Even with all of that, there were a few drawbacks to the tool, one of which is the lack of native integrations with their testing and security tools. But the tool did offer the ability to run scripts to supplement that issue. The same problem exists with the security tools that were required, such as artifact scanning. Although the scripting piece would allow for pseudo-integration, the team would have to build it out themselves.

One last drawback for the team was the fact that the tool was not free. They knew that trying to get a purchase request through the engineering leadership and company was not going to be easy. Setting up the solution was simple, but finding a justifiable reason to purchase a tool over building out Ansible was a different story. What they needed to do was have one group from their team start to build out the Ansible requirements while the other group worked on a way to justify purchasing Harness. Regardless of the outcome, they needed something in order to meet the required deadline for support, whether that was deployment automation or delivery automation.

Verified GitOps is a practice with a broad scope of purpose. Because of that, any tooling that is available will need to support a broad scope of purpose. An example would be something such as Jenkins, where its original scope of purpose is integration, but could allow for more. Using Jenkins is not uncommon in the industry, and the same can be said for any integration or script running tool. But what needs to be considered is the engineering work required to achieve the desired scope of purpose.

An alternative to overextending a tool such as Jenkins would be using a generic execution engine, such as Ansible. Ansible is an open source tool that is commonly compared to Jenkins, Puppet, Terraform, and others. Considering the wide range of tools mentioned, it is clear that Ansible is much more than a simple provisioning or configuration management tool. With the use of YAML files, plugins, and a central execution engine, Ansible meets many of the requirements of verified GitOps, even though it is not marketed as a verified GitOps tool.

Some companies forgo the tool route entirely, choosing instead to build a solution in-house. In most cases, companies do not associate the time and effort to build a tool with a hard cost, even though building and maintaining a tool can be a heavy requirement. Scaling an in-house built tool can not only cause higher support costs but can even lead to attrition of employees. A good way to understand the build cost is to consider the time associated with building, maintaining, documenting, improving, administrating, and hosting the tool. If four engineers, paid $100,000 a year, spend 6 months building the tool, that is $200,000 that's been spent just on building the tool. Maintaining and administering the tool might take 2 to 3 days a week, or 1 to 2 weeks a month, resulting in an additional $23,000 to $47,000 a year. The documentation, training, and iterating on the tool will cost about the same as the maintenance, which is another $23,000 to $47,000 a year. Finally, the hosting costs of the tool, especially if it is hosted in the cloud, might run an additional $5,000 a year or more, depending on scaling needs. This means that the first year will cost between $230,000 and $250,000, with a run rate of $51,000 to $100,000 on the low end.

The alternative to an in-house built tool is to buy a tool, but that comes with its own issues. Any purchasing decision has to consider the hosting and maintenance requirements, the growth of the tool and relative licensing expansions, and break-glass scenarios. Harness is a tool with a scope of purpose that covers the delivery process. But, because it is a Software as a Service (SaaS) solution, it must be licensed. Although it has a lower hosting and administration cost, it has a higher hard cost, which are any costs that a company sees on a purchase order. Alternatively, soft costs are any costs that can be hidden in employee hours or hosting requirements.

Although Harness does not advertise itself as a verified GitOps tool, its scope of purpose covers the same concepts. The ability of Harness to define every integration, permission, execution, and use case in declarative files, and then store those files in a Git repository, makes for an easy setup and configuration process. The drawback of Harness is that it doesn't have a native integration with every possible solution or process on the market. Therefore, anyone looking into a tool such as Harness needs to understand the work required to build out support for missing integration points.

Every time there is a requirement for a new tool or process, it is of extreme importance to understand the soft and hard costs involved. The cost to build or adopt, the cost of maintenance, the cost of administration, the cost of scaling, and licensing costs are all things that need to be considered before making any decision.

Summary

Verified GitOps is a practice that leverages declarative language files to solve the automation around the delivery process. By codifying the integrations, permissions, and executions, the team that adopts verified GitOps can quickly ensure it is repeatable, reliable, and scalable for every delivery. But since there are no tools that are purpose-built for verified GitOps, the options are limited to either a delivery-based tool, such as Harness, or a general script-running tool, such as Ansible.

In the next chapter, we will cover the best practices for deployment, delivery, and GitOps. The adoption of best practices will not only result in meeting industry standards, but also allow for a team and company to avoid tool lock-in.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.251.154