Alongside DevOps, which drove the need for the dev-first security approach just discussed, we’ve seen the evolution of the cloud and the era of cloud native applications. As I mentioned in Chapter 1, cloud native apps have a broader scope than their predecessors, growing to include more elements of the underlying stack.
This change in application scope requires a change in the scope of application security too. This chapter discusses a new and expanded scope for AppSec called Cloud Native Application Security (CNAS).
Before we dig into the details, let’s take a moment to understand this transition and what the new scope holds.
Before the cloud, applications were typically made up of code and libraries and ran on a large central stack managed by the IT department. If a developer wanted a server to run the app or needed a port opened, they opened a ticket with their justification, and IT processed the request. Even after the resource was supplied, the responsibility for ongoing patching of this server or monitoring the opened ports sat with IT, who would reach back to dev only if necessary.
Most of the IT/ops and security industry focused on this reality. IT teams are, by and large, quite security minded and would balance an incoming functional request with their responsibility to keep the data center secure. To serve these teams’ needs, a rich set of solutions came to be, helping them manage and secure fleets of servers and infrastructure. These included config management databases (CMDB), patch management, vulnerability management tools, and more. These solutions were designed for IT teams, tuning—as discussed in Chapter 2—for their needs, context, and surrounding tools.
The cloud introduced a very different reality. Instead of needing to ask IT for a server, a developer can just spin one up through a simple API or a few web console clicks. Instead of asking for access to an IT-managed database, the same developer can pick an open source container or use a cloud service. Opening ports, changing permissions, and many other aspects of infrastructure administration suddenly became available to developers, on demand, without delay. Removing the delay unlocked a burst of productivity and innovation and was a key force in the digital transformation tidal wave, but brought along some scary security implications.
Although developers could easily spin up infrastructure, they weren’t proficient or well equipped to manage and secure it over time. For many enterprises, the result was a sprawling, ill-managed, and often vulnerable cloud surface; an alarming number of unpatched containers; and a wave of breaches due to poor security hygiene. To help regain control, a new class of container security and cloud security posture management (CSPM) tools came to be, using cloud APIs and other techniques to track what’s deployed and flag potential holes, as Figure 3-1 shows.
Alongside CSPM tools helping to get IT/ops back on their feet, developers also evolved their use of the cloud. Instead of ad hoc actions to set up a cloud environment, tools such as Docker, Terraform, and others were created to unlock IaC, a means to describe infrastructure as though it’s just part of the app. These powerful tools up-leveled the previous IT/ops processes, introducing far better repeatability, local testing, version control, and other collaboration capabilities previously reserved for custom code development alone.
From a security perspective, these tools introduce a new reality. CSPM is useful, but those solutions find the problem after the fact, meaning the flaw was already deployed, limiting them to visibility and fast response. IaC, on the other hand, provides an opportunity to find security mistakes before they are deployed, by testing for flaws as part of the CI/CD pipeline, driving far better efficiency.
At first glance, security controls for containers or IaC may seem similar to their ticket-based IT security predecessors. They deal with the same risks, such as an unpatched server or an open port, and offer an opportunity to weigh a request for functionality against the security risks involved. However, in practice, the user for the two solutions is entirely different.
Containers, IaC, and their brethren are managed by developer tools. They reside as files in a source code repository, are edited through Git-based collaboration, and get applied by running a build. Furthermore, they are often managed in the same repository as the rest of the app, and the logic of the app’s custom code is often intertwined with the underlying containers and infrastructure (infra). Last, they are increasingly managed by the same developers building the rest of the app.
Securing containers and IaC requires an application security solution, not an IT security solution. Just like the rest of their AppSec peers, these solutions must be dev-first and fit elegantly in the developer’s workflow and ecosystem of tools. Over time, as the lines blur between the app’s custom code and underlying cloud infra, we must similarly combine securing those parts in one holistic view.
This combined approach of securing all parts of a cloud native app is what we call CNAS, an expanded scope of the historic application security space.
Securing operating systems and infrastructure as AppSec (instead of IT security) requires some rethinking. The next sections shed light on some key differences in the two biggest new players in the AppSec landscape: containers and IaC.
Container images look an awful lot like VM images, but they are lighter in file and memory footprint and thus faster and cheaper to scale. Just like VMs, they hold within them a filesystem and, notably, an operating system, on top of which applications are installed and run. They offer similar operational advantages, including running multiple software machines on a single hardware device, just with a different performance profile and interface. Last, from a security perspective, they need to be secured in a similar fashion, ranging from keeping that operating system patched to monitoring a running container to ensure that it wasn’t compromised, just like you would a VM.
As a result, security teams asked to secure containers, typically after business units already started adopting them, default to applying their VM security practices to the task. More specifically, since containers are used mostly in cloud environments, security teams tend to apply the same security controls used to secure cloud VMs to containers.
This default motion is applied both for runtime container security and for securing container images, but it’s only effective for the former.
For the purpose of securing containers in runtime and identifying and reacting to attacker activity, treating containers like cloud VMs is a pretty good starting point. The two entities are similar in most of the areas that matter.
Containers and cloud VMs are both ephemeral (short-lived) and scale elastically, requiring monitoring software that can identify machines wisely and cluster them well.
Attacks on cloud VMs and containerized systems mostly target the operating system and apps they hold, with very few targeting specific VM or container vulnerabilities, meaning attack or compromise indicators are similar for both.
Both containers and cloud VMs are API driven and designed to be ephemeral, allowing automated recovery and resetting of machines.
That said, containers do require cloud VM security systems to evolve to cope with a new level of scale and speed. Most container clusters run an order of magnitude more VMs than cloud VM setups, are updated on a far more frequent basis, and run more versions in parallel. Containers tend to be more immutable than cloud VMs, scale up and down much faster, and are more likely to be deployed across multiple clouds.
Thus, runtime container security solutions need to be better than cloud VM ones, but along the same path. A great runtime container security solution can easily handle securing cloud VMs too; it just needs to add support for a different virtualization and orchestration platform. In other words, runtime container security solutions are an evolution of their cloud VM predecessors, whereas securing container images requires a revolution.
If their operations seemed similar, building containers couldn’t be more different from building cloud VMs.
Cloud VMs are built via Secure Shell (SSH) and IT tools. The most common pattern is to create golden images by IT manually or by using tools like Puppet and Chef and store them as reusable VMs (e.g., Amazon Machine Images [AMI]). Apps are installed on top of those VMs, either creating their own images ahead of launch or installing the app part at boot time. Over time, IT maintains those golden images, patching them and asking downstream users to update too.
This process has pros and cons but has one big fundamental flaw: it’s separate from the workflows for building the app. This results in myriad common problems, such as the following:
Golden images are modified without testing the apps that run on them, but these underlying changes can cause those apps to break unexpectedly.
Golden images change on a schedule independent of the app, introducing risk and noise at potentially sensitive times in the app’s own rollout schedule.
App-specific images have no inherent traceability to the golden image they were built from, relying on error-prone manual tracking to know when they need to be patched due to golden image security updates.
Golden images are often updated inline, with little or no versioning or storage of past versions, making it hard to track patching or rollback.
Containers, especially after the introduction of Docker, are designed to address this gap. Containers are declared as source code (usually using a Dockerfile), benefiting from Git versioning and collaboration. Their layers and, notably, base image are clearly declared, as Figure 3-2 shows, offering traceability to external or centrally managed golden images. They are built using a standard CI/CD process and stored in a versioned registry, providing the tracking and rollback support needed for proper continuous deployment.
Overall, containers are built as applications, not infrastructure, and securing them requires an application security solution. This means that instead of evolving your VM patching flows to include containers, you need to evolve your AppSec practices to do so. You need to tackle the same risk—having an unpatched or misconfigured server—through an entirely different lens.
As a step further, it’s important to realize that containers aren’t just built the same way apps are built; they’re built as part of an app. The Dockerfile defining a container usually sits in the same repository as the app that runs on it and is edited by the same developer changing the code in the file next door. The same build compiles both the app’s code and the container it will be stored in, and the combined result is stored together in the registry. Containers are a core part of the application being built and shipped.
This creates great alignment between apps and their underlying OS. When updating a container, the ensuing build process will ensure that the app running on it will keep functioning as it should. When a problem does occur, the application can be rolled back to a stable state, including its underlying virtual servers. But it also creates a coupling; containers have to be built and secured by the same people building and securing the application’s code.
This is a change in ownership that shouldn’t be taken lightly. We’re asking developers to take on a new responsibility, one they weren’t expected to do a decade ago. We need developers to see it as part of their job to pick a secure operating system, minimize its content and user permissions, and patch it on a regular basis. We need to educate dev teams on how they can take on these new risks and equip them with the tools and mandate to do so successfully.
Last, moving container security into AppSec requires rethinking our priorities. Should your dev team prioritize avoiding and fixing vulnerabilities in their own custom code, or patching the container they use to address known vulnerabilities? Historically, developers were heavily focused on the code they wrote, but not patching your server is a far more common way to be breached. In the past, these two risks were part of different backlogs; now they’re on the same list.
Rethinking container security through a dev-first AppSec lens is a big endeavor, but there are a few key areas to focus on to get the change started. These represent necessary changes, but also opportunities to use this transition to be more secure with less disruption.
First, test containers early, even before CI/CD kicks in. The fact that Dockerfile is a source code file means it can be inspected when edited in the IDE to flag security mistakes. It can be reviewed automatically in pull-requests before it’s even built. Such early detection can flag known vulnerabilities or bad base images before they are merged, saving time and effort to fix them and reducing the chance of a disruptive broken build.
Second, focus on fixing flaws, not just finding them. As mentioned earlier, although an auditor’s job may be to find issues, a developer’s job is to fix them. Said differently, it’s easier for developers to fix problems than for auditors to, and we should make the most of this. Make sure your security tools invest in making remediation easy, whether through clear guidance, automated-fix pull-requests that modify Dockerfiles, or automated build triggers to drive patching. This will clearly help you fix more holes but will also improve developer adoption of the tools.
Third, invest in base image management and relating vulnerabilities to them. When you audit an image as a filesystem and find a vulnerable artifact, you don’t care which layer it arrived from. If you’re auditing the same image as source code, the origin of each vulnerability becomes key. A vulnerability in a base image requires updating the base image, or perhaps simply rebuilding to get the new patch. A vulnerable line in the Dockerfile requires editing your source code, an entirely different action. Make sure you clearly separate the two and invest in managing those base images, because most vulnerabilities arrive through them.
Last, introduce container security gates into your CI/CD. Although detecting and fixing locally or in Git is ideal, security flaws can still slip past those opt-in steps. Take advantage of the automation container platforms that offer to introduce guardrails into your pipeline, preventing artifacts that introduce severe security issues from being deployed to production. Just keep in mind that breaking the build is a disruptive action, so be careful when picking the security thresholds that require it.
Hot on the heels of container adoption is the growing use of IaC.
IaC came to be in two waves. The first wave was led by tools such as Puppet, Chef, and Ansible and introduced much-needed automation to the world of VMs. It replaced sysadmins SSH-ing into a remote machine and setting it up each time with consistent and automated scripts, and unlocked the ability to patch machines regularly and apply security configuration policies at scale. I’ll refer to this wave of solutions as IT automation tools.
The second wave came with cloud and was led primarily by Terraform, which excels in configuring cloud services. Although its predecessors were used primarily by IT to automate their work, Terraform became the tool of choice for developers and DevOps teams to tune the infrastructure to the application that ran on it.
Puppet and others have since adapted to this new world, and new IaC players like Pulumi came to be. In addition, alongside Terraform, platform-specific IaC solutions were created, such as Kubernetes Helm charts, AWS CloudFormation, and Azure’s ARM. Today, most IaC solutions are cloud and application oriented, and IaC is the clear best practice for defining application infrastructure in the cloud.
As with containers, this new IaC wave introduced a bigger change than new declarative languages or cloud-related features. IaC is now managed as part of a continuous software pipeline, with source files in Git repos, build pipelines applying changes, and even increasingly using standard programming languages. In other words, IaC files are managed as applications, not as IT—and need to be secured accordingly.
“We need to shift from securing the infrastructure to securing the code that creates it.”
Securing infrastructure before IaC involved a lot of manual work, ad hoc scripts, and spreadsheets to track what was installed where and keep the perimeter secure. With the advent of IT automation tools, discovery of the state of your infrastructure became feasible, as did applying security policies at scale.
However, though rules could be applied more easily, defining them remained difficult. Applications require many network access paths and system permissions to operate, and those needs change frequently. Communicating those needs through tickets led to slow execution, complex tracking, and a lot of frustration. At the end of the day, infrastructure was often overly permissive, which increased risk, or was too strict, which caused applications to break—a bad outcome either way.
IaC offers an opportunity for much better alignment, improving security while reducing risk of breakage. By defining infrastructure needs as a natural part of the application, we can ensure that they are “just right”: not too open, not too closed. To achieve this, we need to change our perspective and shift from securing the infrastructure to securing the code that creates it.
This leads us again to application security. We need to assess IaC files as source code and find security vulnerabilities in them. These flaws should be surfaced to the developers writing them, helping them understand and fix them as they code. When a weakness is found in production, it should be solved by going back to the code and fixing it, not by manually modifying the infrastructure previously deployed (though that’s a legit temporary patching action).
As with containers, IaC security isn’t an evolution of IT automation; it’s an entirely new perspective. To secure infrastructure as code, we need to roll the application forward, anticipating how it would manifest in production, as opposed to rolling infrastructure backward. Over time, the source of truth should be your source code, not the assets deployed, and your security will depend on how well you secure that code.
Securing infrastructure is a complicated domain. Although IaC technology resembles software development, most developers don’t know which configurations are secure and which ones aren’t. Few developers will be familiar with Kubernetes’s
securityContext attribute or know that CloudTrail logging needs to be disabled, and the growing complexity of cloud platforms implies that this gap will never truly go away. To address this challenge, you need to invest in education and automated inspection. See Figure 3-3 for some IaC rules that Snyk applies.
For education, focus more on logical configuration than on technical implementation. For example, your dev teams should understand the importance of minimizing the permissions a service is given, reducing the scope of who can access it, and securing the data coming into it and going out of it. They should learn to appreciate how observability helps security and how to balance data retention for forensic purposes with a privacy-driven desire not to retain personal information. These are complicated topics that will take a while for teams to truly master, but they are an important investment in upskilling your team.
For the technical implementation, however, tools should be your key solution. IaC scanners such as Snyk IaC, tfsec, and HashiCorp Sentinel can map your security requirements to every platform’s implementation details and inspect your source code as it changes. Although they can’t replace your policy decisions, such tools also embed industry best practices that can make it easy to get started and kick-start developer education.
Beyond this high-level advice, here are a few quick tips on how to embrace an AppSec approach to securing infrastructure, which at this point in the book should start sounding quite familiar:
Find issues early, before the infrastructure exists. Analyze source code files in IDEs and Git to find problems and guide developers to fix them.
Invest in fixing IaC source code. When flagging problems, help developers fix them by raising clear potential solutions, automatically creating fix pull-requests and more.
Automate IaC testing. Software needs to be tested to avoid regressions,1 and IaC is no different. Invest in unit tests, which in turn will give dev teams confidence to reduce permissions with less risk of breakage.
Embracing the cloud and cloud native development is about far more than technology. It’s an enabler for independent teams to run faster, adapting to market demands faster and driving superior business outcomes. To achieve this, application teams are given ownership and control over more layers in the stack, and we need to make sure they can also keep those layers secure.
CNAS is the perspective we need to make this happen. It requires us to understand that the same people previously tasked solely with securing their code now make decisions around so much more. We discussed containers and infrastructure, but dev teams also need to manage data, service, API gateways, and much more, all areas where decisions can have significant security implications.
Each of these layers requires us to rethink our security controls in a dev-first fashion. As we’ve seen with containers and IaC, the resulting solution could look very different from its predecessors.
1 A regression occurs when a code addition unintentionally breaks previously working functionality.