As a sequence of Chapter 13, Working with Ephemeral Hosts, we will continue to fine-tune our Amazon ECS or EKS cluster by implementing a container image cache strategy. First, we will dive deep into why a container image cache strategy is a must-have for Windows containers; then, we will learn how to leverage the existing EC2 Image Builder pipeline covered in the previous chapter to implement the container image cache.
The chapter will cover the following topics:
The short answer is that we don’t want to wait for a Windows container to take six or more minutes to initiate and start receiving traffic. Still, it usually is more than that since a lot of concurrent I/O will be happening under the hood, and even containers that are ready to serve traffic may experience poor performance.
As we already learned in Chapter 6, Deploying a Fargate Windows-Based Task, usually, a Windows container image is composed of multiple intermediate layers, which are also big in size. In the following figure, we have a widespread scenario across Amazon ECS and EKS clusters that host Windows containers composed of an application framework diversification and sidecars containers. For this, I will use applications ABC and XYZ as samples and Fluent Bit as a log aggregator running as a sidecar container:
As a standard, the windows/servercore:ltsc2019 base image is already present on the ECS and EKS-optimized AMIs; however, looking at Figure 14.1, the container host needs to pull and extract every single layer that composes these three containers:
Figure 14.1 – Typical container image chain downloaded during Windows container instance initialization
It does not seem a big deal, right? But let’s dive deep into the three biggest problems that this scenario brings to us when it comes down to Windows container startup time and container performance in the first 10-15 minutes of EC2 Windows container host life.
By default, Docker and containerd pull three layers of an image simultaneously, where some are small, for instance, the ASP.NET diff layers from the .NET Framework runtime; on the other hand, the runtime .NET Framework intermediate layers themselves are larger. So, if you look at the preceding diagram, the layer chain that composes applications ABC and XYZ is a mix of small and large layers.
In the following figure, the two larger layers held the download queue, where all the remaining intermediated layers were downloaded in a serial order because the other two threads were still downloading the larger layers:
Figure 14.2 – Docker pull behavior during the download phase
Let’s move on to the next problem.
The pull operation is composed of downloading the layer and extracting it to the disk; unlike download, which is a multi-thread operation, layer extraction is a serial operation. Therefore, large layers directly impact the time a container image is fully extracted and available to be used.
In the following figure, the 8185ee4ed646 layer size is 1.66 GB, which takes time to be extracted entirely, and as a result, all other layers that compose the container image stay in a pending status. As you can see in the following figure, one large layer can impact the overall extraction time, holding back all other layers:
Figure 14.3 – Container image extracting phase
Let us move on to the final problem.
Windows Server 2019 and Windows Server 2022 have Windows Defender antivirus enabled by default, providing the customer with a secure mechanism out of the box. To do so, the Defender antivirus scans all files written to the disk, including container image layers extracted during the pull operation, increasing the amount of sequential I/O on the disk subsystem and CPU consumption.
In the following figure, the Windows Defender antivirus consumes 27% of CPU during Docker extraction, thereby slowing down any extraction process:
Figure 14.4 – Windows Defender antivirus high CPU consumption during layer extraction
All three highlighted issues happen simultaneously, and this explains to you in detail why a Windows container takes longer to launch. We want to avoid having all these operations occur in the Windows container host initialization.
So now, let’s jump into the next section and learn how a container image cache strategy can solve these problems.
What if we could have a solution for all the problems mentioned previously so that the Windows container startup could avoid all these operations happening under the hood during the EC2 Windows container host initialization?
Here is where a container image cache strategy comes into play, and all we want is to create a custom AMI that already has all these container layers downloaded and extracted before joining the Amazon ECS or EKS cluster, so when the EC2 Windows container host joins the cluster, the container orchestrator will schedule the container. All the previously mentioned problems won’t happen since the container runtime will check whether the image is already present locally. If so, it will just check the container image metadata and run the container.
In Chapter 13, Working with Ephemeral Hosts, we dove deep into how EC2 Image Builder allows us to build custom AMIs, and one of the steps was to download and extract the necessary container images during the AMI creation.
The following figure is an EC2 Image Builder pipeline that builds a custom ECS/EKS Windows-optimized AMI:
Figure 14.5 – Custom AMI pipeline for ECS or EKS-optimized AMIs
Let’s focus on steps 4, 5, and 6 in the preceding architecture, where the container runtime pulls the image from Docker Hub or a private container image repository.
As mentioned in the previous chapter, we will leverage a custom EC2 Image Builder component that does the work:
name: Images-Auto-cache description: Pull the necessary container images schemaVersion: 1.0 phases: - name: build steps: - name: Dockerpull action: ExecutePowerShell inputs: commands: - (Get-ECRLoginCommand).Password | docker login --username AWS --password-stdin <ACCOUNT-ID>..dkr.ecr.us-east-1.amazonaws.com - docker pull <ACCOUNT-ID>.dkr.ecr.us-east-1.amazonaws.com/my-dotnet-application - docker pull mcr.microsoft.com/dotnet/framework/aspnet:4.8
Let’s dive deep into each command line:
EC2 Image builder components bring to you a lot of possibilities, and you can do whatever you want with them, not limiting yourself to the component we just built. For instance, in January 2023, AWS launched a custom-managed component that applies Center for Internet Security (CIS) benchmarks for the security hardening of your AMIs.
You can check out more about hardening AMIs with EC2 Image Builder at the following link: https://aws.amazon.com/about-aws/whats-new/2023/01/ec2-image-builder-cis-benchmarks-security-hardening-amis/.
In this section, we dove deep into each command that builds this custom component, and with that, we were able to implement a cache image strategy and output a custom Amazon ECS or EKS AMI ready to be used. As you can see, this is a simple solution, but it has a lot of benefits.
In this chapter, we learned why it is crucial to implement a container cache image strategy for Windows containers and the most common problems you will face if not implemented; then, we dove deep into the EC2 Image Builder custom component that helps you cache container images directly into the AMI.
In the next and final chapter, we will change focus and learn what AWS tools help you containerize existing applications and run them on Amazon ECS, EKS, and AWS Fargate. See you there!
18.191.147.190