In this chapter, we’ll learn how a self-managed Windows node and its components work inside and out. We will first learn about the available EKS-optimized Amazon Machine Images (AMIs) for Windows, then we will dive deep into Amazon EKS Windows nodes, followed by available container runtimes and best practices. Finally, we will deploy a heterogenous Amazon EKS cluster.
The chapter will cover the following topics:
In the Deploying an Amazon EKS cluster with a Windows node group using Terraform section, you will need to have the following expertise as well as technologies installed:
To have access to the source code used in this chapter, access the following GitHub repository: https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/main/eks-windows.
Important note
It is strongly recommended that you use an AWS test account to perform the activities described in the book and never use it against your production environment. This chapter uses an administrator-managed policy due to the simple fact that Amazon EKS permissions are very granular and we want to keep things simple for the exercise.
Let’s first start with understanding the different nomenclatures between the Kubernetes project and Amazon EKS. In the Kubernetes world, a cluster consists of worker Nodes, which are responsible for running containerized applications in the form of a Pod.
In Amazon EKS, worker nodes are called Amazon EC2 nodes, and one or more Amazon EC2 nodes that are deployed into the same Amazon EC2 Auto Scaling group are called a node group.
Amazon EKS offers three node group options:
AWS provides customers with Amazon EKS-optimized Windows AMIs, which are preconfigured with the necessary components such as Docker Engine, Kubelet, and Containerd to run Windows containers as pods successfully.
There are four Amazon EKS-optimized Windows AMI variants:
In Chapter 4, Deploying a Windows Container Instance, we dove deep into the differences between the Full and Core variants.
The Amazon EKS-optimized AMI has the following components included:
When working with Terraform, one of the easiest ways to use the latest EKS-optimized Windows AMI is through data sources. Data sources allow Terraform to query for data outside of Terraform and then output it as a value elsewhere in Terraform code. The following is an example SSM API call to retrieve the latest EKS-optimized Windows AMI ID:
data "aws_ami" "eks_optimized_ami" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["Windows_Server-2019-English-Core-ECS_Optimized-*"] } }
When you launch a Windows node, there is a Start-EKSBootstrap.ps1 on the AMI that is responsible for bootstrapping the Windows node with the appropriate kubelet configuration.
There are some required and optional parameters that need to be set when bootstrapping using Terraform. First, let’s understand what the available parameters are:
In Terraform, we can bootstrap these Windows nodes using a launch template by specifying the following user data inside the aws_launch_template resource:
user_data = "${base64encode(<<EOF <powershell> [string]$EKSBinDir = "$env:ProgramFilesAmazonEKS" [string]$EKSBootstrapScriptFile = "$env:ProgramFilesAmazonEKSStart-EKSBootstrap.ps1" & $EKSBootstrapScriptFile -EKSClusterName "${aws_eks_cluster.eks_windows.name}" -APIServerEndpoint "${aws_eks_cluster.eks_windows.endpoint}" -Base64ClusterCA "${data.aws_eks_cluster.eks_windows_cluster_data.certificate_authority[0].data}" 3>&1 4>&1 5>&1 6>&1 </powershell> EOF )}"
As you can see, -EKSClusterName, -APIServerEndpoint, and -Base64ClusterCA are dynamically filled up with the values generated by Terraform during cluster deployment, which we will dive deep into at the end of this chapter.
In Chapter 5, Deploying an EC2 Windows-Based Task, in the Setting up persistent storage section, we discussed the persistent storage use cases for Windows containers using Amazon FSx for Windows File Server and Amazon EBS. The use case remains the same; the only difference is how each container orchestrator manages the storage life cycle.
Kubernetes uses the Container Storage Interface (CSI), a standard for exposing arbitrary block and file storage systems to containerized workloads. There are two open source CSI drivers with Windows supportability:
In this section, I will solely focus on the most common use case, a combination of SMB CSI driver for Kubernetes and Amazon FSx for Windows File Server. First, let’s understand how the CSI driver works on Windows nodes, then we will dive deep into StatefulSets and Deployments as a way to consume the persistent storage.
One of the prerequisites to using CSI drivers on Windows is to have CSI Proxy installed on the Windows node, and as mentioned previously, the ECS-optimized Windows AMI already has this component installed.
Once the SMB CSI driver is installed, a new controller pod is scheduled in the kube-system namespace, and then the following steps are performed to mount a directory within the SMB share into a Pod using the CSI driver:
Figure 9.1 – SMB CSI driver workflow
The workflow is illustrated in the preceding diagram.
In addition to the CSI, there are three API resources available on Kubernetes to handle persistent volumes: Persistent Volume (PV), PVC, and StorageClass:
The following figure shows how these three components are related to each other to provide persistent storage to a pod; each component is decoupled, which makes it easier to scale and add new storage types to your cluster:
Figure 9.2 – Relationship between StorageClass, PV, and PVC
So far, we have covered the bottom component, the CSI driver, and the Kubernetes APIs that control persistent storage on Kubernetes; now, let’s talk about the most common options for consuming PVCs:
Both options are valid and will address your use case, keeping in mind that StatefulSets will create a new PVC per Pod within the StatefulSet while using Deployments; the same PVC will be shared among Pods within the Deployment.
In this section, we dove deep into how persistent storage works on Kubernetes through an SMB CSI driver, and how it can be consumed by stateful applications that still require local directories to save data.
In Chapter 7, Amazon EKS – Overview, we learned about the fundamentals of an Amazon EKS cluster, then in Chapter 8, Preparing the Cluster for OS Interoperability, we learned how to prepare a heterogeneous Amazon EKS cluster with Windows nodes, and we explored core infrastructure concepts of Windows nodes in this chapter.
Now, we will be diving deep into how to deploy a heterogeneous Amazon EKS cluster with Windows a node group using Terraform.
Important note
You will see code snippets for the remaining part of this chapter. The full Terraform code for this chapter can be found at https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/main/eks-windows.
We will first create the security group that will allow inbound/outbound TLS traffic between the Amazon EKS cluster and Windows nodes:
Name |
Source |
Protocol |
Port |
cluster_sg |
Windows_sg |
TCP |
Any |
Windows_sg |
Cluster_sg |
TCP |
Any |
Table 9.1 – Security group for external and internal traffic
In main.tf, between lines 118 and 134, you will find the rules that are being created on the security groups:
resource "aws_security_group_rule" "rule_worker_windows" { type = "ingress" from_port = 0 to_port = 0 protocol = -1 source_security_group_id = aws_security_group.cluster_sg.id security_group_id = aws_security_group.windows_sg.id } resource "aws_security_group_rule" "rule_control_plane" { type = "ingress" from_port = 0 to_port = 0 ...
The next step is to create an OpenID Connect (OIDC) endpoint within the cluster to support IAM roles for service accounts (IRSA):
resource "aws_iam_openid_connect_provider" "eks_iam_openid" { client_id_list = ["sts.amazonaws.com"] thumbprint_list = [data.tls_certificate.eks_windows_cluster_tls.certificates[0].sha1_fingerprint] url = aws_eks_cluster.eks_windows.identity[0].oidc[0].issuer }
With the OIDC configuration in place, now we will ensure the VPC CNI has the necessary role and assume role policy and IAM policy to be attached. We will use AmazonEKS_CNI_Policy, a managed policy that is specifically used by the Amazon VPC CNI plugin:
### EKS VPC CNI Role resource "aws_iam_role" "eks_vpc_cni_role" { assume_role_policy = data.aws_iam_policy_document.eks_windows_assume_role_policy.json name = "eks-vpc-cni-role" } resource "aws_iam_role_policy_attachment" "eks_iam_role_attach_AmazonEKS_CNI_Policy" { policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy" role = aws_iam_role.eks_vpc_cni_role.name }
Finally, we will create an EKS cluster service role and attach two managed policies to the role:
resource "aws_iam_role" "eks_iam_role_cluster_service" { name = "eks-cluster-service-role" assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [ { Action = "sts:AssumeRole" Sid = "" Effect = "Allow" Principal = { Service = "eks.amazonaws.com" } ... resource "aws_iam_role_policy_attachment" "eks_iam_role_cluster_service_attach" { for_each = toset([ "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy", "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController" ]) ...
As you can see, Amazon EKS uses two different sets of roles: first, all roles that are necessary for the control plane. Second are instance roles, which we will create; these will be used by Windows and Linux Amazon EC2 nodes to be able to join and perform activities on the cluster.
AmazonEKSWorkerNodePolicy grants Amazon EC2 nodes permission to connect to Amazon EKS clusters.
AmazonEC2ContainerRegistryReadOnly grants read-only permissions to list images and repositories as well as pull images from Amazon ECR.
AmazonEKS_CNI_Policy is specifically used by the VPC CNI plugin, which has permission to assign, create, attach, delete, describe, and detach ENIs on Amazon EC2 nodes.
AmazonSSMManagedInstanceCore is completely optional and grants instances the permissions needed for core system manager functionality, such as System Manager, Session Manager, Run Commands, and so on:
resource "aws_iam_role" "eks_node_group_role_windows" { name = "eks-node-group-windows-role" assume_role_policy = jsonencode({ Statement = [{ Action = "sts:AssumeRole" Effect = "Allow" Principal = { ... resource "aws_iam_role_policy_attachment" "eks_windows_node_group_role_attach" { for_each = toset([ "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy", "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly", "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy", "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" ...
In order to support Windows with the VPC CNI, vpc-resource-controller and vpc-admission-webhook need to be deployed in the control plane, so we need to enable the feature called enable-windows-ipam: true on Amazon EKS:
resource "kubernetes_config_map" "amazon_vpc_cni_windows" { depends_on = [ aws_eks_cluster.eks_windows ] metadata { name = "amazon-vpc-cni" namespace = "kube-system" } data = { enable-windows-ipam : "true" } }
Components such as kubelet and kube-proxy need to communicate with kube-api. We are setting what permissions a Windows and Linux Amazon EC2 node can perform against the Amazon EKS cluster:
resource "kubernetes_config_map" "configmap" { data = { "mapRoles" = <<EOT - groups: - system:bootstrappers - system:nodes rolearn: arn:aws:iam::${data.aws_caller_identity.account_id.account_id}:role/eks-node-group-linux-role username: system:node:{{EC2PrivateDNSName}} - groups: - eks:kube-proxy-windows - system:bootstrappers - system:nodes ...
Here is where we add the EKS Bootstrap covered in the Understanding the EKS Windows bootstrap topic. We encapsulate the bootstrap as a base64encode string in order to be able to manipulate the values with Terraform expressions:
resource "aws_launch_template" "eks_windows_nodegroup_lt" { name = "eks_windows_nodegroup_lt" vpc_security_group_ids = [aws_security_group.windows_sg.id, aws_security_group.cluster_sg.id] image_id = data.aws_ami.eks_optimized_ami.id instance_type = "t3.large" user_data = "${base64encode(<<EOF <powershell> [string]$EKSBinDir = "$env:ProgramFilesAmazonEKS" [string]$EKSBootstrapScriptFile = "$env:ProgramFilesAmazonEKSStart-EKSBootstrap.ps1" & $EKSBootstrapScriptFile -EKSClusterName "${aws_eks_cluster.eks_windows.name}" -APIServerEndpoint "${aws_eks_cluster.eks_windows.endpoint}" -Base64ClusterCA "${data.aws_eks_cluster.eks_windows_cluster_data.certificate_authority[0].data}" 3>&1 4>&1 5>&1 6>&1 </powershell> EOF ...
Last but not the least, we will need to create an Auto Scaling group and hook up the launch template we created earlier:
resource "aws_autoscaling_group" "eks-windows-nodegroup-asg" { name = "Windows_worker_nodes_asg" desired_capacity = 1 max_size = 5 min_size = 1 #target_group_arns = [var.external_alb_target_group_arn] launch_template { id = aws_launch_template.eks_windows_nodegroup_lt.id version = "$Latest" ...
At this point, we have a heterogeneous EKS cluster with a Windows and Linux node group and the proper IAM roles, instance roles, and permissions. The deployment we just did looks like the following figure:
Figure 9.3 – Heterogeneous Amazon EKS cluster with Windows nodes
Congrats! You just deployed an Amazon EKS cluster with a Windows node group using Terraform. I know that it may sound daunting at first, but once you understand the moving parts needed to deploy the EKS cluster, it will become easier.
In this chapter, we started learning about Amazon EKS-optimized Windows AMIs and their components; then, we learned about EKS Windows bootstrap and its parameters. Next, we dove deep into how the SMB CSI driver works on Amazon EC2 Windows nodes and Amazon FSx for Windows File Service. Finally, we reviewed the code snippets on deploying a heterogeneous Amazon EKS cluster with a Windows node group.
The next chapter will discuss how to schedule a Windows pod, its best practices, and how to integrate the Windows pods with Active Directory using group managed service accounts (gMSA).
3.14.15.94