9

Deploying a Windows Node Group

In this chapter, we’ll learn how a self-managed Windows node and its components work inside and out. We will first learn about the available EKS-optimized Amazon Machine Images (AMIs) for Windows, then we will dive deep into Amazon EKS Windows nodes, followed by available container runtimes and best practices. Finally, we will deploy a heterogenous Amazon EKS cluster.

The chapter will cover the following topics:

  • Amazon EKS node groups
  • Amazon EKS-optimized Windows AMIs
  • Understanding the EKS Windows bootstrap
  • Working with persistent storage using CSI drivers
  • Deploying an Amazon EKS cluster with a Windows node group using Terraform

Technical requirements

In the Deploying an Amazon EKS cluster with a Windows node group using Terraform section, you will need to have the following expertise as well as technologies installed:

  • AWS CLI
  • Terraform CLI
  • IAM user with an administrator-managed policy
  • Terraform development expertise

To have access to the source code used in this chapter, access the following GitHub repository: https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/main/eks-windows.

Important note

It is strongly recommended that you use an AWS test account to perform the activities described in the book and never use it against your production environment. This chapter uses an administrator-managed policy due to the simple fact that Amazon EKS permissions are very granular and we want to keep things simple for the exercise.

Amazon EKS node groups

Let’s first start with understanding the different nomenclatures between the Kubernetes project and Amazon EKS. In the Kubernetes world, a cluster consists of worker Nodes, which are responsible for running containerized applications in the form of a Pod.

In Amazon EKS, worker nodes are called Amazon EC2 nodes, and one or more Amazon EC2 nodes that are deployed into the same Amazon EC2 Auto Scaling group are called a node group.

Amazon EKS offers three node group options:

  • Managed node groups automate the provisioning and life cycle management of Amazon EC2 nodes. One of the benefits of managed mode groups is that you don’t need to care about draining nodes during node replacement, as it is handled by the AWS data plane. When this book was written, managed node groups weren’t supported for the Windows OS; however, remember that an Amazon EKS cluster with a Windows node group is a heterogenous cluster since at least one Linux node group is necessary to run core functionalities, such as CoreDNS.
  • Self-managed nodes mean that you are responsible for the entire Amazon EC2 node life cycle, patching, hardening, and node draining. It offers greater flexibility when compared with a managed node group, at the expense of additional operational overhead. Self-managed nodes are fully compatible with Windows OS; this is the method we will explore in this chapter.
  • AWS Fargate is not available for Windows OS on Amazon EKS. AWS Fargate is thoroughly explained in Chapter 6, Deploying a Fargate Windows-Based Task.

Amazon EKS-optimized Windows AMIs

AWS provides customers with Amazon EKS-optimized Windows AMIs, which are preconfigured with the necessary components such as Docker Engine, Kubelet, and Containerd to run Windows containers as pods successfully.

There are four Amazon EKS-optimized Windows AMI variants:

  • Amazon EKS-optimized Windows Server 2022 Full AMI
  • Amazon EKS-optimized Windows Server 2022 Core AMI
  • Amazon EKS-optimized Windows Server 2019 Full AMI
  • Amazon EKS-optimized Windows Server 2019 Core AMI

In Chapter 4, Deploying a Windows Container Instance, we dove deep into the differences between the Full and Core variants.

The Amazon EKS-optimized AMI has the following components included:

  • kubelet is a node agent that runs on each node. Its main responsibility is to register the node with the cluster's API Server and ensures pods specified on PodSpec are running and in a healthy state.
  • kube-proxy is a network proxy that runs on each node in the cluster. It uses an operational system packet filter called a Virtual Filtering Platform (VFP) that Windows Host Network Service (HNS) manages.
  • AWS IAM Authenticator provides cluster authentication through IAM credentials and/or tokens.
  • CSI proxy is a binary that exposes a set of gRPC APIs through named pipes to manage the storage life cycle through a CSI driver on Windows Pods. Unlike Linux, Windows containers don’t have privileged permissions to directly call the APIs exposed by the Windows host OS to mount, unmount, list, and resize volumes directly, which is a requirement when running CSI drivers such as Amazon EBS CSI Driver or SMB CSI Driver. The CSI proxy workaround problem will be solved soon with the HostProcess container mode on Windows Server.
  • Docker is the most well-known container platform, used by millions of people. The Kubernetes community decided to deprecate Docker runtimes in favor of runtimes that use the Container Runtime Interface (CRI) created for Kubernetes. EKS-0ptimized Windows AMIs still have Docker runtime installed but will automatically be shifted to containerd as the default container runtime on EKS 1.23.
  • containerd is the container runtime of Docker, and it has graduated to the Cloud Native Computing Foundation (CNCF) project. In simple words, you don’t need to have the entire Docker ecosystem for its sole purpose to manage the container life cycle, and this is where containerd becomes a lightweight option to manage the container life cycle.

Consuming an EKS-optimized Windows AMI using Terraform

When working with Terraform, one of the easiest ways to use the latest EKS-optimized Windows AMI is through data sources. Data sources allow Terraform to query for data outside of Terraform and then output it as a value elsewhere in Terraform code. The following is an example SSM API call to retrieve the latest EKS-optimized Windows AMI ID:

data "aws_ami" "eks_optimized_ami" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["Windows_Server-2019-English-Core-ECS_Optimized-*"]
  }
}

Understanding the EKS Windows bootstrap

When you launch a Windows node, there is a Start-EKSBootstrap.ps1 on the AMI that is responsible for bootstrapping the Windows node with the appropriate kubelet configuration.

There are some required and optional parameters that need to be set when bootstrapping using Terraform. First, let’s understand what the available parameters are:

  • -EKSClusterName: Specifies the Amazon EKS cluster name for this worker node to join.
  • -KubeletExtraArgs: Specifies extra arguments for kubelet (optional).
  • -KubeProxyExtraArgs: Specifies extra arguments for kube-proxy (optional).
  • -APIServerEndpoint: Specifies the Amazon EKS cluster API server endpoint (optional). Only valid when used with -Base64ClusterCA.
  • -Base64ClusterCA: Specifies the base64-encoded cluster Certificate Authority (CA) content (optional). Only valid when used with -APIServerEndpoint.
  • -DNSClusterIP: Overrides the IP address to use for Domain Name Server (DNS) queries within the cluster (optional). Defaults to 10.100.0.10 or 172.20.0.10 based on the IP address of the primary interface.
  • -ContainerRuntime: Specifies the container runtime to be used on the node. (For Amazon EKS 1.24, containerd is the default container runtime.)

In Terraform, we can bootstrap these Windows nodes using a launch template by specifying the following user data inside the aws_launch_template resource:

user_data = "${base64encode(<<EOF
<powershell>
[string]$EKSBinDir = "$env:ProgramFilesAmazonEKS"
[string]$EKSBootstrapScriptFile = "$env:ProgramFilesAmazonEKSStart-EKSBootstrap.ps1"
& $EKSBootstrapScriptFile -EKSClusterName "${aws_eks_cluster.eks_windows.name}" -APIServerEndpoint "${aws_eks_cluster.eks_windows.endpoint}" -Base64ClusterCA "${data.aws_eks_cluster.eks_windows_cluster_data.certificate_authority[0].data}" 3>&1 4>&1 5>&1 6>&1
</powershell>
EOF
  )}"

As you can see, -EKSClusterName, -APIServerEndpoint, and -Base64ClusterCA are dynamically filled up with the values generated by Terraform during cluster deployment, which we will dive deep into at the end of this chapter.

Working with persistent storage using CSI drivers

In Chapter 5, Deploying an EC2 Windows-Based Task, in the Setting up persistent storage section, we discussed the persistent storage use cases for Windows containers using Amazon FSx for Windows File Server and Amazon EBS. The use case remains the same; the only difference is how each container orchestrator manages the storage life cycle.

Kubernetes uses the Container Storage Interface (CSI), a standard for exposing arbitrary block and file storage systems to containerized workloads. There are two open source CSI drivers with Windows supportability:

  • Amazon EBS CSI driver: An open source project developed by Amazon Web Services (AWS) that allows Amazon EKS clusters to manage the life cycle of Amazon EBS volumes for persistent volumes
  • SMB CSI driver for Kubernetes: Developed by the open source community, it allows Kubernetes to manage the life cycle of Server Message Block (SMB) shares for persistent volumes

In this section, I will solely focus on the most common use case, a combination of SMB CSI driver for Kubernetes and Amazon FSx for Windows File Server. First, let’s understand how the CSI driver works on Windows nodes, then we will dive deep into StatefulSets and Deployments as a way to consume the persistent storage.

SMB CSI driver high-level overview

One of the prerequisites to using CSI drivers on Windows is to have CSI Proxy installed on the Windows node, and as mentioned previously, the ECS-optimized Windows AMI already has this component installed.

Once the SMB CSI driver is installed, a new controller pod is scheduled in the kube-system namespace, and then the following steps are performed to mount a directory within the SMB share into a Pod using the CSI driver:

  1. The controller pod is scheduled on a Linux node and starts listening for connections on //csi/csi.sock.
  2. The csi-smb-node-win pod is scheduled on the Windows node and is responsible for installing the smb.csi.k8s.io driver, mount point, credential retriever, and fetch updates from the CSI-SMB-Controller pod.
  3. Once a StatefulSet or Deployment schedules a Windows pod with a mount point, the csi-smb-node-win pod retrieves this request from the controller pod and retrieves credentials from the Kubernetes Secret.
  4. The csi-smb-node-win pod requests CSI proxy to map and mount an SMB share using the SMBGlobalMapping feature with the credentials retrieved from the Kubernetes Secret.
  5. The share is made available through a PersistentVolumeClaim (PVC).
  6. The PVC is attached to the pod.
Figure 9.1 – SMB CSI driver workflow

Figure 9.1 – SMB CSI driver workflow

The workflow is illustrated in the preceding diagram.

Managing persistent volumes on Kubernetes

In addition to the CSI, there are three API resources available on Kubernetes to handle persistent volumes: Persistent Volume (PV), PVC, and StorageClass:

  • StorageClass defines storage in your cluster – type of storage, permissions, and so on.
  • PV reserves a specific amount of storage from a StorageClass. It defines what CSI driver will be used and different options, such as the reclaim policy, and when the secret is to be used when accessing the storage defined in the StorageClass.
  • PVC allocates a portion of the storage available in the PV, and then, mounts to the pod.

The following figure shows how these three components are related to each other to provide persistent storage to a pod; each component is decoupled, which makes it easier to scale and add new storage types to your cluster:

Figure 9.2 – Relationship between StorageClass, PV, and PVC

Figure 9.2 – Relationship between StorageClass, PV, and PVC

So far, we have covered the bottom component, the CSI driver, and the Kubernetes APIs that control persistent storage on Kubernetes; now, let’s talk about the most common options for consuming PVCs:

  • StatefulSets are good for applications that require persistent storage but aren’t meant to share this storage with any other deployment.
  • Doing a parallel comparison with the VM and storage worlds, think about a Storage Area Network (SAN) and Logical Unit Number (LUN) as a PV that is mapped to a specific host bus adapter world wide name (HBA WWN), which in this case would be the PVCPVC.
  • Deployments are the most common way to deploy an application on Kubernetes clusters. You can use Deployments to scale to thousands of Pods but keep the same PVC between them, resulting in shared storage. This scenario is common when you have multiple Pods that need to read/write from the same location, and data must be persistent and shareable between them.

Both options are valid and will address your use case, keeping in mind that StatefulSets will create a new PVC per Pod within the StatefulSet while using Deployments; the same PVC will be shared among Pods within the Deployment.

In this section, we dove deep into how persistent storage works on Kubernetes through an SMB CSI driver, and how it can be consumed by stateful applications that still require local directories to save data.

Deploying an Amazon EKS cluster with Windows nodes using Terraform

In Chapter 7, Amazon EKS – Overview, we learned about the fundamentals of an Amazon EKS cluster, then in Chapter 8, Preparing the Cluster for OS Interoperability, we learned how to prepare a heterogeneous Amazon EKS cluster with Windows nodes, and we explored core infrastructure concepts of Windows nodes in this chapter.

Now, we will be diving deep into how to deploy a heterogeneous Amazon EKS cluster with Windows a node group using Terraform.

Important note

You will see code snippets for the remaining part of this chapter. The full Terraform code for this chapter can be found at https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/main/eks-windows.

Creating security groups

We will first create the security group that will allow inbound/outbound TLS traffic between the Amazon EKS cluster and Windows nodes:

Name

Source

Protocol

Port

cluster_sg

Windows_sg

TCP

Any

Windows_sg

Cluster_sg

TCP

Any

Table 9.1 – Security group for external and internal traffic

In main.tf, between lines 118 and 134, you will find the rules that are being created on the security groups:

resource "aws_security_group_rule" "rule_worker_windows" {
  type                     = "ingress"
  from_port                = 0
  to_port                  = 0
  protocol                 = -1
  source_security_group_id = aws_security_group.cluster_sg.id
  security_group_id        = aws_security_group.windows_sg.id
}
resource "aws_security_group_rule" "rule_control_plane" {
  type                     = "ingress"
  from_port                = 0
  to_port                  = 0
...

Creating an OpenID Connect endpoint and IAM roles for the cluster

The next step is to create an OpenID Connect (OIDC) endpoint within the cluster to support IAM roles for service accounts (IRSA):

resource "aws_iam_openid_connect_provider" "eks_iam_openid" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks_windows_cluster_tls.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.eks_windows.identity[0].oidc[0].issuer
}

With the OIDC configuration in place, now we will ensure the VPC CNI has the necessary role and assume role policy and IAM policy to be attached. We will use AmazonEKS_CNI_Policy, a managed policy that is specifically used by the Amazon VPC CNI plugin:

### EKS VPC CNI Role
resource "aws_iam_role" "eks_vpc_cni_role" {
  assume_role_policy = data.aws_iam_policy_document.eks_windows_assume_role_policy.json
  name               = "eks-vpc-cni-role"
}
resource "aws_iam_role_policy_attachment" "eks_iam_role_attach_AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_vpc_cni_role.name
}

Finally, we will create an EKS cluster service role and attach two managed policies to the role:

  • AmazonEKSClusterPolicy grants permission for an Amazon EKS cluster to call other AWS services, such as autoscaling, ec2, elasticloadbalacing, iam, and kms
  • AmazonEKSVPCResourceController grants the cluster role permissions to manage Elastic Network Interfaces (ENIs) and IP addresses for nodes

This is how we go about it:

resource "aws_iam_role" "eks_iam_role_cluster_service" {
  name = "eks-cluster-service-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Sid    = ""
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"
        }
...
resource "aws_iam_role_policy_attachment" "eks_iam_role_cluster_service_attach" {
  for_each = toset([
    "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy",
    "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
  ])
...

Creating instance roles for Windows and Linux node groups

As you can see, Amazon EKS uses two different sets of roles: first, all roles that are necessary for the control plane. Second are instance roles, which we will create; these will be used by Windows and Linux Amazon EC2 nodes to be able to join and perform activities on the cluster.

AmazonEKSWorkerNodePolicy grants Amazon EC2 nodes permission to connect to Amazon EKS clusters.

AmazonEC2ContainerRegistryReadOnly grants read-only permissions to list images and repositories as well as pull images from Amazon ECR.

AmazonEKS_CNI_Policy is specifically used by the VPC CNI plugin, which has permission to assign, create, attach, delete, describe, and detach ENIs on Amazon EC2 nodes.

AmazonSSMManagedInstanceCore is completely optional and grants instances the permissions needed for core system manager functionality, such as System Manager, Session Manager, Run Commands, and so on:

resource "aws_iam_role" "eks_node_group_role_windows" {
  name = "eks-node-group-windows-role"
  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
...
resource "aws_iam_role_policy_attachment" "eks_windows_node_group_role_attach" {
  for_each = toset([
    "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
    "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
    "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
    "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
...

Enabling VPC CNI Windows support

In order to support Windows with the VPC CNI, vpc-resource-controller and vpc-admission-webhook need to be deployed in the control plane, so we need to enable the feature called enable-windows-ipam: true on Amazon EKS:

resource "kubernetes_config_map" "amazon_vpc_cni_windows" {
  depends_on = [
    aws_eks_cluster.eks_windows
  ]
  metadata {
    name      = "amazon-vpc-cni"
    namespace = "kube-system"
  }
  data = {
    enable-windows-ipam : "true"
  }
}

Using ConfigMap to add Kubernetes permissions (RBAC) to a node level

Components such as kubelet and kube-proxy need to communicate with kube-api. We are setting what permissions a Windows and Linux Amazon EC2 node can perform against the Amazon EKS cluster:

resource "kubernetes_config_map" "configmap" {
  data = {
    "mapRoles" = <<EOT
- groups:
  - system:bootstrappers
  - system:nodes
  rolearn: arn:aws:iam::${data.aws_caller_identity.account_id.account_id}:role/eks-node-group-linux-role
  username: system:node:{{EC2PrivateDNSName}}
- groups:
  - eks:kube-proxy-windows
  - system:bootstrappers
  - system:nodes
...

Creating a launch template to launch and bootstrap Windows and Linux Amazon EC2 nodes

Here is where we add the EKS Bootstrap covered in the Understanding the EKS Windows bootstrap topic. We encapsulate the bootstrap as a base64encode string in order to be able to manipulate the values with Terraform expressions:

resource "aws_launch_template" "eks_windows_nodegroup_lt" {
  name                   = "eks_windows_nodegroup_lt"
  vpc_security_group_ids = [aws_security_group.windows_sg.id, aws_security_group.cluster_sg.id]
  image_id               = data.aws_ami.eks_optimized_ami.id
  instance_type          = "t3.large"
  user_data = "${base64encode(<<EOF
<powershell>
[string]$EKSBinDir = "$env:ProgramFilesAmazonEKS"
[string]$EKSBootstrapScriptFile = "$env:ProgramFilesAmazonEKSStart-EKSBootstrap.ps1"
& $EKSBootstrapScriptFile -EKSClusterName "${aws_eks_cluster.eks_windows.name}" -APIServerEndpoint "${aws_eks_cluster.eks_windows.endpoint}" -Base64ClusterCA "${data.aws_eks_cluster.eks_windows_cluster_data.certificate_authority[0].data}" 3>&1 4>&1 5>&1 6>&1
</powershell>
EOF
...

Creating an Auto Scaling group

Last but not the least, we will need to create an Auto Scaling group and hook up the launch template we created earlier:

resource "aws_autoscaling_group" "eks-windows-nodegroup-asg" {
  name             = "Windows_worker_nodes_asg"
  desired_capacity = 1
  max_size         = 5
  min_size         = 1
  #target_group_arns = [var.external_alb_target_group_arn]
  launch_template {
    id      = aws_launch_template.eks_windows_nodegroup_lt.id
    version = "$Latest"
...

At this point, we have a heterogeneous EKS cluster with a Windows and Linux node group and the proper IAM roles, instance roles, and permissions. The deployment we just did looks like the following figure:

Figure 9.3 – Heterogeneous Amazon EKS cluster with Windows nodes

Figure 9.3 – Heterogeneous Amazon EKS cluster with Windows nodes

Congrats! You just deployed an Amazon EKS cluster with a Windows node group using Terraform. I know that it may sound daunting at first, but once you understand the moving parts needed to deploy the EKS cluster, it will become easier.

Summary

In this chapter, we started learning about Amazon EKS-optimized Windows AMIs and their components; then, we learned about EKS Windows bootstrap and its parameters. Next, we dove deep into how the SMB CSI driver works on Amazon EC2 Windows nodes and Amazon FSx for Windows File Service. Finally, we reviewed the code snippets on deploying a heterogeneous Amazon EKS cluster with a Windows node group.

The next chapter will discuss how to schedule a Windows pod, its best practices, and how to integrate the Windows pods with Active Directory using group managed service accounts (gMSA).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.15.94