Previously, we have covered a swath of networking fundamentals, and how traffic in Kubernetes gets from A to B. In this chapter, we will discuss networking abstractions in Kubernetes, primarily service discovery and load balancing. Most notably, this is the chapter on Services and Ingresses. Both resources are notoriously complex, due to the large amount of options, as they attempt to solve numerous use cases. These are the most visible part of the Kubernetes network stack, as they define basic network characteristics of workloads on Kubernetes. This is where developers interact with the networking stack for their applications deployed on Kubernetes.
This chapter will cover fundamental examples of Kubernetes networking abstractions, and details on how they work. To follow along, you will need the following tools.
Docker
Kind
Linkerd
You will need to be familiar with the kubectl exec
and docker exec
commands. If you are not our code repo will
have any and all the commands we discuss so don’t worry too much. We will also make use of ip
and netns
from
Chapter 2 and 3. Note that most of these tools are for debugging and showing implementation details - you would not
necessarily need them during normal operations.
kubectl
will be a key tool in this chapter’s examples,
and it is the standard for operators to interact with clusters and their networks.
You should be familiar with kubectl create
, apply
, get
, delete
, and exec
commands.
You can read more at kubernetes.io/docs/reference/generated/kubectl/kubectl-commands,
or by running kubectl [command] --help
.
Docker, Kind, and Linkerd installs are available on their respective sites, and we’ve provided more information in the book’s code repository as well.
This chapter will explore these Kubernetes Networking Abstractions:
Statefulsets
Endpoints
Endpoint Slices
Services
NodePort
Cluster
Headless
External
LoadBalancer
Ingress
Ingress Controller
Ingress rules
Service Meshes
Linkerd
In order to explore these abstractions we will deploy these examples to our Kubernetes cluster with the following steps.
Deploy Kind Cluster with ingress enabled
Explore StatefulSets
Deploy Kubernetes Services
Deploy an Ingress Controller
Deploying Linkerd Service Mesh
These abstractions are at the heart of what the Kubernetes API provides to Developers and administrators to programmatic control the flow of communications into and out of the cluster. Mastering understanding and deploying these abstractions is crucial for success of any workload inside a cluster. After working through these examples you will understand which abstractions to use for what situations for your applications.
With the kind cluster configuration yaml, we can use kind to create that cluster with the below command. If this is the first time running it, it will take some time to download all the docker images for the working and control plane docker images.
The following examples assume that you still have the local kind cluster running from the previous chapter, along with the golang web server and the dnsutils images for testing.
StatefulSets are a workload abstraction in Kubernetes, to manage pods very similar to a deployment. Unlike a deployment StatefulSets add the following features for applications that require them:
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
The Deployment resource is better suited for applications that do not have these requirements (for example, a service which stores data in an external database).
Our Database for the Golang Minimal web server uses a statefulset. The database has a service, a configmap for the postgres username, password and test database name and a statefulset for the containers running postgres.
Let us deploy it now.
kubectl apply -f database.yaml service/postgres created configmap/postgres-config created statefulset.apps/postgres created
Let us examine the DNS and network ramifications of using a statefulset.
To test dns inside the cluster we can use the dnsutils image, this image is gcr
.io/kubernetes-e2e-test-images/dnsutils:1.3
and is used for k8s testing.
kubectl apply -f dnsutils.yaml
pod/dnsutils created
kubectl get pods
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 0
9s
With the replica configured with two pods, we see the statefulset deploy postgres-0 and postgres-1, in that order, a feature of statefulsets with IP address 10.244.1.3 and 10.244.2.3 respectively.
kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE dnsutils 1/1 Running0
15m 10.244.3.2 kind-worker3 postgres-0 1/1 Running0
15m 10.244.1.3 kind-worker2 postgres-1 1/1 Running0
14m 10.244.2.3 kind-worker
Here is the name of our headless service, postgres, that the client can use for queries to return the endpoint IP addresses.
kubectl get svc postgres NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(
S)
AGE postgres ClusterIP <none> 5432/TCP 23m
Using our dnsutils image we can see that the DNS names for the statefulsets will return those IP Addresses along with the cluster IP of the postgres service.
kubectlexec
dnsutils -- host postgres-0.postgres.default.svc.cluster.local. postgres-0.postgres.default.svc.cluster.local has address 10.244.1.3 kubectlexec
dnsutils -- host postgres-1.postgres.default.svc.cluster.local. postgres-1.postgres.default.svc.cluster.local has address 10.244.2.3 kubectlexec
dnsutils -- host postgres postgres.default.svc.cluster.local has address 10.105.214.153
Statefulsets attempt to mimic a fixed group of persistent machines. As a generic solution for stateful workloads, specific behavior may be frustrating in specific use cases.
A common problem that users encounter is an update requiring manual intervention to fix. When using .spec
.updateStrategy.type: RollingUpdate
, and .spec.podManagementPolicy: OrderedReady
, both of which are default settings.
With these settings, a user must manually intervene if an updated pod never becomes ready.
Also, StatefulSets require a Service, preferable headless, to be responsible for the network identity of the Pods and end users are responsible for creating this Service.
Statefulsets have many configuration options, and many third party alternatives exist (both generic stateful workload controllers, and software-specific workload controllers).
Statefulsets offer functionality for a specific use case in Kubernetes. They should not be used for everyday application deployments. Later in this section we discuss more appropriate Networking abstractions for run-of-the-mill deployments.
In our next section we will explore Endpoints and Endpoints Slices, the backbone of Kubernetes services.
Endpoints help identity what pods are running for the service it powers. Endpoints are created and managed by Services. We will discuss services on their own later, to avoid covering too many new things at once. For now, let us just say that a service contains a standard label selector (introduced in Chapter 4), which defines which pods are in the Endpoints.
In Figure 5-1 we can see traffic being directed to an endpoint on node 2, pod 5.
Let us discuss how this Endpoint is created and maintained in the cluster.
Each endpoint contains a list of ports (which apply to all pods), and two lists of addresses: ready and unready.
apiVersion
:
v1
kind
:
Endpoints
metadata
:
labels
:
name
:
demo-endpoints
subsets
:
-
addresses
:
-
ip
:
10.0.0.1
-
notReadyAddresses
:
-
ip
:
10.0.0.2
ports
:
-
port
:
8080
protocol
:
TCP
Addresses are listed in .addresses
if they are passing pod readiness checks.
Addresses are listed in .notReadyAddresses
if they are not.
This makes endpoints a service discovery tool,
where you can watch an endpoints object to see the health and addresses of all pods.
kubectl get endpoints clusterip-service
NAME ENDPOINTS AGE
clusterip-service 10.244.1.5:8080,10.244.2.7:8080,10.244.2.8:8080 + 1
more... 135m
We can get a better view of all the Addresses with kubectl describe
.
kubectl describe endpoints clusterip-service Name: clusterip-service Namespace: default Labels:app
=
app Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-01-30T18:51:36Z Subsets: Addresses: 10.244.1.5,10.244.2.7,10.244.2.8,10.244.3.9 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- <unset
>8080
TCP Events: Type Reason Age From Message ---- ------ ---- ---- -------
Let us remove the app label and see how Kubernetes responds. In a separate terminal run this command. This will allow us to see changes to the pods in real time
kubectl get pods -w
In another separate terminal let us do the same thing with endpoints.
kubectl get endpoints -w
We now need to get a pod name to remove from the endpoint object.
kubectl get pods -lapp
=
app -o wide NAME READY STATUS RESTARTS AGE IP NODE app-5586fc9d77-7frts 1/1 Running0
19m 10.244.1.5 kind-worker2 app-5586fc9d77-mxhgw 1/1 Running0
19m 10.244.3.9 kind-worker3 app-5586fc9d77-qpxwk 1/1 Running0
20m 10.244.2.7 kind-worker app-5586fc9d77-tpz8q 1/1 Running0
19m 10.244.2.8 kind-worker
With kubectl label
we can alter the pod app-5586fc9d77-7frts
app=app
label.
kubectl label pod app-5586fc9d77-7frtsapp
=
nope --overwrite pod/app-5586fc9d77-7frts labeled
Both Watch commands on Endpoints and Pods will see some changes for the same reason, removal of the label on the pod. The Endpoint controller will notice a change to the pods with the label app=app and so did the Deployment controller. So Kubernetes did what Kubernetes does, it made the real state reflect the desired state.
kubectl get pods -w NAME READY STATUS RESTARTS AGE app-5586fc9d77-7frts 1/1 Running0
21m app-5586fc9d77-mxhgw 1/1 Running0
21m app-5586fc9d77-qpxwk 1/1 Running0
22m app-5586fc9d77-tpz8q 1/1 Running0
21m dnsutils 1/1 Running3
3h1m postgres-0 1/1 Running0
3h postgres-1 1/1 Running0
3h app-5586fc9d77-7frts 1/1 Running0
22m app-5586fc9d77-7frts 1/1 Running0
22m app-5586fc9d77-6dcg2 0/1 Pending0
0s app-5586fc9d77-6dcg2 0/1 Pending0
0s app-5586fc9d77-6dcg2 0/1 ContainerCreating0
0s app-5586fc9d77-6dcg2 0/1 Running0
2s app-5586fc9d77-6dcg2 1/1 Running0
7s
The deployment has four pods but our relabeled pod still exists app-5586fc9d77-7frts
kubectl get pods NAME READY STATUS RESTARTS AGE app-5586fc9d77-6dcg2 1/1 Running0
4m51s app-5586fc9d77-7frts 1/1 Running0
27m app-5586fc9d77-mxhgw 1/1 Running0
27m app-5586fc9d77-qpxwk 1/1 Running0
28m app-5586fc9d77-tpz8q 1/1 Running0
27m dnsutils 1/1 Running3
3h6m postgres-0 1/1 Running0
3h6m postgres-1 1/1 Running0
3h6m
The pod app-5586fc9d77-6dcg2
now is part of the deployment and endpoint object with IP address 10.244.1.6
.
kubectl get pods app-5586fc9d77-6dcg2 -o wide
NAME READY STATUS RESTARTS AGE IP NODE
app-5586fc9d77-6dcg2 1/1 Running 0
3m6s 10.244.1.6 kind-worker2
As always, we can see the full picture of details with kubectl describe
kubectl describe endpoints clusterip-service Name: clusterip-service Namespace: default Labels:app
=
app Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-01-30T19:14:23Z Subsets: Addresses: 10.244.1.6,10.244.2.7,10.244.2.8,10.244.3.9 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- <unset
>8080
TCP Events: Type Reason Age From Message ---- ------ ---- ---- -------
For large deployments that endpoint object can become very large. So much so that it can actually slow down changes in the cluster. To solve that issue, the Kubernetes maintainers have come up with Endpoint Slices.
You may be asking how are they different from Endpoints? This is where we really start to get into the weeds of Kubernetes networking.
In a typical cluster, Kubernetes runs kube-proxy on every node. kube-proxy is responsible for the per-node portions of making Services work, by handling routing and outbound load balancing to all the pods in a Service. To do that, kube-proxy watches all endpoints in the cluster, so that it knows all applicable pods that all services should route to.
Now, imagine we have a big cluster, with thousands of nodes, and tens of thousands of pods. That means thousands of kube-proxies are watching endpoints. When an address changes in an endpoints object (say, from a rolling update, scale up, eviction, healthcheck failure, or any number of reasons), the updated endpoints object is pushed to all listening kube-proxies. It is made worse by the number of pods, since more pods means larger endpoints objects, and more frequent changes. This eventually becomes a strain on etcd, the Kubernetes apiserver, and the network itself. Kubernetes scaling limits are complex and depend on specific criteria, but endpoints watching is a common problem in clusters that have thousands of nodes. Anecdotally, many Kubernetes users consider endpoints watches to be the ultimate bottleneck of cluster size.
This problem is a function of kube-proxy’s design, and the expectation that any pod should be immediately be able to route to any service with no notice. EndpointSlices are an approach that allows kube-proxy’s fundamental design to continue, while drastically reducing the watch bottleneck in large clusters where large services are used.
EndpointSlices have similar contents to Endpoints objects but also include an array of Endpoint objects.
apiVersion
:
discovery.k8s.io/v1beta1
kind
:
EndpointSlice
metadata
:
name
:
demo-slice-1
labels
:
kubernetes.io/service-name
:
demo
addressType
:
IPv4
ports
:
-
name
:
http
protocol
:
TCP
port
:
80
endpoints
:
-
addresses
:
-
"10.0.0.1"
conditions
:
ready
:
true
The meaningful difference between endpoints and endpointslices is not the schema, but how Kubernetes treats them. With “regular” endpoints, a Kubernetes Service creates one endpoint object for all pods in the Service. A Service creates multiple EndpointSlices, each containing a subset of pods, Figure 5-2 depicts this subset. The union of all EndpointSlices for a service contains all pods in the service. This way, an IP address change (due to a new pod, deleted pod, or a pod’s health changing) will result in a much smaller data transfer to watchers. Because Kubernetes doesn’t have a transactional API, the same address may appear temporarily in multiple slices. Any code consuming EndpointSlices (such as kube-proxy) must be able to account for this.
The maximum number of addresses in an EndpointsSlice is set using the --max-endpoints-per-slice
kube-controller-manager flag.
The current default is 100, and the maximum is 1000.
The endpointslice controller attempts to fill existing EndpointSlices before creating new ones,
but does not rebalance EndpointSlices.
The endpointslice controller mirrors endpoints to endpointslices, to allow systems to continue writing endpoints while treating endpointslices as the source of truth. The exact future of this behavior, and Endpoints in general, has not been finalized (however, as a v1 resource, Endpoints would be sunset with substantial notice). There are 4 exceptions that will prevent mirroring:
There is no corresponding Service.
the corresponding Service resource selects pods.
The Endpoints object has the label endpointslice.kubernetes.io/skip-mirror: true
.
The Endpoints object has the annotation control-plane.alpha.kubernetes.io/leader
.
You can fetch all EndpointSlices for a specific Service,
by fetching EndpointSlices filtered to the desired name in .metadata.labels."kubernetes.io/service-name"
.
EndpointSlices have been in beta state since Kubernetes 1.17. This is still the case of Kubernetes 1.20, the current version at the time of writing. Beta resources typically don’t see major changes, and eventually graduate to stable APIs, but that is not a guaranteed. If you directly use EndpointSlices, be aware that a future Kubernetes release may make a breaking change without much warning, or the behaviors described here may change.
Let see some endpoints running in the cluster now with kubectl get endpointslice
kubectl get endpointslice NAME ADDRESSTYPE PORTS ENDPOINTS clusterip-service-l2n9q IPv48080
10.244.2.7,10.244.2.8,10.244.1.5 +1
more...
If we want more detail about the endpointslice clusterip-service-l2n9q
we can use kubectl describe
on it.
kubectl describe endpointslice clusterip-service-l2n9q Name: clusterip-service-l2n9q Namespace: default Labels: endpointslice.kubernetes.io/managed-by=
endpointslice-controller.k8s.io kubernetes.io/service-name=
clusterip-service Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-01-30T18:51:36Z AddressType: IPv4 Ports: Name Port Protocol ---- ---- -------- <unset
>8080
TCP Endpoints: - Addresses: 10.244.2.7 Conditions: Ready:true
Hostname: <
unset
> TargetRef: Pod/app-5586fc9d77-qpxwk Topology: kubernetes.io/hostname=
kind-worker - Addresses: 10.244.2.8 Conditions: Ready:true
Hostname: <
unset
> TargetRef: Pod/app-5586fc9d77-tpz8q Topology: kubernetes.io/hostname=
kind-worker - Addresses: 10.244.1.5 Conditions: Ready:true
Hostname: <
unset
> TargetRef: Pod/app-5586fc9d77-7frts Topology: kubernetes.io/hostname=
kind-worker2 - Addresses: 10.244.3.9 Conditions: Ready:true
Hostname: <
unset
> TargetRef: Pod/app-5586fc9d77-mxhgw Topology: kubernetes.io/hostname=
kind-worker3 Events: <none>
In the output, we see the pod powering the endpointslice from TargetRef
. The Topology
information gives us hostname
of the worker node that the pod is deployed too. Most importantly the Addresses
returns the IP address of the
endpoint object.
Endpoints and endpointslices are important to understand because the identity the pods responsible for the Services, no matter the type deployed. Later in the chapter we review how to use endpoints and labels for troubleshooting. Next we will investigate all the Kubernetes Service Types.
A Service in Kubernetes is a load balancing abstraction within a cluster. There are four types of services,
specified by the .spec.Type
field. Each type offers a different form of load balancing or discovery, which we will
cover individually. The four types are: ClusterIP, NodePort, LoadBalancer, and ExternalName.
Services use a standard pod selector to match pods. The Service will include all matching pods. Services create an endpoints (or endpointsslice) object to handle pod discovery.
apiVersion: v1 kind: Service metadata: name: demo-service spec: selector: app: demo
We will use the Golang minimal webserver for all the services examples. We have added additional functionality to the application to display the host and pod ip in the Rest request.
Figure 5-3 outlines our Pod networking status as a single pod in a cluster. The networking objects are we are about to explore will expose our app pods external the cluster in some instances and in others allows us to scale our application to meet demand. Recall from Chapters 3 and 4 that containers running inside pods share network namespace, among others, there is also a pause container that is created for each pod. The pause container runs manages the namespaces for the pod.
The Pause Container is the parent container for all running containers in the Pod. It holds and shares all the namespaces for the pod. More about the Pause container can be read by Ian Lewis Blog Post https://www.ianlewis.org/en/almighty-pause-container
Before we deploy the services, we must first deploy the web server that the services will be routing traffic too, if we have not already.
kubectl apply -f web.yaml deployment.apps/app created kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE app-9cc7d9df8-ffsm6 1/1 Running0
49s 10.244.1.4 kind-worker2 dnsutils 1/1 Running0
49m 10.244.3.2 kind-worker3 postgres-0 1/1 Running0
48m 10.244.1.3 kind-worker2 postgres-1 1/1 Running0
48m 10.244.2.3 kind-worker
Let’s look at each type of service starting with NodePort.
A NodePort Service provides a simple way for external software, such as a load balancer, to route traffic to the pods
. The software only needs to be aware of node IPs, and the Service’s port(s).A NodePort Service exposes a fixed port
on all nodes, which routes to applicable pods. A NodePort Service uses the .spec.ports.[].nodePort
field to specify
the port to open on all nodes, for the corresponding port on pods.
apiVersion
:
v1
kind
:
Service
metadata
:
name
:
demo-service
spec
:
type
:
NodePort
selector
:
app
:
demo
ports
:
-
port
:
80
targetPort
:
80
nodePort
:
30000
The nodePort
field can be left blank,
in which case Kubernetes will automatically select a unique port.
The flag --service-node-port-range
in kube-controller-manager sets the valid range for ports,30000-32767.
Manually specified ports must be within this range.
Using a NodePort service external users can connect to the nodeport on any node, and be routed to a pod on a node that has a pod backing that service — Figure 5-4 demonstrates this. The service directs traffic to node 3 and Iptables rules forward the traffic to Node 2 hosting the pod This is a bit inefficient, as a typical connection will be routed to a pod on another node.
Figure 5-4 requires us to discuss an attribute of Services, externalTrafficPolicy. ExternalTrafficPolicy indicates how a Service will route external traffic to either node-local or cluster-wide endpoints. “Local” preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services but risks potentially imbalanced traffic spreading. “Cluster” obscures the client source IP and may cause a second hop to another node but should have good overall load-spreading. A “Cluster” value means that for each worker node, the kube-proxy iptable rules are set up to route the traffic to the pods backing the service anywhere in the cluster, just like we have shown in Figure 5-4.
A “Local” value means the kube-proxy iptable rules are set up only on the worker nodes with relevant pods running to route the traffic local to the worker node. Using Local also allows application developers to preserve the source IP of the user request. If you set externalTrafficPolicy to the value Local, kube-proxy will only proxies requests to node local endpoints and will not forward traffic to other nodes. If there are no local endpoints, packets sent to the node are dropped.
Let us scale up the Deployment of our web app for some more testing.
kubectl scale deployment app --replicas 4 deployment.apps/app scaled kubectl get pods -lapp
=
app -o wide NAME READY STATUS RESTARTS AGE IP NODE app-9cc7d9df8-9d5t8 1/1 Running0
43s 10.244.2.4 kind-worker app-9cc7d9df8-ffsm6 1/1 Running0
75m 10.244.1.4 kind-worker2 app-9cc7d9df8-srxk5 1/1 Running0
45s 10.244.3.4 kind-worker3 app-9cc7d9df8-zrnvb 1/1 Running0
43s 10.244.3.5 kind-worker3
With four pods running we will have one pod at every node in the cluster.
kubectl get pods -o wide -lapp
=
app NAME READY STATUS RESTARTS AGE IP NODE app-5586fc9d77-7frts 1/1 Running0
31s 10.244.1.5 kind-worker2 app-5586fc9d77-mxhgw 1/1 Running0
31s 10.244.3.9 kind-worker3 app-5586fc9d77-qpxwk 1/1 Running0
84s 10.244.2.7 kind-worker app-5586fc9d77-tpz8q 1/1 Running0
31s 10.244.2.8 kind-worker
Now let’s deploy our NodePort Service
kubectl apply -f services-nodeport.yaml service/nodeport-service created kubectl describe svc nodeport-service Name: nodeport-service Namespace: default Labels: <none> Annotations: Selector:app
=
app Type: NodePort IP: 10.101.85.57 Port:echo
8080/TCP TargetPort: 8080/TCP NodePort:echo
30040/TCP Endpoints: 10.244.1.5:8080,10.244.2.7:8080,10.244.2.8:8080 +1
more... Session Affinity: None External Traffic Policy: Cluster Events: <none>
In order to test the nodeport service we must retrieve the IP address of a worker node
kubectl get nodes -o wide NAME STATUS ROLES INTERNAL-IP OS-IMAGE KERNEL-VERSION kind-control-plane Ready master 172.18.0.5 Ubuntu 19.10 4.19.121-linuxkit kind-worker Ready <none> 172.18.0.3 Ubuntu 19.10 4.19.121-linuxkit kind-worker2 Ready <none> 172.18.0.4 Ubuntu 19.10 4.19.121-linuxkit kind-worker3 Ready <none> 172.18.0.2 Ubuntu 19.10 4.19.121-linuxkit
Communication external to the cluster will use the NodePort
of 30040 opened on each worker
and the node worker’s IP address.
We can see that our pods are reachable on each host in the cluster.
kubectlexec
-it dnsutils -- wget -q -O- 172.18.0.5:30040/host NODE: kind-worker2, POD IP:10.244.1.5 kubectlexec
-it dnsutils -- wget -q -O- 172.18.0.3:30040/host NODE: kind-worker, POD IP:10.244.2.8 kubectlexec
-it dnsutils -- wget -q -O- 172.18.0.4:30040/host NODE: kind-worker2, POD IP:10.244.1.5
It’s important to consider the limitations as well. A nodeport deployment will fail if it can not allocate the requested port. Also, ports must be tracked across all applications using a NodePort service. Using manually selected ports raises the issue of port collisions (especially when applying a workload to multiple clusters, which may not have the exact same nodeports free).
Another downside of using Nodeport service type is that the load balancer or client software must be aware of node IP addresses. A static configuration (e.g., an operator manually copying node IP addresses) may become too outdated over time, (especially on a cloud provider) as IP addresses change or nodes are replaced. A reliable system automatically populates node IP addresses, either by watching which machines have been allocated to the cluster, or listing nodes from the Kubernetes API itself.
NodePorts are the earliest form of services. We will see that other services types use node ports as a base structure in their architecture. Nodeports should not be used by themselves, as clients would need to know the IP addresses of hosts and the node for connections requests. We will see how nodeports are used to enable load balancers later in the chapter and when we discuss the cloud networks.
Next up is the Services default type, the ClusterIP.
The IP address of pods share the lifecycle of it and thus are not reliable for clients to use for requests. Services help overcome this pod networking design. A ClusterIP Service provides an internal load balancer, with a single IP address that maps to all matching (and ready) pods.
The Service’s IP address must be within the CIDR set in service-cluster-ip-range
, in the apiserver.
You can specify a valid IP address manually, or leave .spec.clusterIP
unset to have one assigned automatically.
The ClusterIP address is a virtual IP address that is only routable internally.
kube-proxy is responsible for making the ClusterIP address route to all applicable pods. See the section on kube-proxy for more. In “normal” configurations, kube-proxy performs L4 load balancing, which may not be sufficient. For example, older pods may see more load, due to accumulating more long-lived connections from clients. Or, a few clients making many requests may cause load to be distributed unevenly.
A particular use case example for ClusterIP is when a workload requires a load balancer within the same cluster.
In Figure 5-5, we can see a ClusterIP service deployed. The Service name is app with a selector or App=App1 There are two pods powering this service. Pod 1 and Pod 5 match the selector for the service.
Let us dig into an example on the command line with our kind cluster.
We will deploy a ClusterIP service for use with our Golang webserver.
kubectl apply -f service-clusterip.yaml service/clusterip-service created kubectl describe svc clusterip-service Name: clusterip-service Namespace: default Labels:app
=
app Annotations: Selector:app
=
app Type: ClusterIP IP: 10.98.252.195 Port: <unset
> 80/TCP TargetPort: 8080/TCP Endpoints: <none> Session Affinity: None Events: <none>
The clusterip service name is resolvable in the network.
kubectl exec
dnsutils -- host clusterip-service
clusterip-service.default.svc.cluster.local has address 10.98.252.195
Now we can reach the Host API endpoint with The Cluster IP, 10.98.252.195
, The Service Name, clusterip-service
,
or directly with the pod IP 10.244.1.4
and port 8080
.
kubectlexec
dnsutils -- wget -q -O- clusterip-service/host NODE: kind-worker2, POD IP:10.244.1.4 kubectlexec
dnsutils -- wget -q -O- 10.98.252.195/host NODE: kind-worker2, POD IP:10.244.1.4 kubectlexec
dnsutils -- wget -q -O- 10.244.1.4:8080/host NODE: kind-worker2, POD IP:10.244.1.4
The clusterIP service is the default type for Services. With that default status, it is warranted that we should explore what the ClusterIP service abstracted for us. If you recall from Chapter 2 and 3, this list is similar to what is set up with Docker Network, but we now also have iptables for the service across all nodes.
View veth pair and match with pod
View network namespace and match with pod
Verify pids on node match pods
Match services with iptables rules
To explore this we need to know what Worker node the pod is deployed too, and that is kind-worker2
kubectl get pods -o wide --field-selector spec.nodeName=
kind-worker2 -lapp
=
app NAME READY STATUS RESTARTS AGE IP NODE app-9cc7d9df8-ffsm6 1/1 Running0
7m23s 10.244.1.4 kind-worker2
Since we are using kind we can use docker ps
and docker exec
to get information out of the running worker node
kind-worker-2
docker ps CONTAINER ID COMMAND PORTS NAMES df6df0736958"/usr/local/bin/entr…"
kind-worker2 e242f11d2d00"/usr/local/bin/entr…"
kind-worker a76b32f37c0e"/usr/local/bin/entr…"
kind-worker3 07ccb63d870f"/usr/local/bin/entr…"
0.0.0.0:80->80/tcp, kind-control-plane 0.0.0.0:443->443/tcp, 127.0.0.1:52321->6443/tcp
kind-worker2
container id is df6df0736958
, kind was kind enough to label each container with names, so we can
reference each worker node with its name kind-worker2
Let’s see our Pod’s, app-9cc7d9df8-ffsm6
, IP address and route table information.
kubectl exec
app-9cc7d9df8-ffsm6 ip r
default via 10.244.1.1 dev eth0
10.244.1.0/24 via 10.244.1.1 dev eth0 src 10.244.1.4
10.244.1.1 dev eth0 scope link src 10.244.1.4
Our Pods IP Address is 10.244.1.4
running on interface [email protected]
with 10.244.1.1
as it’s default route. That
matches the interface 5 on the pod, [email protected]
kubectlexec
app-9cc7d9df8-ffsm6 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu65536
qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: [email protected]: <NOARP> mtu1480
qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: [email protected]: <NOARP> mtu1452
qdisc noop state DOWN group default qlen 1000 link/tunnel6 :: brd :: 5: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu1500
qdisc noqueue state UP group default link/ether 3e:57:42:6e:cd:45 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.244.1.4/24 brd 10.244.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::3c57:42ff:fe6e:cd45/64 scope link valid_lft forever preferred_lft forever
Let’s check the network namespace as well, from the node ip a
output.
dockerexec
-it kind-worker2 ip a <trimmerd> 5: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu1500
qdisc noqueue state UP group default link/ether 3e:39:16:38:3f:23 brd <> link-netns cni-ec37f6e4-a1b5-9bc9-b324-59d612edb4d4 inet 10.244.1.1/32 brd 10.244.1.1 scope global veth45d1f3e8 valid_lft forever preferred_lft forever
netns list
confirms that the network namespaces match our pods interface to the host interface,
cni-ec37f6e4-a1b5-9bc9-b324-59d612edb4d4
.
dockerexec
-it kind-worker2 /usr/sbin/ip netns list cni-ec37f6e4-a1b5-9bc9-b324-59d612edb4d4(
id: 2)
cni-c18c44cb-6c3e-c48d-b783-e7850d40e01c(
id: 1)
Let’s see what process/es run inside that network namespace. For that we will use docker exec
to run commands inside the node kind-worker2
hosting the pod and its network namespace.
docker exec
-it kind-worker2 /usr/sbin/ip netns pid cni-ec37f6e4-a1b5-9bc9-b324-59d612edb4d4
4687
4737
Now we can grep for each process id and inspect what those are doing.
dockerexec
-it kind-worker2 ps aux|
grep 4687 root4687
0.0 0.0968
4
? Ss 17:00 0:00 /pause dockerexec
-it kind-worker2 ps aux|
grep 4737 root4737
0.0 0.0708376
6368
? Ssl 17:00 0:00 /opt/web-server
4737
is the process id of our Web server container running on the kind-worker2
4687
is our pause container holding onto all our namespaces.
Now let’s see what will happen to the iptables on the worker node.
dockerexec
-it kind-worker2 iptables -L Chain INPUT(
policy ACCEPT)
target prot optsource
destination /* kubernetes service portals */ KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW KUBE-FIREWALL all -- anywhere anywhere Chain FORWARD(
policy ACCEPT)
target prot optsource
destination /* kubernetes forwarding rules */ KUBE-FORWARD all -- anywhere anywhere /* kubernetes service portals */ KUBE-SERVICES all -- anywhere anywhere ctstate NEW Chain OUTPUT(
policy ACCEPT)
target prot optsource
destination /* kubernetes service portals */ KUBE-SERVICES all -- anywhere anywhere ctstate NEW KUBE-FIREWALL all -- anywhere anywhere Chain KUBE-EXTERNAL-SERVICES(
1
references)
target prot optsource
destination Chain KUBE-FIREWALL(
2
references)
target prot optsource
destination /* kubernetes firewallfor
dropping marked packets */ DROP all -- anywhere anywhere mark match 0x8000/0x8000 Chain KUBE-FORWARD(
1
references)
target prot optsource
destination DROP all -- anywhere anywhere ctstate INVALID /*kubernetes forwarding rules*/ ACCEPT all -- anywhere anywhere mark match 0x4000/0x4000 /*kubernetes forwarding conntrack podsource
rule*/ ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED /*kubernetes forwarding conntrack pod destination rule*/ ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED Chain KUBE-KUBELET-CANARY(
0
references)
target prot optsource
destination Chain KUBE-PROXY-CANARY(
0
references)
target prot optsource
destination Chain KUBE-SERVICES(
3
references)
target prot optsource
destination
That is a lot of tables being managed by Kubernetes.
We can dive a little deeper to examine the iptables responsible for our services we deployed. Let us retrieve the IP Address of the clusterip-service deployed. We need this to find the matching iptables rules.
kubectl get svc clusterip-service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(
S)
AGE clusterip-service ClusterIP 10.98.252.195 <none> 80/TCP 57m
Now use the cluster ip of the service, 10.98.252.195
, to find our iptables rule.
dockerexec
-it kind-worker2 iptables -L -t nat|
grep 10.98.252.195 /* default/clusterip-service: cluster IP */ KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.98.252.195 tcp dpt:80 /* default/clusterip-service: cluster IP */ KUBE-SVC-V7R3EVKW3DT43QQM tcp -- anywhere 10.98.252.195 tcp dpt:80
List out all the rules on the chain KUBE-SVC-V7R3EVKW3DT43QQM
dockerexec
-it kind-worker2 iptables -t nat -L KUBE-SVC-V7R3EVKW3DT43QQM Chain KUBE-SVC-V7R3EVKW3DT43QQM(
1
references)
target prot optsource
destination /* default/clusterip-service: */ KUBE-SEP-THJR2P3Q4C2QAEPT all -- anywhere anywhere
The KUBE-SEP-
will container the endpoints for the services, KUBE-SEP-THJR2P3Q4C2QAEPT
Now we can see what the rules for this chain are in iptables.
dockerexec
-it kind-worker2 iptables -L KUBE-SEP-THJR2P3Q4C2QAEPT -t nat Chain KUBE-SEP-THJR2P3Q4C2QAEPT(
1
references)
target prot optsource
destination /* default/clusterip-service: */ KUBE-MARK-MASQ all -- 10.244.1.4 anywhere /* default/clusterip-service: */ DNAT tcp -- anywhere anywhere tcp to:10.244.1.4:8080
10.244.1.4:8080
is one of the services endpoints, aka a pod backing the service, which is confirmed with the output
of kubectl get ep clusterip-service
.
kubectl get ep clusterip-service NAME ENDPOINTS AGE clusterip-service 10.244.1.4:8080 62m kubectl describe ep clusterip-service Name: clusterip-service Namespace: default Labels:app
=
app Annotations: <none> Subsets: Addresses: 10.244.1.4 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- <unset
>8080
TCP Events: <none>
Now, let’s explore the limitations of ClusterIP. The ClusterIP is for internal traffic to the cluster. ClusterIP suffers the same issues as endpoints does. As the service size grows, updates to it will slow. In chapter two we discussed how to mitigate that by using IPVS over Iptables as the proxy mode for kubeproxy. We will discuss later in this chapter how to get traffic into the cluster using Ingress and the other service type Loadbalancer.
The ClusterIP is the default type of services, but there are several other specific types of services; headless and externalName. ExternalName is a specific type of services that helps with reaching services outside the cluster. Headless we briefly touched on with statefulsets, but let’s review those in depth now.
A Headless Service isn’t a formal type of service (i.e., there is no .spec.type: Headless
.
A Headless service is a service with .spec.clusterIP: "None"
.
This is distinct from merely not setting a ClusterIP,
which makes Kubernetes automatically assign a ClusterIP.
When ClusterIP is set to “None”, the Service does not support any load balancing functionality. Instead, it only provisions an Endpoints object, and points the service DNS record at all pods that are selected and ready.
A Headless service provides a generic way to watch Endpoints, without needing to interact with the Kubernetes API. Fetching DNS records is much simpler than integrating with the Kubernetes API, and it may not be possible with third party software.
Headless services allows developers to deploy multiple copies of a pod in a deployment. Instead of a single IP address returned, like with ClusterIP, all the IP addresses of the endpoint are returned in the query. It then up to to client to pick which one to use. To see this in action let us scale up the Deployment of our web app.
kubectl scale deployment app --replicas 4 deployment.apps/app scaled kubectl get pods -lapp
=
app -o wide NAME READY STATUS RESTARTS AGE IP NODE app-9cc7d9df8-9d5t8 1/1 Running0
43s 10.244.2.4 kind-worker app-9cc7d9df8-ffsm6 1/1 Running0
75m 10.244.1.4 kind-worker2 app-9cc7d9df8-srxk5 1/1 Running0
45s 10.244.3.4 kind-worker3 app-9cc7d9df8-zrnvb 1/1 Running0
43s 10.244.3.5 kind-worker3
Now let us deploy the headless service.
kubectl apply -f service-headless.yml service/headless-service created
The DNS query will return all four of the Pod IP addresses. Using our dnsutils image we can verify that is the case.
kubectlexec
dnsutils -- host -v -t a headless-service Trying"headless-service.default.svc.cluster.local"
;;
->>HEADER<<- opco
de: QUERY, status: NOERROR, id: 45294;;
flags: qr aa rd;
QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0;;
QUESTION SECTION:;
headless-service.default.svc.cluster.local. IN A;;
ANSWER SECTION: headless-service.default.svc.cluster.local.30
IN A 10.244.2.4 headless-service.default.svc.cluster.local.30
IN A 10.244.3.5 headless-service.default.svc.cluster.local.30
IN A 10.244.1.4 headless-service.default.svc.cluster.local.30
IN A 10.244.3.4 Received292
bytes from 10.96.0.10#53 in0
ms
That ip addresses returned from the query also matches the Endpoints for the service.
kubectl describe
for the endpoint object confirms that.
kubectl describe endpoints headless-service Name: headless-service Namespace: default Labels: service.kubernetes.io/headless Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-01-30T18:16:09Z Subsets: Addresses: 10.244.1.4,10.244.2.4,10.244.3.4,10.244.3.5 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- <unset
>8080
TCP Events: <none>
Headless has a very specific use case and is not typically used for deployments. As we mentioned in the Statefulset section, if developers need to let the client decide which endpoint to use, Headless is the appropriate type of service to deploy. Two examples of headless services are clustered databases and applications that have client-side load-balancing logic built-in to the code.
Our next example is ExternalName, which aids in migrations of services external to cluster. It also offers other DNS advantages inside cluster DNS.
ExternalName is a special case of Service that does not have selectors and uses DNS names instead.
When looking up the host ext-service.default.svc.cluster.local, the cluster DNS Service returns a CNAME record of database.mycompany.com
apiVersion
:
v1
kind
:
Service
metadata
:
name
:
ext-service
spec
:
type
:
ExternalName
externalName
:
database.mycompany.com
If developers are migrating an application into Kubernetes but its dependencies are staying external to the cluster An ExternalName allows you to define a DNS record internal to the cluster no matter where the service actually runs.
DNS will try all the search as seen in the example below.
kubectlexec
-it dnsutils -- host -v -t a github.com Trying"github.com.default.svc.cluster.local"
Trying"github.com.svc.cluster.local"
Trying"github.com.cluster.local"
Trying"github.com"
;;
->>HEADER<<- opco
de: QUERY, status: NOERROR, id: 55908;;
flags: qr rd ra;
QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0;;
QUESTION SECTION:;
github.com. IN A;;
ANSWER SECTION: github.com.30
IN A 140.82.112.3 Received54
bytes from 10.96.0.10#53 in18
ms
As an example, externalName service allows developers to map a Service to a DNS name.
Now if we deploy the External Service
kubectl apply -f service-external.yml service/external-service created
The A record for github.com is return from the external-service query.
kubectlexec
-it dnsutils -- host -v -t a external-service Trying"external-service.default.svc.cluster.local"
;;
->>HEADER<<- opco
de: QUERY, status: NOERROR, id: 11252;;
flags: qr aa rd;
QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0;;
QUESTION SECTION:;
external-service.default.svc.cluster.local. IN A;;
ANSWER SECTION: external-service.default.svc.cluster.local.24
IN CNAME github.com. github.com.24
IN A 140.82.112.3 Received152
bytes from 10.96.0.10#53 in0
ms
The CNAME for external service returns github.com.
kubectlexec
-it dnsutils -- host -v -t cname external-service Trying"external-service.default.svc.cluster.local"
;;
->>HEADER<<- opco
de: QUERY, status: NOERROR, id: 36874;;
flags: qr aa rd;
QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0;;
QUESTION SECTION:;
external-service.default.svc.cluster.local. IN CNAME;;
ANSWER SECTION: external-service.default.svc.cluster.local.30
IN CNAME github.com. Received126
bytes from 10.96.0.10#53 in0
ms
Sending traffic to a Headless Service via DNS record is possible, but inadvisable. DNS is a notoriously poor way to load balancer, as software takes very different (and often simple or unintuitive) approaches to A or AAAA DNS records that return multiple IP addresses. For example, it is common for software to always choose the first IP address in the response, and/or caching reusing the same IP address indefinitely. If you need to be able to send traffic to the Service’s DNS address, consider a (standard) ClusterIP or LoadBalancer service.
The “correct” way to use a Headless service is to query the Service’s A/AAAA DNS record, and use that data in a serverside or clientside load balancer.
Most of the services we have been discussing are for internal traffic management for the cluster network. In our next sections will be reviewing how to route requests into the cluster with Service Type Loadbalancer and Ingresses.
LoadBalancer Services expose services external to the cluster network. They combine NodePort Service behavior with an external integration, such as a cloud provider’s load balancer. Notably, LoadBalancer services handle L4 traffic (unlike Ingress, which handles L7 traffic), so they will work for any TCP or UDP service, provided the load balancer selected supports L4 traffic.
Configuration and load balancer options are extremely dependent on the cloud provider.
For example, some will support .spec.loadBalancerIP
(with varying setup required), and some will ignore it.
apiVersion
:
v1
kind
:
Service
metadata
:
name
:
demo-service
spec
:
selector
:
app
:
demo
ports
:
-
protocol
:
TCP
port
:
80
targetPort
:
8080
clusterIP
:
10.0.5.1
type
:
LoadBalancer
Once the load balancer has been provisioned, its IP address will be written to
.status.loadBalancer.ingress.ip
.
LoadBalancer Services are useful for exposing TCP or UDP services to the outside world. Traffic will come into to the
load balancer on its public IP address address and TCP port 80, defined by spec.ports[*].port
and routed to the
clusterIP address,10.0.5.1, and then to container target port 8080,spec.ports[*].targetPort
. Not shown in the
example the .spec.ports[*].nodePort
, if not specified Kubernetes will pick one for the service.
The Service’s spec.ports[*].targetPort
must match your pod’s container applications
spec.container[*].ports.containerPort
, along with the protocol. It’s like missing a semicolon in k8s networking
otherwise.
In Figure 5-6 we can see how Load Balancer builds on the other service types. The Cloud load balancer will determine how to distribute traffic; we will discuss that in depth in next chapter.
Let us continue to extend our golang web server example with a LoadBalancer Service.
Since we are running on our local machine and not in a Service Provider like AWS, GCP or Azure, we can use MetalLB as an example for our Loadbalancer service. MetalLB project aims to allow users to deploy Bare Metal Loadbalancers for their clusters.
This example has been modified from Kind example deployment https://kind.sigs.k8s.io/docs/user/loadbalancer
Our first step is to deploy a separate namespace for MetalLB.
kubectl apply -f mlb-ns.yaml namespace/metallb-system created
MetalLB Members also require a secret for joining the loadbalancer cluster, let us deploy one now for them to use in our cluster.
kubectl create secret generic -n metallb-system memberlist --from-literal=
secretkey
=
"
$(
openssl rand -base64 128)
"
secret/memberlist created
Now we can deploy MetalLB!
kubectl apply -f ./metallb.yaml podsecuritypolicy.policy/controller created podsecuritypolicy.policy/speaker created serviceaccount/controller created serviceaccount/speaker created clusterrole.rbac.authorization.k8s.io/metallb-system:controller created clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created role.rbac.authorization.k8s.io/config-watcher created role.rbac.authorization.k8s.io/pod-lister created clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created rolebinding.rbac.authorization.k8s.io/config-watcher created rolebinding.rbac.authorization.k8s.io/pod-lister created daemonset.apps/speaker created deployment.apps/controller created
As you can see it deploys many objects, and now we wait for the deployment to finish. We can monitor the deployment of
resources with --watch
option in the metallb-system
namespace.
kubectl get pods -n metallb-system --watch NAME READY STATUS RESTARTS AGE controller-5df88bd85d-mvgqn 0/1 ContainerCreating0
10s speaker-5knqb 1/1 Running0
10s speaker-k79c9 1/1 Running0
10s speaker-pfs2p 1/1 Running0
10s speaker-sl7fd 1/1 Running0
10s controller-5df88bd85d-mvgqn 1/1 Running0
12s
To complete configuration, we need to provide metallb a range of IP addresses it controls. This his range has to be on the docker kind network.
docker network inspect -f'{{.IPAM.Config}}'
kind[{
172.18.0.0/16 172.18.0.1 map[]}
{
fc00:f853:ccd:e793::/64 fc00:f853:ccd:e793::1 map[]}]
172.18.0.0/16
is our docker network running locally.
We want our loadbalancer IP range to come from this subclass. We can configure metallb, for instance, to use 172.18.255.200 to 172.18.255.250 by creating the configmap.
The config map would look like this:
apiVersion
:
v1
kind
:
ConfigMap
metadata
:
namespace
:
metallb-system
name
:
config
data
:
config
:
|
address-pools:
- name: default
protocol: layer2
addresses:
- 172.18.255.200-172.18.255.250
Let us deploy it so we can use MetalLB.
kubectl apply -f ./metallb-configmap.yaml
Now that we MetalLB deploy we deploy a loadbalancer for our Web app.
kubectl apply -f services-loadbalancer.yaml service/loadbalancer-service created
For fun let us scale the web app deployment to 10, if you have the resources for it!
kubectl scale deployment app --replicas 10 kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES app-7bdb9ffd6c-b5x7m 2/2 Running0
26s 10.244.3.15 kind-worker <none> <none> app-7bdb9ffd6c-bqtf8 2/2 Running0
26s 10.244.2.13 kind-worker2 <none> <none> app-7bdb9ffd6c-fb9sf 2/2 Running0
26s 10.244.3.14 kind-worker <none> <none> app-7bdb9ffd6c-hrt7b 2/2 Running0
26s 10.244.2.7 kind-worker2 <none> <none> app-7bdb9ffd6c-l2794 2/2 Running0
26s 10.244.2.9 kind-worker2 <none> <none> app-7bdb9ffd6c-l4cfx 2/2 Running0
26s 10.244.3.11 kind-worker2 <none> <none> app-7bdb9ffd6c-rr4kn 2/2 Running0
23m 10.244.3.10 kind-worker <none> <none> app-7bdb9ffd6c-s4k92 2/2 Running0
26s 10.244.3.13 kind-worker <none> <none> app-7bdb9ffd6c-shmdt 2/2 Running0
26s 10.244.1.12 kind-worker3 <none> <none> app-7bdb9ffd6c-v87f9 2/2 Running0
26s 10.244.1.11 kind-worker3 <none> <none> app2-658bcd97bd-4n888 1/1 Running0
35m 10.244.2.6 kind-worker3 <none> <none> app2-658bcd97bd-mnpkp 1/1 Running0
35m 10.244.3.7 kind-worker <none> <none> app2-658bcd97bd-w2qkl 1/1 Running0
35m 10.244.3.8 kind-worker <none> <none> dnsutils 1/1 Running1
75m 10.244.1.2 kind-worker3 <none> <none> postgres-0 1/1 Running0
75m 10.244.1.4 kind-worker3 <none> <none> postgres-1 1/1 Running0
75m 10.244.3.4 kind-worker <none> <none>
Now we can test the provisioned load balancer.
With more replicas deployed for our App behind the loadbalancer, we need the external IP of the Loadbalancer,
172.18.255.200
.
kubectl get svc loadbalancer-service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(
S)
AGE loadbalancer-service LoadBalancer 10.99.24.220 172.18.255.200 80:31276/TCP 52s kubectl get svc/loadbalancer-service -o=
jsonpath
=
'{.status.loadBalancer.ingress[0].ip}'
172.18.255.200
Since Docker for Mac or Windows does not expose the kind network to the host, we can not directly reach the
172.18.255.200
Loadbalancer IP on the Docker private network.
We can simulate it by attaching a docker container to the kind network and curling the Loadbalancer as a workaround.
If you would like to read more about this issue there is a great blog post here https://www.thehumblelab.com/kind-and-metallb-on-mac/
We will use another great networking docker image called nicolaka/netshoot, to run locally, attach to the kind docker network, and send requests to our MetalLB Loadbalancer.
If we run it several times we can see the loadbalancer is doing its job of routing traffic to different Pods.
docker run --network kind -a stdin -a stdout -i -t nicolaka/netshoot curl 172.18.255.200/host NODE: kind-worker, POD IP:10.244.2.7 docker run --network kind -a stdin -a stdout -i -t nicolaka/netshoot curl 172.18.255.200/host NODE: kind-worker, POD IP:10.244.2.9 docker run --network kind -a stdin -a stdout -i -t nicolaka/netshoot curl 172.18.255.200/host NODE: kind-worker3, POD IP:10.244.3.11 docker run --network kind -a stdin -a stdout -i -t nicolaka/netshoot curl 172.18.255.200/host NODE: kind-worker2, POD IP:10.244.1.6 docker run --network kind -a stdin -a stdout -i -t nicolaka/netshoot curl 172.18.255.200/host NODE: kind-worker, POD IP:10.244.2.9
With each new request the metalLB service is sending requests to different pods. Loadbalancer like other services uses
selectors and labels for the pods, and we can see in the kubectl describe endpoints loadbalancer-service
. The pod IP
addresses match our results from the curl commands.
kubectl describe endpoints loadbalancer-service Name: loadbalancer-service Namespace: default Labels:app
=
app Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-01-30T19:59:57Z Subsets: Addresses: 10.244.1.6, 10.244.1.7, 10.244.1.8, 10.244.2.10, 10.244.2.7, 10.244.2.8, 10.244.2.9, 10.244.3.11, 10.244.3.12, 10.244.3.9 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- service-port8080
TCP Events: <none>
It is important to remember that LoadBalancer Services require specific integrations, and will not work without cloud provider support, or manually-installed software such as MetalLB.
They are not (normally) L7 load balancers, and therefore cannot intelligently handle HTTP(S) requests. There is a one to one mapping of load balancer to workload, which means that all requests sent to that load balancer must be handled by the same workload.
While not a network service, it is important to mention there is the Horizontal Pod Autoscaler service that will scale pods in a replication controller, deployment, replica set or stateful set based on CPU utilization
We can scale our Application to the demands of the Users, with no need for configuration changes on anyone part. Kubernetes and the Loadbalancer service takes care of all of that for developers, system, and network administrators.
We will see in the next chapter how we can take that even further using Cloud Services for autoscaling.
Here are some troubleshooting tips if issues arise with the endpoints or services.
Removing the label on the pod allows it to continue to run while also updating the endpoint and service. The endpoint controller will remove that unlabelled pod from the endpoints objects and the deployment will deploy another pod, this will allow you to troubleshoot issues with that specific unlabeled pod but not adverse effect the service to end customers. I’ve used this one countless times during development, and we did so in the previous sections examples.
There are two probes that communicated the pod’s health to the Kubelet and the rest of the Kubernetes environment
It is also very easy to mess YAML manifest, make sure to compare ports on the service and pods, and that they match.
We discuss network Policies in Chapter 3 and those can also stop pods from communicating with each other and services. If you Cluster Network is using network policies ensure that they are set up appropriately for application traffic flow
Also remember using diagnostic tools like the dnsutils pod, or the netshoot pods on the cluster network are helpful debugging tools.
If endpoints are taking too long to come up in the cluster, there are several options on that can be configured on the kubelet to control how fast it responds to change in the Kubernetes environment
--kube-api-qps
the Query Per Second rate at which the Kubelet will use
when communicating with the Kubernetes apiserver, the default 5.
--kube-api-burst
this will temporarily allow api queries to burst to this number, the default is 10.
--iptables-sync-period
is the maximum interval of how often iptables rules are refreshed
(e.g., 5s, 1m, 2h22m). Must be greater than 0 and the is default 30s.
--ipvs-sync-period duration
is the maximum interval of how often ipvs rules are refreshed.
Must be greater than 0; Default 30s.
Increasing these options for larger clusters is recommended but also remember this increases the resources on both the Kubelet and the API server so keep that in mind.
These can help with alleviate issues and are good to be aware of as the number of services and pods grow in the cluster.
The various types of services exemplify how power the Network abstractions are in Kubernetes. We have dug deep into how these work for each layer of the tool chain. The developer looking to deploy applications to Kubernetes now has the knowledge to pick and choose which services are right for their use case. No longer will a Network Administrator have to manually update load balancers with IP address, with Kubernetes managing that for them.
We have just scratched the surface of what is possible with Services. With each new version of Kubernetes there are options to tune and configurations to run services. Test out each service for your use cases and ensure you are using the appropriate services to optimize your applications on the Kubernetes network.
The LoadBalancer service type is the only one that allows for traffic into the cluster. Exposing HTTP(S) services behind a load balancer, for external users to connect to. Ingresses support path-based routing, which allows different HTTP paths to be served by different services. In our next section will discuss Ingress and how it is an alternative to managing connectivity into the cluster resources
An Ingress is a Kubernetes-specific, L7 (HTTP) load balancer, which is accessible externally, contrasted with L4 ClusterIP Service, which is internal to the cluster. This is the typical choice for exposing a HTTP(s) workload to external users. An ingress can be a single entry point into an API or a microservice based architecture. Traffic can be routed to Services based on HTTP information in the request. Ingress is a configuration spec (with multiple implementations) for routing HTTP traffic to Kubernetes Services. Figure 5-7 outlines the Ingress components.
In order to manage traffic for in a cluster with ingress, there two components required, the Controller and rules. The Controller manages ingress pods and the rules deployed while rules define how the traffic is routed.
We call ingress implementations ingress controllers. In Kubernetes, a controller is software that is responsible for managing a typical resource type, and making reality match the desired state.
There are two general kinds of controllers: external load balancer controllers, and internal load balancer controllers. External load balancer controllers create a load balancer that exists “outside” the cluster, such as a cloud provider product. Internal load balancer controllers deploy a load balancer that runs within the cluster, and do not directly solve the problem of routing consumers to the load balancer. There are a myriad of ways that cluster administrators run internal load balancers, such as running the load balancer on a subset of special nodes, and routing traffic somehow to those nodes. The primary motivation for choosing an internal load balancer is cost reduction. An internal load balancer for ingress can route traffic for multiple ingress objects, whereas an external load balancer controller typically needs one load balancer per ingress. As most cloud providers charge by load balancer, it is cheaper to support a single cloud load balancer that does fan-out within the cluster, rather than many cloud load balancers. Note that this incurs operational overhead, and increased latency and compute costs, so be sure the money you’re saving is worth it. Many companies have a bad habit of optimizing on inconsequential cloud spend line items.
Let’s look at the spec for an Ingress controller. Like LoadBalancer services, most of the spec is universal, but various ingress controllers have different features and accept different unique config. We’ll start with the basics.
apiVersion
:
networking.k8s.io/v1
kind
:
Ingress
metadata
:
name
:
basic-ingress
spec
:
rules
:
-
http
:
paths
:
# Send all /demo requests to demo-service.
-
path
:
/demo
pathType
:
Prefix
backend
:
service
:
name
:
demo-service
port
:
number
:
80
# Send all other requests to main-service.
defaultBackend
:
service
:
name
:
main-service
port
:
number
:
80
The above example is representative of a typical ingress. It sends traffic to /demo
to one service, and all other
traffic to another. Ingresses have a “default backend”, where requests are routed if no rule matches. This can be
configured in many ingress controllers in the controller configuration itself (e.g., a generic 404 page), and many
support the .spec.defaultBackend
field. Ingresses support multiple ways to specify a path. There are currently three.
Matches the specific path and only the given path (including trailing / or lack thereof).
Matches all paths that start with the given path.
Allows for custom semantics from the current ingress controller.
When a request matches multiple paths, the most specific match is chosen.
For example, if there are rules for /first
and /first/second
,
any request starting with /first/second
will go to the backend for /first/second
.
If a path matches an exact path and a prefix path, the request will go to the backend for the exact rule.
Ingresses can also use hostnames in rules.
apiVersion
:
networking.k8s.io/v1
kind
:
Ingress
metadata
:
name
:
multi-host-ingress
spec
:
rules
:
-
host
:
a.example.com
http
:
paths
:
-
pathType
:
Prefix
path
:
"/"
backend
:
service
:
name
:
service-a
port
:
number
:
80
-
host
:
b.example.com
http
:
paths
:
-
pathType
:
Prefix
path
:
"/"
backend
:
service
:
name
:
service-b
port
:
number
:
80
In this example, we serve traffic to a.example.com from one service, and traffic to b.example.com from another. This is comparable to virtualhosts in webservers. You may want to use host rules to use a single load balancer and IP to serve multiple unique domains.
Ingresses have basic TLS support.
apiVersion
:
networking.k8s.io/v1
kind
:
Ingress
metadata
:
name
:
demo-ingress-secure
spec
:
tls
:
-
hosts
:
-
https-example.com
secretName
:
demo-tls
rules
:
-
host
:
https-example.com
http
:
paths
:
-
path
:
/
pathType
:
Prefix
backend
:
service
:
name
:
demo-service
port
:
number
:
80
The TLS config references a Kubernetes Secret by name, in .spec.tls.[*].secretName
. Ingress controllers expect the
TLS certificate and key to be provided in .data."tls.crt"
and .data."tls.key"
respectively, as shown below.
apiVersion
:
v1
kind
:
Secret
metadata
:
name
:
demo-tls
type
:
kubernetes.io/tls
data
:
tls.crt
:
cert, encoded in base64
tls.key
:
key, encoded in base64
If you don’t need to manage traditionally issued certificates by hand, you can use cert-manager to automatically fetch and update certs. Read more at https://cert-manager.io.
We mentioned earlier that ingress is simply a spec, and drastically different implementations exist. It’s possible to use multiple ingress controllers in a single cluster, using IngressClass settings. An ingress class represents an ingress controller, and therefore a specific ingress implementation.
Annotations in Kubernetes must be strings. Because true and false have distinct non-string meanings, you cannot set an annotation to true or false without quotes. “true” and “false” are both valid. This is a long-running bug, which is often encountered when setting a default priority class. https://github.com/kubernetes/kubernetes/issues/59113
IngressClass was introduced in Kubernetes 1.18.
Prior to 1.18, annotating ingresses with kubernetes.io/ingress.class
was a common convention,
but relied on all installed ingress controllers to support it.
Ingresses can pick an ingress class by setting the class’s name in .spec.ingressClassName
.
If more than one IngressClass is set as default, Kubernetes will not allow you to create an ingress with no ingressclass, or remove the ingressclass from an existing ingress. You can use admission control to prevent multiple IngressClasses from being marked as default.
Ingress only supports HTTP(S) requests, which is insufficient if your service uses a different protocol (e.g., most databases use their own protocols). Some ingress controllers, such as the NGINX ingress controller, do support TCP and UDP, but this is not the norm.
Now onto deploying an Ingress Controller; so we can add ingress rules to our golang web server example.
When we deployed our kind cluster, we had to add several options to allow us to deploy an ingress controller.
extraPortMappings allow the local host to make requests to the Ingress controller over ports 80/443
node-labels only allow the ingress controller to run on a specific node(s) matching the label selector
There are many options to choose from with Ingress Controllers. The Kubernetes system does not start or have a default controller like it does with other pieces. The Kubernetes community does support an AWS, GCE and Nginx Ingress Controllers. Table 5-1 outlines several options for Ingress.
Name | Commercial Support | Engine | Protocol Support | SSL termination |
---|---|---|---|---|
Ambassador Ingress Controller |
Yes |
Envoy |
gRPC, HTTP/2, WebSockets |
Yes |
Community Ingress Nginx |
No |
NGINX |
gRPC, HTTP/2, WebSockets |
Yes |
NGINX Inc. Ingress |
Yes |
NGINX |
HTTP, Websocket, gRPC |
Yes |
HAProxy Ingress |
Yes |
HAProxy |
gRPC, HTTP/2, WebSockets |
Yes |
Istio Ingress |
No |
Envoy |
HTTP, HTTPS, gRPC, HTTP/2 |
Yes |
Kong Ingress Controller for Kubernetes |
Yes |
Lua on top of nginx |
gRPC, HTTP/2 |
Yes |
Traefik Kubernetes Ingress |
Yes |
Traefik |
HTTP/2, gRPC, and WebSockets |
Yes |
Some things to consider when deciding on the Ingress for your clusters.
Protocol Support - Do you need more than TCP/UDP, for example gRPC integration or websocket?
Commercial Support - Do you need a commercial support?
Advanced Features - Is JWT/oAuth2 authentication or circuit breakers requirements for your applications?
API Gateway Features - Do you need some API Gateway functionalities such as rate-limiting?
Traffic distribution - Does your application require support for specialized traffic distribution like canary A/B testing or mirroring?
For our example we have chosen to use the Community version of the NGINX Ingress Controller.
For a list of more Ingress Controllers to choose from, the kubernetes.io site maintains a list here https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/
Let’s Deploy the NGINX Ingress Controller into our kind cluster.
kubectl apply -f ingress.yaml namespace/ingress-nginx created serviceaccount/ingress-nginx created configmap/ingress-nginx-controller created clusterrole.rbac.authorization.k8s.io/ingress-nginx created clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created role.rbac.authorization.k8s.io/ingress-nginx created rolebinding.rbac.authorization.k8s.io/ingress-nginx created service/ingress-nginx-controller-admission created service/ingress-nginx-controller created deployment.apps/ingress-nginx-controller created validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created serviceaccount/ingress-nginx-admission created clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created role.rbac.authorization.k8s.io/ingress-nginx-admission created rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created job.batch/ingress-nginx-admission-create created job.batch/ingress-nginx-admission-patch created
As with all deployments, we must wait for the Controller to be ready before we can use it. With the below command we can verify if our Ingress controller is ready for use.
kubectlwait
--namespace ingress-nginx> --for
=
condition
=
ready pod> --selector
=
app.kubernetes.io/component=
controller> --timeout
=
90s pod/ingress-nginx-controller-76b5f89575-zps4k condition met
The Controller is deployed to the cluster, and now we’re ready to write Ingress rules for our application.
Our yaml manifest defines several ingress rules to use with our golang web server example.
kubectl apply -f ingress-rule.yaml
ingress.extensions/ingress-resource created
kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-resource <none> * 80
4s
With describe we can see all the backends that map to clusterip service and the pods.
kubectl describe ingress Name: ingress-resource Namespace: default Address: Default backend: default-http-backend:80(
<error: endpoints"default-http-backend"
not found>)
Rules: Host Path Backends ---- ---- -------- * /host clusterip-service:8080(
10.244.1.6:8080,10.244.1.7:8080,10.244.1.8:8080)
Annotations: kubernetes.io/ingress.class: nginx Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Sync 17s nginx-ingress-controller Scheduledfor
sync
Our ingress rule is only for /host route and will route requests to our clusterip-service:8080
service.
We can test that with curl to http://localhost/host
curl localhost/host NODE: kind-worker2, POD IP:10.244.1.6 curl localhost/healthz
Now we can see how powerful ingresses are, let us deploy a second deployment and clusterIP service.
Our new deployment and service will be used to answer the requests for /data.
kubectl apply -f ingress-example-2.yaml deployment.apps/app2 created service/clusterip-service-2 configured ingress.extensions/ingress-resource-2 configured
Now both the /host and /data work but are going to separate services.
curl localhost/host NODE: kind-worker2, POD IP:10.244.1.6 curl localhost/data Database Connected
Since Ingress work on Layer 7, there are many more options to route traffic with, like host header and URI path.
For more advanced traffic routing and release patterns, a Service Mesh is required to be deployed in the Cluster network. Let’s dig into that next.
A new cluster with defaults options has some limitations. So let’s get an understanding for what those limitations are and how a Service Mesh can resolve some of those limitations. A Service Mesh is an API Driven Infrastructure layer for handling service-to-service communication.
From a security point of view all traffic inside the cluster is unencrypted between pods monitoring and each application team that runs a service must configure monitoring separately for each service. We have discussed the service type, but we have not discussed how to update deployments of pods for them. Service Meshes supports more than the basic deployment type rolling update and recreate, like Cannery. From a developer’s perspective, injecting faults into the network is useful, but also not directly supported in default Kubernetes’ Network deployments. With Service Meshes developers can add fault testing, and instead of just killing pods you can use service meshes to inject delays—again each application would have to build in fault testing or circuit breaking.
There are several pieces of functionality that a Service Mesh enhances or provides in a default Kubernetes cluster network.
Instead of relaying on DNS for service discovery, the service mesh manages service discovery, and removes the need for it to be implemented in each individual application
The Service Mesh adds more advanced load balancing algorithms such least request, consistent hashing, and zone aware
The Service Mesh can increase Communication Resilience for Applications by not having to implement retries, timeouts, circuit-breaking, or rate limiting in application code
A Service Mesh can provide
end-to-end encryption with mTLS between services
authorization policies – which authorize what services can communicate with each other, not just at the Layer 3 and 4 level like in Kubernetes network polices
Service meshes add in Observability by enriching the Layer 7 metrics, and adding tracing, and alerting
Traffic shifting and mirroring in the cluster
All of this can be controlled via an API provided by the service mesh implementation
Let’s walk through several components of a service mesh in Figure 5-8.
Traffic is handled differently depending on the component or destination of traffic. Traffic into and out the cluster is managed by the Gateways. While traffic between the Frontend, Backend and User service is all encrypted with Mutual TLS is handled by the Service Mesh. All the Traffic to the Frontend, Backend, and User pods in the Service mesh is proxied by the Sidecar Proxy deployed within the pods. Even if the Control plane is down and updates can not be made to the mesh, the Service and Application traffic is not effected.
There are several options to use when deploying a Service Mesh, here are highlights of just a few:
Istio
Uses a Go control plane with an Envoy Proxy
this is a Kubernetes-native solution that was initially released by Lyft
Consul
Uses Hashicorp Consul as the control plane
Consul Connect uses an agent installed on every node as a DaemonSet which communicates with the Envoy sidecar proxies that handles routing & forwarding of traffic.
AWS App Mesh
Is an AWS Managed solution that implements its own control plane
Des not have mTLS or traffic policy
Uses the Envoy proxy for the Data plane
Linkerd
Also uses Go for the control plane with the Linkerd proxy
No traffic shifting and no distributed tracing
Is a Kubernetes-only solution, which results in fewer moving pieces, and means that Linkerd has less complexity overall
It is our opinion that the best use case for a Service Mesh is mutual TLS between services. As well as other higher level use cases for developers include, circuit breaking and fault testing for API’s. For Network administrators advanced routing policies and algorithms can be deployed with service meshes.
Let’s look at a service mesh example. The first thing you need to do if you haven’t already is to install the Linkerd CLI. Those directions are at https://linkerd.io/2/getting-started/
Your choices are curl to bash or brew if you’re on mac.
curl -sL https://run.linkerd.io/install |
sh
OR
brew install linkerd
linkerd version
Client version: stable-2.9.2
Server version: unavailable
Pre-flight checklist will verify that our cluster can run linkerd.
linkerd check --pre kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API kubernetes-version ------------------ √ is running the minimum Kubernetes API version √ is running the minimum kubectl version pre-kubernetes-setup -------------------- √ control plane namespace does not already exist √ can create non-namespaced resources √ can create ServiceAccounts √ can create Services √ can create Deployments √ can create CronJobs √ can create ConfigMaps √ can create Secrets √ canread
Secrets √ canread
extension-apiserver-authentication configmap √ no clock skew detected pre-kubernetes-capability ------------------------- √ has NET_ADMIN capability √ has NET_RAW capability linkerd-version --------------- √ can determine the latest version √ cli is up-to-date Status check results are √
The linkerd cli tool can install linkerd for us onto our kind cluster.
linkerd install |
kubectl apply -f -
namespace/linkerd created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-identity created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-identity created
serviceaccount/linkerd-identity created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-controller created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-controller created
serviceaccount/linkerd-controller created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-destination created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-destination created
serviceaccount/linkerd-destination created
role.rbac.authorization.k8s.io/linkerd-heartbeat created
rolebinding.rbac.authorization.k8s.io/linkerd-heartbeat created
serviceaccount/linkerd-heartbeat created
role.rbac.authorization.k8s.io/linkerd-web created
rolebinding.rbac.authorization.k8s.io/linkerd-web created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-web-check created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-web-check created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-web-admin created
serviceaccount/linkerd-web created
customresourcedefinition.apiextensions.k8s.io/serviceprofiles.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/trafficsplits.split.smi-spec.io created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created
serviceaccount/linkerd-proxy-injector created
secret/linkerd-proxy-injector-k8s-tls created
mutatingwebhookconfiguration.admissionregistration.k8s.io
/linkerd-proxy-injector-webhook-config created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-sp-validator created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-sp-validator created
serviceaccount/linkerd-sp-validator created
secret/linkerd-sp-validator-k8s-tls created
validatingwebhookconfiguration.admissionregistration.k8s.io
/linkerd-sp-validator-webhook-config created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-tap created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-tap-admin created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-tap created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-tap-auth-delegator created
serviceaccount/linkerd-tap created
rolebinding.rbac.authorization.k8s.io/linkerd-linkerd-tap-auth-reader created
secret/linkerd-tap-k8s-tls created
apiservice.apiregistration.k8s.io/v1alpha1.tap.linkerd.io created
podsecuritypolicy.policy/linkerd-linkerd-control-plane created
role.rbac.authorization.k8s.io/linkerd-psp created
rolebinding.rbac.authorization.k8s.io/linkerd-psp created
configmap/linkerd-config created
secret/linkerd-identity-issuer created
service/linkerd-identity created
service/linkerd-identity-headless created
deployment.apps/linkerd-identity created
service/linkerd-controller-api created
deployment.apps/linkerd-controller created
service/linkerd-dst created
service/linkerd-dst-headless created
deployment.apps/linkerd-destination created
cronjob.batch/linkerd-heartbeat created
service/linkerd-web created
deployment.apps/linkerd-web created
deployment.apps/linkerd-proxy-injector created
service/linkerd-proxy-injector created
service/linkerd-sp-validator created
deployment.apps/linkerd-sp-validator created
service/linkerd-tap created
deployment.apps/linkerd-tap created
serviceaccount/linkerd-grafana created
configmap/linkerd-grafana-config created
service/linkerd-grafana created
deployment.apps/linkerd-grafana created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-prometheus created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-prometheus created
serviceaccount/linkerd-prometheus created
configmap/linkerd-prometheus-config created
service/linkerd-prometheus created
deployment.apps/linkerd-prometheus created
secret/linkerd-config-overrides created
As with the Ingress Controller and MetalLB we can see that a lot of components are installed in our cluster.
Linkerd can validate the installation with linkerd check
cli command.
It will validate a plethora of checks for the linkerd install, included but not limited to the k8 api version, controllers, pods and configs to run linkerd, as well as all the services, versions and api’s needed to run linkerd.
linkerd check kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API kubernetes-version ------------------ √ is running the minimum Kubernetes API version √ is running the minimum kubectl version linkerd-existence ----------------- √'linkerd-config'
config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods √ controller pod is running √ can initialize the client √ can query the control plane API linkerd-config -------------- √ control plane Namespace exists √ control plane ClusterRoles exist √ control plane ClusterRoleBindings exist √ control plane ServiceAccounts exist √ control plane CustomResourceDefinitions exist √ control plane MutatingWebhookConfigurations exist √ control plane ValidatingWebhookConfigurations exist √ control plane PodSecurityPolicies exist linkerd-identity ---------------- √ certificate config is valid √ trust anchors are using supported crypto algorithm √ trust anchors are within their validity period √ trust anchors are validfor
at least60
days √ issuer cert is using supported crypto algorithm √ issuer cert is within its validity period √ issuer cert is validfor
at least60
days √ issuer cert is issued by the trust anchor linkerd-webhooks-and-apisvc-tls ------------------------------- √ tap API server has valid cert √ tap API server cert is validfor
at least60
days √ proxy-injector webhook has valid cert √ proxy-injector cert is validfor
at least60
days √ sp-validator webhook has valid cert √ sp-validator cert is validfor
at least60
days linkerd-api ----------- √ control plane pods are ready √ control plane self-check √[
kubernetes]
control plane can talk to Kubernetes √[
prometheus]
control plane can talk to Prometheus √ tap api service is running linkerd-version --------------- √ can determine the latest version √ cli is up-to-date control-plane-version --------------------- √ control plane is up-to-date √ control plane and cli versions match linkerd-prometheus ------------------ √ prometheus add-on service account exists √ prometheus add-on config map exists √ prometheus pod is running linkerd-grafana --------------- √ grafana add-on service account exists √ grafana add-on config map exists √ grafana pod is running Status check results are √
Now that everything looks good with our install of Linkerd we can add our application to the Service Mesh.
kubectl -n linkerd get deploy NAME READY UP-TO-DATE AVAILABLE AGE linkerd-controller 1/11
1
3m17s linkerd-destination 1/11
1
3m17s linkerd-grafana 1/11
1
3m16s linkerd-identity 1/11
1
3m17s linkerd-prometheus 1/11
1
3m16s linkerd-proxy-injector 1/11
1
3m17s linkerd-sp-validator 1/11
1
3m17s linkerd-tap 1/11
1
3m17s linkerd-web 1/11
1
3m17s
Let us pull up the Linkerd console to investigate what we have just deployed.
We can start the console with linkerd dashboard &
This will proxy the console to our local machine available at http://localhost:50750
.
linkerd viz install |
kubectl apply -f -
linkerd viz dashboard
Linkerd dashboard available at:
http://localhost:50750
Grafana dashboard available at:
http://localhost:50750/grafana
Opening Linkerd dashboard in the default browser
If you’re having issues with reaching the dashboard, you can run linkerd viz check
and find more help here https://linkerd.io/2.10/tasks/troubleshooting/index.html
We can see all our deployed objects from the previous exercises in Figure 5-9.
Our clusterip-service is not part of the Linkerd service mesh. We will need to use the proxy injector to add our service to the mesh. It accomplishes this by watching for a specific annotation that can either be added with linkerd inject or by hand to the pod’s spec.
Let us remove some older exercise’s resources for clarity.
kubectl delete -f ingress-example-2.yaml deployment.apps"app2"
deleted service"clusterip-service-2"
deleted ingress.extensions"ingress-resource-2"
deleted kubectl delete pods app-5586fc9d77-7frts pod"app-5586fc9d77-7frts"
deleted kubectl delete -f ingress-rule.yaml ingress.extensions"ingress-resource"
deleted
We can use the linkerd cli to inject the proper annotations into our deployment spec so that will become part of the mesh.
We first need to get our application manifest, cat web.yaml
and use linkerd to inject the annotations, linkerd
inject -
, then apply it back to Kubernetes api with kubectl apply -f -
cat web.yaml|
linkerd inject -|
kubectl apply -f - deployment"app"
injected deployment.apps/app configured
If we describe our app deployment we can see that Linkerd has injected new annotations for us,
Annotations: linkerd.io/inject: enabled
.
kubectl describe deployment app Name: app Namespace: default CreationTimestamp: Sat,30
Jan2021
13:48:47 -0500 Labels: <none> Annotations: deployment.kubernetes.io/revision: 3 Selector:app
=
app Replicas:1
desired|
1
updated|
1
total|
1
available|
0
unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels:app
=
app Annotations: linkerd.io/inject: enabled Containers: go-web: Image: strongjz/go-web:v0.0.6 Port: 8080/TCP Host Port: 0/TCP Liveness: http-get http://:8080/healthzdelay
=
5stimeout
=
1speriod
=
5s Readiness: http-get http://:8080/delay
=
5stimeout
=
1speriod
=
5s Environment: MY_NODE_NAME:(
v1:spec.nodeName)
MY_POD_NAME:(
v1:metadata.name)
MY_POD_NAMESPACE:(
v1:metadata.namespace)
MY_POD_IP:(
v1:status.podIP)
MY_POD_SERVICE_ACCOUNT:(
v1:spec.serviceAccountName)
DB_HOST: postgres DB_USER: postgres DB_PASSWORD: mysecretpassword DB_PORT: 5432 Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: app-78dfbb4854(
1/1 replicas created)
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 4m4s deployment-controller Scaled down app-5586fc9d77 Normal ScalingReplicaSet 4m4s deployment-controller Scaled up app-78dfbb4854 Normal Injected 4m4s linkerd-proxy-injector Linkerd sidecar injected Normal ScalingReplicaSet 3m54s deployment-controller Scaled app-5586fc9d77
If we navigate to the app in the Dashboard we can see that our Deployment is part of the Linkerd Service Mesh now as shown in Figure 5-10.
http://localhost:50750/namespaces/default/deployments/app
The CLI can also display our stats for us.
linkerd stat deployments -n default NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN app 1/1 100.00% 0.4rps 1ms 1ms 1ms 1
Again let us scale up our deployment!
kubectl scale deploy app --replicas 10 deployment.apps/app scaled
In Figure 5-11 we navigate to the web browser and open this link, so we can watch the stats in real time. Select the default namespaces and in Resources our deployment/app. Then click start for the web to start displaying the metrics.
http://localhost:50750/top?namespace=default&resource=deployment%2Fapp
In a separate terminal let us use the netshoot image, but this time running inside our kind cluster.
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash If you don'
t see acommand
prompt, try pressing enter. bash-5.0#
Let us send a few hundred queries and see the stats.
bash-5.0#for i in`
seq1
100`
;
do
curl http://clusterip-service/host&&
sleep 2;
done
In our terminal we can see all the liveliness and readiness probes as well as our /host requests.
tmp-shell
is our netshoot bash terminal with our for loop running.
10.244.2.1
, 10.244.3.1
, and 10.244.2.1
are the kubelet’s of the hosts running our probes for us.
linkerd viz stat deploy NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN app 1/1 100.00% 0.7rps 1ms 1ms 1ms 3
Our example only showed the observability functionality for a Service Mesh. Linkerd, istio and the like have many more options available for developers and network administrators to control, monitor and troubleshoot services running inside their cluster network. As with the Ingress Controller there are many options and features available. It is up to you and your teams to decide what functionality and features are important for your networks.
The Kubernetes networking world is feature rich with many options for teams to deploy, test and manage with their Kubernetes Cluster. Each new addition will add complexity and overhead to the Cluster operations. We have given developers, network and system administrators a view into the abstractions that Kubernetes offers.
From internal to the cluster,to external, Teams must choose what abstractions work best for their workloads. This is no small task, and now you are armed with the knowledge to begin those discussions.
In our next chapter we take our Kubernetes Services and network learnings to the Cloud! We will explore the network services offered by each cloud provider and how are integrated into their Kubernetes managed service offering.