Chapter 1. Networking Introduction

Guilty till proven innocent, that is the mantra of the network and the engineers that supervise them. In this opening chapter, we will wade through the development of networking technologies and standards, give a brief overview of the dominant theory of networking and introduce our Golang web server that will weave the thread for networking examples in Kubernetes and the Cloud throughout the book.

Let’s begin…at the beginning…

Networking History

The Internet we know today is vast, with cables spanning the oceans, mountains moved to connect cities for lower latency. Barrett Lyon’s Mapping the Internet, shown in Figure 1-1, shows just how vast it truly is. That image illustrates all the connections between networks of networks that make up the Internet. The purpose of the network is to exchange information from one system to another system. That is an enormous ask of a distributed global system, but it was not always global; it started as a conceptual model and slowly was built up over time, to the behemoth in Barrett’s visually stunning artwork. There are many factors to consider when learning about networking, such as the last mile, the connectivity between a customer’s home, and the Internet Service Providers’ network; all the way to scaling up to the Geopolitical landscape of the Internet. The Internet is integrated into the fabric of society. In this book, we will discuss how the networks operate and how Kubernetes abstracts it for us.

Internet Art
Figure 1-1. Barrett Lyon, The Opte Project Mapping the Internet 2003
Note

Capital I, Internet indicates the network of networks that make up what we describe as the Internet. Lower case i internet is the connectivity of internal, private networks.

Table 1-1 gives you a brief outline of the history of networking before we dive into a few of the important details.

Table 1-1. A brief history of networking
Year Events

1969

ARPANET First connection test

1969

Telnet 1969 RFC (Request for Comments) 15 drafted

1971

FTP RFC 114 drafted

1973

FTP RFC 354 drafted

1974

TCP RFC 675 Vint Cerf, Yogen Dalal, and Carl Sunshine Drafted

1980

Development of Open Systems Interconnection Model Begins

1981

IP RFC 760 Drafted

1982

NORSAR and University College London left the ARPANET and began using TCP/IP over SATNET

1984

ISO 7498 Open Systems Interconnection Reference Model, OSI model, published

1991

Al Gore helps pass the National Information Infrastructure (NII) bill passed

1991

First Version of Linux released

2015

First version of Kubernetes released

In its earliest forms, networking was government ran or sponsored; in the United States’ the Department of Defense (DOD) sponsored the Advanced Research Projects Agency Network, ARPANET (well before Al Gore’s time in politics, which will be relevant in a moment). In 1969 ARPANET was deployed in UCLA, Argumentation Research Center at Stanford Research Institute, University of California Santa Barbara, and the University Of Utah School of Computing. Communication between nodes was not completed until 1970, when they began using the Network Control Protocol, NCP. NCP led to the development and use of the first computer-to-computer protocols like Telnet and File Transfer Protocol, FTP.

The success of ARPANET and NCP, the first protocol to power it, led to NCP’s downfall. It could not keep up with the demands of the network, and variety of networks connected. In 1974 Vint Cerf, Yogen Dalal, and Carl Sunshine began drafting RFC 675 for Transmission Control Protocol, TCP (more info on RFCs to come shortly). TCP would go on to be the standard for network connectivity. TCP allowed for exchanging packets across different types of networks. In 1981 Internet Protocol, IP, defined RFC 791, helped break out responsibilities of TCP into a separate protocol, increasing the modularity of the network. In the following years many organizations, including the DOD, adapted TCP as the standard. By January 1983 TCP/IP become the only approved protocol on the ARPANET, replacing the earlier NCP protocol because of its versatility and modularity.

A competing standards organization, the International Standards organization (ISO), developed and published ISO 7498 Open Systems Interconnection Reference Model, the OSI model. With its publication also came the protocols to support it. Unfortunately, the OSI model protocols never gained traction and lost out to the popularity of TCP/IP. The OSI model is still an excellent learning tool for understanding the layered approach to networking.

1991 Al Gore invented the Internet (well, really he helped pass the National Information Infrastructure (NII) bill), which helped lead to the creation of the Internet Engineering Task Force, IETF. Nowadays standards for the Internet are under the management of the Internet Engineer Task Force, an open consortium of leading experts and companies in the field of networking, like Cisco and Juniper. Requests for Comments, RFC, are published by The Internet Society, and the Internet Engineering Task Force. RFCs are prominently authored by individuals or groups of engineers and computer scientists, and they detail their processes, operations, and applications to the Internet’s functioning.

IETF RFC has two states:

Proposed Standard

A protocol specification has reached community support to be considered a standard. The designs are stable is well understood. A proposed standard can be deployed, implemented, and tested. It may be withdrawn from further consideration, however.

Internet Standard

Per RFC 2026: " In general, an Internet Standard is a stable specification and well-understood, technically competent, has multiple, independent, and interoperable implementations with substantial operational experience, enjoy significant public support and is recognizably useful in some parts of the Internet.”

Note

Draft Standard is a third classification that was discontinued in 2011.

There are thousands of Internet Standards, now defining how to implement protocols for all facets of networking, including wireless, encryption, data formats, among others. Each one being implemented by contributors of opensource projects and privately by large organizations like Cisco.

A lot has happened in the nearly 50 years since those first connectivity tests. Networks have grown in complexity and abstractions, so let’s start with the first thing I learned on the first day of my networking journey, the OSI model.

OSI model

The Open Systems Interconnection model, OSI model, is a conceptual framework for describing how two systems communicate over a network. The OSI model breaks down the responsibility of sending data across networks into layers. This works well for educational purposes to describe the breakdowns of relations between each layer’s responsibility and how data gets sent over networks. Interestingly enough, it was meant to be a protocol suite to power networks but lost to TCP/IP.

Here are the ISO standards that outline the OSI model and protocols.

  • ISO/IEC 7498-1 The Basic Model

  • ISO/IEC 7498-2 Security Architecture

  • ISO/IEC 7498-3 Naming and addressing

  • ISO/IEC 7498-4 Management framework

From ISO/IEC 7498-1, we have a description of what the OSI model is attempting to convey.

The OSI Model description is a complex and exact way of saying Networks have layers like cake or onions. The OSI model breaks the responsibility of the network into seven distinct layers each with different functions to aid in transmitting information from one system to another, we can see this approach in Figure 1-2. The layers encapsulate information from the layer below it; These layers are Application, Presentation, Session, Transport, Network, Datalink, and Physical. Over the next few pages will go over each layer’s functionality to send data between two systems.

OSI Model
Figure 1-2. OSI Model Layers

Each layer takes data from the previous and encapsulates it to make its Protocol Data Unit, PDU. The PDU is used to describe the data at each layer. PDUs are also a part of TCP/IP, terms you’ll see throughout networking (and this book). Application through Session layer are considered “Data” for the PDU, preparing the application information for communication. Transport uses ports to distinguish what process on the local system is responsible for the data. Network Layer PDU is the packet. Packets are distinct pieces of data routed between networks. Datalink Layer is the Frame or segment. Each packet is broken up into Frames, checked for errors, and sent out on the local network. Physical Layer transmits the Frame in bits over the medium. Next we will outline each layer in details. Table 1-2 also highlights the OSI layers.

Note

There are many mnemonics to remember the layers to the OSI Model; my favorite is All People Seem To Need Data Processing.

Application

The Application Layer is the top layer of the OSI Model and is the one end user interacts with every day. This layer is not where actual applications live, but provides the interface for applications that use it like a web browser or Office 365. The single biggest one is HTTP; you are probably reading this book on a web page hosted by an O’Reilly web server. Other examples of the Application layer that we use daily are DNS, SSH, and SMTP. Those applications are responsible for displaying and arranging data requested and sent over the network.

Presentation

This layer provides independence from data representation by translating between application and network formats. It can be referred to as the syntax layer. This layer allows two systems to use different encodings for data and still pass data between them. Encryption is also done at this layer, but that is a more complicated story later for this chapter’s TLS section.

Session

The Session layer is responsible for the duplex of the connection, whether sending and receiving data at the same time. It also establishes procedures for performing checkpointing, suspending, restarting, and terminating a session. It builds, manages, and terminates the connections between the local and remote applications.

Transport

Transport Layer provides transfer of data between applications, providing reliable data transfer services to the upper layers. The transport layer controls a given connection’s reliability through flow control, segmentation and de-segmentation, and error control. Some protocols are state- and connection-oriented. This layer tracks the segments and retransmits those that fail. It also provides the acknowledgment of the successful data transmission and sends the next data if no errors occurred. TCP/IP has two protocols at this layer, Transmission Control Protocol,TCP, and User Datagram Protocol, UDP.

Network

Network Layer implements means of transferring variable length data flows from a host on one network to a host on a different network while sustaining service quality. The network layer performs routing functions, and might also perform fragmentation and reassembly while reporting delivery errors. Routers operate at this layer, sending data throughout the neighboring networks. Several management protocols belong to the network layer, including routing protocols, multicast group management, network-layer information, error handling, and network-layer address assignment, which we will discuss further in the TCP/IP section later in this chapter.

Data Link

This layer is responsible for the host to host transfers on the same network. It defines the protocols to create and terminate the connections between two devices. The datalink layer provides transfers data between network hosts and provides the means to detect and possibly correct errors from the physical layer. Datalink frames, PDU for layer two, do not cross the boundaries of a local network.

Physical

The physical layer is represented by an ethernet cord plugs into a switch. This layer converts data in the form of digital bits into electrical, radio, or optical signals. Think of physical devices, like cables, switches, and wireless access points. The wire signaling protocols are defined at this layer as well.

Table 1-2. OSI Layers Details
Layer Number Layer Name Protocol Data Unit Overview function

7

Application

Data

High-level APIs and Application protocols like HTTP, DNS, and SSH.

6

Presentation

Data

Character encoding, data compression and encryption/decryption.

5

Session

Data

Continuous data exchanges between nodes are managed here, how much data to send, when to send more.

4

Transport

Segment, Datagram

Transmission of data segments between endpoints on a network, including segmentation, acknowledgment, and multiplexing.

3

Network

Packet

Structuring and managing addressing, routing and traffic control for all endpoints on the network.

2

Data link

Frame

Transmission of data frames between two nodes connected by a physical layer.

1

Layer link

Bit

Sending and Receiving of bitstreams over the medium.

The OSI model breaks out all the necessary functions to send a data packet over a network, between two hosts. In the late eighties and early nineties it lost out to TCP/IP as the standard adapted by the DOD and all other majors players in networking. The standard defined in ISO 7498 gives a brief glimpse into the implementation details that were considered by most at the time to be complicated, inefficient, and to an extent unimplementable. The OSI model at a high level still allows those learning networking to comprehend the basic concepts and challenges in networking. These terms and functions are used in TCP/IP in the next section, and ultimately in Kubernetes abstractions. Kubernetes Services break out the type depending on the layer it is operating at, port or ip, layer 3 or 4, more on that in the Kubernetes chapter. Next, we will do a deep dive into the TCP/IP suite with an example walk through.

TCP/IP

TCP/IP creates a heterogeneous network with open protocols that are independent of the operating system and architectural differences. Whether the hosts are running Windows, Linux or another OS, TCP/IP allows them to communicate; TCP/IP does not care if your running Apache or Nginx for your webserver at the Application layer. The breakdown in responsibilities similar to the OSI model makes that possible. In Figure 1-3 we compare the OSI model to TCP/IP, and in this section will expand on those differences.

OSI Model
Figure 1-3. OSI model compared TCP/IP
Application

In TCP/IP, the application layer comprises the communications protocols used in process-to-process communications across an IP network. The application layer standardizes communication and depends upon the underlying transport layer protocols to establish host-to-host data transfer. The lower transport layer also manages the data exchange in network communications. Applications at this layer have their RFCs; in this book, we will continue to use HTTP, RFC 7231, as our example for the application layer.

Transport

TCP and UDP are primary protocols of the transport layer that provide host-to-host communication services for applications. Transport protocols are responsible for connection-oriented communication, reliability, flow control, and multiplexing. In TCP, the window size manages flow control, while UDP does not manage the congestion flow and is considered unreliable, more on that in the UDP section later. Each port identifies the host process responsible for processing the information from the network communication. HTTP uses the well-known port of 80 for non-secure and 443 for secure communication. This port on the server identities its traffic, the sender generates a random port locally to identify itself. The governing body that manages port number assignments is the Internet Assigned Number Authority (IANA); there are 65,535 ports.

Internet

The Internet or network layer is responsible for transmitting data between networks. Outgoing packets, select the next-hop host, and transmit it to that host by passing it to the appropriate link-layer details; Once the packet is received by the destination, the Internet layer will pass the packet payload up to the appropriate transport layer protocol.

Internet Protocol, IP, provides fragmentation or defragmentation of packets based on the maximum transmission unit, MTU; this is the maximum size of the IP packet. IP makes no guarantees about packets’ proper arrival. Since packet delivery across diverse networks is inherently unreliable and failure-prone, that burden is with the endpoints of a communication path, rather than on the network. The function of providing service reliability is in the transport layer. A checksum ensures that the information in a received packet is accurate, but this layer does not validate data integrity. The IP address identifies packets on the network.

Layer link

The link layer in the TCP/IP model comprises networking protocols that operate only on the local network that a host connects. Packets are not routed to non-local networks, that is the internet layers role. Ethernet is the dominant protocol at this layer, and hosts are identified by the link-layer address or commonly their Media Access control addresses on their network interface cards. Once determined by the host, using Address Resolution Protocol, ARP, data sent off the local network is processed by the internet layer. This layer also includes protocols on how to move packets between two internet layer hosts.

Physical Layer

The physical layer defines the components of the hardware to use for the network. For example, the physical network layer stipulates the physical characteristics of the communications media. The physical layer of TCP/IP details hardware standards such as IEEE 802.3, the specification for Ethernet network media. Several interpretations of the RFC 1122 for the physical layer are included with the other layers; we have added this for completeness.

Throughout this book, we will use this minimal golang webserver from Example 1-1 to show various levels of networking components from tcpdump, Linux syscall, to how Kubernetes abstracts those. This section will use it to demonstrate what is happening at the Application, Transport, Network, and Datalink layers.

Application

The Application is the highest layer in the TCP/IP stack, here is where the user interacts with data before it gets sent over the network. In our example walk through we are going to use Hyper Text Transfer Protocol, HTTP, and a simple HTTP Transaction to demonstrate what happens at each layer in the TCP/IP stack.

HTTP

HTTP is responsible for sending and receiving Hypertext Markup Language documents; you know a web page. A vast majority of what we see and do on the Internet is over HTTP, Amazon purchases, Reddit posts, Tweets, all use HTTP. A client will make an HTTP request to our minimal go web server from Example 1-1, and it will send an HTTP response with text containing “Hello”. The web server runs locally inside an Ubuntu virtual machine to test the full TCP/IP stack.

Note

See the example code repo for full instructions.

Example 1-1. Minimal web server in Go
package main

import (
	"fmt"
	"net/http"
)

func hello(w http.ResponseWriter, _ *http.Request) {
	fmt.Fprintf(w, "Hello")
}

func main() {
	http.HandleFunc("/", hello)
	http.ListenAndServe("0.0.0.0:8080", nil)
}

In our Ubuntu vm we need to start our minimal web server or if you have golang installed locally you can just run

go run web-server.go

Let us break down the request for each layer of the TPC/IP stack.

cURL is the requesting client for our HTTP request example. Generally, for a web page the client would be a web browser but were using cURL to simplify and show the command line.

Note

cURL is meant for uploading and downloading data specified with a URL. It is a client-side program (the c), to request data from a URL, and return the response. https://curl.haxx.se/

In Example 1-2 we can see each part of the HTTP request that the cURL client is making and the response. Let’s review what all those options and outputs are below.

Example 1-2. Client Request
  curl localhost:8080 -vvv 1
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 2
> GET / HTTP/1.1 3
> Host: localhost:8080 4
> User-Agent: curl/7.64.1 5
> Accept: */* 6
>
< HTTP/1.1 200 OK 7
< Date: Sat, 25 Jul 2020 14:57:46 GMT 8
< Content-Length: 5 9
< Content-Type: text/plain; charset=utf-8 10
<
* Connection #0 to host localhost left intact
Hello* Closing connection 0 11
1

curl localhost:8080 -vvv This is curl command opening a connection to the locally running webserver, localhost on TCP port 8080. -vvv is setting the verbosity of the output, so we can see everything happening with the request. Also, TCP_NODELAY is set, and that instructs the TCP connection to send the data without delay, one of many options available to the client to set.

2

Connected to localhost (::1) port 8080 It worked! curl connected to the web server, on localhost, and port 8080.

3

Get / HTTP/1.1 HTTP has several methods for retrieving or updating information. In our request, we are performing an HTTP GET to retrieve our “Hello” response. The forward slash is the next part, universal resource locator (URL), that indicates where we are sending the client request to the server. The last section of this header is the version of HTTP the server is using, 1.1.

4

Host: localhost:8080 HTTP has several options for sending information about the request. In our request, the cURL process has set and sent the HTTP Host Header. The client and server can transmit information with an HTTP request or response. An HTTP header contains its name followed by a colon (:), then by its value.

5

User-Agent: cURL/7.64.1 The user agent is a string that indicates the computer program making the HTTP request on behalf of the end user; it is cURL in our context. This string often identifies the browser, its version number, and its host operating system.

6

Accept: */* This header instructs the web server what content types the client understands. Table 1-3 shows example of common Content-type that be sent.

7

HTTP/1.1 200 OK This is the server response to our request. The server responds with the HTTP version, the response status code. There are several possible responses from the server. A status code of 200 indicates the response was successful. 1XX Informational, 2XX Successful, 3XX Redirects, 4XX responses indicate there are issues with the requests, 5XX general refer to issues from the server.

8

Date: Sat, July 25, 2020, 14:57:46 GMT The “Date” header field represents the date and time at which the message originated. The sender generates the value as the approximate date and time of message generation.

9

Content-Length: 5 Content-Length header indicates the size of the message body, in bytes, sent to the recipient, in our case the message is 5 bytes.

10

Content-Type: text/plain; charset=utf-8 The Content-Type entity-header is used to indicate the resource’s media type. Our response is indicating that it is returning plain text file that is UTF-8 encoded.

11

Hello* Closing connection 0 Prints out the response from our web server and closes out the http connection.

Table 1-3. Common Content Types for HTTP Data

Type

Description

application

Any kind of binary data that doesn’t fall explicitly into one of the other types. Common examples include application/json, application/pdf, application/pkcs8, and application/zip.

audio

Audio or music data. Examples include audio/mpeg, audio/vorbis.

font

Font/typeface data. Common examples include font/woff, font/ttf, and font/otf.

image

Image or graphical data including both bitmap and vector such as animated GIF or APNG. Common examples are image/jpeg, image/png, and image/svg+xml.

model

Model data for a 3D object or scene. Examples include model/3mf and model/vrml.

text

Text-only data including human-readable content, source code, or text data. Examples include text/plain, text/csv, and text/html.

video

Video data or files, such as video/mp4.

This is a simplistic view that happens with every http requests. With today websites a single web pages makes an exorbitant amount of requests with one load of a page, and in seconds! This example is brief example forCluster administrators how HTTP and for that matter other layer sevens applications operate. We will continue to build our knowledge of how this request is completed at each layer of the TCP/IP stack, then how Kubernetes completes those same requests. All of this data is formatted and options set at Layer seven, but the real heavy lifting is done at the lower layers of the TCP/IP stack which will go over in the next sections.

Transport

The Transport protocols are responsible for connection-oriented communication, reliability, flow control, and multiplexing, this is mostly true of TCP, We’ll describe the differences in the following sections. Our golang webserver is a layer sever application using HTTP, the transport layer that http relies on is TCP.

TCP

As we already mentioned, TCP is a connection-oriented, reliable protocol, and provides flow control, and multiplexing. TCP is considered connection-oriented because it manages the connection state through the lifecycle of the connection . In TCP, the window size manages flow control, unlike UDP, which does not manage the congestion flow. In addition, UDP is unreliable and data may arrive out of sequence. Each port identifies the host process responsible for processing the information from the network communication. TCP is known as the host to host layer protocol. In order to identify the process on the host responsible for the connection, TCP identifies the segments with a 16 bit port number. HTTP Servers use the well-known port of 80 for non-secure and 443 for secure communication using TLS. Clients requesting a new connection create a source port local on the range of 0-65534.

To understand how TCP performs multiplexing, let us review a simple HTML page retrieval again.

  1. In a web browser type in a web page address.

  2. The browser opens a connection to transfer the page.

  3. The browser opens connections for each image on the page.

  4. The browser opens another connection for the external CSS.

  5. Each of these connections uses a different set of virtual ports.

  6. All the page’s assets download simultaneously.

  7. The browser reconstructs the page.

In [TCP Segment Header]we can see all the TCP segment headers that provide meta-data about the TCP streams. Let us walk through how TCP manages multiplexing with the information provided in the TCP segment headers.

TCP Segment Header
Figure 1-4. TCP Segment Header
  • Source port 16 bits Identifies the sending port.

  • Destination port (16 bits) Identifies the receiving port.

  • Sequence number (32 bits). If the SYN flag is set, this is the initial sequence number. The sequence number of the first data byte, and the acknowledged number in the corresponding ACK are then this sequence number plus 1. It is also used to reassemble data if it arrives out of order.

  • Acknowledgment number (32 bits) If the ACK flag is set, then this field’s value is the next sequence number of the ACK the sender is expecting. This acknowledges receipt of all preceding bytes (if any). Each end’s first ACK acknowledges the other end’s initial sequence number itself, but no data has been sent.

  • Data offset (4 bits) Specifies the size of the TCP header in 32-bit words.

  • Reserved (3 bits) For future use and should be set to zero.

  • Flags (9 bits) There are nine one bit fields defined for the TCP header.

    1. NS - ECN-nonce - concealment protection.

    2. CWR — Congestion Window Reduced - the sender reduced its sending rate.

    3. ECE — ECN Echo (the sender received an earlier congestion notification.

    4. URG — Urgent - the Urgent Pointer field is valid—rarely used.

    5. ACK — Acknowledgment - the Acknowledgment Number field is valid—always on after a connection is established.

    6. PSH — Push the receiver should pass this data to the application as soon as possible.

    7. RST — Reset the connection or connection abort, usually because of an error.

    8. SYN — Synchronize sequence numbers to initiate a connection.

    9. FIN — The sender of the segment is finished sending data to its peer.

Note

The NS bit field is further explained in “RFC 3540 Robust Explicit Congestion Notification (ECN) Signaling with Nonces”. This specification describes an optional addition to Explicit Congestion Notification improving robustness against malicious or accidental concealment of marked packets.

  • Window size - 16 bits - The size of the receive window.

  • Checksum - 16 bits - Checksum field is used for error-checking of the TCP header.

  • Urgent pointer - 16 bits - an offset from the sequence number indicating the last urgent data byte.

  • Options Variable 0–320 bits, in units of 32 bits.

  • Padding The TCP header padding is used to ensure that the TCP header ends, and data begins on a 32-bit boundary.

  • Data The piece of application data being sent in this segment.

The fields reviewed above help manage the flow of data between two systems. Figure 1-5 shows how each step of the TCP/IP stack sends data from one Application on one host, through a network communicating at layer one and two, to get data to the destination host. In our next section we will show how TCP uses these fields to initiate a connection through the three-way handshake.

OSI Model
Figure 1-5. TCP/IP Data Flow

TCP Handshake

TCP uses a three-way handshake, pictured in <three-way-tcp>>, in order to create a connection by exchanging information along the way with various options and flags.

  1. The requesting node sends a connection request via an SYN packet, SYNchronize (a computer what’s up) to get the transmission started.

  2. If the receiving node is listening on the port the sender requests, the receiving node replies with an SYN-ACK, ACKnowledging that it has heard the requesting node.

  3. The requesting node returns an ACK packet, exchanging information, letting them know they are good to send each other information.

OSI Model
Figure 1-6. TCP three-way handshake

Now the connection is established. Data can be transmitted over the psychical medium, routed between networks, find its way to the local destination—but how does the endpoint know how to handle the information? On the local and remote hosts, a socket gets created to track this connection. A socket is just a logical endpoint for communication. In chapter two, we will discuss how a Linux client and server handle sockets.

TCP is a stateful protocol, tracking the connection’s state throughout its lifecycle. The state of the connection depends on both the sender and receiver agreeing where they are in the connection flow. The connection state is concerned about who is sending and receiving data in the TCP stream. TCP has a complex state transition for explaining when and where the connection is, using the 9-bit TCP flags in the TCP segment header, as you can see in Figure 1-7.

TCP State Diagram
Figure 1-7. TCP State Transition Diagram

The TCP Connection States are:

  • LISTEN (server) represents waiting for a connection request from any remote TCP and port.

  • SYN-SENT (client) represents waiting for a matching connection request after sending a connection request.

  • SYN-RECEIVED (server) represents waiting for a confirming connection request acknowledgment after having both received and sent a connection request.

  • ESTABLISHED (both server and client) represents an open connection, data received can be delivered to the user—the intermediate state for the data transfer phase of the connection.

  • FIN-WAIT-1 (both server and client) represents waiting for a connection termination request from the remote host.

  • FIN-WAIT-2 (both server and client) represents waiting for a connection termination request from the remote TCP.

  • CLOSE-WAIT (both server and client) represents waiting for a local user’s connection termination request.

  • CLOSING (both server and client) represents waiting for a connection termination request acknowledgment from the remote TCP.

  • LAST-ACK - both server and client represents waiting for an acknowledgment of the connection termination request previously sent to the remote host.

  • TIME-WAIT (either server or client) represents waiting for enough time to pass to ensure the remote host received the acknowledgment of its connection termination request.

  • CLOSED (both server and client) represents no connection state at all.

Example 1-3 is a sample of James’s mac’s TCP connections, their state, and addresses for both ends of the connection.

Example 1-3. TCP Connection States
○ → netstat -ap TCP
Active Internet connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp6       0      0  2607:fcc8:a205:c.53606 g2600-1407-2800-.https ESTABLISHED
tcp6       0      0  2607:fcc8:a205:c.53603 g2600-1408-5c00-.https ESTABLISHED
tcp4       0      0  192.168.0.17.53602     ec2-3-22-64-157..https ESTABLISHED
tcp6       0      0  2607:fcc8:a205:c.53600 g2600-1408-5c00-.https ESTABLISHED
tcp4       0      0  192.168.0.17.53598     164.196.102.34.b.https ESTABLISHED
tcp4       0      0  192.168.0.17.53597     server-99-84-217.https ESTABLISHED
tcp4       0      0  192.168.0.17.53596     151.101.194.137.https  ESTABLISHED
tcp4       0      0  192.168.0.17.53587     ec2-52-27-83-248.https ESTABLISHED
tcp6       0      0  2607:fcc8:a205:c.53586 iad23s61-in-x04..https ESTABLISHED
tcp6       0      0  2607:fcc8:a205:c.53542 iad23s61-in-x04..https ESTABLISHED
tcp4       0      0  192.168.0.17.53536     ec2-52-10-162-14.https ESTABLISHED
tcp4       0      0  192.168.0.17.53530     server-99-84-178.https ESTABLISHED
tcp4       0      0  192.168.0.17.53525     ec2-52-70-63-25..https ESTABLISHED
tcp6       0      0  2607:fcc8:a205:c.53480 upload-lb.eqiad..https ESTABLISHED
tcp6       0      0  2607:fcc8:a205:c.53477 text-lb.eqiad.wi.https ESTABLISHED
tcp4       0      0  192.168.0.17.53466     151.101.1.132.https    ESTABLISHED
tcp4       0      0  192.168.0.17.53420     ec2-52-0-84-183..https ESTABLISHED
tcp4       0      0  192.168.0.17.53410     192.168.0.18.8060      CLOSE_WAIT
tcp6       0      0  2607:fcc8:a205:c.53408 2600:1901:1:c36:.https ESTABLISHED
tcp4       0      0  192.168.0.17.53067     ec2-52-40-198-7..https ESTABLISHED
tcp4       0      0  192.168.0.17.53066     ec2-52-40-198-7..https ESTABLISHED
tcp4       0      0  192.168.0.17.53055     ec2-54-186-46-24.https ESTABLISHED
tcp4       0      0  localhost.16587        localhost.53029        ESTABLISHED
tcp4       0      0  localhost.53029        localhost.16587        ESTABLISHED
tcp46      0      0  *.16587                *.*                    LISTEN
tcp6      56      0  2607:fcc8:a205:c.56210 ord38s08-in-x0a..https CLOSE_WAIT
tcp6       0      0  2607:fcc8:a205:c.51699 2606:4700::6810:.https ESTABLISHED
tcp4       0      0  192.168.0.17.64407     do-77.lastpass.c.https ESTABLISHED
tcp4       0      0  192.168.0.17.64396     ec2-54-70-97-159.https ESTABLISHED
tcp4       0      0  192.168.0.17.60612     ac88393aca5853df.https ESTABLISHED
tcp4       0      0  192.168.0.17.58193     47.224.186.35.bc.https ESTABLISHED
tcp4       0      0  localhost.63342        *.*                    LISTEN
tcp4       0      0  localhost.6942         *.*                    LISTEN
tcp4       0      0  192.168.0.17.55273     ec2-50-16-251-20.https ESTABLISHED

Now that we know more about how TCP constructs and tracks connections, let us review the HTTP request for our web server at the transport layer using TCP. In order to accomplish this, we use a command-line tool called tcpdump.

TCP dump

Tcpdump prints out a description of the contents of packets on a network interface that matches the boolean expression.

tcpdump manpage

Tcpdump allows administrators and users to display all the packets processed on the system and filter them out based on many TCP segment header details. In the request, we filter all packets with the destination port 8080 on the network interface labeled lo0, this is the local loopback interface on the mac. Our webserver is running on 0.0.0.0:8080. Figure 1-8 shows where tcpdump is collecting data in reference to the full TCP/IP stack, between the NIC driver and layer two.

Note

A loopback interface is a logical, virtual interface on a device. A loopback interface is not a physical interface like ethernet interface. Loopback interfaces interfaces are always up and running and always available, even if other interfaces are down on the host.

TCP DUMP Packet Capture
Figure 1-8. TCP Dump Packet Capture

The general format of a tcpdump extract tos, TTL, id, offset, flags, proto, length, options.

  • tos is the type of service field.

  • TTL is the time-to-live; it is not reported if it is zero. id is the IP identification field.

  • offset is the fragment offset field; it is printed whether this is part of a fragmented datagram or not.

  • flags The DF, Don’t Fragment, flag indicates that the packet cannot be fragmented for transmission. When unset, it indicates that the packet can be fragmented. The MF, More Fragments, flag indicates there are packets that contain more fragments and when unset, it indicates that no more fragments remain.

  • proto is the protocol ID field.

  • length is the total length field.

  • options are the IP options.

Systems that support checksum offloading, IP, TCP, and UDP checksums are calculated on the NIC before being transmitted on the wire. Since we are running tcpdump packet capture before the NIC errors like cksum 0xfe34 (incorrect -> 0xb4c1) appear in the output of Example 1-4.

To produce the output for Example 1-4, open another terminal start a tcpdump trace on the loopback for only tcp and port 8080, otherwise you will see a lot of other packets not relevant to our example. You’ll need to use escalated privileges to trace packets, so sudo in this case.

Example 1-4. TCP Dump
  sudo tcpdump -i lo0 tcp port 8080 -vvv  1

tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 262144 bytes  2

08:13:55.009899 localhost.50399 > localhost.http-alt: Flags [S], cksum 0x0034 (incorrect -> 0x1bd9), seq 2784345138,
win 65535, options [mss 16324,nop,wscale 6,nop,nop,TS val 587364215 ecr 0,sackOK,eol], length 0 3

08:13:55.009997 localhost.http-alt > localhost.50399: Flags [S.], cksum 0x0034 (incorrect -> 0xbe5a), seq 195606347,
ack 2784345139, win 65535, options [mss 16324,nop,wscale 6,nop,nop,TS val 587364215 ecr 587364215,sackOK,eol], length 0  4

08:13:55.010012 localhost.50399 > localhost.http-alt: Flags [.], cksum 0x0028 (incorrect -> 0x1f58), seq 1, ack 1,
win 6371, options [nop,nop,TS val 587364215 ecr 587364215], length 0  5

v 08:13:55.010021 localhost.http-alt > localhost.50399: Flags [.], cksum 0x0028 (incorrect -> 0x1f58), seq 1, ack
1, win 6371, options [nop,nop,TS val 587364215 ecr 587364215], length 0  6

08:13:55.010079 localhost.50399 > localhost.http-alt: Flags [P.], cksum 0x0076 (incorrect -> 0x78b2), seq 1:79,
ack 1, win 6371, options [nop,nop,TS val 587364215 ecr 587364215], length 78: HTTP, length: 78  7
GET / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.64.1
Accept: */*
08:13:55.010102 localhost.http-alt > localhost.50399: Flags [.], cksum 0x0028 (incorrect -> 0x1f0b), seq 1,
ack 79, win 6370, options [nop,nop,TS val 587364215 ecr 587364215], length 0  8

08:13:55.010198 localhost.http-alt > localhost.50399: Flags [P.], cksum 0x00a1 (incorrect -> 0x05d7), seq 1:122,
ack 79, win 6370, options [nop,nop,TS val 587364215 ecr 587364215], length 121: HTTP, length: 121  9
HTTP/1.1 200 OK
Date: Wed, 19 Aug 2020 12:13:55 GMT
Content-Length: 5
Content-Type: text/plain; charset=utf-8
Hello[!http]

08:13:55.010219 localhost.50399 > localhost.http-alt: Flags [.], cksum 0x0028 (incorrect -> 0x1e93), seq 79,
ack 122, win 6369, options [nop,nop,TS val 587364215 ecr 587364215], length 0  10

08:13:55.010324 localhost.50399 > localhost.http-alt: Flags [F.], cksum 0x0028 (incorrect -> 0x1e92), seq 79,
ack 122, win 6369, options [nop,nop,TS val 587364215 ecr 587364215], length 0  11

08:13:55.010343 localhost.http-alt > localhost.50399: Flags [.], cksum 0x0028 (incorrect -> 0x1e91), seq 122,
ack 80, win 6370, options [nop,nop,TS val 587364215 ecr 587364215], length 0  12

08:13:55.010379 localhost.http-alt > localhost.50399: Flags [F.], cksum 0x0028 (incorrect -> 0x1e90), seq 122,
ack 80, win 6370, options [nop,nop,TS val 587364215 ecr 587364215], length 0  13

08:13:55.010403 localhost.50399 > localhost.http-alt: Flags [.], cksum 0x0028 (incorrect -> 0x1e91), seq 80, ack
123, win 6369, options [nop,nop,TS val 587364215 ecr 587364215], length 0  14

 12 packets captured, 12062 packets received by filter 0 packets dropped by kernel.  15
1

The start the tcpdump collection with the command, and its options. sudo - packet captures require escalated privileges. tcpdump is the tcpdump binary. -i lo0 - -i is the interface from which we want to capture packets. dst port 8080 - this option is the matching expression is the man page discussed, here we are matching on all packets destined for TCP port 8080, the port the web service is listening for requests. -v - -v is the verbose options, this allows us to see more details from the tcpdump capture.

2

Feedback from tcpdump letting us know about the tcpdump filter running.

3

The first packet in the TCP handshake, the syn packet, which we can tell because in the flags bit is set with [S], and the sequence number is set to 2784345138 by cURL, the localhost process number 50399.

4

The SYN-ACK packet is next the one filtered by tcpdump from the localhost.http-alt process, the golang web server. The flag is to [S.], syn-ack. The packet sends 195606347 as the next sequence number, and ack 2784345139 is set to acknowledge the previous packet.

5

The acknowledgment packet from cURL is now sent back to the server with act flag set, [.], with the ack and syn numbers set to 1 indicating it is ready to send data.

6

The acknowledgment number is set to 1 to indicate the client’s SYN flag’s receipt in the opening data push.

7

The TCP connection established, both the client and server are ready for data transmission. The next packets are our data transmissions of the HTTP request with the flag set to Data push and ACK, [P.]. The previous packets had a length of zero, but the HTTP request is 78 bytes long, with a sequence number of 1:79.

8

The server acknowledges the receipt of the data transmission, with the ACK flag set, [.], by sending the acknowledgment number of 79.

9

This packet is the HTTP server’s response to the cURL request, data push flag set,[P.], and acknowledges the previous packet with ack number of 79. A new sequence number is set with the data transmission,122, and the data length is 121 bytes.

10

The cURL client acknowledges the receipt of the packet with ACK flag set, and the acknowledgment number to 122, and sets the sequence number to 79.

11

The start of closing the TCP connection, with the client sending the FIN-ACK packet, the [F.], acknowledging the receipt of the previous packet, number 122, and a new sequence number to 80

12

The server increments the acknowledgment number to 80 and sets the ACK flag.

13

TCP requires that both the sender and receiver set the FIN packet for closing the connection. This is that packet, the FIN and ACK flags are set.

14

The final ack from the client here, with acknowledgment number 123. The connection is closed now.

15

Tcpdump on exit lets us know the number of packets in this capture, the total number of the packets captured during the tcpdump, and how many packets were dropped by the operating system.

Tcpdump is an excellent troubleshooting application for network engineers as well as Cluster Administrators. Being able to verify connectivity at many levels in the cluster and the network are valuable skills to have. We will see later in the Kubernetes and Cloud Network chapters how useful tcpdump can be.

Our example was a simple http application using TCP. All of this data was sent over the network in plain text. While this example was a simple hello world, other requests like our bank logins need to have some security. The Transport layer does not offer any security protection for data transiting the network. In order to do so Transport Layer Security, TLS, adds additional security on top of TCP. Let’s dive into that in our next section on TLS.

TLS

Transport Layer Security, TLS, adds encryption to the TCP transport protocol. TLS is an add-on to the TCP/IP suite and not consider to be part of the base operation for TCP. HTTP transactions can be completed without TLS but are not secure from ease droppers on the wire. TLS is a combination of protocols used to ensure traffic is seen between the sender and the intended recipient. TLS much like TCP uses a handshake to establish encryption capabilities and exchange keys for encryption. Figure 1-9 details the TLS handshake between client and server.

TLS Handshake
Figure 1-9. TLS Handshake
  1. ClientHello - Contains the cipher suites supported by the client and a random number.

  2. ServerHello - This message contains the cipher it supports and a random number.

  3. ServerCertificate - Contains the server’s certificate along with its server public key.

  4. ServerHelloDone - This is the end of the ServerHello. If the client receives a request for its certificate, it sends a ClientCertificate message.

  5. ClientKeyExchange - Based on the server’s random number, our client generates a random Pre-Master Secret, encrypts it with the server’s public key certificate, and sends it to the server.

  6. Key Generation - The client and server generate a master secret from the Pre-Master Secret and exchanged random values.

  7. ChangeCipherSpec - Now the client and server swap their ChangeCipherSpec to begin using the new keys for encryption.

  8. Finished Client - The client sends the Finished message to confirm that the key exchange and authentication were successful.

  9. Finished Server - Now, the server Sends the Finished message to the client to end the handshake.

Kubernetes applications and components will manage TLS for developers, so a basic introduction is required, more about TLS and Kubernetes will be reviewed in the Services and Ingress in Chapter five.

As demonstrated with our Web server, cURL, and tcpdump, Transmission Control Protocol is a stateful and reliable protocol for sending data between hosts. Its use of flags combined with the sequence and acknowledgment number dance it performs, delivers thousands of messages over unreliable networks across the globe. That reliability comes at a cost, however. Of the 12 packets we set, only two were real data transfers. For applications that do not need reliability such as voice, and the overhead that comes with, User datagram Protocol, UDP, offers an alternative. Now that we understand how TCP works as a reliable connection-oriented protocol let us review how UDP differs from TCP.

UDP

UDP offers an alternative to applications that do not need the reliability that TCP provides. UDP is an excellent choice for applications that can withstand packet loss such as voice and DNS. UDP offers little overhead from a network perspective, only having four fields and no data acknowledgment, unlike its verbose brother TCP.

It is transaction-oriented, suitable for simple query and response protocols like Domain Name Services, DNS, or Simple Network Management Protocol, SNMP. UDP slices request into datagrams, making it capable for use with other protocols for tunneling like vpn. It is lightweight and straightforward, making it great for bootstrapping application data in the case of DHCP. The stateless nature of data transfer makes UDP perfect for applications, such as voice, that can withstand packet loss—did you hear that? UDP’s lack of retransmit also makes it apt choice for streaming video.

Figure 1-10 lays out the small amount of headers required in a UDP datagram.

udp header
Figure 1-10. UDP Header
  • Source port number - 2 bytes - Identifies the sender’s port. The source host is the client; the port number is ephemeral. UDP ports have well-known numbers like DNS on 53 or DHCP 67/68.

  • Destination port number - 2 bytes - Identifies the receiver’s port and is required.

  • Length - 2 bytes - Specifies the length in bytes of the UDP header and UDP data. The minimum length is 8 bytes, the length of the header.

  • Checksum - 2 bytes - The checksum field is used for error-checking of the header and data. It is optional in IPv4, but mandatory in IPv6 and is all-zeros if unused.

UDP and TCP are general transport protocols that help ship and receive data between hosts. Kubernetes supports both protocols on the network and Services allows users to load balancer many pods using services. Also, important to note is that in each service developers must define the Transport protocol, if not TCP is the default used.

The next layer in the TCP/IP stack is the Internetworking layer—there our packets can get sent across the globe on the vast networks that make up the Internet. Let’s review how that gets completed.

Network

All TCP and UDP data get transmitted as IP packets in TCP/IP in the Network Layer. The Internet or network layer is responsible for transferring data between networks. Outgoing packets select the next-hop host, and send it to that host by passing it to the appropriate link-layer details; Packets are received by a host, de-encapsulated, and sent up to the proper transport layer protocol. In IPv4, both transmit and receive, IP provides fragmentation or defragmentation of packets based on the maximum transmission unit MTU; this is the maximum size of the IP packet.

IP makes no guarantees about packets’ proper arrival; since packet delivery across diverse networks is inherently unreliable and failure-prone, that burden is with the endpoints of a communication path, rather than on the network. As discussed in the previous section, providing service reliability is a function of the transport layer. Each packet has a checksum to ensure that the received packet’s information is accurate, but this layer does not validate data integrity. A source and destination IP addresses identify packets on the network, which we’ll address next.

Internet Protocol

The almighty packet is defined in RFC 791 and is used for sending data across networks. It’s time to dissect the IP packet, beginning with the header detailed in Figure 1-11.

IPv4 Header Format
Figure 1-11. IPv4 Header Format
  • Version The first header field in the IP packet is the four-bit version field. For IPv4, this is always equal to four.

  • Internet Header Length (IHL) The IPv4 header has a variable size due to the optional 14th field options.

  • TOS - Type of Service, Originally defined as the type of service (ToS), now Differentiated Services Code Point (DSCP), this field specifies differentiated services. DSCP allows for routers and networks to make decisions on packet priority during times of congestion. Technologies such as Voice over IP use DSCP to ensure calls take precedence over other traffic.

  • Total Length - The entire packet size in bytes.

  • Identification - Identification field and is used for uniquely identifying the group of fragments of a single IP datagram.

  • Flags - Used to control or identify fragments. In order from most significant to least:

    • bit 0: Reserved, set to zero

    • bit 1: Do not Fragment, DF

    • bit 2: More Fragments, MF

  • Fragment Offset - Specifies the offset of a distinct fragment relative to the first unfragmented IP packet. The first fragment always has an offset of zero.

  • Time To Live (TTL) An eight-bit time to live field helps prevent datagrams from going in circles on a network.

  • Protocol - Protocol used in the data section of the IP packet. IANA has a list of IP protocol numbers in RFC 790, some well-known protocols are also detailed in Table 1-4.

Table 1-4. IP Protocol Numbers
Protocol Number Protocol Name Abbreviation

1

Internet Control Message Protocol

ICMP

2

Internet Group Management Protocol

IGMP

6

Transmission Control Protocol

TCP

17

User Datagram Protocol

UDP

41

IPv6 encapsulation

ENCAP

89

Open Shortest Path First

OSPF

132

Stream Control Transmission Protocol

SCTP

  • Header Checksum - 16-bit - IPv4 header checksum field is used for error-checking. When a packet arrives, a router computes the header’s checksum; the router drops the packet if the two values do not match. The encapsulated protocol must handle errors in the data field. Both UDP and TCP have checksum fields.

Note

When the router receives a packet, it lowers the TTL field by one. As a consequence, the router must compute a new checksum.

  • Source address - IPv4 address of the sender of the packet.

Note

The source address may be changed in transit by a network address translation device; NAT will be discussed later in this chapter and extensive in Container networking.

  • Destination address IPv4 address of the receiver of the packet. As with the source address; A NAT device can change the destination IP address.

  • Options - The possible options in the header are Copied, Option Class, Option Number, Option Length, Option Data.

The crucial component here is the Address; it’s how networks are identified. They simultaneously identify the host on the network and the whole network itself, more on that in the routing section. Understanding how to identify an IP address is critical for an engineer. First, we will review IPv4 then understand the drastic changes in IPv6.

IPv4 Address
Figure 1-12. IPv4 Address

IPv4 addresses are in the dotted-decimal notation for us humans; computers read them out as binary strings, Figure 1-12 details the dotted-decimal notation and binary. Each section is 8 bits in length, four sections making the complete length 32 bits. IPv4 addresses have two sections: the first part is the network, and the second is the host’s unique identifier on the network. In Example 1-5, we have the output of a computer’s IP address for its network interface card. In Example 1-5 we can see its IPv4 address is 192.168.1.2. The IP address also has a subnet mask or netmask associated with them to make out what network it is assigned. The example’s subnet is netmask 0xffffff00 in dotted-decimal, which is 255.255.255.0.

Example 1-5. IP address
○ → ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	options=400<CHANNEL_IO>
	ether 38:f9:d3:bc:8a:51
	inet6 fe80::8f4:bb53:e500:9557%en0 prefixlen 64 secured scopeid 0x6
	inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255
	nd6 options=201<PERFORMNUD,DAD>
	media: autoselect
	status: active

The subnet brings up the idea of IP addressing Classful addressing. Initially, when an IP address range was assigned, A range was considered to be the combination of an 8, 16, or 24-bit network prefix along with a 24, 16, or 8-bit host identifier, respectively. Class A had 8 bits for the host, Class B 16, and Class C 24. Following that, Class A had 2 to the power of 16 hosts available, 16,777,216, Class B 65,536, and Class C has 256. Each class had a host address, the first one in its boundary, and the last one designated as the broadcast address, Figure 1-13 demonstrates this for us.

Note

There are two other classes, but they are not generally used in IP addressing. Class D addresses are used for IP multicasting and Class E addresses are reserved for experimental use.

IPv4 Class Address
Figure 1-13. IP Class

Classful addressing was not scalable on the Internet so to help alleviate that scale issue, we began breaking up the class boundaries using Classless Interdomain Routing, CIDR ranges. Instead of having the full 16 million-plus addresses in a class address range, an Internet entity was only giving a subsection of that range. This effectively allows network engineers to move the subnet boundary to anywhere inside the class range, given them more flexible with CIDR ranges, and helps scale IP address ranges.

CIDR example
Figure 1-14. CIDR Example

In Figure 1-15, we can see the breakdown of the 208.130.29.33 IPv4 address, and the hierarchy that it creates. The 208 .128 .0.0/11 CIDR range is assigned to MCI from IANA. MCI further breaks down the subnet to smaller and smaller subnets for its purposes. Leading to the single host on the network 208.130.29.33/32.

Note

The global coordination of the DNS Root, IP addressing, and other Internet protocol resources are performed by the Internet Assigned Numbers Authority (IANA).

Eventually, though, even this practice of using CIDR to extend the range of IPv4 address led to an exhaustion of address spaces that could be doled out. Leading Network Engineers and IETF to develop the IPv6 standard.

IPv6, unlike IPv4, uses hexadecimal to shorten them for writing purposes. It has similar characteristics to IPv4 in that is has a host and network prefix.

IPv4 Address
Figure 1-15. IPv6 Address

The most significant difference between IPv4 and 6 is the size of the address space. IPv4 has 32 bits, while IPv6 has 128 bits to produce its addresses. To put that size differential in perspective, here are those numbers:

IPv4 has 4,294,967,296

IPv6 has 340,282,366,920,938,463,463,374,607,431,768,211,456

Now that we understand how an individual host on the network is identified and what network it belongs to, we will explore how those networks exchange information between themselves using routing protocols.

Getting Round the Network

Packets are addressed, data’s ready to be sent, but how do our packets get from our host on our network to the intended hosted on another network half-way around the world? That is the job of routing. There are several routing protocols, but the Internet relies on BGP. BGP Stands for “border gateway protocol,” a Dynamic Routing Protocol used to manage how packets get routed between edge routers on the Internet. It is relevant for us because some Kubernetes network implementations use BGP to route cluster network traffic between nodes. Between each node on separate networks is a series of routers.

If we refer to the Map of the Internet in Figure 1-1, each network on the Internet is assigned a BGP AS, autonomous system number, to designate a single administrative entity or corporation that presents a common and clearly defined routing policy on the Internet. BGP and AS Numbers allows Network Administrators to maintain control of their internal network routing while announcing and summarizing their routes on the Internet. Table 1-5 lists out the available AS numbers managed by IANA and other regional entities.

Table 1-5. Complete table of ASN available

Number

Bits

Description

Reference

0

16

Reserved

RFC1930, RFC7607

1 - 23455

16

Public ASNs

23456

16

Reserved for AS Pool Transition

RFC6793

23457 - 64495

16

Public ASNs

64496 - 64511

16

Reserved for use in documentation/sample code

RFC5398

64512 - 65534

16

Reserved for private use

RFC1930, RFC6996

65535

16

Reserved

RFC7300

65536 - 65551

32

Reserved for use in documentation and sample code

RFC4893, RFC5398

65552 - 131071

32

Reserved

131072 - 4199999999

32

Public 32-bit ASNs

4200000000 - 4294967294

32

Reserved for private use

RFC6996

4294967295

32

Reserved

RFC7300

Table 1-5 source data - “Autonomous System (AS) Numbers.” IANA.org. 2018-12-07. Retrieved 2018-12-31.” https://www.iana.org/assignments/as-numbers/as-numbers.xhtml

BGP Routing
Figure 1-16. BGP Routing Example

In Figure 1-16 we have 5 AS’s, 100-500. A host on 130.10.1.200 wants to reach a host destined on 150.10.2.300. Once the local router or default gateway for the host 130.10.1.200 receives the packet it will look for the interface and path for 150.10.2.300 that BGP has determined for that route. Based on the routing table in the Figure 1-17 the router for AS 100 determined the packet belongs to AS 300, and the preferred path is out interface 140.10.1.1. Rinse and repeat on AS 200 till the local router for 150.10.2.300 on AS 300 receives that packet. The flow here is described in Figure 1-6 TCP/IP data flow between networks. A basic understanding of BGP is needed because some Container Networking projects used it for routing between nodes.

Route Table
Figure 1-17. Local Routing Table

Figure 1-17 displays a local route table. The interface that a packet would be sent out based on the destination IP address. For example for a packet destined for 192.168.1.153 will be sent out the link#11 gateway, which is local to the network and no routing is needed. 192.168.1.254 is the router on the network attached to my Internet connection. If the destination network is unknown, it is sent out the Default route.

Note

Like all Linux and BSDs OS, you can find netstat’s man page (man netstat). Apple’s netstat is derived from the BSD version. More information can be found here https://docs.freebsd.org/en/books/handbook/advanced-networking/#network-routing

Routers continuously communicate on the Internet, exchange route information, and inform each other of changes on their respective networks. BGP takes care of a lot of that data exchange, but for Network Engineers and System administrators, they can use ICMP protocol and PING cli tools to test connectivity between hosts and routers.

ICMP

PING is a network utility that uses ICMP for testing connectivity between hosts on the network. In Example 1-6, we see a successful ping test to 192.168.1.2, with five packets all returning an ICMP echo reply.

Example 1-6. ICMP Echo Request
○ → ping 192.168.1.2 -c 5
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=64 time=0.052 ms
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.089 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.142 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.050 ms
64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=0.050 ms
--- 192.168.1.2 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.050/0.077/0.142/0.036 ms

Example 1-7 shows a failed ping attempt that times out trying to reach host 1.2.3.4. Routers and administrators will use pings for testing connective, and it is useful in testing container connectivity as well. More on that in Chapter 2 and 3 as we deploy our minimal golang web server into a container and a pod in those respective chapters.

Example 1-7. ICMP Echo Request Failed
○ → ping 1.2.3.4 -c 4
PING 1.2.3.4 (1.2.3.4): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
--- 1.2.3.4 ping statistics ---
4 packets transmitted, 0 packets received, 100.0% packet loss

As with TCP and UDP there are headers, data and options in an ICMP packet, those are reviewed below and shown in Figure 1-18.

  • Type - ICMP type

  • Code - ICMP subtype

  • Checksum - Internet checksum for error checking, calculated from the ICMP header and data with value 0 substitutes for this field.

  • Rest of Header - Four-bytes field, contents vary based on the ICMP type and code.

  • Data - ICMP error messages contain a data section that includes a copy of the entire IPv4 header.

icmp header
Figure 1-18. ICMP Header
Note

Some consider ICMP a transport layer protocol since it does not use TCP or UDP, per RFC-792, it defines ICMP, which provides routing, diagnostic and error functionality for IP. Although ICMP messages are encapsulated within IP datagrams, ICMP processing is considered and is typically implemented as part of the IP layer. ICMP is IP protocol 1, while TCP is 6 and UDP is 17.

The value identifies control messages in the Type field. The code field gives additional context information for the message. Some standard ICMP type numbers in Table 1-6.

Table 1-6. Common ICMP Type Numbers

Number

Name

Reference

0

Echo Reply

[RFC792]

3

Destination Unreachable

[RFC792]

5

Redirect

[RFC792]

8

Echo

[RFC792]

Now that our packets know what networks they are being sourced and destined to, it is time to start physically sending this data request across the network; this is the responsibility of the Link layer.

Link Layer

The HTTP request has been broken up into segments, Addressed for routing across the Internet, and now all that is left is to send the data across the wire. The link layer of TCP/IP stack comprises two sub-layers; the Media Access Control (MAC) sublayer, and the logical link control (LLC) sublayer. Together, they perform the OSI layers 1 and 2, data link and the physical layer. The link layer is responsible for connectivity to the local network. The first sublayer, MAC is responsible for access to the physical medium. The LLC has the privilege of managing flow control and multiplexing protocols, over the MAC layer to transmit and demultiplexing when receiving as shown in Figure 1-19. IEEE standard 802.3, Ethernet, defines the protocols for sending and receiving frames to encapsulate IP packets. IEEE 802 is the overarching standard for LLC (802.2), Wireless (802.11), and Ethernet/MAC (802.3).

ethernet-demux
Figure 1-19. Ethernet Demultiplexing Example

As with the other Protocol Data units, Ethernet has a header and footers, let us review those in detail as seen in Figure 1-20.

ethernet header
Figure 1-20. Ethernet Header and Footer
  • Preamble - 8 bytes - Alternating string of ones and zeros indicate to the receiving host that a frame is incoming.

  • Destination MAC Address - six bytes - Media access control destination Address, the ethernet frame recipient.

  • Source MAC Address - six bytes - Media access control source Address, the ethernet frame source.

  • VLAN tag - four bytes - Optional 802.1Q tag to differentiate traffic on the network segments.

  • Ether-type - two bytes - Indicates which protocol is encapsulated in the payload of the Frame.

  • Payload - Variable length - The encapsulated IP packet.

  • Frame Check Sequence FCS or Cycle Redundancy Check - CRC - 4 bytes - The frame check sequence (FCS) is a four-octet cyclic redundancy check (CRC) which allows the detection of corrupted data within the entire Frame as received on the receiver side. The CRC is part of the Ethernet frame footer.

In Figure 1-21 we can see that MAC addresses get assigned to network interface hardware at the time of manufacture. MAC Addresses have two parts the Organization Unit Identifier, OUI, and the Network Interface Card specific parts.

Mac Address
Figure 1-21. MAC address

The Frame indicates to the recipient of the network layer packet type. Table 1-7 details the common protocols handled. In Kubernetes, we are mostly interested in IPv4 and ARP packets. IPv6 has recently been introduced to Kubernetes in 1.19 release.

Table 1-7. Common Ethertype Protocols

EtherType

Protocol

0x0800

Internet Protocol version 4 (IPv4)

0x0806

Address Resolution Protocol (ARP)

0x8035

Reverse Address Resolution Protocol (RARP)

0x86DD

Internet Protocol Version 6 (IPv6)

0x88E5

MAC security (IEEE 802.1AE)

0x9100

VLAN-tagged (IEEE 802.1Q) frame with double tagging

When an IP packet reaches its destination network, the destination IP address is resolved with the Address Resolution Protocol for IPv4 (Neighbor Discovery Protocol in the case of IPV6) into the destination host’s MAC address. The Address Resolution Protocol must manage address translation from Internet addresses to link-layer addresses on Ethernet networks. The ARP table is for fast lookups for those known hosts, so it does not have to send an arp request for every Frame the host wants to send out. Example 1-8 shows the output of a local arp table. All devices on the network keep a cache of ARP address for this purpose.

Example 1-8. ARP table
○ → arp -a
? (192.168.0.1) at bc:a5:11:f1:5d:be on en0 ifscope [ethernet]
? (192.168.0.17) at 38:f9:d3:bc:8a:51 on en0 ifscope permanent [ethernet]
? (192.168.0.255) at ff:ff:ff:ff:ff:ff on en0 ifscope [ethernet]
? (224.0.0.251) at 1:0:5e:0:0:fb on en0 ifscope permanent [ethernet]
? (239.255.255.250) at 1:0:5e:7f:ff:fa on en0 ifscope permanent [ethernet]

Figure 1-22 shows the exchange between hosts on the local network. The Browser makes an HTTP request for a website hosted by the Target server. Through DNS, it determines that the server has the IP address 10.0.0.1. To continue to send the HTTP request, it also requires server’s MAC address. First, the requesting computer consults a cached ARP table to look up 10.0.0.1 for any existing records of server’s MAC address. If the MAC address is found, it sends an Ethernet frame with destination address of server’s MAC address, containing the IP packet addressed to 10.0.0.1 onto the link. If the cache did not produce a hit for 10.0.0.2, the requesting computer must send a broadcast ARP request message with a destination mac address of FF:FF:FF:FF:FF:FF, which is accepted by all hosts on the local network, requesting an answer for 10.0.0.1. The server responds with an ARP response message containing its MAC and IP address . As part of answering the request, the server may insert an entry for requesting computer’s MAC into its ARP table for future use. The requesting computer receives and caches the response information in its ARP table and can now send the HTTP packets.

ARP Request
Figure 1-22. ARP request

It also brings a crucial concept on the local network, broadcast domains. All packet on the broadcast domain receive all the ARP messages from hosts. As well as all frames are sent all nodes on the broadcast,the hosts compares the destination MAC address to its own. It will discard frames not destined for itself. As hosts on the network grow, so too does the broadcast traffic. We can use tcpdump to view all the ARP requests happening on the local network as in Example 1-9. The packet capture details they ARP packets, the Ethernet type used, Ethernet (len 6), The higher-level protocol, IPv4. Who is requesting the MAC of the IP address, Request who-has 192.168.0.1 tell 192.168 .0.12.

Example 1-9. ARP tcpdump
○ → sudo tcpdump -i en0 arp -vvv
tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:26:25.906401 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:27.954867 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:29.797714 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:31.845838 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:33.897299 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:35.942221 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:37.785585 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:39.628958 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.13, length 28
17:26:39.833697 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:41.881322 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:43.929320 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:45.977691 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
17:26:47.820597 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.1 tell 192.168.0.12, length 46
^C
13 packets captured
233 packets received by filter
0 packets dropped by kernel

In order to further segment the layer two network, network engineers can use Virtual Local Area Network, VLAN, tagging. Inside the Ethernet frame header is an optional VLAN tag that differentiates traffic on the LAN. It is useful to use VLANs to break up LANs and manage networks on the same switch or different ones across the network campus. Routers between VLANs filter broadcast traffic, enable network security and alleviate network congestion. They are useful to the network administrator for those purposes, but Kubernetes network administrators can use the extended version of VLANs technology known as Virtual Extensible LAN, VXLAN.

In Figure 1-23 we can see how VXLAN, is an extension of VLAN that allows network engineers to encapsulate layer two frames into layer 4 UDP packets. It increases scalability up to 16 million logical networks and allows for layer two adjacency across IP networks. This technology is used in Kubernetes Networks to produce overlay networks, more on that in later chapters.

VXLAN
Figure 1-23. VXLAN packet

Ethernet also details the specifications for the medium to transmit frames on such as twisted pair, coaxial cable, optical fiber, wireless, or other transmission media yet to be invented (a gamma-ray network which powers the Philotic Parallax Instantaneous Communicator 1). Ethernet even defines the encoding and signaling protocols used on the wire; this is out of scope for our proposes.

The Link layer has multiple other protocols involved from a network perspective. Like the layers above, we have only touched on the surface of the link layer. We constrained this book to those details needed for a base understanding of the Link layer for the Kubernetes networking model.

Revisiting our Web Server

Our journey through all the layers TCP/IP is complete. Figure 1-24 outlines of all the Headers and footers each layer of the TCP/IP produces to send data across the Internet.

Full view
Figure 1-24. TCP/IP PDU full view

Let us recap the journey and remind ourselves again what is going on now that we understand each layer in detail. Here is our web server again, and the cURL request for it from earlier in the chapter.

Example 1-10. Minimal web server in Go
package main

import (
	"fmt"
	"net/http"
)

func hello(w http.ResponseWriter, _ *http.Request) {
	fmt.Fprintf(w, "Hello")
}

func main() {
	http.HandleFunc("/", hello)
	http.ListenAndServe("0.0.0.0:8080", nil)
}
Example 1-11. Client Request
○ → curl localhost:8080 -vvv
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Sat, 25 Jul 2020 14:57:46 GMT
< Content-Length: 5
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
Hello* Closing connection 0

We begin with the webserver waiting for a connection in Example 1-10. Curl requests the HTTP server at 0.0.0.0 on port 8080. Curl determines the IP address and port number from the URL and proceeds to establish a TCP connection to the server. Once the connection is set up, TCP Handshake, cURL sends the HTTP request. When the web server starts up, a socket of 8080 is created on the HTTP server, which matches the TCP is port 8080; the same is done on the cURL client-side with a random port number. Next, this information is sent to the Network Protocol Layer where the source and destination IP address are attached to the packet’s IP header. At client’s data link layer, the source MAC address of the NIC is added to the ethernet frame. If the destination mac address is unknown, an ARP request is made to find it . Next, the NIC is used to transmit the Ethernet frames to the webserver.

When the web server receives the request, it creates packets of data that contains the HTTP response. The packets are sent back to the cURL process by routing it through the Internet using the source IP address on the request packet. Once received by the cURL process, the packet is sent from the device to the drivers. At the Datalink, the MAC address is removed. At the Network Protocol Layer, the IP address is verified and then removed from the packet. For this reason, if an application requires access to the client IP, it needs to be stored at the Application Layer, the best example here is in HTTP requests and the X-Forwarded-For header. Now the socket is determined from the TCP data and removed. The packet is then forwarded to the client application that creates that socket. The client reads it and processes the response data. In this case, the socket ID was random, corresponding to the cURL process. All packets are sent to cURL and pieced together into one HTTP response. If we were to use the -O output option, it would have been saved to a file otherwise cURL outputs the response to the terminal’s standard out.

Whew, that is a mouthful, fifty pages and fifty years of networking condensed into two paragraphs!

The basics of networking we have reviewed are just the beginning but are required knowledge if you want to run Kubernetes Clusters and Networks at scale.

Conclusion

The http transactions modeled in this chapter happen every millisecond, globally, all day on the Internet and Datacenter networks. This is the type of scale that the Kubernetes networks api’s help developers abstract away into simple yaml. Understanding the scale of the problem is our first in step mastering the management of the Kubernetes network. By taking our simple example of the golang webserver and learning first principles of networking we can begin to wrangle the packets flowing into and out of our clusters.

So far, we have covered:

  • History of Networking

  • OSI Model

  • TCP/IP

Throughout this chapter, we have discussed many things related to network but only those needed to learn about the Kubernetes abstractions it puts in place. There are several O’Reilly books out on the TCP/IP, TCP/IP Network Administration (3rd Edition; O’Reilly Networking) by Craig Hunt is a great in-depth read on all aspects of TCP.

We have discussed how networking evolved, walked through the OSI model, translated it to the TCP/IP stack, and with that stack completed an example HTTP request. In the next chapter, we walk through how this is implemented for the client and server with Linux Networking.

1 In Enders Game, they use the Ansible network to communicate across the Galaxy instantly. Philotic Parallax Instantaneous Communicator is the official name of the Ansible network. .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset