Chapter 12: Implementing Container Networking Concepts

Container network isolation leverages network namespaces to provide separate network stacks for each container. Without a container runtime, managing network interfaces across multiple namespaces would be complex. Podman provides flexible network management that allows users to customize how containers communicate with external containers and other containers inside the same host.

In this chapter, we will learn about the common configuration practices for managing container networking, along with the differences between rootless and rootfull containers.

In this chapter, we're going to cover the following main topics:

  • Container networking and Podman setup
  • Interconnecting two or more containers
  • Exposing containers outside our underlying host
  • Rootless container network behavior

Technical requirements

To complete this chapter, you will need a machine with a working Podman installation. As we mentioned in Chapter 3, Running the First Container, all the examples in this book can be executed on a Fedora 34 system or later but can be reproduced on your operating system (OS) of choice. The examples in this chapter will be related to both Podman v3.4.z and Podman v4.0.0 since they provide different network implementations.

A good understanding of the topics that were covered in Chapter 4, Managing Running Containers, Chapter 5, Implementing Storage for the Container's Data, and Chapter 9, Pushing Images to a Container Registry, will help you grasp the container networking topics we'll be covering.

You must also have a good understanding of basic networking concepts to understand topics such as routing, the IP protocol, DNS, and firewalling.

Container networking and Podman setup

In this section, we'll cover Podman's networking implementation and how to configure networks. Podman 4.0.0 introduced an important change to the network stack. However, Podman 3 is still widely used in the community. For this reason, we will cover both implementations.

Podman 3 leverages the Container Network Interface (CNI) to manage local networks that are created on the host. The CNI provides a standard set of specifications and libraries to create and configure plugin-based network interfaces in a container environment.

CNI specifications were created for Kubernetes to provide a network configuration format that's used by the container runtime to set up the defined plugins, as well as an execution protocol between plugin binaries and runtimes. The great advantage of this plugin-based approach is that vendors and communities can develop third-party plugins that satisfy the CNI's specifications.

The Podman 4 network stack is based on a brand new project called Netavark, a container-native networking implementation completely written in Rust and designed to work with Podman. Rust is a great programming language for developing system and network components thanks to its efficient memory management and high performance, similar to the C programming language. Netavark provides better support for dual-stack networking (IPv4/IPv6) and inter-container DNS resolution, along with a tighter bond with the Podman project development roadmap.

Important Note

Users upgrading from Podman 3 to Podman 4 will continue to use CNI by default and preserve their previous configuration. New Podman 4 installations will use Netavark by default. Users can revert to the CNI network backend by upgrading the network_backend field in the /usr/share/containers/containers.conf file.

In the next subsection, we'll focus on the CNI configuration that's used by Podman 3 to orchestrate container networking.

CNI configuration quick start

A typical CNI configuration file defines a list of plugins and their related configuration. The following example shows the default CNI configuration of a fresh Podman installation on Fedora:

Chapter12/podman_cni_conf.json

  "cniVersion": "0.4.0",

  "name": "podman",

  "plugins": [

    {

      "type": "bridge",

      "bridge": "cni-podman0",

      "isGateway": true,

      "ipMasq": true,

      "hairpinMode": true,

      "ipam": {

        "type": "host-local",

        "routes": [{ "dst": "0.0.0.0/0" }],

        "ranges": [

          [

            {

              "subnet": "10.88.0.0/16",

              "gateway": "10.88.0.1"

            }

          ]

        ]

      }

    },

    {

      "type": "portmap",

      "capabilities": {

        "portMappings": true

      }

    },

    {

      "type": "firewall"

    },

    {

      "type": "tuning"

    }

  ]

}

As we can see, the plugins list in this file contains a set of plugins that are used by the runtime to orchestrate container networking.

The CNI community curates a repository of reference plugins that can be used by container runtimes. CNI reference plugins are organized into interface-creating, IP address management (IPAM), and Meta plugins. Interface-creating plugins can make use of IPAM and Meta plugins.

The following non-exhaustive list describes the most commonly used interface-creating plugins:

  • bridge: This plugin creates a dedicated Linux bridge on the host for the network. Container interfaces are attached to the managed bridge to communicate between each other and with the external systems. This plugin is currently supported by Podman and by the podman network CLI tools and is the default interface-creating plugin that's configured when Podman is installed or a new network is created.
  • ipvlan: This plugin allows you to attach an IPVLAN interface to the container. The IPVLAN solution is an alternative to the traditional Linux bridge networking solution for containers, where a single parent interface is shared across multiple sub-interfaces, each with an IP address. This plugin is currently supported by Podman but you can still manually create and edit the CNI configuration file if necessary.
  • macvlan: This plugin allows a MACVLAN configuration, which is an approach similar to IPVLAN with one main difference: in this configuration, each container sub-interface also gets a MAC address. This plugin is currently supported by Podman and by the podman network CLI tools.
  • host-device: This plugin allows you to directly pass an existing interface into a container. This is currently not supported by Podman.

CNI IPAM plugins are related to the IP address management inside containers. There are only three reference IPAM plugins:

  • dhcp: This plugin lets you execute a daemon on the host that manages the dhcp leases on behalf of the running containers. It also implies that a running dhcp server is already running on the host network.
  • host-local: This plugin is used to allocate IP addresses to containers using a defined address range. The allocation data is stored in the host filesystem. It is optimal for local container execution and is the default IPAM plugin that's used by Podman in the network bridge.
  • static: This is a basic plugin that manages a discrete list of static addresses that are assigned to containers.

NI Meta plugins are used to configure specific behaviors in the host, such as tuning, firewall rules, and port mapping, and are executed as chained plugins along with the interface-creating plugins. The current Meta plugins that are maintained in the reference plugins repository are as follows:

  • portmap: This plugin is used to manage port mapping between the container and the host. It applies configuration using the host firewall (iptables) and is responsible for creating Source NAT (SNAT) and Destination Nat (DNAT) rules. This plugin is enabled by default in Podman.
  • firewall: This plugin configures firewall rules to allow container ingress and egress traffic. It's enabled by default in Podman.
  • tuning: This plugin customizes system tuning (using sysctl parameters) and interface attributes in the network namespace. It's enabled by default in Podman.
  • bandwidth: This plugin can be used to configure traffic rate limiting on containers using the Linux traffic control subsystem.
  • sbr: This plugin is used to configure source-based routing (SBR) on interfaces.

    Important Note

    On a Fedora system, all the CNI plugin binaries are located in the /usr/libexec/cni folder and are provided by the containernetworking-plugins package, installed as a Podman dependency.

Going back to the CNI configuration example, we can see that the default Podman configuration uses a bridge plugin with host-local IP address management and that the portmap, tuning, and firewall plugins are chained together with it.

In the default network that was created for Podman, the subnet that's been allocated for container networking is 10.88.0.0/16 and the bridge, called cni-podman0, acts as the default gateway to containers on 10.88.0.1, implying that all outbound traffic from a container is directed to the bridge's interface.

Important Note

This configuration is applied to rootfull containers only. Later in this chapter, we'll learn that Podman uses a different networking approach for rootless containers to overcome the user's limited privileges. We will see that this approach has many limitations on host interfaces and IP address management.

Now, let's see what happens on the host when a new rootfull container is created.

Podman CNI walkthrough

In this subsection, we will investigate the most peculiar network events that occur when a new container is created when CNI is used as a network backend.

Important Note

All the examples in this subsection are executed as the root user. Ensure that you clean up the existing running containers to have a clearer view of the network interfaces and firewall rules.

We will try to run an example using the Nginx container and map its default internal port, 80/tcp, to the host port, 8080/tcp.

Before we begin, we want to verify the current host's IP configuration:

# ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group

default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000

    link/ether 52:54:00:a9:ce:df brd ff:ff:ff:ff:ff:ff

    altname enp0s5

    altname ens5

    inet 192.168.121.189/24 brd 192.168.121.255 scope global dynamic noprefixroute eth0

       valid_lft 3054sec preferred_lft 3054sec

    inet6 fe80::2fb:9732:a0d9:ac70/64 scope link noprefixroute        valid_lft forever preferred_lft forever

3: cni-podman0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000

    link/ether de:52:45:ae:1a:7f brd ff:ff:ff:ff:ff:ff

    inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0

       valid_lft forever preferred_lft forever

    inet6 fe80::dc52:45ff:feae:1a7f/64 scope link

       valid_lft forever preferred_lft forever

Along with the host's main interface, eth0, we can see a cni-podman0 bridge interface with an address of 10.88.0.1/16. Also, notice that the bridge's state is set to DOWN.

Important

If the host that's being used for the test is a fresh install and Podman has never been executed before, the cni-podman0 bridge interface will not be listed. This is not a problem – it will be created when a rootfull container is created for the first time.

If no other container is running on the host, we should see no interface attached to the virtual bridge. To verify this, we are going to use the bridge link show command, whose output is expected to be empty:

# bridge link show cni-podman0

Looking at the firewall rules, we do not expect to see rules related to containers in the filter and nat tables:

# iptables -L

# iptables -L -t nat

Important Note

The output of the preceding commands has been omitted for the sake of brevity, but it is worth noting that the filter table should already contain two CNI-related chains named CNI-ADMIN and CNI-FORWARD.

Finally, we want to inspect the routing rules for the cni-podman0 interface:

# ip route show dev cni-podman0

10.88.0.0/16 proto kernel scope link src 10.88.0.1 linkdown

This command says that all traffic going to the 10.88.0.0/16 network goes through the cni-podman0 interface.

Let's run our Nginx container and see what happens to the network interfaces, routing, and firewall configuration:

# podman run -d -p 8080:80

  --name net_example docker.io/library/nginx

The first and most interesting event is a new network interface being created, as shown in the output of the ip addr show command:

# ip addr show

[...omitted output...]

3: cni-podman0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

    link/ether de:52:45:ae:1a:7f brd ff:ff:ff:ff:ff:ff

    inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0

       valid_lft forever preferred_lft forever

    inet6 fe80::dc52:45ff:feae:1a7f/64 scope link

       valid_lft forever preferred_lft forever

5: vethcf8b2132@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman0 state UP group default

    link/ether b6:4c:1d:06:39:5a brd ff:ff:ff:ff:ff:ff link-netns cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d

    inet6 fe80::90e3:98ff:fe6a:acff/64 scope link

       valid_lft forever preferred_lft forever

This new interface is part of a veth pair (see man 4 veth), a couple of virtual Ethernet devices that act like a local tunnel. Veth pairs are native Linux kernel virtual interfaces that don't depend on a container runtime and can be applied to use cases that go beyond container execution.

The interesting part of veth pairs is that they can be spawned across multiple network namespaces and that a packet that's sent to one side of the pair is immediately received on the other side.

The vethcf8b2132@if2 interface is linked to a device that resides in a network namespace named cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d. Since Linux offers us the option to inspect network namespaces using the ip netns command, we can check if the namespace exists and inspect its network stack:

# ip netns

cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d (id: 0)

Hint

When a new network namespace is created, a file with the same name under /var/run/netns/ is created. This file has also the same inode number that's pointed to by the symlink under /proc/<PID>/ns/net. When the file is opened, the returned file descriptor gives access to the namespace.

The preceding command confirms that the network namespace exists. Now, we want to inspect the network interfaces that have been defined inside it:

# ip netns exec cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default

    link/ether fa:c9:6e:5c:db:ad brd ff:ff:ff:ff:ff:ff link-netnsid 0

    inet 10.88.0.3/16 brd 10.88.255.255 scope global eth0

       valid_lft forever preferred_lft forever

    inet6 fe80::f8c9:6eff:fe5c:dbad/64 scope link

       valid_lft forever preferred_lft forever

Here, we executed an ip addr show command that's nested inside the ip netns exec command. The output shows us an interface that is on the other side of our veth pair. This also tells us something valuable: the container's IPv4 address, set to 10.88.0.3.

Hint

If you're curious, the container IP configuration, when using Podman's default network with the host-local IPAM plugin, is persisted to the /var/lib/cni/networks/podman folder. Here, a file named after the assigned IP address is created and written with the container-generated ID.

If a new network is created and used by a container, its configuration will be persisted in the /var/lib/cni/networks/<NETWORK_NAME> folder.

We can also inspect the container's routing tables:

# ip netns exec cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d ip route

default via 10.88.0.1 dev eth0

10.88.0.0/16 dev eth0 proto kernel scope link src 10.88.0.3

All the outbound traffic that's directed to the external networks will go through the 10.88.0.1 address, which has been assigned to the cni-podman0 bridge.

When a new container is created, the firewall and portmapper CNI plugins apply the necessary rules in the host filter and NAT tables. In the following code, we can see the rules that have been applied to the container IP address in the nat table, where SNAT, DNAT, and masquerading rules have been applied:

# iptables -L -t nat -n | grep -B4 10.88.0.3

Chain POSTROUTING (policy ACCEPT)

target     prot opt source               destination        

CNI-HOSTPORT-MASQ  all  --  0.0.0.0/0            0.0.0.0/0            /* CNI portfwd requiring masquerade */

CNI-fb51a7bfa5365a8a89e764fd  all  --  10.88.0.3            0.0.0.0/0            /* name: "podman" id: "a5054cca3436a7bc4dbf78fe4b901ceef0569ced24181d2e7b118232123a5f e3" */

--

Chain CNI-DN-fb51a7bfa5365a8a89e76 (1 references)

target     prot opt source               destination        

CNI-HOSTPORT-SETMARK  tcp  --  10.88.0.0/16         0.0.0.0/0            tcp dpt:8080

CNI-HOSTPORT-SETMARK  tcp  --  127.0.0.1            0.0.0.0/0            tcp dpt:8080

DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:10.88.0.3:80

The bolder line shows a DNAT rule in a custom chain named CNI-DN-fb51a7bfa5365a8a89e76. This rule says that all the TCP packets whose destination is the 8080/tcp port on the host should be redirected to the 10.88.0.3:80 port, which is the network socket that's exposed by the container. This rule matches the–p 8080:80 option that we passed during container creation.

But how does the container communicate with the external world? Let's inspect the cni-podman0 bridge again while looking for notable changes:

# bridge link show cni-podman0

5: vethcf8b2132@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master cni-podman0 state forwarding priority 32 cost 2

The aforementioned interface is connected to the virtual bridge, which also happens to have an IP address assigned to it (10.88.0.1) that acts as the default gateway for all the containers.

Let's try to trace the path of an ICMP packet from the container to a well-known host, 1.1.1.1 (Cloudflare public DNS). To do so, we must run the traceroute utility from the container network's namespace using the ip netns exec command:

# ip netns exec cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d traceroute -I 1.1.1.1

traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets

1  _gateway (10.88.0.1)  0.071 ms  0.025 ms  0.003 ms

2  192.168.121.1 (192.168.121.1)  0.206 ms  0.195 ms  0.189 ms

3  192.168.1.1 (192.168.1.1)  5.326 ms  5.323 ms  5.319 ms

4  192.168.50.6 (192.168.50.6)  17.598 ms  17.595 ms  17.825 ms

5  192.168.50.5 (192.168.50.5)  17.821 ms  17.888 ms  17.882 ms

6  10.177.21.173 (10.177.21.173)  17.998 ms  17.772 ms  24.777 ms

7  185.210.48.42 (185.210.48.42)  25.963 ms  7.604 ms  7.702 ms

8  185.210.48.43 (185.210.48.43)  7.906 ms  10.344 ms  10.984 ms

9  185.210.48.77 (185.210.48.77)  12.212 ms  12.030 ms  12.983 ms

10  1.1.1.1 (1.1.1.1)  12.524 ms  12.160 ms  12.649 ms

Important Note

The traceroute program could be installed on the host by default. To install it on Fedora, run the sudo dnf install traceroute command.

The preceding output shows a series of hops, which are a way to count the number of routes that a packet must pass to reach a destination. In this example, we have a total of 10 hops, which is necessary to reach the target node. The first hop goes through the container's default gateway (10.88.0.1), moving to the host's network stack.

The second hop is the host's default gateway (192.168.121.1), which is assigned to a virtual bridge in a hypervisor and connected to our lab's host VM.

The third hop is a private network default gateway (192.168.1.1) that's assigned to a physical router that's connected to the lab's hypervisor network.

This demonstrates that all the traffic goes through the cni-podman0 bridge interface.

We can create more than one network, either using Podman native commands or our favorite editor to manage JSON files directly.

Now that we've explored CNI's implementation and configuration details, let's look at the new Netavark implementation in Podman 4.

Netavark configuration quick start

Podman's 4.0.0 release introduced Netavark as the default network backend. The advantages of Netavark are as follows:

  • Support for dual IPv4/IPv6 stacks
  • Support for DNS native resolution using the aardvark-dns companion project
  • Support for rootless containers
  • Support for different firewall implementations, including iptables, firewalld, and nftables

The configuration files that are used by Netavark are not very different from the ones that were shown for CNI. Netavark still uses JSON format to configure networks; files are stored under the /etc/containers/networks path for rootfull containers and the ~/.local/share/containers/storage/networks path for rootless containers.

The following configuration file shows an example network that's been created and managed under Netavark:

[

     {

          "name": "netavark-example",

          "id": "d98700453f78ea2fdfe4a1f77eae9e121f3cbf4b6160dab89edf9ce23c b924d7",

          "driver": "bridge",

          "network_interface": "podman1",

          "created": "2022-02-17T21:37:59.873639361Z",

          "subnets": [

               {

                    "subnet": "10.89.4.0/24",

                    "gateway": "10.89.4.1"

               }

          ],

          "ipv6_enabled": false,

          "internal": false,

          "dns_enabled": true,

          "ipam_options": {

               "driver": "host-local"

          }

     }

]

The first noticeable element is the more compact size of the configuration file compared to a CNI configuration. The following fields are defined:

  • name: The name of the network.
  • id: The unique network ID.
  • driver: This specifies the kind of network driver that's being used. The default is bridge. Netavark also supports MACVLAN drivers.
  • network_interface: This is the name of the network interface associated with the network. If bridge is the configured driver, this will be the name of the Linux bridge. In the preceding example, a bridge is created called podman1.
  • created: The network creation timestamp.
  • subnets: This provides a list of subnet and gateway objects. Subnets are assigned automatically. However, when you're creating a new network with Podman, users can provide a custom CIDR. Netavark allows you to manage multiple subnets and gateways on a network.
  • ipv6_enabled: Native support for IPv6 in Netavark can be enabled or disabled with this boolean.
  • internal: This boolean is used to configure a network for internal use only and to block external routing.
  • dns_enabled: This boolean enables DNS resolution for the network and is served by the aardvark-dns daemon.
  • ipam_options: This object defines a series of ipam parameters. In the preceding example, the only option is the kind of IPAM driver, host-local, which behaves in a way similar to the CNI host-local plugin.

The default Podman 4 network, named podman, implements a bridge driver (the bridge's name is podman0). Here, DNS support is disabled, similar to what happens with the default CNI configuration.

Netavark is also an executable binary that's installed by default in the /usr/libexec/podman/netavark path. It has a simple command-line interface (CLI) that implements the setup and teardown commands, applying the network configuration to a given network namespace (see man netavark).

Now, let's look at the effects of creating a new container with Netavark.

Podman Netavark walkthrough

Like CNI, Netavark manages the creation of network configurations in the container network namespace and the host network namespace, including the creation of veth pairs and the Linux bridge that's defined in the config file.

Before the first container is created in the default Podman network, no bridges are created and the host interfaces are the only ones available, along with the loopback interface:

# ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000

    link/ether 52:54:00:9a:ea:f4 brd ff:ff:ff:ff:ff:ff

    altname enp0s5

    altname ens5

    inet 192.168.121.15/24 brd 192.168.121.255 scope global dynamic noprefixroute eth0

       valid_lft 3293sec preferred_lft 3293sec

    inet6 fe80::d0fb:c0d1:159e:2d54/64 scope link noprefixroute

       valid_lft forever preferred_lft forever

Let's run a new Nginx container and see what happens:

# podman run -d -p 8080:80

  --name nginx-netavark

  docker.io/library/nginx

When the container is started, the podman0 bridge and a veth interface appear:

# ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000

    link/ether 52:54:00:9a:ea:f4 brd ff:ff:ff:ff:ff:ff

    altname enp0s5

    altname ens5

    inet 192.168.121.15/24 brd 192.168.121.255 scope global dynamic noprefixroute eth0

       valid_lft 3140sec preferred_lft 3140sec

    inet6 fe80::d0fb:c0d1:159e:2d54/64 scope link noprefixroute

       valid_lft forever preferred_lft forever

3: veth2772d0ea@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master podman0 state UP group default qlen 1000

    link/ether fa:a3:31:63:21:60 brd ff:ff:ff:ff:ff:ff link-netns netns-61a5f9f9-9dff-7488-3922-165cdc6cd320

    inet6 fe80::f8a3:31ff:fe63:2160/64 scope link

       valid_lft forever preferred_lft forever

8: podman0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

    link/ether ea:b4:9d:dd:2c:d1 brd ff:ff:ff:ff:ff:ff

    inet 10.88.0.1/16 brd 10.88.255.255 scope global podman0

       valid_lft forever preferred_lft forever

    inet6 fe80::24ec:30ff:fe1a:2ca8/64 scope link

       valid_lft forever preferred_lft forever

There are no particular changes for the end user in terms of network namespaces, mixing context between version management, firewall rules, or routing compared to the CNI walkthrough provided previously.

Again, a network namespace in the host is created for the nginx-netavark container. Let's inspect the contents of the network namespace:

# ip netns exec netns-61a5f9f9-9dff-7488-3922-165cdc6cd320 ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000    link/ether ae:9b:7f:07:3f:16 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    inet 10.88.0.4/16 brd 10.88.255.255 scope global eth0

       valid_lft forever preferred_lft forever

    inet6 fe80::ac9b:7fff:fe07:3f16/64 scope link

       valid_lft forever preferred_lft forever

Once again, it is possible to find the internal IP address that's been assigned to the container.

If the container is executed in rootless mode, the bridge and veth pairs will be created in a rootless network namespace.

Important Note

The rootless network namespace can be inspected in Podman 4 with the podman unshare --rootless-netns command.

Users running Podman 3 and CNI can use the --rootless-cni option to obtain the same results.

In the next subsection, we will learn how to manage and customize container networks with the CLI tools that are offered by Podman.

Managing networks with Podman

The podman network command provides the necessary tools for managing container networks. The following subcommands are available:

  • create: Creates a new network
  • connect: Connects to a given network
  • disconnect: Disconnects from a network
  • exists: Checks if a network exists
  • inspect: Dumps the CNI configuration of a network
  • prune: Removes unused networks
  • reload: Reloads container firewall rules
  • rm: Removes a given network

In this section, you will learn how to create a new network and connect a container to it. For Podman 3, all the generated CNI config files are written to the /etc/cni/net.d folder in the host.

For Podman 4, all the generated Netavark config files for rootfull networks are written to /etc/containers/networks, while the config files for rootless networks are written to ~/.local/share/containers/storage/networks.

The following command creates a new network called example1:

# podman network create

  --driver bridge

  --gateway "10.89.0.1"

  --subnet "10.89.0.0/16" example1

Here, we provided subnet and gateway information, along with the driver type that corresponds to the CNI interface-creating plugin. The resulting network configuration is written in the aforementioned paths according to the kind of network backend and can be inspected with the podman network inspect command.

The following output shows the configuration for a CNI network backend:

# podman network inspect example1

[

    {

        "cniVersion": "0.4.0",

        "name": "example1",

        "plugins": [

            {

                "bridge": "cni-podman1",

                "hairpinMode": true,

                "ipMasq": true,

                "ipam": {

                    "ranges": [

                        [

                            {

                                "gateway": "10.89.0.1",

                                "subnet": "10.89.0.0/16"

                            }

                        ]

                    ],

                    "routes": [

                        {

                            "dst": "0.0.0.0/0"

                        }

                    ],

                    "type": "host-local"

                },

                "isGateway": true,

                "type": "bridge"

            },

            {

                "capabilities": {

                    "portMappings": true

                },

                "type": "portmap"

            },

            {

                "backend": "",

                "type": "firewall"

            },

            {

                "type": "tuning"

            },

            {

                "capabilities": {

                    "aliases": true

                },

                "domainName": "dns.podman",

                "type": "dnsname"

            }

        ]

    }

]

The new network CNI configuration shows that a bridge called cni-podman1 will be created for this network and that containers will allocate IPs from the 10.89.0.0/16 subnet.

The other fields of the configuration are pretty similar to the default one, except for the dnsname plugin (project's repository: https://github.com/containers/dnsname), which is used to enable internal container name resolution. This feature provides an advantage in cross-container communication that we will look at in the next subsection.

The following output shows the generated configuration for a Netavark network backend:

# podman network inspect example1

[

     {

          "name": "example1",

          "id": "a8ca04a41ef303e3247097b86d9048750e5f1aa819ec573b0e5f78e3cc8a 971b",

          "driver": "bridge",

          "network_interface": "podman1",

          "created": "2022-02-18T17:56:28.451701452Z",

          "subnets": [

               {

                    "subnet": "10.89.0.0/16",

                    "gateway": "10.89.0.1"

               }

          ],

          "ipv6_enabled": false,

          "internal": false,

          "dns_enabled": true,

          "ipam_options": {

               "driver": "host-local"

          }

     }

]

Notice that the bridge naming convention with Netavark is slightly different since it uses the podmanN pattern, with N >= 0.

To list all the existing networks, we can use the podman network ls command:

# podman network ls

NETWORK ID   NAME     VERSION  PLUGINS

2f259bab93aa podman   0.4.0    bridge,portmap,firewall,tuning

228b48a56dbc example1 0.4.0    bridge,portmap,firewall,tuning,dnsname

The preceding output shows the name, ID, CNI version, and active plugins of each active network.

On Podman 4, the output is slightly more compact since there are no CNI plugins to be shown:

# podman network ls

NETWORK ID    NAME        DRIVER

a8ca04a41ef3  example1    bridge

2f259bab93aa  podman      bridge

Now, it's time to spin up a container that's attached to the new network. The following code creates a PostgreSQL database that's attached to the example1 network:

# podman run -d -p 5432:5432

  --network example1

  -e POSTGRES_PASSWORD=password

  --name postgres

  docker.io/library/postgres

533792e9522fc65371fa6d694526400a3a01f29e6de9b2024e84895f354e d2bb

The new container receives an address from the 10.89.0.0/16 subnet, as shown by the podman inspect command:

# podman inspect postgres --format '{{.NetworkSettings.Networks.example1.IPAddress}}'

10.89.0.3

When we're using the CNI network backend, we can double-check this information by looking at the contents of the new /var/lib/cni/networks/example1 folder:

# ls -al /var/lib/cni/networks/example1/

total 20

drwxr-xr-x. 2 root root 4096 Jan 23 17:26 .

drwxr-xr-x. 5 root root 4096 Jan 23 16:22 ..

-rw-r--r--. 1 root root   70 Jan 23 16:26 10.89.0.3

-rw-r--r--. 1 root root    9 Jan 23 16:57 last_reserved_ip.0

-rwxr-x---. 1 root root    0 Jan 23 16:22 lock

Looking at the content of the 10.89.0.3 file, we find the following:

# cat /var/lib/cni/networks/example1/10.89.0.3

533792e9522fc65371fa6d694526400a3a01f29e6de9b2024e84895f354 ed2bb

The file holds the container ID of our postgres container, which is used to track the mapping with the assigned IP address. As we mentioned previously, this behavior is managed by the host-local plugin, the default IPAM choice for Podman networks.

Important Note

The Netavark network backend tracks IPAM configuration in the /run/containers/networks/ipam.db file for rootfull containers.

We can also see that a new Linux bridge has been created (notice the cni- prefix that is used for CNI network backends):

# ip addr show cni-podman1

8: cni-podman1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

    link/ether 56:ed:1d:a9:53:54 brd ff:ff:ff:ff:ff:ff

    inet 10.89.0.1/16 brd 10.89.255.255 scope global cni-podman1

       valid_lft forever preferred_lft forever

    inet6 fe80::54ed:1dff:fea9:5354/64 scope link

       valid_lft forever preferred_lft forever

The new device is connected to one peer of the PostgreSQL container's veth pair:

# bridge link show

10: vethf03ed735@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master cni-podman1 state forwarding priority 32 cost 2

20: veth23ee4990@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master cni-podman0 state forwarding priority 32 cost 2

Here, we can see that vethf03ed735@eth0 is attached to the cni-podman1 bridge. The interface has the following configuration:

# ip addr show vethf03ed735

10: vethf03ed735@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman1 state UP group default

    link/ether 86:d1:8c:c9:8c:2b brd ff:ff:ff:ff:ff:ff link-netns cni-77bfb1c0-af07-1170-4cc8-eb56d15511ac

    inet6 fe80::f889:17ff:fe83:4da2/64 scope link

       valid_lft forever preferred_lft forever

The preceding output also shows that the other side of the veth pair is located in the container's network namespace – that is, cni-77bfb1c0-af07-1170-4cc8-eb56d15511ac. We can inspect the container's network configuration and confirm the IP address that's been allocated from the new subnet:

# ip netns exec cni-77bfb1c0-af07-1170-4cc8-eb56d15511ac ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default

    link/ether ba:91:9e:77:30:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    inet 10.89.0.3/16 brd 10.89.255.255 scope global eth0

       valid_lft forever preferred_lft forever

    inet6 fe80::b891:9eff:fe77:30a1/64 scope link

       valid_lft forever preferred_lft forever

Important Note

The network namespace naming pattern for the Netavark backend in Podman 4 is netns-<UID>.

It is possible to connect a running container to another network without stopping and restarting it. In this way, the container will keep an interface attached to the original network and a second interface, attached to the new network, will be created. This feature, which is useful for use cases such as reverse proxies, can be achieved with the podman network connect command. Let's try to run a new net_example container:

# podman run -d -p 8080:80 --name net_example docker.io/library/nginx

# podman network connect example1 net_example

To verify that the container has been attached to the new network, we can run the podman inspect command and look at the networks:

# podman inspect net_example

[...omitted output...]

            "Networks": {

                "example1": {

                    "EndpointID": "",

                    "Gateway": "10.89.0.1",

                    "IPAddress": "10.89.0.10",

                    "IPPrefixLen": 16,

                    "IPv6Gateway": "",

                    "GlobalIPv6Address": "",

                    "GlobalIPv6PrefixLen": 0,

                    "MacAddress": "fa:41:66:0a:25:45",

                    "NetworkID": "example1",

                    "DriverOpts": null,

                    "IPAMConfig": null,

                    "Links": null

                },

                "podman": {

                    "EndpointID": "",

                    "Gateway": "10.88.0.1",

                    "IPAddress": "10.88.0.7",

                    "IPPrefixLen": 16,

                    "IPv6Gateway": "",

                    "GlobalIPv6Address": "",

                    "GlobalIPv6PrefixLen": 0,

                    "MacAddress": "ba:cd:eb:8d:19:b5",

                    "NetworkID": "podman",

                    "DriverOpts": null,

                    "IPAMConfig": null,

                    "Links": null

                }

            }

[…omitted output...]

Here, we can see that the container now has two interfaces attached to the podman and example1 networks, with IP addresses allocated from each network's subnet.

To disconnect a container from a network, we can use the podman network disconnect command:

# podman network disconnect example1 net_example

When a network is not necessary anymore and is disconnected from running containers, we can delete it with the podman network rm command:

# podman network rm example1

example1

The command's output shows the list of removed networks. Here, the network's CNI configuration is removed from the host's /etc/cni/net.d directory.

Important Note

If the network has associated containers that are either running or have been stopped, the previous message will fail with Error: "example1" has associated containers with it. To work around this issue, remove or disconnect the associated containers before using the command.

The podman network rm command is useful when we need to remove a specific network. To remove all unused networks, the podman network prune command is a better choice:

# podman network prune

WARNING! This will remove all networks not used by at least one container.

Are you sure you want to continue? [y/N] y

example2

db_network

In this section, we learned about the CNI specification and how Podman leverages its interface to simplify container networking. In a multi-tier or microservices scenario, we need to let containers communicate with each other. In the next section, we will learn how to manage container-to-container communication.

Interconnecting two or more containers

Using our knowledge from the previous section, we should be aware that two or more containers that have been created inside the same network can reach each other on the same subnet without the need for external routing.

At the same time, two or more containers that belong to different networks will be able to reach each other on different subnets by routing packets through their networks.

To demonstrate this, let's create a couple of busybox containers in the same default network:

# podman run -d --name endpoint1

  --cap-add=net_admin,net_raw busybox /bin/sleep 10000

# podman run -d --name endpoint2

  --cap-add=net_admin,net_raw busybox /bin/sleep 10000

In our lab, the two containers have 10.88.0.14 (endpoint1) and 10.88.0.15 (endpoint2) as their addresses. These two addresses are subject to change and can be collected using the methods illustrated previously with the podman inspect or the nsenter commands.

Regarding capabilities customization, we added the CAP_NET_ADMIN and CAP_NET_RAW capabilities to let the containers run commands such as ping or traceroute seamlessly.

Let's try to run a traceroute command from endpoint1 to endpoint2 to see the path of a packet:

# podman exec -it endpoint1 traceroute 10.88.0.14

traceroute to 10.88.0.14 (10.88.0.14), 30 hops max, 46 byte packets

1  10.88.0.14 (10.88.0.14)  0.013 ms  0.004 ms  0.002 ms

As we can see, the packet stays on the internal network and reaches the node without additional hops.

Now, let's create a new network, net1, and connect a container called endpoint3 to it:

# podman network create --driver bridge --gateway "10.90.0.1" --subnet "10.90.0.0/16" net1

# podman run -d --name endpoint3 --network=net1 --cap-add=net_admin,net_raw busybox /bin/sleep 10000

The container in our lab gets an IP address of 10.90.0.2. Let's see the network path from endpoint1 to endpoint3:

# podman exec -it endpoint1 traceroute 10.90.0.2

traceroute to 10.90.0.2 (10.90.0.2), 30 hops max, 46 byte packets

1  host.containers.internal (10.88.0.1)  0.003 ms  0.001 ms  0.006 ms

2  10.90.0.2 (10.90.0.2)  0.001 ms  0.002 ms  0.002 ms

This time, the packet has traversed the endpoint1 container's default gateway (10.88.0.1) and reached the endpoint3 container, which is routed from the host to the associated net1 Linux bridge.

Connectivity across containers in the same host is very easy to manage and understand. However, we are still missing an important aspect for container-to-container communication: DNS resolution.

Let's learn how to leverage this feature with Podman networks.

Container DNS resolution

Despite its many configuration caveats, DNS resolution is a very simple concept: a service is queried to provide the IP address associated with a given hostname. The amount of information that can be provided by a DNS server is far richer than this, but we want to focus on simple IP resolution in this example.

For example, let's imagine a scenario where a web application running on a container named webapp needs read/write access to a database running on a second container named db. DNS resolution enables webapp to query for the db container's IP address before contacting it.

Previously, we learned that Podman's default network does not provide DNS resolution, while new user-created networks have DNS resolution enabled by default. On a CNI network backend, the dnsname plugin automatically configures a dnsmasq service, which is started when containers are connected to the network, to provide DNS resolution. On a Netavark network backend, the DNS resolution is delivered by aarvark-dns.

To test this feature, we are going to reuse the students web application that we illustrated in Chapter 10, Troubleshooting and Monitoring Containers, since it provides an adequate client-server example with a minimal REST service and a database backend based on PostgreSQL.

Info

The source code is available in this book's GitHub repository at https://github.com/PacktPublishing/Podman-for-DevOps/tree/main/Chapter10/students.

In this example, the web application simply prints some output in JSON as the result of an HTTP GET that triggers a query to a PostgreSQL database. For our demonstration, we will run both the database and the web application on the same network.

First, we must create the PostgreSQL database pod while providing a generic username and password:

# podman run -d

   --network net1 --name db

   -e POSTGRES_USER=admin

   -e POSTGRES_PASSWORD=password

   -e POSTGRES_DB=students

   postgres

Next, we must restore the data from the SQL dump in the students folder to the database:

# cd Chapter10/students

# cat db.sql | podman exec -i db psql -U admin students

If you haven't already built it in the previous chapters, then you need to build the students container image and run it on the host:

# buildah build -t students .

# podman run -d

   --network net1

   -p 8080:8080

   --name webapp

   students

   students -host db -port 5432

   -username admin -password

Notice the highlighted part of the command: the students application accepts the -host, -port, -username, and -password options to customize the database's endpoints and credentials.

We did not provide any IP address in the host field. Instead, the Postgres container name, db, along with the default 5432 port, were used to identify the database.

Also, notice that the db container was created without any kind of port mapping: we expect to directly reach the database over the net1 container network, where both containers were created.

Let's try to call the students application API and see what happens:

# curl localhost:8080/students {"Id":10149,"FirstName":"Frank","MiddleName":"Vincent","LastName":"Zappa","Class":"3A","Course":"Composition"}

The query worked fine, meaning that the application successfully queried the database. But how did this happen? How did it resolve the container IP address by only knowing its name? In the next section, we'll look at the different behaviors on CNI and Netavark network backends.

DNS resolution on a CNI network backend

On Podman 3 or Podman 4 with a CNI backend, the dnsname plugin is enabled in the net1 network and a dedicated dnsmasq service is spawned that is in charge of resolving container names to their assigned IP addresses. Let's start by finding the container's IP addresses first:

# podman inspect db --format '{{.NetworkSettings.Networks.net1.IPAddress}}'

10.90.0.2

# podman inspect webapp --format '{{.NetworkSettings.Networks.net1.IPAddress}}'

10.90.0.3

We want to look for dnsmasq processes running on the system:

# ps aux | grep dnsmasq

root        2703  0.0  0.0  26436  2384 ?        S    16:16   0:00 /usr/sbin/dnsmasq -u root --conf-file=/run/containers/cni/dnsname/net1/dnsmasq.conf

root        5577  0.0  0.0   6140   832 pts/0    S+   22:00   0:00 grep --color=auto dnsmasq

The preceding output shows an instance of the dnsmasq process running with a config file that's been created under the /run/containers/cni/dnsname/net1/ directory. Let's inspect its contents:

# ls -al /run/containers/cni/dnsname/net1/

total 12

drwx------. 2 root root 120 Jan 25 16:16 .

drwx------. 3 root root  60 Jan 25 16:16 ..

-rw-r--r--. 1 root root  30 Jan 25 16:28 addnhosts

-rwx------. 1 root root 356 Jan 25 16:16 dnsmasq.conf

-rwxr-x---. 1 root root   0 Jan 25 16:16 lock

-rw-r--r--. 1 root root   5 Jan 25 16:16 pidfile

/run/containers/cni/dnsname/net1/dnsmasq.conf defines the dnsmasq configuration:

# cat /run/containers/cni/dnsname/net1/dnsmasq.conf

## WARNING: THIS IS AN AUTOGENERATED FILE

## AND SHOULD NOT BE EDITED MANUALLY AS IT

## LIKELY TO AUTOMATICALLY BE REPLACED.

strict-order

local=/dns.podman/

domain=dns.podman

expand-hosts

pid-file=/run/containers/cni/dnsname/net1/pidfile

except-interface=lo

bind-dynamic

no-hosts

interface=cni-podman1

addn-hosts=/run/containers/cni/dnsname/net1/addnhosts

The process listens on the cni-podman1 interface (the net1 network bridge, which has an IP address of 10.90.0.1) and is authoritative for the dns.podman domain. The host's records are kept in the /run/containers/cni/dnsname/net1/addnhosts file, which contains the following:

# cat /run/containers/cni/dnsname/net1/addnhosts

10.90.0.2  db

10.90.0.3  webapp

When a container in the net1 network attempts DNS resolution, it uses its /etc/resolv.conf file to find out the DNS server to direct the query to. The file's content in the webapp container is as follows:

# podman exec -it webapp cat /etc/resolv.conf

search dns.podman

nameserver 10.90.0.1

This shows that the container contacts the 10.90.0.1 address (which is also the container default gateway and the cni-podman1 bridge) to query hostname resolution.

The search domain allows processes to search for a Fully Qualified Domain Name (FQDN). In the preceding example, db.dns.podman would be resolved correctly by the DNS service. The search domain for a CNI network configuration can be customized by editing the related config file under /etc/cni/net.d/. The default configuration for the dnsname plugin in the net1 config is as follows:

{

         "type": "dnsname",

         "domainName": "dns.podman",

         "capabilities": {

            "aliases": true

         }

      }

When you update the domainName field to a new value, the changes are not effective immediately. To regenerate the updated dnsmasq.conf, all the containers in the network must be stopped to let the dnsname plugin clean up the current network configuration. When containers are restarted, the dnsmasq configuration is regenerated accordingly.

DNS resolution on a Netavark network backend

If the preceding example was executed on Podman 4 with a Netavark network backend, the aardvark-dns daemon would be responsible for container resolution in a similar way to dnsmasq.

The aardvark-dns project is a companion project of Netavark written in Rust. It is a lightweight authoritative DNS service that can work on both IPv4 A records and IPv6 AAAA records.

When a new network with DNS resolution enabled is created, a new aardvark-dns process is created, as shown in the following code:

# ps aux | grep aardvark-dns

root        9115  0.0  0.0 344732  2584 pts/0    Sl   20:15   0:00 /usr/libexec/podman/aardvark-dns --config /run/containers/networks/aardvark-dns -p 53 run

root       10831  0.0  0.0   6400  2044 pts/0    S+   23:36   0:00 grep --color=auto aardvark-dns

The process listens on port 53/udp of the host network namespace for rootfull containers and on port 53/udp of the rootless network namespace for rootless containers.

The output of the ps command also shows the default configuration path – the /run/containers/networks/aardvark-dns directory – where the aardvark-dns process stores the resolution configurations under different files, named after the associated network. For example, for the net1 network, we will find content similar to the following:

# cat /run/containers/networks/aardvark-dns/net1

10.90.0.1

dc7fff2ef78e99a2a1a3ea6e29bfb961fc07cd6cf71200d50761e25df30 11636 10.90.0.2  db,dc7fff2ef78e

10c7bbb7006c9b253f9ebe1103234a9af41dced8f12a6d94b7fc46a9a97 5d8cc 10.90.0.2  webapp,10c7bbb7006c

The file stores IPv4 addresses (and IPv6 addresses, if present) for every container. Here, we can see the containers' names and short IDs resolved to the IPv4 addresses.

The first line tells us the address where aardvark-dns is listening for incoming requests. Once again, it corresponds to the default gateway address for the network.

Connecting containers across the same network allows for fast and simple communication across different services running in separate network namespaces. However, there are use cases where containers must share the same network namespace. Podman offers a solution to achieve this goal easily: Pods.

Running containers inside a Pod

The concept of a Pod comes from the Kubernetes architecture. According to the official upstream documentation, "A Pod ... is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers."

A Pod is also the smallest deployable unit in Kubernetes scheduling. All the containers inside a Pod share the same network, UTC, IPC, and (optionally) PID namespace. This means that all the services running on the different containers can refer to each other as localhost, while external containers will continue to contact the Pod's IP address. A Pod receives one IP address that is shared across all the containers.

There are many adoption use cases. A very common one is sidecar containers: in this case, a reverse proxy or an OAuth proxy runs alongside the main container to provide authentication or service mesh functionalities.

Podman provides the basic tooling for manipulating Pods with the podman pod command. The following example shows how to create a basic Pod with two containers and demonstrates network namespace sharing across the two containers in the Pod.

Important Note

To understand the following example, stop and remove all the running containers and Pods and start with a clean environment.

podman pod create initializes a new, empty Pod from scratch:

# podman pod create --name example_pod

Important Note

When a new, empty Pod is created, Podman also creates an infra container, which is used to initialize the namespaces when the Pod is started. This container is based on the k8s.gcr.io/pause image for Podman 3 and a locally-built podman-pause image for Podman 4.

Now, we can create two basic busybox containers inside the Pod:

# podman create --name c1 --pod example_pod busybox sh -c 'sleep 10000'

# podman create --name c2 --pod example_pod busybox sh -c 'sleep 10000'

Finally, we can start the Pod (and its associated containers) with the podman pod start command:

# podman pod start example_pod

Here, we have a running Pod with two containers (plus an infra one) running. To verify its status, we can use the podman pod ps command:

# podman pod ps

POD ID        NAME         STATUS      CREATED        INFRA ID      # OF CONTAINERS

8f89f37b8f3b  example_pod  Degraded    8 minutes ago  95589171284a  4

With the podman pod top command, we can see the resources that are being consumed by each container in the Pod:

# podman pod top example_pod

USER        PID         PPID        %CPU        ELAPSED          TTY         TIME        COMMAND

root        1           0           0.000       10.576973703s  ?           0s          sleep 1000

0           1           0           0.000       10.577293395s  ?           0s          /catatonit -P

root        1           0           0.000       9.577587032s   ?           0s          sleep 1000

After creating the Pod, we can inspect the network's behavior. First, we will see that only one network namespace has been created in the system:

# ip netns

netns-17b9bb67-5ce6-d533-ecf0-9d7f339e6ebd (id: 0)

Let's check the IP configuration for this namespace and its related network stack:

# ip netns exec netns-17b9bb67-5ce6-d533-ecf0-9d7f339e6ebd ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default

    link/ether a6:1b:bc:8e:65:1e brd ff:ff:ff:ff:ff:ff link-netnsid 0

    inet 10.88.0.3/16 brd 10.88.255.255 scope global eth0

       valid_lft forever preferred_lft forever

    inet6 fe80::a41b:bcff:fe8e:651e/64 scope link

       valid_lft forever preferred_lft forever

To verify that the c1 and c2 containers share the same network namespace and are running with an IP address of 10.88.0.3, we can run the same ip addr show command inside the containers using the podman exec command:

# podman exec -it c1 ip addr show

# podman exec -it c2 ip addr show

These two containers are expected to return the same output as the netns-17b9bb67-5ce6-d533-ecf0-9d7f339e6ebd network namespace.

The example pod can be stopped and removed with the podman pod stop and podman pod rm commands, respectively:

# podman pod stop example_pod

# podman pod rm example_pod

We will cover pods in more detail in Chapter 14, Interacting with systemd and Kubernetes, where we will also discuss name resolution and multi-pod orchestration.

In this section, we focused on communication across two or more containers inside the same host or Pod, regardless of the number and type of networks involved. However, containers are a platform where you can run services that are generally accessed by the external world. For this reason, in the next section, we will investigate the best practices that can be applied to expose containers outside their hosts and make their services accessible to other clients/consumers.

Exposing containers outside our underlying host

Container adoption in an enterprise company or a community project could be a hard thing to do that could require time. For this reason, we may not have all the required services running as containers during our adoption journey. This is why exposing containers outside our underlying host could be a nice solution for interconnecting services that live in containers to services that run in the legacy world.

As we briefly saw earlier in this chapter, Podman uses two different networking stacks, depending on the container: rootless or rootfull.

Even though the underlying mechanism is slightly different, depending on if you are using a rootless or a rootfull container, Podman's command-line options for exposing network ports are the same for both container types.

Good to Know

Note that the example we are going to see in this section will be executed as a root user. This choice was necessary because the main objective of this section is to show you some of the firewall configurations that could be mandatory for exposing a container service to the outside world.

Exposing a container starts with Port Publishing activities. We'll learn what this is in the next section.

Port Publishing

Port Publishing consists of instructing Podman to create a temporary mapping between the container's ports and some random or custom host's ports.

The option to instruct Podman to publish a port is really simple – it consists of adding the -p or --publish option to the run command. Let's see how it works:

-p=ip:hostPort:containerPort

The previous option publishes a container's port, or range of ports, to the host. When we are specifying ranges for hostPort or containerPort, the number must be equal for both ranges.

We can even omit ip. In that case, the port will be bound on all the IPs of the underlying host. If we do not set the host port, the container's port will be randomly assigned a port on the host.

Let's look at an example of the port publishing option:

# podman run -dt -p 80:80/tcp docker.io/library/httpd

Trying to pull docker.io/library/httpd:latest...

Getting image source signatures

Copying blob 41c22baa66ec done  

Copying blob dcc4698797c8 done  

Copying blob d982c879c57e done  

Copying blob a2abf6c4d29d done  

Copying blob 67283bbdd4a0 done  

Copying config dabbfbe0c5 done  

Writing manifest to image destination

Storing signatures

ea23dbbeac2ea4cb6d215796e225c0e7c7cf2a979862838ef4299d410c90 ad44

As you can see, we have told Podman to run a container starting from the httpd base image. Then, we allocated a pseudo-tty (-t) in detached mode (-d) before setting the port mapping to bind the underlying host port, 80, to port 80 of the container.

Now, we can use the podman port command to see the actual mapping:

# podman ps

CONTAINER ID  IMAGE                           COMMAND           CREATED        STATUS            PORTS               NAMES

ea23dbbeac2e  docker.io/library/httpd:latest  httpd-foreground  3 minutes ago  Up 3 minutes ago  0.0.0.0:80->80/tcp  ecstatic_chaplygin

# podman port ea23dbbeac2e

80/tcp -> 0.0.0.0:80

First, we requested the list of running containers and then passed the correct container ID to the podman port command. We can check if the mapping is working properly like so:

# curl localhost:80

<html><body><h1>It works!</h1></body></html>

Here, we executed a curl command from the host system and it worked – the httpd process running in the container just replied to us.

If we have multiple ports and we do not care about their assignment on the underlying host system, we can easily leverage the–P or --publish-all option to publish all the ports that are exposed by the container image to random ports on the host interfaces. Podman will run through the container image's metadata looking for the exposed ports. These ports are usually defined in a Dockerfile or Containerfile with the EXPOSE instruction, as shown here:

EXPOSE 80/tcp

EXPOSE 80/udp

With the previous keyword, we can instruct the container engine that will run the final container of which network ports will be exposed and used by it.

However, we can leverage an easy but insecure alternative, as shown in the next section.

Attaching a host network

To expose a container service to the outside world, we can attach the whole host network to the running container. As you can imagine, this method could lead to the unauthorized use of host resources so, for this reason, it is not recommended and should be used carefully.

As we anticipated, attaching the host network to a running container is quite simple. Using the right Podman option, we can easily get rid of any network isolation:

# podman run --network=host -dt docker.io/library/httpd

2cb80369e53761601a41a4c004a485139de280c3738d1b7131c241f4001 f78a6

Here, we used the --network option while specifying the host value. This informs Podman that we want to let the container attach to the host network.

After running the previous command, we can check that the running container is bound to the host system's network interfaces since it can access all of them:

# netstat -nap|grep ::80

tcp6       0      0 :::80                   :::*                    LISTEN      37304/httpd

# curl localhost:80

<html><body><h1>It works!</h1></body></html>

Here, we executed a curl command from the host system and it worked – the httpd process running in the container just replied to us.

The process of exposing containers outside the underlying host does not stop here. In the next section, we'll learn how to complete this job.

Host firewall configuration

Whether we choose to leverage Port Publishing or attach the host network to the container, the process of exposing containers outside the underlying host does not stop here – we have reached the base OS of our host machine. In most cases, we will also need to allow the incoming connections to flow in the host's underlying machine, which will be interacting with the system firewall.

The following example shows a non-comprehensive way to interact with the base OS firewall. If we're using a Fedora operating system or any other Linux distribution that's leveraging Firewalld as its firewall daemon manager, we can allow incoming connections on port 80 by running the following commands:

# firewall-cmd --add-port=80/tcp

success

# firewall-cmd --runtime-to-permanent

success

The first command edits the live system rules, while the second command stores the runtime rules in a permanent way that will survive system reboot or service restart.

Good to Know

Firewalld is a firewall service daemon that provides us with an easy and fast way to customize the system firewall. Firewalld is dynamic, which means that it can create, change, and delete the firewall rules without restarting the firewall daemon each time a change is applied.

As we have seen, the process of exposing the container's services is quite simple but should be performed with a bit of consciousness and attention: opening a network port to the outside world should always be done carefully.

Rootless container network behavior

As we saw in the previous sections, Podman relies on CNI plugins or Netavark for containers running as root and has the privileges to alter network configurations in the host network namespace. For rootless containers, Podman uses the slirp4netns project, which allows you to create container network configurations without the need for root privileges; the network interfaces are created inside a rootless network namespace where the standard user has sufficient privileges. This approach allows you to transparently and flexibly manage rootless container networking.

In the previous sections, we saw how container network namespaces can be connected to a bridge using a veth pair. Being able to create a veth pair in the host network namespace requires root privileges that are not allowed for standard users.

In the simplest scenario, slirp4netns aims to overcome these privilege limitations by allowing a tap device to be created that's attached to a user-mode network namespace. This tap device is created in the rootless network namespace.

For every new rootless container, a new slirp4netns process is executed on the host. The process creates a network namespace for the container and a tap0 device is created and configured with the 10.0.2.100/24 address (from the default slirp4netns 10.0.2.0/24 subnet). This prevents two containers from directly communicating with each other on the same network since there would be an IP address overlap.

The following example demonstrates the network behavior of a rootless busybox container:

$ podman run -i busybox sh -c 'ip addr show tap0'

2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000

    link/ether 2a:c7:86:66:e9:20 brd ff:ff:ff:ff:ff:ff

    inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0

       valid_lft forever preferred_lft forever

    inet6 fd00::28c7:86ff:fe66:e920/64 scope global dynamic mngtmpaddr

       valid_lft 86117sec preferred_lft 14117sec

    inet6 fe80::28c7:86ff:fe66:e920/64 scope link

       valid_lft forever preferred_lft forever

It is possible to inspect the rootless network namespace and find the corresponding tap0 device:

$ podman unshare --rootless-netns ip addr show tap0

2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000

    link/ether 1a:eb:82:6a:82:8d brd ff:ff:ff:ff:ff:ff

    inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0

       valid_lft forever preferred_lft forever

    inet6 fd00::18eb:82ff:fe6a:828d/64 scope global dynamic mngtmpaddr

       valid_lft 86311sec preferred_lft 14311sec

    inet6 fe80::18eb:82ff:fe6a:828d/64 scope link

       valid_lft forever preferred_lft forever

Since rootless containers do not own independent IP addresses, we have two ways to let two or more containers communicate with each other:

  • The easiest way could be to put all the containers in a single Pod so that the containers can communicate using the localhost interface, without the need to open any ports.
  • The second way is to attach the container to a custom network and have its interfaces managed in the rootless network namespace.
  • If we want to keep all the containers independent, we could use the port mapping technique to publish all the necessary ports and then use those ports to let the containers communicate with each other.

Using a Podman 4 network backend, let's quickly focus on the second scenario, where two pods are attached on a rootless network. First, we need to create the network and attach a couple of test containers:

$ podman network create rootless-net

$ podman run -d --net rootless-net --name endpoint1 --cap-add=net_admin,net_raw busybox /bin/sleep 10000

$ podman run -d --net rootless-net --name endpoint2 --cap-add=net_admin,net_raw busybox /bin/sleep 10000

Let's try to ping the endpoint2 container from endpoint1:

$ podman exec -it endpoint1 ping -c1 endpoint1

PING endpoint1 (10.89.1.2): 56 data bytes

64 bytes from 10.89.1.2: seq=0 ttl=64 time=0.023 ms

--- endpoint1 ping statistics ---

1 packets transmitted, 1 packets received, 0% packet loss

round-trip min/avg/max = 0.023/0.023/0.023 ms

These two containers can communicate on the common network and have different IPv4 addresses. To prove this, we can inspect the contents of the aardvark-dns configuration for the rootless containers:

$ cat /run/user/1000/containers/networks/aardvark-dns/rootless-net

10.89.1.1

fe27f8d653384fc191d5c580d18d874d480a7e8ef74c2626ae21b118eedb f1e6 10.89.1.2  endpoint1,fe27f8d65338

19a4307516ce1ece32ce58753e70da5e5abf9cf70feea7b981917ae399ef 934d 10.89.1.3  endpoint2,19a4307516ce

Finally, let's demonstrate that the custom network bypasses the tap0 interface and allows dedicated veth pairs and bridges to be created in the rootless network namespace. The following command will show a Linux bridge for the rootless-net network and two attached veth pairs:

$ podman unshare --rootless-netns ip link | grep 'podman'

3: podman2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000

4: vethdca7cdc6@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master podman2 state UP mode DEFAULT group default qlen 1000

5: veth912bd229@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master podman2 state UP mode DEFAULT group default qlen 1000

Important Note

If you're running this code on a CNI network backend, use the podman unshare –rootless-cni command.

Another limitation of rootless containers is regarding the ping command. Usually, on Linux distributions, standard non-root users lack the CAP_NET_RAW security capability. This inhibits the execution of the ping command, which leverages the send/receive of ICMP packets. If we want to use the ping command in a rootless container, we can enable the missing security capability through the sysctl command:

# sysctl -w "net.ipv4.ping_group_range=0 2000000"

Note that this could allow any process that will be executed by a user on these groups to send ping packets.

Finally, while using rootless containers, we also need to consider that the Port Publishing technique can only be used for ports above 1024. This is because, on Linux operating systems, all the ports below 1024 are privileged and cannot be used by standard non-root users.

Summary

In this chapter, we learned how container network isolation can be leveraged to allow network segregation for each container that's running through network namespaces. These activities seem complex but thankfully, with the help of a container runtime, the steps are almost automated. We learned how to manage container networking with Podman and how to interconnect two or more containers. Finally, we learned how to expose a container's network ports outside of the underlying host and what kind of limitations we can expect while networking for rootless containers.

In the next chapter, we will discover the main differences between Docker and Podman. This will be useful for advanced users, but also for novice ones, to understand what we can expect by comparing these two container engines.

Further reading

To learn more about the topics that were covered in this chapter, take a look at the following resources:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.32.230