Container network isolation leverages network namespaces to provide separate network stacks for each container. Without a container runtime, managing network interfaces across multiple namespaces would be complex. Podman provides flexible network management that allows users to customize how containers communicate with external containers and other containers inside the same host.
In this chapter, we will learn about the common configuration practices for managing container networking, along with the differences between rootless and rootfull containers.
In this chapter, we're going to cover the following main topics:
To complete this chapter, you will need a machine with a working Podman installation. As we mentioned in Chapter 3, Running the First Container, all the examples in this book can be executed on a Fedora 34 system or later but can be reproduced on your operating system (OS) of choice. The examples in this chapter will be related to both Podman v3.4.z and Podman v4.0.0 since they provide different network implementations.
A good understanding of the topics that were covered in Chapter 4, Managing Running Containers, Chapter 5, Implementing Storage for the Container's Data, and Chapter 9, Pushing Images to a Container Registry, will help you grasp the container networking topics we'll be covering.
You must also have a good understanding of basic networking concepts to understand topics such as routing, the IP protocol, DNS, and firewalling.
In this section, we'll cover Podman's networking implementation and how to configure networks. Podman 4.0.0 introduced an important change to the network stack. However, Podman 3 is still widely used in the community. For this reason, we will cover both implementations.
Podman 3 leverages the Container Network Interface (CNI) to manage local networks that are created on the host. The CNI provides a standard set of specifications and libraries to create and configure plugin-based network interfaces in a container environment.
CNI specifications were created for Kubernetes to provide a network configuration format that's used by the container runtime to set up the defined plugins, as well as an execution protocol between plugin binaries and runtimes. The great advantage of this plugin-based approach is that vendors and communities can develop third-party plugins that satisfy the CNI's specifications.
The Podman 4 network stack is based on a brand new project called Netavark, a container-native networking implementation completely written in Rust and designed to work with Podman. Rust is a great programming language for developing system and network components thanks to its efficient memory management and high performance, similar to the C programming language. Netavark provides better support for dual-stack networking (IPv4/IPv6) and inter-container DNS resolution, along with a tighter bond with the Podman project development roadmap.
Important Note
Users upgrading from Podman 3 to Podman 4 will continue to use CNI by default and preserve their previous configuration. New Podman 4 installations will use Netavark by default. Users can revert to the CNI network backend by upgrading the network_backend field in the /usr/share/containers/containers.conf file.
In the next subsection, we'll focus on the CNI configuration that's used by Podman 3 to orchestrate container networking.
A typical CNI configuration file defines a list of plugins and their related configuration. The following example shows the default CNI configuration of a fresh Podman installation on Fedora:
Chapter12/podman_cni_conf.json
"cniVersion": "0.4.0",
"name": "podman",
"plugins": [
{
"type": "bridge",
"bridge": "cni-podman0",
"isGateway": true,
"ipMasq": true,
"hairpinMode": true,
"ipam": {
"type": "host-local",
"routes": [{ "dst": "0.0.0.0/0" }],
"ranges": [
[
{
"subnet": "10.88.0.0/16",
"gateway": "10.88.0.1"
}
]
]
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
},
{
"type": "firewall"
},
{
"type": "tuning"
}
]
}
As we can see, the plugins list in this file contains a set of plugins that are used by the runtime to orchestrate container networking.
The CNI community curates a repository of reference plugins that can be used by container runtimes. CNI reference plugins are organized into interface-creating, IP address management (IPAM), and Meta plugins. Interface-creating plugins can make use of IPAM and Meta plugins.
The following non-exhaustive list describes the most commonly used interface-creating plugins:
CNI IPAM plugins are related to the IP address management inside containers. There are only three reference IPAM plugins:
NI Meta plugins are used to configure specific behaviors in the host, such as tuning, firewall rules, and port mapping, and are executed as chained plugins along with the interface-creating plugins. The current Meta plugins that are maintained in the reference plugins repository are as follows:
Important Note
On a Fedora system, all the CNI plugin binaries are located in the /usr/libexec/cni folder and are provided by the containernetworking-plugins package, installed as a Podman dependency.
Going back to the CNI configuration example, we can see that the default Podman configuration uses a bridge plugin with host-local IP address management and that the portmap, tuning, and firewall plugins are chained together with it.
In the default network that was created for Podman, the subnet that's been allocated for container networking is 10.88.0.0/16 and the bridge, called cni-podman0, acts as the default gateway to containers on 10.88.0.1, implying that all outbound traffic from a container is directed to the bridge's interface.
Important Note
This configuration is applied to rootfull containers only. Later in this chapter, we'll learn that Podman uses a different networking approach for rootless containers to overcome the user's limited privileges. We will see that this approach has many limitations on host interfaces and IP address management.
Now, let's see what happens on the host when a new rootfull container is created.
In this subsection, we will investigate the most peculiar network events that occur when a new container is created when CNI is used as a network backend.
Important Note
All the examples in this subsection are executed as the root user. Ensure that you clean up the existing running containers to have a clearer view of the network interfaces and firewall rules.
We will try to run an example using the Nginx container and map its default internal port, 80/tcp, to the host port, 8080/tcp.
Before we begin, we want to verify the current host's IP configuration:
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:a9:ce:df brd ff:ff:ff:ff:ff:ff
altname enp0s5
altname ens5
inet 192.168.121.189/24 brd 192.168.121.255 scope global dynamic noprefixroute eth0
valid_lft 3054sec preferred_lft 3054sec
inet6 fe80::2fb:9732:a0d9:ac70/64 scope link noprefixroute valid_lft forever preferred_lft forever
3: cni-podman0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether de:52:45:ae:1a:7f brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0
valid_lft forever preferred_lft forever
inet6 fe80::dc52:45ff:feae:1a7f/64 scope link
valid_lft forever preferred_lft forever
Along with the host's main interface, eth0, we can see a cni-podman0 bridge interface with an address of 10.88.0.1/16. Also, notice that the bridge's state is set to DOWN.
Important
If the host that's being used for the test is a fresh install and Podman has never been executed before, the cni-podman0 bridge interface will not be listed. This is not a problem – it will be created when a rootfull container is created for the first time.
If no other container is running on the host, we should see no interface attached to the virtual bridge. To verify this, we are going to use the bridge link show command, whose output is expected to be empty:
# bridge link show cni-podman0
Looking at the firewall rules, we do not expect to see rules related to containers in the filter and nat tables:
# iptables -L
# iptables -L -t nat
Important Note
The output of the preceding commands has been omitted for the sake of brevity, but it is worth noting that the filter table should already contain two CNI-related chains named CNI-ADMIN and CNI-FORWARD.
Finally, we want to inspect the routing rules for the cni-podman0 interface:
# ip route show dev cni-podman0
10.88.0.0/16 proto kernel scope link src 10.88.0.1 linkdown
This command says that all traffic going to the 10.88.0.0/16 network goes through the cni-podman0 interface.
Let's run our Nginx container and see what happens to the network interfaces, routing, and firewall configuration:
# podman run -d -p 8080:80
--name net_example docker.io/library/nginx
The first and most interesting event is a new network interface being created, as shown in the output of the ip addr show command:
# ip addr show
[...omitted output...]
3: cni-podman0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether de:52:45:ae:1a:7f brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0
valid_lft forever preferred_lft forever
inet6 fe80::dc52:45ff:feae:1a7f/64 scope link
valid_lft forever preferred_lft forever
5: vethcf8b2132@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman0 state UP group default
link/ether b6:4c:1d:06:39:5a brd ff:ff:ff:ff:ff:ff link-netns cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d
inet6 fe80::90e3:98ff:fe6a:acff/64 scope link
valid_lft forever preferred_lft forever
This new interface is part of a veth pair (see man 4 veth), a couple of virtual Ethernet devices that act like a local tunnel. Veth pairs are native Linux kernel virtual interfaces that don't depend on a container runtime and can be applied to use cases that go beyond container execution.
The interesting part of veth pairs is that they can be spawned across multiple network namespaces and that a packet that's sent to one side of the pair is immediately received on the other side.
The vethcf8b2132@if2 interface is linked to a device that resides in a network namespace named cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d. Since Linux offers us the option to inspect network namespaces using the ip netns command, we can check if the namespace exists and inspect its network stack:
# ip netns
cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d (id: 0)
Hint
When a new network namespace is created, a file with the same name under /var/run/netns/ is created. This file has also the same inode number that's pointed to by the symlink under /proc/<PID>/ns/net. When the file is opened, the returned file descriptor gives access to the namespace.
The preceding command confirms that the network namespace exists. Now, we want to inspect the network interfaces that have been defined inside it:
# ip netns exec cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether fa:c9:6e:5c:db:ad brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.88.0.3/16 brd 10.88.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::f8c9:6eff:fe5c:dbad/64 scope link
valid_lft forever preferred_lft forever
Here, we executed an ip addr show command that's nested inside the ip netns exec command. The output shows us an interface that is on the other side of our veth pair. This also tells us something valuable: the container's IPv4 address, set to 10.88.0.3.
Hint
If you're curious, the container IP configuration, when using Podman's default network with the host-local IPAM plugin, is persisted to the /var/lib/cni/networks/podman folder. Here, a file named after the assigned IP address is created and written with the container-generated ID.
If a new network is created and used by a container, its configuration will be persisted in the /var/lib/cni/networks/<NETWORK_NAME> folder.
We can also inspect the container's routing tables:
# ip netns exec cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d ip route
default via 10.88.0.1 dev eth0
10.88.0.0/16 dev eth0 proto kernel scope link src 10.88.0.3
All the outbound traffic that's directed to the external networks will go through the 10.88.0.1 address, which has been assigned to the cni-podman0 bridge.
When a new container is created, the firewall and portmapper CNI plugins apply the necessary rules in the host filter and NAT tables. In the following code, we can see the rules that have been applied to the container IP address in the nat table, where SNAT, DNAT, and masquerading rules have been applied:
# iptables -L -t nat -n | grep -B4 10.88.0.3
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
CNI-HOSTPORT-MASQ all -- 0.0.0.0/0 0.0.0.0/0 /* CNI portfwd requiring masquerade */
CNI-fb51a7bfa5365a8a89e764fd all -- 10.88.0.3 0.0.0.0/0 /* name: "podman" id: "a5054cca3436a7bc4dbf78fe4b901ceef0569ced24181d2e7b118232123a5f e3" */
--
Chain CNI-DN-fb51a7bfa5365a8a89e76 (1 references)
target prot opt source destination
CNI-HOSTPORT-SETMARK tcp -- 10.88.0.0/16 0.0.0.0/0 tcp dpt:8080
CNI-HOSTPORT-SETMARK tcp -- 127.0.0.1 0.0.0.0/0 tcp dpt:8080
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:10.88.0.3:80
The bolder line shows a DNAT rule in a custom chain named CNI-DN-fb51a7bfa5365a8a89e76. This rule says that all the TCP packets whose destination is the 8080/tcp port on the host should be redirected to the 10.88.0.3:80 port, which is the network socket that's exposed by the container. This rule matches the–p 8080:80 option that we passed during container creation.
But how does the container communicate with the external world? Let's inspect the cni-podman0 bridge again while looking for notable changes:
# bridge link show cni-podman0
5: vethcf8b2132@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master cni-podman0 state forwarding priority 32 cost 2
The aforementioned interface is connected to the virtual bridge, which also happens to have an IP address assigned to it (10.88.0.1) that acts as the default gateway for all the containers.
Let's try to trace the path of an ICMP packet from the container to a well-known host, 1.1.1.1 (Cloudflare public DNS). To do so, we must run the traceroute utility from the container network's namespace using the ip netns exec command:
# ip netns exec cni-df380fb0-b8a6-4f39-0d19-99a0535c2f2d traceroute -I 1.1.1.1
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
1 _gateway (10.88.0.1) 0.071 ms 0.025 ms 0.003 ms
2 192.168.121.1 (192.168.121.1) 0.206 ms 0.195 ms 0.189 ms
3 192.168.1.1 (192.168.1.1) 5.326 ms 5.323 ms 5.319 ms
4 192.168.50.6 (192.168.50.6) 17.598 ms 17.595 ms 17.825 ms
5 192.168.50.5 (192.168.50.5) 17.821 ms 17.888 ms 17.882 ms
6 10.177.21.173 (10.177.21.173) 17.998 ms 17.772 ms 24.777 ms
7 185.210.48.42 (185.210.48.42) 25.963 ms 7.604 ms 7.702 ms
8 185.210.48.43 (185.210.48.43) 7.906 ms 10.344 ms 10.984 ms
9 185.210.48.77 (185.210.48.77) 12.212 ms 12.030 ms 12.983 ms
10 1.1.1.1 (1.1.1.1) 12.524 ms 12.160 ms 12.649 ms
Important Note
The traceroute program could be installed on the host by default. To install it on Fedora, run the sudo dnf install traceroute command.
The preceding output shows a series of hops, which are a way to count the number of routes that a packet must pass to reach a destination. In this example, we have a total of 10 hops, which is necessary to reach the target node. The first hop goes through the container's default gateway (10.88.0.1), moving to the host's network stack.
The second hop is the host's default gateway (192.168.121.1), which is assigned to a virtual bridge in a hypervisor and connected to our lab's host VM.
The third hop is a private network default gateway (192.168.1.1) that's assigned to a physical router that's connected to the lab's hypervisor network.
This demonstrates that all the traffic goes through the cni-podman0 bridge interface.
We can create more than one network, either using Podman native commands or our favorite editor to manage JSON files directly.
Now that we've explored CNI's implementation and configuration details, let's look at the new Netavark implementation in Podman 4.
Podman's 4.0.0 release introduced Netavark as the default network backend. The advantages of Netavark are as follows:
The configuration files that are used by Netavark are not very different from the ones that were shown for CNI. Netavark still uses JSON format to configure networks; files are stored under the /etc/containers/networks path for rootfull containers and the ~/.local/share/containers/storage/networks path for rootless containers.
The following configuration file shows an example network that's been created and managed under Netavark:
[
{
"name": "netavark-example",
"id": "d98700453f78ea2fdfe4a1f77eae9e121f3cbf4b6160dab89edf9ce23c b924d7",
"driver": "bridge",
"network_interface": "podman1",
"created": "2022-02-17T21:37:59.873639361Z",
"subnets": [
{
"subnet": "10.89.4.0/24",
"gateway": "10.89.4.1"
}
],
"ipv6_enabled": false,
"internal": false,
"dns_enabled": true,
"ipam_options": {
"driver": "host-local"
}
}
]
The first noticeable element is the more compact size of the configuration file compared to a CNI configuration. The following fields are defined:
The default Podman 4 network, named podman, implements a bridge driver (the bridge's name is podman0). Here, DNS support is disabled, similar to what happens with the default CNI configuration.
Netavark is also an executable binary that's installed by default in the /usr/libexec/podman/netavark path. It has a simple command-line interface (CLI) that implements the setup and teardown commands, applying the network configuration to a given network namespace (see man netavark).
Now, let's look at the effects of creating a new container with Netavark.
Like CNI, Netavark manages the creation of network configurations in the container network namespace and the host network namespace, including the creation of veth pairs and the Linux bridge that's defined in the config file.
Before the first container is created in the default Podman network, no bridges are created and the host interfaces are the only ones available, along with the loopback interface:
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:9a:ea:f4 brd ff:ff:ff:ff:ff:ff
altname enp0s5
altname ens5
inet 192.168.121.15/24 brd 192.168.121.255 scope global dynamic noprefixroute eth0
valid_lft 3293sec preferred_lft 3293sec
inet6 fe80::d0fb:c0d1:159e:2d54/64 scope link noprefixroute
valid_lft forever preferred_lft forever
Let's run a new Nginx container and see what happens:
# podman run -d -p 8080:80
--name nginx-netavark
docker.io/library/nginx
When the container is started, the podman0 bridge and a veth interface appear:
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:9a:ea:f4 brd ff:ff:ff:ff:ff:ff
altname enp0s5
altname ens5
inet 192.168.121.15/24 brd 192.168.121.255 scope global dynamic noprefixroute eth0
valid_lft 3140sec preferred_lft 3140sec
inet6 fe80::d0fb:c0d1:159e:2d54/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: veth2772d0ea@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master podman0 state UP group default qlen 1000
link/ether fa:a3:31:63:21:60 brd ff:ff:ff:ff:ff:ff link-netns netns-61a5f9f9-9dff-7488-3922-165cdc6cd320
inet6 fe80::f8a3:31ff:fe63:2160/64 scope link
valid_lft forever preferred_lft forever
8: podman0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ea:b4:9d:dd:2c:d1 brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global podman0
valid_lft forever preferred_lft forever
inet6 fe80::24ec:30ff:fe1a:2ca8/64 scope link
valid_lft forever preferred_lft forever
There are no particular changes for the end user in terms of network namespaces, mixing context between version management, firewall rules, or routing compared to the CNI walkthrough provided previously.
Again, a network namespace in the host is created for the nginx-netavark container. Let's inspect the contents of the network namespace:
# ip netns exec netns-61a5f9f9-9dff-7488-3922-165cdc6cd320 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether ae:9b:7f:07:3f:16 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.88.0.4/16 brd 10.88.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::ac9b:7fff:fe07:3f16/64 scope link
valid_lft forever preferred_lft forever
Once again, it is possible to find the internal IP address that's been assigned to the container.
If the container is executed in rootless mode, the bridge and veth pairs will be created in a rootless network namespace.
Important Note
The rootless network namespace can be inspected in Podman 4 with the podman unshare --rootless-netns command.
Users running Podman 3 and CNI can use the --rootless-cni option to obtain the same results.
In the next subsection, we will learn how to manage and customize container networks with the CLI tools that are offered by Podman.
The podman network command provides the necessary tools for managing container networks. The following subcommands are available:
In this section, you will learn how to create a new network and connect a container to it. For Podman 3, all the generated CNI config files are written to the /etc/cni/net.d folder in the host.
For Podman 4, all the generated Netavark config files for rootfull networks are written to /etc/containers/networks, while the config files for rootless networks are written to ~/.local/share/containers/storage/networks.
The following command creates a new network called example1:
# podman network create
--driver bridge
--gateway "10.89.0.1"
--subnet "10.89.0.0/16" example1
Here, we provided subnet and gateway information, along with the driver type that corresponds to the CNI interface-creating plugin. The resulting network configuration is written in the aforementioned paths according to the kind of network backend and can be inspected with the podman network inspect command.
The following output shows the configuration for a CNI network backend:
# podman network inspect example1
[
{
"cniVersion": "0.4.0",
"name": "example1",
"plugins": [
{
"bridge": "cni-podman1",
"hairpinMode": true,
"ipMasq": true,
"ipam": {
"ranges": [
[
{
"gateway": "10.89.0.1",
"subnet": "10.89.0.0/16"
}
]
],
"routes": [
{
"dst": "0.0.0.0/0"
}
],
"type": "host-local"
},
"isGateway": true,
"type": "bridge"
},
{
"capabilities": {
"portMappings": true
},
"type": "portmap"
},
{
"backend": "",
"type": "firewall"
},
{
"type": "tuning"
},
{
"capabilities": {
"aliases": true
},
"domainName": "dns.podman",
"type": "dnsname"
}
]
}
]
The new network CNI configuration shows that a bridge called cni-podman1 will be created for this network and that containers will allocate IPs from the 10.89.0.0/16 subnet.
The other fields of the configuration are pretty similar to the default one, except for the dnsname plugin (project's repository: https://github.com/containers/dnsname), which is used to enable internal container name resolution. This feature provides an advantage in cross-container communication that we will look at in the next subsection.
The following output shows the generated configuration for a Netavark network backend:
# podman network inspect example1
[
{
"name": "example1",
"id": "a8ca04a41ef303e3247097b86d9048750e5f1aa819ec573b0e5f78e3cc8a 971b",
"driver": "bridge",
"network_interface": "podman1",
"created": "2022-02-18T17:56:28.451701452Z",
"subnets": [
{
"subnet": "10.89.0.0/16",
"gateway": "10.89.0.1"
}
],
"ipv6_enabled": false,
"internal": false,
"dns_enabled": true,
"ipam_options": {
"driver": "host-local"
}
}
]
Notice that the bridge naming convention with Netavark is slightly different since it uses the podmanN pattern, with N >= 0.
To list all the existing networks, we can use the podman network ls command:
# podman network ls
NETWORK ID NAME VERSION PLUGINS
2f259bab93aa podman 0.4.0 bridge,portmap,firewall,tuning
228b48a56dbc example1 0.4.0 bridge,portmap,firewall,tuning,dnsname
The preceding output shows the name, ID, CNI version, and active plugins of each active network.
On Podman 4, the output is slightly more compact since there are no CNI plugins to be shown:
# podman network ls
NETWORK ID NAME DRIVER
a8ca04a41ef3 example1 bridge
2f259bab93aa podman bridge
Now, it's time to spin up a container that's attached to the new network. The following code creates a PostgreSQL database that's attached to the example1 network:
# podman run -d -p 5432:5432
--network example1
-e POSTGRES_PASSWORD=password
--name postgres
docker.io/library/postgres
533792e9522fc65371fa6d694526400a3a01f29e6de9b2024e84895f354e d2bb
The new container receives an address from the 10.89.0.0/16 subnet, as shown by the podman inspect command:
# podman inspect postgres --format '{{.NetworkSettings.Networks.example1.IPAddress}}'
10.89.0.3
When we're using the CNI network backend, we can double-check this information by looking at the contents of the new /var/lib/cni/networks/example1 folder:
# ls -al /var/lib/cni/networks/example1/
total 20
drwxr-xr-x. 2 root root 4096 Jan 23 17:26 .
drwxr-xr-x. 5 root root 4096 Jan 23 16:22 ..
-rw-r--r--. 1 root root 70 Jan 23 16:26 10.89.0.3
-rw-r--r--. 1 root root 9 Jan 23 16:57 last_reserved_ip.0
-rwxr-x---. 1 root root 0 Jan 23 16:22 lock
Looking at the content of the 10.89.0.3 file, we find the following:
# cat /var/lib/cni/networks/example1/10.89.0.3
533792e9522fc65371fa6d694526400a3a01f29e6de9b2024e84895f354 ed2bb
The file holds the container ID of our postgres container, which is used to track the mapping with the assigned IP address. As we mentioned previously, this behavior is managed by the host-local plugin, the default IPAM choice for Podman networks.
Important Note
The Netavark network backend tracks IPAM configuration in the /run/containers/networks/ipam.db file for rootfull containers.
We can also see that a new Linux bridge has been created (notice the cni- prefix that is used for CNI network backends):
# ip addr show cni-podman1
8: cni-podman1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 56:ed:1d:a9:53:54 brd ff:ff:ff:ff:ff:ff
inet 10.89.0.1/16 brd 10.89.255.255 scope global cni-podman1
valid_lft forever preferred_lft forever
inet6 fe80::54ed:1dff:fea9:5354/64 scope link
valid_lft forever preferred_lft forever
The new device is connected to one peer of the PostgreSQL container's veth pair:
# bridge link show
10: vethf03ed735@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master cni-podman1 state forwarding priority 32 cost 2
20: veth23ee4990@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master cni-podman0 state forwarding priority 32 cost 2
Here, we can see that vethf03ed735@eth0 is attached to the cni-podman1 bridge. The interface has the following configuration:
# ip addr show vethf03ed735
10: vethf03ed735@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman1 state UP group default
link/ether 86:d1:8c:c9:8c:2b brd ff:ff:ff:ff:ff:ff link-netns cni-77bfb1c0-af07-1170-4cc8-eb56d15511ac
inet6 fe80::f889:17ff:fe83:4da2/64 scope link
valid_lft forever preferred_lft forever
The preceding output also shows that the other side of the veth pair is located in the container's network namespace – that is, cni-77bfb1c0-af07-1170-4cc8-eb56d15511ac. We can inspect the container's network configuration and confirm the IP address that's been allocated from the new subnet:
# ip netns exec cni-77bfb1c0-af07-1170-4cc8-eb56d15511ac ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ba:91:9e:77:30:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.89.0.3/16 brd 10.89.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::b891:9eff:fe77:30a1/64 scope link
valid_lft forever preferred_lft forever
Important Note
The network namespace naming pattern for the Netavark backend in Podman 4 is netns-<UID>.
It is possible to connect a running container to another network without stopping and restarting it. In this way, the container will keep an interface attached to the original network and a second interface, attached to the new network, will be created. This feature, which is useful for use cases such as reverse proxies, can be achieved with the podman network connect command. Let's try to run a new net_example container:
# podman run -d -p 8080:80 --name net_example docker.io/library/nginx
# podman network connect example1 net_example
To verify that the container has been attached to the new network, we can run the podman inspect command and look at the networks:
# podman inspect net_example
[...omitted output...]
"Networks": {
"example1": {
"EndpointID": "",
"Gateway": "10.89.0.1",
"IPAddress": "10.89.0.10",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "fa:41:66:0a:25:45",
"NetworkID": "example1",
"DriverOpts": null,
"IPAMConfig": null,
"Links": null
},
"podman": {
"EndpointID": "",
"Gateway": "10.88.0.1",
"IPAddress": "10.88.0.7",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "ba:cd:eb:8d:19:b5",
"NetworkID": "podman",
"DriverOpts": null,
"IPAMConfig": null,
"Links": null
}
}
[…omitted output...]
Here, we can see that the container now has two interfaces attached to the podman and example1 networks, with IP addresses allocated from each network's subnet.
To disconnect a container from a network, we can use the podman network disconnect command:
# podman network disconnect example1 net_example
When a network is not necessary anymore and is disconnected from running containers, we can delete it with the podman network rm command:
# podman network rm example1
example1
The command's output shows the list of removed networks. Here, the network's CNI configuration is removed from the host's /etc/cni/net.d directory.
Important Note
If the network has associated containers that are either running or have been stopped, the previous message will fail with Error: "example1" has associated containers with it. To work around this issue, remove or disconnect the associated containers before using the command.
The podman network rm command is useful when we need to remove a specific network. To remove all unused networks, the podman network prune command is a better choice:
# podman network prune
WARNING! This will remove all networks not used by at least one container.
Are you sure you want to continue? [y/N] y
example2
db_network
In this section, we learned about the CNI specification and how Podman leverages its interface to simplify container networking. In a multi-tier or microservices scenario, we need to let containers communicate with each other. In the next section, we will learn how to manage container-to-container communication.
Using our knowledge from the previous section, we should be aware that two or more containers that have been created inside the same network can reach each other on the same subnet without the need for external routing.
At the same time, two or more containers that belong to different networks will be able to reach each other on different subnets by routing packets through their networks.
To demonstrate this, let's create a couple of busybox containers in the same default network:
# podman run -d --name endpoint1
--cap-add=net_admin,net_raw busybox /bin/sleep 10000
# podman run -d --name endpoint2
--cap-add=net_admin,net_raw busybox /bin/sleep 10000
In our lab, the two containers have 10.88.0.14 (endpoint1) and 10.88.0.15 (endpoint2) as their addresses. These two addresses are subject to change and can be collected using the methods illustrated previously with the podman inspect or the nsenter commands.
Regarding capabilities customization, we added the CAP_NET_ADMIN and CAP_NET_RAW capabilities to let the containers run commands such as ping or traceroute seamlessly.
Let's try to run a traceroute command from endpoint1 to endpoint2 to see the path of a packet:
# podman exec -it endpoint1 traceroute 10.88.0.14
traceroute to 10.88.0.14 (10.88.0.14), 30 hops max, 46 byte packets
1 10.88.0.14 (10.88.0.14) 0.013 ms 0.004 ms 0.002 ms
As we can see, the packet stays on the internal network and reaches the node without additional hops.
Now, let's create a new network, net1, and connect a container called endpoint3 to it:
# podman network create --driver bridge --gateway "10.90.0.1" --subnet "10.90.0.0/16" net1
# podman run -d --name endpoint3 --network=net1 --cap-add=net_admin,net_raw busybox /bin/sleep 10000
The container in our lab gets an IP address of 10.90.0.2. Let's see the network path from endpoint1 to endpoint3:
# podman exec -it endpoint1 traceroute 10.90.0.2
traceroute to 10.90.0.2 (10.90.0.2), 30 hops max, 46 byte packets
1 host.containers.internal (10.88.0.1) 0.003 ms 0.001 ms 0.006 ms
2 10.90.0.2 (10.90.0.2) 0.001 ms 0.002 ms 0.002 ms
This time, the packet has traversed the endpoint1 container's default gateway (10.88.0.1) and reached the endpoint3 container, which is routed from the host to the associated net1 Linux bridge.
Connectivity across containers in the same host is very easy to manage and understand. However, we are still missing an important aspect for container-to-container communication: DNS resolution.
Let's learn how to leverage this feature with Podman networks.
Despite its many configuration caveats, DNS resolution is a very simple concept: a service is queried to provide the IP address associated with a given hostname. The amount of information that can be provided by a DNS server is far richer than this, but we want to focus on simple IP resolution in this example.
For example, let's imagine a scenario where a web application running on a container named webapp needs read/write access to a database running on a second container named db. DNS resolution enables webapp to query for the db container's IP address before contacting it.
Previously, we learned that Podman's default network does not provide DNS resolution, while new user-created networks have DNS resolution enabled by default. On a CNI network backend, the dnsname plugin automatically configures a dnsmasq service, which is started when containers are connected to the network, to provide DNS resolution. On a Netavark network backend, the DNS resolution is delivered by aarvark-dns.
To test this feature, we are going to reuse the students web application that we illustrated in Chapter 10, Troubleshooting and Monitoring Containers, since it provides an adequate client-server example with a minimal REST service and a database backend based on PostgreSQL.
Info
The source code is available in this book's GitHub repository at https://github.com/PacktPublishing/Podman-for-DevOps/tree/main/Chapter10/students.
In this example, the web application simply prints some output in JSON as the result of an HTTP GET that triggers a query to a PostgreSQL database. For our demonstration, we will run both the database and the web application on the same network.
First, we must create the PostgreSQL database pod while providing a generic username and password:
# podman run -d
--network net1 --name db
-e POSTGRES_USER=admin
-e POSTGRES_PASSWORD=password
-e POSTGRES_DB=students
postgres
Next, we must restore the data from the SQL dump in the students folder to the database:
# cd Chapter10/students
# cat db.sql | podman exec -i db psql -U admin students
If you haven't already built it in the previous chapters, then you need to build the students container image and run it on the host:
# buildah build -t students .
# podman run -d
--network net1
-p 8080:8080
--name webapp
students
students -host db -port 5432
-username admin -password
Notice the highlighted part of the command: the students application accepts the -host, -port, -username, and -password options to customize the database's endpoints and credentials.
We did not provide any IP address in the host field. Instead, the Postgres container name, db, along with the default 5432 port, were used to identify the database.
Also, notice that the db container was created without any kind of port mapping: we expect to directly reach the database over the net1 container network, where both containers were created.
Let's try to call the students application API and see what happens:
# curl localhost:8080/students {"Id":10149,"FirstName":"Frank","MiddleName":"Vincent","LastName":"Zappa","Class":"3A","Course":"Composition"}
The query worked fine, meaning that the application successfully queried the database. But how did this happen? How did it resolve the container IP address by only knowing its name? In the next section, we'll look at the different behaviors on CNI and Netavark network backends.
On Podman 3 or Podman 4 with a CNI backend, the dnsname plugin is enabled in the net1 network and a dedicated dnsmasq service is spawned that is in charge of resolving container names to their assigned IP addresses. Let's start by finding the container's IP addresses first:
# podman inspect db --format '{{.NetworkSettings.Networks.net1.IPAddress}}'
10.90.0.2
# podman inspect webapp --format '{{.NetworkSettings.Networks.net1.IPAddress}}'
10.90.0.3
We want to look for dnsmasq processes running on the system:
# ps aux | grep dnsmasq
root 2703 0.0 0.0 26436 2384 ? S 16:16 0:00 /usr/sbin/dnsmasq -u root --conf-file=/run/containers/cni/dnsname/net1/dnsmasq.conf
root 5577 0.0 0.0 6140 832 pts/0 S+ 22:00 0:00 grep --color=auto dnsmasq
The preceding output shows an instance of the dnsmasq process running with a config file that's been created under the /run/containers/cni/dnsname/net1/ directory. Let's inspect its contents:
# ls -al /run/containers/cni/dnsname/net1/
total 12
drwx------. 2 root root 120 Jan 25 16:16 .
drwx------. 3 root root 60 Jan 25 16:16 ..
-rw-r--r--. 1 root root 30 Jan 25 16:28 addnhosts
-rwx------. 1 root root 356 Jan 25 16:16 dnsmasq.conf
-rwxr-x---. 1 root root 0 Jan 25 16:16 lock
-rw-r--r--. 1 root root 5 Jan 25 16:16 pidfile
/run/containers/cni/dnsname/net1/dnsmasq.conf defines the dnsmasq configuration:
# cat /run/containers/cni/dnsname/net1/dnsmasq.conf
## WARNING: THIS IS AN AUTOGENERATED FILE
## AND SHOULD NOT BE EDITED MANUALLY AS IT
## LIKELY TO AUTOMATICALLY BE REPLACED.
strict-order
local=/dns.podman/
domain=dns.podman
expand-hosts
pid-file=/run/containers/cni/dnsname/net1/pidfile
except-interface=lo
bind-dynamic
no-hosts
interface=cni-podman1
addn-hosts=/run/containers/cni/dnsname/net1/addnhosts
The process listens on the cni-podman1 interface (the net1 network bridge, which has an IP address of 10.90.0.1) and is authoritative for the dns.podman domain. The host's records are kept in the /run/containers/cni/dnsname/net1/addnhosts file, which contains the following:
# cat /run/containers/cni/dnsname/net1/addnhosts
10.90.0.2 db
10.90.0.3 webapp
When a container in the net1 network attempts DNS resolution, it uses its /etc/resolv.conf file to find out the DNS server to direct the query to. The file's content in the webapp container is as follows:
# podman exec -it webapp cat /etc/resolv.conf
search dns.podman
nameserver 10.90.0.1
This shows that the container contacts the 10.90.0.1 address (which is also the container default gateway and the cni-podman1 bridge) to query hostname resolution.
The search domain allows processes to search for a Fully Qualified Domain Name (FQDN). In the preceding example, db.dns.podman would be resolved correctly by the DNS service. The search domain for a CNI network configuration can be customized by editing the related config file under /etc/cni/net.d/. The default configuration for the dnsname plugin in the net1 config is as follows:
{
"type": "dnsname",
"domainName": "dns.podman",
"capabilities": {
"aliases": true
}
}
When you update the domainName field to a new value, the changes are not effective immediately. To regenerate the updated dnsmasq.conf, all the containers in the network must be stopped to let the dnsname plugin clean up the current network configuration. When containers are restarted, the dnsmasq configuration is regenerated accordingly.
If the preceding example was executed on Podman 4 with a Netavark network backend, the aardvark-dns daemon would be responsible for container resolution in a similar way to dnsmasq.
The aardvark-dns project is a companion project of Netavark written in Rust. It is a lightweight authoritative DNS service that can work on both IPv4 A records and IPv6 AAAA records.
When a new network with DNS resolution enabled is created, a new aardvark-dns process is created, as shown in the following code:
# ps aux | grep aardvark-dns
root 9115 0.0 0.0 344732 2584 pts/0 Sl 20:15 0:00 /usr/libexec/podman/aardvark-dns --config /run/containers/networks/aardvark-dns -p 53 run
root 10831 0.0 0.0 6400 2044 pts/0 S+ 23:36 0:00 grep --color=auto aardvark-dns
The process listens on port 53/udp of the host network namespace for rootfull containers and on port 53/udp of the rootless network namespace for rootless containers.
The output of the ps command also shows the default configuration path – the /run/containers/networks/aardvark-dns directory – where the aardvark-dns process stores the resolution configurations under different files, named after the associated network. For example, for the net1 network, we will find content similar to the following:
# cat /run/containers/networks/aardvark-dns/net1
10.90.0.1
dc7fff2ef78e99a2a1a3ea6e29bfb961fc07cd6cf71200d50761e25df30 11636 10.90.0.2 db,dc7fff2ef78e
10c7bbb7006c9b253f9ebe1103234a9af41dced8f12a6d94b7fc46a9a97 5d8cc 10.90.0.2 webapp,10c7bbb7006c
The file stores IPv4 addresses (and IPv6 addresses, if present) for every container. Here, we can see the containers' names and short IDs resolved to the IPv4 addresses.
The first line tells us the address where aardvark-dns is listening for incoming requests. Once again, it corresponds to the default gateway address for the network.
Connecting containers across the same network allows for fast and simple communication across different services running in separate network namespaces. However, there are use cases where containers must share the same network namespace. Podman offers a solution to achieve this goal easily: Pods.
The concept of a Pod comes from the Kubernetes architecture. According to the official upstream documentation, "A Pod ... is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers."
A Pod is also the smallest deployable unit in Kubernetes scheduling. All the containers inside a Pod share the same network, UTC, IPC, and (optionally) PID namespace. This means that all the services running on the different containers can refer to each other as localhost, while external containers will continue to contact the Pod's IP address. A Pod receives one IP address that is shared across all the containers.
There are many adoption use cases. A very common one is sidecar containers: in this case, a reverse proxy or an OAuth proxy runs alongside the main container to provide authentication or service mesh functionalities.
Podman provides the basic tooling for manipulating Pods with the podman pod command. The following example shows how to create a basic Pod with two containers and demonstrates network namespace sharing across the two containers in the Pod.
Important Note
To understand the following example, stop and remove all the running containers and Pods and start with a clean environment.
podman pod create initializes a new, empty Pod from scratch:
# podman pod create --name example_pod
Important Note
When a new, empty Pod is created, Podman also creates an infra container, which is used to initialize the namespaces when the Pod is started. This container is based on the k8s.gcr.io/pause image for Podman 3 and a locally-built podman-pause image for Podman 4.
Now, we can create two basic busybox containers inside the Pod:
# podman create --name c1 --pod example_pod busybox sh -c 'sleep 10000'
# podman create --name c2 --pod example_pod busybox sh -c 'sleep 10000'
Finally, we can start the Pod (and its associated containers) with the podman pod start command:
# podman pod start example_pod
Here, we have a running Pod with two containers (plus an infra one) running. To verify its status, we can use the podman pod ps command:
# podman pod ps
POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS
8f89f37b8f3b example_pod Degraded 8 minutes ago 95589171284a 4
With the podman pod top command, we can see the resources that are being consumed by each container in the Pod:
# podman pod top example_pod
USER PID PPID %CPU ELAPSED TTY TIME COMMAND
root 1 0 0.000 10.576973703s ? 0s sleep 1000
0 1 0 0.000 10.577293395s ? 0s /catatonit -P
root 1 0 0.000 9.577587032s ? 0s sleep 1000
After creating the Pod, we can inspect the network's behavior. First, we will see that only one network namespace has been created in the system:
# ip netns
netns-17b9bb67-5ce6-d533-ecf0-9d7f339e6ebd (id: 0)
Let's check the IP configuration for this namespace and its related network stack:
# ip netns exec netns-17b9bb67-5ce6-d533-ecf0-9d7f339e6ebd ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether a6:1b:bc:8e:65:1e brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.88.0.3/16 brd 10.88.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a41b:bcff:fe8e:651e/64 scope link
valid_lft forever preferred_lft forever
To verify that the c1 and c2 containers share the same network namespace and are running with an IP address of 10.88.0.3, we can run the same ip addr show command inside the containers using the podman exec command:
# podman exec -it c1 ip addr show
# podman exec -it c2 ip addr show
These two containers are expected to return the same output as the netns-17b9bb67-5ce6-d533-ecf0-9d7f339e6ebd network namespace.
The example pod can be stopped and removed with the podman pod stop and podman pod rm commands, respectively:
# podman pod stop example_pod
# podman pod rm example_pod
We will cover pods in more detail in Chapter 14, Interacting with systemd and Kubernetes, where we will also discuss name resolution and multi-pod orchestration.
In this section, we focused on communication across two or more containers inside the same host or Pod, regardless of the number and type of networks involved. However, containers are a platform where you can run services that are generally accessed by the external world. For this reason, in the next section, we will investigate the best practices that can be applied to expose containers outside their hosts and make their services accessible to other clients/consumers.
Container adoption in an enterprise company or a community project could be a hard thing to do that could require time. For this reason, we may not have all the required services running as containers during our adoption journey. This is why exposing containers outside our underlying host could be a nice solution for interconnecting services that live in containers to services that run in the legacy world.
As we briefly saw earlier in this chapter, Podman uses two different networking stacks, depending on the container: rootless or rootfull.
Even though the underlying mechanism is slightly different, depending on if you are using a rootless or a rootfull container, Podman's command-line options for exposing network ports are the same for both container types.
Good to Know
Note that the example we are going to see in this section will be executed as a root user. This choice was necessary because the main objective of this section is to show you some of the firewall configurations that could be mandatory for exposing a container service to the outside world.
Exposing a container starts with Port Publishing activities. We'll learn what this is in the next section.
Port Publishing consists of instructing Podman to create a temporary mapping between the container's ports and some random or custom host's ports.
The option to instruct Podman to publish a port is really simple – it consists of adding the -p or --publish option to the run command. Let's see how it works:
-p=ip:hostPort:containerPort
The previous option publishes a container's port, or range of ports, to the host. When we are specifying ranges for hostPort or containerPort, the number must be equal for both ranges.
We can even omit ip. In that case, the port will be bound on all the IPs of the underlying host. If we do not set the host port, the container's port will be randomly assigned a port on the host.
Let's look at an example of the port publishing option:
# podman run -dt -p 80:80/tcp docker.io/library/httpd
Trying to pull docker.io/library/httpd:latest...
Getting image source signatures
Copying blob 41c22baa66ec done
Copying blob dcc4698797c8 done
Copying blob d982c879c57e done
Copying blob a2abf6c4d29d done
Copying blob 67283bbdd4a0 done
Copying config dabbfbe0c5 done
Writing manifest to image destination
Storing signatures
ea23dbbeac2ea4cb6d215796e225c0e7c7cf2a979862838ef4299d410c90 ad44
As you can see, we have told Podman to run a container starting from the httpd base image. Then, we allocated a pseudo-tty (-t) in detached mode (-d) before setting the port mapping to bind the underlying host port, 80, to port 80 of the container.
Now, we can use the podman port command to see the actual mapping:
# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ea23dbbeac2e docker.io/library/httpd:latest httpd-foreground 3 minutes ago Up 3 minutes ago 0.0.0.0:80->80/tcp ecstatic_chaplygin
# podman port ea23dbbeac2e
80/tcp -> 0.0.0.0:80
First, we requested the list of running containers and then passed the correct container ID to the podman port command. We can check if the mapping is working properly like so:
# curl localhost:80
<html><body><h1>It works!</h1></body></html>
Here, we executed a curl command from the host system and it worked – the httpd process running in the container just replied to us.
If we have multiple ports and we do not care about their assignment on the underlying host system, we can easily leverage the–P or --publish-all option to publish all the ports that are exposed by the container image to random ports on the host interfaces. Podman will run through the container image's metadata looking for the exposed ports. These ports are usually defined in a Dockerfile or Containerfile with the EXPOSE instruction, as shown here:
EXPOSE 80/tcp
EXPOSE 80/udp
With the previous keyword, we can instruct the container engine that will run the final container of which network ports will be exposed and used by it.
However, we can leverage an easy but insecure alternative, as shown in the next section.
To expose a container service to the outside world, we can attach the whole host network to the running container. As you can imagine, this method could lead to the unauthorized use of host resources so, for this reason, it is not recommended and should be used carefully.
As we anticipated, attaching the host network to a running container is quite simple. Using the right Podman option, we can easily get rid of any network isolation:
# podman run --network=host -dt docker.io/library/httpd
2cb80369e53761601a41a4c004a485139de280c3738d1b7131c241f4001 f78a6
Here, we used the --network option while specifying the host value. This informs Podman that we want to let the container attach to the host network.
After running the previous command, we can check that the running container is bound to the host system's network interfaces since it can access all of them:
# netstat -nap|grep ::80
tcp6 0 0 :::80 :::* LISTEN 37304/httpd
# curl localhost:80
<html><body><h1>It works!</h1></body></html>
Here, we executed a curl command from the host system and it worked – the httpd process running in the container just replied to us.
The process of exposing containers outside the underlying host does not stop here. In the next section, we'll learn how to complete this job.
Whether we choose to leverage Port Publishing or attach the host network to the container, the process of exposing containers outside the underlying host does not stop here – we have reached the base OS of our host machine. In most cases, we will also need to allow the incoming connections to flow in the host's underlying machine, which will be interacting with the system firewall.
The following example shows a non-comprehensive way to interact with the base OS firewall. If we're using a Fedora operating system or any other Linux distribution that's leveraging Firewalld as its firewall daemon manager, we can allow incoming connections on port 80 by running the following commands:
# firewall-cmd --add-port=80/tcp
success
# firewall-cmd --runtime-to-permanent
success
The first command edits the live system rules, while the second command stores the runtime rules in a permanent way that will survive system reboot or service restart.
Good to Know
Firewalld is a firewall service daemon that provides us with an easy and fast way to customize the system firewall. Firewalld is dynamic, which means that it can create, change, and delete the firewall rules without restarting the firewall daemon each time a change is applied.
As we have seen, the process of exposing the container's services is quite simple but should be performed with a bit of consciousness and attention: opening a network port to the outside world should always be done carefully.
As we saw in the previous sections, Podman relies on CNI plugins or Netavark for containers running as root and has the privileges to alter network configurations in the host network namespace. For rootless containers, Podman uses the slirp4netns project, which allows you to create container network configurations without the need for root privileges; the network interfaces are created inside a rootless network namespace where the standard user has sufficient privileges. This approach allows you to transparently and flexibly manage rootless container networking.
In the previous sections, we saw how container network namespaces can be connected to a bridge using a veth pair. Being able to create a veth pair in the host network namespace requires root privileges that are not allowed for standard users.
In the simplest scenario, slirp4netns aims to overcome these privilege limitations by allowing a tap device to be created that's attached to a user-mode network namespace. This tap device is created in the rootless network namespace.
For every new rootless container, a new slirp4netns process is executed on the host. The process creates a network namespace for the container and a tap0 device is created and configured with the 10.0.2.100/24 address (from the default slirp4netns 10.0.2.0/24 subnet). This prevents two containers from directly communicating with each other on the same network since there would be an IP address overlap.
The following example demonstrates the network behavior of a rootless busybox container:
$ podman run -i busybox sh -c 'ip addr show tap0'
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether 2a:c7:86:66:e9:20 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0
valid_lft forever preferred_lft forever
inet6 fd00::28c7:86ff:fe66:e920/64 scope global dynamic mngtmpaddr
valid_lft 86117sec preferred_lft 14117sec
inet6 fe80::28c7:86ff:fe66:e920/64 scope link
valid_lft forever preferred_lft forever
It is possible to inspect the rootless network namespace and find the corresponding tap0 device:
$ podman unshare --rootless-netns ip addr show tap0
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether 1a:eb:82:6a:82:8d brd ff:ff:ff:ff:ff:ff
inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0
valid_lft forever preferred_lft forever
inet6 fd00::18eb:82ff:fe6a:828d/64 scope global dynamic mngtmpaddr
valid_lft 86311sec preferred_lft 14311sec
inet6 fe80::18eb:82ff:fe6a:828d/64 scope link
valid_lft forever preferred_lft forever
Since rootless containers do not own independent IP addresses, we have two ways to let two or more containers communicate with each other:
Using a Podman 4 network backend, let's quickly focus on the second scenario, where two pods are attached on a rootless network. First, we need to create the network and attach a couple of test containers:
$ podman network create rootless-net
$ podman run -d --net rootless-net --name endpoint1 --cap-add=net_admin,net_raw busybox /bin/sleep 10000
$ podman run -d --net rootless-net --name endpoint2 --cap-add=net_admin,net_raw busybox /bin/sleep 10000
Let's try to ping the endpoint2 container from endpoint1:
$ podman exec -it endpoint1 ping -c1 endpoint1
PING endpoint1 (10.89.1.2): 56 data bytes
64 bytes from 10.89.1.2: seq=0 ttl=64 time=0.023 ms
--- endpoint1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.023/0.023/0.023 ms
These two containers can communicate on the common network and have different IPv4 addresses. To prove this, we can inspect the contents of the aardvark-dns configuration for the rootless containers:
$ cat /run/user/1000/containers/networks/aardvark-dns/rootless-net
10.89.1.1
fe27f8d653384fc191d5c580d18d874d480a7e8ef74c2626ae21b118eedb f1e6 10.89.1.2 endpoint1,fe27f8d65338
19a4307516ce1ece32ce58753e70da5e5abf9cf70feea7b981917ae399ef 934d 10.89.1.3 endpoint2,19a4307516ce
Finally, let's demonstrate that the custom network bypasses the tap0 interface and allows dedicated veth pairs and bridges to be created in the rootless network namespace. The following command will show a Linux bridge for the rootless-net network and two attached veth pairs:
$ podman unshare --rootless-netns ip link | grep 'podman'
3: podman2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
4: vethdca7cdc6@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master podman2 state UP mode DEFAULT group default qlen 1000
5: veth912bd229@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master podman2 state UP mode DEFAULT group default qlen 1000
Important Note
If you're running this code on a CNI network backend, use the podman unshare –rootless-cni command.
Another limitation of rootless containers is regarding the ping command. Usually, on Linux distributions, standard non-root users lack the CAP_NET_RAW security capability. This inhibits the execution of the ping command, which leverages the send/receive of ICMP packets. If we want to use the ping command in a rootless container, we can enable the missing security capability through the sysctl command:
# sysctl -w "net.ipv4.ping_group_range=0 2000000"
Note that this could allow any process that will be executed by a user on these groups to send ping packets.
Finally, while using rootless containers, we also need to consider that the Port Publishing technique can only be used for ports above 1024. This is because, on Linux operating systems, all the ports below 1024 are privileged and cannot be used by standard non-root users.
In this chapter, we learned how container network isolation can be leveraged to allow network segregation for each container that's running through network namespaces. These activities seem complex but thankfully, with the help of a container runtime, the steps are almost automated. We learned how to manage container networking with Podman and how to interconnect two or more containers. Finally, we learned how to expose a container's network ports outside of the underlying host and what kind of limitations we can expect while networking for rootless containers.
In the next chapter, we will discover the main differences between Docker and Podman. This will be useful for advanced users, but also for novice ones, to understand what we can expect by comparing these two container engines.
To learn more about the topics that were covered in this chapter, take a look at the following resources:
18.223.32.230