In simple terms, with capabilities we can break down the power of a root user. Note the following from the main capabilities page:
Starting with kernel 2.2, Linux divides the privileges traditionally associated with superusers into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.
Some examples of capabilities are as follows:
- CAP_SYSLOG: This modifies kernel printk behavior
- CAP_NET_ADMIN: This configures the network
- CAP_SYS_ADMIN: This helps you to catch all the capabilities
There are only 32 slots available for capabilities in kernel. There is one capability, CAP_SYS_ADMIN, that catches all capabilities; this is used whenever in doubt.
Docker has the ability to add or remove capabilities for a container. It uses the chown, dac_override, fowner, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, setfcap, and audit_write capabilities by default, and removes the following capabilities for a container by default:
- CAP_SETPCAP: This modifies process capabilities
- CAP_SYS_MODULE: This inserts/removes kernel modules
- CAP_SYS_RAWIO: This modifies Kernel Memory
- CAP_SYS_PACCT: This configures process accounting
- CAP_SYS_NICE: This modifies the priority of processes
- CAP_SYS_RESOURCE: This overrides Resource Limits
- CAP_SYS_TIME: This modifies system clock
- CAP_SYS_TTY_CONFIG: This configures tty devices
- CAP_AUDIT_WRITE: This writes the audit log
- CAP_AUDIT_CONTROL: This configures audit subsystem
- CAP_MAC_OVERRIDE: This ignores kernel MAC Policy
- CAP_MAC_ADMIN: This configures MAC Configuration
- CAP_SYSLOG: This modifies kernel printk behavior
- CAP_NET_ADMIN: This configures network
- CAP_SYS_ADMIN: This helps you catch all containers
We need to be very careful what capabilities we remove, as applications can break if they don't have enough capabilities to run. To add and remove capabilities for the container, you can use the --cap-add and --cap-drop options, respectively.