CoreOS supports its own version of cloud-init, with added support for the CoreOS environment, and without everything else incompatible with its environment, so we can boot a fully configured system and cluster.
We'll take a look at the CoreOS specificities, as we can refer to earlier tips on how to manage users, files, authorized SSH keys, and other standard cloud-init directives. At the end of this part, you'll know how to configure the etcd key value store, the fleet cluster manager, the flannel overlay network, control the update mechanism, and ensure systemd units are started as early as possible.
CoreOS proposes a very useful cloud-config file validator at https://coreos.com/validate/. It's super useful when we're not sure if a directive is supported or not in the distribution.
To step through this recipe, you will need:
We'll get through the most important configuration options that can be manipulated for CoreOS. This includes the etcd distributed key value store, the fleet scheduler, the fleet network, the update strategy, and some systemd unit configuration.
The etcd key value store is used in CoreOS to share multiple configuration data between members of a same cluster. To begin with, we need a discovery token, that can be obtained from https://discovery.etcd.io/new.
$ curl -w " " 'https://discovery.etcd.io/new' https://discovery.etcd.io/638d980c4edf94d6ddff8d6e862bc7d9
We can specify the minimum required size of the CoreOS cluster by adding the size=
argument to the URL https://discovery.etcd.io/new?size=3.
Now we have a valid discovery token, let's add it to our cloud-config.yml
file under the etcd2
directive:
#cloud-config coreos: etcd2: discovery: "https://discovery.etcd.io/638d980c4edf94d6ddff8d6e862bc7d9"
The next step is to configure etcd:
listen-peer-urls
). We want the local interface on the default port (TCP/2380
).listen-client-urls
). We want all available interfaces on the default port (TCP/2379
).initial-advertise-peer-urls
). We want the local interface, using the same peer traffic port (TCP/2380
).advertise-client-urls
). We want the local interface, using the same client traffic port (TCP/2379
).To make it more dynamic, we can use variables compatible with most IaaS providers—$private_ipv4
and $public_ipv4
.
This is how our cloud-config.yml
file looks with all the etcd configuration:
#cloud-config coreos: etcd2: discovery: "https://discovery.etcd.io/b8724b9a1456573f4d527452cba8ebdb" advertise-client-urls: "http://$private_ipv4:2379" listen-client-urls: "http://0.0.0.0:2379" initial-advertise-peer-urls: "http://$private_ipv4:2380" listen-peer-urls: "http://$private_ipv4:2380"
This will generate the right variables in the systemd
unit file found at /run/systemd/system/etcd2.service.d/20-cloudinit.conf
.
$ cat /run/systemd/system/etcd2.service.d/20-cloudinit.conf [Service] Environment="ETCD_ADVERTISE_CLIENT_URLS=http://172.31.15.59:2379" Environment="ETCD_DISCOVERY=https://discovery.etcd.io/b8724b9a1456573f4d527452cba8ebdb" Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://172.31.15.59:2380" Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379" Environment="ETCD_LISTEN_PEER_URLS=http://172.31.15.59:2380"
When we have our cluster ready, we'll be able to request information as a client on the specified port:
$ etcdctl cluster-health member 7466dcc2053a98a4 is healthy: got healthy result from http://172.31.15.59:2379 member 8f9bd8a78e0cca38 is healthy: got healthy result from http://172.31.8.96:2379 member e0f77aacba6888fc is healthy: got healthy result from http://172.31.1.27:2379 cluster is healthy
We can also navigate the etcd key value store to confirm we can access it:
$ etcdctl ls /coreos.com
Fleet is a distributed init manager based on systemd that we use to schedule services on our CoreOS cluster.
The most important configuration parameters are the following:
public_ip
: This specifies which interface to use to communicate with other hosts. We want the public IP of the host so we can interact with fleet right from our workstation.metadata
: This is any key value relevant to our needs, so we can schedule units accordingly. We want to store the provider (aws
), the region (eu-west-1
), and the name of the cluster (mycluster
). This is totally arbitrary; adapt keys and values to your own needs.This is how it looks in the cloud-config.yml
file:
coreos: fleet: public-ip: "$public_ipv4" metadata: "region=eu-west-1,provider=aws,cluster=mycluster"
This will generate the right variables in the systemd unit at /run/systemd/system/fleet.service.d/20-cloudinit.conf
:
$ cat /run/systemd/system/fleet.service.d/20-cloudinit.conf [Service] Environment="FLEET_METADATA=region=eu-west-1,provider=aws,cluster=mycluster" Environment="FLEET_PUBLIC_IP=52.209.159.4"
Using fleet is outside of the scope of this book, but we can at least verify the connection to the fleet cluster manager is working from the instance:
$ fleetctl list-machines MACHINE IP METADATA 441bf02a... 52.31.10.18 cluster=mycluster,provider=aws,region=eu-west-1 b95a5262... 52.209.159.4 cluster=mycluster,provider=aws,region=eu-west-1 d9fa1d18... 52.31.109.156 cluster=mycluster,provider=aws,region=eu-west-1
We can now submit and start services on our working fleet cluster!
CoreOS can handle updates in various ways, including rebooting immediately after a new CoreOS version is made available, scheduling with etcd for an ideal time so the cluster never breaks, a mix of both (the default), or even to never reboot. We can also explicitly specify which CoreOS channel to use (stable, beta, or alpha). We want to ensure the cluster never breaks, using the etcd-lock
strategy, and be sure the stable release is used:
coreos: update: reboot-strategy: "etcd-lock" group: "stable"
This section generates the /etc/coreos/update.conf
file:
$ cat /etc/coreos/update.conf GROUP=stable REBOOT_STRATEGY=etcd-lock
We can force an update check to verify it's working (sample taken from a system with an update available):
$ sudo update_engine_client -update [0924/131749:INFO:update_engine_client.cc(243)] Initiating update check and install. [0924/131750:INFO:update_engine_client.cc(248)] Waiting for update to complete. CURRENT_OP=UPDATE_STATUS_UPDATE_AVAILABLE [...]
Now we're sure the update system is correctly triggered, we are facing a new problem: nodes from our cluster can reboot at any time when an update is available. It's probably less than desirable in a high load environment. So we can configure locksmith to allow reboots only during a specific timeframe, such as "every night from Friday to Saturday, between 4 am and 6 am". We're not limited to a single day, so we could also allow reboots any day at 4 am:
coreos: locksmith: window-start: Sat 04:00 window-length: 2h
This generates the following content in /run/systemd/system/locksmithd.service.d/20-cloudinit.conf
:
$ cat /run/systemd/system/locksmithd.service.d/20-cloudinit.conf [Service] Environment="REBOOT_WINDOW_START=04:00" Environment="REBOOT_WINDOW_LENGTH=2h"
At any time, we can check for a reboot slot availability using the locksmithctl
command:
$ locksmithctl status Available: 1 Max: 1
If another machine is currently rebooting, its ID is displayed so we know who's rebooting.
We can manage units easily from cloud-init, so critical parts of the system are started right when we need them. For example, we know we want the etcd2 and fleet services to start at every boot:
coreos: units: - name: etcd2.service command: start - name: fleet.service command: start
Flannel is used to create an overlay network across all hosts in the cluster, so containers can talk to each other over the network, whatever node they run on. To configure flannel before starting it, we can add more configuration information to the cloud-config file. We know we want our flannel network to work on the 10.1.0.0/16 network, so we can create a drop-in systemd configuration file with its content that will be executed before the flanneld
service. In this case, setting the flannel network is done by writing the key/value combination to etcd under /coreos.com/network/config
:
coreos: units: - name: flanneld.service drop-ins: - name: 50-network-config.conf content: | [Service] ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'
This will simply create the file /etc/systemd/system/flanneld.service.d/50-network-config.conf
:
$ cat /etc/systemd/system/flanneld.service.d/50-network-config.conf [Service] ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'
Verify we have a correct flannel0
interface in the correct IP network range:
$ ifconfig flannel0 flannel0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 8973 inet 10.1.19.0 netmask 255.255.0.0 destination 10.1.19.0 [...]
Launch a container to verify it's also running in the 10.1.0.0/16 network:
$ docker run -it --rm alpine ifconfig eth0 eth0 Link encap:Ethernet HWaddr 02:42:0A:01:13:02 inet addr:10.1.19.2 Bcast:0.0.0.0 Mask:255.255.255.0 [...]
It's all working great!
We now know the most useful configuration options to bootstrap automatically a CoreOS cluster using cloud-init.
3.15.237.123