Configuring CoreOS using cloud-init

CoreOS supports its own version of cloud-init, with added support for the CoreOS environment, and without everything else incompatible with its environment, so we can boot a fully configured system and cluster.

We'll take a look at the CoreOS specificities, as we can refer to earlier tips on how to manage users, files, authorized SSH keys, and other standard cloud-init directives. At the end of this part, you'll know how to configure the etcd key value store, the fleet cluster manager, the flannel overlay network, control the update mechanism, and ensure systemd units are started as early as possible.

Note

CoreOS proposes a very useful cloud-config file validator at https://coreos.com/validate/. It's super useful when we're not sure if a directive is supported or not in the distribution.

Getting ready

To step through this recipe, you will need:

  • Access to a cloud-config enabled infrastructure

How to do it…

We'll get through the most important configuration options that can be manipulated for CoreOS. This includes the etcd distributed key value store, the fleet scheduler, the fleet network, the update strategy, and some systemd unit configuration.

Configuring etcd using cloud-init

The etcd key value store is used in CoreOS to share multiple configuration data between members of a same cluster. To begin with, we need a discovery token, that can be obtained from https://discovery.etcd.io/new.

$ curl -w "
" 'https://discovery.etcd.io/new'
https://discovery.etcd.io/638d980c4edf94d6ddff8d6e862bc7d9

Note

We can specify the minimum required size of the CoreOS cluster by adding the size= argument to the URL https://discovery.etcd.io/new?size=3.

Now we have a valid discovery token, let's add it to our cloud-config.yml file under the etcd2 directive:

#cloud-config
coreos:
  etcd2:
    discovery: "https://discovery.etcd.io/638d980c4edf94d6ddff8d6e862bc7d9"

The next step is to configure etcd:

  • How should etcd listen for peer traffic? (listen-peer-urls). We want the local interface on the default port (TCP/2380).
  • How should etcd listen for client traffic? (listen-client-urls). We want all available interfaces on the default port (TCP/2379).
  • How should etcd initially advertise to the rest of the cluster? (initial-advertise-peer-urls). We want the local interface, using the same peer traffic port (TCP/2380).
  • How should etcd advertise the client URLs to the rest of the cluster? (advertise-client-urls). We want the local interface, using the same client traffic port (TCP/2379).

To make it more dynamic, we can use variables compatible with most IaaS providers—$private_ipv4 and $public_ipv4.

This is how our cloud-config.yml file looks with all the etcd configuration:

#cloud-config
coreos:
  etcd2:
    discovery: "https://discovery.etcd.io/b8724b9a1456573f4d527452cba8ebdb"
    advertise-client-urls: "http://$private_ipv4:2379"
    listen-client-urls: "http://0.0.0.0:2379"
    initial-advertise-peer-urls: "http://$private_ipv4:2380"
    listen-peer-urls: "http://$private_ipv4:2380"

This will generate the right variables in the systemd unit file found at /run/systemd/system/etcd2.service.d/20-cloudinit.conf.

$ cat /run/systemd/system/etcd2.service.d/20-cloudinit.conf
[Service]
Environment="ETCD_ADVERTISE_CLIENT_URLS=http://172.31.15.59:2379"
Environment="ETCD_DISCOVERY=https://discovery.etcd.io/b8724b9a1456573f4d527452cba8ebdb"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://172.31.15.59:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=http://172.31.15.59:2380"

When we have our cluster ready, we'll be able to request information as a client on the specified port:

$ etcdctl cluster-health
member 7466dcc2053a98a4 is healthy: got healthy result from http://172.31.15.59:2379
member 8f9bd8a78e0cca38 is healthy: got healthy result from http://172.31.8.96:2379
member e0f77aacba6888fc is healthy: got healthy result from http://172.31.1.27:2379
cluster is healthy

We can also navigate the etcd key value store to confirm we can access it:

$ etcdctl ls
/coreos.com

Configuring fleet using cloud-init

Fleet is a distributed init manager based on systemd that we use to schedule services on our CoreOS cluster.

The most important configuration parameters are the following:

  • public_ip: This specifies which interface to use to communicate with other hosts. We want the public IP of the host so we can interact with fleet right from our workstation.
  • metadata: This is any key value relevant to our needs, so we can schedule units accordingly. We want to store the provider (aws), the region (eu-west-1), and the name of the cluster (mycluster). This is totally arbitrary; adapt keys and values to your own needs.

This is how it looks in the cloud-config.yml file:

coreos:
  fleet:
    public-ip: "$public_ipv4"
    metadata: "region=eu-west-1,provider=aws,cluster=mycluster"

This will generate the right variables in the systemd unit at /run/systemd/system/fleet.service.d/20-cloudinit.conf:

$ cat /run/systemd/system/fleet.service.d/20-cloudinit.conf
[Service]
Environment="FLEET_METADATA=region=eu-west-1,provider=aws,cluster=mycluster"
Environment="FLEET_PUBLIC_IP=52.209.159.4"

Using fleet is outside of the scope of this book, but we can at least verify the connection to the fleet cluster manager is working from the instance:

$ fleetctl list-machines
MACHINE         IP              METADATA
441bf02a...     52.31.10.18     cluster=mycluster,provider=aws,region=eu-west-1
b95a5262...     52.209.159.4    cluster=mycluster,provider=aws,region=eu-west-1
d9fa1d18...     52.31.109.156   cluster=mycluster,provider=aws,region=eu-west-1

We can now submit and start services on our working fleet cluster!

Configuring the update strategy using cloud-init

CoreOS can handle updates in various ways, including rebooting immediately after a new CoreOS version is made available, scheduling with etcd for an ideal time so the cluster never breaks, a mix of both (the default), or even to never reboot. We can also explicitly specify which CoreOS channel to use (stable, beta, or alpha). We want to ensure the cluster never breaks, using the etcd-lock strategy, and be sure the stable release is used:

coreos:
  update:
    reboot-strategy: "etcd-lock"
    group: "stable"

This section generates the /etc/coreos/update.conf file:

$ cat /etc/coreos/update.conf
GROUP=stable
REBOOT_STRATEGY=etcd-lock

We can force an update check to verify it's working (sample taken from a system with an update available):

$ sudo update_engine_client -update
[0924/131749:INFO:update_engine_client.cc(243)] Initiating update check and install.
[0924/131750:INFO:update_engine_client.cc(248)] Waiting for update to complete.
CURRENT_OP=UPDATE_STATUS_UPDATE_AVAILABLE
[...]

Configuring locksmith using cloud-init

Now we're sure the update system is correctly triggered, we are facing a new problem: nodes from our cluster can reboot at any time when an update is available. It's probably less than desirable in a high load environment. So we can configure locksmith to allow reboots only during a specific timeframe, such as "every night from Friday to Saturday, between 4 am and 6 am". We're not limited to a single day, so we could also allow reboots any day at 4 am:

coreos:
  locksmith:
    window-start: Sat 04:00
    window-length: 2h 

This generates the following content in /run/systemd/system/locksmithd.service.d/20-cloudinit.conf:

$ cat /run/systemd/system/locksmithd.service.d/20-cloudinit.conf
[Service]
Environment="REBOOT_WINDOW_START=04:00"
Environment="REBOOT_WINDOW_LENGTH=2h"

At any time, we can check for a reboot slot availability using the locksmithctl command:

$ locksmithctl status
Available: 1
Max: 1

If another machine is currently rebooting, its ID is displayed so we know who's rebooting.

Configuring systemd units using cloud-init

We can manage units easily from cloud-init, so critical parts of the system are started right when we need them. For example, we know we want the etcd2 and fleet services to start at every boot:

  coreos: 
units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start 

Configuring flannel using cloud-init

Flannel is used to create an overlay network across all hosts in the cluster, so containers can talk to each other over the network, whatever node they run on. To configure flannel before starting it, we can add more configuration information to the cloud-config file. We know we want our flannel network to work on the 10.1.0.0/16 network, so we can create a drop-in systemd configuration file with its content that will be executed before the flanneld service. In this case, setting the flannel network is done by writing the key/value combination to etcd under /coreos.com/network/config:

coreos:
  units:
    - name: flanneld.service
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'

This will simply create the file /etc/systemd/system/flanneld.service.d/50-network-config.conf:

$ cat /etc/systemd/system/flanneld.service.d/50-network-config.conf
[Service]
ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'

Verify we have a correct flannel0 interface in the correct IP network range:

$ ifconfig flannel0
flannel0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 8973
        inet 10.1.19.0  netmask 255.255.0.0  destination 10.1.19.0
[...]

Launch a container to verify it's also running in the 10.1.0.0/16 network:

$ docker run -it --rm alpine ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 02:42:0A:01:13:02
          inet addr:10.1.19.2  Bcast:0.0.0.0  Mask:255.255.255.0
[...]

It's all working great!

Note

Note that it may take a while to get the interface up, depending on the host Internet connection speed, as flannel is running from a container that needs to be downloaded first (51 MB to date).

We now know the most useful configuration options to bootstrap automatically a CoreOS cluster using cloud-init.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.123