This chapter covers
Much like setting up a development environment for writing software, it’s common practice to run a CoreOS cluster on your local machine. You’ll be able to use this environment to try out various configuration settings, clustering options, and, of course, your unit files before starting them in a real compute cluster. This gives you the ability to work on CoreOS without many dependencies, as well as the ability to completely blow up your systems without impacting anyone else.
You’ll use this virtualized local cluster on your machine as a workspace throughout the book and build all the example application stacks using it until the discussion gets to production deployments of CoreOS. This will let you dive into CoreOS in a well-supported way without having to deal with any of the details of normal infrastructure.
We’ll begin this chapter by looking at how to set up Vagrant, a common virtualization tool, and deploy a CoreOS cluster to it. We’ll then explore some of the basic tooling to interact with this workspace. Finally, we’ll go through the chapter 1 example of deploying a simple NGINX service to your new cluster and see how to interact with it in the context of CoreOS. By the end of this chapter, you should be set up with a three-node cluster and have a basic understanding of how to administer CoreOS, which will be essential once we dive into more-complex examples later in the book.
Vagrant (www.vagrantup.com) is an open source tool from HashiCorp to set up and manage virtual machines for development. It’s great for consistent development--environment bootstrapping; it’s a tool that acts as a configuration wrapper for a VM hypervisor of your choice. It officially supports VMware and VirtualBox; we’ll use VirtualBox (www.virtualbox.org) for all the examples in this book, because it’s also open source and freely available.
This chapter is the only place I’ll provide instructions for Windows, OS X, and Linux. After this, for the sake of simplicity, I’ll assume you have a UNIX-like OS on your workstation. There are a few more hoops you have to jump through on Windows that I’ll address later in this chapter.
It’s not absolutely required that you run your development environment on your workstation. Some people prefer to keep a development environment in the cloud for mobility reasons, or share a development environment among coworkers, or have a development environment that more closely resembles production. Although those approaches aren’t as easy or convenient as using a local cluster, CoreOS offers guides and resources for setting things up on public cloud providers with the least friction possible.
The list of officially supported platforms and how to get started with them is available at https://coreos.com/os/docs/latest/#running-coreos. Keep in mind that we’ll explore a complete AWS production deployment later in this book.
Command-line examples throughout this chapter have two possible locations from which they’re run. If the example command starts with host$, it’s a command you’re running from your workstation; if it starts with core@core-01 ~ $ (where 01 can be any number), it’s meant to be run from the CoreOS machine. In section 2.2, you’ll see how to use fleetctl with an SSH tunnel; command-line examples later in the book that begin with $ assume you’re using this tunnel, in which case it doesn’t matter whether you’re running the command from your host or on a CoreOS node.
Ideally, you’re running Windows, Linux, or OS X in a 64-bit flavor and on x86. It’s probably not impossible to run this on ARM or in 32-bit, but CoreOS only supports x86 on 64-bit, and I don’t want to cover the performance and usability impact of using an alternative architecture on the hypervisor host machine. This book’s examples will also be a lot easier to work through if you’re on anything but Windows, because you can run some of the tools from your local workstation. I haven’t tried this with the new Ubuntu for Windows 10 runtime: it may offer an easier environment for Windows users.
You’ll also want at least 3 GB of memory available to run your VMs (1 GB for each VM). You can get by with less, but this will be my assumption for the examples. You can either tune the VMs to use less, or accept the performance impact of over-allocating VM memory (meaning your host will start swapping). I also recommend having a four-core CPU, but that’s a little less important for this setup. You’ll allocate one CPU per VM, but over-allocating here shouldn’t have a huge impact. The biggest performance bottleneck will, of course, be I/O; if you can use a solid-state drive for this, it will greatly enhance your experience.
Your first step in getting up and running is to install VirtualBox. You can get the appropriate 64-bit version of VirtualBox from www.virtualbox.org; you may also choose to install the Oracle VM VirtualBox Extension Pack if you meet the requirements of its license, but it isn’t required. Alternatively, you can install VirtualBox from whatever package manager you use (APT, Homebrew, and so on). The installation should be straightforward on any OS. You may need to reboot.
Next, you need to install Vagrant. The same procedure applies: grab the installer (64-bit) from www.vagrantup.com, or install it with your OS’s package manager. At the time of writing, the latest versions of the VirtualBox and Vagrant packages (VirtualBox 5.0 and Vagrant 1.8) are well beyond the minimum required versions for CoreOS.
You also need Git installed to clone the coreos/coreos-vagrant repository. This should be available (or already installed, in some cases) through your OS’s package manager. For Windows, the easiest option—if you’re not already conversant with Git and you use some other client—is to install GitHub’s desktop client from https://desktop.github.com. You can also use this in OS X, but the command-line Git is provided for you in OS X. You don’t need a lot of Git experience; only one command is needed to get you up and running.
You’ll also want to grab the code repository for this book. Although most of the code listings (as with most technical books) are best committed to memory by typing them out rather than copying and pasting, there are some very long listings in later chapters that you should use from the repo. It’s available at www.manning.com/books/coreos-in-action.
Now that you’ve got everything installed, let’s look at how things will fit together once you’re done with this section (see figure 2.1). Your development cluster will consist of three CoreOS machines (core01–03) running within VirtualBox.
Follow these steps on your workstation:
host$ git clone https://github.com/coreos/coreos-vagrant.git
Now that everything is downloaded, we can look at how to configure Vagrant for your CoreOS development environment:
# Size of the CoreOS cluster created by Vagrant $num_instances=1To tell Vagrant (via the Vagrantfile configuration file) to start up three CoreOS instances, change the variable to read as follows:
# Size of the CoreOS cluster created by Vagrant $num_instances=3
All the examples will show the benefits of CoreOS in a cluster configuration, and three machines is the minimum for etcd clustering. If you’re resource-constrained on your desktop, you can choose to do only one instance, but understand that you probably won’t get a good sense of how CoreOS manages things at scale.
A single instance is fine for development once you’re comfortable with the platform, but I highly recommend a cluster configuration to learn all of CoreOS’s features.
# Customize VMs #$vm_gui = false #$vm_memory = 1024 #$vm_cpus = 1
You can also share some filesystems across the VM from your host machine. I won’t go into this, but it may be useful for Windows users who aren’t comfortable using command-line editors to build unit files.
Next, you open a shell session to interact with Vagrant. If you’re using the GitHub Desktop client, right-click the coreos-vagrant repository and click Open in Git Shell (see figure 2.6) so you can interact inside the Git repository. Doing so opens the shell shown in figure 2.7.
At this point, I’m finished with screenshots until I start talking about Amazon Web Services in chapter 8. All the commands are the same across all platforms—Vagrant is great for standardizing these kinds of development environments.
You’re now ready to start up your cluster. If you had to opt for a single-instance deployment, note that the output will look slightly different, but the commands are the same.
Let’s start Vagrant! With the coreos-vagrant repository as your current working directory in your shell, issue this command:
host$ vagrant up
You’ll see a bunch of things happen, which will look something like this:
Bringing machine 'core-01' up with 'virtualbox' provider... Bringing machine 'core-02' up with 'virtualbox' provider... Bringing machine 'core-03' up with 'virtualbox' provider... ==> core-01: Importing base box 'coreos-alpha'... ...etc
Once the operation has completed, you can verify that everything is up and running properly by logging in to one of the machines and using fleetctl to check the cluster:
If something didn’t work right or was interrupted unexpectedly, you can always run vagrant destroy to start over. If you see three machines, you’re finished! You now have a local cluster of CoreOS machines.
It’s important to remember that you must remain in the directory where your Vagrantfile is, to interact with your Vagrant machines. Once you change directories in your shell, things like vagrant ssh won’t work.
Your Vagrant cluster of CoreOS machines is up and running, and it’s time to learn about the tooling that’s essential to interact with CoreOS. CoreOS uses the Bash shell, and I’ll assume you have some familiarity with it as well as SSH.
This section covers the essential tools to use CoreOS: fleetctl and etcdctl. We’ll also visit the Toolbox, which is useful for debugging anything you might run into in a more familiar Linux administration environment; and we’ll go over how CoreOS may appear different than what you’re used to if you’re an experienced Linux admin.
You should understand that Vim is the only installed editor on CoreOS. Ultimately, your workflow won’t involve editing files directly on CoreOS, but for the sake of learning how things work on CoreOS, you’ll need some way to get systemd unit files on your cluster.
If you absolutely don’t want to use Vim, here are a few options:
While you’re learning about the basics of CoreOS, this book will assume you’re editing some files directly on the box (with Vim), because that’s the most universal option. Obviously, you’re going to want to set up a more formal workflow for using CoreOS in production and across a team; we’ll go into that later in the book.
fleetctl and etcdctl will be your most commonly used tools in CoreOS. They aren’t especially complicated to use, but you’ll want to be well acquainted with how they function to do anything in your CoreOS cluster. A bit of a refresher: fleet is CoreOS’s distributed scheduler; it decides when, where, and how your containers run within your cluster. It acts as an orchestrater for systemd and represents service state within your cluster. Together, for example, fleet and systemd decide how many and which machines run an NGINX service. etcd is CoreOS’s distributed configuration store; it gives you a consistent place to manage and inspect the configuration state of your cluster. These two systems make CoreOS work, and they’re the foundation on which you can take advantage of what CoreOS offers.
Before getting started with the tools, the handiest way to use fleet and etcd is from your host machine, rather than having to ssh directly to a CoreOS node before you do anything. But this will only work on OSs that aren’t Windows (although I haven’t tried it on the new Ubuntu Windows 10 runtime). You can install these with your package manager of choice, but I recommend using Homebrew for OS X or Linuxbrew for Linux specifically so you’re sure to have the latest version—some package managers don’t keep up with these tools’ release cycles. To be clear: you’re installing this software so you can use fleetctl and etcdctl from your workstation, but it’s not intended that you’ll run the fleetd and etcd daemons on your workstation.
As the client application for fleet, fleetctl gives you management over your cluster’s services’ states. It also manages the distribution of your systemd unit files. As mentioned earlier, you’ll be using fleetctl on a CoreOS machine; but you can also use it remotely with SSH tunneling. Using fleetctl with a tunnel requires that you do some preconfiguration with SSH.
You can choose one of two options to use remote fleetctl with your Vagrant cluster. The best option is if you’re already running ssh-agent:
Additionally, if you’re using ssh-agent, make sure you’re forwarding your agent socket to remote hosts. In your ~/.ssh/config file, it should look something like this:
This ensures that your agent will be available within a CoreOS machine, once you’ve sshed to it, so it can use the same agent to talk to another CoreOS machine. If you aren’t using ssh-agent, you can add Vagrant’s SSH config to your local SSH config:
host$ vagrant ssh-config core-01 >> ~/.ssh/config
You also need to discover which port Vagrant has assigned to SSH on your host (it almost always starts with 2222):
host$ vagrant port core-01 The forwarded ports for the machine are listed below. Please note that these values may differ from values configured in the Vagrantfile if the provider supports automatic port collision detection and resolution. 22 (guest) => 2222 (host)
You should now be able to ssh manually into your CoreOS node:
host$ ssh -p2222 [email protected] CoreOS alpha (928.0.0) core@core-01 ~ $
You should also be able to use fleetctl with a tunnel:
You can also export an environment variable for the tunnel, if you want to type less:
host$ export FLEETCTL_TUNNEL=127.0.0.1:2222
You’ve already used list-machines in a few examples to verify that the cluster is operating normally. You’ll see in the output of list-machines a unique hash representing a particular node in the cluster; if you want to see the full ID, you can append --full to list-machines. You can also do machine-specific operations on the short hash, such as fleetctl ssh cac39fc1, which will ssh you into that particular machine.
Let’s look at how fleetctl interacts with unit files. We’ll start with the simple example we started in chapter 1: an NGINX server. The following listing changes the example a bit to have one instance.
Once you’ve saved that, you have a few options. fleetctl has some commands that are effectively aliases for a few related commands.
To start a service in some way, you can use the following:
Most of the time, you’ll want start. But load can be useful if you want to see where units will start without actually starting them; and submit can be handy if you just want to update your unit file and then restart the service at a later time.
fleetctl maintains its own SSH known_hosts file in $HOME/.fleetctl/known_hosts. So, if you’ve ever destroyed your Vagrant cluster, new hosts may now be running on the same IPs, which may throw a known-hosts error. Clear this file.
For simplicity, you can start your service with start, although you’re welcome to use the other two commands:
core@core-01 ~ $ fleetctl start code/ch2/nginx.service Unit nginx.service inactive Unit nginx.service launched on 45b08438.../172.17.8.102
Next, let’s look at how to inspect some things about the current state. The first thing you can check is the status of all the units in the cluster:
core@core-01 ~ $ fleetctl list-units UNIT MACHINE ACTIVE SUB nginx.service 45b08438.../172.17.8.102 active running
This shows you that NGINX has successfully started on machine 45b08438. You can inspect the status of the service as well:
Although it’s great that fleetctl status shows you a lot of information, manipulating files in /run/fleet/ and in /sys/fs/cgroup/ is well outside the scope of this book and also outside the scope of administrating CoreOS in general. If you find yourself needing to do things with these files for any reason other than your own edification and exploration, you’re probably going down a road that’s difficult to maintain.
Let’s look at how you can use this information. First, let’s get into core-02, where the service is running. fleetctl ssh has a handy feature that lets you ssh into a host by passing the service name, so you don’t have to think too much about your cluster’s IPs or machine IDs:
core@core-01 ~ $ fleetctl ssh nginx Last login: Mon Jan 18 04:58:52 2016 from 172.17.8.101 CoreOS alpha (928.0.0) core@core-02 ~ $
Now, you can curl localhost to see your NGINX server:
If the fleetctl status nginx command fails with something about SSH_AUTH_SOCK, you probably didn’t add ForwardAgent yes to your SSH config.
Another great informational feature is access to the journal. As you may know, systemd uses journaled logging, which has the benefit of not filling up your filesystem with logs. I’m sure, as a professional, you’ve never had a server go down from having a filesystem full of logs (haha!). fleet has full access to this journal from any node, as well as the ability to follow the log as you would have done in the past with tail -f:
core@core-01 ~ $ fleetctl journal -f nginx -- Logs begin at Sun 2016-01-17 20:48:02 UTC. -- Jan 18 04:51:20 core-02 docker[1482]: 38267e0e16c7: Pull complete Jan 18 04:51:20 core-02 docker[1482]: 407195ab8b07: Pull complete ... etc
Now, you can remove your service. Much like starting it, there’s the same set of encompassing commands: stop, unload, and destroy. destroy both stops and unloads as well as completely removes the service files, and unload both stops and unloads the service. Let’s look at these in sequence to better understand the states.
Here, the NGINX service is loaded but not running:
core@core-01 ~ $ fleetctl stop nginx Unit nginx.service loaded on 45b08438.../172.17.8.102 core@core-01 ~ $ fleetctl list-units UNIT MACHINE ACTIVE SUB nginx.service 45b08438.../172.17.8.102 failed failed
Next, the NGINX service is removed from fleet’s registry, but the unit file is still available:
core@core-01 ~ $ fleetctl unload nginx Unit nginx.service inactive core@core-01 ~ $ fleetctl list-units UNIT MACHINE ACTIVE SUB core@core-01 ~ $ fleetctl list-unit-files UNIT HASH DSTATE STATE TARGET nginx.service fbf621b inactive inactive -
And finally, the NGINX service is completely destroyed:
core@core-01 ~ $ fleetctl destroy nginx Destroyed nginx.service core@core-01 ~ $ fleetctl list-unit-files UNIT HASH DSTATE STATE TARGET core@core-01 ~ $
You should now be pretty comfortable with how fleetctl functions and have an understanding of how to access information you need to use and administer services in CoreOS. To recap, you’ve done the following:
Next, we can move on to the other crucial bit of cluster state: etcd!
etcdctl is the user-space tool for manipulating etcd. As the name implies, it’s a daemon to store cluster-wide configuration. Everything you can do with etcdctl, you can also do with curl; it just provides a friendly wrapper around accessing and changing information.
The etcd cluster is available to any machine in the CoreOS cluster. You can make it available within a running container, but you should understand the security implications of doing so. The latest version of etcd has basic role-based access control (RBAC) to grant and restrict certain subcommands; we’ll get deeper into configuring etcd later in the book. For now, we’ll look at the basics of using etcdctl for service registration and discovery, which are the most common usage scenarios.
You can begin by exploring your etcd directory recursively:
You can get any of these endpoints, and they will return some JSON:
You may already see how some of this information can be useful for things like load balancers and networking configuration outside of the cluster.
Just as easily as getting information with etcdctl, you can set information, as well:
core@core-01 ~ $ etcdctl set /foo/bar '{ "baz": "quux" }' { "baz": "quux" }
You can also set a time-to-live (TTL) for any value:
core@core-01 ~ $ etcdctl set --ttl 3 /foo/bar '{ "baz": "quux" }'; > sleep 1; > etcdctl get /foo/bar; > sleep 3; > etcdctl get /foo/bar { "baz": "quux" } { "baz": "quux" } Error: 100: Key not found (/foo/bar) [24861]
You’ll remember from chapter 1 that the sidekick examples used a TTL of 60 seconds so that you could retain the value slightly longer than the loop sleep time to set it again. Tuning this value is important for configuring when things like load-balancer health checks run, or how long you want some kinds of failure to remain in a particular state.
etcdctl watch and watch-exec can also be used in creative ways to monitor and set configurations for live services. We’ll go into more detail on how to use these features later in the book. We’ll also go deeper into configuring etcd later; for now, knowing these basic commands is enough to get started. As you can see, etcd has a simple interface to a distributed configuration with a lot of potential. By default, any query run against the cluster will ensure that the data is in sync before it returns, so it guarantees consistency and accuracy above all else.
etcdctl and fleetctl are the tools specific to CoreOS that you’ll use all the time. But as I’m sure you know, a whole world of Linux tools and commands are available to do various things in an operating system. This is where the Toolbox comes into play.
CoreOS has a strict philosophy of being a very static system. There’s no package manager installed, and you should never rely on the local filesystem to maintain anything; etcd and fleet are the only places you store any kind of state. But sometimes you need to debug something from within the cluster—say, you need to run nmap to try to figure out why you can’t reach another host on your network from CoreOS.
This is where the Toolbox comes in. Essentially, the Toolbox is a basic Fedora Linux Docker container where you can install and use all the tools you’re used to for administration. You install and use the Toolbox as follows:
Further, your entire filesystem is mounted within the Toolbox container. So, if you want to install and use Emacs to edit files in the core home directory, you can find it mounted in /media/root:
core@core-01 ~ $ toolbox Spawning container core-fedora-latest on /var/lib/toolbox/core-fedora-latest. Press ^] three times within 1s to kill container. [root@core-01 ~]# touch /media/root/home/core/fromtoolbox [root@core-01 ~]# logout Container core-fedora-latest exited successfully. core@core-01 ~ $ ls fromtoolbox
Remember, though, that although your Toolbox will persist for the life of the machine, an update will clobber anything you save there. It’s meant only for debugging. Resist the temptation to use the Toolbox to serve anything or perform tasks that require its persistence.
Don’t forget that the Toolbox image will take up about 200 MB of disk, which is a lot considering how small CoreOS is to begin with. You can always use docker rmi fedora to clean it up completely.
Remember, though, that the goal with CoreOS is that you only ever need to ssh into a machine for development or for serious debugging needs. If you find yourself using the Toolbox frequently or for some repeated tasks, you may want to consider how you can automate your task with etcd and fleet.
Some conceptual changes Linux admins face are probably obvious, given the circumstances in which you’d use the Toolbox (for example, just as a utility, not a workstation environment). There’s no package manager in CoreOS by design, and poking around in the OS from a terminal session on the host isn’t something you should do or have to do on a regular basis. You should consider all the data on any particular filesystem of any given machine to be ephemeral and unimportant on its own. If you’re already used to working with public cloud systems, this shouldn’t be too much of a hurdle.
Dealing with that ephemeral state can be a little daunting, and I’m sure your first thought is, “Then how do I do databases?” The answer is a bit complex and depends on the technologies you’re using. Some data systems handle this architecture within their own design (Elasticsearch, Riak, Mongo, and so on), and others will probably need some help (such as PostgreSQL). As a general rule, software that’s designed to scale horizontally will be easier to implement here than software that isn’t. For the latter, there are a few solutions that we’ll get into later in the book.
Because you almost never ssh into a machine to do anything for administration, you’ll also find that you won’t need to be too concerned with managing users and permissions in CoreOS. If you find you really, really, need to do that kind of thing, it’s possible, but expect your cloud-config to become more complex.
You’ll also notice the lack of configuration management in general. I touched on this in chapter 1, but the initial state is always defined by cloud-config. Beyond this initial state, there isn’t much to do unless you’re debugging or testing things in your local test cluster, and therefore there’s no need for traditional configuration-management suites (Puppet, Chef, and so on). It’s entirely possible for you to set up cloud-config to bootstrap Chef, but the point of CoreOS isn’t to alter the state of a machine once it’s up, and doing so would serve little purpose.
Another aspect of normal system administration that you may be wondering about is updates. Configuration management or something you have to set up has probably been your go-to for keeping systems up at scale; what’s happening for CoreOS?
If you’ve spun up your development cluster following the instructions in this chapter, and it’s been running a few days on your workstation, and if you’re a very observant person, you may have noticed the login message change when you sshed into a machine: for example, from CoreOS alpha (928.0.0) to CoreOS alpha (933.0.0). Or, you may see that your machine’s uptime doesn’t match how long you know you’ve been running this cluster. CoreOS updates itself. It does so by installing the new version on a “B” partition and rebooting one machine at a time in the cluster. This method solves a number of problems with update management, and it’s also a tunable process that we’ll go into in much more depth later.
3.22.66.208