IaaS clouds have been popularized through heavy usage of virtual machines. Recent initiatives are targeting bare metal servers with an API, so we get the best of both worlds—on-demand servers through an API and incredible performance through direct access to the hardware. https://www.packet.net/ is a bare metal IaaS provider (https://www.scaleway.com/ is another) very well supported by Terraform with an awesome global network. Within minutes we have new hardware ready and connected to the network.
We'll build a fully automated and scalable Docker Swarm cluster, so we can operate highly scalable and performant workloads on bare metal: this setup can scale thousands of containers in just a few minutes. This cluster is composed of Type 0 machines (4 cores and 8 GB RAM), for one manager and 2 nodes, totaling 12 cores and 24 GB of RAM, but we can use more performant machines if we want: the same cluster with Type 2 machines will have 72 cores and 768 GB of RAM (though the price will adapt accordingly).
To step through this recipe, you will need the following:
Let's start by creating the packet
provider, using the API key (an authentication token). Create the variable in variables.tf
:
variable "auth_token" { default = "1234" description = "API Key Auth Token" }
Also, be sure to override the value in terraform.tfvars
with the real token:
auth_token = "JnN7e6tPMpWNtGcyPGT93AkLuguKw2eN"
Packet, like some other IaaS providers, uses the notion of project to group machines. Let's create a project named Docker Swarm Bare Metal Infrastructure
, since that's what we want to do, in a projects.tf
file:
resource "packet_project" "swarm" { name = "Docker Swarm Bare Metal Infrastructure" }
This way, if you happen to manage multiple projects or customers, you can split them all into their own projects.
To connect to the machines using SSH, we need at least one public key uploaded to our Packet account. Let's create a variable to store it in variables.tf
:
variable "ssh_key" { default = "keys/admin_key" description = "Path to SSH key" }
Don't forget to override the value in terraform.tfvars
if you use another name for the key.
Let's use the packet_ssh_key
resource to create the SSH key on our Packet account:
resource "packet_ssh_key" "admin" { name = "admin_key" public_key = "${file("${var.ssh_key}.pub")}" }
We'll create two types of servers for this Docker Swarm cluster: managers and nodes. Managers are controlling what's executed on the nodes. We'll start by bootstrapping the Docker Swarm manager server, using the Packet service (more alternatives are available from Packet API):
baremetal_0
)ams1
)ubuntu_16_04_image
)root
hourly
, but that can be monthly
as wellLet's put generic information in variables.tf
so we can manipulate them:
variable "facility" { default = "ewr1" description = "Packet facility (us-east=ewr1, us-west=sjc1, eu-west=ams1)" } variable "plan" { default = "baremetal_0" description = "Packet machine type" } variable "operating_system" { default = "coreos_stable" description = "Packet operating_system" } variable "ssh_username" { default = "root" description = "Default host username" }
Also, override them in terraform.tfvars
to match our values:
facility = "ams1" operating_system = "ubuntu_16_04_image"
To create a server with Packet, let's use the packet_device
resource, specifying the chosen plan, facility, operating system, billing, and the project in which it will run:
resource "packet_device" "swarm_master" { hostname = "swarm-master" plan = "${var.plan}" facility = "${var.facility}" operating_system = "${var.operating_system}" billing_cycle = "hourly" project_id = "${packet_project.swarm.id}" }
Now, let's create two scripts that will execute when the server is ready. The first one will update Ubuntu (update_os.sh
) while the second will install Docker (install_docker.sh
).
#!/usr/bin/env bash # file: ./scripts/update_os.sh sudo apt update -yqq sudo apt upgrade -yqq
This script will install and start Docker:
#!/usr/bin/env bash # file: ./scripts/install_docker.sh curl -sSL https://get.docker.com/ | sh sudo systemctl enable docker sudo systemctl start docker
We can now call those scripts as a remote-exec
provisioner inside the packet_device
resource:
provisioner "remote-exec" { connection { user = "${var.ssh_username}" private_key = "${file("${var.ssh_key}")}" } scripts = [ "scripts/update_os.sh", "scripts/install_docker.sh", ] }
At this point, the system is fully provisioned and functional, with Docker running.
To initialize a Docker Swarm cluster, starting with Docker 1.12, we can just issue the following command:
$ docker swarm init --advertise-addr docker.manager.local.ip
A server at Packet has one interface sharing both public and private IP addresses. The private IP is the second one, and is available through the following exported attribute: ${packet_device.swarm_master.network.2.address}
. Let's create another remote-exec
provisioner, so the Swarm manager is initialized automatically, right after bootstrap:
provisioner "remote-exec" { connection { user = "${var.ssh_username}" private_key = "${file("${var.ssh_key}")}" } inline = [ "docker swarm init --advertise-addr ${packet_device.swarm_master.network.2.address}", ] }
At this point, we have a Docker cluster running, with only one node—the manager itself.
The last step is to store the Swarm token, so the nodes can join. The token can be obtained with the following command:
$ docker swarm join-token worker -q
We'll store this token in a simple file in our infrastructure repository (worker.token
), so we can access it and version it. Let's create a variable to store our token in a file in variables.tf
:
variable "worker_token_file" { default = "worker.token" description = "Worker token file" }
We will execute the previous docker swarm
command through SSH when everything else is done, using a local-exec
provisioner. As we can't interact with the process, let's skip the host key checking and other initial SSH checks:
provisioner "local-exec" { command = "ssh -t -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ${var.ssh_key} ${var.ssh_username}@${packet_device.swarm_master.network.0.address} "docker swarm join-token worker -q" > ${var.worker_token_file}" }
We're now done with the Docker Swarm manager!
We need nodes to join the swarm, so the workload can be spread. For convenience, the machine specs for the nodes will be the same as that of the master. Here's what will happen:
Let's start by creating a variable for the number of nodes we want, in variables.tf
:
variable "num_nodes" { default = "1" description = "Number of Docker Swarm nodes" }
Override that value as the cluster grows in terraform.tfvars
:
num_nodes = "2"
Create the nodes using the same packet_device
resource we used for the master:
resource "packet_device" "swarm_node" { count = "${var.num_nodes}" hostname = "swarm-node-${count.index+1}" plan = "${var.plan}" facility = "${var.facility}" operating_system = "${var.operating_system}" billing_cycle = "hourly" project_id = "${packet_project.swarm.id}" }
Add a file
provisioner to copy the token file:
provisioner "file" { source = "${var.worker_token_file}" destination = "${var.worker_token_file}" }
Using the same update and Docker installation scripts as the master, create the same remote-exec
provisioner:
provisioner "remote-exec" { connection { user = "${var.ssh_username}" private_key = "${file("${var.ssh_key}")}" } scripts = [ "scripts/update_os.sh", "scripts/install_docker.sh", ] }
The operating system is now fully updated and Docker is running.
Now we want to join the Docker Swarm cluster. To do this, we need two pieces of information: the token and the local IP of the master. We already have the token in a file locally, and Terraform knows the local IP of the swarm manager. So a trick is to create a simple script (I suggest you write a more robust one!), that reads the local token, and takes the local manager IP address as an argument. In a file named scripts/join_swarm.sh
, enter the following lines:
#!/usr/bin/env bash # file: scripts/join_swarm.sh MASTER=$1 SWARM_TOKEN=$(cat worker.token) docker swarm join --token ${SWARM_TOKEN} ${MASTER}:2377
Now we just have to send this file to the nodes using the file
provisioner:
provisioner "file" { source = "scripts/join_swarm.sh" destination = "join_swarm.sh" }
Use it as a last step through a remote-exec
provisioner, sending the local Docker master IP (${packet_device.swarm_master.network.2.address}"
) as an argument to the script:
provisioner "remote-exec" { connection { user = "${var.ssh_username}" private_key = "${file("${var.ssh_key}")}" } inline = [ "chmod +x join_swarm.sh", "./join_swarm.sh ${packet_device.swarm_master.network.2.address}", ] }//.
Launch the whole infrastructure:
$ terraform apply Outputs: Swarm Master Private IP = 10.80.86.129 Swarm Master Public IP = 147.75.100.19 Swarm Nodes = Public: 147.75.100.23,147.75.100.3, Private: 10.80.86.135,10.80.86.133
Our cluster is running.
Using our Docker Swarm cluster is out of the scope of this book, but now we have it, let's take a quick look to scale a container to the thousands!
Verify we have our 3 nodes:
# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 9sxqi2f1pywmofgf63l84n7ps * swarm-master.local.lan Ready Active Leader ag07nh1wzsbsvnef98sqf5agy swarm-node-1.local.lan Ready Active cppk5ja4spysu6opdov9f3x8h swarm-node-2.local.lan Ready Active
We want a common network for our containers, and we want to scale to the thousands. So a typical /24 network won't be enough (that's the docker network
default). Let's create a /16 overlay network, so we have room for scale!
# docker network create -d overlay --subnet 172.16.0.0/16 nginx-network
Create a Docker service that will simply launch an nginx container on this new overlay network, with 3 replicas (3 instances of the container running at the same time):
# docker service create --name nginx --network nginx-network --replicas 3 -p 80:80/tcp nginx
Verify if it's working:
# docker service ls ID NAME REPLICAS IMAGE COMMAND aeq9lspl0mpg nginx 3/3 nginx
Now, accessing by HTTP any of the public IPs of the cluster, any container of any node can answer: we can make an HTTP request to node-1, and it can be a container on node-2 responding. Nice!
Let's scale our service now, from 3 replicas to 100:
# docker service scale nginx=100 nginx scaled to 100 # docker service ls ID NAME REPLICAS IMAGE COMMAND aeq9lspl0mpg nginx 100/100 nginx
We just scaled to a hundred containers in a few seconds and split them on all 3 bare metal machines.
Now, you know you can scale, and with such a configuration you can push the nginx
service to 500, 1000, or maybe more!
3.139.239.41