Chapter 3. Porting an Existing Application to Mesos

It’s time to learn how to build an application on top of Mesos. Rather than building everything from first principles, however, let’s take a look how to utilize existing frameworks to port our legacy applications. When we think about most applications, they typically fall into two categories: applications that respond to requests and applications that do actions at a particular time. In the LAMP stack, these two components are PHP and cron jobs.

First, we’re going to look at how to move an existing HTTP-based application from your current infrastructure onto Mesos. In doing so, we’re going to begin to be able to take advantage of Mesos’s scalability and resiliency, ending up with a system that can automatically heal and recover from common failure classes. Besides improving the resiliency of the application, we’re also going to improve the isolation of the application’s components. This will help us to achieve a better quality of service without struggling and suffering to build this directly on virtual machines. We’ll use Marathon, a popular Mesos framework, to host our HTTP-based application.

Then, we’ll look at Chronos as a case study in using Marathon to add high availability and reliability to other frameworks. Chronos will enable us to run programs at a specified interval—it can be used to schedule nightly data generation jobs or reports every 15 minutes. We’ll also go over some recommendations for utilizing Chronos effectively and maintainably.

Finally, we’ll briefly touch on some alternatives to Marathon, Singularity and Aurora, which were developed by HubSpot and Twitter, respectively.

Moving a Web Application to Mesos

Nearly every company has a web application. However, no matter whether it’s written in Ruby on Rails, Scala Play!, or Python’s Flask, deploying it in a reliable, scalable, highly available manner is always a challenge. In this section, we’re going to learn about using Marathon—an easy-to-use Platform as a Service (PaaS) developed by Mesosphere—and how to integrate with other tools, such as HAProxy. Through this process, it is our goal to realize a few benefits over previous architectures:

  1. All of our backend processes will be able to run on any machine, and our Mesos framework will handle automatically spinning up new instances of the backend when existing instances fail.

  2. We will host our static assets, fast-responding endpoints, dangerous endpoints,1 and API in entirely different containers to improve our isolation.

  3. We will make it easy to deploy a new version or to roll back, and to do so in a matter of seconds.

  4. We will put the pieces in place to allow our applications to autoscale based on demand.

At its core, Marathon allows you to specify a command line or Docker image, number of CPUs, amount of memory, and number of instances, and it will start that many instances of that command line or image in containers with the requested amount of CPU and memory. We will explore this basic configuration as well as a few ways to integrate Marathon with load balancers, in order to expose our services internally and to the outside world.

Setting Up Marathon

Obviously, in order to use Marathon, you’ll need to download it. Marathon comes prepackaged for Ubuntu, Debian, Red Hat, and CentOS on the Mesosphere website. Of course, you can also build it yourself, by downloading the latest tarball from Mesosphere’s website, or by cloning it from GitHub. Marathon is written in Scala, so you simply need to run sbt assembly to build it. From here on out, we’ll assume that you’ve got a copy of Marathon installed.

Marathon is trivial to run as a highly available application. In fact, all you need to do to run Marathon in a highly available configuration is to ensure that you start two or three instances of Marathon, and that they all share the same --zk command-line argument. You should also make other options the same, such as the --master, --framework-name, and the ports that it listens for HTTP requests on, or you could see confusing behavior. When running Marathon in this mode, all the instances of Marathon will correctly respond to requests, and they’ll magically synchronize their state so that you don’t need to worry about this. Typically, you’ll finish the setup by assigning the instances of Marathon to a round-robin DNS name, or putting it behind a load balancer. Please see Table 3-1 for further information on selected command-line options.

Table 3-1. Selected Marathon command-line options
Flag Functionality

--master <instance>

The URL of the Mesos master. Usually of the form zk://host1:port,host2:port,host3:port/mesos.

--framework-name <name>

Allows you to customize the name of this instance of Marathon in Mesos; useful if you have multiple instances of Marathon running to stay organized.

--zk <instance>

Sets the instance of ZooKeeper in which Marathon’s state is stored. Usually of the form zk://host1:port,host2:port,host3:port/marathon.

--https-port <port>

Enables HTTPS access to Marathon. See “Securing Marathon and Chronos” for details.

Using Marathon

Marathon is controlled entirely through its HTTP API. With a working instance of Marathon, let’s try launching an application.

We’ll run a simple application: the famous Python SimpleHTTPServer. This application simply serves the files in the directory in which it was started. We’re going to base our example on it because nearly everyone’s got Python installed, and it’s available by default.

If you run this command:

python -m SimpleHTTPServer 8000

on your computer, then you should be able to navigate to localhost:8000 in your browser and see a directory listing of the folder in which you started the server.

Let’s see how we can run that on a Marathon server running on marathon.example.com:8080. We’ll first make a file containing the JSON descriptor of the Marathon application shown in Example 3-1.

Example 3-1. SimpleHTTPServer JSON descriptor
{
    "cmd": "python -m SimpleHTTPServer 31500", 1
    "cpus": 0.5, 2
    "mem": 50, 3
    "instances": 1, 4
    "id": "my-first-app", 5
    "ports": [31500], 6
    "requirePorts": true 7
}
1

The "cmd" specifies the command line that will launch your application.

2

"cpus" specifies the number of CPUs that your application’s container should have. This number can be fractional.

3

"mem" specifies the number of MB of memory that your application’s container should have.

4

"instances" specifies how many copies of this application should be started on the cluster. For this demo, we’ll only make one instance, but typical applications will have anywhere from 2 to 2,000 instances.

5

"id" is how we’ll refer to this application in the future via the APIs. This should be unique for each application.

6

"ports" is an array of the ports that the application requires. In this case, we only need a single port. Later, we’ll see how to dynamically allocate and bind to ports.

7

Since we’re explicitly specifying which ports we want to allocate, we must tell Marathon, or else it will default to automatic port assignment.

Now let’s suppose that data has been stored in a file called my-first-app.json. In order to start this application, we’ll use an HTTP POST to send the data to a Marathon server running at marathon.example.com:8080:

curl -X POST -H 'Content-Type: application/json' 
    marathon.example.com:8080/v2/apps --data @my-first-app.json

This should give us a response that looks something like:

{
   // This section contains parameters we specified
   "id" : "/my-first-app",
   "instances" : 1,
   "cmd" : "python -m SimpleHTTPServer 31500",
   "ports" : [ 31500 ],
   "requirePorts": true,
   "mem" : 50,
   "cpus" : 0.5,

   // This section contains parameters with their automatically assigned values
   "backoffFactor" : 1.15,
   "maxLaunchDelaySeconds" : 3600,
   "upgradeStrategy" : {
      "minimumHealthCapacity" : 1,
      "maximumOverCapacity" : 1
   },
   "version" : "2015-06-04T18:26:18.834Z",
   "deployments" : [
      {
         "id" : "54e5fdf8-8a81-4f95-805f-b9ecc9293095"
      }
   ],
   "backoffSeconds" : 1,
   "disk" : 0,
   "tasksRunning" : 0,
   "tasksHealthy" : 0,
   "tasks" : [],
   "tasksStaged" : 0,

   // This section contains unspecified parameters
   "executor" : "",
   "storeUrls" : [],
   "dependencies" : [],
   "args" : null,
   "healthChecks" : [],
   "uris" : [],
   "env" : {},
   "tasksUnhealthy" : 0,
   "user" : null,
   "requirePorts" : true,
   "container" : null,
   "constraints" : [],
   "labels" : {},
}

which indicates that the application was started correctly.

We can find all the information about the running application by querying for information about my-first-app (what we decided to name this application):

curl marathon.example.com:8080/v2/apps/my-first-app | python -m json.tool

This query returns almost the exact same data as we got when we created the application. The difference is that by now our application has had time to start running, so we can see information about the running tasks:

// ... snip (same as before) ...
    "tasks": [
        {
            "appId": "/my-first-app",
            "host": "10.141.141.10", 1
            "id": "my-first-app.7345b7b5-0ae7-11e5-b3a7-56847afe9799", 2
            "ports": [ 31500 ], 3
            "stagedAt": "2015-06-04T18:28:09.235Z", 4
            "startedAt": "2015-06-04T18:28:09.404Z", 5
            "version": "2015-06-04T18:28:05.214Z" 6
        }
    ],
    "tasksHealthy" : 1, 7
// ... snip (same as before) ...
1

This tells us the DNS name or IP address of the host that is running the task.

2

The id of the task is a Mesos construct. We can use the Mesos master’s UI to inspect this task; for instance, we can examine its stdout and stderr.

3

Whether we chose the port that the application is running on or allowed Marathon to choose for us (see Example 3-2 for details), we can find the actually reserved ports for the task.

4

The staging time of the task is when we submitted the task to a Mesos offer.

5

The start time of the task is when Mesos actually got around to launching the task.

6

Every time we update the Marathon descriptor, it gets stamped with a new version (see the response to the initial POST given earlier). Every task is annotated with the version from which it was launched.

7

This gives a summary of the number of currently healthy tasks in the application.

Let’s look at some of the rich information available about the application. First, we can see all of the configuration parameters that we specified when we created the application. We also see tons and tons of other settings, many of which we’ll go over soon. The next very interesting field is "tasks", which is an array of all of the actual running instances of the application. Each task is represented as a JSON object, which has several useful fields:

  • "host" and "ports" tell us what the actual host and ports that were assigned to this task are. These fields are the key to discovering where an application is actually running so that we can connect to it.

  • "id" is the mesos TaskID. This is constructed by adding a unique per-task UUID to the application’s ID, which allows for convenient discovery with the Mesos CLI tools or in the Mesos UI.

  • "stagedAt" and "startedAt" give us information about when lifecycle events happened to this task. A task is staged once Marathon actually requests Mesos to start the task on a particular offer; a task is started once it begins running.

Upcoming Change in Marathon REST API

The REST API examples in this book are valid as of Marathon 0.9. In the future, Marathon will filter the results so requests don’t need to return megabytes of JSON data. When this happens, you’ll need to specify an additional, repeatable URL parameter: embed. That is, instead of /v2/apps/my-first-app, you need to request /v2/apps/my-first-app?embed=app.counts&embed=app .tasks to retrieve the data for the preceding examples. You can read about all of the options for embed in the Marathon REST API documentation.

Usually, you won’t want to hardcode the port for your application: after all, if you did this, then you’d only ever be able to run one instance of the application per slave. Marathon can specify the port for you, instead: simply choose each port your application requires as 0, and the assigned ports will be available in the environment variables $PORT0, $PORT1, etc. If we changed our earlier configuration to take advantage of this, we’d end up with Example 3-2.

Example 3-2. SimpleHTTPServer JSON descriptor with dynamically chosen port
{
    "cmd": "./python -m SimpleHTTPServer $PORT0", 1
    "cpus": 0.5,
    "mem": 50,
    "instances": 1,
    "id": "my-first-app",
    "ports": [0], 2
    "uris": [
        "http://fetchable/uri/with/python.tar.gz" 3
    ]
}
1

Note that we use $PORT0 to access the environment variable that contains the port assigned to the first index of the "ports" array.

2

We provided 0 as the desired port, meaning that we’re allowing Marathon to choose for us.

3

We can also provide a list of URIs to be fetched and unzipped into the container before running the command. See “Configuring the process’s environment” for details on the URI schemes supported.

Some applications you’d like to run on Marathon don’t accept their ports via the command line. Don’t worry! There’s an easy way for us to propagate our $PORT0 environment variables into configuration files. I’ll assume that the application’s configuration file is set as a URL to be downloaded with the application. Now, we’re going to use sed to replace a special string in the configuration file with the desired environment variable, be it a $PORT0 or the $MESOS_DIRECTORY:

sed -e 's#@MESOSDIR@#'"$MESOS_DIRECTORY"'#' config.templ > config 1
sed -e 's#@PORT@#'"$PORT0"'#' config2.templ > config2 2
1

In our first example, we are replacing the string @MESOSDIR@ with the task’s sandbox directory. We surround the string in the configuration file with @ symbols, because they so rarely show up in configuration languages (of course, you could use any unique string). Also, we use single quotes around the “constants” in our sed command and double quotes around the environment variable to ensure that our shell doesn’t corrupt our command. Finally, we use # instead of / for the sed command so that we can successfully template an environment variable that itself contains forward slashes.

2

Here, we template the port at index 0 into the configuration file. In fact, we could pass multiple -e arguments to a single invocation of sed if we wanted to template several variables in one configuration file.

We’ll look at other fields as we cover the features that utilize them.

Scaling Your Application

Let’s see what it would take to scale our HTTP server to have five instances. Ready?

curl -X PUT -H 'Content-Type: application/json' 
    marathon.example.com:8080/v2/apps --data '{"instances": 5}'

Of course, no one would ever actually want to have five instances of such a useless application, so why don’t we scale it down?

curl -X PUT -H 'Content-Type: application/json' 
    marathon.example.com:8080/v2/apps --data '{"instances": 1}'

Luckily, scaling is extremely easy!

Using Placement Constraints

Marathon supports constraints on where the application is launched. These constraints can be driven either by the hostname of the slave or any slave attribute. Constraints are provided in an array to the application; each constraint is itself an array of two or three elements, depending on whether there’s an argument. Let’s take a look at what constraints are available and how to use them:

GROUP_BY

This operator ensures that instances of the application are evenly spread across nodes with a particular attribute. You can use this to spread the application equitably across hosts (Example 3-3) or racks (assuming that the rack is encoded by the rack attribute on the slave; see Example 3-4).

Example 3-3. Spread evenly by host
{
    // ... snip ...
    "constraints": [["hostname", "GROUP_BY"]]
}
Example 3-4. Spread evenly by rack
{
    // ... snip ...
    "constraints": [["rack", "GROUP_BY"]]
}
UNIQUE

This operator ensures that every instance of the application has a different value for the UNIQUE constraint. It’s similar to GROUP_BY, except if there are not enough different values for the attribute, the application will fail to fully deploy, rather than running multiple instances on some slaves. Usually, GROUP_BY is the best choice for availability, since it’s generally preferable to have multiple tasks running on a single slave rather than fewer than the desired number. Example 3-5 shows how you could use UNIQUE to ensure that there’s no more than one instance of some application per slave in the cluster.

Example 3-5. Run at most one instance per host
{
    // ... snip ...
    "constraints": [["hostname", "UNIQUE"]]
}
CLUSTER

This operator allows you to run an application only on slaves with a specific value for an attribute. If certain slaves have special hardware configurations, or if you want to restrict placement to a specific rack, this operator can be useful; however, LIKE is more powerful and often a better fit. For example, suppose that we’ve got a mixture of ARM and x86 processors in our data center, and we’ve made a cpu_arch attribute for our slaves to help us distinguish their architectures. Example 3-6 shows how to ensure that our application only runs on x86 processors.

Example 3-6. Only run on x86
{
    // ... snip ...
    "constraints": [["cpu_arch", "CLUSTER", "x86"]] 1
}
1

Note that the CLUSTER operator takes an argument.

LIKE

This operator ensures that the application runs only on slaves that have a certain attribute, and where the value of that attribute matches the provided regular expression. This can be used like CLUSTER, but it has more flexibility, since many values can be matched. For example, suppose that we’re running on Amazon, and we set the instance_type attribute on all of our slaves. Example 3-7 shows how we could restrict our application to run only on the biggest C4 compute-optimized machines.

Example 3-7. Only run on c4.4xlarge and c4.8xlarge machines
{
    // ... snip ...
    "constraints": [["cpu_arch", "LIKE", "c4.[48]xlarge"]] 1
}
1

LIKE’s argument is a regular expression.

UNLIKE

UNLIKE is the complement of LIKE: it allows us to avoid running on certain slaves. Maybe you don’t want some daemons (such as ones processing user billing information) to run on machines that are in the DMZ, which is an unsecured network area. By setting the attribute dmz to true on machines in the DMZ, we can have our application avoid those machines, as in Example 3-8.

Example 3-8. Don’t run in the DMZ
{
    // ... snip ...
    "constraints": [["dmz", "UNLIKE", "true"]] 1
}
1

UNLIKE’s argument can be a regular expression.

Placement constraints can also be combined. For example, we could ensure that our application isn’t running in the DMZ and is evenly distributed amongst the remaining machines as shown in Example 3-9.

Example 3-9. Combining constraints
{
    // ... snip ...
    "constraints": [["dmz", "UNLIKE", "true"],
                    ["hostname", "GROUP_BY"]]
}

Combining constraints will allow you to precisely control how your applications are placed across the cluster.

Running Dockerized Applications

Marathon has first-class support for Docker containers. If your app is already Dockerized, then it’s very easy to run it on Marathon.

In order to get started, you’ll first need to make sure that your Mesos cluster has Docker support enabled. To do this, make sure that you’ve enabled the Docker containerizer and increased the executor timeout (see “Using Docker” for details).

At this point, we can simply add the Docker configuration to the JSON that we use to configure our application. Let’s take a look at the options we have when using Docker, as illustrated in Example 3-10. Note that this JSON configuration is partial; it must also include additional options such as those in Example 3-1.

Example 3-10. Marathon Docker JSON configuration
{
    // ... snip ...
    "container": {
        "type": "DOCKER", 1
        "docker": {
            "image": "group/image", 2
            "network": "HOST" 3
        }
    }
}
1

Since Mesos will continue to add new container types, such as Rocket or KVM containers, much of the container configuration isn’t Docker-specific. Here, we must specify that we’ll provide Docker-specific configuration, because this is a DOCKER container.

2

This is the most important part: where you specify your container image.2

3

We’re going to run Docker in host networking mode; this means that Docker is using the slave’s networking resources directly.

Once the Docker container launches, it’ll also have access to the Mesos sandbox directory, which will be available in the environment variable $MESOS_SANDBOX.

Mounting host volumes

Often, when using Docker, you may want to mount some volume available on the host. This may be some globally distributed configuration files, data files, or maybe you’d just like a data directory that won’t be automatically garbage collected. Example 3-11 shows how to add this to the JSON configuration.

Example 3-11. Mounting host volumes
{
    "container": {
        "type": "DOCKER",
        "docker": {
            "image": "group/image",
            "network": "HOST"
        },
        "volumes": [ 1
            { 2
                "containerPath": "/var/hostlib",
                "hostPath": "/usr/lib",
                "mode": "RO" 3
            },
            { 4
                "containerPath": "/var/scratch",
                "hostPath": "/mount/ssd",
                "mode": "RW"
            }
        ]
    }
}
1

Note that the "volumes" aren’t considered Docker-specific, and you can specify an array of them.

2

This volume allows us to access the /usr/lib directory of the host machine from within our Docker container. This could be useful if we’d like to access libmesos.so from the host, rather than needing to keep updating our Docker image.

3

We can mount the volume in read-only (RO) or read-write (RW) mode.

4

This volume gives us some SSD-mounted scratch space, assuming that an SSD is mounted at /mount/ssd on the host.

Mounting host volumes into the Docker container might seem like a great way to host databases on Marathon; after all, you can provide persistent host storage to containers. Unfortunately, even though you can mount volumes with Docker, there are a few reasons why this is very hard (at least, until Mesos natively supports volumes). First of all, it’s tedious to communicate which machines have disks available. In order to accomplish this, you’ll need to use placement constraints to ensure all of the following:

  1. Each slave with a spare volume has a special attribute identifying it as having that volume, so that you can use the CLUSTER operator to ensure placement on those slaves.

  2. The application has the UNIQUE operator set on the hostname, so that only one instance of the application launches on each slave (since there’s only one special mount point per machine).

  3. Every one of the slaves has its host volume mounted on the same path (luckily, this is the easiest part).

If you feel up to the task, this can help you to get certain types of databases running on Marathon. Sadly, you won’t have scaling or automatic recovery handled for you, since there’s so much special configuration necessary on each slave. Happily, as Mesos adds support for persistent disks, more and more specialized frameworks that handle launching and configuring databases will pop up, and this mess will no longer be needed.

Health Checks

Marathon keeps track of whether it believes an application is healthy or not. This information is exported via the REST API, making it easy to use Marathon to centralize management of its application health checks. When you query the REST API for information about a particular application, all of its tasks will include their latest health check results. By default, Marathon assumes that a task is healthy if it is in the RUNNING state. Of course, just because an application has started executing doesn’t mean that it’s actually healthy. To improve the usefulness of the application’s health data, you can specify three additional types of health checks that the tasks should undergo: command-based, HTTP, and TCP.

Example 3-12 shows the common fields of the JSON health check specification.

Example 3-12. Common health check fields
{
    "gracePeriodSeconds": 300, 1
    "intervalSeconds": 60, 2
    "maxConsecutiveFailures": 3, 3
    "timeoutSeconds": 30 4
}
1

The grace period is the amount of time that a task is given to start itself up. Health checks will not begin to run against a particular task until this many seconds have elapsed. This parameter is optional, with a default value of a 5-minute grace period.

2

The interval is the amount of time between runs of this health check. This parameter is optional, with a default value of 1 minute between each check.

3

The maximum number of consecutive failures determines how many failed health checks in a row should cause Marathon to forcibly kill the task, so that it can be restarted elsewhere. If this is set to 0, then this health check failing will never result in the task getting killed. This parameter is optional, with tasks by default being killed after three failures of this health check.

4

Sometimes, a health check just hangs. Rather than simply ignoring the check, Marathon will treat the slow or hung health check as a failure if it doesn’t complete in this many seconds. This parameter is optional, and by default health checks have 20 seconds to run.

We’ll look at the two of the three types of health checks here: HTTP and TCP. Command health checks are still relatively new, and due to some of their limitations, I recommend waiting a bit longer for them to stabilize. Bear in mind, though, that HTTP and TCP health checks here are currently performed by the Marathon master, which means that they’re not scalable to thousands of tasks.3

HTTP checks

HTTP health checks validate if doing a GET on a particular route results in a successful HTTP status code. Successful status codes can either be true successes, or redirects; thus, the response code must be in the range 200–399, inclusive. To configure an HTTP health check, there are two required fields and one optional field, as shown in Example 3-13.

Example 3-13. HTTP health check fields
{
    "protocol": "HTTP", 1
    "path": "/healthcheck", 2
    "portIndex": 0 3
}
1

You must specify that this is an HTTP health check in the "protocol" field.

2

You must specify what the route for the health check should be. In this case, we’re going to attempt a GET on http://myserver.example.com:8888/healthcheck, where myserver.example.com:8888 is the host and port that Marathon started the task on.

3

You can optionally specify a "portIndex" for the health check. By default, the port index is 0.

Why Do Health Checks Use “portIndex” Instead of “port”?

You might be confused as to what the "portIndex" is in the health check specification, and why we don’t simply specify the actual port. Remember that Marathon can (and by best practices does) choose the ports that your application will bind to. As a result, we don’t actually know when we’re configuring the health check what port the application is running on. Instead, however, remember that all the ports our application used were specified in the "ports" array of the application descriptor. Therefore, we can simply specify which index of the "ports" array corresponds to the port that we want to run the health check on!

TCP checks

TCP health checks validate whether it’s possible to successfully open a TCP connection to the task. They do not send or receive any data—they simply attempt to open the socket. To configure a TCP health check, there is one required and one optional field, shown in Example 3-14.

Example 3-14. TCP health check fields
{
    "protocol": "TCP", 1
    "portIndex": 0 2
}
1

You must specify that this is a TCP health check in the "protocol" field.

2

You can optionally specify a "portIndex" for the health check. By default, the port index is 0.

Application Versioning and Rolling Upgrades

What use would a distributed, scalable PaaS have if it couldn’t do rolling upgrades? An upgrade is any time that you modify the application, which is typically done by making a PUT to the /v2/apps/$appid route on Marathon. Every upgrade creates a new version, which is named with the timestamp of when the upgrade was made. When you query the REST API for information about the application, every task will say which version it is running. You can query which versions are available from the /v2/apps/$appid/versions route on Marathon, and you can find details of the versions from the /v2/apps/$appid/versions/$version route. Example 3-15 demonstrates how to configure the upgrade behavior of an application.

Example 3-15. Application upgrades
{
    "upgradeStrategy": {
        "minimumHealthCapacity": 0.5,
        "maximumOverCapacity": 0.1
    }
}

With Marathon, you can easily specify how much of the application you want to be available during an upgrade via the "minimumHealthCapacity" field of the application descriptor. The "minimumHealthCapacity" can be set to any value between 0 and 1.

If you set this to 0, when you deploy a new version Marathon will first kill all the old tasks, then start up new ones.

If you set this to 0.6, when you deploy a new version Marathon will first kill 40% of the old tasks, then start up 60% of the new tasks, then kill the rest of the old tasks and start the rest of the new tasks.

If you set this to 1, when you deploy a new version Marathon will first start up all the new tasks (at which point your application could use twice as many resources as during regular operations), and then shut down the old tasks.

The "maximumOverCapacity" setting provides an additional level of safety, so that an application doesn’t consume too many resources temporarily during an upgrade. The "maximumOverCapacity" can be set to any value between 0 and 1.

If you set this to 0, when you deploy a new version Marathon will not start more tasks than the target number of instances.

If you set this to 0.6, when you deploy a new version Marathon will never exceed a combined total of 160% of the target number of tasks.

If you set this to 1, when you deploy a new version Marathon will be allowed to start every new task before it kills any of the old tasks, if it so chooses.

The Event Bus

Often, when building applications with Marathon, you’ll want to receive notifications when various events happen, such as when an application gets a new deployment, or when it scales up or down. These notifications could then be used to reconfigure proxies and routers, log statistics, and generate reports. Marathon has a built-in feature for automatically POSTing all of its internal events to a provided URI. To use this feature, simply provide two additional command-line arguments to Marathon:

--event_subscriber

This argument should be passed the value http_callback in order to enable this subsystem. At this time, no other options are supported.

--http_endpoints

This argument is a comma-separated list of destination URIs to send the JSON-formatted events to. For example, a valid argument would be http://host1/foo,http://host2/bar.

Setting Up HAProxy with Marathon

What a great day! We’ve learned all about Marathon and decided that it would make an excellent platform for our applications. Now that our applications are running on Marathon, however, we’ve reached an impasse: how can the outside world actually connect to them? Some types of applications, such as Celery or Resque workers, need no additional configuration: they communicate via a shared database. Other types of applications, such as HTTP servers and Redis instances, need to be made easily discoverable. The most popular way to do this on Mesos clusters is to run proxies on static, non-Mesos managed hosts, and automatically update those proxies to point at running instances of the application. This way, the application is always available on a known host and port (i.e., the proxy), but we can still dynamically scale its capacity on the backend. Also, each proxy host can typically serve many applications, each on a different port, so that a small number of proxy hosts are sufficient for hundreds of backends. Typically, proxies are constrained either by the total bandwidth used by all active connections, or by the total CPU usage if they’re providing SSL termination.4

There are two proxies that are far and away the most popular choices for use in modern application stacks: HAProxy and Nginx. HAProxy is a proxy with very few features: it can proxy HTTP and TCP connections, perform basic health checks, and terminate SSL. It is built for stability and performance: there have been no reported crashes or deadlock bugs in HAProxy for 13 years. This is why HAProxy is popular amongst Mesos users today.

Nginx, by comparison, has no limits on its features. Besides acting as a proxy, it is able to run custom user code written in Lua, JavaScript, or JVM-based languages. This code can affect the proxy operations, or even serve responses directly. Many systems actually use both proxies: clients connect to HAProxy instances, which themselves forward the requests to Nginx instances. The Nginx instances then either respond directly, or proxy the requests to backend servers to be processed.

In this section, we’re only going to discuss HAProxy, but be aware that Nginx can be used in exactly the same way, if you require the additional functionality it offers.

haproxy-marathon-bridge

In the Marathon repository, there’s a script called bin/haproxy-marathon-bridge. This script is intended to help generate an HAProxy configuration file for Marathon. Unfortunately, it doesn’t allow you to specify which port the services on Marathon will be bound to on the HAProxy host. As a result, this script is best avoided.

Let’s take a look at the most popular configurations for setting up proxies with Mesos.

Bamboo

Bamboo is an HAProxy configuration daemon with a web interface. It provides a UI for viewing and managing the current state of HAProxy rules. At its core, Bamboo monitors Marathon for changes to applications, and keeps its HAProxy instance’s backends up-to-date. Bamboo also provides several useful features, all of which are exposed via the web interface and an HTTP REST API.

Most importantly, Bamboo supports using HAProxy ACL rules for each Marathon application. An ACL rule is a fine-grained way to choose which server backend should handle a request. ACL rules can route requests based on their URL, cookies, headers, and many other factors. For instance, you can use ACL rules to route requests whose paths start with /static to the static content-serving backends, but all other requests to your application servers. ACL rules are extremely powerful, since they allow you to separate your application into isolated components or services, scale those components separately, but still expose a single URL to clients. With ACL rules, you could choose to route anonymous visitors to one pool of servers, logged-in users to another pool of servers, and requests for static data to a third pool of servers, thus enabling you to scale each of their capacities separately.

Bamboo also has a powerful templating system, based on the Go language’s text/template package. This templating system is flexible, but does have a steep learning curve compared to simpler, more popular systems like mustache templates.

All proxy configuration information is stored in ZooKeeper, which simplifies configuration management by keeping all instances of Bamboo in sync.

Bamboo uses the Marathon event bus to discover changes in backend configurations, which means that your HAProxy configurations typically lag changes in Marathon by mere hundreds of milliseconds.

Additionally, Bamboo has basic integration with StatsD to report on when reconfiguration events occurred.

Bamboo provides a complete solution for sophisticated HAProxy configurations for Marathon applications.

HAProxy for microservices

Nowadays, microservices are a popular architectural pattern. Mesos and Marathon form a great substrate for building microservices-based applications: after all, you can host all your services on Marathon. Unfortunately, service discovery is typically a substantial problem for microservice deployments. Let’s look at how we can use Mesos and Marathon to create a SmartStack-like system.

We need to consider two issues: how can we do service discovery, and what are the problems with the standard options? These options include:

DNS

We could choose to use round-robin DNS to give all our backend servers the same DNS name. There are two problems we’ll face, however: DNS changes typically take at least several seconds to propagate through our infrastructure, and some applications cache DNS resolutions forever (meaning they’ll never see DNS changes). The other problem is that, unless you write a custom client library that can use SRV records, DNS doesn’t provided a simple solution to running multiple applications on the same server but with different or randomly assigned ports.

Centralized load balancing

This is akin to what we discussed in “Bamboo”. For some applications, this is actually a great design; however, it does require a globally accessible pool of load balancers. This centralization can make it tedious to securely isolate applications from one another: in order to isolate the applications, you must configure the Nginx or HAProxy rules for the specific isolation requirements.5

In-app discovery

Companies like Twitter and Google use specialized service request and discovery layers, such as Finagle and Stubby, respectively. These RPC layers integrate service discovery directly into the application, querying ZooKeeper or Chubby (Google’s internal ZooKeeper-like system) when they need to find a server to which they can send a request. If this is possible in your environment, using a specialized service discovery layer offers the most flexibility in terms of routing, load balancing, and reliability due to the fact that you can control every aspect of the system. Unfortunately, if you need to integrate applications written in several languages, it can be onerous and time-consuming to develop parallel implementations of the discovery system for every language. Furthermore, this approach can be a nonstarter if you need to integrate with existing applications that weren’t developed with this service discovery mechanism built in.

The shortcomings of these approaches led many engineers to come up with a fourth pattern: one that combines the ease of integration of centralized load balancing with the fault tolerance of the decentralized options. With this approach, we will run a copy of HAProxy on every single machine in our data center. Then, we’ll assign each application a port (but not a hostname). Finally, we’ll keep all of our HAProxy instances pointing each application’s port at its backends, wherever they’re running on the cluster.

This approach has two main benefits:

Ease of setup

You can use Bamboo or a simple cron job on every machine in your cluster to keep the local instances of HAProxy in sync with Marathon. Even if some part of your infrastructure is temporarily unavailable, the local HAProxy will continue to route traffic directly to the backends. This means that even a disruption of Marthon won’t impact the application’s operations.

Ease of use

Every application can communicate with any other microservice simply by connecting to localhost:$PORT, where $PORT is what you assigned to that microservice. There’s no need to write any specialized or custom code for your applications, as all you need to do is remember which port corresponds to which application.

There are also a few downsides to this approach. One is that you need to carefully maintain the mapping from service to port. If you try to store that mapping globally and have applications fetch it on startup, you’ll have to write custom code for every language again. On the other hand, if you hardcode the ports themselves, you run the risk of typos, which can be very difficult to debug.

The other downside is the lack of coordination between proxies. Our system continues to function even when partial failures happen, because each machine operates independently. Due to the independent operation of each machine, it’s possible for several HAProxies to all route to the same backend, thus spiking the load on that backend, which can result in cascading failures. It is normal to see the variance of request times increase as the number of nodes in a cluster increases.

Today, this approach of tying services to ports and having every client connect to HAProxy at localhost seems to be the most popular way to manage service discovery on Mesos clusters.

Running Mesos Frameworks on Marathon

A very common choice in Mesos clusters is to run the cluster’s Mesos frameworks on Marathon. But Marathon is a Mesos framework itself! So what does it mean to run a Mesos framework on Marathon? Rather than worrying about deploying each framework’s scheduler to specific hosts and dealing with those hosts’ failures, Marathon will ensure that the framework’s scheduler is always running somewhere on the cluster. This greatly simplifies deploying new frameworks in a highly reliable configuration.

What About Resource Allocation for Frameworks Run by Marathon?

New Mesos users often wonder, “If I run a framework on Marathon, how does that affect the allocation of resources?” In fact, although the framework’s scheduler is run by Marathon, the scheduler still must connect directly to Mesos, and it will receive resources in an identical way as if you ran it on specifically chosen machines. The benefit of running it on Marathon, of course, is that any Mesos slave can run the scheduler, and Marathon can restart it automatically if that slave fails.

What Is Chronos?

Chronos is a Mesos framework that provides a highly available, distributed time-based job scheduler, like cron. With Chronos, you can launch jobs that are either command-line programs (optionally downloaded from URIs) or Docker containers. Like Marathon, Chronos has a web UI and an easily programmable REST API. Although the Chronos endpoints have a slightly different interface than Marathon’s, we won’t go over their details, as the online documentation is sufficient. Chronos has four key features for scheduling jobs:

Interval scheduling

Chronos uses the standard ISO 8601 notation to specify repeating intervals. Through the web UI or REST API, you can specify the frequency at which you want the job to rerun. This makes Chronos suitable for running quick jobs every few minutes, or a nightly data generation job.

Dependency tracking

In theory, Chronos supports dependent jobs. Dependent jobs run only if their prerequisites ran successfully. Unfortunately, Chronos’s implementation of dependent jobs misses the mark. In Chronos, a job’s dependencies are considered satisfied if the last invocation of that job was successful. A dependent job is run as soon as all of its dependencies have been run at least once. This means that unless every parent has the exact same interval, the dependent jobs can run at potentially unexpected times. In “Chronos Operational Concerns”, we’ll look at a popular mitigation strategy for this.

Deferred start

Chronos can also be used to run a job once or on a repeating interval at a point in the future. When you specify the ISO 8601 interval on which to repeat the job, you also specify when you’d like the first run of the job to start. This means that you can use Chronos to schedule jobs to run days, weeks, or months in the future, which can be useful for planning purposes.

Worker scalability

The key differentiator between Chronos and most other job schedulers is that Chronos runs its jobs inside of containers on Mesos. This means that jobs are isolated and workers are scalable, so no matter what load Chronos generates, you’re able to add new Mesos slaves to handle the additional work and ensure that the jobs receive their necessary resources.

Like all production-ready Mesos frameworks, Chronos has high-availability support with hot spares—most Chronos deployments will run at least two instances of the scheduler, to ensure that Chronos never has more than a couple seconds of downtime if a scheduler or its host fails. Chronos also comes with a synchronization tool, chronos-sync.rb. This tool synchronizes the Chronos state to disk in JSON format. Using chronos-sync.rb, you can version the jobs you run on your cluster by committing the file to version control. This is the most popular approach to ensure repeatability, since it allows you to keep your Chronos configuration in a repository.

The combination of its REST API and synchronize-to-disk tool make Chronos easy to integrate. Since its features are somewhat limited, it’s easy to use, but it’s not a complete solution for dependent workflow management.

Running Chronos on Marathon

One of the most common techniques seen in Mesos framework deployment is to run framework schedulers on Marathon. The reason for this is that those schedulers can rely on Marathon to ensure they’re always running somewhere. This removes a huge burden from the operator, who would otherwise need to ensure that every scheduler had a well-monitored, multimachine deployment.

In order to be deployable on Marathon, a scheduler must do some sort of leader election. As we’ll learn in Chapter 4, a scheduler connects to Mesos from a single host. This means that if we’d like to run our scheduler in two locations (e.g., to have a hot spare), we’ll need some sort of signaling to ensure that only one scheduler runs at a time. The most common way to solve this is to use ZooKeeper, since it has an easily integrated leader election component (see “Adding High Availability” for details). Marathon and Chronos both take this approach.

Let’s look at an example JSON expression for launching Chronos on Marathon (Example 3-16). In order to maximize the efficiency of our port allocations, we’ll allow Chronos to bind to any port. To connect to Chronos, we’ll either click through the Marathon UI or use one of the HAProxy service discovery mechanisms described earlier.

Example 3-16. Chronos JSON descriptor
{
    "cmd": "java -jar chronos.jar -Xmx350m -Xms350m ---port $PORT0", 1
    "uris": [
        "http://fetchable/uri/with/chronos.jar" 2
    ],
    "cpus": 1, 3
    "mem": 384, 3
    "instances": 2, 4
    "id": "chronos",
    "ports": [0], 5
    "constraints": [["hostname", "UNIQUE"]] 6
}
1

We assume that java is already installed on all the Mesos slaves. Thus, we only need to pass the dynamically assigned port to Chronos so that it can start up successfully.

2

Rather than installing Chronos on every Mesos slave, we’ll store its .jar file in some known location. This way, it’ll be downloaded on the fly to any slave that is assigned to run the Chronos scheduler, simplifying deployment.

3

We’ll choose a container size to match the amount of memory we allow the JVM to allocate (we leave 34 MB of headroom for the JVM itself).

4

Running two instances of Chronos gives us near-instantaneous failover, since the standby Chronos scheduler will already be running.

5

Chronos needs only one port to be assigned.

6

We will use a unique host constraint to ensure that the standby instance of Chronos is running on a different machine. If there are slave attributes that indicate the rack of each slave, it would be better to rack-isolate the scheduler and its standby.

The simplest way to verify that Chronos is running is to click on the assigned port number from the Marathon app view. Each task’s port assignment will be a link to that host and port combination, which will load the Chronos UI. The Chronos UI is somewhat buggy, so it’s recommended to use curl or chronos-sync.rb to configure your instance of Chronos.

Chronos Operational Concerns

Chronos is a useful part of the Mesos ecosystem; however, care must be taken not to overly rely on it. There are a few things that one must be aware of when running Chronos in production:

  1. The dependent jobs feature isn’t useful—you must use a workflow manager.

  2. Chronos is still a complex distributed system—it’s only necessary if you need jobs to run scalably in containers.

Almost no one uses the Chronos dependent jobs feature. Instead, they use Chronos to schedule invocations of popular data workflow tools. For simple workflows (such as running a few processes in order), Bash scripts are a popular option; however, beyond 20 lines, Bash scripts become unwieldy, and their syntax for complex loops and conditionals leaves much to be desired. make is another popular way to express dependencies, since it has superior syntax for workflows and supports basic parallelism. Luigi is a much richer and more powerful workflow manager. It supports checkpoints in HDFS and databases, which can improve the efficiency of retries of jobs that failed partway through, since it won’t need to redo earlier checkpointed work. Regardless of your particular needs and use case, you should use some other tool to order the jobs that Chronos will be scheduling.

If you’ve made it this far, you’re almost ready to use Chronos in production! The final consideration is whether you truly need the scalable isolation that Chronos gets from Mesos. Even though you’re running a Mesos cluster, you don’t need to use Chronos for your task management. There are many other job-scheduling servers that could easily be run on Marathon; only use Chronos if you need each job to run in its own container.

Chronos on Marathon: Summary

Chronos is a powerful tool for Mesos clusters that need reliable, scalable, interval-based job scheduling. It powers the data generation processes at many companies, including Airbnb. Just remember the several quirks, workarounds, and advice discussed here!

Chronos on Marathon is just one example of how Marathon can be used to provide high availability and ease of deployment to other Mesos schedulers. DCOS, a commercial Mesos product, takes this further and requires all frameworks on DCOS to be hosted by Marathon. Whether you’re deploying Chronos, Kafka, or Cotton, Marathon is an excellent choice for hosting other Mesos frameworks’ schedulers.

Alternatives to Marathon + Chronos

Of course, Marathon (and Chronos) won’t fit every cluster’s needs. Let’s briefly look at two other popular Mesos PaaS frameworks: Aurora and Singularity. Both frameworks have a much steeper learning curve, but they offer many additional features.

Singularity

Singularity is a Mesos framework developed by HubSpot. It offers the ability to deploy services just like Marathon (including Dockerized applications); beyond this, if a service begins to fail its health checks after a deployment, Singularity will automatically roll back the deployment. Singularity also supports repeating jobs, like Chronos, as well as one-shot jobs, which can be triggered via a REST API. This simplifies adding containerized, asynchronous background processing to any service or repeating job. For managing service discovery and proxies, Singularity is integrated with another tool called Baragon, which manages and integrates HAProxy, Nginx, and Amazon’s Elastic Load Balancer.

Aurora

Aurora is the Mesos framework that powers Twitter. Aurora has a highly advanced job description API written in Python. This API, called Thermos, allows users to specify how to build and install applications, as well as the workflow and sequence of processes to run within the Mesos executor. Aurora comes with a high-performance distributed service discovery API with bindings for C++, Java, and Python. It’s able to run repeating jobs with cron-style syntax. Also, just like Singularity, Aurora can automatically detect when a service’s health checks start failing and roll it back to the previous working version. Aurora’s most powerful features are related to its multitenancy controls: Aurora can allow for some tasks to use spare cluster capacity, and then preempt those tasks when higher-priority production tasks need to be scheduled. It also can enforce quotas between users. These features were born out of the needs of a company running huge numbers of applications written by hundreds of developers on massive Mesos clusters; if that sounds like the kind of organization you’re building a Mesos cluster for, Aurora’s worth a closer look.

Summary

In this chapter, we learned how to build standard stateless server-based applications with Mesos. What we achieved was self-healing, self-restarting, and simplified service discovery—we basically built a PaaS on Mesos! This was accomplished by using Marathon, a popular Mesos framework.

Marathon launches, monitors, and restarts processes across your Mesos cluster. It simplifies rollout, allows for push-button scale up and scale down, and provides a simple JSON API that’s easy to integrate with.

First, we learned how to start up Marathon, how to secure it, and how to ensure that it’s highly available. Then, we learned how to write a Marathon service descriptor, which allowed us to launch the omnipresent SimpleHTTPServer. This was just an example application; any other application could be launched simply by changing the command line. We also learned how to programmatically query for information about the status of applications on Marathon.

Next, we learned about scaling applications on Marathon. Most importantly, we discussed placement constraints, and how to easily ensure that applications run on the correct class of slave machines and are spread evenly amongst racks.

Docker is the most popular application containerization technology. We learned how to launch Dockerized applications on Marathon with minimal fuss, and how to configure various Docker-specific options, such as mounting host disks.

Our applications wouldn’t be fully robust without including health checks and automatic incremental rollouts and rollbacks. We discussed how to configure TCP, HTTP, and command-based health checks, as well as how to specify the application’s behavior during upgrades. We also discussed how to subscribe to an event stream from Marathon to build custom integrations.

Then, we learned about integrating Marathon with a service discovery proxy. Bamboo, an open-source HAProxy configuration manager, is a powerful tool for synchronizing HAProxy with Marathon tasks. We also learned about popular approaches for building scalable, multilanguage microservices architectures with HAProxy and Marathon.

Finally, we looked at how to use Marathon to host other Mesos frameworks’ schedulers, using Chronos, a distributed cron-like framework, as an example. We also briefly surveyed alternative PaaS frameworks to Marathon, since they offer many additional features (at the cost of a steeper learning curve).

Perhaps Marathon and its alternatives don’t satisfy the needs of your application, and you need more control over how processes are launched or how failures are handled. In that case, you’ll need to write your own framework. In the next chapter, we’ll learn about the techniques used to build Marathon, so that you can build your own custom Mesos framework.

1 By this, I mean endpoints that may result in running a query on the database or making an external service call that you can’t predict how long it will take, or whether we’ll consume resources indefinitely.

2 If you’d like to use a private Docker repository, you can add the .dockercfg to the "uris" field of the application descriptor, as described in the Advanced Docker Configuration note in the section “Using Docker”.

3 These will be made horizontally scalable in a future version of Mesos. You can see whether these features have been added yet by checking MESOS-2533 and MESOS-3567 on https://issues.apache.org/jira.

4 SSL termination is when clients communicate with the proxy over a secure SSL connection, and the proxy handles the SSL so that the backends need not waste CPU resources on cryptography, or even support it.

5 The other options, DNS and in-app discovery, delegate communication directly to each application, so there’s no central system to configure for isolation. Instead, each application can handle its authorization in complete isolation from other applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.21.109