Appendix A. Roll Your Own Serverless Infrastructure

Here we will discuss a simple proof of concept (POC) for a serverless computing implementation using containers.

Note that the following POC is of an educational nature. It serves to demonstrate how one could go about implementing a serverless infrastructure and what logic is typically required; the discussion of its limitations at the end of this appendix will likely be of the most value for you, should you decide to roll your own infrastructure.

Flock of Birds Architecture

So, what is necessary to implement a serverless infrastructure? Astonishingly little, as it turns out: I created a POC called Flock of Birds (FoB), using DC/OS as the underlying platform, in a matter of days.

The underlying design considerations for the FoB proof of concept were:

  • The service should be easy to use, and it should be straightforward to integrate the service.

  • Executing different functions must not result in side effects; each function must run in its own sandbox.

  • Invoking a function should be as fast as possible; that is, long ramp-up times should be avoided when invoking a function.

Taken together, the requirements suggest a container-based implementation. Now let’s have a look at how we can address them one by one.

FoB exposes an HTTP API with three public and two internal endpoints:

  • POST /api/gen with a code fragment as its payload generates a new function; it sets up a language-specific sandbox, stores the user-provided code fragment, and returns a function ID, $fun_id.

  • GET /api/call/$fun_id invokes the function with ID $fun_id.

  • GET /api/stats lists all registered functions.

  • GET /api/meta/$fun_id is an internal endpoint that provides for service runtime introspection, effectively disclosing the host and port the container with the respective function is running on.

  • GET /api/cs/$fun_id is an internal endpoint that serves the code fragment that is used by the driver to inject the user-provided code fragment.

The HTTP API makes FoB easy to interact with and also allows for integration, for example, to invoke it programmatically.

Isolation in FoB is achieved through drivers. This is specific code that is dependent on the programming language; it calls the user-provided code fragment. For an example, see the Python driver. The drivers are deployed through sandboxes, which are templated Marathon application specifications using language-specific Docker images. See Example A-1 for an example of the Python sandbox.

Example A-1. Python sandbox in FoB
{
    "id": "fob-aviary/$FUN_ID",
    "cpus": 0.1,
    "mem": 100,
    "cmd": "curl $FUN_CODE > fobfun.py && python fob_driver.py",
    "container": {
        "type": "DOCKER",
        "docker": {
            "image": "mhausenblas/fob:pydriver",
            "forcePullImage": true,
            "network": "BRIDGE",
            "portMappings": [
                {
                    "containerPort": 8080,
                    "hostPort": 0
                }
            ]
        }
    },
    "acceptedResourceRoles": [
        "slave_public"
    ],
}

At registration time, the id of the Marathon app is replaced with the actual UUID of the function, so fob-aviary/$FUN_ID turns into something like fob-aviary/5c2e7f5f-5e57-43b0-ba48-bacf40f666ba. Similarly, $FUN_CODE is replaced with the storage location of the user-provided code, something like fob.marathon.mesos/api/cs/5c2e7f5f-5e57-43b0-ba48-bacf40f666ba. When the container is deployed, the cmd is executed, along with the injected user-provided code.

Execution speed in FoB is improved by decoupling the registration and execution phases. The registration phase—that is, when the client invokes /api/gen—can take anywhere from several seconds to minutes, mainly determined by how fast the sandbox Docker image is pulled from a registry. When the function is invoked, the driver container along with an embedded app server that listens to a certain port simply receives the request and immediately returns the result. In other words, the execution time is almost entirely determined by the properties of the function itself.

Figure A-1 shows the FoB architecture, including its main components, the dispatcher, and the drivers.

Flock of Birds architecture
Figure A-1. Flock of Birds architecture

A typical flow would be as follows:

  1. A client posts a code snippet to /api/gen.

  2. The dispatcher launches the matching driver along with the code snippet in a sandbox.

  3. The dispatcher returns $fun_id, the ID under which the function is registered, to the client.

  4. The client calls the function registered above using /api/call/$fun_id.

  5. The dispatcher routes the function call to the respective driver.

  6. The result of the function call is returned to the client.

Both the dispatcher and the drivers are stateless. State is managed through Marathon, using the function ID and a group where all functions live (by default called fob-aviary).

Interacting with Flock of Birds

With an understanding of the architecture and the inner workings of FoB, as outlined in the previous section, let’s now have a look at the concrete interactions with it from an end user’s perspective. The goal is to register two functions and invoke them.

First we need to provide the functions, according to the required signature in the driver. The first function, shown in Example A-2, prints Hello serverless world! to standard out and returns 42 as a value. This code fragment is stored in a file called helloworld.py, which we will use shortly to register the function with FoB.

Example A-2. Code fragment for the “hello world” function
def callme():
    print("Hello serverless world!")
    return 42

The second function, stored in add.py, is shown in Example A-3. It takes two numbers as parameters and returns their sum.

Example A-3. Code fragment for the add function
def callme(param1, param2):
    if param1 and param2:
        return int(param1) + int(param2)
    else:
        return None

For the next steps, we need to figure out where the FoB service is available. The result (IP address and port) is captured in the shell variable $FOB.

Now we want to register helloworld.py using the /api/gen endpoint. Example A-4 shows the outcome of this interaction: the endpoint returns the function ID we will subsequently use to invoke the function.

Example A-4. Registering the “hello world” function
$ http POST $FOB/api/gen < helloworld.py
HTTP/1.1 200 OK
Content-Length: 46
Content-Type: application/json; charset=UTF-8
Date: Sat, 02 Apr 2016 23:09:47 GMT
Server: TornadoServer/4.3

{
    "id": "5c2e7f5f-5e57-43b0-ba48-bacf40f666ba"
}

We do the same with the second function, stored in add.py, and then list the registered functions as shown in Example A-5.

Example A-5. Listing all registered functions
$ http $FOB/api/stats
{
    "functions": [
        "5c2e7f5f-5e57-43b0-ba48-bacf40f666ba",
        "fda0c536-2996-41a8-a6eb-693762e4d65b"
    ]
}

At this point, the functions are available and are ready to be used. Let’s now invoke the add function with the ID fda0c536-2996-41a8-a6eb-693762e4d65b, which takes two numbers as parameters. Example A-6 shows the interaction with /api/call, including the result of the function execution—which is, unsurprisingly and as expected, 2 (since the two parameters we provided were both 1).

Example A-6. Invoking the add function
$ http $FOB/api/call/fda0c536-2996-41a8-a6eb-693762e4d65b?
  param1:1,param2:1
{
    "result": 2
}

As you can see in Example A-6, you can also pass parameters when invoking the function. If the cardinality or type of the parameter is incorrect, you’ll receive an HTTP 404 status code with the appropriate error message as the JSON payload; otherwise, you’ll receive the result of the function invocation.

Limitations of Flock of Birds

Naturally, FoB has a number of limitations, which I’ll highlight in this section. If you end up implementing your own solution, you should be aware of these challenges. Ordered from most trivial to most crucial for production-grade operations, the things you’d likely want to address are:

  • The only programming language FoB supports is Python. Depending on the requirements of your organization, you’ll likely need to support a number of programming languages. Supporting other interpreted languages, such as Ruby or JavaScript, is straightforward; however, for compiled languages you’ll need to figure out a way to inject the user-provided code fragment into the driver.

  • If exactly-once execution semantics are required, it’s up to the function author to guarantee that the function is idempotent.

  • Fault tolerance is limited. While Marathon takes care of container failover, there is one component that needs to be extended to survive machine failures. This component is the dispatcher, which stores the code fragment in local storage, serving it when required via the /api/meta/$fun_id endpoint. In order to address this, you could use an NFS or CIFS mount on the host or a solution like Flocker or REX-Ray to make sure that when the dispatcher container fails over to another host, the functions are not lost.

  • A rather essential limitation of FoB is that it doesn’t support autoscaling of the functions. In serverless computing, this is certainly a feature supported by most commercial offerings. You can add autoscaling to the respective driver container to enable this behavior.

  • There are no integration points or explicit triggers. As FoB is currently implemented, the only way to execute a registered function is through knowing the function ID and invoking the HTTP API. In order for it to be useful in a realistic setup, you’d need to implement triggers as well as integrations with external services such as storage.

By now you should have a good idea of what it takes to build your own serverless computing infrastructure.

For a selection of pointers to in-use examples and other useful references, see Appendix B.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.162.247