Chapter 7. Collaborative Chaos

In Chapter 1 you learned that collaboration is key to successful chaos engineering. When you are running a chaos experiment, whether it be a Game Day or an automated experiment, everyone should be aware that chaos is happening (more on this in Chapter 10).

This collaboration also extends to your experiments and findings themselves. While you’ve seen how useful chaos engineering can be for you and your team as you surface, explore, and overcome weaknesses in your systems, it is potentially useful to others outside of your team as well.

The good news is that, because you put so much effort into defining hypotheses and experiments in those JSON or YAML documents with the Chaos Toolkit in earlier chapters, those experiments are almost ready to become useful candidates for cross-team sharing and even potential reuse. In this chapter you’ll learn how to make a few final tweaks so that you can share and reuse your experiments as you write your own code.

Sharing Experiment Definitions

A few particular aspects of an experiment may stop it from being shareable embedded configuration values and, even worse, embedded secrets. You probably already know that it’s a bad idea to share secrets in plain text in any of your development artifacts, and the same caution should be applied to your experiments. In addition, there likely are configuration values that could vary from system to system; externalizing those values is also good practice, so that anybody looking to directly reuse your experiment will be instructed of the configuration they will need to amend to do that.

Let’s take a Chaos Toolkit chaos experiment candidate as an example (see Example 7-1).

Example 7-1. An experiment that contains embedded configuration and secrets
{
    "version": "1.0.0",
    "title": "Simply retrieves all events from Instana",
    "description": "Demonstrates retrieving all the events for a time window and
    adding them into the experiment's journal",
    "tags": [
        "instana"
    ],
    "secrets": {
        "instana": {
            "instana_api_token": "1234567789"
        }
    },
    "configuration": {
        "instana_host" : "http://myhost.somewhere"
    },
    "steady-state-hypothesis": {
        "title": "Services are all available and healthy",
        "probes": [{
                "type": "probe",
                "name": "app-must-respond",
                "tolerance": 200,
                "provider": {
                    "type": "http",
                    "url": "http://192.168.39.7:31546/invokeConsumedService"
                }
            }
        ]
    },
    "method": [
		{
            "type": "probe",
            "name": "get-all-events-in-window-from-instana",
            "provider": {
                "secrets": ["instana"],
                "type": "python",
                "module": "chaosinstana.probes",
                    "func": "get_all_events_in_window",
                    "arguments": {
                        "from_time": "5 minutes ago"
                    }
            }
        }
    ],
    "rollbacks": []
}

This experiment helps to illustrate the two aspects we’re looking to externalize: configuration and secrets. The experiment does nothing more than attach to the third-party Instana service and retrieve a collection of system events, which it then dutifully places in the experiment’s output, the journal.

About This Experiment

This sample experiment actually comes from the Instana Chaos Toolkit driver’s source code. Instana is an application performance management platform, and it’s common to use a driver1 that integrates with other tooling like this to inspect the state of an application as well as to provide observability around chaos experiments as they are executed (see Chapter 10).

There are some obvious configuration and secrets blocks that are needed to talk to Instana in the experiment method’s single probe, and—dangerously in the case of secrets—the values are also specified in the experiment. In addition, the URL is specified for the app-must-respond probe in the experiment’s steady-state hypothesis. The URL could vary from one environment to the next and thus would also be a worthy candidate for being made a configurable item.

To make this experiment more reusable, and much more shareable, you will:

  • Move the URL for the app-must-respond probe into a configuration item.

  • Move all of the experiment’s configuration so that it is then sourced from the experiment’s runtime environment.

  • Move all of the experiment’s secrets so that they are sourced from the experiment’s runtime environment.

Moving Values into Configuration

The app-must-respond probe’s URL should really be declared as a configurable value so that your experiment can be pointed at different target systems. To do this, you create a new configuration item in the configuration block to contain the application-endpoint configurable property:

{
    ...

    "secrets": {
        "instana": {
            "instana_api_token": "1234567789"
        }
    },
    "configuration": {
        "instana_host" : "http://myhost.somewhere",
        "application_endpoint": "http://192.168.39.7:31546/
        invokeConsumedService"
    },
    "steady-state-hypothesis": {
        "title": "Services are all available and healthy",
        "probes": [{
                "type": "probe",
                "name": "app-must-respond",
                "tolerance": 200,
                "provider": {
                    "type": "http",
                    "url": "http://192.168.39.7:31546/invokeConsumedService"
                }
            }
        ]
    },

    ...
}

You can now refer to your new application_endpoint configuration property using its name wrapped in ${} when it is needed in the app-must-respond probe:

{
    ...

    "steady-state-hypothesis": {
        "title": "Services are all available and healthy",
        "probes": [{
                "type": "probe",
                "name": "app-must-respond",
                "tolerance": 200,
                "provider": {
                    "type": "http",
                    "url": "${application_endpoint}"
                }
            }
        ]
    },

    ...
}

Introducing the application_endpoint configuration property is a good start, but it is still a hardcoded value in the experiment. The instana_host configuration property also suffers from this limitation. Adjusting the Instana host and your application endpoint for different environments would require the creation of a whole new experiment just to change those two values. We can do better than that by shifting those configuration properties into environment variables.

Specifying Configuration Properties as Environment Variables

A benefit of environment variables is that they can be specified and changed without affecting the source code of your experiments, making your experiments even more reusable. To make a configuration property populatable by an environment variable, you need to map an environment variable as containing the value for the configuration property using a key and value. This example shows the change for the application_endpoint property:

{
    ...

    "configuration": {
        "instana_host" : "http://myhost.somewhere",
        "application_endpoint": {
            "type" : "env",
            "key"  : "APPLICATION_ENDPOINT"
        }
    },

    ...
}

type specifies env, which tells the Chaos Toolkit to grab the value for this configuration property from the runtime environment. key then specifies the name of the environment variable that will supply that value.

Now you can do the same with the instana_host configuration property, leaving your experiment in a much better state—these two properties can then be configured at runtime depending on what Instana service is being used and where your application’s endpoint is in that context:

{
    ...

    "configuration": {
        "instana_host" :
        {
            "type" : "env",
            "key"  : "INSTANA_HOST"
        }
        "application_endpoint": {
            "type" : "env",
            "key"  : "APPLICATION_ENDPOINT"
        }
    },
    ...
}

Externalizing Secrets

While externalizing configuration properties from experiments is good practice, externalizing secrets is great practice, if not absolutely essential. You should never have secrets embedded in your source code, and as automated chaos experiments are also code, it stands to reason that the same rule applies there.

At the moment, you do have a secret: the instana_api_token, which is hanging around in your experiment’s code, just waiting to be committed and shared with everyone by accident:

{
    ...

    "secrets": {
        "instana": {
            "instana_api_token": "1234567789"
        }
    },

    ...

The embedding of a secret like that is a major no-no, so let’s rectify that now. You have a couple of options as to where secrets can be sourced from when using the Chaos Toolkit, but for simplicity’s sake, let’s change this embedded secret so that it is sourced from an environment variable, as you did with the experiment’s configuration properties:

{
    ...

    "secrets": {
        "instana": {
            "instana_api_token": {
                "type": "env",
                "key": "INSTANA_API_TOKEN"
            }
        }
    },

Now the instana_api_token secret is being sourced from the INSTANA_API_TOKEN environment variable, and your experiment is ready to be shared safely.

Scoping Secrets

You might have noticed in the previous example that the secrets block has an additional level of nesting when compared with the configuration block:

{
    ...

    "secrets": {
        "instana": {
            "instana_api_token": {
                "type": "env",
                "key": "INSTANA_API_TOKEN"
            }
        }
    },
    "configuration": {
        "instana_host" :
        {
            "type" : "env",
            "key"  : "INSTANA_HOST"
        }
    },

    ...

The configuration block directly contains the instana_host configuration property, but the instana_api_token secret is captured in an additional instana block. This is because secrets in Chaos Toolkit experiments are scoped to a specific named container. When a secret is used in an experiment, you must specify which secrets container is to be made available to the activity. You can’t specify the names of the secrets explicitly, but you can specify the named container that has the needed secrets:

{

            "type": "probe",

            "name": "get-all-events-in-window-from-instana",

            "provider": {

                "secrets": ["instana"],

                "type": "python",

                "module": "chaosinstana.events.probes",

                    "func": "get_all_events_in_window",

                    "arguments": {

                        "from_time": "5 minutes ago"

                    }

            }

        }

This added level of control for secrets is purposeful. What you don’t see is that configuration properties are not only available to pass explicitly as values to your probes and actions but are also passed to every underlying driver and control that the Chaos Toolkit uses.

That works well for configuration, but it’s the last thing you’d want with secrets. Suppose someone fooled you into adding a custom extension into your Chaos Toolkit that disseminated secrets anywhere they chose; applying the same strategy as was applied with configuration would be trivial and dangerous!2

This is why secrets are scoped in their own named containers that must be explicitly passed to the activities in which they are to be used, as shown here:

{
    ...
    "method": [
		{
            "type": "probe",
            "name": "get-all-events-in-window-from-instana",
            "provider": {
                "secrets": ["instana"],
                "type": "python",
                "module": "chaosinstana.probes",
                    "func": "get_all_events_in_window",
                    "arguments": {
                        "from_time": "5 minutes ago"
                    }
            }
        }
    ],
    ...
}

In this experiment snippet, the get-all-events-in-window-from-instana probe needs access to the secrets scoped in the instana secrets container, and so this is explicitly specified using the secrets property on the probe’s provider.

Specifying a Contribution Model

Anyone who encounters a shared experiment will be able to tell the hypothesis being explored, the method being employed to inject turbulent conditions, and even any suggested rollbacks. Those things alone will be pretty valuable to them, even if they are just looking for inspiration for their own experiments against their own system.

One piece of the puzzle is still not being taken advantage of, though, and it’s something that you had available to you in your Hypothesis Backlog (see Chapter 2) and that you added into your Game Day preparations (see Chapter 3). Your backlog and your Game Day plans both mentioned the contributions to trust and confidence that an experiment might make if it were employed as a Game Day or automated chaos experiment.

The Chaos Toolkit experiment format does allow you to specify this information as a “contribution model” for a given experiment:

{
    ...

    "contributions": {
        "availability": "high",
        "reliability": "high",
        "safety": "medium",
        "security": "none",
        "performability": "none"
    },

    ...
}

The experiment can declare a set of contributions, with each contribution rated as none, low, medium, or high depending on how much the experiment author thinks the experiment contributes trust and confidence to a particular system quality. The low, medium, and high values likely need little explanation, but none is usually a bit more surprising.

The none setting is available for when you want to be explicit that an experiment does not contribute any trust or confidence to a particular system quality. Not specifying the contribution to a system quality leaves the degree of trust and confidence that the experiment might bring to that quality up to the reader’s interpretation. By specifying none, the experiment author is communicating clearly that there was no intention to add anything to trust and confidence in the specified system quality.

While specifying a contribution model adds even more information to help a reader’s comprehension when the experiment is shared, that’s not all a contribution model adds. It also has a part to play when the results of an experiment’s execution are being shared, and to facilitate that, it’s time to look at how to create an experimental findings report.

Creating and Sharing Human-Readable Chaos Experiment Reports

When you execute a Chaos Toolkit experiment, it produces a journal of everything found when the experiment ran (see Chapter 4). This journal is a complete record of everything about the experiment and its context as the experiment ran, but although it’s captured in a JSON file, named journal.json by default,3 it’s not the easiest thing for us humans to parse.

It would be much nicer if you could produce a human-readable report from the journal of an experiment’s findings—and with the chaostoolkit-reporting plug-in, you can.

Creating a Single-Experiment Execution Report

The chaostoolkit-reporting plug-in extends the toolkit’s chaos command with the report sub-command so that you can generate PDFs, HTML, and many other formats as well. Installing the plug-in takes some effort as it requires native dependencies to work, so here we’re going to use the prepackaged Docker container of the Chaos Toolkit instead, as it comes with the reporting plug-in already installed.

The first thing you need to do is to make sure that you have Docker installed for your platform. Once you have Docker installed, you should see something like the following when you enter docker -v from your command line:

$ docker -v
Docker version 18.09.2, build 6247962

You might get a different version and build number, but as long as the command runs, you should be all set. Now you can download the prepackaged Chaos Toolkit Docker image:4

$ docker pull chaostoolkit/reporting
Using default tag: latest
latest: Pulling from chaostoolkit/reporting
bf295113f40d: Pull complete
62fe5b9a5ae4: Pull complete
6cc848917b0a: Pull complete
053381643ee3: Pull complete
6f73bfabf9cf: Pull complete
ed97632415bb: Pull complete
132b4f713e93: Pull complete
718eca2312ee: Pull complete
e3d07de070a7: Pull complete
17efee31eaf2: Pull complete
921227dc4c21: Pull complete
Digest: sha256:624032823c21d6626d494f7e1fe7d5fcf791da6834275f962d5f99fb7eb5d43d
Status: Downloaded newer image for chaostoolkit/reporting:latest

Once you’ve downloaded and extracted the image you can locate a journal.json file from an experiment’s execution and, from within the directory containing that file, run the following command to generate your own human-readable and shareable PDF-format report:

$ docker run 
    --user `id -u` 
    -v `pwd`:/tmp/result 
    -it 
    chaostoolkit/reporting

Running this docker command will produce, by default, a PDF report based on the single journal.json file in your current directory. It’s that simple, but it gets even better. If you have access to a number of journals (maybe you’ve been collecting them over time based on a single experiment, or maybe you’ve collected them during the execution of many different experiments that target the same system), then you can use the chaos report command to consume all of those experimental-finding journals to give you an even more powerful report.

Creating and Sharing a Multiple Experiment Execution Report

To create a report based on multiple experimental-findings journals, you just need to feed references to all those files into the chaos report command—or in our case, into the following docker command (if you had a collection of journal files that each began with journal-, such as journal-1.json and journal-2.json):

$ docker run 
    --user `id -u` 
    -v `pwd`:/tmp/result 
    -it 
    chaostoolkit/reporting -- report --export-format=html5 journal-*.json report.html

This command will take all of the specified journals and produce one über-report, by default named report.pdf, ready for sharing. And an additional piece of value is added to the reports if you specified a contribution model in each of the contributing experiments. If you had a contributions block in each of the experiments that produced your journals, then you will see a number of overview charts in your newly created report that show a picture of how the collective experiments contributed to trust and confidence across a range of system qualities!5

Summary

In this chapter you’ve looked at how it’s crucial to enable collaboration around experiments, how experimental findings are recorded through journal.json files, and even how those files can be turned into valuable reports in various formats. In Chapter 8 you’ll turn up the collaboration dial one notch further and see how and why you can customize your own automated chaos experiment platform with features that can be shared across your team and organization.

It’s time to customize your Chaos Toolkit.

1 Or you can create your own driver—see Chapter 8.

2 Of course, you would have had to install the malicious Chaos Toolkit extension, but it would be even better if you could avoid the danger altogether.

3 You can amend this—see Appendix A.

4 This may take a little while, as the image is on the large side!

5 These are important summaries for anybody responsible for the reliability of a system!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.76.89