2 A basic pipeline

In this chapter

  • working with the basic building blocks: pipelines
    and tasks
  • learning the elements of a basic CD pipeline: linting, testing, building, publishing, and deploying
  • understanding the role of automation in the execution of pipelines: webhooks, events, and triggering
  • exploring the varied terminology in the CD space

Before we get into the nitty-gritty of how to create great continuous delivery (CD) pipelines, let’s zoom out and take a look at pipelines as a whole. In this chapter, we’ll look at some pipelines at a high level and identify the basic elements you should expect to see in most CD pipelines.

Cat Picture Website

To understand what goes into basic CD pipelines, we’ll take a look at the pipelines used for Cat Picture Website. Cat Picture Website is the best website around for finding and sharing cat pictures! The way it’s built is relatively simple, but since it’s a popular website, the company that runs it (Cat Picture, Inc.) has architected it into
several services.

What’s CD again?

We use CD in this book to refer to continuous delivery. See chapter 1 for more.

The company runs Cat Picture Website in the cloud (its cloud provider is called Big Cloud, Inc.) and it uses some of Big Cloud’s services, such as Big Cloud Blob Storage service.

What’s a pipeline?

Don’t worry, we’ll get into that in a couple of pages!

  

Cat Picture Website source code

The architecture diagram tells us how Cat Picture Website is architected, but to understand the CD pipeline, there’s another important thing to consider: where does the code live?

In chapter 1, we looked at the elements of CD, half of which is about using continuous integration (CI) to ensure that our software is always in a releasable state. Let’s look at the definition again:

CI is process of combining code changes frequently, with each change verified on check-in.

When we look at what we’re actually doing when we do CD, we can see that the core is code changes. This means that the input to our CD pipelines is the source code. In fact, this is what sets CD pipelines apart from other kinds of workflow automation: CD pipelines almost always take source code as an input.

Version control

Using a version control system such as Git is a prerequisite for CD. Without having your code stored with history and conflict detection, it is practically impossible to have CD. More on this in chapter 3.

Before we look at Cat Picture Website CD pipelines, we need to understand how its source code is organized and stored. The folks working on Cat Picture Website store their code in several code repositories (repos):

  • The Frontend repo holds the code for the frontend

  • The Picture service, User service, and the database schemas are all stored in the Service repo.

  • Lastly, Cat Picture Website uses a config-as-code approach to configuration management (more on this in chapter 3), storing its configuration in the Config repo.

The Cat Picture Website developers could have organized their code in lots of other ways, all with their own pros and cons.

Cat Picture Website pipelines

Since Cat Picture Website is made up of several services, and all the code and configuration needed for it is spread across several repos, the website is managed by several CD pipelines. We’ll go over all of these pipelines in detail in future chapters as we examine more advanced pipelines, but for now we’re going to stick to the basic pipeline that is used for the User service and the Picture service.

Since these two services are so similar, the same pipeline is used for both, and that pipeline will show us all of the basic elements we’d expect to see in a pipeline.

Vocab time

Container images are executable software packages that contain everything needed to run that software.

This pipeline is not only used for Cat Picture Website, but also has the basic elements that you’ll see in all the pipelines in this book!

When does this actually get run? We’ll get to that in a few pages, and go in depth in chapter 10.

What’s a pipeline? What’s a task?

We just spent a few pages looking at Cat Picture Website pipeline, but what is a pipeline anyway? A lot of different terminology exists in the CD space. While we’re using the term pipeline, some CD systems use other terms like workflow. We’ll have an overview
of this terminology at the end of the chapter, but for now let’s take a look at pipelines
and tasks.

Tasks are individual things you can do; you can think of them a lot like functions. And pipelines are like the entry point to code, which calls all the functions at the right time, in the right order.

The following is a pipeline, represented as Python code, with three tasks: Task A runs first, then Task B, and the pipelines ends with Task C.

CD Pipelines will get run again and again; we’ll talk more about when in a few pages. If we were to run the pipeline() function (representing the preceding pipeline), we’d get this output:

Hello from task A!
Hello from task B!
Hello from task C!

The basic tasks in a CD pipeline

The Cat Picture Website pipeline shows us all the basic tasks that you will see in most pipelines. We’ll be looking at these basic tasks in detail in the next chapters. Let’s review what each task in the Cat Picture Website
pipeline is for:

  • Linting catches common programing and style errors in the Picture service and User service code.

  • Unit and integration tests verify that the Picture service and User service code does what the authors intended.

  • After the code has been linted and tested, the build image task builds container images for each of the services.

  • Next we upload the container images to an image registry.

  • Last, the running version of the software is updated to use the new images.

Each task in the Cat Picture Website pipeline is representative of a basic pipeline element:

  • Linting is the most common form of static analysis in CD pipelines.

  • Unit and integration tests are forms of tests.

  • These services are built into images; to use most software, you need to build it into another form before it can be used.

  • Container images are stored and retrieved from registries; as you saw in chapter 1, some kinds of software will need to be published in order to be used.

  • Cat Picture Website needs to be up and running so users can interact with it. Updating the running service to use the new image is how the website is deployed.

These are the basic types of tasks you’ll see in a CI/CD pipeline:

Gates and transformations

Some tasks are about verifying your code. They are quality gates that your code has to pass through.

Other tasks are about changing your code from one form to another. They are transformations of your code: your code goes in as input and comes out in another form.

Looking at the tasks in a CD pipeline as gates and transformations goes hand in hand with the elements of CD. In chapter 1, you learned that you’re doing CD when

  • you can safely deliver changes to your software at any time.

  • delivering that software is as simple as pushing a button.

If you squint at those, they map 1:1 to gates and transformations:

  • Gates verify the quality of your code changes, ensuring it is safe to deliver them.

  • Transformations build, publish, and, depending on the kind of software, deploy your changes.

And in fact, the gates usually make up the CI part of your pipeline!

CI is all about verifying your code! You’ll often hear people talk about “running CI” or “CI failing,” and usually they’re referring to gates.

CD: Gates and transformations

Let’s take a look at our basic CD tasks again and see how they map to gates and transformations:

  • Code goes into gating tasks, and they either pass or fail. If they fail, the code should not continue through the pipeline.

  • Code goes into transformation tasks, and it changes into something completely different, or changes are made to some part of the world using it.

Basic CD tasks map to gates and transformations like this:

  • Linting is all about looking at the code and flagging common mistakes and bugs, but without actually running the code. Sounds like a gate to me!

  • Testing activities verify that the code does what we intended it to do. Since this is another example of code verification, this sounds like a gate too.

  • Building code is about taking code from one form and transforming it into another form so that it can be used. Sometimes this activity will catch issues with the code, so it has aspects of CI. However, in order to test our code, we probably need to build it, so the main purpose here is to transform (build) the code.

  • Publishing code is about putting the built software somewhere so that it can be used. This is part of releasing that software. (For some code, such as libraries, this is all you need to do in order to release it!) This sounds like a kind of transformation too.

  • Lastly, deploying the code (for kinds of software that need to be up and running) is a kind of transformation of the state of the built software.

Okay, you said the gates are the CI tasks. Are you saying CI is just about tests and linting? I remember before CD, CI included building, too.

I hear you! CI does often include building, and sometimes folks throw publishing in there also. What really matters is having a conceptual framework for these activities, so in this book I choose to treat CI as being about verification, and not building/publishing/deploying/releasing.

Cat Picture Website service pipeline

What does the Cat Picture Website service pipeline look like if we view it as a pipeline of gates and transformations?

The first gate the code must pass through is linting. If there are linting problems in the code, we shouldn’t start transforming the code and delivering it; these problems should be fixed first.

The other gate the code must pass through is unit and integration tests. Just as with linting, if these tests reveal that the code doesn’t do what the authors intended, we shouldn’t start transforming the code and delivering it; these problems should be fixed first.

Once the code has passed through all the gates, we know it’s in good shape and we can start transforming it.

The first transformation is to build the image from the source code. The code is compiled and packaged up into a container image that can be executed.

The next transformation takes that built image and uploads it to the image registry, changing it from an image on disk to an image in a registry that can be downloaded and used.

The last transformation will update the running service to use the image.

And we’re done!

Running the pipeline

You might be starting to wonder how and when this pipeline gets run. That’s a great question! The process evolved over time for the folks at Cat Picture, Inc.

When Cat Picture, Inc., started, it had only a few engineers: Topher, Angela, and Sato. Angela wrote the Cat Picture Website service pipeline in Python and it looked like this:

def pipeline(source_repo, config_repo):
  linting(source_repo)
  unit_and_integration_tests(source_repo)
  image = build_image(source_repo)
  image_url = upload_image_to_registry(image)
  update_running_service(image_url, config_repo)

This is a simplification of the code Angela wrote, but it’s enough info for us to use for now.

The pipeline() function in this code executes each of the tasks in the Cat Picture Website as a function.

Both linting and testing happen on the source code, and building an image will perform the build from the source code. The outputs of each transformation (building, uploading, updating) are passed to each other as they are created.

This is great, but how do you run it? Someone (or as we’ll see later, some thing) needs to execute the pipeline() function.

Topher volunteered to be in charge of running the pipeline, so he wrote an executable Python file that looks like this:

if __name__ == "__main__":
  pipeline("https://10.10.10.10/catpicturewebsite/service.git",
           "https://10.10.10.10/catpicturewebsite/config.git")

This executable file calls the pipeline() function, passing in the addresses of the Service repo and Config repo Git repositories as arguments. All Topher has to do is run the executable, and he’ll run the pipeline and all of its tasks.

Should I write my pipelines and tasks in Python like Angela and Topher?

Probably not! Instead of reinventing a CD system yourself, you can choose from lots of existing tools. Appendix A provides a brief overview of some of the current options. We’ll be using Python to demonstrate the ideas behind these CD systems without suggesting any particular system to you, and we’ll use GitHub Actions in later chapters as well. All CD systems have their pros and cons; choose the ones that work best for your needs.

Running once a day

Topher is in charge of running the pipeline, by running the executable Python file:

def pipeline(source_repo, config_repo):
  linting(source_repo)
  unit_and_integration_tests(source_repo)
  image = build_image(source_repo)
  image_url = upload_image_to_registry(image)
  update_running_service(image_url, config_repo)
  
if __name__ == "__main__":
  pipeline("https://10.10.10.10/catpicturewebsite/service.git",
           "https://10.10.10.10/catpicturewebsite/config.git")

When does he run it? He decides that he’s going to run it every morning before he starts his day. Let’s see what that looks like:

Vocab time

Saying a pipeline breaks means that a task in the pipeline encountered an error and pipeline execution stopped.

Tuesday 10 a.m.

Topher runs the pipeline.

The pipeline breaks.

Topher sees that Sato made the most recent change.

That worked okay, but look what happened the next day:

Wednesday 10 a.m.

Topher runs the pipeline.

The pipeline breaks.

Both Sato and Angela made changes the day before.

This isn’t working out as Topher had hoped: because he’s running the pipeline once a day, he’s picking up all of the changes that were made the day before. When something goes wrong, he can’t tell which change caused the problem.

Trying continuous integration

Because Topher is running the pipeline once a day, he’s picking up all of the changes from the day before. If we look back at the definition of CI, we can see what’s going wrong:

Continuous integration is the process of combining code changes frequently, with each change verified on check-in.

Topher needs to run the pipeline on every change. This way, every time the code is changed, the team will get a signal about whether that change introduced problems.

Topher asks his team members to tell him each time they push a change, so that he can run the pipeline right away. Now the pipeline is being run on every change, and the team is getting feedback immediately after making changes.

Thursday 11:15 a.m.

Vocab time

Saying a pipeline passes means everything succeeded, i.e., nothing broke.

  

Continuous deployment

By running the entire pipeline, including the transformation tasks, Topher is doing continuous deployment as well. Many people will run their CI tasks and their transformation tasks as different pipelines. We’ll explore the tradeoffs between these approaches in chapter 13.

Using notifications

A few weeks have passed, and his team members have been telling Topher every time they make a change. Let’s see how it’s going:

Friday 3:15 p.m.

Once again, it didn’t work quite as well as Topher hoped. Angela made a change and forgot to tell him, and now the team has to backtrack. How can Topher make sure he doesn’t miss any changes?

Topher looks into the problem and realizes that he can get notifications from his version control system every time someone makes a change. Instead of having the team tell him when they make changes, he uses these email notifications.

Monday

  

Vocab time

Version control management is the term for systems like GitHub that combine version control with extra features such as code-review tools. Other examples are GitLab and Bitbucket. See appendix B.

Scaling manual effort

Things have been going so well for the team that two more team members have joined. What does this look like for Topher now?

Friday

Topher is now spending his entire day running the pipeline and has no time to do any other work. He has lots of ideas for things he wants to improve in the pipeline, and some features he wants to implement, but he can’t find any time!

He decides to step back and think about what’s happening so he can find a way to save his own time:

  1. An email arrives in Topher’s inbox.

  2. Topher’s email application notifies Topher he has a new email.

  3. Topher sees the notification.

  4. Topher runs the pipeline script.

  5. Topher tells people when the pipeline fails.

Topher looks at his own role in this process. Which parts require Topher’s human intervention?

  1. Topher has to see the email notification.

  2. Topher has to type the command to run the script.

  3. Topher tells people what happened.

Is there some way Topher could take himself out of the process? He’d need something that could do the following:

  1. See the notification

  2. Run the pipeline script

  3. Tell people what happened

Topher needs to find something that can receive a notification and run his script for him.

Automation with webhooks

Time is precious! Topher has realized his whole day is being taken up running the pipeline, but he can take himself out of the process if he can find tools to do the following:

  1. See the notification

  2. Run the pipeline script

  3. Tell people what happened

Topher looks into the problem and realizes that his version control system supports webhooks. By writing a simple web server, he can do everything he needs:

  1. The version control system will make a request to his web server every time someone pushes a change. (Topher doesn’t need to see the notification!)

  2. When the web server gets the request, it can run the pipeline script. (Topher doesn’t need to do it!)

  3. The request the system makes to the web server contains the email of the person who made the change, so if the pipeline script fails, the web server can send an email to the person who caused the problem.

Vocab time

Use webhooks to get a system outside of your control to run your code when events happen. Usually, you do this by giving the system the URL of an HTTP endpoint that you control.

class Webhook(BaseHTTPRequestHandler):
  def do_POST(self):
    respond(self)
    email = get_email_from_request(self)
    success, logs = run_pipeline()
    if not success:
      send_email(email, logs)
if __name__ == '__main__':
  httpd = HTTPServer(('', 8080), Webhook)
  httpd.serve_forever()

Topher starts the web server running on his workstation, and voilà: he has automated pipeline execution!

How do I get notifications and events from my version control system?

You’ll have to look at the documentation for your version control system to see how to set this up, but getting notifications for changes and webhook triggering is a core feature of most version control systems. If yours doesn’t have that, consider changing to a different system that does! (See appendix B for some options.)

Scaling with webhooks

Let’s look at what happens now that Topher has automated execution with his webhook:

Monday

The events from the version control system and the webhooks are taking care of all that manual work Topher was doing before. Now he can move on to the work he actually wants to get done!

Vocab time

Having your version control system call your webhook when an event happens is often referred to as triggering your pipeline.

Should I write these webhooks myself like Topher did?

Again, probably not! We’re using Python here to demonstrate how CD systems work in general, but instead of creating one yourself, look at the appendices at the end of this book to see existing CD systems you could use. Supporting webhooks is a key feature to look for!

Don’t push changes when broken

Topher will run into a few more problems. Let’s look at a couple of them here and we’ll leave the rest for chapter 7. What if Angela introduced a change and wasn’t able to fix it before another change was made?

Monday

While Angela is fixing the problem she introduced, Sato pushes one of his changes. The system thinks that Sato caused the pipeline to break, but in reality it was Angela, and poor Sato is confused.

Plus, every change that is added on top of an existing problem has the potential to make it harder and harder to fix the original problem. The way to combat this is to enforce a simple rule:

When the pipeline breaks, stop pushing changes.

This can be enforced by the CD system itself, and also by notifying all the other engineers working on the project that the pipeline is broken, via notifications. Stay tuned for chapter 7 to learn more!

Why break the pipeline at all?

Wouldn’t it be better if Angela found out before she pushed that a problem existed? That way, she could fix it before pushing, and it wouldn’t interfere with Sato’s work. Yes, that’s definitely better! We’ll talk a bit more about this in the next chapter and get into more detail in chapter 7.

Cat Picture Website CD

Whew! Now we know all about Cat Picture Website’s CD: the pipeline that the developers use for their services, as well as how it is automated and triggered.

Should I run webhooks directly on my workstation?

No. In chapter 9, we’ll see some good reasons not to, and besides, running webhooks is another feature most CD systems will handle for you.

What’s in a name?

Once you start using a CD system, you might encounter terminology different from what we’ve been using in this chapter and will be using in the rest of this book. So here’s an overview of the terminology used across the space and how it relates to the terms we’ll be using:

Conclusion

The pipeline used by Cat Picture Website for its services shows us the same basic building blocks that you should expect to see in most CD pipelines. By looking at how the folks at Cat Picture Website run their pipeline, we’ve learned how important automation is in making CD scale, especially as a company grows. In the rest of this book, we’ll be looking at the details of each element of the pipeline and how to stitch them together.

Summary

  • This book uses the terms pipelines and tasks to refer to basic CD building blocks, which can go by many other names.

  • Tasks are like functions. Tasks can also be called stages, jobs, builds, and steps.

  • Pipelines are the orchestration that combines tasks together. Pipelines can also be called workflows.

  • The basic components of a CD pipeline are linting (static analysis), testing, building, delivering, and deploying.

  • Linting and testing are gates (aka continuous integration tasks), while building, delivering, and deploying are transformations.

  • Version control systems provide mechanisms such as events and webhooks to make it possible to automate pipeline execution.

  • When a pipeline breaks, stop pushing changes!

Up next . . .

In the next chapter, we’ll examine why version control is essential to CD and why all the plain-text data that makes up your software should be stored in version control, a practice often called config as code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.186.6