appendix A. Running code samples

This book comes with an accompanying code repository on GitHub (https://github.com/BasPH/data-pipelines-with-apache-airflow). The repository holds the same code as demonstrated in this book, together with easily executable Docker environments so that you can run all examples yourself. This appendix explains how the code is organized and how to run the examples.

A.1 Code structure

The code is organized per chapter, and each chapter is structured the same. The top level of the repository consists of several chapter directories (numbered 01–18), which contain self-contained code examples for the corresponding chapters. Each chapter directory contains at least the following files/directories:

  • dags—Directory containing the DAG files demonstrated in the chapter

  • docker-compose.yml—File describing the Airflow setup needed for running the DAGs

  • README.md—Readme introducing the chapter examples and explaining any chapter-specific details on how to run the examples

Where possible, code listings in the book will refer to the corresponding file in the chapter directory. For some chapters, code listings shown in the chapters will correspond to individual DAGs. In other cases (particularly for more complex examples), several code listings will be combined into one single DAG, resulting in a single DAG file.

Other than DAG files and Python code, some examples later in the book (especially the cloud chapters 16, 17, and 18) require extra supporting resources or configuration to run the examples. The extra steps required to run these examples will be described in the corresponding chapter and the chapter’s README file.

A.2 Running the examples

Each chapter comes with a Docker environment that can be used for running the corresponding code examples.

A.2.1 Starting the Docker environment

To get started with running the chapter examples, run inside the chapter directory:

$ docker-compose up --build

This command starts a Docker environment that contains several containers required for running Airflow, including the following containers:

  • Airflow webserver

  • Airflow scheduler

  • Postgres database for the Airflow metastore

To avoid seeing the output of all three containers in your terminal, you can also start the Docker environment in the background by using

 $ docker-compose up --build -d

Some chapters create additional containers, which provide other services or APIs needed for the examples. For example, chapter 12 demonstrates the following monitoring services, which are also created in Docker to make the examples to be as realistic as possible:

  • Grafana

  • Prometheus

  • Flower

  • Redis

Fortunately, running all these services will be taken care of for you by the details in the docker-compose file. Of course, don’t hesitate to dive into the details of this file if you’re interested.

A.2.2 Inspecting running services

Once an example is running, you can check out which containers are running using the docker ps command:

$ docker ps
CONTAINER ID       IMAGE                            ... NAMES
d7c68a1b9937       apache/airflow:2.0.0-python3.8   ... chapter02_scheduler_1
557e97741309       apache/airflow:2.0.0-python3.8   ... chapter02_webserver_1
742194dd2ef5       postgres:12-alpine               ... chapter02_postgres_1

By default, docker-compose prefixes running containers with the name of the containing folder, meaning that containers belonging to each chapter should be recognizable by their container names.

You can also inspect the logs of the individual containers using docker logs:

$ docker logs -f chapter02_scheduler_1
 [2020-11-30 20:17:36,532] {scheduler_job.py:1249} INFO - Starting the scheduler
 [2020-11-30 20:17:36,533] {scheduler_job.py:1254} INFO - Processing each file at most -1 times
 [2020-11-30 20:17:36,984] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 131

These logs should hopefully be able to provide you with valuable feedback if things go awry.

A.2.3 Tearing down the environment

Once you’re done running an example, you can exit docker-compose using CTRL+C. (Note that this isn’t needed if you’re running docker-compose in the background.) To fully teardown the Docker environment, you can run the following command from the chapter directory:

  $ docker-compose down -v

In addition to stopping the various containers, this should also take care of removing any Docker networks and volumes used in the example.

To check if all containers have indeed been fully removed, you can use the following command to see any containers that have been stopped but not yet deleted:

  $ docker ps -a

If you’re anything like us, this might still show a list of containers that you’ll want to remove. You can remove containers one by one using the following command:

  $ docker rm <container_id>

where the container_id is obtained from the list of containers shown by the ps command. Alternatively, you can use the following shorthand to remove all containers:

  $ docker rm $(docker ps -aq)

Finally, you can also remove any unused volumes previously used by these containers using

  $ docker volume prune

However, we urge you to use caution when using this command, as it may result in inadvertent data loss if you end up discarding the wrong Docker volumes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.122.162