This book comes with an accompanying code repository on GitHub (https://github.com/BasPH/data-pipelines-with-apache-airflow). The repository holds the same code as demonstrated in this book, together with easily executable Docker environments so that you can run all examples yourself. This appendix explains how the code is organized and how to run the examples.
The code is organized per chapter, and each chapter is structured the same. The top level of the repository consists of several chapter directories (numbered 01–18), which contain self-contained code examples for the corresponding chapters. Each chapter directory contains at least the following files/directories:
dags—Directory containing the DAG files demonstrated in the chapter
docker-compose.yml—File describing the Airflow setup needed for running the DAGs
README.md—Readme introducing the chapter examples and explaining any chapter-specific details on how to run the examples
Where possible, code listings in the book will refer to the corresponding file in the chapter directory. For some chapters, code listings shown in the chapters will correspond to individual DAGs. In other cases (particularly for more complex examples), several code listings will be combined into one single DAG, resulting in a single DAG file.
Other than DAG files and Python code, some examples later in the book (especially the cloud chapters 16, 17, and 18) require extra supporting resources or configuration to run the examples. The extra steps required to run these examples will be described in the corresponding chapter and the chapter’s README file.
Each chapter comes with a Docker environment that can be used for running the corresponding code examples.
To get started with running the chapter examples, run inside the chapter directory:
$ docker-compose up --build
This command starts a Docker environment that contains several containers required for running Airflow, including the following containers:
To avoid seeing the output of all three containers in your terminal, you can also start the Docker environment in the background by using
$ docker-compose up --build -d
Some chapters create additional containers, which provide other services or APIs needed for the examples. For example, chapter 12 demonstrates the following monitoring services, which are also created in Docker to make the examples to be as realistic as possible:
Fortunately, running all these services will be taken care of for you by the details in the docker-compose file. Of course, don’t hesitate to dive into the details of this file if you’re interested.
Once an example is running, you can check out which containers are running using the docker
ps
command:
$ docker ps CONTAINER ID IMAGE ... NAMES d7c68a1b9937 apache/airflow:2.0.0-python3.8 ... chapter02_scheduler_1 557e97741309 apache/airflow:2.0.0-python3.8 ... chapter02_webserver_1 742194dd2ef5 postgres:12-alpine ... chapter02_postgres_1
By default, docker-compose prefixes running containers with the name of the containing folder, meaning that containers belonging to each chapter should be recognizable by their container names.
You can also inspect the logs of the individual containers using docker logs
:
$ docker logs -f chapter02_scheduler_1 ➥ [2020-11-30 20:17:36,532] {scheduler_job.py:1249} INFO - Starting the scheduler ➥ [2020-11-30 20:17:36,533] {scheduler_job.py:1254} INFO - Processing each file at most -1 times ➥ [2020-11-30 20:17:36,984] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 131
These logs should hopefully be able to provide you with valuable feedback if things go awry.
Once you’re done running an example, you can exit docker-compose using CTRL+C. (Note that this isn’t needed if you’re running docker-compose in the background.) To fully teardown the Docker environment, you can run the following command from the chapter directory:
$ docker-compose down -v
In addition to stopping the various containers, this should also take care of removing any Docker networks and volumes used in the example.
To check if all containers have indeed been fully removed, you can use the following command to see any containers that have been stopped but not yet deleted:
$ docker ps -a
If you’re anything like us, this might still show a list of containers that you’ll want to remove. You can remove containers one by one using the following command:
$ docker rm <container_id>
where the container_id
is obtained from the list of containers shown by the ps
command. Alternatively, you can use the following shorthand to remove all containers:
$ docker rm $(docker ps -aq)
Finally, you can also remove any unused volumes previously used by these containers using
$ docker volume prune
However, we urge you to use caution when using this command, as it may result in inadvertent data loss if you end up discarding the wrong Docker volumes.
13.59.122.162