Publish AI, ML & data-science insights to a global community of data professionals.

How to Run Airflow Locally With Docker

A step by step guide for running Airflow with Docker on your local machine

Photo by Joshua Reddekopp on Unsplash
Photo by Joshua Reddekopp on Unsplash

Introduction

Apache Airflow is one of the hottest technologies in the field of Data Engineering that lets users build, orchestrate and monitor data pipelines at scale.

There’s a certain chance that you have already attempted to run Airflow locally by installing it through pip but the chances are that you are running into problems and even worse, messing up your local environment.

If you would like to test Airflow on your local machine, then the simplest way to do so is with the use of Docker images. In today’s short tutorial we will be going through a step by step guide that we’ll help you get Airflow up and running via Docker in less than a few minutes.


Prerequisites

First of all, you need to make sure you have installed


Step 1: Fetch docker-compose.yaml

The first thing we’ll need is the docker-compose.yaml file. Create a new directory on your home directory (let’s call it airflow-local):

$ mkdir airflow-local
$ cd airflow-local

And fetch the docker-compose.yaml file (note that we will be using Airflow v2.3.0)

$ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.3.0/docker-compose.yaml'

Feel free to inspect the compose file and the services defined in it namely airflow-scheduler, airflow-webserver, airflow-worker, airflow-init, flower, postgres and redis.


Step 2: Create directories

Now while you are in the airflow-local directory, we will need to create three additional directories:

  • dags
  • logs
  • plugins
$ mkdir ./dags ./logs ./plugins

Step 3: Setting the Airflow user

Now we would have to export an environment variable to ensure that the folder on your host machine and the folders within the containers share the same permissions. We will simply add these variables into a file called .env.

$ echo -e "AIRFLOW_UID=$(id -u)nAIRFLOW_GID=0" > .env

Inspect the content of .env using cat and ensure that it contains the two aforementioned variables.

$ cat .env
AIRFLOW_UID=501
AIRFLOW_GID=0

Step 4: Initialise the Airflow Database

Now we are ready initialise the Airflow Database by first starting the airflow-init container:

$ docker-compose up airflow-init

This service will essentially run airflow db init and create the admin user for the Airflow Database. By default, the account created has the login airflow and the password airflow.


Step 5: Start Airflow services

The final thing we need to do to get Airflow up and running is start the Airflow services we’ve seen in Step 1.

$ docker-compose up

Note that the above command may take a while since multiple services need to be started. Once done, you can verify that these images are up and running using the following command in a new command-line tab:

$ docker ps
Airflow images up and running - Source: Author
Airflow images up and running – Source: Author

Step 6: Access Airflow UI

In order to access Airflow User Interface simply head to your preferred browser and open localhost:8080.

Airflow admin login - Source: Author
Airflow admin login – Source: Author

Type in your credentials (as already noted, by default these will be both set to airflow and hit ‘Sign in’. You should now gain access to the Airflow Dashboard where you can see some of the example DAGs patched with Airflow.

Example DAGs on Airflow UI - Source: Author
Example DAGs on Airflow UI – Source: Author

Step 7: Enter the Airflow Worker container

You can even enter the worker container so that you can run airflow commands using the following command. You can find <container-id> for the Airflow worker service by running docker ps:

$ docker exec -it <container-id> bash

For example,

$ docker exec -it d2697f8e7aeb bash
$ default@d2697f8e7aeb:/opt/airflow$ airflow version
2.3.0

Step 8: Cleaning up the mess

Once you are done with your experimentation, you can clean up the mess we’ve just created by simply running

$ docker-compose down --volumes --rmi all

This command will stop and delete all running containers, delete volumes with database data and downloaded images.

If you run docker ps once again you can verify that no container is up and running

$ docker ps
CONTAINER ID  IMAGE   COMMAND   CREATED   STATUS    PORTS     NAMES

Final Thoughts

In today’s short tutorial we explored a step-by-step guide that can help you get Apache Airflow v2.3.0 up and running on your local machine via Docker.

Note that the Airflow Docker images should only be used for testing purposes. If you are planning to deploy Airflow on production environments I’d recommend running it on Kubernetes with the official helm chart.


Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Join Medium with my referral link – Giorgos Myrianthous


Related articles you may also like

Tools For Data Engineers


What to Expect in Python 3.11


15 Kafka CLI Commands For Everyday Programming


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles