How to Install Apache Airflow on Ubuntu 24.04

Updated on January 31, 2025
How to Install Apache Airflow on Ubuntu 24.04 header image

Apache Airflow is an open-source workflow management platform that manages data pipelines and automates workflows such as Extract, Transform, and Load (ETL) processes. If you want to install Apache Airflow on Ubuntu 24.04, it utilizes Python-based Directed Acyclic Graphs (DAGs) to schedule and execute tasks while streamlining the management of all required dependencies for error-free execution.

This article explains how to install Apache Airflow on Ubuntu 24.04, configure a secure environment, and test your deployment with a sample Directed Acyclic Graph (DAG).

Prerequisites

Before you begin, you need to:

Install Apache Airflow on Ubuntu 2.04

Apache Airflow is available as a Python package you can install using a package manager such as Pip. Follow the steps below to install Python if it's not available on your system, create a new virtual environment, and install Apache Airflow.

  1. Update the server's package index.

    console
    $ sudo apt update
    
  2. View the available Python version on your server.

    console
    $ python3 --version
    

    Your output should be similar to the one below.

    Python 3.12.3

    Install Python if it's not available on your server.

    console
    $ sudo apt install python3
    
  3. Install the python3-venv Python virtual environment module and the PostgreSQL development library.

    console
    $ sudo apt install python3-venv libpq-dev -y
    
  4. Create a new virtual environment such as airflow_env.

    console
    $ python3 -m venv airflow_env
    
  5. Activate the airflow_env virtual environment.

    console
    $ source ~/airflow_env/bin/activate
    

    Verify that your shell prompt changes to the airflow_env virtual environment.

    console
    (airflow_env) linuxuser@example:~$
    
  6. Use Pip to install Apache Airflow with support for PostgresQL.

    console
    $ pip install apache-airflow[postgres] psycopg2
    
  7. Install PostgreSQL.

    console
    $ sudo apt install postgresql postgresql-contrib
    
  8. Start the PostgreSQL service.

    console
    $ sudo systemctl start postgresql
    
  9. Access the PostgreSQL console using the postgres user.

    console
    $ sudo -u postgres psql
    

    Your output should be similar to the one below:

    psql (16.6 (Ubuntu 16.6-0ubuntu0.24.04.1))
    Type "help" for help.
    
    postgres=# 
  10. Create a new airflow PostgreSQL user with a strong password. Replace YourStrongPassword with your desired password.

    psql
    postgres=# CREATE USER airflow PASSWORD 'YourStrongPassword';
    
  11. Create a new database, for example, airflowdb.

    psql
    postgres=# CREATE DATABASE airflowdb;
    
  12. Grant the airflow user full privileges to all tables in the public schema.

    psql
    postgres=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
    
  13. Grant the airflow user ownership privileges to the airflowdb database.

    psql
    postgres=# ALTER DATABASE airflowdb OWNER TO airflow;
    
  14. Grant the airflow user all privileges on the public schema.

    psql
    postgres=# GRANT ALL ON SCHEMA public TO airflow;
    
  15. Exit the PostgreSQL console.

    psql
    postgres=# exit;
    
  16. Open the airflow.cfg file in your Airflow installation directory.

    console
    $ nano ~/airflow/airflow.cfg
    

    Temporarily initialize the database and start the Airflow scheduler to create the necessary directories if the airflow directory is missing.

    console
    $ airflow db init; airflow scheduler
    

    Press Ctrl+C to stop the scheduler.

  17. Replace the default executor and sql_alchemy_conn values with the following configuration to enable parallel execution and set PostgreSQL as the metadata database.

    ini
    executor = LocalExecutor
    sql_alchemy_conn = postgresql+psycopg2://airflow:YourStrongPassword@localhost/airflowdb
    

    Save and close the file.

  18. Initialize the Airflow metadata database to apply the changes.

    console
    $ airflow db init
    

    Your output should be similar to the one below:

    DB: postgresql+psycopg2://airflow:***@localhost/airflow
    [2025-01-05T23:58:36.808+0000] {migration.py:207} INFO - Context impl PostgresqlImpl.
    [2025-01-05T23:58:36.809+0000] {migration.py:210} INFO - Will assume transactional DDL.
    INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
    INFO  [alembic.runtime.migration] Will assume transactional DDL.
    INFO  [alembic.runtime.migration] Running stamp_revision  -> 5f2621c13b39
    WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
    Initialization done
  19. Create a new administrative user to use with Apache Airflow. Replace admin with your desired username.

    console
    $ airflow users create \
       --username admin \
       --password yourSuperSecretPassword \
       --firstname Admin \
       --lastname User \
       --role Admin \
       --email admin@example.com
    
  20. Start the Airflow web server on port 8080 as a background process and redirect all logs to the webserver.log file.

    console
    $ nohup airflow webserver -p 8080 > webserver.log 2>&1 &
    
  21. Start the Airflow scheduler and redirect all logs to the scheduler.log file.

    console
    $ nohup airflow scheduler > scheduler.log 2>&1 &
    

Configure Nginx as a Reverse Proxy to Expose Apache Airflow

Apache Airflow listens for connections using the default port 8080. Follow the steps below to secure Apache Airflow's port using Nginx and serve requests over HTTP and HTTPS.

  1. Install Nginx.

    console
    $ sudo apt install -y nginx
    
  2. Create a new airflow Nginx virtual host configuration file.

    console
    $ sudo nano /etc/nginx/sites-available/airflow
    
  3. Add the following configurations to the file. Replace airflow.example.com with your actual domain.

    nginx
    server {
        listen 80;
        server_name airflow.example.com;
    
        location / {
            proxy_pass http://127.0.0.1:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
    

    Save and close the file.

    The above Nginx configuration listens for incoming connections using your airflow.example.com domain and forwards all connections to the Apache Airflow port 8080.

  4. Link the Airflow configuration to the Nginx sites-enabled directory to enable it.

    console
    $ sudo ln -s /etc/nginx/sites-available/airflow /etc/nginx/sites-enabled/
    
  5. Test the Nginx configuration for errors.

    console
    $ sudo nginx -t
    

    Your output should be similar to the one below.

    nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
    nginx: configuration file /etc/nginx/nginx.conf test is successful
  6. Reload Nginx to apply the configuration changes.

    console
    $ sudo systemctl reload nginx
    
  7. Allow connections to the HTTP port 80 through the firewall.

    console
    $ sudo ufw allow 80/tcp
    
  8. Reload UFW to apply the firewall configuration changes.

    console
    $ sudo ufw reload
    
  9. Access your airflow.example.com domain using a web browser such as Chrome and verify that the Airflow login page displays.

    http://airflow.example.com

    Airflow Login Page

Generate Trusted SSL Certificates to Secure Apache Airflow

SSL certificates encrypt the connection between a client and the Apache Airflow server. Follow the steps below to generate Let's Encrypt SSL certificates using Certbot to secure connections to Apache Airflow.

  1. Install the Certbot Let's Encrypt Client.

    console
    $ sudo snap install --classic certbot
    

    Install Snap if it's not available on your workstation.

    console
    $ sudo apt install snapd -y
    
  2. Move the Certbot binary to the /usr/bin path to enable it as a system-wide command.

    console
    $ sudo ln -s /snap/bin/certbot /usr/bin/certbot
    
  3. Request a new Let's Encrypt SSL certificate using the Nginx plugin and your domain. Replace aiflow.example.com with your actual domain and admin@example.com with your active email address.

    console
    $ sudo certbot --nginx --redirect -d airflow.example.com -m admin@example.com --agree-tos
    

    Your output should be similar to the one below when the certificate request is successful.

    ...
    Account registered.
    Requesting a certificate for airflow.example.com
    
    Successfully received certificate.
    Certificate is saved at: /etc/letsencrypt/live/airflow.example.com/fullchain.pem
    Key is saved at:         /etc/letsencrypt/live/airflow.example.com/privkey.pem
    This certificate expires on 2025-04-21.
    These files will be updated when the certificate renews.
    Certbot has set up a scheduled task to automatically renew this certificate in the background.
    
    Deploying certificate
    Successfully deployed certificate for airflow.example.com to /etc/nginx/sites-enabled/airflow
    Congratulations! You have successfully enabled HTTPS on https://airflow.example.com
    ...
  4. Verify that Certbot auto-renews the SSL certificate before it expires.

    console
    $ sudo certbot renew --dry-run
    
  5. Restart Nginx to apply the SSL configuration changes.

    console
    $ sudo systemctl restart nginx
    

Access Apache Airflow

Follow the steps below to access the Apache Airflow interface and run DAGs of your server.

  1. Access the Apache Airflow web interface using your domain.

    https://airflow.example.com

    Enter the following credentials you set earlier to log in to Apache Airflow.

    • Username: admin
    • Password: yourSuperSecretPassword

    Dashboard

Create and Run DAGs Using Apache Airflow

Follow the steps below to create and run a sample DAG using Apache Airflow.

  1. Create the dags directory in the Airflow installation directory.

    console
    $ mkdir ~/airflow/dags
    
  2. Create a new my_first_dag.py Python application file in the dags directory.

    console
    $ nano ~/airflow/dags/my_first_dag.py
    
  3. Add the following code to the my_first_dag.py file to define a new DAG.

    python
    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from datetime import datetime, timedelta
    
    with DAG(
        'my_first_dag',
        start_date=datetime(2024, 1, 1),
        schedule_interval=timedelta(days=1),
        catchup=False
    ) as dag:
    
        def print_hello():
            print('Greetings from Vultr')
    
        hello_task = PythonOperator(
            task_id='hello_task',
            python_callable=print_hello
        )
    

    Save and close the file.

    The above application code creates a my_first_dag sample DAG that runs daily and prints a Greetings from Vultr.

  4. Navigate to the DAGs within the Apache Airflow interface. Find and enable the DAG to manually trigger it.

    Graph View

  5. Use the Graph View and Event Log to monitor the DAG.

    Airflow Views

Conclusion

You have installed Apache Airflow on Ubuntu 24.04 and secured access to the application using Nginx as a reverse proxy. You can use Apache Airflow to create multiple workflows and DAGs to match your project needs. For more information, visit the Airflow documentation.