Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. Airflow is one of the best open source orchestrators and it is used widely because it is simplicity, scalability and extensibility.
Main goal of this course is to achieve an Airflow distributed setup using Celery Executor and be able to run more than 100 jobs or DAGs in parallel at any instance in time. I cover Sequential, Local and Celery Executor in this course. We acquire few EC2 instances from AWS and configure these executors.
Airflow community recently released Airflow 2.0. It contains many amazing features like HA Scheduler, Massive performance improvement on scheduler as well as celery workers etc. I myself fascinated by airflow 2.0 performance and migrated all airflow 1.x to 2.x in my organisation.
I am adding a new module on apache airflow 2.0. Here, we begin the module by learning new enhancements and HA architecture of airflow 2.0. Next, we install Webserver, Scheduler, Celery workers and Flower components. At the end, we configure multiple schedulers and observe its performance.
In addition to this, we explore salient features like Login, Email alerting and Logs management.
By the end of this course, you own a great distributed airflow setup which could be shared by multi teams in your organisation.