Apache airflow is what Airbnb use to manage their data pipelines. They built it and then gave it to the Apache foundation. It is now an Apache top level project. Read a little bit about Apache if you don’t know the significance of that. TLDR; it’s a solid piece of tech.
What does it do? Basically, large scale data processing generally requires:
This on it’s own isn’t too hard. But let’s say you are working for a big company. Like a bank. What else might a data pipeline need?
This is why Airflow exists.
Another cool thing about airflow is that the concepts at its core can help you understand many other tools.
Airflow’s docs are great. Use the official tutorial, and make sure you understand the core concepts.
And this guide has some good pictures