Airflow & Spark talk
Airflow and Spark Streaming at Astronomer
June 5, 2017
By Taylor Edmiston
- Data Engineering
- Apache Airflow
- Apache Spark
- Software Engineer at Astronomer (core platform)
- Experience working at and mentoring several Cincy startups
- 9 years with Python
2. Data Engineering
What is data engineering?
- "The data engineering field could be thought of as a superset of business intelligence and data warehousing that brings more elements from software engineering.” —Maxime
- Data engineers exists because companies have troves of data… but they need to be able to extract and manipulate it to glean value
- Data engineering tools are how we make sense of it all quickly (and at scale)
- (Which tools? Data from where? At what scale?)
What is Astronomer?
- Astronomer’s platform connects and centralizes data, making it super simple for anyone from business users to data scientists to quickly create and monitor data pipelines across the entire organization.
3. Apache Airflow
- “Airflow is a platform to programmatically author, schedule and monitor workflows.”
- Airflow vs other frameworks - ex. Luigi (Spotify), Azkaban (LinkedIn)
- Components are extensible and there are community contributed ones
- Sensors - ex.
- Operators - ex. PythonOperator
Airflow at Astronomer