Airflow & Spark talk v2.0
Airflow and Spark Streaming at Astronomer
October 12, 2017
by Taylor Edmiston
This talk is from a talk I gave at the .
- Data Engineering
- Apache Airflow
- Apache Spark
- Wrap up
1/5 - Introduction
- Now - Software Engineer at Astronomer
- BS in CS, Wright State ’12
- Experience working at and mentoring several Cincy startups
- 9 years with Python, 5 years as a professional programmer
2/5 - Data Engineering
What is data engineering?
- "The data engineering field could be thought of as a superset of business intelligence and data warehousing that brings more elements from software engineering.”
- —Maxime Beauchemin, Creator of Airflow
- Data engineers exists because companies have troves of data… but they need to be able to extract and manipulate it to glean value
- Data engineering tools are how we make sense of it all quickly at scale
- (Which tools? Data from where? At what scale?)
What is Astronomer?
- Astronomer is a data engineering platform that connects and centralizes data, making it simple for anyone from business users to data scientists to aggregate streaming data and create data pipelines.
3/5 - Apache Airflow
- “Apache Airflow is a platform to programmatically author, schedule and monitor workflows.”
- Airflow vs other frameworks - ex. Luigi (Spotify), Azkaban (LinkedIn)
- Components are extensible and there are community contributed ones
- Very widely used - Airbnb, Astronomer, Carbonite, FreshBooks, HBO, IFTTT, Lyft, New Relic, Postmates, Quora, Robinhood, Stripe, Uber (hard fork), Zapier, etc. ()