Data Pipelines: ETL, ELT, and DAGs
Data pipelines move data from source to destination, turning raw facts into actionable insights. ETL and ELT describe where data is transformed. DAGs, or directed acyclic graphs, organize the steps that move data across systems. Understanding these ideas helps you pick the right pattern for your team and your data.
What ETL means
ETL stands for extract, transform, load. In this pattern, you clean and shape data before it enters the target warehouse or data lake. This upfront work helps quality, but it can slow loading and requires compute before load. ETL works well when data sources are messy or when the destination needs strict governance.
What ELT means
ELT flips the order: extract, load, transform. Raw data lands in the warehouse first, and the heavy lifting happens there. This fits modern, scalable platforms that can run many calculations in place. ELT keeps raw data available for future use and can simplify pipelines, but it depends on warehouse performance and careful governance.
Understanding DAGs
DAGs define tasks and how they depend on one another. They are the backbone of orchestration. Nodes are tasks, edges show order, and a scheduler runs the graph on a schedule or in response to events. A simple DAG might pull data, clean it, run a daily aggregation, and publish a report.
Choosing a pattern for your needs
Think about latency, data quality, and cost. If you need clean data before it moves, ETL is helpful. If you have a powerful warehouse and large data volumes, ELT can be more flexible. Use DAGs to coordinate multiple pipelines, retries, and alerts.
A simple example
Suppose your store sends daily sales data. You extract from the API, load the raw file into a staging area, then transform to daily totals and store them in a dashboard table. The pipeline can run every night and trigger an alert if a step fails.
Key Takeaways
- ETL and ELT are two ways to move and shape data
- DAGs help organize tasks and dependencies
- Choose the pattern based on data quality needs and system power