From ETL to ELT: A Practical Playbook for Building Modern Data Pipelines with Airbyte and dbt

Community manager and producer of specialized marketing content

Modern data teams are moving fast from traditional ETL to ELT. Why? Because today’s cloud data warehouses and lakehouses are powerful enough to handle heavy transformations right where your data lives. That shift reshapes the pipeline: ingestion tools bring raw data in quickly and reliably, then transformation tools shape it into trusted, analytics-ready datasets.

In this guide, you’ll learn what’s really different between ETL and ELT, why ELT wins in most modern scenarios, and how to design a clean, maintainable pipeline using Airbyte (for ingestion) and dbt (for transformation). You’ll also get implementation steps, performance tips, governance best practices, and a practical FAQ.

ETL vs. ELT: What’s the Real Difference?

ETL (Extract, Transform, Load)
Where it excels: strict schemas, limited cloud compute, heavy on-prem workloads.
Limitations: long development cycles, inflexible when source schemas change, harder to scale as sources multiply.
ELT (Extract, Load, Transform)
Where it shines: cloud-first stacks, agile changes, fast iteration, lower maintenance.
Why it’s winning: cheap storage, elastic compute, SQL-native transformation tools, and strong lineage/testing frameworks.

In short: ELT flips the order so you can land raw data first (quickly and safely), then transform downstream using the warehouse’s horsepower. This enables faster delivery, better governance, and repeatable engineering patterns.

Why the Modern Data Stack Favors ELT

Elastic compute: Warehouses like Snowflake, BigQuery, and Databricks scale transformations on demand.
Agility: Iterating in SQL with version control is faster than rebuilding brittle ETL flows.
Observability and testing: Modern tooling bakes in lineage, tests, and documentation.
Cost control: Separate ingestion and transformation lets you optimize each layer independently.

The Airbyte + dbt Combo: Who Does What?

Airbyte handles EL (Extract + Load)
Connects to dozens of sources (SaaS apps, APIs, databases).
Supports CDC (Change Data Capture) to replicate only deltas.
Loads into destinations like Snowflake, BigQuery, Databricks, Postgres, and S3.
dbt handles T (Transform)
SQL-first transformations, models, tests, and lineage in your warehouse.
Modular, version-controlled projects with CI/CD.
Built-in data quality tests and documentation.

For a deeper look at building reliable ELT integrations, this practical guide is worth bookmarking: Airbyte made practical: how to build reliable data integrations and ELT pipelines.

If you’re new to dbt or want to level up your transformation practice, see: dbt (Data Build Tool): what you need to know.

Reference Architecture: From Sources to “Gold” Analytics

A proven pattern that pairs perfectly with ELT is the Medallion Architecture:

Bronze (raw): Land unmodified source data fast and faithfully.
Silver (refined): Clean, standardize, and conform across sources.
Gold (business-ready): Curated marts that power dashboards, ML features, and self-service analytics.

Explore the pattern in detail here: Medallion Architecture explained: how Bronze/Silver/Gold layers supercharge your data quality.

How this maps to Airbyte + dbt:

Airbyte lands Bronze (raw) tables in your warehouse or data lake.
dbt builds Silver (cleaned/conformed) and Gold (consumption) layers on top.

Step-by-Step: Implement Your First Airbyte + dbt ELT Pipeline

1) Pick a destination and structure schemas

Choose Snowflake, BigQuery, or Databricks as your warehouse.
Create separate schemas (or databases) for bronze_raw, silver_refined, and gold_marts.

2) Configure Airbyte connections

Select sources (e.g., Postgres, Salesforce, HubSpot, GA4).
Enable CDC if available (logical replication for Postgres/MySQL; vendor CDC for SaaS where supported).
Set replication frequency (real-time, hourly, or daily) and retention rules.

3) Decide on initial loading strategy

Full refresh for first import; incremental thereafter (CDC or incremental cursor).
Keep raw data immutable (Bronze) to simplify auditing and reprocessing.

4) Stand up a dbt project

Initialize dbt; configure target profiles for dev/staging/prod.
Define “sources” pointing to your Bronze schemas.
Add a staging folder with lightweight models mirroring your raw tables.

5) Build incremental models for performance

Use incremental models for event/log tables or large fact tables.
Leverage a unique_key and updated_at for merges and upserts.

Example dbt incremental pattern (simplified):

Configure incremental materialization with unique_key (e.g., id).
Select from the source table.
If is_incremental, filter on rows updated after the latest record in the target.

6) Add tests for data quality

not_null and unique on primary keys.
accepted_values for enums and statuses.
relationships to ensure referential integrity between dimensions and facts.

7) Document and generate lineage

Add descriptions at the source and model level.
Run dbt docs to visualize lineage and help analysts discover trusted datasets.

8) Orchestrate end-to-end

Kick off Airbyte first (ingestion), then dbt (transformation).
Use Airflow, Dagster, Prefect, or simple CI (e.g., GitHub Actions) for small teams.
Build alerts for failures in either step.

9) Promote with environments and CI/CD

Dev: ad-hoc runs, schema changes, and tests.
Staging: production-like tests and data checks.
Prod: scheduled jobs with notifications and SLAs.

10) Monitor cost and performance

Partition and cluster large tables.
Use incremental materializations, ephemeral models for CTE inlining, and warehouse caching.
Periodically backfill historical data with dedicated, off-peak runs.

Data Quality and Observability: Bake It In

Column-level tests in dbt (unique, not_null, relationships, accepted_values).
Threshold alerts for freshness and row-count anomalies.
Track lineage to identify upstream causes of broken dashboards.
Consider complementary tools (e.g., Great Expectations or metadata-based anomaly detection) if you need policy-driven checks outside dbt.

Handling CDC, Late-Arriving Data, and Schema Drift

CDC and deduplication
Prefer idempotent merges; always model a unique business key.
Use updated_at or a high-water mark to pick up late changes.
Late-arriving facts
Design your Silver and Gold models to re-aggregate affected windows.
Keep a convenient “recompute window” parameter for backfills (e.g., last N days).
Schema drift
Airbyte can add new columns; dbt can adopt a “select star” pattern with explicit field exclusions to reduce breakage.
Add tests to catch unexpected null spikes or new values in constrained columns.

Performance and Cost Optimization in ELT

Use incremental models wherever possible; avoid full table rebuilds on large facts.
Partition and cluster by date/time or high-cardinality keys used in filters.
Leverage ephemeral models for CTE-heavy logic that doesn’t need persistence.
Schedule high-cost transformations off-peak.
Right-size your warehouse/compute and enforce query timeouts.

Security and Governance from Day One

Secrets management via environment variables or vaults.
Service accounts with least-privilege roles per tool (Airbyte, dbt, orchestration).
Separate environments and schemas to enforce blast-radius boundaries.
PII protection: column-level masking, tokenization, or dynamic data masking where supported.
Logging and audit trails for compliance and incident response.
Track lineage and ownership (e.g., in dbt exposures) so users know which Gold tables are reliable.

Real-World Patterns and Use Cases

Marketing analytics hub
Sources: HubSpot/Salesforce/GA4
Silver: standardized contact, account, campaign, and session models
Gold: multi-touch attribution, pipeline conversion, ROAS by channel
E-commerce 360
Sources: Shopify, Postgres (orders), payment gateway
CDC: order and transaction tables
Gold: daily sales, inventory turns, LTV cohorts, and funnel metrics
FinOps/Cost analytics
Sources: cloud billing exports, HR, CMDB
Gold: cost per product/tenant/environment with showback/chargeback models

Common Pitfalls (and How to Avoid Them)

Mixing ingestion and transformation in one tool
Keep Airbyte for EL and dbt for T. Separation of concerns pays off.
Skipping tests
Add basic tests on Day 1; they prevent bad data from rolling forward.
Overusing full refresh
Move to incremental early to control costs and runtime.
Not planning for schema changes
Expect it. Use macros, select-star patterns with exclusions, and robust CDC merges.
No environments or CI/CD
Version-control everything; promote changes through dev→staging→prod with automated checks.

A Pragmatic Roadmap to Scale

Weeks 1–2: Land 1–2 sources in Bronze; build a minimal Silver model with dbt and basic tests.
Weeks 3–6: Add a second domain, introduce CDC, and build Gold marts that power a dashboard.
Weeks 7–12: Add CI/CD, orchestration, data SLAs, alerts, and defined ownership. Expand tests and governance.

Conclusion

Moving from ETL to ELT isn’t just a tooling change—it’s an operating model shift. Airbyte helps you ingest data fast and reliably, while dbt gives you clean, testable, and documented transformations. Together, they create a modern, scalable pipeline that teams can trust.

Below are answers to common questions that come up when planning or migrating to ELT with Airbyte and dbt.

FAQ: ETL vs. ELT, Airbyte, and dbt

1) What’s the main benefit of switching from ETL to ELT?

ELT lands raw data quickly, then uses your warehouse’s compute for transformations. That translates to faster iteration, better scalability, stronger testing/lineage, and lower maintenance. It also simplifies schema evolution and backfills.

2) When does traditional ETL still make sense?

ETL can make sense when:

You must transform before landing data due to strict regulatory constraints.
You’re loading into systems that can’t handle heavy transformations.
You’re on-prem with limited warehouse compute and rigid batch windows.

3) How does Airbyte compare to Fivetran or Meltano?

Airbyte is open-source with a large connector ecosystem and flexible deployment (self-hosted or managed). Fivetran focuses on fully managed connectors and simplicity. Meltano is a framework built around Singer taps/targets. Choose based on connector coverage, deployment model, cost, and operational preferences.

For a practical take on building robust ELT with Airbyte, see: Airbyte made practical: how to build reliable data integrations and ELT pipelines.

4) Do I need dbt if I already use SQL in the warehouse?

Using raw SQL works for quick one-offs, but dbt adds:

Project structure and modularity
Reusable macros and packages
Built-in tests and documentation
Lineage graphs and exposures
Easy CI/CD and environment management

That framework is what turns SQL scripts into a maintainable transformation platform. More here: dbt (Data Build Tool): what you need to know.

5) How do I handle schema changes from sources?

Let Airbyte add new columns into Bronze.
In dbt, use macros to manage “select star” with exclusions, or explicitly include new columns as needed.
Add tests to detect unexpected null spikes, value changes, or duplicates.
Version models or use backward-compatible patterns when deprecating fields.

6) What’s the best way to orchestrate Airbyte and dbt?

For small teams: GitHub Actions or simple cron-based schedules often suffice.
For growing teams: use Airflow, Dagster, or Prefect to sequence runs, manage dependencies, and trigger alerts.
Always run Airbyte jobs first (ingestion), then dbt (transformation).

7) How do I optimize cost and performance in ELT?

Use incremental models for large tables and event streams.
Partition/cluster on commonly filtered columns.
Schedule heavy jobs off-peak and right-size compute.
Use ephemeral models for CTE-heavy logic and leverage warehouse caching.
Periodically vacuum/optimize as your platform recommends.

8) How does the Medallion Architecture fit with Airbyte + dbt?

It’s a natural fit:

Bronze: raw from Airbyte
Silver: cleaned and conformed with dbt
Gold: business-ready marts in dbt

This layering isolates concerns, supports reprocessing, and improves trust. Reference: Medallion Architecture explained.

9) How do I ensure data quality and trust?

Start with dbt tests (unique, not_null, relationships, accepted_values).
Add freshness checks and outlier detection for critical tables.
Implement SLAs/SLOs and alerting on failures.
Use lineage to quickly identify upstream issues and owners.

10) How should I structure environments and releases?

Separate dev, staging, and prod with distinct schemas and credentials.
Use feature branches and pull requests for all changes.
Automate tests and docs generation in CI.
Promote through environments with controlled approvals and automated deployments.

By combining Airbyte for ingestion and dbt for transformation—and layering your data using a Bronze/Silver/Gold approach—you’ll get an ELT pipeline that’s fast to build, easy to govern, and resilient as your data footprint grows.

Data Engineering

From ETL to ELT: A Practical Playbook for Building Modern Data Pipelines with Airbyte and dbt

ETL vs. ELT: What’s the Real Difference?

Why the Modern Data Stack Favors ELT

The Airbyte + dbt Combo: Who Does What?

Reference Architecture: From Sources to “Gold” Analytics

Step-by-Step: Implement Your First Airbyte + dbt ELT Pipeline

Data Quality and Observability: Bake It In

Handling CDC, Late-Arriving Data, and Schema Drift

Performance and Cost Optimization in ELT

Security and Governance from Day One

Real-World Patterns and Use Cases

Common Pitfalls (and How to Avoid Them)

A Pragmatic Roadmap to Scale

Conclusion

FAQ: ETL vs. ELT, Airbyte, and dbt

1) What’s the main benefit of switching from ETL to ELT?

2) When does traditional ETL still make sense?

3) How does Airbyte compare to Fivetran or Meltano?

4) Do I need dbt if I already use SQL in the warehouse?

5) How do I handle schema changes from sources?

6) What’s the best way to orchestrate Airbyte and dbt?

7) How do I optimize cost and performance in ELT?

8) How does the Medallion Architecture fit with Airbyte + dbt?

9) How do I ensure data quality and trust?

10) How should I structure environments and releases?

Don't miss any of our content

Sign up for our BIX News

Our Social Media

Most Popular

5 key considerations for implementing Gen AI in Your business

dbt Semantic Layer: How Metrics Work in Practice (and Why It Changes Analytics)

Best Observability Tools for LLM-Based Applications: A Practical Guide to Traces, Costs, Quality, and Safety

Implementing dbt in an Existing Data Warehouse: A Practical, Low-Risk Playbook

The Best BI Tools for Non‑Technical Users (and How to Choose the Right One)

The Hidden Costs of “Cheap” Data Solutions: Why Low Price Often Means High Risk

Is Your Company Ready to Use Generative AI? A Practical Readiness Guide for Leaders

Start your tech project risk-free