From ETL to ELT: A Practical Playbook for Building Modern Data Pipelines with Airbyte and dbt

December 17, 2025 at 12:09 PM | Est. read time: 14 min
Valentina Vianna

By Valentina Vianna

Community manager and producer of specialized marketing content

Modern data teams are moving fast from traditional ETL to ELT. Why? Because today’s cloud data warehouses and lakehouses are powerful enough to handle heavy transformations right where your data lives. That shift reshapes the pipeline: ingestion tools bring raw data in quickly and reliably, then transformation tools shape it into trusted, analytics-ready datasets.

In this guide, you’ll learn what’s really different between ETL and ELT, why ELT wins in most modern scenarios, and how to design a clean, maintainable pipeline using Airbyte (for ingestion) and dbt (for transformation). You’ll also get implementation steps, performance tips, governance best practices, and a practical FAQ.

ETL vs. ELT: What’s the Real Difference?

  • ETL (Extract, Transform, Load)
  • Where it excels: strict schemas, limited cloud compute, heavy on-prem workloads.
  • Limitations: long development cycles, inflexible when source schemas change, harder to scale as sources multiply.
  • ELT (Extract, Load, Transform)
  • Where it shines: cloud-first stacks, agile changes, fast iteration, lower maintenance.
  • Why it’s winning: cheap storage, elastic compute, SQL-native transformation tools, and strong lineage/testing frameworks.

In short: ELT flips the order so you can land raw data first (quickly and safely), then transform downstream using the warehouse’s horsepower. This enables faster delivery, better governance, and repeatable engineering patterns.

Why the Modern Data Stack Favors ELT

  • Elastic compute: Warehouses like Snowflake, BigQuery, and Databricks scale transformations on demand.
  • Agility: Iterating in SQL with version control is faster than rebuilding brittle ETL flows.
  • Observability and testing: Modern tooling bakes in lineage, tests, and documentation.
  • Cost control: Separate ingestion and transformation lets you optimize each layer independently.

The Airbyte + dbt Combo: Who Does What?

  • Airbyte handles EL (Extract + Load)
  • Connects to dozens of sources (SaaS apps, APIs, databases).
  • Supports CDC (Change Data Capture) to replicate only deltas.
  • Loads into destinations like Snowflake, BigQuery, Databricks, Postgres, and S3.
  • dbt handles T (Transform)
  • SQL-first transformations, models, tests, and lineage in your warehouse.
  • Modular, version-controlled projects with CI/CD.
  • Built-in data quality tests and documentation.

For a deeper look at building reliable ELT integrations, this practical guide is worth bookmarking: Airbyte made practical: how to build reliable data integrations and ELT pipelines.

If you’re new to dbt or want to level up your transformation practice, see: dbt (Data Build Tool): what you need to know.

Reference Architecture: From Sources to “Gold” Analytics

A proven pattern that pairs perfectly with ELT is the Medallion Architecture:

  • Bronze (raw): Land unmodified source data fast and faithfully.
  • Silver (refined): Clean, standardize, and conform across sources.
  • Gold (business-ready): Curated marts that power dashboards, ML features, and self-service analytics.

Explore the pattern in detail here: Medallion Architecture explained: how Bronze/Silver/Gold layers supercharge your data quality.

How this maps to Airbyte + dbt:

  • Airbyte lands Bronze (raw) tables in your warehouse or data lake.
  • dbt builds Silver (cleaned/conformed) and Gold (consumption) layers on top.

Step-by-Step: Implement Your First Airbyte + dbt ELT Pipeline

1) Pick a destination and structure schemas

  • Choose Snowflake, BigQuery, or Databricks as your warehouse.
  • Create separate schemas (or databases) for bronze_raw, silver_refined, and gold_marts.

2) Configure Airbyte connections

  • Select sources (e.g., Postgres, Salesforce, HubSpot, GA4).
  • Enable CDC if available (logical replication for Postgres/MySQL; vendor CDC for SaaS where supported).
  • Set replication frequency (real-time, hourly, or daily) and retention rules.

3) Decide on initial loading strategy

  • Full refresh for first import; incremental thereafter (CDC or incremental cursor).
  • Keep raw data immutable (Bronze) to simplify auditing and reprocessing.

4) Stand up a dbt project

  • Initialize dbt; configure target profiles for dev/staging/prod.
  • Define “sources” pointing to your Bronze schemas.
  • Add a staging folder with lightweight models mirroring your raw tables.

5) Build incremental models for performance

  • Use incremental models for event/log tables or large fact tables.
  • Leverage a unique_key and updated_at for merges and upserts.

Example dbt incremental pattern (simplified):

  • Configure incremental materialization with unique_key (e.g., id).
  • Select from the source table.
  • If is_incremental, filter on rows updated after the latest record in the target.

6) Add tests for data quality

  • not_null and unique on primary keys.
  • accepted_values for enums and statuses.
  • relationships to ensure referential integrity between dimensions and facts.

7) Document and generate lineage

  • Add descriptions at the source and model level.
  • Run dbt docs to visualize lineage and help analysts discover trusted datasets.

8) Orchestrate end-to-end

  • Kick off Airbyte first (ingestion), then dbt (transformation).
  • Use Airflow, Dagster, Prefect, or simple CI (e.g., GitHub Actions) for small teams.
  • Build alerts for failures in either step.

9) Promote with environments and CI/CD

  • Dev: ad-hoc runs, schema changes, and tests.
  • Staging: production-like tests and data checks.
  • Prod: scheduled jobs with notifications and SLAs.

10) Monitor cost and performance

  • Partition and cluster large tables.
  • Use incremental materializations, ephemeral models for CTE inlining, and warehouse caching.
  • Periodically backfill historical data with dedicated, off-peak runs.

Data Quality and Observability: Bake It In

  • Column-level tests in dbt (unique, not_null, relationships, accepted_values).
  • Threshold alerts for freshness and row-count anomalies.
  • Track lineage to identify upstream causes of broken dashboards.
  • Consider complementary tools (e.g., Great Expectations or metadata-based anomaly detection) if you need policy-driven checks outside dbt.

Handling CDC, Late-Arriving Data, and Schema Drift

  • CDC and deduplication
  • Prefer idempotent merges; always model a unique business key.
  • Use updated_at or a high-water mark to pick up late changes.
  • Late-arriving facts
  • Design your Silver and Gold models to re-aggregate affected windows.
  • Keep a convenient “recompute window” parameter for backfills (e.g., last N days).
  • Schema drift
  • Airbyte can add new columns; dbt can adopt a “select star” pattern with explicit field exclusions to reduce breakage.
  • Add tests to catch unexpected null spikes or new values in constrained columns.

Performance and Cost Optimization in ELT

  • Use incremental models wherever possible; avoid full table rebuilds on large facts.
  • Partition and cluster by date/time or high-cardinality keys used in filters.
  • Leverage ephemeral models for CTE-heavy logic that doesn’t need persistence.
  • Schedule high-cost transformations off-peak.
  • Right-size your warehouse/compute and enforce query timeouts.

Security and Governance from Day One

  • Secrets management via environment variables or vaults.
  • Service accounts with least-privilege roles per tool (Airbyte, dbt, orchestration).
  • Separate environments and schemas to enforce blast-radius boundaries.
  • PII protection: column-level masking, tokenization, or dynamic data masking where supported.
  • Logging and audit trails for compliance and incident response.
  • Track lineage and ownership (e.g., in dbt exposures) so users know which Gold tables are reliable.

Real-World Patterns and Use Cases

  • Marketing analytics hub
  • Sources: HubSpot/Salesforce/GA4
  • Silver: standardized contact, account, campaign, and session models
  • Gold: multi-touch attribution, pipeline conversion, ROAS by channel
  • E-commerce 360
  • Sources: Shopify, Postgres (orders), payment gateway
  • CDC: order and transaction tables
  • Gold: daily sales, inventory turns, LTV cohorts, and funnel metrics
  • FinOps/Cost analytics
  • Sources: cloud billing exports, HR, CMDB
  • Gold: cost per product/tenant/environment with showback/chargeback models

Common Pitfalls (and How to Avoid Them)

  • Mixing ingestion and transformation in one tool
  • Keep Airbyte for EL and dbt for T. Separation of concerns pays off.
  • Skipping tests
  • Add basic tests on Day 1; they prevent bad data from rolling forward.
  • Overusing full refresh
  • Move to incremental early to control costs and runtime.
  • Not planning for schema changes
  • Expect it. Use macros, select-star patterns with exclusions, and robust CDC merges.
  • No environments or CI/CD
  • Version-control everything; promote changes through dev→staging→prod with automated checks.

A Pragmatic Roadmap to Scale

  • Weeks 1–2: Land 1–2 sources in Bronze; build a minimal Silver model with dbt and basic tests.
  • Weeks 3–6: Add a second domain, introduce CDC, and build Gold marts that power a dashboard.
  • Weeks 7–12: Add CI/CD, orchestration, data SLAs, alerts, and defined ownership. Expand tests and governance.

Conclusion

Moving from ETL to ELT isn’t just a tooling change—it’s an operating model shift. Airbyte helps you ingest data fast and reliably, while dbt gives you clean, testable, and documented transformations. Together, they create a modern, scalable pipeline that teams can trust.

Below are answers to common questions that come up when planning or migrating to ELT with Airbyte and dbt.


FAQ: ETL vs. ELT, Airbyte, and dbt

1) What’s the main benefit of switching from ETL to ELT?

ELT lands raw data quickly, then uses your warehouse’s compute for transformations. That translates to faster iteration, better scalability, stronger testing/lineage, and lower maintenance. It also simplifies schema evolution and backfills.

2) When does traditional ETL still make sense?

ETL can make sense when:

  • You must transform before landing data due to strict regulatory constraints.
  • You’re loading into systems that can’t handle heavy transformations.
  • You’re on-prem with limited warehouse compute and rigid batch windows.

3) How does Airbyte compare to Fivetran or Meltano?

Airbyte is open-source with a large connector ecosystem and flexible deployment (self-hosted or managed). Fivetran focuses on fully managed connectors and simplicity. Meltano is a framework built around Singer taps/targets. Choose based on connector coverage, deployment model, cost, and operational preferences.

For a practical take on building robust ELT with Airbyte, see: Airbyte made practical: how to build reliable data integrations and ELT pipelines.

4) Do I need dbt if I already use SQL in the warehouse?

Using raw SQL works for quick one-offs, but dbt adds:

  • Project structure and modularity
  • Reusable macros and packages
  • Built-in tests and documentation
  • Lineage graphs and exposures
  • Easy CI/CD and environment management

That framework is what turns SQL scripts into a maintainable transformation platform. More here: dbt (Data Build Tool): what you need to know.

5) How do I handle schema changes from sources?

  • Let Airbyte add new columns into Bronze.
  • In dbt, use macros to manage “select star” with exclusions, or explicitly include new columns as needed.
  • Add tests to detect unexpected null spikes, value changes, or duplicates.
  • Version models or use backward-compatible patterns when deprecating fields.

6) What’s the best way to orchestrate Airbyte and dbt?

  • For small teams: GitHub Actions or simple cron-based schedules often suffice.
  • For growing teams: use Airflow, Dagster, or Prefect to sequence runs, manage dependencies, and trigger alerts.
  • Always run Airbyte jobs first (ingestion), then dbt (transformation).

7) How do I optimize cost and performance in ELT?

  • Use incremental models for large tables and event streams.
  • Partition/cluster on commonly filtered columns.
  • Schedule heavy jobs off-peak and right-size compute.
  • Use ephemeral models for CTE-heavy logic and leverage warehouse caching.
  • Periodically vacuum/optimize as your platform recommends.

8) How does the Medallion Architecture fit with Airbyte + dbt?

It’s a natural fit:

  • Bronze: raw from Airbyte
  • Silver: cleaned and conformed with dbt
  • Gold: business-ready marts in dbt

This layering isolates concerns, supports reprocessing, and improves trust. Reference: Medallion Architecture explained.

9) How do I ensure data quality and trust?

  • Start with dbt tests (unique, not_null, relationships, accepted_values).
  • Add freshness checks and outlier detection for critical tables.
  • Implement SLAs/SLOs and alerting on failures.
  • Use lineage to quickly identify upstream issues and owners.

10) How should I structure environments and releases?

  • Separate dev, staging, and prod with distinct schemas and credentials.
  • Use feature branches and pull requests for all changes.
  • Automate tests and docs generation in CI.
  • Promote through environments with controlled approvals and automated deployments.

By combining Airbyte for ingestion and dbt for transformation—and layering your data using a Bronze/Silver/Gold approach—you’ll get an ELT pipeline that’s fast to build, easy to govern, and resilient as your data footprint grows.

Don't miss any of our content

Sign up for our BIX News

Our Social Media

Most Popular

Start your tech project risk-free

AI, Data & Dev teams aligned with your time zone – get a free consultation and pay $0 if you're not satisfied with the first sprint.