Automated Data Testing with Apache Airflow and Great Expectations: A Practical End-to-End Playbook

December 16, 2025 at 12:09 PM | Est. read time: 12 min
Valentina Vianna

By Valentina Vianna

Community manager and producer of specialized marketing content

Bad data is expensive. It breaks dashboards, corrupts models, and erodes trust. The fix isn’t more manual checks—it’s automated data testing embedded directly into your pipelines. In this guide, you’ll learn how to combine Apache Airflow and Great Expectations (GX) to continuously validate your data, fail fast when something’s wrong, and ship reliable analytics at scale.

If you’re new to either tool, these deep dives are helpful complements:

What You’ll Learn

  • How Airflow and Great Expectations fit together in a modern data stack
  • A reference architecture for validation across Bronze/Silver/Gold layers
  • A step-by-step quickstart with sample code
  • Patterns for scalable, cost-effective validations in warehouses and lakehouses
  • Alerting, observability, and CI/CD best practices
  • Common pitfalls and how to avoid them

Why Airflow + Great Expectations?

  • Apache Airflow schedules and coordinates ETL/ELT tasks, ensuring dependencies and retries are handled.
  • Great Expectations turns your data quality rules into automated tests (expectations), with human-friendly documentation and machine-checkable results.
  • Together, they act like circuit breakers in your pipelines—promotion happens only when data meets your quality bar.

Where Data Tests Belong in the Pipeline

A reliable pattern is to validate data at key transitions:

  • Bronze (raw) → Silver (cleaned): Validate schema, types, nulls, uniqueness, and basic volume.
  • Silver (cleaned) → Gold (business-ready): Validate referential integrity, business rules, distribution drift, aggregations.
  • Before serving: Validate freshness, completeness, and contract-level rules for downstream apps and BI.

This “gating” approach prevents bad data from propagating downstream—and keeps your users from seeing broken dashboards.

Reference Architecture

  • Ingestion: Load raw data into object storage or warehouse (Bronze).
  • Transformation: Clean and conform in Silver using SQL or Spark.
  • Validation: Run GX checkpoints after each transformation step.
  • Publish: Promote to Gold or quarantine failed datasets.
  • Observe: Log, alert, and document results (Airflow logs + GX Data Docs + alerts).

Tip: Adopt the Medallion architecture (Bronze/Silver/Gold) and attach expectation suites to each layer with increasing strictness.

Quickstart: Run Great Expectations from an Airflow DAG

1) Install dependencies

  • pip install apache-airflow
  • pip install great_expectations
  • Optional (provider): pip install gx-airflow-provider or great-expectations-provider

Note: Provider names can vary by version; confirm the operator import path in your environment.

2) Initialize GX and configure a datasource

From your GX project root (often alongside your DAGs):

  • great_expectations init
  • Add a datasource (e.g., Snowflake, BigQuery, Postgres, S3 + Pandas, or Spark)
  • Create an expectation suite (CLI, notebook, or programmatic)

3) Create a checkpoint

A checkpoint defines which data asset(s) to validate with which suite(s), plus run-time options (batch parameters, evaluation store, data docs, etc.). Store it in your GX project under checkpoints/.

4) Call GX from Airflow

Option A: Use the provider operator (simple and declarative)

  • from great_expectations_provider.operators.great_expectations import GreatExpectationsOperator

ge_validate = GreatExpectationsOperator(

task_id="ge_validate_orders",

data_context_root_dir="/usr/local/airflow/great_expectations",

checkpoint_name="orders_daily_checkpoint",

fail_task_on_validation_failure=True,

)

Option B: Use PythonOperator (maximum control)

  • from airflow import DAG
  • from airflow.operators.python import PythonOperator
  • from airflow.exceptions import AirflowFailException
  • from datetime import datetime

def run_gx_checkpoint(checkpoint_name: str, data_context_root_dir: str):

import os

os.environ["GX_HOME"] = data_context_root_dir # optional if not default

import great_expectations as gx

context = gx.get_context()

result = context.run_checkpoint(checkpoint_name=checkpoint_name)

if not result.success:

raise AirflowFailException("Great Expectations validation failed.")

with DAG(

dag_id="daily_orders_pipeline",

start_date=datetime(2024, 1, 1),

schedule_interval="0 6 *",

catchup=False,

) as dag:

extract = PythonOperator(

task_id="extract_orders",

python_callable=lambda: print("Extract step..."),

)

transform = PythonOperator(

task_id="transform_orders",

python_callable=lambda: print("Transform step..."),

)

validate = PythonOperator(

task_id="validate_orders_with_gx",

python_callable=run_gx_checkpoint,

op_kwargs={

"checkpoint_name": "orders_daily_checkpoint",

"data_context_root_dir": "/usr/local/airflow/great_expectations",

},

)

publish = PythonOperator(

task_id="publish_to_gold",

python_callable=lambda: print("Publish step..."),

)

extract >> transform >> validate >> publish

5) Fail fast or warn-only, by design

  • Critical rules (e.g., broken schema, missing primary keys) should fail the DAG.
  • Non-critical drift (e.g., slight distribution change) may log warnings and still publish.
  • Use “mostly” thresholds and severities to tune sensitivity and reduce false positives.

What to Test: A Practical Coverage Plan

Start with a small, high-value set and expand over time:

  • Schema and types: Columns present, types correct (int, date, decimal).
  • Nullability and uniqueness: Primary keys non-null and unique; critical fields non-null.
  • Referential integrity: Keys match between tables (e.g., orders.customer_id in customers).
  • Freshness and volume: Data within expected time windows; record counts within bounds.
  • Distribution and drift: Numeric ranges, value sets, quantiles, and category proportions.
  • Business rules: Domain logic like “order_total = sum(line_items)” or “status in (‘open’, ‘closed’)”.

Pro tip: Document each expectation in human language (why it matters, owner, severity). GX “Data Docs” does this automatically and helps drive adoption.

Warehouse-First Validations for Scale

Validating huge datasets with Pandas can be slow and expensive. Use pushdown where possible:

  • Point GX to your warehouse (Snowflake, BigQuery, Redshift, Databricks, Postgres).
  • Let GX compute expectations via SQL where it’s faster and cheaper.
  • Validate deltas (today’s partition) for routine checks; run full-suite “deep” checks off-hours.

For a broader program around monitoring and alerting, this practical playbook on data quality monitoring is a useful next step.

Operational Patterns That Work

  • Data Docs as artifacts: Publish GX Data Docs to S3/GCS and link from Airflow logs or XCom.
  • Alerts: Use Airflow callbacks/EmailOperator/Slack callbacks on task failure. Include Data Docs URL.
  • Quarantine pattern: On failure, write data to a quarantine bucket/table for investigation.
  • Backfills: Reduce test scope (sample or partitioned checks) to keep backfills fast.
  • Environments: Separate GX configs for dev/stage/prod. Promote suites via CI/CD.
  • Ownership: Tag expectation suites with owners. Drive accountability and speedier fixes.

Common Pitfalls (and Fixes)

  • Too many tests on day one: Start with 8–12 high-signal expectations; expand based on incidents.
  • Overfitting thresholds: Use historical runs to set realistic bounds; prefer “mostly” parameters.
  • Validating everything in Pandas: Push heavy checks into the warehouse or Spark.
  • Silent failures: Always alert on failures and expose Data Docs to engineers and analysts.
  • Mixing concerns: Separate transformation logic from validation logic; keep tests clean and explainable.

A Brief Real-World Scenario

An online retailer promoted daily “orders” to the analytics layer before validating referential integrity. During a vendor API change, “customer_id” values arrived null, breaking downstream revenue dashboards. After adding a GX checkpoint in Airflow between Silver and Gold, null foreign keys triggered a DAG failure and a Slack alert within minutes. The data was quarantined, business dashboards stayed clean, and the root cause was fixed the same day. That’s the compounding value of automation and gating.

Related Reading

Conclusion

Automated data testing isn’t a luxury—it’s the foundation of trustworthy analytics. With Airflow orchestrating workflows and Great Expectations validating data at critical handoffs, you get fast feedback, clear ownership, and fewer surprises. Start small, gate the most impactful transitions, and evolve your coverage as your data platform matures.


FAQ: Airflow + Great Expectations

1) Should I use the GreatExpectationsOperator or a PythonOperator?

  • Use the GreatExpectationsOperator for simplicity: it runs a checkpoint with minimal code and can fail the task on validation failure.
  • Use PythonOperator when you need advanced control (dynamic batch requests, conditional suites, custom logging, or integration with other systems).

2) Where should the Great Expectations project live?

  • Co-locate it with your DAGs (for small teams) or package it as a separate repository and deploy via CI/CD (for larger teams).
  • In containerized environments (Kubernetes), mount the GX directory as a volume or bake it into the image to ensure consistent versions.

3) How do I handle large datasets without exploding costs?

  • Push validations to your data warehouse (SQL-based datasources) instead of pulling data into memory.
  • Validate only changed partitions (e.g., “yesterday’s data”).
  • Run lightweight checks daily; schedule deeper, full-table checks off-hours or weekly.

4) How do I decide which tests should fail the pipeline?

  • Critical correctness issues (schema changes, missing keys, null primary keys, wrong types) should fail.
  • Soft issues (minor drift) can warn-only. Use “mostly” parameters and tag expectations with severities to standardize behavior.

5) Can I version-control expectation suites?

  • Yes. Keep expectation suites, checkpoints, and datasources in Git.
  • Use CI to validate that suites load correctly and optionally run sample validations against test data to catch config regressions.

6) How do I alert the right people when validations fail?

  • Use Airflow task callbacks or failure handlers (Slack, EmailOperator, PagerDuty, etc.).
  • Include a link to Data Docs and the run ID. Tag suites with owners and Slack channels to route alerts quickly.

7) How do I run tests during backfills without slowing everything down?

  • Scope validations to the partition being backfilled, reduce sample sizes, or switch certain expectations to warn-only.
  • Optionally disable heavy drift checks during bulk backfills and re-enable after the catch-up finishes.

8) What’s the best way to validate referential integrity?

  • Use expectations like “values in column A must exist in column B of table T” via warehouse pushdown or Spark.
  • For large joins, validate at the warehouse to avoid moving data out and to keep performance predictable.

9) How do I keep tests maintainable as schemas evolve?

  • Group expectations by layer (Bronze/Silver/Gold) and data domain.
  • Use tags, naming conventions, and documentation in GX to track intent. Review and adjust thresholds after major upstream changes.

10) Can I generate expectations automatically?

  • Yes. GX can help bootstrap suites (profiling), but treat the output as a starting point. Curate rules to focus on business-critical checks and reduce noise.

If you implement the patterns above—gating transitions, pushdown validations, clear severities, and proactive alerting—you’ll turn data quality from a reactive firefight into a predictable, automated practice.

Don't miss any of our content

Sign up for our BIX News

Our Social Media

Most Popular

Start your tech project risk-free

AI, Data & Dev teams aligned with your time zone – get a free consultation and pay $0 if you're not satisfied with the first sprint.