
Community manager and producer of specialized marketing content
Modern analytics and machine learning live or die by data quality. A single upstream schema change, a silent null explosion, or a “helpful” ETL tweak can break dashboards, degrade model performance, and trigger costly incidents-often without an obvious error.
That’s where Great Expectations (GE) comes in. It’s a popular open-source framework for data validation, testing, and documentation that helps teams define what “good data” looks like and enforce those standards consistently-especially in production data pipelines.
Why Data Validation Matters in Production Pipelines
In development, it’s easy to assume the data is “mostly fine.” In production, “mostly fine” becomes:
- Broken BI dashboards after a column is renamed
- Downstream failures when a partition is missing
- Model drift because distributions quietly shift
- Compliance issues due to unexpected PII appearing in datasets
Production pipelines need data observability, but also something more actionable: automated validation gates that stop bad data before it spreads.
Great Expectations provides that gate by allowing you to define and run validations such as:
- Schema checks (columns exist, types match)
- Completeness checks (no unexpected nulls)
- Range checks (values within expected bounds)
- Uniqueness checks (primary keys are unique)
- Distribution checks (numeric stats don’t drift too far)
What Is Great Expectations (and Why Teams Like It)
Great Expectations is designed around a simple idea:
> Turn data quality assumptions into executable tests-then run them automatically.
The core building blocks (in plain English)
- Expectations: The rules you define (e.g., “column
emailshould never be null”). - Expectation Suites: A collection of expectations for a dataset.
- Checkpoints: Configured runs that execute suites against data and produce results.
- Data Docs: Auto-generated documentation that summarizes validations and outcomes.
This structure makes GE useful not only for validation, but also for collaboration: business stakeholders can read Data Docs, engineers can version expectation suites, and pipeline owners can monitor checkpoint results.
Where Great Expectations Fits in a Production Architecture
A common production flow looks like this:
- Ingest data (APIs, CDC, files)
- Land data (raw/bronze)
- Transform (silver/gold, dimensional models)
- Serve (warehouse marts, BI, feature stores)
Great Expectations can be used at multiple stages, but the best ROI typically comes from validating:
1) Raw ingestion (Bronze): catch upstream chaos early
Useful validations:
- Files arrived on time
- Row counts not zero
- Required columns present
- No corrupted records
2) Post-transform datasets (Silver/Gold): protect business logic
Useful validations:
- Unique keys after deduplication
- Referential integrity across tables
- Aggregations within expected ranges
- Type enforcement after casting
3) Pre-serving layer: prevent downstream consumer incidents
Useful validations:
- Null checks on metrics used in dashboards
- Freshness checks before BI refresh
- Distribution checks to detect drift
Practical tip: Don’t over-validate every layer. Pick critical datasets and add checks where failures are most costly.
Designing Expectations That Actually Work in Production
A common failure mode is writing expectations that are too strict, too vague, or too brittle. Production validation needs balance: strong enough to catch real issues, flexible enough to tolerate normal variation.
Use the “contract + tolerance” approach
Instead of hardcoding rigid rules, define:
- Contracts: schema, required fields, uniqueness
- Tolerances: acceptable drift, threshold-based null rates
Examples:
- “
order_idis always unique” (contract) - “
discount_amountis between 0 and 500” (contract) - “Null rate for
customer_phone< 5%” (tolerance) - “Daily row count should not drop by more than 30% vs trailing 7-day average” (tolerance)
Prioritize expectations by risk
Not all fields are equal. Rank checks by:
- Downstream impact (BI? ML features? revenue reporting?)
- Frequency of change (volatile sources need guardrails)
- Operational risk (pipelines that fail silently)
Great Expectations Best Practices for Production Pipelines
1) Store expectation suites in version control
Treat data quality rules like code:
- Review changes via pull requests
- Tie expectation updates to pipeline changes
- Roll back easily if needed
2) Run validations as pipeline steps-not ad-hoc
If validations aren’t part of the orchestration flow, they’ll be ignored. In production, checkpoints should run automatically:
- After ingestion
- After transformations
- Before publish/serve
3) Decide what happens on failure (and be consistent)
A validation is only useful if it leads to action. Common failure policies:
- Fail-fast: stop the pipeline immediately (good for critical tables)
- Quarantine: route data to a “bad records” area but keep pipeline running
- Warn-only: log and alert without stopping (good during rollout)
A mature setup often starts with warn-only and graduates to fail-fast for critical assets.
4) Keep expectations maintainable
Avoid checks that no one understands six months later. Prefer:
- Clear naming conventions
- Short suites per dataset (focused rules)
- Reusable patterns (e.g., standardized checks for timestamps, IDs)
5) Generate Data Docs and publish them internally
GE’s Data Docs help teams:
- See which expectations exist (and why)
- Track failures and trends
- Align stakeholders on what “quality” means
Integrating Great Expectations with Orchestration (Airflow, Dagster, dbt, etc.)
Great Expectations is flexible and can run wherever your pipeline runs:
Airflow
- Run a checkpoint in a PythonOperator
- Fail the task on validation failure
- Send alerts to Slack/email
Dagster
- Embed validations as solids/assets checks
- Track results as part of the asset graph
dbt
- Use dbt tests for basic constraints and GE for deeper validation (distribution, freshness, multi-table logic)
- Trigger GE checkpoints after dbt runs
Spark / Databricks
- Validate large datasets where Spark executes
- Use partition-aware expectations
Key idea: Keep validations close to the compute engine to avoid unnecessary data movement.
A Concrete Example: One “Orders” Pipeline, One Checkpoint, Real Failure Output, Real Alerts
To make this less abstract, here’s a compact production-style example you can adapt.
Scenario
- A daily job builds
analytics.fct_orders(gold table) from raworders+customers. - The BI layer depends on:
order_idbeing uniqueorder_totalbeing non-negativeorder_timestampbeing populatedcustomer_idbeing present (and ideally valid)
Expectation suite (what you’d version-control)
You can define these in code or via the CLI. Conceptually, the suite contains checks like:
expect_table_columns_to_match_ordered_list(schema contract)expect_column_values_to_not_be_nullonorder_id,order_timestamp,customer_idexpect_column_values_to_be_uniqueonorder_idexpect_column_values_to_be_betweenonorder_total(min = 0)
Sample Checkpoint config (YAML)
This is the kind of file teams commit to Git and run from Airflow/Dagster/CI:
`yaml
great_expectations/checkpoints/fct_orders_daily.yml
name: fct_orders_daily
config_version: 1.0
class_name: Checkpoint
run_name_template: "%Y%m%d-fct_orders"
validations:
- batch_request:
datasource_name: warehouse
data_connector_name: default_inferred_data_connector_name
data_asset_name: analytics.fct_orders
expectation_suite_name: fct_orders_suite
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
Optional: push a lightweight “pass/fail” signal into your incident pipeline
- name: send_webhook_on_failure
action:
class_name: WebhookAction
Example endpoint: a small internal service that forwards to PagerDuty/Slack
url: "https://alerts.mycompany.com/ge"
method: "POST"
headers:
Content-Type: "application/json"
Some teams include run metadata to make triage faster
payload:
dataset: "analytics.fct_orders"
checkpoint: "fct_orders_daily"
severity: "high"
`
What a failure looks like (expected output)
GE results are structured, but the “what failed” is usually what on-call needs. A realistic failure might look like:
expect_column_values_to_be_uniqueonorder_id: FAILED- Unexpected duplicate count: 128
- Example unexpected values:
["A-10291", "A-10444", "A-10444", "A-10902"] expect_column_values_to_not_be_nulloncustomer_id: FAILED- Null percentage: 2.7% (threshold: 0%)
In practice, this points to common root causes:
- a late-arriving incremental load reprocessed yesterday’s partition (duplicates)
- a join key changed upstream or a dimension table is missing keys (null foreign keys)
How the alert gets wired (one clean approach)
A simple, production-friendly pattern:
- Orchestrator runs checkpoint (Airflow/Dagster task).
- Task fails (fail-fast) for critical datasets, or continues (warn-only) for non-critical datasets.
- Alert handler receives a webhook (from
WebhookAction) and forwards:
- Slack message to
#data-oncall - PagerDuty incident if severity is high or repeated
- Data Docs are updated so the alert links to a human-readable report.
If you prefer keeping all alert logic in the orchestrator, the checkpoint task can raise on failure and the orchestrator can own notifications. The key is consistency: one clear owner for paging, one clear place to view details.
CI/CD for Data: How to Shift Left with Great Expectations
Teams often wait until production to detect data issues. Great Expectations can help you “shift left”:
What to validate in CI
- Run expectation suites against sample or staged datasets
- Validate schema changes before merge
- Confirm transformations still produce expected output shapes
Why this matters
- Faster feedback loops
- Fewer production incidents
- More confidence in data releases
Pro move: Add a “data contract” step in PR checks-especially when upstream APIs or source tables change.
Observability and Alerting: Don’t Just Validate-Respond
Production-grade validation needs monitoring:
- Store validation results somewhere queryable (warehouse table, logs, monitoring system)
- Alert on failures (Slack, PagerDuty, email)
- Track recurring issues and assign owners
Common alert patterns
- Alert immediately for fail-fast datasets
- Daily digest for warn-only datasets
- Escalate if the same expectation fails repeatedly
Validation without operational response becomes noise. The goal is actionable signals.
Common Pitfalls (and How to Avoid Them)
Pitfall 1: Too many expectations too soon
Start small:
- 5–15 high-value expectations per critical dataset
- Expand after you’ve proven the workflow
Pitfall 2: Hard-coded thresholds that become outdated
Use rolling baselines or percent-based thresholds where possible.
Pitfall 3: Treating all failures as equal
Classify expectations by severity:
- Critical (must pass)
- High (alert + investigate)
- Informational (trend tracking)
Pitfall 4: No ownership
Every dataset should have an owner who understands:
- What “good” looks like
- Who gets alerted
- What the remediation path is
A Practical Rollout Plan (That Won’t Overwhelm Your Team)
Phase 1: Foundation (1–2 weeks)
- Pick 1–3 critical datasets
- Define key expectations (schema, nulls, uniqueness, freshness)
- Run in warn-only mode
- Publish Data Docs
Phase 2: Operationalize (2–4 weeks)
- Add validations to orchestration
- Configure alerting and routing
- Add basic CI checks for schema changes
Phase 3: Expand and mature (ongoing)
- Add drift/distribution checks
- Add multi-table integrity rules
- Create reusable expectation templates
- Promote critical suites to fail-fast
Build Pipelines People Can Trust (Without Slowing Delivery)
Reliable data is a product: it needs tests, clear contracts, and a response plan when something breaks. Great Expectations gives you a practical way to formalize assumptions, run them automatically at the right points in the pipeline, and make failures visible to the people who can fix them.
If you’re already operating Airflow/Dagster/dbt jobs in production, the fastest path to value is straightforward: pick one high-impact table, add a small expectation suite, run it as a checkpoint after transformations, and wire failures into the same alerting flow you use for pipeline uptime.
FAQ: Great Expectations for Production Data Pipelines
1) What is Great Expectations used for?
Great Expectations is used for data quality validation-defining rules (expectations) and automatically testing datasets in pipelines to ensure they meet requirements like schema, completeness, ranges, uniqueness, and distribution stability.
2) Is Great Expectations only for data warehouses?
No. Great Expectations can validate data in many environments, including data lakes, warehouses, Spark-based processing, and batch/ELT workflows. The best fit is anywhere you can reliably access data to run checks close to where transformations happen.
3) Should validations run before or after transformations?
Ideally both, but prioritize based on risk:
- Before transforms: catch upstream schema/format issues early
- After transforms: verify business logic outputs (keys, metrics, integrity)
Many teams get the most value by validating curated “gold” datasets that power BI and ML.
4) How many expectations should I create per dataset?
Start with a small, high-impact set-often 5 to 15 expectations-focused on critical columns and invariants. Add more once the workflow is stable and alerts are actionable (not noisy).
5) What should happen when an expectation fails in production?
It depends on severity:
- Fail-fast for critical reporting or regulatory datasets
- Quarantine for pipelines that can continue but should isolate bad partitions
- Warn-only during rollout or for non-critical checks
The key is to have a consistent policy and a clear remediation path.
6) Can Great Expectations replace dbt tests?
Not exactly. dbt tests are excellent for common constraints (unique, not null, relationships). Great Expectations often complements dbt by covering:
- Distribution and drift checks
- More flexible thresholds
- Complex validation patterns
- Rich documentation output (Data Docs)
7) How do I avoid flaky validations caused by natural data variation?
Use tolerance-based expectations:
- Percent thresholds instead of absolute counts
- Rolling baselines (e.g., compare to trailing averages)
- Severity levels (critical vs informational)
This reduces false positives while still catching meaningful anomalies.
8) Is Great Expectations suitable for real-time streaming pipelines?
Great Expectations is most commonly used in batch and micro-batch workflows. For true streaming, teams often validate at ingestion boundaries (micro-batches) or validate sampled data, complemented by streaming-native monitoring tools like logs and alerts for distributed pipelines.
9) What’s the best way to operationalize Great Expectations?
Operationalizing usually includes:
- Running checkpoints inside your orchestrator (Airflow/Dagster/etc.)
- Publishing Data Docs
- Centralizing results and building alerting
- Adding CI checks for schema-breaking changes
This turns validation into a reliable, repeatable production control. For a deeper operational blueprint, see automated data testing with Apache Airflow and Great Expectations.
10) How long does it take to implement Great Expectations in an existing pipeline?
A basic production rollout for a few key datasets can often be done in 1–2 weeks, with additional time to operationalize alerting, CI/CD integration, and drift detection as you scale across more domains. If you also need traceability for compliance and incident response, pair validation with data pipeline auditing and lineage.








