
Community manager and producer of specialized marketing content
If you’re running Apache Airflow on Kubernetes, rolling out an MLflow tracking server, or maintaining a fleet of model-serving deployments, you’ve probably felt the pain: one “small” change to a Helm value or secret reference can break schedulers, stall workers, or put production out of sync with staging.
That’s where GitOps with Argo CD pays off. Instead of “deploying by hand,” you operate your platform the same way strong Kubernetes teams run web services: versioned, reviewed, continuously reconciled, and easy to roll back.
Below you’ll find practical patterns-Argo CD Application YAML, a GitOps repo tree (Helm/Kustomize), and a multi-cluster app-of-apps setup-plus concrete setup steps and caveats teams usually discover the hard way.
Why Data & AI Pipelines Need Better Deployment Practices
Data and AI pipelines aren’t like traditional web apps. They typically involve:
- Multiple components (orchestrators, workers, feature stores, model servers, monitoring)
- Event-driven or scheduled workloads (batch and streaming)
- Environment complexity (dev/stage/prod, multiple clusters, GPUs, different node pools)
- Frequent updates (model versions, preprocessing logic, parameters, dependencies)
- Strong compliance and traceability needs (who changed what and when)
If deployments are done manually-or by ad-hoc scripts-teams often face:
- Configuration drift between environments
- Hard-to-reproduce bugs due to unclear versioning
- Slow rollbacks when a pipeline breaks
- Limited auditability, which is risky for regulated industries
- Scaling pain as the number of services and pipelines grows
This is exactly the problem space where GitOps becomes a major advantage.
GitOps in Plain English (and Why It Works)
GitOps is an operational model where:
- Git is the single source of truth for desired system state.
- Changes happen via pull requests (PRs).
- An automated agent continuously reconciles the real environment with what’s in Git.
Instead of “deploying” by pushing commands to a cluster, you declare desired state (Kubernetes YAML, Helm values, Kustomize overlays, etc.) and a controller makes the cluster match it.
Core GitOps Principles
- Declarative configuration: define infrastructure and apps as code
- Version control: every change is tracked in Git
- Automated reconciliation: drift is detected and corrected
- Review + approval workflow: PRs enforce standards and reduce risk
Applied to Airflow, MLflow, and model serving, this creates a consistent path from dev to prod-without mystery changes in the cluster.
What Is Argo CD?
Argo CD is a GitOps continuous delivery tool for Kubernetes. It watches a Git repository and ensures that the Kubernetes cluster matches what’s defined there.
What Argo CD Does Well
- Syncs Kubernetes manifests from Git to cluster (manual or automated)
- Detects drift and shows what differs between Git and runtime
- Supports Helm, Kustomize, and plain YAML
- Offers a clear UI and robust RBAC
- Enables multi-cluster deployments
- Makes rollbacks auditable (revert Git; Argo CD reconciles)
If you’re deploying Airflow, Spark, Ray, MLflow, KServe/Seldon, or supporting services on Kubernetes, Argo CD becomes the control plane for “what should be running” and “who changed it.”
Where Argo CD Fits in DataOps and MLOps
Argo CD is not a pipeline orchestrator (that’s typically Airflow, Dagster, Prefect, Argo Workflows, etc.). Instead, it manages the platform and application components that pipelines rely on.
Common workloads where Argo CD makes a difference:
1) Orchestrators and Schedulers
- Apache Airflow (Helm-based deployments are common)
- Dagster deployments
- Prefect agents and workers
- Argo Workflows controllers
Argo CD keeps scheduler/webserver/worker configuration consistent across environments and makes upgrades predictable.
2) Data Processing Services
- Spark on Kubernetes (operator + SparkApplications)
- Flink clusters
- Kafka operators/connectors
- dbt runners packaged as jobs
Argo CD helps standardize resource policies and environment-specific settings that otherwise diverge quickly.
3) MLOps Components
- MLflow tracking server (backend store + artifact store wiring)
- Feature stores (e.g., Feast)
- Model serving (KServe, Seldon, custom inference APIs)
- GPU-enabled inference deployments
Here GitOps shines: reproducible infra and safer changes when you’re iterating on models quickly.
4) Observability & Monitoring
- Prometheus + Grafana
- Loki, Tempo, OpenTelemetry collectors
- Model monitoring integrations (custom metrics, drift dashboards)
Managing observability with the same workflow as the rest of the platform prevents “it only exists in prod” surprises—especially when paired with modern stacks described in observability with Sentry, Grafana, and OpenTelemetry.
A Practical GitOps Architecture for Data & AI Teams
A clean approach is to split responsibilities across repositories and environments.
Recommended Repository Structure
Option A: Single repo (simple to start)
/apps/airflow/apps/mlflow/apps/model-serving/infra/monitoring/environments/dev/environments/prod
Option B: Two-repo approach (common at scale)
- App repo(s): code, Dockerfiles, tests, CI pipelines
- GitOps repo: Kubernetes manifests, Helm values, Kustomize overlays, env configs
The two-repo approach is often preferred because:
- Access control is cleaner
- Promotion to prod is explicit
- Platform teams can manage infra safely
A more concrete GitOps repo tree (Helm + Kustomize)
Here’s a practical structure many teams use:
`text
gitops/
clusters/
dev-us-east-1/
apps/ # "app-of-apps" entry point
kustomization.yaml
airflow-app.yaml
mlflow-app.yaml
monitoring-app.yaml
prod-us-east-1/
apps/
kustomization.yaml
airflow-app.yaml
mlflow-app.yaml
monitoring-app.yaml
apps/
airflow/
values/
dev.yaml
prod.yaml
mlflow/
base/
deployment.yaml
service.yaml
ingress.yaml
overlays/
dev/
kustomization.yaml
configmap-patch.yaml
prod/
kustomization.yaml
configmap-patch.yaml
projects/
data-platform-project.yaml # Argo CD Project (RBAC boundaries)
`
Environment Promotion Flow (Typical)
- Merge a PR updating image tag or Helm/Kustomize config in dev
- Argo CD syncs dev automatically
- After validation, a PR promotes the same change to staging
- A controlled PR promotes to prod
To keep promotion clean, many teams avoid “hand-editing prod” and instead promote via:
- a PR that updates
apps/, or/values/prod.yaml - a PR that bumps a shared version variable, or
- a PR that updates the
targetRevisionfor a prod Application (when using release tags)
How Argo CD Handles Drift, Rollbacks, and Reliability
Drift Detection
Argo CD constantly compares:
- Desired state (Git)
vs
- Live state (cluster)
If someone changes a resource manually (kubectl edit, hotfix patches), Argo CD flags it as OutOfSync-and can optionally auto-reconcile back to Git.
Rollbacks
Rollbacks are usually: revert the Git commit (or restore the previous Helm values) and let Argo CD reconcile the cluster back to a known-good state.
For platforms where a small config change can break scheduled jobs or model endpoints, this is the difference between a clean recovery and a long outage.
Best Practices: GitOps + Argo CD for Data and AI Pipelines
1) Treat Config as a Product
Define clear ownership and standards:
- naming conventions for apps and namespaces
- resource limits/requests (especially for Spark/ML jobs)
- secrets strategy (more below)
- logging and metrics requirements
2) Use Kustomize or Helm for Multi-Environment Variations
Common environment differences:
- replica counts
- node selectors (GPU vs CPU)
- S3 buckets / data lake endpoints
- feature flags
- job schedules
Kustomize overlays or Helm values per environment both work well-pick one per component to reduce cognitive load.
Practical tip: if you deploy a vendor chart (like Airflow), Helm values are often simplest. If you own the manifests (like a custom MLflow deployment), Kustomize overlays tend to stay readable.
3) Avoid “Tag Drift”: Pin Image Versions
Use immutable tags (or digests):
my-model-server:1.4.2(good)my-model-server:latest(risky)
This improves reproducibility and makes rollbacks reliable-especially for model servers tied to a specific artifact/schema.
4) Plan for Secrets: Don’t Store Them in Git
GitOps does not mean “put secrets in Git.” Common patterns include:
- External Secrets Operator + AWS Secrets Manager / GCP Secret Manager / Azure Key Vault
- Sealed Secrets (encrypted secrets committed to Git)
- Vault integrations (Vault Agent Injector / CSI driver)
Implementation detail that matters: Argo CD should deploy references to secrets (ExternalSecret/SealedSecret), not raw credentials. Also decide early whether secrets live per-namespace or per-environment; inconsistency here causes most “works in dev, fails in prod” incidents.
5) Introduce Progressive Delivery for Model Serving
For inference services, consider:
- canary releases (5% → 50% → 100%)
- blue/green deployments
- automated rollback based on metrics
Argo CD reconciles desired state; Argo Rollouts implements safe rollout strategies.
6) Lock down access with Argo CD Projects, RBAC, and SSO (often missed)
“Who can sync what” matters as much as “what is deployed.”
- Use Argo CD Projects to define allowed destinations (clusters/namespaces) and allowed source repos.
- Configure RBAC so data engineers can manage application namespaces, while platform admins own cluster-level operators.
- Integrate SSO (OIDC via Okta/Azure AD/Google, or Dex) to avoid shared admin access.
Real-World Examples of GitOps for Data & AI Workloads
Example 1: Updating an Airflow DAG Runtime Image
Instead of patching a cluster manually:
- Update the Airflow Helm values in Git (image tag/config)
- Merge PR after review
- Argo CD syncs and redeploys
- If something breaks, revert the commit
Result: consistent environments and a clear audit trail.
Example 2: Promoting a New Model Version to Production
A model inference service might use:
MODEL_VERSION=2026-01-10IMAGE_TAG=2.3.0- CPU/MEM tuned for the new model
With GitOps:
- dev → staging → prod promotions are explicit PRs
- rollbacks are predictable
- changes are reviewable and auditable
Example 3: Managing Spark Job Configurations
Spark job parameters (executors, memory, shuffle configs) frequently change.
GitOps ensures:
- config changes are tracked
- performance tuning is reproducible
- teams can compare what changed between runs
Common Challenges (and How to Solve Them)
“Our pipeline code is in Git-why do we need GitOps too?”
Because GitOps governs deployment state, not just source code. You can have perfect pipeline code and still break production with drift, unreviewed Helm changes, or an operator upgrade that wasn’t recorded.
“What about frequent experiments in ML?”
Use namespaces and environment patterns:
- ephemeral namespaces per experiment branch
- automated cleanup policies
- separate Argo CD Applications for experiment stacks
“We have multiple clusters-does Argo CD scale?”
Yes. Multi-cluster management is a common Argo CD use case. You can manage different clusters/regions with separate Applications or the app-of-apps pattern.
Implementation Details: Step-by-Step Setup (Argo CD + Projects/RBAC + Promotion)
This section fills in the “missing middle” between architecture and YAML.
Step 1: Install Argo CD
Most teams install via Helm:
- Docs: https://argo-cd.readthedocs.io/en/stable/getting_started/
Example (Helm-based) flow:
- Create the
argocdnamespace - Install the Argo CD chart
- Expose the API server (Ingress/LoadBalancer)
- Enable SSO early if you have it (OIDC/Dex)
Caveat: lock down the initial admin credentials immediately; don’t leave a long-lived admin password in a shared channel or wiki.
Step 2: Register clusters (for multi-cluster)
Argo CD can manage:
- the same cluster it runs in (in-cluster), and/or
- external clusters (staging/prod in separate accounts)
Docs: https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#clusters
In practice:
- register each target cluster,
- confirm it appears under Settings → Clusters,
- then use
destination.namein Applications (as shown later).
Step 3: Define an Argo CD Project (boundaries + guardrails)
Projects are where you restrict:
- which repos can be used
- which clusters/namespaces are allowed
- which resource kinds can be created (optional but powerful)
Reference: https://argo-cd.readthedocs.io/en/stable/user-guide/projects/
Your repo already includes projects/data-platform-project.yaml; ensure it defines:
sourceRepos(your GitOps repo(s) and any approved chart repos)destinations(dev/stage/prod clusters + namespaces)- (optional)
clusterResourceWhitelist/namespaceResourceWhitelist
Step 4: RBAC and least privilege
Docs: https://argo-cd.readthedocs.io/en/stable/operator-manual/rbac/
Typical approach:
- map SSO groups to Argo CD roles (e.g.,
data-engineering,ml-engineering,platform-admin) - grant “sync” rights only for the namespaces/apps they own
- keep cluster-wide operators gated to platform admins
Step 5: CI promotion mechanics (a practical default)
A simple, reliable pattern:
- CI builds/pushes an image on merge to
main - CI opens a PR to the GitOps repo bumping the image tag in
dev - after validation, CI (or a human) promotes by PR to
staging, thenprod
This keeps “promotion” as a Git change-not an imperative deploy step. If you want stronger traceability, combine GitOps promotion with data pipeline auditing and lineage so every change is provable end-to-end.
Implementation Details: Sample Argo CD Applications (including app-of-apps + multi-cluster)
Below are minimal examples you can adapt.
1) Argo CD “app-of-apps” for a cluster (recommended)
One root Application per cluster/environment, pulling in platform apps (Airflow, MLflow, monitoring, etc.):
`yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: data-platform-dev
namespace: argocd
spec:
project: data-platform
source:
repoURL: https://github.com/your-org/gitops.git
targetRevision: main
path: clusters/dev-us-east-1/apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
`
2) Argo CD Application for Airflow (Helm)
A common setup using Helm values per environment:
`yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: airflow-dev
namespace: argocd
spec:
project: data-platform
source:
repoURL: https://airflow.apache.org
chart: airflow
targetRevision: 1.13.0
helm:
valueFiles:
- https://raw.githubusercontent.com/your-org/gitops/main/apps/airflow/values/dev.yaml
destination:
server: https://kubernetes.default.svc
namespace: airflow
syncPolicy:
automated:
prune: true
selfHeal: true
`
Important caveat (Helm values sourcing): referencing values via a raw GitHub URL can work, but many teams prefer keeping the values alongside the Application source (same repo) to avoid availability/permissions issues and to keep everything auditable in one place. Another common approach is to use a chart from a Helm repo but keep values in your GitOps repo and reference them via path (Argo CD supports “Helm from Git” patterns as well). If you’re standardizing pipeline deployments, it also helps to align the GitOps repo layout with modern ELT practices like moving from ETL to ELT with Airbyte and dbt.
3) Argo CD Application for MLflow (Kustomize overlay)
Dev/prod often differ by backend store, ingress hostnames, and artifact store config:
`yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: mlflow-prod
namespace: argocd
spec:
project: data-platform
source:
repoURL: https://github.com/your-org/gitops.git
targetRevision: main
path: apps/mlflow/overlays/prod
destination:
name: prod-us-east-1 # Argo CD cluster secret name
namespace: mlflow
syncPolicy:
syncOptions:
- CreateNamespace=true
`
> Tip: For multi-cluster setups, prefer destination.name (registered cluster) over destination.server, and keep cluster-specific roots under clusters/.
Wrap-up: A More Reliable Delivery Path for Data & AI on Kubernetes
When you manage Airflow, MLflow, model serving, and observability components across environments, the hard part isn’t “deploying once”-it’s keeping everything consistent, reviewable, and recoverable over time.
GitOps with Argo CD helps by turning deployments into a repeatable workflow:
- desired state is versioned in Git,
- changes are promoted via PRs,
- drift is visible (and correctable),
- rollbacks are a revert-not a firefight.
If you adopt only one habit first, make it this: treat configuration changes (values, overlays, secrets references) with the same rigor as code-PRs, reviews, and clear promotion between environments.
FAQ: Argo CD and GitOps for Data & AI Pipelines
1) What is GitOps, and how is it different from traditional CI/CD?
Traditional CI/CD often “pushes” deployments to environments (a pipeline runs commands against a cluster). GitOps “pulls” desired state from Git using a controller like Argo CD. Git becomes the source of truth, and reconciliation ensures environments match Git continuously.
2) Is Argo CD only for Kubernetes?
Argo CD is designed specifically for Kubernetes deployments. It manages Kubernetes resources defined as YAML, Helm charts, or Kustomize overlays. If your data or ML stack isn’t on Kubernetes, you’d use different deployment tooling.
3) Should data pipelines themselves be deployed with Argo CD?
Argo CD typically deploys the platform components (Airflow, workers, model servers, monitoring, operators). The pipeline definitions (e.g., DAG files, code) are usually delivered via container images or mounted artifacts-still managed through GitOps by updating image tags or chart values.
4) How do you handle secrets in GitOps without exposing credentials?
Use one of these secure patterns:
- External Secrets Operator connected to a cloud secrets manager
- Sealed Secrets (encrypted secrets stored in Git)
- Vault integrations
These approaches keep sensitive values out of plain text in repositories.
5) What’s the best way to structure Git repositories for Argo CD?
A common pattern is:
- Application repo: source code, tests, Docker build, CI
- GitOps repo: Kubernetes manifests/Helm/Kustomize per environment
This separation improves security and keeps deployments clean and auditable.
6) Can Argo CD manage multiple environments like dev, staging, and prod?
Yes. You can model environments with:
- separate folders and overlays in one repo
- separate branches (less common for GitOps)
- separate repos
Argo CD applications can target different namespaces or clusters for each environment.
7) How do rollbacks work with Argo CD?
Rollbacks are typically done by reverting a Git commit (or rolling back Helm values). After the revert, Argo CD syncs the cluster back to the previous known-good state. This makes rollbacks more consistent than manual patching.
8) Does GitOps slow teams down when they need quick changes?
In practice, GitOps speeds teams up after adoption because:
- changes are standardized and repeatable
- fewer incidents happen due to drift
- troubleshooting is easier (everything is versioned)
For emergencies, you can still use controlled procedures, but GitOps reduces how often you need them.
9) What’s the difference between Argo CD and Argo Workflows?
- Argo CD: continuous delivery and GitOps reconciliation for Kubernetes resources
- Argo Workflows: workflow execution engine for Kubernetes (run jobs/DAGs)
Many teams use both: Workflows for running data/ML tasks, and Argo CD for managing the infrastructure and workflow definitions.
10) What are the biggest wins of using Argo CD for ML model serving?
The biggest benefits are:
- consistent deployments across environments
- easier version tracking (model version + infra config)
- safer rollouts and faster rollbacks
- improved auditability for compliance and governance
References
- Argo CD docs: https://argo-cd.readthedocs.io/
- Argo CD Projects: https://argo-cd.readthedocs.io/en/stable/user-guide/projects/
- Argo CD RBAC: https://argo-cd.readthedocs.io/en/stable/operator-manual/rbac/
- Declarative cluster management: https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#clusters







