Introduction
When I started using Terraform and Kubernetes together, one question kept coming up: where does Terraform stop and something else begin?
The easy answer is to let Terraform manage everything. It can install Helm charts, create Kubernetes resources, manage namespaces. But Terraform tracks state as a snapshot. It knows what it deployed, but has no idea what is actually running. If a deployment crashes, if pods are stuck pending, if someone deletes something manually, Terraform does not know and does not care. Its state file says everything is fine, so everything is fine.
ArgoCD was built for exactly that problem. It watches your cluster against a desired state in Git and keeps them in sync, continuously. If something drifts, ArgoCD fixes it. If a pod crashes, ArgoCD reconciles. Terraform will never do that.
So the split I landed on is this: Terraform provisions the infrastructure, ArgoCD manages everything inside the cluster. The interesting part is how you connect the two cleanly, and that is what this post is about.
What Terraform Owns
Terraform handles everything that needs to exist before the cluster is useful:
- VPC – subnets, NAT gateway, internet gateway, route tables
- EKS cluster – control plane, node groups, OIDC provider
- IAM roles – one per controller, all using IRSA so no static credentials ever touch the cluster
- KMS key and Secrets Manager entries – encrypted at rest, path-namespaced per environment
- ECR repositories – with lifecycle policies so old images do not accumulate
- ArgoCD – installed via a single
helm_release, and this is where Terraform stops
That last point matters. Terraform installs ArgoCD, and that is the last thing it does inside the cluster. Everything that comes after, controllers, drivers, application workloads, is ArgoCD’s job.
Why Terraform Stops at ArgoCD
When Terraform runs a helm_release, it marks the resource as complete as soon as the chart is installed. It does not wait for pods to be ready, does not check if the deployment is healthy, and genuinely cannot tell the difference between installed and running.
There is also no reconciliation loop. Terraform applies once and moves on. If a controller crashes an hour later, Terraform will not notice. If someone edits a ConfigMap manually, Terraform will not revert it. Drift is invisible unless you run a plan again.
ArgoCD runs a continuous loop comparing what is in Git to what is in the cluster, and corrects any difference automatically. That is the behaviour you want for everything running inside Kubernetes.
So: Terraform for the infrastructure that needs to exist, ArgoCD for everything that runs on top of it.
Where Ansible Fits In
Before getting into the code, it is worth understanding why Ansible is involved at all.
Terraform finishes provisioning the EKS cluster and hands off to the next step. But at that point, the cluster exists and ArgoCD is installed via Helm, and that is all Terraform knows. It cannot tell you whether the ArgoCD pods are actually running, and it cannot apply Kubernetes manifests cleanly without reaching for raw kubectl in a local-exec block.
Raw kubectl in a shell script works, but it is fragile. You end up writing sleep commands, polling loops, and process substitution hacks just to pass a CA cert. It gets messy fast.
Ansible sits right in that gap. It has native Kubernetes modules that know how to wait for a deployment to become available without any of that. It speaks Terraform’s language on one side – it gets called from a local-exec and receives the cluster credentials as variables – and it speaks Kubernetes on the other side, applying the root app manifest once the cluster is actually ready.
It is not doing anything magical. It is just the cleanest way to cross that boundary without turning the local-exec into a bash nightmare.
The Handoff
The EKS Addons module is where the handoff happens. It installs ArgoCD via Helm, then once ArgoCD is ready, applies the root application that kicks everything else off.
resource "helm_release" "argocd" {
name = "argocd"
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
namespace = "argocd"
create_namespace = true
}
resource "null_resource" "argocd_root_app" {
depends_on = [helm_release.argocd]
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
command = <<-EOT
CA_CERT_FILE=$(mktemp /tmp/eks-ca-XXXXXX.crt)
echo "${var.cluster_ca_certificate}" | base64 -d > "$CA_CERT_FILE"
ansible-playbook "${var.repo_root}/ansible/argocd-bootstrap.yml" \
-e "cluster_endpoint=${var.cluster_endpoint}" \
-e "cluster_ca_cert_path=$CA_CERT_FILE" \
-e "cluster_token=${data.aws_eks_cluster_auth.main.token}" \
-e "env=${var.environment}"
rm -f "$CA_CERT_FILE"
EOT
}
}
The depends_on ensures the local-exec only runs after the Helm release completes. I initially reached for raw kubectl commands here, but switched to an Ansible playbook. Ansible has proper Kubernetes modules that handle waiting for readiness natively, which is much cleaner than writing sleep loops in bash.
The CA cert gets written to a temp file because Ansible’s Kubernetes modules expect a file path rather than a raw string. It gets cleaned up right after the playbook finishes.
The Ansible Playbook
The playbook lives in ansible/argocd-bootstrap.yml and does two things: waits for the ArgoCD server to be healthy, then applies the root app manifest.
- name: Bootstrap ArgoCD root application
hosts: localhost
connection: local
gather_facts: false
tasks:
- name: Wait for ArgoCD server deployment to be available
kubernetes.core.k8s_info:
api_version: apps/v1
kind: Deployment
name: argocd-server
namespace: argocd
host: "{{ cluster_endpoint }}"
ca_cert: "{{ cluster_ca_cert_path }}"
api_key: "{{ cluster_token }}"
wait: true
wait_condition:
type: Available
status: "True"
wait_timeout: 180
- name: Apply ArgoCD root application
kubernetes.core.k8s:
state: present
src: "{{ repo_root }}/argocd/bootstrap/root-app-{{ environment }}.yaml"
host: "{{ cluster_endpoint }}"
ca_cert: "{{ cluster_ca_cert_path }}"
api_key: "{{ cluster_token }}"
The k8s_info module blocks until the Available condition is true or the timeout is hit. Only then does the k8s module apply the root app. No polling loops, no sleep commands, no guessing.
Before running, install the required collection:
ansible-galaxy collection install -r ansible/requirements.yml
How the Playbook Gets Cluster Permissions
The playbook runs on the same machine as Terraform, your laptop or the GitHub Actions runner. It does not use a kubeconfig file at all. The cluster endpoint, CA cert, and token are passed directly as extra vars.
The token comes from data.aws_eks_cluster_auth, which exchanges your current IAM identity for a short-lived Kubernetes bearer token. Whoever runs Terraform needs eks:DescribeCluster permission and needs to be authorized in the cluster’s access entries. The GitHub Actions role created by the CloudFormation bootstrap already has both.
The Root App
The root application is a single ArgoCD Application manifest that points at the argocd/apps directory in the repo:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: k8s-gitops-root-dev
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/ajaydhungel7/k8s-gitops
targetRevision: dev
path: argocd/apps
helm:
valueFiles:
- values.yaml
- values-dev.yaml
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
When this gets applied, ArgoCD reads the argocd/apps Helm chart, renders values-dev.yaml into a set of Application CRDs, and starts syncing all of them. This is the App of Apps pattern – one root app that generates all the other apps.
App of Apps: values-dev.yaml as the Source of Truth
The argocd/apps directory is a Helm chart whose single template generates ArgoCD Application CRDs from a values file. Everything ArgoCD deploys in the dev environment is declared in values-dev.yaml.
I split it into two lists. Platform applications are system-level components the cluster needs to function:
platformApplications:
- name: aws-ebs-csi-driver
namespace: kube-system
syncWave: "-20"
source:
repoURL: https://kubernetes-sigs.github.io/aws-ebs-csi-driver
chart: aws-ebs-csi-driver
targetRevision: 2.x.x
- name: aws-load-balancer-controller
namespace: kube-system
syncWave: "-20"
source:
repoURL: https://aws.github.io/eks-charts
chart: aws-load-balancer-controller
targetRevision: 1.11.0
- name: cluster-autoscaler
namespace: kube-system
syncWave: "-20"
- name: external-secrets
namespace: external-secrets
syncWave: "-10"
- name: configs
syncWave: "-5"
- name: jenkins
syncWave: "0"
Workload applications are the actual services:
workloadApplications:
- name: mongodb
syncWave: "5"
- name: redis
syncWave: "5"
- name: backend
syncWave: "10"
- name: frontend
syncWave: "10"
To add something new to the cluster, you add an entry here and push. ArgoCD picks it up and deploys it. To remove something, you delete the entry and ArgoCD prunes it. This file is the complete description of what should be running.
How the Helm Template Generates Applications
The template in argocd/apps iterates over both lists and renders one ArgoCD Application CRD per entry:
{{- $defaults := dict "project" .Values.project "destinationServer" .Values.destinationServer -}}
{{- $applications := concat .Values.platformApplications .Values.workloadApplications -}}
{{- range $app := $applications }}
{{- $spec := mergeOverwrite (deepCopy $defaults) $app -}}
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: {{ $spec.name }}
namespace: argocd
{{- if $spec.syncWave }}
annotations:
argocd.argoproj.io/sync-wave: {{ $spec.syncWave | quote }}
{{- end }}
spec:
project: {{ $spec.project }}
source:
{{- if $spec.source.repoURL }}
repoURL: {{ $spec.source.repoURL }}
{{- end }}
{{- if $spec.source.chart }}
chart: {{ $spec.source.chart }}
{{- end }}
{{- if $spec.source.path }}
path: {{ $spec.source.path }}
{{- end }}
targetRevision: {{ default $.Values.targetRevision $spec.source.targetRevision }}
{{- with $spec.source.helm }}
helm:
{{- with .releaseName }}
releaseName: {{ . }}
{{- end }}
{{- with .valueFiles }}
valueFiles:
{{- range . }}
- {{ . }}
{{- end }}
{{- end }}
{{- with .values }}
values: |
{{ . | indent 8 }}
{{- end }}
{{- end }}
destination:
server: {{ $spec.destinationServer }}
namespace: {{ $spec.namespace }}
syncPolicy:
{{- if $spec.syncPolicy }}
{{ toYaml $spec.syncPolicy | indent 4 }}
{{- end }}
syncOptions:
- CreateNamespace={{ ternary "true" "false" (default false $spec.createNamespace) }}
---
{{- end }}
The mergeOverwrite call merges each app entry on top of the shared defaults, so you only specify what is different per app. The syncWave annotation only renders if the field is set. The ternary on CreateNamespace lets you control namespace creation per app with a simple boolean rather than repeating the sync option everywhere.
When ArgoCD syncs the root app, it runs this template against values-dev.yaml and gets back a list of fully formed Application CRDs. Adding a new service to the cluster is just an entry in the values file and a push.
Sync Waves: Why Order Matters
ArgoCD does not deploy everything at once. Sync waves control the order, and getting it wrong causes real failures. Here is how I have mine set up and why:
| Wave | What deploys | Why |
|---|---|---|
| -20 | EBS CSI, ALB Controller, Cluster Autoscaler, Metrics Server | Storage and networking primitives. Everything else depends on these. |
| -10 | External Secrets Operator | Must be running before any app tries to pull a secret. |
| -5 | Configs (StorageClass, ClusterSecretStore, namespaces) | ClusterSecretStore must exist before ExternalSecret resources are created. |
| 0 | Jenkins | CI tooling, independent of workloads. |
| 5 | MongoDB, Redis | Databases must be up before the apps that use them. |
| 10 | Backend, Frontend | Last to deploy, all dependencies are ready. |
If you skip the waves and deploy everything at once, ExternalSecret resources fail because ESO is not running yet, apps fail because the database is not ready, and the ALB controller does not exist to reconcile Ingress resources. Waves are not optional.
IRSA: How Controllers Get AWS Permissions
Every controller that needs to talk to AWS needs an IAM role. The EBS CSI driver needs to provision volumes, the ALB Controller needs to manage load balancers, External Secrets needs to read from Secrets Manager. Those roles are created by Terraform, not ArgoCD.
The connection is the service account annotation. Terraform creates an IAM role with a trust policy scoped to a specific Kubernetes service account:
condition {
test = "StringEquals"
variable = "${local.oidc_host}:sub"
values = ["system:serviceaccount:kube-system:ebs-csi-controller-sa"]
}
ArgoCD then deploys the controller with that annotation on the service account:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/k8s-gitops-dev-ebs-csi-role
When the pod starts, the EKS pod identity agent sees the annotation, validates the service account against the OIDC provider, and injects temporary AWS credentials. No static keys, nothing to rotate, no credentials stored anywhere.
Terraform creates the roles. ArgoCD annotates the service accounts. Neither needs to know the details of the other, which is exactly the separation I was after.
Secrets: From AWS to the Pod
The secrets flow follows the same idea. Terraform creates the KMS key and the Secrets Manager entries and never puts credentials directly in Kubernetes.
External Secrets Operator, deployed by ArgoCD, holds an IRSA role that lets it read from Secrets Manager. A ClusterSecretStore points ESO at the right AWS region and account. Then ExternalSecret resources in each namespace declare which secrets to pull and what to name them as Kubernetes Secrets.
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: backend-secrets
spec:
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
data:
- secretKey: MONGODB_PASSWORD
remoteRef:
key: /k8s-gitops/dev/mongodb/credentials
property: password
The pod mounts the resulting Kubernetes Secret as an environment variable. At no point does a credential appear in Git, in a values file, or in Terraform state.
What Happens After One Command
Once terragrunt run --all apply finishes, here is what has happened without any further manual steps:
- VPC, subnets, and networking are provisioned
- EKS cluster and node group are running
- OIDC provider is attached to the cluster
- IAM roles for all controllers are created
- KMS key and Secrets Manager entries exist
- ECR repositories are ready
- ArgoCD is installed and the server is healthy
- The root app is applied
- ArgoCD reads
values-dev.yamland generates all Application CRDs - Platform controllers install in wave -20: EBS CSI, ALB Controller, Cluster Autoscaler, Metrics Server
- External Secrets Operator installs in wave -10
- ClusterSecretStore and namespaces are created in wave -5
- Jenkins installs in wave 0
- MongoDB and Redis start in wave 5, pulling credentials from Secrets Manager via ESO
- Backend and Frontend deploy in wave 10, connecting to the databases
And just like that, the entire platform assembles itself. One command, walk away, come back to a running cluster.
Further Reading
That’s all for now! Thank you for making it to the end.