Skip to content

hallllow29/gitops-devsecops-aws

Repository files navigation

GitOps DevSecOps AWS

IaC: Terraform Orchestration: Terragrunt Cloud: AWS Runtime: Kubernetes CI/CD: GitHub Actions Methodology: GitOps DevSecOps License: MIT

A production-style GitOps + DevSecOps reference architecture for AWS that demonstrates how a small/medium team can ship infrastructure-as-code changes safely through automated security gates, ephemeral dev environments, and human-gated promotion to production.

What problem does this solve? "Someone wants to add a new service to our AWS platform. How do we make sure their change doesn't break production, doesn't introduce vulnerabilities, follows our policies, and gets reviewed by the right people β€” all automatically, with a clear audit trail?"

This repo is the answer.


Table of Contents


What's inside

A multi-region, multi-cluster AWS Kubernetes platform with:

  • 3 EKS clusters across 3 regions β€” clear separation of concerns
  • Self-hosted GitHub Actions runners inside the security cluster (no third-party CI access to AWS)
  • GitOps pipeline driven by GitHub PRs β€” no manual terragrunt apply once set up
  • Security gates at every step: secret scanning, IaC scanning, container scanning, policy enforcement
  • Ephemeral dev environments β€” created on merge to dev, destroyed on merge to main
  • Centralised vulnerability management β€” every scan feeds into DefectDojo
  • IRSA everywhere β€” pods get AWS credentials through OIDC, no static keys
  • Cost-aware NAT with fck-nat instead of $32/month NAT Gateways

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚   Paris (eu-west-3)              Ireland (eu-west-1)     Frankfurt      β”‚
β”‚   ─────────────────              ─────────────────       (eu-central-1) β”‚
β”‚   security-eks (PERMANENT)       dev-eks (EPHEMERAL)     prod-eks       β”‚
β”‚                                                          (PERMANENT)    β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚   β”‚ SonarQube           β”‚        β”‚ Test workloads  β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ DefectDojo          β”‚  scan  β”‚ created by PR   β”‚ β†’   β”‚ Real     β”‚   β”‚
β”‚   β”‚ Dependency Track    β”‚ ────►  β”‚ destroyed on    β”‚ ──► β”‚ services β”‚   β”‚
β”‚   β”‚ Vault               β”‚        β”‚ merge to main   β”‚     β”‚          β”‚   β”‚
β”‚   β”‚ Harbor              β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚   β”‚ Prometheus+Grafana  β”‚                                                β”‚
β”‚   β”‚ Atlantis            β”‚                                                β”‚
β”‚   β”‚ GitHub Runner (ARC) β”‚                                                β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

         β–²                                β–²                         β–²
         β”‚                                β”‚                         β”‚
         └──── runs CI workflows ─────────┴──── deployed by β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                  GitOps

Why 3 regions?

Region Purpose Why this region
eu-west-3 (Paris) Security tooling β€” permanent Central scanning, separate blast radius
eu-west-1 (Ireland) Ephemeral dev β€” short-lived Lots of capacity, fast spin-up
eu-central-1 (Frankfurt) Production workloads β€” permanent Low-latency for EU users

The GitOps flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Developer  β”‚
β”‚ feature/xyz  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚ git push
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PR  feature/xyz  β†’  dev              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ Security Scan workflow runs       β”‚
β”‚    - TruffleHog (secrets)            β”‚
β”‚    - KICS (IaC)                      β”‚
β”‚    - Checkov (IaC)                   β”‚
β”‚    - findings ➜ DefectDojo           β”‚
β”‚  β€’ PR Checks workflow runs           β”‚
β”‚    - terragrunt fmt/validate/plan    β”‚
β”‚    - OPA/Conftest policies           β”‚
β”‚    - SonarQube SAST                  β”‚
β”‚    - Trivy scan                      β”‚
β”‚    - findings ➜ DefectDojo           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ βœ… reviewer approves
               β”‚ merge
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ push  β†’  dev                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Deploy Dev workflow                 β”‚
β”‚    1. assume AWS role via OIDC       β”‚
β”‚    2. terragrunt apply environments/dev
β”‚    3. smoke tests                    β”‚
β”‚    β†’ Cluster dev (Ireland) is up     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ Manual testing in dev
               β”‚ (kubectl, integration tests, ...)
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PR  dev  β†’  main                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ Same scans rerun against final    β”‚
β”‚    state                             β”‚
β”‚  β€’ Reviewer approves                 β”‚
β”‚  β€’ Merge                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ pull_request closed + merged
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Deploy Prod workflow                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Job 1: destroy-dev                  β”‚
β”‚    terragrunt destroy environments/dev
β”‚    (dev was just for validation)     β”‚
β”‚                                      β”‚
β”‚  Job 2: deploy-prod                  β”‚
β”‚    ⚠ environment: production         β”‚
β”‚    (manual approval gate)            β”‚
β”‚    terragrunt apply environments/prodβ”‚
β”‚    β†’ Prod (Frankfurt) updated        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why dev is ephemeral

Two reasons:

  1. Cost β€” running dev 24/7 doubles your bill. Spin it up only when needed.
  2. State hygiene β€” every PR gets a fresh cluster. No "works on my dev" because the dev was tweaked by hand three releases ago.

The trade-off: if multiple PRs are in flight, they queue (only one dev at a time in this repo). For multi-team scale, see the Roadmap section about preview environments per PR.


Tech stack

Layer Tool Why
IaC Terraform 1.9 + Terragrunt 1.0 Module reuse, remote state, dependencies
State S3 + DynamoDB lock Standard AWS pattern
Orchestration EKS 1.31 Managed Kubernetes
NAT fck-nat $3/mo instead of $32/mo for NAT Gateway
Ingress AWS Load Balancer Controller Native ALB/NLB integration
Storage EBS CSI driver Persistent volumes for stateful tools
Cluster auth EKS Access Entries (API mode) Replaces deprecated aws-auth ConfigMap
Workload auth IRSA (OIDC) No static keys in pods
SAST SonarQube Code quality + security
IaC scan KICS + Checkov Two scanners cover different rule sets
Secret scan TruffleHog Verified-only mode, low false positives
Container scan Trivy Filesystem + image scanning
Policy OPA + Conftest Reusable policies tested in CI
Vuln mgmt DefectDojo Aggregates findings from all scanners
SCA Dependency Track Component analysis
Secrets Vault Dynamic credentials, encryption-as-a-service
Registry Harbor Private container registry with Trivy built-in
Observability kube-prometheus-stack Prometheus + Grafana + Alertmanager
PR automation Atlantis Terraform apply via PR comments
Runners Actions Runner Controller Self-hosted runners as K8s pods

Repository layout

.
β”œβ”€β”€ bootstrap/                        # One-time setup: S3 backends, DynamoDB, GitHub OIDC
β”‚   β”œβ”€β”€ main.tf
β”‚   β”œβ”€β”€ variables.tf
β”‚   β”œβ”€β”€ terraform.tfvars.example      # Copy to terraform.tfvars and set github_repo
β”‚   └── outputs.tf
β”œβ”€β”€ environments/
β”‚   β”œβ”€β”€ root.hcl                      # Shared terragrunt config (backend, provider)
β”‚   β”œβ”€β”€ dev/                          # Ireland β€” ephemeral
β”‚   β”‚   β”œβ”€β”€ networking/terragrunt.hcl
β”‚   β”‚   β”œβ”€β”€ eks/terragrunt.hcl
β”‚   β”‚   └── eks-addons/terragrunt.hcl
β”‚   β”œβ”€β”€ prod/                         # Frankfurt β€” permanent
β”‚   β”‚   β”œβ”€β”€ networking/
β”‚   β”‚   β”œβ”€β”€ eks/
β”‚   β”‚   β”œβ”€β”€ eks-addons/
β”‚   β”‚   └── prod-services/
β”‚   └── security/                     # Paris β€” permanent
β”‚       β”œβ”€β”€ networking/
β”‚       β”œβ”€β”€ eks/
β”‚       β”œβ”€β”€ eks-addons/
β”‚       └── security-tools/
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ networking/                   # VPC, subnets, IGW, fck-nat, flow logs
β”‚   β”œβ”€β”€ eks/                          # Cluster, node group, OIDC provider, access entries
β”‚   β”œβ”€β”€ eks-addons/                   # EBS CSI, CoreDNS, kube-proxy, LBC + IRSA
β”‚   β”œβ”€β”€ security-tools/               # IRSA roles + Harbor S3 + KMS
β”‚   └── prod-services/                # Generic IRSA role for prod apps
β”œβ”€β”€ kubernetes/
β”‚   β”œβ”€β”€ namespaces/                   # security-tools, dev-services, prod-services
β”‚   β”œβ”€β”€ helm/                         # values.yaml for each tool
β”‚   └── manifests/                    # github-runner, ingresses, network policies
β”œβ”€β”€ policies/                         # OPA/Rego policies (run by Conftest)
β”‚   β”œβ”€β”€ *.rego
β”‚   └── tests/
β”œβ”€β”€ scripts/
β”‚   └── smoke-tests.sh                # Post-deploy validation
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ workflows/
β”‚   β”‚   β”œβ”€β”€ security-scan.yaml        # Runs on PR open
β”‚   β”‚   β”œβ”€β”€ pr-checks.yaml            # Runs on PR open (terraform validate/plan)
β”‚   β”‚   β”œβ”€β”€ deploy-dev.yaml           # Runs on push to dev (after PR merge)
β”‚   β”‚   └── deploy-prod.yaml          # Runs on PR merge to main
β”‚   └── CODEOWNERS
β”œβ”€β”€ README.md
β”œβ”€β”€ SETUP.md                          # Step-by-step setup guide
β”œβ”€β”€ Makefile
└── LICENSE

Quick start

Full instructions with copy-paste commands, troubleshooting, and screenshots are in SETUP.md. The below is a high-level summary.

  1. Fork this repo to your GitHub account.
  2. Configure AWS (paid account; Free Plan strict won't allow t3.large).
  3. Bootstrap:
    cd bootstrap
    cp terraform.tfvars.example terraform.tfvars   # set github_repo = "you/your-fork"
    terraform init && terraform apply
  4. Deploy permanent clusters (security + prod, ~25 min each):
    terragrunt run --all apply --working-dir environments/security
    terragrunt run --all apply --working-dir environments/prod
  5. Install security tools (Helm) β€” see SETUP.md Β§6.
  6. Configure GitHub β€” secrets, environments, branch protection.
  7. Try the flow: branch off, open PR to dev, watch the magic.

How a developer uses this

Day-to-day, a developer only ever does this:

# 1. Branch off main
git checkout -b feature/add-service-x

# 2. Make changes (e.g. add a new IAM role in modules/prod-services/main.tf)
$EDITOR modules/prod-services/main.tf

# 3. Push and open PR to dev
git push origin feature/add-service-x
gh pr create --base dev --head feature/add-service-x

# 4. Wait for green checks (scans + plan + policies)
# 5. Approve & merge β†’ dev cluster gets created automatically with the change
# 6. Test manually in dev (kubectl, curl, etc.)
# 7. If happy, open PR dev β†’ main, approve, merge
# 8. Prod gets updated, dev gets destroyed automatically

That's it. No terraform apply from anyone's laptop. Ever.


Security controls

At-rest

  • S3 state buckets: KMS-encrypted, versioning, public access blocked
  • DynamoDB locks: server-side encryption
  • EKS secrets: optional KMS envelope encryption (opt-in to avoid recreating existing clusters)
  • EBS volumes: encrypted by default via launch template

In-transit

  • EKS API endpoint: private + public with configurable CIDR allowlist (default open for PoC; restrict per env)
  • All workloads: traffic stays in VPC unless explicitly routed out via fck-nat

Access

  • IRSA for every pod that needs AWS access β€” no instance-profile shortcuts, no static keys
  • OIDC trust scoped to specific GitHub repo (repo:owner/repo:*) for GitHub Actions
  • EKS Access Entries in API mode (no aws-auth ConfigMap drift)
  • CODEOWNERS + branch protection forces human review for every change

Audit & observability

  • VPC flow logs to CloudWatch (7-day retention by default)
  • EKS control plane logs (api, audit, authenticator, controllerManager, scheduler)
  • Prometheus scraping all clusters
  • DefectDojo as single pane of glass for findings across scanners

CI/CD security

  • Self-hosted runners in private subnets inside the security cluster
  • No third-party CI service holds AWS credentials
  • GitHub OIDC β†’ AWS STS β†’ assume role (short-lived credentials)
  • PRs to main require approving review from @hallllow29 (or configured CODEOWNER)

OPA policies enforced

Run by Conftest in pr-checks.yaml:

Policy What it blocks
no_public_s3 S3 buckets without PublicAccessBlock
enforce_imdsv2 EC2 instances allowing IMDSv1
require_encryption EBS volumes without encryption
eks_private_endpoint EKS clusters without endpoint_private_access
no_privileged_containers Privileged pods
require_tags Resources without Name and Environment tags
no_wide_ingress Security groups with SSH open to 0.0.0.0/0
eks_secrets_encryption EKS clusters without KMS encryption

Each policy has a corresponding test in policies/tests/.


Cost

Running everything 24/7 (all 3 clusters up):

Resource $/month
3Γ— EKS control plane ($0.10/h Γ— 730h) $216
6Γ— t3.large SPOT nodes ~$130
ALBs (1 per exposed workload) $20–60
EBS volumes (PVCs for stateful tools) ~$20
3Γ— fck-nat (t4g.nano) ~$5
S3 + KMS + DynamoDB + CloudWatch logs ~$10
Total ~$400–500

Cost-saving notes:

  • Dev is ephemeral β€” only pays during testing windows. Realistic monthly: ~$350.
  • All node groups use SPOT instances (~70% cheaper than on-demand).
  • Replace fck-nat with NAT Gateway only if you need 99.99% NAT uptime.
  • For learning/portfolio: spin up, capture screenshots, destroy. ~$15 one-off.

See docs/COST.md for a detailed optimisation matrix (Fargate, Karpenter, instance type choices, etc.).


Customisation

Adding a new application to production

  1. Add Terraform IAM role and dependencies in modules/prod-services/.
  2. Add Helm chart values in kubernetes/helm/<your-app>/.
  3. Add Kubernetes manifests in kubernetes/manifests/prod-services/.
  4. Open PR to dev β€” scans run, dev gets the change, test, then PR to main.

Tuning node sizing per environment

Override in the env's eks/terragrunt.hcl:

inputs = {
  environment    = "prod"
  instance_types = ["t3.xlarge"]   # default is t3.large
  desired_size   = 4
  max_size       = 8
  min_size       = 2
}

Restricting the EKS endpoint CIDR

inputs = {
  endpoint_public_access_cidrs = ["<YOUR_OFFICE_CIDR>/32"]
}

Enabling EKS secrets encryption (new clusters only β€” opt-in to avoid recreate)

inputs = {
  enable_secrets_encryption = true
}

Roadmap

  • Preview environments per PR β€” dev-pr-<number> instead of single shared dev
  • Karpenter for node autoscaling (replace fixed ASG)
  • ArgoCD as the source of truth for Kubernetes manifests
  • External DNS for Route53 automation
  • cert-manager with Let's Encrypt for public TLS
  • Slack alert routing in Alertmanager
  • Custom runner image with awscli, kubectl, helm, jq pre-installed
  • Terratest module tests
  • Renovate / Dependabot for Helm chart and module updates
  • Cost dashboard with AWS Cost Explorer integration
  • Disaster recovery runbook in docs/DR.md

Troubleshooting

See docs/TROUBLESHOOTING.md for solutions to common issues:

  • terragrunt run --all apply failing with state-checksum mismatch
  • IAM roles surviving after partial destroy ("EntityAlreadyExists")
  • EKS Access Entry rejecting assumed-role ARNs
  • Self-hosted runner stuck in Pending
  • Helm release cannot re-use a name that is still in use
  • DefectDojo returning HTML on /api/v2/import-scan/ (ALLOWED_HOSTS)

Contributing

PRs welcome. The flow this repo demonstrates is also the flow used to develop it:

  1. Branch off main
  2. PR to dev β€” scans must pass, reviewer approves
  3. Merge β†’ dev cluster validates the change
  4. PR dev β†’ main β€” final review
  5. Merge β†’ prod updated, dev torn down

See CONTRIBUTING.md for code style and commit message conventions.


License

MIT β€” use, modify, distribute. See LICENSE.


Acknowledgements

About

πŸš€ Production-ready GitOps template for AWS with built-in security scanning, policy enforcement, and multi-region deployment automation. Deploy secure infrastructure across 3 AWS regions with zero-touch GitOps workflow.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors