Skip to main content

Production Deployment Guide

This guide covers security, reliability, and performance considerations when deploying Aileron to production.

warning

The platform is currently in alpha (v0.1.0-alpha). The items below are recommendations; adjust per your organization's security and compliance requirements.

Pre-Deployment Checklist

Required

  • All default passwords changed (PostgreSQL, Redis, Keycloak, JWT Secret)
  • TLS certificates configured (Ingress or reverse proxy)
  • DNS records created (at minimum for static service hosts, with workspace hosts covered by wildcard or automated records)
  • Keycloak Redirect URIs updated for the production domain
  • VITE_ variables contain no secrets
  • Docker socket mount removed (use Kubernetes mode)
  • Database connection uses encryption (sslmode=require)
  • Container images pinned to specific tags (not latest or dev)
  • Resource limits/requests configured
  • Persistent storage configured (PVC with appropriate StorageClass)
  • Monitoring and alerting in place
  • Backup strategy established
  • Log collection configured

Security Hardening

Password and Secret Management

Never use default passwords. The following must be changed:

# values.yaml - production example
postgres:
auth:
password: "<strong-random-password>"

keycloak:
auth:
adminUser: admin
adminPassword: "<strong-random-password>"

workspaceManager:
env:
SECRET_KEY: "<random-256-bit-key>"
ACCESS_TOKEN_EXPIRE_MINUTES: "60" # Shorten token lifetime
REFRESH_TOKEN_EXPIRE_DAYS: "1"

Use Kubernetes Secrets rather than plaintext values in values.yaml:

# Create a secret
kubectl create secret generic aileron-secrets \
--from-literal=DATABASE_PASSWORD='<password>' \
--from-literal=SECRET_KEY='<key>' \
--from-literal=KEYCLOAK_ADMIN_PASSWORD='<password>' \
-n aileron
External Secret Management

Combine with External Secrets Operator or Sealed Secrets to sync from AWS Secrets Manager, HashiCorp Vault, etc.

TLS / HTTPS

All public-facing services must use HTTPS:

publicRouting:
scheme: https

ingress:
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
tls:
- secretName: aileron-tls
hosts:
- example.com
- "*.example.com"

Keycloak should be updated accordingly:

keycloak:
env:
KC_HOSTNAME_STRICT: "true"
KC_HOSTNAME_STRICT_HTTPS: "true"
KC_PROXY_HEADERS: xforwarded

Container Image Security

  • Use a private registry; configure global.imagePullSecrets
  • Pin image tags to commit SHAs or semantic versions — avoid latest
  • Scan images regularly (Trivy, Snyk, etc.)
global:
imagePullSecrets:
- name: registry-credentials

frontend:
image:
repository: your-registry.com/workspace-ui
tag: v0.1.0
pullPolicy: IfNotPresent

Network Security

  • Enable Cilium for network isolation between workspaces
  • Restrict access to the Keycloak Admin Console
  • Keep workspace domain allowlists as precise as possible
cilium:
enabled: true

firewall:
defaults:
workspace:
allowedDomains:
- github.com
- api.github.com
- registry.npmjs.org
- pypi.org
- api.anthropic.com
browser:
allowedDomains:
- github.com

Resource Planning

Current live cluster settings (April 13, 2026, aileron namespace)

These are the actual resources.requests / resources.limits currently observed in the cluster:

WorkloadContainerRequestsLimits
aileron-aileron-workspace-managerworkspace-managerCPU 500m / Memory 1GiCPU 2 / Memory 2Gi
aileron-aileron-frontendfrontendNot setNot set
aileron-aileron-keycloakkeycloakNot setNot set
aileron-aileron-workspace-operatorworkspace-operatorNot setNot set
aileron-aileron-coturncoturnNot setNot set
aileron-aileron-postgrespostgresNot setNot set
aileron-aileron-redisredisNot setNot set
workspace-runtime-default-workspaceruntimeNot setNot set
workspace-browser-default-workspacebrowserNot setNot set
workspace-canvas-default-workspacecanvasNot setNot set
note

At the moment, only workspace-manager has explicit requests / limits in the live cluster. The other platform services and the current default workspace deployments do not yet set container resources in their Pod specs.

Platform Service Recommendations

ServiceCPU RequestCPU LimitMemory RequestMemory Limit
Frontend100m500m128Mi256Mi
Workspace Manager250m1000m256Mi512Mi
Workspace Operator100m500m128Mi256Mi
Keycloak500m1000m512Mi1Gi
PostgreSQL250m1000m256Mi1Gi
Redis100m500m128Mi256Mi
CoTURN100m500m64Mi128Mi

Workspace Pod Recommendations

ComponentCPU RequestCPU LimitMemory RequestMemory Limit
Runtime500m2000m512Mi2Gi
Browser (neko)1000m2000m1Gi2Gi
Canvas Runtime250m1000m256Mi512Mi

Resource configuration entry points in the chart

The Helm chart already exposes the following values:

frontend:
resources: {}

workspaceManager:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi

workspaceOperator:
resources: {}

postgres:
resources: {}

redis:
resources: {}

keycloak:
resources: {}

coturn:
resources: {}

kubernetes:
workspaceDefaults:
runtime:
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
browser:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
canvas:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
Chart defaults vs live cluster

Although kubernetes.workspaceDefaults.*.resources already has defaults in the chart, and the current aileron-aileron-platform-config ConfigMap contains the corresponding JSON values, the existing workspace-runtime-default-workspace, workspace-browser-default-workspace, and workspace-canvas-default-workspace Deployments still show resources: {}. Treat the actual Deployment / Pod spec as the source of truth when verifying resource enforcement.

# Set resource limits in values.yaml
workspaceManager:
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi

How to inspect the current K8s resource settings

Inspect platform services and StatefulSets:

kubectl get deploy,statefulset -n aileron \
-o jsonpath='{range .items[*]}{.kind}{"\t"}{.metadata.name}{"\t"}{range .spec.template.spec.containers[*]}{.name}{": requests="}{.resources.requests.cpu}{"/"}{.resources.requests.memory}{", limits="}{.resources.limits.cpu}{"/"}{.resources.limits.memory}{"; "}{end}{"\n"}{end}'

Inspect the current workspace Deployments:

kubectl get deploy workspace-runtime-default-workspace \
workspace-browser-default-workspace \
workspace-canvas-default-workspace \
-n aileron -o yaml

Inspect whether the workspace default resources are present in platform config:

kubectl get configmap aileron-aileron-platform-config -n aileron \
-o jsonpath='{.data.RUNTIME_K8S_RUNTIME_RESOURCES}{"\n"}{.data.RUNTIME_K8S_BROWSER_RESOURCES}{"\n"}{.data.RUNTIME_K8S_CANVAS_RESOURCES}{"\n"}'

Storage Planning

PurposeRecommended SizeAccess ModeNotes
PostgreSQL20–50GiReadWriteOnceScale with workspace count and history
Redis5–10GiReadWriteOnceTask queue and cache
Workspace data10–50Gi/workspaceReadWriteOnceCode and Claude data

External Service Integration

In production, consider using managed/external services rather than Helm-managed ones:

External PostgreSQL

postgres:
enabled: false # Disable bundled PostgreSQL

workspaceManager:
env:
DATABASE_URL: "postgresql://user:pass@rds-instance.region.rds.amazonaws.com:5432/aileron?sslmode=require"

External Redis

redis:
enabled: false # Disable bundled Redis

workspaceManager:
env:
REDIS_URL: "rediss://user:pass@redis-cluster.region.cache.amazonaws.com:6379"
CELERY_BROKER_URL: "rediss://user:pass@redis-cluster.region.cache.amazonaws.com:6379/0"
CELERY_RESULT_BACKEND: "rediss://user:pass@redis-cluster.region.cache.amazonaws.com:6379/1"

External Keycloak

If you already have an enterprise Keycloak or another OIDC provider:

keycloak:
enabled: false # Disable bundled Keycloak

workspaceManager:
env:
KEYCLOAK_SERVER_URL: "https://sso.company.com"
KEYCLOAK_REALM: "aileron"
KEYCLOAK_CLIENT_ID: "your-client-id"

Backup Strategy

Database Backup

# Back up PostgreSQL
kubectl exec -n aileron statefulset/aileron-postgres -- \
pg_dump -U postgres aileron > backup-$(date +%Y%m%d).sql

# Schedule with a CronJob

Keycloak Realm Backup

# Export realm settings
ADMIN_TOKEN=$(curl -s -X POST "https://keycloak.example.com/realms/master/protocol/openid-connect/token" \
-d "client_id=admin-cli" \
-d "username=admin" \
-d "password=$ADMIN_PASSWORD" \
-d "grant_type=password" | jq -r '.access_token')

curl -X POST "https://keycloak.example.com/admin/realms/aileron/partial-export?exportClients=true&exportGroupsAndRoles=true" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-o realm-backup-$(date +%Y%m%d).json

Workspace Data Backup

  • Use VolumeSnapshot on PVCs (if supported by the CSI driver)
  • Or use Velero for cluster-wide backups

Monitoring

Health Check Endpoints

ServiceEndpoint
Workspace ManagerGET /health
Workspace RuntimeGET /health
KeycloakGET /health/ready
PostgreSQLpg_isready
Redisredis-cli ping

Metrics

  • Keycloak: GET /metrics (Prometheus format, requires KC_METRICS_ENABLED=true)
  • Celery Flower: http://<manager>:5555/api/tasks (task monitoring)
  • Workspace Manager: /docs endpoint can be used for API availability checks
ConditionSeverityNotes
Pod CrashLoopBackOffCriticalService failure
Keycloak unhealthyCriticalAffects all logins
PostgreSQL unhealthyCriticalDatabase outage
Redis unhealthyHighTask queue outage
PVC usage > 80%WarningStorage almost full
Rising Celery task failuresWarningAutomation failures

Upgrade Workflow

Helm Chart Upgrade

# 1. Review changes
helm diff upgrade aileron helm/aileron \
--namespace aileron \
-f production-values.yaml

# 2. Back up
kubectl exec -n aileron statefulset/aileron-postgres -- \
pg_dump -U postgres aileron > pre-upgrade-backup.sql

# 3. Execute the upgrade
helm upgrade aileron helm/aileron \
--namespace aileron \
-f production-values.yaml

# 4. Verify
kubectl get pods -n aileron
kubectl logs -n aileron deployment/aileron-workspace-manager --tail=50

Database Migrations

# Run migrations (if provided)
./scripts/db/run-migrations.sh
caution

Always back up the database before upgrading. CRD updates require extra care — helm upgrade does not automatically update CRDs. If CRD schemas have changed:

kubectl apply -f helm/aileron/crds/