Troubleshooting
This page collects common issues and fixes encountered during deployment and runtime.
Startup Issues
Frontend Can't Log In After docker compose
Symptom: Opening http://localhost:8082 and clicking login redirects to Keycloak but shows an error or fails to load.
Possible causes:
- Keycloak hasn't finished initializing
- Realm import failed
- Redirect URI mismatch
Debugging:
# 1. Check Keycloak status
docker compose ps keycloak
# 2. Verify Keycloak is up
docker compose logs keycloak | grep -i "started"
# 3. Check realm was imported
curl http://localhost:8080/realms/aileron/.well-known/openid-configuration
# 4. If realm is missing, restart Keycloak
docker compose restart keycloak
Keycloak needs about 60 seconds to initialize on first startup. Wait for docker compose ps to show healthy before trying to log in.
workspace-manager Fails to Start: Database Connection Error
Symptom: docker compose logs workspace-manager shows connection refused or password authentication failed.
Debugging:
# 1. Check postgres status
docker compose ps postgres
docker compose logs postgres
# 2. Test connection manually
docker compose exec postgres psql -U postgres -d aileron -c "SELECT 1;"
# 3. Verify database is initialized
docker compose exec postgres psql -U postgres -d aileron -c "\dt"
Fixes:
- If database isn't initialized, check the scripts under
init-sql/ - If the password is wrong, confirm
DATABASE_URLmatchesPOSTGRES_PASSWORD - Full cleanup and restart:
python scripts/dev/docker/ops.py cleanup && python scripts/dev/docker/ops.py up --build
Kubernetes Pod Stuck in Pending
Possible causes:
- PVC cannot bind (StorageClass missing)
- Insufficient resources (CPU / Memory)
- Image pull failure
- Node selector / taint mismatch
Debugging:
# 1. Inspect pod events
kubectl describe pod <pod-name> -n aileron
# 2. Check PVC status
kubectl get pvc -n aileron
# 3. Check StorageClass
kubectl get storageclass
# 4. Check node resources
kubectl top nodes
Keycloak OIDC Redirect Fails
Symptom: After login, redirect back to the frontend shows an Invalid redirect uri error.
Cause: The Keycloak client's Valid Redirect URIs don't include the current domain.
Fix:
- Open Keycloak Admin Console
aileronrealm → Clients →aileron-frontend- Add Valid Redirect URIs:
- Docker:
http://localhost:8082/* - Kubernetes:
https://example.com/*
- Docker:
- Update Web Origins for CORS
- Save
If editing the Helm chart realm.json, redeploy:
helm upgrade aileron helm/aileron \
--namespace aileron
Workspace Issues
Workspace Creation Hangs
Docker mode debugging:
# 1. Check workspace-manager log
docker compose logs -f workspace-manager
# 2. List all workspace containers
docker ps --filter "name=workspace-"
# 3. Verify Docker socket mount
docker compose exec workspace-manager ls -la /var/run/docker.sock
Kubernetes mode debugging:
# 1. Check Workspace CR status
kubectl get workspaces -A
kubectl describe workspace <name> -n <namespace>
# 2. Check Operator logs
kubectl logs -n aileron deployment/aileron-workspace-operator
# 3. Check target namespace pods
kubectl get pods -n workspace-system
kubectl describe pod workspace-runtime-<id> -n workspace-system
Canvas Stuck Loading
Symptom: Canvas Runtime or Runtime screen shows a perpetual loading spinner.
Possible causes:
- WebSocket connection failure (timeout too short)
- Cross-origin issue (CORS)
- Ingress does not handle wildcard subdomains correctly
- Database connection pool exhausted (after long runs)
Debugging:
# Check browser DevTools Network tab
# Look for failed WebSocket or XHR requests
# Docker mode: check runtime log
docker compose logs -f workspace-runtime
# Kubernetes mode: check corresponding workspace pod
kubectl logs workspace-runtime-<id> -n workspace-system
Ingress WebSocket settings:
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
Workspace Browser (WebRTC) Can't Connect
Symptom: Clicking the browser tab shows a black screen or fails to connect.
Possible causes:
- CoTURN not enabled or IP misconfigured
- UDP port 52000 blocked by firewall
- NAT 1:1 mapping misconfigured
Debugging:
# Docker mode
docker compose logs workspace-browser
# Verify UDP port is reachable
nc -u -z localhost 52000
# Kubernetes mode
kubectl get svc -l app=coturn -A
kubectl logs -n aileron deployment/aileron-coturn
Kubernetes CoTURN host:
coturn:
# Docker Desktop K8s uses node IP
host: "192.168.65.3"
# Production: use actual public IP
# host: "203.0.113.10"
Claude Code Doesn't Respond
Symptom: Chat Panel shows no reply after sending a message, or shows an authentication error.
Debugging:
# Verify Anthropic API token is set
docker compose exec workspace-runtime env | grep ANTHROPIC
# Check Claude-related errors in runtime log
docker compose logs workspace-runtime | grep -i claude
Common errors:
| Message | Cause | Fix |
|---|---|---|
Unauthorized / 401 | Invalid or expired token | Update ANTHROPIC_AUTH_TOKEN |
Model not found | Wrong model name | Use a supported Claude model ID |
Rate limit exceeded | API quota exceeded | Wait or upgrade API quota |
Database Issues
PostgreSQL Directory Permission Error
Symptom: ./data/postgres permission issues prevent the container from starting.
Fix:
# macOS / Linux
sudo chown -R 999:999 ./data/postgres
# Or full cleanup and restart
python scripts/dev/docker/ops.py cleanup
python scripts/dev/docker/ops.py up --build
Connection Pool Exhausted After Long Runs
Symptom: After running for a long time, Manager/Runtime shows QueuePool limit or httpx ENOENT errors.
Cause: Zombie connections in the database connection pool are not reclaimed.
Fix:
- Restart the affected services:
docker compose restart workspace-manager workspace-runtime - Check connection pool settings (
pool_recycle) - This is a known issue that is being improved.
Network Issues
Services Can't Communicate (Docker mode)
Symptom: workspace-manager can't reach workspace-runtime, or vice versa.
Debugging:
# 1. Verify both containers are on the same network
docker network inspect aileron-network-dev
# 2. Ping from manager to runtime
docker compose exec workspace-manager ping -c 3 workspace-runtime
# 3. Check DNS resolution
docker compose exec workspace-manager nslookup workspace-runtime
CORS Errors
Symptom: Browser console shows Access-Control-Allow-Origin errors.
Possible causes:
- Keycloak Web Origins not set
- Manager API CORS allowlist does not include the frontend domain
- In Kubernetes mode,
PUBLIC_ALLOWED_ORIGINSmisconfigured
Fix:
# Kubernetes: inspect platform-config
kubectl get configmap -n aileron \
aileron-platform-config -o yaml
# Confirm PUBLIC_ALLOWED_ORIGINS includes the frontend domain
Kubernetes Ingress 502 Bad Gateway
Possible causes:
- Target Service is not ready
- Service selector mismatch
- Incorrect Ingress path
Debugging:
# Verify Service endpoints exist
kubectl get endpoints -n aileron
# Confirm pods are Ready
kubectl get pods -n aileron
# Check ingress-controller logs
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
Performance Issues
Container OOMKilled
Symptom: kubectl get pods or docker compose ps shows the container repeatedly restarting with OOMKilled in events.
Fix:
Docker mode — edit docker-compose.yml:
workspace-browser:
deploy:
resources:
limits:
memory: 4G # Increase
Kubernetes mode — adjust values.yaml:
workspaceManager:
resources:
limits:
memory: 1Gi # Increase
PostgreSQL Queries Getting Slower
Debugging:
# Connect to DB
docker compose exec postgres psql -U postgres -d aileron
# Check slow queries
SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
# Check VACUUM status
SELECT relname, last_vacuum, last_autovacuum
FROM pg_stat_user_tables;
Useful Debugging Commands
Docker Mode
# Enter container shell
docker compose exec workspace-manager bash
docker compose exec workspace-runtime bash
# Tail logs for all services
docker compose logs -f
# Rebuild and start a specific service
docker compose up -d --build workspace-manager
# Inspect network
docker network inspect aileron-network-dev
# List all related containers (including dynamic workspaces)
docker ps -a --filter "name=aileron" --filter "name=workspace-"
# Prune unused volumes
docker volume prune
Kubernetes Mode
# Enter pod shell
kubectl exec -it -n aileron deployment/aileron-workspace-manager -- bash
# View all resources
kubectl get all -n aileron
# Watch pod events
kubectl get events -n aileron --sort-by='.lastTimestamp'
# Tail logs for multiple pods
kubectl logs -f -l app.kubernetes.io/name=workspace-manager -n aileron
# Port-forward for local access
kubectl port-forward -n aileron svc/aileron-workspace-manager 3001:3001
# Check CRDs
kubectl get workspaces -A
kubectl describe workspace <name> -n workspace-system
# Restart a deployment
kubectl rollout restart deployment/aileron-workspace-manager -n aileron
# View all ConfigMaps
kubectl get configmap -n aileron
kubectl describe configmap aileron-platform-config -n aileron
Getting Help
If none of the above resolves your issue:
- Gather relevant logs (
docker compose logsorkubectl logs) - Include docker compose version or Helm values
- Describe reproduction steps
- File an issue on GitHub Issues or the relevant community channel