Skip to main content
PercherPercher

Multi-instance & auto-scaling

resources.instances and [resources.autoscale]

Run multiple containers behind a single Caddy load balancer with active health checks. Two ways to configure it in percher.toml:

Static fan-out

[resources]
instances = 2   # runs 2 containers, load-balanced

Plan caps: free=1, starter=1, maker=2, pro=4. Exceeding the cap is silently clamped with a note in the build log.

CPU-based autoscaling

[resources.autoscale]
min = 1
max = 4

Percher samples CPU every 30s, evaluates every 50s, and scales by ±1 when sustained average crosses the threshold (default: scale up at >80% for 2 min, scale down at <20% for 10 min). Scaling up is conservative — one instance per action — and a cooldown prevents thrashing.

Active Caddy health checks route traffic around any unhealthy instance. A single-instance crash in a multi-instance deploy stays at severity warning (the app continues serving); an app only becomes crashed when all instances are down simultaneously. Per-deploy scale history lives on the deploy detail card in the dashboard.

← PrevZero-downtime deploysNext →Cost optimization insights
Percher — AI-native app hosting