Skip to main content

Predictive Autoscaling: Scale Your Database Before Demand Spikes

· 7 min read
FoundryDB Team
Engineering @ FoundryDB

Reactive autoscaling has a fundamental problem: it waits for something to go wrong. Your database hits 95% CPU, the autoscaler wakes up, requests a resize, and for the next few minutes your application eats latency while the new resources come online. If your traffic is predictable (and most production traffic is), this delay is avoidable.

FoundryDB's predictive autoscaling engine learns your workload's seasonal patterns and scales your database before demand spikes arrive. It combines real-time metric thresholds with historical baselines, anomaly detection, and configurable cost limits so you stay fast without overspending.

How It Works

The autoscaling system operates at two levels: a reactive policy that you configure per service via the API, and a predictive engine that runs continuously across all services.

Reactive Autoscale Policies

Every FoundryDB service supports a metric-based autoscale policy. You define which metrics to watch, what thresholds trigger a scale-up or scale-down, and the boundaries the autoscaler must stay within.

The four supported metrics are:

MetricWhat it tracks
cpu_percentCPU utilization across the primary node
memory_percentMemory pressure from buffers, caches, and active queries
connections_percentPercentage of max connections in use
disk_percentStorage utilization on the data volume

Each metric has an independent threshold_up (trigger scale-up), threshold_down (trigger scale-down), and duration_seconds (how long the metric must sustain that level before acting). This prevents transient spikes from causing unnecessary tier changes.

Predictive Engine

The predictive layer builds a seasonal baseline from your workload history. It looks at the same hour of the same day of the week over the past 7 days to compute a mean and standard deviation for CPU utilization. When current CPU deviates significantly from this baseline (measured by z-score), the engine acts preemptively rather than waiting for a hard threshold breach.

Three triggers drive predictive decisions:

  • Anomaly spike. A z-score above 2.5 combined with CPU already above 75% triggers an aggressive scale-up of 2 tiers. This handles sudden, unexpected load that the seasonal model didn't predict.
  • Sustained growth. Three consecutive metric windows (each 15 minutes) all above the scale-up threshold triggers a 1-tier increase. This catches gradual load increases like organic traffic growth.
  • Sustained low. Seven consecutive windows all below 30% CPU triggers a 1-tier decrease. The higher bar for scale-down (7 vs 3 windows) prevents premature downsizing during temporary lulls.

Every decision (including no-ops) is recorded to an audit table for debugging, reporting, and cooldown enforcement.

Configuring an Autoscale Policy

Set up a CPU-based autoscale policy using the REST API:

curl -u user:password -X PUT \
https://api.foundrydb.com/managed-services/{id}/autoscale-policy \
-H "Content-Type: application/json" \
-d '{
"enabled": true,
"metrics": [
{
"metric": "cpu_percent",
"threshold_up": 80,
"threshold_down": 30,
"duration_seconds": 300
},
{
"metric": "memory_percent",
"threshold_up": 85,
"threshold_down": 40,
"duration_seconds": 300
}
],
"min_plan": "tier-2",
"max_plan": "tier-8",
"cooldown_seconds": 300
}'

This tells the autoscaler: scale up when CPU exceeds 80% for 5 minutes, scale down when it stays below 30% for 5 minutes, and never go below tier-2 (2 vCPU, 4 GB) or above tier-8 (8 vCPU, 16 GB). After any scaling action, wait at least 5 minutes before making another decision.

The min_plan and max_plan boundaries are your cost controls. Setting max_plan to tier-8 means your bill will never exceed the hourly rate for that tier, regardless of what the autoscaler recommends.

Storage Autoscaling

Storage autoscaling works independently from compute autoscaling. Unlike compute, storage can only grow (you cannot shrink a disk without risking data loss). Configure it alongside your compute policy:

curl -u user:password -X PUT \
https://api.foundrydb.com/managed-services/{id}/autoscale-policy \
-H "Content-Type: application/json" \
-d '{
"enabled": true,
"storage_auto_scale": {
"enabled": true,
"threshold_percent": 85,
"increment_gb": 50,
"max_size_gb": 1000
}
}'

When disk usage crosses 85%, the autoscaler adds 50 GB. It repeats as needed until the disk reaches the 1000 GB cap. The storage autoscaler checks every 5 minutes, with a 60-minute cooldown between expansions.

Default values if you enable storage autoscaling without specifying parameters:

ParameterDefault
threshold_percent80%
increment_gb10 GB
max_size_gb500 GB
cooldown_minutes60

Scaling History and Audit Trail

Every scaling operation records who triggered it: user (manual), auto_scale (reactive policy), or system (predictive engine). You can query your service's scaling history to see what happened, when, and why:

curl -u user:password \
https://api.foundrydb.com/managed-services/{id}/autoscale-policy

The response includes last_scale_at so you can see when the most recent action occurred. The predictive engine also maintains a separate audit table that logs the z-score, seasonal baseline, current CPU average, and confidence level for every decision, including decisions where it evaluated your service and chose not to scale.

Reactive vs. Predictive: When Each Fires

The two systems complement each other. Here is how they divide responsibility:

ScenarioReactivePredictive
Monday morning traffic ramp-upFires after CPU crosses 80%Fires before, based on last Monday's pattern
Unexpected viral traffic spikeFires after 5 min sustained thresholdFires immediately via anomaly detection (z-score > 2.5)
Gradual organic growth over weeksFires when thresholds breachFires on sustained growth (3 consecutive high windows)
Weekend low trafficScales down after 5 min below thresholdScales down after 7 consecutive low windows (~105 min)
One-off batch job spikeMay fire if sustained > duration_secondsIgnores if within seasonal norms

The predictive engine is deliberately conservative. It requires a z-score above 2.5 for anomaly spikes (roughly 99th percentile deviation), 3 sustained high windows for growth, and 7 sustained low windows for scale-down. A 6-hour cooldown prevents thrashing from rapid successive decisions.

Dry-Run Mode

Before trusting the autoscaler with production changes, enable dry-run mode. In this mode, the predictive engine evaluates your workload and records all decisions to the audit log, but does not execute any tier changes. Review the decisions over a few days to verify the engine's judgment before going live.

Disabling Autoscaling

Remove the autoscale policy entirely:

curl -u user:password -X DELETE \
https://api.foundrydb.com/managed-services/{id}/autoscale-policy

The response confirms the timestamp when autoscaling was disabled. All existing resources remain at their current tier and size.

Best Practices

Start with reactive, then layer predictive. Set conservative thresholds (80% up, 30% down) and monitor scaling events for a week. Once you trust the behavior, the predictive engine adds the look-ahead advantage.

Set meaningful plan boundaries. Your min_plan should be the smallest tier that handles your baseline traffic. Your max_plan should be the largest tier your budget allows. The gap between them is the autoscaler's operating range.

Use duration to filter noise. A duration_seconds of 300 (5 minutes) prevents one-off query spikes from triggering a resize. Reduce it to 60 only if your application is genuinely latency-sensitive to brief spikes.

Monitor storage thresholds proactively. Unlike compute, storage cannot scale down. Set max_size_gb to a value you are comfortable paying for indefinitely, and monitor disk growth trends in your metrics dashboard.

Review the audit log. The predictive scaling decisions table records every evaluation. If the engine is making decisions you disagree with, adjust the min_plan/max_plan boundaries or the reactive thresholds.

What's Next