High Availability
Single Node vs Multi-Node
By default, services start as a single-node primary. For production, add at least one replica.
| Setup | Nodes | Uptime | Use case |
|---|---|---|---|
| Single node | 1 | No automatic failover | Development |
| Primary + 1 replica | 2 | Automatic failover | Production minimum |
| Primary + 2 replicas | 3 | Automatic failover, read scaling | High-traffic production |
Adding a Replica
curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/nodes \
-H "Content-Type: application/json" \
-d '{"role": "replica"}'
The replica is provisioned, synced from the primary, and begins streaming replication. This takes a few minutes for small services, longer for large databases.
Replication Status
curl -u admin:password \
https://api.foundrydb.com/managed-services/{id}/nodes/{node_id}/replication-status
{
"role": "replica",
"lag_bytes": 0,
"lag_seconds": 0.002,
"connected": true,
"sync_state": "streaming"
}
lag_bytes and lag_seconds tell you how far behind the replica is. For most workloads, this should be under 1 second.
Automatic Failover
When a primary becomes unreachable, the system:
- Detects the failure (within ~30 seconds)
- Promotes the replica with the lowest replication lag
- Updates DNS to point to the new primary
- Marks the old primary as unavailable
Your application reconnects via DNS — no code changes needed. Design your app to handle brief connection interruptions with retries.
Manual Failover
Trigger a controlled failover (e.g. before maintenance):
curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/nodes/{replica_id}/failover
The replica is promoted, the DNS record updates, and the old primary becomes a replica.
Replication Lag Alerts
Set an alert to notify you if replication lag exceeds a threshold:
curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/alerts/rules \
-H "Content-Type: application/json" \
-d '{
"metric": "replication_lag_seconds",
"condition": "gt",
"threshold": 30,
"severity": "warning",
"notification_channel_id": "channel_abc"
}'
Engine-Specific Notes
PostgreSQL
- Streaming replication — synchronous or asynchronous (default: asynchronous)
- Optionally use PgBouncer for connection pooling across failover
MySQL
- GTID-based replication — position-free, reliable failover
- ProxySQL (if enabled) automatically reroutes after failover
MongoDB
- Built-in replica set consensus — primary election happens automatically
- Drivers with replica set awareness reconnect transparently
Valkey
- Sentinel monitors primary/replica and triggers promotion
- Configure your client with retry logic for seamless failover
Kafka
- KRaft consensus — no ZooKeeper dependency
- Partition leadership rebalances automatically when a broker fails