Skip to main content

High Availability

Single Node vs Multi-Node

By default, services start as a single-node primary. For production, add at least one replica.

SetupNodesUptimeUse case
Single node1No automatic failoverDevelopment
Primary + 1 replica2Automatic failoverProduction minimum
Primary + 2 replicas3Automatic failover, read scalingHigh-traffic production

Adding a Replica

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/nodes \
-H "Content-Type: application/json" \
-d '{"role": "replica"}'

The replica is provisioned, synced from the primary, and begins streaming replication. This takes a few minutes for small services, longer for large databases.

Replication Status

curl -u admin:password \
https://api.foundrydb.com/managed-services/{id}/nodes/{node_id}/replication-status
{
"role": "replica",
"lag_bytes": 0,
"lag_seconds": 0.002,
"connected": true,
"sync_state": "streaming"
}

lag_bytes and lag_seconds tell you how far behind the replica is. For most workloads, this should be under 1 second.

Automatic Failover

When a primary becomes unreachable, the system:

  1. Detects the failure (within ~30 seconds)
  2. Promotes the replica with the lowest replication lag
  3. Updates DNS to point to the new primary
  4. Marks the old primary as unavailable

Your application reconnects via DNS — no code changes needed. Design your app to handle brief connection interruptions with retries.

Manual Failover

Trigger a controlled failover (e.g. before maintenance):

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/nodes/{replica_id}/failover

The replica is promoted, the DNS record updates, and the old primary becomes a replica.

Replication Lag Alerts

Set an alert to notify you if replication lag exceeds a threshold:

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/alerts/rules \
-H "Content-Type: application/json" \
-d '{
"metric": "replication_lag_seconds",
"condition": "gt",
"threshold": 30,
"severity": "warning",
"notification_channel_id": "channel_abc"
}'

Engine-Specific Notes

PostgreSQL

  • Streaming replication — synchronous or asynchronous (default: asynchronous)
  • Optionally use PgBouncer for connection pooling across failover

MySQL

  • GTID-based replication — position-free, reliable failover
  • ProxySQL (if enabled) automatically reroutes after failover

MongoDB

  • Built-in replica set consensus — primary election happens automatically
  • Drivers with replica set awareness reconnect transparently

Valkey

  • Sentinel monitors primary/replica and triggers promotion
  • Configure your client with retry logic for seamless failover

Kafka

  • KRaft consensus — no ZooKeeper dependency
  • Partition leadership rebalances automatically when a broker fails