High Availability

Single Node vs Multi-Node

By default, services start as a single-node primary. For production, add at least one replica.

Setup	Nodes	Uptime	Use case
Single node	1	No automatic failover	Development
Primary + 1 replica	2	Automatic failover	Production minimum
Primary + 2 replicas	3	Automatic failover, read scaling	High-traffic production

Adding a Replica

curl -u admin:password -X POST \
  https://api.foundrydb.com/managed-services/{id}/nodes \
  -H "Content-Type: application/json" \
  -d '{"role": "replica"}'

The replica is provisioned, synced from the primary, and begins streaming replication. This takes a few minutes for small services, longer for large databases.

Replication Status

curl -u admin:password \
  https://api.foundrydb.com/managed-services/{id}/nodes/{node_id}/replication-status

{
  "role": "replica",
  "lag_bytes": 0,
  "lag_seconds": 0.002,
  "connected": true,
  "sync_state": "streaming"
}

lag_bytes and lag_seconds tell you how far behind the replica is. For most workloads, this should be under 1 second.

Automatic Failover

When a primary becomes unreachable, the system:

Detects the failure (within ~30 seconds)
Promotes the replica with the lowest replication lag
Updates DNS to point to the new primary
Marks the old primary as unavailable

Your application reconnects via DNS — no code changes needed. Design your app to handle brief connection interruptions with retries.

Manual Failover

Trigger a controlled failover (e.g. before maintenance):

curl -u admin:password -X POST \
  https://api.foundrydb.com/managed-services/{id}/nodes/{replica_id}/failover

The replica is promoted, the DNS record updates, and the old primary becomes a replica.

Replication Lag Alerts

Set an alert to notify you if replication lag exceeds a threshold:

curl -u admin:password -X POST \
  https://api.foundrydb.com/managed-services/{id}/alerts/rules \
  -H "Content-Type: application/json" \
  -d '{
    "metric": "replication_lag_seconds",
    "condition": "gt",
    "threshold": 30,
    "severity": "warning",
    "notification_channel_id": "channel_abc"
  }'

Engine-Specific Notes

PostgreSQL

Streaming replication — synchronous or asynchronous (default: asynchronous)
Optionally use PgBouncer for connection pooling across failover

MySQL

GTID-based replication — position-free, reliable failover
ProxySQL (if enabled) automatically reroutes after failover

MongoDB

Built-in replica set consensus — primary election happens automatically
Drivers with replica set awareness reconnect transparently

Valkey

Sentinel monitors primary/replica and triggers promotion
Configure your client with retry logic for seamless failover

Kafka

KRaft consensus — no ZooKeeper dependency
Partition leadership rebalances automatically when a broker fails

Single Node vs Multi-Node​

Adding a Replica​

Replication Status​

Automatic Failover​

Manual Failover​

Replication Lag Alerts​

Engine-Specific Notes​

PostgreSQL​

MySQL​

MongoDB​

Valkey​

Kafka​