Skip to main content

Scaling Reads with PostgreSQL Read Replicas on FoundryDB

· 6 min read
FoundryDB Team
Engineering @ FoundryDB

Most PostgreSQL workloads are read-heavy. Dashboards, reporting queries, search, and API endpoints that serve cached-but-still-queried data all put pressure on a single primary. Vertical scaling helps, but at some point you are paying for CPU that only writes use while reads compete for the same connections.

Read replicas solve this by giving you additional PostgreSQL nodes that serve read queries from a continuously updated copy of your data. On FoundryDB, adding a replica takes one API call and a few minutes of provisioning time.

When You Need Read Replicas

A single-node PostgreSQL service handles a surprising amount of traffic. But three patterns signal that it is time to add replicas:

  • Read-heavy APIs. Your application runs 10x more SELECTs than INSERTs/UPDATEs. The primary spends most of its CPU serving reads instead of processing writes.
  • Analytics and reporting. Long-running aggregation queries compete with transactional queries for connections and buffer cache. Running analytics on a replica keeps the primary fast.
  • Geographic distribution. Replicas in different zones reduce read latency for users closer to those regions.

A good rule of thumb: if your primary CPU is consistently above 70% and most of the load comes from SELECT queries, read replicas will help more than scaling up compute.

Adding a Replica

Add a replica to an existing PostgreSQL service with a single request:

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/nodes \
-H "Content-Type: application/json" \
-d '{"role": "replica"}'

Or use the CLI:

fdb services add-replica --service my-postgres --zone eu-helsinki

FoundryDB provisions a new VM, streams the current data from the primary using pg_basebackup, and starts streaming replication. For a 50 GB database on maxiops storage, this typically completes in under 5 minutes.

Once the replica is ready, list your nodes to get the replica's hostname:

curl -u admin:password \
https://api.foundrydb.com/managed-services/{id}/nodes
{
"nodes": [
{
"id": "node_abc",
"role": "primary",
"hostname": "my-postgres-primary.db.foundrydb.com",
"status": "running"
},
{
"id": "node_def",
"role": "replica",
"hostname": "my-postgres-replica-1.db.foundrydb.com",
"status": "running"
}
]
}

Each replica gets its own DNS hostname. You can connect to it directly for read queries using the same credentials and TLS settings as the primary.

How Streaming Replication Works

PostgreSQL streaming replication ships WAL (Write-Ahead Log) records from the primary to each replica in near-real-time. Every change (INSERT, UPDATE, DELETE, DDL) is first written to WAL on the primary, then streamed over a persistent TCP connection to the replica, which replays the changes against its own data files.

By default, FoundryDB uses asynchronous replication. The primary does not wait for replicas to confirm receipt before committing a transaction. This means writes are never slowed down by replica count, but there is a small window (typically under 10 milliseconds) where a replica may be slightly behind.

For workloads that require zero-lag reads (financial reporting, audit trails), you can switch to synchronous replication via the configuration API. This guarantees that at least one replica has received and flushed every committed transaction, at the cost of higher write latency.

curl -u admin:password -X PATCH \
https://api.foundrydb.com/managed-services/{id}/configuration \
-H "Content-Type: application/json" \
-d '{"parameters": {"synchronous_commit": "remote_write"}}'

Most applications should stay with asynchronous replication. The sub-second lag is invisible to end users.

Connection Routing with PgBouncer

Adding replicas only helps if your application actually sends reads to them. There are two approaches: application-level routing and connection pooling with PgBouncer.

Application-level routing is the simplest. Use the primary hostname for writes and the replica hostname for reads in your application code:

import psycopg

# Write connection (primary)
write_conn = psycopg.connect(
"host=my-postgres-primary.db.foundrydb.com dbname=defaultdb sslmode=verify-full",
user="app_user", password="secret"
)

# Read connection (replica)
read_conn = psycopg.connect(
"host=my-postgres-replica-1.db.foundrydb.com dbname=defaultdb sslmode=verify-full",
user="app_user", password="secret"
)

PgBouncer adds connection pooling on top. Enable it on your service to reduce connection overhead, especially when you have many short-lived application instances:

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/pooler \
-H "Content-Type: application/json" \
-d '{"pool_size": 25, "pool_mode": "transaction"}'

PgBouncer runs on port 5433 on each node (primary and replicas). In transaction mode, connections are returned to the pool after each transaction completes, allowing 25 server-side connections to serve hundreds of application connections.

Monitoring Replication Lag

Replication lag is the most important metric for a replicated setup. It tells you how far behind a replica is relative to the primary.

Check lag for a specific node via the API:

curl -u admin:password \
https://api.foundrydb.com/managed-services/{id}/nodes/{node_id}/replication-status
{
"role": "replica",
"lag_bytes": 1024,
"lag_seconds": 0.002,
"connected": true,
"sync_state": "streaming"
}

For continuous monitoring, query the pg_replication_lag_seconds metric:

curl -u admin:password \
"https://api.foundrydb.com/managed-services/{id}/metrics?metric=pg_replication_lag_seconds&period=1h"

Set an alert to catch lag spikes before they affect your application:

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/alerts/rules \
-H "Content-Type: application/json" \
-d '{
"metric": "replication_lag_seconds",
"condition": "gt",
"threshold": 30,
"severity": "warning",
"notification_channel_id": "channel_abc"
}'

A threshold of 30 seconds is reasonable for most workloads. If your application requires stronger consistency guarantees on reads, tighten this to 5 or 10 seconds.

Failover: What Happens When the Primary Goes Down

With at least one replica, automatic failover is enabled. If the primary becomes unreachable, the system:

  1. Detects the failure within approximately 30 seconds.
  2. Promotes the replica with the lowest replication lag to primary.
  3. Updates DNS so the primary hostname points to the new primary.
  4. Demotes the old primary to replica status once it recovers.

Your application reconnects through DNS with no code changes needed. Design your connection logic with retry behavior (most PostgreSQL drivers handle this by default) and you will see only a brief interruption during the switchover.

You can also trigger a manual failover for planned maintenance:

curl -u admin:password -X POST \
https://api.foundrydb.com/managed-services/{id}/nodes/{replica_id}/failover

This performs a clean promotion with zero data loss, since the replica is fully caught up before the switch happens.

Removing Replicas

Scale down when the extra capacity is no longer needed:

curl -u admin:password -X DELETE \
https://api.foundrydb.com/managed-services/{id}/nodes/{node_id}

The replica VM is deprovisioned and the node is removed from the service. If you remove your last replica, automatic failover is disabled, so keep at least one replica for any production service.

Summary

SetupNodesFailoverBest for
Single node1NoneDevelopment, staging
Primary + 1 replica2AutomaticProduction minimum
Primary + 2 replicas3Automatic, read scalingHigh-traffic production

Read replicas give you horizontal read scaling, automatic failover, and analytics isolation without changing your application's write path. Start with one replica for failover protection, add more as your read traffic grows.

Read the full PostgreSQL documentation for extensions, configuration tuning, and point-in-time recovery. The High Availability guide covers failover behavior across all supported database engines.