Data Pipelines Are Live: Stream PostgreSQL Into Kafka With One API Call

June 7, 2026 · 6 min read

Engineering @ FoundryDB

Most managed database products rent you a box. You provision a Postgres, you provision a Kafka, and then the interesting part, the part where data actually flows from one to the other, is handed back to you with a shrug. You stand up Kafka Connect, you hunt for the right Debezium build, you hand-edit pg_hba.conf, you grant REPLICATION, you author a publication, you guess at a slot name, and you pray the two services can even reach each other on the network. That is a runbook, not a product. And until today, it was yours to own.

Not anymore. FoundryDB data pipelines are live. The connection between two services you already own is now a first-class resource you can create, watch, and tear down with a single call. The first pipeline type ships today: change data capture from a PostgreSQL source straight into a Kafka sink. One POST, and the platform does all the plumbing, end to end, idempotently, with nothing exposed to the public internet to make it happen.

PostgreSQL → Kafka CDC (Debezium)

STEADY streaming · source_lag=0 B

PostgreSQLpublication/WAL⇢ SDNDebeziumKafka Connectenvelope →Kafka topicsshop.public.*partitions →Consumergroup

PostgreSQL :5432 (source)Debezium · Kafka Connectshop.public.ordersshop.public.customersConsumer groupprivate SDN peering (dashed)

What you can build now

Insert a row into orders on your Postgres service, and it lands on a Kafka topic seconds later. That is the whole promise, and it unlocks the architectures developers reach for constantly: event sourcing off your operational tables, real-time materialized views, search indexes that never drift, audit logs that write themselves, fan-out to any downstream consumer that speaks Kafka. The pattern everyone hand-builds is now a line in your infrastructure, not a weekend.

You describe the flow, not the machinery. Source service, sink service, the tables you care about, a topic prefix. The platform owns everything underneath:

curl -u "$USER:$PASS" -X POST \
  https://api.foundrydb.com/organizations/$ORG/pipelines \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "orders-to-kafka",
    "pipeline_type": "cdc_pg_to_kafka",
    "source_service_id": "'$PG_ID'",
    "sink_service_id": "'$KAFKA_ID'",
    "config": { "tables": ["public.orders"], "topic_prefix": "shop" }
  }'

No Connect cluster to babysit. No connector JSON to author. No network rules to open by hand. The events show up on shop.public.orders, and you move on to building the thing that consumes them.

Wired by the platform, not by you

The reason this is one call and not a checklist is that the controller owns every step and drives it to completion on its own. When you create a pipeline, a reconciler walks it through a small state machine and keeps going until it is Running:

The network path is built for you. The two services live on separate private subnets, so the platform creates a bidirectional peering between them and the connector reaches the source over its private IP. Pipeline data never touches the shared internal network, and nothing is exposed publicly to make the connection work.
The connector plugin is installed for you. The Debezium PostgreSQL connector is placed on the sink's Kafka Connect worker, which restarts to pick it up. Already present? It is a no-op.
The source is prepared for you. The platform grants the REPLICATION attribute to the pipeline's database user and admits the sink's private subnet in pg_hba.conf as a single tagged entry. PostgreSQL ships already configured with wal_level=logical, so logical decoding just works.
The publication and connector are created for you. The platform owns the publication, scoped to your table filter, in the right database. Debezium starts against the source's private IP, takes its initial snapshot, and switches to streaming.

You watch it happen through one endpoint, and the status tells you exactly where things stand: connector state, source lag in bytes, the topic prefix in play. No guesswork, no SSH, no log spelunking.

Teardown that leaves zero residue

Deleting a CDC setup by hand is where the ghosts pile up: a connector still clutching a replication slot, a slot quietly pinning WAL forever, a publication nobody remembers creating. Delete a FoundryDB pipeline and it reverses the build in order. It removes the connector so Debezium releases the slot, drops the replication slot (terminating any lingering backend first), drops the publication in the database that owns it, and removes the pg_hba entry it added. The source goes back to exactly the state it was in before, with nothing left behind. Cleanup is a feature here, not an afterthought.

Honest about failover

We would rather write down the hard parts than pretend they do not exist. A logical replication slot lives on the primary that created it, and a promoted replica does not inherit it. FoundryDB handles that: when the connector's host moves under it, the reconciler re-points the connector at the new primary's private IP, refreshes credentials, and restarts it. Debezium recreates the slot on the new primary and resumes from its committed Kafka offsets.

The delivery guarantee across a failover is at-least-once. One caveat worth stating plainly: rows committed on the old primary but not yet replicated when it died can be lost in a lossy failover, exactly as they would be for any consumer of that database. Events already in Kafka are never lost, and recent events may be re-delivered after recovery, so your downstream consumers should be idempotent. That is standard CDC practice, and it is the truth, so we put it in the open.

This is the first pipeline, not the last

CDC from Postgres to Kafka is where we started because it is the one everyone hand-builds. The shape generalizes, and that is the whole point of a data fabric: Kafka into OpenSearch for search indexing, Postgres into pgvector for embeddings, analytics sinks as the engine roster grows. The connections between your services become a product surface instead of your problem. You stop renting databases one at a time and start composing architectures.

To be clear and honest, those additional pipeline types are the direction we are heading, not buttons in your dashboard today. What is in your hands right now is real, live, and tested: CDC from PostgreSQL to Kafka, fully managed from network to connector.

Drive it from the dashboard's Data Flows page, the REST API, the Go SDK, or your AI assistant through the MCP server's pipeline tools. Pick your altitude, point a source at a sink, and watch your data start moving. Go build something with it.

What you can build now​

Wired by the platform, not by you​

Teardown that leaves zero residue​

Honest about failover​

This is the first pipeline, not the last​

What you can build now

Wired by the platform, not by you

Teardown that leaves zero residue

Honest about failover

This is the first pipeline, not the last