Data Pipelines

A data pipeline is a managed, first-class data flow between two services you already own. You point a source at a sink, the platform builds the network path, installs any required connectors, prepares the source database, and drives the pipeline to Running. You observe and delete it via the API.

What a data pipeline is

Data pipelines connect existing managed services. The connection itself — the private network peering, the connector plugin, the replication configuration, the connector process — is a resource the platform owns and reconciles. If the source primary fails over, the pipeline re-points itself and restarts. If the controller restarts mid-provisioning, it resumes from where it left off.

This is distinct from pipeline templates, which deploy pre-configured groups of new services (such as a RAG pipeline with Kafka + PostgreSQL + Valkey). Data pipelines connect services that already exist in your organization.

Available pipeline types

Type	Source	Sink	Mechanism
`cdc_pg_to_kafka`	PostgreSQL	Kafka + Kafka Connect addon	Debezium PostgreSQL connector via logical replication

Additional pipeline types (Kafka to OpenSearch, PostgreSQL to pgvector) are planned. Today's surface is PostgreSQL CDC into Kafka.

How it works: PostgreSQL CDC to Kafka

When you create a cdc_pg_to_kafka pipeline, a reconciler walks it through five provisioning steps:

Pending
  └── validate source (PostgreSQL, Running) and sink (Kafka + Connect, Running)
Provisioning
  ├── ensure_network    — bidirectional SDN peering between source and sink subnets
  ├── install_plugin    — Debezium PostgreSQL connector installed on the Kafka Connect worker
  ├── prepare_source    — REPLICATION granted to the connector user; sink subnet admitted in pg_hba.conf
  ├── create_publication — platform-owned logical replication publication on the source
  └── create_connector  — Debezium connector created on the Connect cluster; waits for RUNNING
Running
  └── health checked every 60 seconds; connector re-pointed after source failover

Every step is idempotent. A controller restart or transient network failure resumes from the last completed step without creating duplicate resources.

Network path

Pipeline data traffic travels over the services' private SDN subnets, never the shared utility network. The platform creates a bidirectional peering between the source and sink SDN networks. Nothing is opened on the public internet to make the pipeline work.

Source and sink must be in the same UpCloud peering region. Cross-region pipelines are not supported in Phase 1.

Lifecycle states

Status	Meaning
`Pending`	Pipeline accepted; reconciler validating source and sink
`Provisioning`	Reconciler working through provisioning steps; see `provision_step`
`Running`	Connector is up and streaming
`Paused`	Connector paused (e.g., manual pause via Kafka Connect API)
`Failed`	A provisioning step or connector reached a terminal error; see `error_message`
`Deleting`	Deletion in progress

The provision_step field narrows Provisioning to one of: ensure_network, install_plugin, prepare_source, create_publication, create_connector.

Teardown

Deleting a pipeline reverses provisioning in order: the connector is deleted first (releasing the replication slot), then the replication slot is dropped, then the publication, then the pg_hba entry added by the pipeline. The source returns to exactly the state it was in before.

The cross-service SDN peering is intentionally kept. Other pipelines between the same two services may share it, and an idle peering carries no traffic. The peering is removed when the underlying service is deleted.

Delivery guarantee

CDC pipelines use logical replication. The delivery guarantee across normal operation is at-least-once per Kafka offset: Debezium commits offsets only after Kafka acknowledges receipt.

Across a source failover, the guarantee is at-least-once relative to what the new primary retained. Logical replication slots are not transferred to promoted replicas (a PostgreSQL limitation before version 17 failover slots). After failover, the reconciler re-points the connector at the new primary's private IP, refreshes credentials, and restarts it. Debezium recreates the slot and resumes from its committed Kafka offsets. Events already written to Kafka are not lost. Events committed on the old primary after the slot position but before the failover may be re-delivered or, in rare lossy failovers, missed.

Design downstream consumers to be idempotent.

Difference from pipeline templates

	Data pipelines	Pipeline templates
Purpose	Wire existing services together	Deploy a new multi-service topology
Input	Source service ID + sink service ID	Template name + zone + plan
Output	A managed connector between your services	Several new managed services
API	`POST /organizations/{orgId}/pipelines`	`POST /pipelines`
Docs	This section	Pipeline Templates

Next steps

Set up a PostgreSQL to Kafka CDC pipeline — step-by-step how-to
Data Pipelines API reference — full schema and endpoint list
Pipeline Templates — deploy a pre-configured multi-service topology

What a data pipeline is​

Available pipeline types​

How it works: PostgreSQL CDC to Kafka​

Network path​

Lifecycle states​

Teardown​

Delivery guarantee​

Difference from pipeline templates​

Next steps​