Skip to main content

Embedding Pipelines

An embedding pipeline monitors a source table on your PostgreSQL service, calls an embedding model for new or updated rows, and writes the resulting vectors into a target table. The target table is created and maintained by the agent; you do not need to manage pgvector schema setup by hand.

Embedding pipelines are PostgreSQL-only. The service must be in Running status.

Pipeline Modes

ModeHow it works
continuousThe agent runs a background poller that checks for new rows every poll_interval_seconds seconds (default 30). New rows are processed as they arrive.
scheduledRuns on a 5-field cron expression. Pending rows are processed in a single batch job at each scheduled time.
manualNo automatic runs. Trigger each run explicitly via the API.

Endpoints

GET    /managed-services/{id}/embedding-pipelines
POST /managed-services/{id}/embedding-pipelines
GET /managed-services/{id}/embedding-pipelines/{pid}
PATCH /managed-services/{id}/embedding-pipelines/{pid}
DELETE /managed-services/{id}/embedding-pipelines/{pid}
POST /managed-services/{id}/embedding-pipelines/{pid}/pause
POST /managed-services/{id}/embedding-pipelines/{pid}/resume
POST /managed-services/{id}/embedding-pipelines/{pid}/runs
GET /managed-services/{id}/embedding-pipelines/{pid}/runs
GET /managed-services/{id}/embedding-pipelines/{pid}/runs/{rid}

Create a Pipeline

curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines \
-H "Content-Type: application/json" \
-d '{
"source_table": "articles",
"text_columns": ["title", "body"],
"model_provider": "openai",
"embedding_model": "text-embedding-3-small",
"model_dimensions": 1536,
"provider_api_key": "sk-...",
"database_name": "defaultdb",
"target_table": "articles_embeddings",
"batch_size": 100,
"mode": "continuous",
"poll_interval_seconds": 30
}'

Required Fields

FieldDescription
source_tableTable to monitor for new or updated rows.
text_columnsOne or more columns concatenated and sent to the model.
model_provideropenai, cohere, or custom.
embedding_modelModel identifier, e.g. text-embedding-3-small.
model_dimensionsOutput vector dimensionality, e.g. 1536.
provider_api_keyProvider API key stored encrypted.

Optional Fields

FieldDefaultDescription
database_namedefaultdbDatabase containing the source table.
target_table{source_table}_embeddingsWhere embeddings are written. Created by the agent if it does not exist.
batch_size100Rows per embedding API call.
poll_interval_seconds30Seconds between polls (continuous mode only).
modecontinuouscontinuous, scheduled, or manual.
schedule_cronRequired when mode is scheduled. Standard 5-field cron, e.g. "0 3 * * *".
source_filterRestricted SQL WHERE fragment applied to the pending-row scan: AND (<filter>). Max 500 characters. Semicolons, SQL comments, and DML keywords are rejected. Example: "published = true".
max_row_retries3Per-row retry attempts when a batch fails. Range 0-10.

Response

The pipeline is created in configuring status while the agent sets up the target table schema and any required triggers. It transitions to active once configuration is complete.

{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"service_id": "...",
"source_table": "articles",
"text_columns": ["title", "body"],
"model_provider": "openai",
"embedding_model": "text-embedding-3-small",
"model_dimensions": 1536,
"target_table": "articles_embeddings",
"mode": "continuous",
"poll_interval_seconds": 30,
"batch_size": 100,
"status": "configuring",
"rows_processed": 0,
"rows_pending": 0,
"tokens_used": 0,
"created_at": "2026-06-22T10:00:00Z",
"updated_at": "2026-06-22T10:00:00Z"
}

Monitor a Pipeline

curl -u $USER:$PASS \
https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID

Key fields to watch:

FieldMeaning
statuspending, configuring, active, paused, failed, or deleting.
rows_processedTotal rows embedded since creation.
rows_pendingRows waiting to be embedded.
tokens_usedCumulative tokens consumed from the model provider.
last_processed_atTimestamp of the last successfully processed batch.
error_messageMost recent error, null when healthy.
next_run_atNext scheduled run time (scheduled mode only).

Pause and Resume

# Pause
curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/pause

# Resume
curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/resume

Resuming a continuous pipeline restarts the agent worker and processes any rows added while the pipeline was paused. Resuming a scheduled pipeline recomputes next_run_at from the current time.

Trigger a Manual Run

For scheduled and manual pipelines, trigger a run via the API:

curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/runs

At most one run can be queued or running per pipeline at a time. If a run is already active, the call returns 409 Conflict.

Inspect Run History

curl -u $USER:$PASS \
https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/runs

The response includes the 50 most recent runs, newest first. Each run record shows:

FieldDescription
statusqueued, running, succeeded, partial, failed, or canceled.
triggerschedule, manual, or api.
rows_scannedSource rows examined.
rows_embeddedRows successfully embedded.
rows_failedRows that failed after all retries.
tokens_usedTokens consumed in this run.
error_sampleUp to 20 per-row failures with source row ID and error message.

A partial status means some rows succeeded and some failed. The error_sample array identifies which rows failed.

Update a Pipeline

curl -u $USER:$PASS \
-X PATCH https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID \
-H "Content-Type: application/json" \
-d '{
"batch_size": 50,
"source_filter": "published = true"
}'

The fields database_name, source_table, text_columns, model_provider, target_schema, and target_table cannot be changed after creation. Changing mode or schedule_cron re-bootstraps scheduling.

Delete a Pipeline

# Preserve the target table (default)
curl -u $USER:$PASS \
-X DELETE https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID

# Also drop the target embeddings table
curl -u $USER:$PASS \
-X DELETE "https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID?remove_data=true"

Deletion is asynchronous. The pipeline transitions to deleting status while the agent cleans up triggers and the background worker. The target table is preserved by default.

What's Next

  • Vector Search — query the embeddings your pipeline produces, using pipeline_id for server-side text embedding at query time.
  • Inference Proxy — route embedding API calls through the FoundryDB proxy to avoid hard-coding provider keys.