Embedding Pipelines
An embedding pipeline monitors a source table on your PostgreSQL service, calls an embedding model for new or updated rows, and writes the resulting vectors into a target table. The target table is created and maintained by the agent; you do not need to manage pgvector schema setup by hand.
Embedding pipelines are PostgreSQL-only. The service must be in Running status.
Pipeline Modes
| Mode | How it works |
|---|---|
continuous | The agent runs a background poller that checks for new rows every poll_interval_seconds seconds (default 30). New rows are processed as they arrive. |
scheduled | Runs on a 5-field cron expression. Pending rows are processed in a single batch job at each scheduled time. |
manual | No automatic runs. Trigger each run explicitly via the API. |
Endpoints
GET /managed-services/{id}/embedding-pipelines
POST /managed-services/{id}/embedding-pipelines
GET /managed-services/{id}/embedding-pipelines/{pid}
PATCH /managed-services/{id}/embedding-pipelines/{pid}
DELETE /managed-services/{id}/embedding-pipelines/{pid}
POST /managed-services/{id}/embedding-pipelines/{pid}/pause
POST /managed-services/{id}/embedding-pipelines/{pid}/resume
POST /managed-services/{id}/embedding-pipelines/{pid}/runs
GET /managed-services/{id}/embedding-pipelines/{pid}/runs
GET /managed-services/{id}/embedding-pipelines/{pid}/runs/{rid}
Create a Pipeline
curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines \
-H "Content-Type: application/json" \
-d '{
"source_table": "articles",
"text_columns": ["title", "body"],
"model_provider": "openai",
"embedding_model": "text-embedding-3-small",
"model_dimensions": 1536,
"provider_api_key": "sk-...",
"database_name": "defaultdb",
"target_table": "articles_embeddings",
"batch_size": 100,
"mode": "continuous",
"poll_interval_seconds": 30
}'
Required Fields
| Field | Description |
|---|---|
source_table | Table to monitor for new or updated rows. |
text_columns | One or more columns concatenated and sent to the model. |
model_provider | openai, cohere, or custom. |
embedding_model | Model identifier, e.g. text-embedding-3-small. |
model_dimensions | Output vector dimensionality, e.g. 1536. |
provider_api_key | Provider API key stored encrypted. |
Optional Fields
| Field | Default | Description |
|---|---|---|
database_name | defaultdb | Database containing the source table. |
target_table | {source_table}_embeddings | Where embeddings are written. Created by the agent if it does not exist. |
batch_size | 100 | Rows per embedding API call. |
poll_interval_seconds | 30 | Seconds between polls (continuous mode only). |
mode | continuous | continuous, scheduled, or manual. |
schedule_cron | — | Required when mode is scheduled. Standard 5-field cron, e.g. "0 3 * * *". |
source_filter | — | Restricted SQL WHERE fragment applied to the pending-row scan: AND (<filter>). Max 500 characters. Semicolons, SQL comments, and DML keywords are rejected. Example: "published = true". |
max_row_retries | 3 | Per-row retry attempts when a batch fails. Range 0-10. |
Response
The pipeline is created in configuring status while the agent sets up the target table schema and any required triggers. It transitions to active once configuration is complete.
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"service_id": "...",
"source_table": "articles",
"text_columns": ["title", "body"],
"model_provider": "openai",
"embedding_model": "text-embedding-3-small",
"model_dimensions": 1536,
"target_table": "articles_embeddings",
"mode": "continuous",
"poll_interval_seconds": 30,
"batch_size": 100,
"status": "configuring",
"rows_processed": 0,
"rows_pending": 0,
"tokens_used": 0,
"created_at": "2026-06-22T10:00:00Z",
"updated_at": "2026-06-22T10:00:00Z"
}
Monitor a Pipeline
curl -u $USER:$PASS \
https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID
Key fields to watch:
| Field | Meaning |
|---|---|
status | pending, configuring, active, paused, failed, or deleting. |
rows_processed | Total rows embedded since creation. |
rows_pending | Rows waiting to be embedded. |
tokens_used | Cumulative tokens consumed from the model provider. |
last_processed_at | Timestamp of the last successfully processed batch. |
error_message | Most recent error, null when healthy. |
next_run_at | Next scheduled run time (scheduled mode only). |
Pause and Resume
# Pause
curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/pause
# Resume
curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/resume
Resuming a continuous pipeline restarts the agent worker and processes any rows added while the pipeline was paused. Resuming a scheduled pipeline recomputes next_run_at from the current time.
Trigger a Manual Run
For scheduled and manual pipelines, trigger a run via the API:
curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/runs
At most one run can be queued or running per pipeline at a time. If a run is already active, the call returns 409 Conflict.
Inspect Run History
curl -u $USER:$PASS \
https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/runs
The response includes the 50 most recent runs, newest first. Each run record shows:
| Field | Description |
|---|---|
status | queued, running, succeeded, partial, failed, or canceled. |
trigger | schedule, manual, or api. |
rows_scanned | Source rows examined. |
rows_embedded | Rows successfully embedded. |
rows_failed | Rows that failed after all retries. |
tokens_used | Tokens consumed in this run. |
error_sample | Up to 20 per-row failures with source row ID and error message. |
A partial status means some rows succeeded and some failed. The error_sample array identifies which rows failed.
Update a Pipeline
curl -u $USER:$PASS \
-X PATCH https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID \
-H "Content-Type: application/json" \
-d '{
"batch_size": 50,
"source_filter": "published = true"
}'
The fields database_name, source_table, text_columns, model_provider, target_schema, and target_table cannot be changed after creation. Changing mode or schedule_cron re-bootstraps scheduling.
Delete a Pipeline
# Preserve the target table (default)
curl -u $USER:$PASS \
-X DELETE https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID
# Also drop the target embeddings table
curl -u $USER:$PASS \
-X DELETE "https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID?remove_data=true"
Deletion is asynchronous. The pipeline transitions to deleting status while the agent cleans up triggers and the background worker. The target table is preserved by default.
What's Next
- Vector Search — query the embeddings your pipeline produces, using
pipeline_idfor server-side text embedding at query time. - Inference Proxy — route embedding API calls through the FoundryDB proxy to avoid hard-coding provider keys.