Embedding Pipelines

An embedding pipeline monitors a source table on your PostgreSQL service, calls an embedding model for new or updated rows, and writes the resulting vectors into a target table. The target table is created and maintained by the agent; you do not need to manage pgvector schema setup by hand.

Embedding pipelines are PostgreSQL-only. The service must be in Running status.

Pipeline Modes

Mode	How it works
`continuous`	The agent runs a background poller that checks for new rows every `poll_interval_seconds` seconds (default 30). New rows are processed as they arrive.
`scheduled`	Runs on a 5-field cron expression. Pending rows are processed in a single batch job at each scheduled time.
`manual`	No automatic runs. Trigger each run explicitly via the API.

Endpoints

GET    /managed-services/{id}/embedding-pipelines
POST   /managed-services/{id}/embedding-pipelines
GET    /managed-services/{id}/embedding-pipelines/{pid}
PATCH  /managed-services/{id}/embedding-pipelines/{pid}
DELETE /managed-services/{id}/embedding-pipelines/{pid}
POST   /managed-services/{id}/embedding-pipelines/{pid}/pause
POST   /managed-services/{id}/embedding-pipelines/{pid}/resume
POST   /managed-services/{id}/embedding-pipelines/{pid}/runs
GET    /managed-services/{id}/embedding-pipelines/{pid}/runs
GET    /managed-services/{id}/embedding-pipelines/{pid}/runs/{rid}

Create a Pipeline

curl -u $USER:$PASS \
  -X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines \
  -H "Content-Type: application/json" \
  -d '{
    "source_table": "articles",
    "text_columns": ["title", "body"],
    "model_provider": "openai",
    "embedding_model": "text-embedding-3-small",
    "model_dimensions": 1536,
    "provider_api_key": "sk-...",
    "database_name": "defaultdb",
    "target_table": "articles_embeddings",
    "batch_size": 100,
    "mode": "continuous",
    "poll_interval_seconds": 30
  }'

Required Fields

Field	Description
`source_table`	Table to monitor for new or updated rows.
`text_columns`	One or more columns concatenated and sent to the model.
`model_provider`	`openai`, `cohere`, or `custom`.
`embedding_model`	Model identifier, e.g. `text-embedding-3-small`.
`model_dimensions`	Output vector dimensionality, e.g. `1536`.
`provider_api_key`	Provider API key stored encrypted.

Optional Fields

Field	Default	Description
`database_name`	`defaultdb`	Database containing the source table.
`target_table`	`{source_table}_embeddings`	Where embeddings are written. Created by the agent if it does not exist.
`batch_size`	100	Rows per embedding API call.
`poll_interval_seconds`	30	Seconds between polls (continuous mode only).
`mode`	`continuous`	`continuous`, `scheduled`, or `manual`.
`schedule_cron`	—	Required when `mode` is `scheduled`. Standard 5-field cron, e.g. `"0 3 * * *"`.
`source_filter`	—	Restricted SQL WHERE fragment applied to the pending-row scan: `AND (<filter>)`. Max 500 characters. Semicolons, SQL comments, and DML keywords are rejected. Example: `"published = true"`.
`max_row_retries`	3	Per-row retry attempts when a batch fails. Range 0-10.

Response

The pipeline is created in configuring status while the agent sets up the target table schema and any required triggers. It transitions to active once configuration is complete.

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "service_id": "...",
  "source_table": "articles",
  "text_columns": ["title", "body"],
  "model_provider": "openai",
  "embedding_model": "text-embedding-3-small",
  "model_dimensions": 1536,
  "target_table": "articles_embeddings",
  "mode": "continuous",
  "poll_interval_seconds": 30,
  "batch_size": 100,
  "status": "configuring",
  "rows_processed": 0,
  "rows_pending": 0,
  "tokens_used": 0,
  "created_at": "2026-06-22T10:00:00Z",
  "updated_at": "2026-06-22T10:00:00Z"
}

Monitor a Pipeline

curl -u $USER:$PASS \
  https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID

Key fields to watch:

Field	Meaning
`status`	`pending`, `configuring`, `active`, `paused`, `failed`, or `deleting`.
`rows_processed`	Total rows embedded since creation.
`rows_pending`	Rows waiting to be embedded.
`tokens_used`	Cumulative tokens consumed from the model provider.
`last_processed_at`	Timestamp of the last successfully processed batch.
`error_message`	Most recent error, null when healthy.
`next_run_at`	Next scheduled run time (scheduled mode only).

Pause and Resume

# Pause
curl -u $USER:$PASS \
  -X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/pause

# Resume
curl -u $USER:$PASS \
  -X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/resume

Resuming a continuous pipeline restarts the agent worker and processes any rows added while the pipeline was paused. Resuming a scheduled pipeline recomputes next_run_at from the current time.

Trigger a Manual Run

For scheduled and manual pipelines, trigger a run via the API:

curl -u $USER:$PASS \
  -X POST https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/runs

At most one run can be queued or running per pipeline at a time. If a run is already active, the call returns 409 Conflict.

Inspect Run History

curl -u $USER:$PASS \
  https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID/runs

The response includes the 50 most recent runs, newest first. Each run record shows:

Field	Description
`status`	`queued`, `running`, `succeeded`, `partial`, `failed`, or `canceled`.
`trigger`	`schedule`, `manual`, or `api`.
`rows_scanned`	Source rows examined.
`rows_embedded`	Rows successfully embedded.
`rows_failed`	Rows that failed after all retries.
`tokens_used`	Tokens consumed in this run.
`error_sample`	Up to 20 per-row failures with source row ID and error message.

A partial status means some rows succeeded and some failed. The error_sample array identifies which rows failed.

Update a Pipeline

curl -u $USER:$PASS \
  -X PATCH https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID \
  -H "Content-Type: application/json" \
  -d '{
    "batch_size": 50,
    "source_filter": "published = true"
  }'

The fields database_name, source_table, text_columns, model_provider, target_schema, and target_table cannot be changed after creation. Changing mode or schedule_cron re-bootstraps scheduling.

Delete a Pipeline

# Preserve the target table (default)
curl -u $USER:$PASS \
  -X DELETE https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID

# Also drop the target embeddings table
curl -u $USER:$PASS \
  -X DELETE "https://api.foundrydb.com/managed-services/$SERVICE_ID/embedding-pipelines/$PIPELINE_ID?remove_data=true"

Deletion is asynchronous. The pipeline transitions to deleting status while the agent cleans up triggers and the background worker. The target table is preserved by default.

What's Next

Vector Search — query the embeddings your pipeline produces, using pipeline_id for server-side text embedding at query time.
Inference Proxy — route embedding API calls through the FoundryDB proxy to avoid hard-coding provider keys.

Pipeline Modes​

Endpoints​

Create a Pipeline​

Required Fields​

Optional Fields​

Response​

Monitor a Pipeline​

Pause and Resume​

Trigger a Manual Run​

Inspect Run History​

Update a Pipeline​

Delete a Pipeline​

What's Next​