Skip to main content

Automatic Embedding Generation: Build RAG Without the Plumbing

· 8 min read
FoundryDB Team
Engineering @ FoundryDB

Every RAG system needs the same boring middle layer: watch a table for changes, call an embedding API, write vectors back, handle retries, manage batches, build indexes, schedule cron jobs, and pray nothing drifts out of sync at 3 AM. FoundryDB's managed embedding pipelines eliminate that entire layer. You configure a pipeline, and your PostgreSQL data gets auto-vectorized with an HNSW index, ready for similarity search.

No ETL scripts. No cron jobs. No model orchestration code.

The Problem with DIY Embedding Pipelines

If you have built a RAG application, you have written some version of this pipeline:

  1. A script queries your source table for new or changed rows.
  2. It concatenates text columns, calls OpenAI or Cohere, and receives vectors.
  3. It writes those vectors to a separate table (or the same table, if you enjoy schema headaches).
  4. A cron job or queue worker runs this every few minutes.
  5. You add retry logic for API rate limits and transient failures.
  6. You create and maintain a pgvector HNSW index.
  7. You monitor for drift: rows that were inserted but never embedded, embeddings from deleted rows, schema changes that silently break the pipeline.

This works. It also means you are maintaining a custom data pipeline for every table you want to embed. Each one needs monitoring, error handling, and operational attention. For a team that just wants to ship a search feature or a chatbot, this is overhead that adds no user value.

How Managed Embedding Pipelines Work

FoundryDB handles the entire flow on the database VM itself. When you create a pipeline via the API, the system:

  1. Verifies that pgvector is installed on your PostgreSQL service.
  2. Discovers the primary key of your source table.
  3. Creates a companion table (e.g., articles_embeddings) with a vector column sized to your model's dimensions.
  4. Builds an HNSW index on that column using cosine distance (vector_cosine_ops) with m = 16 and ef_construction = 64.
  5. Starts a background worker on the primary node that polls the source table at a configurable interval.
  6. On each poll, the worker detects new or changed rows using text hashes, batches them, calls the embedding provider, and writes the results to the companion table.

The pipeline transitions through well-defined states: pending, configuring, active, paused, failed, and deleting. You can pause a pipeline without losing state, resume it to process rows that accumulated while paused, or delete it and optionally drop the companion table.

Creating a Pipeline

Prerequisites: a PostgreSQL service in Running state with the pgvector extension installed. The easiest way is to include "extensions": ["vector"] at service creation time, or use the agent-postgresql-rag preset which includes pgvector automatically.

Once your service is running, create a pipeline with a single API call:

curl -u user:pass -X POST \
https://api.foundrydb.com/managed-services/{service-id}/embedding-pipelines \
-H "Content-Type: application/json" \
-d '{
"source_table": "articles",
"text_columns": ["title", "body"],
"model_provider": "openai",
"embedding_model": "text-embedding-3-small",
"model_dimensions": 1536,
"provider_api_key": "sk-...",
"database_name": "defaultdb",
"batch_size": 100,
"poll_interval_seconds": 30
}'

That is the entire setup. FoundryDB creates the companion table articles_embeddings, builds the HNSW index, and starts processing rows. The provider_api_key is encrypted at rest in the controller database and only transmitted to the agent when it needs to call the embedding API.

The response includes the pipeline ID and status. Within seconds, the status transitions from pending to configuring to active.

Supported Model Providers

FoundryDB supports three provider types:

ProviderExample ModelsDefault Dimensions
OpenAItext-embedding-3-small, text-embedding-3-large1536 / 3072
Cohereembed-english-v3.01024
CustomAny OpenAI-compatible endpointConfigurable (1 to 8192)

The custom provider accepts a provider_base_url field pointing to any endpoint that implements the OpenAI /v1/embeddings format. This works with self-hosted models, Azure OpenAI, or any compatible proxy.

For models that support variable dimensions (such as text-embedding-3-small), you can override the default by setting model_dimensions to a lower value like 512. Smaller vectors reduce storage and improve query speed at a modest accuracy tradeoff.

Querying Your Embeddings

Once the pipeline is active and rows have been processed, query the companion table using standard pgvector operators. The HNSW index makes approximate nearest-neighbor search fast without any additional configuration.

-- Find the 5 most similar articles to a query embedding
SELECT a.id, a.title, e.embedding <=> '[0.1, 0.2, ...]' AS distance
FROM articles a
JOIN articles_embeddings e ON e.source_row_id::int = a.id
ORDER BY e.embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;

The <=> operator computes cosine distance, which matches the HNSW index's vector_cosine_ops configuration. You can also use <-> for L2 (Euclidean) distance or <#> for inner product.

Combine vector search with standard SQL filters for hybrid queries:

SELECT a.id, a.title, e.embedding <=> $1 AS distance
FROM articles a
JOIN articles_embeddings e ON e.source_row_id::int = a.id
WHERE a.published = true
AND a.category = 'technology'
ORDER BY e.embedding <=> $1
LIMIT 10;

This is a key advantage of keeping embeddings in PostgreSQL rather than a separate vector database: your vectors and your relational data live in the same database, queryable in a single SQL statement.

Monitoring Pipeline Status

Check how your pipeline is performing at any time:

curl -u user:pass \
https://api.foundrydb.com/managed-services/{service-id}/embedding-pipelines/{pipeline-id} \
| jq '{status, rows_processed, rows_pending, tokens_used, last_processed_at, error_message}'

The response includes:

  • rows_processed: Total rows embedded since the pipeline was created.
  • rows_pending: Rows waiting to be embedded (new inserts or updated text).
  • tokens_used: Total tokens consumed by the model provider, useful for cost tracking.
  • last_processed_at: Timestamp of the last successfully processed batch.
  • error_message: Details when the pipeline enters failed state (e.g., invalid API key, provider rate limits, source table schema changes).

Pausing and Resuming

Need to rotate your API key, or want to stop embedding during a bulk data migration? Pause the pipeline:

curl -u user:pass -X POST \
https://api.foundrydb.com/managed-services/{service-id}/embedding-pipelines/{pipeline-id}/pause

The worker completes any in-flight batch and stops. The companion table and its HNSW index are preserved. When you resume, the worker picks up where it left off and processes any rows that were inserted or updated while paused:

curl -u user:pass -X POST \
https://api.foundrydb.com/managed-services/{service-id}/embedding-pipelines/{pipeline-id}/resume

Tuning for Your Workload

Two configuration knobs let you balance throughput, latency, and cost:

  • batch_size (1 to 1000, default 100): Larger batches reduce API call overhead but increase per-batch latency. Use smaller batches (10 to 50) for near-real-time applications, larger batches (200 to 1000) for bulk ingestion.
  • poll_interval_seconds (5 to 3600, default 30): How frequently the worker checks for new or changed rows. A 5-second interval gives near-real-time embedding at the cost of higher CPU usage on the agent. A 300-second interval works well when freshness is less critical.

Both can be updated on a running pipeline via the PATCH endpoint without pausing.

Limitations Worth Knowing

  • PostgreSQL only. Embedding pipelines require pgvector, which is a PostgreSQL extension.
  • Primary node only. The embedding worker runs on the primary node. Read replicas do not run their own workers.
  • One pipeline per source table. Each source table supports a single active pipeline. If you need different models for different use cases, use separate tables.
  • Source table must have a primary key. The worker uses the primary key for row tracking and change detection.

What This Replaces

A typical DIY embedding pipeline involves a queue worker (or cron job), an embedding service wrapper, a database migration for the vector column and index, monitoring for the worker process, and retry/dead-letter logic for API failures. That is 200 to 500 lines of infrastructure code per table, plus ongoing operational burden.

With FoundryDB's managed pipelines, you replace all of that with a single POST request and a few configuration fields. The worker, the companion table, the HNSW index, the change detection, the batching, and the error handling are all managed for you.

Get Started

  1. Create a PostgreSQL service with pgvector enabled.
  2. Create an embedding pipeline via the API.
  3. Insert data into your source table and watch the companion table populate automatically.
  4. Query with pgvector operators and build your RAG application.

If you are already running PostgreSQL on FoundryDB, you can add a pipeline to an existing service today. If you are new to the platform, the agent-postgresql-rag preset gives you a pgvector-ready database in under five minutes.

Try FoundryDB | Read the docs | Embedding Pipelines API reference