Skip to main content

Launch a RAG Chatbot

The rag-chatbot stack gives you a complete, private retrieval-augmented chat application in one launch. You get a polished chat interface (Open WebUI) backed by your own vector database, an object-storage bucket for the documents you want it to reason over, and an inference key minted against your organization's own model provider and pinned to EU routing. Everything is provisioned, wired, and metered before it reaches the Running state.

This is the right starting point if you want a real assistant over your own data, running on infrastructure you own, without assembling a database, a bucket, an inference endpoint, and an app by hand and then connecting them yourself.

What it composes

The stack composes four FoundryDB primitives in the correct dependency order. It is the only first-party stack that includes both a Files bucket and an inference resource.

ResourceKindWhat it isWhy it is here
dbdatabasePostgreSQL 17 with the pgvector extension (tier-1, 25 GB standard storage)The chatbot's persistence and vector store. Embeddings and chat history live in a database that is yours.
docsfilesAn object-storage bucket (5 GB soft quota)Holds the documents you ingest for the assistant to retrieve over.
inferenceinferenceAn org-scoped inference key, eu_only, with a monthly budget ceilingAn OpenAI-compatible endpoint that routes every token of generation through your own configured provider, inside the EU.
chatappOpen WebUI (ghcr.io/open-webui/open-webui:main), attached to db and docs, fronted by auth and an edge domainThe chat application. It talks to your database, your bucket, and your inference endpoint.

The chat app depends on db, docs, and inference, so the platform brings those up first, then provisions and wires the app last.

One-click stack launch fan-out
RUNNING Stack wired · endpoint live
Stack Templaterag-chatbotlaunch ⇉PostgreSQLpgvectorAppOpen WebUIFilesbucketInferenceEU key
Template · AppPostgreSQL (pgvector)Files bucketInference (EU)wiring (env injected)

Prerequisites

  • An API token. See Authentication.
  • An enabled inference provider configured for your organization. The rag-chatbot stack mints a key against your provider; it never creates one, and there is no shared platform-default model. If no provider is enabled, the launch fails at the cost preview step with 400 Bad Request. See the Inference Proxy docs for how to add a provider.

Set your token for the API examples below:

export FOUNDRYDB_TOKEN="your-api-token"

Launch it

You can launch from the console with one click, or drive the same flow over the API.

Option A: the console

  1. Open Stacks in the FoundryDB console.
  2. Pick Launch a RAG chatbot.
  3. Review the cost preview. The inference line item is a monthly budget ceiling, not a guaranteed charge.
  4. Click Launch. The stack provisions asynchronously and the console shows each resource coming up.

Option B: the API

Launching over the API is a two-step flow: preview the cost first, then send the launch request with the cost you accepted. This guarantees you always see an up-to-date estimate before any billable resource is created.

Step 1: Preview the cost

curl -X POST https://api.foundrydb.com/stacks/preview \
-H "Authorization: Bearer $FOUNDRYDB_TOKEN" \
-H "Content-Type: application/json" \
-d '{"template_name": "rag-chatbot"}'

The response breaks the cost down per resource:

{
"template_name": "rag-chatbot",
"currency": "USD",
"monthly_total": 67.00,
"line_items": [
{
"symbolic_name": "db",
"kind": "database",
"description": "PostgreSQL tier-1 (1 vCPU, 2 GB) + 25 GB standard storage, pgvector",
"monthly_cost": 24.00,
"is_ceiling": false
},
{
"symbolic_name": "docs",
"kind": "files",
"description": "Files bucket, 5 GB soft quota",
"monthly_cost": 3.00,
"is_ceiling": false
},
{
"symbolic_name": "inference",
"kind": "inference",
"description": "EU-routed inference key, monthly budget ceiling",
"monthly_cost": 5.00,
"is_ceiling": true
},
{
"symbolic_name": "chat",
"kind": "app",
"description": "Open WebUI, tier-1 + 10 GB storage, edge domain",
"monthly_cost": 35.00,
"is_ceiling": false
}
],
"warnings": [
"The inference line item is a monthly budget ceiling, not a guaranteed charge."
]
}

If your organization has no enabled inference provider, this call returns 400 Bad Request instead. Add a provider first, then preview again.

Line items marked is_ceiling: true are maximum charges, not fixed recurring costs. monthly_total is the number you pass to the launch request.

Step 2: Launch

Pass the monthly_total from the preview as accepted_monthly_cost:

curl -X POST https://api.foundrydb.com/stacks \
-H "Authorization: Bearer $FOUNDRYDB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-rag-chatbot",
"template_name": "rag-chatbot",
"accepted_monthly_cost": 67.00
}'

The response is 201 Created with the stack in Pending status. Capture its ID:

export STACK_ID="the-id-from-the-response"

If the accepted cost has drifted from the current estimate by more than $0.01, the launch is rejected with 409 Conflict and you re-preview. If you omit accepted_monthly_cost, you get 400 Bad Request.

Step 3: Poll for status

The stack provisions asynchronously. Poll GET /stacks/{stack-id} until status reaches Running:

curl https://api.foundrydb.com/stacks/$STACK_ID \
-H "Authorization: Bearer $FOUNDRYDB_TOKEN"

The response includes a resources array with each child's own status, and once the stack is Running, an endpoint_url with the live Open WebUI address:

{
"id": "...",
"name": "my-rag-chatbot",
"status": "Running",
"endpoint_url": "https://my-rag-chatbot.foundrydb.com",
"resources": [
{ "symbolic_name": "db", "kind": "database", "status": "Running" },
{ "symbolic_name": "docs", "kind": "files", "status": "Running" },
{ "symbolic_name": "inference", "kind": "inference", "status": "Running" },
{ "symbolic_name": "chat", "kind": "app", "status": "Running" }
]
}

Typical progression is Pending, Provisioning, Wiring, Running. Most stacks complete within 5 minutes.

First run

Once the stack is Running:

  1. Open the endpoint. Visit the endpoint_url. It is a real foundrydb.com hostname fronted by an edge domain with a valid certificate.
  2. Create your account. Open WebUI launches with authentication enabled (WEBUI_AUTH). The first account you create becomes the administrator. User accounts and settings persist into your PostgreSQL database, not into ephemeral container state.
  3. Upload documents. Add the files you want the assistant to retrieve over. They are stored in the docs Files bucket that the app is attached to.
  4. Chat. Ask questions about your documents. Generation is routed through the EU-pinned inference key against your own model provider. Embeddings for retrieval are stored using pgvector in your PostgreSQL database.

You are not talking to a demo. You are talking to your own data, on services you own.

How the wiring works

The stack engine does not introduce a new provisioning path. It calls the same APIs you would call yourself, in dependency order, and injects the credentials each resource needs to reach the others.

Attachment credential injection. The chat app is attached to exactly one database and one files bucket. Because of that one-to-one attachment, the platform injects DATABASE_URL and the S3_* bucket credentials into the app's container environment automatically. You do not copy a connection string or paste an access key. Open WebUI boots already knowing how to reach Postgres and the bucket.

The minted inference key. The inference resource mints an org-scoped key against your configured provider, pinned to EU routing, with a monthly budget ceiling. The stack wires that key into the app explicitly through three environment variables: OPENAI_API_BASE_URL points at the OpenAI-compatible inference proxy, OPENAI_API_KEY carries the minted key, and WEBUI_AUTH enforces sign-in. Open WebUI speaks the OpenAI API, so it talks to your provider through the proxy with no extra adapter.

pgvector for embeddings. PostgreSQL is provisioned with the pgvector extension enabled, so document embeddings live alongside your application data in a database you control, queryable with standard SQL.

Cost and EU residency

The cost preview is mandatory and is re-checked at launch. The inference line item is a budget ceiling: the inference resource carries a monthly budget cap, so it bounds your generation spend rather than charging a flat fee. Every other line item is an ordinary recurring service cost on your plan.

Every resource the stack creates is EU-resident. The database, the bucket, the app, and the inference routing all stay within the platform's European footprint, and the inference key is pinned eu_only. Residency is not a toggle you remember to flip; it is where these resources run.

Teardown

Deleting the stack removes every child resource atomically:

curl -X DELETE https://api.foundrydb.com/stacks/$STACK_ID \
-H "Authorization: Bearer $FOUNDRYDB_TOKEN"

Returns 202 Accepted. The reconciler deletes the app, the inference key, the bucket, and the database in reverse dependency order. No child resource is left orphaned. Teardown is irreversible.

Next steps