Skip to main content

Inference Proxy

The inference proxy exposes an OpenAI-compatible /inference/v1 endpoint that forwards requests to the model provider of your choice. Your provider API keys are stored encrypted at rest and never returned by the management API. The proxy adds per-key rate limiting, monthly token ceilings, a cost circuit breaker, and optional EU residency enforcement.

Architecture

Each organization configures one or more provider configs (OpenAI, Anthropic, Mistral, Azure OpenAI, or Groq). Applications authenticate with a dedicated inference key (fdb-inf-...) rather than a platform API token. The proxy maps the provider prefix in the model name to the correct upstream and translates the request format as needed (Anthropic uses the Messages API; the proxy handles the conversion transparently).

Your app  →  POST /inference/v1/chat/completions  →  FoundryDB proxy  →  OpenAI / Anthropic / Mistral / Azure / Groq
Bearer fdb-inf-... (EU routing, rate limit, cost check)

Supported Providers

PrefixProviderChatEmbeddingsEU Route Available
openai/OpenAIYesYesNo (US-hosted)
anthropic/AnthropicYesNoNo (US-hosted)
mistral/Mistral AIYesYesYes
azure_openai/Azure OpenAIYesYesDepends on region
groq/GroqYesYesNo

When your organization has eu_only: true set, requests to providers without an EU-resident route return 403 eu_residency_unavailable.

Endpoints

POST /inference/v1/chat/completions
POST /inference/v1/embeddings

Both endpoints authenticate with a Bearer inference key, not a platform token.

Management Endpoints

GET    /organizations/{orgId}/inference/providers
PUT /organizations/{orgId}/inference/providers
DELETE /organizations/{orgId}/inference/providers/{provider}
GET /organizations/{orgId}/inference/keys
POST /organizations/{orgId}/inference/keys
DELETE /organizations/{orgId}/inference/keys/{keyId}
GET /organizations/{orgId}/inference/settings
PUT /organizations/{orgId}/inference/settings
GET /organizations/{orgId}/inference/usage

Management endpoints use basic auth (owner or admin membership required).

Step 1: Configure a Provider

curl -u $USER:$PASS \
-X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"api_key": "sk-...",
"enabled": true
}'

For Azure OpenAI, base_url (your Azure resource endpoint) is required:

curl -u $USER:$PASS \
-X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "azure_openai",
"api_key": "...",
"base_url": "https://my-resource.openai.azure.com",
"eu_endpoint": true
}'

eu_endpoint: true is an operator attestation that this configuration points at an EU-resident endpoint. It is never inferred from the base_url.

Step 2: Configure Org-Wide Settings

Settings must be configured before any inference calls go through. monthly_cost_limit_cents is required on first configuration:

curl -u $USER:$PASS \
-X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/settings \
-H "Content-Type: application/json" \
-d '{
"monthly_cost_limit_cents": 5000,
"eu_only": false
}'
FieldDescription
monthly_cost_limit_centsMonthly spending ceiling. When reached, the cost circuit opens and data plane calls return 402 cost_circuit_open until the limit is raised and the circuit is reset.
eu_onlyWhen true, reject calls to any provider without an EU-resident route.
reset_circuitPass true to close an open cost circuit after raising the limit.

Step 3: Mint an Inference Key

Inference keys authenticate data plane calls. The secret is returned exactly once at creation; only its SHA-256 hash is stored.

curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/organizations/$ORG_ID/inference/keys \
-H "Content-Type: application/json" \
-d '{
"name": "checkout-service",
"monthly_token_limit": 1000000,
"rate_limit_rpm": 60
}'

Response:

{
"key": {
"id": "3fa85f64-...",
"name": "checkout-service",
"key_prefix": "fdb-inf-3f4a",
"monthly_token_limit": 1000000,
"rate_limit_rpm": 60,
"status": "active",
"tokens_used_cycle": 0
},
"secret": "fdb-inf-3f4a..."
}

Store the secret immediately. It is not retrievable after this response.

monthly_token_limit is required and must be positive. There is no unlimited key.

Step 4: Call the Proxy

The only difference from a standard OpenAI request is the model field: prefix it with the provider name.

Chat Completions

curl -X POST https://api.foundrydb.com/inference/v1/chat/completions \
-H "Authorization: Bearer fdb-inf-3f4a..." \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{ "role": "user", "content": "Summarise the main types of PostgreSQL indexes." }
]
}'

Streaming is supported. Pass "stream": true to receive a server-sent event stream. Usage is always included in the terminal chunk.

curl -X POST https://api.foundrydb.com/inference/v1/chat/completions \
-H "Authorization: Bearer fdb-inf-3f4a..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-haiku-20241022",
"messages": [
{ "role": "user", "content": "What is a HNSW index?" }
],
"stream": true
}'

Embeddings

curl -X POST https://api.foundrydb.com/inference/v1/embeddings \
-H "Authorization: Bearer fdb-inf-3f4a..." \
-H "Content-Type: application/json" \
-d '{
"model": "openai/text-embedding-3-small",
"input": "PostgreSQL supports TLS for all client connections."
}'

Anthropic does not offer an embeddings API. Requests with an anthropic/ prefix to /inference/v1/embeddings return 400 invalid_model.

Using an OpenAI-Compatible Client

Any library that lets you set a base URL works without modification:

from openai import OpenAI

client = OpenAI(
base_url="https://api.foundrydb.com/inference/v1",
api_key="fdb-inf-3f4a...",
)

response = client.chat.completions.create(
model="mistral/mistral-large-latest",
messages=[{"role": "user", "content": "Explain pgvector cosine distance."}],
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
baseURL: "https://api.foundrydb.com/inference/v1",
apiKey: "fdb-inf-3f4a...",
});

const response = await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "What is a WAL file?" }],
});

Monitoring Usage

curl -u $USER:$PASS \
"https://api.foundrydb.com/organizations/$ORG_ID/inference/usage?group_by=model"
{
"from": "2026-06-01T00:00:00Z",
"to": "2026-06-22T12:00:00Z",
"group_by": "model",
"rows": [
{
"group_key": "openai/gpt-4o-mini",
"provider": "openai",
"calls": 1420,
"input_tokens": 89340,
"output_tokens": 14200,
"total_tokens": 103540,
"cost_microcents": 51770000
}
]
}

Aggregate by key instead of model to see per-application consumption.

Error Reference

HTTP StatusCodeMeaning
400invalid_modelMalformed body, missing provider prefix, or Anthropic on embeddings.
401invalid_keyMissing, unknown, revoked, or suspended key.
402cost_circuit_openOrganization monthly cost limit reached. Raise the limit and reset the circuit.
403eu_residency_unavailableOrg is eu_only and this provider has no EU-resident route.
412provider_not_configuredProvider not configured or disabled for this organization.
429rate_limitedPer-key RPM limit hit. Check the Retry-After header.
429token_ceiling_exceededMonthly token limit for this key reached.
502upstream_errorProvider returned an error. The provider's status code and body are passed through.

What's Next

  • Embedding Pipelines — use the inference proxy as the embedding provider for automated pipelines.
  • Vector Search — query the vectors your embedding pipeline produces.