Inference Proxy
The inference proxy exposes an OpenAI-compatible /inference/v1 endpoint that forwards requests to the model provider of your choice. Your provider API keys are stored encrypted at rest and never returned by the management API. The proxy adds per-key rate limiting, monthly token ceilings, a cost circuit breaker, and optional EU residency enforcement.
Architecture
Each organization configures one or more provider configs (OpenAI, Anthropic, Mistral, Azure OpenAI, or Groq). Applications authenticate with a dedicated inference key (fdb-inf-...) rather than a platform API token. The proxy maps the provider prefix in the model name to the correct upstream and translates the request format as needed (Anthropic uses the Messages API; the proxy handles the conversion transparently).
Your app → POST /inference/v1/chat/completions → FoundryDB proxy → OpenAI / Anthropic / Mistral / Azure / Groq
Bearer fdb-inf-... (EU routing, rate limit, cost check)
Supported Providers
| Prefix | Provider | Chat | Embeddings | EU Route Available |
|---|---|---|---|---|
openai/ | OpenAI | Yes | Yes | No (US-hosted) |
anthropic/ | Anthropic | Yes | No | No (US-hosted) |
mistral/ | Mistral AI | Yes | Yes | Yes |
azure_openai/ | Azure OpenAI | Yes | Yes | Depends on region |
groq/ | Groq | Yes | Yes | No |
When your organization has eu_only: true set, requests to providers without an EU-resident route return 403 eu_residency_unavailable.
Endpoints
POST /inference/v1/chat/completions
POST /inference/v1/embeddings
Both endpoints authenticate with a Bearer inference key, not a platform token.
Management Endpoints
GET /organizations/{orgId}/inference/providers
PUT /organizations/{orgId}/inference/providers
DELETE /organizations/{orgId}/inference/providers/{provider}
GET /organizations/{orgId}/inference/keys
POST /organizations/{orgId}/inference/keys
DELETE /organizations/{orgId}/inference/keys/{keyId}
GET /organizations/{orgId}/inference/settings
PUT /organizations/{orgId}/inference/settings
GET /organizations/{orgId}/inference/usage
Management endpoints use basic auth (owner or admin membership required).
Step 1: Configure a Provider
curl -u $USER:$PASS \
-X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"api_key": "sk-...",
"enabled": true
}'
For Azure OpenAI, base_url (your Azure resource endpoint) is required:
curl -u $USER:$PASS \
-X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "azure_openai",
"api_key": "...",
"base_url": "https://my-resource.openai.azure.com",
"eu_endpoint": true
}'
eu_endpoint: true is an operator attestation that this configuration points at an EU-resident endpoint. It is never inferred from the base_url.
Step 2: Configure Org-Wide Settings
Settings must be configured before any inference calls go through. monthly_cost_limit_cents is required on first configuration:
curl -u $USER:$PASS \
-X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/settings \
-H "Content-Type: application/json" \
-d '{
"monthly_cost_limit_cents": 5000,
"eu_only": false
}'
| Field | Description |
|---|---|
monthly_cost_limit_cents | Monthly spending ceiling. When reached, the cost circuit opens and data plane calls return 402 cost_circuit_open until the limit is raised and the circuit is reset. |
eu_only | When true, reject calls to any provider without an EU-resident route. |
reset_circuit | Pass true to close an open cost circuit after raising the limit. |
Step 3: Mint an Inference Key
Inference keys authenticate data plane calls. The secret is returned exactly once at creation; only its SHA-256 hash is stored.
curl -u $USER:$PASS \
-X POST https://api.foundrydb.com/organizations/$ORG_ID/inference/keys \
-H "Content-Type: application/json" \
-d '{
"name": "checkout-service",
"monthly_token_limit": 1000000,
"rate_limit_rpm": 60
}'
Response:
{
"key": {
"id": "3fa85f64-...",
"name": "checkout-service",
"key_prefix": "fdb-inf-3f4a",
"monthly_token_limit": 1000000,
"rate_limit_rpm": 60,
"status": "active",
"tokens_used_cycle": 0
},
"secret": "fdb-inf-3f4a..."
}
Store the secret immediately. It is not retrievable after this response.
monthly_token_limit is required and must be positive. There is no unlimited key.
Step 4: Call the Proxy
The only difference from a standard OpenAI request is the model field: prefix it with the provider name.
Chat Completions
curl -X POST https://api.foundrydb.com/inference/v1/chat/completions \
-H "Authorization: Bearer fdb-inf-3f4a..." \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{ "role": "user", "content": "Summarise the main types of PostgreSQL indexes." }
]
}'
Streaming is supported. Pass "stream": true to receive a server-sent event stream. Usage is always included in the terminal chunk.
curl -X POST https://api.foundrydb.com/inference/v1/chat/completions \
-H "Authorization: Bearer fdb-inf-3f4a..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-haiku-20241022",
"messages": [
{ "role": "user", "content": "What is a HNSW index?" }
],
"stream": true
}'
Embeddings
curl -X POST https://api.foundrydb.com/inference/v1/embeddings \
-H "Authorization: Bearer fdb-inf-3f4a..." \
-H "Content-Type: application/json" \
-d '{
"model": "openai/text-embedding-3-small",
"input": "PostgreSQL supports TLS for all client connections."
}'
Anthropic does not offer an embeddings API. Requests with an anthropic/ prefix to /inference/v1/embeddings return 400 invalid_model.
Using an OpenAI-Compatible Client
Any library that lets you set a base URL works without modification:
from openai import OpenAI
client = OpenAI(
base_url="https://api.foundrydb.com/inference/v1",
api_key="fdb-inf-3f4a...",
)
response = client.chat.completions.create(
model="mistral/mistral-large-latest",
messages=[{"role": "user", "content": "Explain pgvector cosine distance."}],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.foundrydb.com/inference/v1",
apiKey: "fdb-inf-3f4a...",
});
const response = await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "What is a WAL file?" }],
});
Monitoring Usage
curl -u $USER:$PASS \
"https://api.foundrydb.com/organizations/$ORG_ID/inference/usage?group_by=model"
{
"from": "2026-06-01T00:00:00Z",
"to": "2026-06-22T12:00:00Z",
"group_by": "model",
"rows": [
{
"group_key": "openai/gpt-4o-mini",
"provider": "openai",
"calls": 1420,
"input_tokens": 89340,
"output_tokens": 14200,
"total_tokens": 103540,
"cost_microcents": 51770000
}
]
}
Aggregate by key instead of model to see per-application consumption.
Error Reference
| HTTP Status | Code | Meaning |
|---|---|---|
| 400 | invalid_model | Malformed body, missing provider prefix, or Anthropic on embeddings. |
| 401 | invalid_key | Missing, unknown, revoked, or suspended key. |
| 402 | cost_circuit_open | Organization monthly cost limit reached. Raise the limit and reset the circuit. |
| 403 | eu_residency_unavailable | Org is eu_only and this provider has no EU-resident route. |
| 412 | provider_not_configured | Provider not configured or disabled for this organization. |
| 429 | rate_limited | Per-key RPM limit hit. Check the Retry-After header. |
| 429 | token_ceiling_exceeded | Monthly token limit for this key reached. |
| 502 | upstream_error | Provider returned an error. The provider's status code and body are passed through. |
What's Next
- Embedding Pipelines — use the inference proxy as the embedding provider for automated pipelines.
- Vector Search — query the vectors your embedding pipeline produces.