Inference Proxy

The inference proxy exposes an OpenAI-compatible /inference/v1 endpoint that forwards requests to the model provider of your choice. Your provider API keys are stored encrypted at rest and never returned by the management API. The proxy adds per-key rate limiting, monthly token ceilings, a cost circuit breaker, and optional EU residency enforcement.

Architecture

Each organization configures one or more provider configs (OpenAI, Anthropic, Mistral, Azure OpenAI, or Groq). Applications authenticate with a dedicated inference key (fdb-inf-...) rather than a platform API token. The proxy maps the provider prefix in the model name to the correct upstream and translates the request format as needed (Anthropic uses the Messages API; the proxy handles the conversion transparently).

Your app  →  POST /inference/v1/chat/completions  →  FoundryDB proxy  →  OpenAI / Anthropic / Mistral / Azure / Groq
              Bearer fdb-inf-...                      (EU routing, rate limit, cost check)

Supported Providers

Prefix	Provider	Chat	Embeddings	EU Route Available
`openai/`	OpenAI	Yes	Yes	No (US-hosted)
`anthropic/`	Anthropic	Yes	No	No (US-hosted)
`mistral/`	Mistral AI	Yes	Yes	Yes
`azure_openai/`	Azure OpenAI	Yes	Yes	Depends on region
`groq/`	Groq	Yes	Yes	No

When your organization has eu_only: true set, requests to providers without an EU-resident route return 403 eu_residency_unavailable.

Endpoints

POST /inference/v1/chat/completions
POST /inference/v1/embeddings

Both endpoints authenticate with a Bearer inference key, not a platform token.

Management Endpoints

GET    /organizations/{orgId}/inference/providers
PUT    /organizations/{orgId}/inference/providers
DELETE /organizations/{orgId}/inference/providers/{provider}
GET    /organizations/{orgId}/inference/keys
POST   /organizations/{orgId}/inference/keys
DELETE /organizations/{orgId}/inference/keys/{keyId}
GET    /organizations/{orgId}/inference/settings
PUT    /organizations/{orgId}/inference/settings
GET    /organizations/{orgId}/inference/usage

Management endpoints use basic auth (owner or admin membership required).

Step 1: Configure a Provider

curl -u $USER:$PASS \
  -X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "api_key": "sk-...",
    "enabled": true
  }'

For Azure OpenAI, base_url (your Azure resource endpoint) is required:

curl -u $USER:$PASS \
  -X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "azure_openai",
    "api_key": "...",
    "base_url": "https://my-resource.openai.azure.com",
    "eu_endpoint": true
  }'

eu_endpoint: true is an operator attestation that this configuration points at an EU-resident endpoint. It is never inferred from the base_url.

Step 2: Configure Org-Wide Settings

Settings must be configured before any inference calls go through. monthly_cost_limit_cents is required on first configuration:

curl -u $USER:$PASS \
  -X PUT https://api.foundrydb.com/organizations/$ORG_ID/inference/settings \
  -H "Content-Type: application/json" \
  -d '{
    "monthly_cost_limit_cents": 5000,
    "eu_only": false
  }'

Field	Description
`monthly_cost_limit_cents`	Monthly spending ceiling. When reached, the cost circuit opens and data plane calls return `402 cost_circuit_open` until the limit is raised and the circuit is reset.
`eu_only`	When true, reject calls to any provider without an EU-resident route.
`reset_circuit`	Pass `true` to close an open cost circuit after raising the limit.

Step 3: Mint an Inference Key

Inference keys authenticate data plane calls. The secret is returned exactly once at creation; only its SHA-256 hash is stored.

curl -u $USER:$PASS \
  -X POST https://api.foundrydb.com/organizations/$ORG_ID/inference/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "checkout-service",
    "monthly_token_limit": 1000000,
    "rate_limit_rpm": 60
  }'

Response:

{
  "key": {
    "id": "3fa85f64-...",
    "name": "checkout-service",
    "key_prefix": "fdb-inf-3f4a",
    "monthly_token_limit": 1000000,
    "rate_limit_rpm": 60,
    "status": "active",
    "tokens_used_cycle": 0
  },
  "secret": "fdb-inf-3f4a..."
}

Store the secret immediately. It is not retrievable after this response.

monthly_token_limit is required and must be positive. There is no unlimited key.

Step 4: Call the Proxy

The only difference from a standard OpenAI request is the model field: prefix it with the provider name.

Chat Completions

curl -X POST https://api.foundrydb.com/inference/v1/chat/completions \
  -H "Authorization: Bearer fdb-inf-3f4a..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "Summarise the main types of PostgreSQL indexes." }
    ]
  }'

Streaming is supported. Pass "stream": true to receive a server-sent event stream. Usage is always included in the terminal chunk.

curl -X POST https://api.foundrydb.com/inference/v1/chat/completions \
  -H "Authorization: Bearer fdb-inf-3f4a..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-haiku-20241022",
    "messages": [
      { "role": "user", "content": "What is a HNSW index?" }
    ],
    "stream": true
  }'

Embeddings

curl -X POST https://api.foundrydb.com/inference/v1/embeddings \
  -H "Authorization: Bearer fdb-inf-3f4a..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "PostgreSQL supports TLS for all client connections."
  }'

Anthropic does not offer an embeddings API. Requests with an anthropic/ prefix to /inference/v1/embeddings return 400 invalid_model.

Using an OpenAI-Compatible Client

Any library that lets you set a base URL works without modification:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.foundrydb.com/inference/v1",
    api_key="fdb-inf-3f4a...",
)

response = client.chat.completions.create(
    model="mistral/mistral-large-latest",
    messages=[{"role": "user", "content": "Explain pgvector cosine distance."}],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.foundrydb.com/inference/v1",
  apiKey: "fdb-inf-3f4a...",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "What is a WAL file?" }],
});

Monitoring Usage

curl -u $USER:$PASS \
  "https://api.foundrydb.com/organizations/$ORG_ID/inference/usage?group_by=model"

{
  "from": "2026-06-01T00:00:00Z",
  "to": "2026-06-22T12:00:00Z",
  "group_by": "model",
  "rows": [
    {
      "group_key": "openai/gpt-4o-mini",
      "provider": "openai",
      "calls": 1420,
      "input_tokens": 89340,
      "output_tokens": 14200,
      "total_tokens": 103540,
      "cost_microcents": 51770000
    }
  ]
}

Aggregate by key instead of model to see per-application consumption.

Error Reference

HTTP Status	Code	Meaning
400	`invalid_model`	Malformed body, missing provider prefix, or Anthropic on embeddings.
401	`invalid_key`	Missing, unknown, revoked, or suspended key.
402	`cost_circuit_open`	Organization monthly cost limit reached. Raise the limit and reset the circuit.
403	`eu_residency_unavailable`	Org is `eu_only` and this provider has no EU-resident route.
412	`provider_not_configured`	Provider not configured or disabled for this organization.
429	`rate_limited`	Per-key RPM limit hit. Check the `Retry-After` header.
429	`token_ceiling_exceeded`	Monthly token limit for this key reached.
502	`upstream_error`	Provider returned an error. The provider's status code and body are passed through.

What's Next

Embedding Pipelines — use the inference proxy as the embedding provider for automated pipelines.
Vector Search — query the vectors your embedding pipeline produces.

Architecture​

Supported Providers​

Endpoints​

Management Endpoints​

Step 1: Configure a Provider​

Step 2: Configure Org-Wide Settings​

Step 3: Mint an Inference Key​

Step 4: Call the Proxy​

Chat Completions​

Embeddings​

Using an OpenAI-Compatible Client​

Monitoring Usage​

Error Reference​

What's Next​