Migrating from OpenAI

Infery’s gateway is a strict OpenAI-API superset. You keep the OpenAI SDK; you change the base URL and API key.

The change

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["INFERY_API_KEY"],   # was OPENAI_API_KEY
    base_url="https://api.infery.ai/v1",     # added
)

That’s it. Every existing call site keeps working.

What stays identical

Endpoint paths: /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/audio/*, /v1/files
Request bodies (messages, tools, response_format, streaming)
Response shapes (id, choices, usage, system_fingerprint)
SSE streaming format including the final data: [DONE]
Tool calling, JSON mode, structured outputs, vision, PDF
Idempotency keys
Error envelope ({ "error": { "type", "code", "message" } })

What’s added

Feature	How
Cost per request	Header `x-credits-used`, plus a `credits_used` SSE chunk before `[DONE]`
Multi-provider models	Use any model slug from `GET /v1/models` — Anthropic, Google, xAI, OSS — with the OpenAI SDK
Fallback routing	Configure in dashboard, headers `x-model-used` / `x-fallback-from` tell you what served
Usage analytics	Per-key, per-model, per-member breakdowns

What changes

Model slugs. OpenAI models keep their names (gpt-4o, gpt-4o-mini, text-embedding-3-large). For Anthropic/Google/xAI, use the slug from GET /v1/models — for example claude-sonnet-4-5, gemini-2-5-flash, grok-4. Auth. Use an Infery API key (inf_...) — your OpenAI key is not valid here. Create one in Settings → API Keys. Rate limits. Per-workspace, not per-OpenAI-org. See Rate limits. Billing. Single Infery invoice covers every provider. Your OpenAI billing relationship ends.

Checklist

Create an Infery API key
Replace OPENAI_API_KEY with INFERY_API_KEY in env config
Set base_url / baseURL to https://api.infery.ai/v1
Run your test suite — nothing else should change
(Optional) Set up a fallback chain for production resilience
(Optional) Add x-credits-used to your request logging

Things to watch

Org-level OpenAI features (project keys, fine-tunes, batch API) aren’t 1:1 yet — batch is on the roadmap.
System fingerprints are passed through from upstream when present, so determinism guarantees match the underlying provider.
If your code parses error messages by string, switch to error.code — it’s stable; messages are not.

Get started

Playground

Workspaces

Billing

Models

Guides

Reference

Migrating from OpenAI

The change

What stays identical

What’s added

What changes

Checklist

Things to watch

Get started

Playground

Workspaces

Billing

Models

Guides

Reference

​The change

​What stays identical

​What’s added

​What changes

​Checklist

​Things to watch

The change

What stays identical

What’s added

What changes

Checklist

Things to watch