Skip to main content
Infery’s gateway is a strict OpenAI-API superset. You keep the OpenAI SDK; you change the base URL and API key.

The change

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["INFERY_API_KEY"],   # was OPENAI_API_KEY
    base_url="https://api.infery.ai/v1",     # added
)
That’s it. Every existing call site keeps working.

What stays identical

  • Endpoint paths: /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/audio/*, /v1/files
  • Request bodies (messages, tools, response_format, streaming)
  • Response shapes (id, choices, usage, system_fingerprint)
  • SSE streaming format including the final data: [DONE]
  • Tool calling, JSON mode, structured outputs, vision, PDF
  • Idempotency keys
  • Error envelope ({ "error": { "type", "code", "message" } })

What’s added

FeatureHow
Cost per requestHeader x-credits-used, plus a credits_used SSE chunk before [DONE]
Multi-provider modelsUse any model slug from GET /v1/models — Anthropic, Google, xAI, OSS — with the OpenAI SDK
Fallback routingConfigure in dashboard, headers x-model-used / x-fallback-from tell you what served
Usage analyticsPer-key, per-model, per-member breakdowns

What changes

Model slugs. OpenAI models keep their names (gpt-4o, gpt-4o-mini, text-embedding-3-large). For Anthropic/Google/xAI, use the slug from GET /v1/models — for example claude-sonnet-4-5, gemini-2-5-flash, grok-4. Auth. Use an Infery API key (inf_...) — your OpenAI key is not valid here. Create one in Settings → API Keys. Rate limits. Per-workspace, not per-OpenAI-org. See Rate limits. Billing. Single Infery invoice covers every provider. Your OpenAI billing relationship ends.

Checklist

  • Create an Infery API key
  • Replace OPENAI_API_KEY with INFERY_API_KEY in env config
  • Set base_url / baseURL to https://api.infery.ai/v1
  • Run your test suite — nothing else should change
  • (Optional) Set up a fallback chain for production resilience
  • (Optional) Add x-credits-used to your request logging

Things to watch

  • Org-level OpenAI features (project keys, fine-tunes, batch API) aren’t 1:1 yet — batch is on the roadmap.
  • System fingerprints are passed through from upstream when present, so determinism guarantees match the underlying provider.
  • If your code parses error messages by string, switch to error.code — it’s stable; messages are not.