Create chat completion
Chat Completions
Chat completions
POST /v1/chat/completions — OpenAI-compatible text and multimodal chat.
POST
Create chat completion
Drop-in OpenAI-compatible chat endpoint. Supports streaming, tool calls, JSON mode, vision, PDF, audio input and Files API
OpenAI SDKs ignore chunks with empty
Supported block types:
The gateway resolves
Or with a schema (Structured Outputs):
file_id references.
Minimal example
Streaming
Add"stream": true. Response is Server-Sent Events (text/event-stream). Each chunk is data: {...}\n\n; the stream ends with data: [DONE]\n\n.
The final chunk before [DONE] carries usage info and Infery-specific credits_used:
choices, so credits_used is a non-breaking extension.
Multimodal content
Pass arrays incontent:
textimage_url— HTTP URL or base64data:URIinput_audio— inline base64 audio withformat(wav/mp3/pcm16/webm)file— inlinedata+mime_typeorfile_idreference (Files API)
Using a Files API reference
file_id to bytes on the server, injects them into the provider call, and returns a clear 400 if the id doesn’t exist or is out of your workspace.
Tool calls
Works exactly like OpenAI’s spec —tools, tool_choice, function schema and tool role messages. Every chat-capable model on Infery that supports tools honours the same format.
JSON mode
Vision and PDFs
Models withsupportsVision: true accept images directly. For PDFs, models with supportsPdf: true read them natively. Others get an automatic PDF-to-image conversion on the gateway (plus text extraction) — you pay a small extra fee per page (see billing), no code changes required.
Parameters
Full OpenAI parameter set:temperature, top_p, presence_penalty, frequency_penalty, max_tokens, stop, seed, stream, tools, tool_choice, response_format. Plus model-specific:
top_k— Gemini and some OSS models
Response headers
x-request-idx-credits-usedx-model-used(when fallback fires)x-fallback-from,x-fallback-depth
Authorizations
API key in format: Bearer inf_***
Headers
Optional request ID for tracking
Body
application/json
Model ID
Example:
"gpt-4o"
Required range:
0 <= x <= 2Required range:
0 <= x <= 1Top-K sampling (Google Gemini)
Presence penalty (OpenAI, Google Gemini)
Required range:
-2 <= x <= 2Frequency penalty (OpenAI, Google Gemini)
Required range:
-2 <= x <= 2Seed for deterministic output (OpenAI, Google Gemini)
Response
Chat completion result. When stream=true, returns SSE (text/event-stream) where each data chunk is a chat.completion.chunk; the final chunk before [DONE] carries usage and credits_used.

