Skip to main content
POST
/
v1
/
chat
/
completions
Create chat completion
curl --request POST \
  --url https://api.infery.ai/v1/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "temperature": 1,
  "max_tokens": 123,
  "top_p": 0.5,
  "top_k": 123,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "seed": 123,
  "stream": false,
  "stop": "<string>",
  "tools": [
    {}
  ],
  "tool_choice": "<unknown>",
  "response_format": {}
}
'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1713204900,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?",
        "tool_calls": [
          {}
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  },
  "credits_used": 2
}
Drop-in OpenAI-compatible chat endpoint. Supports streaming, tool calls, JSON mode, vision, PDF, audio input and Files API file_id references.

Minimal example

curl https://api.infery.ai/v1/chat/completions \
  -H "Authorization: Bearer $INFERY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Summarise Node.js streams in 2 sentences"}
    ]
  }'

Streaming

Add "stream": true. Response is Server-Sent Events (text/event-stream). Each chunk is data: {...}\n\n; the stream ends with data: [DONE]\n\n. The final chunk before [DONE] carries usage info and Infery-specific credits_used:
data: {"choices":[{"delta":{"content":"..."}}]}
data: {"choices":[],"usage":{"prompt_tokens":42,"completion_tokens":128}}
data: {"choices":[],"usage":{...},"credits_used":2}
data: [DONE]
OpenAI SDKs ignore chunks with empty choices, so credits_used is a non-breaking extension.

Multimodal content

Pass arrays in content:
{
  "model": "gpt-4o",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What's in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]
  }]
}
Supported block types:
  • text
  • image_url — HTTP URL or base64 data: URI
  • input_audio — inline base64 audio with format (wav/mp3/pcm16/webm)
  • file — inline data + mime_type or file_id reference (Files API)

Using a Files API reference

{
  "content": [
    {"type": "text", "text": "Review this PDF"},
    {"type": "file", "file_id": "file_abc123..."}
  ]
}
The gateway resolves file_id to bytes on the server, injects them into the provider call, and returns a clear 400 if the id doesn’t exist or is out of your workspace.

Tool calls

Works exactly like OpenAI’s spec — tools, tool_choice, function schema and tool role messages. Every chat-capable model on Infery that supports tools honours the same format.

JSON mode

{"response_format": {"type": "json_object"}}
Or with a schema (Structured Outputs):
{
  "response_format": {
    "type": "json_schema",
    "json_schema": { "schema": { ... }, "strict": true }
  }
}

Vision and PDFs

Models with supportsVision: true accept images directly. For PDFs, models with supportsPdf: true read them natively. Others get an automatic PDF-to-image conversion on the gateway (plus text extraction) — you pay a small extra fee per page (see billing), no code changes required.

Parameters

Full OpenAI parameter set: temperature, top_p, presence_penalty, frequency_penalty, max_tokens, stop, seed, stream, tools, tool_choice, response_format. Plus model-specific:
  • top_k — Gemini and some OSS models

Response headers

  • x-request-id
  • x-credits-used
  • x-model-used (when fallback fires)
  • x-fallback-from, x-fallback-depth

Authorizations

Authorization
string
header
required

API key in format: Bearer inf_***

Headers

x-request-id
string

Optional request ID for tracking

Body

application/json
model
string
required

Model ID

Example:

"gpt-4o"

messages
object[]
required
temperature
number
Required range: 0 <= x <= 2
max_tokens
integer
top_p
number
Required range: 0 <= x <= 1
top_k
integer

Top-K sampling (Google Gemini)

presence_penalty
number

Presence penalty (OpenAI, Google Gemini)

Required range: -2 <= x <= 2
frequency_penalty
number

Frequency penalty (OpenAI, Google Gemini)

Required range: -2 <= x <= 2
seed
integer

Seed for deterministic output (OpenAI, Google Gemini)

stream
boolean
default:false
stop
tools
object[]
tool_choice
any
response_format
object

Response

Chat completion result. When stream=true, returns SSE (text/event-stream) where each data chunk is a chat.completion.chunk; the final chunk before [DONE] carries usage and credits_used.

id
string
Example:

"chatcmpl-abc123"

object
string
Example:

"chat.completion"

created
integer
Example:

1713204900

model
string
Example:

"gpt-4o"

choices
object[]
usage
object
credits_used
integer

Credits deducted from the workspace balance for this request

Example:

2