Chat completions

Create chat completion

curl --request POST \
  --url https://api.infery.ai/v1/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "temperature": 1,
  "max_tokens": 123,
  "top_p": 0.5,
  "top_k": 123,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "seed": 123,
  "stream": false,
  "stop": "<string>",
  "tools": [
    {}
  ],
  "tool_choice": "<unknown>",
  "response_format": {}
}
'

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1713204900,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?",
        "tool_calls": [
          {}
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  },
  "credits_used": 2
}

POST

chat

completions

Create chat completion

curl --request POST \
  --url https://api.infery.ai/v1/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "temperature": 1,
  "max_tokens": 123,
  "top_p": 0.5,
  "top_k": 123,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "seed": 123,
  "stream": false,
  "stop": "<string>",
  "tools": [
    {}
  ],
  "tool_choice": "<unknown>",
  "response_format": {}
}
'

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1713204900,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?",
        "tool_calls": [
          {}
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  },
  "credits_used": 2
}

Drop-in OpenAI-compatible chat endpoint. Supports streaming, tool calls, JSON mode, vision, PDF, audio input and Files API file_id references.

Minimal example

curl https://api.infery.ai/v1/chat/completions \
  -H "Authorization: Bearer $INFERY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Summarise Node.js streams in 2 sentences"}
    ]
  }'

Streaming

Add "stream": true. Response is Server-Sent Events (text/event-stream). Each chunk is data: {...}\n\n; the stream ends with data: [DONE]\n\n. The final chunk before [DONE] carries usage info and Infery-specific credits_used:

data: {"choices":[{"delta":{"content":"..."}}]}
data: {"choices":[],"usage":{"prompt_tokens":42,"completion_tokens":128}}
data: {"choices":[],"usage":{...},"credits_used":2}
data: [DONE]

OpenAI SDKs ignore chunks with empty choices, so credits_used is a non-breaking extension.

Multimodal content

Pass arrays in content:

{
  "model": "gpt-4o",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What's in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]
  }]
}

Supported block types:

text
image_url — HTTP URL or base64 data: URI
input_audio — inline base64 audio with format (wav/mp3/pcm16/webm)
file — inline data + mime_type or file_id reference (Files API)

Using a Files API reference

{
  "content": [
    {"type": "text", "text": "Review this PDF"},
    {"type": "file", "file_id": "file_abc123..."}
  ]
}

The gateway resolves file_id to bytes on the server, injects them into the provider call, and returns a clear 400 if the id doesn’t exist or is out of your workspace.

Tool calls

Works exactly like OpenAI’s spec — tools, tool_choice, function schema and tool role messages. Every chat-capable model on Infery that supports tools honours the same format.

JSON mode

{"response_format": {"type": "json_object"}}

Or with a schema (Structured Outputs):

{
  "response_format": {
    "type": "json_schema",
    "json_schema": { "schema": { ... }, "strict": true }
  }
}

Vision and PDFs

Models with supportsVision: true accept images directly. For PDFs, models with supportsPdf: true read them natively. Others get an automatic PDF-to-image conversion on the gateway (plus text extraction) — you pay a small extra fee per page (see billing), no code changes required.

Parameters

Full OpenAI parameter set: temperature, top_p, presence_penalty, frequency_penalty, max_tokens, stop, seed, stream, tools, tool_choice, response_format. Plus model-specific:

top_k — Gemini and some OSS models

Response headers

x-request-id
x-credits-used
x-model-used (when fallback fires)
x-fallback-from, x-fallback-depth

Authorizations

Authorization

string

header

required

API key in format: Bearer inf_***

Headers

x-request-id

string

Optional request ID for tracking

Body

application/json

model

string

required

Model ID

Example:

"gpt-4o"

messages

object[]

required

Show child attributes

temperature

number

Required range: 0 <= x <= 2

max_tokens

integer

top_p

number

Required range: 0 <= x <= 1

top_k

integer

Top-K sampling (Google Gemini)

presence_penalty

number

Presence penalty (OpenAI, Google Gemini)

Required range: -2 <= x <= 2

frequency_penalty

number

Frequency penalty (OpenAI, Google Gemini)

Required range: -2 <= x <= 2

seed

integer

Seed for deterministic output (OpenAI, Google Gemini)

stream

boolean

default:false

stop

tools

object[]

tool_choice

any

response_format

object

Response

Chat completion result. When stream=true, returns SSE (text/event-stream) where each data chunk is a chat.completion.chunk; the final chunk before [DONE] carries usage and credits_used.

string

Example:

"chatcmpl-abc123"

object

string

Example:

"chat.completion"

created

integer

Example:

1713204900

model

string

Example:

"gpt-4o"

choices

object[]

Show child attributes

usage

object

Show child attributes

credits_used

integer

Credits deducted from the workspace balance for this request

Example:

2

Rate limits Embeddings

⌘I

Overview

Embeddings

Images

Audio

Video

Music

Files

Models

Chat completions

Minimal example

Streaming

Multimodal content

Using a Files API reference

Tool calls

JSON mode

Vision and PDFs

Parameters

Response headers

Authorizations

Headers

Body

Response

Overview

Chat Completions

Embeddings

Images

Audio

Video

Music

Files

Models

​Minimal example

​Streaming

​Multimodal content

​Using a Files API reference

​Tool calls

​JSON mode

​Vision and PDFs

​Parameters

​Response headers

Authorizations

Headers

Body

Response

Minimal example

Streaming

Multimodal content

Using a Files API reference

Tool calls

JSON mode

Vision and PDFs

Parameters

Response headers