Skip to main content
Each model declares capabilities as boolean flags. You can read them via GET /v1/models/{slug} to branch feature-by-feature in your app.
CapabilityMeaning
supportsChatAccepts chat completions format
supportsStreamingCan stream tokens via SSE
supportsVisionAccepts image inputs
supportsPdfAccepts PDF natively (otherwise gateway auto-converts)
supportsToolsHonours tools array + tool_choice
supportsJsonModeresponse_format: {type: "json_object"} or JSON schema
supportsImagesGenerates images as output (different from Vision)
isReasoningChain-of-thought reasoning model (slower but more accurate on complex tasks)
isFlagshipProvider’s top-of-line model (highest quality + cost)

Provider-by-provider highlights

OpenAI

  • gpt-5-4 — flagship, all capabilities
  • gpt-4o — vision ✓, tools ✓, JSON ✓, streaming ✓
  • o3 / o1 — reasoning ✓ (no streaming tokens — final result only)
  • gpt-4o-audio — audio input natively in chat

Anthropic

  • claude-opus-4-7 — flagship, vision ✓, PDF ✓, tools ✓
  • claude-opus-4-6 — vision ✓, PDF ✓, tools ✓
  • claude-sonnet-4-6 — same with better cost/perf
  • Audio inputnot supported; use /v1/audio/transcriptions first

Google

  • gemini-2-5-pro — vision ✓, PDF ✓, 2M context window
  • gemini-2-5-flash — fast, huge context, multimodal
  • gemini-2-0-flash — budget option

xAI

  • grok-3 — vision ✓, JSON ✓

Alibaba

  • qwen3-max — vision + PDF + long context (262k)
  • qwen3-5-plus — 1M context
  • qwen-vl-max — vision-specialised

PDF fallback

If a model without supportsPdf receives a PDF, our gateway:
  1. Extracts text via pdftotext
  2. Renders pages to PNG (for vision models)
  3. Injects text + images into the message
  4. Bills a small pdf_extraction_per_page fee
You don’t lift a finger — works on Grok, DeepSeek, older models. The response behaves as if the model read the PDF natively.