Skip to main content

Per-API-key limits

Primary rate-limiting is per API key, implemented as a sliding window in Redis. Limit varies by plan and any attached quota preset:
PlanDefault RPMConfigurable max (preset)
Free1010
Starter3060
Growth60120
Pro120240
Business200400
Scale400800
EnterpriseCustomCustom
You see the effective value in Settings → API Keys next to each key.

Exceeded response

HTTP 429 Too Many Requests
Retry-After: 12
{
  "error": {
    "message": "Rate limit: 30 req/min on this API key",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
The Retry-After header tells you how many seconds until your window frees up. Honour it.

Daily token budgets

In addition to RPM, some plans cap total tokens per day (rateLimitTpd in your quota preset). Hitting this returns the same 429 with a different message.

Global safety net

We enforce a 5 000 req / 10 min per IP ceiling at the edge to stop scraping. This almost never trips for real users — only poorly-configured crawlers.

Best practices

  1. Always respect Retry-After. Don’t hammer.
  2. Use exponential backoff with jitter. 1s → 2s → 4s → 8s (+random 0–500 ms) is fine.
  3. Consider fallback chains. If 429s on gpt-4o matter, set a fallback to gpt-4o-mini or gemini-flash — the gateway handles retry for you.
  4. Use separate keys per environment. Dev traffic with a 30 rpm preset, prod on 400. Never mix.
  5. Queue on your side too. For batch workloads, implement a local rate limiter that stays below your key’s RPM — that way short bursts don’t get rejected.

Rate limits on specific endpoints

  • /contact/sales5 req / hour per IP (spam protection)
  • /public/plans and /public/models30 req / min per IP
  • /v1/files upload — per-key RPM applies; additionally serialised by workspace (one upload at a time)