Speech-to-text
Audio
Speech-to-text
POST /v1/audio/transcriptions — transcribe audio to text.
POST
Speech-to-text
We accept both multipart and JSON base64 for STT, so the OpenAI SDK works out of the box.
Multipart (OpenAI SDK default)
JSON base64 (light HTTP clients)
Response formats
json (default), text, srt, verbose_json (with segments + word timestamps), vtt.
Limits
- Max 25 MB audio per request
- Formats: MP3, MP4, M4A, WAV, WebM, OGG, FLAC
Authorizations
API key in format: Bearer inf_***
Body
application/json
Response
Transcription result. Shape depends on response_format: JSON (json, verbose_json) or plain text (text, srt, vtt).
response_format: json (default) or verbose_json

