Speech-to-text

curl --request POST \ --url https://api.infery.ai/v1/audio/transcriptions \ --header 'Authorization: <api-key>' \ --header 'Content-Type: application/json' \ --data ' { "model": "<string>", "audio": "<string>", "language": "<string>", "response_format": "json" } '

{ "text": "Hello, this is a test transcription.", "language": "en", "duration": 12.5, "segments": [ { "id": 123, "start": 123, "end": 123, "text": "<string>", "avg_logprob": 123, "compression_ratio": 123, "no_speech_prob": 123 } ], "credits_used": 3 }

from openai import OpenAI client = OpenAI(api_key=API_KEY, base_url="https://api.infery.ai/v1") tr = client.audio.transcriptions.create( model="whisper-1", file=open("meeting.mp3", "rb"), response_format="verbose_json", ) print(tr.text)

curl https://api.infery.ai/v1/audio/transcriptions \ -H "Authorization: Bearer $INFERY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "whisper-1", "file_base64": "<base64 audio>", "filename": "meeting.mp3", "response_format": "json" }'

Limits

Max 25 MB audio per request

Formats: MP3, MP4, M4A, WAV, WebM, OGG, FLAC

Authorizations

Authorization

string

header

required

API key in format: Bearer inf_***

Body

application/json

model

string

required

Model ID to use for STT

audio

string

required

Base64-encoded audio data

language

string

Language of the audio (ISO-639-1)

response_format

enum<string>

default:json

Available options:

json,

text,

srt,

verbose_json,

vtt

Response

Transcription result. Shape depends on response_format: JSON (json, verbose_json) or plain text (text, srt, vtt).

response_format: json (default) or verbose_json

text

string

Example:

"Hello, this is a test transcription."

language

string

Detected language (verbose_json only)

Example:

"en"

duration

number

Audio duration in seconds (verbose_json only)

Example:

12.5

segments

object[]

Time-stamped segments (verbose_json only)

Show child attributes

credits_used

integer

Example:

3

Overview

Chat Completions

Embeddings

Images

Audio

Video

Music

Files

Models

Multipart (OpenAI SDK default)

JSON base64 (light HTTP clients)

Response formats

Limits

Authorizations

Body

Response

Overview

Chat Completions

Embeddings

Images

Audio

Video

Music

Files

Models

​Multipart (OpenAI SDK default)

​JSON base64 (light HTTP clients)

​Response formats

​Limits

Authorizations

Body

Response

Multipart (OpenAI SDK default)

JSON base64 (light HTTP clients)

Response formats

Limits