Google

Google’s AI models available through Infery, spanning text, image, audio, video and music generation.

Text models

Model	Slug	Context	Max output	Stream	Tools	JSON	Vision	Files
Gemini 2.5 Computer Use	`gemini-2-5-computer-use`	1M	64K	✓	—	—	✓	PDF, images, audio, video, text, JSON
Gemini 2.5 Flash	`gemini-2-5-flash`	1M	8K	✓	✓	✓	✓	PDF, images, audio, video, text, JSON
Gemini 2.5 Flash Native Audio	`gemini-2-5-flash-native-audio`	1M	16K	✓	—	—	✓	PDF, images, audio, video, text, JSON
Gemini 2.5 Flash-Lite	`gemini-2-5-flash-lite`	1M	8K	✓	✓	✓	✓	PDF, images, audio, video, text, JSON
Gemini 2.5 Pro	`gemini-2-5-pro`	1M	64K	✓	✓	✓	✓	PDF, images, audio, video, text, JSON
Gemini 3 Flash Preview	`gemini-3-flash-preview`	1M	64K	✓	✓	✓	✓	PDF, images, audio, video, text, JSON
Gemini 3.1 Flash Live	`gemini-3.1-flash-live-preview`	128K	64K	—	—	—	—	PDF, images, audio, video, text, JSON
Gemini 3.1 Flash-Lite Preview	`gemini-3.1-flash-lite-preview`	1M	64K	✓	✓	✓	✓	PDF, images, audio, video, text, JSON
Gemini 3.1 Pro Preview	`gemini-3.1-pro-preview`	1M	64K	✓	✓	✓	✓	PDF, images, audio, video, text, JSON
Gemini Robotics-ER 1.5	`gemini-robotics-er`	1M	8K	✓	—	—	✓	PDF, images, audio, video, text, JSON

Google models support up to 50 files and 1000 PDF pages per request. Accepted formats: images (JPEG, PNG, GIF, WebP, HEIC), audio (WAV, MP3, OGG, FLAC, WebM, AAC, AIFF, MP4), video (MP4, WebM, MOV), text (plain, HTML, CSV, Markdown), and JSON.

Embedding models

Model	Slug	Max input	Multimodal
Gemini Embedding	`gemini-embedding-001`	8K	—
Gemini Embedding 2	`gemini-embedding-2`	8K	✓ (images, audio, video, PDF)

Image models

Model	Slug	Sizes	Max N	Aspect ratios	Formats	Person gen	Edits
Imagen 4	`imagen-4`	1K, 2K	4	1:1, 3:4, 4:3, 9:16, 16:9	png, jpeg	configurable	✓
Imagen 4 Fast	`imagen-4-fast`	—	4	1:1, 3:4, 4:3, 9:16, 16:9	png, jpeg	configurable	✓
Imagen 4 Ultra	`imagen-4-ultra`	1K, 2K	4	1:1, 3:4, 4:3, 9:16, 16:9	png, jpeg	configurable	✓
Nano Banana	`gemini-2-5-flash-image`	—	—	—	png, jpeg, webp	—	✓
Nano Banana 2	`gemini-3-1-flash-image`	1K, 2K, 4K	4	1:1, 3:4, 4:3, 9:16, 16:9, 1:2, 2:1, 2:3, 3:2	png, jpeg, webp	configurable	✓
Nano Banana Pro	`gemini-3-pro-image`	1K, 2K, 4K	4	1:1, 3:4, 4:3, 9:16, 16:9, 1:2, 2:1, 2:3, 3:2	png, jpeg, webp	configurable	✓

Audio models

Text-to-speech

Model	Slug	Output formats
Gemini 2.5 Flash TTS	`gemini-2-5-flash-tts`	mp3, opus, wav, flac
Gemini 2.5 Pro TTS	`gemini-2-5-pro-tts`	mp3, opus, wav, flac
Google Cloud TTS	`google-cloud-tts`	mp3, opus, wav, flac

Speech-to-text

Model	Slug	Response formats	Max file	Inputs
Gemini 2.5 Flash STT	`gemini-2-5-flash-stt`	json, text, srt, verbose_json, vtt	25 MB	mp3, wav, ogg, flac, webm, mp4, aac, aiff

Video models

Model	Slug	Durations	Resolutions	Aspect ratios	Person gen	Image-to-video
Veo 2	`veo-2`	5, 6, 8s	720p	16:9, 9:16	configurable	✓
Veo 3	`veo-3`	4, 6, 8s	720p, 1080p	16:9, 9:16	configurable	✓
Veo 3 Fast	`veo-3-fast`	4, 6, 8s	720p, 1080p	16:9, 9:16	configurable	✓
Veo 3.1	`veo-3-1`	4, 6, 8s	720p, 1080p, 4K	16:9, 9:16	configurable	✓
Veo 3.1 Fast	`veo-3-1-fast`	4, 6, 8s	720p, 1080p, 4K	16:9, 9:16	configurable	✓
Veo 3.1 Lite	`veo-3-1-lite`	4, 6, 8s	720p, 1080p	16:9, 9:16	configurable	✓ (max 2)

Music models

Model	Slug	Max duration	Formats	Image input
Lyria 3 Clip	`lyria-3-clip`	30s	mp3	✓ (up to 10 images)
Lyria 3 Pro	`lyria-3-pro`	240s	mp3, wav	✓ (up to 10 images)

Get started

Playground

Workspaces

Billing

Models

Guides

Reference

Text models

Embedding models

Image models

Audio models

Text-to-speech

Speech-to-text

Video models

Music models

​Text models

​Embedding models

​Image models

​Audio models

​Text-to-speech

​Speech-to-text

​Video models

​Music models

Text models

Embedding models

Image models

Audio models

Text-to-speech

Speech-to-text

Video models

Music models