Skip to main content
Google’s AI models available through Infery, spanning text, image, audio, video and music generation.

Text models

ModelSlugContextMax outputStreamToolsJSONVisionFiles
Gemini 2.5 Computer Usegemini-2-5-computer-use1M64KPDF, images, audio, video, text, JSON
Gemini 2.5 Flashgemini-2-5-flash1M8KPDF, images, audio, video, text, JSON
Gemini 2.5 Flash Native Audiogemini-2-5-flash-native-audio1M16KPDF, images, audio, video, text, JSON
Gemini 2.5 Flash-Litegemini-2-5-flash-lite1M8KPDF, images, audio, video, text, JSON
Gemini 2.5 Progemini-2-5-pro1M64KPDF, images, audio, video, text, JSON
Gemini 3 Flash Previewgemini-3-flash-preview1M64KPDF, images, audio, video, text, JSON
Gemini 3.1 Flash Livegemini-3.1-flash-live-preview128K64KPDF, images, audio, video, text, JSON
Gemini 3.1 Flash-Lite Previewgemini-3.1-flash-lite-preview1M64KPDF, images, audio, video, text, JSON
Gemini 3.1 Pro Previewgemini-3.1-pro-preview1M64KPDF, images, audio, video, text, JSON
Gemini Robotics-ER 1.5gemini-robotics-er1M8KPDF, images, audio, video, text, JSON
Google models support up to 50 files and 1000 PDF pages per request. Accepted formats: images (JPEG, PNG, GIF, WebP, HEIC), audio (WAV, MP3, OGG, FLAC, WebM, AAC, AIFF, MP4), video (MP4, WebM, MOV), text (plain, HTML, CSV, Markdown), and JSON.

Embedding models

ModelSlugMax inputMultimodal
Gemini Embeddinggemini-embedding-0018K
Gemini Embedding 2gemini-embedding-28K✓ (images, audio, video, PDF)

Image models

ModelSlugSizesMax NAspect ratiosFormatsPerson genEdits
Imagen 4imagen-41K, 2K41:1, 3:4, 4:3, 9:16, 16:9png, jpegconfigurable
Imagen 4 Fastimagen-4-fast41:1, 3:4, 4:3, 9:16, 16:9png, jpegconfigurable
Imagen 4 Ultraimagen-4-ultra1K, 2K41:1, 3:4, 4:3, 9:16, 16:9png, jpegconfigurable
Nano Bananagemini-2-5-flash-imagepng, jpeg, webp
Nano Banana 2gemini-3-1-flash-image1K, 2K, 4K41:1, 3:4, 4:3, 9:16, 16:9, 1:2, 2:1, 2:3, 3:2png, jpeg, webpconfigurable
Nano Banana Progemini-3-pro-image1K, 2K, 4K41:1, 3:4, 4:3, 9:16, 16:9, 1:2, 2:1, 2:3, 3:2png, jpeg, webpconfigurable

Audio models

Text-to-speech

ModelSlugOutput formats
Gemini 2.5 Flash TTSgemini-2-5-flash-ttsmp3, opus, wav, flac
Gemini 2.5 Pro TTSgemini-2-5-pro-ttsmp3, opus, wav, flac
Google Cloud TTSgoogle-cloud-ttsmp3, opus, wav, flac

Speech-to-text

ModelSlugResponse formatsMax fileInputs
Gemini 2.5 Flash STTgemini-2-5-flash-sttjson, text, srt, verbose_json, vtt25 MBmp3, wav, ogg, flac, webm, mp4, aac, aiff

Video models

ModelSlugDurationsResolutionsAspect ratiosPerson genImage-to-video
Veo 2veo-25, 6, 8s720p16:9, 9:16configurable
Veo 3veo-34, 6, 8s720p, 1080p16:9, 9:16configurable
Veo 3 Fastveo-3-fast4, 6, 8s720p, 1080p16:9, 9:16configurable
Veo 3.1veo-3-14, 6, 8s720p, 1080p, 4K16:9, 9:16configurable
Veo 3.1 Fastveo-3-1-fast4, 6, 8s720p, 1080p, 4K16:9, 9:16configurable
Veo 3.1 Liteveo-3-1-lite4, 6, 8s720p, 1080p16:9, 9:16configurable✓ (max 2)

Music models

ModelSlugMax durationFormatsImage input
Lyria 3 Cliplyria-3-clip30smp3✓ (up to 10 images)
Lyria 3 Prolyria-3-pro240smp3, wav✓ (up to 10 images)