Google: Gemini 3.1 Flash Lite
google/gemini-3.1-flash-liteAbout
Gemini 3.1 Flash Lite is Google's most cost-efficient Gemini model, generally available and optimized for low-latency, high-volume workloads. It accepts text, image, video, audio, and PDF inputs with text output, supports a roughly 1M-token context window with up to 64K output tokens, and targets lightweight agentic workflows, simple data extraction, and applications where responsiveness and API cost are the primary constraints. It supports configurable thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs and includes targeted improvements for instruction following and audio-input quality.
Capabilities
- Context Length
- 1.0M
- Max Output
- 65.5K
- Reasoning
- Yes
- In
- text, image, video, file, audio
- Out
- text
Benchmarks
View leaderboardReasoning & Knowledge
Coding & Agentic
Source: Artificial Analysis
Pricing
Full pricing| Type | Price / 1M tokens |
|---|---|
| Input | $0.25 |
| Output | $1.50 |
| Cache Read | $0.025 |
| Cache Write | $0.083333 |
| Audio Input | $0.50 |
| Audio Cache | $0.05 |
| Reasoning | $1.50 |
| Image Input | $0.25 |
| Web Search | $0.01 / call |
OpenAI-compatible · Model ID google/gemini-3.1-flash-lite
curl https://api.elliotgate.com/v1/chat/completions \
-H "Authorization: Bearer sk-omg-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3.1-flash-lite",
"messages": [{"role": "user", "content": "Hello!"}]
}'OFTEN COMPARED
Gemini 3.1 Flash Lite comparisons
Decide which model wins on the dimensions that matter for your workload — context, benchmarks, pricing, or serving latency.
Gemini 3.1 Flash Lite vs GPT-5.4 Nano
Gemini 3.
See full comparison →Gemini 3.1 Flash Lite vs Claude Haiku 4.5
Claude Haiku 4.
See full comparison →Gemini 3.1 Flash Lite vs GLM 5 Turbo
Gemini 3.
See full comparison →Gemini 3.1 Flash Lite vs Gemini 3.1 Flash Lite Preview
Gemini 3.
See full comparison →Gemini 3.1 Flash Lite vs Gemini 3.5 Flash
Both models come from Google's Gemini Flash family, share the same multimodal input stack — text, image, video, audio, and PDF — and offer identical 1M-token context windows with 64K output.
See full comparison →