gemini-2.5-flash-lite
Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
| Provider | Source | Input Price ($/1M) | Output Price ($/1M) | Description | Free |
|---|---|---|---|---|---|
| vercel | vercel | Input: $0.10 | Output: $0.40 | Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window. | |
| poe | poe | Input: $0.07 | Output: $0.28 | A lightweight Gemini 2.5 Flash reasoning model optimized for cost efficiency and low latency. Supports web search. Supports 1 million tokens of input context. Serves the latest `gemini-2.5-flash-lite-preview-09-2025` snapshot. For more complex queries, use https://poe.com/Gemini-2.5-Pro or https://poe.com/Gemini-2.5-Flash To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 24,576 to the end of your message. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting). | |
| helicone | models-dev | Input: $0.10 | Output: $0.40 | Provider: Helicone, Context: 1048576, Output Limit: 65535 | |
| models-dev | Input: $0.10 | Output: $0.40 | Provider: Google, Context: 1048576, Output Limit: 65536 | ||
| googlevertex | models-dev | Input: $0.10 | Output: $0.40 | Provider: Vertex, Context: 1048576, Output Limit: 65536 | |
| zenmux | models-dev | Input: $0.10 | Output: $0.40 | Provider: ZenMux, Context: 1048576, Output Limit: 64000 | |
| vertex | litellm | Input: $0.10 | Output: $0.40 | Source: vertex, Context: 1048576 | |
| gemini | litellm | Input: $0.10 | Output: $0.40 | Source: gemini, Context: 1048576 | |
| openrouter | openrouter | Input: $0.10 | Output: $0.40 | Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. Context: 1048576 |