← Back to all models

Vercel Models

195 Models
Name Model ID Input Price ($/1M) Output Price ($/1M) Description Free
Grok Code Fast 1 grok-code-fast-1 0.20 1.50 xAI's latest coding model that offers fast agentic coding with a 256K context window.
Claude Sonnet 4.5 claude-sonnet-4.5 3.00 15.00 Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4.
Gemini 2.5 Flash Lite gemini-2.5-flash-lite 0.10 0.40 Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
Gemini 3 Flash gemini-3-flash 0.50 3.00 Google's most intelligent model built for speed, combining frontier intelligence with superior search and grounding.
Claude Haiku 4.5 claude-haiku-4.5 1.00 5.00 Claude Haiku 4.5 matches Sonnet 4's performance on coding, computer use, and agent tasks at substantially lower cost and faster speeds. It delivers near-frontier performance and Claude’s unique character at a price point that works for scaled sub-agent deployments, free tier products, and intelligence-sensitive applications with budget constraints.
MiniMax M2 minimax-m2 0.27 1.15 MiniMax-M2 redefines efficiency for agents. It is a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence.
Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
DeepSeek V3.2 deepseek-v3.2 0.27 0.40 DeepSeek-V3.2: Official successor to V3.2-Exp.
Claude Opus 4.5 claude-opus-4.5 5.00 25.00 Claude Opus 4.5 is Anthropic’s latest model in the Opus series, meant for demanding reasoning tasks and complex problem solving. This model has improvements in general intelligence and vision compared to previous iterations. In addition, it is suited for difficult coding tasks and agentic workflows, especially those with computer use and tool use, and can effectively handle context usage and external memory files.
Claude 3.7 Sonnet claude-3.7-sonnet 3.00 15.00 Claude 3.7 Sonnet is the first hybrid reasoning model and Anthropic's most intelligent model to date. It delivers state-of-the-art performance for coding, content generation, data analysis, and planning tasks, building upon its predecessor Claude 3.5 Sonnet's capabilities in software engineering and computer use.
GPT-5.2 gpt-5.2 1.75 14.00 GPT-5.2 is OpenAI's best general-purpose model, part of the GPT-5 flagship model family. It's their most intelligent model yet for both general and agentic tasks.
Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Claude Sonnet 4 significantly improves on Sonnet 3.7's industry-leading capabilities, excelling in coding with a state-of-the-art 72.7% on SWE-bench. The model balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.
Grok 4.1 Fast Non-Reasoning grok-4.1-fast-non-reasoning 0.20 0.50 Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for speed use this variant. Otherwise, use the reasoning version.
Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 This model improves upon Gemini 2.5 Pro and is catered towards challenging tasks, especially those involving complex reasoning or agentic workflows. Improvements highlighted include use cases for coding, multi-step function calling, planning, reasoning, deep knowledge tasks, and instruction following.
GPT-5 mini gpt-5-mini 0.25 2.00 GPT-5 mini is a cost optimized model that excels at reasoning/chat tasks. It offers an optimal balance between speed, cost, and capability.
GPT-5 gpt-5 1.25 10.00 GPT-5 is OpenAI's flagship language model that excels at complex reasoning, broad real-world knowledge, code-intensive, and multi-step agentic tasks.
GPT-5 Chat gpt-5-chat 1.25 10.00 GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT.
GPT-5 nano gpt-5-nano 0.05 0.40 GPT-5 nano is a high throughput model that excels at simple instruction or classification tasks.
GPT-4.1 mini gpt-4.1-mini 0.40 1.60 GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.
GPT-5-Codex gpt-5-codex 1.25 10.00 GPT-5-Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments.
Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Gemini 2.5 Pro is our most advanced reasoning Gemini model, capable of solving complex problems. Gemini 2.5 Pro can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and even entire code repositories.
GLM 4.6 glm-4.6 0.45 1.80 As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
Grok 4 Fast Non-Reasoning grok-4-fast-non-reasoning 0.20 0.50 Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
gpt-oss-120b gpt-oss-120b 0.10 0.50 Extremely capable general-purpose LLM with strong, controllable reasoning capabilities
gpt-oss-safeguard-20b gpt-oss-safeguard-20b 0.08 0.30 OpenAI's first open weight reasoning model specifically trained for safety classification tasks. Fine-tuned from GPT-OSS, this model helps classify text content based on customizable policies, enabling bring-your-own-policy Trust & Safety AI where your own taxonomy, definitions, and thresholds guide classification decisions.
GPT-5.1 Instant gpt-5.1-instant 1.25 10.00 GPT-5.1 Instant (or GPT-5.1 chat) is a warmer and more conversational version of GPT-5-chat, with improved instruction following and adaptive reasoning for deciding when to think before responding.
GPT-4o mini gpt-4o-mini 0.15 0.60 GPT-4o mini from OpenAI is their most advanced and cost-efficient small model. It is multi-modal (accepting text or image inputs and outputting text) and has higher intelligence than gpt-3.5-turbo but is just as fast.
MiniMax M2.1 minimax-m2.1 0.30 1.20 MiniMax 2.1 is MiniMax's latest model, optimized specifically for robustness in coding, tool use, instruction following, and long-horizon planning.
Gemini 2.0 Flash gemini-2.0-flash 0.10 0.40 Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
Devstral 2 devstral-2 0.00 0.00 An enterprise-grade text model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
GPT 5.1 Thinking gpt-5.1-thinking 1.25 10.00 An upgraded version of GPT-5 that adapts thinking time more precisely to the question to spend more time on complex questions and respond more quickly to simpler tasks.
text-embedding-3-small text-embedding-3-small 0.02 0.00 OpenAI's improved, more performant version of their ada embedding model.
Grok 4.1 Fast Reasoning grok-4.1-fast-reasoning 0.20 0.50 Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for maximal intelligence use this variant. Otherwise, use the non-reasoning version.
DeepSeek V3.2 Thinking deepseek-v3.2-thinking 0.28 0.42 Thinking mode of DeepSeek V3.2
GLM 4.7 glm-4.7 0.43 1.75 GLM-4.7 is Z.ai’s latest flagship model, with major upgrades focused on two key areas: stronger coding capabilities and more stable multi-step reasoning and execution.
Ministral 3B ministral-3b 0.04 0.04 A compact, efficient model for on-device tasks like smart assistants and local analytics, offering low-latency performance.
Devstral Small 2 devstral-small-2 0.00 0.00 Our open source model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
Mistral Embed mistral-embed 0.10 0.00 General-purpose text embedding model for semantic search, similarity, clustering, and RAG workflows.
Nova Lite nova-lite 0.06 0.24 A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.
Claude Opus 4.1 claude-opus-4.1 15.00 75.00 Claude Opus 4.1 is a drop-in replacement for Opus 4 that delivers superior performance and precision for real-world coding and agentic tasks. Opus 4.1 advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, and handles complex, multi-step problems with more rigor and attention to detail.
Qwen3 Next 80B A3B Instruct qwen3-next-80b-a3b-instruct 0.09 1.10 A new generation of open-source, non-thinking mode model powered by Qwen3. This version demonstrates superior Chinese text understanding, augmented logical reasoning, and enhanced capabilities in text generation tasks over the previous iteration (Qwen3-235B-A22B-Instruct-2507).
GPT-4.1 gpt-4.1 2.00 8.00 GPT 4.1 is OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.
GPT-4o gpt-4o 2.50 10.00 GPT-4o from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It matches GPT-4 Turbo performance with a faster and cheaper API.
GPT-4.1 nano gpt-4.1-nano 0.10 0.40 GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.
GPT 5.1 Codex Max gpt-5.1-codex-max 1.25 10.00 GPT‑5.1-Codex-Max is purpose-built for agentic coding.
Grok 4 Fast Reasoning grok-4-fast-reasoning 0.20 0.50 Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
Grok 4 grok-4 3.00 15.00 xAI's latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.
Nano Banana (Gemini 2.5 Flash Image) gemini-2.5-flash-image 0.30 2.50 Nano Banana (Gemini 2.5 Flash Image) is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
Nano Banana Pro (Gemini 3 Pro Image) gemini-3-pro-image 2.00 120.00 Nano Banana Pro (Gemini 3 Pro Image) builds on Nano Banana's generation capabilities into a new era of studio-quality, functional design to help you create and edit high-fidelity, production-ready visuals with unparalleled precision and control. Improvements include enhanced world knowledge and reasoning, dynamic text and translation, and studio level controls.
gpt-oss-20b gpt-oss-20b 0.07 0.30 A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
Gemini Embedding 001 gemini-embedding-001 0.15 0.00 State-of-the-art embedding model with excellent performance across English, multilingual and code tasks.
o4-mini o4-mini 1.10 4.40 OpenAI's o4-mini delivers fast, cost-efficient reasoning with exceptional performance for its size, particularly excelling in math (best-performing on AIME benchmarks), coding, and visual tasks.
Sonar sonar 1.00 1.00 Perplexity's lightweight offering with search grounding, quicker and cheaper than Sonar Pro.
Kimi K2 0905 kimi-k2-0905 0.60 2.50 Kimi K2 0905 has shown strong performance on agentic tasks thanks to its tool calling, reasoning abilities, and long context handling. But as a large parameter model (1T parameters), it’s also resource-intensive. Running it in production requires a highly optimized inference stack to avoid excessive latency.
Gemini 2.5 Flash Lite Preview 09-2025 gemini-2.5-flash-lite-preview-09-2025 0.10 0.40 Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
text-embedding-3-large text-embedding-3-large 0.13 0.00 OpenAI's most capable embedding model for both english and non-english tasks.
Gemini 2.0 Flash Lite gemini-2.0-flash-lite 0.08 0.30 Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
Claude Opus 4 claude-opus-4 15.00 75.00 Claude Opus 4 is Anthropic's most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish.
Claude 3.5 Haiku claude-3.5-haiku 0.80 4.00 Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.
GPT-5.2 Chat gpt-5.2-chat 1.75 14.00 The model powering ChatGPT is gpt-5.2-chat-latest: this is OpenAI's best general-purpose model, part of the GPT-5 flagship model family.
Gemini 2.5 Flash Preview 09-2025 gemini-2.5-flash-preview-09-2025 0.30 2.50 Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
GPT-5.1 Codex mini gpt-5.1-codex-mini 0.25 2.00 GPT-5.1 Codex mini is a smaller, faster, and cheaper version of GPT-5.1 Codex.
DeepSeek V3.2 Exp deepseek-v3.2-exp 0.27 0.40 DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality.
MiMo V2 Flash mimo-v2-flash 0.10 0.29 Xiaomi MiMo-V2-Flash is a proprietary MoE model developed by Xiaomi, designed for extreme inference efficiency with 309B total parameters (15B active). By incorporating an innovative Hybrid attention architecture and multi-layer MTP inference acceleration, it ranks among the top 2 global open-source models across multiple Agent benchmarks.
DeepSeek V3 0324 deepseek-v3 0.77 0.77 Fast general-purpose LLM with enhanced reasoning capabilities
Mistral Small mistral-small 0.10 0.30 Mistral Small is the ideal choice for simple tasks that one can do in bulk - like Classification, Customer Support, or Text Generation. It offers excellent performance at an affordable price point.
o3 o3 2.00 8.00 OpenAI's o3 is their most powerful reasoning model, setting new state-of-the-art benchmarks in coding, math, science, and visual perception. It excels at complex queries requiring multi-faceted analysis, with particular strength in analyzing images, charts, and graphics.
Qwen3 Max qwen3-max 1.20 6.00 The Qwen 3 series Max model has undergone specialized upgrades in agent programming and tool invocation compared to the preview version. The officially released model this time has achieved state-of-the-art (SOTA) performance in its field and is better suited to meet the demands of agents operating in more complex scenarios.
Llama 3.3 70B llama-3.3-70b 0.72 0.72 The upgraded Llama 3.1 70B model features enhanced reasoning, tool use, and multilingual abilities, along with a significantly expanded 128K context window. These improvements make it well-suited for demanding tasks such as long-form summarization, multilingual conversations, and coding assistance.
Llama 3.1 8B llama-3.1-8b 0.03 0.05 Llama 3.1 8B brings powerful performance in a smaller, more efficient package. With improved multilingual support, tool use, and a 128K context length, it enables sophisticated use cases like interactive agents and compact coding assistants while remaining lightweight and accessible.
GPT-5.1-Codex gpt-5.1-codex 1.25 10.00 GPT-5.1-Codex is a version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.
Kimi K2 Thinking kimi-k2-thinking 0.47 2.00 Kimi K2 Thinking is an advanced open-source thinking model by Moonshot AI. It can execute up to 200 – 300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. Built as a thinking agent, it reasons step by step while using tools, achieving state-of-the-art performance on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, with major gains in reasoning, agentic search, coding, writing, and general capabilities.
KAT-Coder-Pro V1 kat-coder-pro-v1 0.00 0.00 KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KwaiKAT series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving a remarkable 73.4% solve rate on the SWE-Bench Verified benchmark. KAT-Coder-Pro V1 delivers top-tier coding performance and has been rigorously tested by thousands of in-house engineers. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
Qwen3 235B A22b Instruct 2507 qwen-3-235b 0.13 0.60 Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
MiniMax M2.1 Lightning minimax-m2.1-lightning 0.30 2.40 MiniMax-M2.1-lightning is a faster version of MiniMax-M2.1, offering the same performance but with significantly higher throughput (output speed ~100 TPS, MiniMax-M2 output speed ~60 TPS).
Kimi K2 kimi-k2 0.50 2.00 Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.
DeepSeek R1 0528 deepseek-r1 0.50 2.15 The latest revision of DeepSeek's first-generation reasoning model
text-embedding-ada-002 text-embedding-ada-002 0.10 0.00 OpenAI's legacy text embedding model.
Llama 4 Scout 17B 16E Instruct llama-4-scout 0.08 0.30 The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts. Served by DeepInfra.
o3-mini o3-mini 1.10 4.40 o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini.
DeepSeek V3.1 Terminus deepseek-v3.1-terminus 0.27 1.00 DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version and addresses user feedback (i.e. language consistency and agent upgrades).
Mistral Large 3 mistral-large-3 0.50 1.50 Mistral Large 3 2512 is Mistral’s most capable model to date. It has a sparse mixture-of-experts architecture with 41B active parameters (675B total).
Pixtral 12B 2409 pixtral-12b 0.15 0.15 A 12B model with image understanding capabilities in addition to text.
Sonar Pro sonar-pro 3.00 15.00 Perplexity's premier offering with search grounding, supporting advanced queries and follow-ups.
GLM-4.6V-Flash glm-4.6v-flash 0.00 0.00 For local deployment and low-latency applications. GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
Kimi K2 Thinking Turbo kimi-k2-thinking-turbo 1.15 8.00 High-speed version of kimi-k2-thinking, suitable for scenarios requiring both deep reasoning and extremely fast responses
Llama 4 Maverick 17B 128E Instruct llama-4-maverick 0.15 0.60 Llama 4 Maverick 17B-128E is Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities.
DeepSeek V3.1 deepseek-v3.1 0.30 1.00 DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.
Kimi K2 Turbo kimi-k2-turbo 2.40 10.00 Kimi K2 Turbo is the high-speed version of kimi-k2, with the same model parameters as kimi-k2, but the output speed is increased to 60 tokens per second, with a maximum of 100 tokens per second, the context length is 256k
Grok 3 Mini Beta grok-3-mini 0.30 0.50 xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible.
Claude 3.5 Sonnet claude-3.5-sonnet 3.00 15.00 The upgraded Claude 3.5 Sonnet is now state-of-the-art for a variety of tasks including real-world software engineering, agentic capabilities and computer use. The new Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.
LongCat Flash Chat longcat-flash-chat 0.00 0.00 LongCat-Flash-Chat is a high-throughput MoE chat model (128k context) designed for agentic tasks.
Qwen3 Next 80B A3B Thinking qwen3-next-80b-a3b-thinking 0.15 1.50 A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).
Qwen 3.32B qwen-3-32b 0.10 0.30 Qwen3-32B is a world-class model with comparable quality to DeepSeek R1 while outperforming GPT-4.1 and Claude Sonnet 3.7. It excels in code-gen, tool-calling, and advanced reasoning, making it an exceptional model for a wide range of production use cases.
Claude 3 Haiku claude-3-haiku 0.25 1.25 Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts. Haiku to quickly analyze large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier.
Qwen3 VL 235B A22B Instruct qwen3-vl-instruct 0.70 2.80 The Qwen3 series VL models has been comprehensively upgraded in areas such as visual coding and spatial perception. Its visual perception and recognition capabilities have significantly improved, supporting the understanding of ultra-long videos, and its OCR functionality has undergone a major enhancement.
Text Embedding 005 text-embedding-005 0.03 0.00 English-focused text embedding model optimized for code and English language tasks.
Nano Banana Preview (Gemini 2.5 Flash Image Preview) gemini-2.5-flash-image-preview 0.30 2.50 Gemini 2.5 Flash Image Preview is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
GPT 5.2 gpt-5.2-pro 21.00 168.00 Version of GPT-5.2 that produces smarter and more precise responses.
Qwen 3 Coder 30B A3B Instruct qwen3-coder-30b-a3b 0.07 0.27 Efficient coding specialist balancing performance with cost-effectiveness for daily development tasks while maintaining strong tool integration capabilities.
Qwen3 Coder 480B A35B Instruct qwen3-coder 0.38 1.53 Mixture-of-experts LLM with advanced coding and reasoning capabilities
Grok 2 Vision grok-2-vision 2.00 10.00 Grok 2 vision model excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA). It can process a wide variety of visual information including documents, diagrams, charts, screenshots, and photographs.
Morph V3 Fast morph-v3-fast 0.80 1.20 Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 4500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
Grok 3 Beta grok-3 3.00 15.00 xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.
Nova Micro nova-micro 0.04 0.14 A text-only model that delivers the lowest latency responses at very low cost.
Ministral 14B ministral-14b 0.20 0.20 Ministral 3 14B is the largest model in the Ministral 3 family, offering state-of-the-art capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. Optimized for local deployment, it delivers high performance across diverse hardware, including local setups.
Ministral 8B ministral-8b 0.10 0.10 A more powerful model with faster, memory-efficient inference, ideal for complex workflows and demanding edge applications.
Mistral Codestral codestral 0.30 0.90 Mistral's cutting-edge language model for coding released end of July 2025, Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
Claude 3 Opus claude-3-opus 15.00 75.00 Claude 3 Opus is Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what's possible with generative AI.
Pixtral Large pixtral-large 2.00 6.00 Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.
GPT-4 Turbo gpt-4-turbo 10.00 30.00 gpt-4-turbo from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It has a knowledge cutoff of April 2023 and a 128,000 token context window.
voyage-3.5 voyage-3.5 0.06 0.00 Voyage AI's embedding model optimized for general-purpose and multilingual retrieval quality.
Llama 3.1 70B Instruct llama-3.1-70b 0.40 0.40 An update to Meta Llama 3 70B Instruct that includes an expanded 128K context length, multilinguality and improved reasoning capabilities.
Nemotron 3 Nano 30B A3B nemotron-3-nano-30b-a3b 0.06 0.24 NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.
Qwen3 VL 235B A22B Thinking qwen3-vl-thinking 0.70 8.40 Qwen3 series VL models feature significantly enhanced multimodal reasoning capabilities, with a particular focus on optimizing the model for STEM and mathematical reasoning. Visual perception and recognition abilities have been comprehensively improved, and OCR capabilities have undergone a major upgrade.
Sonar Reasoning Pro sonar-reasoning-pro 2.00 8.00 A premium reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing comprehensive explanations with enhanced search capabilities and multiple search queries per request.
GPT-3.5 Turbo gpt-3.5-turbo 0.50 1.50 OpenAI's most capable and cost effective model in the GPT-3.5 family optimized for chat purposes, but also works well for traditional completions tasks.
Qwen3 Embedding 8B qwen3-embedding-8b 0.05 0.00 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
Mistral Medium 3.1 mistral-medium 0.40 2.00 Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost.
INTELLECT 3 intellect-3 0.20 1.10 Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack. Achieving state-of-the-art performance for its size across math, code and reasoning.
Nvidia Nemotron Nano 12B V2 VL nemotron-nano-12b-v2-vl 0.20 0.60 The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.
Qwen3-14B qwen-3-14b 0.06 0.24 Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
Embed v4.0 embed-v4.0 0.12 0.00 A model that allows for text, images, or mixed content to be classified or turned into embeddings.
GLM 4.5 Air glm-4.5-air 0.20 1.10 GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
GPT-5 pro gpt-5-pro 15.00 120.00 GPT-5 pro uses more compute to think harder and provide consistently better answers. Since GPT-5 pro is designed to tackle tough problems, some requests may take several minutes to finish.
Llama 3.2 3B Instruct llama-3.2-3b 0.15 0.15 Text-only model, fine-tuned for supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
voyage-3-large voyage-3-large 0.18 0.00 Voyage AI's embedding model with the best general-purpose and multilingual retrieval quality.
Titan Text Embeddings V2 titan-embed-text-v2 0.02 0.00 Amazon Titan Text Embeddings V2 is a light weight, efficient multilingual embedding model supporting 1024, 512, and 256 dimensions.
Grok 3 Fast Beta grok-3-fast 5.00 25.00 xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking 0.30 2.90 Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.
v0-1.5-md v0-1.5-md 3.00 15.00 Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
Qwen3 Coder Plus qwen3-coder-plus 1.00 5.00 Powered by Qwen3 this is a powerful Coding Agent that excels in tool calling and environment interaction to achieve autonomous programming. It combines outstanding coding proficiency with versatile general-purpose abilities.
Qwen3 Embedding 4B qwen3-embedding-4b 0.02 0.00 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
Grok 3 Mini Fast Beta grok-3-mini-fast 0.60 4.00 xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
v0-1.0-md v0-1.0-md 3.00 15.00 Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
Qwen3-30B-A3B qwen-3-30b 0.08 0.29 Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
o3 Pro o3-pro 20.00 80.00 The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.
GLM-4.6V glm-4.6v 0.30 0.90 GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
Grok 2 grok-2 2.00 10.00 Grok 2 is a frontier language model with state-of-the-art reasoning capabilities. It features advanced capabilities in chat, coding, and reasoning, outperforming both Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard.
Claude 3.5 Sonnet (2024-06-20) claude-3.5-sonnet-20240620 3.00 15.00 Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.
Nova Pro nova-pro 0.80 3.20 A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.
Command A command-a 2.50 10.00 Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.
Nova 2 Lite nova-2-lite 0.30 2.50 Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text.
Sonoma Sky Alpha sonoma-sky-alpha 0.20 0.50 This model is no longer in stealth and gets responses from Grok 4 Fast Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
Sonoma Dusk Alpha sonoma-dusk-alpha 0.20 0.50 This model is no longer in stealth and gets responses from Grok 4 Fast Non-Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
Llama 3.2 1B Instruct llama-3.2-1b 0.10 0.10 Text-only model, supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
o1 o1 15.00 60.00 o1 is OpenAI's flagship reasoning model, designed for complex problems that require deep thinking. It provides strong reasoning capabilities with improved accuracy for complex multi-step tasks.
GLM 4.5V glm-4.5v 0.60 1.80 Built on the GLM-4.5-Air base model, GLM-4.5V inherits proven techniques from GLM-4.1V-Thinking while achieving effective scaling through a powerful 106B-parameter MoE architecture.
GLM 4.5 glm-4.5 0.60 2.20 GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
Qwen3 Max Preview qwen3-max-preview 1.20 6.00 Qwen3-Max-Preview shows substantial gains over the 2.5 series in overall capability, with significant enhancements in Chinese-English text understanding, complex instruction following, handling of subjective open-ended tasks, multilingual ability, and tool invocation; model knowledge hallucinations are reduced.
Devstral Small 1.1 devstral-small 0.10 0.30 Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents.
voyage-3.5-lite voyage-3.5-lite 0.02 0.00 Voyage AI's embedding model optimized for latency and cost.
FLUX.1 Kontext Max flux-kontext-max 0.00 0.00 FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
Imagen 4 Fast imagen-4.0-fast-generate-001 0.00 0.00 Imagen 4 Fast is Google’s speed-optimized variant of the Imagen 4 text-to-image model, designed for rapid, high-volume image generation. It’s ideal for workflows like quick drafts, mockups, and iterative creative exploration. Despite emphasizing speed, it still benefits from the broader Imagen 4 family’s improvements in clarity, text rendering, and stylistic flexibility, and supports high-resolution outputs up to 2K.
o3-deep-research o3-deep-research 10.00 40.00 o3-deep-research is OpenAI's most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data—brought in through MCP connectors.
FLUX1.1 [pro] flux-pro-1.1 0.00 0.00 FLUX1.1 [pro] is the standard for text-to-image generation with fast, reliable and consistently stunning results. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
Imagen 4 imagen-4.0-generate-001 0.00 0.00 Imagen 4: Google's flagship text-to-image model that serves as the go-to choice for a wide variety of high-quality image generation tasks, featuring significant improvements in text rendering over previous models. It now supports up to 2K resolution generation for creating detailed and crisp visuals, making it suitable for everything from marketing assets to artistic compositions.
FLUX.2 [flex] flux-2-flex 0.00 0.00 FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [flex] supports customizable image generation and editing with adjustable steps and guidance. It's better at typography and text rendering. It supports up to 10 reference images (up to 14 MP total input). This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
FLUX.2 [pro] flux-2-pro 0.00 0.00 FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [pro] supports generation, editing, and multiple reference images (up to 9 MP total input). This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
Imagen 4 Ultra imagen-4.0-ultra-generate-001 0.00 0.00 Imagen 4 Ultra: Highest quality image generation model for detailed and photorealistic outputs.
Sonar Reasoning sonar-reasoning 1.00 5.00 A reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing detailed explanations with search grounding.
FLUX1.1 [pro] Ultra flux-pro-1.1-ultra 0.00 0.00 FLUX1.1 [pro] Ultra delivers ultra-fast, ultra high-resolution image creation - with more pixels in every picture. Generate varying aspect ratios from text, at 4MP resolution fast. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
FLUX.1 Kontext Pro flux-kontext-pro 0.00 0.00 FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
GPT-3.5 Turbo Instruct gpt-3.5-turbo-instruct 1.50 2.00 Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.
Llama 3.2 90B Vision Instruct llama-3.2-90b 0.72 0.72 Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
Qwen3 Embedding 0.6B qwen3-embedding-0.6b 0.01 0.00 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
Trinity Mini trinity-mini 0.05 0.15 Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model, engineered for efficient inference over long contexts with robust function calling and multi-step agent workflows.
FLUX.1 Fill [pro] flux-pro-1.0-fill 0.00 0.00 A state-of-the-art inpainting model, enabling editing and expansion of real and generated images given a text description and a binary mask. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
FLUX.2 [max] flux-2-max 0.00 0.00 FLUX.2 [max] offers image generation and image editing with the highest quality available. It delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency. Built for professional use, FLUX.2 [max] produces production-ready outputs for marketing teams, creatives, filmmakers, and creators around the world.
Text Multilingual Embedding 002 text-multilingual-embedding-002 0.03 0.00 Multilingual text embedding model optimized for cross-lingual tasks across many languages.
Mercury Coder Small Beta mercury-coder-small 0.25 1.00 Mercury Coder Small is ideal for code generation, debugging, and refactoring tasks with minimal latency.
LongCat Flash Thinking longcat-flash-thinking 0.15 1.50 LongCat-Flash-Thinking is a high-throughput MoE reasoning model (128k context) optimized for agentic tasks.
Llama 3.2 11B Vision Instruct llama-3.2-11b 0.16 0.16 Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
Codestral Embed codestral-embed 0.15 0.00 Code embedding model that can embed code databases and repositories to power coding assistants.
Magistral Medium 2509 magistral-medium 2.00 5.00 Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
Magistral Small 2509 magistral-small 0.50 1.50 Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
Mistral Nemo mistral-nemo 0.04 0.17 A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.
Mixtral MoE 8x22B Instruct mixtral-8x22b-instruct 1.20 1.20 8x22b Instruct model. 8x22b is mixture-of-experts open source model by Mistral served by Fireworks.
Morph V3 Large morph-v3-large 0.90 1.90 Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 2500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
Nvidia Nemotron Nano 9B V2 nemotron-nano-9b-v2 0.04 0.16 NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.\
Codex Mini codex-mini 1.50 6.00 Codex Mini is a fine-tuned version of o4-mini specifically for use in Codex CLI.
voyage-code-2 voyage-code-2 0.12 0.00 Voyage AI's embedding model optimized for code retrieval (17% better than alternatives). This is the previous generation of code embeddings models.
voyage-code-3 voyage-code-3 0.18 0.00 Voyage AI's embedding model optimized for code retrieval.
voyage-finance-2 voyage-finance-2 0.12 0.00 Voyage AI's embedding model optimized for finance retrieval and RAG.
voyage-law-2 voyage-law-2 0.12 0.00 Voyage AI's embedding model optimized for legal retrieval and RAG.
claude-4-opus claude-4-opus 15.00 75.00 Source: vercel, Context: 200000
claude-4-sonnet claude-4-sonnet 3.00 15.00 Source: vercel, Context: 200000
command-r command-r 0.15 0.60 Source: vercel, Context: 128000
command-r-plus command-r-plus 2.50 10.00 Source: vercel, Context: 128000
deepseek-r1-distill-llama-70b deepseek-r1-distill-llama-70b 0.75 0.99 Source: vercel, Context: 131072
gemma-2-9b gemma-2-9b 0.20 0.20 Source: vercel, Context: 8192
llama-3-70b llama-3-70b 0.59 0.79 Source: vercel, Context: 8192
llama-3-8b llama-3-8b 0.05 0.08 Source: vercel, Context: 8192
mistral-large mistral-large 2.00 6.00 Source: vercel, Context: 32000
mistral-saba-24b mistral-saba-24b 0.79 0.79 Source: vercel, Context: 32768
Sources
vercel: 185 models
litellm: 10 models