AI Models List

4226 Models Found
Provider Name Model ID Input Price ($/1M) Output Price ($/1M) Description Free
vercel Grok Code Fast 1 grok-code-fast-1 0.20 1.50 xAI's latest coding model that offers fast agentic coding with a 256K context window.
vercel Claude Sonnet 4.5 claude-sonnet-4.5 3.00 15.00 Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4.
vercel Gemini 2.5 Flash Lite gemini-2.5-flash-lite 0.10 0.40 Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
vercel Gemini 3 Flash gemini-3-flash 0.50 3.00 Google's most intelligent model built for speed, combining frontier intelligence with superior search and grounding.
vercel Claude Haiku 4.5 claude-haiku-4.5 1.00 5.00 Claude Haiku 4.5 matches Sonnet 4's performance on coding, computer use, and agent tasks at substantially lower cost and faster speeds. It delivers near-frontier performance and Claude’s unique character at a price point that works for scaled sub-agent deployments, free tier products, and intelligence-sensitive applications with budget constraints.
vercel MiniMax M2 minimax-m2 0.27 1.15 MiniMax-M2 redefines efficiency for agents. It is a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence.
vercel Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
vercel DeepSeek V3.2 deepseek-v3.2 0.27 0.40 DeepSeek-V3.2: Official successor to V3.2-Exp.
vercel Claude Opus 4.5 claude-opus-4.5 5.00 25.00 Claude Opus 4.5 is Anthropic’s latest model in the Opus series, meant for demanding reasoning tasks and complex problem solving. This model has improvements in general intelligence and vision compared to previous iterations. In addition, it is suited for difficult coding tasks and agentic workflows, especially those with computer use and tool use, and can effectively handle context usage and external memory files.
vercel Claude 3.7 Sonnet claude-3.7-sonnet 3.00 15.00 Claude 3.7 Sonnet is the first hybrid reasoning model and Anthropic's most intelligent model to date. It delivers state-of-the-art performance for coding, content generation, data analysis, and planning tasks, building upon its predecessor Claude 3.5 Sonnet's capabilities in software engineering and computer use.
vercel GPT-5.2 gpt-5.2 1.75 14.00 GPT-5.2 is OpenAI's best general-purpose model, part of the GPT-5 flagship model family. It's their most intelligent model yet for both general and agentic tasks.
vercel Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Claude Sonnet 4 significantly improves on Sonnet 3.7's industry-leading capabilities, excelling in coding with a state-of-the-art 72.7% on SWE-bench. The model balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.
vercel Grok 4.1 Fast Non-Reasoning grok-4.1-fast-non-reasoning 0.20 0.50 Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for speed use this variant. Otherwise, use the reasoning version.
vercel Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 This model improves upon Gemini 2.5 Pro and is catered towards challenging tasks, especially those involving complex reasoning or agentic workflows. Improvements highlighted include use cases for coding, multi-step function calling, planning, reasoning, deep knowledge tasks, and instruction following.
vercel GPT-5 mini gpt-5-mini 0.25 2.00 GPT-5 mini is a cost optimized model that excels at reasoning/chat tasks. It offers an optimal balance between speed, cost, and capability.
vercel GPT-5 gpt-5 1.25 10.00 GPT-5 is OpenAI's flagship language model that excels at complex reasoning, broad real-world knowledge, code-intensive, and multi-step agentic tasks.
vercel GPT-5 Chat gpt-5-chat 1.25 10.00 GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT.
vercel GPT-5 nano gpt-5-nano 0.05 0.40 GPT-5 nano is a high throughput model that excels at simple instruction or classification tasks.
vercel GPT-4.1 mini gpt-4.1-mini 0.40 1.60 GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.
vercel GPT-5-Codex gpt-5-codex 1.25 10.00 GPT-5-Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments.
vercel Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Gemini 2.5 Pro is our most advanced reasoning Gemini model, capable of solving complex problems. Gemini 2.5 Pro can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and even entire code repositories.
vercel GLM 4.6 glm-4.6 0.45 1.80 As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
vercel Grok 4 Fast Non-Reasoning grok-4-fast-non-reasoning 0.20 0.50 Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
vercel gpt-oss-120b gpt-oss-120b 0.10 0.50 Extremely capable general-purpose LLM with strong, controllable reasoning capabilities
vercel gpt-oss-safeguard-20b gpt-oss-safeguard-20b 0.08 0.30 OpenAI's first open weight reasoning model specifically trained for safety classification tasks. Fine-tuned from GPT-OSS, this model helps classify text content based on customizable policies, enabling bring-your-own-policy Trust & Safety AI where your own taxonomy, definitions, and thresholds guide classification decisions.
vercel GPT-5.1 Instant gpt-5.1-instant 1.25 10.00 GPT-5.1 Instant (or GPT-5.1 chat) is a warmer and more conversational version of GPT-5-chat, with improved instruction following and adaptive reasoning for deciding when to think before responding.
vercel GPT-4o mini gpt-4o-mini 0.15 0.60 GPT-4o mini from OpenAI is their most advanced and cost-efficient small model. It is multi-modal (accepting text or image inputs and outputting text) and has higher intelligence than gpt-3.5-turbo but is just as fast.
vercel MiniMax M2.1 minimax-m2.1 0.30 1.20 MiniMax 2.1 is MiniMax's latest model, optimized specifically for robustness in coding, tool use, instruction following, and long-horizon planning.
vercel Gemini 2.0 Flash gemini-2.0-flash 0.10 0.40 Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
vercel Devstral 2 devstral-2 0.00 0.00 An enterprise-grade text model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
vercel GPT 5.1 Thinking gpt-5.1-thinking 1.25 10.00 An upgraded version of GPT-5 that adapts thinking time more precisely to the question to spend more time on complex questions and respond more quickly to simpler tasks.
vercel text-embedding-3-small text-embedding-3-small 0.02 0.00 OpenAI's improved, more performant version of their ada embedding model.
vercel Grok 4.1 Fast Reasoning grok-4.1-fast-reasoning 0.20 0.50 Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for maximal intelligence use this variant. Otherwise, use the non-reasoning version.
vercel DeepSeek V3.2 Thinking deepseek-v3.2-thinking 0.28 0.42 Thinking mode of DeepSeek V3.2
vercel GLM 4.7 glm-4.7 0.43 1.75 GLM-4.7 is Z.ai’s latest flagship model, with major upgrades focused on two key areas: stronger coding capabilities and more stable multi-step reasoning and execution.
vercel Ministral 3B ministral-3b 0.04 0.04 A compact, efficient model for on-device tasks like smart assistants and local analytics, offering low-latency performance.
vercel Devstral Small 2 devstral-small-2 0.00 0.00 Our open source model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
vercel Mistral Embed mistral-embed 0.10 0.00 General-purpose text embedding model for semantic search, similarity, clustering, and RAG workflows.
vercel Nova Lite nova-lite 0.06 0.24 A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.
vercel Claude Opus 4.1 claude-opus-4.1 15.00 75.00 Claude Opus 4.1 is a drop-in replacement for Opus 4 that delivers superior performance and precision for real-world coding and agentic tasks. Opus 4.1 advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, and handles complex, multi-step problems with more rigor and attention to detail.
vercel Qwen3 Next 80B A3B Instruct qwen3-next-80b-a3b-instruct 0.09 1.10 A new generation of open-source, non-thinking mode model powered by Qwen3. This version demonstrates superior Chinese text understanding, augmented logical reasoning, and enhanced capabilities in text generation tasks over the previous iteration (Qwen3-235B-A22B-Instruct-2507).
vercel GPT-4.1 gpt-4.1 2.00 8.00 GPT 4.1 is OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.
vercel GPT-4o gpt-4o 2.50 10.00 GPT-4o from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It matches GPT-4 Turbo performance with a faster and cheaper API.
vercel GPT-4.1 nano gpt-4.1-nano 0.10 0.40 GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.
vercel GPT 5.1 Codex Max gpt-5.1-codex-max 1.25 10.00 GPT‑5.1-Codex-Max is purpose-built for agentic coding.
vercel Grok 4 Fast Reasoning grok-4-fast-reasoning 0.20 0.50 Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
vercel Grok 4 grok-4 3.00 15.00 xAI's latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.
vercel Nano Banana (Gemini 2.5 Flash Image) gemini-2.5-flash-image 0.30 2.50 Nano Banana (Gemini 2.5 Flash Image) is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
vercel Nano Banana Pro (Gemini 3 Pro Image) gemini-3-pro-image 2.00 120.00 Nano Banana Pro (Gemini 3 Pro Image) builds on Nano Banana's generation capabilities into a new era of studio-quality, functional design to help you create and edit high-fidelity, production-ready visuals with unparalleled precision and control. Improvements include enhanced world knowledge and reasoning, dynamic text and translation, and studio level controls.
vercel gpt-oss-20b gpt-oss-20b 0.07 0.30 A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
vercel Gemini Embedding 001 gemini-embedding-001 0.15 0.00 State-of-the-art embedding model with excellent performance across English, multilingual and code tasks.
vercel o4-mini o4-mini 1.10 4.40 OpenAI's o4-mini delivers fast, cost-efficient reasoning with exceptional performance for its size, particularly excelling in math (best-performing on AIME benchmarks), coding, and visual tasks.
vercel Sonar sonar 1.00 1.00 Perplexity's lightweight offering with search grounding, quicker and cheaper than Sonar Pro.
vercel Kimi K2 0905 kimi-k2-0905 0.60 2.50 Kimi K2 0905 has shown strong performance on agentic tasks thanks to its tool calling, reasoning abilities, and long context handling. But as a large parameter model (1T parameters), it’s also resource-intensive. Running it in production requires a highly optimized inference stack to avoid excessive latency.
vercel Gemini 2.5 Flash Lite Preview 09-2025 gemini-2.5-flash-lite-preview-09-2025 0.10 0.40 Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
vercel text-embedding-3-large text-embedding-3-large 0.13 0.00 OpenAI's most capable embedding model for both english and non-english tasks.
vercel Gemini 2.0 Flash Lite gemini-2.0-flash-lite 0.08 0.30 Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
vercel Claude Opus 4 claude-opus-4 15.00 75.00 Claude Opus 4 is Anthropic's most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish.
vercel Claude 3.5 Haiku claude-3.5-haiku 0.80 4.00 Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.
vercel GPT-5.2 Chat gpt-5.2-chat 1.75 14.00 The model powering ChatGPT is gpt-5.2-chat-latest: this is OpenAI's best general-purpose model, part of the GPT-5 flagship model family.
vercel Gemini 2.5 Flash Preview 09-2025 gemini-2.5-flash-preview-09-2025 0.30 2.50 Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
vercel GPT-5.1 Codex mini gpt-5.1-codex-mini 0.25 2.00 GPT-5.1 Codex mini is a smaller, faster, and cheaper version of GPT-5.1 Codex.
vercel DeepSeek V3.2 Exp deepseek-v3.2-exp 0.27 0.40 DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality.
vercel MiMo V2 Flash mimo-v2-flash 0.10 0.29 Xiaomi MiMo-V2-Flash is a proprietary MoE model developed by Xiaomi, designed for extreme inference efficiency with 309B total parameters (15B active). By incorporating an innovative Hybrid attention architecture and multi-layer MTP inference acceleration, it ranks among the top 2 global open-source models across multiple Agent benchmarks.
vercel DeepSeek V3 0324 deepseek-v3 0.77 0.77 Fast general-purpose LLM with enhanced reasoning capabilities
vercel Mistral Small mistral-small 0.10 0.30 Mistral Small is the ideal choice for simple tasks that one can do in bulk - like Classification, Customer Support, or Text Generation. It offers excellent performance at an affordable price point.
vercel o3 o3 2.00 8.00 OpenAI's o3 is their most powerful reasoning model, setting new state-of-the-art benchmarks in coding, math, science, and visual perception. It excels at complex queries requiring multi-faceted analysis, with particular strength in analyzing images, charts, and graphics.
vercel Qwen3 Max qwen3-max 1.20 6.00 The Qwen 3 series Max model has undergone specialized upgrades in agent programming and tool invocation compared to the preview version. The officially released model this time has achieved state-of-the-art (SOTA) performance in its field and is better suited to meet the demands of agents operating in more complex scenarios.
vercel Llama 3.3 70B llama-3.3-70b 0.72 0.72 The upgraded Llama 3.1 70B model features enhanced reasoning, tool use, and multilingual abilities, along with a significantly expanded 128K context window. These improvements make it well-suited for demanding tasks such as long-form summarization, multilingual conversations, and coding assistance.
vercel Llama 3.1 8B llama-3.1-8b 0.03 0.05 Llama 3.1 8B brings powerful performance in a smaller, more efficient package. With improved multilingual support, tool use, and a 128K context length, it enables sophisticated use cases like interactive agents and compact coding assistants while remaining lightweight and accessible.
vercel GPT-5.1-Codex gpt-5.1-codex 1.25 10.00 GPT-5.1-Codex is a version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.
vercel Kimi K2 Thinking kimi-k2-thinking 0.47 2.00 Kimi K2 Thinking is an advanced open-source thinking model by Moonshot AI. It can execute up to 200 – 300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. Built as a thinking agent, it reasons step by step while using tools, achieving state-of-the-art performance on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, with major gains in reasoning, agentic search, coding, writing, and general capabilities.
vercel KAT-Coder-Pro V1 kat-coder-pro-v1 0.00 0.00 KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KwaiKAT series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving a remarkable 73.4% solve rate on the SWE-Bench Verified benchmark. KAT-Coder-Pro V1 delivers top-tier coding performance and has been rigorously tested by thousands of in-house engineers. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
vercel Qwen3 235B A22b Instruct 2507 qwen-3-235b 0.13 0.60 Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
vercel MiniMax M2.1 Lightning minimax-m2.1-lightning 0.30 2.40 MiniMax-M2.1-lightning is a faster version of MiniMax-M2.1, offering the same performance but with significantly higher throughput (output speed ~100 TPS, MiniMax-M2 output speed ~60 TPS).
vercel Kimi K2 kimi-k2 0.50 2.00 Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.
vercel DeepSeek R1 0528 deepseek-r1 0.50 2.15 The latest revision of DeepSeek's first-generation reasoning model
vercel text-embedding-ada-002 text-embedding-ada-002 0.10 0.00 OpenAI's legacy text embedding model.
vercel Llama 4 Scout 17B 16E Instruct llama-4-scout 0.08 0.30 The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts. Served by DeepInfra.
vercel o3-mini o3-mini 1.10 4.40 o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini.
vercel DeepSeek V3.1 Terminus deepseek-v3.1-terminus 0.27 1.00 DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version and addresses user feedback (i.e. language consistency and agent upgrades).
vercel Mistral Large 3 mistral-large-3 0.50 1.50 Mistral Large 3 2512 is Mistral’s most capable model to date. It has a sparse mixture-of-experts architecture with 41B active parameters (675B total).
vercel Pixtral 12B 2409 pixtral-12b 0.15 0.15 A 12B model with image understanding capabilities in addition to text.
vercel Sonar Pro sonar-pro 3.00 15.00 Perplexity's premier offering with search grounding, supporting advanced queries and follow-ups.
vercel GLM-4.6V-Flash glm-4.6v-flash 0.00 0.00 For local deployment and low-latency applications. GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
vercel Kimi K2 Thinking Turbo kimi-k2-thinking-turbo 1.15 8.00 High-speed version of kimi-k2-thinking, suitable for scenarios requiring both deep reasoning and extremely fast responses
vercel Llama 4 Maverick 17B 128E Instruct llama-4-maverick 0.15 0.60 Llama 4 Maverick 17B-128E is Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities.
vercel DeepSeek V3.1 deepseek-v3.1 0.30 1.00 DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.
vercel Kimi K2 Turbo kimi-k2-turbo 2.40 10.00 Kimi K2 Turbo is the high-speed version of kimi-k2, with the same model parameters as kimi-k2, but the output speed is increased to 60 tokens per second, with a maximum of 100 tokens per second, the context length is 256k
vercel Grok 3 Mini Beta grok-3-mini 0.30 0.50 xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible.
vercel Claude 3.5 Sonnet claude-3.5-sonnet 3.00 15.00 The upgraded Claude 3.5 Sonnet is now state-of-the-art for a variety of tasks including real-world software engineering, agentic capabilities and computer use. The new Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.
vercel LongCat Flash Chat longcat-flash-chat 0.00 0.00 LongCat-Flash-Chat is a high-throughput MoE chat model (128k context) designed for agentic tasks.
vercel Qwen3 Next 80B A3B Thinking qwen3-next-80b-a3b-thinking 0.15 1.50 A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).
vercel Qwen 3.32B qwen-3-32b 0.10 0.30 Qwen3-32B is a world-class model with comparable quality to DeepSeek R1 while outperforming GPT-4.1 and Claude Sonnet 3.7. It excels in code-gen, tool-calling, and advanced reasoning, making it an exceptional model for a wide range of production use cases.
vercel Claude 3 Haiku claude-3-haiku 0.25 1.25 Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts. Haiku to quickly analyze large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier.
vercel Qwen3 VL 235B A22B Instruct qwen3-vl-instruct 0.70 2.80 The Qwen3 series VL models has been comprehensively upgraded in areas such as visual coding and spatial perception. Its visual perception and recognition capabilities have significantly improved, supporting the understanding of ultra-long videos, and its OCR functionality has undergone a major enhancement.
vercel Text Embedding 005 text-embedding-005 0.03 0.00 English-focused text embedding model optimized for code and English language tasks.
vercel Nano Banana Preview (Gemini 2.5 Flash Image Preview) gemini-2.5-flash-image-preview 0.30 2.50 Gemini 2.5 Flash Image Preview is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
vercel GPT 5.2 gpt-5.2-pro 21.00 168.00 Version of GPT-5.2 that produces smarter and more precise responses.
vercel Qwen 3 Coder 30B A3B Instruct qwen3-coder-30b-a3b 0.07 0.27 Efficient coding specialist balancing performance with cost-effectiveness for daily development tasks while maintaining strong tool integration capabilities.
vercel Qwen3 Coder 480B A35B Instruct qwen3-coder 0.38 1.53 Mixture-of-experts LLM with advanced coding and reasoning capabilities
vercel Grok 2 Vision grok-2-vision 2.00 10.00 Grok 2 vision model excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA). It can process a wide variety of visual information including documents, diagrams, charts, screenshots, and photographs.
vercel Morph V3 Fast morph-v3-fast 0.80 1.20 Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 4500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
vercel Grok 3 Beta grok-3 3.00 15.00 xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.
vercel Nova Micro nova-micro 0.04 0.14 A text-only model that delivers the lowest latency responses at very low cost.
vercel Ministral 14B ministral-14b 0.20 0.20 Ministral 3 14B is the largest model in the Ministral 3 family, offering state-of-the-art capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. Optimized for local deployment, it delivers high performance across diverse hardware, including local setups.
vercel Ministral 8B ministral-8b 0.10 0.10 A more powerful model with faster, memory-efficient inference, ideal for complex workflows and demanding edge applications.
vercel Mistral Codestral codestral 0.30 0.90 Mistral's cutting-edge language model for coding released end of July 2025, Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
vercel Claude 3 Opus claude-3-opus 15.00 75.00 Claude 3 Opus is Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what's possible with generative AI.
vercel Pixtral Large pixtral-large 2.00 6.00 Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.
vercel GPT-4 Turbo gpt-4-turbo 10.00 30.00 gpt-4-turbo from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It has a knowledge cutoff of April 2023 and a 128,000 token context window.
vercel voyage-3.5 voyage-3.5 0.06 0.00 Voyage AI's embedding model optimized for general-purpose and multilingual retrieval quality.
vercel Llama 3.1 70B Instruct llama-3.1-70b 0.40 0.40 An update to Meta Llama 3 70B Instruct that includes an expanded 128K context length, multilinguality and improved reasoning capabilities.
vercel Nemotron 3 Nano 30B A3B nemotron-3-nano-30b-a3b 0.06 0.24 NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.
vercel Qwen3 VL 235B A22B Thinking qwen3-vl-thinking 0.70 8.40 Qwen3 series VL models feature significantly enhanced multimodal reasoning capabilities, with a particular focus on optimizing the model for STEM and mathematical reasoning. Visual perception and recognition abilities have been comprehensively improved, and OCR capabilities have undergone a major upgrade.
vercel Sonar Reasoning Pro sonar-reasoning-pro 2.00 8.00 A premium reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing comprehensive explanations with enhanced search capabilities and multiple search queries per request.
vercel GPT-3.5 Turbo gpt-3.5-turbo 0.50 1.50 OpenAI's most capable and cost effective model in the GPT-3.5 family optimized for chat purposes, but also works well for traditional completions tasks.
vercel Qwen3 Embedding 8B qwen3-embedding-8b 0.05 0.00 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
vercel Mistral Medium 3.1 mistral-medium 0.40 2.00 Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost.
vercel INTELLECT 3 intellect-3 0.20 1.10 Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack. Achieving state-of-the-art performance for its size across math, code and reasoning.
vercel Nvidia Nemotron Nano 12B V2 VL nemotron-nano-12b-v2-vl 0.20 0.60 The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.
vercel Qwen3-14B qwen-3-14b 0.06 0.24 Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
vercel Embed v4.0 embed-v4.0 0.12 0.00 A model that allows for text, images, or mixed content to be classified or turned into embeddings.
vercel GLM 4.5 Air glm-4.5-air 0.20 1.10 GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
vercel GPT-5 pro gpt-5-pro 15.00 120.00 GPT-5 pro uses more compute to think harder and provide consistently better answers. Since GPT-5 pro is designed to tackle tough problems, some requests may take several minutes to finish.
vercel Llama 3.2 3B Instruct llama-3.2-3b 0.15 0.15 Text-only model, fine-tuned for supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
vercel voyage-3-large voyage-3-large 0.18 0.00 Voyage AI's embedding model with the best general-purpose and multilingual retrieval quality.
vercel Titan Text Embeddings V2 titan-embed-text-v2 0.02 0.00 Amazon Titan Text Embeddings V2 is a light weight, efficient multilingual embedding model supporting 1024, 512, and 256 dimensions.
vercel Grok 3 Fast Beta grok-3-fast 5.00 25.00 xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
vercel Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking 0.30 2.90 Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.
vercel v0-1.5-md v0-1.5-md 3.00 15.00 Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
vercel Qwen3 Coder Plus qwen3-coder-plus 1.00 5.00 Powered by Qwen3 this is a powerful Coding Agent that excels in tool calling and environment interaction to achieve autonomous programming. It combines outstanding coding proficiency with versatile general-purpose abilities.
vercel Qwen3 Embedding 4B qwen3-embedding-4b 0.02 0.00 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
vercel Grok 3 Mini Fast Beta grok-3-mini-fast 0.60 4.00 xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
vercel v0-1.0-md v0-1.0-md 3.00 15.00 Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
vercel Qwen3-30B-A3B qwen-3-30b 0.08 0.29 Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
vercel o3 Pro o3-pro 20.00 80.00 The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.
vercel GLM-4.6V glm-4.6v 0.30 0.90 GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
vercel Grok 2 grok-2 2.00 10.00 Grok 2 is a frontier language model with state-of-the-art reasoning capabilities. It features advanced capabilities in chat, coding, and reasoning, outperforming both Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard.
vercel Claude 3.5 Sonnet (2024-06-20) claude-3.5-sonnet-20240620 3.00 15.00 Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.
vercel Nova Pro nova-pro 0.80 3.20 A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.
vercel Command A command-a 2.50 10.00 Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.
vercel Nova 2 Lite nova-2-lite 0.30 2.50 Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text.
vercel Sonoma Sky Alpha sonoma-sky-alpha 0.20 0.50 This model is no longer in stealth and gets responses from Grok 4 Fast Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
vercel Sonoma Dusk Alpha sonoma-dusk-alpha 0.20 0.50 This model is no longer in stealth and gets responses from Grok 4 Fast Non-Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
vercel Llama 3.2 1B Instruct llama-3.2-1b 0.10 0.10 Text-only model, supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
vercel o1 o1 15.00 60.00 o1 is OpenAI's flagship reasoning model, designed for complex problems that require deep thinking. It provides strong reasoning capabilities with improved accuracy for complex multi-step tasks.
vercel GLM 4.5V glm-4.5v 0.60 1.80 Built on the GLM-4.5-Air base model, GLM-4.5V inherits proven techniques from GLM-4.1V-Thinking while achieving effective scaling through a powerful 106B-parameter MoE architecture.
vercel GLM 4.5 glm-4.5 0.60 2.20 GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
vercel Qwen3 Max Preview qwen3-max-preview 1.20 6.00 Qwen3-Max-Preview shows substantial gains over the 2.5 series in overall capability, with significant enhancements in Chinese-English text understanding, complex instruction following, handling of subjective open-ended tasks, multilingual ability, and tool invocation; model knowledge hallucinations are reduced.
vercel Devstral Small 1.1 devstral-small 0.10 0.30 Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents.
vercel voyage-3.5-lite voyage-3.5-lite 0.02 0.00 Voyage AI's embedding model optimized for latency and cost.
vercel FLUX.1 Kontext Max flux-kontext-max 0.00 0.00 FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel Imagen 4 Fast imagen-4.0-fast-generate-001 0.00 0.00 Imagen 4 Fast is Google’s speed-optimized variant of the Imagen 4 text-to-image model, designed for rapid, high-volume image generation. It’s ideal for workflows like quick drafts, mockups, and iterative creative exploration. Despite emphasizing speed, it still benefits from the broader Imagen 4 family’s improvements in clarity, text rendering, and stylistic flexibility, and supports high-resolution outputs up to 2K.
vercel o3-deep-research o3-deep-research 10.00 40.00 o3-deep-research is OpenAI's most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data—brought in through MCP connectors.
vercel FLUX1.1 [pro] flux-pro-1.1 0.00 0.00 FLUX1.1 [pro] is the standard for text-to-image generation with fast, reliable and consistently stunning results. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel Imagen 4 imagen-4.0-generate-001 0.00 0.00 Imagen 4: Google's flagship text-to-image model that serves as the go-to choice for a wide variety of high-quality image generation tasks, featuring significant improvements in text rendering over previous models. It now supports up to 2K resolution generation for creating detailed and crisp visuals, making it suitable for everything from marketing assets to artistic compositions.
vercel FLUX.2 [flex] flux-2-flex 0.00 0.00 FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [flex] supports customizable image generation and editing with adjustable steps and guidance. It's better at typography and text rendering. It supports up to 10 reference images (up to 14 MP total input). This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel FLUX.2 [pro] flux-2-pro 0.00 0.00 FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [pro] supports generation, editing, and multiple reference images (up to 9 MP total input). This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel Imagen 4 Ultra imagen-4.0-ultra-generate-001 0.00 0.00 Imagen 4 Ultra: Highest quality image generation model for detailed and photorealistic outputs.
vercel Sonar Reasoning sonar-reasoning 1.00 5.00 A reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing detailed explanations with search grounding.
vercel FLUX1.1 [pro] Ultra flux-pro-1.1-ultra 0.00 0.00 FLUX1.1 [pro] Ultra delivers ultra-fast, ultra high-resolution image creation - with more pixels in every picture. Generate varying aspect ratios from text, at 4MP resolution fast. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel FLUX.1 Kontext Pro flux-kontext-pro 0.00 0.00 FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel GPT-3.5 Turbo Instruct gpt-3.5-turbo-instruct 1.50 2.00 Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.
vercel Llama 3.2 90B Vision Instruct llama-3.2-90b 0.72 0.72 Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
vercel Qwen3 Embedding 0.6B qwen3-embedding-0.6b 0.01 0.00 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
vercel Trinity Mini trinity-mini 0.05 0.15 Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model, engineered for efficient inference over long contexts with robust function calling and multi-step agent workflows.
vercel FLUX.1 Fill [pro] flux-pro-1.0-fill 0.00 0.00 A state-of-the-art inpainting model, enabling editing and expansion of real and generated images given a text description and a binary mask. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel FLUX.2 [max] flux-2-max 0.00 0.00 FLUX.2 [max] offers image generation and image editing with the highest quality available. It delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency. Built for professional use, FLUX.2 [max] produces production-ready outputs for marketing teams, creatives, filmmakers, and creators around the world.
vercel Text Multilingual Embedding 002 text-multilingual-embedding-002 0.03 0.00 Multilingual text embedding model optimized for cross-lingual tasks across many languages.
vercel Mercury Coder Small Beta mercury-coder-small 0.25 1.00 Mercury Coder Small is ideal for code generation, debugging, and refactoring tasks with minimal latency.
vercel LongCat Flash Thinking longcat-flash-thinking 0.15 1.50 LongCat-Flash-Thinking is a high-throughput MoE reasoning model (128k context) optimized for agentic tasks.
vercel Llama 3.2 11B Vision Instruct llama-3.2-11b 0.16 0.16 Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
vercel Codestral Embed codestral-embed 0.15 0.00 Code embedding model that can embed code databases and repositories to power coding assistants.
vercel Magistral Medium 2509 magistral-medium 2.00 5.00 Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
vercel Magistral Small 2509 magistral-small 0.50 1.50 Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
vercel Mistral Nemo mistral-nemo 0.04 0.17 A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.
vercel Mixtral MoE 8x22B Instruct mixtral-8x22b-instruct 1.20 1.20 8x22b Instruct model. 8x22b is mixture-of-experts open source model by Mistral served by Fireworks.
vercel Morph V3 Large morph-v3-large 0.90 1.90 Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 2500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
vercel Nvidia Nemotron Nano 9B V2 nemotron-nano-9b-v2 0.04 0.16 NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.\
vercel Codex Mini codex-mini 1.50 6.00 Codex Mini is a fine-tuned version of o4-mini specifically for use in Codex CLI.
vercel voyage-code-2 voyage-code-2 0.12 0.00 Voyage AI's embedding model optimized for code retrieval (17% better than alternatives). This is the previous generation of code embeddings models.
vercel voyage-code-3 voyage-code-3 0.18 0.00 Voyage AI's embedding model optimized for code retrieval.
vercel voyage-finance-2 voyage-finance-2 0.12 0.00 Voyage AI's embedding model optimized for finance retrieval and RAG.
vercel voyage-law-2 voyage-law-2 0.12 0.00 Voyage AI's embedding model optimized for legal retrieval and RAG.
together Llama 4 Maverick llama-4-maverick 0.27 0.85 -
together Llama 4 Scout llama-4-scout 0.18 0.59 -
together Llama 3.3 70B Instruct-Turbo llama-3-3-70b-instruct-turbo 0.88 0.88 -
together Llama 3.2 3B Instruct Turbo llama-3-2-3b-instruct-turbo 0.06 0.06 -
together Llama 3.1 405B Instruct Turbo llama-3-1-405b-instruct-turbo 3.50 3.50 -
together Llama 3.1 70B Instruct Turbo llama-3-1-70b-instruct-turbo 0.88 0.88 -
together Llama 3.1 8B Instruct Turbo llama-3-1-8b-instruct-turbo 0.18 0.18 -
together Llama 3 8B Instruct Lite llama-3-8b-instruct-lite 0.10 0.10 -
together Llama 3 70B Instruct Reference llama-3-70b-instruct-reference 0.88 0.88 -
together Llama 3 70B Instruct Turbo llama-3-70b-instruct-turbo 0.88 0.88 -
together LLaMA-2 llama-2 0.90 0.90 -
together DeepSeek-R1 deepseek-r1 3.00 7.00 -
together DeepSeek R1 Distilled Qwen 14B deepseek-r1-distilled-qwen-14b 0.18 0.18 -
together DeepSeek R1 Distilled Llama 70B deepseek-r1-distilled-llama-70b 2.00 2.00 -
together DeepSeek R1-0528-tput deepseek-r1-0528-tput 0.55 2.19 -
together DeepSeek-V3-1 deepseek-v3-1 0.60 1.70 -
together DeepSeek-V3 deepseek-v3 1.25 1.25 -
together gpt-oss-120B gpt-oss-120b 0.15 0.60 -
together gpt-oss-20B gpt-oss-20b 0.05 0.20 -
together Qwen3 Next 80B A3B Instruct qwen3-next-80b-a3b-instruct 0.15 1.50 -
together Qwen3 Next 80B A3B Thinking qwen3-next-80b-a3b-thinking 0.15 1.50 -
together Qwen3-VL 32B Instruct qwen3-vl-32b-instruct 0.50 1.50 -
together Qwen3-Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 2.00 2.00 -
together Qwen3 235B A22B Instruct 2507 FP8 qwen3-235b-a22b-instruct-2507-fp8 0.20 0.60 -
together Qwen3 235B A22B Thinking 2507 FP8 qwen3-235b-a22b-thinking-2507-fp8 0.65 3.00 -
together Qwen3 235B A22B FP8 Throughput qwen3-235b-a22b-fp8-throughput 0.20 0.60 -
together Qwen 2.5 72B qwen-2-5-72b 1.20 1.20 -
together Qwen2.5-VL 72B Instruct qwen2-5-vl-72b-instruct 1.95 8.00 -
together Qwen2.5 Coder 32B Instruct qwen2-5-coder-32b-instruct 0.80 0.80 -
together Qwen2.5 7B Instruct Turbo qwen2-5-7b-instruct-turbo 0.30 0.30 -
together Qwen QwQ-32B qwen-qwq-32b 1.20 1.20 -
together GLM-4.6 glm-4-6 0.60 2.20 -
together GLM-4.5-Air glm-4-5-air 0.20 1.10 -
together Kimi K2 Instruct kimi-k2-instruct 1.00 3.00 -
together Kimi K2 Thinking kimi-k2-thinking 1.20 4.00 -
together Kimi K2 0905 kimi-k2-0905 1.00 3.00 -
together Mistral (7B) Instruct v0.2 mistral-7b-instruct-v0-2 0.20 0.20 -
together Mistral Instruct mistral-instruct 0.20 0.20 -
together Mistral Small 3 mistral-small-3 0.80 0.80 -
together Mixtral 8x7B Instruct v0.1 mixtral-8x7b-instruct-v0-1 0.60 0.60 -
together Marin 8B Instruct marin-8b-instruct 0.18 0.18 -
together Arcee AI AFM-4.5B arcee-ai-afm-4-5b 0.10 0.40 -
together Arcee AI Coder-Large arcee-ai-coder-large 0.50 0.80 -
together Arcee AI Maestro arcee-ai-maestro 0.90 3.30 -
together Arcee AI Virtuoso-Large arcee-ai-virtuoso-large 0.75 1.20 -
together Cogito v2 preview - 109B MoE cogito-v2-preview-109b-moe 0.18 0.59 -
together Cogito v2 preview - 405B cogito-v2-preview-405b 3.50 3.50 -
together Cogito v2 preview - 671B MoE cogito-v2-preview-671b-moe 1.25 1.25 -
together Cogito v2 preview - 70B cogito-v2-preview-70b 0.88 0.88 -
together Refuel LLM-2 refuel-llm-2 0.60 0.60 -
together Refuel LLM-2 Small refuel-llm-2-small 0.20 0.20 -
together Typhoon 2 70B Instruct typhoon-2-70b-instruct 0.88 0.88 -
together gemma-3n-E4B-it gemma-3n-e4b-it 0.02 0.04 -
poe - assistant - - General-purpose assistant. Write, code, ask for real-time information, create images, and more. Queries are automatically routed based on the task and subscription status. For subscribers: - General queries: @GPT-5.2-Instant - Web searches: @Web-Search - Image generation: @Nano-Banana - Video-input tasks: @Gemini-2.5-Pro For non-subscribers: - General queries: @GPT-4o-Mini - Web searches: @Web-Search - Image generation: @FLUX-schnell - Video-input tasks: @Gemini-2.5-Flash
poe - gpt-5.2-instant 1.60 13.00 A fast, steady conversational model built for day-to-day use. It handles long threads without drifting, keeps context clean, and answers in a straightforward way. Good for planning, rewriting, summarizing, and quick technical help. Supports 400k tokens of context and native vision. Optional parameters: Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe - claude-opus-4.5 4.30 21.00 Claude Opus 4.5 from Anthropic, supports customizable thinking budget (up to 64k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63999 to the end of your message.
poe - gemini-3-flash 0.40 2.40 Building on the reasoning capabilities of Gemini 3 Pro, Gemini 3 Flash is a powerful but affordable and performant model. It has exceptional world knowledge, multimodal understanding and reasoning capabilities at a fraction of the cost of equivalent models (as of December 2025). Optional parameters: To set thinking level, add --thinking_level and set it to either `minimal`, `low`, `high`. This is set to `low` as default. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe - gemini-3-pro 1.60 9.60 Gemini 3 Pro is a state-of-the-art model for math, coding, computer use, and long‑horizon agent tasks, delivering top benchmark results including 23.4% on MathArena Apex (up from 1.6%), SOTA on tau-bench, an Elo of 2,439 on LiveCodeBench Pro (vs. 2,234), 72.7% on ScreenSpot‑Pro (~2× the previous best), and a higher mean net worth on Vending‑Bench 2 ($5,478 vs. $3,838). It has a 1M input context window and a max output tokens of 64k. Optional Parameters: To instruct the bot to use more thinking effort, select from "Low" or "High" To enable web search and real-time information update, toggle "enable web search". This is disabled by default.
poe - gpt-5.2-pro 19.00 150.00 A powerful reasoning model that is ideal for your most complex, highest difficulty tasks. On x-high reasoning effort, scores a 90.5% on ARC-AGI-1 benchmark, an incredibly difficult problem-solving benchmark where humans score 100%. Note: the model can take up to 30 minutes to think through a problem and is quite expensive. Supports 400k tokens of context and native vision. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "medium", "high" or "Xhigh" (default: "medium") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
poe - gpt-5.2 1.60 13.00 GPT-5.2 is a state-of-the-art AI model from OpenAI designed for real work across writing, analysis, coding, and problem solving. It handles long contexts and multi-step tasks better than earlier versions, and it’s tuned to give accurate responses with fewer errors. Supports 400k tokens of context, and native vision. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", "high", or "Xhigh" (default: "None") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", "high" (default: medium)
poe - claude-sonnet-4.5 2.60 13.00 Claude Sonnet 4.5 represents a major leap forward in AI capability and alignment. It is the most advanced model released by Anthropic to date, distinguished by dramatic improvements in reasoning, mathematics, and real-world coding. Supports 1m tokens of context. To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 31,999 to the end of your message. Use `--web_search true` to enable web search and real-time information update. This is disabled by default.
poe - grok-4 3.00 15.00 Grok 4 is xAI's latest and most intelligent language model. It features state-of-the-art capabilities in coding, reasoning, and answering questions. It excels at handling complex and multi-step tasks. Reasoning traces are not available via the xAI API.
poe - claude-haiku-4.5 0.85 4.30 Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line, and scores >73% on SWE-bench verified, ranking among the world's best coding models. Supports 200k tokens of context. Optional parameters: To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63,999 to the end of your message. Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe - claude-opus-4.1 13.00 64.00 Claude Opus 4.1 from Anthropic, supports customizable thinking budget (up to 32k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 31999 to the end of your message.
poe - glm-4.7 - - GLM-4.7 is Z.AI's latest flagship model, with major upgrades focused on advanced coding capabilities and more reliable multi-step reasoning and execution. It shows clear gains in complex agent workflows, while delivering a more natural conversational experience and stronger front-end design sensibility. File Support: Text, Markdown and PDF files Context window: 205k tokens Optional parameters: Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default. Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
poe - minimax-m2.1 - - MiniMax M2.1 is a cutting-edge AI model designed to revolutionize how developers build software. With enhanced multi-language programming support, it excels in generating high-quality code across popular languages like Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, and JavaScript. Key improvements include: 22% faster response times and 30% lower token consumption for efficient workflows. Seamless integration with leading development frameworks (Claude Code, Droid Factory AI, BlackBox, etc.). Full-stack development capabilities, from mobile (Android/iOS) to web and 3D interactive prototyping. Optimized performance-to-cost ratio, making AI-assisted development more accessible. Whether you're a software engineer, app developer, or tech innovator, M2.1 empowers smarter coding with industry-leading AI. File Support: Text, Markdown and PDF files Context window: 205k tokens Optional parameters: Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default. Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
poe - gemini-2.5-flash 0.21 1.80 Gemini 2.5 Flash builds upon the popular foundation of Google's 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, search capabilities, and image/video understanding while still prioritizing speed and cost. Supports 1M tokens of input context. Serves the latest `gemini-2.5-flash-preview-09-2025` snapshot. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 24,576 to the end of your message. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe - gemini-2.5-pro 0.87 7.00 Gemini 2.5 Pro is Google's advanced model with frontier performance on various key benchmarks; supports web search and 1 million tokens of input context. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 32,768 to the end of your message. Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe - kling-omni - - Bot for Kling Omni Image-to-Video inference. Send one image for image-to-video generation and two images for first-to-last frame video generation. Set duration with `--duration`, to either 5 or 10 seconds. Accepted file type: jpeg, png, webp, heic, heif. This bot does not accept video files. Note: Prompt is required after attaching images to generate video.
poe - deepseek-r1 18,000.00 - Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to Together AI, a US-based company. Supports 164k tokens of input context and 33k tokens of output context. Uses the latest May 28th snapshot (DeepSeek-R1-0528).
poe - manus - - Manus is an autonomous AI agent that executes tasks. It can take a high-level prompt, break it into subtasks, interact with tools/APIs, and deliver end-to-end results (like reports, code, websites, images, and more) without you managing each step. Notes: - In Agent mode, responses may take several minutes to complete. - Sometimes, files that Manus has created are incorrectly uploaded to the Poe message. In such cases, please check the Manus chat for the file. Parameter controls available: 1. Task Mode - Default: '--task_mode adaptive' (smart routing: may choose Chat or Agent) - Conversational single turn:' --task_mode chat' (fixed price) - Autonomous multi-step: '--task_mode agent' 2. Agent Profile - Default: '--agent_profile manus-1.6' (standard tasks) - Lower usage: '--agent_profile manus-1.6-lite' (speed/savings) - Maximum capability: '--agent_profile manus-1.6-max' (complex reasoning)
poe - glm-4.6 6,600.00 - As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications. Use `--enable_thinking false` to disable thinking about the response before giving a final answer. This is enabled by default. Bot does not support media (video and audio file) attachments. Technical Specifications File Support: Text, Markdown and PDF files Context window: 200k tokens
poe - gpt-5.1-instant 1.10 9.00 OpenAI’s most flagship model optimized for conversational intelligence. It excels at natural dialogue, contextual memory, and adaptive tone, making it perfect for interactive agents, tutoring, and customer support. It balances speed, reliability, and empathy for seamless real‑time communication. Supports 128k tokens of input context.
poe - gpt-5.1 1.10 9.00 OpenAI’s flagship general‑purpose model, built for advanced reasoning, comprehension, and creativity. It delivers robust performance across text and code, with significant improvements in factual accuracy, long‑context understanding, and multilingual fluency. Ideal for research, content creation, analysis, and problem‑solving in any domain. Supports 400k of input context window. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high" (default: "None") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
poe - gpt-image-1.5 - - OpenAI's frontier image generation model in ChatGPT as of December 2025, offering exceptional prompt adherence, world knowledge, precise edits, facial preservation, level of detail, and overall quality with improved latency/generation times. It supports editing, restyling, and combining images attached to the latest user query. For a conversational image generation and editing experience use: https://poe.com/GPT-5.2 Optional Parameters: Set aspect ratio, with options 3:2, 1:1 and 2:3. Set quality to low, medium and high. Default is set to high. Enable use mask by toggling it on or by typing 'use_mask' in the prompt. This option is turned off by default. Disable high fidelity by toggling it off or by typing 'use_high_fidelity'. This option is turned on by default.
poe - kimi-k2-thinking 6,700.00 - Built as a thinking agent, it performs step-by-step reasoning while utilizing tools, achieving state-of-the-art performance on benchmarks such as Humanity's Last Exam (HLE), BrowseComp, and others. The model demonstrates substantial advancements in reasoning, agentic search, coding, writing, and general problem-solving capabilities. Kimi K2 Thinking is capable of executing 200–300 sequential tool calls autonomously, maintaining coherent reasoning across hundreds of steps to solve complex tasks. File Support: Text, Markdown and PDF files Context window: 256k tokens
poe - deepseek-v3.2 - - We introduce DeepSeek-V3.2, a next-generation foundation model designed to unify high computational efficiency with state-of-the-art reasoning and agentic performance. DeepSeek-V3.2 is built upon three core technical breakthroughs: • DeepSeek Sparse Attention (DSA): A new highly efficient attention mechanism that significantly reduces computational overhead while preserving model quality, purpose-built for long-context reasoning and high-throughput workloads. • Scalable Reinforcement Learning Framework: DeepSeek-V3.2 leverages a robust RL training protocol and expanded post-training compute to reach GPT-5-level performance. Its high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and demonstrates reasoning capabilities comparable to Gemini-3.0-Pro. • Large-Scale Agentic Task Synthesis Pipeline: To enable reliable tool-use and multi-step decision-making, we develop a novel agentic data synthesis pipeline that generates high-quality interactive reasoning tasks at scale, greatly enhancing the model’s File Support: Text, Markdown and PDF files Context window: 164k tokens
poe - glm-4.6v - - GLM-4.6V represents a significant multimodal advancement in the GLM series, achieving state-of-the-art visual understanding accuracy for models of its parameter scale. Notably, it's the first visual model to natively integrate Function Call capabilities directly into its architecture, creating a seamless pathway from visual perception to executable actions. This breakthrough establishes a unified technical foundation for deploying multimodal agents in real-world business applications. File Support: Text, Markdown, Image and PDF files Context window: 131k tokens Optional parameters: Enable Thinking - Toggle this on for the model to think before providing a response. This is disabled by default Temperature - Controls randomness in the response. Lower values make the output more focused and deterministic. Select from 0 to 2 range. This is set to 0.7 by default. Max Output Tokens: Maximum number of tokens to generate in the response. This can be set from 1 to 32768. Set to Max token at 32768 by default.
poe - gpt-5.1-codex 1.10 9.00 GPT‑5.1‑Codex extends GPT‑5.1’s capabilities for software development. It understands complex codebases, provides accurate completions, explains algorithms, and assists with debugging across modern programming languages. Designed for developers, it elevates productivity and supports full‑stack coding workflows with precision. Supports 400k tokens of input context. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
poe - gpt-5-pro 14.00 110.00 OpenAI’s latest flagship model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1. Use `--web_search true` to enable web search and real-time information access, this is disabled by default. GPT-5-Pro thinks long and hard. When using this bot through the API, consider increasing your request timeouts.
poe - gpt-5-chat 1.10 9.00 ChatGPT-5 points to the non-reasoning model GPT-5 snapshot (gpt-5-chat-latest) currently used in ChatGPT. Supports native vision, 400k tokens of context, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount.
poe - claude-code - - A powerful assistant that can read, write, and analyze files across many formats. It can also delegate to other Poe bots to handle complex, multi-step tasks. Built on the Claude Agent SDK from Anthropic.
poe - grok-4.1-fast-reasoning - - Grok-4.1-Fast-Reasoning is a high-performance version of xAI’s Grok 4.1 Fast, the company’s best agentic tool‑calling model. It works great in real-world use cases like customer support, deep research, and advanced analytical reasoning. Equipped with 2M‑token context window, this model processes vast information seamlessly, delivering coherent, context‑aware, and deeply reasoned insights at exceptional speed.
poe - zai-glm-4.6-cs 19,000.00 - World’s fastest inference for ZAI GLM 4.6 with Cerebras. ZAI GLM 4.6 is a high‑performance AI model designed for advanced reasoning, superior coding, and effective tool use. It supports structured outputs, parallel tool calling, and real‑time streaming responses. Optimized for agentic coding and automation tasks, the model delivers strong real‑world performance with a context window of up to 131K tokens and output up to 40K tokens. For more information see: https://inference-docs.cerebras.ai/models/zai-glm-46 Context Limit: 131k
poe - gpt-5.1-codex-max 1.10 9.00 OpenAI's most capable agentic coding model; recommended for use in agentic harnesses or similar environments (e.g. Cursor, Claude Code, Codex); the default reasoning effort is set to `Xhigh` so the model will reason extensively on problems given to it (i.e. expect long generation times) and points-intensive. Accepts image attachments.
poe - gpt-5.1-codex-mini 0.22 1.80 GPT‑5.1‑Codex‑Mini is a lightweight, fast, and efficient code‑generation model derived from GPT‑5.1‑Codex. It’s optimized for quick iterations, smaller environments, and edge applications—offering strong coding assistance with lower computational cost while maintaining accuracy and utility. Supports 400k tokens of input context. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
poe - gpt-4o - - OpenAI's GPT-4o answers user prompts in a natural, engaging & tailored writing with strong overall world knowledge. Uses GPT-Image-1 to create and edit images conversationally. For fine-grained image generation control (e.g. image quality), use https://poe.com/GPT-Image-1. Supports context window of 128k tokens. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - nano-banana-pro 1.70 10.00 Nano Banana Pro (Gemini 3 Pro Image Preview) can make detailed, context-rich visuals, precisely edit or restyle input images with exceptional fidelity, and even generate legible text in images in multiple languages. Optional parameters: `--aspect_ratio` (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image `--web_search true` to enable web search and real-time information access, this is disabled by default. `--image_only` (defaults: False): Determines whether to only generate image output `--image_size` (options: 1K, 2K, 4K): Resolution of image Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
poe - nano-banana 0.21 1.80 Google DeepMind's Nano Banana (i.e. Gemini 2.5 Flash Image model) offers image generation and editing capabilities, state-of-the-art performance in photo-realistic multi-turn edits at exceptional speeds. Supports a maximum input context of 32k tokens. Optional parameters: --aspect_ratio (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image --image_only (defaults: False): Determines whether to only generate image output Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
poe - grok-4.1-fast-non-reasoning - - Grok-4.1-Fast-Non-Reasoning is a streamlined companion to Grok 4.1 Fast, xAI’s best agentic tool‑calling model. It has 2M context window and high responsiveness but is optimized for non‑reasoning tasks — excelling at text generation, summarization, and automated workflows that demand speed and efficiency over deep logic. Ideal for high-throughput use cases like customer support automation, bulk content creation, and fast conversational responses.
poe - gpt-5 1.10 9.00 OpenAI’s most advanced general model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal", "low", "medium", or "high" Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe - gpt-5-nano 0.04 0.36 GPT-5 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 400k input tokens of context. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal, "low", "medium", or "high" Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe - gpt-5-mini 0.22 1.80 GPT-5 mini is a small, fast & affordable model that matches or beats GPT-4.1 in many intelligence and vision-related tasks. Supports 400k tokens of context. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "minimal", "low", "medium", or "high". Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe - o3-pro 18.00 72.00 o3-pro is a well-rounded and powerful model across domains, with more capability than https://poe.com/o3 at the cost of higher price and lower speed. It is especially capable at math, science, coding, visual reasoning tasks, technical writing, and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe - gemini-2.5-flash-lite 0.07 0.28 A lightweight Gemini 2.5 Flash reasoning model optimized for cost efficiency and low latency. Supports web search. Supports 1 million tokens of input context. Serves the latest `gemini-2.5-flash-lite-preview-09-2025` snapshot. For more complex queries, use https://poe.com/Gemini-2.5-Pro or https://poe.com/Gemini-2.5-Flash To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 24,576 to the end of your message. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe - gpt-5-codex 1.10 9.00 GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. It supports multimodal inputs such as images or screenshots for UI development and a 400k token context window. We recommend using GPT-5-Codex only for agentic and interactive coding use cases. To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "low", "medium", or "high"
poe - grok-4-fast-non-reasoning 0.20 0.50 Grok 4 Fast Non-Reasoning is designed for fast, efficient tasks like content generation with a 2M token context window. Combining cutting-edge performance with cost-efficiency, it ensures high-quality results for simpler, everyday applications.
poe - qwen-3-next-80b-think 3,000.00 - The Qwen3-Next-80B-Think (with thinking mode enabled by default) is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B-Thinking." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the inference cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens. This is the thinking version of https://poe.com/Qwen3-Next-80B, supports 65k tokens of context. Optional Parameters: Use additional input beside attachment button to manage the optional parameters: 1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer. Technical Specifications: File Support: PDF, DOC and XLSX files File Attachment Limitation: Audio, video and image files Context Window: 65k tokens
poe - qwen3-next-80b 2,400.00 - The Qwen3-Next-80B is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the training cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens. Use `--enable_thinking false` to disable thinking mode before giving an answer. This is the non-thinking version of https://poe.com/Qwen3-Next-80B-Think; supports 65k tokens of context.
poe - deepseek-v3.2-exp 3,900.00 - DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality. This delivers substantial computational efficiency improvements without compromising accuracy. Comprehensive benchmarks confirm V3.2-Exp matches V3.1-Terminus performance, proving efficiency gains don't sacrifice capability. As both a powerful tool and research platform, it establishes new paradigms for efficient long-context AI processing. Optional Parameters: Use additional input beside attachment button to manage the optional parameters: 1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer. Technical Specifications: File Support: Text, Markdown and PDF files Context window: 160k tokens
poe - nova-pro-1.0 - - Amazon Nova Pro 1.0 is a highly capable multimodal foundation model from Amazon Nova, offering a strong balance of accuracy, speed, and cost for processing text, images, and video. Its context window is 300,000 tokens, which enables handling very large inputs (including up to ~30 minutes of video input) in a single request. Use ‘--enable_latency_optimized [false/true]’ (default false) to disable/enable the latency optimized inference accordingly. Note that if enabled, costs may increase. Check the rate card for more information.
poe - nova-premier-1.0 - - The Amazon Nova Premier 1.0 model is Amazon’s most capable foundation model, able to handle extremely long contexts (≈ 1 million tokens) and multimodal inputs like text, images, and video while excelling at complex, multi‑step tasks across tools and data sources. It supports chain‑of‑thought style reasoning and breaks down problems into intermediate steps before arriving at an answer, improving coherence and accuracy. Use '--enable_thinking [true/false]' (default true) to enable/disable thinking accordingly.
poe - grok-4-fast-reasoning 0.20 0.50 Grok 4 Fast Reasoning delivers exceptional performance for tasks requiring logical thinking and problem-solving. With a 2M token context window and state-of-the-art cost-efficiency, it handles complex reasoning tasks with accuracy and speed, making advanced AI capabilities accessible to more users.
poe - nova-micro-1.0 - - Amazon Nova Micro is a text-only foundation model in the Amazon Nova family, designed for ultra‑low latency and very low cost, optimized for tasks like summarization, translation, and interactive chat. It supports a context window of 128,000 tokens, enabling handling of large text inputs in a single request.
poe - nova-lite-1.0 - - Amazon Nova Lite is a low‑cost multimodal foundation model from Amazon that can process text, images, and video and is optimized for speed and affordability. It offers a context window of 300,000 tokens, allowing handling of very large inputs in a single request (including up to ~30 minutes of video).
poe - minimax-m2 3,300.00 - MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever. Technical Specifications File Support: Text, Markdown and PDF files Context window: 200k tokens
poe - hunyuan-image-3 - - Hunyuan Image 3.0 is Tencent’s next‑generation open‑source text-to-image model that uses a large multimodal Mixture-of-Experts architecture to unify image understanding and generation in one system. It produces high-fidelity, often photorealistic images with strong prompt adherence, multilingual text rendering, and intelligent world-knowledge reasoning that can enrich sparse prompts with appropriate visual details. Note: Uploading attachments is not supported. Parameter controls available: 1. Image Settings Size / Aspect Ratio - Default: `--size 1024x1024` (Square 1:1) - `--size 768x1024` (Portrait 3:4) - `--size 1024x768` (Landscape 4:3) - `--size 1024x1536` (Tall Portrait 2:3) - `--size 1536x1024` (Wide Landscape 3:2) - `--size 512x512` (Small Square 1:1) Quantity - `--num_images [1-4]` number of images to generate (default: 1) Quality & Generation - `--num_inference_steps [10-50]` denoising steps for quality (default: 28, higher = better quality but slower) - `--guidance_scale [1.0-20.0]` how closely to follow prompt (default: 7.5) Customization - `--negative_prompt "text"` things to avoid in generated images - `--seed [integer]` reproducible generation with fixed seed (e.g., 42)
poe - kling-image-o1 - - Kling Image O1 image generation and image editing bot. Send up to 10 images to use as a reference, and refer to each image with $image1, $image2, etc. in the prompt to specify interactions. Set resolution with `--resolution` and aspect ratio with `--aspect`. Note: `auto` aspect ratio is default and can be used only for editing, text-to-image generation has a default of `1:1`. Supports jpeg, png, heic, webp images.
poe - kling-2.6-pro - - Generate high-quality videos with native audio from text and images using Kling 2.6 Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video. Use --silent to generate a silent video.
poe - flux-2-pro - - Flux.2 [Pro] is Black Forest Labs' state-of-the-art model with multi-reference support, fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex colour codes within the prompt for precise colouring. Send images (Up to 8 images) in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 9 megapixels. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
poe - flux-2-flex - - Flux.2 [Flex] is Black Forest Lab's latest model, with Multi-Reference Support, Fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex color codes within the prompt for precise coloring. Send images in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 14 megapixels. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
poe - flux-2-dev - - Open-weight image gen (32B) model, derived from the FLUX.2 base model. The most powerful open-weight image generation and editing model available today, combining text-to-image synthesis and image editing with multiple input images in a single checkpoint. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
poe - mistral-medium-3.1 - - Mistral Medium 3.1 is a high-performance, enterprise-grade language model that delivers strong reasoning, coding, and STEM capabilities. It supports hybrid, on-prem, and in-VPC deployments, offering competitive accuracy and easy integration across cloud environments. Context Length: 131k
poe - exa-answer - - Get a quick LLM-style answer to a question informed by Exa search results. For more in-depth results, consider using the following endpoint: https://poe.com/Exa-Research Supported file type upload: PDF, TXT, PNG, JPG, JPEG Audio and video file upload is not supported. Parameter Controls Available: - `--text false/true` Show text snippets under each source citation (default: false)
poe - exa-search - - Utilize Exa's technology for searching web pages, finding similar web pages, crawling, and more. Note: This endpoint does not return an LLM-style response (visit the following if you want an LLM-style response: https://poe.com/Exa-Answer or https://poe.com/Exa-Research). File upload is not supported. Parameter Controls Available: 1. Operation Mode - Default: `--operation search` (Web Search) - For finding similar pages: `--operation similar` - For getting page contents: `--operation contents` - For code search: `--operation code` 2. Search Settings (search operation) - `--search_type [auto|neural|deep|fast]` search algorithm (default: auto) - `--show_content` display full page content in results - `--include_domains` comma-separated domains to include - `--include_text` text that must appear (up to 5 words) - `--exclude_text` text that must NOT appear (up to 5 words) 3. Common Search Settings (search & similar operations) - `--num_results [1-100]` number of results to return (default: 10) - `--category [company|research paper|news|pdf|github|tweet|personal site|linkedin profile|financial report]` - `--exclude_domains` comma-separated domains to exclude 4. Date Filters (search operation) - `--start_crawl_date` results crawled after this date (ISO 8601) - `--end_crawl_date` results crawled before this date (ISO 8601) - `--start_published_date` content published after this date (ISO 8601) - `--end_published_date` content published before this date (ISO 8601) 5. Content Options (search, similar, & contents operations) - `--return_text` fetch page text content (default: true) - `--text_max_chars` limit text length (empty = unlimited) - `--include_html_tags` preserve HTML structure - `--return_highlights` get AI-selected key snippets - `--highlights_sentences [1-10]` sentences per highlight (default: 3) - `--highlights_per_url [1-10]` highlights per result (default: 3) - `--highlights_query` guide highlight selection - `--return_summary` get AI-generated summaries - `--summary_query` guide summary generation 6. Advanced Options (search, similar, & contents operations) - `--livecrawl [fallback|never|always|preferred]` when to fetch fresh content (default: fallback) - `--subpages [0-10]` number of linked subpages to crawl (default: 0) - `--subpage_target` find specific subpages matching keyword 7. Code Search Controls (code operation) - `--code_tokens [dynamic|5000|10000|20000]` response length (default: dynamic)
poe - exa-research - - Create an asynchronous research task that explores the web, gathers sources, synthesizes findings, and returns results with citations. Note: Responses may take several minutes to complete depending on complexity. Supported file type upload: PDF, TXT, PNG, JPG, JPEG Audio and video file upload is not supported. Parameter Controls Available: Model Selection - `--model exa-research` (Standard, default) - `--model exa-research-pro` (Deepest, highest quality) - `--model exa-research-fast` (Fastest, lightest)
poe - kat-coder-pro - - KAT-Coder-Pro V1 by KwaiKAT is a non-reasoning model optimized for agentic coding. It delivers strong performance on reasoning-style tasks while requiring significantly fewer output tokens than peer models. With the 1210 release, it achieved a score of 64 on the Artificial Analysis Intelligence Index, placing it in the global Top 10 and ranking first among all non-reasoning models. File Support: Text, Markdown and PDF files Context window: 256k tokens
poe - deepseek-v3.2-fw 5,300.00 - Model from DeepSeek that harmonizes high computational efficiency with superior reasoning and agent performance. File Support: Image (JPG, JPEG, PNG, HEIC), Other File Types (PDF, PYTHON, XLSX)
poe - nova-lite-2 - - Amazon Nova 2 Lite is a fast, cost-effective multimodal reasoning model from Amazon that can process text, images, documents, and video, designed for everyday workloads like chatbots, document processing, and business automation. It offers a 1 million token context window, enabling very large, complex inputs in a single request, including long documents and extended video clips (~90 minutes). Note: Video file uploads are limited to ~1GB. Also note that reasoning traces are not exposed from AWS. Supported file types: JPEG, PNG, GIF, WEBP, PDF, DOCX, TXT, MP4, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP Parameter controls available: '--enable_reasoning true/false' - Enable step-by-step reasoning (default: true). '--reasoning_effort low/medium/high' - Specify the reasoning effort level (default: medium).
poe - gpt-oss-120b-t 1,500.00 - OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. Built with community feedback and released under Apache 2.0, this 120B parameter model provides transparency, customization, and deployment flexibility for organizations requiring complete data security & privacy control.
poe - gpt-oss-20b-t 450.00 - OpenAI's GPT-OSS-20B provides powerful chain-of-thought reasoning in an efficient 20B parameter model. Designed for single-GPU deployment while maintaining sophisticated reasoning capabilities, this Apache 2.0 licensed model offers the perfect balance of performance and resource efficiency for diverse applications.
poe - amazon-nova-reel-1.1 - - Amazon Nova Reel 1.1 is an advanced AI video generation model that creates up to 2-minute multi-shot videos from text and optional image prompts, offering improved video quality, latency, and visual consistency compared to its predecessor.
poe - kimi-k2-think-t 13,000.00 - Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed. Supported File Types: JPEG, PNG, PDF
poe - amazon-nova-canvas - - Amazon Nova Canvas is a high-quality image‐generation model that creates and edits images from text or image inputs—offering features like inpainting/outpainting, virtual try‑on, style controls, and background removal—all with built‑in customization.
poe - kimi-k2 6,300.00 - Kimi K2-Instruct-0905 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Key Features: - Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability. - MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up. - Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving. Technical Specifications File Support: Attachments not supported Context window: 256k tokens
poe - kimi-k2-0905-t 11,000.00 - The new Kimi K2-0905 model from Moonshot AI features a massive 256,000-token context window, double the length of its predecessor (Kimi K2), along with greatly improved coding abilities and front-end generation accuracy. It boasts 1 trillion total parameters (with 32 billion activated at a time) and claims 100% tool-call success in real-world tests, setting a new bar for open-source AI performance in complex, multi-step tasks
poe - kimi-k2-t 11,000.00 - Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
poe - kimi-k2-instruct 6,000.00 - Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Uses the latest September 5th, 2025 snapshot. The updated version has improved coding abilities, agentic tool use, and a longer (256K) context window.
poe - deepseek-v3.1 7,800.00 - Latest Update: Terminus Enhancement This model has been updated with the Terminus release, addressing key user-reported issues while maintaining all original capabilities: - Language consistency: Reduced instances of mixed Chinese-English text and abnormal characters - Enhanced agent capabilities: Optimized performance of the Code Agent and Search Agent Core Capabilities DeepSeek-V3.1 is a hybrid model supporting both thinking mode and non-thinking mode, built upon the original V3 base checkpoint through a two-phase long context extension approach. Technical Specifications Context Window: 128k tokens File Support: PDF, DOC, and XLSX files File Restrictions: Does not accept audio and video files
poe - glm-4.6-fw 6,000.00 - As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
poe - deepseek-v3.1-t 6,000.00 - DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects: Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template. Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
poe - glm-4.5 5,700.00 - The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. Technical Specifications File Support: PDF and Markdown files Context window: 128k tokens
poe - deepseek-v3.1-n 5,700.00 - DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects: - Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template. - Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. - Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe - qwen3-coder 9,000.00 - Qwen3 Coder 480B A35B Instruct is a state-of-the-art 480B-parameter Mixture-of-Experts model (35B active) that achieves top-tier performance across multiple agentic coding benchmarks. Supports 256K native context length and scales to 1M tokens with extrapolation. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company.
poe - claude-sonnet-4 2.60 13.00 Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 1m context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
poe - claude-opus-4 13.00 64.00 Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
poe - claude-opus-4-reasoning 13.00 64.00 Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
poe - claude-sonnet-4-reasoning 2.60 13.00 Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 60k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 61,440 to the end of your message.
poe - o4-mini 0.99 4.00 o4-mini provides high intelligence on a variety of tasks and domains, including science, math, and coding at an affordable price point. This bot uses medium reasoning effort by low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe - gemini-deep-research 1.60 9.60 Gemini Deep Research plans, executes, and synthesizes complex, multi-step investigations by querying the web and other data to produce detailed, structured reports. Offers best in the world performance on Google's newly released DeepSearchQA benchmark as of December 2025. Be sure to give your entire research request in the initial prompt and include as much detail as you can! use --interaction_id flag if you want to continue discussion in previous research task.
poe - o4-mini-deep-research 1.80 7.20 Deep Research from OpenAI powered by the o4-mini model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
poe - glm-4.5-air-t 2,400.00 - The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
poe - glm-4.5-fw 5,400.00 - The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters. It unifies reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
poe - grok-3 - - xAI's February 2025 flagship release representing nearly state-of-the-art performance in several reasoning/problem solving domains. The API doesn't yet support reasoning mode for Grok 3, but does for https://poe.com/Grok-3-Mini; this bot also doesn't have access to the X data feed. Supports 131k tokens of context, uses Grok 2 for native vision.
poe - grok-3-mini - - xAI's February 2025 release with strong performance across many domains but at a more affordable price point. Supports reasoning with a configurable reasoning effort level, and 131k tokens of context; doesn't have access to the X data feed. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low" or "high".
poe - o3 1.80 7.20 o3 provides state-of-the-art intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe - o3-deep-research 9.00 36.00 Deep Research from OpenAI powered by the o3 model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
poe - elevenlabs-v3 - - ElevenLabs v3 is a cutting-edge text-to-speech model that brings scripts to life with remarkable realism and performance-level control. Unlike traditional TTS systems, it allows creators to shape the emotional tone, pacing, and soundscape of their audio through the use of inline audio tags. These tags are enclosed in square brackets and act as stage directions—guiding how a line is spoken or what sound effects are inserted—without being spoken aloud. This enables rich, expressive narration and dialogue for applications like audiobooks, games, podcasts, and interactive media. Whether you’re aiming for a tense whisper, a sarcastic remark, or a dramatic soundscape full of explosions and ambient effects, v3 gives you granular control directly in the text prompt. This bot will also run text-to-speech on PDF attachments / URL links. Examples of voice delivery tags include: * [whispers] I have to tell you a secret. * [angry] That was *never* the plan. * [sarcastic] Oh, sure. That’ll totally work. * and [laughs] You're hilarious. Examples of sound effect tags are: * [gunshot] Get down! * [applause] Thank you, everyone. * and [explosion] What was that?! These can also be combined. Multiple speakers can be supported via the parameter control. Dialogue for multiple speakers must follow the format, e.g. for 3 speakers: Speaker 1: [dialogue] Speaker 2: [dialogue] Speaker 3: [dialogue] Speaker 1: [dialogue] Speaker 2: [dialogue] --speaker_count 3 --voice_1 [voice_1] --voice_2 [voice_2] --voice_3 [voice_3] The following voices are supported: Alexandra - Conversational & Real Amy - Young & Natural Arabella - Mature Female Narrator Austin - Good Ol' Texas Boy Blondie - Warm & Conversational Bradford - British Male Storyteller Callum - Gravelly Yet Unsettling Charlotte - Raspy & Sensual Chris - Down-to-Earth Coco Li - Shanghainese Female Gaming - Unreal Tonemanagement 2003 Harry - Animated Warrior Hayato - Soothing Zen Male Hope - Upbeat & Clear James - Husky & Engaging James Gao - Calm Chinese Voice Jane - Professional Audiobook Reader Jessica - Playful American Female Juniper - Grounded Female Professional Karo Yang - Youthful Asian Male Kuon - Acute Fantastic Female Laura - Quirky Female Voice Liam - Warm, Energetic Youth Monika Sogam - Indian-English Accent Nichalia Schwartz - Engaging Female American Priyanka Sogam - Late-Night Radio Reginald - Brooding, Intense Villain ShanShan - Young, Energetic Female Xiao Bai - Shrill & Annoying Prompt input cannot exceed 5,000 characters.
poe - deepseek-v3 12,000.00 - DeepSeek-V3 – the new top open-source LLM. Updated to the March 24, 2025 checkpoint. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to Together, a US-based company. Supports 131k context window and max output of 12k tokens.
poe - deepseek-v3-fw 9,000.00 - DeepSeek-V3 is an open-source Mixture-of-Experts (MoE) language model; able to perform well on competitive benchmarks with cost-effective training & inference. All data submitted to this bot is governed by the Poe privacy policy and is sent to Fireworks, a US-based company. Supports 131k context window and max output of 131k tokens. Updated to serve the latest March 24th, 2025 snapshot.
poe - deepseek-v3.1-tm 5,700.00 - DeepSeek-V3.1-Terminus preserves all original model capabilities while resolving key user-reported issues, including: - Language consistency: Significantly reducing mixed Chinese-English output and eliminating abnormal character occurrences - Agent performance: Enhanced optimization of both Code Agent and Search Agent functionality - Use `--enable_thinking false` to disable thinking about the response before giving a final answer. - The bot does not accept attachment. It also does not support billing logic Context window: 128k tokens.
poe - gpt-4.1 1.80 7.20 OpenAI’s GPT-4.1 significantly improves on past models in terms of its coding skills, long context (1M tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4o. Provides a 75% chat history cache discount. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - gpt-4.1-mini 0.36 1.40 GPT-4.1 mini is a small, fast & affordable model that matches or beats GPT-4o in many intelligence and vision-related tasks. Supports 1M tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
poe - gpt-4.1-nano 0.09 0.36 GPT-4.1 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 1M input tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5-nano.
poe - llama-4-scout-t 1,000.00 - Llama 4 Scout, fast long-context multimodal model from Meta. A 16-expert MoE model that excels at multi-document analysis, codebase reasoning, and personalized tasks. A smaller model than Maverick but state of the art in its size & with text + image input support. Supports 300k context.
poe - claude-opus-4-search 13.00 64.00 Claude Opus 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe - claude-sonnet-4-search 2.60 13.00 Claude Sonnet 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe - claude-sonnet-3.7 2.60 13.00 Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. For the maximum extending thinking, please use https://poe.com/Claude-Sonnet-Reasoning-3.7. Supports a 200k token context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 16,384 to the end of your message.
poe - claude-sonnet-3.5 2.60 13.00 Anthropic's Claude Sonnet 3.5 using the October 22, 2024 model snapshot. Excels in complex tasks like coding, writing, analysis and visual processing. Has a context window of 200k of tokens (approximately 150k English words).
poe - claude-haiku-3.5 0.68 3.40 The latest generation of Anthropic's fastest model. Claude Haiku 3.5 has fast speeds and improved instruction following.
poe - gemini-2.0-flash 0.10 0.42 Gemini 2.0 Flash is Google's most popular model yet with enhanced performance and blazingly fast response times; supports web search grounding so can intelligently answer questions related to recent events. Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed. Supports 1 million tokens of input context. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe - gemini-2.0-flash-lite 0.05 0.21 Gemini 2.0 Flash Lite is a new model variant from Google that is our most cost-efficient model yet, and often considered a spiritual successor to Gemini 1.5 Flash in terms of capability, context window size and cost. Does not support web search (if you need search, we recommend using https://poe.com/Gemini-2.0-Flash), supports 1 million tokens of input context.
poe - claude-sonnet-3.7-search 2.60 13.00 Claude Sonnet 3.7 with access to real-time information from the web. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe - claude-haiku-3.5-search 0.68 3.40 Claude Haiku 3.5 with access to real-time information from the web.
poe - qwen3-max - - Qwen3-Max is a major update to the Qwen3 series, delivering significant improvements in reasoning, instruction following, and multilingual support. It provides higher accuracy in complex tasks like coding and math, along with reduced hallucinations and better performance on open-ended questions. This model is served by Alibaba Cloud Int. from Singapore.
poe - gpt-oss-120b 1,200.00 - OpenAI introduces the GPT-OSS-120B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe - gpt-oss-20b 450.00 - OpenAI introduces the GPT-OSS-20B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-20B model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe - gpt-oss-120b-cs 3,200.00 - World’s fastest inference for GPT OSS 120B with Cerebras. OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. The bot does not accept video, ppt, docx and excel files.
poe - openai-gpt-oss-120b 1,500.00 - GPT-OSS-120b is a high-performance, open-weight language model designed for production-grade, general-purpose use cases. It fits on a single H100 GPU, making it accessible without requiring multi-GPU infrastructure. Trained on the Harmony response format, it excels at complex reasoning and supports configurable reasoning effort, full chain-of-thought transparency for easier debugging and trust, and native agentic capabilities for function calling, tool use, and structured outputs.
poe - openai-gpt-oss-20b 750.00 - GPT-OSS-20B is a compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments. It shares the same Harmony training foundation and capabilities as 120B, with faster inference and easier deployment that is ideal for specialized or offline use cases, fast responsive performance, chain-of-thought output, and agentic workflows.
poe - qwen3-next-instruct-t 2,400.00 - Qwen3-Next Instruct features a highly sparse MoE structure that activates only 3B of its 80B parameters during inference. Supports only instruct mode without thinking blocks, delivering performance on par with Qwen3-235B-A22B-Instruct-2507 on certain benchmarks while using less than 10% training cost and providing 10x+ higher throughput on contexts over 32K tokens.
poe - qwen3-next-think-t 3,000.00 - Qwen3-Next Thinking features the same highly sparse MoE architecture but specialized for complex reasoning tasks. Supports only thinking mode with automatic tag inclusion, delivering exceptional analytical performance while maintaining extreme efficiency with 10x+ higher throughput on long contexts and may generate longer thinking content than predecessors.
poe - qwen3-max-n 22,000.00 - Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode. File Support: Text, Markdown and PDF files Context window: 256k tokens
poe - qwen3-vl-235b-a22b-t 4,800.00 - Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment. Key Features: - Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks. - Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos. - Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis. - Long Context: Processes up to 1M tokens for books or long videos. - Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers. - Visual Recognition: Recognizes a wide range of objects, landmarks, and more. - OCR: Supports 32 languages with improved performance in challenging conditions. - Text-Vision Fusion: Achieves seamless, unified comprehension. Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens
poe - qwen3-vl-235b-a22b-i 3,600.00 - This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on‑demand deployment. Key Enhancements: Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos. Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers. Upgraded Visual Recognition: Broader, higher-quality pretraining is able to "recognize everything"—celebrities, anime, products, landmarks, flora/fauna, etc. Expanded OCR: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. Text Understanding on par with pure LLMs: Seamless text–vision fusion for lossless, unified comprehension. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens
poe - qwen-3-235b-2507-t 1,900.00 - Qwen3 235B A22B 2507, currently the best instruct model (non-reasoning) among both closed and open source models. It excels in instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. It is also great at multilingual tasks and supports a long context window (262k).
poe - qwen3-235b-2507-fw 2,700.00 - State-of-the-art language model with exceptional math, coding, and problem-solving performance. Operates in non-thinking mode, and does not generate <think></think> blocks in its output. Supports 256k tokens of native context length. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company. Uses the latest July 21st, 2025 snapshot (Qwen3-235B-A22B-Instruct-2507).
poe - qwen3-235b-2507-cs 6,000.00 - World's fastest inference with Qwen3 235B Instruct (2507) model with Cerebras. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage.
poe - qwen3-coder-480b-t 17,000.00 - Qwen3‑Coder‑480B is a state of the art mixture‑of‑experts (MoE) code‑specialized language model with 480 billion total parameters and 35 billion activated parameters. Qwen3‑Coder delivers exceptional performance across code generation, function calling, tool use, and long‑context reasoning. It natively supports up to 262,144‑token context windows, making it ideal for large repository and multi‑file coding tasks.
poe - qwen3-coder-480b-n 7,200.00 - Qwen3-Coder-480B-A35B-Instruct delivers Claude Sonnet-comparable performance on agentic coding and browser tasks while supporting 256K-1M token long-context processing and multi-platform agentic coding capabilities. Technical Specifications File Support: Attachments not supported Context window: 256k tokens
poe - qwen3-235b-a22b-di 1,900.00 - Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP8.
poe - qwen3-235b-a22b-n 1,800.00 - It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). The Bot does not currently support attachments. This feature the following key enhancements: - Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. - Substantial gains in long-tail knowledge coverage across multiple languages. - Markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. - Enhanced capabilities in 256K long-context understanding. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe - magistral-medium-2509-thinking - - Magistral Medium 2509 (thinking) by EmpirioLabs. Magistral is Mistral's first reasoning model. It is ideal for general purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs. From legal research and financial forecasting to software development and creative storytelling — this model solves multi-step challenges where transparency and precision are critical. Context Window: 40,000k Supported file type uploads: PDF, XLSX, TXT, PNG, JPG, JPEG
poe - o1 14.00 54.00 OpenAI's o1 is designed to reason before it responds and provides world-class capabilities on complex tasks (e.g. science, coding, and math). Improving upon o1-preview and with higher reasoning effort, it is also capable of reasoning through images and supports 200k tokens of input context. By default, uses reasoning_effort of medium, but low, medium & high are also selectable.
poe - o1-pro 140.00 540.00 OpenAI’s o1-pro highly capable reasoning model, tailored for complex, compute- or context-heavy tasks, dedicating additional thinking time to deliver more accurate, reliable answers. For less costly, complex tasks, https://poe.com/o3-mini is recommended. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe - cartesia-ink-whisper - - Transcribe audio files using Speech-to-Text with the Cartesia Ink Whisper model. Select the Language (`--language`) of your audio file in Settings. Default is English (en). Supported Languages: English (en) Chinese (zh) German (de) Spanish (es) Russian (ru) Korean (ko) French (fr) Japanese (ja) Portuguese (pt) Turkish (tr) Polish (pl) Catalan (ca) Dutch (nl) Arabic (ar) Swedish (sv) Italian (it) Indonesian (id) Hindi (hi) Finnish (fi) Vietnamese (vi) Hebrew (he) Ukrainian (uk) Greek (el) Malay (ms) Czech (cs) Romanian (ro) Danish (da) Hungarian (hu) Tamil (ta) Norwegian (no) Thai (th) Urdu (ur) Croatian (hr) Bulgarian (bg) Lithuanian (lt) Latin (la) Maori (mi) Malayalam (ml) Welsh (cy) Slovak (sk) Telugu (te) Persian (fa) Latvian (lv) Bengali (bn) Serbian (sr) Azerbaijani (az) Slovenian (sl) Kannada (kn) Estonian (et) Macedonian (mk) Breton (br) Basque (eu) Icelandic (is) Armenian (hy) Nepali (ne) Mongolian (mn) Bosnian (bs) Kazakh (kk) Albanian (sq) Swahili (sw) Galician (gl) Marathi (mr) Punjabi (pa) Sinhala (si) Khmer (km) Shona (sn) Yoruba (yo) Somali (so) Afrikaans (af) Occitan (oc) Georgian (ka) Belarusian (be) Tajik (tg) Sindhi (sd) Gujarati (gu) Amharic (am) Yiddish (yi) Lao (lo) Uzbek (uz) Faroese (fo) Haitian Creole (ht) Pashto (ps) Turkmen (tk) Nynorsk (nn) Maltese (mt) Sanskrit (sa) Luxembourgish (lb) Myanmar (my) Tibetan (bo) Tagalog (tl) Malagasy (mg) Assamese (as) Tatar (tt) Hawaiian (haw) Lingala (ln) Hausa (ha) Bashkir (ba) Javanese (jw) Sundanese (su) Cantonese (yue)
poe - chatgpt-4o-latest 4.50 14.00 Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks. Supports context window of 128k tokens, cannot generate images.
poe - gpt-4o-mini 0.14 0.54 This intelligent small model from OpenAI is significantly smarter, cheaper, and just as fast as GPT-3.5 Turbo. Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
poe - glm-4.6-t 6,600.00 - GLM-4.6 is the latest flagship model from Z.ai's GLM series, delivering state-of-the-art agentic and coding capabilities that rival Claude Sonnet 4. With 357B parameters in a Mixture-of-Experts architecture, an expanded 200K context window, and 30% improved token efficiency, GLM-4.6 represents the top-performing model developed in China.
poe - qwen3-max-preview - - A preview version of the Max model in the Tongyi Qianwen 3 series, achieving an effective integration of thinking and non-thinking modes. In thinking mode, there is a significant enhancement in capabilities such as intelligent agent programming, common-sense reasoning, and reasoning across mathematics, science, and general domains. This model is served by Alibaba Cloud Int. from Singapore. Notes: - Audio/Video files are not supported. - Max Context Window: 252k Use '-- enable_thinking true/false' to enable/disable Deep Thinking accordingly.
poe - o3-mini 0.99 4.00 o3-mini is OpenAI's reasoning model, providing high intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high can be selected; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe - o3-mini-high 0.99 4.00 o3-mini-high is OpenAI's most recent reasoning model with reasoning_effort set to high, providing frontier intelligence on most tasks. Like other models in the o-series, it is designed to excel at science, math, and coding tasks. Supports 200k tokens of input context and 100k tokens of output context.
poe - llama-3.1-8b-di 300.00 - The smallest and fastest model from Meta's Llama 3.1 family. This open-source language model excels in multilingual dialogue, outperforming numerous industry benchmarks for both closed and open-source conversational AI systems. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Input token limit 128k, output token limit 8k. Quantization: FP16 (official).
poe - claude-sonnet-3.7-reasoning 2.60 13.00 Reasoning capabilities on by default. Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. Recommended for complex math or coding problems. Supports a 200k token context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe - inception-mercury - - Mercury is the first diffusion large language model (dLLM). On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. A new generation of LLMs that push the frontier of fast, high-quality text generation.
poe - inception-mercury-coder - - Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder Small's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the blog post here: https://www.inceptionlabs.ai/introducing-mercury.
poe - mistral-medium-3 - - Mistral Medium 3 is a powerful, cost-efficient language model offering top-tier reasoning and multimodal performance. Context Window: 130k
poe - mistral-medium 2.70 8.10 Mistral AI's medium-sized model. Supports a context window of 32k tokens (around 24,000 words) and is stronger than Mixtral-8x7b and Mistral-7b on benchmarks across the board.
poe - llama-4-maverick-t 1,600.00 - Llama 4 Maverick, state of the art long-context multimodal model from Meta. A 128-expert MoE powerhouse for multilingual image/text understanding (12 languages), creative writing, and enterprise-scale applications—outperforming Llama 3.3 70B. Supports 500k tokens context.
poe - llama-3.3-70b-fw 4,200.00 - Meta's Llama 3.3 70B Instruct, hosted by Fireworks AI. Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
poe - llama-3.3-70b 3,900.00 - Llama 3.3 70B – with similar performance as Llama 3.1 405B while being faster and much smaller! Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
poe - deepseek-prover-v2 - - DeepSeek-Prover-V2 is an open-source large language model specifically designed for formal theorem proving in Lean 4. The model builds on a recursive theorem proving pipeline powered by the company's DeepSeek-V3 foundation model.
poe - deepseek-r1-fw 18,000.00 - State-of-the-art large reasoning model problem solving, math, and coding performance at a fraction of the cost; explains its chain of thought. All data you provide this bot will not be used in training, and is sent only to Fireworks AI, a US-based company. Supports 164k tokens of input context and 164k tokens of output context. Uses the latest May 28th, 2025 snapshot.
poe - deepseek-r1-di 6,000.00 - Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
poe - deepseek-r1-n 6,000.00 - The DeepSeek-R1 (latest Snapshot model DeepSeek-R1-0528) model features enhanced reasoning and inference capabilities through optimized algorithms and increased computational resources. It excels in mathematics, programming, and logic, with performance nearing top-tier models like o3 and Gemini 2.5 Pro. This bot does not accept attachments. Technical Specifications File Support: Attachments not supported Context window: 160k tokens
poe - llama-3.3-70b-n 1,400.00 - The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe - llama-3.3-70b-cs 7,800.00 - World’s fastest inference for Llama 3.3 70B with Cerebras. The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
poe - llama-3.1-70b-t 14,000.00 - Llama 3.1 70B Instruct from Meta. Supports 128k tokens of context. The points price is subject to change.
poe - llama-3.1-8b-cs 900.00 - World’s fastest inference for Llama 3.1 8B with Cerebras. This Llama 8B instruct-tuned version is fast and efficient. The Llama 3.1 8B is an instruction tuned text only model, optimized for multilingual dialogue use cases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.
poe - gpt-researcher - - GPT Researcher is an agent that conducts deep research on any topic and generates a comprehensive report with citations. GPT Researcher is powered by Tavily's search engine. GPTR is based on the popular open source project: https://github.com/assafelovic/gpt-researcher -- by integrating Tavily search, it is optimized for curation and ranking of trusted research sources. Learn more at https://gptr.dev or https://tavily.com
poe - web-search - - Web-enabled assistant bot that searches the internet to inform its responses. Particularly good for queries regarding up-to-date information or specific facts. Powered by Gemini 2.0 Flash.
poe - gpt-4o-search 2.20 9.00 OpenAI's fine-tuned model for searching the web for real-time information. For less expensive messages, consider https://poe.com/GPT-4o-mini-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
poe - gpt-4o-mini-search 0.14 0.54 OpenAI's fine-tuned model for searching the web for real-time information. For higher-performance, consider https://poe.com/GPT-4o-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
poe - reka-research - - Reka Research is a state-of-the-art agentic AI that answers complex questions by browsing the web. It excels at synthesizing information from multiple sources, performing work that usually takes hours in minutes
poe - perplexity-sonar - - Sonar by Perplexity is a cutting-edge AI model that delivers real-time, web-connected search results with accurate citations. It's designed to provide up-to-date information and customizable search sources, making it a powerful tool for integrating AI search into various applications. Context Length: 127k
poe - linkup-deep-search - - Linkup Deep Search is an AI-powered search bot that continues to search iteratively if it hasn't found sufficient information on the first attempt. Results are slower compared to its Standard search counterpart, but often yield to more comprehensive results. Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Context Window: 100k Audio/video files are not supported at this time. Parameter controls available: 1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains. 2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format 3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display. Learn more: https://www.linkup.so/
poe - linkup-standard - - Linkup Standard is an AI-powered search bot that provides detailed overviews and answers sourced from the web, helping you find high-quality information quickly and accurately. Results are faster compared to its Deep search counterpart. Context Window: 100k Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Audio/video files are not supported at this time. Parameter controls available: 1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains. 2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format 3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display. Learn more: https://www.linkup.so/
poe - perplexity-sonar-pro - - Sonar Pro by Perplexity is an advanced AI model that enhances real-time, web-connected search capabilities with double the citations and a larger context window. It's designed for complex queries, providing in-depth, nuanced answers and extended extensibility, making it ideal for enterprises and developers needing robust search solutions. Context Length: 200k (max output token limit of 8k)
poe - perplexity-sonar-rsn-pro - - This model operates on the open-sourced uncensored R1-1776 model from Perplexity with web search capabilities. The Perplexity Sonar Rsn Pro Reasoning Model takes AI-powered answers to the next level, offering unmatched quality and precision. Outperforming leading search engines and LLMs, This model has demonstrated superior performance in the SimpleQA benchmark, making it the gold standard for high-quality answer generation. Context Length: 128k (max output token limit of 8k)
poe - perplexity-deep-research - - Perplexity Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains like finance, technology, health, and current events. Context Length: 128k
poe - flux-pro-1.1-ultra - - State-of-the-art image generation with four times the resolution of standard FLUX-1.1-pro. Best-in-class prompt adherence and pixel-perfect image detail. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Add "--raw" (no other arguments needed) for an overall less processed, everyday aesthetic. Valid aspect ratios are 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21. Send an image to have this model reimagine/regenerate it via FLUX Redux, and use "--strength" (e.g --strength 0.7) to control the impact of the text prompt (1 gives greater influence, 0 means very little)."--raw true" to enable raw photographic detail.
poe - mistral-small-3.1 - - Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.
poe - claude-opus-3 13.00 64.00 Anthropic's Claude Opus 3 can handle complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks. Supports 200k tokens of context (approximately 150k English words).
poe - sonic-3.0 6,000.00 - Generates audio based on your prompt using the latest Cartesia's Sonic 3.0 text-to-speech model in your voice of choice. Supports 10k characters. You can select a voice and language in option menu in the input bar. The following voices are supported covering 42 languages (English, Arabic, Bengali, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Malay, Malayalam, Marathi, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Slovak, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese): -- English -- Ariana Kiefer Tessa Brandon Linda - Conversational Guide Ronald - Thinker Brooke - Big Sister Katie - Friendly Fixer Jacqueline - Reassuring Agent Caroline - Southern Guide -- Arabic -- Amira - Dreamy Whisperer Omar - High-Energy Presenter -- Bengali -- Pooja - Everyday Assistant Rubel - City Guide -- Bulgarian -- Ivana - Instruction Provider Georgi - Conversationalist -- Chinese -- Hua - Sunny Support Yue - Gentle Woman Tao - Lecturer Lan - Instructor -- Croatian -- Petra - Strict Lecturer Ivan - Bar Companion -- Czech -- Jana - Crisp Conversationalist Petr - Pastor -- Danish -- Katrine - Calm Caregiver -- Dutch -- Bram - Instructional Daan - Business Baritone Sanne - Clear Companion Lucas - Storyteller -- Finnish -- Helmi - Warm Friend Mikko - Narration Expert -- French -- Helpful French Lady French Narrator Man Calm French Woman Antoine - Stern Man -- Georgian -- Levan - Support Guide Tamara - Support Specialist -- German -- Thomas - Anchor Viktoria - Phone Conversationalist Lukas - Professional Lena - Muse -- Greek -- Despina - Motherly Woman Nikos - Radio Storyteller -- Gujarati -- Isha - Learner Amit - Sports Student -- Hebrew -- Noam - Broadcaster -- Hindi -- Arushi - Hinglish Speaker Sunil - Official Announcer Riya - College Roommate Aadhya - Soother -- Hungarian -- Gabor - Reassuring Eszter - Customer Companion -- Indonesian -- Siti - Ad Narrator Andi - Dynamic Presenter -- Italian -- Liv - Casual Friend Alessandra - Melodic Guide Francesca - Elegant Partner Giancarlo - Support Leader -- Japanese -- Yumiko - Friendly Agent Emi - Soft-Spoken Friend Yuki - Calm Woman Daisuke - Businessman -- Kannada -- Prakash - Instructor Divya - Joyful Narrator -- Korean -- Jihyun - Anchorwoman Mimi - Show Stopper Byungtae - Enforcer Jiwoo - Service Specialist -- Malay -- Aisyah - Chat Partner Faiz - Family Guide -- Malayalam -- Latha - Friendly Host -- Marathi -- Suresh - Instruction Anika - Enthusiastic Seller -- Norwegian -- Lars - Casual Conversationalist -- Polish -- Tomek - Casual Companion Wojciech - Documentarian Piotr - Corporate Lead Katarzyna - Melodic Storyteller -- Portuguese -- Luana - Public Speaker Felipe - Casual Talker Ana Paula - Marketer Beatriz - Support Guide -- Punjabi -- Gurpreet - Companion Jaspreet - Commercial Woman -- Romanian -- Andrada - Steady Speaker Andrei - Conversationalist Guy -- Russian -- Tatiana - Friendly Storyteller Natalya - Soothing Guide Irina - Poetic Sergei - Expressive Narrator -- Slovak -- Katarina - Friendly Sales Peter - Narrator Man -- Spanish -- Pedro - Formal Speaker Daniela - Relaxed Woman Fran - Confident Young Professional Isabel - Teacher -- Swedish -- Freja - Nordic Reader Ingrid - Peaceful Guide Anders - Nordic Baritone Cees - Nordic Narrator -- Tagalog -- Luz - Casual Speaker Angelo - Calm Narrator -- Tamil -- Arun - Lively Lakshmi - Everyday -- Telugu -- Sindhu - Conversational Partner Vikram - Folk Narrator -- Thai -- Somchai - Star Suda - Fortune Teller -- Turkish -- Emre - Calming Speaker Leyla - Story Companion Azra - Service Specialist Taylan - Expressive -- Ukrainian -- Oleh - Professional Guy -- Vietnamese -- Minh - Conversational Partner Xia - Calm Companion
poe - hailuo-music-v1.5 - - Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions. Send the lyrics of the music over as your prompt. Use `--style` to set the style of the generated music - for example, rock and roll, hip-hop, etc. Both prompt/lyrics and style must be sent over for best quality. The prompt supports [intro][verse][chorus][bridge][outro] sections.
poe - elevenlabs-music - - The ElevenLabs music model is a generative AI system designed to compose original music from text prompts. It allows creators to specify genres, moods, instruments, and structure, producing royalty-free tracks tailored to their needs. The model emphasizes speed, creative flexibility, and high-quality audio output, making it suitable for use in videos, podcasts, games, and other multimedia projects. This bot can produce songs with suggested lyrics based on general descriptions, exact lyrics if specified as such, or instrumental ones, all via prompting. Use `--music_length_ms` to set the length of the song in milliseconds (10,000 to 300,000 ms). Prompt input cannot exceed 2,000 characters.
poe - whisper-v3-large-t 3,000.00 - Whisper v3 Large is a state-of-the-art automatic speech recognition and translation model developed by OpenAI, offering 10–20% lower error rates than its predecessor, Whisper large-v2. It supports transcription and translation across numerous languages, with improvements in handling diverse audio inputs, including noisy conditions and long-form audio files.
poe - stable-audio-2.5 - - Stable Audio 2.5 generates high-quality audio up to 3 minutes long from text prompts, supporting text-to-audio, audio-to-audio transformations, and inpainting with customizable settings like duration, steps, CFG scale, and more. It is Ideal for music production, cinematic sound design, and remixing. Note: Audio-to-audio and inpaint modes require a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - If replacing specific parts: `--mode audio-inpaint` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [4-8]`: Higher = better quality (recommended 6-8) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical) 5. Inpainting control (only for audio-inpaint) - `--mask_start_time [seconds]` start time of the uploaded audio to modify - `--mask_end_time [seconds]` end time of the uploaded audio to modify
poe - stable-audio-2.0 - - Stable Audio 2.0 generates audio up to 3 minutes long from text prompts, supporting text-to-audio and audio-to-audio transformations with customizable settings like duration, steps, CFG scale, and more. It is ideal for creative professionals seeking detailed and extended outputs from simple prompts. Note: Audio-to-audio mode requires a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [30-100]`: Higher = better quality (recommended 50-80) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical)
poe - hailuo-speech-02 - - Generate speech from text prompts using the MiniMax Speech-02 model. Include `--hd` at the end of your prompt for higher quality output with a higher price. You may set language with `--language`, voice with`--voice`, pitch with `--pitch`, speed with `--speed`, and volume with `--volume`. Please check the UI for allowed values for each parameter.
poe - elevenlabs-v2.5-turbo - - ElevenLabs' leading text-to-speech technology converts your text into natural-sounding speech, using the Turbo v2.5 model. Simply send a text prompt, and the bot will generate audio using your choice of available voices. If you link a URL or a PDF, it will do its best to read it aloud to you. The overall default voice is Jessica, an American-English female. Add --voice "Voice Name" to the end of a message (e.g. "Hello world --voice Eric") to customize the voice used. Add --language and the two-letter, Language ISO-639-1 code to your message if you notice pronunciation errors; table of ISO-639-1 codes here: https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes (e.g. zh for Chinese, es for Spanish, hi for Hindi) The following voices are supported and recommended for each language: English -- Sarah, George, River, Matilda, Will, Jessica, Brian, Lily, Monika Sogam Chinese -- James Gao, Martin Li, Will, River Spanish -- David Martin, Will, Efrayn, Alejandro, Sara Martin, Regina Martin Hindi -- Ranga, Niraj, Liam, Raju, Leo, Manu, Vihana Huja, Kanika, River, Monika Sogam, Muskaan, Saanu, Riya, Devi Arabic -- Bill, Mo Wiseman, Haytham, George, Mona, Sarah, Sana, Laura German -- Bill, Otto, Leon Stern, Mila, Emilia, Lea, Leonie Indonesian -- Jessica, Putra, Mahaputra Portuguese -- Will, Muhammad, Onildo, Lily, Jessica, Alice Vietnamese -- Bill, Liam, Trung Caha, Van Phuc, Ca Dao, Trang, Jessica, Alice, Matilda Filipino -- Roger, Brian, Alice, Matilda French -- Roger, Louis, Emilie Swedish -- Will, Chris, Jessica, Charlotte Turkish -- Cavit Pancar, Sohbet Adami, Belma, Sultan, Mahidevran Romanian -- Eric, Bill, Brian, Charlotte, Lily Italian -- Carmelo, Luca, Alice, Lily Polish -- Robert, Rob, Eric, Pawel, Lily, Alice Norwegian -- Chris, Charlotte Czech -- Pawel Finnish -- Callum, River Hungarian -- Brian, Sarah Japanese -- Alice Prompt input cannot exceed 40,000 characters.
poe - sonic-2.0 - - Generates audio based on your prompt using the latest Cartesia's Sonic 2.0 text-to-speech model in your voice of choice (see below) Add --voice [Voice Name] to the end of a message to customize the voice used or to handle different language inputs (e.g. 你好 --voice Chinese Commercial Woman). All of Cartesia's voices are supported on Poe. The following voices are supported covering 15 languages (English, French, German, Spanish, Portuguese, Chinese, Japanese, Hindi, Italian, Korean, Dutch, Polish, Russian, Swedish, Turkish): Here's the alphabetical list of all the top voice names: "1920's Radioman" Aadhya Adele Alabama Man Alina American Voiceover Man Ananya Anna Announcer Man Apoorva ASMR Lady Australian Customer Support Man Australian Man Australian Narrator Lady Australian Salesman Australian Woman Barbershop Man Brenda British Customer Support Lady British Lady British Reading Lady Brooke California Girl Calm French Woman Calm Lady Camille Carson Casper Cathy Chongz Classy British Man Commercial Lady Commercial Man Confident British Man Connie Corinne Customer Support Lady Customer Support Man Dallas Dave David Devansh Elena Ellen Ethan Female Nurse Florence Francesca French Conversational Lady French Narrator Lady French Narrator Man Friendly Australian Man Friendly French Man Friendly Reading Man Friendly Sidekick German Conversational Woman German Conversation Man German Reporter Man German Woman Grace Griffin Happy Carson Helpful French Lady Helpful Woman Hindi Calm Man Hinglish Speaking Woman Indian Lady Indian Man Isabel Ishan Jacqueline Janvi Japanese Male Conversational Joan of Ark John Jordan Katie Keith Kenneth Kentucky Man Korean Support Woman Laidback Woman Lena Lily Whisper Little Gaming Girl Little Narrator Girl Liv Lukas Luke Madame Mischief Madison Maria Mateo Mexican Man Mexican Woman Mia Middle Eastern Woman Midwestern Man Midwestern Woman Movieman Nathan Newslady Newsman New York Man Nico Nonfiction Man Olivia Orion Peninsular Spanish Narrator Lady Pleasant Brazilian Lady Pleasant Man Polite Man Princess Professional Woman Rebecca Reflective Woman Ronald Russian Storyteller Man Salesman Samantha Angry Samantha Happy Sarah Sarah Curious Savannah Silas Sophie Southern Man Southern Woman Spanish Narrator Woman Spanish Reporter Woman Spanish-speaking Reporter Man Sportsman Stacy Stern French Man Steve Storyteller Lady Sweet Lady Tatiana Taylor Teacher Lady The Merchant Tutorial Man Wise Guide Man Wise Lady Wise Man Wizardman Yogaman Young Shy Japanese Woman Zia
poe - gemini-2.5-flash-tts - - Gemini‑2.5‑Flash‑TTS is Google’s low‐latency text‑to‑speech model that converts text input into audio output, supporting both single‑ and multi‑speaker voices with controllable style, accent, and expressive tone — ideal for applications like podcasts, audiobooks, and conversational voice systems. This bot does not accept attachments. Parameter controls available: 1. Voice & Style Configuration - Basic Settings - `--mode single` (default) for single speaker or `--mode multi` for conversation - `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US) - `--output_format [MP3|WAV|OGG]` (default: MP3) - Single speaker: `--voice [voice_name]` (default: Charon) - Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore) - Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2) - Style Instructions - `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent") 2. Limitations - Text and style prompt limited to 4000 bytes each - Multi-speaker requires `SpeakerName: text` format Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm) Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
poe - gemini-2.5-pro-tts - - Gemini‑2.5‑Pro‑TTS is Google’s highest‑quality text‑to‑speech model preview, designed for complex workflows like podcasts, audiobooks, and customer support; it delivers expressive, accent‑ and style‑controllable single‑ or multi‑speaker speech, supporting over 23 languages, and built for state‑of‑the‑art output with the most powerful model architecture. This bot does not accept attachments. Parameter controls available: 1. Voice & Style Configuration - Basic Settings - `--mode single` (default) for single speaker or `--mode multi` for conversation - `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US) - `--output_format [MP3|WAV|OGG]` (default: MP3) - Single speaker: `--voice [voice_name]` (default: Charon) - Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore) - Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2) - Style Instructions - `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent") 2. Limitations - Text and style prompt limited to 4000 bytes each - Multi-speaker requires `SpeakerName: text` format Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm) Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
poe - orpheus-tts - - Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. Send a text prompt to voice it. Use --voice to choose from one of the available voices (`tara`, `leah`, `jess`, `leo`, `dan`,`mia`, `zac`, `zoe`). Officially supported sound effects are: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>, and <giggle>.
poe - deepgram-nova-3 - - Transcribe audio files using Speech-to-Text technology with the Deepgram Nova-3 model, featuring multi-language support and advanced customizable settings. [1] Basic Features: Use `--generate_pdf true` to generate a PDF file of the transcription, Use `--diarize true` to identify different speakers in the audio. This will automatically enable utterances. Use `--smart_format false` to disable automatic format text for improved readability including punctuation and paragraphs. This feature is enabled by default. [2] Advanced Features: Use `--dictation true` to convert spoken commands for punctuation into their respective marks (e.g., 'period' becomes '.'). This will automatically enable punctuation. Use `--measurements true` to format spoken measurement units into abbreviations Use `--profanity_filter true` to replace profanity with asterisks Use `--redact_pci true` to redact payment card information Use `--redact_pii true` to redact personally identifiable information Use `--utterances true` to segment speech into meaningful semantic units Use `--paragraphs false` to disable paragraphs feature. This feature split audio into paragraphs to improve transcript readability. This will automatically enable punctuation. This is enabled by default. Use `--punctuate false` to disable punctuate feature. This feature add punctuation and capitalization to your transcript. This is enabled by default. Use `--numerals false` to disable numerals feature. This feature convert numbers from written format to numerical format [3] Languages Supported: Auto-detect (Default) English Spanish French German Italian Portuguese Japanese Chinese Hindi Russian Dutch [4] Key Terms `--keyterm` to enter important terms to improve recognition accuracy, separated by commas. English only, Limited to 500 tokens total.
poe - playai-tts - - Generates audio based on your prompt using PlayHT's text-to-speech model, in the voice of your choice. Use --voice [voice_name] to pass in the voice of your choice, choosing one from below. Voice defaults to `Jennifer_(English_(US)/American)`. Jennifer_(English_(US)/American) Dexter_(English_(US)/American) Ava_(English_(AU)/Australian) Tilly_(English_(AU)/Australian) Charlotte_(Advertising)_(English_(CA)/Canadian) Charlotte_(Meditation)_(English_(CA)/Canadian) Cecil_(English_(GB)/British) Sterling_(English_(GB)/British) Cillian_(English_(IE)/Irish) Madison_(English_(IE)/Irish) Ada_(English_(ZA)/South_African) Furio_(English_(IT)/Italian) Alessandro_(English_(IT)/Italian) Carmen_(English_(MX)/Mexican) Sumita_(English_(IN)/Indian) Navya_(English_(IN)/Indian) Baptiste_(English_(FR)/French) Lumi_(English_(FI)/Finnish) Ronel_Conversational_(Afrikaans/South_African) Ronel_Narrative_(Afrikaans/South_African) Abdo_Conversational_(Arabic/Arabic) Abdo_Narrative_(Arabic/Arabic) Mousmi_Conversational_(Bengali/Bengali) Mousmi_Narrative_(Bengali/Bengali) Caroline_Conversational_(Portuguese_(BR)/Brazilian) Caroline_Narrative_(Portuguese_(BR)/Brazilian) Ange_Conversational_(French/French) Ange_Narrative_(French/French) Anke_Conversational_(German/German) Anke_Narrative_(German/German) Bora_Conversational_(Greek/Greek) Bora_Narrative_(Greek/Greek) Anuj_Conversational_(Hindi/Indian) Anuj_Narrative_(Hindi/Indian) Alessandro_Conversational_(Italian/Italian) Alessandro_Narrative_(Italian/Italian) Kiriko_Conversational_(Japanese/Japanese) Kiriko_Narrative_(Japanese/Japanese) Dohee_Conversational_(Korean/Korean) Dohee_Narrative_(Korean/Korean) Ignatius_Conversational_(Malay/Malay) Ignatius_Narrative_(Malay/Malay) Adam_Conversational_(Polish/Polish) Adam_Narrative_(Polish/Polish) Andrei_Conversational_(Russian/Russian) Andrei_Narrative_(Russian/Russian) Aleksa_Conversational_(Serbian/Serbian) Aleksa_Narrative_(Serbian/Serbian) Carmen_Conversational_(Spanish/Spanish) Patricia_Conversational_(Spanish/Spanish) Aiken_Conversational_(Tagalog/Filipino) Aiken_Narrative_(Tagalog/Filipino) Katbundit_Conversational_(Thai/Thai) Katbundit_Narrative_(Thai/Thai) Ali_Conversational_(Turkish/Turkish) Ali_Narrative_(Turkish/Turkish) Sahil_Conversational_(Urdu/Pakistani) Sahil_Narrative_(Urdu/Pakistani) Mary_Conversational_(Hebrew/Israeli) Mary_Narrative_(Hebrew/Israeli)
poe - unreal-speech-tts - - Convert chats, URLs, and documents into natural speech. 8 Languages: English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese. Use `--voice <VOICE_NAME>`. Defaults to `--voice Sierra`. Full list below: American English - Male: Noah, Jasper, Caleb, Ronan, Ethan, Daniel, Zane, Rowan - Female: Autumn, Melody, Hannah, Emily, Ivy, Kaitlyn, Luna, Willow, Lauren, Sierra British English - Male: Benjamin, Arthur, Edward, Oliver - Female: Eleanor, Chloe, Amelia, Charlotte Japanese - Male: Haruto - Female: Sakura, Hana, Yuki, Rina Chinese - Male: Wei, Jian, Hao, Sheng - Female: Mei, Lian, Ting, Jing Spanish - Male: Mateo, Javier - Female: Lucía French - Female: Élodie Hindi - Male: Arjun, Rohan - Female: Ananya, Priya Italian - Male: Luca - Female: Giulia Portuguese - Male: Thiago, Rafael - Female: Camila
poe - imagen-4-ultra 42,000.00 - DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-exp-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
poe - imagen-4-fast 14,000.00 - DeepMind's June 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-fast-generate-preview-06-06` model from Google Vertex, and has a maximum input of 480 tokens.
poe - imagen-4 28,000.00 - DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
poe - phoenix-1.0 17,000.00 - High-fidelity image generation with strong prompt adherence, especially for long and detailed instructions. Phoenix is capable of rendering coherent text in a wide variety of contexts. Prompt enhance is on to see the full power of a long, detailed prompt, but it can be turned off for full control. Uses the Phoenix 1.0 Fast model for performant, high-quality generations. Parameters: - Aspect Ratio (1:1, 3:2, 2:3, 9:16, 16:9) - Prompt Enhance (Enable the prompt for better image generation) - Style (Please see parameter control to identify available styles) Image generation prompts can be a maximum of 1500 characters.
poe - dreamina-3.1 - - ByteDance's Dreamina 3.1 Text-to-Image showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details. This model excels with large prompts, please use large prompts in case you face Content Checker issues. The model does not accept attachment. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, & 9:16.
poe - qwen-image 20,000.00 - Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Experiments show strong general capabilities in image generation, with exceptional performance in text rendering, especially for Chinese. Prompt input cannot exceed 2,000 characters.
poe - qwen-image-20b - - Qwen-Image (20B) is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt.
poe - hunyuan-image-2.1 - - Hunyuan Image 2.1 is a high quality, highly efficient text-to-image model. Send a prompt to generate an image. Use `--aspect` (one of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`) to set the aspect ratio of the generated image. Use `--negative_prompt` (examples: blur, low resolution, poor quality) to set negative prompt on the image generated. This bot does not accept attachment.
poe - flux-kontext-max - - FLUX.1 Kontext [max] is a new premium model from Black Forest Labs that brings maximum performance across all aspects. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
poe - flux-kontext-pro - - The FLUX.1 Kontext [pro] model delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, flawless typography, and image editing capabilities. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
poe - flux-krea - - FLUX-Krea is a version of FLUX Dev tuned for superior aesthetics. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Krea Redux.
poe - imagen-3 28,000.00 - Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For simpler prompts, faster results, & lower cost, use @Imagen3-Fast. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
poe - wan-animate - - Wan Animate takes in an image and a video to generate another video where a character in the image replaces a character in the video(default), or the video character's motion is used to animate the character in the image. Pass --animate for the second functionality. The bot supports only four file types: JPEG, PNG, WebP, and MP4
poe - imagen-3-fast 14,000.00 - Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts — optimized for short, simple prompts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For more complex prompts, use @Imagen3. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
poe - seedream-3.0 - - Seedream 3.0 by ByteDance is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation.
poe - seedance-1.0-pro - - Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. `--duration` (3 to 12) sets the video duration. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
poe - seedance-1.0-lite - - Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Optional paremeters: Use `--aspect` to set the aspect ratio (available values:`21:9`, `16:9`, `4:3`, `1:1`, `3:4` and `9:16`). Use `--resolution` (one of `480p`, `720p` and `1080p` to set the video resolution. Use `--duration` (3 to 12) sets the video duration. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
poe - ideogram-v3 - - Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use. Use `--aspect` to set the aspect ratio (Valid aspect ratios are 5:4, 4:3, 4:5, 1:1, 1:2, 1:3, 3:4, 3:1, 3:2, 2:1, 2:3, 16:9, 16:10, 10:16, 9:16), and use `--style` to specify a style (one of `AUTO`, `GENERAL`, `REALISTIC`, and `DESIGN`, default: `AUTO`.). Send one image with a prompt for image remixing/restyling. Send two images (one an image and the other a black-and-white mask image denoting an area) for image editing.
poe - ideogram-v2 57,000.00 - Latest image model from Ideogram, with industry leading capabilities in generating realistic images, graphic design, typography, and more. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, 1:1. "--style" parameter can be defined to specify the style of image generated(GENERAL, REALISTIC, DESIGN, RENDER_3D, ANIME). Powered by Ideogram.
poe - flux-dev-di 5,000.00 - High quality image generator using FLUX dev model. Top of the line prompt following, visual quality and output diversity. This model is a text to image generation only and does not accept attachments. To further customize the prompt, you can follow the parameters available: To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024 To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024 To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 25
poe - flux-schnell-di 990.00 - This is the fastest version of FLUX, featuring highly optimized abstract models that excel at creative and unconventional renders. To further customize the prompt, you can follow the parameters available: To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024 To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024 To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 1
poe - flux-pro-1.1 - - State-of-the-art image generation with top-of-the-line prompt following, visual quality, image detail and output diversity. This is the most powerful version of FLUX 1.1, use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
poe - luma-photon-flash - - Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
poe - hidream-i1-full - - Hidream-I1 is a state-of-the-art text to image model by Hidream. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt. Hosted by fal.ai.
poe - retro-diffusion-core - - Generate true game ready pixel art in seconds at any resolution between 16x16 and 512x512 across the various styles. Create 48x48 walking animations of sprites using the "animation_four_angle_walking" style! First 50 basic image requests worth of points free! Check out more settings below 👇 Example message: "A cute corgi wearing sunglasses and a party hat --ar 128:128 --style rd_fast__portrait" Settings: --ar <width>:<height> (Image size in pixels, larger images cost more. Or aspect ratio like 16:9) --style <style_name> (The name of the style you want to use. Available styles: rd_fast__anime, rd_fast__retro, rd_fast__simple, rd_fast__detailed, rd_fast__game_asset, rd_fast__portrait, rd_fast__texture, rd_fast__ui, rd_fast__item_sheet, rd_fast__mc_texture, rd_fast__mc_item, rd_fast__character_turnaround, rd_fast__1_bit, animation__four_angle_walking, rd_plus__default, rd_plus__retro, rd_plus__watercolor, rd_plus__textured, rd_plus__cartoon, rd_plus__ui_element, rd_plus__item_sheet, rd_plus__character_turnaround, rd_plus__isometric, rd_plus__isometric_asset, rd_plus__topdown_map, rd_plus__top_down_asset) --seed (Random number, keep the same for consistent generations) --tile (Creates seamless edges on applicable images) --tilex (Seamless horizontally only) --tiley (Seamless vertically only) --native (Returns pixel art at native resolution, without upscaling) --removebg (Automatically remove the background) --iw <decimal between 0.0 and 1.0> (Controls how strong the image generation is. 0.0 for small changes, 1.0 for big changes) Additional notes: All styles have a size range of 48x48 -> 512x512, except for the "mc" styles, which have a size range of 16x16 -> 128x128, and the "animation_four_angle_walking" style, which will only create 48x48 animations.
poe - stablediffusion3.5-l - - Stability.ai's StableDiffusion3.5 Large, hosted by @fal, is the Stable Diffusion family's most powerful image generation model both in terms of image quality and prompt adherence. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16.
poe - flux-schnell - - Turbo speed image generation with strengths in prompt following, visual quality, image detail and output diversity. This is the fastest version of FLUX.1. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
poe - gpt-image-1 - - OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query. For a conversational editing experience, use https://poe.com/GPT-4o (all users) or https://poe.com/Assistant (subscribers) instead. Optional parameters: `--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image ` --quality` (options: high, medium, low): Image resolution ` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit. `--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
poe - gpt-image-1-mini - - OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query. Optional parameters: `--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image ` --quality` (options: high, medium, low): Image resolution ` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit. `--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
poe - veo-3.1 - - Google’s Veo 3.1 is an updated version of the Veo family of models that features richer native audio, from natural conversations to synchronized sound effects, and offers greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes. Optional parameters: `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9` negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy` `--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s` `--seed` to set the seed (set number value) `--reference-mode` toggle to use input images(3 max) as reference for video generation For first & last frame video generation and references support, please use www.poe.com/Veo-v3.1
poe - veo-3.1-fast - - Google’s Veo 3.1 Fast is an updated version of the Veo family of models that's optimized for speed and cost, but still features richer native audio, from natural conversations to synchronized sound effects, and offer greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes. Optional parameters: `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9` negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy` `--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s` `--seed` to set the seed (set number value) For first & last frame video generation support, please use www.poe.com/Veo-v3.1-Fast
poe - sora-2-pro - - Sora 2 Pro is OpenAI’s state-of-the-art video and audio generation model, capable of creating richly detailed, dynamic clips with synchronized audio from natural language prompts or images. It builds on Sora 2’s capabilities with enhanced physical accuracy, intricate world-state persistence, and higher fidelity in cinematic styles. The model excels at generating synchronized dialogue, sound effects, and realistic simulations, all while adhering to real-world physics. Sora 2 Pro also supports seamless editing, complex multi-shot prompt execution, and the integration of real-world elements like people, animals, and objects with unparalleled detail and accuracy. This bot supports text-to-video and image-to-video generation. Optional parameters: `--duration` (options: 4, 8, 12): Video output duration in seconds `--size` (options: [Landscape] - 1280x720, 1792x1024, [Portrait] - 720x1280, 1024x1792): Resolution of the output video
poe - sora-2 - - Sora 2 is OpenAI’s latest video and audio generation model, delivering exceptional realism, physical accuracy, and controllability. It excels at creating cinematic scenes, synchronized dialogue, sound effects, and dynamic simulations while faithfully adhering to the laws of physics. The model supports editing, multi-shot prompt adherence, and the integration of real-world elements, such as people, animals, and objects. This bot supports text-to-video and image-to-video generation. Optional parameters: `--duration` (options: 4, 8, 12): Video output duration in seconds `--size` (options: [landscape] - 1280x720, [portrait] - 720x1280): Resolution of the output video
poe - kling-2.5-turbo-std - - Generate high-quality videos from images using Kling 2.5 Turbo Standard. Optional prompts: Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--duration` to set either 5 or 10 second video. Note - only Image to Video is supported, aspect ratio is inferred automatically from the image and cannot be set. Supported image file format: jpeg, png, webp
poe - wan-2.6 - - WAN 2.6 is Alibaba’s multimodal video generation model built for cinematic, multi-shot storytelling—creating high-fidelity videos from text and/or images while keeping characters and style consistent across scenes. It also supports native audio-visual sync (including lip-sync) and can generate or align dialogue/music/SFX with the visuals, enabling “prompt-to-video” results that feel production-ready without heavy post work. Notes: - This model is served from the Singapore area. - Upload an image to enable image-to-video generations or video(s) for video-to-video generations. - Responses may take upwards of 5 minutes (or more) to finish generating. Parameter controls available: 1. Video Settings - `--resolution 1080p` (default) or `--resolution 720p` - `--aspect_ratio 16:9` (default), `9:16`, `1:1`, `4:3`, or `3:4` (ignored for image-to-video as it uses the input image's aspect ratio) - `--duration [5, 10, or 15]` seconds (default: 5) (video-to-video limited to 10s max) 2. Advanced Settings - `--prompt_extend true` (default) or `--prompt_extend false`: AI prompt enhancement - `--audio true` (default) or `--audio false`: Enable/disable audio generation - `--shot_type multi` (default) or `--shot_type single`: Multi-shot narrative vs single continuous shot - `--seed [0-2147483646]`: Random seed for reproducibility - `--negative_prompt "text"`: Describe what you don't want in the video 3. Attachments - For i2v: Attach an image as the first frame - For r2v: Attach 1-3 reference videos (2-30 seconds each, MP4/MOV) (Use `character1`, `character2`, `character3` in prompt to reference subjects, ex. character1 references the subject in the first uploaded video) - For t2v/i2v: Optionally attach an audio file (3-30 seconds, max 15mb, .mp3/.wav) for custom audio 4. Multi-Shot Prompting - For multi-shot mode, use timeline syntax: `[Shot #] [Timestamp] [Action]`. Example: `[Shot 1] [0-5s] Wide shot of city skyline. [Shot 2] [5-10s] Close-up of character walking.` - Ensure timestamps match your selected duration and use transition keywords like "Hard cut" or "Fade in" between shots.
poe - seedream-4.0 - - Seedream 4.0 is ByteDance's latest and best text-to-image model, capable of impressive high fidelity image generation, with great text-rendering ability. Seedream 4.0 can also take in multiple images as references and combine them together or edit them to return an output. Pass `--aspect` to set the aspect ratio for the model (One of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`).
poe - kling-2.5-turbo-pro - - Generate high-quality videos from text and images using Kling 2.5 Turbo Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video.
poe - kling-2.1-master - - Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use --duration to set either 5 second or 10 second video.
poe - hailuo-02 - - Hailuo-02, MiniMax's latest video generation model. Generates 6-second, 768p videos, just submit a text prompt or an image with a prompt describing the desired video behavior, and it will create it; typically takes ~5 minutes for generation time. Strong motion effects and ultra-clear quality.
poe - hailuo-02-standard - - MiniMax Hailuo-02 Video Generation model: Advanced image-to-video generation model with 768p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Use `--duration` to set the video duration (6 or 10 seconds).
poe - hailuo-02-pro - - MiniMax Hailuo-02 Pro Video Generation model: Advanced image-to-video generation model with 1080p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Generates 5 second video.
poe - deepseek-r1-turbo-di 15,000.00 - Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. Turbo model is quantized to achieve higher speeds. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
poe - hailuo-director-01 - - Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control. Both text-to-video and image-to-video are supported. Camera movement instructions can be added using square brackets (e.g. [Pan left] or [Zoom in]). You can use up to 3 combined movements per prompt. Duration is fixed to 5 seconds. Supported movements: Truck left/right, Pan left/right, Push in/Pull out, Pedestal up/down, Tilt up/down, Zoom in/out, Shake, Tracking shot, Static shot. For example: [Truck left, Pan right, Zoom in]. For a more detailed guide, refer https://sixth-switch-2ac.notion.site/T2V-01-Director-Model-Tutorial-with-camera-movement-1886c20a98eb80f395b8e05291ad8645
poe - pixverse-v5 - - Pixverse v5 offers advanced creative tools with three main features: Text-to-Video, which transforms written prompts into cinematic, high-detail video clips with fluid motion and accurate visual interpretation; Image-to-Video, which animates static images into dynamic short videos with lifelike motion and smooth transitions; and Transition, which generates seamless morphs between frames or scenes to create unified, professional-quality visual flow. Parameter Controls and Usage: 1. Video Generation (Main Control Section) - `--resolution [360p|540p|720p|1080p]` - Description: Video resolution. - Default: 720p - `--duration [5|8]` - Description: Video length in seconds. - Default: 5 - `--aspect_ratio [16:9|4:3|1:1|3:4|9:16]` - Description: Video aspect ratio. - Default: 16:9 - `--style [none|anime|3d_animation|clay|comic|cyberpunk]` - Description: Video style (optional). - Default: none - `--negative_prompt "[text]"` - Description: Elements to avoid (optional). - Default: "" (empty) - `--seed [integer]` - Description: Optional seed for reproducibility (e.g., 12345). - Default: "" (empty/random) 2. Generation Modes (Determined by attachments) - Text-to-Video: Provide a prompt with 0 image attachments. - Image-to-Video: Provide 1 image attachment. - Transition: Provide 2 image attachments (first is start frame, second is end frame). 3. Limitations - The combination of `--resolution 1080p` and `--duration 8` is not supported. - Only 0, 1, or 2 image attachments are supported. - Attachments must be images (PNG/JPEG/WEBP/TIFF/BMP/HEIC/GIF).
poe - wan-2.5 - - Wan-2.5 Video Generation bot. Has text-to-video and image-to-video capabilities. Optionally, send an audio file (mp3) to guide the video generation. Optional Parameters: Control the output's resolution with `--resolution` (480p, 720p or 1080p) defaults to 720. Pricing varies on the basis of resolution. Aspect ratio with `--aspect` ( 16:9, 1:1, 9:16) defaults to 16:9. Duration with `--duration` ( 5s or 10s) defaults to 5s.
poe - pixverse-v4.5 - - Pixverse v4.5 is a video generation model capable of generating high quality videos in under a minute. Use `--negative_prompt` to set the negative prompt. Use `--duration` to set the video duration (5 or 8 seconds). Set the resolution (360p,540p,720p or 1080p) using `--resolution`. Send 1 image to perform an image-to-video task or a video effect generation task, and 2 images to perform a video transition task, using the first image as the first frame and the second image as the last frame. Use `--effect` to set the video generation effect, provided 1 image is given (Options: `Kiss_Me_AI`, `Kiss`, `Muscle_Surge`, `Warmth_of_Jesus`, `Anything,_Robot`, `The_Tiger_Touch`, `Hug`, `Holy_Wings`, `Hulk`, `Venom`, `Microwave`). Use `--style` to set the video generation style (for text-to-video,image-to-video, and transition only, options: `anime`, `3d_animation`, `clay`, `comic`, `cyberpunk`). Use `--seed` to set the seed and `--aspect` to set the aspect ratio.
poe - flux-dev - - High-performance image generation with top of the line prompt following, visual quality, image detail and output diversity. This is a more efficient version of FLUX-pro, balancing quality and speed. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
poe - lyria - - Google DeepMind's Lyria 2 delivers high-quality audio generation, capable of creating diverse soundscapes and musical pieces from text prompts. Allows users to specify elements to exclude in the audio using the "--no" parameter at the end of the prompt. Also supports "--seed" for deterministic generation. e.g. "An energetic electronic dance track --no vocals, slow tempo --seed 123". Lyria blocks prompts that name specific artists or songs (artist-intent and recitation checks). This bot does not support attachments. This bot accepts input prompts of up to 480 tokens.
poe - kling-1.6-pro - - Kling v1.6 video generation bot, hosted by fal.ai. For best results, upload an image attachment. Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
poe - clarity-upscaler - - Upscales images with high fidelity to the original image. Use "--upscale_factor" (value is a number between 1 and 4) to set the upscaled images' size (2 means the output image is 2x in size, etc.). "--creativity" and "--clarity" can be set between 0 and 1 to alter the faithfulness to the original image and the sharpness, respectively. This bot supports .jpg and .png images.
poe - topazlabs 30.00 - Topaz Labs’ image upscaler is a best-in-class generative AI model to increase overall clarity and the pixel amount of inputted photos — whether they be ones generated by AI image models and from the real world — while preserving the original photo’s contents. It can produce images of as small as ~10MB and as large as 512MB, depending on the size of the input photo. Specify --upscale and a number up to 16 to control the upscaling factor, output_height and/or output_width to specify the number of pixels for each dimension, and add --generated if the input photo is AI-generated. With no parameters specified, it will increase both input photo’s height and width by 2; especially effective on images of human faces.
poe - veo-v3.1 - - Google's Veo-3.1 is an improved version of Veo 3. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`). Use `--silent` to generate a silent video at a lower cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V). Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation. Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task. Pass up to 3 images with `--reference` for a reference-to-video task. Reference images will be directly used in the video generation.
poe - veo-v3.1-fast - - Google's Veo 3.1 Fast is a fast version of Veo 3.1. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`). Use `--silent` to generate a silent video at a lower cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V). Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task.
poe - wan-2.2 - - Wan-2.2 is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Send one image for image to video tasks, and send two images for first-frame - last-frame generation. Use `--aspect` to set the aspect ratio (One of `16:9`, `1:1`, `9:16`) for text-to-video requests. Duration is limited to 5 seconds only with up to 720p resolution.
poe - ltx-2-fast - - LTX-2 Fast is a video model by Lightricks that delivers exceptional quality and speed. It can generate videos at up to 50 FPS in high resolutions and supports both text-to-video and image-to-video generation. Optional Prompts: Use `--generate-audio` to generate an audio with the video. This is disabled by default. Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default. Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price. Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default File attachment accepted: jpeg, png, webp
poe - ltx-2-pro - - LTX-2 Pro is an advanced video generation model by Lightricks designed for professional‑grade results. It offers high‑quality, realistic video generation at exceptional speed and supports outputs up to 2K resolution. Perfect for both text‑to‑video and image‑to‑video creation, it delivers cinematic detail and smooth performance. Optional Prompts: Use `--generate_audio` to generate an audio with the video. This is disabled by default. Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default. Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price. Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default. File attachment accepted: jpeg, png, webp
poe - veo-3 - - Veo 3 produces incredibly high-quality videos across a diverse range of subjects and styles. It incorporates an enhanced understanding of real-world physics and the subtleties of human movement and expression, resulting in greater detail and overall realism. Veo 3 is fluent in the unique language of cinematography: you can request a specific genre, specify a lens, or suggest cinematic effects, and Veo 3 will deliver stunning 8-second video clips. It supports both text-to-video and image-to-video generation and also features native audio generation based on text prompts. Please note that Veo 3 does not accept audio attachments. To exclude specific elements, use --no followed by a negative prompt (e.g., blurry, cloudy, or other attributes). To set a specific seed value, use `--seed` followed by the desired number (e.g., --seed 2). To set aspect ratio, use `aspect_ratio` followed by either 16:9 or 9:16. To set duration, use `--duration` followed by either 4s, 6s, 8s.
poe - veo-3-vfast - - Veo-3 Fast is a faster and more cost effective version of Google's Veo 3. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `1:1`, `9:16`). Use `--generate_audio` to generate audio with your video at a higher cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. Duration is limited to 7 seconds. This is a text to video generation model only.
poe - vidu - - The Vidu Video Generation Bot creates videos using images and text prompts. You can generate videos in four modes: (1) Image-to-Video: send 1 image with a prompt, (2) Start-to-End Frame: send 2 images with a prompt for transition videos, (3) Reference-to-Video: send up to 3 images with the `--reference` flag for guidance, and (4) Template-to-Video: use `--template` to apply pre-designed templates (1-3 images required, pricing varies by template). Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image. The bot supports aspect ratios `--aspect` (16:9, 1:1, 9:16), set movement amplitude `--movement-amplitude`, and accepts PNG, JPEG, and WEBP formats. Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video). Duration is limited to 5 seconds.
poe - vidu-q1 - - The Vidu Q1 Video Generation Bot creates videos using text prompts and images. You can generate videos in three modes: (1) Text-to-Video: send a text prompt, (2) Image-to-Video: send 1 image with a prompt, and (3) Reference-to-Video: send up to 7 images with the `--reference flag`. Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image. The bot support aspect ratios `--aspect` (16:9, 1:1, 9:16) and set movement amplitude `--movement-amplitude` that can be customized for text-to-video and reference-to-video tasks. Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video generation). The bot accepts PNG, JPEG, and WEBP formats. Duration is limited to 5 seconds.
poe - veo-3-fast - - Veo 3 Fast is a speed-optimized variant of Google’s Veo 3 AI video generation engine. It’s designed for rapid, cost-efficient production of short clips with synchronized audio (dialogue, ambient sound, effects). Prioritizes faster generation times while still delivering solid visual and audio quality, supports text-to-video and image-to-video workflows, allowing creators to animate still images into motion sequences, operates under defined constraints (e.g. video lengths of 4, 6, or 8 seconds, specified via the --duration parameter, e.g. "A cat dances --duration 6" will produce a 6-second video). Use `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9`. Please only upload photos that you own or have the right to use, otherwise the bot will throw an error.
poe - seedance-1.0-pro-fast - - Seedance Pro Fast is a faster version of Seedance 1.0 Pro that balances speed, quality and cost. Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Optional prompts: Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Set to `16:9` as default. Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. Set to `1080p` as default. `--duration` (3 to 12) sets the video duration. Set to `5s` as default. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024). File attachment accepted: jpeg, png, webp
poe - sora - - Sora is OpenAI's video generation model. Use `--duration` to set the duration of the generated video, and `--resolution` to set the video's resolution (480p, 720p, or 1080p). Set the aspect ratio of the generated video with `--aspect` (Valid aspect ratios are 16:9, 1:1, 9:16). This is a text-to-video model only. Switch to the newest models for improved video and audio creation: https://poe.com/Sora-2-Pro for cinematic excellence or https://poe.com/Sora-2 for unmatched realism and precision.
poe - omnihuman - - OmniHuman, by Bytedance, generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio. Send an image including a human figure with a visible face, and an audio, and the bot will return a video. The maximum audio length accepted is 30 seconds.
poe - grok-code-fast-1 - - Grok-Code-Fast-1 from xAI is a high-performance, cost-efficient model designed for agentic coding. It offers visible reasoning traces, strong steerability, and supports a 256k context window.
poe - bagoodex-web-search - - Bagoodex delivers real-time AI-powered web search offering instant access to videos, images, weather, and more. Audio and video uploads are not supported at this time.
poe - deep-ai-search - - Deep search engine integrating Brave AI with real-time web search. This chatbot executes commands and scrapes websites at scale while preserving its hallmark intelligence advantage. The bot doesn't accept file attachments. Examples: https://poe.com/s/P0BQmsvbE7zusdY0n49l https://poe.com/s/QgQSPsLD9efQrIwbmwuO
poe - kling-avatar-pro - - Create lifelike avatar videos featuring realistic humans, animals, cartoons, or stylized characters. Simply upload an image and an audio file to generate a video of your character speaking. Supported file formats: Images: JPEG, PNG, WEBP Audio: MP3, WAV
poe - playai-dialog - - Generates dialogues based on your script using PlayHT's text-to-speech model, in the voices of your choice. Use --speaker_1 [voice_name] and --speaker_2 [voice_name] to pass in the voices of your choice, choosing from below. Voice defaults to `Jennifer_(English_(US)/American)`. Follow the below format while prompting (case sensitive): FORMAT: ``` Speaker 1: ...... Speaker 2: ...... Speaker 1: ...... Speaker 2: ...... --speaker_1 [voice_1] --speaker_2 [voice_2] ``` VOICES AVAILABLE: Jennifer_(English_(US)/American) Dexter_(English_(US)/American) Ava_(English_(AU)/Australian) Tilly_(English_(AU)/Australian) Charlotte_(Advertising)_(English_(CA)/Canadian) Charlotte_(Meditation)_(English_(CA)/Canadian) Cecil_(English_(GB)/British) Sterling_(English_(GB)/British) Cillian_(English_(IE)/Irish) Madison_(English_(IE)/Irish) Ada_(English_(ZA)/South_African) Furio_(English_(IT)/Italian) Alessandro_(English_(IT)/Italian) Carmen_(English_(MX)/Mexican) Sumita_(English_(IN)/Indian) Navya_(English_(IN)/Indian) Baptiste_(English_(FR)/French) Lumi_(English_(FI)/Finnish) Ronel_Conversational_(Afrikaans/South_African) Ronel_Narrative_(Afrikaans/South_African) Abdo_Conversational_(Arabic/Arabic) Abdo_Narrative_(Arabic/Arabic) Mousmi_Conversational_(Bengali/Bengali) Mousmi_Narrative_(Bengali/Bengali) Caroline_Conversational_(Portuguese_(BR)/Brazilian) Caroline_Narrative_(Portuguese_(BR)/Brazilian) Ange_Conversational_(French/French) Ange_Narrative_(French/French) Anke_Conversational_(German/German) Anke_Narrative_(German/German) Bora_Conversational_(Greek/Greek) Bora_Narrative_(Greek/Greek) Anuj_Conversational_(Hindi/Indian) Anuj_Narrative_(Hindi/Indian) Alessandro_Conversational_(Italian/Italian) Alessandro_Narrative_(Italian/Italian) Kiriko_Conversational_(Japanese/Japanese) Kiriko_Narrative_(Japanese/Japanese) Dohee_Conversational_(Korean/Korean) Dohee_Narrative_(Korean/Korean) Ignatius_Conversational_(Malay/Malay) Ignatius_Narrative_(Malay/Malay) Adam_Conversational_(Polish/Polish) Adam_Narrative_(Polish/Polish) Andrei_Conversational_(Russian/Russian) Andrei_Narrative_(Russian/Russian) Aleksa_Conversational_(Serbian/Serbian) Aleksa_Narrative_(Serbian/Serbian) Carmen_Conversational_(Spanish/Spanish) Patricia_Conversational_(Spanish/Spanish) Aiken_Conversational_(Tagalog/Filipino) Aiken_Narrative_(Tagalog/Filipino) Katbundit_Conversational_(Thai/Thai) Katbundit_Narrative_(Thai/Thai) Ali_Conversational_(Turkish/Turkish) Ali_Narrative_(Turkish/Turkish) Sahil_Conversational_(Urdu/Pakistani) Sahil_Narrative_(Urdu/Pakistani) Mary_Conversational_(Hebrew/Israeli) Mary_Narrative_(Hebrew/Israeli) Prompt input cannot exceed 10,000 characters.
poe - luma-photon - - Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
poe - ideogram 45,000.00 - Excels at creating high-quality images from text prompts. For most prompts, https://poe.com/Ideogram-v2 will produce better results. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, & 1:1.
poe - seededit-3.0 - - SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images. Please send an image with a prompt to edit the image.
poe - kling-2.1-pro - - Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`. Requires an image attachment.
poe - kling-2.1-std - - Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`.
poe - runway-gen-4-turbo - - Runway's Gen-4 Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 1:1, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds. Full prompting guide here: https://help.runwayml.com/hc/en-us/articles/39789879462419-Gen-4-Video-Prompting-Guide
poe - runway - - Runway's Gen-3 Alpha Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds.
poe - veo-2 - - Veo 2 creates incredibly high-quality videos in a wide range of subjects and styles. It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall. Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver in 8-second clips. Use `--aspect_ratio` (16:9 or 9:16) to customize video aspect ratio. Supports text-to-video as well as image-to-video. Non english input will be translated first. Note: currently has low rate limit so you may need to retry your request at times of peak usage.
poe - dream-machine 360,000.00 - Luma AI's Dream Machine is an AI model that makes high-quality, realistic videos fast from text and images. Iterate at the speed of thought, create action-packed shots, and dream worlds with consistent characters on Poe today! To specify the aspect ratio of your video add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21). To loop your video add --loop True.
poe - kling-2.0-master - - Generate high-quality videos from text or images using Kling 2.0 Master. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use `--duration` to set either 5 or 10 second video.
poe - qwen-edit - - Image editing model based on Qwen-Image, with superior text editing capabilities.
poe - gptzero - - GPTZero is a deep-learning-driven platform designed to analyze and flag portions of text that are likely generated by AI vs. human authors. It distinguishes between “entirely human,” “entirely AI,” or “mixed” content and highlights the specific sentences involved. *Max number of files that can submitted simultaneously is 50, and the max file size for all files combined is 15 MB. Each file's document will be truncated to 50,000 characters. Supported file types: PDF, DOC/DOCX, TXT, ODT Parameter controls available: 1. Detection Options - Multilingual (FR/ES): - `--multilingual true` (Enables the GPTZero multilingual model) - `--multilingual false` (Default/Disabled) - Model Version: - `--modelVersion [version_string]` (Selects a specific GPTZero model version, e.g., '2025-10-30-base') - `--modelVersion __latest__` (Default: Automatically uses the latest model version)
poe - kling-pro-effects - - Generate videos with effects like squishing an object, two people hugging, making heart gestures, etc. using Kling-Pro-Effects. Requires an image input. Send a single image for `squish` and `expansion` effects and two images (of people) for `hug`, `kiss`, and `heart_gesture` effects. Set effect with --effect. Default effect: `squish`. Set duration with `--duration` with either 5s or 10s, set to 5s by default.
poe - hailuo-live - - Hailuo Live, the latest model from Minimax, sets a new standard for bringing still images to life. From breathtakingly vivid motion to finely tuned expressions, this state-of-the-art model enables your characters to captivate, move, and shine like never before. It excels in bring art and drawings to life, exceptional realism without morphing, emotional range, and unparalleled character consistency. Generates 5 second video.
poe - hailuo-ai - - Best-in-class text and image to video model by MiniMax.
poe - ray2 - - Ray2 is a large–scale video generative model capable of creating realistic visuals with natural, coherent motion. It has strong understanding of text instructions and can also take image input. Can produce videos from 540p to 4k resolution and with either 5/9s durations.
poe - veo-2-video - - Veo2 is Google's cutting-edge video generation model. Veo creates videos with realistic motion and high quality output.
poe - wan-2.1 - - Wan-2.1 is a text-to-video and image-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Generates 5 second video.
poe - ideogram-v2a-turbo 24,000.00 - Fast, affordable text-to-image model, optimized for graphic design and photography. For higher quality, use https://poe.com/Ideogram-v2A Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
poe - ideogram-v2a 39,000.00 - Fast, affordable text-to-image model, optimized for graphic design and photography. For faster and more cost-effective generations, use https://poe.com/Ideogram-v2A-Turbo Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
poe - trellis-3d - - Generate 3D models from your images using Trellis, a native 3D generative model enabling versatile and high-quality 3D asset creation. Send an image to convert it into a 3D model.
poe - flux-dev-finetuner - - Fine-tune the FLUX dev model with your own pictures! Upload 8-12 of them (same subject, only one subject in the picture, ideally from different poses and backgrounds) and wait ~2-5 minutes to create your own finetuned bot that will generate pictures of this subject in whatever setting you want.
poe - flux-inpaint - - Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
poe - flux-fill - - Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
poe - bria-eraser - - Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Send an image and a black-and-white mask image denoting the objects to be cleared out from the image. The input prompt is only used to create the filename of the output image.
poe - aya-vision 30.00 - Aya Vision is a 32B open-weights multimodal model with advanced capabilities optimized for a variety of vision-language use cases. It is model trained to excel in 23 languages in both vision and text: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
poe - kling-1.5-pro - - Kling v1.5 video generation bot, hosted by fal.ai. For best results, upload an image attachment. Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
poe - deepreasoning - - DeepReasoning (previously DeepClaude) is a high-performance LLM inference that combines DeepSeek R1's Chain of Thought (CoT) reasoning capabilities with Anthropic Claude's creative and code generation prowess. It provides a unified interface for leveraging the strengths of both models while maintaining complete control over your data. Learn more: https://deepclaude.com/
poe - gemma-3-27b - - Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2
poe - qwen3-32b-cs 3,600.00 - World’s fastest inference for Qwen 3 32B with Cerebras. Append /no_think to your prompt to disable the model's default reasoning behavior.
poe - qwen-2.5-vl-32b 6,600.00 - Qwen2.5-VL-32B's mathematical and problem-solving capabilities have been strengthened through reinforcement learning, leading to a significantly improved user experience. The model's response styles have been refined to better align with human preferences, particularly for objective queries involving mathematics, logical reasoning, and knowledge-based Q&A. As a result, responses now feature greater detail, improved clarity, and enhanced formatting.
poe - qwen2.5-vl-72b-t 8,700.00 - Qwen 2.5 VL 72B, a cutting-edge multimodal model from the Qwen Team, excels in visual and video understanding, multilingual text/image processing (including Japanese, Arabic, and Korean), and dynamic agentic reasoning for automation. It supports long-context comprehension (32K tokens)
poe - mistral-small-3 0.10 0.30 Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks--those that require robust language and instruction following performance, with very low latency. Released under an Apache 2.0 license and comparable to Llama-3.3-70B and Qwen2.5-32B-Instruct.
poe - deepseek-v3-di 4,300.00 - Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
poe - deepseek-v3-turbo-di 5,900.00 - Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. Turbo variant is quantized to achieve higher speeds. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
poe - phi-4-di 300.00 - Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 16k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
poe - mistral-7b-v0.3-di 150.00 - Mistral Instruct 7B v0.3 from Mistral AI. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
poe - aya-expanse-32b 5,100.00 - Aya Expanse is a 32B open-weight research release of a model with highly advanced multilingual capabilities. Aya supports state-of-art generative capabilities in 23 languages: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
poe - liveportrait - - Animates given portraits with the motion's in the video. Powered by fal.ai
poe - llama-3.1-8b-t-128k 3,000.00 - Llama 3.1 8B Instruct from Meta. Supports 128k tokens of context. The points price is subject to change.
poe - stablediffusion3-2b - - Stable Diffusion v3 Medium - by fal.ai
poe - mixtral8x22b-inst-fw 3,600.00 - Mixtral 8x22B Mixture-of-Experts instruct model from Mistral hosted by Fireworks.
poe - command-r 5,100.00 - I can search the web for up to date information and respond in over 10 languages!
poe - mistral-large-2 3.00 9.00 Mistral's latest text generation model (Mistral-Large-2407) with top-tier reasoning capabilities. It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. This bot has the full 128k context window supported by the model.
poe - dall-e-3 45,000.00 - OpenAI's most powerful image generation model. Generates high quality images with intricate details based on the user's most recent prompt. For most prompts, https://poe.com/FLUX-pro-1.1-ultra or https://poe.com/FLUX-dev or https://poe.com/Imagen3 will produce better results. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 1:1, 7:4, & 4:7.
poe - reka-core - - Reka's largest and most capable multimodal language model. Works with text, images, and video inputs. 8k context length.
poe - reka-flash - - Reka's efficient and capable 21B multimodal model optimized for fast workloads and amazing quality. Works with text, images and video inputs.
poe - command-r-plus 5,100.00 - A supercharged version of Command R. I can search the web for up to date information and respond in over 10 languages!
poe - claude-sonnet-3.5-june 2.60 13.00 Anthropic's legacy Sonnet 3.5 model, specifically the June 2024 snapshot (for the latest, please use https://poe.com/Claude-Sonnet-3.5). Excels in complex tasks like coding, writing, analysis and visual processing; generally, more verbose than the more concise October 2024 snapshot.
poe - gpt-3.5-turbo 0.45 1.40 OpenAI’s GPT 3.5 Turbo model is a powerful language generation system designed to provide highly coherent, contextually relevant, and detailed responses. Supports 16,384 tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - sketch-to-image - - Takes in sketches and converts them to colored images.
poe - qwen2.5-coder-32b 1,500.00 - Qwen2.5-Coder is the latest series of code-specific Qwen large language models (formerly known as CodeQwen), developed by Alibaba.
poe - stablediffusion3.5-t - - Faster version of Stable Diffusion 3 Large, hosted by @fal. Excels for fast image generation. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1).
poe - flux-pro-1.1-t 30,000.00 - The best state of the art image model from BFL. FLUX 1.1 Pro generates images six times faster than its predecessor, FLUX 1 Pro, while also improving image quality, prompt adherence, and output diversity. The bot does not support any attachments.
poe - flux-schnell-t 2,100.00 - Lightning-fast AI image generation model that excels in producing high-quality visuals in just seconds. Great for quick prototyping or real-time use cases. This is the fastest version of FLUX.1. The bot does not support any attachments.
poe - recraft-v3 - - Recraft V3, state of the art image generation. Prompt input cannot exceed 1,000 characters. Use --style for styles, and --aspect for aspect ratio configuration (16:9, 4:3, 1:1, 3:4, 9:16). Available styles: realistic_image, digital_illustration, vector_illustration, realistic_image/b_and_w, realistic_image/hard_flash, realistic_image/hdr, realistic_image/natural_light, realistic_image/studio_portrait, realistic_image/enterprise, realistic_image/motion_blur, digital_illustration/pixel_art, digital_illustration/hand_drawn, digital_illustration/grain, digital_illustration/infantile_sketch, digital_illustration/2d_art_poster, digital_illustration/handmade_3d, digital_illustration/hand_drawn_outline, digital_illustration/engraving_color, digital_illustration/2d_art_poster_2, vector_illustration/engraving, vector_illustration/line_art, vector_illustration/line_circuit, vector_illustration/linocut
poe - llama-3-70b-t 2,300.00 - Llama 3 70B Instruct from Meta. For most use cases, https://poe.com/Llama-3.3-70B will perform better.
poe - gpt-4o-aug 2.20 9.00 OpenAI's most powerful model, GPT-4o, using the August 2024 model snapshot. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - gpt-4-classic-0314 27.00 54.00 OpenAI's GPT-4 model. Powered by gpt-4-0314 (non-Turbo) for text input and gpt-4o for image input. For most use cases, https://poe.com/GPT-4o will perform significantly better.
poe - gpt-4-classic 27.00 54.00 OpenAI's GPT-4 model. Powered by gpt-4-0613 (non-Turbo) for text input and gpt-4o for image input. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - solar-pro-2 2,100.00 - Solar Pro 2 is Upstage's latest frontier-scale LLM. With just 31B parameters, it delivers top-tier performance through world-class multilingual support, advanced reasoning, and real-world tool use. Especially in Korean, it outperforms much larger models across critical benchmarks. Built for the next generation of practical LLMs, Solar Pro 2 proves that smaller models can still lead. Supports a context length of 64k tokens.
poe - remove-background - - Remove background from your images
poe - sana-t2i - - SANA can synthesize high-resolution, high-quality images at a remarkably fast rate, with the ability to generate 4K images in less than a second. Optional parameters: Set aspect ratio, with options 16:9, 4:3, 1:1, 3:4 and 9:16. This is set to 4:3 by default.
poe - mistral-7b-v0.3-t 1,400.00 - Mistral Instruct 7B v0.3 from Mistral AI. The points price is subject to change.
poe - tako 30,000.00 - Tako is a bot that transforms your questions about stocks, sports, economics or politics into interactive, shareable knowledge cards from trusted sources. Tako's knowledge graph is built exclusively from authoritative, real-time data providers, and is embeddable in your apps, research and storytelling. You can adjust the specificity threshold by typing `--specificity 30` (or a value between 0 - 100) at the end of your query/question; the default is 60.
poe - llama-3.1-405b-fp16 62,000.00 - The Biggest and Best open-source AI model trained by Meta, beating GPT-4o across most benchmarks. This bot is in BF16 and with 128K context length.
poe - llama-3.1-8b-fp16 1,500.00 - The smallest and fastest member of the Llama 3.1 family, offering exceptional efficiency and rapid response times with 128K context length.
poe - llama-3.1-70b-fp16 6,000.00 - The best LLM at its size with faster response times compared to the 405B model with 128K context length.
poe - llama-3-70b-fp16 6,000.00 - A highly efficient and powerful model designed for a veriety of tasks with 128K context length.
poe - restyler - - This bot enables rapid transformation of existing images, delivering high-quality style transfers and image modifications. Takes in a text input and an image attachment. Use --strength to control the guidance given by the initial image, with higher values adhering to the image more strongly.
poe - stablediffusionxl 3,600.00 - Generates high quality images based on the user's most recent prompt. Allows users to specify elements to avoid in the image using the "--no" parameter at the end of the prompt. Select an aspect ratio with "--aspect". (e.g. "Tall trees, daylight --no rain --aspect 7:4"). Valid aspect ratios are 1:1, 7:4, 4:7, 9:7, 7:9, 19:13, 13:19, 12:5, & 5:12. Powered by Stable Diffusion XL.
poe - qwen-2.5-7b-t 2,300.00 - Qwen 2.5 7B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages.
poe - qwen-2.5-72b-t 9,000.00 - Qwen 2.5 72B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages. Delivering results on par with Llama-3-405B despite using only one-fifth of the parameters.
poe - python 30.00 - Executes Python code (version 3.11) from the user message and outputs the results. If there are code blocks in the user message (surrounded by triple backticks), then only the code blocks will be executed. These libraries are imported into this bot's run-time automatically -- numpy, pandas, requests, matplotlib, scikit-learn, torch, PyYAML, tensorflow, scipy, pytest -- along with ~150 of the most widely used Python libraries.
poe - markitdown - - Convert anything to Markdown: URLs, PDFs, Word, Excel, PowerPoint, images (EXIF metadata), audio (EXIF metadata and transcription), and more. This bot wraps Microsoft’s MarkItDown MCP server (https://github.com/microsoft/markitdown).
poe - gpt-4-turbo 9.00 27.00 Powered by OpenAI's GPT-4 Turbo with Vision. For most tasks, https://poe.com/GPT-4o will perform better. Supports 128k tokens of context. Requests with images will be routed to @GPT-4o. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - flux-1-schnell-fw 1,000.00 - FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. 2. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps. 3. Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.
poe - flux-1-dev-fw 11,000.00 - FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro]. 2. Competitive prompt following, matching the performance of closed source alternatives. 3. Trained using guidance distillation, making FLUX.1 [dev] more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.
poe - mochi-preview - - Open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence. Supports both text-to-video and image-to-video. Generates 5 second video.
poe - gpt-3.5-turbo-instruct 1.40 1.80 Powered by gpt-3.5-turbo-instruct. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - gpt-3.5-turbo-raw 0.45 1.40 Powered by gpt-3.5-turbo without a system prompt. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe - interpreter - - Interpreter for Poe Python
poe - claude-haiku-3 0.21 1.10 Anthropic's Claude Haiku 3 outperforms models in its intelligence category on performance, speed and cost without the need for specialized fine-tuning. The compute points value is subject to change. For most use cases, https://poe.com/Claude-Haiku-3.5 will be better.
poe - code-saver - - A system bot that handles Poe scripts in chat.
poe - code-editor - - Official code editor for Poe Scripting using Python, used to connect multiple Poe bots and create AI workflows. Guide and tips: https://creator.poe.com/docs/script-bots/poe-python-reference
moonshotaicn Kimi K2 Thinking Turbo kimi-k2-thinking-turbo 1.15 8.00 Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
moonshotaicn Kimi K2 Thinking kimi-k2-thinking 0.60 2.50 Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
moonshotaicn Kimi K2 0905 kimi-k2-0905-preview 0.60 2.50 Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
moonshotaicn Kimi K2 0711 kimi-k2-0711-preview 0.60 2.50 Provider: Moonshot AI (China), Context: 131072, Output Limit: 16384
moonshotaicn Kimi K2 Turbo kimi-k2-turbo-preview 2.40 10.00 Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
lucidquery LucidQuery Nexus Coder lucidquery-nexus-coder 2.00 5.00 Provider: LucidQuery AI, Context: 250000, Output Limit: 60000
lucidquery LucidNova RF1 100B lucidnova-rf1-100b 2.00 5.00 Provider: LucidQuery AI, Context: 120000, Output Limit: 8000
moonshotai Kimi K2 Thinking Turbo kimi-k2-thinking-turbo 1.15 8.00 Provider: Moonshot AI, Context: 262144, Output Limit: 262144
moonshotai Kimi K2 Turbo kimi-k2-turbo-preview 2.40 10.00 Provider: Moonshot AI, Context: 262144, Output Limit: 262144
moonshotai Kimi K2 0711 kimi-k2-0711-preview 0.60 2.50 Provider: Moonshot AI, Context: 131072, Output Limit: 16384
moonshotai Kimi K2 Thinking kimi-k2-thinking 0.60 2.50 Provider: Moonshot AI, Context: 262144, Output Limit: 262144
moonshotai Kimi K2 0905 kimi-k2-0905-preview 0.60 2.50 Provider: Moonshot AI, Context: 262144, Output Limit: 262144
ollamacloud Kimi K2 Thinking kimi-k2-thinking:cloud - - Provider: Ollama Cloud, Context: 256000, Output Limit: 8192
ollamacloud Qwen3-VL 235B Instruct qwen3-vl-235b-cloud - - Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud Qwen3 Coder 480B qwen3-coder:480b-cloud - - Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud GPT-OSS 120B gpt-oss:120b-cloud - - Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud DeepSeek-V3.1 671B deepseek-v3.1:671b-cloud - - Provider: Ollama Cloud, Context: 160000, Output Limit: 8192
ollamacloud GLM-4.6 glm-4.6:cloud - - Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud Cogito 2.1 671B cogito-2.1:671b-cloud - - Provider: Ollama Cloud, Context: 160000, Output Limit: 8192
ollamacloud GPT-OSS 20B gpt-oss:20b-cloud - - Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud Qwen3-VL 235B Instruct qwen3-vl-235b-instruct-cloud - - Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud Kimi K2 kimi-k2:1t-cloud - - Provider: Ollama Cloud, Context: 256000, Output Limit: 8192
ollamacloud MiniMax M2 minimax-m2:cloud - - Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud Gemini 3 Pro Preview gemini-3-pro-preview:latest - - Provider: Ollama Cloud, Context: 1000000, Output Limit: 64000
xiaomi MiMo-V2-Flash mimo-v2-flash 0.07 0.21 Provider: Xiaomi, Context: 256000, Output Limit: 32000
alibaba Qwen3-LiveTranslate Flash Realtime qwen3-livetranslate-flash-realtime 10.00 10.00 Provider: Alibaba, Context: 53248, Output Limit: 4096
alibaba Qwen3-ASR Flash qwen3-asr-flash 0.04 0.04 Provider: Alibaba, Context: 53248, Output Limit: 4096
alibaba Qwen-Omni Turbo qwen-omni-turbo 0.07 0.27 Provider: Alibaba, Context: 32768, Output Limit: 2048
alibaba Qwen-VL Max qwen-vl-max 0.80 3.20 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen3-Next 80B-A3B Instruct qwen3-next-80b-a3b-instruct 0.50 2.00 Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba Qwen Turbo qwen-turbo 0.05 0.20 Provider: Alibaba, Context: 1000000, Output Limit: 16384
alibaba Qwen3-VL 235B-A22B qwen3-vl-235b-a22b 0.70 2.80 Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba Qwen3 Coder Flash qwen3-coder-flash 0.30 1.50 Provider: Alibaba, Context: 1000000, Output Limit: 65536
alibaba Qwen3-VL 30B-A3B qwen3-vl-30b-a3b 0.20 0.80 Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba Qwen3 14B qwen3-14b 0.35 1.40 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba QVQ Max qvq-max 1.20 4.80 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen Plus Character (Japanese) qwen-plus-character-ja 0.50 1.40 Provider: Alibaba, Context: 8192, Output Limit: 512
alibaba Qwen2.5 14B Instruct qwen2-5-14b-instruct 0.35 1.40 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba QwQ Plus qwq-plus 0.80 2.40 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen3-Coder 30B-A3B Instruct qwen3-coder-30b-a3b-instruct 0.45 2.25 Provider: Alibaba, Context: 262144, Output Limit: 65536
alibaba Qwen-VL OCR qwen-vl-ocr 0.72 0.72 Provider: Alibaba, Context: 34096, Output Limit: 4096
alibaba Qwen2.5 72B Instruct qwen2-5-72b-instruct 1.40 5.60 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen3-Omni Flash qwen3-omni-flash 0.43 1.66 Provider: Alibaba, Context: 65536, Output Limit: 16384
alibaba Qwen Flash qwen-flash 0.05 0.40 Provider: Alibaba, Context: 1000000, Output Limit: 32768
alibaba Qwen3 8B qwen3-8b 0.18 0.70 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen3-Omni Flash Realtime qwen3-omni-flash-realtime 0.52 1.99 Provider: Alibaba, Context: 65536, Output Limit: 16384
alibaba Qwen2.5-VL 72B Instruct qwen2-5-vl-72b-instruct 2.80 8.40 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen3-VL Plus qwen3-vl-plus 0.20 1.60 Provider: Alibaba, Context: 262144, Output Limit: 32768
alibaba Qwen Plus qwen-plus 0.40 1.20 Provider: Alibaba, Context: 1000000, Output Limit: 32768
alibaba Qwen2.5 32B Instruct qwen2-5-32b-instruct 0.70 2.80 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen2.5-Omni 7B qwen2-5-omni-7b 0.10 0.40 Provider: Alibaba, Context: 32768, Output Limit: 2048
alibaba Qwen Max qwen-max 1.60 6.40 Provider: Alibaba, Context: 32768, Output Limit: 8192
alibaba Qwen2.5 7B Instruct qwen2-5-7b-instruct 0.18 0.70 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen2.5-VL 7B Instruct qwen2-5-vl-7b-instruct 0.35 1.05 Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba Qwen3 235B-A22B qwen3-235b-a22b 0.70 2.80 Provider: Alibaba, Context: 131072, Output Limit: 16384
alibaba Qwen-Omni Turbo Realtime qwen-omni-turbo-realtime 0.27 1.07 Provider: Alibaba, Context: 32768, Output Limit: 2048
alibaba Qwen-MT Turbo qwen-mt-turbo 0.16 0.49 Provider: Alibaba, Context: 16384, Output Limit: 8192
alibaba Qwen3-Coder 480B-A35B Instruct qwen3-coder-480b-a35b-instruct 1.50 7.50 Provider: Alibaba, Context: 262144, Output Limit: 65536
alibaba Qwen-MT Plus qwen-mt-plus 2.46 7.37 Provider: Alibaba, Context: 16384, Output Limit: 8192
alibaba Qwen3 Max qwen3-max 1.20 6.00 Provider: Alibaba, Context: 262144, Output Limit: 65536
alibaba Qwen3 Coder Plus qwen3-coder-plus 1.00 5.00 Provider: Alibaba, Context: 1048576, Output Limit: 65536
alibaba Qwen3-Next 80B-A3B (Thinking) qwen3-next-80b-a3b-thinking 0.50 6.00 Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba Qwen3 32B qwen3-32b 0.70 2.80 Provider: Alibaba, Context: 131072, Output Limit: 16384
alibaba Qwen-VL Plus qwen-vl-plus 0.21 0.63 Provider: Alibaba, Context: 131072, Output Limit: 8192
xai Grok 4 Fast (Non-Reasoning) grok-4-fast-non-reasoning 0.20 0.50 Provider: xAI, Context: 2000000, Output Limit: 30000
xai Grok 3 Fast grok-3-fast 5.00 25.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 4 grok-4 3.00 15.00 Provider: xAI, Context: 256000, Output Limit: 64000
xai Grok 2 Vision grok-2-vision 2.00 10.00 Provider: xAI, Context: 8192, Output Limit: 4096
xai Grok Code Fast 1 grok-code-fast-1 0.20 1.50 Provider: xAI, Context: 256000, Output Limit: 10000
xai Grok 2 grok-2 2.00 10.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 3 Mini Fast Latest grok-3-mini-fast-latest 0.60 4.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 2 Vision (1212) grok-2-vision-1212 2.00 10.00 Provider: xAI, Context: 8192, Output Limit: 4096
xai Grok 3 grok-3 3.00 15.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 4 Fast grok-4-fast 0.20 0.50 Provider: xAI, Context: 2000000, Output Limit: 30000
xai Grok 2 Latest grok-2-latest 2.00 10.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 4.1 Fast grok-4-1-fast 0.20 0.50 Provider: xAI, Context: 2000000, Output Limit: 30000
xai Grok 2 (1212) grok-2-1212 2.00 10.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 3 Fast Latest grok-3-fast-latest 5.00 25.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 3 Latest grok-3-latest 3.00 15.00 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 2 Vision Latest grok-2-vision-latest 2.00 10.00 Provider: xAI, Context: 8192, Output Limit: 4096
xai Grok Vision Beta grok-vision-beta 5.00 15.00 Provider: xAI, Context: 8192, Output Limit: 4096
xai Grok 3 Mini grok-3-mini 0.30 0.50 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok Beta grok-beta 5.00 15.00 Provider: xAI, Context: 131072, Output Limit: 4096
xai Grok 3 Mini Latest grok-3-mini-latest 0.30 0.50 Provider: xAI, Context: 131072, Output Limit: 8192
xai Grok 4.1 Fast (Non-Reasoning) grok-4-1-fast-non-reasoning 0.20 0.50 Provider: xAI, Context: 2000000, Output Limit: 30000
xai Grok 3 Mini Fast grok-3-mini-fast 0.60 4.00 Provider: xAI, Context: 131072, Output Limit: 8192
vultr DeepSeek R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b 0.20 0.20 Provider: Vultr, Context: 121808, Output Limit: 8192
vultr Qwen2.5 Coder 32B Instruct qwen2.5-coder-32b-instruct 0.20 0.20 Provider: Vultr, Context: 12952, Output Limit: 2048
vultr Kimi K2 Instruct kimi-k2-instruct 0.20 0.20 Provider: Vultr, Context: 58904, Output Limit: 4096
vultr DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.20 0.20 Provider: Vultr, Context: 121808, Output Limit: 8192
vultr GPT OSS 120B gpt-oss-120b 0.20 0.20 Provider: Vultr, Context: 121808, Output Limit: 8192
nvidia Kimi K2 0905 kimi-k2-instruct-0905 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia Kimi K2 Thinking kimi-k2-thinking 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia Kimi K2 Instruct kimi-k2-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia nvidia-nemotron-nano-9b-v2 nvidia-nemotron-nano-9b-v2 0.00 0.00 Provider: Nvidia, Context: 131072, Output Limit: 131072
nvidia Cosmos Nemotron 34B cosmos-nemotron-34b 0.00 0.00 Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia Llama Embed Nemotron 8B llama-embed-nemotron-8b 0.00 0.00 Provider: Nvidia, Context: 32768, Output Limit: 2048
nvidia nemotron-3-nano-30b-a3b nemotron-3-nano-30b-a3b 0.00 0.00 Provider: Nvidia, Context: 131072, Output Limit: 131072
nvidia Parakeet TDT 0.6B v2 parakeet-tdt-0.6b-v2 0.00 0.00 Provider: Nvidia, Context: N/A, Output Limit: 4096
nvidia NeMo Retriever OCR v1 nemoretriever-ocr-v1 0.00 0.00 Provider: Nvidia, Context: N/A, Output Limit: 4096
nvidia Llama 3.3 Nemotron Super 49b V1 llama-3.3-nemotron-super-49b-v1 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 3.1 Nemotron 51b Instruct llama-3.1-nemotron-51b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama3 Chatqa 1.5 70b llama3-chatqa-1.5-70b 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama-3.1-Nemotron-Ultra-253B-v1 llama-3.1-nemotron-ultra-253b-v1 0.00 0.00 Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia Llama 3.1 Nemotron 70b Instruct llama-3.1-nemotron-70b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Nemotron 4 340b Instruct nemotron-4-340b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 3.3 Nemotron Super 49b V1.5 llama-3.3-nemotron-super-49b-v1.5 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia MiniMax-M2 minimax-m2 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 16384
nvidia Gemma 3n E2b It gemma-3n-e2b-it 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Codegemma 1.1 7b codegemma-1.1-7b 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Gemma 3n E4b It gemma-3n-e4b-it 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Gemma 2 2b It gemma-2-2b-it 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Gemma 3 12b It gemma-3-12b-it 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Codegemma 7b codegemma-7b 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Gemma 3 1b It gemma-3-1b-it 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Gemma 2 27b It gemma-2-27b-it 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Gemma-3-27B-IT gemma-3-27b-it 0.00 0.00 Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia Phi 3 Medium 128k Instruct phi-3-medium-128k-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Phi 3 Small 128k Instruct phi-3-small-128k-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Phi 3.5 Vision Instruct phi-3.5-vision-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Phi 3 Small 8k Instruct phi-3-small-8k-instruct 0.00 0.00 Provider: Nvidia, Context: 8000, Output Limit: 4096
nvidia Phi 3.5 Moe Instruct phi-3.5-moe-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Phi-4-Mini phi-4-mini-instruct 0.00 0.00 Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia Phi 3 Medium 4k Instruct phi-3-medium-4k-instruct 0.00 0.00 Provider: Nvidia, Context: 4000, Output Limit: 4096
nvidia Phi 3 Vision 128k Instruct phi-3-vision-128k-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Whisper Large v3 whisper-large-v3 0.00 0.00 Provider: Nvidia, Context: N/A, Output Limit: 4096
nvidia GPT-OSS-120B gpt-oss-120b 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 16384
nvidia Qwen2.5 Coder 32b Instruct qwen2.5-coder-32b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Qwen2.5 Coder 7b Instruct qwen2.5-coder-7b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Qwen3-235B-A22B qwen3-235b-a22b 0.00 0.00 Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 66536
nvidia Qwq 32b qwq-32b 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 16384
nvidia Devstral-2-123B-Instruct-2512 devstral-2-123b-instruct-2512 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia Mistral Large 3 675B Instruct 2512 mistral-large-3-675b-instruct-2512 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia Ministral 3 14B Instruct 2512 ministral-14b-instruct-2512 0.00 0.00 Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia Mamba Codestral 7b V0.1 mamba-codestral-7b-v0.1 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Mistral Large 2 Instruct mistral-large-2-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Codestral 22b Instruct V0.1 codestral-22b-instruct-v0.1 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Mistral Small 3.1 24b Instruct 2503 mistral-small-3.1-24b-instruct-2503 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 3.2 11b Vision Instruct llama-3.2-11b-vision-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama3 70b Instruct llama3-70b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 3.3 70b Instruct llama-3.3-70b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 3.2 1b Instruct llama-3.2-1b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 4 Scout 17b 16e Instruct llama-4-scout-17b-16e-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 4 Maverick 17b 128e Instruct llama-4-maverick-17b-128e-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Codellama 70b codellama-70b 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 3.1 405b Instruct llama-3.1-405b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama3 8b Instruct llama3-8b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Llama 3.1 70b Instruct llama-3.1-70b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Deepseek R1 0528 deepseek-r1-0528 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia Deepseek R1 deepseek-r1 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia DeepSeek V3.1 Terminus deepseek-v3.1-terminus 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia DeepSeek V3.1 deepseek-v3.1 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia Deepseek Coder 6.7b Instruct deepseek-coder-6.7b-instruct 0.00 0.00 Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia FLUX.1-dev flux.1-dev 0.00 0.00 Provider: Nvidia, Context: 4096, Output Limit: N/A
cohere Command A Translate command-a-translate-08-2025 2.50 10.00 Provider: Cohere, Context: 8000, Output Limit: 8000
cohere Command A command-a-03-2025 2.50 10.00 Provider: Cohere, Context: 256000, Output Limit: 8000
cohere Command R command-r-08-2024 0.15 0.60 Provider: Cohere, Context: 128000, Output Limit: 4000
cohere Command R+ command-r-plus-08-2024 2.50 10.00 Provider: Cohere, Context: 128000, Output Limit: 4000
cohere Command R7B command-r7b-12-2024 0.04 0.15 Provider: Cohere, Context: 128000, Output Limit: 4000
cohere Command A Reasoning command-a-reasoning-08-2025 2.50 10.00 Provider: Cohere, Context: 256000, Output Limit: 32000
cohere Command A Vision command-a-vision-07-2025 2.50 10.00 Provider: Cohere, Context: 128000, Output Limit: 8000
upstage solar-mini solar-mini 0.15 0.15 Provider: Upstage, Context: 32768, Output Limit: 4096
upstage solar-pro2 solar-pro2 0.25 0.25 Provider: Upstage, Context: 65536, Output Limit: 8192
groq Llama 3.1 8B Instant llama-3.1-8b-instant 0.05 0.08 Provider: Groq, Context: 131072, Output Limit: 131072
groq Mistral Saba 24B mistral-saba-24b 0.79 0.79 Provider: Groq, Context: 32768, Output Limit: 32768
groq Llama 3 8B llama3-8b-8192 0.05 0.08 Provider: Groq, Context: 8192, Output Limit: 8192
groq Qwen QwQ 32B qwen-qwq-32b 0.29 0.39 Provider: Groq, Context: 131072, Output Limit: 16384
groq Llama 3 70B llama3-70b-8192 0.59 0.79 Provider: Groq, Context: 8192, Output Limit: 8192
groq DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.75 0.99 Provider: Groq, Context: 131072, Output Limit: 8192
groq Llama Guard 3 8B llama-guard-3-8b 0.20 0.20 Provider: Groq, Context: 8192, Output Limit: 8192
groq Gemma 2 9B gemma2-9b-it 0.20 0.20 Provider: Groq, Context: 8192, Output Limit: 8192
groq Llama 3.3 70B Versatile llama-3.3-70b-versatile 0.59 0.79 Provider: Groq, Context: 131072, Output Limit: 32768
groq Kimi K2 Instruct 0905 kimi-k2-instruct-0905 1.00 3.00 Provider: Groq, Context: 262144, Output Limit: 16384
groq Kimi K2 Instruct kimi-k2-instruct 1.00 3.00 Provider: Groq, Context: 131072, Output Limit: 16384
groq GPT OSS 20B gpt-oss-20b 0.08 0.30 Provider: Groq, Context: 131072, Output Limit: 65536
groq GPT OSS 120B gpt-oss-120b 0.15 0.60 Provider: Groq, Context: 131072, Output Limit: 65536
groq Qwen3 32B qwen3-32b 0.29 0.59 Provider: Groq, Context: 131072, Output Limit: 16384
groq Llama 4 Scout 17B llama-4-scout-17b-16e-instruct 0.11 0.34 Provider: Groq, Context: 131072, Output Limit: 8192
groq Llama 4 Maverick 17B llama-4-maverick-17b-128e-instruct 0.20 0.60 Provider: Groq, Context: 131072, Output Limit: 8192
groq Llama Guard 4 12B llama-guard-4-12b 0.20 0.20 Provider: Groq, Context: 131072, Output Limit: 1024
bailing Ling-1T ling-1t 0.57 2.29 Provider: Bailing, Context: 128000, Output Limit: 32000
bailing Ring-1T ring-1t 0.57 2.29 Provider: Bailing, Context: 128000, Output Limit: 32000
githubcopilot Gemini 2.0 Flash gemini-2.0-flash-001 0.00 0.00 Provider: GitHub Copilot, Context: 1000000, Output Limit: 8192
githubcopilot Claude Opus 4 claude-opus-4 0.00 0.00 Provider: GitHub Copilot, Context: 80000, Output Limit: 16000
githubcopilot Gemini 3 Flash gemini-3-flash-preview 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot Grok Code Fast 1 grok-code-fast-1 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot GPT-5.1-Codex gpt-5.1-codex 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot Claude Haiku 4.5 claude-haiku-4.5 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
githubcopilot Gemini 3 Pro Preview gemini-3-pro-preview 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot Raptor Mini (Preview) oswe-vscode-prime 0.00 0.00 Provider: GitHub Copilot, Context: 200000, Output Limit: 64000
githubcopilot Claude Sonnet 3.5 claude-3.5-sonnet 0.00 0.00 Provider: GitHub Copilot, Context: 90000, Output Limit: 8192
githubcopilot GPT-5.1-Codex-mini gpt-5.1-codex-mini 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 100000
githubcopilot o3-mini o3-mini 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 65536
githubcopilot GPT-5.1 gpt-5.1 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot GPT-5-Codex gpt-5-codex 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot GPT-4o gpt-4o 0.00 0.00 Provider: GitHub Copilot, Context: 64000, Output Limit: 16384
githubcopilot GPT-4.1 gpt-4.1 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 16384
githubcopilot o4-mini (Preview) o4-mini 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 65536
githubcopilot Claude Opus 4.1 claude-opus-41 0.00 0.00 Provider: GitHub Copilot, Context: 80000, Output Limit: 16000
githubcopilot GPT-5-mini gpt-5-mini 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot Claude Sonnet 3.7 claude-3.7-sonnet 0.00 0.00 Provider: GitHub Copilot, Context: 200000, Output Limit: 16384
githubcopilot Gemini 2.5 Pro gemini-2.5-pro 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot GPT-5.1-Codex-max gpt-5.1-codex-max 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot o3 (Preview) o3 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 16384
githubcopilot Claude Sonnet 4 claude-sonnet-4 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
githubcopilot GPT-5 gpt-5 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot Claude Sonnet 3.7 Thinking claude-3.7-sonnet-thought 0.00 0.00 Provider: GitHub Copilot, Context: 200000, Output Limit: 16384
githubcopilot Claude Opus 4.5 claude-opus-4.5 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
githubcopilot GPT-5.2 gpt-5.2 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot Claude Sonnet 4.5 claude-sonnet-4.5 0.00 0.00 Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
mistral Devstral Medium devstral-medium-2507 0.40 2.00 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Mistral Large 3 mistral-large-2512 0.50 1.50 Provider: Mistral, Context: 262144, Output Limit: 262144
mistral Mixtral 8x22B open-mixtral-8x22b 2.00 6.00 Provider: Mistral, Context: 64000, Output Limit: 64000
mistral Ministral 8B ministral-8b-latest 0.10 0.10 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Pixtral Large pixtral-large-latest 2.00 6.00 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Mistral Small 3.2 mistral-small-2506 0.10 0.30 Provider: Mistral, Context: 128000, Output Limit: 16384
mistral devstral-2512 devstral-2512 0.40 2.00 Source: mistral, Context: 256000
mistral Ministral 3B ministral-3b-latest 0.04 0.04 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Pixtral 12B pixtral-12b 0.15 0.15 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Mistral Medium 3 mistral-medium-2505 0.40 2.00 Provider: Mistral, Context: 131072, Output Limit: 131072
mistral labs-devstral-small-2512 labs-devstral-small-2512 0.10 0.30 Source: mistral, Context: 256000
mistral Devstral 2 devstral-medium-latest 0.40 2.00 Provider: Mistral, Context: 262144, Output Limit: 262144
mistral Devstral Small 2505 devstral-small-2505 0.10 0.30 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Mistral Medium 3.1 mistral-medium-2508 0.40 2.00 Provider: Mistral, Context: 262144, Output Limit: 262144
mistral Mistral Embed mistral-embed 0.10 0.00 Provider: Mistral, Context: 8000, Output Limit: 3072
mistral Mistral Small mistral-small-latest 0.10 0.30 Provider: Mistral, Context: 128000, Output Limit: 16384
mistral Magistral Small magistral-small 0.50 1.50 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Devstral Small devstral-small-2507 0.10 0.30 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Codestral codestral-latest 0.30 0.90 Provider: Mistral, Context: 256000, Output Limit: 4096
mistral Mixtral 8x7B open-mixtral-8x7b 0.70 0.70 Provider: Mistral, Context: 32000, Output Limit: 32000
mistral Mistral Nemo mistral-nemo 0.15 0.15 Provider: Mistral, Context: 128000, Output Limit: 128000
mistral Mistral 7B open-mistral-7b 0.25 0.25 Provider: Mistral, Context: 8000, Output Limit: 8000
mistral Mistral Large mistral-large-latest 0.50 1.50 Provider: Mistral, Context: 262144, Output Limit: 262144
mistral Mistral Medium mistral-medium-latest 0.40 2.00 Provider: Mistral, Context: 128000, Output Limit: 16384
mistral Mistral Large 2.1 mistral-large-2411 2.00 6.00 Provider: Mistral, Context: 131072, Output Limit: 16384
mistral Magistral Medium magistral-medium-latest 2.00 5.00 Provider: Mistral, Context: 128000, Output Limit: 16384
abacus GPT-4.1 Nano gpt-4.1-nano 0.10 0.40 Provider: Abacus, Context: 1047576, Output Limit: 32768
abacus Grok 4 Fast (Non-Reasoning) grok-4-fast-non-reasoning 0.20 0.50 Provider: Abacus, Context: 2000000, Output Limit: 16384
abacus Gemini 2.0 Flash gemini-2.0-flash-001 0.10 0.40 Provider: Abacus, Context: 1000000, Output Limit: 8192
abacus DeepSeek V3.2 deepseek-ai-deepseek-v3.2 0.27 0.40 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus Llama 3.1 405B Instruct Turbo meta-llama-meta-llama-3.1-405b-instruct-turbo 3.50 3.50 Provider: Abacus, Context: 128000, Output Limit: 4096
abacus Gemini 3 Flash Preview gemini-3-flash-preview 0.50 3.00 Provider: Abacus, Context: 1048576, Output Limit: 65536
abacus Qwen3 235B A22B Instruct qwen-qwen3-235b-a22b-instruct-2507 0.13 0.60 Provider: Abacus, Context: 262144, Output Limit: 8192
abacus Llama 3.1 8B Instruct meta-llama-meta-llama-3.1-8b-instruct 0.02 0.05 Provider: Abacus, Context: 128000, Output Limit: 4096
abacus Grok Code Fast 1 grok-code-fast-1 0.20 1.50 Provider: Abacus, Context: 256000, Output Limit: 16384
abacus DeepSeek R1 deepseek-ai-deepseek-r1 3.00 7.00 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus Kimi K2 Turbo Preview kimi-k2-turbo-preview 0.15 8.00 Provider: Abacus, Context: 256000, Output Limit: 8192
abacus Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 Provider: Abacus, Context: 1000000, Output Limit: 65000
abacus Qwen3 Coder 480B A35B Instruct qwen-qwen3-coder-480b-a35b-instruct 0.29 1.20 Provider: Abacus, Context: 262144, Output Limit: 65536
abacus Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Provider: Abacus, Context: 1048576, Output Limit: 65536
abacus GPT-4.1 Mini gpt-4.1-mini 0.40 1.60 Provider: Abacus, Context: 1047576, Output Limit: 32768
abacus Claude Opus 4.5 claude-opus-4-5-20251101 5.00 25.00 Provider: Abacus, Context: 200000, Output Limit: 64000
abacus Qwen 2.5 Coder 32B qwen-2.5-coder-32b 0.79 0.79 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus Claude Sonnet 4.5 claude-sonnet-4-5-20250929 3.00 15.00 Provider: Abacus, Context: 200000, Output Limit: 64000
abacus GPT-OSS 120B openai-gpt-oss-120b 0.08 0.44 Provider: Abacus, Context: 128000, Output Limit: 32768
abacus Qwen3 Max qwen-qwen3-max 1.20 6.00 Provider: Abacus, Context: 131072, Output Limit: 16384
abacus Grok 4 grok-4-0709 3.00 15.00 Provider: Abacus, Context: 256000, Output Limit: 16384
abacus Llama 3.1 70B Instruct meta-llama-meta-llama-3.1-70b-instruct 0.40 0.40 Provider: Abacus, Context: 128000, Output Limit: 4096
abacus o3-mini o3-mini 1.10 4.40 Provider: Abacus, Context: 200000, Output Limit: 100000
abacus GLM-4.5 zai-org-glm-4.5 0.60 2.20 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus Gemini 2.0 Pro Exp gemini-2.0-pro-exp-02-05 - - Provider: Abacus, Context: 2000000, Output Limit: 8192
abacus GPT-5.1 gpt-5.1 1.25 10.00 Provider: Abacus, Context: 400000, Output Limit: 128000
abacus GPT-5 Nano gpt-5-nano 0.05 0.40 Provider: Abacus, Context: 400000, Output Limit: 128000
abacus Claude Sonnet 4 claude-sonnet-4-20250514 3.00 15.00 Provider: Abacus, Context: 200000, Output Limit: 64000
abacus GPT-4.1 gpt-4.1 2.00 8.00 Provider: Abacus, Context: 1047576, Output Limit: 32768
abacus o4-mini o4-mini 1.10 4.40 Provider: Abacus, Context: 200000, Output Limit: 100000
abacus Qwen3 32B qwen-qwen3-32b 0.09 0.29 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus Claude Opus 4 claude-opus-4-20250514 15.00 75.00 Provider: Abacus, Context: 200000, Output Limit: 32000
abacus GPT-5 Mini gpt-5-mini 0.25 2.00 Provider: Abacus, Context: 400000, Output Limit: 128000
abacus Llama 4 Maverick 17B 128E Instruct FP8 meta-llama-llama-4-maverick-17b-128e-instruct-fp8 0.14 0.59 Provider: Abacus, Context: 1000000, Output Limit: 32768
abacus o3-pro o3-pro 20.00 80.00 Provider: Abacus, Context: 200000, Output Limit: 100000
abacus Claude Sonnet 3.7 claude-3-7-sonnet-20250219 3.00 15.00 Provider: Abacus, Context: 200000, Output Limit: 64000
abacus DeepSeek V3.1 Terminus deepseek-ai-deepseek-v3.1-terminus 0.27 1.00 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Provider: Abacus, Context: 1048576, Output Limit: 65536
abacus GPT-4o (2024-11-20) gpt-4o-2024-11-20 2.50 10.00 Provider: Abacus, Context: 128000, Output Limit: 16384
abacus o3 o3 2.00 8.00 Provider: Abacus, Context: 200000, Output Limit: 100000
abacus Qwen 2.5 72B Instruct qwen-qwen2.5-72b-instruct 0.11 0.38 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus GLM-4.6 zai-org-glm-4.6 0.60 2.20 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus DeepSeek V3.1 deepseek-deepseek-v3.1 0.55 1.66 Provider: Abacus, Context: 128000, Output Limit: 8192
abacus QwQ 32B qwen-qwq-32b 0.40 0.40 Provider: Abacus, Context: 32768, Output Limit: 32768
abacus GPT-4o Mini gpt-4o-mini 0.15 0.60 Provider: Abacus, Context: 128000, Output Limit: 16384
abacus GPT-5 gpt-5 1.25 10.00 Provider: Abacus, Context: 400000, Output Limit: 128000
abacus Grok 4.1 Fast (Non-Reasoning) grok-4-1-fast-non-reasoning 0.20 0.50 Provider: Abacus, Context: 2000000, Output Limit: 16384
abacus Llama 3.3 70B Versatile llama-3.3-70b-versatile 0.59 0.79 Provider: Abacus, Context: 128000, Output Limit: 32768
abacus Claude Opus 4.1 claude-opus-4-1-20250805 15.00 75.00 Provider: Abacus, Context: 200000, Output Limit: 32000
abacus GPT-5.2 gpt-5.2 1.75 14.00 Provider: Abacus, Context: 400000, Output Limit: 128000
abacus GPT-5.1 Chat Latest gpt-5.1-chat-latest 1.25 10.00 Provider: Abacus, Context: 400000, Output Limit: 128000
abacus Claude Haiku 4.5 claude-haiku-4-5-20251001 1.00 5.00 Provider: Abacus, Context: 200000, Output Limit: 64000
nebius Hermes 4 70B hermes-4-70b 0.13 0.40 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius Hermes-4 405B hermes-4-405b 1.00 3.00 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius Kimi K2 Instruct kimi-k2-instruct 0.50 2.40 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius Llama 3.1 Nemotron Ultra 253B v1 llama-3_1-nemotron-ultra-253b-v1 0.60 1.80 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius GPT OSS 20B gpt-oss-20b 0.05 0.20 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius GPT OSS 120B gpt-oss-120b 0.15 0.60 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.20 0.60 Provider: Nebius Token Factory, Context: 262144, Output Limit: 8192
nebius Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 0.20 0.80 Provider: Nebius Token Factory, Context: 262144, Output Limit: 8192
nebius Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 0.40 1.80 Provider: Nebius Token Factory, Context: 262144, Output Limit: 66536
nebius Llama 3.1 405B Instruct llama-3_1-405b-instruct 1.00 3.00 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius Llama-3.3-70B-Instruct (Fast) llama-3.3-70b-instruct-fast 0.25 0.75 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius Llama-3.3-70B-Instruct (Base) llama-3.3-70b-instruct-base 0.13 0.40 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius GLM 4.5 glm-4.5 0.60 2.20 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius GLM 4.5 Air glm-4.5-air 0.20 1.20 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius DeepSeek V3 deepseek-v3 0.50 1.50 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
deepseek DeepSeek Chat deepseek-chat 0.28 0.42 Provider: DeepSeek, Context: 128000, Output Limit: 8192
deepseek DeepSeek Reasoner deepseek-reasoner 0.28 0.42 Provider: DeepSeek, Context: 128000, Output Limit: 128000
alibabacn DeepSeek R1 Distill Qwen 7B deepseek-r1-distill-qwen-7b 0.07 0.14 Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn Qwen3-ASR Flash qwen3-asr-flash 0.03 0.03 Provider: Alibaba (China), Context: 53248, Output Limit: 4096
alibabacn DeepSeek R1 0528 deepseek-r1-0528 0.57 2.29 Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn DeepSeek V3 deepseek-v3 0.29 1.15 Provider: Alibaba (China), Context: 65536, Output Limit: 8192
alibabacn Qwen-Omni Turbo qwen-omni-turbo 0.06 0.23 Provider: Alibaba (China), Context: 32768, Output Limit: 2048
alibabacn Qwen-VL Max qwen-vl-max 0.23 0.57 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn DeepSeek V3.2 Exp deepseek-v3-2-exp 0.29 0.43 Provider: Alibaba (China), Context: 131072, Output Limit: 65536
alibabacn Qwen3-Next 80B-A3B Instruct qwen3-next-80b-a3b-instruct 0.14 0.57 Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn DeepSeek R1 deepseek-r1 0.57 2.29 Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn Qwen Turbo qwen-turbo 0.04 0.09 Provider: Alibaba (China), Context: 1000000, Output Limit: 16384
alibabacn Qwen3-VL 235B-A22B qwen3-vl-235b-a22b 0.29 1.15 Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn Qwen3 Coder Flash qwen3-coder-flash 0.14 0.57 Provider: Alibaba (China), Context: 1000000, Output Limit: 65536
alibabacn Qwen3-VL 30B-A3B qwen3-vl-30b-a3b 0.11 0.43 Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn Qwen3 14B qwen3-14b 0.14 0.57 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn QVQ Max qvq-max 1.15 4.59 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn DeepSeek R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b 0.29 0.86 Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn Qwen Plus Character qwen-plus-character 0.12 0.29 Provider: Alibaba (China), Context: 32768, Output Limit: 4096
alibabacn Qwen2.5 14B Instruct qwen2-5-14b-instruct 0.14 0.43 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn QwQ Plus qwq-plus 0.23 0.57 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen2.5-Coder 32B Instruct qwen2-5-coder-32b-instruct 0.29 0.86 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen3-Coder 30B-A3B Instruct qwen3-coder-30b-a3b-instruct 0.22 0.86 Provider: Alibaba (China), Context: 262144, Output Limit: 65536
alibabacn Qwen Math Plus qwen-math-plus 0.57 1.72 Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn Qwen-VL OCR qwen-vl-ocr 0.72 0.72 Provider: Alibaba (China), Context: 34096, Output Limit: 4096
alibabacn Qwen Doc Turbo qwen-doc-turbo 0.09 0.14 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen Deep Research qwen-deep-research 7.74 23.37 Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
alibabacn Qwen2.5 72B Instruct qwen2-5-72b-instruct 0.57 1.72 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen3-Omni Flash qwen3-omni-flash 0.06 0.23 Provider: Alibaba (China), Context: 65536, Output Limit: 16384
alibabacn Qwen Flash qwen-flash 0.02 0.22 Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
alibabacn Qwen3 8B qwen3-8b 0.07 0.29 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen3-Omni Flash Realtime qwen3-omni-flash-realtime 0.23 0.92 Provider: Alibaba (China), Context: 65536, Output Limit: 16384
alibabacn Qwen2.5-VL 72B Instruct qwen2-5-vl-72b-instruct 2.29 6.88 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen3-VL Plus qwen3-vl-plus 0.14 1.43 Provider: Alibaba (China), Context: 262144, Output Limit: 32768
alibabacn Qwen Plus qwen-plus 0.12 0.29 Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
alibabacn Qwen2.5 32B Instruct qwen2-5-32b-instruct 0.29 0.86 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen2.5-Omni 7B qwen2-5-omni-7b 0.09 0.35 Provider: Alibaba (China), Context: 32768, Output Limit: 2048
alibabacn Qwen Max qwen-max 0.35 1.38 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen Long qwen-long 0.07 0.29 Provider: Alibaba (China), Context: 10000000, Output Limit: 8192
alibabacn Qwen2.5-Math 72B Instruct qwen2-5-math-72b-instruct 0.57 1.72 Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn Moonshot Kimi K2 Instruct moonshot-kimi-k2-instruct 0.57 2.29 Provider: Alibaba (China), Context: 131072, Output Limit: 131072
alibabacn Tongyi Intent Detect V3 tongyi-intent-detect-v3 0.06 0.14 Provider: Alibaba (China), Context: 8192, Output Limit: 1024
alibabacn Qwen2.5 7B Instruct qwen2-5-7b-instruct 0.07 0.14 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen2.5-VL 7B Instruct qwen2-5-vl-7b-instruct 0.29 0.72 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn DeepSeek V3.1 deepseek-v3-1 0.57 1.72 Provider: Alibaba (China), Context: 131072, Output Limit: 65536
alibabacn DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.29 0.86 Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn Qwen3 235B-A22B qwen3-235b-a22b 0.29 1.15 Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn Qwen2.5-Coder 7B Instruct qwen2-5-coder-7b-instruct 0.14 0.29 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn DeepSeek R1 Distill Qwen 14B deepseek-r1-distill-qwen-14b 0.14 0.43 Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn Qwen-Omni Turbo Realtime qwen-omni-turbo-realtime 0.23 0.92 Provider: Alibaba (China), Context: 32768, Output Limit: 2048
alibabacn Qwen Math Turbo qwen-math-turbo 0.29 0.86 Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn Qwen-MT Turbo qwen-mt-turbo 0.10 0.28 Provider: Alibaba (China), Context: 16384, Output Limit: 8192
alibabacn DeepSeek R1 Distill Llama 8B deepseek-r1-distill-llama-8b 0.00 0.00 Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn Qwen3-Coder 480B-A35B Instruct qwen3-coder-480b-a35b-instruct 0.86 3.44 Provider: Alibaba (China), Context: 262144, Output Limit: 65536
alibabacn Qwen-MT Plus qwen-mt-plus 0.26 0.78 Provider: Alibaba (China), Context: 16384, Output Limit: 8192
alibabacn Qwen3 Max qwen3-max 0.86 3.44 Provider: Alibaba (China), Context: 262144, Output Limit: 65536
alibabacn QwQ 32B qwq-32b 0.29 0.86 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen2.5-Math 7B Instruct qwen2-5-math-7b-instruct 0.14 0.29 Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn Qwen3-Next 80B-A3B (Thinking) qwen3-next-80b-a3b-thinking 0.14 1.43 Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn DeepSeek R1 Distill Qwen 1.5B deepseek-r1-distill-qwen-1-5b 0.00 0.00 Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn Qwen3 32B qwen3-32b 0.29 1.15 Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn Qwen-VL Plus qwen-vl-plus 0.12 0.29 Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn Qwen3 Coder Plus qwen3-coder-plus 1.00 5.00 Provider: Alibaba (China), Context: 1048576, Output Limit: 65536
googlevertexanthropic Claude Opus 4.5 claude-opus-4-5@20251101 5.00 25.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic Claude Sonnet 3.5 v2 claude-3-5-sonnet@20241022 3.00 15.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 8192
googlevertexanthropic Claude Haiku 3.5 claude-3-5-haiku@20241022 0.80 4.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 8192
googlevertexanthropic Claude Sonnet 4 claude-sonnet-4@20250514 3.00 15.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic Claude Sonnet 4.5 claude-sonnet-4-5@20250929 3.00 15.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic Claude Opus 4.1 claude-opus-4-1@20250805 15.00 75.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 32000
googlevertexanthropic Claude Haiku 4.5 claude-haiku-4-5@20251001 1.00 5.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic Claude Sonnet 3.7 claude-3-7-sonnet@20250219 3.00 15.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic Claude Opus 4 claude-opus-4@20250514 15.00 75.00 Provider: Vertex (Anthropic), Context: 200000, Output Limit: 32000
venice Grok 4.1 Fast grok-41-fast 0.50 1.25 Provider: Venice AI, Context: 262144, Output Limit: 65536
venice Qwen 3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.15 0.75 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice Gemini 3 Flash Preview gemini-3-flash-preview 0.70 3.75 Provider: Venice AI, Context: 262144, Output Limit: 65536
venice Claude Opus 4.5 claude-opus-45 6.00 30.00 Provider: Venice AI, Context: 202752, Output Limit: 50688
venice Venice Medium mistral-31-24b 0.50 2.00 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice Grok Code Fast 1 grok-code-fast-1 0.25 1.87 Provider: Venice AI, Context: 262144, Output Limit: 65536
venice GLM 4.7 zai-org-glm-4.7 0.85 2.75 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice Venice Uncensored 1.1 venice-uncensored 0.20 0.90 Provider: Venice AI, Context: 32768, Output Limit: 8192
venice Gemini 3 Pro Preview gemini-3-pro-preview 2.50 15.00 Provider: Venice AI, Context: 202752, Output Limit: 50688
venice GPT-5.2 openai-gpt-52 2.19 17.50 Provider: Venice AI, Context: 262144, Output Limit: 65536
venice Venice Small qwen3-4b 0.05 0.15 Provider: Venice AI, Context: 32768, Output Limit: 8192
venice Llama 3.3 70B llama-3.3-70b 0.70 2.80 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice OpenAI GPT OSS 120B openai-gpt-oss-120b 0.07 0.30 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice Kimi K2 Thinking kimi-k2-thinking 0.75 3.20 Provider: Venice AI, Context: 262144, Output Limit: 65536
venice Qwen 3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 0.45 3.50 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice Llama 3.2 3B llama-3.2-3b 0.15 0.60 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice Google Gemma 3 27B Instruct google-gemma-3-27b-it 0.12 0.20 Provider: Venice AI, Context: 202752, Output Limit: 50688
venice Hermes 3 Llama 3.1 405b hermes-3-llama-3.1-405b 1.10 3.00 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice GLM 4.6V zai-org-glm-4.6v 0.39 1.13 Provider: Venice AI, Context: 131072, Output Limit: 32768
venice MiniMax M2.1 minimax-m21 0.40 1.60 Provider: Venice AI, Context: 202752, Output Limit: 50688
venice Qwen 3 Next 80b qwen3-next-80b 0.35 1.90 Provider: Venice AI, Context: 262144, Output Limit: 65536
venice GLM 4.6 zai-org-glm-4.6 0.85 2.75 Provider: Venice AI, Context: 202752, Output Limit: 50688
venice Qwen 3 Coder 480b qwen3-coder-480b-a35b-instruct 0.75 3.00 Provider: Venice AI, Context: 262144, Output Limit: 65536
venice DeepSeek V3.2 deepseek-v3.2 0.40 1.00 Provider: Venice AI, Context: 163840, Output Limit: 40960
siliconflowcn inclusionAI/Ring-flash-2.0 ring-flash-2.0 0.14 0.57 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn inclusionAI/Ling-flash-2.0 ling-flash-2.0 0.14 0.57 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn inclusionAI/Ling-mini-2.0 ling-mini-2.0 0.07 0.28 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn moonshotai/Kimi-K2-Thinking kimi-k2-thinking 0.55 2.50 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn moonshotai/Kimi-K2-Instruct-0905 kimi-k2-instruct-0905 0.40 2.00 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn moonshotai/Kimi-Dev-72B kimi-dev-72b 0.29 1.15 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn moonshotai/Kimi-K2-Instruct kimi-k2-instruct 0.58 2.29 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn tencent/Hunyuan-A13B-Instruct hunyuan-a13b-instruct 0.14 0.57 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn tencent/Hunyuan-MT-7B hunyuan-mt-7b 0.00 0.00 Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
siliconflowcn MiniMaxAI/MiniMax-M1-80k minimax-m1-80k 0.55 2.20 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn MiniMaxAI/MiniMax-M2 minimax-m2 0.30 1.20 Provider: SiliconFlow (China), Context: 197000, Output Limit: 131000
siliconflowcn THUDM/GLM-Z1-32B-0414 glm-z1-32b-0414 0.14 0.57 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn THUDM/GLM-4-9B-0414 glm-4-9b-0414 0.09 0.09 Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
siliconflowcn THUDM/GLM-Z1-9B-0414 glm-z1-9b-0414 0.09 0.09 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn THUDM/GLM-4.1V-9B-Thinking glm-4.1v-9b-thinking 0.04 0.14 Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn THUDM/GLM-4-32B-0414 glm-4-32b-0414 0.27 0.27 Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
siliconflowcn openai/gpt-oss-120b gpt-oss-120b 0.05 0.45 Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
siliconflowcn openai/gpt-oss-20b gpt-oss-20b 0.04 0.18 Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
siliconflowcn stepfun-ai/step3 step3 0.57 1.42 Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn nex-agi/DeepSeek-V3.1-Nex-N1 deepseek-v3.1-nex-n1 0.50 2.00 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn baidu/ERNIE-4.5-300B-A47B ernie-4.5-300b-a47b 0.28 1.10 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn z-ai/GLM-4.5-Air glm-4.5-air 0.14 0.86 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn z-ai/GLM-4.5 glm-4.5 0.40 2.00 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn ByteDance-Seed/Seed-OSS-36B-Instruct seed-oss-36b-instruct 0.21 0.57 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn meta-llama/Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.06 0.06 Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn Qwen/Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking 0.14 0.57 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen2.5-14B-Instruct qwen2.5-14b-instruct 0.10 0.10 Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn Qwen/Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct 0.14 1.40 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-VL-32B-Instruct qwen3-vl-32b-instruct 0.20 0.60 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-Omni-30B-A3B-Thinking qwen3-omni-30b-a3b-thinking 0.10 0.40 Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn Qwen/Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 0.13 0.60 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-VL-32B-Thinking qwen3-vl-32b-thinking 0.20 1.50 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-VL-30B-A3B-Thinking qwen3-vl-30b-a3b-thinking 0.29 1.00 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-30B-A3B-Instruct-2507 qwen3-30b-a3b-instruct-2507 0.09 0.30 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-VL-235B-A22B-Thinking qwen3-vl-235b-a22b-thinking 0.45 3.50 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-Coder-480B-A35B-Instruct qwen3-coder-480b-a35b-instruct 0.25 1.00 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-VL-235B-A22B-Instruct qwen3-vl-235b-a22b-instruct 0.30 1.50 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-VL-8B-Instruct qwen3-vl-8b-instruct 0.18 0.68 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-32B qwen3-32b 0.14 0.57 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn Qwen/Qwen2.5-VL-7B-Instruct qwen2.5-vl-7b-instruct 0.05 0.05 Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn Qwen/QwQ-32B qwq-32b 0.15 0.58 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn Qwen/Qwen2.5-VL-72B-Instruct qwen2.5-vl-72b-instruct 0.59 0.59 Provider: SiliconFlow (China), Context: 131000, Output Limit: 4000
siliconflowcn Qwen/Qwen3-235B-A22B qwen3-235b-a22b 0.35 1.42 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn Qwen/Qwen2.5-7B-Instruct qwen2.5-7b-instruct 0.05 0.05 Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn Qwen/Qwen3-Coder-30B-A3B-Instruct qwen3-coder-30b-a3b-instruct 0.07 0.28 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen2.5-72B-Instruct qwen2.5-72b-instruct 0.59 0.59 Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn Qwen/Qwen2.5-72B-Instruct-128K qwen2.5-72b-instruct-128k 0.59 0.59 Provider: SiliconFlow (China), Context: 131000, Output Limit: 4000
siliconflowcn Qwen/Qwen2.5-32B-Instruct qwen2.5-32b-instruct 0.18 0.18 Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn Qwen/Qwen2.5-Coder-32B-Instruct qwen2.5-coder-32b-instruct 0.18 0.18 Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn Qwen/Qwen3-235B-A22B-Instruct-2507 qwen3-235b-a22b-instruct-2507 0.09 0.60 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-VL-8B-Thinking qwen3-vl-8b-thinking 0.18 2.00 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-Omni-30B-A3B-Instruct qwen3-omni-30b-a3b-instruct 0.10 0.40 Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn Qwen/Qwen3-8B qwen3-8b 0.06 0.06 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn Qwen/Qwen3-Omni-30B-A3B-Captioner qwen3-omni-30b-a3b-captioner 0.10 0.40 Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn Qwen/Qwen2.5-VL-32B-Instruct qwen2.5-vl-32b-instruct 0.27 0.27 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn Qwen/Qwen3-14B qwen3-14b 0.07 0.28 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn Qwen/Qwen3-VL-30B-A3B-Instruct qwen3-vl-30b-a3b-instruct 0.29 1.00 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn Qwen/Qwen3-30B-A3B-Thinking-2507 qwen3-30b-a3b-thinking-2507 0.09 0.30 Provider: SiliconFlow (China), Context: 262000, Output Limit: 131000
siliconflowcn Qwen/Qwen3-30B-A3B qwen3-30b-a3b 0.09 0.45 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn zai-org/GLM-4.5V glm-4.5v 0.14 0.86 Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn zai-org/GLM-4.6 glm-4.6 0.50 1.90 Provider: SiliconFlow (China), Context: 205000, Output Limit: 205000
siliconflowcn deepseek-ai/DeepSeek-V3.1 deepseek-v3.1 0.27 1.00 Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn deepseek-ai/DeepSeek-V3 deepseek-v3 0.25 1.00 Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn deepseek-ai/DeepSeek-R1-Distill-Qwen-7B deepseek-r1-distill-qwen-7b 0.05 0.05 Provider: SiliconFlow (China), Context: 33000, Output Limit: 16000
siliconflowcn deepseek-ai/DeepSeek-V3.1-Terminus deepseek-v3.1-terminus 0.27 1.00 Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn deepseek-ai/DeepSeek-V3.2-Exp deepseek-v3.2-exp 0.27 0.41 Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn deepseek-ai/DeepSeek-R1-Distill-Qwen-14B deepseek-r1-distill-qwen-14b 0.10 0.10 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn deepseek-ai/deepseek-vl2 deepseek-vl2 0.15 0.15 Provider: SiliconFlow (China), Context: 4000, Output Limit: 4000
siliconflowcn deepseek-ai/DeepSeek-R1-Distill-Qwen-32B deepseek-r1-distill-qwen-32b 0.18 0.18 Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn deepseek-ai/DeepSeek-R1 deepseek-r1 0.50 2.18 Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
chutes Hermes 4.3 36B hermes-4.3-36b 0.10 0.39 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Hermes 4 70B hermes-4-70b 0.11 0.38 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes Hermes 4 14B hermes-4-14b 0.01 0.05 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Hermes 4 405B FP8 TEE hermes-4-405b-fp8-tee 0.30 1.20 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Hermes 4 405B FP8 hermes-4-405b-fp8 0.30 1.20 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes DeepHermes 3 Mistral 24B Preview deephermes-3-mistral-24b-preview 0.02 0.10 Provider: Chutes, Context: 32768, Output Limit: 32768
chutes dots.ocr dots.ocr 0.01 0.01 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes Kimi K2 Instruct 0905 kimi-k2-instruct-0905 0.39 1.90 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes Kimi K2 Thinking TEE kimi-k2-thinking-tee 0.40 1.75 Provider: Chutes, Context: 262144, Output Limit: 65535
chutes MiniMax M2 minimax-m2 0.26 1.02 Provider: Chutes, Context: 196608, Output Limit: 196608
chutes MiniMax M2.1 TEE minimax-m2.1-tee 0.30 1.20 Provider: Chutes, Context: 196608, Output Limit: 65536
chutes NVIDIA Nemotron 3 Nano 30B A3B BF16 nvidia-nemotron-3-nano-30b-a3b-bf16 0.06 0.24 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes QwQ 32B ArliAI RpR v1 qwq-32b-arliai-rpr-v1 0.03 0.11 Provider: Chutes, Context: 32768, Output Limit: 32768
chutes DeepSeek R1T Chimera deepseek-r1t-chimera 0.30 1.20 Provider: Chutes, Context: 163840, Output Limit: 163840
chutes DeepSeek TNG R1T2 Chimera deepseek-tng-r1t2-chimera 0.30 1.20 Provider: Chutes, Context: 163840, Output Limit: 163840
chutes TNG R1T Chimera TEE tng-r1t-chimera-tee 0.30 1.20 Provider: Chutes, Context: 163840, Output Limit: 65536
chutes MiMo V2 Flash mimo-v2-flash 0.17 0.65 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes InternVL3 78B internvl3-78b 0.10 0.39 Provider: Chutes, Context: 32768, Output Limit: 32768
chutes gpt oss 120b TEE gpt-oss-120b-tee 0.04 0.25 Provider: Chutes, Context: 131072, Output Limit: 65536
chutes gpt oss 20b gpt-oss-20b 0.02 0.10 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes Mistral Small 3.1 24B Instruct 2503 mistral-small-3.1-24b-instruct-2503 0.03 0.11 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes Mistral Small 3.2 24B Instruct 2506 mistral-small-3.2-24b-instruct-2506 0.06 0.18 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes Tongyi DeepResearch 30B A3B tongyi-deepresearch-30b-a3b 0.10 0.39 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes Devstral 2 123B Instruct 2512 devstral-2-123b-instruct-2512 0.05 0.22 Provider: Chutes, Context: 262144, Output Limit: 65536
chutes Mistral Nemo Instruct 2407 mistral-nemo-instruct-2407 0.02 0.04 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes gemma 3 4b it gemma-3-4b-it 0.01 0.03 Provider: Chutes, Context: 96000, Output Limit: 96000
chutes Mistral Small 24B Instruct 2501 mistral-small-24b-instruct-2501 0.03 0.11 Provider: Chutes, Context: 32768, Output Limit: 32768
chutes gemma 3 12b it gemma-3-12b-it 0.03 0.10 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes gemma 3 27b it gemma-3-27b-it 0.04 0.15 Provider: Chutes, Context: 96000, Output Limit: 96000
chutes Qwen3 30B A3B qwen3-30b-a3b 0.06 0.22 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Qwen3 14B qwen3-14b 0.05 0.22 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Qwen2.5 VL 32B Instruct qwen2.5-vl-32b-instruct 0.05 0.22 Provider: Chutes, Context: 16384, Output Limit: 16384
chutes Qwen3Guard Gen 0.6B qwen3guard-gen-0.6b 0.01 0.01 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.08 0.55 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes Qwen2.5 Coder 32B Instruct qwen2.5-coder-32b-instruct 0.03 0.11 Provider: Chutes, Context: 32768, Output Limit: 32768
chutes Qwen2.5 72B Instruct qwen2.5-72b-instruct 0.13 0.52 Provider: Chutes, Context: 32768, Output Limit: 32768
chutes Qwen2.5 VL 72B Instruct TEE qwen2.5-vl-72b-instruct-tee 0.15 0.60 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Qwen3 235B A22B qwen3-235b-a22b 0.30 1.20 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Qwen2.5 VL 72B Instruct qwen2.5-vl-72b-instruct 0.07 0.26 Provider: Chutes, Context: 32768, Output Limit: 32768
chutes Qwen3 235B A22B Instruct 2507 TEE qwen3-235b-a22b-instruct-2507-tee 0.08 0.55 Provider: Chutes, Context: 262144, Output Limit: 65536
chutes Qwen3 32B qwen3-32b 0.08 0.24 Provider: Chutes, Context: 40960, Output Limit: 40960
chutes Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct 0.30 1.20 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking 0.30 1.20 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes Qwen3 30B A3B Instruct 2507 qwen3-30b-a3b-instruct-2507 0.08 0.33 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes Qwen3 Coder 480B A35B Instruct FP8 TEE qwen3-coder-480b-a35b-instruct-fp8-tee 0.22 0.95 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 0.11 0.60 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes Qwen3 Next 80B A3B Instruct qwen3-next-80b-a3b-instruct 0.10 0.80 Provider: Chutes, Context: 262144, Output Limit: 262144
chutes GLM 4.6 TEE glm-4.6-tee 0.40 1.75 Provider: Chutes, Context: 202752, Output Limit: 65536
chutes GLM 4.5 TEE glm-4.5-tee 0.35 1.55 Provider: Chutes, Context: 131072, Output Limit: 65536
chutes GLM 4.6V glm-4.6v 0.30 0.90 Provider: Chutes, Context: 131072, Output Limit: 65536
chutes GLM 4.7 TEE glm-4.7-tee 0.40 1.50 Provider: Chutes, Context: 202752, Output Limit: 65535
chutes GLM 4.5 Air glm-4.5-air 0.05 0.22 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes DeepSeek V3 0324 TEE deepseek-v3-0324-tee 0.24 0.84 Provider: Chutes, Context: 163840, Output Limit: 65536
chutes DeepSeek V3.2 Speciale TEE deepseek-v3.2-speciale-tee 0.27 0.41 Provider: Chutes, Context: 163840, Output Limit: 65536
chutes DeepSeek V3.1 Terminus TEE deepseek-v3.1-terminus-tee 0.23 0.90 Provider: Chutes, Context: 163840, Output Limit: 65536
chutes DeepSeek V3 deepseek-v3 0.30 1.20 Provider: Chutes, Context: 163840, Output Limit: 163840
chutes DeepSeek R1 TEE deepseek-r1-tee 0.30 1.20 Provider: Chutes, Context: 163840, Output Limit: 163840
chutes DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.03 0.11 Provider: Chutes, Context: 131072, Output Limit: 131072
chutes DeepSeek V3.1 deepseek-v3.1 0.20 0.80 Provider: Chutes, Context: 163840, Output Limit: 65536
chutes DeepSeek R1 0528 TEE deepseek-r1-0528-tee 0.40 1.75 Provider: Chutes, Context: 163840, Output Limit: 163840
chutes DeepSeek V3.2 TEE deepseek-v3.2-tee 0.27 0.41 Provider: Chutes, Context: 163840, Output Limit: 16384
chutes DeepSeek V3.1 TEE deepseek-v3.1-tee 0.20 0.80 Provider: Chutes, Context: 163840, Output Limit: 65536
kimiforcoding Kimi K2 Thinking kimi-k2-thinking 0.00 0.00 Provider: Kimi For Coding, Context: 262144, Output Limit: 32768
cortecs Nova Pro 1.0 nova-pro-v1 1.02 4.06 Provider: Cortecs, Context: 300000, Output Limit: 5000
cortecs Devstral 2 2512 devstral-2512 0.00 0.00 Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs INTELLECT 3 intellect-3 0.22 1.20 Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs Claude 4.5 Sonnet claude-4-5-sonnet 3.26 16.30 Provider: Cortecs, Context: 200000, Output Limit: 200000
cortecs DeepSeek V3 0324 deepseek-v3-0324 0.55 1.65 Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs Kimi K2 Thinking kimi-k2-thinking 0.66 2.73 Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs Kimi K2 Instruct kimi-k2-instruct 0.55 2.65 Provider: Cortecs, Context: 131000, Output Limit: 131000
cortecs GPT 4.1 gpt-4.1 2.35 9.42 Provider: Cortecs, Context: 1047576, Output Limit: 32768
cortecs Gemini 2.5 Pro gemini-2.5-pro 1.65 11.02 Provider: Cortecs, Context: 1048576, Output Limit: 65535
cortecs GPT Oss 120b gpt-oss-120b 0.00 0.00 Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs Devstral Small 2 2512 devstral-small-2512 0.00 0.00 Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 0.44 1.98 Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs Claude Sonnet 4 claude-sonnet-4 3.31 16.54 Provider: Cortecs, Context: 200000, Output Limit: 64000
cortecs Llama 3.1 405B Instruct llama-3.1-405b-instruct 0.00 0.00 Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs Qwen3 Next 80B A3B Thinking qwen3-next-80b-a3b-thinking 0.16 1.31 Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs Qwen3 32B qwen3-32b 0.10 0.33 Provider: Cortecs, Context: 16384, Output Limit: 16384
githubmodels JAIS 30b Chat jais-30b-chat 0.00 0.00 Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels Grok 3 grok-3 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Grok 3 Mini grok-3-mini 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Cohere Command R 08-2024 cohere-command-r-08-2024 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Cohere Command A cohere-command-a 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Cohere Command R+ 08-2024 cohere-command-r-plus-08-2024 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Cohere Command R cohere-command-r 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Cohere Command R+ cohere-command-r-plus 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels DeepSeek-R1-0528 deepseek-r1-0528 0.00 0.00 Provider: GitHub Models, Context: 65536, Output Limit: 8192
githubmodels DeepSeek-R1 deepseek-r1 0.00 0.00 Provider: GitHub Models, Context: 65536, Output Limit: 8192
githubmodels DeepSeek-V3-0324 deepseek-v3-0324 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Mistral Medium 3 (25.05) mistral-medium-2505 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels Ministral 3B ministral-3b 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Mistral Nemo mistral-nemo 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Mistral Large 24.11 mistral-large-2411 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels Codestral 25.01 codestral-2501 0.00 0.00 Provider: GitHub Models, Context: 32000, Output Limit: 8192
githubmodels Mistral Small 3.1 mistral-small-2503 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels Phi-3-medium instruct (128k) phi-3-medium-128k-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-3-mini instruct (4k) phi-3-mini-4k-instruct 0.00 0.00 Provider: GitHub Models, Context: 4096, Output Limit: 1024
githubmodels Phi-3-small instruct (128k) phi-3-small-128k-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-3.5-vision instruct (128k) phi-3.5-vision-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-4 phi-4 0.00 0.00 Provider: GitHub Models, Context: 16000, Output Limit: 4096
githubmodels Phi-4-mini-reasoning phi-4-mini-reasoning 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-3-small instruct (8k) phi-3-small-8k-instruct 0.00 0.00 Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels Phi-3.5-mini instruct (128k) phi-3.5-mini-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-4-multimodal-instruct phi-4-multimodal-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-3-mini instruct (128k) phi-3-mini-128k-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-3.5-MoE instruct (128k) phi-3.5-moe-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-4-mini-instruct phi-4-mini-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels Phi-3-medium instruct (4k) phi-3-medium-4k-instruct 0.00 0.00 Provider: GitHub Models, Context: 4096, Output Limit: 1024
githubmodels Phi-4-Reasoning phi-4-reasoning 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels MAI-DS-R1 mai-ds-r1 0.00 0.00 Provider: GitHub Models, Context: 65536, Output Limit: 8192
githubmodels GPT-4.1-nano gpt-4.1-nano 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels GPT-4.1-mini gpt-4.1-mini 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels OpenAI o1-preview o1-preview 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels OpenAI o3-mini o3-mini 0.00 0.00 Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels GPT-4o gpt-4o 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels GPT-4.1 gpt-4.1 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels OpenAI o4-mini o4-mini 0.00 0.00 Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels OpenAI o1 o1 0.00 0.00 Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels OpenAI o1-mini o1-mini 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 65536
githubmodels OpenAI o3 o3 0.00 0.00 Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels GPT-4o mini gpt-4o-mini 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels Llama-3.2-11B-Vision-Instruct llama-3.2-11b-vision-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Meta-Llama-3.1-405B-Instruct meta-llama-3.1-405b-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels Llama 4 Maverick 17B 128E Instruct FP8 llama-4-maverick-17b-128e-instruct-fp8 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Meta-Llama-3-70B-Instruct meta-llama-3-70b-instruct 0.00 0.00 Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels Meta-Llama-3.1-70B-Instruct meta-llama-3.1-70b-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels Llama-3.2-90B-Vision-Instruct llama-3.2-90b-vision-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Meta-Llama-3-8B-Instruct meta-llama-3-8b-instruct 0.00 0.00 Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels Llama 4 Scout 17B 16E Instruct llama-4-scout-17b-16e-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.00 0.00 Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels AI21 Jamba 1.5 Large ai21-jamba-1.5-large 0.00 0.00 Provider: GitHub Models, Context: 256000, Output Limit: 4096
githubmodels AI21 Jamba 1.5 Mini ai21-jamba-1.5-mini 0.00 0.00 Provider: GitHub Models, Context: 256000, Output Limit: 4096
togetherai Kimi K2 Instruct kimi-k2-instruct 1.00 3.00 Provider: Together AI, Context: 131072, Output Limit: 32768
togetherai Kimi K2 Thinking kimi-k2-thinking 1.20 4.00 Provider: Together AI, Context: 262144, Output Limit: 32768
togetherai Rnj-1 Instruct rnj-1-instruct 0.15 0.15 Provider: Together AI, Context: 32768, Output Limit: 32768
togetherai GPT OSS 120B gpt-oss-120b 0.15 0.60 Provider: Together AI, Context: 131072, Output Limit: 131072
togetherai Llama 3.3 70B llama-3.3-70b-instruct-turbo 0.88 0.88 Provider: Together AI, Context: 131072, Output Limit: 66536
togetherai Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct-fp8 2.00 2.00 Provider: Together AI, Context: 262144, Output Limit: 66536
togetherai GLM 4.6 glm-4.6 0.60 2.20 Provider: Together AI, Context: 200000, Output Limit: 32768
togetherai DeepSeek R1 deepseek-r1 3.00 7.00 Provider: Together AI, Context: 163839, Output Limit: 12288
togetherai DeepSeek V3 deepseek-v3 1.25 1.25 Provider: Together AI, Context: 131072, Output Limit: 12288
togetherai DeepSeek V3.1 deepseek-v3-1 0.60 1.70 Provider: Together AI, Context: 131072, Output Limit: 12288
azure GPT-4.1 nano gpt-4.1-nano 0.10 0.40 Provider: Azure, Context: 1047576, Output Limit: 32768
azure text-embedding-3-small text-embedding-3-small 0.02 0.00 Provider: Azure, Context: 8191, Output Limit: 1536
azure Grok 4 Fast (Non-Reasoning) grok-4-fast-non-reasoning 0.20 0.50 Provider: Azure, Context: 2000000, Output Limit: 30000
azure DeepSeek-R1-0528 deepseek-r1-0528 1.35 5.40 Provider: Azure, Context: 163840, Output Limit: 163840
azure Grok 4 Fast (Reasoning) grok-4-fast-reasoning 0.20 0.50 Provider: Azure, Context: 2000000, Output Limit: 30000
azure Phi-3-medium-instruct (128k) phi-3-medium-128k-instruct 0.17 0.68 Provider: Azure, Context: 128000, Output Limit: 4096
azure GPT-4 gpt-4 60.00 120.00 Provider: Azure, Context: 8192, Output Limit: 8192
azure Claude Opus 4.1 claude-opus-4-1 15.00 75.00 Provider: Azure, Context: 200000, Output Limit: 32000
azure GPT-5.2 Chat gpt-5.2-chat 1.75 14.00 Provider: Azure, Context: 128000, Output Limit: 16384
azure Llama-3.2-11B-Vision-Instruct llama-3.2-11b-vision-instruct 0.37 0.37 Provider: Azure, Context: 128000, Output Limit: 8192
azure Embed v4 cohere-embed-v-4-0 0.12 0.00 Provider: Azure, Context: 128000, Output Limit: 1536
azure Command R cohere-command-r-08-2024 0.15 0.60 Provider: Azure, Context: 128000, Output Limit: 4000
azure Grok 4 grok-4 3.00 15.00 Provider: Azure, Context: 256000, Output Limit: 64000
azure Embed v3 Multilingual cohere-embed-v3-multilingual 0.10 0.00 Provider: Azure, Context: 512, Output Limit: 1024
azure Phi-4-mini phi-4-mini 0.08 0.30 Provider: Azure, Context: 128000, Output Limit: 4096
azure GPT-4 32K gpt-4-32k 60.00 120.00 Provider: Azure, Context: 32768, Output Limit: 32768
azure Meta-Llama-3.1-405B-Instruct meta-llama-3.1-405b-instruct 5.33 16.00 Provider: Azure, Context: 128000, Output Limit: 32768
azure DeepSeek-R1 deepseek-r1 1.35 5.40 Provider: Azure, Context: 163840, Output Limit: 163840
azure Grok Code Fast 1 grok-code-fast-1 0.20 1.50 Provider: Azure, Context: 256000, Output Limit: 10000
azure GPT-5.1 Codex gpt-5.1-codex 1.25 10.00 Provider: Azure, Context: 400000, Output Limit: 128000
azure Phi-3-mini-instruct (4k) phi-3-mini-4k-instruct 0.13 0.52 Provider: Azure, Context: 4096, Output Limit: 1024
azure Claude Haiku 4.5 claude-haiku-4-5 1.00 5.00 Provider: Azure, Context: 200000, Output Limit: 64000
azure DeepSeek-V3.2-Speciale deepseek-v3.2-speciale 0.28 0.42 Provider: Azure, Context: 128000, Output Limit: 128000
azure Mistral Medium 3 mistral-medium-2505 0.40 2.00 Provider: Azure, Context: 128000, Output Limit: 128000
azure Claude Opus 4.5 claude-opus-4-5 5.00 25.00 Provider: Azure, Context: 200000, Output Limit: 64000
azure Phi-3-small-instruct (128k) phi-3-small-128k-instruct 0.15 0.60 Provider: Azure, Context: 128000, Output Limit: 4096
azure Command A cohere-command-a 2.50 10.00 Provider: Azure, Context: 256000, Output Limit: 8000
azure Command R+ cohere-command-r-plus-08-2024 2.50 10.00 Provider: Azure, Context: 128000, Output Limit: 4000
azure Llama 4 Maverick 17B 128E Instruct FP8 llama-4-maverick-17b-128e-instruct-fp8 0.25 1.00 Provider: Azure, Context: 128000, Output Limit: 8192
azure GPT-4.1 mini gpt-4.1-mini 0.40 1.60 Provider: Azure, Context: 1047576, Output Limit: 32768
azure GPT-5 Chat gpt-5-chat 1.25 10.00 Provider: Azure, Context: 128000, Output Limit: 16384
azure DeepSeek-V3.1 deepseek-v3.1 0.56 1.68 Provider: Azure, Context: 131072, Output Limit: 131072
azure Phi-4 phi-4 0.13 0.50 Provider: Azure, Context: 128000, Output Limit: 4096
azure Phi-4-mini-reasoning phi-4-mini-reasoning 0.08 0.30 Provider: Azure, Context: 128000, Output Limit: 4096
azure Claude Sonnet 4.5 claude-sonnet-4-5 3.00 15.00 Provider: Azure, Context: 200000, Output Limit: 64000
azure GPT-3.5 Turbo 0125 gpt-3.5-turbo-0125 0.50 1.50 Provider: Azure, Context: 16384, Output Limit: 16384
azure Grok 3 grok-3 3.00 15.00 Provider: Azure, Context: 131072, Output Limit: 8192
azure text-embedding-3-large text-embedding-3-large 0.13 0.00 Provider: Azure, Context: 8191, Output Limit: 3072
azure Meta-Llama-3-70B-Instruct meta-llama-3-70b-instruct 2.68 3.54 Provider: Azure, Context: 8192, Output Limit: 2048
azure DeepSeek-V3-0324 deepseek-v3-0324 1.14 4.56 Provider: Azure, Context: 131072, Output Limit: 131072
azure Phi-3-small-instruct (8k) phi-3-small-8k-instruct 0.15 0.60 Provider: Azure, Context: 8192, Output Limit: 2048
azure Meta-Llama-3.1-70B-Instruct meta-llama-3.1-70b-instruct 2.68 3.54 Provider: Azure, Context: 128000, Output Limit: 32768
azure GPT-4 Turbo gpt-4-turbo 10.00 30.00 Provider: Azure, Context: 128000, Output Limit: 4096
azure GPT-3.5 Turbo 0613 gpt-3.5-turbo-0613 3.00 4.00 Provider: Azure, Context: 16384, Output Limit: 16384
azure Phi-3.5-mini-instruct phi-3.5-mini-instruct 0.13 0.52 Provider: Azure, Context: 128000, Output Limit: 4096
azure o1-preview o1-preview 16.50 66.00 Provider: Azure, Context: 128000, Output Limit: 32768
azure Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.71 0.71 Provider: Azure, Context: 128000, Output Limit: 32768
azure GPT-5.1 Codex Mini gpt-5.1-codex-mini 0.25 2.00 Provider: Azure, Context: 400000, Output Limit: 128000
azure Kimi K2 Thinking kimi-k2-thinking 0.60 2.50 Provider: Azure, Context: 262144, Output Limit: 262144
azure Model Router model-router 0.14 0.00 Provider: Azure, Context: 128000, Output Limit: 16384
azure o3-mini o3-mini 1.10 4.40 Provider: Azure, Context: 200000, Output Limit: 100000
azure GPT-5.1 gpt-5.1 1.25 10.00 Provider: Azure, Context: 272000, Output Limit: 128000
azure GPT-5 Nano gpt-5-nano 0.05 0.40 Provider: Azure, Context: 272000, Output Limit: 128000
azure GPT-5-Codex gpt-5-codex 1.25 10.00 Provider: Azure, Context: 400000, Output Limit: 128000
azure Llama-3.2-90B-Vision-Instruct llama-3.2-90b-vision-instruct 2.04 2.04 Provider: Azure, Context: 128000, Output Limit: 8192
azure Phi-3-mini-instruct (128k) phi-3-mini-128k-instruct 0.13 0.52 Provider: Azure, Context: 128000, Output Limit: 4096
azure GPT-4o gpt-4o 2.50 10.00 Provider: Azure, Context: 128000, Output Limit: 16384
azure GPT-3.5 Turbo 0301 gpt-3.5-turbo-0301 1.50 2.00 Provider: Azure, Context: 4096, Output Limit: 4096
azure Ministral 3B ministral-3b 0.04 0.04 Provider: Azure, Context: 128000, Output Limit: 8192
azure GPT-4.1 gpt-4.1 2.00 8.00 Provider: Azure, Context: 1047576, Output Limit: 32768
azure o4-mini o4-mini 1.10 4.40 Provider: Azure, Context: 200000, Output Limit: 100000
azure Phi-4-multimodal phi-4-multimodal 0.08 0.32 Provider: Azure, Context: 128000, Output Limit: 4096
azure Meta-Llama-3-8B-Instruct meta-llama-3-8b-instruct 0.30 0.61 Provider: Azure, Context: 8192, Output Limit: 2048
azure o1 o1 15.00 60.00 Provider: Azure, Context: 200000, Output Limit: 100000
azure Grok 3 Mini grok-3-mini 0.30 0.50 Provider: Azure, Context: 131072, Output Limit: 8192
azure GPT-5.1 Chat gpt-5.1-chat 1.25 10.00 Provider: Azure, Context: 128000, Output Limit: 16384
azure Phi-3.5-MoE-instruct phi-3.5-moe-instruct 0.16 0.64 Provider: Azure, Context: 128000, Output Limit: 4096
azure GPT-5 Mini gpt-5-mini 0.25 2.00 Provider: Azure, Context: 272000, Output Limit: 128000
azure o1-mini o1-mini 1.10 4.40 Provider: Azure, Context: 128000, Output Limit: 65536
azure Llama 4 Scout 17B 16E Instruct llama-4-scout-17b-16e-instruct 0.20 0.78 Provider: Azure, Context: 128000, Output Limit: 8192
azure Embed v3 English cohere-embed-v3-english 0.10 0.00 Provider: Azure, Context: 512, Output Limit: 1024
azure text-embedding-ada-002 text-embedding-ada-002 0.10 0.00 Provider: Azure, Context: 8192, Output Limit: 1536
azure Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.30 0.61 Provider: Azure, Context: 128000, Output Limit: 32768
azure GPT-5.1 Codex Max gpt-5.1-codex-max 1.25 10.00 Provider: Azure, Context: 400000, Output Limit: 128000
azure GPT-3.5 Turbo Instruct gpt-3.5-turbo-instruct 1.50 2.00 Provider: Azure, Context: 4096, Output Limit: 4096
azure Mistral Nemo mistral-nemo 0.15 0.15 Provider: Azure, Context: 128000, Output Limit: 128000
azure o3 o3 2.00 8.00 Provider: Azure, Context: 200000, Output Limit: 100000
azure Codex Mini codex-mini 1.50 6.00 Provider: Azure, Context: 200000, Output Limit: 100000
azure Phi-3-medium-instruct (4k) phi-3-medium-4k-instruct 0.17 0.68 Provider: Azure, Context: 4096, Output Limit: 1024
azure Phi-4-reasoning phi-4-reasoning 0.13 0.50 Provider: Azure, Context: 32000, Output Limit: 4096
azure GPT-4 Turbo Vision gpt-4-turbo-vision 10.00 30.00 Provider: Azure, Context: 128000, Output Limit: 4096
azure Phi-4-reasoning-plus phi-4-reasoning-plus 0.13 0.50 Provider: Azure, Context: 32000, Output Limit: 4096
azure GPT-4o mini gpt-4o-mini 0.15 0.60 Provider: Azure, Context: 128000, Output Limit: 16384
azure GPT-5 gpt-5 1.25 10.00 Provider: Azure, Context: 272000, Output Limit: 128000
azure MAI-DS-R1 mai-ds-r1 1.35 5.40 Provider: Azure, Context: 128000, Output Limit: 8192
azure DeepSeek-V3.2 deepseek-v3.2 0.28 0.42 Provider: Azure, Context: 128000, Output Limit: 128000
azure GPT-5 Pro gpt-5-pro 15.00 120.00 Provider: Azure, Context: 400000, Output Limit: 272000
azure Mistral Large 24.11 mistral-large-2411 2.00 6.00 Provider: Azure, Context: 128000, Output Limit: 32768
azure GPT-5.2 gpt-5.2 1.75 14.00 Provider: Azure, Context: 400000, Output Limit: 128000
azure Codestral 25.01 codestral-2501 0.30 0.90 Provider: Azure, Context: 256000, Output Limit: 256000
azure Mistral Small 3.1 mistral-small-2503 0.10 0.30 Provider: Azure, Context: 128000, Output Limit: 32768
azure GPT-3.5 Turbo 1106 gpt-3.5-turbo-1106 1.00 2.00 Provider: Azure, Context: 16384, Output Limit: 16384
baseten Kimi K2 Instruct 0905 kimi-k2-instruct-0905 0.60 2.50 Provider: Baseten, Context: 262144, Output Limit: 262144
baseten Kimi K2 Thinking kimi-k2-thinking 0.60 2.50 Provider: Baseten, Context: 262144, Output Limit: 262144
baseten Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 0.38 1.53 Provider: Baseten, Context: 262144, Output Limit: 66536
baseten GLM-4.7 glm-4.7 0.60 2.20 Provider: Baseten, Context: 204800, Output Limit: 131072
baseten GLM 4.6 glm-4.6 0.60 2.20 Provider: Baseten, Context: 200000, Output Limit: 200000
baseten DeepSeek V3.2 deepseek-v3.2 0.30 0.45 Provider: Baseten, Context: 163800, Output Limit: 131100
siliconflow inclusionAI/Ling-mini-2.0 ling-mini-2.0 0.07 0.28 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow inclusionAI/Ling-flash-2.0 ling-flash-2.0 0.14 0.57 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow inclusionAI/Ring-flash-2.0 ring-flash-2.0 0.14 0.57 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow moonshotai/Kimi-K2-Instruct kimi-k2-instruct 0.58 2.29 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow moonshotai/Kimi-Dev-72B kimi-dev-72b 0.29 1.15 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow moonshotai/Kimi-K2-Instruct-0905 kimi-k2-instruct-0905 0.40 2.00 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow moonshotai/Kimi-K2-Thinking kimi-k2-thinking 0.55 2.50 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow tencent/Hunyuan-MT-7B hunyuan-mt-7b 0.00 0.00 Provider: SiliconFlow, Context: 33000, Output Limit: 33000
siliconflow tencent/Hunyuan-A13B-Instruct hunyuan-a13b-instruct 0.14 0.57 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow MiniMaxAI/MiniMax-M2 minimax-m2 0.30 1.20 Provider: SiliconFlow, Context: 197000, Output Limit: 131000
siliconflow MiniMaxAI/MiniMax-M1-80k minimax-m1-80k 0.55 2.20 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow THUDM/GLM-4-32B-0414 glm-4-32b-0414 0.27 0.27 Provider: SiliconFlow, Context: 33000, Output Limit: 33000
siliconflow THUDM/GLM-4.1V-9B-Thinking glm-4.1v-9b-thinking 0.04 0.14 Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow THUDM/GLM-Z1-9B-0414 glm-z1-9b-0414 0.09 0.09 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow THUDM/GLM-4-9B-0414 glm-4-9b-0414 0.09 0.09 Provider: SiliconFlow, Context: 33000, Output Limit: 33000
siliconflow THUDM/GLM-Z1-32B-0414 glm-z1-32b-0414 0.14 0.57 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow openai/gpt-oss-20b gpt-oss-20b 0.04 0.18 Provider: SiliconFlow, Context: 131000, Output Limit: 8000
siliconflow openai/gpt-oss-120b gpt-oss-120b 0.05 0.45 Provider: SiliconFlow, Context: 131000, Output Limit: 8000
siliconflow stepfun-ai/step3 step3 0.57 1.42 Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow nex-agi/DeepSeek-V3.1-Nex-N1 deepseek-v3.1-nex-n1 0.50 2.00 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow baidu/ERNIE-4.5-300B-A47B ernie-4.5-300b-a47b 0.28 1.10 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow z-ai/GLM-4.5 glm-4.5 0.40 2.00 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow z-ai/GLM-4.5-Air glm-4.5-air 0.14 0.86 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow ByteDance-Seed/Seed-OSS-36B-Instruct seed-oss-36b-instruct 0.21 0.57 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow meta-llama/Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.06 0.06 Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow Qwen/Qwen3-30B-A3B qwen3-30b-a3b 0.09 0.45 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow Qwen/Qwen3-30B-A3B-Thinking-2507 qwen3-30b-a3b-thinking-2507 0.09 0.30 Provider: SiliconFlow, Context: 262000, Output Limit: 131000
siliconflow Qwen/Qwen3-VL-30B-A3B-Instruct qwen3-vl-30b-a3b-instruct 0.29 1.00 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-14B qwen3-14b 0.07 0.28 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow Qwen/Qwen2.5-VL-32B-Instruct qwen2.5-vl-32b-instruct 0.27 0.27 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow Qwen/Qwen3-Omni-30B-A3B-Captioner qwen3-omni-30b-a3b-captioner 0.10 0.40 Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow Qwen/Qwen3-8B qwen3-8b 0.06 0.06 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow Qwen/Qwen3-Omni-30B-A3B-Instruct qwen3-omni-30b-a3b-instruct 0.10 0.40 Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow Qwen/Qwen3-VL-8B-Thinking qwen3-vl-8b-thinking 0.18 2.00 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-235B-A22B-Instruct-2507 qwen3-235b-a22b-instruct-2507 0.09 0.60 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen2.5-Coder-32B-Instruct qwen2.5-coder-32b-instruct 0.18 0.18 Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow Qwen/Qwen2.5-32B-Instruct qwen2.5-32b-instruct 0.18 0.18 Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow Qwen/Qwen2.5-72B-Instruct-128K qwen2.5-72b-instruct-128k 0.59 0.59 Provider: SiliconFlow, Context: 131000, Output Limit: 4000
siliconflow Qwen/Qwen2.5-72B-Instruct qwen2.5-72b-instruct 0.59 0.59 Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow Qwen/Qwen3-Coder-30B-A3B-Instruct qwen3-coder-30b-a3b-instruct 0.07 0.28 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen2.5-7B-Instruct qwen2.5-7b-instruct 0.05 0.05 Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow Qwen/Qwen3-235B-A22B qwen3-235b-a22b 0.35 1.42 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow Qwen/Qwen2.5-VL-72B-Instruct qwen2.5-vl-72b-instruct 0.59 0.59 Provider: SiliconFlow, Context: 131000, Output Limit: 4000
siliconflow Qwen/QwQ-32B qwq-32b 0.15 0.58 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow Qwen/Qwen2.5-VL-7B-Instruct qwen2.5-vl-7b-instruct 0.05 0.05 Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow Qwen/Qwen3-32B qwen3-32b 0.14 0.57 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow Qwen/Qwen3-VL-8B-Instruct qwen3-vl-8b-instruct 0.18 0.68 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-VL-235B-A22B-Instruct qwen3-vl-235b-a22b-instruct 0.30 1.50 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-Coder-480B-A35B-Instruct qwen3-coder-480b-a35b-instruct 0.25 1.00 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-VL-235B-A22B-Thinking qwen3-vl-235b-a22b-thinking 0.45 3.50 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-30B-A3B-Instruct-2507 qwen3-30b-a3b-instruct-2507 0.09 0.30 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-VL-30B-A3B-Thinking qwen3-vl-30b-a3b-thinking 0.29 1.00 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-VL-32B-Thinking qwen3-vl-32b-thinking 0.20 1.50 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 0.13 0.60 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-Omni-30B-A3B-Thinking qwen3-omni-30b-a3b-thinking 0.10 0.40 Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow Qwen/Qwen3-VL-32B-Instruct qwen3-vl-32b-instruct 0.20 0.60 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct 0.14 1.40 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow Qwen/Qwen2.5-14B-Instruct qwen2.5-14b-instruct 0.10 0.10 Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow Qwen/Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking 0.14 0.57 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow zai-org/GLM-4.6 glm-4.6 0.50 1.90 Provider: SiliconFlow, Context: 205000, Output Limit: 205000
siliconflow zai-org/GLM-4.5V glm-4.5v 0.14 0.86 Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow deepseek-ai/DeepSeek-R1 deepseek-r1 0.50 2.18 Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow deepseek-ai/DeepSeek-R1-Distill-Qwen-32B deepseek-r1-distill-qwen-32b 0.18 0.18 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow deepseek-ai/deepseek-vl2 deepseek-vl2 0.15 0.15 Provider: SiliconFlow, Context: 4000, Output Limit: 4000
siliconflow deepseek-ai/DeepSeek-R1-Distill-Qwen-14B deepseek-r1-distill-qwen-14b 0.10 0.10 Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow deepseek-ai/DeepSeek-V3.2-Exp deepseek-v3.2-exp 0.27 0.41 Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow deepseek-ai/DeepSeek-V3.1-Terminus deepseek-v3.1-terminus 0.27 1.00 Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow deepseek-ai/DeepSeek-R1-Distill-Qwen-7B deepseek-r1-distill-qwen-7b 0.05 0.05 Provider: SiliconFlow, Context: 33000, Output Limit: 16000
siliconflow deepseek-ai/DeepSeek-V3 deepseek-v3 0.25 1.00 Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow deepseek-ai/DeepSeek-V3.1 deepseek-v3.1 0.27 1.00 Provider: SiliconFlow, Context: 164000, Output Limit: 164000
helicone OpenAI GPT-4.1 Nano gpt-4.1-nano 0.10 0.40 Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone xAI Grok 4 Fast Non-Reasoning grok-4-fast-non-reasoning 0.20 0.50 Provider: Helicone, Context: 2000000, Output Limit: 2000000
helicone Qwen3 Coder 480B A35B Instruct Turbo qwen3-coder 0.22 0.95 Provider: Helicone, Context: 262144, Output Limit: 16384
helicone DeepSeek V3 deepseek-v3 0.56 1.68 Provider: Helicone, Context: 128000, Output Limit: 8192
helicone Anthropic: Claude Opus 4 claude-opus-4 15.00 75.00 Provider: Helicone, Context: 200000, Output Limit: 32000
helicone xAI: Grok 4 Fast Reasoning grok-4-fast-reasoning 0.20 0.50 Provider: Helicone, Context: 2000000, Output Limit: 2000000
helicone Meta Llama 3.1 8B Instant llama-3.1-8b-instant 0.05 0.08 Provider: Helicone, Context: 131072, Output Limit: 32678
helicone Anthropic: Claude Opus 4.1 claude-opus-4-1 15.00 75.00 Provider: Helicone, Context: 200000, Output Limit: 32000
helicone xAI Grok 4 grok-4 3.00 15.00 Provider: Helicone, Context: 256000, Output Limit: 256000
helicone Qwen3 Next 80B A3B Instruct qwen3-next-80b-a3b-instruct 0.14 1.40 Provider: Helicone, Context: 262000, Output Limit: 16384
helicone Meta Llama 4 Maverick 17B 128E llama-4-maverick 0.15 0.60 Provider: Helicone, Context: 131072, Output Limit: 8192
helicone Meta Llama Prompt Guard 2 86M llama-prompt-guard-2-86m 0.01 0.01 Provider: Helicone, Context: 512, Output Limit: 2
helicone xAI Grok 4.1 Fast Reasoning grok-4-1-fast-reasoning 0.20 0.50 Provider: Helicone, Context: 2000000, Output Limit: 2000000
helicone xAI Grok Code Fast 1 grok-code-fast-1 0.20 1.50 Provider: Helicone, Context: 256000, Output Limit: 10000
helicone Anthropic: Claude 4.5 Haiku claude-4.5-haiku 1.00 5.00 Provider: Helicone, Context: 200000, Output Limit: 8192
helicone Meta Llama 3.1 8B Instruct Turbo llama-3.1-8b-instruct-turbo 0.02 0.03 Provider: Helicone, Context: 128000, Output Limit: 128000
helicone OpenAI: GPT-5.1 Codex gpt-5.1-codex 1.25 10.00 Provider: Helicone, Context: 400000, Output Limit: 128000
helicone OpenAI GPT-4.1 Mini gpt-4.1-mini-2025-04-14 0.40 1.60 Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone Meta Llama Guard 4 12B llama-guard-4 0.21 0.21 Provider: Helicone, Context: 131072, Output Limit: 1024
helicone Meta Llama 3.1 8B Instruct llama-3.1-8b-instruct 0.02 0.05 Provider: Helicone, Context: 16384, Output Limit: 16384
helicone Google Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 Provider: Helicone, Context: 1048576, Output Limit: 65536
helicone Google Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Provider: Helicone, Context: 1048576, Output Limit: 65535
helicone OpenAI GPT-4.1 Mini gpt-4.1-mini 0.40 1.60 Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone DeepSeek V3.1 Terminus deepseek-v3.1-terminus 0.27 1.00 Provider: Helicone, Context: 128000, Output Limit: 16384
helicone Meta Llama Prompt Guard 2 22M llama-prompt-guard-2-22m 0.01 0.01 Provider: Helicone, Context: 512, Output Limit: 2
helicone Anthropic: Claude 3.5 Sonnet v2 claude-3.5-sonnet-v2 3.00 15.00 Provider: Helicone, Context: 200000, Output Limit: 8192
helicone Perplexity Sonar Deep Research sonar-deep-research 2.00 8.00 Provider: Helicone, Context: 127000, Output Limit: 4096
helicone Google Gemini 2.5 Flash Lite gemini-2.5-flash-lite 0.10 0.40 Provider: Helicone, Context: 1048576, Output Limit: 65535
helicone Anthropic: Claude Sonnet 4.5 (20250929) claude-sonnet-4-5-20250929 3.00 15.00 Provider: Helicone, Context: 200000, Output Limit: 64000
helicone xAI Grok 3 grok-3 3.00 15.00 Provider: Helicone, Context: 131072, Output Limit: 131072
helicone Mistral Small mistral-small 75.00 200.00 Provider: Helicone, Context: 128000, Output Limit: 128000
helicone Kimi K2 (07/11) kimi-k2-0711 0.57 2.30 Provider: Helicone, Context: 131072, Output Limit: 16384
helicone OpenAI ChatGPT-4o chatgpt-4o-latest 5.00 20.00 Provider: Helicone, Context: 128000, Output Limit: 16384
helicone Qwen3 Coder 30B A3B Instruct qwen3-coder-30b-a3b-instruct 0.10 0.30 Provider: Helicone, Context: 262144, Output Limit: 262144
helicone Kimi K2 (09/05) kimi-k2-0905 0.50 2.00 Provider: Helicone, Context: 262144, Output Limit: 16384
helicone Perplexity Sonar Reasoning sonar-reasoning 1.00 5.00 Provider: Helicone, Context: 127000, Output Limit: 4096
helicone Meta Llama 3.3 70B Instruct llama-3.3-70b-instruct 0.13 0.39 Provider: Helicone, Context: 128000, Output Limit: 16400
helicone OpenAI: GPT-5.1 Codex Mini gpt-5.1-codex-mini 0.25 2.00 Provider: Helicone, Context: 400000, Output Limit: 128000
helicone Kimi K2 Thinking kimi-k2-thinking 0.48 2.00 Provider: Helicone, Context: 256000, Output Limit: 262144
helicone OpenAI o3 Mini o3-mini 1.10 4.40 Provider: Helicone, Context: 200000, Output Limit: 100000
helicone Anthropic: Claude Sonnet 4.5 claude-4.5-sonnet 3.00 15.00 Provider: Helicone, Context: 200000, Output Limit: 64000
helicone OpenAI GPT-5.1 gpt-5.1 1.25 10.00 Provider: Helicone, Context: 400000, Output Limit: 128000
helicone OpenAI Codex Mini Latest codex-mini-latest 1.50 6.00 Provider: Helicone, Context: 200000, Output Limit: 100000
helicone OpenAI GPT-5 Nano gpt-5-nano 0.05 0.40 Provider: Helicone, Context: 400000, Output Limit: 128000
helicone OpenAI: GPT-5 Codex gpt-5-codex 1.25 10.00 Provider: Helicone, Context: 400000, Output Limit: 128000
helicone OpenAI GPT-4o gpt-4o 2.50 10.00 Provider: Helicone, Context: 128000, Output Limit: 16384
helicone DeepSeek TNG R1T2 Chimera deepseek-tng-r1t2-chimera 0.30 1.20 Provider: Helicone, Context: 130000, Output Limit: 163840
helicone Anthropic: Claude Opus 4.5 claude-4.5-opus 5.00 25.00 Provider: Helicone, Context: 200000, Output Limit: 64000
helicone OpenAI GPT-4.1 gpt-4.1 2.00 8.00 Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone Perplexity Sonar sonar 1.00 1.00 Provider: Helicone, Context: 127000, Output Limit: 4096
helicone Zai GLM-4.6 glm-4.6 0.45 1.50 Provider: Helicone, Context: 204800, Output Limit: 131072
helicone OpenAI o4 Mini o4-mini 1.10 4.40 Provider: Helicone, Context: 200000, Output Limit: 100000
helicone Qwen3 235B A22B Thinking qwen3-235b-a22b-thinking 0.30 2.90 Provider: Helicone, Context: 262144, Output Limit: 81920
helicone Hermes 2 Pro Llama 3 8B hermes-2-pro-llama-3-8b 0.14 0.14 Provider: Helicone, Context: 131072, Output Limit: 131072
helicone OpenAI: o1 o1 15.00 60.00 Provider: Helicone, Context: 200000, Output Limit: 100000
helicone xAI Grok 3 Mini grok-3-mini 0.30 0.50 Provider: Helicone, Context: 131072, Output Limit: 131072
helicone Perplexity Sonar Pro sonar-pro 3.00 15.00 Provider: Helicone, Context: 200000, Output Limit: 4096
helicone OpenAI GPT-5 Mini gpt-5-mini 0.25 2.00 Provider: Helicone, Context: 400000, Output Limit: 128000
helicone DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.03 0.13 Provider: Helicone, Context: 128000, Output Limit: 4096
helicone OpenAI: o1-mini o1-mini 1.10 4.40 Provider: Helicone, Context: 128000, Output Limit: 65536
helicone Anthropic: Claude 3.7 Sonnet claude-3.7-sonnet 3.00 15.00 Provider: Helicone, Context: 200000, Output Limit: 64000
helicone Anthropic: Claude 3 Haiku claude-3-haiku-20240307 0.25 1.25 Provider: Helicone, Context: 200000, Output Limit: 4096
helicone OpenAI o3 Pro o3-pro 20.00 80.00 Provider: Helicone, Context: 200000, Output Limit: 100000
helicone Qwen2.5 Coder 7B fast qwen2.5-coder-7b-fast 0.03 0.09 Provider: Helicone, Context: 32000, Output Limit: 8192
helicone DeepSeek Reasoner deepseek-reasoner 0.56 1.68 Provider: Helicone, Context: 128000, Output Limit: 64000
helicone Google Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Provider: Helicone, Context: 1048576, Output Limit: 65536
helicone Google Gemma 3 12B gemma-3-12b-it 0.05 0.10 Provider: Helicone, Context: 131072, Output Limit: 8192
helicone Mistral Nemo mistral-nemo 20.00 40.00 Provider: Helicone, Context: 128000, Output Limit: 16400
helicone OpenAI o3 o3 2.00 8.00 Provider: Helicone, Context: 200000, Output Limit: 100000
helicone OpenAI GPT-OSS 20b gpt-oss-20b 0.05 0.20 Provider: Helicone, Context: 131072, Output Limit: 131072
helicone OpenAI GPT-OSS 120b gpt-oss-120b 0.04 0.16 Provider: Helicone, Context: 131072, Output Limit: 131072
helicone Anthropic: Claude 3.5 Haiku claude-3.5-haiku 0.80 4.00 Provider: Helicone, Context: 200000, Output Limit: 8192
helicone OpenAI GPT-5 Chat Latest gpt-5-chat-latest 1.25 10.00 Provider: Helicone, Context: 128000, Output Limit: 16384
helicone OpenAI GPT-4o-mini gpt-4o-mini 0.15 0.60 Provider: Helicone, Context: 128000, Output Limit: 16384
helicone Google Gemma 2 gemma2-9b-it 0.01 0.03 Provider: Helicone, Context: 8192, Output Limit: 8192
helicone Anthropic: Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Provider: Helicone, Context: 200000, Output Limit: 64000
helicone Perplexity Sonar Reasoning Pro sonar-reasoning-pro 2.00 8.00 Provider: Helicone, Context: 127000, Output Limit: 4096
helicone OpenAI GPT-5 gpt-5 1.25 10.00 Provider: Helicone, Context: 400000, Output Limit: 128000
helicone Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct 0.30 1.50 Provider: Helicone, Context: 256000, Output Limit: 16384
helicone Qwen3 30B A3B qwen3-30b-a3b 0.08 0.29 Provider: Helicone, Context: 41000, Output Limit: 41000
helicone DeepSeek V3.2 deepseek-v3.2 0.27 0.41 Provider: Helicone, Context: 163840, Output Limit: 65536
helicone xAI Grok 4.1 Fast Non-Reasoning grok-4-1-fast-non-reasoning 0.20 0.50 Provider: Helicone, Context: 2000000, Output Limit: 30000
helicone OpenAI: GPT-5 Pro gpt-5-pro 15.00 120.00 Provider: Helicone, Context: 128000, Output Limit: 32768
helicone Meta Llama 3.3 70B Versatile llama-3.3-70b-versatile 0.59 0.79 Provider: Helicone, Context: 131072, Output Limit: 32678
helicone Mistral-Large mistral-large-2411 2.00 6.00 Provider: Helicone, Context: 128000, Output Limit: 32768
helicone Anthropic: Claude Opus 4.1 (20250805) claude-opus-4-1-20250805 15.00 75.00 Provider: Helicone, Context: 200000, Output Limit: 32000
helicone Baidu Ernie 4.5 21B A3B Thinking ernie-4.5-21b-a3b-thinking 0.07 0.28 Provider: Helicone, Context: 128000, Output Limit: 8000
helicone OpenAI GPT-5.1 Chat gpt-5.1-chat-latest 1.25 10.00 Provider: Helicone, Context: 128000, Output Limit: 16384
helicone Qwen3 32B qwen3-32b 0.29 0.59 Provider: Helicone, Context: 131072, Output Limit: 40960
helicone Anthropic: Claude 4.5 Haiku (20251001) claude-haiku-4-5-20251001 1.00 5.00 Provider: Helicone, Context: 200000, Output Limit: 8192
helicone Meta Llama 4 Scout 17B 16E llama-4-scout 0.08 0.30 Provider: Helicone, Context: 131072, Output Limit: 8192
huggingface Kimi-K2-Instruct kimi-k2-instruct 1.00 3.00 Provider: Hugging Face, Context: 131072, Output Limit: 16384
huggingface Kimi-K2-Instruct-0905 kimi-k2-instruct-0905 1.00 3.00 Provider: Hugging Face, Context: 262144, Output Limit: 16384
huggingface MiniMax-M2 minimax-m2 0.30 1.20 Provider: Hugging Face, Context: 204800, Output Limit: 204800
huggingface Qwen 3 Embedding 8B qwen3-embedding-8b 0.01 0.00 Provider: Hugging Face, Context: 32000, Output Limit: 4096
huggingface Qwen 3 Embedding 4B qwen3-embedding-4b 0.01 0.00 Provider: Hugging Face, Context: 32000, Output Limit: 2048
huggingface Qwen3-Coder-480B-A35B-Instruct qwen3-coder-480b-a35b-instruct 2.00 2.00 Provider: Hugging Face, Context: 262144, Output Limit: 66536
huggingface Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 0.30 3.00 Provider: Hugging Face, Context: 262144, Output Limit: 131072
huggingface Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct 0.25 1.00 Provider: Hugging Face, Context: 262144, Output Limit: 66536
huggingface Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking 0.30 2.00 Provider: Hugging Face, Context: 262144, Output Limit: 131072
huggingface GLM-4.5 glm-4.5 0.60 2.20 Provider: Hugging Face, Context: 131072, Output Limit: 98304
huggingface GLM-4.6 glm-4.6 0.60 2.20 Provider: Hugging Face, Context: 200000, Output Limit: 128000
huggingface GLM-4.5-Air glm-4.5-air 0.20 1.10 Provider: Hugging Face, Context: 128000, Output Limit: 96000
huggingface DeepSeek-V3-0324 deepseek-v3-0324 1.25 1.25 Provider: Hugging Face, Context: 16384, Output Limit: 8192
huggingface DeepSeek-R1-0528 deepseek-r1-0528 3.00 5.00 Provider: Hugging Face, Context: 163840, Output Limit: 163840
opencode Qwen3 Coder qwen3-coder 0.45 1.80 Provider: OpenCode Zen, Context: 262144, Output Limit: 65536
opencode Claude Opus 4.1 claude-opus-4-1 15.00 75.00 Provider: OpenCode Zen, Context: 200000, Output Limit: 32000
opencode Kimi K2 kimi-k2 0.40 2.50 Provider: OpenCode Zen, Context: 262144, Output Limit: 262144
opencode GPT-5.1 Codex gpt-5.1-codex 1.07 8.50 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode Claude Haiku 4.5 claude-haiku-4-5 1.00 5.00 Provider: OpenCode Zen, Context: 200000, Output Limit: 64000
opencode Claude Opus 4.5 claude-opus-4-5 5.00 25.00 Provider: OpenCode Zen, Context: 200000, Output Limit: 64000
opencode Gemini 3 Pro gemini-3-pro 2.00 12.00 Provider: OpenCode Zen, Context: 1048576, Output Limit: 65536
opencode Alpha GLM-4.7 alpha-glm-4.7 0.60 2.20 Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode Claude Sonnet 4.5 claude-sonnet-4-5 3.00 15.00 Provider: OpenCode Zen, Context: 1000000, Output Limit: 64000
opencode GPT-5.1 Codex Mini gpt-5.1-codex-mini 0.25 2.00 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode Alpha GD4 alpha-gd4 0.50 2.00 Provider: OpenCode Zen, Context: 262144, Output Limit: 32768
opencode Kimi K2 Thinking kimi-k2-thinking 0.40 2.50 Provider: OpenCode Zen, Context: 262144, Output Limit: 262144
opencode GPT-5.1 gpt-5.1 1.07 8.50 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode GPT-5 Nano gpt-5-nano 0.00 0.00 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode GPT-5 Codex gpt-5-codex 1.07 8.50 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode Big Pickle big-pickle 0.00 0.00 Provider: OpenCode Zen, Context: 200000, Output Limit: 128000
opencode Claude Haiku 3.5 claude-3-5-haiku 0.80 4.00 Provider: OpenCode Zen, Context: 200000, Output Limit: 8192
opencode GLM-4.6 glm-4.6 0.60 2.20 Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode GLM-4.7 glm-4.7-free 0.00 0.00 Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode Grok Code Fast 1 grok-code 0.00 0.00 Provider: OpenCode Zen, Context: 256000, Output Limit: 256000
opencode Gemini 3 Flash gemini-3-flash 0.50 3.00 Provider: OpenCode Zen, Context: 1048576, Output Limit: 65536
opencode GPT-5.1 Codex Max gpt-5.1-codex-max 1.25 10.00 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode MiniMax M2.1 minimax-m2.1-free 0.00 0.00 Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Provider: OpenCode Zen, Context: 1000000, Output Limit: 64000
opencode GPT-5 gpt-5 1.07 8.50 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode GPT-5.2 gpt-5.2 1.75 14.00 Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
fastrouter Kimi K2 kimi-k2 0.55 2.20 Provider: FastRouter, Context: 131072, Output Limit: 32768
fastrouter Grok 4 grok-4 3.00 15.00 Provider: FastRouter, Context: 256000, Output Limit: 64000
fastrouter Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Provider: FastRouter, Context: 1048576, Output Limit: 65536
fastrouter Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Provider: FastRouter, Context: 1048576, Output Limit: 65536
fastrouter GPT-5 Nano gpt-5-nano 0.05 0.40 Provider: FastRouter, Context: 400000, Output Limit: 128000
fastrouter GPT-4.1 gpt-4.1 2.00 8.00 Provider: FastRouter, Context: 1047576, Output Limit: 32768
fastrouter GPT-5 Mini gpt-5-mini 0.25 2.00 Provider: FastRouter, Context: 400000, Output Limit: 128000
fastrouter GPT OSS 20B gpt-oss-20b 0.05 0.20 Provider: FastRouter, Context: 131072, Output Limit: 65536
fastrouter GPT OSS 120B gpt-oss-120b 0.15 0.60 Provider: FastRouter, Context: 131072, Output Limit: 32768
fastrouter GPT-5 gpt-5 1.25 10.00 Provider: FastRouter, Context: 400000, Output Limit: 128000
fastrouter Qwen3 Coder qwen3-coder 0.30 1.20 Provider: FastRouter, Context: 262144, Output Limit: 66536
fastrouter Claude Opus 4.1 claude-opus-4.1 15.00 75.00 Provider: FastRouter, Context: 200000, Output Limit: 32000
fastrouter Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Provider: FastRouter, Context: 200000, Output Limit: 64000
fastrouter DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.03 0.14 Provider: FastRouter, Context: 131072, Output Limit: 131072
minimax MiniMax-M2 minimax-m2 0.30 1.20 Provider: MiniMax, Context: 196608, Output Limit: 128000
minimax MiniMax-M2.1 minimax-m2.1 0.30 1.20 Provider: MiniMax, Context: 204800, Output Limit: 131072
google Gemini Embedding 001 gemini-embedding-001 0.15 0.00 Provider: Google, Context: 2048, Output Limit: 3072
google Gemini 3 Flash Preview gemini-3-flash-preview 0.50 3.00 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 2.5 Flash Image gemini-2.5-flash-image 0.30 30.00 Provider: Google, Context: 32768, Output Limit: 32768
google Gemini 2.5 Flash Preview 05-20 gemini-2.5-flash-preview-05-20 0.15 0.60 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini Flash-Lite Latest gemini-flash-lite-latest 0.10 0.40 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 Provider: Google, Context: 1000000, Output Limit: 64000
google Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini Flash Latest gemini-flash-latest 0.30 2.50 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 2.5 Pro Preview 05-06 gemini-2.5-pro-preview-05-06 1.25 10.00 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts 0.50 10.00 Provider: Google, Context: 8000, Output Limit: 16000
google Gemini 2.0 Flash Lite gemini-2.0-flash-lite 0.08 0.30 Provider: Google, Context: 1048576, Output Limit: 8192
google Gemini Live 2.5 Flash Preview Native Audio gemini-live-2.5-flash-preview-native-audio 0.50 2.00 Provider: Google, Context: 131072, Output Limit: 65536
google Gemini 2.0 Flash gemini-2.0-flash 0.10 0.40 Provider: Google, Context: 1048576, Output Limit: 8192
google Gemini 2.5 Flash Lite gemini-2.5-flash-lite 0.10 0.40 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview-06-05 1.25 10.00 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini Live 2.5 Flash gemini-live-2.5-flash 0.50 2.00 Provider: Google, Context: 128000, Output Limit: 8000
google Gemini 2.5 Flash Lite Preview 06-17 gemini-2.5-flash-lite-preview-06-17 0.10 0.40 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 2.5 Flash Image (Preview) gemini-2.5-flash-image-preview 0.30 30.00 Provider: Google, Context: 32768, Output Limit: 32768
google Gemini 2.5 Flash Preview 09-25 gemini-2.5-flash-preview-09-2025 0.30 2.50 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 2.5 Flash Preview 04-17 gemini-2.5-flash-preview-04-17 0.15 0.60 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts 1.00 20.00 Provider: Google, Context: 8000, Output Limit: 16000
google Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 1.5 Flash gemini-1.5-flash 0.08 0.30 Provider: Google, Context: 1000000, Output Limit: 8192
google Gemini 1.5 Flash-8B gemini-1.5-flash-8b 0.04 0.15 Provider: Google, Context: 1000000, Output Limit: 8192
google Gemini 2.5 Flash Lite Preview 09-25 gemini-2.5-flash-lite-preview-09-2025 0.10 0.40 Provider: Google, Context: 1048576, Output Limit: 65536
google Gemini 1.5 Pro gemini-1.5-pro 1.25 5.00 Provider: Google, Context: 1000000, Output Limit: 8192
googlevertex Gemini Embedding 001 gemini-embedding-001 0.15 0.00 Provider: Vertex, Context: 2048, Output Limit: 3072
googlevertex Gemini 3 Flash Preview gemini-3-flash-preview 0.50 3.00 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Flash Preview 05-20 gemini-2.5-flash-preview-05-20 0.15 0.60 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini Flash-Lite Latest gemini-flash-lite-latest 0.10 0.40 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini Flash Latest gemini-flash-latest 0.30 2.50 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Pro Preview 05-06 gemini-2.5-pro-preview-05-06 1.25 10.00 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.0 Flash Lite gemini-2.0-flash-lite 0.08 0.30 Provider: Vertex, Context: 1048576, Output Limit: 8192
googlevertex Gemini 2.0 Flash gemini-2.0-flash 0.10 0.40 Provider: Vertex, Context: 1048576, Output Limit: 8192
googlevertex Gemini 2.5 Flash Lite gemini-2.5-flash-lite 0.10 0.40 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview-06-05 1.25 10.00 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Flash Lite Preview 06-17 gemini-2.5-flash-lite-preview-06-17 0.10 0.40 Provider: Vertex, Context: 65536, Output Limit: 65536
googlevertex Gemini 2.5 Flash Preview 09-25 gemini-2.5-flash-preview-09-2025 0.30 2.50 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Flash Preview 04-17 gemini-2.5-flash-preview-04-17 0.15 0.60 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex Gemini 2.5 Flash Lite Preview 09-25 gemini-2.5-flash-lite-preview-09-2025 0.10 0.40 Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex GPT OSS 120B gpt-oss-120b-maas 0.09 0.36 Provider: Vertex, Context: 131072, Output Limit: 32768
googlevertex GPT OSS 20B gpt-oss-20b-maas 0.07 0.25 Provider: Vertex, Context: 131072, Output Limit: 32768
cloudflareworkersai @hf/thebloke/mistral-7b-instruct-v0.1-awq mistral-7b-instruct-v0.1-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/deepgram/aura-1 aura-1 0.02 0.02 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @hf/mistral/mistral-7b-instruct-v0.2 mistral-7b-instruct-v0.2 0.00 0.00 Provider: Cloudflare Workers AI, Context: 3072, Output Limit: 4096
cloudflareworkersai @cf/tinyllama/tinyllama-1.1b-chat-v1.0 tinyllama-1.1b-chat-v1.0 0.00 0.00 Provider: Cloudflare Workers AI, Context: 2048, Output Limit: 2048
cloudflareworkersai @cf/qwen/qwen1.5-0.5b-chat qwen1.5-0.5b-chat 0.00 0.00 Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
cloudflareworkersai @cf/meta/llama-3.2-11b-vision-instruct llama-3.2-11b-vision-instruct 0.05 0.68 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai @hf/thebloke/llama-2-13b-chat-awq llama-2-13b-chat-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/meta/llama-3.1-8b-instruct-fp8 llama-3.1-8b-instruct-fp8 0.15 0.29 Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
cloudflareworkersai @cf/openai/whisper whisper 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/stabilityai/stable-diffusion-xl-base-1.0 stable-diffusion-xl-base-1.0 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/meta/llama-2-7b-chat-fp16 llama-2-7b-chat-fp16 0.56 6.67 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/microsoft/resnet-50 resnet-50 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/runwayml/stable-diffusion-v1-5-inpainting stable-diffusion-v1-5-inpainting 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/defog/sqlcoder-7b-2 sqlcoder-7b-2 0.00 0.00 Provider: Cloudflare Workers AI, Context: 10000, Output Limit: 10000
cloudflareworkersai @cf/meta/llama-3-8b-instruct llama-3-8b-instruct 0.28 0.83 Provider: Cloudflare Workers AI, Context: 7968, Output Limit: 7968
cloudflareworkersai @cf/meta-llama/llama-2-7b-chat-hf-lora llama-2-7b-chat-hf-lora 0.00 0.00 Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai @cf/meta/llama-3.1-8b-instruct llama-3.1-8b-instruct 0.28 0.83 Provider: Cloudflare Workers AI, Context: 7968, Output Limit: 7968
cloudflareworkersai @cf/openchat/openchat-3.5-0106 openchat-3.5-0106 0.00 0.00 Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai @hf/thebloke/openhermes-2.5-mistral-7b-awq openhermes-2.5-mistral-7b-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/leonardo/lucid-origin lucid-origin 0.01 0.01 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/facebook/bart-large-cnn bart-large-cnn 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/black-forest-labs/flux-1-schnell flux-1-schnell 0.00 0.00 Provider: Cloudflare Workers AI, Context: 2048, Output Limit: N/A
cloudflareworkersai @cf/deepseek-ai/deepseek-r1-distill-qwen-32b deepseek-r1-distill-qwen-32b 0.50 4.88 Provider: Cloudflare Workers AI, Context: 80000, Output Limit: 80000
cloudflareworkersai @cf/google/gemma-2b-it-lora gemma-2b-it-lora 0.00 0.00 Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai @cf/fblgit/una-cybertron-7b-v2-bf16 una-cybertron-7b-v2-bf16 0.00 0.00 Provider: Cloudflare Workers AI, Context: 15000, Output Limit: 15000
cloudflareworkersai @cf/aisingapore/gemma-sea-lion-v4-27b-it gemma-sea-lion-v4-27b-it 0.35 0.56 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: N/A
cloudflareworkersai @cf/meta/m2m100-1.2b m2m100-1.2b 0.34 0.34 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/meta/llama-3.2-3b-instruct llama-3.2-3b-instruct 0.05 0.34 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai @cf/qwen/qwen2.5-coder-32b-instruct qwen2.5-coder-32b-instruct 0.66 1.00 Provider: Cloudflare Workers AI, Context: 32768, Output Limit: 32768
cloudflareworkersai @cf/runwayml/stable-diffusion-v1-5-img2img stable-diffusion-v1-5-img2img 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/google/gemma-7b-it-lora gemma-7b-it-lora 0.00 0.00 Provider: Cloudflare Workers AI, Context: 3500, Output Limit: 3500
cloudflareworkersai @cf/qwen/qwen1.5-14b-chat-awq qwen1.5-14b-chat-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 7500, Output Limit: 7500
cloudflareworkersai @cf/qwen/qwen1.5-1.8b-chat qwen1.5-1.8b-chat 0.00 0.00 Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
cloudflareworkersai @cf/mistralai/mistral-small-3.1-24b-instruct mistral-small-3.1-24b-instruct 0.35 0.56 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai @hf/google/gemma-7b-it gemma-7b-it 0.00 0.00 Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai @cf/qwen/qwen3-30b-a3b-fp8 qwen3-30b-a3b-fp8 0.05 0.34 Provider: Cloudflare Workers AI, Context: 32768, Output Limit: N/A
cloudflareworkersai @hf/thebloke/llamaguard-7b-awq llamaguard-7b-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @hf/nousresearch/hermes-2-pro-mistral-7b hermes-2-pro-mistral-7b 0.00 0.00 Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai @cf/ibm-granite/granite-4.0-h-micro granite-4.0-h-micro 0.02 0.11 Provider: Cloudflare Workers AI, Context: 131000, Output Limit: N/A
cloudflareworkersai @cf/tiiuae/falcon-7b-instruct falcon-7b-instruct 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/meta/llama-3.3-70b-instruct-fp8-fast llama-3.3-70b-instruct-fp8-fast 0.29 2.25 Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai @cf/meta/llama-3-8b-instruct-awq llama-3-8b-instruct-awq 0.12 0.27 Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai @cf/leonardo/phoenix-1.0 phoenix-1.0 0.01 0.01 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/microsoft/phi-2 phi-2 0.00 0.00 Provider: Cloudflare Workers AI, Context: 2048, Output Limit: 2048
cloudflareworkersai @cf/lykon/dreamshaper-8-lcm dreamshaper-8-lcm 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/thebloke/discolm-german-7b-v1-awq discolm-german-7b-v1-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/meta/llama-2-7b-chat-int8 llama-2-7b-chat-int8 0.56 6.67 Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai @cf/meta/llama-3.2-1b-instruct llama-3.2-1b-instruct 0.03 0.20 Provider: Cloudflare Workers AI, Context: 60000, Output Limit: 60000
cloudflareworkersai @cf/openai/whisper-large-v3-turbo whisper-large-v3-turbo 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/meta/llama-4-scout-17b-16e-instruct llama-4-scout-17b-16e-instruct 0.27 0.85 Provider: Cloudflare Workers AI, Context: 131000, Output Limit: 131000
cloudflareworkersai @hf/nexusflow/starling-lm-7b-beta starling-lm-7b-beta 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @hf/thebloke/deepseek-coder-6.7b-base-awq deepseek-coder-6.7b-base-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/google/gemma-3-12b-it gemma-3-12b-it 0.35 0.56 Provider: Cloudflare Workers AI, Context: 80000, Output Limit: 80000
cloudflareworkersai @cf/meta/llama-guard-3-8b llama-guard-3-8b 0.48 0.03 Provider: Cloudflare Workers AI, Context: 131072, Output Limit: N/A
cloudflareworkersai @hf/thebloke/neural-chat-7b-v3-1-awq neural-chat-7b-v3-1-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/openai/whisper-tiny-en whisper-tiny-en 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/bytedance/stable-diffusion-xl-lightning stable-diffusion-xl-lightning 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/mistral/mistral-7b-instruct-v0.1 mistral-7b-instruct-v0.1 0.11 0.19 Provider: Cloudflare Workers AI, Context: 2824, Output Limit: 2824
cloudflareworkersai @cf/llava-hf/llava-1.5-7b-hf llava-1.5-7b-hf 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/openai/gpt-oss-20b gpt-oss-20b 0.20 0.30 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai @cf/deepseek-ai/deepseek-math-7b-instruct deepseek-math-7b-instruct 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/openai/gpt-oss-120b gpt-oss-120b 0.35 0.75 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai @cf/myshell-ai/melotts melotts 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/qwen/qwen1.5-7b-chat-awq qwen1.5-7b-chat-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 20000, Output Limit: 20000
cloudflareworkersai @cf/meta/llama-3.1-8b-instruct-fast llama-3.1-8b-instruct-fast 0.05 0.38 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai @cf/deepgram/nova-3 nova-3 0.01 0.01 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai @cf/meta/llama-3.1-70b-instruct llama-3.1-70b-instruct 0.29 2.25 Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai @cf/qwen/qwq-32b qwq-32b 0.66 1.00 Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai @hf/thebloke/zephyr-7b-beta-awq zephyr-7b-beta-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @hf/thebloke/deepseek-coder-6.7b-instruct-awq deepseek-coder-6.7b-instruct-awq 0.00 0.00 Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai @cf/meta/llama-3.1-8b-instruct-awq llama-3.1-8b-instruct-awq 0.12 0.27 Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai @cf/mistral/mistral-7b-instruct-v0.2-lora mistral-7b-instruct-v0.2-lora 0.00 0.00 Provider: Cloudflare Workers AI, Context: 15000, Output Limit: 15000
cloudflareworkersai @cf/unum/uform-gen2-qwen-500m uform-gen2-qwen-500m 0.00 0.00 Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
inception Mercury Coder mercury-coder 0.25 1.00 Provider: Inception, Context: 128000, Output Limit: 16384
inception Mercury mercury 0.25 1.00 Provider: Inception, Context: 128000, Output Limit: 16384
wandb Kimi-K2-Instruct kimi-k2-instruct 1.35 4.00 Provider: Weights & Biases, Context: 128000, Output Limit: 16384
wandb Phi-4-mini-instruct phi-4-mini-instruct 0.08 0.35 Provider: Weights & Biases, Context: 128000, Output Limit: 4096
wandb Meta-Llama-3.1-8B-Instruct llama-3.1-8b-instruct 0.22 0.22 Provider: Weights & Biases, Context: 128000, Output Limit: 32768
wandb Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.71 0.71 Provider: Weights & Biases, Context: 128000, Output Limit: 32768
wandb Llama 4 Scout 17B 16E Instruct llama-4-scout-17b-16e-instruct 0.17 0.66 Provider: Weights & Biases, Context: 64000, Output Limit: 8192
wandb Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.10 0.10 Provider: Weights & Biases, Context: 262144, Output Limit: 131072
wandb Qwen3-Coder-480B-A35B-Instruct qwen3-coder-480b-a35b-instruct 1.00 1.50 Provider: Weights & Biases, Context: 262144, Output Limit: 66536
wandb Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 0.10 0.10 Provider: Weights & Biases, Context: 262144, Output Limit: 131072
wandb DeepSeek-R1-0528 deepseek-r1-0528 1.35 5.40 Provider: Weights & Biases, Context: 161000, Output Limit: 163840
wandb DeepSeek-V3-0324 deepseek-v3-0324 1.14 2.75 Provider: Weights & Biases, Context: 161000, Output Limit: 8192
cloudflareaigateway IBM Granite 4.0 H Micro granite-4.0-h-micro 0.02 0.11 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway BART Large CNN bart-large-cnn 0.00 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Mistral 7B Instruct v0.1 mistral-7b-instruct-v0.1 0.11 0.19 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway DistilBERT SST-2 INT8 distilbert-sst-2-int8 0.03 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway MyShell MeloTTS melotts 0.00 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Gemma 3 12B IT gemma-3-12b-it 0.35 0.56 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway PLaMo Embedding 1B plamo-embedding-1b 0.02 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway GPT OSS 20B gpt-oss-20b 0.20 0.30 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway GPT OSS 120B gpt-oss-120b 0.35 0.75 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway IndicTrans2 EN-Indic 1B indictrans2-en-indic-1b 0.34 0.34 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Pipecat Smart Turn v2 smart-turn-v2 0.00 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Qwen 2.5 Coder 32B Instruct qwen2.5-coder-32b-instruct 0.66 1.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Qwen3 30B A3B FP8 qwen3-30b-a3b-fp8 0.05 0.34 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Qwen3 Embedding 0.6B qwen3-embedding-0.6b 0.01 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway QwQ 32B qwq-32b 0.66 1.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Mistral Small 3.1 24B Instruct mistral-small-3.1-24b-instruct 0.35 0.56 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Deepgram Aura 2 (ES) aura-2-es 0.00 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Deepgram Aura 2 (EN) aura-2-en 0.00 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Deepgram Nova 3 nova-3 0.00 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Gemma SEA-LION v4 27B IT gemma-sea-lion-v4-27b-it 0.35 0.56 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3.2 11B Vision Instruct llama-3.2-11b-vision-instruct 0.05 0.68 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3.1 8B Instruct FP8 llama-3.1-8b-instruct-fp8 0.15 0.29 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 2 7B Chat FP16 llama-2-7b-chat-fp16 0.56 6.67 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3 8B Instruct llama-3-8b-instruct 0.28 0.83 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3.1 8B Instruct llama-3.1-8b-instruct 0.28 0.83 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway M2M100 1.2B m2m100-1.2b 0.34 0.34 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3.2 3B Instruct llama-3.2-3b-instruct 0.05 0.34 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3.3 70B Instruct FP8 Fast llama-3.3-70b-instruct-fp8-fast 0.29 2.25 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3 8B Instruct AWQ llama-3-8b-instruct-awq 0.12 0.27 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3.2 1B Instruct llama-3.2-1b-instruct 0.03 0.20 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 4 Scout 17B 16E Instruct llama-4-scout-17b-16e-instruct 0.27 0.85 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama Guard 3 8B llama-guard-3-8b 0.48 0.03 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway Llama 3.1 8B Instruct AWQ llama-3.1-8b-instruct-awq 0.12 0.27 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway BGE M3 bge-m3 0.01 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway BGE Base EN v1.5 bge-base-en-v1.5 0.07 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway BGE Large EN v1.5 bge-large-en-v1.5 0.20 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway BGE Reranker Base bge-reranker-base 0.00 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway BGE Small EN v1.5 bge-small-en-v1.5 0.02 0.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway DeepSeek R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b 0.50 4.88 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway GPT-4 gpt-4 30.00 60.00 Provider: Cloudflare AI Gateway, Context: 8192, Output Limit: 8192
cloudflareaigateway GPT-5.1 Codex gpt-5.1-codex 1.25 10.00 Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
cloudflareaigateway GPT-3.5-turbo gpt-3.5-turbo 0.50 1.50 Provider: Cloudflare AI Gateway, Context: 16385, Output Limit: 4096
cloudflareaigateway GPT-4 Turbo gpt-4-turbo 10.00 30.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 4096
cloudflareaigateway o3-mini o3-mini 1.10 4.40 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway GPT-5.1 gpt-5.1 1.25 10.00 Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
cloudflareaigateway GPT-4o gpt-4o 2.50 10.00 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway o4-mini o4-mini 1.10 4.40 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway o1 o1 15.00 60.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway o3-pro o3-pro 20.00 80.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway o3 o3 2.00 8.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway GPT-4o mini gpt-4o-mini 0.15 0.60 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway GPT-5.2 gpt-5.2 1.75 14.00 Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
cloudflareaigateway Claude Opus 4 (latest) claude-opus-4 15.00 75.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 32000
cloudflareaigateway Claude Opus 4.1 (latest) claude-opus-4-1 15.00 75.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 32000
cloudflareaigateway Claude Haiku 4.5 (latest) claude-haiku-4-5 1.00 5.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
cloudflareaigateway Claude Haiku 3 claude-3-haiku 0.25 1.25 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
cloudflareaigateway Claude Opus 4.5 (latest) claude-opus-4-5 5.00 25.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
cloudflareaigateway Claude Opus 3 claude-3-opus 15.00 75.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
cloudflareaigateway Claude Sonnet 4.5 (latest) claude-sonnet-4-5 3.00 15.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
cloudflareaigateway Claude Sonnet 3.5 v2 claude-3.5-sonnet 3.00 15.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
cloudflareaigateway Claude Sonnet 3 claude-3-sonnet 3.00 15.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
cloudflareaigateway Claude Haiku 3.5 (latest) claude-3-5-haiku 0.80 4.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
cloudflareaigateway Claude Haiku 3.5 (latest) claude-3.5-haiku 0.80 4.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
cloudflareaigateway Claude Sonnet 4 (latest) claude-sonnet-4 3.00 15.00 Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
openai GPT-4.1 nano gpt-4.1-nano 0.10 0.40 Provider: OpenAI, Context: 1047576, Output Limit: 32768
openai text-embedding-3-small text-embedding-3-small 0.02 0.00 Provider: OpenAI, Context: 8191, Output Limit: 1536
openai GPT-4 gpt-4 30.00 60.00 Provider: OpenAI, Context: 8192, Output Limit: 8192
openai o1-pro o1-pro 150.00 600.00 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai GPT-4o (2024-05-13) gpt-4o-2024-05-13 5.00 15.00 Provider: OpenAI, Context: 128000, Output Limit: 4096
openai GPT-5.1 Codex gpt-5.1-codex 1.25 10.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai GPT-4o (2024-08-06) gpt-4o-2024-08-06 2.50 10.00 Provider: OpenAI, Context: 128000, Output Limit: 16384
openai GPT-4.1 mini gpt-4.1-mini 0.40 1.60 Provider: OpenAI, Context: 1047576, Output Limit: 32768
openai o3-deep-research o3-deep-research 10.00 40.00 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai GPT-3.5-turbo gpt-3.5-turbo 0.50 1.50 Provider: OpenAI, Context: 16385, Output Limit: 4096
openai GPT-5.2 Pro gpt-5.2-pro 21.00 168.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai text-embedding-3-large text-embedding-3-large 0.13 0.00 Provider: OpenAI, Context: 8191, Output Limit: 3072
openai GPT-4 Turbo gpt-4-turbo 10.00 30.00 Provider: OpenAI, Context: 128000, Output Limit: 4096
openai o1-preview o1-preview 15.00 60.00 Provider: OpenAI, Context: 128000, Output Limit: 32768
openai GPT-5.1 Codex mini gpt-5.1-codex-mini 0.25 2.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai o3-mini o3-mini 1.10 4.40 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai GPT-5.2 Chat gpt-5.2-chat-latest 1.75 14.00 Provider: OpenAI, Context: 128000, Output Limit: 16384
openai GPT-5.1 gpt-5.1 1.25 10.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai Codex Mini codex-mini-latest 1.50 6.00 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai GPT-5 Nano gpt-5-nano 0.05 0.40 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai GPT-5-Codex gpt-5-codex 1.25 10.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai GPT-4o gpt-4o 2.50 10.00 Provider: OpenAI, Context: 128000, Output Limit: 16384
openai GPT-4.1 gpt-4.1 2.00 8.00 Provider: OpenAI, Context: 1047576, Output Limit: 32768
openai o4-mini o4-mini 1.10 4.40 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai o1 o1 15.00 60.00 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai GPT-5 Mini gpt-5-mini 0.25 2.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai o1-mini o1-mini 1.10 4.40 Provider: OpenAI, Context: 128000, Output Limit: 65536
openai text-embedding-ada-002 text-embedding-ada-002 0.10 0.00 Provider: OpenAI, Context: 8192, Output Limit: 1536
openai o3-pro o3-pro 20.00 80.00 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai GPT-4o (2024-11-20) gpt-4o-2024-11-20 2.50 10.00 Provider: OpenAI, Context: 128000, Output Limit: 16384
openai GPT-5.1 Codex Max gpt-5.1-codex-max 1.25 10.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai o3 o3 2.00 8.00 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai o4-mini-deep-research o4-mini-deep-research 2.00 8.00 Provider: OpenAI, Context: 200000, Output Limit: 100000
openai GPT-5 Chat (latest) gpt-5-chat-latest 1.25 10.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai GPT-4o mini gpt-4o-mini 0.15 0.60 Provider: OpenAI, Context: 128000, Output Limit: 16384
openai GPT-5 gpt-5 1.25 10.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai GPT-5 Pro gpt-5-pro 15.00 120.00 Provider: OpenAI, Context: 400000, Output Limit: 272000
openai GPT-5.2 gpt-5.2 1.75 14.00 Provider: OpenAI, Context: 400000, Output Limit: 128000
openai GPT-5.1 Chat gpt-5.1-chat-latest 1.25 10.00 Provider: OpenAI, Context: 128000, Output Limit: 16384
minimaxcn MiniMax-M2.1 minimax-m2.1 0.30 1.20 Provider: MiniMax (China), Context: 204800, Output Limit: 131072
minimaxcn MiniMax-M2 minimax-m2 0.30 1.20 Provider: MiniMax (China), Context: 196608, Output Limit: 128000
perplexity Sonar sonar 1.00 1.00 Provider: Perplexity, Context: 128000, Output Limit: 4096
perplexity Sonar Pro sonar-pro 3.00 15.00 Provider: Perplexity, Context: 200000, Output Limit: 8192
perplexity Sonar Reasoning Pro sonar-reasoning-pro 2.00 8.00 Provider: Perplexity, Context: 128000, Output Limit: 4096
zenmux Step-3 step-3 0.21 0.57 Provider: ZenMux, Context: 65536, Output Limit: 64000
zenmux Kimi K2 Thinking Turbo kimi-k2-thinking-turbo 1.15 8.00 Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux Kimi K2 0905 kimi-k2-0905 0.60 2.50 Provider: ZenMux, Context: 262100, Output Limit: 64000
zenmux Kimi K2 Thinking kimi-k2-thinking 0.60 2.50 Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux MiMo-V2-Flash Free mimo-v2-flash-free 0.00 0.00 Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux MiMo-V2-Flash mimo-v2-flash 0.00 0.00 Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux Grok 4 grok-4 3.00 15.00 Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux Grok Code Fast 1 grok-code-fast-1 0.20 1.50 Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux Grok 4.1 Fast Non Reasoning grok-4.1-fast-non-reasoning 0.20 0.50 Provider: ZenMux, Context: 2000000, Output Limit: 64000
zenmux Grok 4 Fast grok-4-fast 0.20 0.50 Provider: ZenMux, Context: 2000000, Output Limit: 64000
zenmux Grok 4.1 Fast grok-4.1-fast 0.20 0.50 Provider: ZenMux, Context: 2000000, Output Limit: 64000
zenmux DeepSeek-V3.2 (Non-thinking Mode) deepseek-chat 0.28 0.42 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux DeepSeek-V3.2-Exp deepseek-v3.2-exp 0.22 0.33 Provider: ZenMux, Context: 163840, Output Limit: 64000
zenmux DeepSeek-V3.2 (Thinking Mode) deepseek-reasoner 0.28 0.42 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux DeepSeek V3.2 deepseek-v3.2 0.28 0.43 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux MiniMax M2 minimax-m2 0.30 1.20 Provider: ZenMux, Context: 204800, Output Limit: 64000
zenmux MiniMax M2.1 minimax-m2.1 0.30 1.20 Provider: ZenMux, Context: 204800, Output Limit: 64000
zenmux Gemini 3 Flash Preview gemini-3-flash-preview 0.50 3.00 Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux Gemini 3 Flash Preview Free gemini-3-flash-preview-free 0.00 0.00 Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux Gemini 2.5 Flash Lite gemini-2.5-flash-lite 0.10 0.40 Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Provider: ZenMux, Context: 1048576, Output Limit: 65536
zenmux Doubao-Seed-Code doubao-seed-code 0.17 1.12 Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux Doubao-Seed-1.8 doubao-seed-1.8 0.11 0.28 Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux GPT-5.1-Codex gpt-5.1-codex 1.25 10.00 Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux GPT-5.1-Codex-Mini gpt-5.1-codex-mini 0.25 2.00 Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux GPT-5.1 gpt-5.1 1.25 10.00 Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux GPT-5 Codex gpt-5-codex 1.25 10.00 Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux GPT-5.1 Chat gpt-5.1-chat 1.25 10.00 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux GPT-5 gpt-5 1.25 10.00 Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux GPT-5.2 gpt-5.2 1.75 14.00 Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux ERNIE-5.0-Thinking-Preview ernie-5.0-thinking-preview 0.84 3.37 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux Ring-1T ring-1t 0.56 2.24 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux Ling-1T ling-1t 0.56 2.24 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux GLM 4.7 glm-4.7 0.28 1.14 Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux GLM 4.6V Flash (Free) glm-4.6v-flash-free 0.00 0.00 Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux GLM 4.6V FlashX glm-4.6v-flash 0.00 0.00 Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux GLM 4.5 glm-4.5 0.35 1.54 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux GLM 4.5 Air glm-4.5-air 0.11 0.56 Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux GLM 4.6 glm-4.6 0.35 1.54 Provider: ZenMux, Context: 200000, Output Limit: 128000
zenmux GLM 4.6V glm-4.6v 0.14 0.42 Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux Qwen3-Coder-Plus qwen3-coder-plus 1.00 5.00 Provider: ZenMux, Context: 1000000, Output Limit: 64000
zenmux KAT-Coder-Pro-V1 Free kat-coder-pro-v1-free 0.00 0.00 Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux KAT-Coder-Pro-V1 kat-coder-pro-v1 0.00 0.00 Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux Claude Opus 4 claude-opus-4 15.00 75.00 Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux Claude Haiku 4.5 claude-haiku-4.5 1.00 5.00 Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux Claude Opus 4.1 claude-opus-4.1 15.00 75.00 Provider: ZenMux, Context: 200000, Output Limit: 32000
zenmux Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Provider: ZenMux, Context: 1000000, Output Limit: 64000
zenmux Claude Opus 4.5 claude-opus-4.5 5.00 25.00 Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux Claude Sonnet 4.5 claude-sonnet-4.5 3.00 15.00 Provider: ZenMux, Context: 1000000, Output Limit: 64000
ovhcloud Mixtral-8x7B-Instruct-v0.1 mixtral-8x7b-instruct-v0.1 0.70 0.70 Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud Mistral-7B-Instruct-v0.3 mistral-7b-instruct-v0.3 0.11 0.11 Provider: OVHcloud AI Endpoints, Context: 127000, Output Limit: 127000
ovhcloud Llama-3.1-8B-Instruct llama-3.1-8b-instruct 0.11 0.11 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud Qwen2.5-VL-72B-Instruct qwen2.5-vl-72b-instruct 1.01 1.01 Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud Mistral-Nemo-Instruct-2407 mistral-nemo-instruct-2407 0.14 0.14 Provider: OVHcloud AI Endpoints, Context: 118000, Output Limit: 118000
ovhcloud Mistral-Small-3.2-24B-Instruct-2506 mistral-small-3.2-24b-instruct-2506 0.10 0.31 Provider: OVHcloud AI Endpoints, Context: 128000, Output Limit: 128000
ovhcloud Qwen2.5-Coder-32B-Instruct qwen2.5-coder-32b-instruct 0.96 0.96 Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud Qwen3-Coder-30B-A3B-Instruct qwen3-coder-30b-a3b-instruct 0.07 0.26 Provider: OVHcloud AI Endpoints, Context: 256000, Output Limit: 256000
ovhcloud llava-next-mistral-7b llava-next-mistral-7b 0.32 0.32 Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud DeepSeek-R1-Distill-Llama-70B deepseek-r1-distill-llama-70b 0.74 0.74 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud Meta-Llama-3_1-70B-Instruct meta-llama-3_1-70b-instruct 0.74 0.74 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud gpt-oss-20b gpt-oss-20b 0.05 0.18 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud gpt-oss-120b gpt-oss-120b 0.09 0.47 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud Meta-Llama-3_3-70B-Instruct meta-llama-3_3-70b-instruct 0.74 0.74 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud Qwen3-32B qwen3-32b 0.09 0.25 Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
v0 v0-1.5-lg v0-1.5-lg 15.00 75.00 Provider: v0, Context: 512000, Output Limit: 32000
v0 v0-1.5-md v0-1.5-md 3.00 15.00 Provider: v0, Context: 128000, Output Limit: 32000
v0 v0-1.0-md v0-1.0-md 3.00 15.00 Provider: v0, Context: 128000, Output Limit: 32000
iflowcn Qwen3-Coder-480B-A35B qwen3-coder 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn DeepSeek-V3 deepseek-v3 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 32000
iflowcn Kimi-K2 kimi-k2 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn DeepSeek-R1 deepseek-r1 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 32000
iflowcn DeepSeek-V3.1-Terminus deepseek-v3.1 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn MiniMax-M2 minimax-m2 0.00 0.00 Provider: iFlow, Context: 204800, Output Limit: 131100
iflowcn Qwen3-235B-A22B qwen3-235b 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 32000
iflowcn DeepSeek-V3.2 deepseek-v3.2-chat 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn Kimi-K2-0905 kimi-k2-0905 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn Kimi-K2-Thinking kimi-k2-thinking 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn Qwen3-235B-A22B-Thinking qwen3-235b-a22b-thinking-2507 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn Qwen3-VL-Plus qwen3-vl-plus 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 32000
iflowcn GLM-4.6 glm-4.6 0.00 0.00 Provider: iFlow, Context: 200000, Output Limit: 128000
iflowcn TStars-2.0 tstars2.0 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn Qwen3-235B-A22B-Instruct qwen3-235b-a22b-instruct 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn Qwen3-Max qwen3-max 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 32000
iflowcn DeepSeek-V3.2-Exp deepseek-v3.2 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn Qwen3-Max-Preview qwen3-max-preview 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 32000
iflowcn Qwen3-Coder-Plus qwen3-coder-plus 0.00 0.00 Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn Qwen3-32B qwen3-32b 0.00 0.00 Provider: iFlow, Context: 128000, Output Limit: 32000
synthetic Qwen 3 235B Instruct qwen3-235b-a22b-instruct-2507 0.20 0.60 Provider: Synthetic, Context: 256000, Output Limit: 32000
synthetic Qwen2.5-Coder-32B-Instruct qwen2.5-coder-32b-instruct 0.80 0.80 Provider: Synthetic, Context: 32768, Output Limit: 32768
synthetic Qwen 3 Coder 480B qwen3-coder-480b-a35b-instruct 2.00 2.00 Provider: Synthetic, Context: 256000, Output Limit: 32000
synthetic Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 0.65 3.00 Provider: Synthetic, Context: 256000, Output Limit: 32000
synthetic MiniMax-M2 minimax-m2 0.55 2.19 Provider: Synthetic, Context: 196608, Output Limit: 131000
synthetic MiniMax-M2.1 minimax-m2.1 0.55 2.19 Provider: Synthetic, Context: 204800, Output Limit: 131072
synthetic Llama-3.1-70B-Instruct llama-3.1-70b-instruct 0.90 0.90 Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic Llama-3.1-8B-Instruct llama-3.1-8b-instruct 0.20 0.20 Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.90 0.90 Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic Llama-4-Scout-17B-16E-Instruct llama-4-scout-17b-16e-instruct 0.15 0.60 Provider: Synthetic, Context: 328000, Output Limit: 4096
synthetic Llama-4-Maverick-17B-128E-Instruct-FP8 llama-4-maverick-17b-128e-instruct-fp8 0.22 0.88 Provider: Synthetic, Context: 524000, Output Limit: 4096
synthetic Llama-3.1-405B-Instruct llama-3.1-405b-instruct 3.00 3.00 Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic Kimi K2 0905 kimi-k2-instruct-0905 1.20 1.20 Provider: Synthetic, Context: 262144, Output Limit: 32768
synthetic Kimi K2 Thinking kimi-k2-thinking 0.55 2.19 Provider: Synthetic, Context: 262144, Output Limit: 262144
synthetic GLM 4.5 glm-4.5 0.55 2.19 Provider: Synthetic, Context: 128000, Output Limit: 96000
synthetic GLM 4.7 glm-4.7 0.55 2.19 Provider: Synthetic, Context: 200000, Output Limit: 64000
synthetic GLM 4.6 glm-4.6 0.55 2.19 Provider: Synthetic, Context: 200000, Output Limit: 64000
synthetic DeepSeek R1 deepseek-r1 0.55 2.19 Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic DeepSeek R1 (0528) deepseek-r1-0528 3.00 8.00 Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic DeepSeek V3.1 Terminus deepseek-v3.1-terminus 1.20 1.20 Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic DeepSeek V3.2 deepseek-v3.2 0.27 0.40 Provider: Synthetic, Context: 162816, Output Limit: 8000
synthetic DeepSeek V3 deepseek-v3 1.25 1.25 Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic DeepSeek V3.1 deepseek-v3.1 0.56 1.68 Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic DeepSeek V3 (0324) deepseek-v3-0324 1.20 1.20 Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic GPT OSS 120B gpt-oss-120b 0.10 0.10 Provider: Synthetic, Context: 128000, Output Limit: 32768
deepinfra Kimi K2 kimi-k2-instruct 0.50 2.00 Provider: Deep Infra, Context: 131072, Output Limit: 32768
deepinfra Kimi K2 Thinking kimi-k2-thinking 0.47 2.00 Provider: Deep Infra, Context: 131072, Output Limit: 32768
deepinfra MiniMax M2 minimax-m2 0.25 1.02 Provider: Deep Infra, Context: 262144, Output Limit: 32768
deepinfra GPT OSS 20B gpt-oss-20b 0.03 0.14 Provider: Deep Infra, Context: 131072, Output Limit: 16384
deepinfra GPT OSS 120B gpt-oss-120b 0.05 0.24 Provider: Deep Infra, Context: 131072, Output Limit: 16384
deepinfra Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 0.40 1.60 Provider: Deep Infra, Context: 262144, Output Limit: 66536
deepinfra Qwen3 Coder 480B A35B Instruct Turbo qwen3-coder-480b-a35b-instruct-turbo 0.30 1.20 Provider: Deep Infra, Context: 262144, Output Limit: 66536
deepinfra GLM-4.5 glm-4.5 0.60 2.20 Provider: Deep Infra, Context: 131072, Output Limit: 98304
deepinfra GLM-4.7 glm-4.7 0.43 1.75 Provider: Deep Infra, Context: 202752, Output Limit: 16384
zhipuai GLM-4.6V-Flash glm-4.6v-flash 0.00 0.00 Provider: Zhipu AI, Context: 128000, Output Limit: 32768
zhipuai GLM-4.6V glm-4.6v 0.30 0.90 Provider: Zhipu AI, Context: 128000, Output Limit: 32768
zhipuai GLM-4.6 glm-4.6 0.60 2.20 Provider: Zhipu AI, Context: 204800, Output Limit: 131072
zhipuai GLM-4.5V glm-4.5v 0.60 1.80 Provider: Zhipu AI, Context: 64000, Output Limit: 16384
zhipuai GLM-4.5-Air glm-4.5-air 0.20 1.10 Provider: Zhipu AI, Context: 131072, Output Limit: 98304
zhipuai GLM-4.5 glm-4.5 0.60 2.20 Provider: Zhipu AI, Context: 131072, Output Limit: 98304
zhipuai GLM-4.5-Flash glm-4.5-flash 0.00 0.00 Provider: Zhipu AI, Context: 131072, Output Limit: 98304
zhipuai GLM-4.7 glm-4.7 0.60 2.20 Provider: Zhipu AI, Context: 204800, Output Limit: 131072
submodel GPT OSS 120B gpt-oss-120b 0.10 0.50 Provider: submodel, Context: 131072, Output Limit: 32768
submodel Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.20 0.30 Provider: submodel, Context: 262144, Output Limit: 131072
submodel Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct-fp8 0.20 0.80 Provider: submodel, Context: 262144, Output Limit: 262144
submodel Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 0.20 0.60 Provider: submodel, Context: 262144, Output Limit: 131072
submodel GLM 4.5 FP8 glm-4.5-fp8 0.20 0.80 Provider: submodel, Context: 131072, Output Limit: 131072
submodel GLM 4.5 Air glm-4.5-air 0.10 0.50 Provider: submodel, Context: 131072, Output Limit: 131072
submodel DeepSeek R1 0528 deepseek-r1-0528 0.50 2.15 Provider: submodel, Context: 75000, Output Limit: 163840
submodel DeepSeek V3.1 deepseek-v3.1 0.20 0.80 Provider: submodel, Context: 75000, Output Limit: 163840
submodel DeepSeek V3 0324 deepseek-v3-0324 0.20 0.80 Provider: submodel, Context: 75000, Output Limit: 163840
nanogpt Kimi K2 Thinking kimi-k2-thinking 1.00 2.00 Provider: NanoGPT, Context: 32768, Output Limit: 8192
nanogpt Kimi K2 Instruct kimi-k2-instruct 1.00 2.00 Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt Hermes 4 405b Thinking hermes-4-405b:thinking 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt Llama 3 3 Nemotron Super 49B V1 5 llama-3_3-nemotron-super-49b-v1_5 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt Deepseek V3.2 Thinking deepseek-v3.2:thinking 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt Deepseek R1 deepseek-r1 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt Minimax M2.1 minimax-m2.1 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt GPT Oss 120b gpt-oss-120b 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt GLM 4.6 Thinking glm-4.6:thinking 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt GLM 4.6 glm-4.6 1.00 2.00 Provider: NanoGPT, Context: 200000, Output Limit: 8192
nanogpt Qwen3 Coder qwen3-coder 1.00 2.00 Provider: NanoGPT, Context: 106000, Output Limit: 8192
nanogpt Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 1.00 2.00 Provider: NanoGPT, Context: 262144, Output Limit: 8192
nanogpt Devstral 2 123b Instruct 2512 devstral-2-123b-instruct-2512 1.00 2.00 Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt Mistral Large 3 675b Instruct 2512 mistral-large-3-675b-instruct-2512 1.00 2.00 Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt Ministral 14b Instruct 2512 ministral-14b-instruct-2512 1.00 2.00 Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt Llama 4 Maverick llama-4-maverick 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt Llama 3.3 70b Instruct llama-3.3-70b-instruct 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt GLM 4.7 glm-4.7 1.00 2.00 Provider: NanoGPT, Context: 204800, Output Limit: 8192
nanogpt GLM 4.5 Air glm-4.5-air 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt GLM 4.7 Thinking glm-4.7:thinking 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt GLM 4.5 Air Thinking glm-4.5-air:thinking 1.00 2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
inference Mistral Nemo 12B Instruct mistral-nemo-12b-instruct 0.04 0.10 Provider: Inference, Context: 16000, Output Limit: 4096
inference Google Gemma 3 gemma-3 0.15 0.30 Provider: Inference, Context: 125000, Output Limit: 4096
inference Osmosis Structure 0.6B osmosis-structure-0.6b 0.10 0.50 Provider: Inference, Context: 4000, Output Limit: 2048
inference Qwen 3 Embedding 4B qwen3-embedding-4b 0.01 0.00 Provider: Inference, Context: 32000, Output Limit: 2048
inference Qwen 2.5 7B Vision Instruct qwen-2.5-7b-vision-instruct 0.20 0.20 Provider: Inference, Context: 125000, Output Limit: 4096
inference Llama 3.2 11B Vision Instruct llama-3.2-11b-vision-instruct 0.06 0.06 Provider: Inference, Context: 16000, Output Limit: 4096
inference Llama 3.1 8B Instruct llama-3.1-8b-instruct 0.03 0.03 Provider: Inference, Context: 16000, Output Limit: 4096
inference Llama 3.2 3B Instruct llama-3.2-3b-instruct 0.02 0.02 Provider: Inference, Context: 16000, Output Limit: 4096
inference Llama 3.2 1B Instruct llama-3.2-1b-instruct 0.01 0.01 Provider: Inference, Context: 16000, Output Limit: 4096
requesty Grok 4 grok-4 3.00 15.00 Provider: Requesty, Context: 256000, Output Limit: 64000
requesty Grok 4 Fast grok-4-fast 0.20 0.50 Provider: Requesty, Context: 2000000, Output Limit: 64000
requesty Gemini 3 Flash gemini-3-flash-preview 0.50 3.00 Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty Gemini 3 Pro gemini-3-pro-preview 2.00 12.00 Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty GPT-4.1 Mini gpt-4.1-mini 0.40 1.60 Provider: Requesty, Context: 1047576, Output Limit: 32768
requesty GPT-5 Nano gpt-5-nano 0.05 0.40 Provider: Requesty, Context: 16000, Output Limit: 4000
requesty GPT-4.1 gpt-4.1 2.00 8.00 Provider: Requesty, Context: 1047576, Output Limit: 32768
requesty o4 Mini o4-mini 1.10 4.40 Provider: Requesty, Context: 200000, Output Limit: 100000
requesty GPT-5 Mini gpt-5-mini 0.25 2.00 Provider: Requesty, Context: 128000, Output Limit: 32000
requesty GPT-4o Mini gpt-4o-mini 0.15 0.60 Provider: Requesty, Context: 128000, Output Limit: 16384
requesty GPT-5 gpt-5 1.25 10.00 Provider: Requesty, Context: 400000, Output Limit: 128000
requesty Claude Opus 4 claude-opus-4 15.00 75.00 Provider: Requesty, Context: 200000, Output Limit: 32000
requesty Claude Opus 4.1 claude-opus-4-1 15.00 75.00 Provider: Requesty, Context: 200000, Output Limit: 32000
requesty Claude Haiku 4.5 claude-haiku-4-5 1.00 5.00 Provider: Requesty, Context: 200000, Output Limit: 62000
requesty Claude Opus 4.5 claude-opus-4-5 5.00 25.00 Provider: Requesty, Context: 200000, Output Limit: 64000
requesty Claude Sonnet 4.5 claude-sonnet-4-5 3.00 15.00 Provider: Requesty, Context: 1000000, Output Limit: 64000
requesty Claude Sonnet 3.7 claude-3-7-sonnet 3.00 15.00 Provider: Requesty, Context: 200000, Output Limit: 64000
requesty Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Provider: Requesty, Context: 200000, Output Limit: 64000
morph Morph v3 Large morph-v3-large 0.90 1.90 Provider: Morph, Context: 32000, Output Limit: 32000
morph Auto auto 0.85 1.55 Provider: Morph, Context: 32000, Output Limit: 32000
morph Morph v3 Fast morph-v3-fast 0.80 1.20 Provider: Morph, Context: 16000, Output Limit: 16000
lmstudio GPT OSS 20B gpt-oss-20b 0.00 0.00 Provider: LMStudio, Context: 131072, Output Limit: 32768
lmstudio Qwen3 30B A3B 2507 qwen3-30b-a3b-2507 0.00 0.00 Provider: LMStudio, Context: 262144, Output Limit: 16384
lmstudio Qwen3 Coder 30B qwen3-coder-30b 0.00 0.00 Provider: LMStudio, Context: 262144, Output Limit: 65536
friendli Llama 3.3 70B Instruct meta-llama-3.3-70b-instruct 0.60 0.60 Provider: Friendli, Context: 131072, Output Limit: 131072
friendli Llama 3.1 8B Instruct meta-llama-3.1-8b-instruct 0.10 0.10 Provider: Friendli, Context: 131072, Output Limit: 8000
friendli EXAONE 4.0.1 32B exaone-4.0.1-32b 0.60 1.00 Provider: Friendli, Context: 131072, Output Limit: 131072
friendli Llama 4 Maverick 17B 128E Instruct llama-4-maverick-17b-128e-instruct - - Provider: Friendli, Context: 131072, Output Limit: 8000
friendli Llama 4 Scout 17B 16E Instruct llama-4-scout-17b-16e-instruct - - Provider: Friendli, Context: 131072, Output Limit: 8000
friendli Qwen3 30B A3B qwen3-30b-a3b - - Provider: Friendli, Context: 131072, Output Limit: 8000
friendli Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.20 0.80 Provider: Friendli, Context: 131072, Output Limit: 131072
friendli Qwen3 32B qwen3-32b - - Provider: Friendli, Context: 131072, Output Limit: 8000
friendli Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 - - Provider: Friendli, Context: 131072, Output Limit: 131072
friendli GLM 4.6 glm-4.6 - - Provider: Friendli, Context: 131072, Output Limit: 131072
friendli DeepSeek R1 0528 deepseek-r1-0528 - - Provider: Friendli, Context: 163840, Output Limit: 163840
sapaicore anthropic--claude-3.5-sonnet anthropic--claude-3.5-sonnet 3.00 15.00 Provider: SAP AI Core, Context: 200000, Output Limit: 8192
sapaicore anthropic--claude-4.5-haiku anthropic--claude-4.5-haiku 1.00 5.00 Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore anthropic--claude-4-opus anthropic--claude-4-opus 15.00 75.00 Provider: SAP AI Core, Context: 200000, Output Limit: 32000
sapaicore gemini-2.5-flash gemini-2.5-flash 0.30 2.50 Provider: SAP AI Core, Context: 1048576, Output Limit: 65536
sapaicore anthropic--claude-3-haiku anthropic--claude-3-haiku 0.25 1.25 Provider: SAP AI Core, Context: 200000, Output Limit: 4096
sapaicore anthropic--claude-3-sonnet anthropic--claude-3-sonnet 3.00 15.00 Provider: SAP AI Core, Context: 200000, Output Limit: 4096
sapaicore gpt-5-nano gpt-5-nano 0.05 0.40 Provider: SAP AI Core, Context: 400000, Output Limit: 128000
sapaicore anthropic--claude-3.7-sonnet anthropic--claude-3.7-sonnet 3.00 15.00 Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore gpt-5-mini gpt-5-mini 0.25 2.00 Provider: SAP AI Core, Context: 400000, Output Limit: 128000
sapaicore anthropic--claude-4.5-sonnet anthropic--claude-4.5-sonnet 3.00 15.00 Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore gemini-2.5-pro gemini-2.5-pro 1.25 10.00 Provider: SAP AI Core, Context: 1048576, Output Limit: 65536
sapaicore anthropic--claude-3-opus anthropic--claude-3-opus 15.00 75.00 Provider: SAP AI Core, Context: 200000, Output Limit: 4096
sapaicore anthropic--claude-4-sonnet anthropic--claude-4-sonnet 3.00 15.00 Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore gpt-5 gpt-5 1.25 10.00 Provider: SAP AI Core, Context: 400000, Output Limit: 128000
anthropic Claude Opus 4 (latest) claude-opus-4-0 15.00 75.00 Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic Claude Sonnet 3.5 v2 claude-3-5-sonnet-20241022 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic Claude Opus 4.1 (latest) claude-opus-4-1 15.00 75.00 Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic Claude Haiku 4.5 (latest) claude-haiku-4-5 1.00 5.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Sonnet 3.5 claude-3-5-sonnet-20240620 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic Claude Haiku 3.5 (latest) claude-3-5-haiku-latest 0.80 4.00 Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic Claude Opus 4.5 (latest) claude-opus-4-5 5.00 25.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Opus 3 claude-3-opus-20240229 15.00 75.00 Provider: Anthropic, Context: 200000, Output Limit: 4096
anthropic Claude Opus 4.5 claude-opus-4-5-20251101 5.00 25.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Sonnet 4.5 (latest) claude-sonnet-4-5 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Sonnet 4.5 claude-sonnet-4-5-20250929 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Sonnet 4 claude-sonnet-4-20250514 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Opus 4 claude-opus-4-20250514 15.00 75.00 Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic Claude Haiku 3.5 claude-3-5-haiku-20241022 0.80 4.00 Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic Claude Haiku 3 claude-3-haiku-20240307 0.25 1.25 Provider: Anthropic, Context: 200000, Output Limit: 4096
anthropic Claude Sonnet 3.7 claude-3-7-sonnet-20250219 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Sonnet 3.7 (latest) claude-3-7-sonnet-latest 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Sonnet 4 (latest) claude-sonnet-4-0 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic Claude Opus 4.1 claude-opus-4-1-20250805 15.00 75.00 Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic Claude Sonnet 3 claude-3-sonnet-20240229 3.00 15.00 Provider: Anthropic, Context: 200000, Output Limit: 4096
anthropic Claude Haiku 4.5 claude-haiku-4-5-20251001 1.00 5.00 Provider: Anthropic, Context: 200000, Output Limit: 64000
aihubmix GPT-4.1 nano gpt-4.1-nano 0.10 0.40 Provider: AIHubMix, Context: 1047576, Output Limit: 32768
aihubmix GLM-4.7 glm-4.7 0.27 1.10 Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.28 1.12 Provider: AIHubMix, Context: 262144, Output Limit: 262144
aihubmix Claude Opus 4.1 claude-opus-4-1 16.50 82.50 Provider: AIHubMix, Context: 200000, Output Limit: 32000
aihubmix GPT-5.1 Codex gpt-5.1-codex 1.25 10.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix Claude Haiku 4.5 claude-haiku-4-5 1.10 5.50 Provider: AIHubMix, Context: 200000, Output Limit: 64000
aihubmix Claude Opus 4.5 claude-opus-4-5 5.00 25.00 Provider: AIHubMix, Context: 200000, Output Limit: 32000
aihubmix Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 Provider: AIHubMix, Context: 1000000, Output Limit: 65000
aihubmix Gemini 2.5 Flash gemini-2.5-flash 0.08 0.30 Provider: AIHubMix, Context: 1000000, Output Limit: 65000
aihubmix GPT-4.1 mini gpt-4.1-mini 0.40 1.60 Provider: AIHubMix, Context: 1047576, Output Limit: 32768
aihubmix Claude Sonnet 4.5 claude-sonnet-4-5 3.30 16.50 Provider: AIHubMix, Context: 200000, Output Limit: 64000
aihubmix Coding GLM-4.7 Free coding-glm-4.7-free 0.00 0.00 Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix GPT-5.1 Codex Mini gpt-5.1-codex-mini 0.25 2.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 0.28 2.80 Provider: AIHubMix, Context: 262144, Output Limit: 262144
aihubmix GPT-5.1 gpt-5.1 1.25 10.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix GPT-5-Nano gpt-5-nano 0.50 2.00 Provider: AIHubMix, Context: 128000, Output Limit: 16384
aihubmix GPT-5-Codex gpt-5-codex 1.25 10.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix GPT-4o gpt-4o 2.50 10.00 Provider: AIHubMix, Context: 128000, Output Limit: 16384
aihubmix GPT-4.1 gpt-4.1 2.00 8.00 Provider: AIHubMix, Context: 1047576, Output Limit: 32768
aihubmix o4-mini o4-mini 1.50 6.00 Provider: AIHubMix, Context: 200000, Output Limit: 65536
aihubmix GPT-5-Mini gpt-5-mini 1.50 6.00 Provider: AIHubMix, Context: 200000, Output Limit: 64000
aihubmix Gemini 2.5 Pro gemini-2.5-pro 1.25 5.00 Provider: AIHubMix, Context: 2000000, Output Limit: 65000
aihubmix GPT-4o (2024-11-20) gpt-4o-2024-11-20 2.50 10.00 Provider: AIHubMix, Context: 128000, Output Limit: 16384
aihubmix GPT-5.1-Codex-Max gpt-5.1-codex-max 1.25 10.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix MiniMax M2.1 Free minimax-m2.1-free 0.00 0.00 Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 0.82 3.29 Provider: AIHubMix, Context: 262144, Output Limit: 131000
aihubmix DeepSeek-V3.2-Think deepseek-v3.2-think 0.30 0.45 Provider: AIHubMix, Context: 131000, Output Limit: 64000
aihubmix GPT-5 gpt-5 5.00 20.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix MiniMax M2.1 minimax-m2.1 0.29 1.15 Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix DeepSeek-V3.2 deepseek-v3.2 0.30 0.45 Provider: AIHubMix, Context: 131000, Output Limit: 64000
aihubmix Kimi K2 0905 kimi-k2-0905 0.55 2.19 Provider: AIHubMix, Context: 262144, Output Limit: 262144
aihubmix GPT-5-Pro gpt-5-pro 7.00 28.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix GPT-5.2 gpt-5.2 1.75 14.00 Provider: AIHubMix, Context: 400000, Output Limit: 128000
fireworksai Deepseek R1 05/28 deepseek-r1-0528 3.00 8.00 Provider: Fireworks AI, Context: 160000, Output Limit: 16384
fireworksai DeepSeek V3.1 deepseek-v3p1 0.56 1.68 Provider: Fireworks AI, Context: 163840, Output Limit: 163840
fireworksai DeepSeek V3.2 deepseek-v3p2 0.56 1.68 Provider: Fireworks AI, Context: 160000, Output Limit: 160000
fireworksai MiniMax-M2 minimax-m2 0.30 1.20 Provider: Fireworks AI, Context: 192000, Output Limit: 192000
fireworksai MiniMax-M2.1 minimax-m2p1 0.30 1.20 Provider: Fireworks AI, Context: 200000, Output Limit: 200000
fireworksai GLM 4.7 glm-4p7 0.60 2.20 Provider: Fireworks AI, Context: 198000, Output Limit: 198000
fireworksai Deepseek V3 03-24 deepseek-v3-0324 0.90 0.90 Provider: Fireworks AI, Context: 160000, Output Limit: 16384
fireworksai GLM 4.6 glm-4p6 0.55 2.19 Provider: Fireworks AI, Context: 198000, Output Limit: 198000
fireworksai Kimi K2 Thinking kimi-k2-thinking 0.60 2.50 Provider: Fireworks AI, Context: 256000, Output Limit: 256000
fireworksai Kimi K2 Instruct kimi-k2-instruct 1.00 3.00 Provider: Fireworks AI, Context: 128000, Output Limit: 16384
fireworksai Qwen3 235B-A22B qwen3-235b-a22b 0.22 0.88 Provider: Fireworks AI, Context: 128000, Output Limit: 16384
fireworksai GPT OSS 20B gpt-oss-20b 0.05 0.20 Provider: Fireworks AI, Context: 131072, Output Limit: 32768
fireworksai GPT OSS 120B gpt-oss-120b 0.15 0.60 Provider: Fireworks AI, Context: 131072, Output Limit: 32768
fireworksai GLM 4.5 Air glm-4p5-air 0.22 0.88 Provider: Fireworks AI, Context: 131072, Output Limit: 131072
fireworksai Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct 0.45 1.80 Provider: Fireworks AI, Context: 256000, Output Limit: 32768
fireworksai GLM 4.5 glm-4p5 0.55 2.19 Provider: Fireworks AI, Context: 131072, Output Limit: 131072
ionet Kimi K2 Instruct kimi-k2-instruct-0905 0.39 1.90 Provider: IO.NET, Context: 32768, Output Limit: 4096
ionet Kimi K2 Thinking kimi-k2-thinking 0.55 2.25 Provider: IO.NET, Context: 32768, Output Limit: 4096
ionet GPT-OSS 20B gpt-oss-20b 0.03 0.14 Provider: IO.NET, Context: 64000, Output Limit: 4096
ionet GPT-OSS 120B gpt-oss-120b 0.04 0.40 Provider: IO.NET, Context: 131072, Output Limit: 4096
ionet Devstral Small 2505 devstral-small-2505 0.05 0.22 Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet Mistral Nemo Instruct 2407 mistral-nemo-instruct-2407 0.02 0.04 Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet Magistral Small 2506 magistral-small-2506 0.50 1.50 Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet Mistral Large Instruct 2411 mistral-large-instruct-2411 2.00 6.00 Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet Llama 3.3 70B Instruct llama-3.3-70b-instruct 0.13 0.38 Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet Llama 4 Maverick 17B 128E Instruct llama-4-maverick-17b-128e-instruct-fp8 0.15 0.60 Provider: IO.NET, Context: 430000, Output Limit: 4096
ionet Llama 3.2 90B Vision Instruct llama-3.2-90b-vision-instruct 0.35 0.40 Provider: IO.NET, Context: 16000, Output Limit: 4096
ionet Qwen 3 Coder 480B qwen3-coder-480b-a35b-instruct-int4-mixed-ar 0.22 0.95 Provider: IO.NET, Context: 106000, Output Limit: 4096
ionet Qwen 2.5 VL 32B Instruct qwen2.5-vl-32b-instruct 0.05 0.22 Provider: IO.NET, Context: 32000, Output Limit: 4096
ionet Qwen 3 235B Thinking qwen3-235b-a22b-thinking-2507 0.11 0.60 Provider: IO.NET, Context: 262144, Output Limit: 4096
ionet Qwen 3 Next 80B Instruct qwen3-next-80b-a3b-instruct 0.10 0.80 Provider: IO.NET, Context: 262144, Output Limit: 4096
ionet GLM 4.6 glm-4.6 0.40 1.75 Provider: IO.NET, Context: 200000, Output Limit: 4096
ionet DeepSeek R1 deepseek-r1-0528 2.00 8.75 Provider: IO.NET, Context: 128000, Output Limit: 4096
modelscope GLM-4.5 glm-4.5 0.00 0.00 Provider: ModelScope, Context: 131072, Output Limit: 98304
modelscope GLM-4.6 glm-4.6 0.00 0.00 Provider: ModelScope, Context: 202752, Output Limit: 98304
modelscope Qwen3 30B A3B Thinking 2507 qwen3-30b-a3b-thinking-2507 0.00 0.00 Provider: ModelScope, Context: 262144, Output Limit: 32768
modelscope Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.00 0.00 Provider: ModelScope, Context: 262144, Output Limit: 131072
modelscope Qwen3 Coder 30B A3B Instruct qwen3-coder-30b-a3b-instruct 0.00 0.00 Provider: ModelScope, Context: 262144, Output Limit: 65536
modelscope Qwen3 30B A3B Instruct 2507 qwen3-30b-a3b-instruct-2507 0.00 0.00 Provider: ModelScope, Context: 262144, Output Limit: 16384
modelscope Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 0.00 0.00 Provider: ModelScope, Context: 262144, Output Limit: 131072
azurecognitiveservices GPT-3.5 Turbo 1106 gpt-3.5-turbo-1106 1.00 2.00 Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
azurecognitiveservices Mistral Small 3.1 mistral-small-2503 0.10 0.30 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices Codestral 25.01 codestral-2501 0.30 0.90 Provider: Azure Cognitive Services, Context: 256000, Output Limit: 256000
azurecognitiveservices Mistral Large 24.11 mistral-large-2411 2.00 6.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices GPT-5 Pro gpt-5-pro 15.00 120.00 Provider: Azure Cognitive Services, Context: 400000, Output Limit: 272000
azurecognitiveservices DeepSeek-V3.2 deepseek-v3.2 0.28 0.42 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices MAI-DS-R1 mai-ds-r1 1.35 5.40 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices GPT-5 gpt-5 1.25 10.00 Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices GPT-4o mini gpt-4o-mini 0.15 0.60 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices Phi-4-reasoning-plus phi-4-reasoning-plus 0.13 0.50 Provider: Azure Cognitive Services, Context: 32000, Output Limit: 4096
azurecognitiveservices GPT-4 Turbo Vision gpt-4-turbo-vision 10.00 30.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices Phi-4-reasoning phi-4-reasoning 0.13 0.50 Provider: Azure Cognitive Services, Context: 32000, Output Limit: 4096
azurecognitiveservices Phi-3-medium-instruct (4k) phi-3-medium-4k-instruct 0.17 0.68 Provider: Azure Cognitive Services, Context: 4096, Output Limit: 1024
azurecognitiveservices Codex Mini codex-mini 1.50 6.00 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices o3 o3 2.00 8.00 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices Mistral Nemo mistral-nemo 0.15 0.15 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices GPT-3.5 Turbo Instruct gpt-3.5-turbo-instruct 1.50 2.00 Provider: Azure Cognitive Services, Context: 4096, Output Limit: 4096
azurecognitiveservices Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.30 0.61 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices text-embedding-ada-002 text-embedding-ada-002 0.10 0.00 Provider: Azure Cognitive Services, Context: 8192, Output Limit: 1536
azurecognitiveservices Embed v3 English cohere-embed-v3-english 0.10 0.00 Provider: Azure Cognitive Services, Context: 512, Output Limit: 1024
azurecognitiveservices Llama 4 Scout 17B 16E Instruct llama-4-scout-17b-16e-instruct 0.20 0.78 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices o1-mini o1-mini 1.10 4.40 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 65536
azurecognitiveservices GPT-5 Mini gpt-5-mini 0.25 2.00 Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices Phi-3.5-MoE-instruct phi-3.5-moe-instruct 0.16 0.64 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices GPT-5.1 Chat gpt-5.1-chat 1.25 10.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices Grok 3 Mini grok-3-mini 0.30 0.50 Provider: Azure Cognitive Services, Context: 131072, Output Limit: 8192
azurecognitiveservices o1 o1 15.00 60.00 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices Meta-Llama-3-8B-Instruct meta-llama-3-8b-instruct 0.30 0.61 Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
azurecognitiveservices Phi-4-multimodal phi-4-multimodal 0.08 0.32 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices o4-mini o4-mini 1.10 4.40 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices GPT-4.1 gpt-4.1 2.00 8.00 Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
azurecognitiveservices Ministral 3B ministral-3b 0.04 0.04 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices GPT-3.5 Turbo 0301 gpt-3.5-turbo-0301 1.50 2.00 Provider: Azure Cognitive Services, Context: 4096, Output Limit: 4096
azurecognitiveservices GPT-4o gpt-4o 2.50 10.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices Phi-3-mini-instruct (128k) phi-3-mini-128k-instruct 0.13 0.52 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices Llama-3.2-90B-Vision-Instruct llama-3.2-90b-vision-instruct 2.04 2.04 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices GPT-5-Codex gpt-5-codex 1.25 10.00 Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
azurecognitiveservices GPT-5 Nano gpt-5-nano 0.05 0.40 Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices GPT-5.1 gpt-5.1 1.25 10.00 Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices o3-mini o3-mini 1.10 4.40 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices Model Router model-router 0.14 0.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices Kimi K2 Thinking kimi-k2-thinking 0.60 2.50 Provider: Azure Cognitive Services, Context: 262144, Output Limit: 262144
azurecognitiveservices GPT-5.1 Codex Mini gpt-5.1-codex-mini 0.25 2.00 Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
azurecognitiveservices Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.71 0.71 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices o1-preview o1-preview 16.50 66.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices Phi-3.5-mini-instruct phi-3.5-mini-instruct 0.13 0.52 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices GPT-3.5 Turbo 0613 gpt-3.5-turbo-0613 3.00 4.00 Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
azurecognitiveservices GPT-4 Turbo gpt-4-turbo 10.00 30.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices Meta-Llama-3.1-70B-Instruct meta-llama-3.1-70b-instruct 2.68 3.54 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices Phi-3-small-instruct (8k) phi-3-small-8k-instruct 0.15 0.60 Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
azurecognitiveservices DeepSeek-V3-0324 deepseek-v3-0324 1.14 4.56 Provider: Azure Cognitive Services, Context: 131072, Output Limit: 131072
azurecognitiveservices Meta-Llama-3-70B-Instruct meta-llama-3-70b-instruct 2.68 3.54 Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
azurecognitiveservices text-embedding-3-large text-embedding-3-large 0.13 0.00 Provider: Azure Cognitive Services, Context: 8191, Output Limit: 3072
azurecognitiveservices Grok 3 grok-3 3.00 15.00 Provider: Azure Cognitive Services, Context: 131072, Output Limit: 8192
azurecognitiveservices GPT-3.5 Turbo 0125 gpt-3.5-turbo-0125 0.50 1.50 Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
azurecognitiveservices Claude Sonnet 4.5 claude-sonnet-4-5 3.00 15.00 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
azurecognitiveservices Phi-4-mini-reasoning phi-4-mini-reasoning 0.08 0.30 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices Phi-4 phi-4 0.13 0.50 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices DeepSeek-V3.1 deepseek-v3.1 0.56 1.68 Provider: Azure Cognitive Services, Context: 131072, Output Limit: 131072
azurecognitiveservices GPT-5 Chat gpt-5-chat 1.25 10.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices GPT-4.1 mini gpt-4.1-mini 0.40 1.60 Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
azurecognitiveservices Llama 4 Maverick 17B 128E Instruct FP8 llama-4-maverick-17b-128e-instruct-fp8 0.25 1.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices Command R+ cohere-command-r-plus-08-2024 2.50 10.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4000
azurecognitiveservices Command A cohere-command-a 2.50 10.00 Provider: Azure Cognitive Services, Context: 256000, Output Limit: 8000
azurecognitiveservices Phi-3-small-instruct (128k) phi-3-small-128k-instruct 0.15 0.60 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices Claude Opus 4.5 claude-opus-4-5 5.00 25.00 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
azurecognitiveservices Mistral Medium 3 mistral-medium-2505 0.40 2.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices DeepSeek-V3.2-Speciale deepseek-v3.2-speciale 0.28 0.42 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices Claude Haiku 4.5 claude-haiku-4-5 1.00 5.00 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
azurecognitiveservices Phi-3-mini-instruct (4k) phi-3-mini-4k-instruct 0.13 0.52 Provider: Azure Cognitive Services, Context: 4096, Output Limit: 1024
azurecognitiveservices GPT-5.1 Codex gpt-5.1-codex 1.25 10.00 Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
azurecognitiveservices Grok Code Fast 1 grok-code-fast-1 0.20 1.50 Provider: Azure Cognitive Services, Context: 256000, Output Limit: 10000
azurecognitiveservices DeepSeek-R1 deepseek-r1 1.35 5.40 Provider: Azure Cognitive Services, Context: 163840, Output Limit: 163840
azurecognitiveservices Meta-Llama-3.1-405B-Instruct meta-llama-3.1-405b-instruct 5.33 16.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices GPT-4 32K gpt-4-32k 60.00 120.00 Provider: Azure Cognitive Services, Context: 32768, Output Limit: 32768
azurecognitiveservices Phi-4-mini phi-4-mini 0.08 0.30 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices Embed v3 Multilingual cohere-embed-v3-multilingual 0.10 0.00 Provider: Azure Cognitive Services, Context: 512, Output Limit: 1024
azurecognitiveservices Grok 4 grok-4 3.00 15.00 Provider: Azure Cognitive Services, Context: 256000, Output Limit: 64000
azurecognitiveservices Command R cohere-command-r-08-2024 0.15 0.60 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4000
azurecognitiveservices Embed v4 cohere-embed-v-4-0 0.12 0.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 1536
azurecognitiveservices Llama-3.2-11B-Vision-Instruct llama-3.2-11b-vision-instruct 0.37 0.37 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices GPT-5.2 Chat gpt-5.2-chat 1.75 14.00 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices Claude Opus 4.1 claude-opus-4-1 15.00 75.00 Provider: Azure Cognitive Services, Context: 200000, Output Limit: 32000
azurecognitiveservices GPT-4 gpt-4 60.00 120.00 Provider: Azure Cognitive Services, Context: 8192, Output Limit: 8192
azurecognitiveservices Phi-3-medium-instruct (128k) phi-3-medium-128k-instruct 0.17 0.68 Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices Grok 4 Fast (Reasoning) grok-4-fast-reasoning 0.20 0.50 Provider: Azure Cognitive Services, Context: 2000000, Output Limit: 30000
azurecognitiveservices DeepSeek-R1-0528 deepseek-r1-0528 1.35 5.40 Provider: Azure Cognitive Services, Context: 163840, Output Limit: 163840
azurecognitiveservices Grok 4 Fast (Non-Reasoning) grok-4-fast-non-reasoning 0.20 0.50 Provider: Azure Cognitive Services, Context: 2000000, Output Limit: 30000
azurecognitiveservices text-embedding-3-small text-embedding-3-small 0.02 0.00 Provider: Azure Cognitive Services, Context: 8191, Output Limit: 1536
azurecognitiveservices GPT-4.1 nano gpt-4.1-nano 0.10 0.40 Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
llama Llama-3.3-8B-Instruct llama-3.3-8b-instruct 0.00 0.00 Provider: Llama, Context: 128000, Output Limit: 4096
llama Llama-4-Maverick-17B-128E-Instruct-FP8 llama-4-maverick-17b-128e-instruct-fp8 0.00 0.00 Provider: Llama, Context: 128000, Output Limit: 4096
llama Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.00 0.00 Provider: Llama, Context: 128000, Output Limit: 4096
llama Llama-4-Scout-17B-16E-Instruct-FP8 llama-4-scout-17b-16e-instruct-fp8 0.00 0.00 Provider: Llama, Context: 128000, Output Limit: 4096
llama Groq-Llama-4-Maverick-17B-128E-Instruct groq-llama-4-maverick-17b-128e-instruct 0.00 0.00 Provider: Llama, Context: 128000, Output Limit: 4096
llama Cerebras-Llama-4-Scout-17B-16E-Instruct cerebras-llama-4-scout-17b-16e-instruct 0.00 0.00 Provider: Llama, Context: 128000, Output Limit: 4096
llama Cerebras-Llama-4-Maverick-17B-128E-Instruct cerebras-llama-4-maverick-17b-128e-instruct 0.00 0.00 Provider: Llama, Context: 128000, Output Limit: 4096
scaleway Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507 0.75 2.25 Provider: Scaleway, Context: 260000, Output Limit: 8192
scaleway Pixtral 12B 2409 pixtral-12b-2409 0.20 0.20 Provider: Scaleway, Context: 128000, Output Limit: 4096
scaleway Llama 3.1 8B Instruct llama-3.1-8b-instruct 0.20 0.20 Provider: Scaleway, Context: 128000, Output Limit: 16384
scaleway Mistral Nemo Instruct 2407 mistral-nemo-instruct-2407 0.20 0.20 Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway Mistral Small 3.2 24B Instruct (2506) mistral-small-3.2-24b-instruct-2506 0.15 0.35 Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway Qwen3-Coder 30B-A3B Instruct qwen3-coder-30b-a3b-instruct 0.20 0.80 Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.90 0.90 Provider: Scaleway, Context: 100000, Output Limit: 4096
scaleway Whisper Large v3 whisper-large-v3 0.00 0.00 Provider: Scaleway, Context: N/A, Output Limit: 4096
scaleway DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.90 0.90 Provider: Scaleway, Context: 32000, Output Limit: 4096
scaleway Voxtral Small 24B 2507 voxtral-small-24b-2507 0.15 0.35 Provider: Scaleway, Context: 32000, Output Limit: 8192
scaleway GPT-OSS 120B gpt-oss-120b 0.15 0.60 Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway BGE Multilingual Gemma2 bge-multilingual-gemma2 0.13 0.00 Provider: Scaleway, Context: 8191, Output Limit: 3072
scaleway Gemma-3-27B-IT gemma-3-27b-it 0.25 0.50 Provider: Scaleway, Context: 40000, Output Limit: 8192
amazonbedrock Command R+ cohere.command-r-plus-v1:0 3.00 15.00 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Claude 2 anthropic.claude-v2 8.00 24.00 Provider: Amazon Bedrock, Context: 100000, Output Limit: 4096
amazonbedrock Claude Sonnet 3.7 anthropic.claude-3-7-sonnet-20250219-v1:0 3.00 15.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock Claude Sonnet 4 anthropic.claude-sonnet-4-20250514-v1:0 3.00 15.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock Qwen3 Coder 30B A3B Instruct qwen.qwen3-coder-30b-a3b-v1:0 0.15 0.60 Provider: Amazon Bedrock, Context: 262144, Output Limit: 131072
amazonbedrock Gemma 3 4B IT google.gemma-3-4b-it 0.04 0.08 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock MiniMax M2 minimax.minimax-m2 0.30 1.20 Provider: Amazon Bedrock, Context: 204608, Output Limit: 128000
amazonbedrock Llama 3.2 11B Instruct meta.llama3-2-11b-instruct-v1:0 0.16 0.16 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Qwen/Qwen3-Next-80B-A3B-Instruct qwen.qwen3-next-80b-a3b 0.14 1.40 Provider: Amazon Bedrock, Context: 262000, Output Limit: 262000
amazonbedrock Claude Haiku 3 anthropic.claude-3-haiku-20240307-v1:0 0.25 1.25 Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock Llama 3.2 90B Instruct meta.llama3-2-90b-instruct-v1:0 0.72 0.72 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Qwen/Qwen3-VL-235B-A22B-Instruct qwen.qwen3-vl-235b-a22b 0.30 1.50 Provider: Amazon Bedrock, Context: 262000, Output Limit: 262000
amazonbedrock Llama 3.2 1B Instruct meta.llama3-2-1b-instruct-v1:0 0.10 0.10 Provider: Amazon Bedrock, Context: 131000, Output Limit: 4096
amazonbedrock Claude 2.1 anthropic.claude-v2:1 8.00 24.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock DeepSeek-V3.1 deepseek.v3-v1:0 0.58 1.68 Provider: Amazon Bedrock, Context: 163840, Output Limit: 81920
amazonbedrock Claude Opus 4.5 anthropic.claude-opus-4-5-20251101-v1:0 5.00 25.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock Command Light cohere.command-light-text-v14 0.30 0.60 Provider: Amazon Bedrock, Context: 4096, Output Limit: 4096
amazonbedrock Mistral Large (24.02) mistral.mistral-large-2402-v1:0 0.50 1.50 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Google Gemma 3 27B Instruct google.gemma-3-27b-it 0.12 0.20 Provider: Amazon Bedrock, Context: 202752, Output Limit: 8192
amazonbedrock NVIDIA Nemotron Nano 12B v2 VL BF16 nvidia.nemotron-nano-12b-v2 0.20 0.60 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Google Gemma 3 12B google.gemma-3-12b-it 0.05 0.10 Provider: Amazon Bedrock, Context: 131072, Output Limit: 8192
amazonbedrock Jamba 1.5 Large ai21.jamba-1-5-large-v1:0 2.00 8.00 Provider: Amazon Bedrock, Context: 256000, Output Limit: 4096
amazonbedrock Llama 3.3 70B Instruct meta.llama3-3-70b-instruct-v1:0 0.72 0.72 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Claude Opus 3 anthropic.claude-3-opus-20240229-v1:0 15.00 75.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock Nova Pro amazon.nova-pro-v1:0 0.80 3.20 Provider: Amazon Bedrock, Context: 300000, Output Limit: 8192
amazonbedrock Llama 3.1 8B Instruct meta.llama3-1-8b-instruct-v1:0 0.22 0.22 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock gpt-oss-120b openai.gpt-oss-120b-1:0 0.15 0.60 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Qwen3 32B (dense) qwen.qwen3-32b-v1:0 0.15 0.60 Provider: Amazon Bedrock, Context: 16384, Output Limit: 16384
amazonbedrock Claude Sonnet 3.5 anthropic.claude-3-5-sonnet-20240620-v1:0 3.00 15.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock Claude Haiku 4.5 anthropic.claude-haiku-4-5-20251001-v1:0 1.00 5.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock Command R cohere.command-r-v1:0 0.50 1.50 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Voxtral Small 24B 2507 mistral.voxtral-small-24b-2507 0.15 0.35 Provider: Amazon Bedrock, Context: 32000, Output Limit: 8192
amazonbedrock Nova Micro amazon.nova-micro-v1:0 0.04 0.14 Provider: Amazon Bedrock, Context: 128000, Output Limit: 8192
amazonbedrock Llama 3.1 70B Instruct meta.llama3-1-70b-instruct-v1:0 0.72 0.72 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Llama 3 70B Instruct meta.llama3-70b-instruct-v1:0 2.65 3.50 Provider: Amazon Bedrock, Context: 8192, Output Limit: 2048
amazonbedrock DeepSeek-R1 deepseek.r1-v1:0 1.35 5.40 Provider: Amazon Bedrock, Context: 128000, Output Limit: 32768
amazonbedrock Claude Sonnet 3.5 v2 anthropic.claude-3-5-sonnet-20241022-v2:0 3.00 15.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock Ministral 3 8B mistral.ministral-3-8b-instruct 0.15 0.15 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Command cohere.command-text-v14 1.50 2.00 Provider: Amazon Bedrock, Context: 4096, Output Limit: 4096
amazonbedrock Claude Opus 4 anthropic.claude-opus-4-20250514-v1:0 15.00 75.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 32000
amazonbedrock Voxtral Mini 3B 2507 mistral.voxtral-mini-3b-2507 0.04 0.04 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Claude Opus 4.5 (Global) global.anthropic.claude-opus-4-5-20251101-v1:0 5.00 25.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock Nova 2 Lite amazon.nova-2-lite-v1:0 0.33 2.75 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Qwen3 Coder 480B A35B Instruct qwen.qwen3-coder-480b-a35b-v1:0 0.22 1.80 Provider: Amazon Bedrock, Context: 131072, Output Limit: 65536
amazonbedrock Claude Sonnet 4.5 anthropic.claude-sonnet-4-5-20250929-v1:0 3.00 15.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock GPT OSS Safeguard 20B openai.gpt-oss-safeguard-20b 0.07 0.20 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock gpt-oss-20b openai.gpt-oss-20b-1:0 0.07 0.30 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Llama 3.2 3B Instruct meta.llama3-2-3b-instruct-v1:0 0.15 0.15 Provider: Amazon Bedrock, Context: 131000, Output Limit: 4096
amazonbedrock Claude Instant anthropic.claude-instant-v1 0.80 2.40 Provider: Amazon Bedrock, Context: 100000, Output Limit: 4096
amazonbedrock Nova Premier amazon.nova-premier-v1:0 2.50 12.50 Provider: Amazon Bedrock, Context: 1000000, Output Limit: 16384
amazonbedrock Mistral-7B-Instruct-v0.3 mistral.mistral-7b-instruct-v0:2 0.11 0.11 Provider: Amazon Bedrock, Context: 127000, Output Limit: 127000
amazonbedrock Mixtral-8x7B-Instruct-v0.1 mistral.mixtral-8x7b-instruct-v0:1 0.70 0.70 Provider: Amazon Bedrock, Context: 32000, Output Limit: 32000
amazonbedrock Claude Opus 4.1 anthropic.claude-opus-4-1-20250805-v1:0 15.00 75.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 32000
amazonbedrock Llama 4 Scout 17B Instruct meta.llama4-scout-17b-instruct-v1:0 0.17 0.66 Provider: Amazon Bedrock, Context: 3500000, Output Limit: 16384
amazonbedrock Jamba 1.5 Mini ai21.jamba-1-5-mini-v1:0 0.20 0.40 Provider: Amazon Bedrock, Context: 256000, Output Limit: 4096
amazonbedrock Llama 3 8B Instruct meta.llama3-8b-instruct-v1:0 0.30 0.60 Provider: Amazon Bedrock, Context: 8192, Output Limit: 2048
amazonbedrock Titan Text G1 - Express amazon.titan-text-express-v1:0:8k 0.20 0.60 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Claude Sonnet 3 anthropic.claude-3-sonnet-20240229-v1:0 3.00 15.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock NVIDIA Nemotron Nano 9B v2 nvidia.nemotron-nano-9b-v2 0.06 0.23 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Titan Text G1 - Express amazon.titan-text-express-v1 0.20 0.60 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Llama 4 Maverick 17B Instruct meta.llama4-maverick-17b-instruct-v1:0 0.24 0.97 Provider: Amazon Bedrock, Context: 1000000, Output Limit: 16384
amazonbedrock Ministral 14B 3.0 mistral.ministral-3-14b-instruct 0.20 0.20 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock GPT OSS Safeguard 120B openai.gpt-oss-safeguard-120b 0.15 0.60 Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock Qwen3 235B A22B 2507 qwen.qwen3-235b-a22b-2507-v1:0 0.22 0.88 Provider: Amazon Bedrock, Context: 262144, Output Limit: 131072
amazonbedrock Nova Lite amazon.nova-lite-v1:0 0.06 0.24 Provider: Amazon Bedrock, Context: 300000, Output Limit: 8192
amazonbedrock Claude Haiku 3.5 anthropic.claude-3-5-haiku-20241022-v1:0 0.80 4.00 Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock Kimi K2 Thinking moonshot.kimi-k2-thinking 0.60 2.50 Provider: Amazon Bedrock, Context: 256000, Output Limit: 256000
cerebras Qwen 3 235B Instruct qwen-3-235b-a22b-instruct-2507 0.60 1.20 Provider: Cerebras, Context: 131000, Output Limit: 32000
cerebras zai-glm-4.6 zai-glm-4.6 2.25 2.75 Source: cerebras, Context: 128000
cerebras GPT OSS 120B gpt-oss-120b 0.25 0.69 Provider: Cerebras, Context: 131072, Output Limit: 32768
bedrock amazon.nova-canvas-v1:0 amazon.nova-canvas-v1:0 0.00 0.00 Source: bedrock, Context: 2600
bedrock stability.stable-diffusion-xl-v1 stability.stable-diffusion-xl-v1 0.00 0.00 Source: bedrock, Context: 77
openai dall-e-2 dall-e-2 0.00 0.00 Source: openai, Context: N/A
bedrock stability.stable-diffusion-xl-v0 stability.stable-diffusion-xl-v0 0.00 0.00 Source: bedrock, Context: 77
bedrock ai21.j2-mid-v1 ai21.j2-mid-v1 12.50 12.50 Source: bedrock, Context: 8191
bedrock ai21.j2-ultra-v1 ai21.j2-ultra-v1 18.80 18.80 Source: bedrock, Context: 8191
bedrock ai21.jamba-1-5-large-v1:0 ai21.jamba-1-5-large-v1:0 2.00 8.00 Source: bedrock, Context: 256000
bedrock ai21.jamba-1-5-mini-v1:0 ai21.jamba-1-5-mini-v1:0 0.20 0.40 Source: bedrock, Context: 256000
bedrock ai21.jamba-instruct-v1:0 ai21.jamba-instruct-v1:0 0.50 0.70 Source: bedrock, Context: 70000
aiml dall-e-2 dall-e-2 0.00 0.00 Source: aiml, Context: N/A
aiml dall-e-3 dall-e-3 0.00 0.00 Source: aiml, Context: N/A
aiml flux-pro flux-pro 0.00 0.00 Source: aiml, Context: N/A
aiml v1.1 v1.1 0.00 0.00 Source: aiml, Context: N/A
aiml v1.1-ultra v1.1-ultra 0.00 0.00 Source: aiml, Context: N/A
aiml flux-realism flux-realism 0.00 0.00 Source: aiml, Context: N/A
aiml dev dev 0.00 0.00 Source: aiml, Context: N/A
aiml text-to-image text-to-image 0.00 0.00 Source: aiml, Context: N/A
aiml schnell schnell 0.00 0.00 Source: aiml, Context: N/A
aiml imagen-4.0-ultra-generate-001 imagen-4.0-ultra-generate-001 0.00 0.00 Source: aiml, Context: N/A
aiml nano-banana-pro nano-banana-pro 0.00 0.00 Source: aiml, Context: N/A
bedrockconverse us.writer.palmyra-x4-v1:0 us.writer.palmyra-x4-v1:0 2.50 10.00 Source: bedrock_converse, Context: 128000
bedrockconverse us.writer.palmyra-x5-v1:0 us.writer.palmyra-x5-v1:0 0.60 6.00 Source: bedrock_converse, Context: 1000000
bedrockconverse writer.palmyra-x4-v1:0 writer.palmyra-x4-v1:0 2.50 10.00 Source: bedrock_converse, Context: 128000
bedrockconverse writer.palmyra-x5-v1:0 writer.palmyra-x5-v1:0 0.60 6.00 Source: bedrock_converse, Context: 1000000
bedrockconverse amazon.nova-lite-v1:0 amazon.nova-lite-v1:0 0.06 0.24 Source: bedrock_converse, Context: 300000
bedrockconverse amazon.nova-2-lite-v1:0 amazon.nova-2-lite-v1:0 0.30 2.50 Source: bedrock_converse, Context: 1000000
bedrockconverse apac.amazon.nova-2-lite-v1:0 apac.amazon.nova-2-lite-v1:0 0.33 2.75 Source: bedrock_converse, Context: 1000000
bedrockconverse eu.amazon.nova-2-lite-v1:0 eu.amazon.nova-2-lite-v1:0 0.33 2.75 Source: bedrock_converse, Context: 1000000
bedrockconverse us.amazon.nova-2-lite-v1:0 us.amazon.nova-2-lite-v1:0 0.33 2.75 Source: bedrock_converse, Context: 1000000
bedrockconverse amazon.nova-micro-v1:0 amazon.nova-micro-v1:0 0.04 0.14 Source: bedrock_converse, Context: 128000
bedrockconverse amazon.nova-pro-v1:0 amazon.nova-pro-v1:0 0.80 3.20 Source: bedrock_converse, Context: 300000
bedrock amazon.rerank-v1:0 amazon.rerank-v1:0 0.00 0.00 Source: bedrock, Context: 32000
bedrock amazon.titan-embed-image-v1 amazon.titan-embed-image-v1 0.80 0.00 Source: bedrock, Context: 128
bedrock amazon.titan-embed-text-v1 amazon.titan-embed-text-v1 0.10 0.00 Source: bedrock, Context: 8192
bedrock amazon.titan-embed-text-v2:0 amazon.titan-embed-text-v2:0 0.20 0.00 Source: bedrock, Context: 8192
bedrock amazon.titan-image-generator-v1 amazon.titan-image-generator-v1 0.00 0.00 Source: bedrock, Context: N/A
bedrock amazon.titan-image-generator-v2 amazon.titan-image-generator-v2 0.00 0.00 Source: bedrock, Context: N/A
bedrock amazon.titan-image-generator-v2:0 amazon.titan-image-generator-v2:0 0.00 0.00 Source: bedrock, Context: N/A
bedrock twelvelabs.marengo-embed-2-7-v1:0 twelvelabs.marengo-embed-2-7-v1:0 70.00 0.00 Source: bedrock, Context: 77
bedrock us.twelvelabs.marengo-embed-2-7-v1:0 us.twelvelabs.marengo-embed-2-7-v1:0 70.00 0.00 Source: bedrock, Context: 77
bedrock eu.twelvelabs.marengo-embed-2-7-v1:0 eu.twelvelabs.marengo-embed-2-7-v1:0 70.00 0.00 Source: bedrock, Context: 77
bedrock twelvelabs.pegasus-1-2-v1:0 twelvelabs.pegasus-1-2-v1:0 0.00 7.50 Source: bedrock, Context: N/A
bedrock us.twelvelabs.pegasus-1-2-v1:0 us.twelvelabs.pegasus-1-2-v1:0 0.00 7.50 Source: bedrock, Context: N/A
bedrock eu.twelvelabs.pegasus-1-2-v1:0 eu.twelvelabs.pegasus-1-2-v1:0 0.00 7.50 Source: bedrock, Context: N/A
bedrock amazon.titan-text-express-v1 amazon.titan-text-express-v1 1.30 1.70 Source: bedrock, Context: 42000
bedrock amazon.titan-text-lite-v1 amazon.titan-text-lite-v1 0.30 0.40 Source: bedrock, Context: 42000
bedrock amazon.titan-text-premier-v1:0 amazon.titan-text-premier-v1:0 0.50 1.50 Source: bedrock, Context: 42000
bedrock anthropic.claude-3-5-haiku-20241022-v1:0 anthropic.claude-3-5-haiku-20241022-v1:0 0.80 4.00 Source: bedrock, Context: 200000
bedrockconverse anthropic.claude-haiku-4-5-20251001-v1:0 anthropic.claude-haiku-4-5-20251001-v1:0 1.00 5.00 Source: bedrock_converse, Context: 200000
bedrockconverse anthropic.claude-haiku-4-5@20251001 anthropic.claude-haiku-4-5@20251001 1.00 5.00 Source: bedrock_converse, Context: 200000
bedrock anthropic.claude-3-5-sonnet-20240620-v1:0 anthropic.claude-3-5-sonnet-20240620-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock anthropic.claude-3-5-sonnet-20241022-v2:0 anthropic.claude-3-5-sonnet-20241022-v2:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock anthropic.claude-3-7-sonnet-20240620-v1:0 anthropic.claude-3-7-sonnet-20240620-v1:0 3.60 18.00 Source: bedrock, Context: 200000
bedrockconverse anthropic.claude-3-7-sonnet-20250219-v1:0 anthropic.claude-3-7-sonnet-20250219-v1:0 3.00 15.00 Source: bedrock_converse, Context: 200000
bedrock anthropic.claude-3-haiku-20240307-v1:0 anthropic.claude-3-haiku-20240307-v1:0 0.25 1.25 Source: bedrock, Context: 200000
bedrock anthropic.claude-3-opus-20240229-v1:0 anthropic.claude-3-opus-20240229-v1:0 15.00 75.00 Source: bedrock, Context: 200000
bedrock anthropic.claude-3-sonnet-20240229-v1:0 anthropic.claude-3-sonnet-20240229-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock anthropic.claude-instant-v1 anthropic.claude-instant-v1 0.80 2.40 Source: bedrock, Context: 100000
bedrockconverse anthropic.claude-opus-4-1-20250805-v1:0 anthropic.claude-opus-4-1-20250805-v1:0 15.00 75.00 Source: bedrock_converse, Context: 200000
bedrockconverse anthropic.claude-opus-4-20250514-v1:0 anthropic.claude-opus-4-20250514-v1:0 15.00 75.00 Source: bedrock_converse, Context: 200000
bedrockconverse anthropic.claude-opus-4-5-20251101-v1:0 anthropic.claude-opus-4-5-20251101-v1:0 5.00 25.00 Source: bedrock_converse, Context: 200000
bedrockconverse anthropic.claude-sonnet-4-20250514-v1:0 anthropic.claude-sonnet-4-20250514-v1:0 3.00 15.00 Source: bedrock_converse, Context: 1000000
bedrockconverse anthropic.claude-sonnet-4-5-20250929-v1:0 anthropic.claude-sonnet-4-5-20250929-v1:0 3.00 15.00 Source: bedrock_converse, Context: 200000
bedrock anthropic.claude-v1 anthropic.claude-v1 8.00 24.00 Source: bedrock, Context: 100000
bedrock anthropic.claude-v2:1 anthropic.claude-v2:1 8.00 24.00 Source: bedrock, Context: 100000
anyscale zephyr-7b-beta zephyr-7b-beta 0.15 0.15 Source: anyscale, Context: 16384
anyscale CodeLlama-34b-Instruct-hf codellama-34b-instruct-hf 1.00 1.00 Source: anyscale, Context: 4096
anyscale CodeLlama-70b-Instruct-hf codellama-70b-instruct-hf 1.00 1.00 Source: anyscale, Context: 4096
anyscale gemma-7b-it gemma-7b-it 0.15 0.15 Source: anyscale, Context: 8192
anyscale Llama-2-13b-chat-hf llama-2-13b-chat-hf 0.25 0.25 Source: anyscale, Context: 4096
anyscale Llama-2-70b-chat-hf llama-2-70b-chat-hf 1.00 1.00 Source: anyscale, Context: 4096
anyscale Llama-2-7b-chat-hf llama-2-7b-chat-hf 0.15 0.15 Source: anyscale, Context: 4096
anyscale Meta-Llama-3-70B-Instruct meta-llama-3-70b-instruct 1.00 1.00 Source: anyscale, Context: 8192
anyscale Meta-Llama-3-8B-Instruct meta-llama-3-8b-instruct 0.15 0.15 Source: anyscale, Context: 8192
anyscale Mistral-7B-Instruct-v0.1 mistral-7b-instruct-v0.1 0.15 0.15 Source: anyscale, Context: 16384
anyscale Mixtral-8x22B-Instruct-v0.1 mixtral-8x22b-instruct-v0.1 0.90 0.90 Source: anyscale, Context: 65536
anyscale Mixtral-8x7B-Instruct-v0.1 mixtral-8x7b-instruct-v0.1 0.15 0.15 Source: anyscale, Context: 16384
bedrockconverse apac.amazon.nova-lite-v1:0 apac.amazon.nova-lite-v1:0 0.06 0.25 Source: bedrock_converse, Context: 300000
bedrockconverse apac.amazon.nova-micro-v1:0 apac.amazon.nova-micro-v1:0 0.04 0.15 Source: bedrock_converse, Context: 128000
bedrockconverse apac.amazon.nova-pro-v1:0 apac.amazon.nova-pro-v1:0 0.84 3.36 Source: bedrock_converse, Context: 300000
bedrock apac.anthropic.claude-3-5-sonnet-20240620-v1:0 apac.anthropic.claude-3-5-sonnet-20240620-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock apac.anthropic.claude-3-5-sonnet-20241022-v2:0 apac.anthropic.claude-3-5-sonnet-20241022-v2:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock apac.anthropic.claude-3-haiku-20240307-v1:0 apac.anthropic.claude-3-haiku-20240307-v1:0 0.25 1.25 Source: bedrock, Context: 200000
bedrockconverse apac.anthropic.claude-haiku-4-5-20251001-v1:0 apac.anthropic.claude-haiku-4-5-20251001-v1:0 1.10 5.50 Source: bedrock_converse, Context: 200000
bedrock apac.anthropic.claude-3-sonnet-20240229-v1:0 apac.anthropic.claude-3-sonnet-20240229-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrockconverse apac.anthropic.claude-sonnet-4-20250514-v1:0 apac.anthropic.claude-sonnet-4-20250514-v1:0 3.00 15.00 Source: bedrock_converse, Context: 1000000
assemblyai best best 0.00 0.00 Source: assemblyai, Context: N/A
assemblyai nano nano 0.00 0.00 Source: assemblyai, Context: N/A
bedrockconverse au.anthropic.claude-sonnet-4-5-20250929-v1:0 au.anthropic.claude-sonnet-4-5-20250929-v1:0 3.30 16.50 Source: bedrock_converse, Context: 200000
azure ada ada 0.10 0.00 Source: azure, Context: 8191
azure command-r-plus command-r-plus 3.00 15.00 Source: azure, Context: 128000
azureai claude-haiku-4-5 claude-haiku-4-5 1.00 5.00 Source: azure_ai, Context: 200000
azureai claude-opus-4-1 claude-opus-4-1 15.00 75.00 Source: azure_ai, Context: 200000
azureai claude-sonnet-4-5 claude-sonnet-4-5 3.00 15.00 Source: azure_ai, Context: 200000
azure computer-use-preview computer-use-preview 3.00 12.00 Source: azure, Context: 8192
azure container container 0.00 0.00 Source: azure, Context: N/A
azureai gpt-oss-120b gpt-oss-120b 0.15 0.60 Source: azure_ai, Context: 131072
azure gpt-4o-2024-08-06 gpt-4o-2024-08-06 2.75 11.00 Source: azure, Context: 128000
azure gpt-4o-2024-11-20 gpt-4o-2024-11-20 2.75 11.00 Source: azure, Context: 128000
azure gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18 0.17 0.66 Source: azure, Context: 128000
azure gpt-4o-mini-realtime-preview-2024-12-17 gpt-4o-mini-realtime-preview-2024-12-17 0.66 2.64 Source: azure, Context: 128000
azure gpt-4o-realtime-preview-2024-10-01 gpt-4o-realtime-preview-2024-10-01 5.50 22.00 Source: azure, Context: 128000
azure gpt-4o-realtime-preview-2024-12-17 gpt-4o-realtime-preview-2024-12-17 5.50 22.00 Source: azure, Context: 128000
azure gpt-5-2025-08-07 gpt-5-2025-08-07 1.38 11.00 Source: azure, Context: 272000
azure gpt-5-mini-2025-08-07 gpt-5-mini-2025-08-07 0.28 2.20 Source: azure, Context: 272000
azure gpt-5-nano-2025-08-07 gpt-5-nano-2025-08-07 0.06 0.44 Source: azure, Context: 272000
azure o1-2024-12-17 o1-2024-12-17 16.50 66.00 Source: azure, Context: 200000
azure o1-mini-2024-09-12 o1-mini-2024-09-12 1.21 4.84 Source: azure, Context: 128000
azure o1-preview-2024-09-12 o1-preview-2024-09-12 16.50 66.00 Source: azure, Context: 128000
azure o3-mini-2025-01-31 o3-mini-2025-01-31 1.21 4.84 Source: azure, Context: 200000
azure gpt-3.5-turbo gpt-3.5-turbo 0.50 1.50 Source: azure, Context: 4097
azuretext gpt-3.5-turbo-instruct-0914 gpt-3.5-turbo-instruct-0914 1.50 2.00 Source: azure_text, Context: 4097
azure gpt-35-turbo gpt-35-turbo 0.50 1.50 Source: azure, Context: 4097
azure gpt-35-turbo-0125 gpt-35-turbo-0125 0.50 1.50 Source: azure, Context: 16384
azure gpt-35-turbo-0301 gpt-35-turbo-0301 0.20 2.00 Source: azure, Context: 4097
azure gpt-35-turbo-0613 gpt-35-turbo-0613 1.50 2.00 Source: azure, Context: 4097
azure gpt-35-turbo-1106 gpt-35-turbo-1106 1.00 2.00 Source: azure, Context: 16384
azure gpt-35-turbo-16k gpt-35-turbo-16k 3.00 4.00 Source: azure, Context: 16385
azure gpt-35-turbo-16k-0613 gpt-35-turbo-16k-0613 3.00 4.00 Source: azure, Context: 16385
azuretext gpt-35-turbo-instruct gpt-35-turbo-instruct 1.50 2.00 Source: azure_text, Context: 4097
azuretext gpt-35-turbo-instruct-0914 gpt-35-turbo-instruct-0914 1.50 2.00 Source: azure_text, Context: 4097
azure gpt-4-0125-preview gpt-4-0125-preview 10.00 30.00 Source: azure, Context: 128000
azure gpt-4-0613 gpt-4-0613 30.00 60.00 Source: azure, Context: 8192
azure gpt-4-1106-preview gpt-4-1106-preview 10.00 30.00 Source: azure, Context: 128000
azure gpt-4-32k-0613 gpt-4-32k-0613 60.00 120.00 Source: azure, Context: 32768
azure gpt-4-turbo-2024-04-09 gpt-4-turbo-2024-04-09 10.00 30.00 Source: azure, Context: 128000
azure gpt-4-turbo-vision-preview gpt-4-turbo-vision-preview 10.00 30.00 Source: azure, Context: 128000
azure gpt-4.1-2025-04-14 gpt-4.1-2025-04-14 2.00 8.00 Source: azure, Context: 1047576
azure gpt-4.1-mini-2025-04-14 gpt-4.1-mini-2025-04-14 0.40 1.60 Source: azure, Context: 1047576
azure gpt-4.1-nano-2025-04-14 gpt-4.1-nano-2025-04-14 0.10 0.40 Source: azure, Context: 1047576
azure gpt-4.5-preview gpt-4.5-preview 75.00 150.00 Source: azure, Context: 128000
azure gpt-4o-2024-05-13 gpt-4o-2024-05-13 5.00 15.00 Source: azure, Context: 128000
azure gpt-audio-2025-08-28 gpt-audio-2025-08-28 2.50 10.00 Source: azure, Context: 128000
azure gpt-audio-mini-2025-10-06 gpt-audio-mini-2025-10-06 0.60 2.40 Source: azure, Context: 128000
azure gpt-4o-audio-preview-2024-12-17 gpt-4o-audio-preview-2024-12-17 2.50 10.00 Source: azure, Context: 128000
azure gpt-4o-mini-audio-preview-2024-12-17 gpt-4o-mini-audio-preview-2024-12-17 2.50 10.00 Source: azure, Context: 128000
azure gpt-realtime-2025-08-28 gpt-realtime-2025-08-28 4.00 16.00 Source: azure, Context: 32000
azure gpt-realtime-mini-2025-10-06 gpt-realtime-mini-2025-10-06 0.60 2.40 Source: azure, Context: 32000
azure gpt-4o-mini-transcribe gpt-4o-mini-transcribe 1.25 5.00 Source: azure, Context: 16000
azure gpt-4o-mini-tts gpt-4o-mini-tts 2.50 10.00 Source: azure, Context: N/A
azure gpt-4o-transcribe gpt-4o-transcribe 2.50 10.00 Source: azure, Context: 16000
azure gpt-4o-transcribe-diarize gpt-4o-transcribe-diarize 2.50 10.00 Source: azure, Context: 16000
azure gpt-5.1-2025-11-13 gpt-5.1-2025-11-13 1.25 10.00 Source: azure, Context: 272000
azure gpt-5.1-chat-2025-11-13 gpt-5.1-chat-2025-11-13 1.25 10.00 Source: azure, Context: 128000
azure gpt-5.1-codex-2025-11-13 gpt-5.1-codex-2025-11-13 1.25 10.00 Source: azure, Context: 272000
azure gpt-5.1-codex-mini-2025-11-13 gpt-5.1-codex-mini-2025-11-13 0.25 2.00 Source: azure, Context: 272000
azure gpt-5-chat-latest gpt-5-chat-latest 1.25 10.00 Source: azure, Context: 128000
azure gpt-5.2-2025-12-11 gpt-5.2-2025-12-11 1.75 14.00 Source: azure, Context: 400000
azure gpt-5.2-chat-2025-12-11 gpt-5.2-chat-2025-12-11 1.75 14.00 Source: azure, Context: 128000
azure gpt-5.2-pro gpt-5.2-pro 21.00 168.00 Source: azure, Context: 400000
azure gpt-5.2-pro-2025-12-11 gpt-5.2-pro-2025-12-11 21.00 168.00 Source: azure, Context: 400000
azure gpt-image-1 gpt-image-1 5.00 0.00 Source: azure, Context: N/A
azure dall-e-3 dall-e-3 0.00 0.00 Source: azure, Context: N/A
azure gpt-image-1-mini gpt-image-1-mini 2.00 0.00 Source: azure, Context: N/A
azure gpt-image-1.5 gpt-image-1.5 5.00 0.00 Source: azure, Context: N/A
azure gpt-image-1.5-2025-12-16 gpt-image-1.5-2025-12-16 5.00 0.00 Source: azure, Context: N/A
azure mistral-large-2402 mistral-large-2402 8.00 24.00 Source: azure, Context: 32000
azure mistral-large-latest mistral-large-latest 8.00 24.00 Source: azure, Context: 32000
azure o3-2025-04-16 o3-2025-04-16 2.00 8.00 Source: azure, Context: 200000
azure o3-deep-research o3-deep-research 10.00 40.00 Source: azure, Context: 200000
azure o3-pro o3-pro 20.00 80.00 Source: azure, Context: 200000
azure o3-pro-2025-06-10 o3-pro-2025-06-10 20.00 80.00 Source: azure, Context: 200000
azure o4-mini-2025-04-16 o4-mini-2025-04-16 1.10 4.40 Source: azure, Context: 200000
azure dall-e-2 dall-e-2 0.00 0.00 Source: azure, Context: N/A
azure azure-tts azure-tts 0.00 0.00 Source: azure, Context: N/A
azure azure-tts-hd azure-tts-hd 0.00 0.00 Source: azure, Context: N/A
azure tts-1 tts-1 0.00 0.00 Source: azure, Context: N/A
azure tts-1-hd tts-1-hd 0.00 0.00 Source: azure, Context: N/A
azure whisper-1 whisper-1 0.00 0.00 Source: azure, Context: N/A
azureai Cohere-embed-v3-english cohere-embed-v3-english 0.10 0.00 Source: azure_ai, Context: 512
azureai Cohere-embed-v3-multilingual cohere-embed-v3-multilingual 0.10 0.00 Source: azure_ai, Context: 512
azureai FLUX-1.1-pro flux-1.1-pro 0.00 0.00 Source: azure_ai, Context: N/A
azureai FLUX.1-Kontext-pro flux.1-kontext-pro 0.00 0.00 Source: azure_ai, Context: N/A
azureai Llama-3.2-11B-Vision-Instruct llama-3.2-11b-vision-instruct 0.37 0.37 Source: azure_ai, Context: 128000
azureai Llama-3.2-90B-Vision-Instruct llama-3.2-90b-vision-instruct 2.04 2.04 Source: azure_ai, Context: 128000
azureai Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.71 0.71 Source: azure_ai, Context: 128000
azureai Llama-4-Maverick-17B-128E-Instruct-FP8 llama-4-maverick-17b-128e-instruct-fp8 1.41 0.35 Source: azure_ai, Context: 1000000
azureai Llama-4-Scout-17B-16E-Instruct llama-4-scout-17b-16e-instruct 0.20 0.78 Source: azure_ai, Context: 10000000
azureai Meta-Llama-3-70B-Instruct meta-llama-3-70b-instruct 1.10 0.37 Source: azure_ai, Context: 8192
azureai Meta-Llama-3.1-405B-Instruct meta-llama-3.1-405b-instruct 5.33 16.00 Source: azure_ai, Context: 128000
azureai Meta-Llama-3.1-70B-Instruct meta-llama-3.1-70b-instruct 2.68 3.54 Source: azure_ai, Context: 128000
azureai Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.30 0.61 Source: azure_ai, Context: 128000
azureai Phi-3-medium-128k-instruct phi-3-medium-128k-instruct 0.17 0.68 Source: azure_ai, Context: 128000
azureai Phi-3-medium-4k-instruct phi-3-medium-4k-instruct 0.17 0.68 Source: azure_ai, Context: 4096
azureai Phi-3-mini-128k-instruct phi-3-mini-128k-instruct 0.13 0.52 Source: azure_ai, Context: 128000
azureai Phi-3-mini-4k-instruct phi-3-mini-4k-instruct 0.13 0.52 Source: azure_ai, Context: 4096
azureai Phi-3-small-128k-instruct phi-3-small-128k-instruct 0.15 0.60 Source: azure_ai, Context: 128000
azureai Phi-3-small-8k-instruct phi-3-small-8k-instruct 0.15 0.60 Source: azure_ai, Context: 8192
azureai Phi-3.5-MoE-instruct phi-3.5-moe-instruct 0.16 0.64 Source: azure_ai, Context: 128000
azureai Phi-3.5-mini-instruct phi-3.5-mini-instruct 0.13 0.52 Source: azure_ai, Context: 128000
azureai Phi-3.5-vision-instruct phi-3.5-vision-instruct 0.13 0.52 Source: azure_ai, Context: 128000
azureai Phi-4 phi-4 0.13 0.50 Source: azure_ai, Context: 16384
azureai Phi-4-mini-instruct phi-4-mini-instruct 0.08 0.30 Source: azure_ai, Context: 131072
azureai Phi-4-multimodal-instruct phi-4-multimodal-instruct 0.08 0.32 Source: azure_ai, Context: 131072
azureai Phi-4-mini-reasoning phi-4-mini-reasoning 0.08 0.32 Source: azure_ai, Context: 131072
azureai Phi-4-reasoning phi-4-reasoning 0.13 0.50 Source: azure_ai, Context: 32768
azureai mistral-document-ai-2505 mistral-document-ai-2505 0.00 0.00 Source: azure_ai, Context: N/A
azureai prebuilt-read prebuilt-read 0.00 0.00 Source: azure_ai, Context: N/A
azureai prebuilt-layout prebuilt-layout 0.00 0.00 Source: azure_ai, Context: N/A
azureai prebuilt-document prebuilt-document 0.00 0.00 Source: azure_ai, Context: N/A
azureai MAI-DS-R1 mai-ds-r1 1.35 5.40 Source: azure_ai, Context: 128000
azureai cohere-rerank-v3-english cohere-rerank-v3-english 0.00 0.00 Source: azure_ai, Context: 4096
azureai cohere-rerank-v3-multilingual cohere-rerank-v3-multilingual 0.00 0.00 Source: azure_ai, Context: 4096
azureai cohere-rerank-v3.5 cohere-rerank-v3.5 0.00 0.00 Source: azure_ai, Context: 4096
azureai cohere-rerank-v4.0-pro cohere-rerank-v4.0-pro 0.00 0.00 Source: azure_ai, Context: 32768
azureai cohere-rerank-v4.0-fast cohere-rerank-v4.0-fast 0.00 0.00 Source: azure_ai, Context: 32768
azureai deepseek-v3.2 deepseek-v3.2 0.58 1.68 Source: azure_ai, Context: 163840
azureai deepseek-v3.2-speciale deepseek-v3.2-speciale 0.58 1.68 Source: azure_ai, Context: 163840
azureai deepseek-r1 deepseek-r1 1.35 5.40 Source: azure_ai, Context: 128000
azureai deepseek-v3 deepseek-v3 1.14 4.56 Source: azure_ai, Context: 128000
azureai deepseek-v3-0324 deepseek-v3-0324 1.14 4.56 Source: azure_ai, Context: 128000
azureai embed-v-4-0 embed-v-4-0 0.12 0.00 Source: azure_ai, Context: 128000
azureai grok-3 grok-3 3.00 15.00 Source: azure_ai, Context: 131072
azureai grok-3-mini grok-3-mini 0.25 1.27 Source: azure_ai, Context: 131072
azureai grok-4 grok-4 5.50 27.50 Source: azure_ai, Context: 131072
azureai grok-4-fast-non-reasoning grok-4-fast-non-reasoning 0.43 1.73 Source: azure_ai, Context: 131072
azureai grok-4-fast-reasoning grok-4-fast-reasoning 0.43 1.73 Source: azure_ai, Context: 131072
azureai grok-code-fast-1 grok-code-fast-1 3.50 17.50 Source: azure_ai, Context: 131072
azureai jais-30b-chat jais-30b-chat 3,200.00 9,710.00 Source: azure_ai, Context: 8192
azureai jamba-instruct jamba-instruct 0.50 0.70 Source: azure_ai, Context: 70000
azureai ministral-3b ministral-3b 0.04 0.04 Source: azure_ai, Context: 128000
azureai mistral-large mistral-large 4.00 12.00 Source: azure_ai, Context: 32000
azureai mistral-large-2407 mistral-large-2407 2.00 6.00 Source: azure_ai, Context: 128000
azureai mistral-large-latest mistral-large-latest 2.00 6.00 Source: azure_ai, Context: 128000
azureai mistral-large-3 mistral-large-3 0.50 1.50 Source: azure_ai, Context: 256000
azureai mistral-medium-2505 mistral-medium-2505 0.40 2.00 Source: azure_ai, Context: 131072
azureai mistral-nemo mistral-nemo 0.15 0.15 Source: azure_ai, Context: 131072
azureai mistral-small mistral-small 1.00 3.00 Source: azure_ai, Context: 32000
azureai mistral-small-2503 mistral-small-2503 1.00 3.00 Source: azure_ai, Context: 128000
textcompletionopenai babbage-002 babbage-002 0.40 0.40 Source: text-completion-openai, Context: 16384
bedrock cohere.command-light-text-v14 cohere.command-light-text-v14 0.30 0.60 Source: bedrock, Context: 4096
bedrock cohere.command-text-v14 cohere.command-text-v14 1.50 2.00 Source: bedrock, Context: 4096
bedrock meta.llama3-70b-instruct-v1:0 meta.llama3-70b-instruct-v1:0 3.18 4.20 Source: bedrock, Context: 8192
bedrock meta.llama3-8b-instruct-v1:0 meta.llama3-8b-instruct-v1:0 0.36 0.72 Source: bedrock, Context: 8192
bedrock mistral.mistral-7b-instruct-v0:2 mistral.mistral-7b-instruct-v0:2 0.20 0.26 Source: bedrock, Context: 32000
bedrock mistral.mistral-large-2402-v1:0 mistral.mistral-large-2402-v1:0 10.40 31.20 Source: bedrock, Context: 32000
bedrock mistral.mixtral-8x7b-instruct-v0:1 mistral.mixtral-8x7b-instruct-v0:1 0.59 0.91 Source: bedrock, Context: 32000
bedrock amazon.nova-pro-v1:0 amazon.nova-pro-v1:0 0.96 3.84 Source: bedrock, Context: 300000
bedrock claude-sonnet-4-5-20250929-v1:0 claude-sonnet-4-5-20250929-v1:0 3.30 16.50 Source: bedrock, Context: 200000
bedrock anthropic.claude-3-7-sonnet-20250219-v1:0 anthropic.claude-3-7-sonnet-20250219-v1:0 3.60 18.00 Source: bedrock, Context: 200000
bedrock us.anthropic.claude-3-5-haiku-20241022-v1:0 us.anthropic.claude-3-5-haiku-20241022-v1:0 0.80 4.00 Source: bedrock, Context: 200000
cerebras llama-3.3-70b llama-3.3-70b 0.85 1.20 Source: cerebras, Context: 128000
cerebras llama3.1-70b llama3.1-70b 0.60 0.60 Source: cerebras, Context: 128000
cerebras llama3.1-8b llama3.1-8b 0.10 0.10 Source: cerebras, Context: 128000
cerebras qwen-3-32b qwen-3-32b 0.40 0.80 Source: cerebras, Context: 128000
vertex chat-bison chat-bison 0.13 0.13 Source: vertex, Context: 8192
vertex chat-bison-32k chat-bison-32k 0.13 0.13 Source: vertex, Context: 32000
vertex chat-bison-32k@002 chat-bison-32k@002 0.13 0.13 Source: vertex, Context: 32000
vertex chat-bison@001 chat-bison@001 0.13 0.13 Source: vertex, Context: 8192
vertex chat-bison@002 chat-bison@002 0.13 0.13 Source: vertex, Context: 8192
nlpcloud chatdolphin chatdolphin 0.50 0.50 Source: nlp_cloud, Context: 16384
openai chatgpt-4o-latest chatgpt-4o-latest 5.00 15.00 Source: openai, Context: 128000
openai gpt-4o-transcribe-diarize gpt-4o-transcribe-diarize 2.50 10.00 Source: openai, Context: 16000
anthropic claude-3-5-sonnet-latest claude-3-5-sonnet-latest 3.00 15.00 Source: anthropic, Context: 200000
anthropic claude-3-opus-latest claude-3-opus-latest 15.00 75.00 Source: anthropic, Context: 200000
anthropic claude-4-opus-20250514 claude-4-opus-20250514 15.00 75.00 Source: anthropic, Context: 200000
anthropic claude-4-sonnet-20250514 claude-4-sonnet-20250514 3.00 15.00 Source: anthropic, Context: 1000000
cloudflare llama-2-7b-chat-fp16 llama-2-7b-chat-fp16 1.92 1.92 Source: cloudflare, Context: 3072
cloudflare llama-2-7b-chat-int8 llama-2-7b-chat-int8 1.92 1.92 Source: cloudflare, Context: 2048
cloudflare mistral-7b-instruct-v0.1 mistral-7b-instruct-v0.1 1.92 1.92 Source: cloudflare, Context: 8192
cloudflare codellama-7b-instruct-awq codellama-7b-instruct-awq 1.92 1.92 Source: cloudflare, Context: 4096
vertex code-bison code-bison 0.13 0.13 Source: vertex, Context: 6144
vertex code-bison-32k@002 code-bison-32k@002 0.13 0.13 Source: vertex, Context: 6144
vertex code-bison32k code-bison32k 0.13 0.13 Source: vertex, Context: 6144
vertex code-bison@001 code-bison@001 0.13 0.13 Source: vertex, Context: 6144
vertex code-bison@002 code-bison@002 0.13 0.13 Source: vertex, Context: 6144
vertex code-gecko code-gecko 0.13 0.13 Source: vertex, Context: 2048
vertex code-gecko-latest code-gecko-latest 0.13 0.13 Source: vertex, Context: 2048
vertex code-gecko@001 code-gecko@001 0.13 0.13 Source: vertex, Context: 2048
vertex code-gecko@002 code-gecko@002 0.13 0.13 Source: vertex, Context: 2048
vertex codechat-bison codechat-bison 0.13 0.13 Source: vertex, Context: 6144
vertex codechat-bison-32k codechat-bison-32k 0.13 0.13 Source: vertex, Context: 32000
vertex codechat-bison-32k@002 codechat-bison-32k@002 0.13 0.13 Source: vertex, Context: 32000
vertex codechat-bison@001 codechat-bison@001 0.13 0.13 Source: vertex, Context: 6144
vertex codechat-bison@002 codechat-bison@002 0.13 0.13 Source: vertex, Context: 6144
vertex codechat-bison@latest codechat-bison@latest 0.13 0.13 Source: vertex, Context: 6144
codestral codestral-2405 codestral-2405 0.00 0.00 Source: codestral, Context: 32000
codestral codestral-latest codestral-latest 0.00 0.00 Source: codestral, Context: 32000
bedrock cohere.command-r-plus-v1:0 cohere.command-r-plus-v1:0 3.00 15.00 Source: bedrock, Context: 128000
bedrock cohere.command-r-v1:0 cohere.command-r-v1:0 0.50 1.50 Source: bedrock, Context: 128000
bedrock cohere.embed-english-v3 cohere.embed-english-v3 0.10 0.00 Source: bedrock, Context: 512
bedrock cohere.embed-multilingual-v3 cohere.embed-multilingual-v3 0.10 0.00 Source: bedrock, Context: 512
bedrock cohere.embed-v4:0 cohere.embed-v4:0 0.12 0.00 Source: bedrock, Context: 128000
cohere embed-v4.0 embed-v4.0 0.12 0.00 Source: cohere, Context: 128000
bedrock cohere.rerank-v3-5:0 cohere.rerank-v3-5:0 0.00 0.00 Source: bedrock, Context: 32000
cohere command command 1.00 2.00 Source: cohere, Context: 4096
coherechat command-a-03-2025 command-a-03-2025 2.50 10.00 Source: cohere_chat, Context: 256000
coherechat command-light command-light 0.30 0.60 Source: cohere_chat, Context: 4096
cohere command-nightly command-nightly 1.00 2.00 Source: cohere, Context: 4096
coherechat command-r command-r 0.15 0.60 Source: cohere_chat, Context: 128000
coherechat command-r-08-2024 command-r-08-2024 0.15 0.60 Source: cohere_chat, Context: 128000
coherechat command-r-plus command-r-plus 2.50 10.00 Source: cohere_chat, Context: 128000
coherechat command-r-plus-08-2024 command-r-plus-08-2024 2.50 10.00 Source: cohere_chat, Context: 128000
coherechat command-r7b-12-2024 command-r7b-12-2024 0.15 0.04 Source: cohere_chat, Context: 128000
dashscope qwen-coder qwen-coder 0.30 1.50 Source: dashscope, Context: 1000000
dashscope qwen-flash qwen-flash 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen-flash-2025-07-28 qwen-flash-2025-07-28 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen-max qwen-max 1.60 6.40 Source: dashscope, Context: 30720
dashscope qwen-plus qwen-plus 0.40 1.20 Source: dashscope, Context: 129024
dashscope qwen-plus-2025-01-25 qwen-plus-2025-01-25 0.40 1.20 Source: dashscope, Context: 129024
dashscope qwen-plus-2025-04-28 qwen-plus-2025-04-28 0.40 1.20 Source: dashscope, Context: 129024
dashscope qwen-plus-2025-07-14 qwen-plus-2025-07-14 0.40 1.20 Source: dashscope, Context: 129024
dashscope qwen-plus-2025-07-28 qwen-plus-2025-07-28 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen-plus-2025-09-11 qwen-plus-2025-09-11 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen-plus-latest qwen-plus-latest 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen-turbo qwen-turbo 0.05 0.20 Source: dashscope, Context: 129024
dashscope qwen-turbo-2024-11-01 qwen-turbo-2024-11-01 0.05 0.20 Source: dashscope, Context: 1000000
dashscope qwen-turbo-2025-04-28 qwen-turbo-2025-04-28 0.05 0.20 Source: dashscope, Context: 1000000
dashscope qwen-turbo-latest qwen-turbo-latest 0.05 0.20 Source: dashscope, Context: 1000000
dashscope qwen3-30b-a3b qwen3-30b-a3b 0.00 0.00 Source: dashscope, Context: 129024
dashscope qwen3-coder-flash qwen3-coder-flash 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen3-coder-flash-2025-07-28 qwen3-coder-flash-2025-07-28 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen3-coder-plus qwen3-coder-plus 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen3-coder-plus-2025-07-22 qwen3-coder-plus-2025-07-22 0.00 0.00 Source: dashscope, Context: 997952
dashscope qwen3-max-preview qwen3-max-preview 0.00 0.00 Source: dashscope, Context: 258048
dashscope qwq-plus qwq-plus 0.80 2.40 Source: dashscope, Context: 98304
databricks databricks-bge-large-en databricks-bge-large-en 0.10 0.00 Source: databricks, Context: 512
databricks databricks-claude-3-7-sonnet databricks-claude-3-7-sonnet 3.00 15.00 Source: databricks, Context: 200000
databricks databricks-claude-haiku-4-5 databricks-claude-haiku-4-5 1.00 5.00 Source: databricks, Context: 200000
databricks databricks-claude-opus-4 databricks-claude-opus-4 15.00 75.00 Source: databricks, Context: 200000
databricks databricks-claude-opus-4-1 databricks-claude-opus-4-1 15.00 75.00 Source: databricks, Context: 200000
databricks databricks-claude-opus-4-5 databricks-claude-opus-4-5 5.00 25.00 Source: databricks, Context: 200000
databricks databricks-claude-sonnet-4 databricks-claude-sonnet-4 3.00 15.00 Source: databricks, Context: 200000
databricks databricks-claude-sonnet-4-1 databricks-claude-sonnet-4-1 3.00 15.00 Source: databricks, Context: 200000
databricks databricks-claude-sonnet-4-5 databricks-claude-sonnet-4-5 3.00 15.00 Source: databricks, Context: 200000
databricks databricks-gemini-2-5-flash databricks-gemini-2-5-flash 0.30 2.50 Source: databricks, Context: 1048576
databricks databricks-gemini-2-5-pro databricks-gemini-2-5-pro 1.25 10.00 Source: databricks, Context: 1048576
databricks databricks-gemma-3-12b databricks-gemma-3-12b 0.15 0.50 Source: databricks, Context: 128000
databricks databricks-gpt-5 databricks-gpt-5 1.25 10.00 Source: databricks, Context: 400000
databricks databricks-gpt-5-1 databricks-gpt-5-1 1.25 10.00 Source: databricks, Context: 400000
databricks databricks-gpt-5-mini databricks-gpt-5-mini 0.25 2.00 Source: databricks, Context: 400000
databricks databricks-gpt-5-nano databricks-gpt-5-nano 0.05 0.40 Source: databricks, Context: 400000
databricks databricks-gpt-oss-120b databricks-gpt-oss-120b 0.15 0.60 Source: databricks, Context: 131072
databricks databricks-gpt-oss-20b databricks-gpt-oss-20b 0.07 0.30 Source: databricks, Context: 131072
databricks databricks-gte-large-en databricks-gte-large-en 0.13 0.00 Source: databricks, Context: 8192
databricks databricks-llama-2-70b-chat databricks-llama-2-70b-chat 0.50 1.50 Source: databricks, Context: 4096
databricks databricks-llama-4-maverick databricks-llama-4-maverick 0.50 1.50 Source: databricks, Context: 128000
databricks databricks-meta-llama-3-1-405b-instruct databricks-meta-llama-3-1-405b-instruct 5.00 15.00 Source: databricks, Context: 128000
databricks databricks-meta-llama-3-1-8b-instruct databricks-meta-llama-3-1-8b-instruct 0.15 0.45 Source: databricks, Context: 200000
databricks databricks-meta-llama-3-3-70b-instruct databricks-meta-llama-3-3-70b-instruct 0.50 1.50 Source: databricks, Context: 128000
databricks databricks-meta-llama-3-70b-instruct databricks-meta-llama-3-70b-instruct 1.00 3.00 Source: databricks, Context: 128000
databricks databricks-mixtral-8x7b-instruct databricks-mixtral-8x7b-instruct 0.50 1.00 Source: databricks, Context: 4096
databricks databricks-mpt-30b-instruct databricks-mpt-30b-instruct 1.00 1.00 Source: databricks, Context: 8192
databricks databricks-mpt-7b-instruct databricks-mpt-7b-instruct 0.50 0.00 Source: databricks, Context: 8192
dataforseo search search 0.00 0.00 Source: dataforseo, Context: N/A
textcompletionopenai davinci-002 davinci-002 2.00 2.00 Source: text-completion-openai, Context: 16384
deepgram base base 0.00 0.00 Source: deepgram, Context: N/A
deepgram base-conversationalai base-conversationalai 0.00 0.00 Source: deepgram, Context: N/A
deepgram base-finance base-finance 0.00 0.00 Source: deepgram, Context: N/A
deepgram base-general base-general 0.00 0.00 Source: deepgram, Context: N/A
deepgram base-meeting base-meeting 0.00 0.00 Source: deepgram, Context: N/A
deepgram base-phonecall base-phonecall 0.00 0.00 Source: deepgram, Context: N/A
deepgram base-video base-video 0.00 0.00 Source: deepgram, Context: N/A
deepgram base-voicemail base-voicemail 0.00 0.00 Source: deepgram, Context: N/A
deepgram enhanced enhanced 0.00 0.00 Source: deepgram, Context: N/A
deepgram enhanced-finance enhanced-finance 0.00 0.00 Source: deepgram, Context: N/A
deepgram enhanced-general enhanced-general 0.00 0.00 Source: deepgram, Context: N/A
deepgram enhanced-meeting enhanced-meeting 0.00 0.00 Source: deepgram, Context: N/A
deepgram enhanced-phonecall enhanced-phonecall 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova nova 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2 nova-2 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-atc nova-2-atc 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-automotive nova-2-automotive 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-conversationalai nova-2-conversationalai 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-drivethru nova-2-drivethru 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-finance nova-2-finance 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-general nova-2-general 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-meeting nova-2-meeting 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-phonecall nova-2-phonecall 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-video nova-2-video 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-2-voicemail nova-2-voicemail 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-3 nova-3 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-3-general nova-3-general 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-3-medical nova-3-medical 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-general nova-general 0.00 0.00 Source: deepgram, Context: N/A
deepgram nova-phonecall nova-phonecall 0.00 0.00 Source: deepgram, Context: N/A
deepgram whisper whisper 0.00 0.00 Source: deepgram, Context: N/A
deepgram whisper-base whisper-base 0.00 0.00 Source: deepgram, Context: N/A
deepgram whisper-large whisper-large 0.00 0.00 Source: deepgram, Context: N/A
deepgram whisper-medium whisper-medium 0.00 0.00 Source: deepgram, Context: N/A
deepgram whisper-small whisper-small 0.00 0.00 Source: deepgram, Context: N/A
deepgram whisper-tiny whisper-tiny 0.00 0.00 Source: deepgram, Context: N/A
deepinfra MythoMax-L2-13b mythomax-l2-13b 0.08 0.09 Source: deepinfra, Context: 4096
deepinfra Hermes-3-Llama-3.1-405B hermes-3-llama-3.1-405b 1.00 1.00 Source: deepinfra, Context: 131072
deepinfra Hermes-3-Llama-3.1-70B hermes-3-llama-3.1-70b 0.30 0.30 Source: deepinfra, Context: 131072
deepinfra QwQ-32B qwq-32b 0.15 0.40 Source: deepinfra, Context: 131072
deepinfra Qwen2.5-72B-Instruct qwen2.5-72b-instruct 0.12 0.39 Source: deepinfra, Context: 32768
deepinfra Qwen2.5-7B-Instruct qwen2.5-7b-instruct 0.04 0.10 Source: deepinfra, Context: 32768
deepinfra Qwen2.5-VL-32B-Instruct qwen2.5-vl-32b-instruct 0.20 0.60 Source: deepinfra, Context: 128000
deepinfra Qwen3-14B qwen3-14b 0.06 0.24 Source: deepinfra, Context: 40960
deepinfra Qwen3-235B-A22B qwen3-235b-a22b 0.18 0.54 Source: deepinfra, Context: 40960
deepinfra Qwen3-235B-A22B-Instruct-2507 qwen3-235b-a22b-instruct-2507 0.09 0.60 Source: deepinfra, Context: 262144
deepinfra Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 0.30 2.90 Source: deepinfra, Context: 262144
deepinfra Qwen3-30B-A3B qwen3-30b-a3b 0.08 0.29 Source: deepinfra, Context: 40960
deepinfra Qwen3-32B qwen3-32b 0.10 0.28 Source: deepinfra, Context: 40960
deepinfra Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct 0.14 1.40 Source: deepinfra, Context: 262144
deepinfra Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking 0.14 1.40 Source: deepinfra, Context: 262144
deepinfra L3-8B-Lunaris-v1-Turbo l3-8b-lunaris-v1-turbo 0.04 0.05 Source: deepinfra, Context: 8192
deepinfra L3.1-70B-Euryale-v2.2 l3.1-70b-euryale-v2.2 0.65 0.75 Source: deepinfra, Context: 131072
deepinfra L3.3-70B-Euryale-v2.3 l3.3-70b-euryale-v2.3 0.65 0.75 Source: deepinfra, Context: 131072
deepinfra olmOCR-7B-0725-FP8 olmocr-7b-0725-fp8 0.27 1.50 Source: deepinfra, Context: 16384
deepinfra claude-3-7-sonnet-latest claude-3-7-sonnet-latest 3.30 16.50 Source: deepinfra, Context: 200000
deepinfra claude-4-opus claude-4-opus 16.50 82.50 Source: deepinfra, Context: 200000
deepinfra claude-4-sonnet claude-4-sonnet 3.30 16.50 Source: deepinfra, Context: 200000
deepinfra DeepSeek-R1 deepseek-r1 0.70 2.40 Source: deepinfra, Context: 163840
deepinfra DeepSeek-R1-0528 deepseek-r1-0528 0.50 2.15 Source: deepinfra, Context: 163840
deepinfra DeepSeek-R1-0528-Turbo deepseek-r1-0528-turbo 1.00 3.00 Source: deepinfra, Context: 32768
deepinfra DeepSeek-R1-Distill-Llama-70B deepseek-r1-distill-llama-70b 0.20 0.60 Source: deepinfra, Context: 131072
deepinfra DeepSeek-R1-Distill-Qwen-32B deepseek-r1-distill-qwen-32b 0.27 0.27 Source: deepinfra, Context: 131072
deepinfra DeepSeek-R1-Turbo deepseek-r1-turbo 1.00 3.00 Source: deepinfra, Context: 40960
deepinfra DeepSeek-V3 deepseek-v3 0.38 0.89 Source: deepinfra, Context: 163840
deepinfra DeepSeek-V3-0324 deepseek-v3-0324 0.25 0.88 Source: deepinfra, Context: 163840
deepinfra DeepSeek-V3.1 deepseek-v3.1 0.27 1.00 Source: deepinfra, Context: 163840
deepinfra DeepSeek-V3.1-Terminus deepseek-v3.1-terminus 0.27 1.00 Source: deepinfra, Context: 163840
deepinfra gemini-2.0-flash-001 gemini-2.0-flash-001 0.10 0.40 Source: deepinfra, Context: 1000000
deepinfra gemini-2.5-flash gemini-2.5-flash 0.30 2.50 Source: deepinfra, Context: 1000000
deepinfra gemini-2.5-pro gemini-2.5-pro 1.25 10.00 Source: deepinfra, Context: 1000000
deepinfra gemma-3-12b-it gemma-3-12b-it 0.05 0.10 Source: deepinfra, Context: 131072
deepinfra gemma-3-27b-it gemma-3-27b-it 0.09 0.16 Source: deepinfra, Context: 131072
deepinfra gemma-3-4b-it gemma-3-4b-it 0.04 0.08 Source: deepinfra, Context: 131072
deepinfra Llama-3.2-11B-Vision-Instruct llama-3.2-11b-vision-instruct 0.05 0.05 Source: deepinfra, Context: 131072
deepinfra Llama-3.2-3B-Instruct llama-3.2-3b-instruct 0.02 0.02 Source: deepinfra, Context: 131072
deepinfra Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.23 0.40 Source: deepinfra, Context: 131072
deepinfra Llama-3.3-70B-Instruct-Turbo llama-3.3-70b-instruct-turbo 0.13 0.39 Source: deepinfra, Context: 131072
deepinfra Llama-4-Maverick-17B-128E-Instruct-FP8 llama-4-maverick-17b-128e-instruct-fp8 0.15 0.60 Source: deepinfra, Context: 1048576
deepinfra Llama-4-Scout-17B-16E-Instruct llama-4-scout-17b-16e-instruct 0.08 0.30 Source: deepinfra, Context: 327680
deepinfra Llama-Guard-3-8B llama-guard-3-8b 0.06 0.06 Source: deepinfra, Context: 131072
deepinfra Llama-Guard-4-12B llama-guard-4-12b 0.18 0.18 Source: deepinfra, Context: 163840
deepinfra Meta-Llama-3-8B-Instruct meta-llama-3-8b-instruct 0.03 0.06 Source: deepinfra, Context: 8192
deepinfra Meta-Llama-3.1-70B-Instruct meta-llama-3.1-70b-instruct 0.40 0.40 Source: deepinfra, Context: 131072
deepinfra Meta-Llama-3.1-70B-Instruct-Turbo meta-llama-3.1-70b-instruct-turbo 0.10 0.28 Source: deepinfra, Context: 131072
deepinfra Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.03 0.05 Source: deepinfra, Context: 131072
deepinfra Meta-Llama-3.1-8B-Instruct-Turbo meta-llama-3.1-8b-instruct-turbo 0.02 0.03 Source: deepinfra, Context: 131072
deepinfra WizardLM-2-8x22B wizardlm-2-8x22b 0.48 0.48 Source: deepinfra, Context: 65536
deepinfra phi-4 phi-4 0.07 0.14 Source: deepinfra, Context: 16384
deepinfra Mistral-Nemo-Instruct-2407 mistral-nemo-instruct-2407 0.02 0.04 Source: deepinfra, Context: 131072
deepinfra Mistral-Small-24B-Instruct-2501 mistral-small-24b-instruct-2501 0.05 0.08 Source: deepinfra, Context: 32768
deepinfra Mistral-Small-3.2-24B-Instruct-2506 mistral-small-3.2-24b-instruct-2506 0.08 0.20 Source: deepinfra, Context: 128000
deepinfra Mixtral-8x7B-Instruct-v0.1 mixtral-8x7b-instruct-v0.1 0.40 0.40 Source: deepinfra, Context: 32768
deepinfra Kimi-K2-Instruct-0905 kimi-k2-instruct-0905 0.50 2.00 Source: deepinfra, Context: 262144
deepinfra Llama-3.1-Nemotron-70B-Instruct llama-3.1-nemotron-70b-instruct 0.60 0.60 Source: deepinfra, Context: 131072
deepinfra Llama-3.3-Nemotron-Super-49B-v1.5 llama-3.3-nemotron-super-49b-v1.5 0.10 0.40 Source: deepinfra, Context: 131072
deepinfra NVIDIA-Nemotron-Nano-9B-v2 nvidia-nemotron-nano-9b-v2 0.04 0.16 Source: deepinfra, Context: 131072
deepseek deepseek-coder deepseek-coder 0.14 0.28 Source: deepseek, Context: 128000
deepseek deepseek-r1 deepseek-r1 0.55 2.19 Source: deepseek, Context: 65536
deepseek deepseek-v3 deepseek-v3 0.27 1.10 Source: deepseek, Context: 65536
deepseek deepseek-v3.2 deepseek-v3.2 0.28 0.40 Source: deepseek, Context: 163840
bedrockconverse deepseek.v3-v1:0 deepseek.v3-v1:0 0.58 1.68 Source: bedrock_converse, Context: 163840
nlpcloud dolphin dolphin 0.50 0.50 Source: nlp_cloud, Context: 16384
volcengine doubao-embedding doubao-embedding 0.00 0.00 Source: volcengine, Context: 4096
volcengine doubao-embedding-large doubao-embedding-large 0.00 0.00 Source: volcengine, Context: 4096
volcengine doubao-embedding-large-text-240915 doubao-embedding-large-text-240915 0.00 0.00 Source: volcengine, Context: 4096
volcengine doubao-embedding-large-text-250515 doubao-embedding-large-text-250515 0.00 0.00 Source: volcengine, Context: 4096
volcengine doubao-embedding-text-240715 doubao-embedding-text-240715 0.00 0.00 Source: volcengine, Context: 4096
exaai search search 0.00 0.00 Source: exa_ai, Context: N/A
firecrawl search search 0.00 0.00 Source: firecrawl, Context: N/A
perplexity search search 0.00 0.00 Source: perplexity, Context: N/A
searxng search search 0.00 0.00 Source: searxng, Context: N/A
elevenlabs scribe_v1 scribe_v1 0.00 0.00 Source: elevenlabs, Context: N/A
elevenlabs scribe_v1_experimental scribe_v1_experimental 0.00 0.00 Source: elevenlabs, Context: N/A
cohere embed-english-light-v2.0 embed-english-light-v2.0 0.10 0.00 Source: cohere, Context: 1024
cohere embed-english-light-v3.0 embed-english-light-v3.0 0.10 0.00 Source: cohere, Context: 1024
cohere embed-english-v2.0 embed-english-v2.0 0.10 0.00 Source: cohere, Context: 4096
cohere embed-english-v3.0 embed-english-v3.0 0.10 0.00 Source: cohere, Context: 1024
cohere embed-multilingual-v2.0 embed-multilingual-v2.0 0.10 0.00 Source: cohere, Context: 768
cohere embed-multilingual-v3.0 embed-multilingual-v3.0 0.10 0.00 Source: cohere, Context: 1024
cohere embed-multilingual-light-v3.0 embed-multilingual-light-v3.0 100.00 0.00 Source: cohere, Context: 1024
bedrockconverse eu.amazon.nova-lite-v1:0 eu.amazon.nova-lite-v1:0 0.08 0.31 Source: bedrock_converse, Context: 300000
bedrockconverse eu.amazon.nova-micro-v1:0 eu.amazon.nova-micro-v1:0 0.05 0.18 Source: bedrock_converse, Context: 128000
bedrockconverse eu.amazon.nova-pro-v1:0 eu.amazon.nova-pro-v1:0 1.05 4.20 Source: bedrock_converse, Context: 300000
bedrock eu.anthropic.claude-3-5-haiku-20241022-v1:0 eu.anthropic.claude-3-5-haiku-20241022-v1:0 0.25 1.25 Source: bedrock, Context: 200000
bedrockconverse eu.anthropic.claude-haiku-4-5-20251001-v1:0 eu.anthropic.claude-haiku-4-5-20251001-v1:0 1.10 5.50 Source: bedrock_converse, Context: 200000
bedrock eu.anthropic.claude-3-5-sonnet-20240620-v1:0 eu.anthropic.claude-3-5-sonnet-20240620-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock eu.anthropic.claude-3-5-sonnet-20241022-v2:0 eu.anthropic.claude-3-5-sonnet-20241022-v2:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock eu.anthropic.claude-3-7-sonnet-20250219-v1:0 eu.anthropic.claude-3-7-sonnet-20250219-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock eu.anthropic.claude-3-haiku-20240307-v1:0 eu.anthropic.claude-3-haiku-20240307-v1:0 0.25 1.25 Source: bedrock, Context: 200000
bedrock eu.anthropic.claude-3-opus-20240229-v1:0 eu.anthropic.claude-3-opus-20240229-v1:0 15.00 75.00 Source: bedrock, Context: 200000
bedrock eu.anthropic.claude-3-sonnet-20240229-v1:0 eu.anthropic.claude-3-sonnet-20240229-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrockconverse eu.anthropic.claude-opus-4-1-20250805-v1:0 eu.anthropic.claude-opus-4-1-20250805-v1:0 15.00 75.00 Source: bedrock_converse, Context: 200000
bedrockconverse eu.anthropic.claude-opus-4-20250514-v1:0 eu.anthropic.claude-opus-4-20250514-v1:0 15.00 75.00 Source: bedrock_converse, Context: 200000
bedrockconverse eu.anthropic.claude-sonnet-4-20250514-v1:0 eu.anthropic.claude-sonnet-4-20250514-v1:0 3.00 15.00 Source: bedrock_converse, Context: 1000000
bedrockconverse eu.anthropic.claude-sonnet-4-5-20250929-v1:0 eu.anthropic.claude-sonnet-4-5-20250929-v1:0 3.30 16.50 Source: bedrock_converse, Context: 200000
bedrock eu.meta.llama3-2-1b-instruct-v1:0 eu.meta.llama3-2-1b-instruct-v1:0 0.13 0.13 Source: bedrock, Context: 128000
bedrock eu.meta.llama3-2-3b-instruct-v1:0 eu.meta.llama3-2-3b-instruct-v1:0 0.19 0.19 Source: bedrock, Context: 128000
bedrockconverse eu.mistral.pixtral-large-2502-v1:0 eu.mistral.pixtral-large-2502-v1:0 2.00 6.00 Source: bedrock_converse, Context: 128000
falai 3.2 3.2 0.00 0.00 Source: fal_ai, Context: N/A
falai v1.1 v1.1 0.00 0.00 Source: fal_ai, Context: N/A
falai v1.1-ultra v1.1-ultra 0.00 0.00 Source: fal_ai, Context: N/A
falai schnell schnell 0.00 0.00 Source: fal_ai, Context: N/A
falai text-to-image text-to-image 0.00 0.00 Source: fal_ai, Context: N/A
falai v3 v3 0.00 0.00 Source: fal_ai, Context: N/A
falai preview preview 0.00 0.00 Source: fal_ai, Context: N/A
falai fast fast 0.00 0.00 Source: fal_ai, Context: N/A
falai ultra ultra 0.00 0.00 Source: fal_ai, Context: N/A
falai stable-diffusion-v35-medium stable-diffusion-v35-medium 0.00 0.00 Source: fal_ai, Context: N/A
featherlessai Qwerky-72B qwerky-72b 0.00 0.00 Source: featherless_ai, Context: 32768
featherlessai Qwerky-QwQ-32B qwerky-qwq-32b 0.00 0.00 Source: featherless_ai, Context: 32768
fireworksai fireworks-ai-4.1b-to-16b fireworks-ai-4.1b-to-16b 0.20 0.20 Source: fireworks_ai, Context: N/A
fireworksai fireworks-ai-56b-to-176b fireworks-ai-56b-to-176b 1.20 1.20 Source: fireworks_ai, Context: N/A
fireworksai fireworks-ai-above-16b fireworks-ai-above-16b 0.90 0.90 Source: fireworks_ai, Context: N/A
fireworksai fireworks-ai-default fireworks-ai-default 0.00 0.00 Source: fireworks_ai, Context: N/A
fireworksaiembeddingmodels fireworks-ai-embedding-150m-to-350m fireworks-ai-embedding-150m-to-350m 0.02 0.00 Source: fireworks_ai-embedding-models, Context: N/A
fireworksaiembeddingmodels fireworks-ai-embedding-up-to-150m fireworks-ai-embedding-up-to-150m 0.01 0.00 Source: fireworks_ai-embedding-models, Context: N/A
fireworksai fireworks-ai-moe-up-to-56b fireworks-ai-moe-up-to-56b 0.50 0.50 Source: fireworks_ai, Context: N/A
fireworksai fireworks-ai-up-to-4b fireworks-ai-up-to-4b 0.20 0.20 Source: fireworks_ai, Context: N/A
fireworksaiembeddingmodels UAE-Large-V1 uae-large-v1 0.02 0.00 Source: fireworks_ai-embedding-models, Context: 512
fireworksai deepseek-coder-v2-instruct deepseek-coder-v2-instruct 1.20 1.20 Source: fireworks_ai, Context: 65536
fireworksai deepseek-r1 deepseek-r1 3.00 8.00 Source: fireworks_ai, Context: 128000
fireworksai deepseek-r1-basic deepseek-r1-basic 0.55 2.19 Source: fireworks_ai, Context: 128000
fireworksai deepseek-v3 deepseek-v3 0.90 0.90 Source: fireworks_ai, Context: 128000
fireworksai deepseek-v3p1-terminus deepseek-v3p1-terminus 0.56 1.68 Source: fireworks_ai, Context: 128000
fireworksai firefunction-v2 firefunction-v2 0.90 0.90 Source: fireworks_ai, Context: 8192
fireworksai kimi-k2-instruct-0905 kimi-k2-instruct-0905 0.60 2.50 Source: fireworks_ai, Context: 262144
fireworksai llama-v3p1-405b-instruct llama-v3p1-405b-instruct 3.00 3.00 Source: fireworks_ai, Context: 128000
fireworksai llama-v3p1-8b-instruct llama-v3p1-8b-instruct 0.10 0.10 Source: fireworks_ai, Context: 16384
fireworksai llama-v3p2-11b-vision-instruct llama-v3p2-11b-vision-instruct 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai llama-v3p2-1b-instruct llama-v3p2-1b-instruct 0.10 0.10 Source: fireworks_ai, Context: 16384
fireworksai llama-v3p2-3b-instruct llama-v3p2-3b-instruct 0.10 0.10 Source: fireworks_ai, Context: 16384
fireworksai llama-v3p2-90b-vision-instruct llama-v3p2-90b-vision-instruct 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai llama4-maverick-instruct-basic llama4-maverick-instruct-basic 0.22 0.88 Source: fireworks_ai, Context: 131072
fireworksai llama4-scout-instruct-basic llama4-scout-instruct-basic 0.15 0.60 Source: fireworks_ai, Context: 131072
fireworksai mixtral-8x22b-instruct-hf mixtral-8x22b-instruct-hf 1.20 1.20 Source: fireworks_ai, Context: 65536
fireworksai qwen2-72b-instruct qwen2-72b-instruct 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-32b-instruct qwen2p5-coder-32b-instruct 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai yi-large yi-large 3.00 3.00 Source: fireworks_ai, Context: 32768
fireworksaiembeddingmodels nomic-embed-text-v1 nomic-embed-text-v1 0.01 0.00 Source: fireworks_ai-embedding-models, Context: 8192
fireworksaiembeddingmodels nomic-embed-text-v1.5 nomic-embed-text-v1.5 0.01 0.00 Source: fireworks_ai-embedding-models, Context: 8192
fireworksaiembeddingmodels gte-base gte-base 0.01 0.00 Source: fireworks_ai-embedding-models, Context: 512
fireworksaiembeddingmodels gte-large gte-large 0.02 0.00 Source: fireworks_ai-embedding-models, Context: 512
friendliai meta-llama-3.1-70b-instruct meta-llama-3.1-70b-instruct 0.60 0.60 Source: friendliai, Context: 8192
friendliai meta-llama-3.1-8b-instruct meta-llama-3.1-8b-instruct 0.10 0.10 Source: friendliai, Context: 8192
textcompletionopenai ft:babbage-002 ft:babbage-002 1.60 1.60 Source: text-completion-openai, Context: 16384
textcompletionopenai ft:davinci-002 ft:davinci-002 12.00 12.00 Source: text-completion-openai, Context: 16384
openai ft:gpt-3.5-turbo ft:gpt-3.5-turbo 3.00 6.00 Source: openai, Context: 16385
openai ft:gpt-3.5-turbo-0125 ft:gpt-3.5-turbo-0125 3.00 6.00 Source: openai, Context: 16385
openai ft:gpt-3.5-turbo-0613 ft:gpt-3.5-turbo-0613 3.00 6.00 Source: openai, Context: 4096
openai ft:gpt-3.5-turbo-1106 ft:gpt-3.5-turbo-1106 3.00 6.00 Source: openai, Context: 16385
openai ft:gpt-4-0613 ft:gpt-4-0613 30.00 60.00 Source: openai, Context: 8192
openai ft:gpt-4o-2024-08-06 ft:gpt-4o-2024-08-06 3.75 15.00 Source: openai, Context: 128000
openai ft:gpt-4o-2024-11-20 ft:gpt-4o-2024-11-20 3.75 15.00 Source: openai, Context: 128000
openai ft:gpt-4o-mini-2024-07-18 ft:gpt-4o-mini-2024-07-18 0.30 1.20 Source: openai, Context: 128000
openai ft:gpt-4.1-2025-04-14 ft:gpt-4.1-2025-04-14 3.00 12.00 Source: openai, Context: 1047576
openai ft:gpt-4.1-mini-2025-04-14 ft:gpt-4.1-mini-2025-04-14 0.80 3.20 Source: openai, Context: 1047576
openai ft:gpt-4.1-nano-2025-04-14 ft:gpt-4.1-nano-2025-04-14 0.20 0.80 Source: openai, Context: 1047576
openai ft:o4-mini-2025-04-16 ft:o4-mini-2025-04-16 4.00 16.00 Source: openai, Context: 200000
vertex gemini-1.0-pro gemini-1.0-pro 0.50 1.50 Source: vertex, Context: 32760
vertex gemini-1.0-pro-001 gemini-1.0-pro-001 0.50 1.50 Source: vertex, Context: 32760
vertex gemini-1.0-pro-002 gemini-1.0-pro-002 0.50 1.50 Source: vertex, Context: 32760
vertex gemini-1.0-pro-vision gemini-1.0-pro-vision 0.50 1.50 Source: vertex, Context: 16384
vertex gemini-1.0-pro-vision-001 gemini-1.0-pro-vision-001 0.50 1.50 Source: vertex, Context: 16384
vertex gemini-1.0-ultra gemini-1.0-ultra 0.50 1.50 Source: vertex, Context: 8192
vertex gemini-1.0-ultra-001 gemini-1.0-ultra-001 0.50 1.50 Source: vertex, Context: 8192
vertex gemini-1.5-flash gemini-1.5-flash 0.08 0.30 Source: vertex, Context: 1000000
vertex gemini-1.5-flash-001 gemini-1.5-flash-001 0.08 0.30 Source: vertex, Context: 1000000
vertex gemini-1.5-flash-002 gemini-1.5-flash-002 0.08 0.30 Source: vertex, Context: 1048576
vertex gemini-1.5-flash-exp-0827 gemini-1.5-flash-exp-0827 0.00 0.00 Source: vertex, Context: 1000000
vertex gemini-1.5-flash-preview-0514 gemini-1.5-flash-preview-0514 0.08 0.00 Source: vertex, Context: 1000000
vertex gemini-1.5-pro gemini-1.5-pro 1.25 5.00 Source: vertex, Context: 2097152
vertex gemini-1.5-pro-001 gemini-1.5-pro-001 1.25 5.00 Source: vertex, Context: 1000000
vertex gemini-1.5-pro-002 gemini-1.5-pro-002 1.25 5.00 Source: vertex, Context: 2097152
vertex gemini-1.5-pro-preview-0215 gemini-1.5-pro-preview-0215 0.08 0.31 Source: vertex, Context: 1000000
vertex gemini-1.5-pro-preview-0409 gemini-1.5-pro-preview-0409 0.08 0.31 Source: vertex, Context: 1000000
vertex gemini-1.5-pro-preview-0514 gemini-1.5-pro-preview-0514 0.08 0.31 Source: vertex, Context: 1000000
vertex gemini-2.0-flash gemini-2.0-flash 0.10 0.40 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-001 gemini-2.0-flash-001 0.15 0.60 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-exp gemini-2.0-flash-exp 0.15 0.60 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-lite gemini-2.0-flash-lite 0.08 0.30 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-lite-001 gemini-2.0-flash-lite-001 0.08 0.30 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-live-preview-04-09 gemini-2.0-flash-live-preview-04-09 0.50 2.00 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation 0.10 0.40 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-thinking-exp gemini-2.0-flash-thinking-exp 0.00 0.00 Source: vertex, Context: 1048576
vertex gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-thinking-exp-01-21 0.00 0.00 Source: vertex, Context: 1048576
vertex gemini-2.0-pro-exp-02-05 gemini-2.0-pro-exp-02-05 1.25 10.00 Source: vertex, Context: 2097152
vertex gemini-2.5-flash gemini-2.5-flash 0.30 2.50 Source: vertex, Context: 1048576
vertex gemini-2.5-flash-image gemini-2.5-flash-image 0.30 2.50 Source: vertex, Context: 32768
vertex gemini-2.5-flash-image-preview gemini-2.5-flash-image-preview 0.30 30.00 Source: vertex, Context: 1048576
vertex gemini-3-pro-image-preview gemini-3-pro-image-preview 2.00 12.00 Source: vertex, Context: 65536
vertex gemini-2.5-flash-lite gemini-2.5-flash-lite 0.10 0.40 Source: vertex, Context: 1048576
vertex gemini-2.5-flash-lite-preview-09-2025 gemini-2.5-flash-lite-preview-09-2025 0.10 0.40 Source: vertex, Context: 1048576
vertex gemini-2.5-flash-preview-09-2025 gemini-2.5-flash-preview-09-2025 0.30 2.50 Source: vertex, Context: 1048576
vertex gemini-live-2.5-flash-preview-native-audio-09-2025 gemini-live-2.5-flash-preview-native-audio-09-2025 0.30 2.00 Source: vertex, Context: 1048576
gemini gemini-live-2.5-flash-preview-native-audio-09-2025 gemini-live-2.5-flash-preview-native-audio-09-2025 0.30 2.00 Source: gemini, Context: 1048576
vertex gemini-2.5-flash-lite-preview-06-17 gemini-2.5-flash-lite-preview-06-17 0.10 0.40 Source: vertex, Context: 1048576
vertex gemini-2.5-flash-preview-04-17 gemini-2.5-flash-preview-04-17 0.15 0.60 Source: vertex, Context: 1048576
vertex gemini-2.5-flash-preview-05-20 gemini-2.5-flash-preview-05-20 0.30 2.50 Source: vertex, Context: 1048576
vertex gemini-2.5-pro gemini-2.5-pro 1.25 10.00 Source: vertex, Context: 1048576
vertex gemini-3-pro-preview gemini-3-pro-preview 2.00 12.00 Source: vertex, Context: 1048576
vertex gemini-3-flash-preview gemini-3-flash-preview 0.50 3.00 Source: vertex, Context: 1048576
vertex gemini-2.5-pro-exp-03-25 gemini-2.5-pro-exp-03-25 1.25 10.00 Source: vertex, Context: 1048576
vertex gemini-2.5-pro-preview-03-25 gemini-2.5-pro-preview-03-25 1.25 10.00 Source: vertex, Context: 1048576
vertex gemini-2.5-pro-preview-05-06 gemini-2.5-pro-preview-05-06 1.25 10.00 Source: vertex, Context: 1048576
vertex gemini-2.5-pro-preview-06-05 gemini-2.5-pro-preview-06-05 1.25 10.00 Source: vertex, Context: 1048576
vertex gemini-2.5-pro-preview-tts gemini-2.5-pro-preview-tts 1.25 10.00 Source: vertex, Context: 1048576
vertex gemini-embedding-001 gemini-embedding-001 0.15 0.00 Source: vertex, Context: 2048
vertex gemini-flash-experimental gemini-flash-experimental 0.00 0.00 Source: vertex, Context: 1000000
vertex gemini-pro gemini-pro 0.50 1.50 Source: vertex, Context: 32760
vertex gemini-pro-experimental gemini-pro-experimental 0.00 0.00 Source: vertex, Context: 1000000
vertex gemini-pro-vision gemini-pro-vision 0.50 1.50 Source: vertex, Context: 16384
gemini gemini-embedding-001 gemini-embedding-001 0.15 0.00 Source: gemini, Context: 2048
gemini gemini-1.5-flash gemini-1.5-flash 0.08 0.30 Source: gemini, Context: 1048576
gemini gemini-1.5-flash-001 gemini-1.5-flash-001 0.08 0.30 Source: gemini, Context: 1048576
gemini gemini-1.5-flash-002 gemini-1.5-flash-002 0.08 0.30 Source: gemini, Context: 1048576
gemini gemini-1.5-flash-8b gemini-1.5-flash-8b 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-1.5-flash-8b-exp-0827 gemini-1.5-flash-8b-exp-0827 0.00 0.00 Source: gemini, Context: 1000000
gemini gemini-1.5-flash-8b-exp-0924 gemini-1.5-flash-8b-exp-0924 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-1.5-flash-exp-0827 gemini-1.5-flash-exp-0827 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-1.5-flash-latest gemini-1.5-flash-latest 0.08 0.30 Source: gemini, Context: 1048576
gemini gemini-1.5-pro gemini-1.5-pro 3.50 10.50 Source: gemini, Context: 2097152
gemini gemini-1.5-pro-001 gemini-1.5-pro-001 3.50 10.50 Source: gemini, Context: 2097152
gemini gemini-1.5-pro-002 gemini-1.5-pro-002 3.50 10.50 Source: gemini, Context: 2097152
gemini gemini-1.5-pro-exp-0801 gemini-1.5-pro-exp-0801 3.50 10.50 Source: gemini, Context: 2097152
gemini gemini-1.5-pro-exp-0827 gemini-1.5-pro-exp-0827 0.00 0.00 Source: gemini, Context: 2097152
gemini gemini-1.5-pro-latest gemini-1.5-pro-latest 3.50 1.05 Source: gemini, Context: 1048576
gemini gemini-2.0-flash gemini-2.0-flash 0.10 0.40 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-001 gemini-2.0-flash-001 0.10 0.40 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-exp gemini-2.0-flash-exp 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-lite gemini-2.0-flash-lite 0.08 0.30 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-lite-preview-02-05 gemini-2.0-flash-lite-preview-02-05 0.08 0.30 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-live-001 gemini-2.0-flash-live-001 0.35 1.50 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation 0.10 0.40 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-thinking-exp gemini-2.0-flash-thinking-exp 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-2.0-flash-thinking-exp-01-21 gemini-2.0-flash-thinking-exp-01-21 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-2.0-pro-exp-02-05 gemini-2.0-pro-exp-02-05 0.00 0.00 Source: gemini, Context: 2097152
gemini gemini-2.5-flash gemini-2.5-flash 0.30 2.50 Source: gemini, Context: 1048576
gemini gemini-2.5-flash-image-preview gemini-2.5-flash-image-preview 0.30 30.00 Source: gemini, Context: 1048576
gemini gemini-3-pro-image-preview gemini-3-pro-image-preview 2.00 12.00 Source: gemini, Context: 65536
gemini gemini-2.5-flash-lite gemini-2.5-flash-lite 0.10 0.40 Source: gemini, Context: 1048576
gemini gemini-2.5-flash-lite-preview-09-2025 gemini-2.5-flash-lite-preview-09-2025 0.10 0.40 Source: gemini, Context: 1048576
gemini gemini-2.5-flash-preview-09-2025 gemini-2.5-flash-preview-09-2025 0.30 2.50 Source: gemini, Context: 1048576
gemini gemini-flash-latest gemini-flash-latest 0.30 2.50 Source: gemini, Context: 1048576
gemini gemini-flash-lite-latest gemini-flash-lite-latest 0.10 0.40 Source: gemini, Context: 1048576
gemini gemini-2.5-flash-lite-preview-06-17 gemini-2.5-flash-lite-preview-06-17 0.10 0.40 Source: gemini, Context: 1048576
gemini gemini-2.5-flash-preview-04-17 gemini-2.5-flash-preview-04-17 0.15 0.60 Source: gemini, Context: 1048576
gemini gemini-2.5-flash-preview-05-20 gemini-2.5-flash-preview-05-20 0.30 2.50 Source: gemini, Context: 1048576
gemini gemini-2.5-flash-preview-tts gemini-2.5-flash-preview-tts 0.15 0.60 Source: gemini, Context: 1048576
gemini gemini-2.5-pro gemini-2.5-pro 1.25 10.00 Source: gemini, Context: 1048576
gemini gemini-2.5-computer-use-preview-10-2025 gemini-2.5-computer-use-preview-10-2025 1.25 10.00 Source: gemini, Context: 128000
gemini gemini-3-pro-preview gemini-3-pro-preview 2.00 12.00 Source: gemini, Context: 1048576
gemini gemini-3-flash-preview gemini-3-flash-preview 0.50 3.00 Source: gemini, Context: 1048576
gemini gemini-2.5-pro-exp-03-25 gemini-2.5-pro-exp-03-25 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-2.5-pro-preview-03-25 gemini-2.5-pro-preview-03-25 1.25 10.00 Source: gemini, Context: 1048576
gemini gemini-2.5-pro-preview-05-06 gemini-2.5-pro-preview-05-06 1.25 10.00 Source: gemini, Context: 1048576
gemini gemini-2.5-pro-preview-06-05 gemini-2.5-pro-preview-06-05 1.25 10.00 Source: gemini, Context: 1048576
gemini gemini-2.5-pro-preview-tts gemini-2.5-pro-preview-tts 1.25 10.00 Source: gemini, Context: 1048576
gemini gemini-exp-1114 gemini-exp-1114 0.00 0.00 Source: gemini, Context: 1048576
gemini gemini-exp-1206 gemini-exp-1206 0.00 0.00 Source: gemini, Context: 2097152
gemini gemini-gemma-2-27b-it gemini-gemma-2-27b-it 0.35 1.05 Source: gemini, Context: 8192
gemini gemini-gemma-2-9b-it gemini-gemma-2-9b-it 0.35 1.05 Source: gemini, Context: 8192
gemini gemini-pro gemini-pro 0.35 1.05 Source: gemini, Context: 32760
gemini gemini-pro-vision gemini-pro-vision 0.35 1.05 Source: gemini, Context: 30720
gemini gemma-3-27b-it gemma-3-27b-it 0.00 0.00 Source: gemini, Context: 131072
gemini imagen-3.0-fast-generate-001 imagen-3.0-fast-generate-001 0.00 0.00 Source: gemini, Context: N/A
gemini imagen-3.0-generate-001 imagen-3.0-generate-001 0.00 0.00 Source: gemini, Context: N/A
gemini imagen-3.0-generate-002 imagen-3.0-generate-002 0.00 0.00 Source: gemini, Context: N/A
gemini imagen-4.0-fast-generate-001 imagen-4.0-fast-generate-001 0.00 0.00 Source: gemini, Context: N/A
gemini imagen-4.0-generate-001 imagen-4.0-generate-001 0.00 0.00 Source: gemini, Context: N/A
gemini imagen-4.0-ultra-generate-001 imagen-4.0-ultra-generate-001 0.00 0.00 Source: gemini, Context: N/A
gemini learnlm-1.5-pro-experimental learnlm-1.5-pro-experimental 0.00 0.00 Source: gemini, Context: 32767
gemini veo-2.0-generate-001 veo-2.0-generate-001 0.00 0.00 Source: gemini, Context: 1024
gemini veo-3.0-fast-generate-preview veo-3.0-fast-generate-preview 0.00 0.00 Source: gemini, Context: 1024
gemini veo-3.0-generate-preview veo-3.0-generate-preview 0.00 0.00 Source: gemini, Context: 1024
gemini veo-3.1-fast-generate-preview veo-3.1-fast-generate-preview 0.00 0.00 Source: gemini, Context: 1024
gemini veo-3.1-generate-preview veo-3.1-generate-preview 0.00 0.00 Source: gemini, Context: 1024
gemini veo-3.1-fast-generate-001 veo-3.1-fast-generate-001 0.00 0.00 Source: gemini, Context: 1024
gemini veo-3.1-generate-001 veo-3.1-generate-001 0.00 0.00 Source: gemini, Context: 1024
githubcopilot gpt-3.5-turbo gpt-3.5-turbo 0.00 0.00 Source: github_copilot, Context: 16384
githubcopilot gpt-3.5-turbo-0613 gpt-3.5-turbo-0613 0.00 0.00 Source: github_copilot, Context: 16384
githubcopilot gpt-4 gpt-4 0.00 0.00 Source: github_copilot, Context: 32768
githubcopilot gpt-4-0613 gpt-4-0613 0.00 0.00 Source: github_copilot, Context: 32768
githubcopilot gpt-4-o-preview gpt-4-o-preview 0.00 0.00 Source: github_copilot, Context: 64000
githubcopilot gpt-4.1-2025-04-14 gpt-4.1-2025-04-14 0.00 0.00 Source: github_copilot, Context: 128000
githubcopilot gpt-41-copilot gpt-41-copilot 0.00 0.00 Source: github_copilot, Context: N/A
githubcopilot gpt-4o-2024-05-13 gpt-4o-2024-05-13 0.00 0.00 Source: github_copilot, Context: 64000
githubcopilot gpt-4o-2024-08-06 gpt-4o-2024-08-06 0.00 0.00 Source: github_copilot, Context: 64000
githubcopilot gpt-4o-2024-11-20 gpt-4o-2024-11-20 0.00 0.00 Source: github_copilot, Context: 64000
githubcopilot gpt-4o-mini gpt-4o-mini 0.00 0.00 Source: github_copilot, Context: 64000
githubcopilot gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18 0.00 0.00 Source: github_copilot, Context: 64000
githubcopilot text-embedding-3-small text-embedding-3-small 0.00 0.00 Source: github_copilot, Context: 8191
githubcopilot text-embedding-3-small-inference text-embedding-3-small-inference 0.00 0.00 Source: github_copilot, Context: 8191
githubcopilot text-embedding-ada-002 text-embedding-ada-002 0.00 0.00 Source: github_copilot, Context: 8191
bedrockconverse google.gemma-3-12b-it google.gemma-3-12b-it 0.09 0.29 Source: bedrock_converse, Context: 128000
bedrockconverse google.gemma-3-27b-it google.gemma-3-27b-it 0.23 0.38 Source: bedrock_converse, Context: 128000
bedrockconverse google.gemma-3-4b-it google.gemma-3-4b-it 0.04 0.08 Source: bedrock_converse, Context: 128000
googlepse search search 0.00 0.00 Source: google_pse, Context: N/A
bedrockconverse global.anthropic.claude-sonnet-4-5-20250929-v1:0 global.anthropic.claude-sonnet-4-5-20250929-v1:0 3.00 15.00 Source: bedrock_converse, Context: 200000
bedrockconverse global.anthropic.claude-sonnet-4-20250514-v1:0 global.anthropic.claude-sonnet-4-20250514-v1:0 3.00 15.00 Source: bedrock_converse, Context: 1000000
bedrockconverse global.anthropic.claude-haiku-4-5-20251001-v1:0 global.anthropic.claude-haiku-4-5-20251001-v1:0 1.00 5.00 Source: bedrock_converse, Context: 200000
bedrockconverse global.amazon.nova-2-lite-v1:0 global.amazon.nova-2-lite-v1:0 0.30 2.50 Source: bedrock_converse, Context: 1000000
openai gpt-3.5-turbo-0125 gpt-3.5-turbo-0125 0.50 1.50 Source: openai, Context: 16385
openai gpt-3.5-turbo-0301 gpt-3.5-turbo-0301 1.50 2.00 Source: openai, Context: 4097
openai gpt-3.5-turbo-0613 gpt-3.5-turbo-0613 1.50 2.00 Source: openai, Context: 4097
openai gpt-3.5-turbo-1106 gpt-3.5-turbo-1106 1.00 2.00 Source: openai, Context: 16385
openai gpt-3.5-turbo-16k gpt-3.5-turbo-16k 3.00 4.00 Source: openai, Context: 16385
openai gpt-3.5-turbo-16k-0613 gpt-3.5-turbo-16k-0613 3.00 4.00 Source: openai, Context: 16385
textcompletionopenai gpt-3.5-turbo-instruct gpt-3.5-turbo-instruct 1.50 2.00 Source: text-completion-openai, Context: 8192
textcompletionopenai gpt-3.5-turbo-instruct-0914 gpt-3.5-turbo-instruct-0914 1.50 2.00 Source: text-completion-openai, Context: 8192
openai gpt-4-0125-preview gpt-4-0125-preview 10.00 30.00 Source: openai, Context: 128000
openai gpt-4-0314 gpt-4-0314 30.00 60.00 Source: openai, Context: 8192
openai gpt-4-0613 gpt-4-0613 30.00 60.00 Source: openai, Context: 8192
openai gpt-4-1106-preview gpt-4-1106-preview 10.00 30.00 Source: openai, Context: 128000
openai gpt-4-1106-vision-preview gpt-4-1106-vision-preview 10.00 30.00 Source: openai, Context: 128000
openai gpt-4-32k gpt-4-32k 60.00 120.00 Source: openai, Context: 32768
openai gpt-4-32k-0314 gpt-4-32k-0314 60.00 120.00 Source: openai, Context: 32768
openai gpt-4-32k-0613 gpt-4-32k-0613 60.00 120.00 Source: openai, Context: 32768
openai gpt-4-turbo-2024-04-09 gpt-4-turbo-2024-04-09 10.00 30.00 Source: openai, Context: 128000
openai gpt-4-turbo-preview gpt-4-turbo-preview 10.00 30.00 Source: openai, Context: 128000
openai gpt-4-vision-preview gpt-4-vision-preview 10.00 30.00 Source: openai, Context: 128000
openai gpt-4.1-2025-04-14 gpt-4.1-2025-04-14 2.00 8.00 Source: openai, Context: 1047576
openai gpt-4.1-mini-2025-04-14 gpt-4.1-mini-2025-04-14 0.40 1.60 Source: openai, Context: 1047576
openai gpt-4.1-nano-2025-04-14 gpt-4.1-nano-2025-04-14 0.10 0.40 Source: openai, Context: 1047576
openai gpt-4.5-preview gpt-4.5-preview 75.00 150.00 Source: openai, Context: 128000
openai gpt-4.5-preview-2025-02-27 gpt-4.5-preview-2025-02-27 75.00 150.00 Source: openai, Context: 128000
openai gpt-4o-audio-preview gpt-4o-audio-preview 2.50 10.00 Source: openai, Context: 128000
openai gpt-4o-audio-preview-2024-10-01 gpt-4o-audio-preview-2024-10-01 2.50 10.00 Source: openai, Context: 128000
openai gpt-4o-audio-preview-2024-12-17 gpt-4o-audio-preview-2024-12-17 2.50 10.00 Source: openai, Context: 128000
openai gpt-4o-audio-preview-2025-06-03 gpt-4o-audio-preview-2025-06-03 2.50 10.00 Source: openai, Context: 128000
openai gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18 0.15 0.60 Source: openai, Context: 128000
openai gpt-4o-mini-audio-preview gpt-4o-mini-audio-preview 0.15 0.60 Source: openai, Context: 128000
openai gpt-4o-mini-audio-preview-2024-12-17 gpt-4o-mini-audio-preview-2024-12-17 0.15 0.60 Source: openai, Context: 128000
openai gpt-4o-mini-realtime-preview gpt-4o-mini-realtime-preview 0.60 2.40 Source: openai, Context: 128000
openai gpt-4o-mini-realtime-preview-2024-12-17 gpt-4o-mini-realtime-preview-2024-12-17 0.60 2.40 Source: openai, Context: 128000
openai gpt-4o-mini-search-preview gpt-4o-mini-search-preview 0.15 0.60 Source: openai, Context: 128000
openai gpt-4o-mini-search-preview-2025-03-11 gpt-4o-mini-search-preview-2025-03-11 0.15 0.60 Source: openai, Context: 128000
openai gpt-4o-mini-transcribe gpt-4o-mini-transcribe 1.25 5.00 Source: openai, Context: 16000
openai gpt-4o-mini-tts gpt-4o-mini-tts 2.50 10.00 Source: openai, Context: N/A
openai gpt-4o-realtime-preview gpt-4o-realtime-preview 5.00 20.00 Source: openai, Context: 128000
openai gpt-4o-realtime-preview-2024-10-01 gpt-4o-realtime-preview-2024-10-01 5.00 20.00 Source: openai, Context: 128000
openai gpt-4o-realtime-preview-2024-12-17 gpt-4o-realtime-preview-2024-12-17 5.00 20.00 Source: openai, Context: 128000
openai gpt-4o-realtime-preview-2025-06-03 gpt-4o-realtime-preview-2025-06-03 5.00 20.00 Source: openai, Context: 128000
openai gpt-4o-search-preview gpt-4o-search-preview 2.50 10.00 Source: openai, Context: 128000
openai gpt-4o-search-preview-2025-03-11 gpt-4o-search-preview-2025-03-11 2.50 10.00 Source: openai, Context: 128000
openai gpt-4o-transcribe gpt-4o-transcribe 2.50 10.00 Source: openai, Context: 16000
openai gpt-image-1.5 gpt-image-1.5 5.00 10.00 Source: openai, Context: N/A
openai gpt-image-1.5-2025-12-16 gpt-image-1.5-2025-12-16 5.00 10.00 Source: openai, Context: N/A
openai gpt-5.1-2025-11-13 gpt-5.1-2025-11-13 1.25 10.00 Source: openai, Context: 272000
openai gpt-5.2-2025-12-11 gpt-5.2-2025-12-11 1.75 14.00 Source: openai, Context: 400000
openai gpt-5.2-pro-2025-12-11 gpt-5.2-pro-2025-12-11 21.00 168.00 Source: openai, Context: 400000
openai gpt-5-pro-2025-10-06 gpt-5-pro-2025-10-06 15.00 120.00 Source: openai, Context: 400000
openai gpt-5-2025-08-07 gpt-5-2025-08-07 1.25 10.00 Source: openai, Context: 272000
openai gpt-5-chat gpt-5-chat 1.25 10.00 Source: openai, Context: 272000
openai gpt-5-mini-2025-08-07 gpt-5-mini-2025-08-07 0.25 2.00 Source: openai, Context: 272000
openai gpt-5-nano-2025-08-07 gpt-5-nano-2025-08-07 0.05 0.40 Source: openai, Context: 272000
openai gpt-image-1 gpt-image-1 5.00 0.00 Source: openai, Context: N/A
openai gpt-image-1-mini gpt-image-1-mini 2.00 0.00 Source: openai, Context: N/A
openai gpt-realtime gpt-realtime 4.00 16.00 Source: openai, Context: 32000
openai gpt-realtime-mini gpt-realtime-mini 0.60 2.40 Source: openai, Context: 128000
openai gpt-realtime-2025-08-28 gpt-realtime-2025-08-28 4.00 16.00 Source: openai, Context: 32000
gradientai alibaba-qwen3-32b alibaba-qwen3-32b 0.00 0.00 Source: gradient_ai, Context: 2048
gradientai anthropic-claude-3-opus anthropic-claude-3-opus 15.00 75.00 Source: gradient_ai, Context: 1024
gradientai anthropic-claude-3.5-haiku anthropic-claude-3.5-haiku 0.80 4.00 Source: gradient_ai, Context: 1024
gradientai anthropic-claude-3.5-sonnet anthropic-claude-3.5-sonnet 3.00 15.00 Source: gradient_ai, Context: 1024
gradientai anthropic-claude-3.7-sonnet anthropic-claude-3.7-sonnet 3.00 15.00 Source: gradient_ai, Context: 1024
gradientai deepseek-r1-distill-llama-70b deepseek-r1-distill-llama-70b 0.99 0.99 Source: gradient_ai, Context: 8000
gradientai llama3-8b-instruct llama3-8b-instruct 0.20 0.20 Source: gradient_ai, Context: 512
gradientai llama3.3-70b-instruct llama3.3-70b-instruct 0.65 0.65 Source: gradient_ai, Context: 2048
gradientai mistral-nemo-instruct-2407 mistral-nemo-instruct-2407 0.30 0.30 Source: gradient_ai, Context: 512
gradientai openai-gpt-4o openai-gpt-4o 0.00 0.00 Source: gradient_ai, Context: 16384
gradientai openai-gpt-4o-mini openai-gpt-4o-mini 0.00 0.00 Source: gradient_ai, Context: 16384
gradientai openai-o3 openai-o3 2.00 8.00 Source: gradient_ai, Context: 100000
gradientai openai-o3-mini openai-o3-mini 1.10 4.40 Source: gradient_ai, Context: 100000
lemonade Qwen3-Coder-30B-A3B-Instruct-GGUF qwen3-coder-30b-a3b-instruct-gguf 0.00 0.00 Source: lemonade, Context: 262144
lemonade gpt-oss-20b-mxfp4-GGUF gpt-oss-20b-mxfp4-gguf 0.00 0.00 Source: lemonade, Context: 131072
lemonade gpt-oss-120b-mxfp-GGUF gpt-oss-120b-mxfp-gguf 0.00 0.00 Source: lemonade, Context: 131072
lemonade Gemma-3-4b-it-GGUF gemma-3-4b-it-gguf 0.00 0.00 Source: lemonade, Context: 128000
lemonade Qwen3-4B-Instruct-2507-GGUF qwen3-4b-instruct-2507-gguf 0.00 0.00 Source: lemonade, Context: 262144
amazonnova nova-micro-v1 nova-micro-v1 0.04 0.14 Source: amazon_nova, Context: 128000
amazonnova nova-lite-v1 nova-lite-v1 0.06 0.24 Source: amazon_nova, Context: 300000
amazonnova nova-premier-v1 nova-premier-v1 2.50 12.50 Source: amazon_nova, Context: 1000000
amazonnova nova-pro-v1 nova-pro-v1 0.80 3.20 Source: amazon_nova, Context: 300000
groq gemma-7b-it gemma-7b-it 0.05 0.08 Source: groq, Context: 8192
groq playai-tts playai-tts 0.00 0.00 Source: groq, Context: 10000
groq whisper-large-v3 whisper-large-v3 0.00 0.00 Source: groq, Context: N/A
groq whisper-large-v3-turbo whisper-large-v3-turbo 0.00 0.00 Source: groq, Context: N/A
openai dall-e-3 dall-e-3 0.00 0.00 Source: openai, Context: N/A
heroku claude-3-5-haiku claude-3-5-haiku 0.00 0.00 Source: heroku, Context: 4096
heroku claude-3-5-sonnet-latest claude-3-5-sonnet-latest 0.00 0.00 Source: heroku, Context: 8192
heroku claude-3-7-sonnet claude-3-7-sonnet 0.00 0.00 Source: heroku, Context: 8192
heroku claude-4-sonnet claude-4-sonnet 0.00 0.00 Source: heroku, Context: 8192
hyperbolic Hermes-3-Llama-3.1-70B hermes-3-llama-3.1-70b 0.12 0.30 Source: hyperbolic, Context: 32768
hyperbolic QwQ-32B qwq-32b 0.20 0.20 Source: hyperbolic, Context: 131072
hyperbolic Qwen2.5-72B-Instruct qwen2.5-72b-instruct 0.12 0.30 Source: hyperbolic, Context: 131072
hyperbolic Qwen2.5-Coder-32B-Instruct qwen2.5-coder-32b-instruct 0.12 0.30 Source: hyperbolic, Context: 32768
hyperbolic Qwen3-235B-A22B qwen3-235b-a22b 2.00 2.00 Source: hyperbolic, Context: 131072
hyperbolic DeepSeek-R1 deepseek-r1 0.40 0.40 Source: hyperbolic, Context: 32768
hyperbolic DeepSeek-R1-0528 deepseek-r1-0528 0.25 0.25 Source: hyperbolic, Context: 131072
hyperbolic DeepSeek-V3 deepseek-v3 0.20 0.20 Source: hyperbolic, Context: 32768
hyperbolic DeepSeek-V3-0324 deepseek-v3-0324 0.40 0.40 Source: hyperbolic, Context: 32768
hyperbolic Llama-3.2-3B-Instruct llama-3.2-3b-instruct 0.12 0.30 Source: hyperbolic, Context: 32768
hyperbolic Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.12 0.30 Source: hyperbolic, Context: 131072
hyperbolic Meta-Llama-3-70B-Instruct meta-llama-3-70b-instruct 0.12 0.30 Source: hyperbolic, Context: 131072
hyperbolic Meta-Llama-3.1-405B-Instruct meta-llama-3.1-405b-instruct 0.12 0.30 Source: hyperbolic, Context: 32768
hyperbolic Meta-Llama-3.1-70B-Instruct meta-llama-3.1-70b-instruct 0.12 0.30 Source: hyperbolic, Context: 32768
hyperbolic Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.12 0.30 Source: hyperbolic, Context: 32768
hyperbolic Kimi-K2-Instruct kimi-k2-instruct 2.00 2.00 Source: hyperbolic, Context: 131072
ai21 j2-light j2-light 3.00 3.00 Source: ai21, Context: 8192
ai21 j2-mid j2-mid 10.00 10.00 Source: ai21, Context: 8192
ai21 j2-ultra j2-ultra 15.00 15.00 Source: ai21, Context: 8192
ai21 jamba-1.5 jamba-1.5 0.20 0.40 Source: ai21, Context: 256000
ai21 jamba-1.5-large jamba-1.5-large 2.00 8.00 Source: ai21, Context: 256000
ai21 jamba-1.5-large@001 jamba-1.5-large@001 2.00 8.00 Source: ai21, Context: 256000
ai21 jamba-1.5-mini jamba-1.5-mini 0.20 0.40 Source: ai21, Context: 256000
ai21 jamba-1.5-mini@001 jamba-1.5-mini@001 0.20 0.40 Source: ai21, Context: 256000
ai21 jamba-large-1.6 jamba-large-1.6 2.00 8.00 Source: ai21, Context: 256000
ai21 jamba-large-1.7 jamba-large-1.7 2.00 8.00 Source: ai21, Context: 256000
ai21 jamba-mini-1.6 jamba-mini-1.6 0.20 0.40 Source: ai21, Context: 256000
ai21 jamba-mini-1.7 jamba-mini-1.7 0.20 0.40 Source: ai21, Context: 256000
jinaai jina-reranker-v2-base-multilingual jina-reranker-v2-base-multilingual 0.02 0.02 Source: jina_ai, Context: 1024
bedrockconverse jp.anthropic.claude-sonnet-4-5-20250929-v1:0 jp.anthropic.claude-sonnet-4-5-20250929-v1:0 3.30 16.50 Source: bedrock_converse, Context: 200000
bedrockconverse jp.anthropic.claude-haiku-4-5-20251001-v1:0 jp.anthropic.claude-haiku-4-5-20251001-v1:0 1.10 5.50 Source: bedrock_converse, Context: 200000
lambdaai deepseek-llama3.3-70b deepseek-llama3.3-70b 0.20 0.60 Source: lambda_ai, Context: 131072
lambdaai deepseek-r1-0528 deepseek-r1-0528 0.20 0.60 Source: lambda_ai, Context: 131072
lambdaai deepseek-r1-671b deepseek-r1-671b 0.80 0.80 Source: lambda_ai, Context: 131072
lambdaai deepseek-v3-0324 deepseek-v3-0324 0.20 0.60 Source: lambda_ai, Context: 131072
lambdaai hermes3-405b hermes3-405b 0.80 0.80 Source: lambda_ai, Context: 131072
lambdaai hermes3-70b hermes3-70b 0.12 0.30 Source: lambda_ai, Context: 131072
lambdaai hermes3-8b hermes3-8b 0.03 0.04 Source: lambda_ai, Context: 131072
lambdaai lfm-40b lfm-40b 0.10 0.20 Source: lambda_ai, Context: 131072
lambdaai lfm-7b lfm-7b 0.03 0.04 Source: lambda_ai, Context: 131072
lambdaai llama-4-maverick-17b-128e-instruct-fp8 llama-4-maverick-17b-128e-instruct-fp8 0.05 0.10 Source: lambda_ai, Context: 131072
lambdaai llama-4-scout-17b-16e-instruct llama-4-scout-17b-16e-instruct 0.05 0.10 Source: lambda_ai, Context: 16384
lambdaai llama3.1-405b-instruct-fp8 llama3.1-405b-instruct-fp8 0.80 0.80 Source: lambda_ai, Context: 131072
lambdaai llama3.1-70b-instruct-fp8 llama3.1-70b-instruct-fp8 0.12 0.30 Source: lambda_ai, Context: 131072
lambdaai llama3.1-8b-instruct llama3.1-8b-instruct 0.03 0.04 Source: lambda_ai, Context: 131072
lambdaai llama3.1-nemotron-70b-instruct-fp8 llama3.1-nemotron-70b-instruct-fp8 0.12 0.30 Source: lambda_ai, Context: 131072
lambdaai llama3.2-11b-vision-instruct llama3.2-11b-vision-instruct 0.02 0.03 Source: lambda_ai, Context: 131072
lambdaai llama3.2-3b-instruct llama3.2-3b-instruct 0.02 0.03 Source: lambda_ai, Context: 131072
lambdaai llama3.3-70b-instruct-fp8 llama3.3-70b-instruct-fp8 0.12 0.30 Source: lambda_ai, Context: 131072
lambdaai qwen25-coder-32b-instruct qwen25-coder-32b-instruct 0.05 0.10 Source: lambda_ai, Context: 131072
lambdaai qwen3-32b-fp8 qwen3-32b-fp8 0.05 0.10 Source: lambda_ai, Context: 131072
alephalpha luminous-base luminous-base 30.00 33.00 Source: aleph_alpha, Context: 2048
alephalpha luminous-base-control luminous-base-control 37.50 41.25 Source: aleph_alpha, Context: 2048
alephalpha luminous-extended luminous-extended 45.00 49.50 Source: aleph_alpha, Context: 2048
alephalpha luminous-extended-control luminous-extended-control 56.25 61.88 Source: aleph_alpha, Context: 2048
alephalpha luminous-supreme luminous-supreme 175.00 192.50 Source: aleph_alpha, Context: 2048
alephalpha luminous-supreme-control luminous-supreme-control 218.75 240.63 Source: aleph_alpha, Context: 2048
vertex medlm-large medlm-large 0.00 0.00 Source: vertex, Context: 8192
vertex medlm-medium medlm-medium 0.00 0.00 Source: vertex, Context: 32768
bedrock meta.llama2-13b-chat-v1 meta.llama2-13b-chat-v1 0.75 1.00 Source: bedrock, Context: 4096
bedrock meta.llama2-70b-chat-v1 meta.llama2-70b-chat-v1 1.95 2.56 Source: bedrock, Context: 4096
bedrock meta.llama3-1-405b-instruct-v1:0 meta.llama3-1-405b-instruct-v1:0 5.32 16.00 Source: bedrock, Context: 128000
bedrock meta.llama3-1-70b-instruct-v1:0 meta.llama3-1-70b-instruct-v1:0 0.99 0.99 Source: bedrock, Context: 128000
bedrock meta.llama3-1-8b-instruct-v1:0 meta.llama3-1-8b-instruct-v1:0 0.22 0.22 Source: bedrock, Context: 128000
bedrock meta.llama3-2-11b-instruct-v1:0 meta.llama3-2-11b-instruct-v1:0 0.35 0.35 Source: bedrock, Context: 128000
bedrock meta.llama3-2-1b-instruct-v1:0 meta.llama3-2-1b-instruct-v1:0 0.10 0.10 Source: bedrock, Context: 128000
bedrock meta.llama3-2-3b-instruct-v1:0 meta.llama3-2-3b-instruct-v1:0 0.15 0.15 Source: bedrock, Context: 128000
bedrock meta.llama3-2-90b-instruct-v1:0 meta.llama3-2-90b-instruct-v1:0 2.00 2.00 Source: bedrock, Context: 128000
bedrockconverse meta.llama3-3-70b-instruct-v1:0 meta.llama3-3-70b-instruct-v1:0 0.72 0.72 Source: bedrock_converse, Context: 128000
bedrockconverse meta.llama4-maverick-17b-instruct-v1:0 meta.llama4-maverick-17b-instruct-v1:0 0.24 0.97 Source: bedrock_converse, Context: 128000
bedrockconverse meta.llama4-scout-17b-instruct-v1:0 meta.llama4-scout-17b-instruct-v1:0 0.17 0.66 Source: bedrock_converse, Context: 128000
metallama Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.00 0.00 Source: meta_llama, Context: 128000
metallama Llama-3.3-8B-Instruct llama-3.3-8b-instruct 0.00 0.00 Source: meta_llama, Context: 128000
metallama Llama-4-Maverick-17B-128E-Instruct-FP8 llama-4-maverick-17b-128e-instruct-fp8 0.00 0.00 Source: meta_llama, Context: 1000000
metallama Llama-4-Scout-17B-16E-Instruct-FP8 llama-4-scout-17b-16e-instruct-fp8 0.00 0.00 Source: meta_llama, Context: 10000000
bedrockconverse minimax.minimax-m2 minimax.minimax-m2 0.30 1.20 Source: bedrock_converse, Context: 128000
minimax speech-02-hd speech-02-hd 0.00 0.00 Source: minimax, Context: N/A
minimax speech-02-turbo speech-02-turbo 0.00 0.00 Source: minimax, Context: N/A
minimax speech-2.6-hd speech-2.6-hd 0.00 0.00 Source: minimax, Context: N/A
minimax speech-2.6-turbo speech-2.6-turbo 0.00 0.00 Source: minimax, Context: N/A
minimax MiniMax-M2.1-lightning minimax-m2.1-lightning 0.30 2.40 Source: minimax, Context: 1000000
bedrockconverse mistral.magistral-small-2509 mistral.magistral-small-2509 0.50 1.50 Source: bedrock_converse, Context: 128000
bedrockconverse mistral.ministral-3-14b-instruct mistral.ministral-3-14b-instruct 0.20 0.20 Source: bedrock_converse, Context: 128000
bedrockconverse mistral.ministral-3-3b-instruct mistral.ministral-3-3b-instruct 0.10 0.10 Source: bedrock_converse, Context: 128000
bedrockconverse mistral.ministral-3-8b-instruct mistral.ministral-3-8b-instruct 0.15 0.15 Source: bedrock_converse, Context: 128000
bedrock mistral.mistral-large-2407-v1:0 mistral.mistral-large-2407-v1:0 3.00 9.00 Source: bedrock, Context: 128000
bedrockconverse mistral.mistral-large-3-675b-instruct mistral.mistral-large-3-675b-instruct 0.50 1.50 Source: bedrock_converse, Context: 128000
bedrock mistral.mistral-small-2402-v1:0 mistral.mistral-small-2402-v1:0 1.00 3.00 Source: bedrock, Context: 32000
bedrockconverse mistral.voxtral-mini-3b-2507 mistral.voxtral-mini-3b-2507 0.04 0.04 Source: bedrock_converse, Context: 128000
bedrockconverse mistral.voxtral-small-24b-2507 mistral.voxtral-small-24b-2507 0.10 0.30 Source: bedrock_converse, Context: 128000
mistral codestral-2405 codestral-2405 1.00 3.00 Source: mistral, Context: 32000
mistral codestral-2508 codestral-2508 0.30 0.90 Source: mistral, Context: 256000
mistral codestral-mamba-latest codestral-mamba-latest 0.25 0.25 Source: mistral, Context: 256000
mistral magistral-medium-2506 magistral-medium-2506 2.00 5.00 Source: mistral, Context: 40000
mistral magistral-medium-2509 magistral-medium-2509 2.00 5.00 Source: mistral, Context: 40000
mistral mistral-ocr-latest mistral-ocr-latest 0.00 0.00 Source: mistral, Context: N/A
mistral mistral-ocr-2505-completion mistral-ocr-2505-completion 0.00 0.00 Source: mistral, Context: N/A
mistral magistral-small-2506 magistral-small-2506 0.50 1.50 Source: mistral, Context: 40000
mistral magistral-small-latest magistral-small-latest 0.50 1.50 Source: mistral, Context: 40000
mistral codestral-embed codestral-embed 0.15 0.00 Source: mistral, Context: 8192
mistral codestral-embed-2505 codestral-embed-2505 0.15 0.00 Source: mistral, Context: 8192
mistral mistral-large-2402 mistral-large-2402 4.00 12.00 Source: mistral, Context: 32000
mistral mistral-large-2407 mistral-large-2407 3.00 9.00 Source: mistral, Context: 128000
mistral mistral-large-3 mistral-large-3 0.50 1.50 Source: mistral, Context: 256000
mistral mistral-medium mistral-medium 2.70 8.10 Source: mistral, Context: 32000
mistral mistral-medium-2312 mistral-medium-2312 2.70 8.10 Source: mistral, Context: 32000
mistral mistral-small mistral-small 0.10 0.30 Source: mistral, Context: 32000
mistral mistral-tiny mistral-tiny 0.25 0.25 Source: mistral, Context: 32000
mistral open-codestral-mamba open-codestral-mamba 0.25 0.25 Source: mistral, Context: 256000
mistral open-mistral-nemo open-mistral-nemo 0.30 0.30 Source: mistral, Context: 128000
mistral open-mistral-nemo-2407 open-mistral-nemo-2407 0.30 0.30 Source: mistral, Context: 128000
mistral pixtral-12b-2409 pixtral-12b-2409 0.15 0.15 Source: mistral, Context: 128000
mistral pixtral-large-2411 pixtral-large-2411 2.00 6.00 Source: mistral, Context: 128000
bedrockconverse moonshot.kimi-k2-thinking moonshot.kimi-k2-thinking 0.60 2.50 Source: bedrock_converse, Context: 128000
moonshot kimi-k2-0711-preview kimi-k2-0711-preview 0.60 2.50 Source: moonshot, Context: 131072
moonshot kimi-k2-0905-preview kimi-k2-0905-preview 0.60 2.50 Source: moonshot, Context: 262144
moonshot kimi-k2-turbo-preview kimi-k2-turbo-preview 1.15 8.00 Source: moonshot, Context: 262144
moonshot kimi-latest kimi-latest 2.00 5.00 Source: moonshot, Context: 131072
moonshot kimi-latest-128k kimi-latest-128k 2.00 5.00 Source: moonshot, Context: 131072
moonshot kimi-latest-32k kimi-latest-32k 1.00 3.00 Source: moonshot, Context: 32768
moonshot kimi-latest-8k kimi-latest-8k 0.20 2.00 Source: moonshot, Context: 8192
moonshot kimi-thinking-preview kimi-thinking-preview 0.60 2.50 Source: moonshot, Context: 131072
moonshot kimi-k2-thinking kimi-k2-thinking 0.60 2.50 Source: moonshot, Context: 262144
moonshot kimi-k2-thinking-turbo kimi-k2-thinking-turbo 1.15 8.00 Source: moonshot, Context: 262144
moonshot moonshot-v1-128k moonshot-v1-128k 2.00 5.00 Source: moonshot, Context: 131072
moonshot moonshot-v1-128k-0430 moonshot-v1-128k-0430 2.00 5.00 Source: moonshot, Context: 131072
moonshot moonshot-v1-128k-vision-preview moonshot-v1-128k-vision-preview 2.00 5.00 Source: moonshot, Context: 131072
moonshot moonshot-v1-32k moonshot-v1-32k 1.00 3.00 Source: moonshot, Context: 32768
moonshot moonshot-v1-32k-0430 moonshot-v1-32k-0430 1.00 3.00 Source: moonshot, Context: 32768
moonshot moonshot-v1-32k-vision-preview moonshot-v1-32k-vision-preview 1.00 3.00 Source: moonshot, Context: 32768
moonshot moonshot-v1-8k moonshot-v1-8k 0.20 2.00 Source: moonshot, Context: 8192
moonshot moonshot-v1-8k-0430 moonshot-v1-8k-0430 0.20 2.00 Source: moonshot, Context: 8192
moonshot moonshot-v1-8k-vision-preview moonshot-v1-8k-vision-preview 0.20 2.00 Source: moonshot, Context: 8192
moonshot moonshot-v1-auto moonshot-v1-auto 2.00 5.00 Source: moonshot, Context: 131072
vertex multimodalembedding multimodalembedding 0.80 0.00 Source: vertex, Context: 2048
vertex multimodalembedding@001 multimodalembedding@001 0.80 0.00 Source: vertex, Context: 2048
nscale QwQ-32B qwq-32b 0.18 0.20 Source: nscale, Context: N/A
nscale Qwen2.5-Coder-32B-Instruct qwen2.5-coder-32b-instruct 0.06 0.20 Source: nscale, Context: N/A
nscale Qwen2.5-Coder-3B-Instruct qwen2.5-coder-3b-instruct 0.01 0.03 Source: nscale, Context: N/A
nscale Qwen2.5-Coder-7B-Instruct qwen2.5-coder-7b-instruct 0.01 0.03 Source: nscale, Context: N/A
nscale FLUX.1-schnell flux.1-schnell 0.00 0.00 Source: nscale, Context: N/A
nscale DeepSeek-R1-Distill-Llama-70B deepseek-r1-distill-llama-70b 0.38 0.38 Source: nscale, Context: N/A
nscale DeepSeek-R1-Distill-Llama-8B deepseek-r1-distill-llama-8b 0.03 0.03 Source: nscale, Context: N/A
nscale DeepSeek-R1-Distill-Qwen-1.5B deepseek-r1-distill-qwen-1.5b 0.09 0.09 Source: nscale, Context: N/A
nscale DeepSeek-R1-Distill-Qwen-14B deepseek-r1-distill-qwen-14b 0.07 0.07 Source: nscale, Context: N/A
nscale DeepSeek-R1-Distill-Qwen-32B deepseek-r1-distill-qwen-32b 0.15 0.15 Source: nscale, Context: N/A
nscale DeepSeek-R1-Distill-Qwen-7B deepseek-r1-distill-qwen-7b 0.20 0.20 Source: nscale, Context: N/A
nscale Llama-3.1-8B-Instruct llama-3.1-8b-instruct 0.03 0.03 Source: nscale, Context: N/A
nscale Llama-3.3-70B-Instruct llama-3.3-70b-instruct 0.20 0.20 Source: nscale, Context: N/A
nscale Llama-4-Scout-17B-16E-Instruct llama-4-scout-17b-16e-instruct 0.09 0.29 Source: nscale, Context: N/A
nscale mixtral-8x22b-instruct-v0.1 mixtral-8x22b-instruct-v0.1 0.60 0.60 Source: nscale, Context: N/A
nscale stable-diffusion-xl-base-1.0 stable-diffusion-xl-base-1.0 0.00 0.00 Source: nscale, Context: N/A
bedrockconverse nvidia.nemotron-nano-12b-v2 nvidia.nemotron-nano-12b-v2 0.20 0.60 Source: bedrock_converse, Context: 128000
bedrockconverse nvidia.nemotron-nano-9b-v2 nvidia.nemotron-nano-9b-v2 0.06 0.23 Source: bedrock_converse, Context: 128000
openai o1-2024-12-17 o1-2024-12-17 15.00 60.00 Source: openai, Context: 200000
openai o1-mini-2024-09-12 o1-mini-2024-09-12 3.00 12.00 Source: openai, Context: 128000
openai o1-preview-2024-09-12 o1-preview-2024-09-12 15.00 60.00 Source: openai, Context: 128000
openai o1-pro-2025-03-19 o1-pro-2025-03-19 150.00 600.00 Source: openai, Context: 200000
openai o3-2025-04-16 o3-2025-04-16 2.00 8.00 Source: openai, Context: 200000
openai o3-deep-research-2025-06-26 o3-deep-research-2025-06-26 10.00 40.00 Source: openai, Context: 200000
openai o3-mini-2025-01-31 o3-mini-2025-01-31 1.10 4.40 Source: openai, Context: 200000
openai o3-pro-2025-06-10 o3-pro-2025-06-10 20.00 80.00 Source: openai, Context: 200000
openai o4-mini-2025-04-16 o4-mini-2025-04-16 1.10 4.40 Source: openai, Context: 200000
openai o4-mini-deep-research-2025-06-26 o4-mini-deep-research-2025-06-26 2.00 8.00 Source: openai, Context: 200000
oci meta.llama-3.1-405b-instruct meta.llama-3.1-405b-instruct 10.68 10.68 Source: oci, Context: 128000
oci meta.llama-3.2-90b-vision-instruct meta.llama-3.2-90b-vision-instruct 2.00 2.00 Source: oci, Context: 128000
oci meta.llama-3.3-70b-instruct meta.llama-3.3-70b-instruct 0.72 0.72 Source: oci, Context: 128000
oci meta.llama-4-maverick-17b-128e-instruct-fp8 meta.llama-4-maverick-17b-128e-instruct-fp8 0.72 0.72 Source: oci, Context: 512000
oci meta.llama-4-scout-17b-16e-instruct meta.llama-4-scout-17b-16e-instruct 0.72 0.72 Source: oci, Context: 192000
oci xai.grok-3 xai.grok-3 3.00 0.15 Source: oci, Context: 131072
oci xai.grok-3-fast xai.grok-3-fast 5.00 25.00 Source: oci, Context: 131072
oci xai.grok-3-mini xai.grok-3-mini 0.30 0.50 Source: oci, Context: 131072
oci xai.grok-3-mini-fast xai.grok-3-mini-fast 0.60 4.00 Source: oci, Context: 131072
oci xai.grok-4 xai.grok-4 3.00 0.15 Source: oci, Context: 128000
oci cohere.command-latest cohere.command-latest 1.56 1.56 Source: oci, Context: 128000
oci cohere.command-a-03-2025 cohere.command-a-03-2025 1.56 1.56 Source: oci, Context: 256000
oci cohere.command-plus-latest cohere.command-plus-latest 1.56 1.56 Source: oci, Context: 128000
ollama codegeex4 codegeex4 0.00 0.00 Source: ollama, Context: 32768
ollama codegemma codegemma 0.00 0.00 Source: ollama, Context: 8192
ollama codellama codellama 0.00 0.00 Source: ollama, Context: 4096
ollama deepseek-coder-v2-base deepseek-coder-v2-base 0.00 0.00 Source: ollama, Context: 8192
ollama deepseek-coder-v2-instruct deepseek-coder-v2-instruct 0.00 0.00 Source: ollama, Context: 32768
ollama deepseek-coder-v2-lite-base deepseek-coder-v2-lite-base 0.00 0.00 Source: ollama, Context: 8192
ollama deepseek-coder-v2-lite-instruct deepseek-coder-v2-lite-instruct 0.00 0.00 Source: ollama, Context: 32768
ollama deepseek-v3.1:671b-cloud deepseek-v3.1:671b-cloud 0.00 0.00 Source: ollama, Context: 163840
ollama gpt-oss:120b-cloud gpt-oss:120b-cloud 0.00 0.00 Source: ollama, Context: 131072
ollama gpt-oss:20b-cloud gpt-oss:20b-cloud 0.00 0.00 Source: ollama, Context: 131072
ollama internlm2_5-20b-chat internlm2_5-20b-chat 0.00 0.00 Source: ollama, Context: 32768
ollama llama2 llama2 0.00 0.00 Source: ollama, Context: 4096
ollama llama2-uncensored llama2-uncensored 0.00 0.00 Source: ollama, Context: 4096
ollama llama2:13b llama2:13b 0.00 0.00 Source: ollama, Context: 4096
ollama llama2:70b llama2:70b 0.00 0.00 Source: ollama, Context: 4096
ollama llama2:7b llama2:7b 0.00 0.00 Source: ollama, Context: 4096
ollama llama3 llama3 0.00 0.00 Source: ollama, Context: 8192
ollama llama3.1 llama3.1 0.00 0.00 Source: ollama, Context: 8192
ollama llama3:70b llama3:70b 0.00 0.00 Source: ollama, Context: 8192
ollama llama3:8b llama3:8b 0.00 0.00 Source: ollama, Context: 8192
ollama mistral mistral 0.00 0.00 Source: ollama, Context: 8192
ollama mistral-7B-Instruct-v0.1 mistral-7b-instruct-v0.1 0.00 0.00 Source: ollama, Context: 8192
ollama mistral-7B-Instruct-v0.2 mistral-7b-instruct-v0.2 0.00 0.00 Source: ollama, Context: 32768
ollama mistral-large-instruct-2407 mistral-large-instruct-2407 0.00 0.00 Source: ollama, Context: 65536
ollama mixtral-8x22B-Instruct-v0.1 mixtral-8x22b-instruct-v0.1 0.00 0.00 Source: ollama, Context: 65536
ollama mixtral-8x7B-Instruct-v0.1 mixtral-8x7b-instruct-v0.1 0.00 0.00 Source: ollama, Context: 32768
ollama orca-mini orca-mini 0.00 0.00 Source: ollama, Context: 4096
ollama qwen3-coder:480b-cloud qwen3-coder:480b-cloud 0.00 0.00 Source: ollama, Context: 262144
ollama vicuna vicuna 0.00 0.00 Source: ollama, Context: 2048
openai omni-moderation-2024-09-26 omni-moderation-2024-09-26 0.00 0.00 Source: openai, Context: 32768
openai omni-moderation-latest omni-moderation-latest 0.00 0.00 Source: openai, Context: 32768
openai omni-moderation-latest-intents omni-moderation-latest-intents 0.00 0.00 Source: openai, Context: 32768
bedrockconverse openai.gpt-oss-120b-1:0 openai.gpt-oss-120b-1:0 0.15 0.60 Source: bedrock_converse, Context: 128000
bedrockconverse openai.gpt-oss-20b-1:0 openai.gpt-oss-20b-1:0 0.07 0.30 Source: bedrock_converse, Context: 128000
bedrockconverse openai.gpt-oss-safeguard-120b openai.gpt-oss-safeguard-120b 0.15 0.60 Source: bedrock_converse, Context: 128000
bedrockconverse openai.gpt-oss-safeguard-20b openai.gpt-oss-safeguard-20b 0.07 0.20 Source: bedrock_converse, Context: 128000
ovhcloud llava-v1.6-mistral-7b-hf llava-v1.6-mistral-7b-hf 0.29 0.29 Source: ovhcloud, Context: 32000
ovhcloud mamba-codestral-7B-v0.1 mamba-codestral-7b-v0.1 0.19 0.19 Source: ovhcloud, Context: 256000
palm chat-bison chat-bison 0.13 0.13 Source: palm, Context: 8192
palm chat-bison-001 chat-bison-001 0.13 0.13 Source: palm, Context: 8192
palm text-bison text-bison 0.13 0.13 Source: palm, Context: 8192
palm text-bison-001 text-bison-001 0.13 0.13 Source: palm, Context: 8192
palm text-bison-safety-off text-bison-safety-off 0.13 0.13 Source: palm, Context: 8192
palm text-bison-safety-recitation-off text-bison-safety-recitation-off 0.13 0.13 Source: palm, Context: 8192
parallelai search search 0.00 0.00 Source: parallel_ai, Context: N/A
parallelai search-pro search-pro 0.00 0.00 Source: parallel_ai, Context: N/A
perplexity codellama-34b-instruct codellama-34b-instruct 0.35 1.40 Source: perplexity, Context: 16384
perplexity codellama-70b-instruct codellama-70b-instruct 0.70 2.80 Source: perplexity, Context: 16384
perplexity llama-2-70b-chat llama-2-70b-chat 0.70 2.80 Source: perplexity, Context: 4096
perplexity llama-3.1-70b-instruct llama-3.1-70b-instruct 1.00 1.00 Source: perplexity, Context: 131072
perplexity llama-3.1-8b-instruct llama-3.1-8b-instruct 0.20 0.20 Source: perplexity, Context: 131072
perplexity llama-3.1-sonar-huge-128k-online llama-3.1-sonar-huge-128k-online 5.00 5.00 Source: perplexity, Context: 127072
perplexity llama-3.1-sonar-large-128k-chat llama-3.1-sonar-large-128k-chat 1.00 1.00 Source: perplexity, Context: 131072
perplexity llama-3.1-sonar-large-128k-online llama-3.1-sonar-large-128k-online 1.00 1.00 Source: perplexity, Context: 127072
perplexity llama-3.1-sonar-small-128k-chat llama-3.1-sonar-small-128k-chat 0.20 0.20 Source: perplexity, Context: 131072
perplexity llama-3.1-sonar-small-128k-online llama-3.1-sonar-small-128k-online 0.20 0.20 Source: perplexity, Context: 127072
perplexity mistral-7b-instruct mistral-7b-instruct 0.07 0.28 Source: perplexity, Context: 4096
perplexity mixtral-8x7b-instruct mixtral-8x7b-instruct 0.07 0.28 Source: perplexity, Context: 4096
perplexity pplx-70b-chat pplx-70b-chat 0.70 2.80 Source: perplexity, Context: 4096
perplexity pplx-70b-online pplx-70b-online 0.00 2.80 Source: perplexity, Context: 4096
perplexity pplx-7b-chat pplx-7b-chat 0.07 0.28 Source: perplexity, Context: 8192
perplexity pplx-7b-online pplx-7b-online 0.00 0.28 Source: perplexity, Context: 4096
perplexity sonar-deep-research sonar-deep-research 2.00 8.00 Source: perplexity, Context: 128000
perplexity sonar-medium-chat sonar-medium-chat 0.60 1.80 Source: perplexity, Context: 16384
perplexity sonar-medium-online sonar-medium-online 0.00 1.80 Source: perplexity, Context: 12000
perplexity sonar-reasoning sonar-reasoning 1.00 5.00 Source: perplexity, Context: 128000
perplexity sonar-small-chat sonar-small-chat 0.07 0.28 Source: perplexity, Context: 16384
perplexity sonar-small-online sonar-small-online 0.00 0.28 Source: perplexity, Context: 12000
publicai apertus-8b-instruct apertus-8b-instruct 0.00 0.00 Source: publicai, Context: 8192
publicai apertus-70b-instruct apertus-70b-instruct 0.00 0.00 Source: publicai, Context: 8192
publicai Gemma-SEA-LION-v4-27B-IT gemma-sea-lion-v4-27b-it 0.00 0.00 Source: publicai, Context: 8192
publicai salamandra-7b-instruct-tools-16k salamandra-7b-instruct-tools-16k 0.00 0.00 Source: publicai, Context: 16384
publicai ALIA-40b-instruct_Q8_0 alia-40b-instruct_q8_0 0.00 0.00 Source: publicai, Context: 8192
publicai Olmo-3-7B-Instruct olmo-3-7b-instruct 0.00 0.00 Source: publicai, Context: 32768
publicai Qwen-SEA-LION-v4-32B-IT qwen-sea-lion-v4-32b-it 0.00 0.00 Source: publicai, Context: 32768
publicai Olmo-3-7B-Think olmo-3-7b-think 0.00 0.00 Source: publicai, Context: 32768
publicai Olmo-3-32B-Think olmo-3-32b-think 0.00 0.00 Source: publicai, Context: 32768
bedrockconverse qwen.qwen3-coder-480b-a35b-v1:0 qwen.qwen3-coder-480b-a35b-v1:0 0.22 1.80 Source: bedrock_converse, Context: 262000
bedrockconverse qwen.qwen3-235b-a22b-2507-v1:0 qwen.qwen3-235b-a22b-2507-v1:0 0.22 0.88 Source: bedrock_converse, Context: 262144
bedrockconverse qwen.qwen3-coder-30b-a3b-v1:0 qwen.qwen3-coder-30b-a3b-v1:0 0.15 0.60 Source: bedrock_converse, Context: 262144
bedrockconverse qwen.qwen3-32b-v1:0 qwen.qwen3-32b-v1:0 0.15 0.60 Source: bedrock_converse, Context: 131072
bedrockconverse qwen.qwen3-next-80b-a3b qwen.qwen3-next-80b-a3b 0.15 1.20 Source: bedrock_converse, Context: 128000
bedrockconverse qwen.qwen3-vl-235b-a22b qwen.qwen3-vl-235b-a22b 0.53 2.66 Source: bedrock_converse, Context: 128000
recraft recraftv2 recraftv2 0.00 0.00 Source: recraft, Context: N/A
recraft recraftv3 recraftv3 0.00 0.00 Source: recraft, Context: N/A
replicate llama-2-13b llama-2-13b 0.10 0.50 Source: replicate, Context: 4096
replicate llama-2-13b-chat llama-2-13b-chat 0.10 0.50 Source: replicate, Context: 4096
replicate llama-2-70b llama-2-70b 0.65 2.75 Source: replicate, Context: 4096
replicate llama-2-70b-chat llama-2-70b-chat 0.65 2.75 Source: replicate, Context: 4096
replicate llama-2-7b llama-2-7b 0.05 0.25 Source: replicate, Context: 4096
replicate llama-2-7b-chat llama-2-7b-chat 0.05 0.25 Source: replicate, Context: 4096
replicate llama-3-70b llama-3-70b 0.65 2.75 Source: replicate, Context: 8192
replicate llama-3-70b-instruct llama-3-70b-instruct 0.65 2.75 Source: replicate, Context: 8192
replicate llama-3-8b llama-3-8b 0.05 0.25 Source: replicate, Context: 8086
replicate llama-3-8b-instruct llama-3-8b-instruct 0.05 0.25 Source: replicate, Context: 8086
replicate mistral-7b-instruct-v0.2 mistral-7b-instruct-v0.2 0.05 0.25 Source: replicate, Context: 4096
replicate mistral-7b-v0.1 mistral-7b-v0.1 0.05 0.25 Source: replicate, Context: 4096
replicate mixtral-8x7b-instruct-v0.1 mixtral-8x7b-instruct-v0.1 0.30 1.00 Source: replicate, Context: 4096
cohere rerank-english-v2.0 rerank-english-v2.0 0.00 0.00 Source: cohere, Context: 4096
cohere rerank-english-v3.0 rerank-english-v3.0 0.00 0.00 Source: cohere, Context: 4096
cohere rerank-multilingual-v2.0 rerank-multilingual-v2.0 0.00 0.00 Source: cohere, Context: 4096
cohere rerank-multilingual-v3.0 rerank-multilingual-v3.0 0.00 0.00 Source: cohere, Context: 4096
cohere rerank-v3.5 rerank-v3.5 0.00 0.00 Source: cohere, Context: 4096
nvidianim nv-rerankqa-mistral-4b-v3 nv-rerankqa-mistral-4b-v3 0.00 0.00 Source: nvidia_nim, Context: N/A
nvidianim llama-3_2-nv-rerankqa-1b-v2 llama-3_2-nv-rerankqa-1b-v2 0.00 0.00 Source: nvidia_nim, Context: N/A
nvidianim llama-3.2-nv-rerankqa-1b-v2 llama-3.2-nv-rerankqa-1b-v2 0.00 0.00 Source: nvidia_nim, Context: N/A
sagemaker meta-textgeneration-llama-2-13b meta-textgeneration-llama-2-13b 0.00 0.00 Source: sagemaker, Context: 4096
sagemaker meta-textgeneration-llama-2-13b-f meta-textgeneration-llama-2-13b-f 0.00 0.00 Source: sagemaker, Context: 4096
sagemaker meta-textgeneration-llama-2-70b meta-textgeneration-llama-2-70b 0.00 0.00 Source: sagemaker, Context: 4096
sagemaker meta-textgeneration-llama-2-70b-b-f meta-textgeneration-llama-2-70b-b-f 0.00 0.00 Source: sagemaker, Context: 4096
sagemaker meta-textgeneration-llama-2-7b meta-textgeneration-llama-2-7b 0.00 0.00 Source: sagemaker, Context: 4096
sagemaker meta-textgeneration-llama-2-7b-f meta-textgeneration-llama-2-7b-f 0.00 0.00 Source: sagemaker, Context: 4096
sambanova DeepSeek-R1 deepseek-r1 5.00 7.00 Source: sambanova, Context: 32768
sambanova DeepSeek-R1-Distill-Llama-70B deepseek-r1-distill-llama-70b 0.70 1.40 Source: sambanova, Context: 131072
sambanova DeepSeek-V3-0324 deepseek-v3-0324 3.00 4.50 Source: sambanova, Context: 32768
sambanova Llama-4-Maverick-17B-128E-Instruct llama-4-maverick-17b-128e-instruct 0.63 1.80 Source: sambanova, Context: 131072
sambanova Llama-4-Scout-17B-16E-Instruct llama-4-scout-17b-16e-instruct 0.40 0.70 Source: sambanova, Context: 8192
sambanova Meta-Llama-3.1-405B-Instruct meta-llama-3.1-405b-instruct 5.00 10.00 Source: sambanova, Context: 16384
sambanova Meta-Llama-3.1-8B-Instruct meta-llama-3.1-8b-instruct 0.10 0.20 Source: sambanova, Context: 16384
sambanova Meta-Llama-3.2-1B-Instruct meta-llama-3.2-1b-instruct 0.04 0.08 Source: sambanova, Context: 16384
sambanova Meta-Llama-3.2-3B-Instruct meta-llama-3.2-3b-instruct 0.08 0.16 Source: sambanova, Context: 4096
sambanova Meta-Llama-3.3-70B-Instruct meta-llama-3.3-70b-instruct 0.60 1.20 Source: sambanova, Context: 131072
sambanova Meta-Llama-Guard-3-8B meta-llama-guard-3-8b 0.30 0.30 Source: sambanova, Context: 16384
sambanova QwQ-32B qwq-32b 0.50 1.00 Source: sambanova, Context: 16384
sambanova Qwen2-Audio-7B-Instruct qwen2-audio-7b-instruct 0.50 100.00 Source: sambanova, Context: 4096
sambanova Qwen3-32B qwen3-32b 0.40 0.80 Source: sambanova, Context: 8192
sambanova DeepSeek-V3.1 deepseek-v3.1 3.00 4.50 Source: sambanova, Context: 32768
sambanova gpt-oss-120b gpt-oss-120b 3.00 4.50 Source: sambanova, Context: 131072
snowflake claude-3-5-sonnet claude-3-5-sonnet 0.00 0.00 Source: snowflake, Context: 18000
snowflake deepseek-r1 deepseek-r1 0.00 0.00 Source: snowflake, Context: 32768
snowflake gemma-7b gemma-7b 0.00 0.00 Source: snowflake, Context: 8000
snowflake jamba-1.5-large jamba-1.5-large 0.00 0.00 Source: snowflake, Context: 256000
snowflake jamba-1.5-mini jamba-1.5-mini 0.00 0.00 Source: snowflake, Context: 256000
snowflake jamba-instruct jamba-instruct 0.00 0.00 Source: snowflake, Context: 256000
snowflake llama2-70b-chat llama2-70b-chat 0.00 0.00 Source: snowflake, Context: 4096
snowflake llama3-70b llama3-70b 0.00 0.00 Source: snowflake, Context: 8000
snowflake llama3-8b llama3-8b 0.00 0.00 Source: snowflake, Context: 8000
snowflake llama3.1-405b llama3.1-405b 0.00 0.00 Source: snowflake, Context: 128000
snowflake llama3.1-70b llama3.1-70b 0.00 0.00 Source: snowflake, Context: 128000
snowflake llama3.1-8b llama3.1-8b 0.00 0.00 Source: snowflake, Context: 128000
snowflake llama3.2-1b llama3.2-1b 0.00 0.00 Source: snowflake, Context: 128000
snowflake llama3.2-3b llama3.2-3b 0.00 0.00 Source: snowflake, Context: 128000
snowflake llama3.3-70b llama3.3-70b 0.00 0.00 Source: snowflake, Context: 128000
snowflake mistral-7b mistral-7b 0.00 0.00 Source: snowflake, Context: 32000
snowflake mistral-large mistral-large 0.00 0.00 Source: snowflake, Context: 32000
snowflake mistral-large2 mistral-large2 0.00 0.00 Source: snowflake, Context: 128000
snowflake mixtral-8x7b mixtral-8x7b 0.00 0.00 Source: snowflake, Context: 32000
snowflake reka-core reka-core 0.00 0.00 Source: snowflake, Context: 32000
snowflake reka-flash reka-flash 0.00 0.00 Source: snowflake, Context: 100000
snowflake snowflake-arctic snowflake-arctic 0.00 0.00 Source: snowflake, Context: 4096
snowflake snowflake-llama-3.1-405b snowflake-llama-3.1-405b 0.00 0.00 Source: snowflake, Context: 8000
snowflake snowflake-llama-3.3-70b snowflake-llama-3.3-70b 0.00 0.00 Source: snowflake, Context: 8000
stability sd3 sd3 0.00 0.00 Source: stability, Context: N/A
stability sd3-large sd3-large 0.00 0.00 Source: stability, Context: N/A
stability sd3-large-turbo sd3-large-turbo 0.00 0.00 Source: stability, Context: N/A
stability sd3-medium sd3-medium 0.00 0.00 Source: stability, Context: N/A
stability sd3.5-large sd3.5-large 0.00 0.00 Source: stability, Context: N/A
stability sd3.5-large-turbo sd3.5-large-turbo 0.00 0.00 Source: stability, Context: N/A
stability sd3.5-medium sd3.5-medium 0.00 0.00 Source: stability, Context: N/A
stability stable-image-ultra stable-image-ultra 0.00 0.00 Source: stability, Context: N/A
stability inpaint inpaint 0.00 0.00 Source: stability, Context: N/A
stability outpaint outpaint 0.00 0.00 Source: stability, Context: N/A
stability erase erase 0.00 0.00 Source: stability, Context: N/A
stability search-and-replace search-and-replace 0.00 0.00 Source: stability, Context: N/A
stability search-and-recolor search-and-recolor 0.00 0.00 Source: stability, Context: N/A
stability remove-background remove-background 0.00 0.00 Source: stability, Context: N/A
stability replace-background-and-relight replace-background-and-relight 0.00 0.00 Source: stability, Context: N/A
stability sketch sketch 0.00 0.00 Source: stability, Context: N/A
stability structure structure 0.00 0.00 Source: stability, Context: N/A
stability style style 0.00 0.00 Source: stability, Context: N/A
stability style-transfer style-transfer 0.00 0.00 Source: stability, Context: N/A
stability fast fast 0.00 0.00 Source: stability, Context: N/A
stability conservative conservative 0.00 0.00 Source: stability, Context: N/A
stability creative creative 0.00 0.00 Source: stability, Context: N/A
stability stable-image-core stable-image-core 0.00 0.00 Source: stability, Context: N/A
bedrock stability.sd3-5-large-v1:0 stability.sd3-5-large-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.sd3-large-v1:0 stability.sd3-large-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-core-v1:0 stability.stable-image-core-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-conservative-upscale-v1:0 stability.stable-conservative-upscale-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-creative-upscale-v1:0 stability.stable-creative-upscale-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-fast-upscale-v1:0 stability.stable-fast-upscale-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-outpaint-v1:0 stability.stable-outpaint-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-control-sketch-v1:0 stability.stable-image-control-sketch-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-control-structure-v1:0 stability.stable-image-control-structure-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-erase-object-v1:0 stability.stable-image-erase-object-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-inpaint-v1:0 stability.stable-image-inpaint-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-remove-background-v1:0 stability.stable-image-remove-background-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-search-recolor-v1:0 stability.stable-image-search-recolor-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-search-replace-v1:0 stability.stable-image-search-replace-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-style-guide-v1:0 stability.stable-image-style-guide-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-style-transfer-v1:0 stability.stable-style-transfer-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-core-v1:1 stability.stable-image-core-v1:1 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-ultra-v1:0 stability.stable-image-ultra-v1:0 0.00 0.00 Source: bedrock, Context: 77
bedrock stability.stable-image-ultra-v1:1 stability.stable-image-ultra-v1:1 0.00 0.00 Source: bedrock, Context: 77
linkup search search 0.00 0.00 Source: linkup, Context: N/A
linkup search-deep search-deep 0.00 0.00 Source: linkup, Context: N/A
tavily search search 0.00 0.00 Source: tavily, Context: N/A
tavily search-advanced search-advanced 0.00 0.00 Source: tavily, Context: N/A
vertex text-bison text-bison 0.00 0.00 Source: vertex, Context: 8192
vertex text-bison32k text-bison32k 0.13 0.13 Source: vertex, Context: 8192
vertex text-bison32k@002 text-bison32k@002 0.13 0.13 Source: vertex, Context: 8192
vertex text-bison@001 text-bison@001 0.00 0.00 Source: vertex, Context: 8192
vertex text-bison@002 text-bison@002 0.00 0.00 Source: vertex, Context: 8192
textcompletioncodestral codestral-2405 codestral-2405 0.00 0.00 Source: text-completion-codestral, Context: 32000
textcompletioncodestral codestral-latest codestral-latest 0.00 0.00 Source: text-completion-codestral, Context: 32000
vertex text-embedding-004 text-embedding-004 0.10 0.00 Source: vertex, Context: 2048
vertex text-embedding-005 text-embedding-005 0.10 0.00 Source: vertex, Context: 2048
openai text-embedding-ada-002-v2 text-embedding-ada-002-v2 0.10 0.00 Source: openai, Context: 8191
vertex text-embedding-large-exp-03-07 text-embedding-large-exp-03-07 0.10 0.00 Source: vertex, Context: 8192
vertex text-embedding-preview-0409 text-embedding-preview-0409 0.01 0.00 Source: vertex, Context: 3072
openai text-moderation-007 text-moderation-007 0.00 0.00 Source: openai, Context: 32768
openai text-moderation-latest text-moderation-latest 0.00 0.00 Source: openai, Context: 32768
openai text-moderation-stable text-moderation-stable 0.00 0.00 Source: openai, Context: 32768
vertex text-multilingual-embedding-002 text-multilingual-embedding-002 0.10 0.00 Source: vertex, Context: 2048
vertex text-multilingual-embedding-preview-0409 text-multilingual-embedding-preview-0409 0.01 0.00 Source: vertex, Context: 3072
vertex text-unicorn text-unicorn 10.00 28.00 Source: vertex, Context: 8192
vertex text-unicorn@001 text-unicorn@001 10.00 28.00 Source: vertex, Context: 8192
vertex textembedding-gecko textembedding-gecko 0.10 0.00 Source: vertex, Context: 3072
vertex textembedding-gecko-multilingual textembedding-gecko-multilingual 0.10 0.00 Source: vertex, Context: 3072
vertex textembedding-gecko-multilingual@001 textembedding-gecko-multilingual@001 0.10 0.00 Source: vertex, Context: 3072
vertex textembedding-gecko@001 textembedding-gecko@001 0.10 0.00 Source: vertex, Context: 3072
vertex textembedding-gecko@003 textembedding-gecko@003 0.10 0.00 Source: vertex, Context: 3072
openai tts-1 tts-1 0.00 0.00 Source: openai, Context: N/A
openai tts-1-hd tts-1-hd 0.00 0.00 Source: openai, Context: N/A
awspolly standard standard 0.00 0.00 Source: aws_polly, Context: N/A
awspolly neural neural 0.00 0.00 Source: aws_polly, Context: N/A
awspolly long-form long-form 0.00 0.00 Source: aws_polly, Context: N/A
awspolly generative generative 0.00 0.00 Source: aws_polly, Context: N/A
bedrockconverse us.amazon.nova-lite-v1:0 us.amazon.nova-lite-v1:0 0.06 0.24 Source: bedrock_converse, Context: 300000
bedrockconverse us.amazon.nova-micro-v1:0 us.amazon.nova-micro-v1:0 0.04 0.14 Source: bedrock_converse, Context: 128000
bedrockconverse us.amazon.nova-premier-v1:0 us.amazon.nova-premier-v1:0 2.50 12.50 Source: bedrock_converse, Context: 1000000
bedrockconverse us.amazon.nova-pro-v1:0 us.amazon.nova-pro-v1:0 0.80 3.20 Source: bedrock_converse, Context: 300000
bedrockconverse us.anthropic.claude-haiku-4-5-20251001-v1:0 us.anthropic.claude-haiku-4-5-20251001-v1:0 1.10 5.50 Source: bedrock_converse, Context: 200000
bedrock us.anthropic.claude-3-5-sonnet-20240620-v1:0 us.anthropic.claude-3-5-sonnet-20240620-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrock us.anthropic.claude-3-5-sonnet-20241022-v2:0 us.anthropic.claude-3-5-sonnet-20241022-v2:0 3.00 15.00 Source: bedrock, Context: 200000
bedrockconverse us.anthropic.claude-3-7-sonnet-20250219-v1:0 us.anthropic.claude-3-7-sonnet-20250219-v1:0 3.00 15.00 Source: bedrock_converse, Context: 200000
bedrock us.anthropic.claude-3-haiku-20240307-v1:0 us.anthropic.claude-3-haiku-20240307-v1:0 0.25 1.25 Source: bedrock, Context: 200000
bedrock us.anthropic.claude-3-opus-20240229-v1:0 us.anthropic.claude-3-opus-20240229-v1:0 15.00 75.00 Source: bedrock, Context: 200000
bedrock us.anthropic.claude-3-sonnet-20240229-v1:0 us.anthropic.claude-3-sonnet-20240229-v1:0 3.00 15.00 Source: bedrock, Context: 200000
bedrockconverse us.anthropic.claude-opus-4-1-20250805-v1:0 us.anthropic.claude-opus-4-1-20250805-v1:0 15.00 75.00 Source: bedrock_converse, Context: 200000
bedrockconverse us.anthropic.claude-sonnet-4-5-20250929-v1:0 us.anthropic.claude-sonnet-4-5-20250929-v1:0 3.30 16.50 Source: bedrock_converse, Context: 200000
bedrockconverse au.anthropic.claude-haiku-4-5-20251001-v1:0 au.anthropic.claude-haiku-4-5-20251001-v1:0 1.10 5.50 Source: bedrock_converse, Context: 200000
bedrockconverse us.anthropic.claude-opus-4-20250514-v1:0 us.anthropic.claude-opus-4-20250514-v1:0 15.00 75.00 Source: bedrock_converse, Context: 200000
bedrockconverse us.anthropic.claude-opus-4-5-20251101-v1:0 us.anthropic.claude-opus-4-5-20251101-v1:0 5.00 25.00 Source: bedrock_converse, Context: 200000
bedrockconverse global.anthropic.claude-opus-4-5-20251101-v1:0 global.anthropic.claude-opus-4-5-20251101-v1:0 5.00 25.00 Source: bedrock_converse, Context: 200000
bedrockconverse eu.anthropic.claude-opus-4-5-20251101-v1:0 eu.anthropic.claude-opus-4-5-20251101-v1:0 5.00 25.00 Source: bedrock_converse, Context: 200000
bedrockconverse us.anthropic.claude-sonnet-4-20250514-v1:0 us.anthropic.claude-sonnet-4-20250514-v1:0 3.00 15.00 Source: bedrock_converse, Context: 1000000
bedrockconverse us.deepseek.r1-v1:0 us.deepseek.r1-v1:0 1.35 5.40 Source: bedrock_converse, Context: 128000
bedrock us.meta.llama3-1-405b-instruct-v1:0 us.meta.llama3-1-405b-instruct-v1:0 5.32 16.00 Source: bedrock, Context: 128000
bedrock us.meta.llama3-1-70b-instruct-v1:0 us.meta.llama3-1-70b-instruct-v1:0 0.99 0.99 Source: bedrock, Context: 128000
bedrock us.meta.llama3-1-8b-instruct-v1:0 us.meta.llama3-1-8b-instruct-v1:0 0.22 0.22 Source: bedrock, Context: 128000
bedrock us.meta.llama3-2-11b-instruct-v1:0 us.meta.llama3-2-11b-instruct-v1:0 0.35 0.35 Source: bedrock, Context: 128000
bedrock us.meta.llama3-2-1b-instruct-v1:0 us.meta.llama3-2-1b-instruct-v1:0 0.10 0.10 Source: bedrock, Context: 128000
bedrock us.meta.llama3-2-3b-instruct-v1:0 us.meta.llama3-2-3b-instruct-v1:0 0.15 0.15 Source: bedrock, Context: 128000
bedrock us.meta.llama3-2-90b-instruct-v1:0 us.meta.llama3-2-90b-instruct-v1:0 2.00 2.00 Source: bedrock, Context: 128000
bedrockconverse us.meta.llama3-3-70b-instruct-v1:0 us.meta.llama3-3-70b-instruct-v1:0 0.72 0.72 Source: bedrock_converse, Context: 128000
bedrockconverse us.meta.llama4-maverick-17b-instruct-v1:0 us.meta.llama4-maverick-17b-instruct-v1:0 0.24 0.97 Source: bedrock_converse, Context: 128000
bedrockconverse us.meta.llama4-scout-17b-instruct-v1:0 us.meta.llama4-scout-17b-instruct-v1:0 0.17 0.66 Source: bedrock_converse, Context: 128000
bedrockconverse us.mistral.pixtral-large-2502-v1:0 us.mistral.pixtral-large-2502-v1:0 2.00 6.00 Source: bedrock_converse, Context: 128000
vercel claude-4-opus claude-4-opus 15.00 75.00 Source: vercel, Context: 200000
vercel claude-4-sonnet claude-4-sonnet 3.00 15.00 Source: vercel, Context: 200000
vercel command-r command-r 0.15 0.60 Source: vercel, Context: 128000
vercel command-r-plus command-r-plus 2.50 10.00 Source: vercel, Context: 128000
vercel deepseek-r1-distill-llama-70b deepseek-r1-distill-llama-70b 0.75 0.99 Source: vercel, Context: 131072
vercel gemma-2-9b gemma-2-9b 0.20 0.20 Source: vercel, Context: 8192
vercel llama-3-70b llama-3-70b 0.59 0.79 Source: vercel, Context: 8192
vercel llama-3-8b llama-3-8b 0.05 0.08 Source: vercel, Context: 8192
vercel mistral-large mistral-large 2.00 6.00 Source: vercel, Context: 32000
vercel mistral-saba-24b mistral-saba-24b 0.79 0.79 Source: vercel, Context: 32768
vertex chirp chirp 0.00 0.00 Source: vertex, Context: N/A
vertex claude-3-5-haiku claude-3-5-haiku 1.00 5.00 Source: vertex, Context: 200000
vertex claude-3-5-haiku@20241022 claude-3-5-haiku@20241022 1.00 5.00 Source: vertex, Context: 200000
vertex claude-haiku-4-5@20251001 claude-haiku-4-5@20251001 1.00 5.00 Source: vertex, Context: 200000
vertex claude-3-5-sonnet claude-3-5-sonnet 3.00 15.00 Source: vertex, Context: 200000
vertex claude-3-5-sonnet-v2 claude-3-5-sonnet-v2 3.00 15.00 Source: vertex, Context: 200000
vertex claude-3-5-sonnet-v2@20241022 claude-3-5-sonnet-v2@20241022 3.00 15.00 Source: vertex, Context: 200000
vertex claude-3-5-sonnet@20240620 claude-3-5-sonnet@20240620 3.00 15.00 Source: vertex, Context: 200000
vertex claude-3-7-sonnet@20250219 claude-3-7-sonnet@20250219 3.00 15.00 Source: vertex, Context: 200000
vertex claude-3-haiku claude-3-haiku 0.25 1.25 Source: vertex, Context: 200000
vertex claude-3-haiku@20240307 claude-3-haiku@20240307 0.25 1.25 Source: vertex, Context: 200000
vertex claude-3-opus claude-3-opus 15.00 75.00 Source: vertex, Context: 200000
vertex claude-3-opus@20240229 claude-3-opus@20240229 15.00 75.00 Source: vertex, Context: 200000
vertex claude-3-sonnet claude-3-sonnet 3.00 15.00 Source: vertex, Context: 200000
vertex claude-3-sonnet@20240229 claude-3-sonnet@20240229 3.00 15.00 Source: vertex, Context: 200000
vertex claude-opus-4 claude-opus-4 15.00 75.00 Source: vertex, Context: 200000
vertex claude-opus-4-1 claude-opus-4-1 15.00 75.00 Source: vertex, Context: 200000
vertex claude-opus-4-1@20250805 claude-opus-4-1@20250805 15.00 75.00 Source: vertex, Context: 200000
vertex claude-opus-4-5 claude-opus-4-5 5.00 25.00 Source: vertex, Context: 200000
vertex claude-opus-4-5@20251101 claude-opus-4-5@20251101 5.00 25.00 Source: vertex, Context: 200000
vertex claude-sonnet-4-5 claude-sonnet-4-5 3.00 15.00 Source: vertex, Context: 200000
vertex claude-sonnet-4-5@20250929 claude-sonnet-4-5@20250929 3.00 15.00 Source: vertex, Context: 200000
vertex claude-opus-4@20250514 claude-opus-4@20250514 15.00 75.00 Source: vertex, Context: 200000
vertex claude-sonnet-4 claude-sonnet-4 3.00 15.00 Source: vertex, Context: 1000000
vertex claude-sonnet-4@20250514 claude-sonnet-4@20250514 3.00 15.00 Source: vertex, Context: 1000000
vertex codestral-2@001 codestral-2@001 0.30 0.90 Source: vertex, Context: 128000
vertex codestral-2 codestral-2 0.30 0.90 Source: vertex, Context: 128000
vertex codestral-2501 codestral-2501 0.20 0.60 Source: vertex, Context: 128000
vertex codestral@2405 codestral@2405 0.20 0.60 Source: vertex, Context: 128000
vertex codestral@latest codestral@latest 0.20 0.60 Source: vertex, Context: 128000
vertex deepseek-v3.1-maas deepseek-v3.1-maas 1.35 5.40 Source: vertex, Context: 163840
vertex deepseek-v3.2-maas deepseek-v3.2-maas 0.56 1.68 Source: vertex, Context: 163840
vertex deepseek-r1-0528-maas deepseek-r1-0528-maas 1.35 5.40 Source: vertex, Context: 65336
vertex imagegeneration@006 imagegeneration@006 0.00 0.00 Source: vertex, Context: N/A
vertex imagen-3.0-fast-generate-001 imagen-3.0-fast-generate-001 0.00 0.00 Source: vertex, Context: N/A
vertex imagen-3.0-generate-001 imagen-3.0-generate-001 0.00 0.00 Source: vertex, Context: N/A
vertex imagen-3.0-generate-002 imagen-3.0-generate-002 0.00 0.00 Source: vertex, Context: N/A
vertex imagen-3.0-capability-001 imagen-3.0-capability-001 0.00 0.00 Source: vertex, Context: N/A
vertex imagen-4.0-fast-generate-001 imagen-4.0-fast-generate-001 0.00 0.00 Source: vertex, Context: N/A
vertex imagen-4.0-generate-001 imagen-4.0-generate-001 0.00 0.00 Source: vertex, Context: N/A
vertex imagen-4.0-ultra-generate-001 imagen-4.0-ultra-generate-001 0.00 0.00 Source: vertex, Context: N/A
vertex jamba-1.5 jamba-1.5 0.20 0.40 Source: vertex, Context: 256000
vertex jamba-1.5-large jamba-1.5-large 2.00 8.00 Source: vertex, Context: 256000
vertex jamba-1.5-large@001 jamba-1.5-large@001 2.00 8.00 Source: vertex, Context: 256000
vertex jamba-1.5-mini jamba-1.5-mini 0.20 0.40 Source: vertex, Context: 256000
vertex jamba-1.5-mini@001 jamba-1.5-mini@001 0.20 0.40 Source: vertex, Context: 256000
vertex llama-3.1-405b-instruct-maas llama-3.1-405b-instruct-maas 5.00 16.00 Source: vertex, Context: 128000
vertex llama-3.1-70b-instruct-maas llama-3.1-70b-instruct-maas 0.00 0.00 Source: vertex, Context: 128000
vertex llama-3.1-8b-instruct-maas llama-3.1-8b-instruct-maas 0.00 0.00 Source: vertex, Context: 128000
vertex llama-3.2-90b-vision-instruct-maas llama-3.2-90b-vision-instruct-maas 0.00 0.00 Source: vertex, Context: 128000
vertex llama-4-maverick-17b-128e-instruct-maas llama-4-maverick-17b-128e-instruct-maas 0.35 1.15 Source: vertex, Context: 1000000
vertex llama-4-maverick-17b-16e-instruct-maas llama-4-maverick-17b-16e-instruct-maas 0.35 1.15 Source: vertex, Context: 1000000
vertex llama-4-scout-17b-128e-instruct-maas llama-4-scout-17b-128e-instruct-maas 0.25 0.70 Source: vertex, Context: 10000000
vertex llama-4-scout-17b-16e-instruct-maas llama-4-scout-17b-16e-instruct-maas 0.25 0.70 Source: vertex, Context: 10000000
vertex llama3-405b-instruct-maas llama3-405b-instruct-maas 0.00 0.00 Source: vertex, Context: 32000
vertex llama3-70b-instruct-maas llama3-70b-instruct-maas 0.00 0.00 Source: vertex, Context: 32000
vertex llama3-8b-instruct-maas llama3-8b-instruct-maas 0.00 0.00 Source: vertex, Context: 32000
vertex minimax-m2-maas minimax-m2-maas 0.30 1.20 Source: vertex, Context: 196608
vertex kimi-k2-thinking-maas kimi-k2-thinking-maas 0.60 2.50 Source: vertex, Context: 256000
vertex mistral-medium-3 mistral-medium-3 0.40 2.00 Source: vertex, Context: 128000
vertex mistral-medium-3@001 mistral-medium-3@001 0.40 2.00 Source: vertex, Context: 128000
vertex mistral-large-2411 mistral-large-2411 2.00 6.00 Source: vertex, Context: 128000
vertex mistral-large@2407 mistral-large@2407 2.00 6.00 Source: vertex, Context: 128000
vertex mistral-large@2411-001 mistral-large@2411-001 2.00 6.00 Source: vertex, Context: 128000
vertex mistral-large@latest mistral-large@latest 2.00 6.00 Source: vertex, Context: 128000
vertex mistral-nemo@2407 mistral-nemo@2407 3.00 3.00 Source: vertex, Context: 128000
vertex mistral-nemo@latest mistral-nemo@latest 0.15 0.15 Source: vertex, Context: 128000
vertex mistral-small-2503 mistral-small-2503 1.00 3.00 Source: vertex, Context: 128000
vertex mistral-small-2503@001 mistral-small-2503@001 1.00 3.00 Source: vertex, Context: 32000
vertex mistral-ocr-2505 mistral-ocr-2505 0.00 0.00 Source: vertex, Context: N/A
vertex deepseek-ocr-maas deepseek-ocr-maas 0.30 1.20 Source: vertex, Context: N/A
vertex gpt-oss-120b-maas gpt-oss-120b-maas 0.15 0.60 Source: vertex, Context: 131072
vertex gpt-oss-20b-maas gpt-oss-20b-maas 0.08 0.30 Source: vertex, Context: 131072
vertex qwen3-235b-a22b-instruct-2507-maas qwen3-235b-a22b-instruct-2507-maas 0.25 1.00 Source: vertex, Context: 262144
vertex qwen3-coder-480b-a35b-instruct-maas qwen3-coder-480b-a35b-instruct-maas 1.00 4.00 Source: vertex, Context: 262144
vertex qwen3-next-80b-a3b-instruct-maas qwen3-next-80b-a3b-instruct-maas 0.15 1.20 Source: vertex, Context: 262144
vertex qwen3-next-80b-a3b-thinking-maas qwen3-next-80b-a3b-thinking-maas 0.15 1.20 Source: vertex, Context: 262144
vertex veo-2.0-generate-001 veo-2.0-generate-001 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.0-fast-generate-preview veo-3.0-fast-generate-preview 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.0-generate-preview veo-3.0-generate-preview 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.0-fast-generate-001 veo-3.0-fast-generate-001 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.0-generate-001 veo-3.0-generate-001 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.1-generate-preview veo-3.1-generate-preview 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.1-fast-generate-preview veo-3.1-fast-generate-preview 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.1-generate-001 veo-3.1-generate-001 0.00 0.00 Source: vertex, Context: 1024
vertex veo-3.1-fast-generate-001 veo-3.1-fast-generate-001 0.00 0.00 Source: vertex, Context: 1024
voyage rerank-2 rerank-2 0.05 0.00 Source: voyage, Context: 16000
voyage rerank-2-lite rerank-2-lite 0.02 0.00 Source: voyage, Context: 8000
voyage rerank-2.5 rerank-2.5 0.05 0.00 Source: voyage, Context: 32000
voyage rerank-2.5-lite rerank-2.5-lite 0.02 0.00 Source: voyage, Context: 32000
voyage voyage-2 voyage-2 0.10 0.00 Source: voyage, Context: 4000
voyage voyage-3 voyage-3 0.06 0.00 Source: voyage, Context: 32000
voyage voyage-3-large voyage-3-large 0.18 0.00 Source: voyage, Context: 32000
voyage voyage-3-lite voyage-3-lite 0.02 0.00 Source: voyage, Context: 32000
voyage voyage-3.5 voyage-3.5 0.06 0.00 Source: voyage, Context: 32000
voyage voyage-3.5-lite voyage-3.5-lite 0.02 0.00 Source: voyage, Context: 32000
voyage voyage-code-2 voyage-code-2 0.12 0.00 Source: voyage, Context: 16000
voyage voyage-code-3 voyage-code-3 0.18 0.00 Source: voyage, Context: 32000
voyage voyage-context-3 voyage-context-3 0.18 0.00 Source: voyage, Context: 120000
voyage voyage-finance-2 voyage-finance-2 0.12 0.00 Source: voyage, Context: 32000
voyage voyage-large-2 voyage-large-2 0.12 0.00 Source: voyage, Context: 16000
voyage voyage-law-2 voyage-law-2 0.12 0.00 Source: voyage, Context: 16000
voyage voyage-lite-01 voyage-lite-01 0.10 0.00 Source: voyage, Context: 4096
voyage voyage-lite-02-instruct voyage-lite-02-instruct 0.10 0.00 Source: voyage, Context: 4000
voyage voyage-multimodal-3 voyage-multimodal-3 0.12 0.00 Source: voyage, Context: 32000
wandb gpt-oss-120b gpt-oss-120b 15,000.00 60,000.00 Source: wandb, Context: 131072
wandb gpt-oss-20b gpt-oss-20b 5,000.00 20,000.00 Source: wandb, Context: 131072
wandb GLM-4.5 glm-4.5 55,000.00 200,000.00 Source: wandb, Context: 131072
wandb DeepSeek-V3.1 deepseek-v3.1 55,000.00 165,000.00 Source: wandb, Context: 128000
watsonx granite-3-8b-instruct granite-3-8b-instruct 0.20 0.20 Source: watsonx, Context: 8192
watsonx mistral-large mistral-large 3.00 10.00 Source: watsonx, Context: 131072
watsonx mt0-xxl-13b mt0-xxl-13b 500.00 2,000.00 Source: watsonx, Context: 8192
watsonx jais-13b-chat jais-13b-chat 500.00 2,000.00 Source: watsonx, Context: 8192
watsonx flan-t5-xl-3b flan-t5-xl-3b 0.60 0.60 Source: watsonx, Context: 8192
watsonx granite-13b-chat-v2 granite-13b-chat-v2 0.60 0.60 Source: watsonx, Context: 8192
watsonx granite-13b-instruct-v2 granite-13b-instruct-v2 0.60 0.60 Source: watsonx, Context: 8192
watsonx granite-3-3-8b-instruct granite-3-3-8b-instruct 0.20 0.20 Source: watsonx, Context: 8192
watsonx granite-4-h-small granite-4-h-small 0.06 0.25 Source: watsonx, Context: 20480
watsonx granite-guardian-3-2-2b granite-guardian-3-2-2b 0.10 0.10 Source: watsonx, Context: 8192
watsonx granite-guardian-3-3-8b granite-guardian-3-3-8b 0.20 0.20 Source: watsonx, Context: 8192
watsonx granite-ttm-1024-96-r2 granite-ttm-1024-96-r2 0.38 0.38 Source: watsonx, Context: 512
watsonx granite-ttm-1536-96-r2 granite-ttm-1536-96-r2 0.38 0.38 Source: watsonx, Context: 512
watsonx granite-ttm-512-96-r2 granite-ttm-512-96-r2 0.38 0.38 Source: watsonx, Context: 512
watsonx granite-vision-3-2-2b granite-vision-3-2-2b 0.10 0.10 Source: watsonx, Context: 8192
watsonx llama-3-2-11b-vision-instruct llama-3-2-11b-vision-instruct 0.35 0.35 Source: watsonx, Context: 128000
watsonx llama-3-2-1b-instruct llama-3-2-1b-instruct 0.10 0.10 Source: watsonx, Context: 128000
watsonx llama-3-2-3b-instruct llama-3-2-3b-instruct 0.15 0.15 Source: watsonx, Context: 128000
watsonx llama-3-2-90b-vision-instruct llama-3-2-90b-vision-instruct 2.00 2.00 Source: watsonx, Context: 128000
watsonx llama-3-3-70b-instruct llama-3-3-70b-instruct 0.71 0.71 Source: watsonx, Context: 128000
watsonx llama-4-maverick-17b llama-4-maverick-17b 0.35 1.40 Source: watsonx, Context: 128000
watsonx llama-guard-3-11b-vision llama-guard-3-11b-vision 0.35 0.35 Source: watsonx, Context: 128000
watsonx mistral-medium-2505 mistral-medium-2505 3.00 10.00 Source: watsonx, Context: 128000
watsonx mistral-small-2503 mistral-small-2503 0.10 0.30 Source: watsonx, Context: 32000
watsonx mistral-small-3-1-24b-instruct-2503 mistral-small-3-1-24b-instruct-2503 0.10 0.30 Source: watsonx, Context: 32000
watsonx pixtral-12b-2409 pixtral-12b-2409 0.35 0.35 Source: watsonx, Context: 128000
watsonx gpt-oss-120b gpt-oss-120b 0.15 0.60 Source: watsonx, Context: 8192
watsonx allam-1-13b-instruct allam-1-13b-instruct 1.80 1.80 Source: watsonx, Context: 8192
watsonx whisper-large-v3-turbo whisper-large-v3-turbo 0.00 0.00 Source: watsonx, Context: N/A
openai whisper-1 whisper-1 0.00 0.00 Source: openai, Context: N/A
xai grok-3-beta grok-3-beta 3.00 15.00 Source: xai, Context: 131072
xai grok-3-fast-beta grok-3-fast-beta 5.00 25.00 Source: xai, Context: 131072
xai grok-3-mini-beta grok-3-mini-beta 0.30 0.50 Source: xai, Context: 131072
xai grok-3-mini-fast-beta grok-3-mini-fast-beta 0.60 4.00 Source: xai, Context: 131072
xai grok-4-fast-reasoning grok-4-fast-reasoning 0.20 0.50 Source: xai, Context: 2000000
xai grok-4-0709 grok-4-0709 3.00 15.00 Source: xai, Context: 256000
xai grok-4-latest grok-4-latest 3.00 15.00 Source: xai, Context: 256000
xai grok-4-1-fast-reasoning grok-4-1-fast-reasoning 0.20 0.50 Source: xai, Context: 2000000
xai grok-4-1-fast-reasoning-latest grok-4-1-fast-reasoning-latest 0.20 0.50 Source: xai, Context: 2000000
xai grok-4-1-fast-non-reasoning-latest grok-4-1-fast-non-reasoning-latest 0.20 0.50 Source: xai, Context: 2000000
xai grok-code-fast grok-code-fast 0.20 1.50 Source: xai, Context: 256000
xai grok-code-fast-1-0825 grok-code-fast-1-0825 0.20 1.50 Source: xai, Context: 256000
vertex search_api search_api 0.00 0.00 Source: vertex, Context: N/A
openai container container 0.00 0.00 Source: openai, Context: N/A
openai sora-2 sora-2 0.00 0.00 Source: openai, Context: N/A
openai sora-2-pro sora-2-pro 0.00 0.00 Source: openai, Context: N/A
azure sora-2 sora-2 0.00 0.00 Source: azure, Context: N/A
azure sora-2-pro sora-2-pro 0.00 0.00 Source: azure, Context: N/A
azure sora-2-pro-high-res sora-2-pro-high-res 0.00 0.00 Source: azure, Context: N/A
runwayml gen4_turbo gen4_turbo 0.00 0.00 Source: runwayml, Context: N/A
runwayml gen4_aleph gen4_aleph 0.00 0.00 Source: runwayml, Context: N/A
runwayml gen3a_turbo gen3a_turbo 0.00 0.00 Source: runwayml, Context: N/A
runwayml gen4_image gen4_image 0.00 0.00 Source: runwayml, Context: N/A
runwayml gen4_image_turbo gen4_image_turbo 0.00 0.00 Source: runwayml, Context: N/A
runwayml eleven_multilingual_v2 eleven_multilingual_v2 0.00 0.00 Source: runwayml, Context: N/A
fireworksai flux-kontext-pro flux-kontext-pro 0.04 0.04 Source: fireworks_ai, Context: 4096
fireworksai SSD-1B ssd-1b 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai chronos-hermes-13b-v2 chronos-hermes-13b-v2 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai code-llama-13b code-llama-13b 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai code-llama-13b-instruct code-llama-13b-instruct 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai code-llama-13b-python code-llama-13b-python 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai code-llama-34b code-llama-34b 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai code-llama-34b-instruct code-llama-34b-instruct 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai code-llama-34b-python code-llama-34b-python 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai code-llama-70b code-llama-70b 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai code-llama-70b-instruct code-llama-70b-instruct 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai code-llama-70b-python code-llama-70b-python 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai code-llama-7b code-llama-7b 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai code-llama-7b-instruct code-llama-7b-instruct 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai code-llama-7b-python code-llama-7b-python 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai code-qwen-1p5-7b code-qwen-1p5-7b 0.20 0.20 Source: fireworks_ai, Context: 65536
fireworksai codegemma-2b codegemma-2b 0.10 0.10 Source: fireworks_ai, Context: 8192
fireworksai codegemma-7b codegemma-7b 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai cogito-671b-v2-p1 cogito-671b-v2-p1 1.20 1.20 Source: fireworks_ai, Context: 163840
fireworksai cogito-v1-preview-llama-3b cogito-v1-preview-llama-3b 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai cogito-v1-preview-llama-70b cogito-v1-preview-llama-70b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai cogito-v1-preview-llama-8b cogito-v1-preview-llama-8b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai cogito-v1-preview-qwen-14b cogito-v1-preview-qwen-14b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai cogito-v1-preview-qwen-32b cogito-v1-preview-qwen-32b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai flux-kontext-max flux-kontext-max 0.08 0.08 Source: fireworks_ai, Context: 4096
fireworksai dbrx-instruct dbrx-instruct 1.20 1.20 Source: fireworks_ai, Context: 32768
fireworksai deepseek-coder-1b-base deepseek-coder-1b-base 0.10 0.10 Source: fireworks_ai, Context: 16384
fireworksai deepseek-coder-33b-instruct deepseek-coder-33b-instruct 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai deepseek-coder-7b-base deepseek-coder-7b-base 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai deepseek-coder-7b-base-v1p5 deepseek-coder-7b-base-v1p5 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai deepseek-coder-7b-instruct-v1p5 deepseek-coder-7b-instruct-v1p5 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai deepseek-coder-v2-lite-base deepseek-coder-v2-lite-base 0.50 0.50 Source: fireworks_ai, Context: 163840
fireworksai deepseek-coder-v2-lite-instruct deepseek-coder-v2-lite-instruct 0.50 0.50 Source: fireworks_ai, Context: 163840
fireworksai deepseek-prover-v2 deepseek-prover-v2 1.20 1.20 Source: fireworks_ai, Context: 163840
fireworksai deepseek-r1-0528-distill-qwen3-8b deepseek-r1-0528-distill-qwen3-8b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai deepseek-r1-distill-llama-70b deepseek-r1-distill-llama-70b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai deepseek-r1-distill-llama-8b deepseek-r1-distill-llama-8b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai deepseek-r1-distill-qwen-14b deepseek-r1-distill-qwen-14b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai deepseek-r1-distill-qwen-1p5b deepseek-r1-distill-qwen-1p5b 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai deepseek-r1-distill-qwen-32b deepseek-r1-distill-qwen-32b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai deepseek-r1-distill-qwen-7b deepseek-r1-distill-qwen-7b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai deepseek-v2-lite-chat deepseek-v2-lite-chat 0.50 0.50 Source: fireworks_ai, Context: 163840
fireworksai deepseek-v2p5 deepseek-v2p5 1.20 1.20 Source: fireworks_ai, Context: 32768
fireworksai devstral-small-2505 devstral-small-2505 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai dobby-mini-unhinged-plus-llama-3-1-8b dobby-mini-unhinged-plus-llama-3-1-8b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai dobby-unhinged-llama-3-3-70b-new dobby-unhinged-llama-3-3-70b-new 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai dolphin-2-9-2-qwen2-72b dolphin-2-9-2-qwen2-72b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai dolphin-2p6-mixtral-8x7b dolphin-2p6-mixtral-8x7b 0.50 0.50 Source: fireworks_ai, Context: 32768
fireworksai ernie-4p5-21b-a3b-pt ernie-4p5-21b-a3b-pt 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai ernie-4p5-300b-a47b-pt ernie-4p5-300b-a47b-pt 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai fare-20b fare-20b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai firefunction-v1 firefunction-v1 0.50 0.50 Source: fireworks_ai, Context: 32768
fireworksai firellava-13b firellava-13b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai firesearch-ocr-v6 firesearch-ocr-v6 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai fireworks-asr-large fireworks-asr-large 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai fireworks-asr-v2 fireworks-asr-v2 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai flux-1-dev flux-1-dev 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai flux-1-dev-controlnet-union flux-1-dev-controlnet-union 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai flux-1-dev-fp8 flux-1-dev-fp8 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai flux-1-schnell flux-1-schnell 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai flux-1-schnell-fp8 flux-1-schnell-fp8 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai gemma-2b-it gemma-2b-it 0.10 0.10 Source: fireworks_ai, Context: 8192
fireworksai gemma-3-27b-it gemma-3-27b-it 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai gemma-7b gemma-7b 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai gemma-7b-it gemma-7b-it 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai gemma2-9b-it gemma2-9b-it 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai glm-4p5v glm-4p5v 1.20 1.20 Source: fireworks_ai, Context: 131072
fireworksai gpt-oss-safeguard-120b gpt-oss-safeguard-120b 1.20 1.20 Source: fireworks_ai, Context: 131072
fireworksai gpt-oss-safeguard-20b gpt-oss-safeguard-20b 0.50 0.50 Source: fireworks_ai, Context: 131072
fireworksai hermes-2-pro-mistral-7b hermes-2-pro-mistral-7b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai internvl3-38b internvl3-38b 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai internvl3-78b internvl3-78b 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai internvl3-8b internvl3-8b 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai japanese-stable-diffusion-xl japanese-stable-diffusion-xl 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai kat-coder kat-coder 0.90 0.90 Source: fireworks_ai, Context: 262144
fireworksai kat-dev-32b kat-dev-32b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai kat-dev-72b-exp kat-dev-72b-exp 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai llama-guard-2-8b llama-guard-2-8b 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai llama-guard-3-1b llama-guard-3-1b 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai llama-guard-3-8b llama-guard-3-8b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai llama-v2-13b llama-v2-13b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai llama-v2-13b-chat llama-v2-13b-chat 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai llama-v2-70b llama-v2-70b 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai llama-v2-70b-chat llama-v2-70b-chat 0.90 0.90 Source: fireworks_ai, Context: 2048
fireworksai llama-v2-7b llama-v2-7b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai llama-v2-7b-chat llama-v2-7b-chat 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai llama-v3-70b-instruct llama-v3-70b-instruct 0.90 0.90 Source: fireworks_ai, Context: 8192
fireworksai llama-v3-70b-instruct-hf llama-v3-70b-instruct-hf 0.90 0.90 Source: fireworks_ai, Context: 8192
fireworksai llama-v3-8b llama-v3-8b 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai llama-v3-8b-instruct-hf llama-v3-8b-instruct-hf 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai llama-v3p1-405b-instruct-long llama-v3p1-405b-instruct-long 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai llama-v3p1-70b-instruct llama-v3p1-70b-instruct 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai llama-v3p1-70b-instruct-1b llama-v3p1-70b-instruct-1b 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai llama-v3p1-nemotron-70b-instruct llama-v3p1-nemotron-70b-instruct 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai llama-v3p2-1b llama-v3p2-1b 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai llama-v3p2-3b llama-v3p2-3b 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai llama-v3p3-70b-instruct llama-v3p3-70b-instruct 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai llamaguard-7b llamaguard-7b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai llava-yi-34b llava-yi-34b 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai minimax-m1-80k minimax-m1-80k 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai ministral-3-14b-instruct-2512 ministral-3-14b-instruct-2512 0.20 0.20 Source: fireworks_ai, Context: 256000
fireworksai ministral-3-3b-instruct-2512 ministral-3-3b-instruct-2512 0.10 0.10 Source: fireworks_ai, Context: 256000
fireworksai ministral-3-8b-instruct-2512 ministral-3-8b-instruct-2512 0.20 0.20 Source: fireworks_ai, Context: 256000
fireworksai mistral-7b mistral-7b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai mistral-7b-instruct-4k mistral-7b-instruct-4k 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai mistral-7b-instruct-v0p2 mistral-7b-instruct-v0p2 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai mistral-7b-instruct-v3 mistral-7b-instruct-v3 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai mistral-7b-v0p2 mistral-7b-v0p2 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai mistral-large-3-fp8 mistral-large-3-fp8 1.20 1.20 Source: fireworks_ai, Context: 256000
fireworksai mistral-nemo-base-2407 mistral-nemo-base-2407 0.20 0.20 Source: fireworks_ai, Context: 128000
fireworksai mistral-nemo-instruct-2407 mistral-nemo-instruct-2407 0.20 0.20 Source: fireworks_ai, Context: 128000
fireworksai mistral-small-24b-instruct-2501 mistral-small-24b-instruct-2501 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai mixtral-8x22b mixtral-8x22b 1.20 1.20 Source: fireworks_ai, Context: 65536
fireworksai mixtral-8x22b-instruct mixtral-8x22b-instruct 1.20 1.20 Source: fireworks_ai, Context: 65536
fireworksai mixtral-8x7b mixtral-8x7b 0.50 0.50 Source: fireworks_ai, Context: 32768
fireworksai mixtral-8x7b-instruct mixtral-8x7b-instruct 0.50 0.50 Source: fireworks_ai, Context: 32768
fireworksai mixtral-8x7b-instruct-hf mixtral-8x7b-instruct-hf 0.50 0.50 Source: fireworks_ai, Context: 32768
fireworksai mythomax-l2-13b mythomax-l2-13b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai nemotron-nano-v2-12b-vl nemotron-nano-v2-12b-vl 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai nous-capybara-7b-v1p9 nous-capybara-7b-v1p9 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai nous-hermes-2-mixtral-8x7b-dpo nous-hermes-2-mixtral-8x7b-dpo 0.50 0.50 Source: fireworks_ai, Context: 32768
fireworksai nous-hermes-2-yi-34b nous-hermes-2-yi-34b 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai nous-hermes-llama2-13b nous-hermes-llama2-13b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai nous-hermes-llama2-70b nous-hermes-llama2-70b 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai nous-hermes-llama2-7b nous-hermes-llama2-7b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai nvidia-nemotron-nano-12b-v2 nvidia-nemotron-nano-12b-v2 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai nvidia-nemotron-nano-9b-v2 nvidia-nemotron-nano-9b-v2 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai openchat-3p5-0106-7b openchat-3p5-0106-7b 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai openhermes-2-mistral-7b openhermes-2-mistral-7b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai openhermes-2p5-mistral-7b openhermes-2p5-mistral-7b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai openorca-7b openorca-7b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai phi-2-3b phi-2-3b 0.10 0.10 Source: fireworks_ai, Context: 2048
fireworksai phi-3-mini-128k-instruct phi-3-mini-128k-instruct 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai phi-3-vision-128k-instruct phi-3-vision-128k-instruct 0.20 0.20 Source: fireworks_ai, Context: 32064
fireworksai phind-code-llama-34b-python-v1 phind-code-llama-34b-python-v1 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai phind-code-llama-34b-v1 phind-code-llama-34b-v1 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai phind-code-llama-34b-v2 phind-code-llama-34b-v2 0.90 0.90 Source: fireworks_ai, Context: 16384
fireworksai playground-v2-1024px-aesthetic playground-v2-1024px-aesthetic 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai playground-v2-5-1024px-aesthetic playground-v2-5-1024px-aesthetic 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai pythia-12b pythia-12b 0.20 0.20 Source: fireworks_ai, Context: 2048
fireworksai qwen-qwq-32b-preview qwen-qwq-32b-preview 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen-v2p5-14b-instruct qwen-v2p5-14b-instruct 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen-v2p5-7b qwen-v2p5-7b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai qwen1p5-72b-chat qwen1p5-72b-chat 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen2-7b-instruct qwen2-7b-instruct 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen2-vl-2b-instruct qwen2-vl-2b-instruct 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2-vl-72b-instruct qwen2-vl-72b-instruct 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen2-vl-7b-instruct qwen2-vl-7b-instruct 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-0p5b-instruct qwen2p5-0p5b-instruct 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-14b qwen2p5-14b 0.20 0.20 Source: fireworks_ai, Context: 131072
fireworksai qwen2p5-1p5b-instruct qwen2p5-1p5b-instruct 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-32b qwen2p5-32b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai qwen2p5-32b-instruct qwen2p5-32b-instruct 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-72b qwen2p5-72b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai qwen2p5-72b-instruct qwen2p5-72b-instruct 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-7b-instruct qwen2p5-7b-instruct 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-0p5b qwen2p5-coder-0p5b 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-0p5b-instruct qwen2p5-coder-0p5b-instruct 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-14b qwen2p5-coder-14b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-14b-instruct qwen2p5-coder-14b-instruct 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-1p5b qwen2p5-coder-1p5b 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-1p5b-instruct qwen2p5-coder-1p5b-instruct 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-32b qwen2p5-coder-32b 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-32b-instruct-128k qwen2p5-coder-32b-instruct-128k 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai qwen2p5-coder-32b-instruct-32k-rope qwen2p5-coder-32b-instruct-32k-rope 0.90 0.90 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-32b-instruct-64k qwen2p5-coder-32b-instruct-64k 0.90 0.90 Source: fireworks_ai, Context: 65536
fireworksai qwen2p5-coder-3b qwen2p5-coder-3b 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-3b-instruct qwen2p5-coder-3b-instruct 0.10 0.10 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-7b qwen2p5-coder-7b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-coder-7b-instruct qwen2p5-coder-7b-instruct 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai qwen2p5-math-72b-instruct qwen2p5-math-72b-instruct 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai qwen2p5-vl-32b-instruct qwen2p5-vl-32b-instruct 0.90 0.90 Source: fireworks_ai, Context: 128000
fireworksai qwen2p5-vl-3b-instruct qwen2p5-vl-3b-instruct 0.20 0.20 Source: fireworks_ai, Context: 128000
fireworksai qwen2p5-vl-72b-instruct qwen2p5-vl-72b-instruct 0.90 0.90 Source: fireworks_ai, Context: 128000
fireworksai qwen2p5-vl-7b-instruct qwen2p5-vl-7b-instruct 0.20 0.20 Source: fireworks_ai, Context: 128000
fireworksai qwen3-0p6b qwen3-0p6b 0.10 0.10 Source: fireworks_ai, Context: 40960
fireworksai qwen3-14b qwen3-14b 0.20 0.20 Source: fireworks_ai, Context: 40960
fireworksai qwen3-1p7b qwen3-1p7b 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai qwen3-1p7b-fp8-draft qwen3-1p7b-fp8-draft 0.10 0.10 Source: fireworks_ai, Context: 262144
fireworksai qwen3-1p7b-fp8-draft-131072 qwen3-1p7b-fp8-draft-131072 0.10 0.10 Source: fireworks_ai, Context: 131072
fireworksai qwen3-1p7b-fp8-draft-40960 qwen3-1p7b-fp8-draft-40960 0.10 0.10 Source: fireworks_ai, Context: 40960
fireworksai qwen3-235b-a22b-instruct-2507 qwen3-235b-a22b-instruct-2507 0.22 0.88 Source: fireworks_ai, Context: 262144
fireworksai qwen3-235b-a22b-thinking-2507 qwen3-235b-a22b-thinking-2507 0.22 0.88 Source: fireworks_ai, Context: 262144
fireworksai qwen3-30b-a3b qwen3-30b-a3b 0.15 0.60 Source: fireworks_ai, Context: 131072
fireworksai qwen3-30b-a3b-instruct-2507 qwen3-30b-a3b-instruct-2507 0.50 0.50 Source: fireworks_ai, Context: 262144
fireworksai qwen3-30b-a3b-thinking-2507 qwen3-30b-a3b-thinking-2507 0.90 0.90 Source: fireworks_ai, Context: 262144
fireworksai qwen3-32b qwen3-32b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai qwen3-4b qwen3-4b 0.20 0.20 Source: fireworks_ai, Context: 40960
fireworksai qwen3-4b-instruct-2507 qwen3-4b-instruct-2507 0.20 0.20 Source: fireworks_ai, Context: 262144
fireworksai qwen3-8b qwen3-8b 0.20 0.20 Source: fireworks_ai, Context: 40960
fireworksai qwen3-coder-30b-a3b-instruct qwen3-coder-30b-a3b-instruct 0.15 0.60 Source: fireworks_ai, Context: 262144
fireworksai qwen3-coder-480b-instruct-bf16 qwen3-coder-480b-instruct-bf16 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai qwen3-embedding-0p6b qwen3-embedding-0p6b 0.00 0.00 Source: fireworks_ai, Context: 32768
fireworksai qwen3-embedding-4b qwen3-embedding-4b 0.00 0.00 Source: fireworks_ai, Context: 40960
fireworksai - 0.10 0.00 Source: fireworks_ai, Context: 40960
fireworksai qwen3-next-80b-a3b-instruct qwen3-next-80b-a3b-instruct 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai qwen3-next-80b-a3b-thinking qwen3-next-80b-a3b-thinking 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai qwen3-reranker-0p6b qwen3-reranker-0p6b 0.00 0.00 Source: fireworks_ai, Context: 40960
fireworksai qwen3-reranker-4b qwen3-reranker-4b 0.00 0.00 Source: fireworks_ai, Context: 40960
fireworksai qwen3-reranker-8b qwen3-reranker-8b 0.00 0.00 Source: fireworks_ai, Context: 40960
fireworksai qwen3-vl-235b-a22b-instruct qwen3-vl-235b-a22b-instruct 0.22 0.88 Source: fireworks_ai, Context: 262144
fireworksai qwen3-vl-235b-a22b-thinking qwen3-vl-235b-a22b-thinking 0.22 0.88 Source: fireworks_ai, Context: 262144
fireworksai qwen3-vl-30b-a3b-instruct qwen3-vl-30b-a3b-instruct 0.15 0.60 Source: fireworks_ai, Context: 262144
fireworksai qwen3-vl-30b-a3b-thinking qwen3-vl-30b-a3b-thinking 0.15 0.60 Source: fireworks_ai, Context: 262144
fireworksai qwen3-vl-32b-instruct qwen3-vl-32b-instruct 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai qwen3-vl-8b-instruct qwen3-vl-8b-instruct 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai qwq-32b qwq-32b 0.90 0.90 Source: fireworks_ai, Context: 131072
fireworksai rolm-ocr rolm-ocr 0.20 0.20 Source: fireworks_ai, Context: 128000
fireworksai snorkel-mistral-7b-pairrm-dpo snorkel-mistral-7b-pairrm-dpo 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai stable-diffusion-xl-1024-v1-0 stable-diffusion-xl-1024-v1-0 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai stablecode-3b stablecode-3b 0.10 0.10 Source: fireworks_ai, Context: 4096
fireworksai starcoder-16b starcoder-16b 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai starcoder-7b starcoder-7b 0.20 0.20 Source: fireworks_ai, Context: 8192
fireworksai starcoder2-15b starcoder2-15b 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai starcoder2-3b starcoder2-3b 0.10 0.10 Source: fireworks_ai, Context: 16384
fireworksai starcoder2-7b starcoder2-7b 0.20 0.20 Source: fireworks_ai, Context: 16384
fireworksai toppy-m-7b toppy-m-7b 0.20 0.20 Source: fireworks_ai, Context: 32768
fireworksai whisper-v3 whisper-v3 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai whisper-v3-turbo whisper-v3-turbo 0.00 0.00 Source: fireworks_ai, Context: 4096
fireworksai yi-34b yi-34b 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai yi-34b-200k-capybara yi-34b-200k-capybara 0.90 0.90 Source: fireworks_ai, Context: 200000
fireworksai yi-34b-chat yi-34b-chat 0.90 0.90 Source: fireworks_ai, Context: 4096
fireworksai yi-6b yi-6b 0.20 0.20 Source: fireworks_ai, Context: 4096
fireworksai zephyr-7b-beta zephyr-7b-beta 0.20 0.20 Source: fireworks_ai, Context: 32768
openrouter ByteDance Seed: Seed 1.6 Flash seed-1.6-flash 0.08 0.30 Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens. Context: 262144
openrouter ByteDance Seed: Seed 1.6 seed-1.6 0.25 2.00 Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window. Context: 262144
openrouter MiniMax: MiniMax M2.1 minimax-m2.1 0.12 0.48 MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world capability while maintaining exceptional latency, scalability, and cost efficiency. Compared to its predecessor, M2.1 delivers cleaner, more concise outputs and faster perceived response times. It shows leading multilingual coding performance across major systems and application languages, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, and serves as a versatile agent “brain” for IDEs, coding tools, and general-purpose assistance. To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks). Context: 196608
openrouter Z.AI: GLM 4.7 glm-4.7 0.16 0.80 GLM-4.7 is Z.AI’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics. Context: 202752
openrouter Google: Gemini 3 Flash Preview gemini-3-flash-preview 0.50 3.00 Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models. Context: 1048576
openrouter Mistral: Mistral Small Creative mistral-small-creative 0.10 0.30 Mistral Small Creative is an experimental small model designed for creative writing, narrative generation, roleplay and character-driven dialogue, general-purpose instruction following, and conversational agents. Context: 32768
openrouter AllenAI: Olmo 3.1 32B Think (free) olmo-3.1-32b-think:free 0.00 0.00 Olmo 3.1 32B Think is a large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following. Building on the Olmo 3 series, version 3.1 delivers refined reasoning behavior and stronger performance across demanding evaluations and nuanced conversational tasks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Think continues the Olmo initiative’s commitment to openness, providing full transparency across model weights, code, and training methodology. Context: 65536
openrouter Xiaomi: MiMo-V2-Flash (free) mimo-v2-flash:free 0.00 0.00 MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much. Note: when integrating with agentic tools such as Claude Code, Cline, or Roo Code, **turn off reasoning mode** for the best and fastest performance—this model is deeply optimized for this scenario. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config). Context: 262144
openrouter NVIDIA: Nemotron 3 Nano 30B A3B (free) nemotron-3-nano-30b-a3b:free 0.00 0.00 NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security. Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems. Context: 256000
openrouter NVIDIA: Nemotron 3 Nano 30B A3B nemotron-3-nano-30b-a3b 0.06 0.24 NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security. Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems. Context: 262144
openrouter OpenAI: GPT-5.2 Chat gpt-5.2-chat 1.75 14.00 GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation. Context: 128000
openrouter OpenAI: GPT-5.2 Pro gpt-5.2-pro 21.00 168.00 GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
openrouter OpenAI: GPT-5.2 gpt-5.2 1.75 14.00 GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability. Context: 400000
openrouter Mistral: Devstral 2 2512 (free) devstral-2512:free 0.00 0.00 Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license. Context: 262144
openrouter Mistral: Devstral 2 2512 devstral-2512 0.05 0.22 Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license. Context: 262144
openrouter Relace: Relace Search relace-search 1.00 3.00 The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It's designed to serve as a subagent that passes its findings to an "oracle" coding agent, who orchestrates/performs the rest of the coding task. To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the [Relace documentation](https://docs.relace.ai/docs/fast-agentic-search/agent). Context: 256000
openrouter Z.AI: GLM 4.6V glm-4.6v 0.30 0.90 GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing. Context: 131072
openrouter Nex AGI: DeepSeek V3.1 Nex N1 (free) deepseek-v3.1-nex-n1:free 0.00 0.00 DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent autonomy, tool use, and real-world productivity. Nex-N1 demonstrates competitive performance across all evaluation scenarios, showing particularly strong results in practical coding and HTML generation tasks. Context: 131072
openrouter EssentialAI: Rnj 1 Instruct rnj-1-instruct 0.15 0.15 Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance across multiple programming languages, tool-use workflows, and agentic execution environments (e.g., mini-SWE-agent). Context: 32768
openrouter Body Builder (beta) bodybuilder -1,000,000.00 -1,000,000.00 Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder will construct the appropriate API calls. Example: "count to 10 using gemini and opus." This is useful for creating multi-model requests, custom model routers, or programmatic generation of API calls from human descriptions. **BETA NOTICE**: Body Builder is in beta, and currently free. Pricing and functionality may change in the future. Context: 128000
openrouter OpenAI: GPT-5.1-Codex-Max gpt-5.1-codex-max 1.25 10.00 GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic workflows spanning software engineering, mathematics, and research. GPT-5.1-Codex-Max delivers faster performance, improved reasoning, and higher token efficiency across the development lifecycle. Context: 400000
openrouter Amazon: Nova 2 Lite nova-2-lite-v1 0.30 2.50 Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows. Context: 1000000
openrouter Mistral: Ministral 3 14B 2512 ministral-14b-2512 0.20 0.20 The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. Context: 262144
openrouter Mistral: Ministral 3 8B 2512 ministral-8b-2512 0.15 0.15 A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities. Context: 262144
openrouter Mistral: Ministral 3 3B 2512 ministral-3b-2512 0.10 0.10 The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities. Context: 131072
openrouter Mistral: Mistral Large 3 2512 mistral-large-2512 0.50 1.50 Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license. Context: 262144
openrouter Arcee AI: Trinity Mini (free) trinity-mini:free 0.00 0.00 Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows. Context: 131072
openrouter Arcee AI: Trinity Mini trinity-mini 0.05 0.15 Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows. Context: 131072
openrouter DeepSeek: DeepSeek V3.2 Speciale deepseek-v3.2-speciale 0.27 0.41 DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on DeepSeek Sparse Attention (DSA) for efficient long-context processing, then scales post-training reinforcement learning to push capability beyond the base model. Reported evaluations place Speciale ahead of GPT-5 on difficult reasoning workloads, with proficiency comparable to Gemini-3.0-Pro, while retaining strong coding and tool-use reliability. Like V3.2, it benefits from a large-scale agentic task synthesis pipeline that improves compliance and generalization in interactive environments. Context: 163840
openrouter DeepSeek: DeepSeek V3.2 deepseek-v3.2 0.25 0.38 DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 163840
openrouter Prime Intellect: INTELLECT-3 intellect-3 0.20 1.10 INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL). It offers state-of-the-art performance for its size across math, code, science, and general reasoning, consistently outperforming many larger frontier models. Designed for strong multi-step problem solving, it maintains high accuracy on structured tasks while remaining efficient at inference thanks to its MoE architecture. Context: 131072
openrouter TNG: R1T Chimera tng-r1t-chimera 0.25 0.85 TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter. Characteristics and improvements include: We think that it has a creative and pleasant personality. It has a preliminary EQ-Bench3 value of about 1305. It is quite a bit more intelligent than the original, albeit a slightly slower. It is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated. Tool calling is much improved. TNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1). Context: 163840
openrouter Anthropic: Claude Opus 4.5 claude-opus-4.5 5.00 25.00 Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high. Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks. Context: 200000
openrouter AllenAI: Olmo 3 32B Think (free) olmo-3-32b-think:free 0.00 0.00 Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and highly nuanced conversational reasoning. Developed by Ai2 under the Apache 2.0 license, Olmo 3 32B Think embodies the Olmo initiative’s commitment to openness, offering full transparency across weights, code and training methodology. Context: 65536
openrouter AllenAI: Olmo 3 7B Instruct olmo-3-7b-instruct 0.10 0.20 Olmo 3 7B Instruct is a supervised instruction-fine-tuned variant of the Olmo 3 7B base model, optimized for instruction-following, question-answering, and natural conversational dialogue. By leveraging high-quality instruction data and an open training pipeline, it delivers strong performance across everyday NLP tasks while remaining accessible and easy to integrate. Developed by Ai2 under the Apache 2.0 license, the model offers a transparent, community-friendly option for instruction-driven applications. Context: 65536
openrouter AllenAI: Olmo 3 7B Think olmo-3-7b-think 0.12 0.20 Olmo 3 7B Think is a research-oriented language model in the Olmo family designed for advanced reasoning and instruction-driven tasks. It excels at multi-step problem solving, logical inference, and maintaining coherent conversational context. Developed by Ai2 under the Apache 2.0 license, Olmo 3 7B Think supports transparent, fully open experimentation and provides a lightweight yet capable foundation for academic research and practical NLP workflows. Context: 65536
openrouter Google: Nano Banana Pro (Gemini 3 Pro Image Preview) gemini-3-pro-image-preview 2.00 12.00 Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model generates context-rich graphics, from infographics and diagrams to cinematic composites, and can incorporate real-time information via Search grounding. It offers industry-leading text rendering in images (including long passages and multilingual layouts), consistent multi-image blending, and accurate identity preservation across up to five subjects. Nano Banana Pro adds fine-grained creative controls such as localized edits, lighting and focus adjustments, camera transformations, and support for 2K/4K outputs and flexible aspect ratios. It is designed for professional-grade design, product visualization, storyboarding, and complex multi-element compositions while remaining efficient for general image creation workflows. Context: 65536
openrouter xAI: Grok 4.1 Fast grok-4.1-fast 0.20 0.50 Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) Context: 2000000
openrouter Google: Gemini 3 Pro Preview gemini-3-pro-preview 2.00 12.00 Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing. Context: 1048576
openrouter Deep Cogito: Cogito v2.1 671B cogito-v2.1-671b 1.25 1.25 Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning to reach state-of-the-art performance on multiple categories (instruction following, coding, longer queries and creative writing). This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement. Context: 128000
openrouter OpenAI: GPT-5.1 gpt-5.1 1.25 10.00 GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5 Context: 400000
openrouter OpenAI: GPT-5.1 Chat gpt-5.1-chat 1.25 10.00 GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation. Context: 128000
openrouter OpenAI: GPT-5.1-Codex gpt-5.1-codex 1.25 10.00 GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. Context: 400000
openrouter OpenAI: GPT-5.1-Codex-Mini gpt-5.1-codex-mini 0.25 2.00 GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex Context: 400000
openrouter Kwaipilot: KAT-Coder-Pro V1 (free) kat-coder-pro:free 0.00 0.00 KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL. Context: 256000
openrouter Kwaipilot: KAT-Coder-Pro V1 kat-coder-pro 0.21 0.83 KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL. Context: 256000
openrouter MoonshotAI: Kimi K2 Thinking kimi-k2-thinking 0.32 0.48 Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift. It sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks. Context: 262144
openrouter Amazon: Nova Premier 1.0 nova-premier-v1 2.50 12.50 Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models. Context: 1000000
openrouter Perplexity: Sonar Pro Search sonar-pro-search 3.00 15.00 Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests. This model powers the Pro Search mode on the Perplexity platform. Sonar Pro Search adds autonomous, multi-step reasoning to Sonar Pro. So, instead of just one query + synthesis, it plans and executes entire research workflows using tools. Context: 200000
openrouter Mistral: Voxtral Small 24B 2507 voxtral-small-24b-2507 0.10 0.30 Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio is priced at $100 per million seconds. Context: 32000
openrouter OpenAI: gpt-oss-safeguard-20b gpt-oss-safeguard-20b 0.08 0.30 gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust & safety labeling. Learn more about this model in OpenAI's gpt-oss-safeguard [user guide](https://cookbook.openai.com/articles/gpt-oss-safeguard-guide). Context: 131072
openrouter NVIDIA: Nemotron Nano 12B 2 VL (free) nemotron-nano-12b-v2-vl:free 0.00 0.00 NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes. Context: 128000
openrouter NVIDIA: Nemotron Nano 12B 2 VL nemotron-nano-12b-v2-vl 0.20 0.60 NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes. Context: 131072
openrouter MiniMax: MiniMax M2 minimax-m2 0.20 1.00 MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning, tool use, and multi-step task execution while maintaining low latency and deployment efficiency. The model excels in code generation, multi-file editing, compile-run-fix loops, and test-validated repair, showing strong results on SWE-Bench Verified, Multi-SWE-Bench, and Terminal-Bench. It also performs competitively in agentic evaluations such as BrowseComp and GAIA, effectively handling long-horizon planning, retrieval, and recovery from execution errors. Benchmarked by [Artificial Analysis](https://artificialanalysis.ai/models/minimax-m2), MiniMax-M2 ranks among the top open-source models for composite intelligence, spanning mathematics, science, and instruction-following. Its small activation footprint enables fast inference, high concurrency, and improved unit economics, making it well-suited for large-scale agents, developer assistants, and reasoning-driven applications that require responsiveness and cost efficiency. To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks). Context: 196608
openrouter Qwen: Qwen3 VL 32B Instruct qwen3-vl-32b-instruct 0.50 1.50 Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks. Context: 262144
openrouter LiquidAI/LFM2-8B-A1B lfm2-8b-a1b 0.05 0.10 Model created via inbox interface Context: 32768
openrouter LiquidAI/LFM2-2.6B lfm-2.2-6b 0.05 0.10 LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. Context: 32768
openrouter IBM: Granite 4.0 Micro granite-4.0-h-micro 0.02 0.11 Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long context tool calling. Context: 131000
openrouter Deep Cogito: Cogito V2 Preview Llama 405B cogito-v2-preview-llama-405b 3.50 3.50 Cogito v2 405B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. It represents a significant step toward frontier intelligence with dense architecture delivering performance competitive with leading closed models. This advanced reasoning system combines policy improvement with massive scale for exceptional capabilities. Context: 32768
openrouter OpenAI: GPT-5 Image Mini gpt-5-image-mini 2.50 2.00 GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text rendering, and detailed image editing with reduced latency and cost. It excels at high-quality visual creation while maintaining strong text understanding, making it ideal for applications that require both efficient image generation and text processing at scale. Context: 400000
openrouter Anthropic: Claude Haiku 4.5 claude-haiku-4.5 1.00 5.00 Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment. Context: 200000
openrouter Qwen: Qwen3 VL 8B Thinking qwen3-vl-8b-thinking 0.18 2.10 Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs. Context: 256000
openrouter Qwen: Qwen3 VL 8B Instruct qwen3-vl-8b-instruct 0.08 0.50 Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions. Context: 131072
openrouter OpenAI: GPT-5 Image gpt-5-image 10.00 10.00 [GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following, text rendering, and detailed image editing. Context: 400000
openrouter OpenAI: o3 Deep Research o3-deep-research 10.00 40.00 o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost. Context: 200000
openrouter OpenAI: o4 Mini Deep Research o4-mini-deep-research 2.00 8.00 o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost. Context: 200000
openrouter NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 llama-3.3-nemotron-super-49b-v1.5 0.10 0.40 Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter. Context: 131072
openrouter Baidu: ERNIE 4.5 21B A3B Thinking ernie-4.5-21b-a3b-thinking 0.07 0.28 ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performance in logical puzzles, math, science, coding, text generation, and expert-level academic benchmarks. Context: 131072
openrouter Google: Gemini 2.5 Flash Image (Nano Banana) gemini-2.5-flash-image 0.30 2.50 Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration) Context: 32768
openrouter Qwen: Qwen3 VL 30B A3B Thinking qwen3-vl-30b-a3b-thinking 0.20 1.00 Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research. Context: 131072
openrouter Qwen: Qwen3 VL 30B A3B Instruct qwen3-vl-30b-a3b-instruct 0.15 0.60 Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research. Context: 262144
openrouter OpenAI: GPT-5 Pro gpt-5-pro 15.00 120.00 GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
openrouter Z.AI: GLM 4.6 glm-4.6 0.35 1.50 Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. Context: 202752
openrouter Z.AI: GLM 4.6 (exacto) glm-4.6:exacto 0.44 1.76 Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. Context: 204800
openrouter Anthropic: Claude Sonnet 4.5 claude-sonnet-4.5 3.00 15.00 Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use. Context: 1000000
openrouter DeepSeek: DeepSeek V3.2 Exp deepseek-v3.2-exp 0.21 0.32 DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs. Context: 163840
openrouter TheDrummer: Cydonia 24B V4.1 cydonia-24b-v4.1 0.30 0.50 Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence. Context: 131072
openrouter Relace: Relace Apply 3 relace-apply-3 0.85 1.25 Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at 10,000 tokens/sec on average. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update> Zero Data Retention is enabled for Relace. Learn more about this model in their [documentation](https://docs.relace.ai/api-reference/instant-apply/apply) Context: 256000
openrouter Google: Gemini 2.5 Flash Preview 09-2025 gemini-2.5-flash-preview-09-2025 0.30 2.50 Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning). Context: 1048576
openrouter Google: Gemini 2.5 Flash Lite Preview 09-2025 gemini-2.5-flash-lite-preview-09-2025 0.10 0.40 Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. Context: 1048576
openrouter Qwen: Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking 0.45 3.50 Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents. Context: 262144
openrouter Qwen: Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct 0.12 0.56 Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents. Context: 262144
openrouter Qwen: Qwen3 Max qwen3-max 1.20 6.00 Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode. Context: 256000
openrouter Qwen: Qwen3 Coder Plus qwen3-coder-plus 1.00 5.00 Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities. Context: 128000
openrouter OpenAI: GPT-5 Codex gpt-5-codex 1.25 10.00 GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. Context: 400000
openrouter DeepSeek: DeepSeek V3.1 Terminus (exacto) deepseek-v3.1-terminus:exacto 0.21 0.79 DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. Context: 163840
openrouter DeepSeek: DeepSeek V3.1 Terminus deepseek-v3.1-terminus 0.21 0.79 DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. Context: 163840
openrouter xAI: Grok 4 Fast grok-4-fast 0.20 0.50 Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model on xAI's [news post](http://x.ai/news/grok-4-fast). Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) Context: 2000000
openrouter Tongyi DeepResearch 30B A3B tongyi-deepresearch-30b-a3b 0.09 0.40 Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billion per token. It's optimized for long-horizon, deep information-seeking tasks and delivers state-of-the-art performance on benchmarks like Humanity's Last Exam, BrowserComp, BrowserComp-ZH, WebWalkerQA, GAIA, xbench-DeepSearch, and FRAMES. This makes it superior for complex agentic search, reasoning, and multi-step problem-solving compared to prior models. The model includes a fully automated synthetic data pipeline for scalable pre-training, fine-tuning, and reinforcement learning. It uses large-scale continual pre-training on diverse agentic data to boost reasoning and stay fresh. It also features end-to-end on-policy RL with a customized Group Relative Policy Optimization, including token-level gradients and negative sample filtering for stable training. The model supports ReAct for core ability checks and an IterResearch-based 'Heavy' mode for max performance through test-time scaling. It's ideal for advanced research agents, tool use, and heavy inference workflows. Context: 131072
openrouter Qwen: Qwen3 Coder Flash qwen3-coder-flash 0.30 1.50 Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities. Context: 128000
openrouter OpenGVLab: InternVL3 78B internvl3-78b 0.10 0.39 The InternVL3 series is an advanced multimodal large language model (MLLM). Compared to InternVL 2.5, InternVL3 demonstrates stronger multimodal perception and reasoning capabilities. In addition, InternVL3 is benchmarked against the Qwen2.5 Chat models, whose pre-trained base models serve as the initialization for its language component. Benefiting from Native Multimodal Pre-Training, the InternVL3 series surpasses the Qwen2.5 series in overall text performance. Context: 32768
openrouter Qwen: Qwen3 Next 80B A3B Thinking qwen3-next-80b-a3b-thinking 0.15 1.20 Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode. Context: 262144
openrouter Qwen: Qwen3 Next 80B A3B Instruct qwen3-next-80b-a3b-instruct 0.06 0.60 Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought. The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred. Context: 262144
openrouter Meituan: LongCat Flash Chat longcat-flash-chat 0.20 0.80 LongCat-Flash-Chat is a large-scale Mixture-of-Experts (MoE) model with 560B total parameters, of which 18.6B–31.3B (≈27B on average) are dynamically activated per input. It introduces a shortcut-connected MoE design to reduce communication overhead and achieve high throughput while maintaining training stability through advanced scaling strategies such as hyperparameter transfer, deterministic computation, and multi-stage optimization. This release, LongCat-Flash-Chat, is a non-thinking foundation model optimized for conversational and agentic tasks. It supports long context windows up to 128K tokens and shows competitive performance across reasoning, coding, instruction following, and domain benchmarks, with particular strengths in tool use and complex multi-step interactions. Context: 131072
openrouter Qwen: Qwen Plus 0728 qwen-plus-2025-07-28 0.40 1.20 Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination. Context: 1000000
openrouter Qwen: Qwen Plus 0728 (thinking) qwen-plus-2025-07-28:thinking 0.40 4.00 Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination. Context: 1000000
openrouter NVIDIA: Nemotron Nano 9B V2 (free) nemotron-nano-9b-v2:free 0.00 0.00 NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so. Context: 128000
openrouter NVIDIA: Nemotron Nano 9B V2 nemotron-nano-9b-v2 0.04 0.16 NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so. Context: 131072
openrouter MoonshotAI: Kimi K2 0905 kimi-k2-0905 0.39 1.90 Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training. Context: 262144
openrouter MoonshotAI: Kimi K2 0905 (exacto) kimi-k2-0905:exacto 0.60 2.50 Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training. Context: 262144
openrouter Deep Cogito: Cogito V2 Preview Llama 70B cogito-v2-preview-llama-70b 0.88 0.88 Cogito v2 70B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. Built with iterative policy improvement, it delivers strong performance across reasoning tasks while maintaining efficiency through shorter reasoning chains and improved intuition. Context: 32768
openrouter Cogito V2 Preview Llama 109B cogito-v2-preview-llama-109b-moe 0.18 0.59 An instruction-tuned, hybrid-reasoning Mixture-of-Experts model built on Llama-4-Scout-17B-16E. Cogito v2 can answer directly or engage an extended “thinking” phase, with alignment guided by Iterated Distillation & Amplification (IDA). It targets coding, STEM, instruction following, and general helpfulness, with stronger multilingual, tool-calling, and reasoning performance than size-equivalent baselines. The model supports long-context use (up to 10M tokens) and standard Transformers workflows. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 32767
openrouter StepFun: Step3 step3 0.57 1.42 Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators. Context: 65536
openrouter Qwen: Qwen3 30B A3B Thinking 2507 qwen3-30b-a3b-thinking-2507 0.05 0.34 Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated from final answers. Compared to earlier Qwen3-30B releases, this version improves performance across logical reasoning, mathematics, science, coding, and multilingual benchmarks. It also demonstrates stronger instruction following, tool use, and alignment with human preferences. With higher reasoning efficiency and extended output budgets, it is best suited for advanced research, competitive problem solving, and agentic applications requiring structured long-context reasoning. Context: 32768
openrouter xAI: Grok Code Fast 1 grok-code-fast-1 0.20 1.50 Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows. Context: 256000
openrouter Nous: Hermes 4 70B hermes-4-70b 0.11 0.38 Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit <think>...</think> reasoning traces before answering. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates. Context: 131072
openrouter Nous: Hermes 4 405B hermes-4-405b 1.00 3.00 Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with <think>...</think> traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior. Context: 131072
openrouter Google: Gemini 2.5 Flash Image Preview (Nano Banana) gemini-2.5-flash-image-preview 0.30 2.50 Gemini 2.5 Flash Image Preview, a.k.a. "Nano Banana," is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Context: 32768
openrouter DeepSeek: DeepSeek V3.1 deepseek-chat-v3.1 0.15 0.75 DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the [DeepSeek V3-0324](/deepseek/deepseek-chat-v3-0324) model and performs well on a variety of tasks. Context: 32768
openrouter OpenAI: GPT-4o Audio gpt-4o-audio-preview 2.50 10.00 The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs are currently not supported. Audio tokens are priced at $40 per million input audio tokens. Context: 128000
openrouter Mistral: Mistral Medium 3.1 mistral-medium-3.1 0.40 2.00 Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments. Context: 131072
openrouter Baidu: ERNIE 4.5 21B A3B ernie-4.5-21b-a3b 0.07 0.28 A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understanding and generation through heterogeneous MoE structures and modality-isolated routing. Supporting an extensive 131K token context length, the model achieves efficient inference via multi-expert parallel collaboration and quantization, while advanced post-training techniques including SFT, DPO, and UPO ensure optimized performance across diverse applications with specialized routing and balancing losses for superior task handling. Context: 120000
openrouter Baidu: ERNIE 4.5 VL 28B A3B ernie-4.5-vl-28b-a3b 0.14 0.56 A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities. Context: 30000
openrouter Z.AI: GLM 4.5V glm-4.5v 0.60 1.80 GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 65536
openrouter AI21: Jamba Mini 1.7 jamba-mini-1.7 0.20 0.40 Jamba Mini 1.7 is a compact and efficient member of the Jamba open model family, incorporating key improvements in grounding and instruction-following while maintaining the benefits of the SSM-Transformer hybrid architecture and 256K context window. Despite its compact size, it delivers accurate, contextually grounded responses and improved steerability. Context: 256000
openrouter AI21: Jamba Large 1.7 jamba-large-1.7 2.00 8.00 Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions. Context: 256000
openrouter OpenAI: GPT-5 Chat gpt-5-chat 1.25 10.00 GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications. Context: 128000
openrouter OpenAI: GPT-5 gpt-5 1.25 10.00 GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
openrouter OpenAI: GPT-5 Mini gpt-5-mini 0.25 2.00 GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. GPT-5 Mini is the successor to OpenAI's o4-mini model. Context: 400000
openrouter OpenAI: GPT-5 Nano gpt-5-nano 0.05 0.40 GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger counterparts, it retains key instruction-following and safety features. It is the successor to GPT-4.1-nano and offers a lightweight option for cost-sensitive or real-time applications. Context: 400000
openrouter OpenAI: gpt-oss-120b (free) gpt-oss-120b:free 0.00 0.00 gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
openrouter OpenAI: gpt-oss-120b gpt-oss-120b 0.02 0.10 gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
openrouter OpenAI: gpt-oss-120b (exacto) gpt-oss-120b:exacto 0.04 0.19 gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
openrouter OpenAI: gpt-oss-20b (free) gpt-oss-20b:free 0.00 0.00 gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072
openrouter OpenAI: gpt-oss-20b gpt-oss-20b 0.02 0.06 gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072
openrouter Anthropic: Claude Opus 4.1 claude-opus-4.1 15.00 75.00 Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning. Context: 200000
openrouter Mistral: Codestral 2508 codestral-2508 0.30 0.90 Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. [Blog Post](https://mistral.ai/news/codestral-25-08) Context: 256000
openrouter Qwen: Qwen3 Coder 30B A3B Instruct qwen3-coder-30b-a3b-instruct 0.07 0.27 Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the Qwen3 architecture, it supports a native context length of 256K tokens (extendable to 1M with Yarn) and performs strongly in tasks involving function calls, browser use, and structured code completion. This model is optimized for instruction-following without “thinking mode”, and integrates well with OpenAI-compatible tool-use formats. Context: 160000
openrouter Qwen: Qwen3 30B A3B Instruct 2507 qwen3-30b-a3b-instruct-2507 0.08 0.33 Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and agentic tool use. Post-trained on instruction data, it demonstrates competitive performance across reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench) benchmarks. It outperforms its non-instruct variant on subjective and open-ended tasks while retaining strong factual and coding performance. Context: 262144
openrouter Z.AI: GLM 4.5 glm-4.5 0.35 1.55 GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
openrouter Z.AI: GLM 4.5 Air (free) glm-4.5-air:free 0.00 0.00 GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
openrouter Z.AI: GLM 4.5 Air glm-4.5-air 0.05 0.22 GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
openrouter Qwen: Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 0.11 0.60 Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases. Context: 262144
openrouter Z.AI: GLM 4 32B glm-4-32b 0.10 0.10 GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models. Context: 128000
openrouter Qwen: Qwen3 Coder 480B A35B (free) qwen3-coder:free 0.00 0.00 Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262000
openrouter Qwen: Qwen3 Coder 480B A35B qwen3-coder 0.22 0.95 Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262144
openrouter Qwen: Qwen3 Coder 480B A35B (exacto) qwen3-coder:exacto 0.22 1.80 Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262144
openrouter ByteDance: UI-TARS 7B ui-tars-1.5-7b 0.10 0.20 UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints. Context: 128000
openrouter Google: Gemini 2.5 Flash Lite gemini-2.5-flash-lite 0.10 0.40 Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. Context: 1048576
openrouter Qwen: Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-2507 0.07 0.46 Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench. Context: 262144
openrouter Switchpoint Router router 0.85 3.40 Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you always benefit from the industry's newest models without changing your workflow. This model is configured for a simple, flat rate per response here on OpenRouter. It's powered by the full routing engine from [Switchpoint AI](https://www.switchpoint.dev). Context: 131072
openrouter MoonshotAI: Kimi K2 0711 (free) kimi-k2:free 0.00 0.00 Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training. Context: 32768
openrouter MoonshotAI: Kimi K2 0711 kimi-k2 0.50 2.40 Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training. Context: 131072
openrouter THUDM: GLM 4.1V 9B Thinking glm-4.1v-9b-thinking 0.04 0.14 GLM-4.1V-9B-Thinking is a 9B parameter vision-language model developed by THUDM, based on the GLM-4-9B foundation. It introduces a reasoning-centric "thinking paradigm" enhanced with reinforcement learning to improve multimodal reasoning, long-context understanding (up to 64K tokens), and complex problem solving. It achieves state-of-the-art performance among models in its class, outperforming even larger models like Qwen-2.5-VL-72B on a majority of benchmark tasks. Context: 65536
openrouter Mistral: Devstral Medium devstral-medium 0.40 2.00 Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves 61.6% on SWE-Bench Verified, placing it ahead of Gemini 2.5 Pro and GPT-4.1 in code-related tasks, at a fraction of the cost. It is designed for generalization across prompt styles and tool use in code agents and frameworks. Devstral Medium is available via API only (not open-weight), and supports enterprise deployment on private infrastructure, with optional fine-tuning capabilities. Context: 131072
openrouter Mistral: Devstral Small 1.1 devstral-small 0.07 0.28 Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats. Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes. Context: 128000
openrouter Venice: Uncensored (free) dolphin-mistral-24b-venice-edition:free 0.00 0.00 Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models. Context: 32768
openrouter xAI: Grok 4 grok-4 3.00 15.00 Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified. Pricing increases once the total tokens in a given request is greater than 128k tokens. See more details on the [xAI docs](https://docs.x.ai/docs/models/grok-4-0709) Context: 256000
openrouter Google: Gemma 3n 2B (free) gemma-3n-e2b-it:free 0.00 0.00 Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based on the MatFormer architecture, it supports nested submodels and modular composition via the Mix-and-Match framework. Gemma 3n models are optimized for low-resource deployment, offering 32K context length and strong multilingual and reasoning performance across common benchmarks. This variant is trained on a diverse corpus including code, math, web, and multimodal data. Context: 8192
openrouter Tencent: Hunyuan A13B Instruct hunyuan-a13b-instruct 0.14 0.57 Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark performance across mathematics, science, coding, and multi-turn reasoning tasks, while maintaining high inference efficiency via Grouped Query Attention (GQA) and quantization support (FP8, GPTQ, etc.). Context: 131072
openrouter TNG: DeepSeek R1T2 Chimera (free) deepseek-r1t2-chimera:free 0.00 0.00 DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks. Context: 163840
openrouter TNG: DeepSeek R1T2 Chimera deepseek-r1t2-chimera 0.25 0.85 DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks. Context: 163840
openrouter Morph: Morph V3 Large morph-v3-large 0.90 1.90 Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update> Zero Data Retention is enabled for Morph. Learn more about this model in their [documentation](https://docs.morphllm.com/quickstart) Context: 262144
openrouter Morph: Morph V3 Fast morph-v3-fast 0.80 1.20 Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update> Zero Data Retention is enabled for Morph. Learn more about this model in their [documentation](https://docs.morphllm.com/quickstart) Context: 81920
openrouter Baidu: ERNIE 4.5 VL 424B A47B ernie-4.5-vl-424b-a47b 0.42 1.25 ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization. Context: 123000
openrouter Baidu: ERNIE 4.5 300B A47B ernie-4.5-300b-a47b 0.28 1.10 ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in both English and Chinese. Optimized for high-throughput inference and efficient scaling, it uses a heterogeneous MoE structure with advanced routing and quantization strategies, including FP8 and 2-bit formats. This version is fine-tuned for language-only tasks and supports reasoning, tool parameters, and extended context lengths up to 131k tokens. Suitable for general-purpose LLM applications with high reasoning and throughput demands. Context: 123000
openrouter Inception: Mercury mercury 0.25 1.00 Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post] (https://www.inceptionlabs.ai/blog/introducing-mercury) here. Context: 128000
openrouter Mistral: Mistral Small 3.2 24B mistral-small-3.2-24b-instruct 0.06 0.18 Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks. It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA). Context: 131072
openrouter MiniMax: MiniMax M1 minimax-m1 0.40 2.20 MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it to process long sequences—up to 1 million tokens—while maintaining competitive FLOP efficiency. With 456 billion total parameters and 45.9B active per token, this variant is optimized for complex, multi-step reasoning tasks. Trained via a custom reinforcement learning pipeline (CISPO), M1 excels in long-context understanding, software engineering, agentic tool use, and mathematical reasoning. Benchmarks show strong performance across FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench, often outperforming other open models like DeepSeek R1 and Qwen3-235B. Context: 1000000
openrouter Google: Gemini 2.5 Flash gemini-2.5-flash 0.30 2.50 Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning). Context: 1048576
openrouter Google: Gemini 2.5 Pro gemini-2.5-pro 1.25 10.00 Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
openrouter MoonshotAI: Kimi Dev 72B kimi-dev-72b 0.29 1.15 Kimi-Dev-72B is an open-source large language model fine-tuned for software engineering and issue resolution tasks. Based on Qwen2.5-72B, it is optimized using large-scale reinforcement learning that applies code patches in real repositories and validates them via full test suite execution—rewarding only correct, robust completions. The model achieves 60.4% on SWE-bench Verified, setting a new benchmark among open-source models for software bug fixing and code reasoning. Context: 131072
openrouter OpenAI: o3 Pro o3-pro 20.00 80.00 The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers. Note that BYOK is required for this model. Set up here: https://openrouter.ai/settings/integrations Context: 200000
openrouter xAI: Grok 3 Mini grok-3-mini 0.30 0.50 A lightweight model that thinks before responding. Fast, smart, and great for logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. Context: 131072
openrouter xAI: Grok 3 grok-3 3.00 15.00 Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Context: 131072
openrouter Google: Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview 1.25 10.00 Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
openrouter DeepSeek: DeepSeek R1 0528 Qwen3 8B deepseek-r1-0528-qwen3-8b 0.06 0.09 DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro. It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought. The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024. Context: 128000
openrouter DeepSeek: R1 0528 (free) deepseek-r1-0528:free 0.00 0.00 May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model. Context: 163840
openrouter DeepSeek: R1 0528 deepseek-r1-0528 0.40 1.75 May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model. Context: 163840
openrouter Anthropic: Claude Opus 4 claude-opus-4 15.00 75.00 Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation. Read more at the [blog post here](https://www.anthropic.com/news/claude-4) Context: 200000
openrouter Anthropic: Claude Sonnet 4 claude-sonnet-4 3.00 15.00 Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios. Read more at the [blog post here](https://www.anthropic.com/news/claude-4) Context: 1000000
openrouter Mistral: Devstral Small 2505 devstral-small-2505 0.06 0.12 Devstral-Small-2505 is a 24B parameter agentic LLM fine-tuned from Mistral-Small-3.1, jointly developed by Mistral AI and All Hands AI for advanced software engineering tasks. It is optimized for codebase exploration, multi-file editing, and integration into coding agents, achieving state-of-the-art results on SWE-Bench Verified (46.8%). Devstral supports a 128k context window and uses a custom Tekken tokenizer. It is text-only, with the vision encoder removed, and is suitable for local deployment on high-end consumer hardware (e.g., RTX 4090, 32GB RAM Macs). Devstral is best used in agentic workflows via the OpenHands scaffold and is compatible with inference frameworks like vLLM, Transformers, and Ollama. It is released under the Apache 2.0 license. Context: 128000
openrouter Google: Gemma 3n 4B (free) gemma-3n-e4b-it:free 0.00 0.00 Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/) Context: 8192
openrouter Google: Gemma 3n 4B gemma-3n-e4b-it 0.02 0.04 Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/) Context: 32768
openrouter OpenAI: Codex Mini codex-mini 1.50 6.00 codex-mini-latest is a fine-tuned version of o4-mini specifically for use in Codex CLI. For direct use in the API, we recommend starting with gpt-4.1. Context: 200000
openrouter Nous: DeepHermes 3 Mistral 24B Preview deephermes-3-mistral-24b-preview 0.02 0.10 DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications. DeepHermes 3 supports a **reasoning toggle via system prompt**, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a *"deep thinking"* mode—generating extended chains of thought wrapped in `<think></think>` tags before delivering a final answer. System Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem. Context: 32768
openrouter Mistral: Mistral Medium 3 mistral-medium-3 0.40 2.00 Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments. Context: 131072
openrouter Google: Gemini 2.5 Pro Preview 05-06 gemini-2.5-pro-preview-05-06 1.25 10.00 Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
openrouter Arcee AI: Spotlight spotlight 0.18 0.18 Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests. Context: 131072
openrouter Arcee AI: Maestro Reasoning maestro-reasoning 0.90 3.30 Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL for step‑by‑step logic. Compared to the earlier 7 B preview, the production 32 B release widens the context window to 128 k tokens and doubles pass‑rate on MATH and GSM‑8K, while also lifting code completion accuracy. Its instruction style encourages structured "thought → answer" traces that can be parsed or hidden according to user preference. That transparency pairs well with audit‑focused industries like finance or healthcare where seeing the reasoning path matters. In Arcee Conductor, Maestro is automatically selected for complex, multi‑constraint queries that smaller SLMs bounce. Context: 131072
openrouter Arcee AI: Virtuoso Large virtuoso-large 0.75 1.20 Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and enterprise QA. Unlike many 70 B peers, it retains the 128 k context inherited from Qwen 2.5, letting it ingest books, codebases or financial filings wholesale. Training blended DeepSeek R1 distillation, multi‑epoch supervised fine‑tuning and a final DPO/RLHF alignment stage, yielding strong performance on BIG‑Bench‑Hard, GSM‑8K and long‑context Needle‑In‑Haystack tests. Enterprises use Virtuoso‑Large as the "fallback" brain in Conductor pipelines when other SLMs flag low confidence. Despite its size, aggressive KV‑cache optimizations keep first‑token latency in the low‑second range on 8× H100 nodes, making it a practical production‑grade powerhouse. Context: 131072
openrouter Arcee AI: Coder Large coder-large 0.50 0.80 Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed GitHub, CodeSearchNet and synthetic bug‑fix corpora. It supports a 32k context window, enabling multi‑file refactoring or long diff review in a single call, and understands 30‑plus programming languages with special attention to TypeScript, Go and Terraform. Internal benchmarks show 5–8 pt gains over CodeLlama‑34 B‑Python on HumanEval and competitive BugFix scores thanks to a reinforcement pass that rewards compilable output. The model emits structured explanations alongside code blocks by default, making it suitable for educational tooling as well as production copilot scenarios. Cost‑wise, Together AI prices it well below proprietary incumbents, so teams can scale interactive coding without runaway spend. Context: 32768
openrouter Microsoft: Phi 4 Reasoning Plus phi-4-reasoning-plus 0.07 0.35 Phi-4-reasoning-plus is an enhanced 14B parameter model from Microsoft, fine-tuned from Phi-4 with additional reinforcement learning to boost accuracy on math, science, and code reasoning tasks. It uses the same dense decoder-only transformer architecture as Phi-4, but generates longer, more comprehensive outputs structured into a step-by-step reasoning trace and final answer. While it offers improved benchmark scores over Phi-4-reasoning across tasks like AIME, OmniMath, and HumanEvalPlus, its responses are typically ~50% longer, resulting in higher latency. Designed for English-only applications, it is well-suited for structured reasoning workflows where output quality takes priority over response speed. Context: 32768
openrouter Inception: Mercury Coder mercury-coder 0.25 1.00 Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the [blog post here](https://www.inceptionlabs.ai/blog/introducing-mercury). Context: 128000
openrouter Qwen: Qwen3 4B (free) qwen3-4b:free 0.00 0.00 Qwen3-4B is a 4 billion parameter dense language model from the Qwen3 series, designed to support both general-purpose and reasoning-intensive tasks. It introduces a dual-mode architecture—thinking and non-thinking—allowing dynamic switching between high-precision logical reasoning and efficient dialogue generation. This makes it well-suited for multi-turn chat, instruction following, and complex agent workflows. Context: 40960
openrouter DeepSeek: DeepSeek Prover V2 deepseek-prover-v2 0.50 2.18 DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from [DeepSeek-Prover-V1.5](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL) Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description. Context: 163840
openrouter Meta: Llama Guard 4 12B llama-guard-4-12b 0.18 0.18 Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM—generating text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 4 was aligned to safeguard against the standardized MLCommons hazards taxonomy and designed to support multimodal Llama 4 capabilities. Specifically, it combines features from previous Llama Guard models, providing content moderation for English and multiple supported languages, along with enhanced capabilities to handle mixed text-and-image prompts, including multiple images. Additionally, Llama Guard 4 is integrated into the Llama Moderations API, extending robust safety classification to text and images. Context: 163840
openrouter Qwen: Qwen3 30B A3B qwen3-30b-a3b 0.06 0.22 Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models. Context: 40960
openrouter Qwen: Qwen3 8B qwen3-8b 0.04 0.14 Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math, coding, and logical inference, and "non-thinking" mode for general conversation. The model is fine-tuned for instruction-following, agent integration, creative writing, and multilingual use across 100+ languages and dialects. It natively supports a 32K token context window and can extend to 131K tokens with YaRN scaling. Context: 128000
openrouter Qwen: Qwen3 14B qwen3-14b 0.05 0.22 Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling. Context: 40960
openrouter Qwen: Qwen3 32B qwen3-32b 0.08 0.24 Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling. Context: 40960
openrouter Qwen: Qwen3 235B A22B qwen3-235b-a22b 0.18 0.54 Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling. Context: 40960
openrouter TNG: DeepSeek R1T Chimera (free) deepseek-r1t-chimera:free 0.00 0.00 DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks. The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use. Context: 163840
openrouter TNG: DeepSeek R1T Chimera deepseek-r1t-chimera 0.30 1.20 DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks. The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use. Context: 163840
openrouter OpenAI: o4 Mini High o4-mini-high 1.10 4.40 OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high. OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. Context: 200000
openrouter OpenAI: o3 o3 2.00 8.00 o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. Context: 200000
openrouter OpenAI: o4 Mini o4-mini 1.10 4.40 OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. Context: 200000
openrouter Qwen: Qwen2.5 Coder 7B Instruct qwen2.5-coder-7b-instruct 0.03 0.09 Qwen2.5-Coder-7B-Instruct is a 7B parameter instruction-tuned language model optimized for code-related tasks such as code generation, reasoning, and bug fixing. Based on the Qwen2.5 architecture, it incorporates enhancements like RoPE, SwiGLU, RMSNorm, and GQA attention with support for up to 128K tokens using YaRN-based extrapolation. It is trained on a large corpus of source code, synthetic data, and text-code grounding, providing robust performance across programming languages and agentic coding workflows. This model is part of the Qwen2.5-Coder family and offers strong compatibility with tools like vLLM for efficient deployment. Released under the Apache 2.0 license. Context: 32768
openrouter OpenAI: GPT-4.1 gpt-4.1 2.00 8.00 GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval. Context: 1047576
openrouter OpenAI: GPT-4.1 Mini gpt-4.1-mini 0.40 1.60 GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints. Context: 1047576
openrouter OpenAI: GPT-4.1 Nano gpt-4.1-nano 0.10 0.40 For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion. Context: 1047576
openrouter EleutherAI: Llemma 7b llemma_7b 0.80 1.20 Llemma 7B is a language model for mathematics. It was initialized with Code Llama 7B weights, and trained on the Proof-Pile-2 for 200B tokens. Llemma models are particularly strong at chain-of-thought mathematical reasoning and using computational tools for mathematics, such as Python and formal theorem provers. Context: 4096
openrouter AlfredPros: CodeLLaMa 7B Instruct Solidity codellama-7b-instruct-solidity 0.80 1.20 A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by PEFT library. Context: 4096
openrouter xAI: Grok 3 Mini Beta grok-3-mini-beta 0.30 0.50 Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It’s ideal for reasoning-heavy tasks that don’t demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems. Transparent "thinking" traces accessible. Defaults to low reasoning, can boost with setting `reasoning: { effort: "high" }` Note: That there are two xAI endpoints for this model. By default when using this model we will always route you to the base endpoint. If you want the fast endpoint you can add `provider: { sort: throughput}`, to sort by throughput instead. Context: 131072
openrouter xAI: Grok 3 Beta grok-3-beta 3.00 15.00 Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking. Note: That there are two xAI endpoints for this model. By default when using this model we will always route you to the base endpoint. If you want the fast endpoint you can add `provider: { sort: throughput}`, to sort by throughput instead. Context: 131072
openrouter NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 llama-3.1-nemotron-ultra-253b-v1 0.60 1.80 Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more. Context: 131072
openrouter Meta: Llama 4 Maverick llama-4-maverick 0.15 0.60 Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput. Context: 1048576
openrouter Meta: Llama 4 Scout llama-4-scout 0.08 0.30 Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025. Context: 327680
openrouter Qwen: Qwen2.5 VL 32B Instruct qwen2.5-vl-32b-instruct 0.05 0.22 Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation. Context: 16384
openrouter DeepSeek: DeepSeek V3 0324 deepseek-chat-v3-0324 0.19 0.87 DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks. Context: 163840
openrouter OpenAI: o1-pro o1-pro 150.00 600.00 The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. Context: 200000
openrouter Mistral: Mistral Small 3.1 24B (free) mistral-small-3.1-24b-instruct:free 0.00 0.00 Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct) Context: 128000
openrouter Mistral: Mistral Small 3.1 24B mistral-small-3.1-24b-instruct 0.03 0.11 Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct) Context: 131072
openrouter AllenAI: Olmo 2 32B Instruct olmo-2-0325-32b-instruct 0.05 0.20 OLMo-2 32B Instruct is a supervised instruction-finetuned variant of the OLMo-2 32B March 2025 base model. It excels in complex reasoning and instruction-following tasks across diverse benchmarks such as GSM8K, MATH, IFEval, and general NLP evaluation. Developed by AI2, OLMo-2 32B is part of an open, research-oriented initiative, trained primarily on English-language datasets to advance the understanding and development of open-source language models. Context: 128000
openrouter Google: Gemma 3 4B (free) gemma-3-4b-it:free 0.00 0.00 Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Context: 32768
openrouter Google: Gemma 3 4B gemma-3-4b-it 0.02 0.07 Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Context: 96000
openrouter Google: Gemma 3 12B (free) gemma-3-12b-it:free 0.00 0.00 Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after [Gemma 3 27B](google/gemma-3-27b-it) Context: 32768
openrouter Google: Gemma 3 12B gemma-3-12b-it 0.03 0.10 Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after [Gemma 3 27B](google/gemma-3-27b-it) Context: 131072
openrouter Cohere: Command A command-a 2.50 10.00 Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary and open-weights models Command A delivers maximum performance with minimum hardware costs, excelling on business-critical agentic and multilingual tasks. Context: 256000
openrouter OpenAI: GPT-4o-mini Search Preview gpt-4o-mini-search-preview 0.15 0.60 GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Context: 128000
openrouter OpenAI: GPT-4o Search Preview gpt-4o-search-preview 2.50 10.00 GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Context: 128000
openrouter Google: Gemma 3 27B (free) gemma-3-27b-it:free 0.00 0.00 Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it) Context: 131072
openrouter Google: Gemma 3 27B gemma-3-27b-it 0.04 0.06 Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it) Context: 131072
openrouter TheDrummer: Skyfall 36B V2 skyfall-36b-v2 0.55 0.80 Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling. Context: 32768
openrouter Microsoft: Phi 4 Multimodal Instruct phi-4-multimodal-instruct 0.05 0.10 Phi-4 Multimodal Instruct is a versatile 5.6B parameter foundation model that combines advanced reasoning and instruction-following capabilities across both text and visual inputs, providing accurate text outputs. The unified architecture enables efficient, low-latency inference, suitable for edge and mobile deployments. Phi-4 Multimodal Instruct supports text inputs in multiple languages including Arabic, Chinese, English, French, German, Japanese, Spanish, and more, with visual input optimized primarily for English. It delivers impressive performance on multimodal tasks involving mathematical, scientific, and document reasoning, providing developers and enterprises a powerful yet compact model for sophisticated interactive applications. For more information, see the [Phi-4 Multimodal blog post](https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/). Context: 131072
openrouter Perplexity: Sonar Reasoning Pro sonar-reasoning-pro 2.00 8.00 Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT). Designed for advanced use cases, it supports in-depth, multi-step queries with a larger context window and can surface more citations per search, enabling more comprehensive and extensible responses. Context: 128000
openrouter Perplexity: Sonar Pro sonar-pro 3.00 15.00 Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like double the number of citations per search as Sonar on average. Plus, with a larger context window, it can handle longer and more nuanced searches and follow-up questions. Context: 200000
openrouter Perplexity: Sonar Deep Research sonar-deep-research 2.00 8.00 Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains like finance, technology, health, and current events. Notes on Pricing ([Source](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-deep-research)) - Input tokens comprise of Prompt tokens (user prompt) + Citation tokens (these are processed tokens from running searches) - Deep Research runs multiple searches to conduct exhaustive research. Searches are priced at $5/1000 searches. A request that does 30 searches will cost $0.15 in this step. - Reasoning is a distinct step in Deep Research since it does extensive automated reasoning through all the material it gathers during its research phase. Reasoning tokens here are a bit different than the CoTs in the answer - these are tokens that we use to reason through the research material prior to generating the outputs via the CoTs. Reasoning tokens are priced at $3/1M tokens Context: 128000
openrouter Qwen: QwQ 32B qwq-32b 0.15 0.40 QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. Context: 32768
openrouter Google: Gemini 2.0 Flash Lite gemini-2.0-flash-lite-001 0.08 0.30 Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices. Context: 1048576
openrouter Anthropic: Claude 3.7 Sonnet (thinking) claude-3.7-sonnet:thinking 3.00 15.00 Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet) Context: 200000
openrouter Anthropic: Claude 3.7 Sonnet claude-3.7-sonnet 3.00 15.00 Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet) Context: 200000
openrouter Mistral: Saba mistral-saba 0.20 0.60 Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications. Read more at the blog post [here](https://mistral.ai/en/news/mistral-saba) Context: 32768
openrouter Llama Guard 3 8B llama-guard-3-8b 0.02 0.06 Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls. Context: 131072
openrouter OpenAI: o3 Mini High o3-mini-high 1.10 4.40 OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities. The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. Context: 200000
openrouter Google: Gemini 2.0 Flash gemini-2.0-flash-001 0.10 0.40 Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences. Context: 1048576
openrouter Qwen: Qwen VL Plus qwen-vl-plus 0.21 0.63 Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks. Context: 7500
openrouter AionLabs: Aion-1.0 aion-1.0 4.00 8.00 Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model. Context: 131072
openrouter AionLabs: Aion-1.0-Mini aion-1.0-mini 0.70 1.40 Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), independently replicated for verification. Context: 131072
openrouter AionLabs: Aion-RP 1.0 (8B) aion-rp-llama-3.1-8b 0.80 1.60 Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing. Context: 32768
openrouter Qwen: Qwen VL Max qwen-vl-max 0.80 3.20 Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks. Context: 131072
openrouter Qwen: Qwen-Turbo qwen-turbo 0.05 0.20 Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks. Context: 1000000
openrouter Qwen: Qwen2.5 VL 72B Instruct qwen2.5-vl-72b-instruct 0.15 0.60 Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images. Context: 32768
openrouter Qwen: Qwen-Plus qwen-plus 0.40 1.20 Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination. Context: 131072
openrouter Qwen: Qwen-Max qwen-max 1.60 6.40 Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown. Context: 32768
openrouter OpenAI: o3 Mini o3-mini 1.10 4.40 OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. This model supports the `reasoning_effort` parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slug `openai/o3-mini-high` to default the parameter to "high". The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities. The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. Context: 200000
openrouter Mistral: Mistral Small 3 mistral-small-24b-instruct-2501 0.03 0.11 Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. [Read the blog post about the model here.](https://mistral.ai/news/mistral-small-3/) Context: 32768
openrouter DeepSeek: R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b 0.27 0.27 DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\nOther benchmark results include:\n\n- AIME 2024 pass@1: 72.6\n- MATH-500 pass@1: 94.3\n- CodeForces Rating: 1691\n\nThe model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 131072
openrouter DeepSeek: R1 Distill Qwen 14B deepseek-r1-distill-qwen-14b 0.15 0.15 DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 69.7 - MATH-500 pass@1: 93.9 - CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 32768
openrouter Perplexity: Sonar Reasoning sonar-reasoning 1.00 5.00 Sonar Reasoning is a reasoning model provided by Perplexity based on [DeepSeek R1](/deepseek/deepseek-r1). It allows developers to utilize long chain of thought with built-in web search. Sonar Reasoning is uncensored and hosted in US datacenters. Context: 127000
openrouter Perplexity: Sonar sonar 1.00 1.00 Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features optimized for speed. Context: 127072
openrouter DeepSeek: R1 Distill Llama 70B deepseek-r1-distill-llama-70b 0.03 0.11 DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 131072
openrouter DeepSeek: R1 deepseek-r1 0.70 2.40 DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120). MIT licensed: Distill & commercialize freely! Context: 163840
openrouter MiniMax: MiniMax-01 minimax-01 0.20 1.10 MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens. The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model. To read more about the release, see: https://www.minimaxi.com/en/news/minimax-01-series-2 Context: 1000192
openrouter Microsoft: Phi 4 phi-4 0.06 0.14 [Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. For more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905) Context: 16384
openrouter Sao10K: Llama 3.1 70B Hanami x1 l3.1-70b-hanami-x1 3.00 3.00 This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b). Context: 16000
openrouter DeepSeek: DeepSeek V3 deepseek-chat 0.30 1.20 DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models. For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226). Context: 163840
openrouter Sao10K: Llama 3.3 Euryale 70B l3.3-euryale-70b 0.65 0.75 Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b). Context: 131072
openrouter OpenAI: o1 o1 15.00 60.00 The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1). Context: 200000
openrouter Cohere: Command R7B (12-2024) command-r7b-12-2024 0.04 0.15 Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps. Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
openrouter Google: Gemini 2.0 Flash Experimental (free) gemini-2.0-flash-exp:free 0.00 0.00 Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences. Context: 1048576
openrouter Meta: Llama 3.3 70B Instruct (free) llama-3.3-70b-instruct:free 0.00 0.00 The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md) Context: 131072
openrouter Meta: Llama 3.3 70B Instruct llama-3.3-70b-instruct 0.10 0.32 The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md) Context: 131072
openrouter Amazon: Nova Lite 1.0 nova-lite-v1 0.06 0.24 Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input. Context: 300000
openrouter Amazon: Nova Micro 1.0 nova-micro-v1 0.04 0.14 Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities. Context: 128000
openrouter Amazon: Nova Pro 1.0 nova-pro-v1 0.80 3.20 Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX). Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and at analyzing financial documents. **NOTE**: Video input is not supported at this time. Context: 300000
openrouter OpenAI: GPT-4o (2024-11-20) gpt-4o-2024-11-20 2.50 10.00 The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. Context: 128000
openrouter Mistral Large 2411 mistral-large-2411 2.00 6.00 Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](/mistralai/pixtral-large-2411) It provides a significant upgrade on the previous [Mistral Large 24.07](/mistralai/mistral-large-2407), with notable improvements in long context understanding, a new system prompt, and more accurate function calling. Context: 131072
openrouter Mistral Large 2407 mistral-large-2407 2.00 6.00 This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/). It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents. Context: 131072
openrouter Mistral: Pixtral Large 2411 pixtral-large-2411 2.00 6.00 Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes. Context: 131072
openrouter Qwen2.5 Coder 32B Instruct qwen-2.5-coder-32b-instruct 0.03 0.11 Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. To read more about its evaluation results, check out [Qwen 2.5 Coder's blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/). Context: 32768
openrouter SorcererLM 8x22B sorcererlm-8x22b 4.50 4.50 SorcererLM is an advanced RP and storytelling model, built as a Low-rank 16-bit LoRA fine-tuned on [WizardLM-2 8x22B](/microsoft/wizardlm-2-8x22b). - Advanced reasoning and emotional intelligence for engaging and immersive interactions - Vivid writing capabilities enriched with spatial and contextual awareness - Enhanced narrative depth, promoting creative and dynamic storytelling Context: 16000
openrouter TheDrummer: UnslopNemo 12B unslopnemo-12b 0.40 0.40 UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios. Context: 32768
openrouter Anthropic: Claude 3.5 Haiku (2024-10-22) claude-3.5-haiku-20241022 0.80 4.00 Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries. It does not support image inputs. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use) Context: 200000
openrouter Anthropic: Claude 3.5 Haiku claude-3.5-haiku 0.80 4.00 Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions. This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems. This model is currently pointing to [Claude 3.5 Haiku (2024-10-22)](/anthropic/claude-3-5-haiku-20241022). Context: 200000
openrouter Anthropic: Claude 3.5 Sonnet claude-3.5-sonnet 6.00 30.00 New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal Context: 200000
openrouter Magnum v4 72B magnum-v4-72b 3.00 5.00 This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-2.5-72b-instruct). Context: 16384
openrouter Mistral: Ministral 8B ministral-8b 0.10 0.10 Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications. Context: 131072
openrouter Mistral: Ministral 3B ministral-3b 0.04 0.04 Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference. Context: 131072
openrouter Qwen: Qwen2.5 7B Instruct qwen-2.5-7b-instruct 0.04 0.10 Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter NVIDIA: Llama 3.1 Nemotron 70B Instruct llama-3.1-nemotron-70b-instruct 1.20 1.20 NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter Inflection: Inflection 3 Productivity inflection-3-productivity 2.50 10.00 Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional intelligence similar to Pi, see [Inflect 3 Pi](/inflection/inflection-3-pi) See [Inflection's announcement](https://inflection.ai/blog/enterprise) for more details. Context: 8000
openrouter Inflection: Inflection 3 Pi inflection-3-pi 2.50 10.00 Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi has been trained to mirror your tone and style, if you use more emojis, so will Pi! Try experimenting with various prompts and conversation styles. Context: 8000
openrouter TheDrummer: Rocinante 12B rocinante-12b 0.17 0.43 Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives - Adventure-filled and captivating stories Context: 32768
openrouter Meta: Llama 3.2 90B Vision Instruct llama-3.2-90b-vision-instruct 0.35 0.40 The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 32768
openrouter Meta: Llama 3.2 11B Vision Instruct llama-3.2-11b-vision-instruct 0.05 0.05 Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter Meta: Llama 3.2 1B Instruct llama-3.2-1b-instruct 0.03 0.20 Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance. Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 60000
openrouter Meta: Llama 3.2 3B Instruct (free) llama-3.2-3b-instruct:free 0.00 0.00 Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter Meta: Llama 3.2 3B Instruct llama-3.2-3b-instruct 0.02 0.02 Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter Qwen2.5 72B Instruct qwen-2.5-72b-instruct 0.12 0.39 Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter NeverSleep: Lumimaid v0.2 8B llama-3.1-lumimaid-8b 0.09 0.60 Lumimaid v0.2 8B is a finetune of [Llama 3.1 8B](/models/meta-llama/llama-3.1-8b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 32768
openrouter Mistral: Pixtral 12B pixtral-12b 0.10 0.10 The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836. Context: 32768
openrouter Cohere: Command R (08-2024) command-r-08-2024 0.15 0.60 command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model. Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
openrouter Cohere: Command R+ (08-2024) command-r-plus-08-2024 2.50 10.00 command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same. Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
openrouter Qwen: Qwen2.5-VL 7B Instruct (free) qwen-2.5-vl-7b-instruct:free 0.00 0.00 Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter Qwen: Qwen2.5-VL 7B Instruct qwen-2.5-vl-7b-instruct 0.20 0.20 Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter Sao10K: Llama 3.1 Euryale 70B v2.2 l3.1-euryale-70b 0.65 0.75 Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b). Context: 32768
openrouter Microsoft: Phi-3.5 Mini 128K Instruct phi-3.5-mini-128k-instruct 0.10 0.10 Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as [Phi-3 Mini](/models/microsoft/phi-3-mini-128k-instruct). The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters. Context: 128000
openrouter Nous: Hermes 3 70B Instruct hermes-3-llama-3.1-70b 0.30 0.30 Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 70B is a competitive, if not superior finetune of the [Llama-3.1 70B foundation model](/models/meta-llama/llama-3.1-70b-instruct), focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Context: 65536
openrouter Nous: Hermes 3 405B Instruct (free) hermes-3-llama-3.1-405b:free 0.00 0.00 Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two. Context: 131072
openrouter Nous: Hermes 3 405B Instruct hermes-3-llama-3.1-405b 1.00 1.00 Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two. Context: 131072
openrouter OpenAI: ChatGPT-4o chatgpt-4o-latest 5.00 15.00 OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in that it has additional RLHF. It is intended for research and evaluation. OpenAI notes that this model is not suited for production use-cases as it may be removed or redirected to another model in the future. Context: 128000
openrouter Sao10K: Llama 3 8B Lunaris l3-lunaris-8b 0.04 0.05 Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. Created by [Sao10k](https://huggingface.co/Sao10k), this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning. For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1. Context: 8192
openrouter OpenAI: GPT-4o (2024-08-06) gpt-4o-2024-08-06 2.50 10.00 The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/). GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) Context: 128000
openrouter Meta: Llama 3.1 405B (base) llama-3.1-405b 4.00 4.00 Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 32768
openrouter Meta: Llama 3.1 405B Instruct (free) llama-3.1-405b-instruct:free 0.00 0.00 The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
openrouter Meta: Llama 3.1 405B Instruct llama-3.1-405b-instruct 3.50 3.50 The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 10000
openrouter Meta: Llama 3.1 8B Instruct llama-3.1-8b-instruct 0.02 0.03 Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
openrouter Meta: Llama 3.1 70B Instruct llama-3.1-70b-instruct 0.40 0.40 Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
openrouter Mistral: Mistral Nemo mistral-nemo 0.02 0.04 A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license. Context: 131072
openrouter OpenAI: GPT-4o-mini gpt-4o-mini 0.15 0.60 GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more. #multimodal Context: 128000
openrouter OpenAI: GPT-4o-mini (2024-07-18) gpt-4o-mini-2024-07-18 0.15 0.60 GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more. #multimodal Context: 128000
openrouter Google: Gemma 2 27B gemma-2-27b-it 0.65 0.65 Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Context: 8192
openrouter Google: Gemma 2 9B gemma-2-9b-it 0.03 0.09 Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness. See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Context: 8192
openrouter Sao10k: Llama 3 Euryale 70B v2.1 l3-euryale-70b 1.48 1.48 Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom formatting / reply formats. - Very creative, lots of unique swipes. - Is not restrictive during roleplays. Context: 8192
openrouter Mistral: Mistral 7B Instruct (free) mistral-7b-instruct:free 0.00 0.00 A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.* Context: 32768
openrouter Mistral: Mistral 7B Instruct mistral-7b-instruct 0.03 0.05 A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.* Context: 32768
openrouter Mistral: Mistral 7B Instruct v0.3 mistral-7b-instruct-v0.3 0.20 0.20 A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of [Mistral 7B Instruct v0.2](/models/mistralai/mistral-7b-instruct-v0.2), with the following changes: - Extended vocabulary to 32768 - Supports v3 Tokenizer - Supports function calling NOTE: Support for function calling depends on the provider. Context: 32768
openrouter NousResearch: Hermes 2 Pro - Llama-3 8B hermes-2-pro-llama-3-8b 0.03 0.08 Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. Context: 8192
openrouter Microsoft: Phi-3 Mini 128K Instruct phi-3-mini-128k-instruct 0.10 0.10 Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date. Context: 128000
openrouter Microsoft: Phi-3 Medium 128K Instruct phi-3-medium-128k-instruct 1.00 1.00 Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance. For 4k context length, try [Phi-3 Medium 4K](/models/microsoft/phi-3-medium-4k-instruct). Context: 128000
openrouter OpenAI: GPT-4o (2024-05-13) gpt-4o-2024-05-13 5.00 15.00 GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal Context: 128000
openrouter OpenAI: GPT-4o gpt-4o 2.50 10.00 GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal Context: 128000
openrouter OpenAI: GPT-4o (extended) gpt-4o:extended 6.00 18.00 GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal Context: 128000
openrouter Meta: LlamaGuard 2 8B llama-guard-2-8b 0.20 0.20 This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification. LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated. For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
openrouter Meta: Llama 3 70B Instruct llama-3-70b-instruct 0.30 0.40 Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
openrouter Meta: Llama 3 8B Instruct llama-3-8b-instruct 0.03 0.06 Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
openrouter Mistral: Mixtral 8x22B Instruct mixtral-8x22b-instruct 2.00 6.00 Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe Context: 65536
openrouter WizardLM-2 8x22B wizardlm-2-8x22b 0.48 0.48 WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe Context: 65536
openrouter OpenAI: GPT-4 Turbo gpt-4-turbo 10.00 30.00 The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023. Context: 128000
openrouter Anthropic: Claude 3 Haiku claude-3-haiku 0.25 1.25 Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal Context: 200000
openrouter Anthropic: Claude 3 Opus claude-3-opus 15.00 75.00 Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal Context: 200000
openrouter Mistral Large mistral-large 2.00 6.00 This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/). It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents. Context: 128000
openrouter OpenAI: GPT-3.5 Turbo (older v0613) gpt-3.5-turbo-0613 1.00 2.00 GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021. Context: 4095
openrouter OpenAI: GPT-4 Turbo Preview gpt-4-turbo-preview 10.00 30.00 The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023. **Note:** heavily rate limited by OpenAI while in preview. Context: 128000
openrouter Mistral Tiny mistral-tiny 0.25 0.25 Note: This model is being deprecated. Recommended replacement is the newer [Ministral 8B](/mistral/ministral-8b) This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial. Context: 32768
openrouter Mistral: Mistral 7B Instruct v0.2 mistral-7b-instruct-v0.2 0.20 0.20 A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), with the following changes: - 32k context window (vs 8k context in v0.1) - Rope-theta = 1e6 - No Sliding-Window Attention Context: 32768
openrouter Mistral: Mixtral 8x7B Instruct mixtral-8x7b-instruct 0.54 0.54 Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe Context: 32768
openrouter Noromaid 20B noromaid-20b 1.00 1.75 A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge. #merge #uncensored Context: 4096
openrouter Goliath 120B goliath-120b 6.00 8.00 A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit). - [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios. #merge Context: 6144
openrouter Auto Router auto -1,000,000.00 -1,000,000.00 Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used, visit [Activity](/activity), or read the `model` attribute of the response. Your response will be priced at the same rate as the routed model. The meta-model is powered by [Not Diamond](https://docs.notdiamond.ai/docs/how-not-diamond-works). Learn more in our [docs](/docs/model-routing). Requests will be routed to the following models: - [openai/gpt-5.1](/openai/gpt-5.1) - [openai/gpt-5](/openai/gpt-5) - [openai/gpt-5-mini](/openai/gpt-5-mini) - [openai/gpt-5-nano](/openai/gpt-5-nano) - [openai/gpt-4.1](/openai/gpt-4.1) - [openai/gpt-4.1-mini](/openai/gpt-4.1-mini) - [openai/gpt-4.1-nano](/openai/gpt-4.1-nano) - [openai/gpt-4o](/openai/gpt-4o) - [openai/gpt-4o-2024-05-13](/openai/gpt-4o-2024-05-13) - [openai/gpt-4o-2024-08-06](/openai/gpt-4o-2024-08-06) - [openai/gpt-4o-2024-11-20](/openai/gpt-4o-2024-11-20) - [openai/gpt-4o-mini](/openai/gpt-4o-mini) - [openai/gpt-4o-mini-2024-07-18](/openai/gpt-4o-mini-2024-07-18) - [openai/gpt-4-turbo](/openai/gpt-4-turbo) - [openai/gpt-4-turbo-preview](/openai/gpt-4-turbo-preview) - [openai/gpt-4-1106-preview](/openai/gpt-4-1106-preview) - [openai/gpt-4](/openai/gpt-4) - [openai/gpt-3.5-turbo](/openai/gpt-3.5-turbo) - [openai/gpt-oss-120b](/openai/gpt-oss-120b) - [anthropic/claude-opus-4.5](/anthropic/claude-opus-4.5) - [anthropic/claude-opus-4.1](/anthropic/claude-opus-4.1) - [anthropic/claude-opus-4](/anthropic/claude-opus-4) - [anthropic/claude-sonnet-4.5](/anthropic/claude-sonnet-4.5) - [anthropic/claude-sonnet-4](/anthropic/claude-sonnet-4) - [anthropic/claude-3.7-sonnet](/anthropic/claude-3.7-sonnet) - [anthropic/claude-haiku-4.5](/anthropic/claude-haiku-4.5) - [anthropic/claude-3.5-haiku](/anthropic/claude-3.5-haiku) - [anthropic/claude-3-haiku](/anthropic/claude-3-haiku) - [google/gemini-3-pro-preview](/google/gemini-3-pro-preview) - [google/gemini-2.5-pro](/google/gemini-2.5-pro) - [google/gemini-2.0-flash-001](/google/gemini-2.0-flash-001) - [google/gemini-2.5-flash](/google/gemini-2.5-flash) - [mistralai/mistral-large](/mistralai/mistral-large) - [mistralai/mistral-large-2407](/mistralai/mistral-large-2407) - [mistralai/mistral-large-2411](/mistralai/mistral-large-2411) - [mistralai/mistral-medium-3.1](/mistralai/mistral-medium-3.1) - [mistralai/mistral-nemo](/mistralai/mistral-nemo) - [mistralai/mistral-7b-instruct](/mistralai/mistral-7b-instruct) - [mistralai/mixtral-8x7b-instruct](/mistralai/mixtral-8x7b-instruct) - [mistralai/mixtral-8x22b-instruct](/mistralai/mixtral-8x22b-instruct) - [mistralai/codestral-2508](/mistralai/codestral-2508) - [x-ai/grok-4](/x-ai/grok-4) - [x-ai/grok-3](/x-ai/grok-3) - [x-ai/grok-3-mini](/x-ai/grok-3-mini) - [deepseek/deepseek-r1](/deepseek/deepseek-r1) - [meta-llama/llama-3.3-70b-instruct](/meta-llama/llama-3.3-70b-instruct) - [meta-llama/llama-3.1-405b-instruct](/meta-llama/llama-3.1-405b-instruct) - [meta-llama/llama-3.1-70b-instruct](/meta-llama/llama-3.1-70b-instruct) - [meta-llama/llama-3.1-8b-instruct](/meta-llama/llama-3.1-8b-instruct) - [meta-llama/llama-3-70b-instruct](/meta-llama/llama-3-70b-instruct) - [meta-llama/llama-3-8b-instruct](/meta-llama/llama-3-8b-instruct) - [qwen/qwen3-235b-a22b](/qwen/qwen3-235b-a22b) - [qwen/qwen3-32b](/qwen/qwen3-32b) - [qwen/qwen3-14b](/qwen/qwen3-14b) - [cohere/command-r-plus-08-2024](/cohere/command-r-plus-08-2024) - [cohere/command-r-08-2024](/cohere/command-r-08-2024) - [moonshotai/kimi-k2-thinking](/moonshotai/kimi-k2-thinking) - [perplexity/sonar](/perplexity/sonar) Context: 2000000
openrouter OpenAI: GPT-4 Turbo (older v1106) gpt-4-1106-preview 10.00 30.00 The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023. Context: 128000
openrouter Mistral: Mistral 7B Instruct v0.1 mistral-7b-instruct-v0.1 0.11 0.19 A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length. Context: 2824
openrouter OpenAI: GPT-3.5 Turbo Instruct gpt-3.5-turbo-instruct 1.50 2.00 This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021. Context: 4095
openrouter OpenAI: GPT-3.5 Turbo 16k gpt-3.5-turbo-16k 3.00 4.00 This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021. Context: 16385
openrouter Mancer: Weaver (alpha) weaver 0.75 1.00 An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations. Context: 8000
openrouter ReMM SLERP 13B remm-slerp-l2-13b 0.45 0.65 A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge Context: 6144
openrouter MythoMax 13B mythomax-l2-13b 0.06 0.06 One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge Context: 4096
openrouter OpenAI: GPT-4 (older v0314) gpt-4-0314 30.00 60.00 GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021. Context: 8191
openrouter OpenAI: GPT-4 gpt-4 30.00 60.00 OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021. Context: 8191
openrouter OpenAI: GPT-3.5 Turbo gpt-3.5-turbo 0.50 1.50 GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021. Context: 16385
factoryai glm-4.6 glm-4.6 - - -
factoryai claude-haiku-4-5-20251001 claude-haiku-4-5-20251001 - - -
factoryai gpt-5.1 gpt-5.1 - - -
factoryai gpt-5.1-codex gpt-5.1-codex - - -
factoryai gpt-5.1-codex-max gpt-5.1-codex-max - - -
factoryai gpt-5.2 gpt-5.2 - - -
factoryai gemini-3-pro-preview gemini-3-pro-preview - - -
factoryai gemini-3-flash-preview gemini-3-flash-preview - - -
factoryai claude-sonnet-4-5-20250929 claude-sonnet-4-5-20250929 - - -
factoryai claude-opus-4-5-20251101 claude-opus-4-5-20251101 - - -
zai GLM-4.7 glm-4.7 0.60 0.11 -
zai GLM-4.6 glm-4.6 0.60 0.11 -
zai GLM-4.6V glm-4.6v 0.30 0.05 -
zai GLM-4.6V-FlashX glm-4.6v-flashx 0.04 0.00 -
zai GLM-4.5 glm-4.5 0.60 0.11 -
zai GLM-4.5V glm-4.5v 0.60 0.11 -
zai GLM-4.5-X glm-4.5-x 2.20 0.45 -
zai GLM-4.5-Air glm-4.5-air 0.20 0.03 -
zai GLM-4.5-AirX glm-4.5-airx 1.10 0.22 -
zai GLM-4-32B-0414-128K glm-4-32b-0414-128k 0.10 - -
zai GLM-4.6V-Flash glm-4.6v-flash 0.00 0.00 -
zai GLM-4.5-Flash glm-4.5-flash 0.00 0.00 -