|
vercel
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
xAI's latest coding model that offers fast agentic coding with a 256K context window.
|
|
|
vercel
|
Claude Sonnet 4.5 |
claude-sonnet-4.5
|
3.00 |
15.00 |
Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4.
|
|
|
vercel
|
Gemini 2.5 Flash Lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
|
|
|
vercel
|
Gemini 3 Flash |
gemini-3-flash
|
0.50 |
3.00 |
Google's most intelligent model built for speed, combining frontier intelligence with superior search and grounding.
|
|
|
vercel
|
Claude Haiku 4.5 |
claude-haiku-4.5
|
1.00 |
5.00 |
Claude Haiku 4.5 matches Sonnet 4's performance on coding, computer use, and agent tasks at substantially lower cost and faster speeds. It delivers near-frontier performance and Claude’s unique character at a price point that works for scaled sub-agent deployments, free tier products, and intelligence-sensitive applications with budget constraints.
|
|
|
vercel
|
MiniMax M2 |
minimax-m2
|
0.27 |
1.15 |
MiniMax-M2 redefines efficiency for agents. It is a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence.
|
|
|
vercel
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
|
|
|
vercel
|
DeepSeek V3.2 |
deepseek-v3.2
|
0.27 |
0.40 |
DeepSeek-V3.2: Official successor to V3.2-Exp.
|
|
|
vercel
|
Claude Opus 4.5 |
claude-opus-4.5
|
5.00 |
25.00 |
Claude Opus 4.5 is Anthropic’s latest model in the Opus series, meant for demanding reasoning tasks and complex problem solving. This model has improvements in general intelligence and vision compared to previous iterations. In addition, it is suited for difficult coding tasks and agentic workflows, especially those with computer use and tool use, and can effectively handle context usage and external memory files.
|
|
|
vercel
|
Claude 3.7 Sonnet |
claude-3.7-sonnet
|
3.00 |
15.00 |
Claude 3.7 Sonnet is the first hybrid reasoning model and Anthropic's most intelligent model to date. It delivers state-of-the-art performance for coding, content generation, data analysis, and planning tasks, building upon its predecessor Claude 3.5 Sonnet's capabilities in software engineering and computer use.
|
|
|
vercel
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
GPT-5.2 is OpenAI's best general-purpose model, part of the GPT-5 flagship model family. It's their most intelligent model yet for both general and agentic tasks.
|
|
|
vercel
|
Claude Sonnet 4 |
claude-sonnet-4
|
3.00 |
15.00 |
Claude Sonnet 4 significantly improves on Sonnet 3.7's industry-leading capabilities, excelling in coding with a state-of-the-art 72.7% on SWE-bench. The model balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.
|
|
|
vercel
|
Grok 4.1 Fast Non-Reasoning |
grok-4.1-fast-non-reasoning
|
0.20 |
0.50 |
Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for speed use this variant. Otherwise, use the reasoning version.
|
|
|
vercel
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
This model improves upon Gemini 2.5 Pro and is catered towards challenging tasks, especially those involving complex reasoning or agentic workflows. Improvements highlighted include use cases for coding, multi-step function calling, planning, reasoning, deep knowledge tasks, and instruction following.
|
|
|
vercel
|
GPT-5 mini |
gpt-5-mini
|
0.25 |
2.00 |
GPT-5 mini is a cost optimized model that excels at reasoning/chat tasks. It offers an optimal balance between speed, cost, and capability.
|
|
|
vercel
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
GPT-5 is OpenAI's flagship language model that excels at complex reasoning, broad real-world knowledge, code-intensive, and multi-step agentic tasks.
|
|
|
vercel
|
GPT-5 Chat |
gpt-5-chat
|
1.25 |
10.00 |
GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT.
|
|
|
vercel
|
GPT-5 nano |
gpt-5-nano
|
0.05 |
0.40 |
GPT-5 nano is a high throughput model that excels at simple instruction or classification tasks.
|
|
|
vercel
|
GPT-4.1 mini |
gpt-4.1-mini
|
0.40 |
1.60 |
GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.
|
|
|
vercel
|
GPT-5-Codex |
gpt-5-codex
|
1.25 |
10.00 |
GPT-5-Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments.
|
|
|
vercel
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Gemini 2.5 Pro is our most advanced reasoning Gemini model, capable of solving complex problems. Gemini 2.5 Pro can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and even entire code repositories.
|
|
|
vercel
|
GLM 4.6 |
glm-4.6
|
0.45 |
1.80 |
As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
|
|
|
vercel
|
Grok 4 Fast Non-Reasoning |
grok-4-fast-non-reasoning
|
0.20 |
0.50 |
Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
|
|
|
vercel
|
gpt-oss-120b |
gpt-oss-120b
|
0.10 |
0.50 |
Extremely capable general-purpose LLM with strong, controllable reasoning capabilities
|
|
|
vercel
|
gpt-oss-safeguard-20b |
gpt-oss-safeguard-20b
|
0.08 |
0.30 |
OpenAI's first open weight reasoning model specifically trained for safety classification tasks. Fine-tuned from GPT-OSS, this model helps classify text content based on customizable policies, enabling bring-your-own-policy Trust & Safety AI where your own taxonomy, definitions, and thresholds guide classification decisions.
|
|
|
vercel
|
GPT-5.1 Instant |
gpt-5.1-instant
|
1.25 |
10.00 |
GPT-5.1 Instant (or GPT-5.1 chat) is a warmer and more conversational version of GPT-5-chat, with improved instruction following and adaptive reasoning for deciding when to think before responding.
|
|
|
vercel
|
GPT-4o mini |
gpt-4o-mini
|
0.15 |
0.60 |
GPT-4o mini from OpenAI is their most advanced and cost-efficient small model. It is multi-modal (accepting text or image inputs and outputting text) and has higher intelligence than gpt-3.5-turbo but is just as fast.
|
|
|
vercel
|
MiniMax M2.1 |
minimax-m2.1
|
0.30 |
1.20 |
MiniMax 2.1 is MiniMax's latest model, optimized specifically for robustness in coding, tool use,
instruction following, and long-horizon planning.
|
|
|
vercel
|
Gemini 2.0 Flash |
gemini-2.0-flash
|
0.10 |
0.40 |
Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
|
|
|
vercel
|
Devstral 2 |
devstral-2
|
0.00 |
0.00 |
An enterprise-grade text model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
|
|
|
vercel
|
GPT 5.1 Thinking |
gpt-5.1-thinking
|
1.25 |
10.00 |
An upgraded version of GPT-5 that adapts thinking time more precisely to the question to spend more time on complex questions and respond more quickly to simpler tasks.
|
|
|
vercel
|
text-embedding-3-small |
text-embedding-3-small
|
0.02 |
0.00 |
OpenAI's improved, more performant version of their ada embedding model.
|
|
|
vercel
|
Grok 4.1 Fast Reasoning |
grok-4.1-fast-reasoning
|
0.20 |
0.50 |
Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for maximal intelligence use this variant. Otherwise, use the non-reasoning version.
|
|
|
vercel
|
DeepSeek V3.2 Thinking |
deepseek-v3.2-thinking
|
0.28 |
0.42 |
Thinking mode of DeepSeek V3.2
|
|
|
vercel
|
GLM 4.7 |
glm-4.7
|
0.43 |
1.75 |
GLM-4.7 is Z.ai’s latest flagship model, with major upgrades focused on two key areas: stronger coding capabilities and more stable multi-step reasoning and execution.
|
|
|
vercel
|
Ministral 3B |
ministral-3b
|
0.04 |
0.04 |
A compact, efficient model for on-device tasks like smart assistants and local analytics, offering low-latency performance.
|
|
|
vercel
|
Devstral Small 2 |
devstral-small-2
|
0.00 |
0.00 |
Our open source model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
|
|
|
vercel
|
Mistral Embed |
mistral-embed
|
0.10 |
0.00 |
General-purpose text embedding model for semantic search, similarity, clustering, and RAG workflows.
|
|
|
vercel
|
Nova Lite |
nova-lite
|
0.06 |
0.24 |
A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.
|
|
|
vercel
|
Claude Opus 4.1 |
claude-opus-4.1
|
15.00 |
75.00 |
Claude Opus 4.1 is a drop-in replacement for Opus 4 that delivers superior performance and precision for real-world coding and agentic tasks. Opus 4.1 advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, and handles complex, multi-step problems with more rigor and attention to detail.
|
|
|
vercel
|
Qwen3 Next 80B A3B Instruct |
qwen3-next-80b-a3b-instruct
|
0.09 |
1.10 |
A new generation of open-source, non-thinking mode model powered by Qwen3. This version demonstrates superior Chinese text understanding, augmented logical reasoning, and enhanced capabilities in text generation tasks over the previous iteration (Qwen3-235B-A22B-Instruct-2507).
|
|
|
vercel
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
GPT 4.1 is OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.
|
|
|
vercel
|
GPT-4o |
gpt-4o
|
2.50 |
10.00 |
GPT-4o from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It matches GPT-4 Turbo performance with a faster and cheaper API.
|
|
|
vercel
|
GPT-4.1 nano |
gpt-4.1-nano
|
0.10 |
0.40 |
GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.
|
|
|
vercel
|
GPT 5.1 Codex Max |
gpt-5.1-codex-max
|
1.25 |
10.00 |
GPT‑5.1-Codex-Max is purpose-built for agentic coding.
|
|
|
vercel
|
Grok 4 Fast Reasoning |
grok-4-fast-reasoning
|
0.20 |
0.50 |
Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
|
|
|
vercel
|
Grok 4 |
grok-4
|
3.00 |
15.00 |
xAI's latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.
|
|
|
vercel
|
Nano Banana (Gemini 2.5 Flash Image) |
gemini-2.5-flash-image
|
0.30 |
2.50 |
Nano Banana (Gemini 2.5 Flash Image) is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
|
|
|
vercel
|
Nano Banana Pro (Gemini 3 Pro Image) |
gemini-3-pro-image
|
2.00 |
120.00 |
Nano Banana Pro (Gemini 3 Pro Image) builds on Nano Banana's generation capabilities into a new era of studio-quality, functional design to help you create and edit high-fidelity, production-ready visuals with unparalleled precision and control. Improvements include enhanced world knowledge and reasoning, dynamic text and translation, and studio level controls.
|
|
|
vercel
|
gpt-oss-20b |
gpt-oss-20b
|
0.07 |
0.30 |
A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
|
|
|
vercel
|
Gemini Embedding 001 |
gemini-embedding-001
|
0.15 |
0.00 |
State-of-the-art embedding model with excellent performance across English, multilingual and code tasks.
|
|
|
vercel
|
o4-mini |
o4-mini
|
1.10 |
4.40 |
OpenAI's o4-mini delivers fast, cost-efficient reasoning with exceptional performance for its size, particularly excelling in math (best-performing on AIME benchmarks), coding, and visual tasks.
|
|
|
vercel
|
Sonar |
sonar
|
1.00 |
1.00 |
Perplexity's lightweight offering with search grounding, quicker and cheaper than Sonar Pro.
|
|
|
vercel
|
Kimi K2 0905 |
kimi-k2-0905
|
0.60 |
2.50 |
Kimi K2 0905 has shown strong performance on agentic tasks thanks to its tool calling, reasoning abilities, and long context handling. But as a large parameter model (1T parameters), it’s also resource-intensive. Running it in production requires a highly optimized inference stack to avoid excessive latency.
|
|
|
vercel
|
Gemini 2.5 Flash Lite Preview 09-2025 |
gemini-2.5-flash-lite-preview-09-2025
|
0.10 |
0.40 |
Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
|
|
|
vercel
|
text-embedding-3-large |
text-embedding-3-large
|
0.13 |
0.00 |
OpenAI's most capable embedding model for both english and non-english tasks.
|
|
|
vercel
|
Gemini 2.0 Flash Lite |
gemini-2.0-flash-lite
|
0.08 |
0.30 |
Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
|
|
|
vercel
|
Claude Opus 4 |
claude-opus-4
|
15.00 |
75.00 |
Claude Opus 4 is Anthropic's most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish.
|
|
|
vercel
|
Claude 3.5 Haiku |
claude-3.5-haiku
|
0.80 |
4.00 |
Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.
|
|
|
vercel
|
GPT-5.2 Chat |
gpt-5.2-chat
|
1.75 |
14.00 |
The model powering ChatGPT is gpt-5.2-chat-latest: this is OpenAI's best general-purpose model, part of the GPT-5 flagship model family.
|
|
|
vercel
|
Gemini 2.5 Flash Preview 09-2025 |
gemini-2.5-flash-preview-09-2025
|
0.30 |
2.50 |
Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
|
|
|
vercel
|
GPT-5.1 Codex mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
GPT-5.1 Codex mini is a smaller, faster, and cheaper version of GPT-5.1 Codex.
|
|
|
vercel
|
DeepSeek V3.2 Exp |
deepseek-v3.2-exp
|
0.27 |
0.40 |
DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality.
|
|
|
vercel
|
MiMo V2 Flash |
mimo-v2-flash
|
0.10 |
0.29 |
Xiaomi MiMo-V2-Flash is a proprietary MoE model developed by Xiaomi, designed for extreme inference efficiency with 309B total parameters (15B active). By incorporating an innovative Hybrid attention architecture and multi-layer MTP inference acceleration, it ranks among the top 2 global open-source models across multiple Agent benchmarks.
|
|
|
vercel
|
DeepSeek V3 0324 |
deepseek-v3
|
0.77 |
0.77 |
Fast general-purpose LLM with enhanced reasoning capabilities
|
|
|
vercel
|
Mistral Small |
mistral-small
|
0.10 |
0.30 |
Mistral Small is the ideal choice for simple tasks that one can do in bulk - like Classification, Customer Support, or Text Generation. It offers excellent performance at an affordable price point.
|
|
|
vercel
|
o3 |
o3
|
2.00 |
8.00 |
OpenAI's o3 is their most powerful reasoning model, setting new state-of-the-art benchmarks in coding, math, science, and visual perception. It excels at complex queries requiring multi-faceted analysis, with particular strength in analyzing images, charts, and graphics.
|
|
|
vercel
|
Qwen3 Max |
qwen3-max
|
1.20 |
6.00 |
The Qwen 3 series Max model has undergone specialized upgrades in agent programming and tool invocation compared to the preview version. The officially released model this time has achieved state-of-the-art (SOTA) performance in its field and is better suited to meet the demands of agents operating in more complex scenarios.
|
|
|
vercel
|
Llama 3.3 70B |
llama-3.3-70b
|
0.72 |
0.72 |
The upgraded Llama 3.1 70B model features enhanced reasoning, tool use, and multilingual abilities, along with a significantly expanded 128K context window. These improvements make it well-suited for demanding tasks such as long-form summarization, multilingual conversations, and coding assistance.
|
|
|
vercel
|
Llama 3.1 8B |
llama-3.1-8b
|
0.03 |
0.05 |
Llama 3.1 8B brings powerful performance in a smaller, more efficient package. With improved multilingual support, tool use, and a 128K context length, it enables sophisticated use cases like interactive agents and compact coding assistants while remaining lightweight and accessible.
|
|
|
vercel
|
GPT-5.1-Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
GPT-5.1-Codex is a version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.
|
|
|
vercel
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.47 |
2.00 |
Kimi K2 Thinking is an advanced open-source thinking model by Moonshot AI. It can execute up to 200 – 300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. Built as a thinking agent, it reasons step by step while using tools, achieving state-of-the-art performance on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, with major gains in reasoning, agentic search, coding, writing, and general capabilities.
|
|
|
vercel
|
KAT-Coder-Pro V1 |
kat-coder-pro-v1
|
0.00 |
0.00 |
KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KwaiKAT series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving a remarkable 73.4% solve rate on the SWE-Bench Verified benchmark. KAT-Coder-Pro V1 delivers top-tier coding performance and has been rigorously tested by thousands of in-house engineers. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
|
|
|
vercel
|
Qwen3 235B A22b Instruct 2507 |
qwen-3-235b
|
0.13 |
0.60 |
Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
|
|
|
vercel
|
MiniMax M2.1 Lightning |
minimax-m2.1-lightning
|
0.30 |
2.40 |
MiniMax-M2.1-lightning is a faster version of MiniMax-M2.1, offering the same performance but with significantly higher throughput (output speed ~100 TPS, MiniMax-M2 output speed ~60 TPS).
|
|
|
vercel
|
Kimi K2 |
kimi-k2
|
0.50 |
2.00 |
Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.
|
|
|
vercel
|
DeepSeek R1 0528 |
deepseek-r1
|
0.50 |
2.15 |
The latest revision of DeepSeek's first-generation reasoning model
|
|
|
vercel
|
text-embedding-ada-002 |
text-embedding-ada-002
|
0.10 |
0.00 |
OpenAI's legacy text embedding model.
|
|
|
vercel
|
Llama 4 Scout 17B 16E Instruct |
llama-4-scout
|
0.08 |
0.30 |
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts. Served by DeepInfra.
|
|
|
vercel
|
o3-mini |
o3-mini
|
1.10 |
4.40 |
o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini.
|
|
|
vercel
|
DeepSeek V3.1 Terminus |
deepseek-v3.1-terminus
|
0.27 |
1.00 |
DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version and addresses user feedback (i.e. language consistency and agent upgrades).
|
|
|
vercel
|
Mistral Large 3 |
mistral-large-3
|
0.50 |
1.50 |
Mistral Large 3 2512 is Mistral’s most capable model to date. It has a sparse mixture-of-experts architecture with 41B active parameters (675B total).
|
|
|
vercel
|
Pixtral 12B 2409 |
pixtral-12b
|
0.15 |
0.15 |
A 12B model with image understanding capabilities in addition to text.
|
|
|
vercel
|
Sonar Pro |
sonar-pro
|
3.00 |
15.00 |
Perplexity's premier offering with search grounding, supporting advanced queries and follow-ups.
|
|
|
vercel
|
GLM-4.6V-Flash |
glm-4.6v-flash
|
0.00 |
0.00 |
For local deployment and low-latency applications. GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
|
|
|
vercel
|
Kimi K2 Thinking Turbo |
kimi-k2-thinking-turbo
|
1.15 |
8.00 |
High-speed version of kimi-k2-thinking, suitable for scenarios requiring both deep reasoning and extremely fast responses
|
|
|
vercel
|
Llama 4 Maverick 17B 128E Instruct |
llama-4-maverick
|
0.15 |
0.60 |
Llama 4 Maverick 17B-128E is Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities.
|
|
|
vercel
|
DeepSeek V3.1 |
deepseek-v3.1
|
0.30 |
1.00 |
DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.
|
|
|
vercel
|
Kimi K2 Turbo |
kimi-k2-turbo
|
2.40 |
10.00 |
Kimi K2 Turbo is the high-speed version of kimi-k2, with the same model parameters as kimi-k2, but the output speed is increased to 60 tokens per second, with a maximum of 100 tokens per second, the context length is 256k
|
|
|
vercel
|
Grok 3 Mini Beta |
grok-3-mini
|
0.30 |
0.50 |
xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible.
|
|
|
vercel
|
Claude 3.5 Sonnet |
claude-3.5-sonnet
|
3.00 |
15.00 |
The upgraded Claude 3.5 Sonnet is now state-of-the-art for a variety of tasks including real-world software engineering, agentic capabilities and computer use. The new Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.
|
|
|
vercel
|
LongCat Flash Chat |
longcat-flash-chat
|
0.00 |
0.00 |
LongCat-Flash-Chat is a high-throughput MoE chat model (128k context) designed for agentic tasks.
|
|
|
vercel
|
Qwen3 Next 80B A3B Thinking |
qwen3-next-80b-a3b-thinking
|
0.15 |
1.50 |
A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).
|
|
|
vercel
|
Qwen 3.32B |
qwen-3-32b
|
0.10 |
0.30 |
Qwen3-32B is a world-class model with comparable quality to DeepSeek R1 while outperforming GPT-4.1 and Claude Sonnet 3.7. It excels in code-gen, tool-calling, and advanced reasoning, making it an exceptional model for a wide range of production use cases.
|
|
|
vercel
|
Claude 3 Haiku |
claude-3-haiku
|
0.25 |
1.25 |
Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts. Haiku to quickly analyze large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier.
|
|
|
vercel
|
Qwen3 VL 235B A22B Instruct |
qwen3-vl-instruct
|
0.70 |
2.80 |
The Qwen3 series VL models has been comprehensively upgraded in areas such as visual coding and spatial perception. Its visual perception and recognition capabilities have significantly improved, supporting the understanding of ultra-long videos, and its OCR functionality has undergone a major enhancement.
|
|
|
vercel
|
Text Embedding 005 |
text-embedding-005
|
0.03 |
0.00 |
English-focused text embedding model optimized for code and English language tasks.
|
|
|
vercel
|
Nano Banana Preview (Gemini 2.5 Flash Image Preview) |
gemini-2.5-flash-image-preview
|
0.30 |
2.50 |
Gemini 2.5 Flash Image Preview is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
|
|
|
vercel
|
GPT 5.2 |
gpt-5.2-pro
|
21.00 |
168.00 |
Version of GPT-5.2 that produces smarter and more precise responses.
|
|
|
vercel
|
Qwen 3 Coder 30B A3B Instruct |
qwen3-coder-30b-a3b
|
0.07 |
0.27 |
Efficient coding specialist balancing performance with cost-effectiveness for daily development tasks while maintaining strong tool integration capabilities.
|
|
|
vercel
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder
|
0.38 |
1.53 |
Mixture-of-experts LLM with advanced coding and reasoning capabilities
|
|
|
vercel
|
Grok 2 Vision |
grok-2-vision
|
2.00 |
10.00 |
Grok 2 vision model excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA). It can process a wide variety of visual information including documents, diagrams, charts, screenshots, and photographs.
|
|
|
vercel
|
Morph V3 Fast |
morph-v3-fast
|
0.80 |
1.20 |
Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 4500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
|
|
|
vercel
|
Grok 3 Beta |
grok-3
|
3.00 |
15.00 |
xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.
|
|
|
vercel
|
Nova Micro |
nova-micro
|
0.04 |
0.14 |
A text-only model that delivers the lowest latency responses at very low cost.
|
|
|
vercel
|
Ministral 14B |
ministral-14b
|
0.20 |
0.20 |
Ministral 3 14B is the largest model in the Ministral 3 family, offering state-of-the-art capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. Optimized for local deployment, it delivers high performance across diverse hardware, including local setups.
|
|
|
vercel
|
Ministral 8B |
ministral-8b
|
0.10 |
0.10 |
A more powerful model with faster, memory-efficient inference, ideal for complex workflows and demanding edge applications.
|
|
|
vercel
|
Mistral Codestral |
codestral
|
0.30 |
0.90 |
Mistral's cutting-edge language model for coding released end of July 2025, Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
|
|
|
vercel
|
Claude 3 Opus |
claude-3-opus
|
15.00 |
75.00 |
Claude 3 Opus is Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what's possible with generative AI.
|
|
|
vercel
|
Pixtral Large |
pixtral-large
|
2.00 |
6.00 |
Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.
|
|
|
vercel
|
GPT-4 Turbo |
gpt-4-turbo
|
10.00 |
30.00 |
gpt-4-turbo from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It has a knowledge cutoff of April 2023 and a 128,000 token context window.
|
|
|
vercel
|
voyage-3.5 |
voyage-3.5
|
0.06 |
0.00 |
Voyage AI's embedding model optimized for general-purpose and multilingual retrieval quality.
|
|
|
vercel
|
Llama 3.1 70B Instruct |
llama-3.1-70b
|
0.40 |
0.40 |
An update to Meta Llama 3 70B Instruct that includes an expanded 128K context length, multilinguality and improved reasoning capabilities.
|
|
|
vercel
|
Nemotron 3 Nano 30B A3B |
nemotron-3-nano-30b-a3b
|
0.06 |
0.24 |
NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.
|
|
|
vercel
|
Qwen3 VL 235B A22B Thinking |
qwen3-vl-thinking
|
0.70 |
8.40 |
Qwen3 series VL models feature significantly enhanced multimodal reasoning capabilities, with a particular focus on optimizing the model for STEM and mathematical reasoning. Visual perception and recognition abilities have been comprehensively improved, and OCR capabilities have undergone a major upgrade.
|
|
|
vercel
|
Sonar Reasoning Pro |
sonar-reasoning-pro
|
2.00 |
8.00 |
A premium reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing comprehensive explanations with enhanced search capabilities and multiple search queries per request.
|
|
|
vercel
|
GPT-3.5 Turbo |
gpt-3.5-turbo
|
0.50 |
1.50 |
OpenAI's most capable and cost effective model in the GPT-3.5 family optimized for chat purposes, but also works well for traditional completions tasks.
|
|
|
vercel
|
Qwen3 Embedding 8B |
qwen3-embedding-8b
|
0.05 |
0.00 |
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
|
|
|
vercel
|
Mistral Medium 3.1 |
mistral-medium
|
0.40 |
2.00 |
Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost.
|
|
|
vercel
|
INTELLECT 3 |
intellect-3
|
0.20 |
1.10 |
Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack.
Achieving state-of-the-art performance for its size across math, code and reasoning.
|
|
|
vercel
|
Nvidia Nemotron Nano 12B V2 VL |
nemotron-nano-12b-v2-vl
|
0.20 |
0.60 |
The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.
|
|
|
vercel
|
Qwen3-14B |
qwen-3-14b
|
0.06 |
0.24 |
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
|
|
|
vercel
|
Embed v4.0 |
embed-v4.0
|
0.12 |
0.00 |
A model that allows for text, images, or mixed content to be classified or turned into embeddings.
|
|
|
vercel
|
GLM 4.5 Air |
glm-4.5-air
|
0.20 |
1.10 |
GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
|
|
|
vercel
|
GPT-5 pro |
gpt-5-pro
|
15.00 |
120.00 |
GPT-5 pro uses more compute to think harder and provide consistently better answers. Since GPT-5 pro is designed to tackle tough problems, some requests may take several minutes to finish.
|
|
|
vercel
|
Llama 3.2 3B Instruct |
llama-3.2-3b
|
0.15 |
0.15 |
Text-only model, fine-tuned for supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
|
|
|
vercel
|
voyage-3-large |
voyage-3-large
|
0.18 |
0.00 |
Voyage AI's embedding model with the best general-purpose and multilingual retrieval quality.
|
|
|
vercel
|
Titan Text Embeddings V2 |
titan-embed-text-v2
|
0.02 |
0.00 |
Amazon Titan Text Embeddings V2 is a light weight, efficient multilingual embedding model supporting 1024, 512, and 256 dimensions.
|
|
|
vercel
|
Grok 3 Fast Beta |
grok-3-fast
|
5.00 |
25.00 |
xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
|
|
|
vercel
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking
|
0.30 |
2.90 |
Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.
|
|
|
vercel
|
v0-1.5-md |
v0-1.5-md
|
3.00 |
15.00 |
Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
|
|
|
vercel
|
Qwen3 Coder Plus |
qwen3-coder-plus
|
1.00 |
5.00 |
Powered by Qwen3 this is a powerful Coding Agent that excels in tool calling and environment interaction to achieve autonomous programming. It combines outstanding coding proficiency with versatile general-purpose abilities.
|
|
|
vercel
|
Qwen3 Embedding 4B |
qwen3-embedding-4b
|
0.02 |
0.00 |
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
|
|
|
vercel
|
Grok 3 Mini Fast Beta |
grok-3-mini-fast
|
0.60 |
4.00 |
xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
|
|
|
vercel
|
v0-1.0-md |
v0-1.0-md
|
3.00 |
15.00 |
Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
|
|
|
vercel
|
Qwen3-30B-A3B |
qwen-3-30b
|
0.08 |
0.29 |
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
|
|
|
vercel
|
o3 Pro |
o3-pro
|
20.00 |
80.00 |
The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.
|
|
|
vercel
|
GLM-4.6V |
glm-4.6v
|
0.30 |
0.90 |
GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
|
|
|
vercel
|
Grok 2 |
grok-2
|
2.00 |
10.00 |
Grok 2 is a frontier language model with state-of-the-art reasoning capabilities. It features advanced capabilities in chat, coding, and reasoning, outperforming both Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard.
|
|
|
vercel
|
Claude 3.5 Sonnet (2024-06-20) |
claude-3.5-sonnet-20240620
|
3.00 |
15.00 |
Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.
|
|
|
vercel
|
Nova Pro |
nova-pro
|
0.80 |
3.20 |
A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.
|
|
|
vercel
|
Command A |
command-a
|
2.50 |
10.00 |
Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.
|
|
|
vercel
|
Nova 2 Lite |
nova-2-lite
|
0.30 |
2.50 |
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text.
|
|
|
vercel
|
Sonoma Sky Alpha |
sonoma-sky-alpha
|
0.20 |
0.50 |
This model is no longer in stealth and gets responses from Grok 4 Fast Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
|
|
|
vercel
|
Sonoma Dusk Alpha |
sonoma-dusk-alpha
|
0.20 |
0.50 |
This model is no longer in stealth and gets responses from Grok 4 Fast Non-Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
|
|
|
vercel
|
Llama 3.2 1B Instruct |
llama-3.2-1b
|
0.10 |
0.10 |
Text-only model, supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
|
|
|
vercel
|
o1 |
o1
|
15.00 |
60.00 |
o1 is OpenAI's flagship reasoning model, designed for complex problems that require deep thinking. It provides strong reasoning capabilities with improved accuracy for complex multi-step tasks.
|
|
|
vercel
|
GLM 4.5V |
glm-4.5v
|
0.60 |
1.80 |
Built on the GLM-4.5-Air base model, GLM-4.5V inherits proven techniques from GLM-4.1V-Thinking while achieving effective scaling through a powerful 106B-parameter MoE architecture.
|
|
|
vercel
|
GLM 4.5 |
glm-4.5
|
0.60 |
2.20 |
GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
|
|
|
vercel
|
Qwen3 Max Preview |
qwen3-max-preview
|
1.20 |
6.00 |
Qwen3-Max-Preview shows substantial gains over the 2.5 series in overall capability, with significant enhancements in Chinese-English text understanding, complex instruction following, handling of subjective open-ended tasks, multilingual ability, and tool invocation; model knowledge hallucinations are reduced.
|
|
|
vercel
|
Devstral Small 1.1 |
devstral-small
|
0.10 |
0.30 |
Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents.
|
|
|
vercel
|
voyage-3.5-lite |
voyage-3.5-lite
|
0.02 |
0.00 |
Voyage AI's embedding model optimized for latency and cost.
|
|
|
vercel
|
FLUX.1 Kontext Max |
flux-kontext-max
|
0.00 |
0.00 |
FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed.
This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
|
|
|
vercel
|
Imagen 4 Fast |
imagen-4.0-fast-generate-001
|
0.00 |
0.00 |
Imagen 4 Fast is Google’s speed-optimized variant of the Imagen 4 text-to-image model, designed for rapid, high-volume image generation. It’s ideal for workflows like quick drafts, mockups, and iterative creative exploration. Despite emphasizing speed, it still benefits from the broader Imagen 4 family’s improvements in clarity, text rendering, and stylistic flexibility, and supports high-resolution outputs up to 2K.
|
|
|
vercel
|
o3-deep-research |
o3-deep-research
|
10.00 |
40.00 |
o3-deep-research is OpenAI's most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data—brought in through MCP connectors.
|
|
|
vercel
|
FLUX1.1 [pro] |
flux-pro-1.1
|
0.00 |
0.00 |
FLUX1.1 [pro] is the standard for text-to-image generation with fast, reliable and consistently stunning results.
This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
|
|
|
vercel
|
Imagen 4 |
imagen-4.0-generate-001
|
0.00 |
0.00 |
Imagen 4: Google's flagship text-to-image model that serves as the go-to choice for a wide variety of high-quality image generation tasks, featuring significant improvements in text rendering over previous models. It now supports up to 2K resolution generation for creating detailed and crisp visuals, making it suitable for everything from marketing assets to artistic compositions.
|
|
|
vercel
|
FLUX.2 [flex] |
flux-2-flex
|
0.00 |
0.00 |
FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [flex] supports customizable image generation and editing with adjustable steps and guidance. It's better at typography and text rendering. It supports up to 10 reference images (up to 14 MP total input).
This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
|
|
|
vercel
|
FLUX.2 [pro] |
flux-2-pro
|
0.00 |
0.00 |
FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [pro] supports generation, editing, and multiple reference images (up to 9 MP total input).
This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
|
|
|
vercel
|
Imagen 4 Ultra |
imagen-4.0-ultra-generate-001
|
0.00 |
0.00 |
Imagen 4 Ultra: Highest quality image generation model for detailed and photorealistic outputs.
|
|
|
vercel
|
Sonar Reasoning |
sonar-reasoning
|
1.00 |
5.00 |
A reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing detailed explanations with search grounding.
|
|
|
vercel
|
FLUX1.1 [pro] Ultra |
flux-pro-1.1-ultra
|
0.00 |
0.00 |
FLUX1.1 [pro] Ultra delivers ultra-fast, ultra high-resolution image creation - with more pixels in every picture. Generate varying aspect ratios from text, at 4MP resolution fast.
This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
|
|
|
vercel
|
FLUX.1 Kontext Pro |
flux-kontext-pro
|
0.00 |
0.00 |
FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed.
This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
|
|
|
vercel
|
GPT-3.5 Turbo Instruct |
gpt-3.5-turbo-instruct
|
1.50 |
2.00 |
Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.
|
|
|
vercel
|
Llama 3.2 90B Vision Instruct |
llama-3.2-90b
|
0.72 |
0.72 |
Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
|
|
|
vercel
|
Qwen3 Embedding 0.6B |
qwen3-embedding-0.6b
|
0.01 |
0.00 |
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
|
|
|
vercel
|
Trinity Mini |
trinity-mini
|
0.05 |
0.15 |
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model, engineered for efficient inference over long contexts with robust function calling and multi-step agent workflows.
|
|
|
vercel
|
FLUX.1 Fill [pro] |
flux-pro-1.0-fill
|
0.00 |
0.00 |
A state-of-the-art inpainting model, enabling editing and expansion of real and generated images given a text description and a binary mask.
This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
|
|
|
vercel
|
FLUX.2 [max] |
flux-2-max
|
0.00 |
0.00 |
FLUX.2 [max] offers image generation and image editing with the highest quality available. It delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency. Built for professional use, FLUX.2 [max] produces production-ready outputs for marketing teams, creatives, filmmakers, and creators around the world.
|
|
|
vercel
|
Text Multilingual Embedding 002 |
text-multilingual-embedding-002
|
0.03 |
0.00 |
Multilingual text embedding model optimized for cross-lingual tasks across many languages.
|
|
|
vercel
|
Mercury Coder Small Beta |
mercury-coder-small
|
0.25 |
1.00 |
Mercury Coder Small is ideal for code generation, debugging, and refactoring tasks with minimal latency.
|
|
|
vercel
|
LongCat Flash Thinking |
longcat-flash-thinking
|
0.15 |
1.50 |
LongCat-Flash-Thinking is a high-throughput MoE reasoning model (128k context) optimized for agentic tasks.
|
|
|
vercel
|
Llama 3.2 11B Vision Instruct |
llama-3.2-11b
|
0.16 |
0.16 |
Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
|
|
|
vercel
|
Codestral Embed |
codestral-embed
|
0.15 |
0.00 |
Code embedding model that can embed code databases and repositories to power coding assistants.
|
|
|
vercel
|
Magistral Medium 2509 |
magistral-medium
|
2.00 |
5.00 |
Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
|
|
|
vercel
|
Magistral Small 2509 |
magistral-small
|
0.50 |
1.50 |
Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
|
|
|
vercel
|
Mistral Nemo |
mistral-nemo
|
0.04 |
0.17 |
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.
|
|
|
vercel
|
Mixtral MoE 8x22B Instruct |
mixtral-8x22b-instruct
|
1.20 |
1.20 |
8x22b Instruct model. 8x22b is mixture-of-experts open source model by Mistral served by Fireworks.
|
|
|
vercel
|
Morph V3 Large |
morph-v3-large
|
0.90 |
1.90 |
Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 2500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
|
|
|
vercel
|
Nvidia Nemotron Nano 9B V2 |
nemotron-nano-9b-v2
|
0.04 |
0.16 |
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.\
|
|
|
vercel
|
Codex Mini |
codex-mini
|
1.50 |
6.00 |
Codex Mini is a fine-tuned version of o4-mini specifically for use in Codex CLI.
|
|
|
vercel
|
voyage-code-2 |
voyage-code-2
|
0.12 |
0.00 |
Voyage AI's embedding model optimized for code retrieval (17% better than alternatives). This is the previous generation of code embeddings models.
|
|
|
vercel
|
voyage-code-3 |
voyage-code-3
|
0.18 |
0.00 |
Voyage AI's embedding model optimized for code retrieval.
|
|
|
vercel
|
voyage-finance-2 |
voyage-finance-2
|
0.12 |
0.00 |
Voyage AI's embedding model optimized for finance retrieval and RAG.
|
|
|
vercel
|
voyage-law-2 |
voyage-law-2
|
0.12 |
0.00 |
Voyage AI's embedding model optimized for legal retrieval and RAG.
|
|
|
together
|
Llama 4 Maverick |
llama-4-maverick
|
0.27 |
0.85 |
-
|
|
|
together
|
Llama 4 Scout |
llama-4-scout
|
0.18 |
0.59 |
-
|
|
|
together
|
Llama 3.3 70B Instruct-Turbo |
llama-3-3-70b-instruct-turbo
|
0.88 |
0.88 |
-
|
|
|
together
|
Llama 3.2 3B Instruct Turbo |
llama-3-2-3b-instruct-turbo
|
0.06 |
0.06 |
-
|
|
|
together
|
Llama 3.1 405B Instruct Turbo |
llama-3-1-405b-instruct-turbo
|
3.50 |
3.50 |
-
|
|
|
together
|
Llama 3.1 70B Instruct Turbo |
llama-3-1-70b-instruct-turbo
|
0.88 |
0.88 |
-
|
|
|
together
|
Llama 3.1 8B Instruct Turbo |
llama-3-1-8b-instruct-turbo
|
0.18 |
0.18 |
-
|
|
|
together
|
Llama 3 8B Instruct Lite |
llama-3-8b-instruct-lite
|
0.10 |
0.10 |
-
|
|
|
together
|
Llama 3 70B Instruct Reference |
llama-3-70b-instruct-reference
|
0.88 |
0.88 |
-
|
|
|
together
|
Llama 3 70B Instruct Turbo |
llama-3-70b-instruct-turbo
|
0.88 |
0.88 |
-
|
|
|
together
|
LLaMA-2 |
llama-2
|
0.90 |
0.90 |
-
|
|
|
together
|
DeepSeek-R1 |
deepseek-r1
|
3.00 |
7.00 |
-
|
|
|
together
|
DeepSeek R1 Distilled Qwen 14B |
deepseek-r1-distilled-qwen-14b
|
0.18 |
0.18 |
-
|
|
|
together
|
DeepSeek R1 Distilled Llama 70B |
deepseek-r1-distilled-llama-70b
|
2.00 |
2.00 |
-
|
|
|
together
|
DeepSeek R1-0528-tput |
deepseek-r1-0528-tput
|
0.55 |
2.19 |
-
|
|
|
together
|
DeepSeek-V3-1 |
deepseek-v3-1
|
0.60 |
1.70 |
-
|
|
|
together
|
DeepSeek-V3 |
deepseek-v3
|
1.25 |
1.25 |
-
|
|
|
together
|
gpt-oss-120B |
gpt-oss-120b
|
0.15 |
0.60 |
-
|
|
|
together
|
gpt-oss-20B |
gpt-oss-20b
|
0.05 |
0.20 |
-
|
|
|
together
|
Qwen3 Next 80B A3B Instruct |
qwen3-next-80b-a3b-instruct
|
0.15 |
1.50 |
-
|
|
|
together
|
Qwen3 Next 80B A3B Thinking |
qwen3-next-80b-a3b-thinking
|
0.15 |
1.50 |
-
|
|
|
together
|
Qwen3-VL 32B Instruct |
qwen3-vl-32b-instruct
|
0.50 |
1.50 |
-
|
|
|
together
|
Qwen3-Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
2.00 |
2.00 |
-
|
|
|
together
|
Qwen3 235B A22B Instruct 2507 FP8 |
qwen3-235b-a22b-instruct-2507-fp8
|
0.20 |
0.60 |
-
|
|
|
together
|
Qwen3 235B A22B Thinking 2507 FP8 |
qwen3-235b-a22b-thinking-2507-fp8
|
0.65 |
3.00 |
-
|
|
|
together
|
Qwen3 235B A22B FP8 Throughput |
qwen3-235b-a22b-fp8-throughput
|
0.20 |
0.60 |
-
|
|
|
together
|
Qwen 2.5 72B |
qwen-2-5-72b
|
1.20 |
1.20 |
-
|
|
|
together
|
Qwen2.5-VL 72B Instruct |
qwen2-5-vl-72b-instruct
|
1.95 |
8.00 |
-
|
|
|
together
|
Qwen2.5 Coder 32B Instruct |
qwen2-5-coder-32b-instruct
|
0.80 |
0.80 |
-
|
|
|
together
|
Qwen2.5 7B Instruct Turbo |
qwen2-5-7b-instruct-turbo
|
0.30 |
0.30 |
-
|
|
|
together
|
Qwen QwQ-32B |
qwen-qwq-32b
|
1.20 |
1.20 |
-
|
|
|
together
|
GLM-4.6 |
glm-4-6
|
0.60 |
2.20 |
-
|
|
|
together
|
GLM-4.5-Air |
glm-4-5-air
|
0.20 |
1.10 |
-
|
|
|
together
|
Kimi K2 Instruct |
kimi-k2-instruct
|
1.00 |
3.00 |
-
|
|
|
together
|
Kimi K2 Thinking |
kimi-k2-thinking
|
1.20 |
4.00 |
-
|
|
|
together
|
Kimi K2 0905 |
kimi-k2-0905
|
1.00 |
3.00 |
-
|
|
|
together
|
Mistral (7B) Instruct v0.2 |
mistral-7b-instruct-v0-2
|
0.20 |
0.20 |
-
|
|
|
together
|
Mistral Instruct |
mistral-instruct
|
0.20 |
0.20 |
-
|
|
|
together
|
Mistral Small 3 |
mistral-small-3
|
0.80 |
0.80 |
-
|
|
|
together
|
Mixtral 8x7B Instruct v0.1 |
mixtral-8x7b-instruct-v0-1
|
0.60 |
0.60 |
-
|
|
|
together
|
Marin 8B Instruct |
marin-8b-instruct
|
0.18 |
0.18 |
-
|
|
|
together
|
Arcee AI AFM-4.5B |
arcee-ai-afm-4-5b
|
0.10 |
0.40 |
-
|
|
|
together
|
Arcee AI Coder-Large |
arcee-ai-coder-large
|
0.50 |
0.80 |
-
|
|
|
together
|
Arcee AI Maestro |
arcee-ai-maestro
|
0.90 |
3.30 |
-
|
|
|
together
|
Arcee AI Virtuoso-Large |
arcee-ai-virtuoso-large
|
0.75 |
1.20 |
-
|
|
|
together
|
Cogito v2 preview - 109B MoE |
cogito-v2-preview-109b-moe
|
0.18 |
0.59 |
-
|
|
|
together
|
Cogito v2 preview - 405B |
cogito-v2-preview-405b
|
3.50 |
3.50 |
-
|
|
|
together
|
Cogito v2 preview - 671B MoE |
cogito-v2-preview-671b-moe
|
1.25 |
1.25 |
-
|
|
|
together
|
Cogito v2 preview - 70B |
cogito-v2-preview-70b
|
0.88 |
0.88 |
-
|
|
|
together
|
Refuel LLM-2 |
refuel-llm-2
|
0.60 |
0.60 |
-
|
|
|
together
|
Refuel LLM-2 Small |
refuel-llm-2-small
|
0.20 |
0.20 |
-
|
|
|
together
|
Typhoon 2 70B Instruct |
typhoon-2-70b-instruct
|
0.88 |
0.88 |
-
|
|
|
together
|
gemma-3n-E4B-it |
gemma-3n-e4b-it
|
0.02 |
0.04 |
-
|
|
|
poe
|
- |
assistant
|
- |
- |
General-purpose assistant. Write, code, ask for real-time information, create images, and more.
Queries are automatically routed based on the task and subscription status.
For subscribers:
- General queries: @GPT-5.2-Instant
- Web searches: @Web-Search
- Image generation: @Nano-Banana
- Video-input tasks: @Gemini-2.5-Pro
For non-subscribers:
- General queries: @GPT-4o-Mini
- Web searches: @Web-Search
- Image generation: @FLUX-schnell
- Video-input tasks: @Gemini-2.5-Flash
|
|
|
poe
|
- |
gpt-5.2-instant
|
1.60 |
13.00 |
A fast, steady conversational model built for day-to-day use. It handles long threads without drifting, keeps context clean, and answers in a straightforward way. Good for planning, rewriting, summarizing, and quick technical help. Supports 400k tokens of context and native vision.
Optional parameters:
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
|
|
|
poe
|
- |
claude-opus-4.5
|
4.30 |
21.00 |
Claude Opus 4.5 from Anthropic, supports customizable thinking budget (up to 64k tokens) and 200k context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63999 to the end of your message.
|
|
|
poe
|
- |
gemini-3-flash
|
0.40 |
2.40 |
Building on the reasoning capabilities of Gemini 3 Pro, Gemini 3 Flash is a powerful but affordable and performant model. It has exceptional world knowledge, multimodal understanding and reasoning capabilities at a fraction of the cost of equivalent models (as of December 2025).
Optional parameters:
To set thinking level, add --thinking_level and set it to either `minimal`, `low`, `high`. This is set to `low` as default.
To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
|
|
|
poe
|
- |
gemini-3-pro
|
1.60 |
9.60 |
Gemini 3 Pro is a state-of-the-art model for math, coding, computer use, and long‑horizon agent tasks, delivering top benchmark results including 23.4% on MathArena Apex (up from 1.6%), SOTA on tau-bench, an Elo of 2,439 on LiveCodeBench Pro (vs. 2,234), 72.7% on ScreenSpot‑Pro (~2× the previous best), and a higher mean net worth on Vending‑Bench 2 ($5,478 vs. $3,838). It has a 1M input context window and a max output tokens of 64k.
Optional Parameters:
To instruct the bot to use more thinking effort, select from "Low" or "High"
To enable web search and real-time information update, toggle "enable web search". This is disabled by default.
|
|
|
poe
|
- |
gpt-5.2-pro
|
19.00 |
150.00 |
A powerful reasoning model that is ideal for your most complex, highest difficulty tasks. On x-high reasoning effort, scores a 90.5% on ARC-AGI-1 benchmark, an incredibly difficult problem-solving benchmark where humans score 100%. Note: the model can take up to 30 minutes to think through a problem and is quite expensive. Supports 400k tokens of context and native vision.
Optional parameters:
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "medium", "high" or "Xhigh" (default: "medium")
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
|
|
|
poe
|
- |
gpt-5.2
|
1.60 |
13.00 |
GPT-5.2 is a state-of-the-art AI model from OpenAI designed for real work across writing, analysis, coding, and problem solving. It handles long contexts and multi-step tasks better than earlier versions, and it’s tuned to give accurate responses with fewer errors. Supports 400k tokens of context, and native vision.
Optional parameters:
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", "high", or "Xhigh" (default: "None")
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
Use `--verbosity` to control response details at the end of your message with one of "low", medium", "high" (default: medium)
|
|
|
poe
|
- |
claude-sonnet-4.5
|
2.60 |
13.00 |
Claude Sonnet 4.5 represents a major leap forward in AI capability and alignment. It is the most advanced model released by Anthropic to date, distinguished by dramatic improvements in reasoning, mathematics, and real-world coding. Supports 1m tokens of context.
To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 31,999 to the end of your message.
Use `--web_search true` to enable web search and real-time information update. This is disabled by default.
|
|
|
poe
|
- |
grok-4
|
3.00 |
15.00 |
Grok 4 is xAI's latest and most intelligent language model. It features state-of-the-art capabilities in coding, reasoning, and answering questions. It excels at handling complex and multi-step tasks. Reasoning traces are not available via the xAI API.
|
|
|
poe
|
- |
claude-haiku-4.5
|
0.85 |
4.30 |
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line, and scores >73% on SWE-bench verified, ranking among the world's best coding models. Supports 200k tokens of context.
Optional parameters:
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63,999 to the end of your message.
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
|
|
|
poe
|
- |
claude-opus-4.1
|
13.00 |
64.00 |
Claude Opus 4.1 from Anthropic, supports customizable thinking budget (up to 32k tokens) and 200k context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 31999 to the end of your message.
|
|
|
poe
|
- |
glm-4.7
|
- |
- |
GLM-4.7 is Z.AI's latest flagship model, with major upgrades focused on advanced coding capabilities and more reliable multi-step reasoning and execution. It shows clear gains in complex agent workflows, while delivering a more natural conversational experience and stronger front-end design sensibility.
File Support: Text, Markdown and PDF files
Context window: 205k tokens
Optional parameters:
Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default.
Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default
Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
|
|
|
poe
|
- |
minimax-m2.1
|
- |
- |
MiniMax M2.1 is a cutting-edge AI model designed to revolutionize how developers build software. With enhanced multi-language programming support, it excels in generating high-quality code across popular languages like Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, and JavaScript.
Key improvements include:
22% faster response times and 30% lower token consumption for efficient workflows.
Seamless integration with leading development frameworks (Claude Code, Droid Factory AI, BlackBox, etc.).
Full-stack development capabilities, from mobile (Android/iOS) to web and 3D interactive prototyping.
Optimized performance-to-cost ratio, making AI-assisted development more accessible.
Whether you're a software engineer, app developer, or tech innovator, M2.1 empowers smarter coding with industry-leading AI.
File Support: Text, Markdown and PDF files
Context window: 205k tokens
Optional parameters:
Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default.
Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default
Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
|
|
|
poe
|
- |
gemini-2.5-flash
|
0.21 |
1.80 |
Gemini 2.5 Flash builds upon the popular foundation of Google's 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, search capabilities, and image/video understanding while still prioritizing speed and cost. Supports 1M tokens of input context. Serves the latest `gemini-2.5-flash-preview-09-2025` snapshot.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 24,576 to the end of your message.
To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
|
|
|
poe
|
- |
gemini-2.5-pro
|
0.87 |
7.00 |
Gemini 2.5 Pro is Google's advanced model with frontier performance on various key benchmarks; supports web search and 1 million tokens of input context.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 32,768 to the end of your message.
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
|
|
|
poe
|
- |
kling-omni
|
- |
- |
Bot for Kling Omni Image-to-Video inference. Send one image for image-to-video generation and two images for first-to-last frame video generation. Set duration with `--duration`, to either 5 or 10 seconds.
Accepted file type: jpeg, png, webp, heic, heif.
This bot does not accept video files.
Note: Prompt is required after attaching images to generate video.
|
|
|
poe
|
- |
deepseek-r1
|
18,000.00 |
- |
Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to Together AI, a US-based company. Supports 164k tokens of input context and 33k tokens of output context. Uses the latest May 28th snapshot (DeepSeek-R1-0528).
|
|
|
poe
|
- |
manus
|
- |
- |
Manus is an autonomous AI agent that executes tasks. It can take a high-level prompt, break it into subtasks, interact with tools/APIs, and deliver end-to-end results (like reports, code, websites, images, and more) without you managing each step.
Notes:
- In Agent mode, responses may take several minutes to complete.
- Sometimes, files that Manus has created are incorrectly uploaded to the Poe message. In such cases, please check the Manus chat for the file.
Parameter controls available:
1. Task Mode
- Default: '--task_mode adaptive' (smart routing: may choose Chat or Agent)
- Conversational single turn:' --task_mode chat' (fixed price)
- Autonomous multi-step: '--task_mode agent'
2. Agent Profile
- Default: '--agent_profile manus-1.6' (standard tasks)
- Lower usage: '--agent_profile manus-1.6-lite' (speed/savings)
- Maximum capability: '--agent_profile manus-1.6-max' (complex reasoning)
|
|
|
poe
|
- |
glm-4.6
|
6,600.00 |
- |
As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
Use `--enable_thinking false` to disable thinking about the response before giving a final answer. This is enabled by default.
Bot does not support media (video and audio file) attachments.
Technical Specifications
File Support: Text, Markdown and PDF files
Context window: 200k tokens
|
|
|
poe
|
- |
gpt-5.1-instant
|
1.10 |
9.00 |
OpenAI’s most flagship model optimized for conversational intelligence. It excels at natural dialogue, contextual memory, and adaptive tone, making it perfect for interactive agents, tutoring, and customer support. It balances speed, reliability, and empathy for seamless real‑time communication. Supports 128k tokens of input context.
|
|
|
poe
|
- |
gpt-5.1
|
1.10 |
9.00 |
OpenAI’s flagship general‑purpose model, built for advanced reasoning, comprehension, and creativity. It delivers robust performance across text and code, with significant improvements in factual accuracy, long‑context understanding, and multilingual fluency. Ideal for research, content creation, analysis, and problem‑solving in any domain. Supports 400k of input context window.
Optional parameters:
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high" (default: "None")
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
|
|
|
poe
|
- |
gpt-image-1.5
|
- |
- |
OpenAI's frontier image generation model in ChatGPT as of December 2025, offering exceptional prompt adherence, world knowledge, precise edits, facial preservation, level of detail, and overall quality with improved latency/generation times. It supports editing, restyling, and combining images attached to the latest user query. For a conversational image generation and editing experience use: https://poe.com/GPT-5.2
Optional Parameters:
Set aspect ratio, with options 3:2, 1:1 and 2:3.
Set quality to low, medium and high. Default is set to high.
Enable use mask by toggling it on or by typing 'use_mask' in the prompt. This option is turned off by default.
Disable high fidelity by toggling it off or by typing 'use_high_fidelity'. This option is turned on by default.
|
|
|
poe
|
- |
kimi-k2-thinking
|
6,700.00 |
- |
Built as a thinking agent, it performs step-by-step reasoning while utilizing tools, achieving state-of-the-art performance on benchmarks such as Humanity's Last Exam (HLE), BrowseComp, and others. The model demonstrates substantial advancements in reasoning, agentic search, coding, writing, and general problem-solving capabilities.
Kimi K2 Thinking is capable of executing 200–300 sequential tool calls autonomously, maintaining coherent reasoning across hundreds of steps to solve complex tasks.
File Support: Text, Markdown and PDF files
Context window: 256k tokens
|
|
|
poe
|
- |
deepseek-v3.2
|
- |
- |
We introduce DeepSeek-V3.2, a next-generation foundation model designed to unify high computational efficiency with state-of-the-art reasoning and agentic performance. DeepSeek-V3.2 is built upon three core technical breakthroughs:
• DeepSeek Sparse Attention (DSA):
A new highly efficient attention mechanism that significantly reduces computational overhead while preserving model quality, purpose-built for long-context reasoning and high-throughput workloads.
• Scalable Reinforcement Learning Framework:
DeepSeek-V3.2 leverages a robust RL training protocol and expanded post-training compute to reach GPT-5-level performance. Its high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and demonstrates reasoning capabilities comparable to Gemini-3.0-Pro.
• Large-Scale Agentic Task Synthesis Pipeline:
To enable reliable tool-use and multi-step decision-making, we develop a novel agentic data synthesis pipeline that generates high-quality interactive reasoning tasks at scale, greatly enhancing the model’s
File Support: Text, Markdown and PDF files
Context window: 164k tokens
|
|
|
poe
|
- |
glm-4.6v
|
- |
- |
GLM-4.6V represents a significant multimodal advancement in the GLM series, achieving state-of-the-art visual understanding accuracy for models of its parameter scale. Notably, it's the first visual model to natively integrate Function Call capabilities directly into its architecture, creating a seamless pathway from visual perception to executable actions. This breakthrough establishes a unified technical foundation for deploying multimodal agents in real-world business applications.
File Support: Text, Markdown, Image and PDF files
Context window: 131k tokens
Optional parameters:
Enable Thinking - Toggle this on for the model to think before providing a response. This is disabled by default
Temperature - Controls randomness in the response. Lower values make the output more focused and deterministic. Select from 0 to 2 range. This is set to 0.7 by default.
Max Output Tokens: Maximum number of tokens to generate in the response. This can be set from 1 to 32768. Set to Max token at 32768 by default.
|
|
|
poe
|
- |
gpt-5.1-codex
|
1.10 |
9.00 |
GPT‑5.1‑Codex extends GPT‑5.1’s capabilities for software development. It understands complex codebases, provides accurate completions, explains algorithms, and assists with debugging across modern programming languages. Designed for developers, it elevates productivity and supports full‑stack coding workflows with precision. Supports 400k tokens of input context.
Optional parameters:
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
|
|
|
poe
|
- |
gpt-5-pro
|
14.00 |
110.00 |
OpenAI’s latest flagship model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1.
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
GPT-5-Pro thinks long and hard. When using this bot through the API, consider increasing your request timeouts.
|
|
|
poe
|
- |
gpt-5-chat
|
1.10 |
9.00 |
ChatGPT-5 points to the non-reasoning model GPT-5 snapshot (gpt-5-chat-latest) currently used in ChatGPT. Supports native vision, 400k tokens of context, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount.
|
|
|
poe
|
- |
claude-code
|
- |
- |
A powerful assistant that can read, write, and analyze files across many formats. It can also delegate to other Poe bots to handle complex, multi-step tasks.
Built on the Claude Agent SDK from Anthropic.
|
|
|
poe
|
- |
grok-4.1-fast-reasoning
|
- |
- |
Grok-4.1-Fast-Reasoning is a high-performance version of xAI’s Grok 4.1 Fast, the company’s best agentic tool‑calling model. It works great in real-world use cases like customer support, deep research, and advanced analytical reasoning. Equipped with 2M‑token context window, this model processes vast information seamlessly, delivering coherent, context‑aware, and deeply reasoned insights at exceptional speed.
|
|
|
poe
|
- |
zai-glm-4.6-cs
|
19,000.00 |
- |
World’s fastest inference for ZAI GLM 4.6 with Cerebras. ZAI GLM 4.6 is a high‑performance AI model designed for advanced reasoning, superior coding, and effective tool use. It supports structured outputs, parallel tool calling, and real‑time streaming responses. Optimized for agentic coding and automation tasks, the model delivers strong real‑world performance with a context window of up to 131K tokens and output up to 40K tokens.
For more information see: https://inference-docs.cerebras.ai/models/zai-glm-46
Context Limit: 131k
|
|
|
poe
|
- |
gpt-5.1-codex-max
|
1.10 |
9.00 |
OpenAI's most capable agentic coding model; recommended for use in agentic harnesses or similar environments (e.g. Cursor, Claude Code, Codex); the default reasoning effort is set to `Xhigh` so the model will reason extensively on problems given to it (i.e. expect long generation times) and points-intensive. Accepts image attachments.
|
|
|
poe
|
- |
gpt-5.1-codex-mini
|
0.22 |
1.80 |
GPT‑5.1‑Codex‑Mini is a lightweight, fast, and efficient code‑generation model derived from GPT‑5.1‑Codex. It’s optimized for quick iterations, smaller environments, and edge applications—offering strong coding assistance with lower computational cost while maintaining accuracy and utility. Supports 400k tokens of input context.
Optional parameters:
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
|
|
|
poe
|
- |
gpt-4o
|
- |
- |
OpenAI's GPT-4o answers user prompts in a natural, engaging & tailored writing with strong overall world knowledge. Uses GPT-Image-1 to create and edit images conversationally. For fine-grained image generation control (e.g. image quality), use https://poe.com/GPT-Image-1. Supports context window of 128k tokens.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
nano-banana-pro
|
1.70 |
10.00 |
Nano Banana Pro (Gemini 3 Pro Image Preview) can make detailed, context-rich visuals, precisely edit or restyle input images with exceptional fidelity, and even generate legible text in images in multiple languages.
Optional parameters:
`--aspect_ratio` (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image
`--web_search true` to enable web search and real-time information access, this is disabled by default.
`--image_only` (defaults: False): Determines whether to only generate image output
`--image_size` (options: 1K, 2K, 4K): Resolution of image
Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
|
|
|
poe
|
- |
nano-banana
|
0.21 |
1.80 |
Google DeepMind's Nano Banana (i.e. Gemini 2.5 Flash Image model) offers image generation and editing capabilities, state-of-the-art performance in photo-realistic multi-turn edits at exceptional speeds. Supports a maximum input context of 32k tokens.
Optional parameters:
--aspect_ratio (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image
--image_only (defaults: False): Determines whether to only generate image output
Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
|
|
|
poe
|
- |
grok-4.1-fast-non-reasoning
|
- |
- |
Grok-4.1-Fast-Non-Reasoning is a streamlined companion to Grok 4.1 Fast, xAI’s best agentic tool‑calling model. It has 2M context window and high responsiveness but is optimized for non‑reasoning tasks — excelling at text generation, summarization, and automated workflows that demand speed and efficiency over deep logic. Ideal for high-throughput use cases like customer support automation, bulk content creation, and fast conversational responses.
|
|
|
poe
|
- |
gpt-5
|
1.10 |
9.00 |
OpenAI’s most advanced general model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal", "low", "medium", or "high"
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
|
|
|
poe
|
- |
gpt-5-nano
|
0.04 |
0.36 |
GPT-5 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 400k input tokens of context. Provides a 90% chat history cache discount.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal, "low", "medium", or "high"
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
|
|
|
poe
|
- |
gpt-5-mini
|
0.22 |
1.80 |
GPT-5 mini is a small, fast & affordable model that matches or beats GPT-4.1 in many intelligence and vision-related tasks. Supports 400k tokens of context. Provides a 90% chat history cache discount.
To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "minimal", "low", "medium", or "high".
Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
|
|
|
poe
|
- |
o3-pro
|
18.00 |
72.00 |
o3-pro is a well-rounded and powerful model across domains, with more capability than https://poe.com/o3 at the cost of higher price and lower speed. It is especially capable at math, science, coding, visual reasoning tasks, technical writing, and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
|
|
|
poe
|
- |
gemini-2.5-flash-lite
|
0.07 |
0.28 |
A lightweight Gemini 2.5 Flash reasoning model optimized for cost efficiency and low latency. Supports web search. Supports 1 million tokens of input context. Serves the latest `gemini-2.5-flash-lite-preview-09-2025` snapshot. For more complex queries, use https://poe.com/Gemini-2.5-Pro or https://poe.com/Gemini-2.5-Flash
To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 24,576 to the end of your message.
To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
|
|
|
poe
|
- |
gpt-5-codex
|
1.10 |
9.00 |
GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. It supports multimodal inputs such as images or screenshots for UI development and a 400k token context window.
We recommend using GPT-5-Codex only for agentic and interactive coding use cases.
To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "low", "medium", or "high"
|
|
|
poe
|
- |
grok-4-fast-non-reasoning
|
0.20 |
0.50 |
Grok 4 Fast Non-Reasoning is designed for fast, efficient tasks like content generation with a 2M token context window. Combining cutting-edge performance with cost-efficiency, it ensures high-quality results for simpler, everyday applications.
|
|
|
poe
|
- |
qwen-3-next-80b-think
|
3,000.00 |
- |
The Qwen3-Next-80B-Think (with thinking mode enabled by default) is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B-Thinking." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the inference cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens. This is the thinking version of https://poe.com/Qwen3-Next-80B, supports 65k tokens of context.
Optional Parameters:
Use additional input beside attachment button to manage the optional parameters:
1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer.
Technical Specifications:
File Support: PDF, DOC and XLSX files
File Attachment Limitation: Audio, video and image files
Context Window: 65k tokens
|
|
|
poe
|
- |
qwen3-next-80b
|
2,400.00 |
- |
The Qwen3-Next-80B is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the training cost.
Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens.
Use `--enable_thinking false` to disable thinking mode before giving an answer.
This is the non-thinking version of https://poe.com/Qwen3-Next-80B-Think; supports 65k tokens of context.
|
|
|
poe
|
- |
deepseek-v3.2-exp
|
3,900.00 |
- |
DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency.
Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality. This delivers substantial computational efficiency improvements without compromising accuracy.
Comprehensive benchmarks confirm V3.2-Exp matches V3.1-Terminus performance, proving efficiency gains don't sacrifice capability. As both a powerful tool and research platform, it establishes new paradigms for efficient long-context AI processing.
Optional Parameters:
Use additional input beside attachment button to manage the optional parameters:
1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer.
Technical Specifications:
File Support: Text, Markdown and PDF files
Context window: 160k tokens
|
|
|
poe
|
- |
nova-pro-1.0
|
- |
- |
Amazon Nova Pro 1.0 is a highly capable multimodal foundation model from Amazon Nova, offering a strong balance of accuracy, speed, and cost for processing text, images, and video. Its context window is 300,000 tokens, which enables handling very large inputs (including up to ~30 minutes of video input) in a single request.
Use ‘--enable_latency_optimized [false/true]’ (default false) to disable/enable the latency optimized inference accordingly. Note that if enabled, costs may increase. Check the rate card for more information.
|
|
|
poe
|
- |
nova-premier-1.0
|
- |
- |
The Amazon Nova Premier 1.0 model is Amazon’s most capable foundation model, able to handle extremely long contexts (≈ 1 million tokens) and multimodal inputs like text, images, and video while excelling at complex, multi‑step tasks across tools and data sources.
It supports chain‑of‑thought style reasoning and breaks down problems into intermediate steps before arriving at an answer, improving coherence and accuracy.
Use '--enable_thinking [true/false]' (default true) to enable/disable thinking accordingly.
|
|
|
poe
|
- |
grok-4-fast-reasoning
|
0.20 |
0.50 |
Grok 4 Fast Reasoning delivers exceptional performance for tasks requiring logical thinking and problem-solving. With a 2M token context window and state-of-the-art cost-efficiency, it handles complex reasoning tasks with accuracy and speed, making advanced AI capabilities accessible to more users.
|
|
|
poe
|
- |
nova-micro-1.0
|
- |
- |
Amazon Nova Micro is a text-only foundation model in the Amazon Nova family, designed for ultra‑low latency and very low cost, optimized for tasks like summarization, translation, and interactive chat. It supports a context window of 128,000 tokens, enabling handling of large text inputs in a single request.
|
|
|
poe
|
- |
nova-lite-1.0
|
- |
- |
Amazon Nova Lite is a low‑cost multimodal foundation model from Amazon that can process text, images, and video and is optimized for speed and affordability. It offers a context window of 300,000 tokens, allowing handling of very large inputs in a single request (including up to ~30 minutes of video).
|
|
|
poe
|
- |
minimax-m2
|
3,300.00 |
- |
MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever.
Technical Specifications
File Support: Text, Markdown and PDF files
Context window: 200k tokens
|
|
|
poe
|
- |
hunyuan-image-3
|
- |
- |
Hunyuan Image 3.0 is Tencent’s next‑generation open‑source text-to-image model that uses a large multimodal Mixture-of-Experts architecture to unify image understanding and generation in one system. It produces high-fidelity, often photorealistic images with strong prompt adherence, multilingual text rendering, and intelligent world-knowledge reasoning that can enrich sparse prompts with appropriate visual details.
Note: Uploading attachments is not supported.
Parameter controls available:
1. Image Settings
Size / Aspect Ratio
- Default: `--size 1024x1024` (Square 1:1)
- `--size 768x1024` (Portrait 3:4)
- `--size 1024x768` (Landscape 4:3)
- `--size 1024x1536` (Tall Portrait 2:3)
- `--size 1536x1024` (Wide Landscape 3:2)
- `--size 512x512` (Small Square 1:1)
Quantity
- `--num_images [1-4]` number of images to generate (default: 1)
Quality & Generation
- `--num_inference_steps [10-50]` denoising steps for quality (default: 28, higher = better quality but slower)
- `--guidance_scale [1.0-20.0]` how closely to follow prompt (default: 7.5)
Customization
- `--negative_prompt "text"` things to avoid in generated images
- `--seed [integer]` reproducible generation with fixed seed (e.g., 42)
|
|
|
poe
|
- |
kling-image-o1
|
- |
- |
Kling Image O1 image generation and image editing bot. Send up to 10 images to use as a reference, and refer to each image with $image1, $image2, etc. in the prompt to specify interactions. Set resolution with `--resolution` and aspect ratio with `--aspect`. Note: `auto` aspect ratio is default and can be used only for editing, text-to-image generation has a default of `1:1`. Supports jpeg, png, heic, webp images.
|
|
|
poe
|
- |
kling-2.6-pro
|
- |
- |
Generate high-quality videos with native audio from text and images using Kling 2.6 Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video. Use --silent to generate a silent video.
|
|
|
poe
|
- |
flux-2-pro
|
- |
- |
Flux.2 [Pro] is Black Forest Labs' state-of-the-art model with multi-reference support, fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex colour codes within the prompt for precise colouring. Send images (Up to 8 images) in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 9 megapixels.
Optional parameters:
`--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
|
|
|
poe
|
- |
flux-2-flex
|
- |
- |
Flux.2 [Flex] is Black Forest Lab's latest model, with Multi-Reference Support, Fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex color codes within the prompt for precise coloring.
Send images in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 14 megapixels.
Optional parameters:
`--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
|
|
|
poe
|
- |
flux-2-dev
|
- |
- |
Open-weight image gen (32B) model, derived from the FLUX.2 base model. The most powerful open-weight image generation and editing model available today, combining text-to-image synthesis and image editing with multiple input images in a single checkpoint.
Optional parameters:
`--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
|
|
|
poe
|
- |
mistral-medium-3.1
|
- |
- |
Mistral Medium 3.1 is a high-performance, enterprise-grade language model that delivers strong reasoning, coding, and STEM capabilities. It supports hybrid, on-prem, and in-VPC deployments, offering competitive accuracy and easy integration across cloud environments. Context Length: 131k
|
|
|
poe
|
- |
exa-answer
|
- |
- |
Get a quick LLM-style answer to a question informed by Exa search results.
For more in-depth results, consider using the following endpoint: https://poe.com/Exa-Research
Supported file type upload: PDF, TXT, PNG, JPG, JPEG
Audio and video file upload is not supported.
Parameter Controls Available:
- `--text false/true` Show text snippets under each source citation (default: false)
|
|
|
poe
|
- |
exa-search
|
- |
- |
Utilize Exa's technology for searching web pages, finding similar web pages, crawling, and more.
Note: This endpoint does not return an LLM-style response (visit the following if you want an LLM-style response: https://poe.com/Exa-Answer or https://poe.com/Exa-Research). File upload is not supported.
Parameter Controls Available:
1. Operation Mode
- Default: `--operation search` (Web Search)
- For finding similar pages: `--operation similar`
- For getting page contents: `--operation contents`
- For code search: `--operation code`
2. Search Settings (search operation)
- `--search_type [auto|neural|deep|fast]` search algorithm (default: auto)
- `--show_content` display full page content in results
- `--include_domains` comma-separated domains to include
- `--include_text` text that must appear (up to 5 words)
- `--exclude_text` text that must NOT appear (up to 5 words)
3. Common Search Settings (search & similar operations)
- `--num_results [1-100]` number of results to return (default: 10)
- `--category [company|research paper|news|pdf|github|tweet|personal site|linkedin profile|financial report]`
- `--exclude_domains` comma-separated domains to exclude
4. Date Filters (search operation)
- `--start_crawl_date` results crawled after this date (ISO 8601)
- `--end_crawl_date` results crawled before this date (ISO 8601)
- `--start_published_date` content published after this date (ISO 8601)
- `--end_published_date` content published before this date (ISO 8601)
5. Content Options (search, similar, & contents operations)
- `--return_text` fetch page text content (default: true)
- `--text_max_chars` limit text length (empty = unlimited)
- `--include_html_tags` preserve HTML structure
- `--return_highlights` get AI-selected key snippets
- `--highlights_sentences [1-10]` sentences per highlight (default: 3)
- `--highlights_per_url [1-10]` highlights per result (default: 3)
- `--highlights_query` guide highlight selection
- `--return_summary` get AI-generated summaries
- `--summary_query` guide summary generation
6. Advanced Options (search, similar, & contents operations)
- `--livecrawl [fallback|never|always|preferred]` when to fetch fresh content (default: fallback)
- `--subpages [0-10]` number of linked subpages to crawl (default: 0)
- `--subpage_target` find specific subpages matching keyword
7. Code Search Controls (code operation)
- `--code_tokens [dynamic|5000|10000|20000]` response length (default: dynamic)
|
|
|
poe
|
- |
exa-research
|
- |
- |
Create an asynchronous research task that explores the web, gathers sources, synthesizes findings, and returns results with citations.
Note: Responses may take several minutes to complete depending on complexity.
Supported file type upload: PDF, TXT, PNG, JPG, JPEG
Audio and video file upload is not supported.
Parameter Controls Available:
Model Selection
- `--model exa-research` (Standard, default)
- `--model exa-research-pro` (Deepest, highest quality)
- `--model exa-research-fast` (Fastest, lightest)
|
|
|
poe
|
- |
kat-coder-pro
|
- |
- |
KAT-Coder-Pro V1 by KwaiKAT is a non-reasoning model optimized for agentic coding. It delivers strong performance on reasoning-style tasks while requiring significantly fewer output tokens than peer models. With the 1210 release, it achieved a score of 64 on the Artificial Analysis Intelligence Index, placing it in the global Top 10 and ranking first among all non-reasoning models.
File Support: Text, Markdown and PDF files
Context window: 256k tokens
|
|
|
poe
|
- |
deepseek-v3.2-fw
|
5,300.00 |
- |
Model from DeepSeek that harmonizes high computational efficiency with superior reasoning and agent performance.
File Support: Image (JPG, JPEG, PNG, HEIC), Other File Types (PDF, PYTHON, XLSX)
|
|
|
poe
|
- |
nova-lite-2
|
- |
- |
Amazon Nova 2 Lite is a fast, cost-effective multimodal reasoning model from Amazon that can process text, images, documents, and video, designed for everyday workloads like chatbots, document processing, and business automation. It offers a 1 million token context window, enabling very large, complex inputs in a single request, including long documents and extended video clips (~90 minutes).
Note: Video file uploads are limited to ~1GB. Also note that reasoning traces are not exposed from AWS.
Supported file types: JPEG, PNG, GIF, WEBP, PDF, DOCX, TXT, MP4, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP
Parameter controls available:
'--enable_reasoning true/false' - Enable step-by-step reasoning (default: true).
'--reasoning_effort low/medium/high' - Specify the reasoning effort level (default: medium).
|
|
|
poe
|
- |
gpt-oss-120b-t
|
1,500.00 |
- |
OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. Built with community feedback and released under Apache 2.0, this 120B parameter model provides transparency, customization, and deployment flexibility for organizations requiring complete data security & privacy control.
|
|
|
poe
|
- |
gpt-oss-20b-t
|
450.00 |
- |
OpenAI's GPT-OSS-20B provides powerful chain-of-thought reasoning in an efficient 20B parameter model. Designed for single-GPU deployment while maintaining sophisticated reasoning capabilities, this Apache 2.0 licensed model offers the perfect balance of performance and resource efficiency for diverse applications.
|
|
|
poe
|
- |
amazon-nova-reel-1.1
|
- |
- |
Amazon Nova Reel 1.1 is an advanced AI video generation model that creates up to 2-minute multi-shot videos from text and optional image prompts, offering improved video quality, latency, and visual consistency compared to its predecessor.
|
|
|
poe
|
- |
kimi-k2-think-t
|
13,000.00 |
- |
Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed.
Supported File Types: JPEG, PNG, PDF
|
|
|
poe
|
- |
amazon-nova-canvas
|
- |
- |
Amazon Nova Canvas is a high-quality image‐generation model that creates and edits images from text or image inputs—offering features like inpainting/outpainting, virtual try‑on, style controls, and background removal—all with built‑in customization.
|
|
|
poe
|
- |
kimi-k2
|
6,300.00 |
- |
Kimi K2-Instruct-0905 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
Key Features:
- Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability.
- MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up.
- Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.
Technical Specifications
File Support: Attachments not supported
Context window: 256k tokens
|
|
|
poe
|
- |
kimi-k2-0905-t
|
11,000.00 |
- |
The new Kimi K2-0905 model from Moonshot AI features a massive 256,000-token context window, double the length of its predecessor (Kimi K2), along with greatly improved coding abilities and front-end generation accuracy. It boasts 1 trillion total parameters (with 32 billion activated at a time) and claims 100% tool-call success in real-world tests, setting a new bar for open-source AI performance in complex, multi-step tasks
|
|
|
poe
|
- |
kimi-k2-t
|
11,000.00 |
- |
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
|
|
|
poe
|
- |
kimi-k2-instruct
|
6,000.00 |
- |
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Uses the latest September 5th, 2025 snapshot. The updated version has improved coding abilities, agentic tool use, and a longer (256K) context window.
|
|
|
poe
|
- |
deepseek-v3.1
|
7,800.00 |
- |
Latest Update: Terminus Enhancement
This model has been updated with the Terminus release, addressing key user-reported issues while maintaining all original capabilities:
- Language consistency: Reduced instances of mixed Chinese-English text and abnormal characters
- Enhanced agent capabilities: Optimized performance of the Code Agent and Search Agent
Core Capabilities
DeepSeek-V3.1 is a hybrid model supporting both thinking mode and non-thinking mode, built upon the original V3 base checkpoint through a two-phase long context extension approach.
Technical Specifications
Context Window: 128k tokens
File Support: PDF, DOC, and XLSX files
File Restrictions: Does not accept audio and video files
|
|
|
poe
|
- |
glm-4.6-fw
|
6,000.00 |
- |
As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
|
|
|
poe
|
- |
deepseek-v3.1-t
|
6,000.00 |
- |
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects: Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template. Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
|
|
|
poe
|
- |
glm-4.5
|
5,700.00 |
- |
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
Technical Specifications
File Support: PDF and Markdown files
Context window: 128k tokens
|
|
|
poe
|
- |
deepseek-v3.1-n
|
5,700.00 |
- |
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:
- Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template.
- Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.
- Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
Technical Specifications
File Support: Attachments not supported
Context window: 128k tokens
|
|
|
poe
|
- |
qwen3-coder
|
9,000.00 |
- |
Qwen3 Coder 480B A35B Instruct is a state-of-the-art 480B-parameter Mixture-of-Experts model (35B active) that achieves top-tier performance across multiple agentic coding benchmarks. Supports 256K native context length and scales to 1M tokens with extrapolation. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company.
|
|
|
poe
|
- |
claude-sonnet-4
|
2.60 |
13.00 |
Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 1m context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
|
|
|
poe
|
- |
claude-opus-4
|
13.00 |
64.00 |
Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
|
|
|
poe
|
- |
claude-opus-4-reasoning
|
13.00 |
64.00 |
Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
|
|
|
poe
|
- |
claude-sonnet-4-reasoning
|
2.60 |
13.00 |
Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 60k tokens) and 200k context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 61,440 to the end of your message.
|
|
|
poe
|
- |
o4-mini
|
0.99 |
4.00 |
o4-mini provides high intelligence on a variety of tasks and domains, including science, math, and coding at an affordable price point. This bot uses medium reasoning effort by low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
|
|
|
poe
|
- |
gemini-deep-research
|
1.60 |
9.60 |
Gemini Deep Research plans, executes, and synthesizes complex, multi-step investigations by querying the web and other data to produce detailed, structured reports. Offers best in the world performance on Google's newly released DeepSearchQA benchmark as of December 2025. Be sure to give your entire research request in the initial prompt and include as much detail as you can!
use --interaction_id flag if you want to continue discussion in previous research task.
|
|
|
poe
|
- |
o4-mini-deep-research
|
1.80 |
7.20 |
Deep Research from OpenAI powered by the o4-mini model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
|
|
|
poe
|
- |
glm-4.5-air-t
|
2,400.00 |
- |
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
|
|
|
poe
|
- |
glm-4.5-fw
|
5,400.00 |
- |
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters. It unifies reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
|
|
|
poe
|
- |
grok-3
|
- |
- |
xAI's February 2025 flagship release representing nearly state-of-the-art performance in several reasoning/problem solving domains. The API doesn't yet support reasoning mode for Grok 3, but does for https://poe.com/Grok-3-Mini; this bot also doesn't have access to the X data feed. Supports 131k tokens of context, uses Grok 2 for native vision.
|
|
|
poe
|
- |
grok-3-mini
|
- |
- |
xAI's February 2025 release with strong performance across many domains but at a more affordable price point. Supports reasoning with a configurable reasoning effort level, and 131k tokens of context; doesn't have access to the X data feed.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low" or "high".
|
|
|
poe
|
- |
o3
|
1.80 |
7.20 |
o3 provides state-of-the-art intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
|
|
|
poe
|
- |
o3-deep-research
|
9.00 |
36.00 |
Deep Research from OpenAI powered by the o3 model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
|
|
|
poe
|
- |
elevenlabs-v3
|
- |
- |
ElevenLabs v3 is a cutting-edge text-to-speech model that brings scripts to life with remarkable realism and performance-level control. Unlike traditional TTS systems, it allows creators to shape the emotional tone, pacing, and soundscape of their audio through the use of inline audio tags. These tags are enclosed in square brackets and act as stage directions—guiding how a line is spoken or what sound effects are inserted—without being spoken aloud. This enables rich, expressive narration and dialogue for applications like audiobooks, games, podcasts, and interactive media. Whether you’re aiming for a tense whisper, a sarcastic remark, or a dramatic soundscape full of explosions and ambient effects, v3 gives you granular control directly in the text prompt. This bot will also run text-to-speech on PDF attachments / URL links.
Examples of voice delivery tags include:
* [whispers] I have to tell you a secret.
* [angry] That was *never* the plan.
* [sarcastic] Oh, sure. That’ll totally work.
* and [laughs] You're hilarious.
Examples of sound effect tags are:
* [gunshot] Get down!
* [applause] Thank you, everyone.
* and [explosion] What was that?!
These can also be combined.
Multiple speakers can be supported via the parameter control. Dialogue for multiple speakers must follow the format, e.g. for 3 speakers:
Speaker 1: [dialogue]
Speaker 2: [dialogue]
Speaker 3: [dialogue]
Speaker 1: [dialogue]
Speaker 2: [dialogue]
--speaker_count 3 --voice_1 [voice_1] --voice_2 [voice_2] --voice_3 [voice_3]
The following voices are supported:
Alexandra - Conversational & Real
Amy - Young & Natural
Arabella - Mature Female Narrator
Austin - Good Ol' Texas Boy
Blondie - Warm & Conversational
Bradford - British Male Storyteller
Callum - Gravelly Yet Unsettling
Charlotte - Raspy & Sensual
Chris - Down-to-Earth
Coco Li - Shanghainese Female
Gaming - Unreal Tonemanagement 2003
Harry - Animated Warrior
Hayato - Soothing Zen Male
Hope - Upbeat & Clear
James - Husky & Engaging
James Gao - Calm Chinese Voice
Jane - Professional Audiobook Reader
Jessica - Playful American Female
Juniper - Grounded Female Professional
Karo Yang - Youthful Asian Male
Kuon - Acute Fantastic Female
Laura - Quirky Female Voice
Liam - Warm, Energetic Youth
Monika Sogam - Indian-English Accent
Nichalia Schwartz - Engaging Female American
Priyanka Sogam - Late-Night Radio
Reginald - Brooding, Intense Villain
ShanShan - Young, Energetic Female
Xiao Bai - Shrill & Annoying
Prompt input cannot exceed 5,000 characters.
|
|
|
poe
|
- |
deepseek-v3
|
12,000.00 |
- |
DeepSeek-V3 – the new top open-source LLM. Updated to the March 24, 2025 checkpoint. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to Together, a US-based company. Supports 131k context window and max output of 12k tokens.
|
|
|
poe
|
- |
deepseek-v3-fw
|
9,000.00 |
- |
DeepSeek-V3 is an open-source Mixture-of-Experts (MoE) language model; able to perform well on competitive benchmarks with cost-effective training & inference. All data submitted to this bot is governed by the Poe privacy policy and is sent to Fireworks, a US-based company. Supports 131k context window and max output of 131k tokens. Updated to serve the latest March 24th, 2025 snapshot.
|
|
|
poe
|
- |
deepseek-v3.1-tm
|
5,700.00 |
- |
DeepSeek-V3.1-Terminus preserves all original model capabilities while resolving key user-reported issues, including:
- Language consistency: Significantly reducing mixed Chinese-English output and eliminating abnormal character occurrences
- Agent performance: Enhanced optimization of both Code Agent and Search Agent functionality
- Use `--enable_thinking false` to disable thinking about the response before giving a final answer.
- The bot does not accept attachment. It also does not support billing logic
Context window: 128k tokens.
|
|
|
poe
|
- |
gpt-4.1
|
1.80 |
7.20 |
OpenAI’s GPT-4.1 significantly improves on past models in terms of its coding skills, long context (1M tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4o. Provides a 75% chat history cache discount.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
gpt-4.1-mini
|
0.36 |
1.40 |
GPT-4.1 mini is a small, fast & affordable model that matches or beats GPT-4o in many intelligence and vision-related tasks. Supports 1M tokens of context.
Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
|
|
|
poe
|
- |
gpt-4.1-nano
|
0.09 |
0.36 |
GPT-4.1 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 1M input tokens of context.
Check out the newest version of this bot here: https://poe.com/GPT-5-nano.
|
|
|
poe
|
- |
llama-4-scout-t
|
1,000.00 |
- |
Llama 4 Scout, fast long-context multimodal model from Meta. A 16-expert MoE model that excels at multi-document analysis, codebase reasoning, and personalized tasks. A smaller model than Maverick but state of the art in its size & with text + image input support. Supports 300k context.
|
|
|
poe
|
- |
claude-opus-4-search
|
13.00 |
64.00 |
Claude Opus 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
|
|
|
poe
|
- |
claude-sonnet-4-search
|
2.60 |
13.00 |
Claude Sonnet 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
|
|
|
poe
|
- |
claude-sonnet-3.7
|
2.60 |
13.00 |
Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. For the maximum extending thinking, please use https://poe.com/Claude-Sonnet-Reasoning-3.7. Supports a 200k token context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 16,384 to the end of your message.
|
|
|
poe
|
- |
claude-sonnet-3.5
|
2.60 |
13.00 |
Anthropic's Claude Sonnet 3.5 using the October 22, 2024 model snapshot. Excels in complex tasks like coding, writing, analysis and visual processing. Has a context window of 200k of tokens (approximately 150k English words).
|
|
|
poe
|
- |
claude-haiku-3.5
|
0.68 |
3.40 |
The latest generation of Anthropic's fastest model. Claude Haiku 3.5 has fast speeds and improved instruction following.
|
|
|
poe
|
- |
gemini-2.0-flash
|
0.10 |
0.42 |
Gemini 2.0 Flash is Google's most popular model yet with enhanced performance and blazingly fast response times; supports web search grounding so can intelligently answer questions related to recent events. Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed. Supports 1 million tokens of input context.
To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
|
|
|
poe
|
- |
gemini-2.0-flash-lite
|
0.05 |
0.21 |
Gemini 2.0 Flash Lite is a new model variant from Google that is our most cost-efficient model yet, and often considered a spiritual successor to Gemini 1.5 Flash in terms of capability, context window size and cost. Does not support web search (if you need search, we recommend using https://poe.com/Gemini-2.0-Flash), supports 1 million tokens of input context.
|
|
|
poe
|
- |
claude-sonnet-3.7-search
|
2.60 |
13.00 |
Claude Sonnet 3.7 with access to real-time information from the web.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
|
|
|
poe
|
- |
claude-haiku-3.5-search
|
0.68 |
3.40 |
Claude Haiku 3.5 with access to real-time information from the web.
|
|
|
poe
|
- |
qwen3-max
|
- |
- |
Qwen3-Max is a major update to the Qwen3 series, delivering significant improvements in reasoning, instruction following, and multilingual support. It provides higher accuracy in complex tasks like coding and math, along with reduced hallucinations and better performance on open-ended questions.
This model is served by Alibaba Cloud Int. from Singapore.
|
|
|
poe
|
- |
gpt-oss-120b
|
1,200.00 |
- |
OpenAI introduces the GPT-OSS-120B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities.
The GPT-OSS-120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).
Technical Specifications
File Support: Attachments not supported
Context window: 128k tokens
|
|
|
poe
|
- |
gpt-oss-20b
|
450.00 |
- |
OpenAI introduces the GPT-OSS-20B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities.
The GPT-OSS-20B model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).
Technical Specifications
File Support: Attachments not supported
Context window: 128k tokens
|
|
|
poe
|
- |
gpt-oss-120b-cs
|
3,200.00 |
- |
World’s fastest inference for GPT OSS 120B with Cerebras. OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. The bot does not accept video, ppt, docx and excel files.
|
|
|
poe
|
- |
openai-gpt-oss-120b
|
1,500.00 |
- |
GPT-OSS-120b is a high-performance, open-weight language model designed for production-grade, general-purpose use cases. It fits on a single H100 GPU, making it accessible without requiring multi-GPU infrastructure. Trained on the Harmony response format, it excels at complex reasoning and supports configurable reasoning effort, full chain-of-thought transparency for easier debugging and trust, and native agentic capabilities for function calling, tool use, and structured outputs.
|
|
|
poe
|
- |
openai-gpt-oss-20b
|
750.00 |
- |
GPT-OSS-20B is a compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments. It shares the same Harmony training foundation and capabilities as 120B, with faster inference and easier deployment that is ideal for specialized or offline use cases, fast responsive performance, chain-of-thought output, and agentic workflows.
|
|
|
poe
|
- |
qwen3-next-instruct-t
|
2,400.00 |
- |
Qwen3-Next Instruct features a highly sparse MoE structure that activates only 3B of its 80B parameters during inference. Supports only instruct mode without thinking blocks, delivering performance on par with Qwen3-235B-A22B-Instruct-2507 on certain benchmarks while using less than 10% training cost and providing 10x+ higher throughput on contexts over 32K tokens.
|
|
|
poe
|
- |
qwen3-next-think-t
|
3,000.00 |
- |
Qwen3-Next Thinking features the same highly sparse MoE architecture but specialized for complex reasoning tasks. Supports only thinking mode with automatic tag inclusion, delivering exceptional analytical performance while maintaining extreme efficiency with 10x+ higher throughput on long contexts and may generate longer thinking content than predecessors.
|
|
|
poe
|
- |
qwen3-max-n
|
22,000.00 |
- |
Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.
File Support: Text, Markdown and PDF files
Context window: 256k tokens
|
|
|
poe
|
- |
qwen3-vl-235b-a22b-t
|
4,800.00 |
- |
Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment.
Key Features:
- Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks.
- Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos.
- Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis.
- Long Context: Processes up to 1M tokens for books or long videos.
- Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers.
- Visual Recognition: Recognizes a wide range of objects, landmarks, and more.
- OCR: Supports 32 languages with improved performance in challenging conditions.
- Text-Vision Fusion: Achieves seamless, unified comprehension.
Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks.
Technical Specifications
File Support: Image, Video, PDF and Markdown files
Context window: 128k tokens
|
|
|
poe
|
- |
qwen3-vl-235b-a22b-i
|
3,600.00 |
- |
This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.
Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on‑demand deployment.
Key Enhancements:
Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks.
Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos.
Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI.
Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing.
Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers.
Upgraded Visual Recognition: Broader, higher-quality pretraining is able to "recognize everything"—celebrities, anime, products, landmarks, flora/fauna, etc.
Expanded OCR: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing.
Text Understanding on par with pure LLMs: Seamless text–vision fusion for lossless, unified comprehension.
Technical Specifications
File Support: Image, Video, PDF and Markdown files
Context window: 128k tokens
|
|
|
poe
|
- |
qwen-3-235b-2507-t
|
1,900.00 |
- |
Qwen3 235B A22B 2507, currently the best instruct model (non-reasoning) among both closed and open source models. It excels in instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. It is also great at multilingual tasks and supports a long context window (262k).
|
|
|
poe
|
- |
qwen3-235b-2507-fw
|
2,700.00 |
- |
State-of-the-art language model with exceptional math, coding, and problem-solving performance. Operates in non-thinking mode, and does not generate <think></think> blocks in its output. Supports 256k tokens of native context length. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company. Uses the latest July 21st, 2025 snapshot (Qwen3-235B-A22B-Instruct-2507).
|
|
|
poe
|
- |
qwen3-235b-2507-cs
|
6,000.00 |
- |
World's fastest inference with Qwen3 235B Instruct (2507) model with Cerebras. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage.
|
|
|
poe
|
- |
qwen3-coder-480b-t
|
17,000.00 |
- |
Qwen3‑Coder‑480B is a state of the art mixture‑of‑experts (MoE) code‑specialized language model with 480 billion total parameters and 35 billion activated parameters. Qwen3‑Coder delivers exceptional performance across code generation, function calling, tool use, and long‑context reasoning. It natively supports up to 262,144‑token context windows, making it ideal for large repository and multi‑file coding tasks.
|
|
|
poe
|
- |
qwen3-coder-480b-n
|
7,200.00 |
- |
Qwen3-Coder-480B-A35B-Instruct delivers Claude Sonnet-comparable performance on agentic coding and browser tasks while supporting 256K-1M token long-context processing and multi-platform agentic coding capabilities.
Technical Specifications
File Support: Attachments not supported
Context window: 256k tokens
|
|
|
poe
|
- |
qwen3-235b-a22b-di
|
1,900.00 |
- |
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP8.
|
|
|
poe
|
- |
qwen3-235b-a22b-n
|
1,800.00 |
- |
It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). The Bot does not currently support attachments.
This feature the following key enhancements:
- Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
- Substantial gains in long-tail knowledge coverage across multiple languages.
- Markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation.
- Enhanced capabilities in 256K long-context understanding.
Technical Specifications
File Support: Attachments not supported
Context window: 128k tokens
|
|
|
poe
|
- |
magistral-medium-2509-thinking
|
- |
- |
Magistral Medium 2509 (thinking) by EmpirioLabs.
Magistral is Mistral's first reasoning model. It is ideal for general purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs. From legal research and financial forecasting to software development and creative storytelling — this model solves multi-step challenges where transparency and precision are critical. Context Window: 40,000k
Supported file type uploads: PDF, XLSX, TXT, PNG, JPG, JPEG
|
|
|
poe
|
- |
o1
|
14.00 |
54.00 |
OpenAI's o1 is designed to reason before it responds and provides world-class capabilities on complex tasks (e.g. science, coding, and math). Improving upon o1-preview and with higher reasoning effort, it is also capable of reasoning through images and supports 200k tokens of input context. By default, uses reasoning_effort of medium, but low, medium & high are also selectable.
|
|
|
poe
|
- |
o1-pro
|
140.00 |
540.00 |
OpenAI’s o1-pro highly capable reasoning model, tailored for complex, compute- or context-heavy tasks, dedicating additional thinking time to deliver more accurate, reliable answers. For less costly, complex tasks, https://poe.com/o3-mini is recommended.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
|
|
|
poe
|
- |
cartesia-ink-whisper
|
- |
- |
Transcribe audio files using Speech-to-Text with the Cartesia Ink Whisper model.
Select the Language (`--language`) of your audio file in Settings. Default is English (en).
Supported Languages:
English (en)
Chinese (zh)
German (de)
Spanish (es)
Russian (ru)
Korean (ko)
French (fr)
Japanese (ja)
Portuguese (pt)
Turkish (tr)
Polish (pl)
Catalan (ca)
Dutch (nl)
Arabic (ar)
Swedish (sv)
Italian (it)
Indonesian (id)
Hindi (hi)
Finnish (fi)
Vietnamese (vi)
Hebrew (he)
Ukrainian (uk)
Greek (el)
Malay (ms)
Czech (cs)
Romanian (ro)
Danish (da)
Hungarian (hu)
Tamil (ta)
Norwegian (no)
Thai (th)
Urdu (ur)
Croatian (hr)
Bulgarian (bg)
Lithuanian (lt)
Latin (la)
Maori (mi)
Malayalam (ml)
Welsh (cy)
Slovak (sk)
Telugu (te)
Persian (fa)
Latvian (lv)
Bengali (bn)
Serbian (sr)
Azerbaijani (az)
Slovenian (sl)
Kannada (kn)
Estonian (et)
Macedonian (mk)
Breton (br)
Basque (eu)
Icelandic (is)
Armenian (hy)
Nepali (ne)
Mongolian (mn)
Bosnian (bs)
Kazakh (kk)
Albanian (sq)
Swahili (sw)
Galician (gl)
Marathi (mr)
Punjabi (pa)
Sinhala (si)
Khmer (km)
Shona (sn)
Yoruba (yo)
Somali (so)
Afrikaans (af)
Occitan (oc)
Georgian (ka)
Belarusian (be)
Tajik (tg)
Sindhi (sd)
Gujarati (gu)
Amharic (am)
Yiddish (yi)
Lao (lo)
Uzbek (uz)
Faroese (fo)
Haitian Creole (ht)
Pashto (ps)
Turkmen (tk)
Nynorsk (nn)
Maltese (mt)
Sanskrit (sa)
Luxembourgish (lb)
Myanmar (my)
Tibetan (bo)
Tagalog (tl)
Malagasy (mg)
Assamese (as)
Tatar (tt)
Hawaiian (haw)
Lingala (ln)
Hausa (ha)
Bashkir (ba)
Javanese (jw)
Sundanese (su)
Cantonese (yue)
|
|
|
poe
|
- |
chatgpt-4o-latest
|
4.50 |
14.00 |
Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks. Supports context window of 128k tokens, cannot generate images.
|
|
|
poe
|
- |
gpt-4o-mini
|
0.14 |
0.54 |
This intelligent small model from OpenAI is significantly smarter, cheaper, and just as fast as GPT-3.5 Turbo.
Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
|
|
|
poe
|
- |
glm-4.6-t
|
6,600.00 |
- |
GLM-4.6 is the latest flagship model from Z.ai's GLM series, delivering state-of-the-art agentic and coding capabilities that rival Claude Sonnet 4. With 357B parameters in a Mixture-of-Experts architecture, an expanded 200K context window, and 30% improved token efficiency, GLM-4.6 represents the top-performing model developed in China.
|
|
|
poe
|
- |
qwen3-max-preview
|
- |
- |
A preview version of the Max model in the Tongyi Qianwen 3 series, achieving an effective integration of thinking and non-thinking modes. In thinking mode, there is a significant enhancement in capabilities such as intelligent agent programming, common-sense reasoning, and reasoning across mathematics, science, and general domains.
This model is served by Alibaba Cloud Int. from Singapore.
Notes:
- Audio/Video files are not supported.
- Max Context Window: 252k
Use '-- enable_thinking true/false' to enable/disable Deep Thinking accordingly.
|
|
|
poe
|
- |
o3-mini
|
0.99 |
4.00 |
o3-mini is OpenAI's reasoning model, providing high intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high can be selected; supports 200k tokens of input context and 100k tokens of output context.
To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
|
|
|
poe
|
- |
o3-mini-high
|
0.99 |
4.00 |
o3-mini-high is OpenAI's most recent reasoning model with reasoning_effort set to high, providing frontier intelligence on most tasks. Like other models in the o-series, it is designed to excel at science, math, and coding tasks. Supports 200k tokens of input context and 100k tokens of output context.
|
|
|
poe
|
- |
llama-3.1-8b-di
|
300.00 |
- |
The smallest and fastest model from Meta's Llama 3.1 family. This open-source language model excels in multilingual dialogue, outperforming numerous industry benchmarks for both closed and open-source conversational AI systems. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company.
Input token limit 128k, output token limit 8k. Quantization: FP16 (official).
|
|
|
poe
|
- |
claude-sonnet-3.7-reasoning
|
2.60 |
13.00 |
Reasoning capabilities on by default. Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. Recommended for complex math or coding problems. Supports a 200k token context window.
To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
|
|
|
poe
|
- |
inception-mercury
|
- |
- |
Mercury is the first diffusion large language model (dLLM). On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. A new generation of LLMs that push the frontier of fast, high-quality text generation.
|
|
|
poe
|
- |
inception-mercury-coder
|
- |
- |
Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder Small's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the blog post here: https://www.inceptionlabs.ai/introducing-mercury.
|
|
|
poe
|
- |
mistral-medium-3
|
- |
- |
Mistral Medium 3 is a powerful, cost-efficient language model offering top-tier reasoning and multimodal performance. Context Window: 130k
|
|
|
poe
|
- |
mistral-medium
|
2.70 |
8.10 |
Mistral AI's medium-sized model. Supports a context window of 32k tokens (around 24,000 words) and is stronger than Mixtral-8x7b and Mistral-7b on benchmarks across the board.
|
|
|
poe
|
- |
llama-4-maverick-t
|
1,600.00 |
- |
Llama 4 Maverick, state of the art long-context multimodal model from Meta. A 128-expert MoE powerhouse for multilingual image/text understanding (12 languages), creative writing, and enterprise-scale applications—outperforming Llama 3.3 70B. Supports 500k tokens context.
|
|
|
poe
|
- |
llama-3.3-70b-fw
|
4,200.00 |
- |
Meta's Llama 3.3 70B Instruct, hosted by Fireworks AI. Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
|
|
|
poe
|
- |
llama-3.3-70b
|
3,900.00 |
- |
Llama 3.3 70B – with similar performance as Llama 3.1 405B while being faster and much smaller! Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
|
|
|
poe
|
- |
deepseek-prover-v2
|
- |
- |
DeepSeek-Prover-V2 is an open-source large language model specifically designed for formal theorem proving in Lean 4. The model builds on a recursive theorem proving pipeline powered by the company's DeepSeek-V3 foundation model.
|
|
|
poe
|
- |
deepseek-r1-fw
|
18,000.00 |
- |
State-of-the-art large reasoning model problem solving, math, and coding performance at a fraction of the cost; explains its chain of thought. All data you provide this bot will not be used in training, and is sent only to Fireworks AI, a US-based company. Supports 164k tokens of input context and 164k tokens of output context. Uses the latest May 28th, 2025 snapshot.
|
|
|
poe
|
- |
deepseek-r1-di
|
6,000.00 |
- |
Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company.
Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
|
|
|
poe
|
- |
deepseek-r1-n
|
6,000.00 |
- |
The DeepSeek-R1 (latest Snapshot model DeepSeek-R1-0528) model features enhanced reasoning and inference capabilities through optimized algorithms and increased computational resources. It excels in mathematics, programming, and logic, with performance nearing top-tier models like o3 and Gemini 2.5 Pro. This bot does not accept attachments.
Technical Specifications
File Support: Attachments not supported
Context window: 160k tokens
|
|
|
poe
|
- |
llama-3.3-70b-n
|
1,400.00 |
- |
The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Technical Specifications
File Support: Attachments not supported
Context window: 128k tokens
|
|
|
poe
|
- |
llama-3.3-70b-cs
|
7,800.00 |
- |
World’s fastest inference for Llama 3.3 70B with Cerebras. The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
|
|
|
poe
|
- |
llama-3.1-70b-t
|
14,000.00 |
- |
Llama 3.1 70B Instruct from Meta. Supports 128k tokens of context.
The points price is subject to change.
|
|
|
poe
|
- |
llama-3.1-8b-cs
|
900.00 |
- |
World’s fastest inference for Llama 3.1 8B with Cerebras. This Llama 8B instruct-tuned version is fast and efficient. The Llama 3.1 8B is an instruction tuned text only model, optimized for multilingual dialogue use cases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.
|
|
|
poe
|
- |
gpt-researcher
|
- |
- |
GPT Researcher is an agent that conducts deep research on any topic and generates a comprehensive report with citations. GPT Researcher is powered by Tavily's search engine.
GPTR is based on the popular open source project: https://github.com/assafelovic/gpt-researcher -- by integrating Tavily search, it is optimized for curation and ranking of trusted research sources. Learn more at https://gptr.dev or https://tavily.com
|
|
|
poe
|
- |
web-search
|
- |
- |
Web-enabled assistant bot that searches the internet to inform its responses. Particularly good for queries regarding up-to-date information or specific facts. Powered by Gemini 2.0 Flash.
|
|
|
poe
|
- |
gpt-4o-search
|
2.20 |
9.00 |
OpenAI's fine-tuned model for searching the web for real-time information. For less expensive messages, consider https://poe.com/GPT-4o-mini-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
|
|
|
poe
|
- |
gpt-4o-mini-search
|
0.14 |
0.54 |
OpenAI's fine-tuned model for searching the web for real-time information. For higher-performance, consider https://poe.com/GPT-4o-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
|
|
|
poe
|
- |
reka-research
|
- |
- |
Reka Research is a state-of-the-art agentic AI that answers complex questions by browsing the web. It excels at synthesizing information from multiple sources, performing work that usually takes hours in minutes
|
|
|
poe
|
- |
perplexity-sonar
|
- |
- |
Sonar by Perplexity is a cutting-edge AI model that delivers real-time, web-connected search results with accurate citations. It's designed to provide up-to-date information and customizable search sources, making it a powerful tool for integrating AI search into various applications. Context Length: 127k
|
|
|
poe
|
- |
linkup-deep-search
|
- |
- |
Linkup Deep Search is an AI-powered search bot that continues to search iteratively if it hasn't found sufficient information on the first attempt. Results are slower compared to its Standard search counterpart, but often yield to more comprehensive results.
Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Context Window: 100k
Audio/video files are not supported at this time.
Parameter controls available:
1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains.
2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format
3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display.
Learn more: https://www.linkup.so/
|
|
|
poe
|
- |
linkup-standard
|
- |
- |
Linkup Standard is an AI-powered search bot that provides detailed overviews and answers sourced from the web, helping you find high-quality information quickly and accurately. Results are faster compared to its Deep search counterpart. Context Window: 100k
Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Audio/video files are not supported at this time.
Parameter controls available:
1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains.
2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format
3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display.
Learn more: https://www.linkup.so/
|
|
|
poe
|
- |
perplexity-sonar-pro
|
- |
- |
Sonar Pro by Perplexity is an advanced AI model that enhances real-time, web-connected search capabilities with double the citations and a larger context window. It's designed for complex queries, providing in-depth, nuanced answers and extended extensibility, making it ideal for enterprises and developers needing robust search solutions. Context Length: 200k (max output token limit of 8k)
|
|
|
poe
|
- |
perplexity-sonar-rsn-pro
|
- |
- |
This model operates on the open-sourced uncensored R1-1776 model from Perplexity with web search capabilities. The Perplexity Sonar Rsn Pro Reasoning Model takes AI-powered answers to the next level, offering unmatched quality and precision. Outperforming leading search engines and LLMs, This model has demonstrated superior performance in the SimpleQA benchmark, making it the gold standard for high-quality answer generation. Context Length: 128k (max output token limit of 8k)
|
|
|
poe
|
- |
perplexity-deep-research
|
- |
- |
Perplexity Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains like finance, technology, health, and current events. Context Length: 128k
|
|
|
poe
|
- |
flux-pro-1.1-ultra
|
- |
- |
State-of-the-art image generation with four times the resolution of standard FLUX-1.1-pro. Best-in-class prompt adherence and pixel-perfect image detail. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Add "--raw" (no other arguments needed) for an overall less processed, everyday aesthetic. Valid aspect ratios are 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21. Send an image to have this model reimagine/regenerate it via FLUX Redux, and use "--strength" (e.g --strength 0.7) to control the impact of the text prompt (1 gives greater influence, 0 means very little)."--raw true" to enable raw photographic detail.
|
|
|
poe
|
- |
mistral-small-3.1
|
- |
- |
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.
|
|
|
poe
|
- |
claude-opus-3
|
13.00 |
64.00 |
Anthropic's Claude Opus 3 can handle complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks. Supports 200k tokens of context (approximately 150k English words).
|
|
|
poe
|
- |
sonic-3.0
|
6,000.00 |
- |
Generates audio based on your prompt using the latest Cartesia's Sonic 3.0 text-to-speech model in your voice of choice.
Supports 10k characters.
You can select a voice and language in option menu in the input bar.
The following voices are supported covering 42 languages (English, Arabic, Bengali, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Malay, Malayalam, Marathi, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Slovak, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese):
-- English --
Ariana
Kiefer
Tessa
Brandon
Linda - Conversational Guide
Ronald - Thinker
Brooke - Big Sister
Katie - Friendly Fixer
Jacqueline - Reassuring Agent
Caroline - Southern Guide
-- Arabic --
Amira - Dreamy Whisperer
Omar - High-Energy Presenter
-- Bengali --
Pooja - Everyday Assistant
Rubel - City Guide
-- Bulgarian --
Ivana - Instruction Provider
Georgi - Conversationalist
-- Chinese --
Hua - Sunny Support
Yue - Gentle Woman
Tao - Lecturer
Lan - Instructor
-- Croatian --
Petra - Strict Lecturer
Ivan - Bar Companion
-- Czech --
Jana - Crisp Conversationalist
Petr - Pastor
-- Danish --
Katrine - Calm Caregiver
-- Dutch --
Bram - Instructional
Daan - Business Baritone
Sanne - Clear Companion
Lucas - Storyteller
-- Finnish --
Helmi - Warm Friend
Mikko - Narration Expert
-- French --
Helpful French Lady
French Narrator Man
Calm French Woman
Antoine - Stern Man
-- Georgian --
Levan - Support Guide
Tamara - Support Specialist
-- German --
Thomas - Anchor
Viktoria - Phone Conversationalist
Lukas - Professional
Lena - Muse
-- Greek --
Despina - Motherly Woman
Nikos - Radio Storyteller
-- Gujarati --
Isha - Learner
Amit - Sports Student
-- Hebrew --
Noam - Broadcaster
-- Hindi --
Arushi - Hinglish Speaker
Sunil - Official Announcer
Riya - College Roommate
Aadhya - Soother
-- Hungarian --
Gabor - Reassuring
Eszter - Customer Companion
-- Indonesian --
Siti - Ad Narrator
Andi - Dynamic Presenter
-- Italian --
Liv - Casual Friend
Alessandra - Melodic Guide
Francesca - Elegant Partner
Giancarlo - Support Leader
-- Japanese --
Yumiko - Friendly Agent
Emi - Soft-Spoken Friend
Yuki - Calm Woman
Daisuke - Businessman
-- Kannada --
Prakash - Instructor
Divya - Joyful Narrator
-- Korean --
Jihyun - Anchorwoman
Mimi - Show Stopper
Byungtae - Enforcer
Jiwoo - Service Specialist
-- Malay --
Aisyah - Chat Partner
Faiz - Family Guide
-- Malayalam --
Latha - Friendly Host
-- Marathi --
Suresh - Instruction
Anika - Enthusiastic Seller
-- Norwegian --
Lars - Casual Conversationalist
-- Polish --
Tomek - Casual Companion
Wojciech - Documentarian
Piotr - Corporate Lead
Katarzyna - Melodic Storyteller
-- Portuguese --
Luana - Public Speaker
Felipe - Casual Talker
Ana Paula - Marketer
Beatriz - Support Guide
-- Punjabi --
Gurpreet - Companion
Jaspreet - Commercial Woman
-- Romanian --
Andrada - Steady Speaker
Andrei - Conversationalist Guy
-- Russian --
Tatiana - Friendly Storyteller
Natalya - Soothing Guide
Irina - Poetic
Sergei - Expressive Narrator
-- Slovak --
Katarina - Friendly Sales
Peter - Narrator Man
-- Spanish --
Pedro - Formal Speaker
Daniela - Relaxed Woman
Fran - Confident Young Professional
Isabel - Teacher
-- Swedish --
Freja - Nordic Reader
Ingrid - Peaceful Guide
Anders - Nordic Baritone
Cees - Nordic Narrator
-- Tagalog --
Luz - Casual Speaker
Angelo - Calm Narrator
-- Tamil --
Arun - Lively
Lakshmi - Everyday
-- Telugu --
Sindhu - Conversational Partner
Vikram - Folk Narrator
-- Thai --
Somchai - Star
Suda - Fortune Teller
-- Turkish --
Emre - Calming Speaker
Leyla - Story Companion
Azra - Service Specialist
Taylan - Expressive
-- Ukrainian --
Oleh - Professional Guy
-- Vietnamese --
Minh - Conversational Partner
Xia - Calm Companion
|
|
|
poe
|
- |
hailuo-music-v1.5
|
- |
- |
Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions. Send the lyrics of the music over as your prompt.
Use `--style` to set the style of the generated music - for example, rock and roll, hip-hop, etc.
Both prompt/lyrics and style must be sent over for best quality.
The prompt supports [intro][verse][chorus][bridge][outro] sections.
|
|
|
poe
|
- |
elevenlabs-music
|
- |
- |
The ElevenLabs music model is a generative AI system designed to compose original music from text prompts. It allows creators to specify genres, moods, instruments, and structure, producing royalty-free tracks tailored to their needs. The model emphasizes speed, creative flexibility, and high-quality audio output, making it suitable for use in videos, podcasts, games, and other multimedia projects. This bot can produce songs with suggested lyrics based on general descriptions, exact lyrics if specified as such, or instrumental ones, all via prompting.
Use `--music_length_ms` to set the length of the song in milliseconds (10,000 to 300,000 ms).
Prompt input cannot exceed 2,000 characters.
|
|
|
poe
|
- |
whisper-v3-large-t
|
3,000.00 |
- |
Whisper v3 Large is a state-of-the-art automatic speech recognition and translation model developed by OpenAI, offering 10–20% lower error rates than its predecessor, Whisper large-v2. It supports transcription and translation across numerous languages, with improvements in handling diverse audio inputs, including noisy conditions and long-form audio files.
|
|
|
poe
|
- |
stable-audio-2.5
|
- |
- |
Stable Audio 2.5 generates high-quality audio up to 3 minutes long from text prompts, supporting text-to-audio, audio-to-audio transformations, and inpainting with customizable settings like duration, steps, CFG scale, and more. It is Ideal for music production, cinematic sound design, and remixing.
Note: Audio-to-audio and inpaint modes require a prompt alongside an uploaded audio file for generation.
Parameter controls available:
1. Basic
- Default: text-to-audio (no `--mode` needed)
- If transforming uploaded audio: `--mode audio-to-audio`
- If replacing specific parts: `--mode audio-inpaint`
- `--output_format wav` (for high quality, otherwise omit for mp3)
2. Timing and Randomness
- `--duration [1-190 seconds]` controls how long generated audio is
- '--random_seed false --seed [0-4294967294]' disables random seed generation
3. Advanced
- `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15)
- `--steps [4-8]`: Higher = better quality (recommended 6-8)
4. Transformation control (only for audio-to-audio)
- `--strength [0-1]`: How much to change/transform (0.3-0.7 typical)
5. Inpainting control (only for audio-inpaint)
- `--mask_start_time [seconds]` start time of the uploaded audio to modify
- `--mask_end_time [seconds]` end time of the uploaded audio to modify
|
|
|
poe
|
- |
stable-audio-2.0
|
- |
- |
Stable Audio 2.0 generates audio up to 3 minutes long from text prompts, supporting text-to-audio and audio-to-audio transformations with customizable settings like duration, steps, CFG scale, and more. It is ideal for creative professionals seeking detailed and extended outputs from simple prompts.
Note: Audio-to-audio mode requires a prompt alongside an uploaded audio file for generation.
Parameter controls available:
1. Basic
- Default: text-to-audio (no `--mode` needed)
- If transforming uploaded audio: `--mode audio-to-audio`
- `--output_format wav` (for high quality, otherwise omit for mp3)
2. Timing and Randomness
- `--duration [1-190 seconds]` controls how long generated audio is
- '--random_seed false --seed [0-4294967294]' disables random seed generation
3. Advanced
- `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15)
- `--steps [30-100]`: Higher = better quality (recommended 50-80)
4. Transformation control (only for audio-to-audio)
- `--strength [0-1]`: How much to change/transform (0.3-0.7 typical)
|
|
|
poe
|
- |
hailuo-speech-02
|
- |
- |
Generate speech from text prompts using the MiniMax Speech-02 model. Include `--hd` at the end of your prompt for higher quality output with a higher price. You may set language with `--language`, voice with`--voice`, pitch with `--pitch`, speed with `--speed`, and volume with `--volume`. Please check the UI for allowed values for each parameter.
|
|
|
poe
|
- |
elevenlabs-v2.5-turbo
|
- |
- |
ElevenLabs' leading text-to-speech technology converts your text into natural-sounding speech, using the Turbo v2.5 model. Simply send a text prompt, and the bot will generate audio using your choice of available voices. If you link a URL or a PDF, it will do its best to read it aloud to you. The overall default voice is Jessica, an American-English female.
Add --voice "Voice Name" to the end of a message (e.g. "Hello world --voice Eric") to customize the voice used. Add --language and the two-letter, Language ISO-639-1 code to your message if you notice pronunciation errors; table of ISO-639-1 codes here: https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes (e.g. zh for Chinese, es for Spanish, hi for Hindi)
The following voices are supported and recommended for each language:
English -- Sarah, George, River, Matilda, Will, Jessica, Brian, Lily, Monika Sogam
Chinese -- James Gao, Martin Li, Will, River
Spanish -- David Martin, Will, Efrayn, Alejandro, Sara Martin, Regina Martin
Hindi -- Ranga, Niraj, Liam, Raju, Leo, Manu, Vihana Huja, Kanika, River, Monika Sogam, Muskaan, Saanu, Riya, Devi
Arabic -- Bill, Mo Wiseman, Haytham, George, Mona, Sarah, Sana, Laura
German -- Bill, Otto, Leon Stern, Mila, Emilia, Lea, Leonie
Indonesian -- Jessica, Putra, Mahaputra
Portuguese -- Will, Muhammad, Onildo, Lily, Jessica, Alice
Vietnamese -- Bill, Liam, Trung Caha, Van Phuc, Ca Dao, Trang, Jessica, Alice, Matilda
Filipino -- Roger, Brian, Alice, Matilda
French -- Roger, Louis, Emilie
Swedish -- Will, Chris, Jessica, Charlotte
Turkish -- Cavit Pancar, Sohbet Adami, Belma, Sultan, Mahidevran
Romanian -- Eric, Bill, Brian, Charlotte, Lily
Italian -- Carmelo, Luca, Alice, Lily
Polish -- Robert, Rob, Eric, Pawel, Lily, Alice
Norwegian -- Chris, Charlotte
Czech -- Pawel
Finnish -- Callum, River
Hungarian -- Brian, Sarah
Japanese -- Alice
Prompt input cannot exceed 40,000 characters.
|
|
|
poe
|
- |
sonic-2.0
|
- |
- |
Generates audio based on your prompt using the latest Cartesia's Sonic 2.0 text-to-speech model in your voice of choice (see below)
Add --voice [Voice Name] to the end of a message to customize the voice used or to handle different language inputs (e.g. 你好 --voice Chinese Commercial Woman). All of Cartesia's voices are supported on Poe.
The following voices are supported covering 15 languages (English, French, German, Spanish, Portuguese, Chinese, Japanese, Hindi, Italian, Korean, Dutch, Polish, Russian, Swedish, Turkish):
Here's the alphabetical list of all the top voice names:
"1920's Radioman"
Aadhya
Adele
Alabama Man
Alina
American Voiceover Man
Ananya
Anna
Announcer Man
Apoorva
ASMR Lady
Australian Customer Support Man
Australian Man
Australian Narrator Lady
Australian Salesman
Australian Woman
Barbershop Man
Brenda
British Customer Support Lady
British Lady
British Reading Lady
Brooke
California Girl
Calm French Woman
Calm Lady
Camille
Carson
Casper
Cathy
Chongz
Classy British Man
Commercial Lady
Commercial Man
Confident British Man
Connie
Corinne
Customer Support Lady
Customer Support Man
Dallas
Dave
David
Devansh
Elena
Ellen
Ethan
Female Nurse
Florence
Francesca
French Conversational Lady
French Narrator Lady
French Narrator Man
Friendly Australian Man
Friendly French Man
Friendly Reading Man
Friendly Sidekick
German Conversational Woman
German Conversation Man
German Reporter Man
German Woman
Grace
Griffin
Happy Carson
Helpful French Lady
Helpful Woman
Hindi Calm Man
Hinglish Speaking Woman
Indian Lady
Indian Man
Isabel
Ishan
Jacqueline
Janvi
Japanese Male Conversational
Joan of Ark
John
Jordan
Katie
Keith
Kenneth
Kentucky Man
Korean Support Woman
Laidback Woman
Lena
Lily Whisper
Little Gaming Girl
Little Narrator Girl
Liv
Lukas
Luke
Madame Mischief
Madison
Maria
Mateo
Mexican Man
Mexican Woman
Mia
Middle Eastern Woman
Midwestern Man
Midwestern Woman
Movieman
Nathan
Newslady
Newsman
New York Man
Nico
Nonfiction Man
Olivia
Orion
Peninsular Spanish Narrator Lady
Pleasant Brazilian Lady
Pleasant Man
Polite Man
Princess
Professional Woman
Rebecca
Reflective Woman
Ronald
Russian Storyteller Man
Salesman
Samantha Angry
Samantha Happy
Sarah
Sarah Curious
Savannah
Silas
Sophie
Southern Man
Southern Woman
Spanish Narrator Woman
Spanish Reporter Woman
Spanish-speaking Reporter Man
Sportsman
Stacy
Stern French Man
Steve
Storyteller Lady
Sweet Lady
Tatiana
Taylor
Teacher Lady
The Merchant
Tutorial Man
Wise Guide Man
Wise Lady
Wise Man
Wizardman
Yogaman
Young Shy Japanese Woman
Zia
|
|
|
poe
|
- |
gemini-2.5-flash-tts
|
- |
- |
Gemini‑2.5‑Flash‑TTS is Google’s low‐latency text‑to‑speech model that converts text input into audio output, supporting both single‑ and multi‑speaker voices with controllable style, accent, and expressive tone — ideal for applications like podcasts, audiobooks, and conversational voice systems.
This bot does not accept attachments.
Parameter controls available:
1. Voice & Style Configuration
- Basic Settings
- `--mode single` (default) for single speaker or `--mode multi` for conversation
- `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US)
- `--output_format [MP3|WAV|OGG]` (default: MP3)
- Single speaker: `--voice [voice_name]` (default: Charon)
- Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore)
- Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2)
- Style Instructions
- `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent")
2. Limitations
- Text and style prompt limited to 4000 bytes each
- Multi-speaker requires `SpeakerName: text` format
Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm)
Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
|
|
|
poe
|
- |
gemini-2.5-pro-tts
|
- |
- |
Gemini‑2.5‑Pro‑TTS is Google’s highest‑quality text‑to‑speech model preview, designed for complex workflows like podcasts, audiobooks, and customer support; it delivers expressive, accent‑ and style‑controllable single‑ or multi‑speaker speech, supporting over 23 languages, and built for state‑of‑the‑art output with the most powerful model architecture.
This bot does not accept attachments.
Parameter controls available:
1. Voice & Style Configuration
- Basic Settings
- `--mode single` (default) for single speaker or `--mode multi` for conversation
- `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US)
- `--output_format [MP3|WAV|OGG]` (default: MP3)
- Single speaker: `--voice [voice_name]` (default: Charon)
- Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore)
- Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2)
- Style Instructions
- `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent")
2. Limitations
- Text and style prompt limited to 4000 bytes each
- Multi-speaker requires `SpeakerName: text` format
Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm)
Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
|
|
|
poe
|
- |
orpheus-tts
|
- |
- |
Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. Send a text prompt to voice it. Use --voice to choose from one of the available voices (`tara`, `leah`, `jess`, `leo`, `dan`,`mia`, `zac`, `zoe`). Officially supported sound effects are: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>, and <giggle>.
|
|
|
poe
|
- |
deepgram-nova-3
|
- |
- |
Transcribe audio files using Speech-to-Text technology with the Deepgram Nova-3 model, featuring multi-language support and advanced customizable settings.
[1] Basic Features:
Use `--generate_pdf true` to generate a PDF file of the transcription,
Use `--diarize true` to identify different speakers in the audio. This will automatically enable utterances.
Use `--smart_format false` to disable automatic format text for improved readability including punctuation and paragraphs. This feature is enabled by default.
[2] Advanced Features:
Use `--dictation true` to convert spoken commands for punctuation into their respective marks (e.g., 'period' becomes '.'). This will automatically enable punctuation.
Use `--measurements true` to format spoken measurement units into abbreviations
Use `--profanity_filter true` to replace profanity with asterisks
Use `--redact_pci true` to redact payment card information
Use `--redact_pii true` to redact personally identifiable information
Use `--utterances true` to segment speech into meaningful semantic units
Use `--paragraphs false` to disable paragraphs feature. This feature split audio into paragraphs to improve transcript readability. This will automatically enable punctuation. This is enabled by default.
Use `--punctuate false` to disable punctuate feature. This feature add punctuation and capitalization to your transcript. This is enabled by default.
Use `--numerals false` to disable numerals feature. This feature convert numbers from written format to numerical format
[3] Languages Supported:
Auto-detect (Default)
English
Spanish
French
German
Italian
Portuguese
Japanese
Chinese
Hindi
Russian
Dutch
[4] Key Terms `--keyterm` to enter important terms to improve recognition accuracy, separated by commas. English only, Limited to 500 tokens total.
|
|
|
poe
|
- |
playai-tts
|
- |
- |
Generates audio based on your prompt using PlayHT's text-to-speech model, in the voice of your choice. Use --voice [voice_name] to pass in the voice of your choice, choosing one from below. Voice defaults to `Jennifer_(English_(US)/American)`.
Jennifer_(English_(US)/American)
Dexter_(English_(US)/American)
Ava_(English_(AU)/Australian)
Tilly_(English_(AU)/Australian)
Charlotte_(Advertising)_(English_(CA)/Canadian)
Charlotte_(Meditation)_(English_(CA)/Canadian)
Cecil_(English_(GB)/British)
Sterling_(English_(GB)/British)
Cillian_(English_(IE)/Irish)
Madison_(English_(IE)/Irish)
Ada_(English_(ZA)/South_African)
Furio_(English_(IT)/Italian)
Alessandro_(English_(IT)/Italian)
Carmen_(English_(MX)/Mexican)
Sumita_(English_(IN)/Indian)
Navya_(English_(IN)/Indian)
Baptiste_(English_(FR)/French)
Lumi_(English_(FI)/Finnish)
Ronel_Conversational_(Afrikaans/South_African)
Ronel_Narrative_(Afrikaans/South_African)
Abdo_Conversational_(Arabic/Arabic)
Abdo_Narrative_(Arabic/Arabic)
Mousmi_Conversational_(Bengali/Bengali)
Mousmi_Narrative_(Bengali/Bengali)
Caroline_Conversational_(Portuguese_(BR)/Brazilian)
Caroline_Narrative_(Portuguese_(BR)/Brazilian)
Ange_Conversational_(French/French)
Ange_Narrative_(French/French)
Anke_Conversational_(German/German)
Anke_Narrative_(German/German)
Bora_Conversational_(Greek/Greek)
Bora_Narrative_(Greek/Greek)
Anuj_Conversational_(Hindi/Indian)
Anuj_Narrative_(Hindi/Indian)
Alessandro_Conversational_(Italian/Italian)
Alessandro_Narrative_(Italian/Italian)
Kiriko_Conversational_(Japanese/Japanese)
Kiriko_Narrative_(Japanese/Japanese)
Dohee_Conversational_(Korean/Korean)
Dohee_Narrative_(Korean/Korean)
Ignatius_Conversational_(Malay/Malay)
Ignatius_Narrative_(Malay/Malay)
Adam_Conversational_(Polish/Polish)
Adam_Narrative_(Polish/Polish)
Andrei_Conversational_(Russian/Russian)
Andrei_Narrative_(Russian/Russian)
Aleksa_Conversational_(Serbian/Serbian)
Aleksa_Narrative_(Serbian/Serbian)
Carmen_Conversational_(Spanish/Spanish)
Patricia_Conversational_(Spanish/Spanish)
Aiken_Conversational_(Tagalog/Filipino)
Aiken_Narrative_(Tagalog/Filipino)
Katbundit_Conversational_(Thai/Thai)
Katbundit_Narrative_(Thai/Thai)
Ali_Conversational_(Turkish/Turkish)
Ali_Narrative_(Turkish/Turkish)
Sahil_Conversational_(Urdu/Pakistani)
Sahil_Narrative_(Urdu/Pakistani)
Mary_Conversational_(Hebrew/Israeli)
Mary_Narrative_(Hebrew/Israeli)
|
|
|
poe
|
- |
unreal-speech-tts
|
- |
- |
Convert chats, URLs, and documents into natural speech. 8 Languages: English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese. Use `--voice <VOICE_NAME>`. Defaults to `--voice Sierra`. Full list below:
American English
- Male: Noah, Jasper, Caleb, Ronan, Ethan, Daniel, Zane, Rowan
- Female: Autumn, Melody, Hannah, Emily, Ivy, Kaitlyn, Luna, Willow, Lauren, Sierra
British English
- Male: Benjamin, Arthur, Edward, Oliver
- Female: Eleanor, Chloe, Amelia, Charlotte
Japanese
- Male: Haruto
- Female: Sakura, Hana, Yuki, Rina
Chinese
- Male: Wei, Jian, Hao, Sheng
- Female: Mei, Lian, Ting, Jing
Spanish
- Male: Mateo, Javier
- Female: Lucía
French
- Female: Élodie
Hindi
- Male: Arjun, Rohan
- Female: Ananya, Priya
Italian
- Male: Luca
- Female: Giulia
Portuguese
- Male: Thiago, Rafael
- Female: Camila
|
|
|
poe
|
- |
imagen-4-ultra
|
42,000.00 |
- |
DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-exp-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
|
|
|
poe
|
- |
imagen-4-fast
|
14,000.00 |
- |
DeepMind's June 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-fast-generate-preview-06-06` model from Google Vertex, and has a maximum input of 480 tokens.
|
|
|
poe
|
- |
imagen-4
|
28,000.00 |
- |
DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
|
|
|
poe
|
- |
phoenix-1.0
|
17,000.00 |
- |
High-fidelity image generation with strong prompt adherence, especially for long and detailed instructions. Phoenix is capable of rendering coherent text in a wide variety of contexts. Prompt enhance is on to see the full power of a long, detailed prompt, but it can be turned off for full control. Uses the Phoenix 1.0 Fast model for performant, high-quality generations.
Parameters:
- Aspect Ratio (1:1, 3:2, 2:3, 9:16, 16:9)
- Prompt Enhance (Enable the prompt for better image generation)
- Style (Please see parameter control to identify available styles)
Image generation prompts can be a maximum of 1500 characters.
|
|
|
poe
|
- |
dreamina-3.1
|
- |
- |
ByteDance's Dreamina 3.1 Text-to-Image showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details. This model excels with large prompts, please use large prompts in case you face Content Checker issues.
The model does not accept attachment.
Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, & 9:16.
|
|
|
poe
|
- |
qwen-image
|
20,000.00 |
- |
Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Experiments show strong general capabilities in image generation, with exceptional performance in text rendering, especially for Chinese. Prompt input cannot exceed 2,000 characters.
|
|
|
poe
|
- |
qwen-image-20b
|
- |
- |
Qwen-Image (20B) is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt.
|
|
|
poe
|
- |
hunyuan-image-2.1
|
- |
- |
Hunyuan Image 2.1 is a high quality, highly efficient text-to-image model. Send a prompt to generate an image.
Use `--aspect` (one of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`) to set the aspect ratio of the generated image.
Use `--negative_prompt` (examples: blur, low resolution, poor quality) to set negative prompt on the image generated.
This bot does not accept attachment.
|
|
|
poe
|
- |
flux-kontext-max
|
- |
- |
FLUX.1 Kontext [max] is a new premium model from Black Forest Labs that brings maximum performance across all aspects. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
|
|
|
poe
|
- |
flux-kontext-pro
|
- |
- |
The FLUX.1 Kontext [pro] model delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, flawless typography, and image editing capabilities. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
|
|
|
poe
|
- |
flux-krea
|
- |
- |
FLUX-Krea is a version of FLUX Dev tuned for superior aesthetics. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Krea Redux.
|
|
|
poe
|
- |
imagen-3
|
28,000.00 |
- |
Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For simpler prompts, faster results, & lower cost, use @Imagen3-Fast. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
|
|
|
poe
|
- |
wan-animate
|
- |
- |
Wan Animate takes in an image and a video to generate another video where a character in the image replaces a character in the video(default), or the video character's motion is used to animate the character in the image. Pass --animate for the second functionality.
The bot supports only four file types: JPEG, PNG, WebP, and MP4
|
|
|
poe
|
- |
imagen-3-fast
|
14,000.00 |
- |
Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts — optimized for short, simple prompts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For more complex prompts, use @Imagen3. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
|
|
|
poe
|
- |
seedream-3.0
|
- |
- |
Seedream 3.0 by ByteDance is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation.
|
|
|
poe
|
- |
seedance-1.0-pro
|
- |
- |
Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. `--duration` (3 to 12) sets the video duration.
Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
|
|
|
poe
|
- |
seedance-1.0-lite
|
- |
- |
Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following.
Optional paremeters:
Use `--aspect` to set the aspect ratio (available values:`21:9`, `16:9`, `4:3`, `1:1`, `3:4` and `9:16`).
Use `--resolution` (one of `480p`, `720p` and `1080p` to set the video resolution.
Use `--duration` (3 to 12) sets the video duration. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
|
|
|
poe
|
- |
ideogram-v3
|
- |
- |
Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use. Use `--aspect` to set the aspect ratio (Valid aspect ratios are 5:4, 4:3, 4:5, 1:1, 1:2, 1:3, 3:4, 3:1, 3:2, 2:1, 2:3, 16:9, 16:10, 10:16, 9:16), and use `--style` to specify a style (one of `AUTO`, `GENERAL`, `REALISTIC`, and `DESIGN`, default: `AUTO`.). Send one image with a prompt for image remixing/restyling. Send two images (one an image and the other a black-and-white mask image denoting an area) for image editing.
|
|
|
poe
|
- |
ideogram-v2
|
57,000.00 |
- |
Latest image model from Ideogram, with industry leading capabilities in generating realistic images, graphic design, typography, and more. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, 1:1. "--style" parameter can be defined to specify the style of image generated(GENERAL, REALISTIC, DESIGN, RENDER_3D, ANIME). Powered by Ideogram.
|
|
|
poe
|
- |
flux-dev-di
|
5,000.00 |
- |
High quality image generator using FLUX dev model. Top of the line prompt following, visual quality and output diversity. This model is a text to image generation only and does not accept attachments. To further customize the prompt, you can follow the parameters available:
To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024
To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024
To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random
To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 25
|
|
|
poe
|
- |
flux-schnell-di
|
990.00 |
- |
This is the fastest version of FLUX, featuring highly optimized abstract models that excel at creative and unconventional renders. To further customize the prompt, you can follow the parameters available:
To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024
To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024
To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random
To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 1
|
|
|
poe
|
- |
flux-pro-1.1
|
- |
- |
State-of-the-art image generation with top-of-the-line prompt following, visual quality, image detail and output diversity. This is the most powerful version of FLUX 1.1, use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
|
|
|
poe
|
- |
luma-photon-flash
|
- |
- |
Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
|
|
|
poe
|
- |
hidream-i1-full
|
- |
- |
Hidream-I1 is a state-of-the-art text to image model by Hidream. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt. Hosted by fal.ai.
|
|
|
poe
|
- |
retro-diffusion-core
|
- |
- |
Generate true game ready pixel art in seconds at any resolution between 16x16 and 512x512 across the various styles. Create 48x48 walking animations of sprites using the "animation_four_angle_walking" style! First 50 basic image requests worth of points free! Check out more settings below 👇
Example message: "A cute corgi wearing sunglasses and a party hat --ar 128:128 --style rd_fast__portrait"
Settings:
--ar <width>:<height> (Image size in pixels, larger images cost more. Or aspect ratio like 16:9)
--style <style_name> (The name of the style you want to use. Available styles: rd_fast__anime, rd_fast__retro, rd_fast__simple, rd_fast__detailed, rd_fast__game_asset, rd_fast__portrait, rd_fast__texture, rd_fast__ui, rd_fast__item_sheet, rd_fast__mc_texture, rd_fast__mc_item, rd_fast__character_turnaround, rd_fast__1_bit, animation__four_angle_walking, rd_plus__default, rd_plus__retro, rd_plus__watercolor, rd_plus__textured, rd_plus__cartoon, rd_plus__ui_element, rd_plus__item_sheet, rd_plus__character_turnaround, rd_plus__isometric, rd_plus__isometric_asset, rd_plus__topdown_map, rd_plus__top_down_asset)
--seed (Random number, keep the same for consistent generations)
--tile (Creates seamless edges on applicable images)
--tilex (Seamless horizontally only)
--tiley (Seamless vertically only)
--native (Returns pixel art at native resolution, without upscaling)
--removebg (Automatically remove the background)
--iw <decimal between 0.0 and 1.0> (Controls how strong the image generation is. 0.0 for small changes, 1.0 for big changes)
Additional notes: All styles have a size range of 48x48 -> 512x512, except for the "mc" styles, which have a size range of 16x16 -> 128x128, and the "animation_four_angle_walking" style, which will only create 48x48 animations.
|
|
|
poe
|
- |
stablediffusion3.5-l
|
- |
- |
Stability.ai's StableDiffusion3.5 Large, hosted by @fal, is the Stable Diffusion family's most powerful image generation model both in terms of image quality and prompt adherence. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16.
|
|
|
poe
|
- |
flux-schnell
|
- |
- |
Turbo speed image generation with strengths in prompt following, visual quality, image detail and output diversity. This is the fastest version of FLUX.1. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
|
|
|
poe
|
- |
gpt-image-1
|
- |
- |
OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query. For a conversational editing experience, use https://poe.com/GPT-4o (all users) or https://poe.com/Assistant (subscribers) instead.
Optional parameters:
`--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image
` --quality` (options: high, medium, low): Image resolution
` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit.
`--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
|
|
|
poe
|
- |
gpt-image-1-mini
|
- |
- |
OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query.
Optional parameters:
`--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image
` --quality` (options: high, medium, low): Image resolution
` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit.
`--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
|
|
|
poe
|
- |
veo-3.1
|
- |
- |
Google’s Veo 3.1 is an updated version of the Veo family of models that features richer native audio, from natural conversations to synchronized sound effects, and offers greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes.
Optional parameters:
`--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9`
negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy`
`--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s`
`--seed` to set the seed (set number value)
`--reference-mode` toggle to use input images(3 max) as reference for video generation
For first & last frame video generation and references support, please use www.poe.com/Veo-v3.1
|
|
|
poe
|
- |
veo-3.1-fast
|
- |
- |
Google’s Veo 3.1 Fast is an updated version of the Veo family of models that's optimized for speed and cost, but still features richer native audio, from natural conversations to synchronized sound effects, and offer greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes.
Optional parameters:
`--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9`
negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy`
`--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s`
`--seed` to set the seed (set number value)
For first & last frame video generation support, please use www.poe.com/Veo-v3.1-Fast
|
|
|
poe
|
- |
sora-2-pro
|
- |
- |
Sora 2 Pro is OpenAI’s state-of-the-art video and audio generation model, capable of creating richly detailed, dynamic clips with synchronized audio from natural language prompts or images. It builds on Sora 2’s capabilities with enhanced physical accuracy, intricate world-state persistence, and higher fidelity in cinematic styles. The model excels at generating synchronized dialogue, sound effects, and realistic simulations, all while adhering to real-world physics. Sora 2 Pro also supports seamless editing, complex multi-shot prompt execution, and the integration of real-world elements like people, animals, and objects with unparalleled detail and accuracy.
This bot supports text-to-video and image-to-video generation.
Optional parameters:
`--duration` (options: 4, 8, 12): Video output duration in seconds
`--size` (options: [Landscape] - 1280x720, 1792x1024, [Portrait] - 720x1280, 1024x1792): Resolution of the output video
|
|
|
poe
|
- |
sora-2
|
- |
- |
Sora 2 is OpenAI’s latest video and audio generation model, delivering exceptional realism, physical accuracy, and controllability. It excels at creating cinematic scenes, synchronized dialogue, sound effects, and dynamic simulations while faithfully adhering to the laws of physics. The model supports editing, multi-shot prompt adherence, and the integration of real-world elements, such as people, animals, and objects.
This bot supports text-to-video and image-to-video generation.
Optional parameters:
`--duration` (options: 4, 8, 12): Video output duration in seconds
`--size` (options: [landscape] - 1280x720, [portrait] - 720x1280): Resolution of the output video
|
|
|
poe
|
- |
kling-2.5-turbo-std
|
- |
- |
Generate high-quality videos from images using Kling 2.5 Turbo Standard.
Optional prompts:
Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive).
Use `--duration` to set either 5 or 10 second video. Note - only Image to Video is supported, aspect ratio is inferred automatically from the image and cannot be set.
Supported image file format: jpeg, png, webp
|
|
|
poe
|
- |
wan-2.6
|
- |
- |
WAN 2.6 is Alibaba’s multimodal video generation model built for cinematic, multi-shot storytelling—creating high-fidelity videos from text and/or images while keeping characters and style consistent across scenes. It also supports native audio-visual sync (including lip-sync) and can generate or align dialogue/music/SFX with the visuals, enabling “prompt-to-video” results that feel production-ready without heavy post work.
Notes:
- This model is served from the Singapore area.
- Upload an image to enable image-to-video generations or video(s) for video-to-video generations.
- Responses may take upwards of 5 minutes (or more) to finish generating.
Parameter controls available:
1. Video Settings
- `--resolution 1080p` (default) or `--resolution 720p`
- `--aspect_ratio 16:9` (default), `9:16`, `1:1`, `4:3`, or `3:4` (ignored for image-to-video as it uses the input image's aspect ratio)
- `--duration [5, 10, or 15]` seconds (default: 5) (video-to-video limited to 10s max)
2. Advanced Settings
- `--prompt_extend true` (default) or `--prompt_extend false`: AI prompt enhancement
- `--audio true` (default) or `--audio false`: Enable/disable audio generation
- `--shot_type multi` (default) or `--shot_type single`: Multi-shot narrative vs single continuous shot
- `--seed [0-2147483646]`: Random seed for reproducibility
- `--negative_prompt "text"`: Describe what you don't want in the video
3. Attachments
- For i2v: Attach an image as the first frame
- For r2v: Attach 1-3 reference videos (2-30 seconds each, MP4/MOV) (Use `character1`, `character2`, `character3` in prompt to reference subjects, ex. character1 references the subject in the first uploaded video)
- For t2v/i2v: Optionally attach an audio file (3-30 seconds, max 15mb, .mp3/.wav) for custom audio
4. Multi-Shot Prompting
- For multi-shot mode, use timeline syntax: `[Shot #] [Timestamp] [Action]`. Example: `[Shot 1] [0-5s] Wide shot of city skyline. [Shot 2] [5-10s] Close-up of character walking.`
- Ensure timestamps match your selected duration and use transition keywords like "Hard cut" or "Fade in" between shots.
|
|
|
poe
|
- |
seedream-4.0
|
- |
- |
Seedream 4.0 is ByteDance's latest and best text-to-image model, capable of impressive high fidelity image generation, with great text-rendering ability. Seedream 4.0 can also take in multiple images as references and combine them together or edit them to return an output. Pass `--aspect` to set the aspect ratio for the model (One of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`).
|
|
|
poe
|
- |
kling-2.5-turbo-pro
|
- |
- |
Generate high-quality videos from text and images using Kling 2.5 Turbo Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video.
|
|
|
poe
|
- |
kling-2.1-master
|
- |
- |
Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use --duration to set either 5 second or 10 second video.
|
|
|
poe
|
- |
hailuo-02
|
- |
- |
Hailuo-02, MiniMax's latest video generation model. Generates 6-second, 768p videos, just submit a text prompt or an image with a prompt describing the desired video behavior, and it will create it; typically takes ~5 minutes for generation time. Strong motion effects and ultra-clear quality.
|
|
|
poe
|
- |
hailuo-02-standard
|
- |
- |
MiniMax Hailuo-02 Video Generation model: Advanced image-to-video generation model with 768p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Use `--duration` to set the video duration (6 or 10 seconds).
|
|
|
poe
|
- |
hailuo-02-pro
|
- |
- |
MiniMax Hailuo-02 Pro Video Generation model: Advanced image-to-video generation model with 1080p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Generates 5 second video.
|
|
|
poe
|
- |
deepseek-r1-turbo-di
|
15,000.00 |
- |
Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. Turbo model is quantized to achieve higher speeds. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company.
Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
|
|
|
poe
|
- |
hailuo-director-01
|
- |
- |
Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control. Both text-to-video and image-to-video are supported.
Camera movement instructions can be added using square brackets (e.g. [Pan left] or [Zoom in]).
You can use up to 3 combined movements per prompt. Duration is fixed to 5 seconds.
Supported movements: Truck left/right, Pan left/right, Push in/Pull out, Pedestal up/down, Tilt up/down, Zoom in/out, Shake, Tracking shot, Static shot. For example: [Truck left, Pan right, Zoom in].
For a more detailed guide, refer https://sixth-switch-2ac.notion.site/T2V-01-Director-Model-Tutorial-with-camera-movement-1886c20a98eb80f395b8e05291ad8645
|
|
|
poe
|
- |
pixverse-v5
|
- |
- |
Pixverse v5 offers advanced creative tools with three main features: Text-to-Video, which transforms written prompts into cinematic, high-detail video clips with fluid motion and accurate visual interpretation; Image-to-Video, which animates static images into dynamic short videos with lifelike motion and smooth transitions; and Transition, which generates seamless morphs between frames or scenes to create unified, professional-quality visual flow.
Parameter Controls and Usage:
1. Video Generation (Main Control Section)
- `--resolution [360p|540p|720p|1080p]`
- Description: Video resolution.
- Default: 720p
- `--duration [5|8]`
- Description: Video length in seconds.
- Default: 5
- `--aspect_ratio [16:9|4:3|1:1|3:4|9:16]`
- Description: Video aspect ratio.
- Default: 16:9
- `--style [none|anime|3d_animation|clay|comic|cyberpunk]`
- Description: Video style (optional).
- Default: none
- `--negative_prompt "[text]"`
- Description: Elements to avoid (optional).
- Default: "" (empty)
- `--seed [integer]`
- Description: Optional seed for reproducibility (e.g., 12345).
- Default: "" (empty/random)
2. Generation Modes (Determined by attachments)
- Text-to-Video: Provide a prompt with 0 image attachments.
- Image-to-Video: Provide 1 image attachment.
- Transition: Provide 2 image attachments (first is start frame, second is end frame).
3. Limitations
- The combination of `--resolution 1080p` and `--duration 8` is not supported.
- Only 0, 1, or 2 image attachments are supported.
- Attachments must be images (PNG/JPEG/WEBP/TIFF/BMP/HEIC/GIF).
|
|
|
poe
|
- |
wan-2.5
|
- |
- |
Wan-2.5 Video Generation bot. Has text-to-video and image-to-video capabilities. Optionally, send an audio file (mp3) to guide the video generation.
Optional Parameters:
Control the output's resolution with `--resolution` (480p, 720p or 1080p) defaults to 720. Pricing varies on the basis of resolution.
Aspect ratio with `--aspect` ( 16:9, 1:1, 9:16) defaults to 16:9.
Duration with `--duration` ( 5s or 10s) defaults to 5s.
|
|
|
poe
|
- |
pixverse-v4.5
|
- |
- |
Pixverse v4.5 is a video generation model capable of generating high quality videos in under a minute.
Use `--negative_prompt` to set the negative prompt.
Use `--duration` to set the video duration (5 or 8 seconds).
Set the resolution (360p,540p,720p or 1080p) using `--resolution`.
Send 1 image to perform an image-to-video task or a video effect generation task, and 2 images to perform a video transition task, using the first image as the first frame and the second image as the last frame.
Use `--effect` to set the video generation effect, provided 1 image is given (Options: `Kiss_Me_AI`, `Kiss`, `Muscle_Surge`, `Warmth_of_Jesus`, `Anything,_Robot`, `The_Tiger_Touch`, `Hug`, `Holy_Wings`, `Hulk`, `Venom`, `Microwave`). Use `--style` to set the video generation style (for text-to-video,image-to-video, and transition only, options: `anime`, `3d_animation`, `clay`, `comic`, `cyberpunk`).
Use `--seed` to set the seed and `--aspect` to set the aspect ratio.
|
|
|
poe
|
- |
flux-dev
|
- |
- |
High-performance image generation with top of the line prompt following, visual quality, image detail and output diversity. This is a more efficient version of FLUX-pro, balancing quality and speed. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
|
|
|
poe
|
- |
lyria
|
- |
- |
Google DeepMind's Lyria 2 delivers high-quality audio generation, capable of creating diverse soundscapes and musical pieces from text prompts.
Allows users to specify elements to exclude in the audio using the "--no" parameter at the end of the prompt. Also supports "--seed" for deterministic generation. e.g. "An energetic electronic dance track --no vocals, slow tempo --seed 123". Lyria blocks prompts that name specific artists or songs (artist-intent and recitation checks). This bot does not support attachments. This bot accepts input prompts of up to 480 tokens.
|
|
|
poe
|
- |
kling-1.6-pro
|
- |
- |
Kling v1.6 video generation bot, hosted by fal.ai. For best results, upload an image attachment.
Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
|
|
|
poe
|
- |
clarity-upscaler
|
- |
- |
Upscales images with high fidelity to the original image. Use "--upscale_factor" (value is a number between 1 and 4) to set the upscaled images' size (2 means the output image is 2x in size, etc.). "--creativity" and "--clarity" can be set between 0 and 1 to alter the faithfulness to the original image and the sharpness, respectively.
This bot supports .jpg and .png images.
|
|
|
poe
|
- |
topazlabs
|
30.00 |
- |
Topaz Labs’ image upscaler is a best-in-class generative AI model to increase overall clarity and the pixel amount of inputted photos — whether they be ones generated by AI image models and from the real world — while preserving the original photo’s contents. It can produce images of as small as ~10MB and as large as 512MB, depending on the size of the input photo. Specify --upscale and a number up to 16 to control the upscaling factor, output_height and/or output_width to specify the number of pixels for each dimension, and add --generated if the input photo is AI-generated. With no parameters specified, it will increase both input photo’s height and width by 2; especially effective on images of human faces.
|
|
|
poe
|
- |
veo-v3.1
|
- |
- |
Google's Veo-3.1 is an improved version of Veo 3.
Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`).
Use `--silent` to generate a silent video at a lower cost.
Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V).
Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation.
Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task. Pass up to 3 images with `--reference` for a reference-to-video task. Reference images will be directly used in the video generation.
|
|
|
poe
|
- |
veo-v3.1-fast
|
- |
- |
Google's Veo 3.1 Fast is a fast version of Veo 3.1.
Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`).
Use `--silent` to generate a silent video at a lower cost.
Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V).
Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation
Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task.
|
|
|
poe
|
- |
wan-2.2
|
- |
- |
Wan-2.2 is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Send one image for image to video tasks, and send two images for first-frame - last-frame generation. Use `--aspect` to set the aspect ratio (One of `16:9`, `1:1`, `9:16`) for text-to-video requests. Duration is limited to 5 seconds only with up to 720p resolution.
|
|
|
poe
|
- |
ltx-2-fast
|
- |
- |
LTX-2 Fast is a video model by Lightricks that delivers exceptional quality and speed. It can generate videos at up to 50 FPS in high resolutions and supports both text-to-video and image-to-video generation.
Optional Prompts:
Use `--generate-audio` to generate an audio with the video. This is disabled by default.
Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default.
Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price.
Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default
File attachment accepted: jpeg, png, webp
|
|
|
poe
|
- |
ltx-2-pro
|
- |
- |
LTX-2 Pro is an advanced video generation model by Lightricks designed for professional‑grade results. It offers high‑quality, realistic video generation at exceptional speed and supports outputs up to 2K resolution. Perfect for both text‑to‑video and image‑to‑video creation, it delivers cinematic detail and smooth performance.
Optional Prompts:
Use `--generate_audio` to generate an audio with the video. This is disabled by default.
Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default.
Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price.
Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default.
File attachment accepted: jpeg, png, webp
|
|
|
poe
|
- |
veo-3
|
- |
- |
Veo 3 produces incredibly high-quality videos across a diverse range of subjects and styles. It incorporates an enhanced understanding of real-world physics and the subtleties of human movement and expression, resulting in greater detail and overall realism.
Veo 3 is fluent in the unique language of cinematography: you can request a specific genre, specify a lens, or suggest cinematic effects, and Veo 3 will deliver stunning 8-second video clips. It supports both text-to-video and image-to-video generation and also features native audio generation based on text prompts.
Please note that Veo 3 does not accept audio attachments.
To exclude specific elements, use --no followed by a negative prompt (e.g., blurry, cloudy, or other attributes).
To set a specific seed value, use `--seed` followed by the desired number (e.g., --seed 2).
To set aspect ratio, use `aspect_ratio` followed by either 16:9 or 9:16.
To set duration, use `--duration` followed by either 4s, 6s, 8s.
|
|
|
poe
|
- |
veo-3-vfast
|
- |
- |
Veo-3 Fast is a faster and more cost effective version of Google's Veo 3.
Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `1:1`, `9:16`).
Use `--generate_audio` to generate audio with your video at a higher cost.
Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`.
Duration is limited to 7 seconds. This is a text to video generation model only.
|
|
|
poe
|
- |
vidu
|
- |
- |
The Vidu Video Generation Bot creates videos using images and text prompts. You can generate videos in four modes:
(1) Image-to-Video: send 1 image with a prompt,
(2) Start-to-End Frame: send 2 images with a prompt for transition videos,
(3) Reference-to-Video: send up to 3 images with the `--reference` flag for guidance, and
(4) Template-to-Video: use `--template` to apply pre-designed templates (1-3 images required, pricing varies by template).
Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image.
The bot supports aspect ratios `--aspect` (16:9, 1:1, 9:16), set movement amplitude `--movement-amplitude`, and accepts PNG, JPEG, and WEBP formats.
Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video).
Duration is limited to 5 seconds.
|
|
|
poe
|
- |
vidu-q1
|
- |
- |
The Vidu Q1 Video Generation Bot creates videos using text prompts and images. You can generate videos in three modes:
(1) Text-to-Video: send a text prompt,
(2) Image-to-Video: send 1 image with a prompt, and
(3) Reference-to-Video: send up to 7 images with the `--reference flag`.
Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image.
The bot support aspect ratios `--aspect` (16:9, 1:1, 9:16) and set movement amplitude `--movement-amplitude` that can be customized for text-to-video and reference-to-video tasks.
Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video generation).
The bot accepts PNG, JPEG, and WEBP formats. Duration is limited to 5 seconds.
|
|
|
poe
|
- |
veo-3-fast
|
- |
- |
Veo 3 Fast is a speed-optimized variant of Google’s Veo 3 AI video generation engine. It’s designed for rapid, cost-efficient production of short clips with synchronized audio (dialogue, ambient sound, effects). Prioritizes faster generation times while still delivering solid visual and audio quality, supports text-to-video and image-to-video workflows, allowing creators to animate still images into motion sequences, operates under defined constraints (e.g. video lengths of 4, 6, or 8 seconds, specified via the --duration parameter, e.g. "A cat dances --duration 6" will produce a 6-second video). Use `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9`.
Please only upload photos that you own or have the right to use, otherwise the bot will throw an error.
|
|
|
poe
|
- |
seedance-1.0-pro-fast
|
- |
- |
Seedance Pro Fast is a faster version of Seedance 1.0 Pro that balances speed, quality and cost. Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following.
Optional prompts:
Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Set to `16:9` as default.
Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. Set to `1080p` as default.
`--duration` (3 to 12) sets the video duration. Set to `5s` as default.
Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
File attachment accepted: jpeg, png, webp
|
|
|
poe
|
- |
sora
|
- |
- |
Sora is OpenAI's video generation model. Use `--duration` to set the duration of the generated video, and `--resolution` to set the video's resolution (480p, 720p, or 1080p). Set the aspect ratio of the generated video with `--aspect` (Valid aspect ratios are 16:9, 1:1, 9:16). This is a text-to-video model only.
Switch to the newest models for improved video and audio creation: https://poe.com/Sora-2-Pro for cinematic excellence or https://poe.com/Sora-2 for unmatched realism and precision.
|
|
|
poe
|
- |
omnihuman
|
- |
- |
OmniHuman, by Bytedance, generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio. Send an image including a human figure with a visible face, and an audio, and the bot will return a video. The maximum audio length accepted is 30 seconds.
|
|
|
poe
|
- |
grok-code-fast-1
|
- |
- |
Grok-Code-Fast-1 from xAI is a high-performance, cost-efficient model designed for agentic coding. It offers visible reasoning traces, strong steerability, and supports a 256k context window.
|
|
|
poe
|
- |
bagoodex-web-search
|
- |
- |
Bagoodex delivers real-time AI-powered web search offering instant access to videos, images, weather, and more. Audio and video uploads are not supported at this time.
|
|
|
poe
|
- |
deep-ai-search
|
- |
- |
Deep search engine integrating Brave AI with real-time web search. This chatbot executes commands and scrapes websites at scale while preserving its hallmark intelligence advantage. The bot doesn't accept file attachments.
Examples:
https://poe.com/s/P0BQmsvbE7zusdY0n49l
https://poe.com/s/QgQSPsLD9efQrIwbmwuO
|
|
|
poe
|
- |
kling-avatar-pro
|
- |
- |
Create lifelike avatar videos featuring realistic humans, animals, cartoons, or stylized characters. Simply upload an image and an audio file to generate a video of your character speaking.
Supported file formats:
Images: JPEG, PNG, WEBP
Audio: MP3, WAV
|
|
|
poe
|
- |
playai-dialog
|
- |
- |
Generates dialogues based on your script using PlayHT's text-to-speech model, in the voices of your choice. Use --speaker_1 [voice_name] and --speaker_2 [voice_name] to pass in the voices of your choice, choosing from below. Voice defaults to `Jennifer_(English_(US)/American)`. Follow the below format while prompting (case sensitive):
FORMAT:
```
Speaker 1: ......
Speaker 2: ......
Speaker 1: ......
Speaker 2: ......
--speaker_1 [voice_1] --speaker_2 [voice_2]
```
VOICES AVAILABLE:
Jennifer_(English_(US)/American)
Dexter_(English_(US)/American)
Ava_(English_(AU)/Australian)
Tilly_(English_(AU)/Australian)
Charlotte_(Advertising)_(English_(CA)/Canadian)
Charlotte_(Meditation)_(English_(CA)/Canadian)
Cecil_(English_(GB)/British)
Sterling_(English_(GB)/British)
Cillian_(English_(IE)/Irish)
Madison_(English_(IE)/Irish)
Ada_(English_(ZA)/South_African)
Furio_(English_(IT)/Italian)
Alessandro_(English_(IT)/Italian)
Carmen_(English_(MX)/Mexican)
Sumita_(English_(IN)/Indian)
Navya_(English_(IN)/Indian)
Baptiste_(English_(FR)/French)
Lumi_(English_(FI)/Finnish)
Ronel_Conversational_(Afrikaans/South_African)
Ronel_Narrative_(Afrikaans/South_African)
Abdo_Conversational_(Arabic/Arabic)
Abdo_Narrative_(Arabic/Arabic)
Mousmi_Conversational_(Bengali/Bengali)
Mousmi_Narrative_(Bengali/Bengali)
Caroline_Conversational_(Portuguese_(BR)/Brazilian)
Caroline_Narrative_(Portuguese_(BR)/Brazilian)
Ange_Conversational_(French/French)
Ange_Narrative_(French/French)
Anke_Conversational_(German/German)
Anke_Narrative_(German/German)
Bora_Conversational_(Greek/Greek)
Bora_Narrative_(Greek/Greek)
Anuj_Conversational_(Hindi/Indian)
Anuj_Narrative_(Hindi/Indian)
Alessandro_Conversational_(Italian/Italian)
Alessandro_Narrative_(Italian/Italian)
Kiriko_Conversational_(Japanese/Japanese)
Kiriko_Narrative_(Japanese/Japanese)
Dohee_Conversational_(Korean/Korean)
Dohee_Narrative_(Korean/Korean)
Ignatius_Conversational_(Malay/Malay)
Ignatius_Narrative_(Malay/Malay)
Adam_Conversational_(Polish/Polish)
Adam_Narrative_(Polish/Polish)
Andrei_Conversational_(Russian/Russian)
Andrei_Narrative_(Russian/Russian)
Aleksa_Conversational_(Serbian/Serbian)
Aleksa_Narrative_(Serbian/Serbian)
Carmen_Conversational_(Spanish/Spanish)
Patricia_Conversational_(Spanish/Spanish)
Aiken_Conversational_(Tagalog/Filipino)
Aiken_Narrative_(Tagalog/Filipino)
Katbundit_Conversational_(Thai/Thai)
Katbundit_Narrative_(Thai/Thai)
Ali_Conversational_(Turkish/Turkish)
Ali_Narrative_(Turkish/Turkish)
Sahil_Conversational_(Urdu/Pakistani)
Sahil_Narrative_(Urdu/Pakistani)
Mary_Conversational_(Hebrew/Israeli)
Mary_Narrative_(Hebrew/Israeli)
Prompt input cannot exceed 10,000 characters.
|
|
|
poe
|
- |
luma-photon
|
- |
- |
Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
|
|
|
poe
|
- |
ideogram
|
45,000.00 |
- |
Excels at creating high-quality images from text prompts. For most prompts, https://poe.com/Ideogram-v2 will produce better results. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, & 1:1.
|
|
|
poe
|
- |
seededit-3.0
|
- |
- |
SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images. Please send an image with a prompt to edit the image.
|
|
|
poe
|
- |
kling-2.1-pro
|
- |
- |
Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`. Requires an image attachment.
|
|
|
poe
|
- |
kling-2.1-std
|
- |
- |
Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`.
|
|
|
poe
|
- |
runway-gen-4-turbo
|
- |
- |
Runway's Gen-4 Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 1:1, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds. Full prompting guide here: https://help.runwayml.com/hc/en-us/articles/39789879462419-Gen-4-Video-Prompting-Guide
|
|
|
poe
|
- |
runway
|
- |
- |
Runway's Gen-3 Alpha Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds.
|
|
|
poe
|
- |
veo-2
|
- |
- |
Veo 2 creates incredibly high-quality videos in a wide range of subjects and styles. It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall. Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver in 8-second clips. Use `--aspect_ratio` (16:9 or 9:16) to customize video aspect ratio. Supports text-to-video as well as image-to-video. Non english input will be translated first. Note: currently has low rate limit so you may need to retry your request at times of peak usage.
|
|
|
poe
|
- |
dream-machine
|
360,000.00 |
- |
Luma AI's Dream Machine is an AI model that makes high-quality, realistic videos fast from text and images. Iterate at the speed of thought, create action-packed shots, and dream worlds with consistent characters on Poe today!
To specify the aspect ratio of your video add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21). To loop your video add --loop True.
|
|
|
poe
|
- |
kling-2.0-master
|
- |
- |
Generate high-quality videos from text or images using Kling 2.0 Master. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use `--duration` to set either 5 or 10 second video.
|
|
|
poe
|
- |
qwen-edit
|
- |
- |
Image editing model based on Qwen-Image, with superior text editing capabilities.
|
|
|
poe
|
- |
gptzero
|
- |
- |
GPTZero is a deep-learning-driven platform designed to analyze and flag portions of text that are likely generated by AI vs. human authors. It distinguishes between “entirely human,” “entirely AI,” or “mixed” content and highlights the specific sentences involved.
*Max number of files that can submitted simultaneously is 50, and the max file size for all files combined is 15 MB. Each file's document will be truncated to 50,000 characters.
Supported file types: PDF, DOC/DOCX, TXT, ODT
Parameter controls available:
1. Detection Options
- Multilingual (FR/ES):
- `--multilingual true` (Enables the GPTZero multilingual model)
- `--multilingual false` (Default/Disabled)
- Model Version:
- `--modelVersion [version_string]` (Selects a specific GPTZero model version, e.g., '2025-10-30-base')
- `--modelVersion __latest__` (Default: Automatically uses the latest model version)
|
|
|
poe
|
- |
kling-pro-effects
|
- |
- |
Generate videos with effects like squishing an object, two people hugging, making heart gestures, etc. using Kling-Pro-Effects. Requires an image input. Send a single image for `squish` and `expansion` effects and two images (of people) for `hug`, `kiss`, and `heart_gesture` effects. Set effect with --effect. Default effect: `squish`. Set duration with `--duration` with either 5s or 10s, set to 5s by default.
|
|
|
poe
|
- |
hailuo-live
|
- |
- |
Hailuo Live, the latest model from Minimax, sets a new standard for bringing still images to life. From breathtakingly vivid motion to finely tuned expressions, this state-of-the-art model enables your characters to captivate, move, and shine like never before. It excels in bring art and drawings to life, exceptional realism without morphing, emotional range, and unparalleled character consistency. Generates 5 second video.
|
|
|
poe
|
- |
hailuo-ai
|
- |
- |
Best-in-class text and image to video model by MiniMax.
|
|
|
poe
|
- |
ray2
|
- |
- |
Ray2 is a large–scale video generative model capable of creating realistic visuals with natural, coherent motion. It has strong understanding of text instructions and can also take image input. Can produce videos from 540p to 4k resolution and with either 5/9s durations.
|
|
|
poe
|
- |
veo-2-video
|
- |
- |
Veo2 is Google's cutting-edge video generation model. Veo creates videos with realistic motion and high quality output.
|
|
|
poe
|
- |
wan-2.1
|
- |
- |
Wan-2.1 is a text-to-video and image-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Generates 5 second video.
|
|
|
poe
|
- |
ideogram-v2a-turbo
|
24,000.00 |
- |
Fast, affordable text-to-image model, optimized for graphic design and photography. For higher quality, use https://poe.com/Ideogram-v2A
Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
|
|
|
poe
|
- |
ideogram-v2a
|
39,000.00 |
- |
Fast, affordable text-to-image model, optimized for graphic design and photography. For faster and more cost-effective generations, use https://poe.com/Ideogram-v2A-Turbo
Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
|
|
|
poe
|
- |
trellis-3d
|
- |
- |
Generate 3D models from your images using Trellis, a native 3D generative model enabling versatile and high-quality 3D asset creation. Send an image to convert it into a 3D model.
|
|
|
poe
|
- |
flux-dev-finetuner
|
- |
- |
Fine-tune the FLUX dev model with your own pictures! Upload 8-12 of them (same subject, only one subject in the picture, ideally from different poses and backgrounds) and wait ~2-5 minutes to create your own finetuned bot that will generate pictures of this subject in whatever setting you want.
|
|
|
poe
|
- |
flux-inpaint
|
- |
- |
Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
|
|
|
poe
|
- |
flux-fill
|
- |
- |
Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
|
|
|
poe
|
- |
bria-eraser
|
- |
- |
Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Send an image and a black-and-white mask image denoting the objects to be cleared out from the image. The input prompt is only used to create the filename of the output image.
|
|
|
poe
|
- |
aya-vision
|
30.00 |
- |
Aya Vision is a 32B open-weights multimodal model with advanced capabilities optimized for a variety of vision-language use cases. It is model trained to excel in 23 languages in both vision and text: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
|
|
|
poe
|
- |
kling-1.5-pro
|
- |
- |
Kling v1.5 video generation bot, hosted by fal.ai. For best results, upload an image attachment. Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
|
|
|
poe
|
- |
deepreasoning
|
- |
- |
DeepReasoning (previously DeepClaude) is a high-performance LLM inference that combines DeepSeek R1's Chain of Thought (CoT) reasoning capabilities with Anthropic Claude's creative and code generation prowess. It provides a unified interface for leveraging the strengths of both models while maintaining complete control over your data. Learn more: https://deepclaude.com/
|
|
|
poe
|
- |
gemma-3-27b
|
- |
- |
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2
|
|
|
poe
|
- |
qwen3-32b-cs
|
3,600.00 |
- |
World’s fastest inference for Qwen 3 32B with Cerebras.
Append /no_think to your prompt to disable the model's default reasoning behavior.
|
|
|
poe
|
- |
qwen-2.5-vl-32b
|
6,600.00 |
- |
Qwen2.5-VL-32B's mathematical and problem-solving capabilities have been strengthened through reinforcement learning, leading to a significantly improved user experience. The model's response styles have been refined to better align with human preferences, particularly for objective queries involving mathematics, logical reasoning, and knowledge-based Q&A. As a result, responses now feature greater detail, improved clarity, and enhanced formatting.
|
|
|
poe
|
- |
qwen2.5-vl-72b-t
|
8,700.00 |
- |
Qwen 2.5 VL 72B, a cutting-edge multimodal model from the Qwen Team, excels in visual and video understanding, multilingual text/image processing (including Japanese, Arabic, and Korean), and dynamic agentic reasoning for automation. It supports long-context comprehension (32K tokens)
|
|
|
poe
|
- |
mistral-small-3
|
0.10 |
0.30 |
Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks--those that require robust language and instruction following performance, with very low latency. Released under an Apache 2.0 license and comparable to Llama-3.3-70B and Qwen2.5-32B-Instruct.
|
|
|
poe
|
- |
deepseek-v3-di
|
4,300.00 |
- |
Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company.
Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
|
|
|
poe
|
- |
deepseek-v3-turbo-di
|
5,900.00 |
- |
Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. Turbo variant is quantized to achieve higher speeds. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company.
Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
|
|
|
poe
|
- |
phi-4-di
|
300.00 |
- |
Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed.
At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.
All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company.
Supports 16k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
|
|
|
poe
|
- |
mistral-7b-v0.3-di
|
150.00 |
- |
Mistral Instruct 7B v0.3 from Mistral AI.
All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company.
Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
|
|
|
poe
|
- |
aya-expanse-32b
|
5,100.00 |
- |
Aya Expanse is a 32B open-weight research release of a model with highly advanced multilingual capabilities. Aya supports state-of-art generative capabilities in 23 languages: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
|
|
|
poe
|
- |
liveportrait
|
- |
- |
Animates given portraits with the motion's in the video. Powered by fal.ai
|
|
|
poe
|
- |
llama-3.1-8b-t-128k
|
3,000.00 |
- |
Llama 3.1 8B Instruct from Meta. Supports 128k tokens of context.
The points price is subject to change.
|
|
|
poe
|
- |
stablediffusion3-2b
|
- |
- |
Stable Diffusion v3 Medium - by fal.ai
|
|
|
poe
|
- |
mixtral8x22b-inst-fw
|
3,600.00 |
- |
Mixtral 8x22B Mixture-of-Experts instruct model from Mistral hosted by Fireworks.
|
|
|
poe
|
- |
command-r
|
5,100.00 |
- |
I can search the web for up to date information and respond in over 10 languages!
|
|
|
poe
|
- |
mistral-large-2
|
3.00 |
9.00 |
Mistral's latest text generation model (Mistral-Large-2407) with top-tier reasoning capabilities. It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. This bot has the full 128k context window supported by the model.
|
|
|
poe
|
- |
dall-e-3
|
45,000.00 |
- |
OpenAI's most powerful image generation model. Generates high quality images with intricate details based on the user's most recent prompt. For most prompts, https://poe.com/FLUX-pro-1.1-ultra or https://poe.com/FLUX-dev or https://poe.com/Imagen3 will produce better results. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 1:1, 7:4, & 4:7.
|
|
|
poe
|
- |
reka-core
|
- |
- |
Reka's largest and most capable multimodal language model. Works with text, images, and video inputs. 8k context length.
|
|
|
poe
|
- |
reka-flash
|
- |
- |
Reka's efficient and capable 21B multimodal model optimized for fast workloads and amazing quality. Works with text, images and video inputs.
|
|
|
poe
|
- |
command-r-plus
|
5,100.00 |
- |
A supercharged version of Command R. I can search the web for up to date information and respond in over 10 languages!
|
|
|
poe
|
- |
claude-sonnet-3.5-june
|
2.60 |
13.00 |
Anthropic's legacy Sonnet 3.5 model, specifically the June 2024 snapshot (for the latest, please use https://poe.com/Claude-Sonnet-3.5). Excels in complex tasks like coding, writing, analysis and visual processing; generally, more verbose than the more concise October 2024 snapshot.
|
|
|
poe
|
- |
gpt-3.5-turbo
|
0.45 |
1.40 |
OpenAI’s GPT 3.5 Turbo model is a powerful language generation system designed to provide highly coherent, contextually relevant, and detailed responses. Supports 16,384 tokens of context.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
sketch-to-image
|
- |
- |
Takes in sketches and converts them to colored images.
|
|
|
poe
|
- |
qwen2.5-coder-32b
|
1,500.00 |
- |
Qwen2.5-Coder is the latest series of code-specific Qwen large language models (formerly known as CodeQwen), developed by Alibaba.
|
|
|
poe
|
- |
stablediffusion3.5-t
|
- |
- |
Faster version of Stable Diffusion 3 Large, hosted by @fal. Excels for fast image generation. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1).
|
|
|
poe
|
- |
flux-pro-1.1-t
|
30,000.00 |
- |
The best state of the art image model from BFL. FLUX 1.1 Pro generates images six times faster than its predecessor, FLUX 1 Pro, while also improving image quality, prompt adherence, and output diversity. The bot does not support any attachments.
|
|
|
poe
|
- |
flux-schnell-t
|
2,100.00 |
- |
Lightning-fast AI image generation model that excels in producing high-quality visuals in just seconds. Great for quick prototyping or real-time use cases. This is the fastest version of FLUX.1.
The bot does not support any attachments.
|
|
|
poe
|
- |
recraft-v3
|
- |
- |
Recraft V3, state of the art image generation. Prompt input cannot exceed 1,000 characters.
Use --style for styles, and --aspect for aspect ratio configuration (16:9, 4:3, 1:1, 3:4, 9:16).
Available styles: realistic_image, digital_illustration, vector_illustration, realistic_image/b_and_w, realistic_image/hard_flash, realistic_image/hdr, realistic_image/natural_light, realistic_image/studio_portrait, realistic_image/enterprise, realistic_image/motion_blur, digital_illustration/pixel_art, digital_illustration/hand_drawn, digital_illustration/grain, digital_illustration/infantile_sketch, digital_illustration/2d_art_poster, digital_illustration/handmade_3d, digital_illustration/hand_drawn_outline, digital_illustration/engraving_color, digital_illustration/2d_art_poster_2, vector_illustration/engraving, vector_illustration/line_art, vector_illustration/line_circuit, vector_illustration/linocut
|
|
|
poe
|
- |
llama-3-70b-t
|
2,300.00 |
- |
Llama 3 70B Instruct from Meta. For most use cases, https://poe.com/Llama-3.3-70B will perform better.
|
|
|
poe
|
- |
gpt-4o-aug
|
2.20 |
9.00 |
OpenAI's most powerful model, GPT-4o, using the August 2024 model snapshot. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
gpt-4-classic-0314
|
27.00 |
54.00 |
OpenAI's GPT-4 model. Powered by gpt-4-0314 (non-Turbo) for text input and gpt-4o for image input. For most use cases, https://poe.com/GPT-4o will perform significantly better.
|
|
|
poe
|
- |
gpt-4-classic
|
27.00 |
54.00 |
OpenAI's GPT-4 model. Powered by gpt-4-0613 (non-Turbo) for text input and gpt-4o for image input.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
solar-pro-2
|
2,100.00 |
- |
Solar Pro 2 is Upstage's latest frontier-scale LLM. With just 31B parameters, it delivers top-tier performance through world-class multilingual support, advanced reasoning, and real-world tool use. Especially in Korean, it outperforms much larger models across critical benchmarks. Built for the next generation of practical LLMs, Solar Pro 2 proves that smaller models can still lead. Supports a context length of 64k tokens.
|
|
|
poe
|
- |
remove-background
|
- |
- |
Remove background from your images
|
|
|
poe
|
- |
sana-t2i
|
- |
- |
SANA can synthesize high-resolution, high-quality images at a remarkably fast rate, with the ability to generate 4K images in less than a second.
Optional parameters:
Set aspect ratio, with options 16:9, 4:3, 1:1, 3:4 and 9:16. This is set to 4:3 by default.
|
|
|
poe
|
- |
mistral-7b-v0.3-t
|
1,400.00 |
- |
Mistral Instruct 7B v0.3 from Mistral AI.
The points price is subject to change.
|
|
|
poe
|
- |
tako
|
30,000.00 |
- |
Tako is a bot that transforms your questions about stocks, sports, economics or politics into interactive, shareable knowledge cards from trusted sources. Tako's knowledge graph is built exclusively from authoritative, real-time data providers, and is embeddable in your apps, research and storytelling. You can adjust the specificity threshold by typing `--specificity 30` (or a value between 0 - 100) at the end of your query/question; the default is 60.
|
|
|
poe
|
- |
llama-3.1-405b-fp16
|
62,000.00 |
- |
The Biggest and Best open-source AI model trained by Meta, beating GPT-4o across most benchmarks. This bot is in BF16 and with 128K context length.
|
|
|
poe
|
- |
llama-3.1-8b-fp16
|
1,500.00 |
- |
The smallest and fastest member of the Llama 3.1 family, offering exceptional efficiency and rapid response times with 128K context length.
|
|
|
poe
|
- |
llama-3.1-70b-fp16
|
6,000.00 |
- |
The best LLM at its size with faster response times compared to the 405B model with 128K context length.
|
|
|
poe
|
- |
llama-3-70b-fp16
|
6,000.00 |
- |
A highly efficient and powerful model designed for a veriety of tasks with 128K context length.
|
|
|
poe
|
- |
restyler
|
- |
- |
This bot enables rapid transformation of existing images, delivering high-quality style transfers and image modifications. Takes in a text input and an image attachment. Use --strength to control the guidance given by the initial image, with higher values adhering to the image more strongly.
|
|
|
poe
|
- |
stablediffusionxl
|
3,600.00 |
- |
Generates high quality images based on the user's most recent prompt.
Allows users to specify elements to avoid in the image using the "--no" parameter at the end of the prompt. Select an aspect ratio with "--aspect". (e.g. "Tall trees, daylight --no rain --aspect 7:4"). Valid aspect ratios are 1:1, 7:4, 4:7, 9:7, 7:9, 19:13, 13:19, 12:5, & 5:12.
Powered by Stable Diffusion XL.
|
|
|
poe
|
- |
qwen-2.5-7b-t
|
2,300.00 |
- |
Qwen 2.5 7B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages.
|
|
|
poe
|
- |
qwen-2.5-72b-t
|
9,000.00 |
- |
Qwen 2.5 72B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages.
Delivering results on par with Llama-3-405B despite using only one-fifth of the parameters.
|
|
|
poe
|
- |
python
|
30.00 |
- |
Executes Python code (version 3.11) from the user message and outputs the results. If there are code blocks in the user message (surrounded by triple backticks), then only the code blocks will be executed. These libraries are imported into this bot's run-time automatically -- numpy, pandas, requests, matplotlib, scikit-learn, torch, PyYAML, tensorflow, scipy, pytest -- along with ~150 of the most widely used Python libraries.
|
|
|
poe
|
- |
markitdown
|
- |
- |
Convert anything to Markdown: URLs, PDFs, Word, Excel, PowerPoint, images (EXIF metadata), audio (EXIF metadata and transcription), and more. This bot wraps Microsoft’s MarkItDown MCP server (https://github.com/microsoft/markitdown).
|
|
|
poe
|
- |
gpt-4-turbo
|
9.00 |
27.00 |
Powered by OpenAI's GPT-4 Turbo with Vision. For most tasks, https://poe.com/GPT-4o will perform better. Supports 128k tokens of context. Requests with images will be routed to @GPT-4o.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
flux-1-schnell-fw
|
1,000.00 |
- |
FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
Key Features
1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives.
2. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps.
3. Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.
|
|
|
poe
|
- |
flux-1-dev-fw
|
11,000.00 |
- |
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
Key Features
1. Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro].
2. Competitive prompt following, matching the performance of closed source alternatives.
3. Trained using guidance distillation, making FLUX.1 [dev] more efficient.
4. Open weights to drive new scientific research, and empower artists to develop innovative workflows.
5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.
|
|
|
poe
|
- |
mochi-preview
|
- |
- |
Open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence. Supports both text-to-video and image-to-video. Generates 5 second video.
|
|
|
poe
|
- |
gpt-3.5-turbo-instruct
|
1.40 |
1.80 |
Powered by gpt-3.5-turbo-instruct.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
gpt-3.5-turbo-raw
|
0.45 |
1.40 |
Powered by gpt-3.5-turbo without a system prompt.
Check out the newest version of this bot here: https://poe.com/GPT-5.
|
|
|
poe
|
- |
interpreter
|
- |
- |
Interpreter for Poe Python
|
|
|
poe
|
- |
claude-haiku-3
|
0.21 |
1.10 |
Anthropic's Claude Haiku 3 outperforms models in its intelligence category on performance, speed and cost without the need for specialized fine-tuning. The compute points value is subject to change. For most use cases, https://poe.com/Claude-Haiku-3.5 will be better.
|
|
|
poe
|
- |
code-saver
|
- |
- |
A system bot that handles Poe scripts in chat.
|
|
|
poe
|
- |
code-editor
|
- |
- |
Official code editor for Poe Scripting using Python, used to connect multiple Poe bots and create AI workflows. Guide and tips: https://creator.poe.com/docs/script-bots/poe-python-reference
|
|
|
moonshotaicn
|
Kimi K2 Thinking Turbo |
kimi-k2-thinking-turbo
|
1.15 |
8.00 |
Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
|
|
|
moonshotaicn
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
|
|
|
moonshotaicn
|
Kimi K2 0905 |
kimi-k2-0905-preview
|
0.60 |
2.50 |
Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
|
|
|
moonshotaicn
|
Kimi K2 0711 |
kimi-k2-0711-preview
|
0.60 |
2.50 |
Provider: Moonshot AI (China), Context: 131072, Output Limit: 16384
|
|
|
moonshotaicn
|
Kimi K2 Turbo |
kimi-k2-turbo-preview
|
2.40 |
10.00 |
Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
|
|
|
lucidquery
|
LucidQuery Nexus Coder |
lucidquery-nexus-coder
|
2.00 |
5.00 |
Provider: LucidQuery AI, Context: 250000, Output Limit: 60000
|
|
|
lucidquery
|
LucidNova RF1 100B |
lucidnova-rf1-100b
|
2.00 |
5.00 |
Provider: LucidQuery AI, Context: 120000, Output Limit: 8000
|
|
|
moonshotai
|
Kimi K2 Thinking Turbo |
kimi-k2-thinking-turbo
|
1.15 |
8.00 |
Provider: Moonshot AI, Context: 262144, Output Limit: 262144
|
|
|
moonshotai
|
Kimi K2 Turbo |
kimi-k2-turbo-preview
|
2.40 |
10.00 |
Provider: Moonshot AI, Context: 262144, Output Limit: 262144
|
|
|
moonshotai
|
Kimi K2 0711 |
kimi-k2-0711-preview
|
0.60 |
2.50 |
Provider: Moonshot AI, Context: 131072, Output Limit: 16384
|
|
|
moonshotai
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Provider: Moonshot AI, Context: 262144, Output Limit: 262144
|
|
|
moonshotai
|
Kimi K2 0905 |
kimi-k2-0905-preview
|
0.60 |
2.50 |
Provider: Moonshot AI, Context: 262144, Output Limit: 262144
|
|
|
ollamacloud
|
Kimi K2 Thinking |
kimi-k2-thinking:cloud
|
- |
- |
Provider: Ollama Cloud, Context: 256000, Output Limit: 8192
|
|
|
ollamacloud
|
Qwen3-VL 235B Instruct |
qwen3-vl-235b-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
|
|
|
ollamacloud
|
Qwen3 Coder 480B |
qwen3-coder:480b-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
|
|
|
ollamacloud
|
GPT-OSS 120B |
gpt-oss:120b-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
|
|
|
ollamacloud
|
DeepSeek-V3.1 671B |
deepseek-v3.1:671b-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 160000, Output Limit: 8192
|
|
|
ollamacloud
|
GLM-4.6 |
glm-4.6:cloud
|
- |
- |
Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
|
|
|
ollamacloud
|
Cogito 2.1 671B |
cogito-2.1:671b-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 160000, Output Limit: 8192
|
|
|
ollamacloud
|
GPT-OSS 20B |
gpt-oss:20b-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
|
|
|
ollamacloud
|
Qwen3-VL 235B Instruct |
qwen3-vl-235b-instruct-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
|
|
|
ollamacloud
|
Kimi K2 |
kimi-k2:1t-cloud
|
- |
- |
Provider: Ollama Cloud, Context: 256000, Output Limit: 8192
|
|
|
ollamacloud
|
MiniMax M2 |
minimax-m2:cloud
|
- |
- |
Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
|
|
|
ollamacloud
|
Gemini 3 Pro Preview |
gemini-3-pro-preview:latest
|
- |
- |
Provider: Ollama Cloud, Context: 1000000, Output Limit: 64000
|
|
|
xiaomi
|
MiMo-V2-Flash |
mimo-v2-flash
|
0.07 |
0.21 |
Provider: Xiaomi, Context: 256000, Output Limit: 32000
|
|
|
alibaba
|
Qwen3-LiveTranslate Flash Realtime |
qwen3-livetranslate-flash-realtime
|
10.00 |
10.00 |
Provider: Alibaba, Context: 53248, Output Limit: 4096
|
|
|
alibaba
|
Qwen3-ASR Flash |
qwen3-asr-flash
|
0.04 |
0.04 |
Provider: Alibaba, Context: 53248, Output Limit: 4096
|
|
|
alibaba
|
Qwen-Omni Turbo |
qwen-omni-turbo
|
0.07 |
0.27 |
Provider: Alibaba, Context: 32768, Output Limit: 2048
|
|
|
alibaba
|
Qwen-VL Max |
qwen-vl-max
|
0.80 |
3.20 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen3-Next 80B-A3B Instruct |
qwen3-next-80b-a3b-instruct
|
0.50 |
2.00 |
Provider: Alibaba, Context: 131072, Output Limit: 32768
|
|
|
alibaba
|
Qwen Turbo |
qwen-turbo
|
0.05 |
0.20 |
Provider: Alibaba, Context: 1000000, Output Limit: 16384
|
|
|
alibaba
|
Qwen3-VL 235B-A22B |
qwen3-vl-235b-a22b
|
0.70 |
2.80 |
Provider: Alibaba, Context: 131072, Output Limit: 32768
|
|
|
alibaba
|
Qwen3 Coder Flash |
qwen3-coder-flash
|
0.30 |
1.50 |
Provider: Alibaba, Context: 1000000, Output Limit: 65536
|
|
|
alibaba
|
Qwen3-VL 30B-A3B |
qwen3-vl-30b-a3b
|
0.20 |
0.80 |
Provider: Alibaba, Context: 131072, Output Limit: 32768
|
|
|
alibaba
|
Qwen3 14B |
qwen3-14b
|
0.35 |
1.40 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
QVQ Max |
qvq-max
|
1.20 |
4.80 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen Plus Character (Japanese) |
qwen-plus-character-ja
|
0.50 |
1.40 |
Provider: Alibaba, Context: 8192, Output Limit: 512
|
|
|
alibaba
|
Qwen2.5 14B Instruct |
qwen2-5-14b-instruct
|
0.35 |
1.40 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
QwQ Plus |
qwq-plus
|
0.80 |
2.40 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen3-Coder 30B-A3B Instruct |
qwen3-coder-30b-a3b-instruct
|
0.45 |
2.25 |
Provider: Alibaba, Context: 262144, Output Limit: 65536
|
|
|
alibaba
|
Qwen-VL OCR |
qwen-vl-ocr
|
0.72 |
0.72 |
Provider: Alibaba, Context: 34096, Output Limit: 4096
|
|
|
alibaba
|
Qwen2.5 72B Instruct |
qwen2-5-72b-instruct
|
1.40 |
5.60 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen3-Omni Flash |
qwen3-omni-flash
|
0.43 |
1.66 |
Provider: Alibaba, Context: 65536, Output Limit: 16384
|
|
|
alibaba
|
Qwen Flash |
qwen-flash
|
0.05 |
0.40 |
Provider: Alibaba, Context: 1000000, Output Limit: 32768
|
|
|
alibaba
|
Qwen3 8B |
qwen3-8b
|
0.18 |
0.70 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen3-Omni Flash Realtime |
qwen3-omni-flash-realtime
|
0.52 |
1.99 |
Provider: Alibaba, Context: 65536, Output Limit: 16384
|
|
|
alibaba
|
Qwen2.5-VL 72B Instruct |
qwen2-5-vl-72b-instruct
|
2.80 |
8.40 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen3-VL Plus |
qwen3-vl-plus
|
0.20 |
1.60 |
Provider: Alibaba, Context: 262144, Output Limit: 32768
|
|
|
alibaba
|
Qwen Plus |
qwen-plus
|
0.40 |
1.20 |
Provider: Alibaba, Context: 1000000, Output Limit: 32768
|
|
|
alibaba
|
Qwen2.5 32B Instruct |
qwen2-5-32b-instruct
|
0.70 |
2.80 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen2.5-Omni 7B |
qwen2-5-omni-7b
|
0.10 |
0.40 |
Provider: Alibaba, Context: 32768, Output Limit: 2048
|
|
|
alibaba
|
Qwen Max |
qwen-max
|
1.60 |
6.40 |
Provider: Alibaba, Context: 32768, Output Limit: 8192
|
|
|
alibaba
|
Qwen2.5 7B Instruct |
qwen2-5-7b-instruct
|
0.18 |
0.70 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen2.5-VL 7B Instruct |
qwen2-5-vl-7b-instruct
|
0.35 |
1.05 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
alibaba
|
Qwen3 235B-A22B |
qwen3-235b-a22b
|
0.70 |
2.80 |
Provider: Alibaba, Context: 131072, Output Limit: 16384
|
|
|
alibaba
|
Qwen-Omni Turbo Realtime |
qwen-omni-turbo-realtime
|
0.27 |
1.07 |
Provider: Alibaba, Context: 32768, Output Limit: 2048
|
|
|
alibaba
|
Qwen-MT Turbo |
qwen-mt-turbo
|
0.16 |
0.49 |
Provider: Alibaba, Context: 16384, Output Limit: 8192
|
|
|
alibaba
|
Qwen3-Coder 480B-A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
1.50 |
7.50 |
Provider: Alibaba, Context: 262144, Output Limit: 65536
|
|
|
alibaba
|
Qwen-MT Plus |
qwen-mt-plus
|
2.46 |
7.37 |
Provider: Alibaba, Context: 16384, Output Limit: 8192
|
|
|
alibaba
|
Qwen3 Max |
qwen3-max
|
1.20 |
6.00 |
Provider: Alibaba, Context: 262144, Output Limit: 65536
|
|
|
alibaba
|
Qwen3 Coder Plus |
qwen3-coder-plus
|
1.00 |
5.00 |
Provider: Alibaba, Context: 1048576, Output Limit: 65536
|
|
|
alibaba
|
Qwen3-Next 80B-A3B (Thinking) |
qwen3-next-80b-a3b-thinking
|
0.50 |
6.00 |
Provider: Alibaba, Context: 131072, Output Limit: 32768
|
|
|
alibaba
|
Qwen3 32B |
qwen3-32b
|
0.70 |
2.80 |
Provider: Alibaba, Context: 131072, Output Limit: 16384
|
|
|
alibaba
|
Qwen-VL Plus |
qwen-vl-plus
|
0.21 |
0.63 |
Provider: Alibaba, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 4 Fast (Non-Reasoning) |
grok-4-fast-non-reasoning
|
0.20 |
0.50 |
Provider: xAI, Context: 2000000, Output Limit: 30000
|
|
|
xai
|
Grok 3 Fast |
grok-3-fast
|
5.00 |
25.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 4 |
grok-4
|
3.00 |
15.00 |
Provider: xAI, Context: 256000, Output Limit: 64000
|
|
|
xai
|
Grok 2 Vision |
grok-2-vision
|
2.00 |
10.00 |
Provider: xAI, Context: 8192, Output Limit: 4096
|
|
|
xai
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
Provider: xAI, Context: 256000, Output Limit: 10000
|
|
|
xai
|
Grok 2 |
grok-2
|
2.00 |
10.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 3 Mini Fast Latest |
grok-3-mini-fast-latest
|
0.60 |
4.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 2 Vision (1212) |
grok-2-vision-1212
|
2.00 |
10.00 |
Provider: xAI, Context: 8192, Output Limit: 4096
|
|
|
xai
|
Grok 3 |
grok-3
|
3.00 |
15.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 4 Fast |
grok-4-fast
|
0.20 |
0.50 |
Provider: xAI, Context: 2000000, Output Limit: 30000
|
|
|
xai
|
Grok 2 Latest |
grok-2-latest
|
2.00 |
10.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 4.1 Fast |
grok-4-1-fast
|
0.20 |
0.50 |
Provider: xAI, Context: 2000000, Output Limit: 30000
|
|
|
xai
|
Grok 2 (1212) |
grok-2-1212
|
2.00 |
10.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 3 Fast Latest |
grok-3-fast-latest
|
5.00 |
25.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 3 Latest |
grok-3-latest
|
3.00 |
15.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 2 Vision Latest |
grok-2-vision-latest
|
2.00 |
10.00 |
Provider: xAI, Context: 8192, Output Limit: 4096
|
|
|
xai
|
Grok Vision Beta |
grok-vision-beta
|
5.00 |
15.00 |
Provider: xAI, Context: 8192, Output Limit: 4096
|
|
|
xai
|
Grok 3 Mini |
grok-3-mini
|
0.30 |
0.50 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok Beta |
grok-beta
|
5.00 |
15.00 |
Provider: xAI, Context: 131072, Output Limit: 4096
|
|
|
xai
|
Grok 3 Mini Latest |
grok-3-mini-latest
|
0.30 |
0.50 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
xai
|
Grok 4.1 Fast (Non-Reasoning) |
grok-4-1-fast-non-reasoning
|
0.20 |
0.50 |
Provider: xAI, Context: 2000000, Output Limit: 30000
|
|
|
xai
|
Grok 3 Mini Fast |
grok-3-mini-fast
|
0.60 |
4.00 |
Provider: xAI, Context: 131072, Output Limit: 8192
|
|
|
vultr
|
DeepSeek R1 Distill Qwen 32B |
deepseek-r1-distill-qwen-32b
|
0.20 |
0.20 |
Provider: Vultr, Context: 121808, Output Limit: 8192
|
|
|
vultr
|
Qwen2.5 Coder 32B Instruct |
qwen2.5-coder-32b-instruct
|
0.20 |
0.20 |
Provider: Vultr, Context: 12952, Output Limit: 2048
|
|
|
vultr
|
Kimi K2 Instruct |
kimi-k2-instruct
|
0.20 |
0.20 |
Provider: Vultr, Context: 58904, Output Limit: 4096
|
|
|
vultr
|
DeepSeek R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.20 |
0.20 |
Provider: Vultr, Context: 121808, Output Limit: 8192
|
|
|
vultr
|
GPT OSS 120B |
gpt-oss-120b
|
0.20 |
0.20 |
Provider: Vultr, Context: 121808, Output Limit: 8192
|
|
|
nvidia
|
Kimi K2 0905 |
kimi-k2-instruct-0905
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 262144
|
|
|
nvidia
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 262144
|
|
|
nvidia
|
Kimi K2 Instruct |
kimi-k2-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 8192
|
|
|
nvidia
|
nvidia-nemotron-nano-9b-v2 |
nvidia-nemotron-nano-9b-v2
|
0.00 |
0.00 |
Provider: Nvidia, Context: 131072, Output Limit: 131072
|
|
|
nvidia
|
Cosmos Nemotron 34B |
cosmos-nemotron-34b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 131072, Output Limit: 8192
|
|
|
nvidia
|
Llama Embed Nemotron 8B |
llama-embed-nemotron-8b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 32768, Output Limit: 2048
|
|
|
nvidia
|
nemotron-3-nano-30b-a3b |
nemotron-3-nano-30b-a3b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 131072, Output Limit: 131072
|
|
|
nvidia
|
Parakeet TDT 0.6B v2 |
parakeet-tdt-0.6b-v2
|
0.00 |
0.00 |
Provider: Nvidia, Context: N/A, Output Limit: 4096
|
|
|
nvidia
|
NeMo Retriever OCR v1 |
nemoretriever-ocr-v1
|
0.00 |
0.00 |
Provider: Nvidia, Context: N/A, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.3 Nemotron Super 49b V1 |
llama-3.3-nemotron-super-49b-v1
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.1 Nemotron 51b Instruct |
llama-3.1-nemotron-51b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama3 Chatqa 1.5 70b |
llama3-chatqa-1.5-70b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama-3.1-Nemotron-Ultra-253B-v1 |
llama-3.1-nemotron-ultra-253b-v1
|
0.00 |
0.00 |
Provider: Nvidia, Context: 131072, Output Limit: 8192
|
|
|
nvidia
|
Llama 3.1 Nemotron 70b Instruct |
llama-3.1-nemotron-70b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Nemotron 4 340b Instruct |
nemotron-4-340b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.3 Nemotron Super 49b V1.5 |
llama-3.3-nemotron-super-49b-v1.5
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
MiniMax-M2 |
minimax-m2
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 16384
|
|
|
nvidia
|
Gemma 3n E2b It |
gemma-3n-e2b-it
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Codegemma 1.1 7b |
codegemma-1.1-7b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Gemma 3n E4b It |
gemma-3n-e4b-it
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Gemma 2 2b It |
gemma-2-2b-it
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Gemma 3 12b It |
gemma-3-12b-it
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Codegemma 7b |
codegemma-7b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Gemma 3 1b It |
gemma-3-1b-it
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Gemma 2 27b It |
gemma-2-27b-it
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Gemma-3-27B-IT |
gemma-3-27b-it
|
0.00 |
0.00 |
Provider: Nvidia, Context: 131072, Output Limit: 8192
|
|
|
nvidia
|
Phi 3 Medium 128k Instruct |
phi-3-medium-128k-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Phi 3 Small 128k Instruct |
phi-3-small-128k-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Phi 3.5 Vision Instruct |
phi-3.5-vision-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Phi 3 Small 8k Instruct |
phi-3-small-8k-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 8000, Output Limit: 4096
|
|
|
nvidia
|
Phi 3.5 Moe Instruct |
phi-3.5-moe-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Phi-4-Mini |
phi-4-mini-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 131072, Output Limit: 8192
|
|
|
nvidia
|
Phi 3 Medium 4k Instruct |
phi-3-medium-4k-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 4000, Output Limit: 4096
|
|
|
nvidia
|
Phi 3 Vision 128k Instruct |
phi-3-vision-128k-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Whisper Large v3 |
whisper-large-v3
|
0.00 |
0.00 |
Provider: Nvidia, Context: N/A, Output Limit: 4096
|
|
|
nvidia
|
GPT-OSS-120B |
gpt-oss-120b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 8192
|
|
|
nvidia
|
Qwen3-Next-80B-A3B-Instruct |
qwen3-next-80b-a3b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 16384
|
|
|
nvidia
|
Qwen2.5 Coder 32b Instruct |
qwen2.5-coder-32b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Qwen2.5 Coder 7b Instruct |
qwen2.5-coder-7b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Qwen3-235B-A22B |
qwen3-235b-a22b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 131072, Output Limit: 8192
|
|
|
nvidia
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 66536
|
|
|
nvidia
|
Qwq 32b |
qwq-32b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Qwen3-Next-80B-A3B-Thinking |
qwen3-next-80b-a3b-thinking
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 16384
|
|
|
nvidia
|
Devstral-2-123B-Instruct-2512 |
devstral-2-123b-instruct-2512
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 262144
|
|
|
nvidia
|
Mistral Large 3 675B Instruct 2512 |
mistral-large-3-675b-instruct-2512
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 262144
|
|
|
nvidia
|
Ministral 3 14B Instruct 2512 |
ministral-14b-instruct-2512
|
0.00 |
0.00 |
Provider: Nvidia, Context: 262144, Output Limit: 262144
|
|
|
nvidia
|
Mamba Codestral 7b V0.1 |
mamba-codestral-7b-v0.1
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Mistral Large 2 Instruct |
mistral-large-2-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Codestral 22b Instruct V0.1 |
codestral-22b-instruct-v0.1
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Mistral Small 3.1 24b Instruct 2503 |
mistral-small-3.1-24b-instruct-2503
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.2 11b Vision Instruct |
llama-3.2-11b-vision-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama3 70b Instruct |
llama3-70b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.3 70b Instruct |
llama-3.3-70b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.2 1b Instruct |
llama-3.2-1b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 4 Scout 17b 16e Instruct |
llama-4-scout-17b-16e-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 4 Maverick 17b 128e Instruct |
llama-4-maverick-17b-128e-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Codellama 70b |
codellama-70b
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.1 405b Instruct |
llama-3.1-405b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama3 8b Instruct |
llama3-8b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Llama 3.1 70b Instruct |
llama-3.1-70b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Deepseek R1 0528 |
deepseek-r1-0528
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
Deepseek R1 |
deepseek-r1
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
DeepSeek V3.1 Terminus |
deepseek-v3.1-terminus
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 8192
|
|
|
nvidia
|
DeepSeek V3.1 |
deepseek-v3.1
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 8192
|
|
|
nvidia
|
Deepseek Coder 6.7b Instruct |
deepseek-coder-6.7b-instruct
|
0.00 |
0.00 |
Provider: Nvidia, Context: 128000, Output Limit: 4096
|
|
|
nvidia
|
FLUX.1-dev |
flux.1-dev
|
0.00 |
0.00 |
Provider: Nvidia, Context: 4096, Output Limit: N/A
|
|
|
cohere
|
Command A Translate |
command-a-translate-08-2025
|
2.50 |
10.00 |
Provider: Cohere, Context: 8000, Output Limit: 8000
|
|
|
cohere
|
Command A |
command-a-03-2025
|
2.50 |
10.00 |
Provider: Cohere, Context: 256000, Output Limit: 8000
|
|
|
cohere
|
Command R |
command-r-08-2024
|
0.15 |
0.60 |
Provider: Cohere, Context: 128000, Output Limit: 4000
|
|
|
cohere
|
Command R+ |
command-r-plus-08-2024
|
2.50 |
10.00 |
Provider: Cohere, Context: 128000, Output Limit: 4000
|
|
|
cohere
|
Command R7B |
command-r7b-12-2024
|
0.04 |
0.15 |
Provider: Cohere, Context: 128000, Output Limit: 4000
|
|
|
cohere
|
Command A Reasoning |
command-a-reasoning-08-2025
|
2.50 |
10.00 |
Provider: Cohere, Context: 256000, Output Limit: 32000
|
|
|
cohere
|
Command A Vision |
command-a-vision-07-2025
|
2.50 |
10.00 |
Provider: Cohere, Context: 128000, Output Limit: 8000
|
|
|
upstage
|
solar-mini |
solar-mini
|
0.15 |
0.15 |
Provider: Upstage, Context: 32768, Output Limit: 4096
|
|
|
upstage
|
solar-pro2 |
solar-pro2
|
0.25 |
0.25 |
Provider: Upstage, Context: 65536, Output Limit: 8192
|
|
|
groq
|
Llama 3.1 8B Instant |
llama-3.1-8b-instant
|
0.05 |
0.08 |
Provider: Groq, Context: 131072, Output Limit: 131072
|
|
|
groq
|
Mistral Saba 24B |
mistral-saba-24b
|
0.79 |
0.79 |
Provider: Groq, Context: 32768, Output Limit: 32768
|
|
|
groq
|
Llama 3 8B |
llama3-8b-8192
|
0.05 |
0.08 |
Provider: Groq, Context: 8192, Output Limit: 8192
|
|
|
groq
|
Qwen QwQ 32B |
qwen-qwq-32b
|
0.29 |
0.39 |
Provider: Groq, Context: 131072, Output Limit: 16384
|
|
|
groq
|
Llama 3 70B |
llama3-70b-8192
|
0.59 |
0.79 |
Provider: Groq, Context: 8192, Output Limit: 8192
|
|
|
groq
|
DeepSeek R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.75 |
0.99 |
Provider: Groq, Context: 131072, Output Limit: 8192
|
|
|
groq
|
Llama Guard 3 8B |
llama-guard-3-8b
|
0.20 |
0.20 |
Provider: Groq, Context: 8192, Output Limit: 8192
|
|
|
groq
|
Gemma 2 9B |
gemma2-9b-it
|
0.20 |
0.20 |
Provider: Groq, Context: 8192, Output Limit: 8192
|
|
|
groq
|
Llama 3.3 70B Versatile |
llama-3.3-70b-versatile
|
0.59 |
0.79 |
Provider: Groq, Context: 131072, Output Limit: 32768
|
|
|
groq
|
Kimi K2 Instruct 0905 |
kimi-k2-instruct-0905
|
1.00 |
3.00 |
Provider: Groq, Context: 262144, Output Limit: 16384
|
|
|
groq
|
Kimi K2 Instruct |
kimi-k2-instruct
|
1.00 |
3.00 |
Provider: Groq, Context: 131072, Output Limit: 16384
|
|
|
groq
|
GPT OSS 20B |
gpt-oss-20b
|
0.08 |
0.30 |
Provider: Groq, Context: 131072, Output Limit: 65536
|
|
|
groq
|
GPT OSS 120B |
gpt-oss-120b
|
0.15 |
0.60 |
Provider: Groq, Context: 131072, Output Limit: 65536
|
|
|
groq
|
Qwen3 32B |
qwen3-32b
|
0.29 |
0.59 |
Provider: Groq, Context: 131072, Output Limit: 16384
|
|
|
groq
|
Llama 4 Scout 17B |
llama-4-scout-17b-16e-instruct
|
0.11 |
0.34 |
Provider: Groq, Context: 131072, Output Limit: 8192
|
|
|
groq
|
Llama 4 Maverick 17B |
llama-4-maverick-17b-128e-instruct
|
0.20 |
0.60 |
Provider: Groq, Context: 131072, Output Limit: 8192
|
|
|
groq
|
Llama Guard 4 12B |
llama-guard-4-12b
|
0.20 |
0.20 |
Provider: Groq, Context: 131072, Output Limit: 1024
|
|
|
bailing
|
Ling-1T |
ling-1t
|
0.57 |
2.29 |
Provider: Bailing, Context: 128000, Output Limit: 32000
|
|
|
bailing
|
Ring-1T |
ring-1t
|
0.57 |
2.29 |
Provider: Bailing, Context: 128000, Output Limit: 32000
|
|
|
githubcopilot
|
Gemini 2.0 Flash |
gemini-2.0-flash-001
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 1000000, Output Limit: 8192
|
|
|
githubcopilot
|
Claude Opus 4 |
claude-opus-4
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 80000, Output Limit: 16000
|
|
|
githubcopilot
|
Gemini 3 Flash |
gemini-3-flash-preview
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
|
|
|
githubcopilot
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
|
|
|
githubcopilot
|
GPT-5.1-Codex |
gpt-5.1-codex
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
|
|
|
githubcopilot
|
Claude Haiku 4.5 |
claude-haiku-4.5
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
|
|
|
githubcopilot
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
|
|
|
githubcopilot
|
Raptor Mini (Preview) |
oswe-vscode-prime
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 200000, Output Limit: 64000
|
|
|
githubcopilot
|
Claude Sonnet 3.5 |
claude-3.5-sonnet
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 90000, Output Limit: 8192
|
|
|
githubcopilot
|
GPT-5.1-Codex-mini |
gpt-5.1-codex-mini
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 100000
|
|
|
githubcopilot
|
o3-mini |
o3-mini
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 65536
|
|
|
githubcopilot
|
GPT-5.1 |
gpt-5.1
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
|
|
|
githubcopilot
|
GPT-5-Codex |
gpt-5-codex
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
|
|
|
githubcopilot
|
GPT-4o |
gpt-4o
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 64000, Output Limit: 16384
|
|
|
githubcopilot
|
GPT-4.1 |
gpt-4.1
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 16384
|
|
|
githubcopilot
|
o4-mini (Preview) |
o4-mini
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 65536
|
|
|
githubcopilot
|
Claude Opus 4.1 |
claude-opus-41
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 80000, Output Limit: 16000
|
|
|
githubcopilot
|
GPT-5-mini |
gpt-5-mini
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
|
|
|
githubcopilot
|
Claude Sonnet 3.7 |
claude-3.7-sonnet
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 200000, Output Limit: 16384
|
|
|
githubcopilot
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
|
|
|
githubcopilot
|
GPT-5.1-Codex-max |
gpt-5.1-codex-max
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
|
|
|
githubcopilot
|
o3 (Preview) |
o3
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 16384
|
|
|
githubcopilot
|
Claude Sonnet 4 |
claude-sonnet-4
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
|
|
|
githubcopilot
|
GPT-5 |
gpt-5
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
|
|
|
githubcopilot
|
Claude Sonnet 3.7 Thinking |
claude-3.7-sonnet-thought
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 200000, Output Limit: 16384
|
|
|
githubcopilot
|
Claude Opus 4.5 |
claude-opus-4.5
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
|
|
|
githubcopilot
|
GPT-5.2 |
gpt-5.2
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
|
|
|
githubcopilot
|
Claude Sonnet 4.5 |
claude-sonnet-4.5
|
0.00 |
0.00 |
Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
|
|
|
mistral
|
Devstral Medium |
devstral-medium-2507
|
0.40 |
2.00 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Mistral Large 3 |
mistral-large-2512
|
0.50 |
1.50 |
Provider: Mistral, Context: 262144, Output Limit: 262144
|
|
|
mistral
|
Mixtral 8x22B |
open-mixtral-8x22b
|
2.00 |
6.00 |
Provider: Mistral, Context: 64000, Output Limit: 64000
|
|
|
mistral
|
Ministral 8B |
ministral-8b-latest
|
0.10 |
0.10 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Pixtral Large |
pixtral-large-latest
|
2.00 |
6.00 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Mistral Small 3.2 |
mistral-small-2506
|
0.10 |
0.30 |
Provider: Mistral, Context: 128000, Output Limit: 16384
|
|
|
mistral
|
devstral-2512 |
devstral-2512
|
0.40 |
2.00 |
Source: mistral, Context: 256000
|
|
|
mistral
|
Ministral 3B |
ministral-3b-latest
|
0.04 |
0.04 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Pixtral 12B |
pixtral-12b
|
0.15 |
0.15 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Mistral Medium 3 |
mistral-medium-2505
|
0.40 |
2.00 |
Provider: Mistral, Context: 131072, Output Limit: 131072
|
|
|
mistral
|
labs-devstral-small-2512 |
labs-devstral-small-2512
|
0.10 |
0.30 |
Source: mistral, Context: 256000
|
|
|
mistral
|
Devstral 2 |
devstral-medium-latest
|
0.40 |
2.00 |
Provider: Mistral, Context: 262144, Output Limit: 262144
|
|
|
mistral
|
Devstral Small 2505 |
devstral-small-2505
|
0.10 |
0.30 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Mistral Medium 3.1 |
mistral-medium-2508
|
0.40 |
2.00 |
Provider: Mistral, Context: 262144, Output Limit: 262144
|
|
|
mistral
|
Mistral Embed |
mistral-embed
|
0.10 |
0.00 |
Provider: Mistral, Context: 8000, Output Limit: 3072
|
|
|
mistral
|
Mistral Small |
mistral-small-latest
|
0.10 |
0.30 |
Provider: Mistral, Context: 128000, Output Limit: 16384
|
|
|
mistral
|
Magistral Small |
magistral-small
|
0.50 |
1.50 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Devstral Small |
devstral-small-2507
|
0.10 |
0.30 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Codestral |
codestral-latest
|
0.30 |
0.90 |
Provider: Mistral, Context: 256000, Output Limit: 4096
|
|
|
mistral
|
Mixtral 8x7B |
open-mixtral-8x7b
|
0.70 |
0.70 |
Provider: Mistral, Context: 32000, Output Limit: 32000
|
|
|
mistral
|
Mistral Nemo |
mistral-nemo
|
0.15 |
0.15 |
Provider: Mistral, Context: 128000, Output Limit: 128000
|
|
|
mistral
|
Mistral 7B |
open-mistral-7b
|
0.25 |
0.25 |
Provider: Mistral, Context: 8000, Output Limit: 8000
|
|
|
mistral
|
Mistral Large |
mistral-large-latest
|
0.50 |
1.50 |
Provider: Mistral, Context: 262144, Output Limit: 262144
|
|
|
mistral
|
Mistral Medium |
mistral-medium-latest
|
0.40 |
2.00 |
Provider: Mistral, Context: 128000, Output Limit: 16384
|
|
|
mistral
|
Mistral Large 2.1 |
mistral-large-2411
|
2.00 |
6.00 |
Provider: Mistral, Context: 131072, Output Limit: 16384
|
|
|
mistral
|
Magistral Medium |
magistral-medium-latest
|
2.00 |
5.00 |
Provider: Mistral, Context: 128000, Output Limit: 16384
|
|
|
abacus
|
GPT-4.1 Nano |
gpt-4.1-nano
|
0.10 |
0.40 |
Provider: Abacus, Context: 1047576, Output Limit: 32768
|
|
|
abacus
|
Grok 4 Fast (Non-Reasoning) |
grok-4-fast-non-reasoning
|
0.20 |
0.50 |
Provider: Abacus, Context: 2000000, Output Limit: 16384
|
|
|
abacus
|
Gemini 2.0 Flash |
gemini-2.0-flash-001
|
0.10 |
0.40 |
Provider: Abacus, Context: 1000000, Output Limit: 8192
|
|
|
abacus
|
DeepSeek V3.2 |
deepseek-ai-deepseek-v3.2
|
0.27 |
0.40 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
Llama 3.1 405B Instruct Turbo |
meta-llama-meta-llama-3.1-405b-instruct-turbo
|
3.50 |
3.50 |
Provider: Abacus, Context: 128000, Output Limit: 4096
|
|
|
abacus
|
Gemini 3 Flash Preview |
gemini-3-flash-preview
|
0.50 |
3.00 |
Provider: Abacus, Context: 1048576, Output Limit: 65536
|
|
|
abacus
|
Qwen3 235B A22B Instruct |
qwen-qwen3-235b-a22b-instruct-2507
|
0.13 |
0.60 |
Provider: Abacus, Context: 262144, Output Limit: 8192
|
|
|
abacus
|
Llama 3.1 8B Instruct |
meta-llama-meta-llama-3.1-8b-instruct
|
0.02 |
0.05 |
Provider: Abacus, Context: 128000, Output Limit: 4096
|
|
|
abacus
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
Provider: Abacus, Context: 256000, Output Limit: 16384
|
|
|
abacus
|
DeepSeek R1 |
deepseek-ai-deepseek-r1
|
3.00 |
7.00 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
Kimi K2 Turbo Preview |
kimi-k2-turbo-preview
|
0.15 |
8.00 |
Provider: Abacus, Context: 256000, Output Limit: 8192
|
|
|
abacus
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Provider: Abacus, Context: 1000000, Output Limit: 65000
|
|
|
abacus
|
Qwen3 Coder 480B A35B Instruct |
qwen-qwen3-coder-480b-a35b-instruct
|
0.29 |
1.20 |
Provider: Abacus, Context: 262144, Output Limit: 65536
|
|
|
abacus
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: Abacus, Context: 1048576, Output Limit: 65536
|
|
|
abacus
|
GPT-4.1 Mini |
gpt-4.1-mini
|
0.40 |
1.60 |
Provider: Abacus, Context: 1047576, Output Limit: 32768
|
|
|
abacus
|
Claude Opus 4.5 |
claude-opus-4-5-20251101
|
5.00 |
25.00 |
Provider: Abacus, Context: 200000, Output Limit: 64000
|
|
|
abacus
|
Qwen 2.5 Coder 32B |
qwen-2.5-coder-32b
|
0.79 |
0.79 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
Claude Sonnet 4.5 |
claude-sonnet-4-5-20250929
|
3.00 |
15.00 |
Provider: Abacus, Context: 200000, Output Limit: 64000
|
|
|
abacus
|
GPT-OSS 120B |
openai-gpt-oss-120b
|
0.08 |
0.44 |
Provider: Abacus, Context: 128000, Output Limit: 32768
|
|
|
abacus
|
Qwen3 Max |
qwen-qwen3-max
|
1.20 |
6.00 |
Provider: Abacus, Context: 131072, Output Limit: 16384
|
|
|
abacus
|
Grok 4 |
grok-4-0709
|
3.00 |
15.00 |
Provider: Abacus, Context: 256000, Output Limit: 16384
|
|
|
abacus
|
Llama 3.1 70B Instruct |
meta-llama-meta-llama-3.1-70b-instruct
|
0.40 |
0.40 |
Provider: Abacus, Context: 128000, Output Limit: 4096
|
|
|
abacus
|
o3-mini |
o3-mini
|
1.10 |
4.40 |
Provider: Abacus, Context: 200000, Output Limit: 100000
|
|
|
abacus
|
GLM-4.5 |
zai-org-glm-4.5
|
0.60 |
2.20 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
Gemini 2.0 Pro Exp |
gemini-2.0-pro-exp-02-05
|
- |
- |
Provider: Abacus, Context: 2000000, Output Limit: 8192
|
|
|
abacus
|
GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: Abacus, Context: 400000, Output Limit: 128000
|
|
|
abacus
|
GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: Abacus, Context: 400000, Output Limit: 128000
|
|
|
abacus
|
Claude Sonnet 4 |
claude-sonnet-4-20250514
|
3.00 |
15.00 |
Provider: Abacus, Context: 200000, Output Limit: 64000
|
|
|
abacus
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: Abacus, Context: 1047576, Output Limit: 32768
|
|
|
abacus
|
o4-mini |
o4-mini
|
1.10 |
4.40 |
Provider: Abacus, Context: 200000, Output Limit: 100000
|
|
|
abacus
|
Qwen3 32B |
qwen-qwen3-32b
|
0.09 |
0.29 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
Claude Opus 4 |
claude-opus-4-20250514
|
15.00 |
75.00 |
Provider: Abacus, Context: 200000, Output Limit: 32000
|
|
|
abacus
|
GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: Abacus, Context: 400000, Output Limit: 128000
|
|
|
abacus
|
Llama 4 Maverick 17B 128E Instruct FP8 |
meta-llama-llama-4-maverick-17b-128e-instruct-fp8
|
0.14 |
0.59 |
Provider: Abacus, Context: 1000000, Output Limit: 32768
|
|
|
abacus
|
o3-pro |
o3-pro
|
20.00 |
80.00 |
Provider: Abacus, Context: 200000, Output Limit: 100000
|
|
|
abacus
|
Claude Sonnet 3.7 |
claude-3-7-sonnet-20250219
|
3.00 |
15.00 |
Provider: Abacus, Context: 200000, Output Limit: 64000
|
|
|
abacus
|
DeepSeek V3.1 Terminus |
deepseek-ai-deepseek-v3.1-terminus
|
0.27 |
1.00 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: Abacus, Context: 1048576, Output Limit: 65536
|
|
|
abacus
|
GPT-4o (2024-11-20) |
gpt-4o-2024-11-20
|
2.50 |
10.00 |
Provider: Abacus, Context: 128000, Output Limit: 16384
|
|
|
abacus
|
o3 |
o3
|
2.00 |
8.00 |
Provider: Abacus, Context: 200000, Output Limit: 100000
|
|
|
abacus
|
Qwen 2.5 72B Instruct |
qwen-qwen2.5-72b-instruct
|
0.11 |
0.38 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
GLM-4.6 |
zai-org-glm-4.6
|
0.60 |
2.20 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
DeepSeek V3.1 |
deepseek-deepseek-v3.1
|
0.55 |
1.66 |
Provider: Abacus, Context: 128000, Output Limit: 8192
|
|
|
abacus
|
QwQ 32B |
qwen-qwq-32b
|
0.40 |
0.40 |
Provider: Abacus, Context: 32768, Output Limit: 32768
|
|
|
abacus
|
GPT-4o Mini |
gpt-4o-mini
|
0.15 |
0.60 |
Provider: Abacus, Context: 128000, Output Limit: 16384
|
|
|
abacus
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: Abacus, Context: 400000, Output Limit: 128000
|
|
|
abacus
|
Grok 4.1 Fast (Non-Reasoning) |
grok-4-1-fast-non-reasoning
|
0.20 |
0.50 |
Provider: Abacus, Context: 2000000, Output Limit: 16384
|
|
|
abacus
|
Llama 3.3 70B Versatile |
llama-3.3-70b-versatile
|
0.59 |
0.79 |
Provider: Abacus, Context: 128000, Output Limit: 32768
|
|
|
abacus
|
Claude Opus 4.1 |
claude-opus-4-1-20250805
|
15.00 |
75.00 |
Provider: Abacus, Context: 200000, Output Limit: 32000
|
|
|
abacus
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
Provider: Abacus, Context: 400000, Output Limit: 128000
|
|
|
abacus
|
GPT-5.1 Chat Latest |
gpt-5.1-chat-latest
|
1.25 |
10.00 |
Provider: Abacus, Context: 400000, Output Limit: 128000
|
|
|
abacus
|
Claude Haiku 4.5 |
claude-haiku-4-5-20251001
|
1.00 |
5.00 |
Provider: Abacus, Context: 200000, Output Limit: 64000
|
|
|
nebius
|
Hermes 4 70B |
hermes-4-70b
|
0.13 |
0.40 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
Hermes-4 405B |
hermes-4-405b
|
1.00 |
3.00 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
Kimi K2 Instruct |
kimi-k2-instruct
|
0.50 |
2.40 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
Llama 3.1 Nemotron Ultra 253B v1 |
llama-3_1-nemotron-ultra-253b-v1
|
0.60 |
1.80 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
GPT OSS 20B |
gpt-oss-20b
|
0.05 |
0.20 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
GPT OSS 120B |
gpt-oss-120b
|
0.15 |
0.60 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.20 |
0.60 |
Provider: Nebius Token Factory, Context: 262144, Output Limit: 8192
|
|
|
nebius
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
0.20 |
0.80 |
Provider: Nebius Token Factory, Context: 262144, Output Limit: 8192
|
|
|
nebius
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.40 |
1.80 |
Provider: Nebius Token Factory, Context: 262144, Output Limit: 66536
|
|
|
nebius
|
Llama 3.1 405B Instruct |
llama-3_1-405b-instruct
|
1.00 |
3.00 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
Llama-3.3-70B-Instruct (Fast) |
llama-3.3-70b-instruct-fast
|
0.25 |
0.75 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
Llama-3.3-70B-Instruct (Base) |
llama-3.3-70b-instruct-base
|
0.13 |
0.40 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
GLM 4.5 |
glm-4.5
|
0.60 |
2.20 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
GLM 4.5 Air |
glm-4.5-air
|
0.20 |
1.20 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
nebius
|
DeepSeek V3 |
deepseek-v3
|
0.50 |
1.50 |
Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
|
|
|
deepseek
|
DeepSeek Chat |
deepseek-chat
|
0.28 |
0.42 |
Provider: DeepSeek, Context: 128000, Output Limit: 8192
|
|
|
deepseek
|
DeepSeek Reasoner |
deepseek-reasoner
|
0.28 |
0.42 |
Provider: DeepSeek, Context: 128000, Output Limit: 128000
|
|
|
alibabacn
|
DeepSeek R1 Distill Qwen 7B |
deepseek-r1-distill-qwen-7b
|
0.07 |
0.14 |
Provider: Alibaba (China), Context: 32768, Output Limit: 16384
|
|
|
alibabacn
|
Qwen3-ASR Flash |
qwen3-asr-flash
|
0.03 |
0.03 |
Provider: Alibaba (China), Context: 53248, Output Limit: 4096
|
|
|
alibabacn
|
DeepSeek R1 0528 |
deepseek-r1-0528
|
0.57 |
2.29 |
Provider: Alibaba (China), Context: 131072, Output Limit: 16384
|
|
|
alibabacn
|
DeepSeek V3 |
deepseek-v3
|
0.29 |
1.15 |
Provider: Alibaba (China), Context: 65536, Output Limit: 8192
|
|
|
alibabacn
|
Qwen-Omni Turbo |
qwen-omni-turbo
|
0.06 |
0.23 |
Provider: Alibaba (China), Context: 32768, Output Limit: 2048
|
|
|
alibabacn
|
Qwen-VL Max |
qwen-vl-max
|
0.23 |
0.57 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
DeepSeek V3.2 Exp |
deepseek-v3-2-exp
|
0.29 |
0.43 |
Provider: Alibaba (China), Context: 131072, Output Limit: 65536
|
|
|
alibabacn
|
Qwen3-Next 80B-A3B Instruct |
qwen3-next-80b-a3b-instruct
|
0.14 |
0.57 |
Provider: Alibaba (China), Context: 131072, Output Limit: 32768
|
|
|
alibabacn
|
DeepSeek R1 |
deepseek-r1
|
0.57 |
2.29 |
Provider: Alibaba (China), Context: 131072, Output Limit: 16384
|
|
|
alibabacn
|
Qwen Turbo |
qwen-turbo
|
0.04 |
0.09 |
Provider: Alibaba (China), Context: 1000000, Output Limit: 16384
|
|
|
alibabacn
|
Qwen3-VL 235B-A22B |
qwen3-vl-235b-a22b
|
0.29 |
1.15 |
Provider: Alibaba (China), Context: 131072, Output Limit: 32768
|
|
|
alibabacn
|
Qwen3 Coder Flash |
qwen3-coder-flash
|
0.14 |
0.57 |
Provider: Alibaba (China), Context: 1000000, Output Limit: 65536
|
|
|
alibabacn
|
Qwen3-VL 30B-A3B |
qwen3-vl-30b-a3b
|
0.11 |
0.43 |
Provider: Alibaba (China), Context: 131072, Output Limit: 32768
|
|
|
alibabacn
|
Qwen3 14B |
qwen3-14b
|
0.14 |
0.57 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
QVQ Max |
qvq-max
|
1.15 |
4.59 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
DeepSeek R1 Distill Qwen 32B |
deepseek-r1-distill-qwen-32b
|
0.29 |
0.86 |
Provider: Alibaba (China), Context: 32768, Output Limit: 16384
|
|
|
alibabacn
|
Qwen Plus Character |
qwen-plus-character
|
0.12 |
0.29 |
Provider: Alibaba (China), Context: 32768, Output Limit: 4096
|
|
|
alibabacn
|
Qwen2.5 14B Instruct |
qwen2-5-14b-instruct
|
0.14 |
0.43 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
QwQ Plus |
qwq-plus
|
0.23 |
0.57 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen2.5-Coder 32B Instruct |
qwen2-5-coder-32b-instruct
|
0.29 |
0.86 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen3-Coder 30B-A3B Instruct |
qwen3-coder-30b-a3b-instruct
|
0.22 |
0.86 |
Provider: Alibaba (China), Context: 262144, Output Limit: 65536
|
|
|
alibabacn
|
Qwen Math Plus |
qwen-math-plus
|
0.57 |
1.72 |
Provider: Alibaba (China), Context: 4096, Output Limit: 3072
|
|
|
alibabacn
|
Qwen-VL OCR |
qwen-vl-ocr
|
0.72 |
0.72 |
Provider: Alibaba (China), Context: 34096, Output Limit: 4096
|
|
|
alibabacn
|
Qwen Doc Turbo |
qwen-doc-turbo
|
0.09 |
0.14 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen Deep Research |
qwen-deep-research
|
7.74 |
23.37 |
Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
|
|
|
alibabacn
|
Qwen2.5 72B Instruct |
qwen2-5-72b-instruct
|
0.57 |
1.72 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen3-Omni Flash |
qwen3-omni-flash
|
0.06 |
0.23 |
Provider: Alibaba (China), Context: 65536, Output Limit: 16384
|
|
|
alibabacn
|
Qwen Flash |
qwen-flash
|
0.02 |
0.22 |
Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
|
|
|
alibabacn
|
Qwen3 8B |
qwen3-8b
|
0.07 |
0.29 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen3-Omni Flash Realtime |
qwen3-omni-flash-realtime
|
0.23 |
0.92 |
Provider: Alibaba (China), Context: 65536, Output Limit: 16384
|
|
|
alibabacn
|
Qwen2.5-VL 72B Instruct |
qwen2-5-vl-72b-instruct
|
2.29 |
6.88 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen3-VL Plus |
qwen3-vl-plus
|
0.14 |
1.43 |
Provider: Alibaba (China), Context: 262144, Output Limit: 32768
|
|
|
alibabacn
|
Qwen Plus |
qwen-plus
|
0.12 |
0.29 |
Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
|
|
|
alibabacn
|
Qwen2.5 32B Instruct |
qwen2-5-32b-instruct
|
0.29 |
0.86 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen2.5-Omni 7B |
qwen2-5-omni-7b
|
0.09 |
0.35 |
Provider: Alibaba (China), Context: 32768, Output Limit: 2048
|
|
|
alibabacn
|
Qwen Max |
qwen-max
|
0.35 |
1.38 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen Long |
qwen-long
|
0.07 |
0.29 |
Provider: Alibaba (China), Context: 10000000, Output Limit: 8192
|
|
|
alibabacn
|
Qwen2.5-Math 72B Instruct |
qwen2-5-math-72b-instruct
|
0.57 |
1.72 |
Provider: Alibaba (China), Context: 4096, Output Limit: 3072
|
|
|
alibabacn
|
Moonshot Kimi K2 Instruct |
moonshot-kimi-k2-instruct
|
0.57 |
2.29 |
Provider: Alibaba (China), Context: 131072, Output Limit: 131072
|
|
|
alibabacn
|
Tongyi Intent Detect V3 |
tongyi-intent-detect-v3
|
0.06 |
0.14 |
Provider: Alibaba (China), Context: 8192, Output Limit: 1024
|
|
|
alibabacn
|
Qwen2.5 7B Instruct |
qwen2-5-7b-instruct
|
0.07 |
0.14 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen2.5-VL 7B Instruct |
qwen2-5-vl-7b-instruct
|
0.29 |
0.72 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
DeepSeek V3.1 |
deepseek-v3-1
|
0.57 |
1.72 |
Provider: Alibaba (China), Context: 131072, Output Limit: 65536
|
|
|
alibabacn
|
DeepSeek R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.29 |
0.86 |
Provider: Alibaba (China), Context: 32768, Output Limit: 16384
|
|
|
alibabacn
|
Qwen3 235B-A22B |
qwen3-235b-a22b
|
0.29 |
1.15 |
Provider: Alibaba (China), Context: 131072, Output Limit: 16384
|
|
|
alibabacn
|
Qwen2.5-Coder 7B Instruct |
qwen2-5-coder-7b-instruct
|
0.14 |
0.29 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
DeepSeek R1 Distill Qwen 14B |
deepseek-r1-distill-qwen-14b
|
0.14 |
0.43 |
Provider: Alibaba (China), Context: 32768, Output Limit: 16384
|
|
|
alibabacn
|
Qwen-Omni Turbo Realtime |
qwen-omni-turbo-realtime
|
0.23 |
0.92 |
Provider: Alibaba (China), Context: 32768, Output Limit: 2048
|
|
|
alibabacn
|
Qwen Math Turbo |
qwen-math-turbo
|
0.29 |
0.86 |
Provider: Alibaba (China), Context: 4096, Output Limit: 3072
|
|
|
alibabacn
|
Qwen-MT Turbo |
qwen-mt-turbo
|
0.10 |
0.28 |
Provider: Alibaba (China), Context: 16384, Output Limit: 8192
|
|
|
alibabacn
|
DeepSeek R1 Distill Llama 8B |
deepseek-r1-distill-llama-8b
|
0.00 |
0.00 |
Provider: Alibaba (China), Context: 32768, Output Limit: 16384
|
|
|
alibabacn
|
Qwen3-Coder 480B-A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.86 |
3.44 |
Provider: Alibaba (China), Context: 262144, Output Limit: 65536
|
|
|
alibabacn
|
Qwen-MT Plus |
qwen-mt-plus
|
0.26 |
0.78 |
Provider: Alibaba (China), Context: 16384, Output Limit: 8192
|
|
|
alibabacn
|
Qwen3 Max |
qwen3-max
|
0.86 |
3.44 |
Provider: Alibaba (China), Context: 262144, Output Limit: 65536
|
|
|
alibabacn
|
QwQ 32B |
qwq-32b
|
0.29 |
0.86 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen2.5-Math 7B Instruct |
qwen2-5-math-7b-instruct
|
0.14 |
0.29 |
Provider: Alibaba (China), Context: 4096, Output Limit: 3072
|
|
|
alibabacn
|
Qwen3-Next 80B-A3B (Thinking) |
qwen3-next-80b-a3b-thinking
|
0.14 |
1.43 |
Provider: Alibaba (China), Context: 131072, Output Limit: 32768
|
|
|
alibabacn
|
DeepSeek R1 Distill Qwen 1.5B |
deepseek-r1-distill-qwen-1-5b
|
0.00 |
0.00 |
Provider: Alibaba (China), Context: 32768, Output Limit: 16384
|
|
|
alibabacn
|
Qwen3 32B |
qwen3-32b
|
0.29 |
1.15 |
Provider: Alibaba (China), Context: 131072, Output Limit: 16384
|
|
|
alibabacn
|
Qwen-VL Plus |
qwen-vl-plus
|
0.12 |
0.29 |
Provider: Alibaba (China), Context: 131072, Output Limit: 8192
|
|
|
alibabacn
|
Qwen3 Coder Plus |
qwen3-coder-plus
|
1.00 |
5.00 |
Provider: Alibaba (China), Context: 1048576, Output Limit: 65536
|
|
|
googlevertexanthropic
|
Claude Opus 4.5 |
claude-opus-4-5@20251101
|
5.00 |
25.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
|
|
|
googlevertexanthropic
|
Claude Sonnet 3.5 v2 |
claude-3-5-sonnet@20241022
|
3.00 |
15.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 8192
|
|
|
googlevertexanthropic
|
Claude Haiku 3.5 |
claude-3-5-haiku@20241022
|
0.80 |
4.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 8192
|
|
|
googlevertexanthropic
|
Claude Sonnet 4 |
claude-sonnet-4@20250514
|
3.00 |
15.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
|
|
|
googlevertexanthropic
|
Claude Sonnet 4.5 |
claude-sonnet-4-5@20250929
|
3.00 |
15.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
|
|
|
googlevertexanthropic
|
Claude Opus 4.1 |
claude-opus-4-1@20250805
|
15.00 |
75.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 32000
|
|
|
googlevertexanthropic
|
Claude Haiku 4.5 |
claude-haiku-4-5@20251001
|
1.00 |
5.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
|
|
|
googlevertexanthropic
|
Claude Sonnet 3.7 |
claude-3-7-sonnet@20250219
|
3.00 |
15.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
|
|
|
googlevertexanthropic
|
Claude Opus 4 |
claude-opus-4@20250514
|
15.00 |
75.00 |
Provider: Vertex (Anthropic), Context: 200000, Output Limit: 32000
|
|
|
venice
|
Grok 4.1 Fast |
grok-41-fast
|
0.50 |
1.25 |
Provider: Venice AI, Context: 262144, Output Limit: 65536
|
|
|
venice
|
Qwen 3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.15 |
0.75 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
Gemini 3 Flash Preview |
gemini-3-flash-preview
|
0.70 |
3.75 |
Provider: Venice AI, Context: 262144, Output Limit: 65536
|
|
|
venice
|
Claude Opus 4.5 |
claude-opus-45
|
6.00 |
30.00 |
Provider: Venice AI, Context: 202752, Output Limit: 50688
|
|
|
venice
|
Venice Medium |
mistral-31-24b
|
0.50 |
2.00 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.25 |
1.87 |
Provider: Venice AI, Context: 262144, Output Limit: 65536
|
|
|
venice
|
GLM 4.7 |
zai-org-glm-4.7
|
0.85 |
2.75 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
Venice Uncensored 1.1 |
venice-uncensored
|
0.20 |
0.90 |
Provider: Venice AI, Context: 32768, Output Limit: 8192
|
|
|
venice
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.50 |
15.00 |
Provider: Venice AI, Context: 202752, Output Limit: 50688
|
|
|
venice
|
GPT-5.2 |
openai-gpt-52
|
2.19 |
17.50 |
Provider: Venice AI, Context: 262144, Output Limit: 65536
|
|
|
venice
|
Venice Small |
qwen3-4b
|
0.05 |
0.15 |
Provider: Venice AI, Context: 32768, Output Limit: 8192
|
|
|
venice
|
Llama 3.3 70B |
llama-3.3-70b
|
0.70 |
2.80 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
OpenAI GPT OSS 120B |
openai-gpt-oss-120b
|
0.07 |
0.30 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.75 |
3.20 |
Provider: Venice AI, Context: 262144, Output Limit: 65536
|
|
|
venice
|
Qwen 3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
0.45 |
3.50 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
Llama 3.2 3B |
llama-3.2-3b
|
0.15 |
0.60 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
Google Gemma 3 27B Instruct |
google-gemma-3-27b-it
|
0.12 |
0.20 |
Provider: Venice AI, Context: 202752, Output Limit: 50688
|
|
|
venice
|
Hermes 3 Llama 3.1 405b |
hermes-3-llama-3.1-405b
|
1.10 |
3.00 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
GLM 4.6V |
zai-org-glm-4.6v
|
0.39 |
1.13 |
Provider: Venice AI, Context: 131072, Output Limit: 32768
|
|
|
venice
|
MiniMax M2.1 |
minimax-m21
|
0.40 |
1.60 |
Provider: Venice AI, Context: 202752, Output Limit: 50688
|
|
|
venice
|
Qwen 3 Next 80b |
qwen3-next-80b
|
0.35 |
1.90 |
Provider: Venice AI, Context: 262144, Output Limit: 65536
|
|
|
venice
|
GLM 4.6 |
zai-org-glm-4.6
|
0.85 |
2.75 |
Provider: Venice AI, Context: 202752, Output Limit: 50688
|
|
|
venice
|
Qwen 3 Coder 480b |
qwen3-coder-480b-a35b-instruct
|
0.75 |
3.00 |
Provider: Venice AI, Context: 262144, Output Limit: 65536
|
|
|
venice
|
DeepSeek V3.2 |
deepseek-v3.2
|
0.40 |
1.00 |
Provider: Venice AI, Context: 163840, Output Limit: 40960
|
|
|
siliconflowcn
|
inclusionAI/Ring-flash-2.0 |
ring-flash-2.0
|
0.14 |
0.57 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
inclusionAI/Ling-flash-2.0 |
ling-flash-2.0
|
0.14 |
0.57 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
inclusionAI/Ling-mini-2.0 |
ling-mini-2.0
|
0.07 |
0.28 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
moonshotai/Kimi-K2-Thinking |
kimi-k2-thinking
|
0.55 |
2.50 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
moonshotai/Kimi-K2-Instruct-0905 |
kimi-k2-instruct-0905
|
0.40 |
2.00 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
moonshotai/Kimi-Dev-72B |
kimi-dev-72b
|
0.29 |
1.15 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
moonshotai/Kimi-K2-Instruct |
kimi-k2-instruct
|
0.58 |
2.29 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
tencent/Hunyuan-A13B-Instruct |
hunyuan-a13b-instruct
|
0.14 |
0.57 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
tencent/Hunyuan-MT-7B |
hunyuan-mt-7b
|
0.00 |
0.00 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
|
|
|
siliconflowcn
|
MiniMaxAI/MiniMax-M1-80k |
minimax-m1-80k
|
0.55 |
2.20 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
MiniMaxAI/MiniMax-M2 |
minimax-m2
|
0.30 |
1.20 |
Provider: SiliconFlow (China), Context: 197000, Output Limit: 131000
|
|
|
siliconflowcn
|
THUDM/GLM-Z1-32B-0414 |
glm-z1-32b-0414
|
0.14 |
0.57 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
THUDM/GLM-4-9B-0414 |
glm-4-9b-0414
|
0.09 |
0.09 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
|
|
|
siliconflowcn
|
THUDM/GLM-Z1-9B-0414 |
glm-z1-9b-0414
|
0.09 |
0.09 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
THUDM/GLM-4.1V-9B-Thinking |
glm-4.1v-9b-thinking
|
0.04 |
0.14 |
Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
|
|
|
siliconflowcn
|
THUDM/GLM-4-32B-0414 |
glm-4-32b-0414
|
0.27 |
0.27 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
|
|
|
siliconflowcn
|
openai/gpt-oss-120b |
gpt-oss-120b
|
0.05 |
0.45 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
|
|
|
siliconflowcn
|
openai/gpt-oss-20b |
gpt-oss-20b
|
0.04 |
0.18 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
|
|
|
siliconflowcn
|
stepfun-ai/step3 |
step3
|
0.57 |
1.42 |
Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
|
|
|
siliconflowcn
|
nex-agi/DeepSeek-V3.1-Nex-N1 |
deepseek-v3.1-nex-n1
|
0.50 |
2.00 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
baidu/ERNIE-4.5-300B-A47B |
ernie-4.5-300b-a47b
|
0.28 |
1.10 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
z-ai/GLM-4.5-Air |
glm-4.5-air
|
0.14 |
0.86 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
z-ai/GLM-4.5 |
glm-4.5
|
0.40 |
2.00 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
ByteDance-Seed/Seed-OSS-36B-Instruct |
seed-oss-36b-instruct
|
0.21 |
0.57 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
meta-llama/Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.06 |
0.06 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen3-Next-80B-A3B-Thinking |
qwen3-next-80b-a3b-thinking
|
0.14 |
0.57 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-14B-Instruct |
qwen2.5-14b-instruct
|
0.10 |
0.10 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen3-Next-80B-A3B-Instruct |
qwen3-next-80b-a3b-instruct
|
0.14 |
1.40 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-32B-Instruct |
qwen3-vl-32b-instruct
|
0.20 |
0.60 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-Omni-30B-A3B-Thinking |
qwen3-omni-30b-a3b-thinking
|
0.10 |
0.40 |
Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
|
|
|
siliconflowcn
|
Qwen/Qwen3-235B-A22B-Thinking-2507 |
qwen3-235b-a22b-thinking-2507
|
0.13 |
0.60 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-32B-Thinking |
qwen3-vl-32b-thinking
|
0.20 |
1.50 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-30B-A3B-Thinking |
qwen3-vl-30b-a3b-thinking
|
0.29 |
1.00 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-30B-A3B-Instruct-2507 |
qwen3-30b-a3b-instruct-2507
|
0.09 |
0.30 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-235B-A22B-Thinking |
qwen3-vl-235b-a22b-thinking
|
0.45 |
3.50 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-Coder-480B-A35B-Instruct |
qwen3-coder-480b-a35b-instruct
|
0.25 |
1.00 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-235B-A22B-Instruct |
qwen3-vl-235b-a22b-instruct
|
0.30 |
1.50 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-8B-Instruct |
qwen3-vl-8b-instruct
|
0.18 |
0.68 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-32B |
qwen3-32b
|
0.14 |
0.57 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-VL-7B-Instruct |
qwen2.5-vl-7b-instruct
|
0.05 |
0.05 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/QwQ-32B |
qwq-32b
|
0.15 |
0.58 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-VL-72B-Instruct |
qwen2.5-vl-72b-instruct
|
0.59 |
0.59 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen3-235B-A22B |
qwen3-235b-a22b
|
0.35 |
1.42 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-7B-Instruct |
qwen2.5-7b-instruct
|
0.05 |
0.05 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen3-Coder-30B-A3B-Instruct |
qwen3-coder-30b-a3b-instruct
|
0.07 |
0.28 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-72B-Instruct |
qwen2.5-72b-instruct
|
0.59 |
0.59 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-72B-Instruct-128K |
qwen2.5-72b-instruct-128k
|
0.59 |
0.59 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-32B-Instruct |
qwen2.5-32b-instruct
|
0.18 |
0.18 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-Coder-32B-Instruct |
qwen2.5-coder-32b-instruct
|
0.18 |
0.18 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
|
|
|
siliconflowcn
|
Qwen/Qwen3-235B-A22B-Instruct-2507 |
qwen3-235b-a22b-instruct-2507
|
0.09 |
0.60 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-8B-Thinking |
qwen3-vl-8b-thinking
|
0.18 |
2.00 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-Omni-30B-A3B-Instruct |
qwen3-omni-30b-a3b-instruct
|
0.10 |
0.40 |
Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
|
|
|
siliconflowcn
|
Qwen/Qwen3-8B |
qwen3-8b
|
0.06 |
0.06 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
Qwen/Qwen3-Omni-30B-A3B-Captioner |
qwen3-omni-30b-a3b-captioner
|
0.10 |
0.40 |
Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
|
|
|
siliconflowcn
|
Qwen/Qwen2.5-VL-32B-Instruct |
qwen2.5-vl-32b-instruct
|
0.27 |
0.27 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
Qwen/Qwen3-14B |
qwen3-14b
|
0.07 |
0.28 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
Qwen/Qwen3-VL-30B-A3B-Instruct |
qwen3-vl-30b-a3b-instruct
|
0.29 |
1.00 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
|
|
|
siliconflowcn
|
Qwen/Qwen3-30B-A3B-Thinking-2507 |
qwen3-30b-a3b-thinking-2507
|
0.09 |
0.30 |
Provider: SiliconFlow (China), Context: 262000, Output Limit: 131000
|
|
|
siliconflowcn
|
Qwen/Qwen3-30B-A3B |
qwen3-30b-a3b
|
0.09 |
0.45 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
zai-org/GLM-4.5V |
glm-4.5v
|
0.14 |
0.86 |
Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
|
|
|
siliconflowcn
|
zai-org/GLM-4.6 |
glm-4.6
|
0.50 |
1.90 |
Provider: SiliconFlow (China), Context: 205000, Output Limit: 205000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-V3.1 |
deepseek-v3.1
|
0.27 |
1.00 |
Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-V3 |
deepseek-v3
|
0.25 |
1.00 |
Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
deepseek-r1-distill-qwen-7b
|
0.05 |
0.05 |
Provider: SiliconFlow (China), Context: 33000, Output Limit: 16000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-V3.1-Terminus |
deepseek-v3.1-terminus
|
0.27 |
1.00 |
Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-V3.2-Exp |
deepseek-v3.2-exp
|
0.27 |
0.41 |
Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
deepseek-r1-distill-qwen-14b
|
0.10 |
0.10 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
deepseek-ai/deepseek-vl2 |
deepseek-vl2
|
0.15 |
0.15 |
Provider: SiliconFlow (China), Context: 4000, Output Limit: 4000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B |
deepseek-r1-distill-qwen-32b
|
0.18 |
0.18 |
Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
|
|
|
siliconflowcn
|
deepseek-ai/DeepSeek-R1 |
deepseek-r1
|
0.50 |
2.18 |
Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
|
|
|
chutes
|
Hermes 4.3 36B |
hermes-4.3-36b
|
0.10 |
0.39 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Hermes 4 70B |
hermes-4-70b
|
0.11 |
0.38 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
Hermes 4 14B |
hermes-4-14b
|
0.01 |
0.05 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Hermes 4 405B FP8 TEE |
hermes-4-405b-fp8-tee
|
0.30 |
1.20 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Hermes 4 405B FP8 |
hermes-4-405b-fp8
|
0.30 |
1.20 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
DeepHermes 3 Mistral 24B Preview |
deephermes-3-mistral-24b-preview
|
0.02 |
0.10 |
Provider: Chutes, Context: 32768, Output Limit: 32768
|
|
|
chutes
|
dots.ocr |
dots.ocr
|
0.01 |
0.01 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
Kimi K2 Instruct 0905 |
kimi-k2-instruct-0905
|
0.39 |
1.90 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
Kimi K2 Thinking TEE |
kimi-k2-thinking-tee
|
0.40 |
1.75 |
Provider: Chutes, Context: 262144, Output Limit: 65535
|
|
|
chutes
|
MiniMax M2 |
minimax-m2
|
0.26 |
1.02 |
Provider: Chutes, Context: 196608, Output Limit: 196608
|
|
|
chutes
|
MiniMax M2.1 TEE |
minimax-m2.1-tee
|
0.30 |
1.20 |
Provider: Chutes, Context: 196608, Output Limit: 65536
|
|
|
chutes
|
NVIDIA Nemotron 3 Nano 30B A3B BF16 |
nvidia-nemotron-3-nano-30b-a3b-bf16
|
0.06 |
0.24 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
QwQ 32B ArliAI RpR v1 |
qwq-32b-arliai-rpr-v1
|
0.03 |
0.11 |
Provider: Chutes, Context: 32768, Output Limit: 32768
|
|
|
chutes
|
DeepSeek R1T Chimera |
deepseek-r1t-chimera
|
0.30 |
1.20 |
Provider: Chutes, Context: 163840, Output Limit: 163840
|
|
|
chutes
|
DeepSeek TNG R1T2 Chimera |
deepseek-tng-r1t2-chimera
|
0.30 |
1.20 |
Provider: Chutes, Context: 163840, Output Limit: 163840
|
|
|
chutes
|
TNG R1T Chimera TEE |
tng-r1t-chimera-tee
|
0.30 |
1.20 |
Provider: Chutes, Context: 163840, Output Limit: 65536
|
|
|
chutes
|
MiMo V2 Flash |
mimo-v2-flash
|
0.17 |
0.65 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
InternVL3 78B |
internvl3-78b
|
0.10 |
0.39 |
Provider: Chutes, Context: 32768, Output Limit: 32768
|
|
|
chutes
|
gpt oss 120b TEE |
gpt-oss-120b-tee
|
0.04 |
0.25 |
Provider: Chutes, Context: 131072, Output Limit: 65536
|
|
|
chutes
|
gpt oss 20b |
gpt-oss-20b
|
0.02 |
0.10 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
Mistral Small 3.1 24B Instruct 2503 |
mistral-small-3.1-24b-instruct-2503
|
0.03 |
0.11 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
Mistral Small 3.2 24B Instruct 2506 |
mistral-small-3.2-24b-instruct-2506
|
0.06 |
0.18 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
Tongyi DeepResearch 30B A3B |
tongyi-deepresearch-30b-a3b
|
0.10 |
0.39 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
Devstral 2 123B Instruct 2512 |
devstral-2-123b-instruct-2512
|
0.05 |
0.22 |
Provider: Chutes, Context: 262144, Output Limit: 65536
|
|
|
chutes
|
Mistral Nemo Instruct 2407 |
mistral-nemo-instruct-2407
|
0.02 |
0.04 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
gemma 3 4b it |
gemma-3-4b-it
|
0.01 |
0.03 |
Provider: Chutes, Context: 96000, Output Limit: 96000
|
|
|
chutes
|
Mistral Small 24B Instruct 2501 |
mistral-small-24b-instruct-2501
|
0.03 |
0.11 |
Provider: Chutes, Context: 32768, Output Limit: 32768
|
|
|
chutes
|
gemma 3 12b it |
gemma-3-12b-it
|
0.03 |
0.10 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
gemma 3 27b it |
gemma-3-27b-it
|
0.04 |
0.15 |
Provider: Chutes, Context: 96000, Output Limit: 96000
|
|
|
chutes
|
Qwen3 30B A3B |
qwen3-30b-a3b
|
0.06 |
0.22 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Qwen3 14B |
qwen3-14b
|
0.05 |
0.22 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Qwen2.5 VL 32B Instruct |
qwen2.5-vl-32b-instruct
|
0.05 |
0.22 |
Provider: Chutes, Context: 16384, Output Limit: 16384
|
|
|
chutes
|
Qwen3Guard Gen 0.6B |
qwen3guard-gen-0.6b
|
0.01 |
0.01 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.08 |
0.55 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
Qwen2.5 Coder 32B Instruct |
qwen2.5-coder-32b-instruct
|
0.03 |
0.11 |
Provider: Chutes, Context: 32768, Output Limit: 32768
|
|
|
chutes
|
Qwen2.5 72B Instruct |
qwen2.5-72b-instruct
|
0.13 |
0.52 |
Provider: Chutes, Context: 32768, Output Limit: 32768
|
|
|
chutes
|
Qwen2.5 VL 72B Instruct TEE |
qwen2.5-vl-72b-instruct-tee
|
0.15 |
0.60 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Qwen3 235B A22B |
qwen3-235b-a22b
|
0.30 |
1.20 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Qwen2.5 VL 72B Instruct |
qwen2.5-vl-72b-instruct
|
0.07 |
0.26 |
Provider: Chutes, Context: 32768, Output Limit: 32768
|
|
|
chutes
|
Qwen3 235B A22B Instruct 2507 TEE |
qwen3-235b-a22b-instruct-2507-tee
|
0.08 |
0.55 |
Provider: Chutes, Context: 262144, Output Limit: 65536
|
|
|
chutes
|
Qwen3 32B |
qwen3-32b
|
0.08 |
0.24 |
Provider: Chutes, Context: 40960, Output Limit: 40960
|
|
|
chutes
|
Qwen3 VL 235B A22B Instruct |
qwen3-vl-235b-a22b-instruct
|
0.30 |
1.20 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
Qwen3 VL 235B A22B Thinking |
qwen3-vl-235b-a22b-thinking
|
0.30 |
1.20 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
Qwen3 30B A3B Instruct 2507 |
qwen3-30b-a3b-instruct-2507
|
0.08 |
0.33 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
Qwen3 Coder 480B A35B Instruct FP8 TEE |
qwen3-coder-480b-a35b-instruct-fp8-tee
|
0.22 |
0.95 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
0.11 |
0.60 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
Qwen3 Next 80B A3B Instruct |
qwen3-next-80b-a3b-instruct
|
0.10 |
0.80 |
Provider: Chutes, Context: 262144, Output Limit: 262144
|
|
|
chutes
|
GLM 4.6 TEE |
glm-4.6-tee
|
0.40 |
1.75 |
Provider: Chutes, Context: 202752, Output Limit: 65536
|
|
|
chutes
|
GLM 4.5 TEE |
glm-4.5-tee
|
0.35 |
1.55 |
Provider: Chutes, Context: 131072, Output Limit: 65536
|
|
|
chutes
|
GLM 4.6V |
glm-4.6v
|
0.30 |
0.90 |
Provider: Chutes, Context: 131072, Output Limit: 65536
|
|
|
chutes
|
GLM 4.7 TEE |
glm-4.7-tee
|
0.40 |
1.50 |
Provider: Chutes, Context: 202752, Output Limit: 65535
|
|
|
chutes
|
GLM 4.5 Air |
glm-4.5-air
|
0.05 |
0.22 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
DeepSeek V3 0324 TEE |
deepseek-v3-0324-tee
|
0.24 |
0.84 |
Provider: Chutes, Context: 163840, Output Limit: 65536
|
|
|
chutes
|
DeepSeek V3.2 Speciale TEE |
deepseek-v3.2-speciale-tee
|
0.27 |
0.41 |
Provider: Chutes, Context: 163840, Output Limit: 65536
|
|
|
chutes
|
DeepSeek V3.1 Terminus TEE |
deepseek-v3.1-terminus-tee
|
0.23 |
0.90 |
Provider: Chutes, Context: 163840, Output Limit: 65536
|
|
|
chutes
|
DeepSeek V3 |
deepseek-v3
|
0.30 |
1.20 |
Provider: Chutes, Context: 163840, Output Limit: 163840
|
|
|
chutes
|
DeepSeek R1 TEE |
deepseek-r1-tee
|
0.30 |
1.20 |
Provider: Chutes, Context: 163840, Output Limit: 163840
|
|
|
chutes
|
DeepSeek R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.03 |
0.11 |
Provider: Chutes, Context: 131072, Output Limit: 131072
|
|
|
chutes
|
DeepSeek V3.1 |
deepseek-v3.1
|
0.20 |
0.80 |
Provider: Chutes, Context: 163840, Output Limit: 65536
|
|
|
chutes
|
DeepSeek R1 0528 TEE |
deepseek-r1-0528-tee
|
0.40 |
1.75 |
Provider: Chutes, Context: 163840, Output Limit: 163840
|
|
|
chutes
|
DeepSeek V3.2 TEE |
deepseek-v3.2-tee
|
0.27 |
0.41 |
Provider: Chutes, Context: 163840, Output Limit: 16384
|
|
|
chutes
|
DeepSeek V3.1 TEE |
deepseek-v3.1-tee
|
0.20 |
0.80 |
Provider: Chutes, Context: 163840, Output Limit: 65536
|
|
|
kimiforcoding
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.00 |
0.00 |
Provider: Kimi For Coding, Context: 262144, Output Limit: 32768
|
|
|
cortecs
|
Nova Pro 1.0 |
nova-pro-v1
|
1.02 |
4.06 |
Provider: Cortecs, Context: 300000, Output Limit: 5000
|
|
|
cortecs
|
Devstral 2 2512 |
devstral-2512
|
0.00 |
0.00 |
Provider: Cortecs, Context: 262000, Output Limit: 262000
|
|
|
cortecs
|
INTELLECT 3 |
intellect-3
|
0.22 |
1.20 |
Provider: Cortecs, Context: 128000, Output Limit: 128000
|
|
|
cortecs
|
Claude 4.5 Sonnet |
claude-4-5-sonnet
|
3.26 |
16.30 |
Provider: Cortecs, Context: 200000, Output Limit: 200000
|
|
|
cortecs
|
DeepSeek V3 0324 |
deepseek-v3-0324
|
0.55 |
1.65 |
Provider: Cortecs, Context: 128000, Output Limit: 128000
|
|
|
cortecs
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.66 |
2.73 |
Provider: Cortecs, Context: 262000, Output Limit: 262000
|
|
|
cortecs
|
Kimi K2 Instruct |
kimi-k2-instruct
|
0.55 |
2.65 |
Provider: Cortecs, Context: 131000, Output Limit: 131000
|
|
|
cortecs
|
GPT 4.1 |
gpt-4.1
|
2.35 |
9.42 |
Provider: Cortecs, Context: 1047576, Output Limit: 32768
|
|
|
cortecs
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.65 |
11.02 |
Provider: Cortecs, Context: 1048576, Output Limit: 65535
|
|
|
cortecs
|
GPT Oss 120b |
gpt-oss-120b
|
0.00 |
0.00 |
Provider: Cortecs, Context: 128000, Output Limit: 128000
|
|
|
cortecs
|
Devstral Small 2 2512 |
devstral-small-2512
|
0.00 |
0.00 |
Provider: Cortecs, Context: 262000, Output Limit: 262000
|
|
|
cortecs
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.44 |
1.98 |
Provider: Cortecs, Context: 262000, Output Limit: 262000
|
|
|
cortecs
|
Claude Sonnet 4 |
claude-sonnet-4
|
3.31 |
16.54 |
Provider: Cortecs, Context: 200000, Output Limit: 64000
|
|
|
cortecs
|
Llama 3.1 405B Instruct |
llama-3.1-405b-instruct
|
0.00 |
0.00 |
Provider: Cortecs, Context: 128000, Output Limit: 128000
|
|
|
cortecs
|
Qwen3 Next 80B A3B Thinking |
qwen3-next-80b-a3b-thinking
|
0.16 |
1.31 |
Provider: Cortecs, Context: 128000, Output Limit: 128000
|
|
|
cortecs
|
Qwen3 32B |
qwen3-32b
|
0.10 |
0.33 |
Provider: Cortecs, Context: 16384, Output Limit: 16384
|
|
|
githubmodels
|
JAIS 30b Chat |
jais-30b-chat
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 8192, Output Limit: 2048
|
|
|
githubmodels
|
Grok 3 |
grok-3
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Grok 3 Mini |
grok-3-mini
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Cohere Command R 08-2024 |
cohere-command-r-08-2024
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Cohere Command A |
cohere-command-a
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Cohere Command R+ 08-2024 |
cohere-command-r-plus-08-2024
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Cohere Command R |
cohere-command-r
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Cohere Command R+ |
cohere-command-r-plus
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
DeepSeek-R1-0528 |
deepseek-r1-0528
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 65536, Output Limit: 8192
|
|
|
githubmodels
|
DeepSeek-R1 |
deepseek-r1
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 65536, Output Limit: 8192
|
|
|
githubmodels
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Mistral Medium 3 (25.05) |
mistral-medium-2505
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
Ministral 3B |
ministral-3b
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Mistral Nemo |
mistral-nemo
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Mistral Large 24.11 |
mistral-large-2411
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
Codestral 25.01 |
codestral-2501
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 32000, Output Limit: 8192
|
|
|
githubmodels
|
Mistral Small 3.1 |
mistral-small-2503
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
Phi-3-medium instruct (128k) |
phi-3-medium-128k-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-3-mini instruct (4k) |
phi-3-mini-4k-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 4096, Output Limit: 1024
|
|
|
githubmodels
|
Phi-3-small instruct (128k) |
phi-3-small-128k-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-3.5-vision instruct (128k) |
phi-3.5-vision-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-4 |
phi-4
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 16000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-4-mini-reasoning |
phi-4-mini-reasoning
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-3-small instruct (8k) |
phi-3-small-8k-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 8192, Output Limit: 2048
|
|
|
githubmodels
|
Phi-3.5-mini instruct (128k) |
phi-3.5-mini-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-4-multimodal-instruct |
phi-4-multimodal-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-3-mini instruct (128k) |
phi-3-mini-128k-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-3.5-MoE instruct (128k) |
phi-3.5-moe-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-4-mini-instruct |
phi-4-mini-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
Phi-3-medium instruct (4k) |
phi-3-medium-4k-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 4096, Output Limit: 1024
|
|
|
githubmodels
|
Phi-4-Reasoning |
phi-4-reasoning
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 4096
|
|
|
githubmodels
|
MAI-DS-R1 |
mai-ds-r1
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 65536, Output Limit: 8192
|
|
|
githubmodels
|
GPT-4.1-nano |
gpt-4.1-nano
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 16384
|
|
|
githubmodels
|
GPT-4.1-mini |
gpt-4.1-mini
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 16384
|
|
|
githubmodels
|
OpenAI o1-preview |
o1-preview
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
OpenAI o3-mini |
o3-mini
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 200000, Output Limit: 100000
|
|
|
githubmodels
|
GPT-4o |
gpt-4o
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 16384
|
|
|
githubmodels
|
GPT-4.1 |
gpt-4.1
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 16384
|
|
|
githubmodels
|
OpenAI o4-mini |
o4-mini
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 200000, Output Limit: 100000
|
|
|
githubmodels
|
OpenAI o1 |
o1
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 200000, Output Limit: 100000
|
|
|
githubmodels
|
OpenAI o1-mini |
o1-mini
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 65536
|
|
|
githubmodels
|
OpenAI o3 |
o3
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 200000, Output Limit: 100000
|
|
|
githubmodels
|
GPT-4o mini |
gpt-4o-mini
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 16384
|
|
|
githubmodels
|
Llama-3.2-11B-Vision-Instruct |
llama-3.2-11b-vision-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Meta-Llama-3.1-405B-Instruct |
meta-llama-3.1-405b-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
Llama 4 Maverick 17B 128E Instruct FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Meta-Llama-3-70B-Instruct |
meta-llama-3-70b-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 8192, Output Limit: 2048
|
|
|
githubmodels
|
Meta-Llama-3.1-70B-Instruct |
meta-llama-3.1-70b-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
Llama-3.2-90B-Vision-Instruct |
llama-3.2-90b-vision-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Meta-Llama-3-8B-Instruct |
meta-llama-3-8b-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 8192, Output Limit: 2048
|
|
|
githubmodels
|
Llama 4 Scout 17B 16E Instruct |
llama-4-scout-17b-16e-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 8192
|
|
|
githubmodels
|
Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 128000, Output Limit: 32768
|
|
|
githubmodels
|
AI21 Jamba 1.5 Large |
ai21-jamba-1.5-large
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 256000, Output Limit: 4096
|
|
|
githubmodels
|
AI21 Jamba 1.5 Mini |
ai21-jamba-1.5-mini
|
0.00 |
0.00 |
Provider: GitHub Models, Context: 256000, Output Limit: 4096
|
|
|
togetherai
|
Kimi K2 Instruct |
kimi-k2-instruct
|
1.00 |
3.00 |
Provider: Together AI, Context: 131072, Output Limit: 32768
|
|
|
togetherai
|
Kimi K2 Thinking |
kimi-k2-thinking
|
1.20 |
4.00 |
Provider: Together AI, Context: 262144, Output Limit: 32768
|
|
|
togetherai
|
Rnj-1 Instruct |
rnj-1-instruct
|
0.15 |
0.15 |
Provider: Together AI, Context: 32768, Output Limit: 32768
|
|
|
togetherai
|
GPT OSS 120B |
gpt-oss-120b
|
0.15 |
0.60 |
Provider: Together AI, Context: 131072, Output Limit: 131072
|
|
|
togetherai
|
Llama 3.3 70B |
llama-3.3-70b-instruct-turbo
|
0.88 |
0.88 |
Provider: Together AI, Context: 131072, Output Limit: 66536
|
|
|
togetherai
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct-fp8
|
2.00 |
2.00 |
Provider: Together AI, Context: 262144, Output Limit: 66536
|
|
|
togetherai
|
GLM 4.6 |
glm-4.6
|
0.60 |
2.20 |
Provider: Together AI, Context: 200000, Output Limit: 32768
|
|
|
togetherai
|
DeepSeek R1 |
deepseek-r1
|
3.00 |
7.00 |
Provider: Together AI, Context: 163839, Output Limit: 12288
|
|
|
togetherai
|
DeepSeek V3 |
deepseek-v3
|
1.25 |
1.25 |
Provider: Together AI, Context: 131072, Output Limit: 12288
|
|
|
togetherai
|
DeepSeek V3.1 |
deepseek-v3-1
|
0.60 |
1.70 |
Provider: Together AI, Context: 131072, Output Limit: 12288
|
|
|
azure
|
GPT-4.1 nano |
gpt-4.1-nano
|
0.10 |
0.40 |
Provider: Azure, Context: 1047576, Output Limit: 32768
|
|
|
azure
|
text-embedding-3-small |
text-embedding-3-small
|
0.02 |
0.00 |
Provider: Azure, Context: 8191, Output Limit: 1536
|
|
|
azure
|
Grok 4 Fast (Non-Reasoning) |
grok-4-fast-non-reasoning
|
0.20 |
0.50 |
Provider: Azure, Context: 2000000, Output Limit: 30000
|
|
|
azure
|
DeepSeek-R1-0528 |
deepseek-r1-0528
|
1.35 |
5.40 |
Provider: Azure, Context: 163840, Output Limit: 163840
|
|
|
azure
|
Grok 4 Fast (Reasoning) |
grok-4-fast-reasoning
|
0.20 |
0.50 |
Provider: Azure, Context: 2000000, Output Limit: 30000
|
|
|
azure
|
Phi-3-medium-instruct (128k) |
phi-3-medium-128k-instruct
|
0.17 |
0.68 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
GPT-4 |
gpt-4
|
60.00 |
120.00 |
Provider: Azure, Context: 8192, Output Limit: 8192
|
|
|
azure
|
Claude Opus 4.1 |
claude-opus-4-1
|
15.00 |
75.00 |
Provider: Azure, Context: 200000, Output Limit: 32000
|
|
|
azure
|
GPT-5.2 Chat |
gpt-5.2-chat
|
1.75 |
14.00 |
Provider: Azure, Context: 128000, Output Limit: 16384
|
|
|
azure
|
Llama-3.2-11B-Vision-Instruct |
llama-3.2-11b-vision-instruct
|
0.37 |
0.37 |
Provider: Azure, Context: 128000, Output Limit: 8192
|
|
|
azure
|
Embed v4 |
cohere-embed-v-4-0
|
0.12 |
0.00 |
Provider: Azure, Context: 128000, Output Limit: 1536
|
|
|
azure
|
Command R |
cohere-command-r-08-2024
|
0.15 |
0.60 |
Provider: Azure, Context: 128000, Output Limit: 4000
|
|
|
azure
|
Grok 4 |
grok-4
|
3.00 |
15.00 |
Provider: Azure, Context: 256000, Output Limit: 64000
|
|
|
azure
|
Embed v3 Multilingual |
cohere-embed-v3-multilingual
|
0.10 |
0.00 |
Provider: Azure, Context: 512, Output Limit: 1024
|
|
|
azure
|
Phi-4-mini |
phi-4-mini
|
0.08 |
0.30 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
GPT-4 32K |
gpt-4-32k
|
60.00 |
120.00 |
Provider: Azure, Context: 32768, Output Limit: 32768
|
|
|
azure
|
Meta-Llama-3.1-405B-Instruct |
meta-llama-3.1-405b-instruct
|
5.33 |
16.00 |
Provider: Azure, Context: 128000, Output Limit: 32768
|
|
|
azure
|
DeepSeek-R1 |
deepseek-r1
|
1.35 |
5.40 |
Provider: Azure, Context: 163840, Output Limit: 163840
|
|
|
azure
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
Provider: Azure, Context: 256000, Output Limit: 10000
|
|
|
azure
|
GPT-5.1 Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
Provider: Azure, Context: 400000, Output Limit: 128000
|
|
|
azure
|
Phi-3-mini-instruct (4k) |
phi-3-mini-4k-instruct
|
0.13 |
0.52 |
Provider: Azure, Context: 4096, Output Limit: 1024
|
|
|
azure
|
Claude Haiku 4.5 |
claude-haiku-4-5
|
1.00 |
5.00 |
Provider: Azure, Context: 200000, Output Limit: 64000
|
|
|
azure
|
DeepSeek-V3.2-Speciale |
deepseek-v3.2-speciale
|
0.28 |
0.42 |
Provider: Azure, Context: 128000, Output Limit: 128000
|
|
|
azure
|
Mistral Medium 3 |
mistral-medium-2505
|
0.40 |
2.00 |
Provider: Azure, Context: 128000, Output Limit: 128000
|
|
|
azure
|
Claude Opus 4.5 |
claude-opus-4-5
|
5.00 |
25.00 |
Provider: Azure, Context: 200000, Output Limit: 64000
|
|
|
azure
|
Phi-3-small-instruct (128k) |
phi-3-small-128k-instruct
|
0.15 |
0.60 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
Command A |
cohere-command-a
|
2.50 |
10.00 |
Provider: Azure, Context: 256000, Output Limit: 8000
|
|
|
azure
|
Command R+ |
cohere-command-r-plus-08-2024
|
2.50 |
10.00 |
Provider: Azure, Context: 128000, Output Limit: 4000
|
|
|
azure
|
Llama 4 Maverick 17B 128E Instruct FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.25 |
1.00 |
Provider: Azure, Context: 128000, Output Limit: 8192
|
|
|
azure
|
GPT-4.1 mini |
gpt-4.1-mini
|
0.40 |
1.60 |
Provider: Azure, Context: 1047576, Output Limit: 32768
|
|
|
azure
|
GPT-5 Chat |
gpt-5-chat
|
1.25 |
10.00 |
Provider: Azure, Context: 128000, Output Limit: 16384
|
|
|
azure
|
DeepSeek-V3.1 |
deepseek-v3.1
|
0.56 |
1.68 |
Provider: Azure, Context: 131072, Output Limit: 131072
|
|
|
azure
|
Phi-4 |
phi-4
|
0.13 |
0.50 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
Phi-4-mini-reasoning |
phi-4-mini-reasoning
|
0.08 |
0.30 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
Claude Sonnet 4.5 |
claude-sonnet-4-5
|
3.00 |
15.00 |
Provider: Azure, Context: 200000, Output Limit: 64000
|
|
|
azure
|
GPT-3.5 Turbo 0125 |
gpt-3.5-turbo-0125
|
0.50 |
1.50 |
Provider: Azure, Context: 16384, Output Limit: 16384
|
|
|
azure
|
Grok 3 |
grok-3
|
3.00 |
15.00 |
Provider: Azure, Context: 131072, Output Limit: 8192
|
|
|
azure
|
text-embedding-3-large |
text-embedding-3-large
|
0.13 |
0.00 |
Provider: Azure, Context: 8191, Output Limit: 3072
|
|
|
azure
|
Meta-Llama-3-70B-Instruct |
meta-llama-3-70b-instruct
|
2.68 |
3.54 |
Provider: Azure, Context: 8192, Output Limit: 2048
|
|
|
azure
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
1.14 |
4.56 |
Provider: Azure, Context: 131072, Output Limit: 131072
|
|
|
azure
|
Phi-3-small-instruct (8k) |
phi-3-small-8k-instruct
|
0.15 |
0.60 |
Provider: Azure, Context: 8192, Output Limit: 2048
|
|
|
azure
|
Meta-Llama-3.1-70B-Instruct |
meta-llama-3.1-70b-instruct
|
2.68 |
3.54 |
Provider: Azure, Context: 128000, Output Limit: 32768
|
|
|
azure
|
GPT-4 Turbo |
gpt-4-turbo
|
10.00 |
30.00 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
GPT-3.5 Turbo 0613 |
gpt-3.5-turbo-0613
|
3.00 |
4.00 |
Provider: Azure, Context: 16384, Output Limit: 16384
|
|
|
azure
|
Phi-3.5-mini-instruct |
phi-3.5-mini-instruct
|
0.13 |
0.52 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
o1-preview |
o1-preview
|
16.50 |
66.00 |
Provider: Azure, Context: 128000, Output Limit: 32768
|
|
|
azure
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.71 |
0.71 |
Provider: Azure, Context: 128000, Output Limit: 32768
|
|
|
azure
|
GPT-5.1 Codex Mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
Provider: Azure, Context: 400000, Output Limit: 128000
|
|
|
azure
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Provider: Azure, Context: 262144, Output Limit: 262144
|
|
|
azure
|
Model Router |
model-router
|
0.14 |
0.00 |
Provider: Azure, Context: 128000, Output Limit: 16384
|
|
|
azure
|
o3-mini |
o3-mini
|
1.10 |
4.40 |
Provider: Azure, Context: 200000, Output Limit: 100000
|
|
|
azure
|
GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: Azure, Context: 272000, Output Limit: 128000
|
|
|
azure
|
GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: Azure, Context: 272000, Output Limit: 128000
|
|
|
azure
|
GPT-5-Codex |
gpt-5-codex
|
1.25 |
10.00 |
Provider: Azure, Context: 400000, Output Limit: 128000
|
|
|
azure
|
Llama-3.2-90B-Vision-Instruct |
llama-3.2-90b-vision-instruct
|
2.04 |
2.04 |
Provider: Azure, Context: 128000, Output Limit: 8192
|
|
|
azure
|
Phi-3-mini-instruct (128k) |
phi-3-mini-128k-instruct
|
0.13 |
0.52 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
GPT-4o |
gpt-4o
|
2.50 |
10.00 |
Provider: Azure, Context: 128000, Output Limit: 16384
|
|
|
azure
|
GPT-3.5 Turbo 0301 |
gpt-3.5-turbo-0301
|
1.50 |
2.00 |
Provider: Azure, Context: 4096, Output Limit: 4096
|
|
|
azure
|
Ministral 3B |
ministral-3b
|
0.04 |
0.04 |
Provider: Azure, Context: 128000, Output Limit: 8192
|
|
|
azure
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: Azure, Context: 1047576, Output Limit: 32768
|
|
|
azure
|
o4-mini |
o4-mini
|
1.10 |
4.40 |
Provider: Azure, Context: 200000, Output Limit: 100000
|
|
|
azure
|
Phi-4-multimodal |
phi-4-multimodal
|
0.08 |
0.32 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
Meta-Llama-3-8B-Instruct |
meta-llama-3-8b-instruct
|
0.30 |
0.61 |
Provider: Azure, Context: 8192, Output Limit: 2048
|
|
|
azure
|
o1 |
o1
|
15.00 |
60.00 |
Provider: Azure, Context: 200000, Output Limit: 100000
|
|
|
azure
|
Grok 3 Mini |
grok-3-mini
|
0.30 |
0.50 |
Provider: Azure, Context: 131072, Output Limit: 8192
|
|
|
azure
|
GPT-5.1 Chat |
gpt-5.1-chat
|
1.25 |
10.00 |
Provider: Azure, Context: 128000, Output Limit: 16384
|
|
|
azure
|
Phi-3.5-MoE-instruct |
phi-3.5-moe-instruct
|
0.16 |
0.64 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: Azure, Context: 272000, Output Limit: 128000
|
|
|
azure
|
o1-mini |
o1-mini
|
1.10 |
4.40 |
Provider: Azure, Context: 128000, Output Limit: 65536
|
|
|
azure
|
Llama 4 Scout 17B 16E Instruct |
llama-4-scout-17b-16e-instruct
|
0.20 |
0.78 |
Provider: Azure, Context: 128000, Output Limit: 8192
|
|
|
azure
|
Embed v3 English |
cohere-embed-v3-english
|
0.10 |
0.00 |
Provider: Azure, Context: 512, Output Limit: 1024
|
|
|
azure
|
text-embedding-ada-002 |
text-embedding-ada-002
|
0.10 |
0.00 |
Provider: Azure, Context: 8192, Output Limit: 1536
|
|
|
azure
|
Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.30 |
0.61 |
Provider: Azure, Context: 128000, Output Limit: 32768
|
|
|
azure
|
GPT-5.1 Codex Max |
gpt-5.1-codex-max
|
1.25 |
10.00 |
Provider: Azure, Context: 400000, Output Limit: 128000
|
|
|
azure
|
GPT-3.5 Turbo Instruct |
gpt-3.5-turbo-instruct
|
1.50 |
2.00 |
Provider: Azure, Context: 4096, Output Limit: 4096
|
|
|
azure
|
Mistral Nemo |
mistral-nemo
|
0.15 |
0.15 |
Provider: Azure, Context: 128000, Output Limit: 128000
|
|
|
azure
|
o3 |
o3
|
2.00 |
8.00 |
Provider: Azure, Context: 200000, Output Limit: 100000
|
|
|
azure
|
Codex Mini |
codex-mini
|
1.50 |
6.00 |
Provider: Azure, Context: 200000, Output Limit: 100000
|
|
|
azure
|
Phi-3-medium-instruct (4k) |
phi-3-medium-4k-instruct
|
0.17 |
0.68 |
Provider: Azure, Context: 4096, Output Limit: 1024
|
|
|
azure
|
Phi-4-reasoning |
phi-4-reasoning
|
0.13 |
0.50 |
Provider: Azure, Context: 32000, Output Limit: 4096
|
|
|
azure
|
GPT-4 Turbo Vision |
gpt-4-turbo-vision
|
10.00 |
30.00 |
Provider: Azure, Context: 128000, Output Limit: 4096
|
|
|
azure
|
Phi-4-reasoning-plus |
phi-4-reasoning-plus
|
0.13 |
0.50 |
Provider: Azure, Context: 32000, Output Limit: 4096
|
|
|
azure
|
GPT-4o mini |
gpt-4o-mini
|
0.15 |
0.60 |
Provider: Azure, Context: 128000, Output Limit: 16384
|
|
|
azure
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: Azure, Context: 272000, Output Limit: 128000
|
|
|
azure
|
MAI-DS-R1 |
mai-ds-r1
|
1.35 |
5.40 |
Provider: Azure, Context: 128000, Output Limit: 8192
|
|
|
azure
|
DeepSeek-V3.2 |
deepseek-v3.2
|
0.28 |
0.42 |
Provider: Azure, Context: 128000, Output Limit: 128000
|
|
|
azure
|
GPT-5 Pro |
gpt-5-pro
|
15.00 |
120.00 |
Provider: Azure, Context: 400000, Output Limit: 272000
|
|
|
azure
|
Mistral Large 24.11 |
mistral-large-2411
|
2.00 |
6.00 |
Provider: Azure, Context: 128000, Output Limit: 32768
|
|
|
azure
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
Provider: Azure, Context: 400000, Output Limit: 128000
|
|
|
azure
|
Codestral 25.01 |
codestral-2501
|
0.30 |
0.90 |
Provider: Azure, Context: 256000, Output Limit: 256000
|
|
|
azure
|
Mistral Small 3.1 |
mistral-small-2503
|
0.10 |
0.30 |
Provider: Azure, Context: 128000, Output Limit: 32768
|
|
|
azure
|
GPT-3.5 Turbo 1106 |
gpt-3.5-turbo-1106
|
1.00 |
2.00 |
Provider: Azure, Context: 16384, Output Limit: 16384
|
|
|
baseten
|
Kimi K2 Instruct 0905 |
kimi-k2-instruct-0905
|
0.60 |
2.50 |
Provider: Baseten, Context: 262144, Output Limit: 262144
|
|
|
baseten
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Provider: Baseten, Context: 262144, Output Limit: 262144
|
|
|
baseten
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.38 |
1.53 |
Provider: Baseten, Context: 262144, Output Limit: 66536
|
|
|
baseten
|
GLM-4.7 |
glm-4.7
|
0.60 |
2.20 |
Provider: Baseten, Context: 204800, Output Limit: 131072
|
|
|
baseten
|
GLM 4.6 |
glm-4.6
|
0.60 |
2.20 |
Provider: Baseten, Context: 200000, Output Limit: 200000
|
|
|
baseten
|
DeepSeek V3.2 |
deepseek-v3.2
|
0.30 |
0.45 |
Provider: Baseten, Context: 163800, Output Limit: 131100
|
|
|
siliconflow
|
inclusionAI/Ling-mini-2.0 |
ling-mini-2.0
|
0.07 |
0.28 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
inclusionAI/Ling-flash-2.0 |
ling-flash-2.0
|
0.14 |
0.57 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
inclusionAI/Ring-flash-2.0 |
ring-flash-2.0
|
0.14 |
0.57 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
moonshotai/Kimi-K2-Instruct |
kimi-k2-instruct
|
0.58 |
2.29 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
moonshotai/Kimi-Dev-72B |
kimi-dev-72b
|
0.29 |
1.15 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
moonshotai/Kimi-K2-Instruct-0905 |
kimi-k2-instruct-0905
|
0.40 |
2.00 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
moonshotai/Kimi-K2-Thinking |
kimi-k2-thinking
|
0.55 |
2.50 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
tencent/Hunyuan-MT-7B |
hunyuan-mt-7b
|
0.00 |
0.00 |
Provider: SiliconFlow, Context: 33000, Output Limit: 33000
|
|
|
siliconflow
|
tencent/Hunyuan-A13B-Instruct |
hunyuan-a13b-instruct
|
0.14 |
0.57 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
MiniMaxAI/MiniMax-M2 |
minimax-m2
|
0.30 |
1.20 |
Provider: SiliconFlow, Context: 197000, Output Limit: 131000
|
|
|
siliconflow
|
MiniMaxAI/MiniMax-M1-80k |
minimax-m1-80k
|
0.55 |
2.20 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
THUDM/GLM-4-32B-0414 |
glm-4-32b-0414
|
0.27 |
0.27 |
Provider: SiliconFlow, Context: 33000, Output Limit: 33000
|
|
|
siliconflow
|
THUDM/GLM-4.1V-9B-Thinking |
glm-4.1v-9b-thinking
|
0.04 |
0.14 |
Provider: SiliconFlow, Context: 66000, Output Limit: 66000
|
|
|
siliconflow
|
THUDM/GLM-Z1-9B-0414 |
glm-z1-9b-0414
|
0.09 |
0.09 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
THUDM/GLM-4-9B-0414 |
glm-4-9b-0414
|
0.09 |
0.09 |
Provider: SiliconFlow, Context: 33000, Output Limit: 33000
|
|
|
siliconflow
|
THUDM/GLM-Z1-32B-0414 |
glm-z1-32b-0414
|
0.14 |
0.57 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
openai/gpt-oss-20b |
gpt-oss-20b
|
0.04 |
0.18 |
Provider: SiliconFlow, Context: 131000, Output Limit: 8000
|
|
|
siliconflow
|
openai/gpt-oss-120b |
gpt-oss-120b
|
0.05 |
0.45 |
Provider: SiliconFlow, Context: 131000, Output Limit: 8000
|
|
|
siliconflow
|
stepfun-ai/step3 |
step3
|
0.57 |
1.42 |
Provider: SiliconFlow, Context: 66000, Output Limit: 66000
|
|
|
siliconflow
|
nex-agi/DeepSeek-V3.1-Nex-N1 |
deepseek-v3.1-nex-n1
|
0.50 |
2.00 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
baidu/ERNIE-4.5-300B-A47B |
ernie-4.5-300b-a47b
|
0.28 |
1.10 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
z-ai/GLM-4.5 |
glm-4.5
|
0.40 |
2.00 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
z-ai/GLM-4.5-Air |
glm-4.5-air
|
0.14 |
0.86 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
ByteDance-Seed/Seed-OSS-36B-Instruct |
seed-oss-36b-instruct
|
0.21 |
0.57 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
meta-llama/Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.06 |
0.06 |
Provider: SiliconFlow, Context: 33000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen3-30B-A3B |
qwen3-30b-a3b
|
0.09 |
0.45 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen3-30B-A3B-Thinking-2507 |
qwen3-30b-a3b-thinking-2507
|
0.09 |
0.30 |
Provider: SiliconFlow, Context: 262000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-30B-A3B-Instruct |
qwen3-vl-30b-a3b-instruct
|
0.29 |
1.00 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-14B |
qwen3-14b
|
0.07 |
0.28 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen2.5-VL-32B-Instruct |
qwen2.5-vl-32b-instruct
|
0.27 |
0.27 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen3-Omni-30B-A3B-Captioner |
qwen3-omni-30b-a3b-captioner
|
0.10 |
0.40 |
Provider: SiliconFlow, Context: 66000, Output Limit: 66000
|
|
|
siliconflow
|
Qwen/Qwen3-8B |
qwen3-8b
|
0.06 |
0.06 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen3-Omni-30B-A3B-Instruct |
qwen3-omni-30b-a3b-instruct
|
0.10 |
0.40 |
Provider: SiliconFlow, Context: 66000, Output Limit: 66000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-8B-Thinking |
qwen3-vl-8b-thinking
|
0.18 |
2.00 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-235B-A22B-Instruct-2507 |
qwen3-235b-a22b-instruct-2507
|
0.09 |
0.60 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen2.5-Coder-32B-Instruct |
qwen2.5-coder-32b-instruct
|
0.18 |
0.18 |
Provider: SiliconFlow, Context: 33000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen2.5-32B-Instruct |
qwen2.5-32b-instruct
|
0.18 |
0.18 |
Provider: SiliconFlow, Context: 33000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen2.5-72B-Instruct-128K |
qwen2.5-72b-instruct-128k
|
0.59 |
0.59 |
Provider: SiliconFlow, Context: 131000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen2.5-72B-Instruct |
qwen2.5-72b-instruct
|
0.59 |
0.59 |
Provider: SiliconFlow, Context: 33000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen3-Coder-30B-A3B-Instruct |
qwen3-coder-30b-a3b-instruct
|
0.07 |
0.28 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen2.5-7B-Instruct |
qwen2.5-7b-instruct
|
0.05 |
0.05 |
Provider: SiliconFlow, Context: 33000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen3-235B-A22B |
qwen3-235b-a22b
|
0.35 |
1.42 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen2.5-VL-72B-Instruct |
qwen2.5-vl-72b-instruct
|
0.59 |
0.59 |
Provider: SiliconFlow, Context: 131000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/QwQ-32B |
qwq-32b
|
0.15 |
0.58 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen2.5-VL-7B-Instruct |
qwen2.5-vl-7b-instruct
|
0.05 |
0.05 |
Provider: SiliconFlow, Context: 33000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen3-32B |
qwen3-32b
|
0.14 |
0.57 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-8B-Instruct |
qwen3-vl-8b-instruct
|
0.18 |
0.68 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-235B-A22B-Instruct |
qwen3-vl-235b-a22b-instruct
|
0.30 |
1.50 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-Coder-480B-A35B-Instruct |
qwen3-coder-480b-a35b-instruct
|
0.25 |
1.00 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-235B-A22B-Thinking |
qwen3-vl-235b-a22b-thinking
|
0.45 |
3.50 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-30B-A3B-Instruct-2507 |
qwen3-30b-a3b-instruct-2507
|
0.09 |
0.30 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-30B-A3B-Thinking |
qwen3-vl-30b-a3b-thinking
|
0.29 |
1.00 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-32B-Thinking |
qwen3-vl-32b-thinking
|
0.20 |
1.50 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-235B-A22B-Thinking-2507 |
qwen3-235b-a22b-thinking-2507
|
0.13 |
0.60 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-Omni-30B-A3B-Thinking |
qwen3-omni-30b-a3b-thinking
|
0.10 |
0.40 |
Provider: SiliconFlow, Context: 66000, Output Limit: 66000
|
|
|
siliconflow
|
Qwen/Qwen3-VL-32B-Instruct |
qwen3-vl-32b-instruct
|
0.20 |
0.60 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen3-Next-80B-A3B-Instruct |
qwen3-next-80b-a3b-instruct
|
0.14 |
1.40 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
Qwen/Qwen2.5-14B-Instruct |
qwen2.5-14b-instruct
|
0.10 |
0.10 |
Provider: SiliconFlow, Context: 33000, Output Limit: 4000
|
|
|
siliconflow
|
Qwen/Qwen3-Next-80B-A3B-Thinking |
qwen3-next-80b-a3b-thinking
|
0.14 |
0.57 |
Provider: SiliconFlow, Context: 262000, Output Limit: 262000
|
|
|
siliconflow
|
zai-org/GLM-4.6 |
glm-4.6
|
0.50 |
1.90 |
Provider: SiliconFlow, Context: 205000, Output Limit: 205000
|
|
|
siliconflow
|
zai-org/GLM-4.5V |
glm-4.5v
|
0.14 |
0.86 |
Provider: SiliconFlow, Context: 66000, Output Limit: 66000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-R1 |
deepseek-r1
|
0.50 |
2.18 |
Provider: SiliconFlow, Context: 164000, Output Limit: 164000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B |
deepseek-r1-distill-qwen-32b
|
0.18 |
0.18 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
deepseek-ai/deepseek-vl2 |
deepseek-vl2
|
0.15 |
0.15 |
Provider: SiliconFlow, Context: 4000, Output Limit: 4000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
deepseek-r1-distill-qwen-14b
|
0.10 |
0.10 |
Provider: SiliconFlow, Context: 131000, Output Limit: 131000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-V3.2-Exp |
deepseek-v3.2-exp
|
0.27 |
0.41 |
Provider: SiliconFlow, Context: 164000, Output Limit: 164000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-V3.1-Terminus |
deepseek-v3.1-terminus
|
0.27 |
1.00 |
Provider: SiliconFlow, Context: 164000, Output Limit: 164000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
deepseek-r1-distill-qwen-7b
|
0.05 |
0.05 |
Provider: SiliconFlow, Context: 33000, Output Limit: 16000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-V3 |
deepseek-v3
|
0.25 |
1.00 |
Provider: SiliconFlow, Context: 164000, Output Limit: 164000
|
|
|
siliconflow
|
deepseek-ai/DeepSeek-V3.1 |
deepseek-v3.1
|
0.27 |
1.00 |
Provider: SiliconFlow, Context: 164000, Output Limit: 164000
|
|
|
helicone
|
OpenAI GPT-4.1 Nano |
gpt-4.1-nano
|
0.10 |
0.40 |
Provider: Helicone, Context: 1047576, Output Limit: 32768
|
|
|
helicone
|
xAI Grok 4 Fast Non-Reasoning |
grok-4-fast-non-reasoning
|
0.20 |
0.50 |
Provider: Helicone, Context: 2000000, Output Limit: 2000000
|
|
|
helicone
|
Qwen3 Coder 480B A35B Instruct Turbo |
qwen3-coder
|
0.22 |
0.95 |
Provider: Helicone, Context: 262144, Output Limit: 16384
|
|
|
helicone
|
DeepSeek V3 |
deepseek-v3
|
0.56 |
1.68 |
Provider: Helicone, Context: 128000, Output Limit: 8192
|
|
|
helicone
|
Anthropic: Claude Opus 4 |
claude-opus-4
|
15.00 |
75.00 |
Provider: Helicone, Context: 200000, Output Limit: 32000
|
|
|
helicone
|
xAI: Grok 4 Fast Reasoning |
grok-4-fast-reasoning
|
0.20 |
0.50 |
Provider: Helicone, Context: 2000000, Output Limit: 2000000
|
|
|
helicone
|
Meta Llama 3.1 8B Instant |
llama-3.1-8b-instant
|
0.05 |
0.08 |
Provider: Helicone, Context: 131072, Output Limit: 32678
|
|
|
helicone
|
Anthropic: Claude Opus 4.1 |
claude-opus-4-1
|
15.00 |
75.00 |
Provider: Helicone, Context: 200000, Output Limit: 32000
|
|
|
helicone
|
xAI Grok 4 |
grok-4
|
3.00 |
15.00 |
Provider: Helicone, Context: 256000, Output Limit: 256000
|
|
|
helicone
|
Qwen3 Next 80B A3B Instruct |
qwen3-next-80b-a3b-instruct
|
0.14 |
1.40 |
Provider: Helicone, Context: 262000, Output Limit: 16384
|
|
|
helicone
|
Meta Llama 4 Maverick 17B 128E |
llama-4-maverick
|
0.15 |
0.60 |
Provider: Helicone, Context: 131072, Output Limit: 8192
|
|
|
helicone
|
Meta Llama Prompt Guard 2 86M |
llama-prompt-guard-2-86m
|
0.01 |
0.01 |
Provider: Helicone, Context: 512, Output Limit: 2
|
|
|
helicone
|
xAI Grok 4.1 Fast Reasoning |
grok-4-1-fast-reasoning
|
0.20 |
0.50 |
Provider: Helicone, Context: 2000000, Output Limit: 2000000
|
|
|
helicone
|
xAI Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
Provider: Helicone, Context: 256000, Output Limit: 10000
|
|
|
helicone
|
Anthropic: Claude 4.5 Haiku |
claude-4.5-haiku
|
1.00 |
5.00 |
Provider: Helicone, Context: 200000, Output Limit: 8192
|
|
|
helicone
|
Meta Llama 3.1 8B Instruct Turbo |
llama-3.1-8b-instruct-turbo
|
0.02 |
0.03 |
Provider: Helicone, Context: 128000, Output Limit: 128000
|
|
|
helicone
|
OpenAI: GPT-5.1 Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
Provider: Helicone, Context: 400000, Output Limit: 128000
|
|
|
helicone
|
OpenAI GPT-4.1 Mini |
gpt-4.1-mini-2025-04-14
|
0.40 |
1.60 |
Provider: Helicone, Context: 1047576, Output Limit: 32768
|
|
|
helicone
|
Meta Llama Guard 4 12B |
llama-guard-4
|
0.21 |
0.21 |
Provider: Helicone, Context: 131072, Output Limit: 1024
|
|
|
helicone
|
Meta Llama 3.1 8B Instruct |
llama-3.1-8b-instruct
|
0.02 |
0.05 |
Provider: Helicone, Context: 16384, Output Limit: 16384
|
|
|
helicone
|
Google Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Provider: Helicone, Context: 1048576, Output Limit: 65536
|
|
|
helicone
|
Google Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: Helicone, Context: 1048576, Output Limit: 65535
|
|
|
helicone
|
OpenAI GPT-4.1 Mini |
gpt-4.1-mini
|
0.40 |
1.60 |
Provider: Helicone, Context: 1047576, Output Limit: 32768
|
|
|
helicone
|
DeepSeek V3.1 Terminus |
deepseek-v3.1-terminus
|
0.27 |
1.00 |
Provider: Helicone, Context: 128000, Output Limit: 16384
|
|
|
helicone
|
Meta Llama Prompt Guard 2 22M |
llama-prompt-guard-2-22m
|
0.01 |
0.01 |
Provider: Helicone, Context: 512, Output Limit: 2
|
|
|
helicone
|
Anthropic: Claude 3.5 Sonnet v2 |
claude-3.5-sonnet-v2
|
3.00 |
15.00 |
Provider: Helicone, Context: 200000, Output Limit: 8192
|
|
|
helicone
|
Perplexity Sonar Deep Research |
sonar-deep-research
|
2.00 |
8.00 |
Provider: Helicone, Context: 127000, Output Limit: 4096
|
|
|
helicone
|
Google Gemini 2.5 Flash Lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Provider: Helicone, Context: 1048576, Output Limit: 65535
|
|
|
helicone
|
Anthropic: Claude Sonnet 4.5 (20250929) |
claude-sonnet-4-5-20250929
|
3.00 |
15.00 |
Provider: Helicone, Context: 200000, Output Limit: 64000
|
|
|
helicone
|
xAI Grok 3 |
grok-3
|
3.00 |
15.00 |
Provider: Helicone, Context: 131072, Output Limit: 131072
|
|
|
helicone
|
Mistral Small |
mistral-small
|
75.00 |
200.00 |
Provider: Helicone, Context: 128000, Output Limit: 128000
|
|
|
helicone
|
Kimi K2 (07/11) |
kimi-k2-0711
|
0.57 |
2.30 |
Provider: Helicone, Context: 131072, Output Limit: 16384
|
|
|
helicone
|
OpenAI ChatGPT-4o |
chatgpt-4o-latest
|
5.00 |
20.00 |
Provider: Helicone, Context: 128000, Output Limit: 16384
|
|
|
helicone
|
Qwen3 Coder 30B A3B Instruct |
qwen3-coder-30b-a3b-instruct
|
0.10 |
0.30 |
Provider: Helicone, Context: 262144, Output Limit: 262144
|
|
|
helicone
|
Kimi K2 (09/05) |
kimi-k2-0905
|
0.50 |
2.00 |
Provider: Helicone, Context: 262144, Output Limit: 16384
|
|
|
helicone
|
Perplexity Sonar Reasoning |
sonar-reasoning
|
1.00 |
5.00 |
Provider: Helicone, Context: 127000, Output Limit: 4096
|
|
|
helicone
|
Meta Llama 3.3 70B Instruct |
llama-3.3-70b-instruct
|
0.13 |
0.39 |
Provider: Helicone, Context: 128000, Output Limit: 16400
|
|
|
helicone
|
OpenAI: GPT-5.1 Codex Mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
Provider: Helicone, Context: 400000, Output Limit: 128000
|
|
|
helicone
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.48 |
2.00 |
Provider: Helicone, Context: 256000, Output Limit: 262144
|
|
|
helicone
|
OpenAI o3 Mini |
o3-mini
|
1.10 |
4.40 |
Provider: Helicone, Context: 200000, Output Limit: 100000
|
|
|
helicone
|
Anthropic: Claude Sonnet 4.5 |
claude-4.5-sonnet
|
3.00 |
15.00 |
Provider: Helicone, Context: 200000, Output Limit: 64000
|
|
|
helicone
|
OpenAI GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: Helicone, Context: 400000, Output Limit: 128000
|
|
|
helicone
|
OpenAI Codex Mini Latest |
codex-mini-latest
|
1.50 |
6.00 |
Provider: Helicone, Context: 200000, Output Limit: 100000
|
|
|
helicone
|
OpenAI GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: Helicone, Context: 400000, Output Limit: 128000
|
|
|
helicone
|
OpenAI: GPT-5 Codex |
gpt-5-codex
|
1.25 |
10.00 |
Provider: Helicone, Context: 400000, Output Limit: 128000
|
|
|
helicone
|
OpenAI GPT-4o |
gpt-4o
|
2.50 |
10.00 |
Provider: Helicone, Context: 128000, Output Limit: 16384
|
|
|
helicone
|
DeepSeek TNG R1T2 Chimera |
deepseek-tng-r1t2-chimera
|
0.30 |
1.20 |
Provider: Helicone, Context: 130000, Output Limit: 163840
|
|
|
helicone
|
Anthropic: Claude Opus 4.5 |
claude-4.5-opus
|
5.00 |
25.00 |
Provider: Helicone, Context: 200000, Output Limit: 64000
|
|
|
helicone
|
OpenAI GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: Helicone, Context: 1047576, Output Limit: 32768
|
|
|
helicone
|
Perplexity Sonar |
sonar
|
1.00 |
1.00 |
Provider: Helicone, Context: 127000, Output Limit: 4096
|
|
|
helicone
|
Zai GLM-4.6 |
glm-4.6
|
0.45 |
1.50 |
Provider: Helicone, Context: 204800, Output Limit: 131072
|
|
|
helicone
|
OpenAI o4 Mini |
o4-mini
|
1.10 |
4.40 |
Provider: Helicone, Context: 200000, Output Limit: 100000
|
|
|
helicone
|
Qwen3 235B A22B Thinking |
qwen3-235b-a22b-thinking
|
0.30 |
2.90 |
Provider: Helicone, Context: 262144, Output Limit: 81920
|
|
|
helicone
|
Hermes 2 Pro Llama 3 8B |
hermes-2-pro-llama-3-8b
|
0.14 |
0.14 |
Provider: Helicone, Context: 131072, Output Limit: 131072
|
|
|
helicone
|
OpenAI: o1 |
o1
|
15.00 |
60.00 |
Provider: Helicone, Context: 200000, Output Limit: 100000
|
|
|
helicone
|
xAI Grok 3 Mini |
grok-3-mini
|
0.30 |
0.50 |
Provider: Helicone, Context: 131072, Output Limit: 131072
|
|
|
helicone
|
Perplexity Sonar Pro |
sonar-pro
|
3.00 |
15.00 |
Provider: Helicone, Context: 200000, Output Limit: 4096
|
|
|
helicone
|
OpenAI GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: Helicone, Context: 400000, Output Limit: 128000
|
|
|
helicone
|
DeepSeek R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.03 |
0.13 |
Provider: Helicone, Context: 128000, Output Limit: 4096
|
|
|
helicone
|
OpenAI: o1-mini |
o1-mini
|
1.10 |
4.40 |
Provider: Helicone, Context: 128000, Output Limit: 65536
|
|
|
helicone
|
Anthropic: Claude 3.7 Sonnet |
claude-3.7-sonnet
|
3.00 |
15.00 |
Provider: Helicone, Context: 200000, Output Limit: 64000
|
|
|
helicone
|
Anthropic: Claude 3 Haiku |
claude-3-haiku-20240307
|
0.25 |
1.25 |
Provider: Helicone, Context: 200000, Output Limit: 4096
|
|
|
helicone
|
OpenAI o3 Pro |
o3-pro
|
20.00 |
80.00 |
Provider: Helicone, Context: 200000, Output Limit: 100000
|
|
|
helicone
|
Qwen2.5 Coder 7B fast |
qwen2.5-coder-7b-fast
|
0.03 |
0.09 |
Provider: Helicone, Context: 32000, Output Limit: 8192
|
|
|
helicone
|
DeepSeek Reasoner |
deepseek-reasoner
|
0.56 |
1.68 |
Provider: Helicone, Context: 128000, Output Limit: 64000
|
|
|
helicone
|
Google Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: Helicone, Context: 1048576, Output Limit: 65536
|
|
|
helicone
|
Google Gemma 3 12B |
gemma-3-12b-it
|
0.05 |
0.10 |
Provider: Helicone, Context: 131072, Output Limit: 8192
|
|
|
helicone
|
Mistral Nemo |
mistral-nemo
|
20.00 |
40.00 |
Provider: Helicone, Context: 128000, Output Limit: 16400
|
|
|
helicone
|
OpenAI o3 |
o3
|
2.00 |
8.00 |
Provider: Helicone, Context: 200000, Output Limit: 100000
|
|
|
helicone
|
OpenAI GPT-OSS 20b |
gpt-oss-20b
|
0.05 |
0.20 |
Provider: Helicone, Context: 131072, Output Limit: 131072
|
|
|
helicone
|
OpenAI GPT-OSS 120b |
gpt-oss-120b
|
0.04 |
0.16 |
Provider: Helicone, Context: 131072, Output Limit: 131072
|
|
|
helicone
|
Anthropic: Claude 3.5 Haiku |
claude-3.5-haiku
|
0.80 |
4.00 |
Provider: Helicone, Context: 200000, Output Limit: 8192
|
|
|
helicone
|
OpenAI GPT-5 Chat Latest |
gpt-5-chat-latest
|
1.25 |
10.00 |
Provider: Helicone, Context: 128000, Output Limit: 16384
|
|
|
helicone
|
OpenAI GPT-4o-mini |
gpt-4o-mini
|
0.15 |
0.60 |
Provider: Helicone, Context: 128000, Output Limit: 16384
|
|
|
helicone
|
Google Gemma 2 |
gemma2-9b-it
|
0.01 |
0.03 |
Provider: Helicone, Context: 8192, Output Limit: 8192
|
|
|
helicone
|
Anthropic: Claude Sonnet 4 |
claude-sonnet-4
|
3.00 |
15.00 |
Provider: Helicone, Context: 200000, Output Limit: 64000
|
|
|
helicone
|
Perplexity Sonar Reasoning Pro |
sonar-reasoning-pro
|
2.00 |
8.00 |
Provider: Helicone, Context: 127000, Output Limit: 4096
|
|
|
helicone
|
OpenAI GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: Helicone, Context: 400000, Output Limit: 128000
|
|
|
helicone
|
Qwen3 VL 235B A22B Instruct |
qwen3-vl-235b-a22b-instruct
|
0.30 |
1.50 |
Provider: Helicone, Context: 256000, Output Limit: 16384
|
|
|
helicone
|
Qwen3 30B A3B |
qwen3-30b-a3b
|
0.08 |
0.29 |
Provider: Helicone, Context: 41000, Output Limit: 41000
|
|
|
helicone
|
DeepSeek V3.2 |
deepseek-v3.2
|
0.27 |
0.41 |
Provider: Helicone, Context: 163840, Output Limit: 65536
|
|
|
helicone
|
xAI Grok 4.1 Fast Non-Reasoning |
grok-4-1-fast-non-reasoning
|
0.20 |
0.50 |
Provider: Helicone, Context: 2000000, Output Limit: 30000
|
|
|
helicone
|
OpenAI: GPT-5 Pro |
gpt-5-pro
|
15.00 |
120.00 |
Provider: Helicone, Context: 128000, Output Limit: 32768
|
|
|
helicone
|
Meta Llama 3.3 70B Versatile |
llama-3.3-70b-versatile
|
0.59 |
0.79 |
Provider: Helicone, Context: 131072, Output Limit: 32678
|
|
|
helicone
|
Mistral-Large |
mistral-large-2411
|
2.00 |
6.00 |
Provider: Helicone, Context: 128000, Output Limit: 32768
|
|
|
helicone
|
Anthropic: Claude Opus 4.1 (20250805) |
claude-opus-4-1-20250805
|
15.00 |
75.00 |
Provider: Helicone, Context: 200000, Output Limit: 32000
|
|
|
helicone
|
Baidu Ernie 4.5 21B A3B Thinking |
ernie-4.5-21b-a3b-thinking
|
0.07 |
0.28 |
Provider: Helicone, Context: 128000, Output Limit: 8000
|
|
|
helicone
|
OpenAI GPT-5.1 Chat |
gpt-5.1-chat-latest
|
1.25 |
10.00 |
Provider: Helicone, Context: 128000, Output Limit: 16384
|
|
|
helicone
|
Qwen3 32B |
qwen3-32b
|
0.29 |
0.59 |
Provider: Helicone, Context: 131072, Output Limit: 40960
|
|
|
helicone
|
Anthropic: Claude 4.5 Haiku (20251001) |
claude-haiku-4-5-20251001
|
1.00 |
5.00 |
Provider: Helicone, Context: 200000, Output Limit: 8192
|
|
|
helicone
|
Meta Llama 4 Scout 17B 16E |
llama-4-scout
|
0.08 |
0.30 |
Provider: Helicone, Context: 131072, Output Limit: 8192
|
|
|
huggingface
|
Kimi-K2-Instruct |
kimi-k2-instruct
|
1.00 |
3.00 |
Provider: Hugging Face, Context: 131072, Output Limit: 16384
|
|
|
huggingface
|
Kimi-K2-Instruct-0905 |
kimi-k2-instruct-0905
|
1.00 |
3.00 |
Provider: Hugging Face, Context: 262144, Output Limit: 16384
|
|
|
huggingface
|
MiniMax-M2 |
minimax-m2
|
0.30 |
1.20 |
Provider: Hugging Face, Context: 204800, Output Limit: 204800
|
|
|
huggingface
|
Qwen 3 Embedding 8B |
qwen3-embedding-8b
|
0.01 |
0.00 |
Provider: Hugging Face, Context: 32000, Output Limit: 4096
|
|
|
huggingface
|
Qwen 3 Embedding 4B |
qwen3-embedding-4b
|
0.01 |
0.00 |
Provider: Hugging Face, Context: 32000, Output Limit: 2048
|
|
|
huggingface
|
Qwen3-Coder-480B-A35B-Instruct |
qwen3-coder-480b-a35b-instruct
|
2.00 |
2.00 |
Provider: Hugging Face, Context: 262144, Output Limit: 66536
|
|
|
huggingface
|
Qwen3-235B-A22B-Thinking-2507 |
qwen3-235b-a22b-thinking-2507
|
0.30 |
3.00 |
Provider: Hugging Face, Context: 262144, Output Limit: 131072
|
|
|
huggingface
|
Qwen3-Next-80B-A3B-Instruct |
qwen3-next-80b-a3b-instruct
|
0.25 |
1.00 |
Provider: Hugging Face, Context: 262144, Output Limit: 66536
|
|
|
huggingface
|
Qwen3-Next-80B-A3B-Thinking |
qwen3-next-80b-a3b-thinking
|
0.30 |
2.00 |
Provider: Hugging Face, Context: 262144, Output Limit: 131072
|
|
|
huggingface
|
GLM-4.5 |
glm-4.5
|
0.60 |
2.20 |
Provider: Hugging Face, Context: 131072, Output Limit: 98304
|
|
|
huggingface
|
GLM-4.6 |
glm-4.6
|
0.60 |
2.20 |
Provider: Hugging Face, Context: 200000, Output Limit: 128000
|
|
|
huggingface
|
GLM-4.5-Air |
glm-4.5-air
|
0.20 |
1.10 |
Provider: Hugging Face, Context: 128000, Output Limit: 96000
|
|
|
huggingface
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
1.25 |
1.25 |
Provider: Hugging Face, Context: 16384, Output Limit: 8192
|
|
|
huggingface
|
DeepSeek-R1-0528 |
deepseek-r1-0528
|
3.00 |
5.00 |
Provider: Hugging Face, Context: 163840, Output Limit: 163840
|
|
|
opencode
|
Qwen3 Coder |
qwen3-coder
|
0.45 |
1.80 |
Provider: OpenCode Zen, Context: 262144, Output Limit: 65536
|
|
|
opencode
|
Claude Opus 4.1 |
claude-opus-4-1
|
15.00 |
75.00 |
Provider: OpenCode Zen, Context: 200000, Output Limit: 32000
|
|
|
opencode
|
Kimi K2 |
kimi-k2
|
0.40 |
2.50 |
Provider: OpenCode Zen, Context: 262144, Output Limit: 262144
|
|
|
opencode
|
GPT-5.1 Codex |
gpt-5.1-codex
|
1.07 |
8.50 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
opencode
|
Claude Haiku 4.5 |
claude-haiku-4-5
|
1.00 |
5.00 |
Provider: OpenCode Zen, Context: 200000, Output Limit: 64000
|
|
|
opencode
|
Claude Opus 4.5 |
claude-opus-4-5
|
5.00 |
25.00 |
Provider: OpenCode Zen, Context: 200000, Output Limit: 64000
|
|
|
opencode
|
Gemini 3 Pro |
gemini-3-pro
|
2.00 |
12.00 |
Provider: OpenCode Zen, Context: 1048576, Output Limit: 65536
|
|
|
opencode
|
Alpha GLM-4.7 |
alpha-glm-4.7
|
0.60 |
2.20 |
Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
|
|
|
opencode
|
Claude Sonnet 4.5 |
claude-sonnet-4-5
|
3.00 |
15.00 |
Provider: OpenCode Zen, Context: 1000000, Output Limit: 64000
|
|
|
opencode
|
GPT-5.1 Codex Mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
opencode
|
Alpha GD4 |
alpha-gd4
|
0.50 |
2.00 |
Provider: OpenCode Zen, Context: 262144, Output Limit: 32768
|
|
|
opencode
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.40 |
2.50 |
Provider: OpenCode Zen, Context: 262144, Output Limit: 262144
|
|
|
opencode
|
GPT-5.1 |
gpt-5.1
|
1.07 |
8.50 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
opencode
|
GPT-5 Nano |
gpt-5-nano
|
0.00 |
0.00 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
opencode
|
GPT-5 Codex |
gpt-5-codex
|
1.07 |
8.50 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
opencode
|
Big Pickle |
big-pickle
|
0.00 |
0.00 |
Provider: OpenCode Zen, Context: 200000, Output Limit: 128000
|
|
|
opencode
|
Claude Haiku 3.5 |
claude-3-5-haiku
|
0.80 |
4.00 |
Provider: OpenCode Zen, Context: 200000, Output Limit: 8192
|
|
|
opencode
|
GLM-4.6 |
glm-4.6
|
0.60 |
2.20 |
Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
|
|
|
opencode
|
GLM-4.7 |
glm-4.7-free
|
0.00 |
0.00 |
Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
|
|
|
opencode
|
Grok Code Fast 1 |
grok-code
|
0.00 |
0.00 |
Provider: OpenCode Zen, Context: 256000, Output Limit: 256000
|
|
|
opencode
|
Gemini 3 Flash |
gemini-3-flash
|
0.50 |
3.00 |
Provider: OpenCode Zen, Context: 1048576, Output Limit: 65536
|
|
|
opencode
|
GPT-5.1 Codex Max |
gpt-5.1-codex-max
|
1.25 |
10.00 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
opencode
|
MiniMax M2.1 |
minimax-m2.1-free
|
0.00 |
0.00 |
Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
|
|
|
opencode
|
Claude Sonnet 4 |
claude-sonnet-4
|
3.00 |
15.00 |
Provider: OpenCode Zen, Context: 1000000, Output Limit: 64000
|
|
|
opencode
|
GPT-5 |
gpt-5
|
1.07 |
8.50 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
opencode
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
|
|
|
fastrouter
|
Kimi K2 |
kimi-k2
|
0.55 |
2.20 |
Provider: FastRouter, Context: 131072, Output Limit: 32768
|
|
|
fastrouter
|
Grok 4 |
grok-4
|
3.00 |
15.00 |
Provider: FastRouter, Context: 256000, Output Limit: 64000
|
|
|
fastrouter
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: FastRouter, Context: 1048576, Output Limit: 65536
|
|
|
fastrouter
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: FastRouter, Context: 1048576, Output Limit: 65536
|
|
|
fastrouter
|
GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: FastRouter, Context: 400000, Output Limit: 128000
|
|
|
fastrouter
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: FastRouter, Context: 1047576, Output Limit: 32768
|
|
|
fastrouter
|
GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: FastRouter, Context: 400000, Output Limit: 128000
|
|
|
fastrouter
|
GPT OSS 20B |
gpt-oss-20b
|
0.05 |
0.20 |
Provider: FastRouter, Context: 131072, Output Limit: 65536
|
|
|
fastrouter
|
GPT OSS 120B |
gpt-oss-120b
|
0.15 |
0.60 |
Provider: FastRouter, Context: 131072, Output Limit: 32768
|
|
|
fastrouter
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: FastRouter, Context: 400000, Output Limit: 128000
|
|
|
fastrouter
|
Qwen3 Coder |
qwen3-coder
|
0.30 |
1.20 |
Provider: FastRouter, Context: 262144, Output Limit: 66536
|
|
|
fastrouter
|
Claude Opus 4.1 |
claude-opus-4.1
|
15.00 |
75.00 |
Provider: FastRouter, Context: 200000, Output Limit: 32000
|
|
|
fastrouter
|
Claude Sonnet 4 |
claude-sonnet-4
|
3.00 |
15.00 |
Provider: FastRouter, Context: 200000, Output Limit: 64000
|
|
|
fastrouter
|
DeepSeek R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.03 |
0.14 |
Provider: FastRouter, Context: 131072, Output Limit: 131072
|
|
|
minimax
|
MiniMax-M2 |
minimax-m2
|
0.30 |
1.20 |
Provider: MiniMax, Context: 196608, Output Limit: 128000
|
|
|
minimax
|
MiniMax-M2.1 |
minimax-m2.1
|
0.30 |
1.20 |
Provider: MiniMax, Context: 204800, Output Limit: 131072
|
|
|
google
|
Gemini Embedding 001 |
gemini-embedding-001
|
0.15 |
0.00 |
Provider: Google, Context: 2048, Output Limit: 3072
|
|
|
google
|
Gemini 3 Flash Preview |
gemini-3-flash-preview
|
0.50 |
3.00 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 2.5 Flash Image |
gemini-2.5-flash-image
|
0.30 |
30.00 |
Provider: Google, Context: 32768, Output Limit: 32768
|
|
|
google
|
Gemini 2.5 Flash Preview 05-20 |
gemini-2.5-flash-preview-05-20
|
0.15 |
0.60 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini Flash-Lite Latest |
gemini-flash-lite-latest
|
0.10 |
0.40 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Provider: Google, Context: 1000000, Output Limit: 64000
|
|
|
google
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini Flash Latest |
gemini-flash-latest
|
0.30 |
2.50 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 2.5 Pro Preview 05-06 |
gemini-2.5-pro-preview-05-06
|
1.25 |
10.00 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 2.5 Flash Preview TTS |
gemini-2.5-flash-preview-tts
|
0.50 |
10.00 |
Provider: Google, Context: 8000, Output Limit: 16000
|
|
|
google
|
Gemini 2.0 Flash Lite |
gemini-2.0-flash-lite
|
0.08 |
0.30 |
Provider: Google, Context: 1048576, Output Limit: 8192
|
|
|
google
|
Gemini Live 2.5 Flash Preview Native Audio |
gemini-live-2.5-flash-preview-native-audio
|
0.50 |
2.00 |
Provider: Google, Context: 131072, Output Limit: 65536
|
|
|
google
|
Gemini 2.0 Flash |
gemini-2.0-flash
|
0.10 |
0.40 |
Provider: Google, Context: 1048576, Output Limit: 8192
|
|
|
google
|
Gemini 2.5 Flash Lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 2.5 Pro Preview 06-05 |
gemini-2.5-pro-preview-06-05
|
1.25 |
10.00 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini Live 2.5 Flash |
gemini-live-2.5-flash
|
0.50 |
2.00 |
Provider: Google, Context: 128000, Output Limit: 8000
|
|
|
google
|
Gemini 2.5 Flash Lite Preview 06-17 |
gemini-2.5-flash-lite-preview-06-17
|
0.10 |
0.40 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 2.5 Flash Image (Preview) |
gemini-2.5-flash-image-preview
|
0.30 |
30.00 |
Provider: Google, Context: 32768, Output Limit: 32768
|
|
|
google
|
Gemini 2.5 Flash Preview 09-25 |
gemini-2.5-flash-preview-09-2025
|
0.30 |
2.50 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 2.5 Flash Preview 04-17 |
gemini-2.5-flash-preview-04-17
|
0.15 |
0.60 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 2.5 Pro Preview TTS |
gemini-2.5-pro-preview-tts
|
1.00 |
20.00 |
Provider: Google, Context: 8000, Output Limit: 16000
|
|
|
google
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 1.5 Flash |
gemini-1.5-flash
|
0.08 |
0.30 |
Provider: Google, Context: 1000000, Output Limit: 8192
|
|
|
google
|
Gemini 1.5 Flash-8B |
gemini-1.5-flash-8b
|
0.04 |
0.15 |
Provider: Google, Context: 1000000, Output Limit: 8192
|
|
|
google
|
Gemini 2.5 Flash Lite Preview 09-25 |
gemini-2.5-flash-lite-preview-09-2025
|
0.10 |
0.40 |
Provider: Google, Context: 1048576, Output Limit: 65536
|
|
|
google
|
Gemini 1.5 Pro |
gemini-1.5-pro
|
1.25 |
5.00 |
Provider: Google, Context: 1000000, Output Limit: 8192
|
|
|
googlevertex
|
Gemini Embedding 001 |
gemini-embedding-001
|
0.15 |
0.00 |
Provider: Vertex, Context: 2048, Output Limit: 3072
|
|
|
googlevertex
|
Gemini 3 Flash Preview |
gemini-3-flash-preview
|
0.50 |
3.00 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Flash Preview 05-20 |
gemini-2.5-flash-preview-05-20
|
0.15 |
0.60 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini Flash-Lite Latest |
gemini-flash-lite-latest
|
0.10 |
0.40 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini Flash Latest |
gemini-flash-latest
|
0.30 |
2.50 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Pro Preview 05-06 |
gemini-2.5-pro-preview-05-06
|
1.25 |
10.00 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.0 Flash Lite |
gemini-2.0-flash-lite
|
0.08 |
0.30 |
Provider: Vertex, Context: 1048576, Output Limit: 8192
|
|
|
googlevertex
|
Gemini 2.0 Flash |
gemini-2.0-flash
|
0.10 |
0.40 |
Provider: Vertex, Context: 1048576, Output Limit: 8192
|
|
|
googlevertex
|
Gemini 2.5 Flash Lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Pro Preview 06-05 |
gemini-2.5-pro-preview-06-05
|
1.25 |
10.00 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Flash Lite Preview 06-17 |
gemini-2.5-flash-lite-preview-06-17
|
0.10 |
0.40 |
Provider: Vertex, Context: 65536, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Flash Preview 09-25 |
gemini-2.5-flash-preview-09-2025
|
0.30 |
2.50 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Flash Preview 04-17 |
gemini-2.5-flash-preview-04-17
|
0.15 |
0.60 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
Gemini 2.5 Flash Lite Preview 09-25 |
gemini-2.5-flash-lite-preview-09-2025
|
0.10 |
0.40 |
Provider: Vertex, Context: 1048576, Output Limit: 65536
|
|
|
googlevertex
|
GPT OSS 120B |
gpt-oss-120b-maas
|
0.09 |
0.36 |
Provider: Vertex, Context: 131072, Output Limit: 32768
|
|
|
googlevertex
|
GPT OSS 20B |
gpt-oss-20b-maas
|
0.07 |
0.25 |
Provider: Vertex, Context: 131072, Output Limit: 32768
|
|
|
cloudflareworkersai
|
@hf/thebloke/mistral-7b-instruct-v0.1-awq |
mistral-7b-instruct-v0.1-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/deepgram/aura-1 |
aura-1
|
0.02 |
0.02 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@hf/mistral/mistral-7b-instruct-v0.2 |
mistral-7b-instruct-v0.2
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 3072, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/tinyllama/tinyllama-1.1b-chat-v1.0 |
tinyllama-1.1b-chat-v1.0
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 2048, Output Limit: 2048
|
|
|
cloudflareworkersai
|
@cf/qwen/qwen1.5-0.5b-chat |
qwen1.5-0.5b-chat
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.2-11b-vision-instruct |
llama-3.2-11b-vision-instruct
|
0.05 |
0.68 |
Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
|
|
|
cloudflareworkersai
|
@hf/thebloke/llama-2-13b-chat-awq |
llama-2-13b-chat-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.1-8b-instruct-fp8 |
llama-3.1-8b-instruct-fp8
|
0.15 |
0.29 |
Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
|
|
|
cloudflareworkersai
|
@cf/openai/whisper |
whisper
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/stabilityai/stable-diffusion-xl-base-1.0 |
stable-diffusion-xl-base-1.0
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/meta/llama-2-7b-chat-fp16 |
llama-2-7b-chat-fp16
|
0.56 |
6.67 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/microsoft/resnet-50 |
resnet-50
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/runwayml/stable-diffusion-v1-5-inpainting |
stable-diffusion-v1-5-inpainting
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/defog/sqlcoder-7b-2 |
sqlcoder-7b-2
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 10000, Output Limit: 10000
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3-8b-instruct |
llama-3-8b-instruct
|
0.28 |
0.83 |
Provider: Cloudflare Workers AI, Context: 7968, Output Limit: 7968
|
|
|
cloudflareworkersai
|
@cf/meta-llama/llama-2-7b-chat-hf-lora |
llama-2-7b-chat-hf-lora
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.1-8b-instruct |
llama-3.1-8b-instruct
|
0.28 |
0.83 |
Provider: Cloudflare Workers AI, Context: 7968, Output Limit: 7968
|
|
|
cloudflareworkersai
|
@cf/openchat/openchat-3.5-0106 |
openchat-3.5-0106
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
|
|
|
cloudflareworkersai
|
@hf/thebloke/openhermes-2.5-mistral-7b-awq |
openhermes-2.5-mistral-7b-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/leonardo/lucid-origin |
lucid-origin
|
0.01 |
0.01 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/facebook/bart-large-cnn |
bart-large-cnn
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/black-forest-labs/flux-1-schnell |
flux-1-schnell
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 2048, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b |
deepseek-r1-distill-qwen-32b
|
0.50 |
4.88 |
Provider: Cloudflare Workers AI, Context: 80000, Output Limit: 80000
|
|
|
cloudflareworkersai
|
@cf/google/gemma-2b-it-lora |
gemma-2b-it-lora
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
|
|
|
cloudflareworkersai
|
@cf/fblgit/una-cybertron-7b-v2-bf16 |
una-cybertron-7b-v2-bf16
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 15000, Output Limit: 15000
|
|
|
cloudflareworkersai
|
@cf/aisingapore/gemma-sea-lion-v4-27b-it |
gemma-sea-lion-v4-27b-it
|
0.35 |
0.56 |
Provider: Cloudflare Workers AI, Context: 128000, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/meta/m2m100-1.2b |
m2m100-1.2b
|
0.34 |
0.34 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.2-3b-instruct |
llama-3.2-3b-instruct
|
0.05 |
0.34 |
Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
|
|
|
cloudflareworkersai
|
@cf/qwen/qwen2.5-coder-32b-instruct |
qwen2.5-coder-32b-instruct
|
0.66 |
1.00 |
Provider: Cloudflare Workers AI, Context: 32768, Output Limit: 32768
|
|
|
cloudflareworkersai
|
@cf/runwayml/stable-diffusion-v1-5-img2img |
stable-diffusion-v1-5-img2img
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/google/gemma-7b-it-lora |
gemma-7b-it-lora
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 3500, Output Limit: 3500
|
|
|
cloudflareworkersai
|
@cf/qwen/qwen1.5-14b-chat-awq |
qwen1.5-14b-chat-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 7500, Output Limit: 7500
|
|
|
cloudflareworkersai
|
@cf/qwen/qwen1.5-1.8b-chat |
qwen1.5-1.8b-chat
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
|
|
|
cloudflareworkersai
|
@cf/mistralai/mistral-small-3.1-24b-instruct |
mistral-small-3.1-24b-instruct
|
0.35 |
0.56 |
Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
|
|
|
cloudflareworkersai
|
@hf/google/gemma-7b-it |
gemma-7b-it
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
|
|
|
cloudflareworkersai
|
@cf/qwen/qwen3-30b-a3b-fp8 |
qwen3-30b-a3b-fp8
|
0.05 |
0.34 |
Provider: Cloudflare Workers AI, Context: 32768, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@hf/thebloke/llamaguard-7b-awq |
llamaguard-7b-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@hf/nousresearch/hermes-2-pro-mistral-7b |
hermes-2-pro-mistral-7b
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
|
|
|
cloudflareworkersai
|
@cf/ibm-granite/granite-4.0-h-micro |
granite-4.0-h-micro
|
0.02 |
0.11 |
Provider: Cloudflare Workers AI, Context: 131000, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/tiiuae/falcon-7b-instruct |
falcon-7b-instruct
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.3-70b-instruct-fp8-fast |
llama-3.3-70b-instruct-fp8-fast
|
0.29 |
2.25 |
Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3-8b-instruct-awq |
llama-3-8b-instruct-awq
|
0.12 |
0.27 |
Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
|
|
|
cloudflareworkersai
|
@cf/leonardo/phoenix-1.0 |
phoenix-1.0
|
0.01 |
0.01 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/microsoft/phi-2 |
phi-2
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 2048, Output Limit: 2048
|
|
|
cloudflareworkersai
|
@cf/lykon/dreamshaper-8-lcm |
dreamshaper-8-lcm
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/thebloke/discolm-german-7b-v1-awq |
discolm-german-7b-v1-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/meta/llama-2-7b-chat-int8 |
llama-2-7b-chat-int8
|
0.56 |
6.67 |
Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.2-1b-instruct |
llama-3.2-1b-instruct
|
0.03 |
0.20 |
Provider: Cloudflare Workers AI, Context: 60000, Output Limit: 60000
|
|
|
cloudflareworkersai
|
@cf/openai/whisper-large-v3-turbo |
whisper-large-v3-turbo
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/meta/llama-4-scout-17b-16e-instruct |
llama-4-scout-17b-16e-instruct
|
0.27 |
0.85 |
Provider: Cloudflare Workers AI, Context: 131000, Output Limit: 131000
|
|
|
cloudflareworkersai
|
@hf/nexusflow/starling-lm-7b-beta |
starling-lm-7b-beta
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@hf/thebloke/deepseek-coder-6.7b-base-awq |
deepseek-coder-6.7b-base-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/google/gemma-3-12b-it |
gemma-3-12b-it
|
0.35 |
0.56 |
Provider: Cloudflare Workers AI, Context: 80000, Output Limit: 80000
|
|
|
cloudflareworkersai
|
@cf/meta/llama-guard-3-8b |
llama-guard-3-8b
|
0.48 |
0.03 |
Provider: Cloudflare Workers AI, Context: 131072, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@hf/thebloke/neural-chat-7b-v3-1-awq |
neural-chat-7b-v3-1-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/openai/whisper-tiny-en |
whisper-tiny-en
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/bytedance/stable-diffusion-xl-lightning |
stable-diffusion-xl-lightning
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/mistral/mistral-7b-instruct-v0.1 |
mistral-7b-instruct-v0.1
|
0.11 |
0.19 |
Provider: Cloudflare Workers AI, Context: 2824, Output Limit: 2824
|
|
|
cloudflareworkersai
|
@cf/llava-hf/llava-1.5-7b-hf |
llava-1.5-7b-hf
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/openai/gpt-oss-20b |
gpt-oss-20b
|
0.20 |
0.30 |
Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
|
|
|
cloudflareworkersai
|
@cf/deepseek-ai/deepseek-math-7b-instruct |
deepseek-math-7b-instruct
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/openai/gpt-oss-120b |
gpt-oss-120b
|
0.35 |
0.75 |
Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
|
|
|
cloudflareworkersai
|
@cf/myshell-ai/melotts |
melotts
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/qwen/qwen1.5-7b-chat-awq |
qwen1.5-7b-chat-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 20000, Output Limit: 20000
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.1-8b-instruct-fast |
llama-3.1-8b-instruct-fast
|
0.05 |
0.38 |
Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
|
|
|
cloudflareworkersai
|
@cf/deepgram/nova-3 |
nova-3
|
0.01 |
0.01 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.1-70b-instruct |
llama-3.1-70b-instruct
|
0.29 |
2.25 |
Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
|
|
|
cloudflareworkersai
|
@cf/qwen/qwq-32b |
qwq-32b
|
0.66 |
1.00 |
Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
|
|
|
cloudflareworkersai
|
@hf/thebloke/zephyr-7b-beta-awq |
zephyr-7b-beta-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@hf/thebloke/deepseek-coder-6.7b-instruct-awq |
deepseek-coder-6.7b-instruct-awq
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
|
|
|
cloudflareworkersai
|
@cf/meta/llama-3.1-8b-instruct-awq |
llama-3.1-8b-instruct-awq
|
0.12 |
0.27 |
Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
|
|
|
cloudflareworkersai
|
@cf/mistral/mistral-7b-instruct-v0.2-lora |
mistral-7b-instruct-v0.2-lora
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: 15000, Output Limit: 15000
|
|
|
cloudflareworkersai
|
@cf/unum/uform-gen2-qwen-500m |
uform-gen2-qwen-500m
|
0.00 |
0.00 |
Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
|
|
|
inception
|
Mercury Coder |
mercury-coder
|
0.25 |
1.00 |
Provider: Inception, Context: 128000, Output Limit: 16384
|
|
|
inception
|
Mercury |
mercury
|
0.25 |
1.00 |
Provider: Inception, Context: 128000, Output Limit: 16384
|
|
|
wandb
|
Kimi-K2-Instruct |
kimi-k2-instruct
|
1.35 |
4.00 |
Provider: Weights & Biases, Context: 128000, Output Limit: 16384
|
|
|
wandb
|
Phi-4-mini-instruct |
phi-4-mini-instruct
|
0.08 |
0.35 |
Provider: Weights & Biases, Context: 128000, Output Limit: 4096
|
|
|
wandb
|
Meta-Llama-3.1-8B-Instruct |
llama-3.1-8b-instruct
|
0.22 |
0.22 |
Provider: Weights & Biases, Context: 128000, Output Limit: 32768
|
|
|
wandb
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.71 |
0.71 |
Provider: Weights & Biases, Context: 128000, Output Limit: 32768
|
|
|
wandb
|
Llama 4 Scout 17B 16E Instruct |
llama-4-scout-17b-16e-instruct
|
0.17 |
0.66 |
Provider: Weights & Biases, Context: 64000, Output Limit: 8192
|
|
|
wandb
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.10 |
0.10 |
Provider: Weights & Biases, Context: 262144, Output Limit: 131072
|
|
|
wandb
|
Qwen3-Coder-480B-A35B-Instruct |
qwen3-coder-480b-a35b-instruct
|
1.00 |
1.50 |
Provider: Weights & Biases, Context: 262144, Output Limit: 66536
|
|
|
wandb
|
Qwen3-235B-A22B-Thinking-2507 |
qwen3-235b-a22b-thinking-2507
|
0.10 |
0.10 |
Provider: Weights & Biases, Context: 262144, Output Limit: 131072
|
|
|
wandb
|
DeepSeek-R1-0528 |
deepseek-r1-0528
|
1.35 |
5.40 |
Provider: Weights & Biases, Context: 161000, Output Limit: 163840
|
|
|
wandb
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
1.14 |
2.75 |
Provider: Weights & Biases, Context: 161000, Output Limit: 8192
|
|
|
cloudflareaigateway
|
IBM Granite 4.0 H Micro |
granite-4.0-h-micro
|
0.02 |
0.11 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
BART Large CNN |
bart-large-cnn
|
0.00 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Mistral 7B Instruct v0.1 |
mistral-7b-instruct-v0.1
|
0.11 |
0.19 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
DistilBERT SST-2 INT8 |
distilbert-sst-2-int8
|
0.03 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
MyShell MeloTTS |
melotts
|
0.00 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Gemma 3 12B IT |
gemma-3-12b-it
|
0.35 |
0.56 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
PLaMo Embedding 1B |
plamo-embedding-1b
|
0.02 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
GPT OSS 20B |
gpt-oss-20b
|
0.20 |
0.30 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
GPT OSS 120B |
gpt-oss-120b
|
0.35 |
0.75 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
IndicTrans2 EN-Indic 1B |
indictrans2-en-indic-1b
|
0.34 |
0.34 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Pipecat Smart Turn v2 |
smart-turn-v2
|
0.00 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Qwen 2.5 Coder 32B Instruct |
qwen2.5-coder-32b-instruct
|
0.66 |
1.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Qwen3 30B A3B FP8 |
qwen3-30b-a3b-fp8
|
0.05 |
0.34 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Qwen3 Embedding 0.6B |
qwen3-embedding-0.6b
|
0.01 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
QwQ 32B |
qwq-32b
|
0.66 |
1.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Mistral Small 3.1 24B Instruct |
mistral-small-3.1-24b-instruct
|
0.35 |
0.56 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Deepgram Aura 2 (ES) |
aura-2-es
|
0.00 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Deepgram Aura 2 (EN) |
aura-2-en
|
0.00 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Deepgram Nova 3 |
nova-3
|
0.00 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Gemma SEA-LION v4 27B IT |
gemma-sea-lion-v4-27b-it
|
0.35 |
0.56 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3.2 11B Vision Instruct |
llama-3.2-11b-vision-instruct
|
0.05 |
0.68 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3.1 8B Instruct FP8 |
llama-3.1-8b-instruct-fp8
|
0.15 |
0.29 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 2 7B Chat FP16 |
llama-2-7b-chat-fp16
|
0.56 |
6.67 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3 8B Instruct |
llama-3-8b-instruct
|
0.28 |
0.83 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3.1 8B Instruct |
llama-3.1-8b-instruct
|
0.28 |
0.83 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
M2M100 1.2B |
m2m100-1.2b
|
0.34 |
0.34 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3.2 3B Instruct |
llama-3.2-3b-instruct
|
0.05 |
0.34 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3.3 70B Instruct FP8 Fast |
llama-3.3-70b-instruct-fp8-fast
|
0.29 |
2.25 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3 8B Instruct AWQ |
llama-3-8b-instruct-awq
|
0.12 |
0.27 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3.2 1B Instruct |
llama-3.2-1b-instruct
|
0.03 |
0.20 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 4 Scout 17B 16E Instruct |
llama-4-scout-17b-16e-instruct
|
0.27 |
0.85 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama Guard 3 8B |
llama-guard-3-8b
|
0.48 |
0.03 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
Llama 3.1 8B Instruct AWQ |
llama-3.1-8b-instruct-awq
|
0.12 |
0.27 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
BGE M3 |
bge-m3
|
0.01 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
BGE Base EN v1.5 |
bge-base-en-v1.5
|
0.07 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
BGE Large EN v1.5 |
bge-large-en-v1.5
|
0.20 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
BGE Reranker Base |
bge-reranker-base
|
0.00 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
BGE Small EN v1.5 |
bge-small-en-v1.5
|
0.02 |
0.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
DeepSeek R1 Distill Qwen 32B |
deepseek-r1-distill-qwen-32b
|
0.50 |
4.88 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
GPT-4 |
gpt-4
|
30.00 |
60.00 |
Provider: Cloudflare AI Gateway, Context: 8192, Output Limit: 8192
|
|
|
cloudflareaigateway
|
GPT-5.1 Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
|
|
|
cloudflareaigateway
|
GPT-3.5-turbo |
gpt-3.5-turbo
|
0.50 |
1.50 |
Provider: Cloudflare AI Gateway, Context: 16385, Output Limit: 4096
|
|
|
cloudflareaigateway
|
GPT-4 Turbo |
gpt-4-turbo
|
10.00 |
30.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 4096
|
|
|
cloudflareaigateway
|
o3-mini |
o3-mini
|
1.10 |
4.40 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
|
|
|
cloudflareaigateway
|
GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
|
|
|
cloudflareaigateway
|
GPT-4o |
gpt-4o
|
2.50 |
10.00 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
o4-mini |
o4-mini
|
1.10 |
4.40 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
|
|
|
cloudflareaigateway
|
o1 |
o1
|
15.00 |
60.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
|
|
|
cloudflareaigateway
|
o3-pro |
o3-pro
|
20.00 |
80.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
|
|
|
cloudflareaigateway
|
o3 |
o3
|
2.00 |
8.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
|
|
|
cloudflareaigateway
|
GPT-4o mini |
gpt-4o-mini
|
0.15 |
0.60 |
Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
|
|
|
cloudflareaigateway
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
|
|
|
cloudflareaigateway
|
Claude Opus 4 (latest) |
claude-opus-4
|
15.00 |
75.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 32000
|
|
|
cloudflareaigateway
|
Claude Opus 4.1 (latest) |
claude-opus-4-1
|
15.00 |
75.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 32000
|
|
|
cloudflareaigateway
|
Claude Haiku 4.5 (latest) |
claude-haiku-4-5
|
1.00 |
5.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
|
|
|
cloudflareaigateway
|
Claude Haiku 3 |
claude-3-haiku
|
0.25 |
1.25 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
|
|
|
cloudflareaigateway
|
Claude Opus 4.5 (latest) |
claude-opus-4-5
|
5.00 |
25.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
|
|
|
cloudflareaigateway
|
Claude Opus 3 |
claude-3-opus
|
15.00 |
75.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
|
|
|
cloudflareaigateway
|
Claude Sonnet 4.5 (latest) |
claude-sonnet-4-5
|
3.00 |
15.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
|
|
|
cloudflareaigateway
|
Claude Sonnet 3.5 v2 |
claude-3.5-sonnet
|
3.00 |
15.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
|
|
|
cloudflareaigateway
|
Claude Sonnet 3 |
claude-3-sonnet
|
3.00 |
15.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
|
|
|
cloudflareaigateway
|
Claude Haiku 3.5 (latest) |
claude-3-5-haiku
|
0.80 |
4.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
|
|
|
cloudflareaigateway
|
Claude Haiku 3.5 (latest) |
claude-3.5-haiku
|
0.80 |
4.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
|
|
|
cloudflareaigateway
|
Claude Sonnet 4 (latest) |
claude-sonnet-4
|
3.00 |
15.00 |
Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
|
|
|
openai
|
GPT-4.1 nano |
gpt-4.1-nano
|
0.10 |
0.40 |
Provider: OpenAI, Context: 1047576, Output Limit: 32768
|
|
|
openai
|
text-embedding-3-small |
text-embedding-3-small
|
0.02 |
0.00 |
Provider: OpenAI, Context: 8191, Output Limit: 1536
|
|
|
openai
|
GPT-4 |
gpt-4
|
30.00 |
60.00 |
Provider: OpenAI, Context: 8192, Output Limit: 8192
|
|
|
openai
|
o1-pro |
o1-pro
|
150.00 |
600.00 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
GPT-4o (2024-05-13) |
gpt-4o-2024-05-13
|
5.00 |
15.00 |
Provider: OpenAI, Context: 128000, Output Limit: 4096
|
|
|
openai
|
GPT-5.1 Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
GPT-4o (2024-08-06) |
gpt-4o-2024-08-06
|
2.50 |
10.00 |
Provider: OpenAI, Context: 128000, Output Limit: 16384
|
|
|
openai
|
GPT-4.1 mini |
gpt-4.1-mini
|
0.40 |
1.60 |
Provider: OpenAI, Context: 1047576, Output Limit: 32768
|
|
|
openai
|
o3-deep-research |
o3-deep-research
|
10.00 |
40.00 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
GPT-3.5-turbo |
gpt-3.5-turbo
|
0.50 |
1.50 |
Provider: OpenAI, Context: 16385, Output Limit: 4096
|
|
|
openai
|
GPT-5.2 Pro |
gpt-5.2-pro
|
21.00 |
168.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
text-embedding-3-large |
text-embedding-3-large
|
0.13 |
0.00 |
Provider: OpenAI, Context: 8191, Output Limit: 3072
|
|
|
openai
|
GPT-4 Turbo |
gpt-4-turbo
|
10.00 |
30.00 |
Provider: OpenAI, Context: 128000, Output Limit: 4096
|
|
|
openai
|
o1-preview |
o1-preview
|
15.00 |
60.00 |
Provider: OpenAI, Context: 128000, Output Limit: 32768
|
|
|
openai
|
GPT-5.1 Codex mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
o3-mini |
o3-mini
|
1.10 |
4.40 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
GPT-5.2 Chat |
gpt-5.2-chat-latest
|
1.75 |
14.00 |
Provider: OpenAI, Context: 128000, Output Limit: 16384
|
|
|
openai
|
GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
Codex Mini |
codex-mini-latest
|
1.50 |
6.00 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
GPT-5-Codex |
gpt-5-codex
|
1.25 |
10.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
GPT-4o |
gpt-4o
|
2.50 |
10.00 |
Provider: OpenAI, Context: 128000, Output Limit: 16384
|
|
|
openai
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: OpenAI, Context: 1047576, Output Limit: 32768
|
|
|
openai
|
o4-mini |
o4-mini
|
1.10 |
4.40 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
o1 |
o1
|
15.00 |
60.00 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
o1-mini |
o1-mini
|
1.10 |
4.40 |
Provider: OpenAI, Context: 128000, Output Limit: 65536
|
|
|
openai
|
text-embedding-ada-002 |
text-embedding-ada-002
|
0.10 |
0.00 |
Provider: OpenAI, Context: 8192, Output Limit: 1536
|
|
|
openai
|
o3-pro |
o3-pro
|
20.00 |
80.00 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
GPT-4o (2024-11-20) |
gpt-4o-2024-11-20
|
2.50 |
10.00 |
Provider: OpenAI, Context: 128000, Output Limit: 16384
|
|
|
openai
|
GPT-5.1 Codex Max |
gpt-5.1-codex-max
|
1.25 |
10.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
o3 |
o3
|
2.00 |
8.00 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
o4-mini-deep-research |
o4-mini-deep-research
|
2.00 |
8.00 |
Provider: OpenAI, Context: 200000, Output Limit: 100000
|
|
|
openai
|
GPT-5 Chat (latest) |
gpt-5-chat-latest
|
1.25 |
10.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
GPT-4o mini |
gpt-4o-mini
|
0.15 |
0.60 |
Provider: OpenAI, Context: 128000, Output Limit: 16384
|
|
|
openai
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
GPT-5 Pro |
gpt-5-pro
|
15.00 |
120.00 |
Provider: OpenAI, Context: 400000, Output Limit: 272000
|
|
|
openai
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
Provider: OpenAI, Context: 400000, Output Limit: 128000
|
|
|
openai
|
GPT-5.1 Chat |
gpt-5.1-chat-latest
|
1.25 |
10.00 |
Provider: OpenAI, Context: 128000, Output Limit: 16384
|
|
|
minimaxcn
|
MiniMax-M2.1 |
minimax-m2.1
|
0.30 |
1.20 |
Provider: MiniMax (China), Context: 204800, Output Limit: 131072
|
|
|
minimaxcn
|
MiniMax-M2 |
minimax-m2
|
0.30 |
1.20 |
Provider: MiniMax (China), Context: 196608, Output Limit: 128000
|
|
|
perplexity
|
Sonar |
sonar
|
1.00 |
1.00 |
Provider: Perplexity, Context: 128000, Output Limit: 4096
|
|
|
perplexity
|
Sonar Pro |
sonar-pro
|
3.00 |
15.00 |
Provider: Perplexity, Context: 200000, Output Limit: 8192
|
|
|
perplexity
|
Sonar Reasoning Pro |
sonar-reasoning-pro
|
2.00 |
8.00 |
Provider: Perplexity, Context: 128000, Output Limit: 4096
|
|
|
zenmux
|
Step-3 |
step-3
|
0.21 |
0.57 |
Provider: ZenMux, Context: 65536, Output Limit: 64000
|
|
|
zenmux
|
Kimi K2 Thinking Turbo |
kimi-k2-thinking-turbo
|
1.15 |
8.00 |
Provider: ZenMux, Context: 262144, Output Limit: 64000
|
|
|
zenmux
|
Kimi K2 0905 |
kimi-k2-0905
|
0.60 |
2.50 |
Provider: ZenMux, Context: 262100, Output Limit: 64000
|
|
|
zenmux
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Provider: ZenMux, Context: 262144, Output Limit: 64000
|
|
|
zenmux
|
MiMo-V2-Flash Free |
mimo-v2-flash-free
|
0.00 |
0.00 |
Provider: ZenMux, Context: 262144, Output Limit: 64000
|
|
|
zenmux
|
MiMo-V2-Flash |
mimo-v2-flash
|
0.00 |
0.00 |
Provider: ZenMux, Context: 262144, Output Limit: 64000
|
|
|
zenmux
|
Grok 4 |
grok-4
|
3.00 |
15.00 |
Provider: ZenMux, Context: 256000, Output Limit: 64000
|
|
|
zenmux
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
Provider: ZenMux, Context: 256000, Output Limit: 64000
|
|
|
zenmux
|
Grok 4.1 Fast Non Reasoning |
grok-4.1-fast-non-reasoning
|
0.20 |
0.50 |
Provider: ZenMux, Context: 2000000, Output Limit: 64000
|
|
|
zenmux
|
Grok 4 Fast |
grok-4-fast
|
0.20 |
0.50 |
Provider: ZenMux, Context: 2000000, Output Limit: 64000
|
|
|
zenmux
|
Grok 4.1 Fast |
grok-4.1-fast
|
0.20 |
0.50 |
Provider: ZenMux, Context: 2000000, Output Limit: 64000
|
|
|
zenmux
|
DeepSeek-V3.2 (Non-thinking Mode) |
deepseek-chat
|
0.28 |
0.42 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
DeepSeek-V3.2-Exp |
deepseek-v3.2-exp
|
0.22 |
0.33 |
Provider: ZenMux, Context: 163840, Output Limit: 64000
|
|
|
zenmux
|
DeepSeek-V3.2 (Thinking Mode) |
deepseek-reasoner
|
0.28 |
0.42 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
DeepSeek V3.2 |
deepseek-v3.2
|
0.28 |
0.43 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
MiniMax M2 |
minimax-m2
|
0.30 |
1.20 |
Provider: ZenMux, Context: 204800, Output Limit: 64000
|
|
|
zenmux
|
MiniMax M2.1 |
minimax-m2.1
|
0.30 |
1.20 |
Provider: ZenMux, Context: 204800, Output Limit: 64000
|
|
|
zenmux
|
Gemini 3 Flash Preview |
gemini-3-flash-preview
|
0.50 |
3.00 |
Provider: ZenMux, Context: 1048576, Output Limit: 64000
|
|
|
zenmux
|
Gemini 3 Flash Preview Free |
gemini-3-flash-preview-free
|
0.00 |
0.00 |
Provider: ZenMux, Context: 1048576, Output Limit: 64000
|
|
|
zenmux
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Provider: ZenMux, Context: 1048576, Output Limit: 64000
|
|
|
zenmux
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: ZenMux, Context: 1048576, Output Limit: 64000
|
|
|
zenmux
|
Gemini 2.5 Flash Lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Provider: ZenMux, Context: 1048576, Output Limit: 64000
|
|
|
zenmux
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: ZenMux, Context: 1048576, Output Limit: 65536
|
|
|
zenmux
|
Doubao-Seed-Code |
doubao-seed-code
|
0.17 |
1.12 |
Provider: ZenMux, Context: 256000, Output Limit: 64000
|
|
|
zenmux
|
Doubao-Seed-1.8 |
doubao-seed-1.8
|
0.11 |
0.28 |
Provider: ZenMux, Context: 256000, Output Limit: 64000
|
|
|
zenmux
|
GPT-5.1-Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
Provider: ZenMux, Context: 400000, Output Limit: 64000
|
|
|
zenmux
|
GPT-5.1-Codex-Mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
Provider: ZenMux, Context: 400000, Output Limit: 64000
|
|
|
zenmux
|
GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: ZenMux, Context: 400000, Output Limit: 64000
|
|
|
zenmux
|
GPT-5 Codex |
gpt-5-codex
|
1.25 |
10.00 |
Provider: ZenMux, Context: 400000, Output Limit: 64000
|
|
|
zenmux
|
GPT-5.1 Chat |
gpt-5.1-chat
|
1.25 |
10.00 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: ZenMux, Context: 400000, Output Limit: 64000
|
|
|
zenmux
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
Provider: ZenMux, Context: 400000, Output Limit: 64000
|
|
|
zenmux
|
ERNIE-5.0-Thinking-Preview |
ernie-5.0-thinking-preview
|
0.84 |
3.37 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
Ring-1T |
ring-1t
|
0.56 |
2.24 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
Ling-1T |
ling-1t
|
0.56 |
2.24 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
GLM 4.7 |
glm-4.7
|
0.28 |
1.14 |
Provider: ZenMux, Context: 200000, Output Limit: 64000
|
|
|
zenmux
|
GLM 4.6V Flash (Free) |
glm-4.6v-flash-free
|
0.00 |
0.00 |
Provider: ZenMux, Context: 200000, Output Limit: 64000
|
|
|
zenmux
|
GLM 4.6V FlashX |
glm-4.6v-flash
|
0.00 |
0.00 |
Provider: ZenMux, Context: 200000, Output Limit: 64000
|
|
|
zenmux
|
GLM 4.5 |
glm-4.5
|
0.35 |
1.54 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
GLM 4.5 Air |
glm-4.5-air
|
0.11 |
0.56 |
Provider: ZenMux, Context: 128000, Output Limit: 64000
|
|
|
zenmux
|
GLM 4.6 |
glm-4.6
|
0.35 |
1.54 |
Provider: ZenMux, Context: 200000, Output Limit: 128000
|
|
|
zenmux
|
GLM 4.6V |
glm-4.6v
|
0.14 |
0.42 |
Provider: ZenMux, Context: 200000, Output Limit: 64000
|
|
|
zenmux
|
Qwen3-Coder-Plus |
qwen3-coder-plus
|
1.00 |
5.00 |
Provider: ZenMux, Context: 1000000, Output Limit: 64000
|
|
|
zenmux
|
KAT-Coder-Pro-V1 Free |
kat-coder-pro-v1-free
|
0.00 |
0.00 |
Provider: ZenMux, Context: 256000, Output Limit: 64000
|
|
|
zenmux
|
KAT-Coder-Pro-V1 |
kat-coder-pro-v1
|
0.00 |
0.00 |
Provider: ZenMux, Context: 256000, Output Limit: 64000
|
|
|
zenmux
|
Claude Opus 4 |
claude-opus-4
|
15.00 |
75.00 |
Provider: ZenMux, Context: 200000, Output Limit: 64000
|
|
|
zenmux
|
Claude Haiku 4.5 |
claude-haiku-4.5
|
1.00 |
5.00 |
Provider: ZenMux, Context: 200000, Output Limit: 64000
|
|
|
zenmux
|
Claude Opus 4.1 |
claude-opus-4.1
|
15.00 |
75.00 |
Provider: ZenMux, Context: 200000, Output Limit: 32000
|
|
|
zenmux
|
Claude Sonnet 4 |
claude-sonnet-4
|
3.00 |
15.00 |
Provider: ZenMux, Context: 1000000, Output Limit: 64000
|
|
|
zenmux
|
Claude Opus 4.5 |
claude-opus-4.5
|
5.00 |
25.00 |
Provider: ZenMux, Context: 200000, Output Limit: 64000
|
|
|
zenmux
|
Claude Sonnet 4.5 |
claude-sonnet-4.5
|
3.00 |
15.00 |
Provider: ZenMux, Context: 1000000, Output Limit: 64000
|
|
|
ovhcloud
|
Mixtral-8x7B-Instruct-v0.1 |
mixtral-8x7b-instruct-v0.1
|
0.70 |
0.70 |
Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
|
|
|
ovhcloud
|
Mistral-7B-Instruct-v0.3 |
mistral-7b-instruct-v0.3
|
0.11 |
0.11 |
Provider: OVHcloud AI Endpoints, Context: 127000, Output Limit: 127000
|
|
|
ovhcloud
|
Llama-3.1-8B-Instruct |
llama-3.1-8b-instruct
|
0.11 |
0.11 |
Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
|
|
|
ovhcloud
|
Qwen2.5-VL-72B-Instruct |
qwen2.5-vl-72b-instruct
|
1.01 |
1.01 |
Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
|
|
|
ovhcloud
|
Mistral-Nemo-Instruct-2407 |
mistral-nemo-instruct-2407
|
0.14 |
0.14 |
Provider: OVHcloud AI Endpoints, Context: 118000, Output Limit: 118000
|
|
|
ovhcloud
|
Mistral-Small-3.2-24B-Instruct-2506 |
mistral-small-3.2-24b-instruct-2506
|
0.10 |
0.31 |
Provider: OVHcloud AI Endpoints, Context: 128000, Output Limit: 128000
|
|
|
ovhcloud
|
Qwen2.5-Coder-32B-Instruct |
qwen2.5-coder-32b-instruct
|
0.96 |
0.96 |
Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
|
|
|
ovhcloud
|
Qwen3-Coder-30B-A3B-Instruct |
qwen3-coder-30b-a3b-instruct
|
0.07 |
0.26 |
Provider: OVHcloud AI Endpoints, Context: 256000, Output Limit: 256000
|
|
|
ovhcloud
|
llava-next-mistral-7b |
llava-next-mistral-7b
|
0.32 |
0.32 |
Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
|
|
|
ovhcloud
|
DeepSeek-R1-Distill-Llama-70B |
deepseek-r1-distill-llama-70b
|
0.74 |
0.74 |
Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
|
|
|
ovhcloud
|
Meta-Llama-3_1-70B-Instruct |
meta-llama-3_1-70b-instruct
|
0.74 |
0.74 |
Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
|
|
|
ovhcloud
|
gpt-oss-20b |
gpt-oss-20b
|
0.05 |
0.18 |
Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
|
|
|
ovhcloud
|
gpt-oss-120b |
gpt-oss-120b
|
0.09 |
0.47 |
Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
|
|
|
ovhcloud
|
Meta-Llama-3_3-70B-Instruct |
meta-llama-3_3-70b-instruct
|
0.74 |
0.74 |
Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
|
|
|
ovhcloud
|
Qwen3-32B |
qwen3-32b
|
0.09 |
0.25 |
Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
|
|
|
v0
|
v0-1.5-lg |
v0-1.5-lg
|
15.00 |
75.00 |
Provider: v0, Context: 512000, Output Limit: 32000
|
|
|
v0
|
v0-1.5-md |
v0-1.5-md
|
3.00 |
15.00 |
Provider: v0, Context: 128000, Output Limit: 32000
|
|
|
v0
|
v0-1.0-md |
v0-1.0-md
|
3.00 |
15.00 |
Provider: v0, Context: 128000, Output Limit: 32000
|
|
|
iflowcn
|
Qwen3-Coder-480B-A35B |
qwen3-coder
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 64000
|
|
|
iflowcn
|
DeepSeek-V3 |
deepseek-v3
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 32000
|
|
|
iflowcn
|
Kimi-K2 |
kimi-k2
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 64000
|
|
|
iflowcn
|
DeepSeek-R1 |
deepseek-r1
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 32000
|
|
|
iflowcn
|
DeepSeek-V3.1-Terminus |
deepseek-v3.1
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 64000
|
|
|
iflowcn
|
MiniMax-M2 |
minimax-m2
|
0.00 |
0.00 |
Provider: iFlow, Context: 204800, Output Limit: 131100
|
|
|
iflowcn
|
Qwen3-235B-A22B |
qwen3-235b
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 32000
|
|
|
iflowcn
|
DeepSeek-V3.2 |
deepseek-v3.2-chat
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 64000
|
|
|
iflowcn
|
Kimi-K2-0905 |
kimi-k2-0905
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 64000
|
|
|
iflowcn
|
Kimi-K2-Thinking |
kimi-k2-thinking
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 64000
|
|
|
iflowcn
|
Qwen3-235B-A22B-Thinking |
qwen3-235b-a22b-thinking-2507
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 64000
|
|
|
iflowcn
|
Qwen3-VL-Plus |
qwen3-vl-plus
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 32000
|
|
|
iflowcn
|
GLM-4.6 |
glm-4.6
|
0.00 |
0.00 |
Provider: iFlow, Context: 200000, Output Limit: 128000
|
|
|
iflowcn
|
TStars-2.0 |
tstars2.0
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 64000
|
|
|
iflowcn
|
Qwen3-235B-A22B-Instruct |
qwen3-235b-a22b-instruct
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 64000
|
|
|
iflowcn
|
Qwen3-Max |
qwen3-max
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 32000
|
|
|
iflowcn
|
DeepSeek-V3.2-Exp |
deepseek-v3.2
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 64000
|
|
|
iflowcn
|
Qwen3-Max-Preview |
qwen3-max-preview
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 32000
|
|
|
iflowcn
|
Qwen3-Coder-Plus |
qwen3-coder-plus
|
0.00 |
0.00 |
Provider: iFlow, Context: 256000, Output Limit: 64000
|
|
|
iflowcn
|
Qwen3-32B |
qwen3-32b
|
0.00 |
0.00 |
Provider: iFlow, Context: 128000, Output Limit: 32000
|
|
|
synthetic
|
Qwen 3 235B Instruct |
qwen3-235b-a22b-instruct-2507
|
0.20 |
0.60 |
Provider: Synthetic, Context: 256000, Output Limit: 32000
|
|
|
synthetic
|
Qwen2.5-Coder-32B-Instruct |
qwen2.5-coder-32b-instruct
|
0.80 |
0.80 |
Provider: Synthetic, Context: 32768, Output Limit: 32768
|
|
|
synthetic
|
Qwen 3 Coder 480B |
qwen3-coder-480b-a35b-instruct
|
2.00 |
2.00 |
Provider: Synthetic, Context: 256000, Output Limit: 32000
|
|
|
synthetic
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
0.65 |
3.00 |
Provider: Synthetic, Context: 256000, Output Limit: 32000
|
|
|
synthetic
|
MiniMax-M2 |
minimax-m2
|
0.55 |
2.19 |
Provider: Synthetic, Context: 196608, Output Limit: 131000
|
|
|
synthetic
|
MiniMax-M2.1 |
minimax-m2.1
|
0.55 |
2.19 |
Provider: Synthetic, Context: 204800, Output Limit: 131072
|
|
|
synthetic
|
Llama-3.1-70B-Instruct |
llama-3.1-70b-instruct
|
0.90 |
0.90 |
Provider: Synthetic, Context: 128000, Output Limit: 32768
|
|
|
synthetic
|
Llama-3.1-8B-Instruct |
llama-3.1-8b-instruct
|
0.20 |
0.20 |
Provider: Synthetic, Context: 128000, Output Limit: 32768
|
|
|
synthetic
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.90 |
0.90 |
Provider: Synthetic, Context: 128000, Output Limit: 32768
|
|
|
synthetic
|
Llama-4-Scout-17B-16E-Instruct |
llama-4-scout-17b-16e-instruct
|
0.15 |
0.60 |
Provider: Synthetic, Context: 328000, Output Limit: 4096
|
|
|
synthetic
|
Llama-4-Maverick-17B-128E-Instruct-FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.22 |
0.88 |
Provider: Synthetic, Context: 524000, Output Limit: 4096
|
|
|
synthetic
|
Llama-3.1-405B-Instruct |
llama-3.1-405b-instruct
|
3.00 |
3.00 |
Provider: Synthetic, Context: 128000, Output Limit: 32768
|
|
|
synthetic
|
Kimi K2 0905 |
kimi-k2-instruct-0905
|
1.20 |
1.20 |
Provider: Synthetic, Context: 262144, Output Limit: 32768
|
|
|
synthetic
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.55 |
2.19 |
Provider: Synthetic, Context: 262144, Output Limit: 262144
|
|
|
synthetic
|
GLM 4.5 |
glm-4.5
|
0.55 |
2.19 |
Provider: Synthetic, Context: 128000, Output Limit: 96000
|
|
|
synthetic
|
GLM 4.7 |
glm-4.7
|
0.55 |
2.19 |
Provider: Synthetic, Context: 200000, Output Limit: 64000
|
|
|
synthetic
|
GLM 4.6 |
glm-4.6
|
0.55 |
2.19 |
Provider: Synthetic, Context: 200000, Output Limit: 64000
|
|
|
synthetic
|
DeepSeek R1 |
deepseek-r1
|
0.55 |
2.19 |
Provider: Synthetic, Context: 128000, Output Limit: 128000
|
|
|
synthetic
|
DeepSeek R1 (0528) |
deepseek-r1-0528
|
3.00 |
8.00 |
Provider: Synthetic, Context: 128000, Output Limit: 128000
|
|
|
synthetic
|
DeepSeek V3.1 Terminus |
deepseek-v3.1-terminus
|
1.20 |
1.20 |
Provider: Synthetic, Context: 128000, Output Limit: 128000
|
|
|
synthetic
|
DeepSeek V3.2 |
deepseek-v3.2
|
0.27 |
0.40 |
Provider: Synthetic, Context: 162816, Output Limit: 8000
|
|
|
synthetic
|
DeepSeek V3 |
deepseek-v3
|
1.25 |
1.25 |
Provider: Synthetic, Context: 128000, Output Limit: 128000
|
|
|
synthetic
|
DeepSeek V3.1 |
deepseek-v3.1
|
0.56 |
1.68 |
Provider: Synthetic, Context: 128000, Output Limit: 128000
|
|
|
synthetic
|
DeepSeek V3 (0324) |
deepseek-v3-0324
|
1.20 |
1.20 |
Provider: Synthetic, Context: 128000, Output Limit: 128000
|
|
|
synthetic
|
GPT OSS 120B |
gpt-oss-120b
|
0.10 |
0.10 |
Provider: Synthetic, Context: 128000, Output Limit: 32768
|
|
|
deepinfra
|
Kimi K2 |
kimi-k2-instruct
|
0.50 |
2.00 |
Provider: Deep Infra, Context: 131072, Output Limit: 32768
|
|
|
deepinfra
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.47 |
2.00 |
Provider: Deep Infra, Context: 131072, Output Limit: 32768
|
|
|
deepinfra
|
MiniMax M2 |
minimax-m2
|
0.25 |
1.02 |
Provider: Deep Infra, Context: 262144, Output Limit: 32768
|
|
|
deepinfra
|
GPT OSS 20B |
gpt-oss-20b
|
0.03 |
0.14 |
Provider: Deep Infra, Context: 131072, Output Limit: 16384
|
|
|
deepinfra
|
GPT OSS 120B |
gpt-oss-120b
|
0.05 |
0.24 |
Provider: Deep Infra, Context: 131072, Output Limit: 16384
|
|
|
deepinfra
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.40 |
1.60 |
Provider: Deep Infra, Context: 262144, Output Limit: 66536
|
|
|
deepinfra
|
Qwen3 Coder 480B A35B Instruct Turbo |
qwen3-coder-480b-a35b-instruct-turbo
|
0.30 |
1.20 |
Provider: Deep Infra, Context: 262144, Output Limit: 66536
|
|
|
deepinfra
|
GLM-4.5 |
glm-4.5
|
0.60 |
2.20 |
Provider: Deep Infra, Context: 131072, Output Limit: 98304
|
|
|
deepinfra
|
GLM-4.7 |
glm-4.7
|
0.43 |
1.75 |
Provider: Deep Infra, Context: 202752, Output Limit: 16384
|
|
|
zhipuai
|
GLM-4.6V-Flash |
glm-4.6v-flash
|
0.00 |
0.00 |
Provider: Zhipu AI, Context: 128000, Output Limit: 32768
|
|
|
zhipuai
|
GLM-4.6V |
glm-4.6v
|
0.30 |
0.90 |
Provider: Zhipu AI, Context: 128000, Output Limit: 32768
|
|
|
zhipuai
|
GLM-4.6 |
glm-4.6
|
0.60 |
2.20 |
Provider: Zhipu AI, Context: 204800, Output Limit: 131072
|
|
|
zhipuai
|
GLM-4.5V |
glm-4.5v
|
0.60 |
1.80 |
Provider: Zhipu AI, Context: 64000, Output Limit: 16384
|
|
|
zhipuai
|
GLM-4.5-Air |
glm-4.5-air
|
0.20 |
1.10 |
Provider: Zhipu AI, Context: 131072, Output Limit: 98304
|
|
|
zhipuai
|
GLM-4.5 |
glm-4.5
|
0.60 |
2.20 |
Provider: Zhipu AI, Context: 131072, Output Limit: 98304
|
|
|
zhipuai
|
GLM-4.5-Flash |
glm-4.5-flash
|
0.00 |
0.00 |
Provider: Zhipu AI, Context: 131072, Output Limit: 98304
|
|
|
zhipuai
|
GLM-4.7 |
glm-4.7
|
0.60 |
2.20 |
Provider: Zhipu AI, Context: 204800, Output Limit: 131072
|
|
|
submodel
|
GPT OSS 120B |
gpt-oss-120b
|
0.10 |
0.50 |
Provider: submodel, Context: 131072, Output Limit: 32768
|
|
|
submodel
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.20 |
0.30 |
Provider: submodel, Context: 262144, Output Limit: 131072
|
|
|
submodel
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct-fp8
|
0.20 |
0.80 |
Provider: submodel, Context: 262144, Output Limit: 262144
|
|
|
submodel
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
0.20 |
0.60 |
Provider: submodel, Context: 262144, Output Limit: 131072
|
|
|
submodel
|
GLM 4.5 FP8 |
glm-4.5-fp8
|
0.20 |
0.80 |
Provider: submodel, Context: 131072, Output Limit: 131072
|
|
|
submodel
|
GLM 4.5 Air |
glm-4.5-air
|
0.10 |
0.50 |
Provider: submodel, Context: 131072, Output Limit: 131072
|
|
|
submodel
|
DeepSeek R1 0528 |
deepseek-r1-0528
|
0.50 |
2.15 |
Provider: submodel, Context: 75000, Output Limit: 163840
|
|
|
submodel
|
DeepSeek V3.1 |
deepseek-v3.1
|
0.20 |
0.80 |
Provider: submodel, Context: 75000, Output Limit: 163840
|
|
|
submodel
|
DeepSeek V3 0324 |
deepseek-v3-0324
|
0.20 |
0.80 |
Provider: submodel, Context: 75000, Output Limit: 163840
|
|
|
nanogpt
|
Kimi K2 Thinking |
kimi-k2-thinking
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 32768, Output Limit: 8192
|
|
|
nanogpt
|
Kimi K2 Instruct |
kimi-k2-instruct
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 131072, Output Limit: 8192
|
|
|
nanogpt
|
Hermes 4 405b Thinking |
hermes-4-405b:thinking
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
Llama 3 3 Nemotron Super 49B V1 5 |
llama-3_3-nemotron-super-49b-v1_5
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
Deepseek V3.2 Thinking |
deepseek-v3.2:thinking
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
Deepseek R1 |
deepseek-r1
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
Minimax M2.1 |
minimax-m2.1
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
GPT Oss 120b |
gpt-oss-120b
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
GLM 4.6 Thinking |
glm-4.6:thinking
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
GLM 4.6 |
glm-4.6
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 200000, Output Limit: 8192
|
|
|
nanogpt
|
Qwen3 Coder |
qwen3-coder
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 106000, Output Limit: 8192
|
|
|
nanogpt
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 262144, Output Limit: 8192
|
|
|
nanogpt
|
Devstral 2 123b Instruct 2512 |
devstral-2-123b-instruct-2512
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 131072, Output Limit: 8192
|
|
|
nanogpt
|
Mistral Large 3 675b Instruct 2512 |
mistral-large-3-675b-instruct-2512
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 131072, Output Limit: 8192
|
|
|
nanogpt
|
Ministral 14b Instruct 2512 |
ministral-14b-instruct-2512
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 131072, Output Limit: 8192
|
|
|
nanogpt
|
Llama 4 Maverick |
llama-4-maverick
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
Llama 3.3 70b Instruct |
llama-3.3-70b-instruct
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
GLM 4.7 |
glm-4.7
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 204800, Output Limit: 8192
|
|
|
nanogpt
|
GLM 4.5 Air |
glm-4.5-air
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
GLM 4.7 Thinking |
glm-4.7:thinking
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
nanogpt
|
GLM 4.5 Air Thinking |
glm-4.5-air:thinking
|
1.00 |
2.00 |
Provider: NanoGPT, Context: 128000, Output Limit: 8192
|
|
|
inference
|
Mistral Nemo 12B Instruct |
mistral-nemo-12b-instruct
|
0.04 |
0.10 |
Provider: Inference, Context: 16000, Output Limit: 4096
|
|
|
inference
|
Google Gemma 3 |
gemma-3
|
0.15 |
0.30 |
Provider: Inference, Context: 125000, Output Limit: 4096
|
|
|
inference
|
Osmosis Structure 0.6B |
osmosis-structure-0.6b
|
0.10 |
0.50 |
Provider: Inference, Context: 4000, Output Limit: 2048
|
|
|
inference
|
Qwen 3 Embedding 4B |
qwen3-embedding-4b
|
0.01 |
0.00 |
Provider: Inference, Context: 32000, Output Limit: 2048
|
|
|
inference
|
Qwen 2.5 7B Vision Instruct |
qwen-2.5-7b-vision-instruct
|
0.20 |
0.20 |
Provider: Inference, Context: 125000, Output Limit: 4096
|
|
|
inference
|
Llama 3.2 11B Vision Instruct |
llama-3.2-11b-vision-instruct
|
0.06 |
0.06 |
Provider: Inference, Context: 16000, Output Limit: 4096
|
|
|
inference
|
Llama 3.1 8B Instruct |
llama-3.1-8b-instruct
|
0.03 |
0.03 |
Provider: Inference, Context: 16000, Output Limit: 4096
|
|
|
inference
|
Llama 3.2 3B Instruct |
llama-3.2-3b-instruct
|
0.02 |
0.02 |
Provider: Inference, Context: 16000, Output Limit: 4096
|
|
|
inference
|
Llama 3.2 1B Instruct |
llama-3.2-1b-instruct
|
0.01 |
0.01 |
Provider: Inference, Context: 16000, Output Limit: 4096
|
|
|
requesty
|
Grok 4 |
grok-4
|
3.00 |
15.00 |
Provider: Requesty, Context: 256000, Output Limit: 64000
|
|
|
requesty
|
Grok 4 Fast |
grok-4-fast
|
0.20 |
0.50 |
Provider: Requesty, Context: 2000000, Output Limit: 64000
|
|
|
requesty
|
Gemini 3 Flash |
gemini-3-flash-preview
|
0.50 |
3.00 |
Provider: Requesty, Context: 1048576, Output Limit: 65536
|
|
|
requesty
|
Gemini 3 Pro |
gemini-3-pro-preview
|
2.00 |
12.00 |
Provider: Requesty, Context: 1048576, Output Limit: 65536
|
|
|
requesty
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: Requesty, Context: 1048576, Output Limit: 65536
|
|
|
requesty
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: Requesty, Context: 1048576, Output Limit: 65536
|
|
|
requesty
|
GPT-4.1 Mini |
gpt-4.1-mini
|
0.40 |
1.60 |
Provider: Requesty, Context: 1047576, Output Limit: 32768
|
|
|
requesty
|
GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: Requesty, Context: 16000, Output Limit: 4000
|
|
|
requesty
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: Requesty, Context: 1047576, Output Limit: 32768
|
|
|
requesty
|
o4 Mini |
o4-mini
|
1.10 |
4.40 |
Provider: Requesty, Context: 200000, Output Limit: 100000
|
|
|
requesty
|
GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: Requesty, Context: 128000, Output Limit: 32000
|
|
|
requesty
|
GPT-4o Mini |
gpt-4o-mini
|
0.15 |
0.60 |
Provider: Requesty, Context: 128000, Output Limit: 16384
|
|
|
requesty
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: Requesty, Context: 400000, Output Limit: 128000
|
|
|
requesty
|
Claude Opus 4 |
claude-opus-4
|
15.00 |
75.00 |
Provider: Requesty, Context: 200000, Output Limit: 32000
|
|
|
requesty
|
Claude Opus 4.1 |
claude-opus-4-1
|
15.00 |
75.00 |
Provider: Requesty, Context: 200000, Output Limit: 32000
|
|
|
requesty
|
Claude Haiku 4.5 |
claude-haiku-4-5
|
1.00 |
5.00 |
Provider: Requesty, Context: 200000, Output Limit: 62000
|
|
|
requesty
|
Claude Opus 4.5 |
claude-opus-4-5
|
5.00 |
25.00 |
Provider: Requesty, Context: 200000, Output Limit: 64000
|
|
|
requesty
|
Claude Sonnet 4.5 |
claude-sonnet-4-5
|
3.00 |
15.00 |
Provider: Requesty, Context: 1000000, Output Limit: 64000
|
|
|
requesty
|
Claude Sonnet 3.7 |
claude-3-7-sonnet
|
3.00 |
15.00 |
Provider: Requesty, Context: 200000, Output Limit: 64000
|
|
|
requesty
|
Claude Sonnet 4 |
claude-sonnet-4
|
3.00 |
15.00 |
Provider: Requesty, Context: 200000, Output Limit: 64000
|
|
|
morph
|
Morph v3 Large |
morph-v3-large
|
0.90 |
1.90 |
Provider: Morph, Context: 32000, Output Limit: 32000
|
|
|
morph
|
Auto |
auto
|
0.85 |
1.55 |
Provider: Morph, Context: 32000, Output Limit: 32000
|
|
|
morph
|
Morph v3 Fast |
morph-v3-fast
|
0.80 |
1.20 |
Provider: Morph, Context: 16000, Output Limit: 16000
|
|
|
lmstudio
|
GPT OSS 20B |
gpt-oss-20b
|
0.00 |
0.00 |
Provider: LMStudio, Context: 131072, Output Limit: 32768
|
|
|
lmstudio
|
Qwen3 30B A3B 2507 |
qwen3-30b-a3b-2507
|
0.00 |
0.00 |
Provider: LMStudio, Context: 262144, Output Limit: 16384
|
|
|
lmstudio
|
Qwen3 Coder 30B |
qwen3-coder-30b
|
0.00 |
0.00 |
Provider: LMStudio, Context: 262144, Output Limit: 65536
|
|
|
friendli
|
Llama 3.3 70B Instruct |
meta-llama-3.3-70b-instruct
|
0.60 |
0.60 |
Provider: Friendli, Context: 131072, Output Limit: 131072
|
|
|
friendli
|
Llama 3.1 8B Instruct |
meta-llama-3.1-8b-instruct
|
0.10 |
0.10 |
Provider: Friendli, Context: 131072, Output Limit: 8000
|
|
|
friendli
|
EXAONE 4.0.1 32B |
exaone-4.0.1-32b
|
0.60 |
1.00 |
Provider: Friendli, Context: 131072, Output Limit: 131072
|
|
|
friendli
|
Llama 4 Maverick 17B 128E Instruct |
llama-4-maverick-17b-128e-instruct
|
- |
- |
Provider: Friendli, Context: 131072, Output Limit: 8000
|
|
|
friendli
|
Llama 4 Scout 17B 16E Instruct |
llama-4-scout-17b-16e-instruct
|
- |
- |
Provider: Friendli, Context: 131072, Output Limit: 8000
|
|
|
friendli
|
Qwen3 30B A3B |
qwen3-30b-a3b
|
- |
- |
Provider: Friendli, Context: 131072, Output Limit: 8000
|
|
|
friendli
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.20 |
0.80 |
Provider: Friendli, Context: 131072, Output Limit: 131072
|
|
|
friendli
|
Qwen3 32B |
qwen3-32b
|
- |
- |
Provider: Friendli, Context: 131072, Output Limit: 8000
|
|
|
friendli
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
- |
- |
Provider: Friendli, Context: 131072, Output Limit: 131072
|
|
|
friendli
|
GLM 4.6 |
glm-4.6
|
- |
- |
Provider: Friendli, Context: 131072, Output Limit: 131072
|
|
|
friendli
|
DeepSeek R1 0528 |
deepseek-r1-0528
|
- |
- |
Provider: Friendli, Context: 163840, Output Limit: 163840
|
|
|
sapaicore
|
anthropic--claude-3.5-sonnet |
anthropic--claude-3.5-sonnet
|
3.00 |
15.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 8192
|
|
|
sapaicore
|
anthropic--claude-4.5-haiku |
anthropic--claude-4.5-haiku
|
1.00 |
5.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 64000
|
|
|
sapaicore
|
anthropic--claude-4-opus |
anthropic--claude-4-opus
|
15.00 |
75.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 32000
|
|
|
sapaicore
|
gemini-2.5-flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Provider: SAP AI Core, Context: 1048576, Output Limit: 65536
|
|
|
sapaicore
|
anthropic--claude-3-haiku |
anthropic--claude-3-haiku
|
0.25 |
1.25 |
Provider: SAP AI Core, Context: 200000, Output Limit: 4096
|
|
|
sapaicore
|
anthropic--claude-3-sonnet |
anthropic--claude-3-sonnet
|
3.00 |
15.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 4096
|
|
|
sapaicore
|
gpt-5-nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: SAP AI Core, Context: 400000, Output Limit: 128000
|
|
|
sapaicore
|
anthropic--claude-3.7-sonnet |
anthropic--claude-3.7-sonnet
|
3.00 |
15.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 64000
|
|
|
sapaicore
|
gpt-5-mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: SAP AI Core, Context: 400000, Output Limit: 128000
|
|
|
sapaicore
|
anthropic--claude-4.5-sonnet |
anthropic--claude-4.5-sonnet
|
3.00 |
15.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 64000
|
|
|
sapaicore
|
gemini-2.5-pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Provider: SAP AI Core, Context: 1048576, Output Limit: 65536
|
|
|
sapaicore
|
anthropic--claude-3-opus |
anthropic--claude-3-opus
|
15.00 |
75.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 4096
|
|
|
sapaicore
|
anthropic--claude-4-sonnet |
anthropic--claude-4-sonnet
|
3.00 |
15.00 |
Provider: SAP AI Core, Context: 200000, Output Limit: 64000
|
|
|
sapaicore
|
gpt-5 |
gpt-5
|
1.25 |
10.00 |
Provider: SAP AI Core, Context: 400000, Output Limit: 128000
|
|
|
anthropic
|
Claude Opus 4 (latest) |
claude-opus-4-0
|
15.00 |
75.00 |
Provider: Anthropic, Context: 200000, Output Limit: 32000
|
|
|
anthropic
|
Claude Sonnet 3.5 v2 |
claude-3-5-sonnet-20241022
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 8192
|
|
|
anthropic
|
Claude Opus 4.1 (latest) |
claude-opus-4-1
|
15.00 |
75.00 |
Provider: Anthropic, Context: 200000, Output Limit: 32000
|
|
|
anthropic
|
Claude Haiku 4.5 (latest) |
claude-haiku-4-5
|
1.00 |
5.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Sonnet 3.5 |
claude-3-5-sonnet-20240620
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 8192
|
|
|
anthropic
|
Claude Haiku 3.5 (latest) |
claude-3-5-haiku-latest
|
0.80 |
4.00 |
Provider: Anthropic, Context: 200000, Output Limit: 8192
|
|
|
anthropic
|
Claude Opus 4.5 (latest) |
claude-opus-4-5
|
5.00 |
25.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Opus 3 |
claude-3-opus-20240229
|
15.00 |
75.00 |
Provider: Anthropic, Context: 200000, Output Limit: 4096
|
|
|
anthropic
|
Claude Opus 4.5 |
claude-opus-4-5-20251101
|
5.00 |
25.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Sonnet 4.5 (latest) |
claude-sonnet-4-5
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Sonnet 4.5 |
claude-sonnet-4-5-20250929
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Sonnet 4 |
claude-sonnet-4-20250514
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Opus 4 |
claude-opus-4-20250514
|
15.00 |
75.00 |
Provider: Anthropic, Context: 200000, Output Limit: 32000
|
|
|
anthropic
|
Claude Haiku 3.5 |
claude-3-5-haiku-20241022
|
0.80 |
4.00 |
Provider: Anthropic, Context: 200000, Output Limit: 8192
|
|
|
anthropic
|
Claude Haiku 3 |
claude-3-haiku-20240307
|
0.25 |
1.25 |
Provider: Anthropic, Context: 200000, Output Limit: 4096
|
|
|
anthropic
|
Claude Sonnet 3.7 |
claude-3-7-sonnet-20250219
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Sonnet 3.7 (latest) |
claude-3-7-sonnet-latest
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Sonnet 4 (latest) |
claude-sonnet-4-0
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
anthropic
|
Claude Opus 4.1 |
claude-opus-4-1-20250805
|
15.00 |
75.00 |
Provider: Anthropic, Context: 200000, Output Limit: 32000
|
|
|
anthropic
|
Claude Sonnet 3 |
claude-3-sonnet-20240229
|
3.00 |
15.00 |
Provider: Anthropic, Context: 200000, Output Limit: 4096
|
|
|
anthropic
|
Claude Haiku 4.5 |
claude-haiku-4-5-20251001
|
1.00 |
5.00 |
Provider: Anthropic, Context: 200000, Output Limit: 64000
|
|
|
aihubmix
|
GPT-4.1 nano |
gpt-4.1-nano
|
0.10 |
0.40 |
Provider: AIHubMix, Context: 1047576, Output Limit: 32768
|
|
|
aihubmix
|
GLM-4.7 |
glm-4.7
|
0.27 |
1.10 |
Provider: AIHubMix, Context: 204800, Output Limit: 131072
|
|
|
aihubmix
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.28 |
1.12 |
Provider: AIHubMix, Context: 262144, Output Limit: 262144
|
|
|
aihubmix
|
Claude Opus 4.1 |
claude-opus-4-1
|
16.50 |
82.50 |
Provider: AIHubMix, Context: 200000, Output Limit: 32000
|
|
|
aihubmix
|
GPT-5.1 Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
aihubmix
|
Claude Haiku 4.5 |
claude-haiku-4-5
|
1.10 |
5.50 |
Provider: AIHubMix, Context: 200000, Output Limit: 64000
|
|
|
aihubmix
|
Claude Opus 4.5 |
claude-opus-4-5
|
5.00 |
25.00 |
Provider: AIHubMix, Context: 200000, Output Limit: 32000
|
|
|
aihubmix
|
Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Provider: AIHubMix, Context: 1000000, Output Limit: 65000
|
|
|
aihubmix
|
Gemini 2.5 Flash |
gemini-2.5-flash
|
0.08 |
0.30 |
Provider: AIHubMix, Context: 1000000, Output Limit: 65000
|
|
|
aihubmix
|
GPT-4.1 mini |
gpt-4.1-mini
|
0.40 |
1.60 |
Provider: AIHubMix, Context: 1047576, Output Limit: 32768
|
|
|
aihubmix
|
Claude Sonnet 4.5 |
claude-sonnet-4-5
|
3.30 |
16.50 |
Provider: AIHubMix, Context: 200000, Output Limit: 64000
|
|
|
aihubmix
|
Coding GLM-4.7 Free |
coding-glm-4.7-free
|
0.00 |
0.00 |
Provider: AIHubMix, Context: 204800, Output Limit: 131072
|
|
|
aihubmix
|
GPT-5.1 Codex Mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
aihubmix
|
Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
0.28 |
2.80 |
Provider: AIHubMix, Context: 262144, Output Limit: 262144
|
|
|
aihubmix
|
GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
aihubmix
|
GPT-5-Nano |
gpt-5-nano
|
0.50 |
2.00 |
Provider: AIHubMix, Context: 128000, Output Limit: 16384
|
|
|
aihubmix
|
GPT-5-Codex |
gpt-5-codex
|
1.25 |
10.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
aihubmix
|
GPT-4o |
gpt-4o
|
2.50 |
10.00 |
Provider: AIHubMix, Context: 128000, Output Limit: 16384
|
|
|
aihubmix
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: AIHubMix, Context: 1047576, Output Limit: 32768
|
|
|
aihubmix
|
o4-mini |
o4-mini
|
1.50 |
6.00 |
Provider: AIHubMix, Context: 200000, Output Limit: 65536
|
|
|
aihubmix
|
GPT-5-Mini |
gpt-5-mini
|
1.50 |
6.00 |
Provider: AIHubMix, Context: 200000, Output Limit: 64000
|
|
|
aihubmix
|
Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
5.00 |
Provider: AIHubMix, Context: 2000000, Output Limit: 65000
|
|
|
aihubmix
|
GPT-4o (2024-11-20) |
gpt-4o-2024-11-20
|
2.50 |
10.00 |
Provider: AIHubMix, Context: 128000, Output Limit: 16384
|
|
|
aihubmix
|
GPT-5.1-Codex-Max |
gpt-5.1-codex-max
|
1.25 |
10.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
aihubmix
|
MiniMax M2.1 Free |
minimax-m2.1-free
|
0.00 |
0.00 |
Provider: AIHubMix, Context: 204800, Output Limit: 131072
|
|
|
aihubmix
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.82 |
3.29 |
Provider: AIHubMix, Context: 262144, Output Limit: 131000
|
|
|
aihubmix
|
DeepSeek-V3.2-Think |
deepseek-v3.2-think
|
0.30 |
0.45 |
Provider: AIHubMix, Context: 131000, Output Limit: 64000
|
|
|
aihubmix
|
GPT-5 |
gpt-5
|
5.00 |
20.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
aihubmix
|
MiniMax M2.1 |
minimax-m2.1
|
0.29 |
1.15 |
Provider: AIHubMix, Context: 204800, Output Limit: 131072
|
|
|
aihubmix
|
DeepSeek-V3.2 |
deepseek-v3.2
|
0.30 |
0.45 |
Provider: AIHubMix, Context: 131000, Output Limit: 64000
|
|
|
aihubmix
|
Kimi K2 0905 |
kimi-k2-0905
|
0.55 |
2.19 |
Provider: AIHubMix, Context: 262144, Output Limit: 262144
|
|
|
aihubmix
|
GPT-5-Pro |
gpt-5-pro
|
7.00 |
28.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
aihubmix
|
GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
Provider: AIHubMix, Context: 400000, Output Limit: 128000
|
|
|
fireworksai
|
Deepseek R1 05/28 |
deepseek-r1-0528
|
3.00 |
8.00 |
Provider: Fireworks AI, Context: 160000, Output Limit: 16384
|
|
|
fireworksai
|
DeepSeek V3.1 |
deepseek-v3p1
|
0.56 |
1.68 |
Provider: Fireworks AI, Context: 163840, Output Limit: 163840
|
|
|
fireworksai
|
DeepSeek V3.2 |
deepseek-v3p2
|
0.56 |
1.68 |
Provider: Fireworks AI, Context: 160000, Output Limit: 160000
|
|
|
fireworksai
|
MiniMax-M2 |
minimax-m2
|
0.30 |
1.20 |
Provider: Fireworks AI, Context: 192000, Output Limit: 192000
|
|
|
fireworksai
|
MiniMax-M2.1 |
minimax-m2p1
|
0.30 |
1.20 |
Provider: Fireworks AI, Context: 200000, Output Limit: 200000
|
|
|
fireworksai
|
GLM 4.7 |
glm-4p7
|
0.60 |
2.20 |
Provider: Fireworks AI, Context: 198000, Output Limit: 198000
|
|
|
fireworksai
|
Deepseek V3 03-24 |
deepseek-v3-0324
|
0.90 |
0.90 |
Provider: Fireworks AI, Context: 160000, Output Limit: 16384
|
|
|
fireworksai
|
GLM 4.6 |
glm-4p6
|
0.55 |
2.19 |
Provider: Fireworks AI, Context: 198000, Output Limit: 198000
|
|
|
fireworksai
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Provider: Fireworks AI, Context: 256000, Output Limit: 256000
|
|
|
fireworksai
|
Kimi K2 Instruct |
kimi-k2-instruct
|
1.00 |
3.00 |
Provider: Fireworks AI, Context: 128000, Output Limit: 16384
|
|
|
fireworksai
|
Qwen3 235B-A22B |
qwen3-235b-a22b
|
0.22 |
0.88 |
Provider: Fireworks AI, Context: 128000, Output Limit: 16384
|
|
|
fireworksai
|
GPT OSS 20B |
gpt-oss-20b
|
0.05 |
0.20 |
Provider: Fireworks AI, Context: 131072, Output Limit: 32768
|
|
|
fireworksai
|
GPT OSS 120B |
gpt-oss-120b
|
0.15 |
0.60 |
Provider: Fireworks AI, Context: 131072, Output Limit: 32768
|
|
|
fireworksai
|
GLM 4.5 Air |
glm-4p5-air
|
0.22 |
0.88 |
Provider: Fireworks AI, Context: 131072, Output Limit: 131072
|
|
|
fireworksai
|
Qwen3 Coder 480B A35B Instruct |
qwen3-coder-480b-a35b-instruct
|
0.45 |
1.80 |
Provider: Fireworks AI, Context: 256000, Output Limit: 32768
|
|
|
fireworksai
|
GLM 4.5 |
glm-4p5
|
0.55 |
2.19 |
Provider: Fireworks AI, Context: 131072, Output Limit: 131072
|
|
|
ionet
|
Kimi K2 Instruct |
kimi-k2-instruct-0905
|
0.39 |
1.90 |
Provider: IO.NET, Context: 32768, Output Limit: 4096
|
|
|
ionet
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.55 |
2.25 |
Provider: IO.NET, Context: 32768, Output Limit: 4096
|
|
|
ionet
|
GPT-OSS 20B |
gpt-oss-20b
|
0.03 |
0.14 |
Provider: IO.NET, Context: 64000, Output Limit: 4096
|
|
|
ionet
|
GPT-OSS 120B |
gpt-oss-120b
|
0.04 |
0.40 |
Provider: IO.NET, Context: 131072, Output Limit: 4096
|
|
|
ionet
|
Devstral Small 2505 |
devstral-small-2505
|
0.05 |
0.22 |
Provider: IO.NET, Context: 128000, Output Limit: 4096
|
|
|
ionet
|
Mistral Nemo Instruct 2407 |
mistral-nemo-instruct-2407
|
0.02 |
0.04 |
Provider: IO.NET, Context: 128000, Output Limit: 4096
|
|
|
ionet
|
Magistral Small 2506 |
magistral-small-2506
|
0.50 |
1.50 |
Provider: IO.NET, Context: 128000, Output Limit: 4096
|
|
|
ionet
|
Mistral Large Instruct 2411 |
mistral-large-instruct-2411
|
2.00 |
6.00 |
Provider: IO.NET, Context: 128000, Output Limit: 4096
|
|
|
ionet
|
Llama 3.3 70B Instruct |
llama-3.3-70b-instruct
|
0.13 |
0.38 |
Provider: IO.NET, Context: 128000, Output Limit: 4096
|
|
|
ionet
|
Llama 4 Maverick 17B 128E Instruct |
llama-4-maverick-17b-128e-instruct-fp8
|
0.15 |
0.60 |
Provider: IO.NET, Context: 430000, Output Limit: 4096
|
|
|
ionet
|
Llama 3.2 90B Vision Instruct |
llama-3.2-90b-vision-instruct
|
0.35 |
0.40 |
Provider: IO.NET, Context: 16000, Output Limit: 4096
|
|
|
ionet
|
Qwen 3 Coder 480B |
qwen3-coder-480b-a35b-instruct-int4-mixed-ar
|
0.22 |
0.95 |
Provider: IO.NET, Context: 106000, Output Limit: 4096
|
|
|
ionet
|
Qwen 2.5 VL 32B Instruct |
qwen2.5-vl-32b-instruct
|
0.05 |
0.22 |
Provider: IO.NET, Context: 32000, Output Limit: 4096
|
|
|
ionet
|
Qwen 3 235B Thinking |
qwen3-235b-a22b-thinking-2507
|
0.11 |
0.60 |
Provider: IO.NET, Context: 262144, Output Limit: 4096
|
|
|
ionet
|
Qwen 3 Next 80B Instruct |
qwen3-next-80b-a3b-instruct
|
0.10 |
0.80 |
Provider: IO.NET, Context: 262144, Output Limit: 4096
|
|
|
ionet
|
GLM 4.6 |
glm-4.6
|
0.40 |
1.75 |
Provider: IO.NET, Context: 200000, Output Limit: 4096
|
|
|
ionet
|
DeepSeek R1 |
deepseek-r1-0528
|
2.00 |
8.75 |
Provider: IO.NET, Context: 128000, Output Limit: 4096
|
|
|
modelscope
|
GLM-4.5 |
glm-4.5
|
0.00 |
0.00 |
Provider: ModelScope, Context: 131072, Output Limit: 98304
|
|
|
modelscope
|
GLM-4.6 |
glm-4.6
|
0.00 |
0.00 |
Provider: ModelScope, Context: 202752, Output Limit: 98304
|
|
|
modelscope
|
Qwen3 30B A3B Thinking 2507 |
qwen3-30b-a3b-thinking-2507
|
0.00 |
0.00 |
Provider: ModelScope, Context: 262144, Output Limit: 32768
|
|
|
modelscope
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.00 |
0.00 |
Provider: ModelScope, Context: 262144, Output Limit: 131072
|
|
|
modelscope
|
Qwen3 Coder 30B A3B Instruct |
qwen3-coder-30b-a3b-instruct
|
0.00 |
0.00 |
Provider: ModelScope, Context: 262144, Output Limit: 65536
|
|
|
modelscope
|
Qwen3 30B A3B Instruct 2507 |
qwen3-30b-a3b-instruct-2507
|
0.00 |
0.00 |
Provider: ModelScope, Context: 262144, Output Limit: 16384
|
|
|
modelscope
|
Qwen3-235B-A22B-Thinking-2507 |
qwen3-235b-a22b-thinking-2507
|
0.00 |
0.00 |
Provider: ModelScope, Context: 262144, Output Limit: 131072
|
|
|
azurecognitiveservices
|
GPT-3.5 Turbo 1106 |
gpt-3.5-turbo-1106
|
1.00 |
2.00 |
Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
|
|
|
azurecognitiveservices
|
Mistral Small 3.1 |
mistral-small-2503
|
0.10 |
0.30 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
|
|
|
azurecognitiveservices
|
Codestral 25.01 |
codestral-2501
|
0.30 |
0.90 |
Provider: Azure Cognitive Services, Context: 256000, Output Limit: 256000
|
|
|
azurecognitiveservices
|
Mistral Large 24.11 |
mistral-large-2411
|
2.00 |
6.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
|
|
|
azurecognitiveservices
|
GPT-5 Pro |
gpt-5-pro
|
15.00 |
120.00 |
Provider: Azure Cognitive Services, Context: 400000, Output Limit: 272000
|
|
|
azurecognitiveservices
|
DeepSeek-V3.2 |
deepseek-v3.2
|
0.28 |
0.42 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
MAI-DS-R1 |
mai-ds-r1
|
1.35 |
5.40 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
|
|
|
azurecognitiveservices
|
GPT-5 |
gpt-5
|
1.25 |
10.00 |
Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
GPT-4o mini |
gpt-4o-mini
|
0.15 |
0.60 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
|
|
|
azurecognitiveservices
|
Phi-4-reasoning-plus |
phi-4-reasoning-plus
|
0.13 |
0.50 |
Provider: Azure Cognitive Services, Context: 32000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
GPT-4 Turbo Vision |
gpt-4-turbo-vision
|
10.00 |
30.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Phi-4-reasoning |
phi-4-reasoning
|
0.13 |
0.50 |
Provider: Azure Cognitive Services, Context: 32000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Phi-3-medium-instruct (4k) |
phi-3-medium-4k-instruct
|
0.17 |
0.68 |
Provider: Azure Cognitive Services, Context: 4096, Output Limit: 1024
|
|
|
azurecognitiveservices
|
Codex Mini |
codex-mini
|
1.50 |
6.00 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
|
|
|
azurecognitiveservices
|
o3 |
o3
|
2.00 |
8.00 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
|
|
|
azurecognitiveservices
|
Mistral Nemo |
mistral-nemo
|
0.15 |
0.15 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
GPT-3.5 Turbo Instruct |
gpt-3.5-turbo-instruct
|
1.50 |
2.00 |
Provider: Azure Cognitive Services, Context: 4096, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.30 |
0.61 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
|
|
|
azurecognitiveservices
|
text-embedding-ada-002 |
text-embedding-ada-002
|
0.10 |
0.00 |
Provider: Azure Cognitive Services, Context: 8192, Output Limit: 1536
|
|
|
azurecognitiveservices
|
Embed v3 English |
cohere-embed-v3-english
|
0.10 |
0.00 |
Provider: Azure Cognitive Services, Context: 512, Output Limit: 1024
|
|
|
azurecognitiveservices
|
Llama 4 Scout 17B 16E Instruct |
llama-4-scout-17b-16e-instruct
|
0.20 |
0.78 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
|
|
|
azurecognitiveservices
|
o1-mini |
o1-mini
|
1.10 |
4.40 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 65536
|
|
|
azurecognitiveservices
|
GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
Phi-3.5-MoE-instruct |
phi-3.5-moe-instruct
|
0.16 |
0.64 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
GPT-5.1 Chat |
gpt-5.1-chat
|
1.25 |
10.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
|
|
|
azurecognitiveservices
|
Grok 3 Mini |
grok-3-mini
|
0.30 |
0.50 |
Provider: Azure Cognitive Services, Context: 131072, Output Limit: 8192
|
|
|
azurecognitiveservices
|
o1 |
o1
|
15.00 |
60.00 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
|
|
|
azurecognitiveservices
|
Meta-Llama-3-8B-Instruct |
meta-llama-3-8b-instruct
|
0.30 |
0.61 |
Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
|
|
|
azurecognitiveservices
|
Phi-4-multimodal |
phi-4-multimodal
|
0.08 |
0.32 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
o4-mini |
o4-mini
|
1.10 |
4.40 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
|
|
|
azurecognitiveservices
|
GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
|
|
|
azurecognitiveservices
|
Ministral 3B |
ministral-3b
|
0.04 |
0.04 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
|
|
|
azurecognitiveservices
|
GPT-3.5 Turbo 0301 |
gpt-3.5-turbo-0301
|
1.50 |
2.00 |
Provider: Azure Cognitive Services, Context: 4096, Output Limit: 4096
|
|
|
azurecognitiveservices
|
GPT-4o |
gpt-4o
|
2.50 |
10.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
|
|
|
azurecognitiveservices
|
Phi-3-mini-instruct (128k) |
phi-3-mini-128k-instruct
|
0.13 |
0.52 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Llama-3.2-90B-Vision-Instruct |
llama-3.2-90b-vision-instruct
|
2.04 |
2.04 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
|
|
|
azurecognitiveservices
|
GPT-5-Codex |
gpt-5-codex
|
1.25 |
10.00 |
Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
o3-mini |
o3-mini
|
1.10 |
4.40 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
|
|
|
azurecognitiveservices
|
Model Router |
model-router
|
0.14 |
0.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
|
|
|
azurecognitiveservices
|
Kimi K2 Thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Provider: Azure Cognitive Services, Context: 262144, Output Limit: 262144
|
|
|
azurecognitiveservices
|
GPT-5.1 Codex Mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.71 |
0.71 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
|
|
|
azurecognitiveservices
|
o1-preview |
o1-preview
|
16.50 |
66.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
|
|
|
azurecognitiveservices
|
Phi-3.5-mini-instruct |
phi-3.5-mini-instruct
|
0.13 |
0.52 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
GPT-3.5 Turbo 0613 |
gpt-3.5-turbo-0613
|
3.00 |
4.00 |
Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
|
|
|
azurecognitiveservices
|
GPT-4 Turbo |
gpt-4-turbo
|
10.00 |
30.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Meta-Llama-3.1-70B-Instruct |
meta-llama-3.1-70b-instruct
|
2.68 |
3.54 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
|
|
|
azurecognitiveservices
|
Phi-3-small-instruct (8k) |
phi-3-small-8k-instruct
|
0.15 |
0.60 |
Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
|
|
|
azurecognitiveservices
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
1.14 |
4.56 |
Provider: Azure Cognitive Services, Context: 131072, Output Limit: 131072
|
|
|
azurecognitiveservices
|
Meta-Llama-3-70B-Instruct |
meta-llama-3-70b-instruct
|
2.68 |
3.54 |
Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
|
|
|
azurecognitiveservices
|
text-embedding-3-large |
text-embedding-3-large
|
0.13 |
0.00 |
Provider: Azure Cognitive Services, Context: 8191, Output Limit: 3072
|
|
|
azurecognitiveservices
|
Grok 3 |
grok-3
|
3.00 |
15.00 |
Provider: Azure Cognitive Services, Context: 131072, Output Limit: 8192
|
|
|
azurecognitiveservices
|
GPT-3.5 Turbo 0125 |
gpt-3.5-turbo-0125
|
0.50 |
1.50 |
Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
|
|
|
azurecognitiveservices
|
Claude Sonnet 4.5 |
claude-sonnet-4-5
|
3.00 |
15.00 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
|
|
|
azurecognitiveservices
|
Phi-4-mini-reasoning |
phi-4-mini-reasoning
|
0.08 |
0.30 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Phi-4 |
phi-4
|
0.13 |
0.50 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
DeepSeek-V3.1 |
deepseek-v3.1
|
0.56 |
1.68 |
Provider: Azure Cognitive Services, Context: 131072, Output Limit: 131072
|
|
|
azurecognitiveservices
|
GPT-5 Chat |
gpt-5-chat
|
1.25 |
10.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
|
|
|
azurecognitiveservices
|
GPT-4.1 mini |
gpt-4.1-mini
|
0.40 |
1.60 |
Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
|
|
|
azurecognitiveservices
|
Llama 4 Maverick 17B 128E Instruct FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.25 |
1.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
|
|
|
azurecognitiveservices
|
Command R+ |
cohere-command-r-plus-08-2024
|
2.50 |
10.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4000
|
|
|
azurecognitiveservices
|
Command A |
cohere-command-a
|
2.50 |
10.00 |
Provider: Azure Cognitive Services, Context: 256000, Output Limit: 8000
|
|
|
azurecognitiveservices
|
Phi-3-small-instruct (128k) |
phi-3-small-128k-instruct
|
0.15 |
0.60 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Claude Opus 4.5 |
claude-opus-4-5
|
5.00 |
25.00 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
|
|
|
azurecognitiveservices
|
Mistral Medium 3 |
mistral-medium-2505
|
0.40 |
2.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
DeepSeek-V3.2-Speciale |
deepseek-v3.2-speciale
|
0.28 |
0.42 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
Claude Haiku 4.5 |
claude-haiku-4-5
|
1.00 |
5.00 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
|
|
|
azurecognitiveservices
|
Phi-3-mini-instruct (4k) |
phi-3-mini-4k-instruct
|
0.13 |
0.52 |
Provider: Azure Cognitive Services, Context: 4096, Output Limit: 1024
|
|
|
azurecognitiveservices
|
GPT-5.1 Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
|
|
|
azurecognitiveservices
|
Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
Provider: Azure Cognitive Services, Context: 256000, Output Limit: 10000
|
|
|
azurecognitiveservices
|
DeepSeek-R1 |
deepseek-r1
|
1.35 |
5.40 |
Provider: Azure Cognitive Services, Context: 163840, Output Limit: 163840
|
|
|
azurecognitiveservices
|
Meta-Llama-3.1-405B-Instruct |
meta-llama-3.1-405b-instruct
|
5.33 |
16.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
|
|
|
azurecognitiveservices
|
GPT-4 32K |
gpt-4-32k
|
60.00 |
120.00 |
Provider: Azure Cognitive Services, Context: 32768, Output Limit: 32768
|
|
|
azurecognitiveservices
|
Phi-4-mini |
phi-4-mini
|
0.08 |
0.30 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Embed v3 Multilingual |
cohere-embed-v3-multilingual
|
0.10 |
0.00 |
Provider: Azure Cognitive Services, Context: 512, Output Limit: 1024
|
|
|
azurecognitiveservices
|
Grok 4 |
grok-4
|
3.00 |
15.00 |
Provider: Azure Cognitive Services, Context: 256000, Output Limit: 64000
|
|
|
azurecognitiveservices
|
Command R |
cohere-command-r-08-2024
|
0.15 |
0.60 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4000
|
|
|
azurecognitiveservices
|
Embed v4 |
cohere-embed-v-4-0
|
0.12 |
0.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 1536
|
|
|
azurecognitiveservices
|
Llama-3.2-11B-Vision-Instruct |
llama-3.2-11b-vision-instruct
|
0.37 |
0.37 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
|
|
|
azurecognitiveservices
|
GPT-5.2 Chat |
gpt-5.2-chat
|
1.75 |
14.00 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
|
|
|
azurecognitiveservices
|
Claude Opus 4.1 |
claude-opus-4-1
|
15.00 |
75.00 |
Provider: Azure Cognitive Services, Context: 200000, Output Limit: 32000
|
|
|
azurecognitiveservices
|
GPT-4 |
gpt-4
|
60.00 |
120.00 |
Provider: Azure Cognitive Services, Context: 8192, Output Limit: 8192
|
|
|
azurecognitiveservices
|
Phi-3-medium-instruct (128k) |
phi-3-medium-128k-instruct
|
0.17 |
0.68 |
Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
|
|
|
azurecognitiveservices
|
Grok 4 Fast (Reasoning) |
grok-4-fast-reasoning
|
0.20 |
0.50 |
Provider: Azure Cognitive Services, Context: 2000000, Output Limit: 30000
|
|
|
azurecognitiveservices
|
DeepSeek-R1-0528 |
deepseek-r1-0528
|
1.35 |
5.40 |
Provider: Azure Cognitive Services, Context: 163840, Output Limit: 163840
|
|
|
azurecognitiveservices
|
Grok 4 Fast (Non-Reasoning) |
grok-4-fast-non-reasoning
|
0.20 |
0.50 |
Provider: Azure Cognitive Services, Context: 2000000, Output Limit: 30000
|
|
|
azurecognitiveservices
|
text-embedding-3-small |
text-embedding-3-small
|
0.02 |
0.00 |
Provider: Azure Cognitive Services, Context: 8191, Output Limit: 1536
|
|
|
azurecognitiveservices
|
GPT-4.1 nano |
gpt-4.1-nano
|
0.10 |
0.40 |
Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
|
|
|
llama
|
Llama-3.3-8B-Instruct |
llama-3.3-8b-instruct
|
0.00 |
0.00 |
Provider: Llama, Context: 128000, Output Limit: 4096
|
|
|
llama
|
Llama-4-Maverick-17B-128E-Instruct-FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.00 |
0.00 |
Provider: Llama, Context: 128000, Output Limit: 4096
|
|
|
llama
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.00 |
0.00 |
Provider: Llama, Context: 128000, Output Limit: 4096
|
|
|
llama
|
Llama-4-Scout-17B-16E-Instruct-FP8 |
llama-4-scout-17b-16e-instruct-fp8
|
0.00 |
0.00 |
Provider: Llama, Context: 128000, Output Limit: 4096
|
|
|
llama
|
Groq-Llama-4-Maverick-17B-128E-Instruct |
groq-llama-4-maverick-17b-128e-instruct
|
0.00 |
0.00 |
Provider: Llama, Context: 128000, Output Limit: 4096
|
|
|
llama
|
Cerebras-Llama-4-Scout-17B-16E-Instruct |
cerebras-llama-4-scout-17b-16e-instruct
|
0.00 |
0.00 |
Provider: Llama, Context: 128000, Output Limit: 4096
|
|
|
llama
|
Cerebras-Llama-4-Maverick-17B-128E-Instruct |
cerebras-llama-4-maverick-17b-128e-instruct
|
0.00 |
0.00 |
Provider: Llama, Context: 128000, Output Limit: 4096
|
|
|
scaleway
|
Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-instruct-2507
|
0.75 |
2.25 |
Provider: Scaleway, Context: 260000, Output Limit: 8192
|
|
|
scaleway
|
Pixtral 12B 2409 |
pixtral-12b-2409
|
0.20 |
0.20 |
Provider: Scaleway, Context: 128000, Output Limit: 4096
|
|
|
scaleway
|
Llama 3.1 8B Instruct |
llama-3.1-8b-instruct
|
0.20 |
0.20 |
Provider: Scaleway, Context: 128000, Output Limit: 16384
|
|
|
scaleway
|
Mistral Nemo Instruct 2407 |
mistral-nemo-instruct-2407
|
0.20 |
0.20 |
Provider: Scaleway, Context: 128000, Output Limit: 8192
|
|
|
scaleway
|
Mistral Small 3.2 24B Instruct (2506) |
mistral-small-3.2-24b-instruct-2506
|
0.15 |
0.35 |
Provider: Scaleway, Context: 128000, Output Limit: 8192
|
|
|
scaleway
|
Qwen3-Coder 30B-A3B Instruct |
qwen3-coder-30b-a3b-instruct
|
0.20 |
0.80 |
Provider: Scaleway, Context: 128000, Output Limit: 8192
|
|
|
scaleway
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.90 |
0.90 |
Provider: Scaleway, Context: 100000, Output Limit: 4096
|
|
|
scaleway
|
Whisper Large v3 |
whisper-large-v3
|
0.00 |
0.00 |
Provider: Scaleway, Context: N/A, Output Limit: 4096
|
|
|
scaleway
|
DeepSeek R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.90 |
0.90 |
Provider: Scaleway, Context: 32000, Output Limit: 4096
|
|
|
scaleway
|
Voxtral Small 24B 2507 |
voxtral-small-24b-2507
|
0.15 |
0.35 |
Provider: Scaleway, Context: 32000, Output Limit: 8192
|
|
|
scaleway
|
GPT-OSS 120B |
gpt-oss-120b
|
0.15 |
0.60 |
Provider: Scaleway, Context: 128000, Output Limit: 8192
|
|
|
scaleway
|
BGE Multilingual Gemma2 |
bge-multilingual-gemma2
|
0.13 |
0.00 |
Provider: Scaleway, Context: 8191, Output Limit: 3072
|
|
|
scaleway
|
Gemma-3-27B-IT |
gemma-3-27b-it
|
0.25 |
0.50 |
Provider: Scaleway, Context: 40000, Output Limit: 8192
|
|
|
amazonbedrock
|
Command R+ |
cohere.command-r-plus-v1:0
|
3.00 |
15.00 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude 2 |
anthropic.claude-v2
|
8.00 |
24.00 |
Provider: Amazon Bedrock, Context: 100000, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude Sonnet 3.7 |
anthropic.claude-3-7-sonnet-20250219-v1:0
|
3.00 |
15.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
|
|
|
amazonbedrock
|
Claude Sonnet 4 |
anthropic.claude-sonnet-4-20250514-v1:0
|
3.00 |
15.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
|
|
|
amazonbedrock
|
Qwen3 Coder 30B A3B Instruct |
qwen.qwen3-coder-30b-a3b-v1:0
|
0.15 |
0.60 |
Provider: Amazon Bedrock, Context: 262144, Output Limit: 131072
|
|
|
amazonbedrock
|
Gemma 3 4B IT |
google.gemma-3-4b-it
|
0.04 |
0.08 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
MiniMax M2 |
minimax.minimax-m2
|
0.30 |
1.20 |
Provider: Amazon Bedrock, Context: 204608, Output Limit: 128000
|
|
|
amazonbedrock
|
Llama 3.2 11B Instruct |
meta.llama3-2-11b-instruct-v1:0
|
0.16 |
0.16 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Qwen/Qwen3-Next-80B-A3B-Instruct |
qwen.qwen3-next-80b-a3b
|
0.14 |
1.40 |
Provider: Amazon Bedrock, Context: 262000, Output Limit: 262000
|
|
|
amazonbedrock
|
Claude Haiku 3 |
anthropic.claude-3-haiku-20240307-v1:0
|
0.25 |
1.25 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
|
|
|
amazonbedrock
|
Llama 3.2 90B Instruct |
meta.llama3-2-90b-instruct-v1:0
|
0.72 |
0.72 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Qwen/Qwen3-VL-235B-A22B-Instruct |
qwen.qwen3-vl-235b-a22b
|
0.30 |
1.50 |
Provider: Amazon Bedrock, Context: 262000, Output Limit: 262000
|
|
|
amazonbedrock
|
Llama 3.2 1B Instruct |
meta.llama3-2-1b-instruct-v1:0
|
0.10 |
0.10 |
Provider: Amazon Bedrock, Context: 131000, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude 2.1 |
anthropic.claude-v2:1
|
8.00 |
24.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
|
|
|
amazonbedrock
|
DeepSeek-V3.1 |
deepseek.v3-v1:0
|
0.58 |
1.68 |
Provider: Amazon Bedrock, Context: 163840, Output Limit: 81920
|
|
|
amazonbedrock
|
Claude Opus 4.5 |
anthropic.claude-opus-4-5-20251101-v1:0
|
5.00 |
25.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
|
|
|
amazonbedrock
|
Command Light |
cohere.command-light-text-v14
|
0.30 |
0.60 |
Provider: Amazon Bedrock, Context: 4096, Output Limit: 4096
|
|
|
amazonbedrock
|
Mistral Large (24.02) |
mistral.mistral-large-2402-v1:0
|
0.50 |
1.50 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Google Gemma 3 27B Instruct |
google.gemma-3-27b-it
|
0.12 |
0.20 |
Provider: Amazon Bedrock, Context: 202752, Output Limit: 8192
|
|
|
amazonbedrock
|
NVIDIA Nemotron Nano 12B v2 VL BF16 |
nvidia.nemotron-nano-12b-v2
|
0.20 |
0.60 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Google Gemma 3 12B |
google.gemma-3-12b-it
|
0.05 |
0.10 |
Provider: Amazon Bedrock, Context: 131072, Output Limit: 8192
|
|
|
amazonbedrock
|
Jamba 1.5 Large |
ai21.jamba-1-5-large-v1:0
|
2.00 |
8.00 |
Provider: Amazon Bedrock, Context: 256000, Output Limit: 4096
|
|
|
amazonbedrock
|
Llama 3.3 70B Instruct |
meta.llama3-3-70b-instruct-v1:0
|
0.72 |
0.72 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude Opus 3 |
anthropic.claude-3-opus-20240229-v1:0
|
15.00 |
75.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
|
|
|
amazonbedrock
|
Nova Pro |
amazon.nova-pro-v1:0
|
0.80 |
3.20 |
Provider: Amazon Bedrock, Context: 300000, Output Limit: 8192
|
|
|
amazonbedrock
|
Llama 3.1 8B Instruct |
meta.llama3-1-8b-instruct-v1:0
|
0.22 |
0.22 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
gpt-oss-120b |
openai.gpt-oss-120b-1:0
|
0.15 |
0.60 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Qwen3 32B (dense) |
qwen.qwen3-32b-v1:0
|
0.15 |
0.60 |
Provider: Amazon Bedrock, Context: 16384, Output Limit: 16384
|
|
|
amazonbedrock
|
Claude Sonnet 3.5 |
anthropic.claude-3-5-sonnet-20240620-v1:0
|
3.00 |
15.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
|
|
|
amazonbedrock
|
Claude Haiku 4.5 |
anthropic.claude-haiku-4-5-20251001-v1:0
|
1.00 |
5.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
|
|
|
amazonbedrock
|
Command R |
cohere.command-r-v1:0
|
0.50 |
1.50 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Voxtral Small 24B 2507 |
mistral.voxtral-small-24b-2507
|
0.15 |
0.35 |
Provider: Amazon Bedrock, Context: 32000, Output Limit: 8192
|
|
|
amazonbedrock
|
Nova Micro |
amazon.nova-micro-v1:0
|
0.04 |
0.14 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 8192
|
|
|
amazonbedrock
|
Llama 3.1 70B Instruct |
meta.llama3-1-70b-instruct-v1:0
|
0.72 |
0.72 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Llama 3 70B Instruct |
meta.llama3-70b-instruct-v1:0
|
2.65 |
3.50 |
Provider: Amazon Bedrock, Context: 8192, Output Limit: 2048
|
|
|
amazonbedrock
|
DeepSeek-R1 |
deepseek.r1-v1:0
|
1.35 |
5.40 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 32768
|
|
|
amazonbedrock
|
Claude Sonnet 3.5 v2 |
anthropic.claude-3-5-sonnet-20241022-v2:0
|
3.00 |
15.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
|
|
|
amazonbedrock
|
Ministral 3 8B |
mistral.ministral-3-8b-instruct
|
0.15 |
0.15 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Command |
cohere.command-text-v14
|
1.50 |
2.00 |
Provider: Amazon Bedrock, Context: 4096, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude Opus 4 |
anthropic.claude-opus-4-20250514-v1:0
|
15.00 |
75.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 32000
|
|
|
amazonbedrock
|
Voxtral Mini 3B 2507 |
mistral.voxtral-mini-3b-2507
|
0.04 |
0.04 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude Opus 4.5 (Global) |
global.anthropic.claude-opus-4-5-20251101-v1:0
|
5.00 |
25.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
|
|
|
amazonbedrock
|
Nova 2 Lite |
amazon.nova-2-lite-v1:0
|
0.33 |
2.75 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Qwen3 Coder 480B A35B Instruct |
qwen.qwen3-coder-480b-a35b-v1:0
|
0.22 |
1.80 |
Provider: Amazon Bedrock, Context: 131072, Output Limit: 65536
|
|
|
amazonbedrock
|
Claude Sonnet 4.5 |
anthropic.claude-sonnet-4-5-20250929-v1:0
|
3.00 |
15.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
|
|
|
amazonbedrock
|
GPT OSS Safeguard 20B |
openai.gpt-oss-safeguard-20b
|
0.07 |
0.20 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
gpt-oss-20b |
openai.gpt-oss-20b-1:0
|
0.07 |
0.30 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Llama 3.2 3B Instruct |
meta.llama3-2-3b-instruct-v1:0
|
0.15 |
0.15 |
Provider: Amazon Bedrock, Context: 131000, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude Instant |
anthropic.claude-instant-v1
|
0.80 |
2.40 |
Provider: Amazon Bedrock, Context: 100000, Output Limit: 4096
|
|
|
amazonbedrock
|
Nova Premier |
amazon.nova-premier-v1:0
|
2.50 |
12.50 |
Provider: Amazon Bedrock, Context: 1000000, Output Limit: 16384
|
|
|
amazonbedrock
|
Mistral-7B-Instruct-v0.3 |
mistral.mistral-7b-instruct-v0:2
|
0.11 |
0.11 |
Provider: Amazon Bedrock, Context: 127000, Output Limit: 127000
|
|
|
amazonbedrock
|
Mixtral-8x7B-Instruct-v0.1 |
mistral.mixtral-8x7b-instruct-v0:1
|
0.70 |
0.70 |
Provider: Amazon Bedrock, Context: 32000, Output Limit: 32000
|
|
|
amazonbedrock
|
Claude Opus 4.1 |
anthropic.claude-opus-4-1-20250805-v1:0
|
15.00 |
75.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 32000
|
|
|
amazonbedrock
|
Llama 4 Scout 17B Instruct |
meta.llama4-scout-17b-instruct-v1:0
|
0.17 |
0.66 |
Provider: Amazon Bedrock, Context: 3500000, Output Limit: 16384
|
|
|
amazonbedrock
|
Jamba 1.5 Mini |
ai21.jamba-1-5-mini-v1:0
|
0.20 |
0.40 |
Provider: Amazon Bedrock, Context: 256000, Output Limit: 4096
|
|
|
amazonbedrock
|
Llama 3 8B Instruct |
meta.llama3-8b-instruct-v1:0
|
0.30 |
0.60 |
Provider: Amazon Bedrock, Context: 8192, Output Limit: 2048
|
|
|
amazonbedrock
|
Titan Text G1 - Express |
amazon.titan-text-express-v1:0:8k
|
0.20 |
0.60 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Claude Sonnet 3 |
anthropic.claude-3-sonnet-20240229-v1:0
|
3.00 |
15.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
|
|
|
amazonbedrock
|
NVIDIA Nemotron Nano 9B v2 |
nvidia.nemotron-nano-9b-v2
|
0.06 |
0.23 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Titan Text G1 - Express |
amazon.titan-text-express-v1
|
0.20 |
0.60 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Llama 4 Maverick 17B Instruct |
meta.llama4-maverick-17b-instruct-v1:0
|
0.24 |
0.97 |
Provider: Amazon Bedrock, Context: 1000000, Output Limit: 16384
|
|
|
amazonbedrock
|
Ministral 14B 3.0 |
mistral.ministral-3-14b-instruct
|
0.20 |
0.20 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
GPT OSS Safeguard 120B |
openai.gpt-oss-safeguard-120b
|
0.15 |
0.60 |
Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
|
|
|
amazonbedrock
|
Qwen3 235B A22B 2507 |
qwen.qwen3-235b-a22b-2507-v1:0
|
0.22 |
0.88 |
Provider: Amazon Bedrock, Context: 262144, Output Limit: 131072
|
|
|
amazonbedrock
|
Nova Lite |
amazon.nova-lite-v1:0
|
0.06 |
0.24 |
Provider: Amazon Bedrock, Context: 300000, Output Limit: 8192
|
|
|
amazonbedrock
|
Claude Haiku 3.5 |
anthropic.claude-3-5-haiku-20241022-v1:0
|
0.80 |
4.00 |
Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
|
|
|
amazonbedrock
|
Kimi K2 Thinking |
moonshot.kimi-k2-thinking
|
0.60 |
2.50 |
Provider: Amazon Bedrock, Context: 256000, Output Limit: 256000
|
|
|
cerebras
|
Qwen 3 235B Instruct |
qwen-3-235b-a22b-instruct-2507
|
0.60 |
1.20 |
Provider: Cerebras, Context: 131000, Output Limit: 32000
|
|
|
cerebras
|
zai-glm-4.6 |
zai-glm-4.6
|
2.25 |
2.75 |
Source: cerebras, Context: 128000
|
|
|
cerebras
|
GPT OSS 120B |
gpt-oss-120b
|
0.25 |
0.69 |
Provider: Cerebras, Context: 131072, Output Limit: 32768
|
|
|
bedrock
|
amazon.nova-canvas-v1:0 |
amazon.nova-canvas-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 2600
|
|
|
bedrock
|
stability.stable-diffusion-xl-v1 |
stability.stable-diffusion-xl-v1
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
openai
|
dall-e-2 |
dall-e-2
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
bedrock
|
stability.stable-diffusion-xl-v0 |
stability.stable-diffusion-xl-v0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
ai21.j2-mid-v1 |
ai21.j2-mid-v1
|
12.50 |
12.50 |
Source: bedrock, Context: 8191
|
|
|
bedrock
|
ai21.j2-ultra-v1 |
ai21.j2-ultra-v1
|
18.80 |
18.80 |
Source: bedrock, Context: 8191
|
|
|
bedrock
|
ai21.jamba-1-5-large-v1:0 |
ai21.jamba-1-5-large-v1:0
|
2.00 |
8.00 |
Source: bedrock, Context: 256000
|
|
|
bedrock
|
ai21.jamba-1-5-mini-v1:0 |
ai21.jamba-1-5-mini-v1:0
|
0.20 |
0.40 |
Source: bedrock, Context: 256000
|
|
|
bedrock
|
ai21.jamba-instruct-v1:0 |
ai21.jamba-instruct-v1:0
|
0.50 |
0.70 |
Source: bedrock, Context: 70000
|
|
|
aiml
|
dall-e-2 |
dall-e-2
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
dall-e-3 |
dall-e-3
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
flux-pro |
flux-pro
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
v1.1 |
v1.1
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
v1.1-ultra |
v1.1-ultra
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
flux-realism |
flux-realism
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
dev |
dev
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
text-to-image |
text-to-image
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
schnell |
schnell
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
imagen-4.0-ultra-generate-001 |
imagen-4.0-ultra-generate-001
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
aiml
|
nano-banana-pro |
nano-banana-pro
|
0.00 |
0.00 |
Source: aiml, Context: N/A
|
|
|
bedrockconverse
|
us.writer.palmyra-x4-v1:0 |
us.writer.palmyra-x4-v1:0
|
2.50 |
10.00 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
us.writer.palmyra-x5-v1:0 |
us.writer.palmyra-x5-v1:0
|
0.60 |
6.00 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
writer.palmyra-x4-v1:0 |
writer.palmyra-x4-v1:0
|
2.50 |
10.00 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
writer.palmyra-x5-v1:0 |
writer.palmyra-x5-v1:0
|
0.60 |
6.00 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
amazon.nova-lite-v1:0 |
amazon.nova-lite-v1:0
|
0.06 |
0.24 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrockconverse
|
amazon.nova-2-lite-v1:0 |
amazon.nova-2-lite-v1:0
|
0.30 |
2.50 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
apac.amazon.nova-2-lite-v1:0 |
apac.amazon.nova-2-lite-v1:0
|
0.33 |
2.75 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
eu.amazon.nova-2-lite-v1:0 |
eu.amazon.nova-2-lite-v1:0
|
0.33 |
2.75 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
us.amazon.nova-2-lite-v1:0 |
us.amazon.nova-2-lite-v1:0
|
0.33 |
2.75 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
amazon.nova-micro-v1:0 |
amazon.nova-micro-v1:0
|
0.04 |
0.14 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
amazon.nova-pro-v1:0 |
amazon.nova-pro-v1:0
|
0.80 |
3.20 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrock
|
amazon.rerank-v1:0 |
amazon.rerank-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 32000
|
|
|
bedrock
|
amazon.titan-embed-image-v1 |
amazon.titan-embed-image-v1
|
0.80 |
0.00 |
Source: bedrock, Context: 128
|
|
|
bedrock
|
amazon.titan-embed-text-v1 |
amazon.titan-embed-text-v1
|
0.10 |
0.00 |
Source: bedrock, Context: 8192
|
|
|
bedrock
|
amazon.titan-embed-text-v2:0 |
amazon.titan-embed-text-v2:0
|
0.20 |
0.00 |
Source: bedrock, Context: 8192
|
|
|
bedrock
|
amazon.titan-image-generator-v1 |
amazon.titan-image-generator-v1
|
0.00 |
0.00 |
Source: bedrock, Context: N/A
|
|
|
bedrock
|
amazon.titan-image-generator-v2 |
amazon.titan-image-generator-v2
|
0.00 |
0.00 |
Source: bedrock, Context: N/A
|
|
|
bedrock
|
amazon.titan-image-generator-v2:0 |
amazon.titan-image-generator-v2:0
|
0.00 |
0.00 |
Source: bedrock, Context: N/A
|
|
|
bedrock
|
twelvelabs.marengo-embed-2-7-v1:0 |
twelvelabs.marengo-embed-2-7-v1:0
|
70.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
us.twelvelabs.marengo-embed-2-7-v1:0 |
us.twelvelabs.marengo-embed-2-7-v1:0
|
70.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
eu.twelvelabs.marengo-embed-2-7-v1:0 |
eu.twelvelabs.marengo-embed-2-7-v1:0
|
70.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
twelvelabs.pegasus-1-2-v1:0 |
twelvelabs.pegasus-1-2-v1:0
|
0.00 |
7.50 |
Source: bedrock, Context: N/A
|
|
|
bedrock
|
us.twelvelabs.pegasus-1-2-v1:0 |
us.twelvelabs.pegasus-1-2-v1:0
|
0.00 |
7.50 |
Source: bedrock, Context: N/A
|
|
|
bedrock
|
eu.twelvelabs.pegasus-1-2-v1:0 |
eu.twelvelabs.pegasus-1-2-v1:0
|
0.00 |
7.50 |
Source: bedrock, Context: N/A
|
|
|
bedrock
|
amazon.titan-text-express-v1 |
amazon.titan-text-express-v1
|
1.30 |
1.70 |
Source: bedrock, Context: 42000
|
|
|
bedrock
|
amazon.titan-text-lite-v1 |
amazon.titan-text-lite-v1
|
0.30 |
0.40 |
Source: bedrock, Context: 42000
|
|
|
bedrock
|
amazon.titan-text-premier-v1:0 |
amazon.titan-text-premier-v1:0
|
0.50 |
1.50 |
Source: bedrock, Context: 42000
|
|
|
bedrock
|
anthropic.claude-3-5-haiku-20241022-v1:0 |
anthropic.claude-3-5-haiku-20241022-v1:0
|
0.80 |
4.00 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
anthropic.claude-haiku-4-5-20251001-v1:0 |
anthropic.claude-haiku-4-5-20251001-v1:0
|
1.00 |
5.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
anthropic.claude-haiku-4-5@20251001 |
anthropic.claude-haiku-4-5@20251001
|
1.00 |
5.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
anthropic.claude-3-5-sonnet-20240620-v1:0 |
anthropic.claude-3-5-sonnet-20240620-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
anthropic.claude-3-5-sonnet-20241022-v2:0 |
anthropic.claude-3-5-sonnet-20241022-v2:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
anthropic.claude-3-7-sonnet-20240620-v1:0 |
anthropic.claude-3-7-sonnet-20240620-v1:0
|
3.60 |
18.00 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
anthropic.claude-3-7-sonnet-20250219-v1:0 |
anthropic.claude-3-7-sonnet-20250219-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
anthropic.claude-3-haiku-20240307-v1:0 |
anthropic.claude-3-haiku-20240307-v1:0
|
0.25 |
1.25 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
anthropic.claude-3-opus-20240229-v1:0 |
anthropic.claude-3-opus-20240229-v1:0
|
15.00 |
75.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
anthropic.claude-3-sonnet-20240229-v1:0 |
anthropic.claude-3-sonnet-20240229-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
anthropic.claude-instant-v1 |
anthropic.claude-instant-v1
|
0.80 |
2.40 |
Source: bedrock, Context: 100000
|
|
|
bedrockconverse
|
anthropic.claude-opus-4-1-20250805-v1:0 |
anthropic.claude-opus-4-1-20250805-v1:0
|
15.00 |
75.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
anthropic.claude-opus-4-20250514-v1:0 |
anthropic.claude-opus-4-20250514-v1:0
|
15.00 |
75.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
anthropic.claude-opus-4-5-20251101-v1:0 |
anthropic.claude-opus-4-5-20251101-v1:0
|
5.00 |
25.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
anthropic.claude-sonnet-4-20250514-v1:0 |
anthropic.claude-sonnet-4-20250514-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
anthropic.claude-sonnet-4-5-20250929-v1:0 |
anthropic.claude-sonnet-4-5-20250929-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
anthropic.claude-v1 |
anthropic.claude-v1
|
8.00 |
24.00 |
Source: bedrock, Context: 100000
|
|
|
bedrock
|
anthropic.claude-v2:1 |
anthropic.claude-v2:1
|
8.00 |
24.00 |
Source: bedrock, Context: 100000
|
|
|
anyscale
|
zephyr-7b-beta |
zephyr-7b-beta
|
0.15 |
0.15 |
Source: anyscale, Context: 16384
|
|
|
anyscale
|
CodeLlama-34b-Instruct-hf |
codellama-34b-instruct-hf
|
1.00 |
1.00 |
Source: anyscale, Context: 4096
|
|
|
anyscale
|
CodeLlama-70b-Instruct-hf |
codellama-70b-instruct-hf
|
1.00 |
1.00 |
Source: anyscale, Context: 4096
|
|
|
anyscale
|
gemma-7b-it |
gemma-7b-it
|
0.15 |
0.15 |
Source: anyscale, Context: 8192
|
|
|
anyscale
|
Llama-2-13b-chat-hf |
llama-2-13b-chat-hf
|
0.25 |
0.25 |
Source: anyscale, Context: 4096
|
|
|
anyscale
|
Llama-2-70b-chat-hf |
llama-2-70b-chat-hf
|
1.00 |
1.00 |
Source: anyscale, Context: 4096
|
|
|
anyscale
|
Llama-2-7b-chat-hf |
llama-2-7b-chat-hf
|
0.15 |
0.15 |
Source: anyscale, Context: 4096
|
|
|
anyscale
|
Meta-Llama-3-70B-Instruct |
meta-llama-3-70b-instruct
|
1.00 |
1.00 |
Source: anyscale, Context: 8192
|
|
|
anyscale
|
Meta-Llama-3-8B-Instruct |
meta-llama-3-8b-instruct
|
0.15 |
0.15 |
Source: anyscale, Context: 8192
|
|
|
anyscale
|
Mistral-7B-Instruct-v0.1 |
mistral-7b-instruct-v0.1
|
0.15 |
0.15 |
Source: anyscale, Context: 16384
|
|
|
anyscale
|
Mixtral-8x22B-Instruct-v0.1 |
mixtral-8x22b-instruct-v0.1
|
0.90 |
0.90 |
Source: anyscale, Context: 65536
|
|
|
anyscale
|
Mixtral-8x7B-Instruct-v0.1 |
mixtral-8x7b-instruct-v0.1
|
0.15 |
0.15 |
Source: anyscale, Context: 16384
|
|
|
bedrockconverse
|
apac.amazon.nova-lite-v1:0 |
apac.amazon.nova-lite-v1:0
|
0.06 |
0.25 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrockconverse
|
apac.amazon.nova-micro-v1:0 |
apac.amazon.nova-micro-v1:0
|
0.04 |
0.15 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
apac.amazon.nova-pro-v1:0 |
apac.amazon.nova-pro-v1:0
|
0.84 |
3.36 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrock
|
apac.anthropic.claude-3-5-sonnet-20240620-v1:0 |
apac.anthropic.claude-3-5-sonnet-20240620-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
apac.anthropic.claude-3-5-sonnet-20241022-v2:0 |
apac.anthropic.claude-3-5-sonnet-20241022-v2:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
apac.anthropic.claude-3-haiku-20240307-v1:0 |
apac.anthropic.claude-3-haiku-20240307-v1:0
|
0.25 |
1.25 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
apac.anthropic.claude-haiku-4-5-20251001-v1:0 |
apac.anthropic.claude-haiku-4-5-20251001-v1:0
|
1.10 |
5.50 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
apac.anthropic.claude-3-sonnet-20240229-v1:0 |
apac.anthropic.claude-3-sonnet-20240229-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
apac.anthropic.claude-sonnet-4-20250514-v1:0 |
apac.anthropic.claude-sonnet-4-20250514-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 1000000
|
|
|
assemblyai
|
best |
best
|
0.00 |
0.00 |
Source: assemblyai, Context: N/A
|
|
|
assemblyai
|
nano |
nano
|
0.00 |
0.00 |
Source: assemblyai, Context: N/A
|
|
|
bedrockconverse
|
au.anthropic.claude-sonnet-4-5-20250929-v1:0 |
au.anthropic.claude-sonnet-4-5-20250929-v1:0
|
3.30 |
16.50 |
Source: bedrock_converse, Context: 200000
|
|
|
azure
|
ada |
ada
|
0.10 |
0.00 |
Source: azure, Context: 8191
|
|
|
azure
|
command-r-plus |
command-r-plus
|
3.00 |
15.00 |
Source: azure, Context: 128000
|
|
|
azureai
|
claude-haiku-4-5 |
claude-haiku-4-5
|
1.00 |
5.00 |
Source: azure_ai, Context: 200000
|
|
|
azureai
|
claude-opus-4-1 |
claude-opus-4-1
|
15.00 |
75.00 |
Source: azure_ai, Context: 200000
|
|
|
azureai
|
claude-sonnet-4-5 |
claude-sonnet-4-5
|
3.00 |
15.00 |
Source: azure_ai, Context: 200000
|
|
|
azure
|
computer-use-preview |
computer-use-preview
|
3.00 |
12.00 |
Source: azure, Context: 8192
|
|
|
azure
|
container |
container
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azureai
|
gpt-oss-120b |
gpt-oss-120b
|
0.15 |
0.60 |
Source: azure_ai, Context: 131072
|
|
|
azure
|
gpt-4o-2024-08-06 |
gpt-4o-2024-08-06
|
2.75 |
11.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-2024-11-20 |
gpt-4o-2024-11-20
|
2.75 |
11.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-mini-2024-07-18 |
gpt-4o-mini-2024-07-18
|
0.17 |
0.66 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-mini-realtime-preview-2024-12-17 |
gpt-4o-mini-realtime-preview-2024-12-17
|
0.66 |
2.64 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-realtime-preview-2024-10-01 |
gpt-4o-realtime-preview-2024-10-01
|
5.50 |
22.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-realtime-preview-2024-12-17 |
gpt-4o-realtime-preview-2024-12-17
|
5.50 |
22.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-5-2025-08-07 |
gpt-5-2025-08-07
|
1.38 |
11.00 |
Source: azure, Context: 272000
|
|
|
azure
|
gpt-5-mini-2025-08-07 |
gpt-5-mini-2025-08-07
|
0.28 |
2.20 |
Source: azure, Context: 272000
|
|
|
azure
|
gpt-5-nano-2025-08-07 |
gpt-5-nano-2025-08-07
|
0.06 |
0.44 |
Source: azure, Context: 272000
|
|
|
azure
|
o1-2024-12-17 |
o1-2024-12-17
|
16.50 |
66.00 |
Source: azure, Context: 200000
|
|
|
azure
|
o1-mini-2024-09-12 |
o1-mini-2024-09-12
|
1.21 |
4.84 |
Source: azure, Context: 128000
|
|
|
azure
|
o1-preview-2024-09-12 |
o1-preview-2024-09-12
|
16.50 |
66.00 |
Source: azure, Context: 128000
|
|
|
azure
|
o3-mini-2025-01-31 |
o3-mini-2025-01-31
|
1.21 |
4.84 |
Source: azure, Context: 200000
|
|
|
azure
|
gpt-3.5-turbo |
gpt-3.5-turbo
|
0.50 |
1.50 |
Source: azure, Context: 4097
|
|
|
azuretext
|
gpt-3.5-turbo-instruct-0914 |
gpt-3.5-turbo-instruct-0914
|
1.50 |
2.00 |
Source: azure_text, Context: 4097
|
|
|
azure
|
gpt-35-turbo |
gpt-35-turbo
|
0.50 |
1.50 |
Source: azure, Context: 4097
|
|
|
azure
|
gpt-35-turbo-0125 |
gpt-35-turbo-0125
|
0.50 |
1.50 |
Source: azure, Context: 16384
|
|
|
azure
|
gpt-35-turbo-0301 |
gpt-35-turbo-0301
|
0.20 |
2.00 |
Source: azure, Context: 4097
|
|
|
azure
|
gpt-35-turbo-0613 |
gpt-35-turbo-0613
|
1.50 |
2.00 |
Source: azure, Context: 4097
|
|
|
azure
|
gpt-35-turbo-1106 |
gpt-35-turbo-1106
|
1.00 |
2.00 |
Source: azure, Context: 16384
|
|
|
azure
|
gpt-35-turbo-16k |
gpt-35-turbo-16k
|
3.00 |
4.00 |
Source: azure, Context: 16385
|
|
|
azure
|
gpt-35-turbo-16k-0613 |
gpt-35-turbo-16k-0613
|
3.00 |
4.00 |
Source: azure, Context: 16385
|
|
|
azuretext
|
gpt-35-turbo-instruct |
gpt-35-turbo-instruct
|
1.50 |
2.00 |
Source: azure_text, Context: 4097
|
|
|
azuretext
|
gpt-35-turbo-instruct-0914 |
gpt-35-turbo-instruct-0914
|
1.50 |
2.00 |
Source: azure_text, Context: 4097
|
|
|
azure
|
gpt-4-0125-preview |
gpt-4-0125-preview
|
10.00 |
30.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4-0613 |
gpt-4-0613
|
30.00 |
60.00 |
Source: azure, Context: 8192
|
|
|
azure
|
gpt-4-1106-preview |
gpt-4-1106-preview
|
10.00 |
30.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4-32k-0613 |
gpt-4-32k-0613
|
60.00 |
120.00 |
Source: azure, Context: 32768
|
|
|
azure
|
gpt-4-turbo-2024-04-09 |
gpt-4-turbo-2024-04-09
|
10.00 |
30.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4-turbo-vision-preview |
gpt-4-turbo-vision-preview
|
10.00 |
30.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4.1-2025-04-14 |
gpt-4.1-2025-04-14
|
2.00 |
8.00 |
Source: azure, Context: 1047576
|
|
|
azure
|
gpt-4.1-mini-2025-04-14 |
gpt-4.1-mini-2025-04-14
|
0.40 |
1.60 |
Source: azure, Context: 1047576
|
|
|
azure
|
gpt-4.1-nano-2025-04-14 |
gpt-4.1-nano-2025-04-14
|
0.10 |
0.40 |
Source: azure, Context: 1047576
|
|
|
azure
|
gpt-4.5-preview |
gpt-4.5-preview
|
75.00 |
150.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-2024-05-13 |
gpt-4o-2024-05-13
|
5.00 |
15.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-audio-2025-08-28 |
gpt-audio-2025-08-28
|
2.50 |
10.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-audio-mini-2025-10-06 |
gpt-audio-mini-2025-10-06
|
0.60 |
2.40 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-audio-preview-2024-12-17 |
gpt-4o-audio-preview-2024-12-17
|
2.50 |
10.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-4o-mini-audio-preview-2024-12-17 |
gpt-4o-mini-audio-preview-2024-12-17
|
2.50 |
10.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-realtime-2025-08-28 |
gpt-realtime-2025-08-28
|
4.00 |
16.00 |
Source: azure, Context: 32000
|
|
|
azure
|
gpt-realtime-mini-2025-10-06 |
gpt-realtime-mini-2025-10-06
|
0.60 |
2.40 |
Source: azure, Context: 32000
|
|
|
azure
|
gpt-4o-mini-transcribe |
gpt-4o-mini-transcribe
|
1.25 |
5.00 |
Source: azure, Context: 16000
|
|
|
azure
|
gpt-4o-mini-tts |
gpt-4o-mini-tts
|
2.50 |
10.00 |
Source: azure, Context: N/A
|
|
|
azure
|
gpt-4o-transcribe |
gpt-4o-transcribe
|
2.50 |
10.00 |
Source: azure, Context: 16000
|
|
|
azure
|
gpt-4o-transcribe-diarize |
gpt-4o-transcribe-diarize
|
2.50 |
10.00 |
Source: azure, Context: 16000
|
|
|
azure
|
gpt-5.1-2025-11-13 |
gpt-5.1-2025-11-13
|
1.25 |
10.00 |
Source: azure, Context: 272000
|
|
|
azure
|
gpt-5.1-chat-2025-11-13 |
gpt-5.1-chat-2025-11-13
|
1.25 |
10.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-5.1-codex-2025-11-13 |
gpt-5.1-codex-2025-11-13
|
1.25 |
10.00 |
Source: azure, Context: 272000
|
|
|
azure
|
gpt-5.1-codex-mini-2025-11-13 |
gpt-5.1-codex-mini-2025-11-13
|
0.25 |
2.00 |
Source: azure, Context: 272000
|
|
|
azure
|
gpt-5-chat-latest |
gpt-5-chat-latest
|
1.25 |
10.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-5.2-2025-12-11 |
gpt-5.2-2025-12-11
|
1.75 |
14.00 |
Source: azure, Context: 400000
|
|
|
azure
|
gpt-5.2-chat-2025-12-11 |
gpt-5.2-chat-2025-12-11
|
1.75 |
14.00 |
Source: azure, Context: 128000
|
|
|
azure
|
gpt-5.2-pro |
gpt-5.2-pro
|
21.00 |
168.00 |
Source: azure, Context: 400000
|
|
|
azure
|
gpt-5.2-pro-2025-12-11 |
gpt-5.2-pro-2025-12-11
|
21.00 |
168.00 |
Source: azure, Context: 400000
|
|
|
azure
|
gpt-image-1 |
gpt-image-1
|
5.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
dall-e-3 |
dall-e-3
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
gpt-image-1-mini |
gpt-image-1-mini
|
2.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
gpt-image-1.5 |
gpt-image-1.5
|
5.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
gpt-image-1.5-2025-12-16 |
gpt-image-1.5-2025-12-16
|
5.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
mistral-large-2402 |
mistral-large-2402
|
8.00 |
24.00 |
Source: azure, Context: 32000
|
|
|
azure
|
mistral-large-latest |
mistral-large-latest
|
8.00 |
24.00 |
Source: azure, Context: 32000
|
|
|
azure
|
o3-2025-04-16 |
o3-2025-04-16
|
2.00 |
8.00 |
Source: azure, Context: 200000
|
|
|
azure
|
o3-deep-research |
o3-deep-research
|
10.00 |
40.00 |
Source: azure, Context: 200000
|
|
|
azure
|
o3-pro |
o3-pro
|
20.00 |
80.00 |
Source: azure, Context: 200000
|
|
|
azure
|
o3-pro-2025-06-10 |
o3-pro-2025-06-10
|
20.00 |
80.00 |
Source: azure, Context: 200000
|
|
|
azure
|
o4-mini-2025-04-16 |
o4-mini-2025-04-16
|
1.10 |
4.40 |
Source: azure, Context: 200000
|
|
|
azure
|
dall-e-2 |
dall-e-2
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
azure-tts |
azure-tts
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
azure-tts-hd |
azure-tts-hd
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
tts-1 |
tts-1
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
tts-1-hd |
tts-1-hd
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
whisper-1 |
whisper-1
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azureai
|
Cohere-embed-v3-english |
cohere-embed-v3-english
|
0.10 |
0.00 |
Source: azure_ai, Context: 512
|
|
|
azureai
|
Cohere-embed-v3-multilingual |
cohere-embed-v3-multilingual
|
0.10 |
0.00 |
Source: azure_ai, Context: 512
|
|
|
azureai
|
FLUX-1.1-pro |
flux-1.1-pro
|
0.00 |
0.00 |
Source: azure_ai, Context: N/A
|
|
|
azureai
|
FLUX.1-Kontext-pro |
flux.1-kontext-pro
|
0.00 |
0.00 |
Source: azure_ai, Context: N/A
|
|
|
azureai
|
Llama-3.2-11B-Vision-Instruct |
llama-3.2-11b-vision-instruct
|
0.37 |
0.37 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Llama-3.2-90B-Vision-Instruct |
llama-3.2-90b-vision-instruct
|
2.04 |
2.04 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.71 |
0.71 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Llama-4-Maverick-17B-128E-Instruct-FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
1.41 |
0.35 |
Source: azure_ai, Context: 1000000
|
|
|
azureai
|
Llama-4-Scout-17B-16E-Instruct |
llama-4-scout-17b-16e-instruct
|
0.20 |
0.78 |
Source: azure_ai, Context: 10000000
|
|
|
azureai
|
Meta-Llama-3-70B-Instruct |
meta-llama-3-70b-instruct
|
1.10 |
0.37 |
Source: azure_ai, Context: 8192
|
|
|
azureai
|
Meta-Llama-3.1-405B-Instruct |
meta-llama-3.1-405b-instruct
|
5.33 |
16.00 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Meta-Llama-3.1-70B-Instruct |
meta-llama-3.1-70b-instruct
|
2.68 |
3.54 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.30 |
0.61 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Phi-3-medium-128k-instruct |
phi-3-medium-128k-instruct
|
0.17 |
0.68 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Phi-3-medium-4k-instruct |
phi-3-medium-4k-instruct
|
0.17 |
0.68 |
Source: azure_ai, Context: 4096
|
|
|
azureai
|
Phi-3-mini-128k-instruct |
phi-3-mini-128k-instruct
|
0.13 |
0.52 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Phi-3-mini-4k-instruct |
phi-3-mini-4k-instruct
|
0.13 |
0.52 |
Source: azure_ai, Context: 4096
|
|
|
azureai
|
Phi-3-small-128k-instruct |
phi-3-small-128k-instruct
|
0.15 |
0.60 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Phi-3-small-8k-instruct |
phi-3-small-8k-instruct
|
0.15 |
0.60 |
Source: azure_ai, Context: 8192
|
|
|
azureai
|
Phi-3.5-MoE-instruct |
phi-3.5-moe-instruct
|
0.16 |
0.64 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Phi-3.5-mini-instruct |
phi-3.5-mini-instruct
|
0.13 |
0.52 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Phi-3.5-vision-instruct |
phi-3.5-vision-instruct
|
0.13 |
0.52 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
Phi-4 |
phi-4
|
0.13 |
0.50 |
Source: azure_ai, Context: 16384
|
|
|
azureai
|
Phi-4-mini-instruct |
phi-4-mini-instruct
|
0.08 |
0.30 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
Phi-4-multimodal-instruct |
phi-4-multimodal-instruct
|
0.08 |
0.32 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
Phi-4-mini-reasoning |
phi-4-mini-reasoning
|
0.08 |
0.32 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
Phi-4-reasoning |
phi-4-reasoning
|
0.13 |
0.50 |
Source: azure_ai, Context: 32768
|
|
|
azureai
|
mistral-document-ai-2505 |
mistral-document-ai-2505
|
0.00 |
0.00 |
Source: azure_ai, Context: N/A
|
|
|
azureai
|
prebuilt-read |
prebuilt-read
|
0.00 |
0.00 |
Source: azure_ai, Context: N/A
|
|
|
azureai
|
prebuilt-layout |
prebuilt-layout
|
0.00 |
0.00 |
Source: azure_ai, Context: N/A
|
|
|
azureai
|
prebuilt-document |
prebuilt-document
|
0.00 |
0.00 |
Source: azure_ai, Context: N/A
|
|
|
azureai
|
MAI-DS-R1 |
mai-ds-r1
|
1.35 |
5.40 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
cohere-rerank-v3-english |
cohere-rerank-v3-english
|
0.00 |
0.00 |
Source: azure_ai, Context: 4096
|
|
|
azureai
|
cohere-rerank-v3-multilingual |
cohere-rerank-v3-multilingual
|
0.00 |
0.00 |
Source: azure_ai, Context: 4096
|
|
|
azureai
|
cohere-rerank-v3.5 |
cohere-rerank-v3.5
|
0.00 |
0.00 |
Source: azure_ai, Context: 4096
|
|
|
azureai
|
cohere-rerank-v4.0-pro |
cohere-rerank-v4.0-pro
|
0.00 |
0.00 |
Source: azure_ai, Context: 32768
|
|
|
azureai
|
cohere-rerank-v4.0-fast |
cohere-rerank-v4.0-fast
|
0.00 |
0.00 |
Source: azure_ai, Context: 32768
|
|
|
azureai
|
deepseek-v3.2 |
deepseek-v3.2
|
0.58 |
1.68 |
Source: azure_ai, Context: 163840
|
|
|
azureai
|
deepseek-v3.2-speciale |
deepseek-v3.2-speciale
|
0.58 |
1.68 |
Source: azure_ai, Context: 163840
|
|
|
azureai
|
deepseek-r1 |
deepseek-r1
|
1.35 |
5.40 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
deepseek-v3 |
deepseek-v3
|
1.14 |
4.56 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
deepseek-v3-0324 |
deepseek-v3-0324
|
1.14 |
4.56 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
embed-v-4-0 |
embed-v-4-0
|
0.12 |
0.00 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
grok-3 |
grok-3
|
3.00 |
15.00 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
grok-3-mini |
grok-3-mini
|
0.25 |
1.27 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
grok-4 |
grok-4
|
5.50 |
27.50 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
grok-4-fast-non-reasoning |
grok-4-fast-non-reasoning
|
0.43 |
1.73 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
grok-4-fast-reasoning |
grok-4-fast-reasoning
|
0.43 |
1.73 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
grok-code-fast-1 |
grok-code-fast-1
|
3.50 |
17.50 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
jais-30b-chat |
jais-30b-chat
|
3,200.00 |
9,710.00 |
Source: azure_ai, Context: 8192
|
|
|
azureai
|
jamba-instruct |
jamba-instruct
|
0.50 |
0.70 |
Source: azure_ai, Context: 70000
|
|
|
azureai
|
ministral-3b |
ministral-3b
|
0.04 |
0.04 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
mistral-large |
mistral-large
|
4.00 |
12.00 |
Source: azure_ai, Context: 32000
|
|
|
azureai
|
mistral-large-2407 |
mistral-large-2407
|
2.00 |
6.00 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
mistral-large-latest |
mistral-large-latest
|
2.00 |
6.00 |
Source: azure_ai, Context: 128000
|
|
|
azureai
|
mistral-large-3 |
mistral-large-3
|
0.50 |
1.50 |
Source: azure_ai, Context: 256000
|
|
|
azureai
|
mistral-medium-2505 |
mistral-medium-2505
|
0.40 |
2.00 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
mistral-nemo |
mistral-nemo
|
0.15 |
0.15 |
Source: azure_ai, Context: 131072
|
|
|
azureai
|
mistral-small |
mistral-small
|
1.00 |
3.00 |
Source: azure_ai, Context: 32000
|
|
|
azureai
|
mistral-small-2503 |
mistral-small-2503
|
1.00 |
3.00 |
Source: azure_ai, Context: 128000
|
|
|
textcompletionopenai
|
babbage-002 |
babbage-002
|
0.40 |
0.40 |
Source: text-completion-openai, Context: 16384
|
|
|
bedrock
|
cohere.command-light-text-v14 |
cohere.command-light-text-v14
|
0.30 |
0.60 |
Source: bedrock, Context: 4096
|
|
|
bedrock
|
cohere.command-text-v14 |
cohere.command-text-v14
|
1.50 |
2.00 |
Source: bedrock, Context: 4096
|
|
|
bedrock
|
meta.llama3-70b-instruct-v1:0 |
meta.llama3-70b-instruct-v1:0
|
3.18 |
4.20 |
Source: bedrock, Context: 8192
|
|
|
bedrock
|
meta.llama3-8b-instruct-v1:0 |
meta.llama3-8b-instruct-v1:0
|
0.36 |
0.72 |
Source: bedrock, Context: 8192
|
|
|
bedrock
|
mistral.mistral-7b-instruct-v0:2 |
mistral.mistral-7b-instruct-v0:2
|
0.20 |
0.26 |
Source: bedrock, Context: 32000
|
|
|
bedrock
|
mistral.mistral-large-2402-v1:0 |
mistral.mistral-large-2402-v1:0
|
10.40 |
31.20 |
Source: bedrock, Context: 32000
|
|
|
bedrock
|
mistral.mixtral-8x7b-instruct-v0:1 |
mistral.mixtral-8x7b-instruct-v0:1
|
0.59 |
0.91 |
Source: bedrock, Context: 32000
|
|
|
bedrock
|
amazon.nova-pro-v1:0 |
amazon.nova-pro-v1:0
|
0.96 |
3.84 |
Source: bedrock, Context: 300000
|
|
|
bedrock
|
claude-sonnet-4-5-20250929-v1:0 |
claude-sonnet-4-5-20250929-v1:0
|
3.30 |
16.50 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
anthropic.claude-3-7-sonnet-20250219-v1:0 |
anthropic.claude-3-7-sonnet-20250219-v1:0
|
3.60 |
18.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
us.anthropic.claude-3-5-haiku-20241022-v1:0 |
us.anthropic.claude-3-5-haiku-20241022-v1:0
|
0.80 |
4.00 |
Source: bedrock, Context: 200000
|
|
|
cerebras
|
llama-3.3-70b |
llama-3.3-70b
|
0.85 |
1.20 |
Source: cerebras, Context: 128000
|
|
|
cerebras
|
llama3.1-70b |
llama3.1-70b
|
0.60 |
0.60 |
Source: cerebras, Context: 128000
|
|
|
cerebras
|
llama3.1-8b |
llama3.1-8b
|
0.10 |
0.10 |
Source: cerebras, Context: 128000
|
|
|
cerebras
|
qwen-3-32b |
qwen-3-32b
|
0.40 |
0.80 |
Source: cerebras, Context: 128000
|
|
|
vertex
|
chat-bison |
chat-bison
|
0.13 |
0.13 |
Source: vertex, Context: 8192
|
|
|
vertex
|
chat-bison-32k |
chat-bison-32k
|
0.13 |
0.13 |
Source: vertex, Context: 32000
|
|
|
vertex
|
chat-bison-32k@002 |
chat-bison-32k@002
|
0.13 |
0.13 |
Source: vertex, Context: 32000
|
|
|
vertex
|
chat-bison@001 |
chat-bison@001
|
0.13 |
0.13 |
Source: vertex, Context: 8192
|
|
|
vertex
|
chat-bison@002 |
chat-bison@002
|
0.13 |
0.13 |
Source: vertex, Context: 8192
|
|
|
nlpcloud
|
chatdolphin |
chatdolphin
|
0.50 |
0.50 |
Source: nlp_cloud, Context: 16384
|
|
|
openai
|
chatgpt-4o-latest |
chatgpt-4o-latest
|
5.00 |
15.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-transcribe-diarize |
gpt-4o-transcribe-diarize
|
2.50 |
10.00 |
Source: openai, Context: 16000
|
|
|
anthropic
|
claude-3-5-sonnet-latest |
claude-3-5-sonnet-latest
|
3.00 |
15.00 |
Source: anthropic, Context: 200000
|
|
|
anthropic
|
claude-3-opus-latest |
claude-3-opus-latest
|
15.00 |
75.00 |
Source: anthropic, Context: 200000
|
|
|
anthropic
|
claude-4-opus-20250514 |
claude-4-opus-20250514
|
15.00 |
75.00 |
Source: anthropic, Context: 200000
|
|
|
anthropic
|
claude-4-sonnet-20250514 |
claude-4-sonnet-20250514
|
3.00 |
15.00 |
Source: anthropic, Context: 1000000
|
|
|
cloudflare
|
llama-2-7b-chat-fp16 |
llama-2-7b-chat-fp16
|
1.92 |
1.92 |
Source: cloudflare, Context: 3072
|
|
|
cloudflare
|
llama-2-7b-chat-int8 |
llama-2-7b-chat-int8
|
1.92 |
1.92 |
Source: cloudflare, Context: 2048
|
|
|
cloudflare
|
mistral-7b-instruct-v0.1 |
mistral-7b-instruct-v0.1
|
1.92 |
1.92 |
Source: cloudflare, Context: 8192
|
|
|
cloudflare
|
codellama-7b-instruct-awq |
codellama-7b-instruct-awq
|
1.92 |
1.92 |
Source: cloudflare, Context: 4096
|
|
|
vertex
|
code-bison |
code-bison
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
code-bison-32k@002 |
code-bison-32k@002
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
code-bison32k |
code-bison32k
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
code-bison@001 |
code-bison@001
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
code-bison@002 |
code-bison@002
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
code-gecko |
code-gecko
|
0.13 |
0.13 |
Source: vertex, Context: 2048
|
|
|
vertex
|
code-gecko-latest |
code-gecko-latest
|
0.13 |
0.13 |
Source: vertex, Context: 2048
|
|
|
vertex
|
code-gecko@001 |
code-gecko@001
|
0.13 |
0.13 |
Source: vertex, Context: 2048
|
|
|
vertex
|
code-gecko@002 |
code-gecko@002
|
0.13 |
0.13 |
Source: vertex, Context: 2048
|
|
|
vertex
|
codechat-bison |
codechat-bison
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
codechat-bison-32k |
codechat-bison-32k
|
0.13 |
0.13 |
Source: vertex, Context: 32000
|
|
|
vertex
|
codechat-bison-32k@002 |
codechat-bison-32k@002
|
0.13 |
0.13 |
Source: vertex, Context: 32000
|
|
|
vertex
|
codechat-bison@001 |
codechat-bison@001
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
codechat-bison@002 |
codechat-bison@002
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
vertex
|
codechat-bison@latest |
codechat-bison@latest
|
0.13 |
0.13 |
Source: vertex, Context: 6144
|
|
|
codestral
|
codestral-2405 |
codestral-2405
|
0.00 |
0.00 |
Source: codestral, Context: 32000
|
|
|
codestral
|
codestral-latest |
codestral-latest
|
0.00 |
0.00 |
Source: codestral, Context: 32000
|
|
|
bedrock
|
cohere.command-r-plus-v1:0 |
cohere.command-r-plus-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
cohere.command-r-v1:0 |
cohere.command-r-v1:0
|
0.50 |
1.50 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
cohere.embed-english-v3 |
cohere.embed-english-v3
|
0.10 |
0.00 |
Source: bedrock, Context: 512
|
|
|
bedrock
|
cohere.embed-multilingual-v3 |
cohere.embed-multilingual-v3
|
0.10 |
0.00 |
Source: bedrock, Context: 512
|
|
|
bedrock
|
cohere.embed-v4:0 |
cohere.embed-v4:0
|
0.12 |
0.00 |
Source: bedrock, Context: 128000
|
|
|
cohere
|
embed-v4.0 |
embed-v4.0
|
0.12 |
0.00 |
Source: cohere, Context: 128000
|
|
|
bedrock
|
cohere.rerank-v3-5:0 |
cohere.rerank-v3-5:0
|
0.00 |
0.00 |
Source: bedrock, Context: 32000
|
|
|
cohere
|
command |
command
|
1.00 |
2.00 |
Source: cohere, Context: 4096
|
|
|
coherechat
|
command-a-03-2025 |
command-a-03-2025
|
2.50 |
10.00 |
Source: cohere_chat, Context: 256000
|
|
|
coherechat
|
command-light |
command-light
|
0.30 |
0.60 |
Source: cohere_chat, Context: 4096
|
|
|
cohere
|
command-nightly |
command-nightly
|
1.00 |
2.00 |
Source: cohere, Context: 4096
|
|
|
coherechat
|
command-r |
command-r
|
0.15 |
0.60 |
Source: cohere_chat, Context: 128000
|
|
|
coherechat
|
command-r-08-2024 |
command-r-08-2024
|
0.15 |
0.60 |
Source: cohere_chat, Context: 128000
|
|
|
coherechat
|
command-r-plus |
command-r-plus
|
2.50 |
10.00 |
Source: cohere_chat, Context: 128000
|
|
|
coherechat
|
command-r-plus-08-2024 |
command-r-plus-08-2024
|
2.50 |
10.00 |
Source: cohere_chat, Context: 128000
|
|
|
coherechat
|
command-r7b-12-2024 |
command-r7b-12-2024
|
0.15 |
0.04 |
Source: cohere_chat, Context: 128000
|
|
|
dashscope
|
qwen-coder |
qwen-coder
|
0.30 |
1.50 |
Source: dashscope, Context: 1000000
|
|
|
dashscope
|
qwen-flash |
qwen-flash
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen-flash-2025-07-28 |
qwen-flash-2025-07-28
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen-max |
qwen-max
|
1.60 |
6.40 |
Source: dashscope, Context: 30720
|
|
|
dashscope
|
qwen-plus |
qwen-plus
|
0.40 |
1.20 |
Source: dashscope, Context: 129024
|
|
|
dashscope
|
qwen-plus-2025-01-25 |
qwen-plus-2025-01-25
|
0.40 |
1.20 |
Source: dashscope, Context: 129024
|
|
|
dashscope
|
qwen-plus-2025-04-28 |
qwen-plus-2025-04-28
|
0.40 |
1.20 |
Source: dashscope, Context: 129024
|
|
|
dashscope
|
qwen-plus-2025-07-14 |
qwen-plus-2025-07-14
|
0.40 |
1.20 |
Source: dashscope, Context: 129024
|
|
|
dashscope
|
qwen-plus-2025-07-28 |
qwen-plus-2025-07-28
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen-plus-2025-09-11 |
qwen-plus-2025-09-11
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen-plus-latest |
qwen-plus-latest
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen-turbo |
qwen-turbo
|
0.05 |
0.20 |
Source: dashscope, Context: 129024
|
|
|
dashscope
|
qwen-turbo-2024-11-01 |
qwen-turbo-2024-11-01
|
0.05 |
0.20 |
Source: dashscope, Context: 1000000
|
|
|
dashscope
|
qwen-turbo-2025-04-28 |
qwen-turbo-2025-04-28
|
0.05 |
0.20 |
Source: dashscope, Context: 1000000
|
|
|
dashscope
|
qwen-turbo-latest |
qwen-turbo-latest
|
0.05 |
0.20 |
Source: dashscope, Context: 1000000
|
|
|
dashscope
|
qwen3-30b-a3b |
qwen3-30b-a3b
|
0.00 |
0.00 |
Source: dashscope, Context: 129024
|
|
|
dashscope
|
qwen3-coder-flash |
qwen3-coder-flash
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen3-coder-flash-2025-07-28 |
qwen3-coder-flash-2025-07-28
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen3-coder-plus |
qwen3-coder-plus
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen3-coder-plus-2025-07-22 |
qwen3-coder-plus-2025-07-22
|
0.00 |
0.00 |
Source: dashscope, Context: 997952
|
|
|
dashscope
|
qwen3-max-preview |
qwen3-max-preview
|
0.00 |
0.00 |
Source: dashscope, Context: 258048
|
|
|
dashscope
|
qwq-plus |
qwq-plus
|
0.80 |
2.40 |
Source: dashscope, Context: 98304
|
|
|
databricks
|
databricks-bge-large-en |
databricks-bge-large-en
|
0.10 |
0.00 |
Source: databricks, Context: 512
|
|
|
databricks
|
databricks-claude-3-7-sonnet |
databricks-claude-3-7-sonnet
|
3.00 |
15.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-claude-haiku-4-5 |
databricks-claude-haiku-4-5
|
1.00 |
5.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-claude-opus-4 |
databricks-claude-opus-4
|
15.00 |
75.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-claude-opus-4-1 |
databricks-claude-opus-4-1
|
15.00 |
75.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-claude-opus-4-5 |
databricks-claude-opus-4-5
|
5.00 |
25.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-claude-sonnet-4 |
databricks-claude-sonnet-4
|
3.00 |
15.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-claude-sonnet-4-1 |
databricks-claude-sonnet-4-1
|
3.00 |
15.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-claude-sonnet-4-5 |
databricks-claude-sonnet-4-5
|
3.00 |
15.00 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-gemini-2-5-flash |
databricks-gemini-2-5-flash
|
0.30 |
2.50 |
Source: databricks, Context: 1048576
|
|
|
databricks
|
databricks-gemini-2-5-pro |
databricks-gemini-2-5-pro
|
1.25 |
10.00 |
Source: databricks, Context: 1048576
|
|
|
databricks
|
databricks-gemma-3-12b |
databricks-gemma-3-12b
|
0.15 |
0.50 |
Source: databricks, Context: 128000
|
|
|
databricks
|
databricks-gpt-5 |
databricks-gpt-5
|
1.25 |
10.00 |
Source: databricks, Context: 400000
|
|
|
databricks
|
databricks-gpt-5-1 |
databricks-gpt-5-1
|
1.25 |
10.00 |
Source: databricks, Context: 400000
|
|
|
databricks
|
databricks-gpt-5-mini |
databricks-gpt-5-mini
|
0.25 |
2.00 |
Source: databricks, Context: 400000
|
|
|
databricks
|
databricks-gpt-5-nano |
databricks-gpt-5-nano
|
0.05 |
0.40 |
Source: databricks, Context: 400000
|
|
|
databricks
|
databricks-gpt-oss-120b |
databricks-gpt-oss-120b
|
0.15 |
0.60 |
Source: databricks, Context: 131072
|
|
|
databricks
|
databricks-gpt-oss-20b |
databricks-gpt-oss-20b
|
0.07 |
0.30 |
Source: databricks, Context: 131072
|
|
|
databricks
|
databricks-gte-large-en |
databricks-gte-large-en
|
0.13 |
0.00 |
Source: databricks, Context: 8192
|
|
|
databricks
|
databricks-llama-2-70b-chat |
databricks-llama-2-70b-chat
|
0.50 |
1.50 |
Source: databricks, Context: 4096
|
|
|
databricks
|
databricks-llama-4-maverick |
databricks-llama-4-maverick
|
0.50 |
1.50 |
Source: databricks, Context: 128000
|
|
|
databricks
|
databricks-meta-llama-3-1-405b-instruct |
databricks-meta-llama-3-1-405b-instruct
|
5.00 |
15.00 |
Source: databricks, Context: 128000
|
|
|
databricks
|
databricks-meta-llama-3-1-8b-instruct |
databricks-meta-llama-3-1-8b-instruct
|
0.15 |
0.45 |
Source: databricks, Context: 200000
|
|
|
databricks
|
databricks-meta-llama-3-3-70b-instruct |
databricks-meta-llama-3-3-70b-instruct
|
0.50 |
1.50 |
Source: databricks, Context: 128000
|
|
|
databricks
|
databricks-meta-llama-3-70b-instruct |
databricks-meta-llama-3-70b-instruct
|
1.00 |
3.00 |
Source: databricks, Context: 128000
|
|
|
databricks
|
databricks-mixtral-8x7b-instruct |
databricks-mixtral-8x7b-instruct
|
0.50 |
1.00 |
Source: databricks, Context: 4096
|
|
|
databricks
|
databricks-mpt-30b-instruct |
databricks-mpt-30b-instruct
|
1.00 |
1.00 |
Source: databricks, Context: 8192
|
|
|
databricks
|
databricks-mpt-7b-instruct |
databricks-mpt-7b-instruct
|
0.50 |
0.00 |
Source: databricks, Context: 8192
|
|
|
dataforseo
|
search |
search
|
0.00 |
0.00 |
Source: dataforseo, Context: N/A
|
|
|
textcompletionopenai
|
davinci-002 |
davinci-002
|
2.00 |
2.00 |
Source: text-completion-openai, Context: 16384
|
|
|
deepgram
|
base |
base
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
base-conversationalai |
base-conversationalai
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
base-finance |
base-finance
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
base-general |
base-general
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
base-meeting |
base-meeting
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
base-phonecall |
base-phonecall
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
base-video |
base-video
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
base-voicemail |
base-voicemail
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
enhanced |
enhanced
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
enhanced-finance |
enhanced-finance
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
enhanced-general |
enhanced-general
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
enhanced-meeting |
enhanced-meeting
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
enhanced-phonecall |
enhanced-phonecall
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova |
nova
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2 |
nova-2
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-atc |
nova-2-atc
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-automotive |
nova-2-automotive
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-conversationalai |
nova-2-conversationalai
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-drivethru |
nova-2-drivethru
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-finance |
nova-2-finance
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-general |
nova-2-general
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-meeting |
nova-2-meeting
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-phonecall |
nova-2-phonecall
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-video |
nova-2-video
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-2-voicemail |
nova-2-voicemail
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-3 |
nova-3
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-3-general |
nova-3-general
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-3-medical |
nova-3-medical
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-general |
nova-general
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
nova-phonecall |
nova-phonecall
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
whisper |
whisper
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
whisper-base |
whisper-base
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
whisper-large |
whisper-large
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
whisper-medium |
whisper-medium
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
whisper-small |
whisper-small
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepgram
|
whisper-tiny |
whisper-tiny
|
0.00 |
0.00 |
Source: deepgram, Context: N/A
|
|
|
deepinfra
|
MythoMax-L2-13b |
mythomax-l2-13b
|
0.08 |
0.09 |
Source: deepinfra, Context: 4096
|
|
|
deepinfra
|
Hermes-3-Llama-3.1-405B |
hermes-3-llama-3.1-405b
|
1.00 |
1.00 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Hermes-3-Llama-3.1-70B |
hermes-3-llama-3.1-70b
|
0.30 |
0.30 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
QwQ-32B |
qwq-32b
|
0.15 |
0.40 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Qwen2.5-72B-Instruct |
qwen2.5-72b-instruct
|
0.12 |
0.39 |
Source: deepinfra, Context: 32768
|
|
|
deepinfra
|
Qwen2.5-7B-Instruct |
qwen2.5-7b-instruct
|
0.04 |
0.10 |
Source: deepinfra, Context: 32768
|
|
|
deepinfra
|
Qwen2.5-VL-32B-Instruct |
qwen2.5-vl-32b-instruct
|
0.20 |
0.60 |
Source: deepinfra, Context: 128000
|
|
|
deepinfra
|
Qwen3-14B |
qwen3-14b
|
0.06 |
0.24 |
Source: deepinfra, Context: 40960
|
|
|
deepinfra
|
Qwen3-235B-A22B |
qwen3-235b-a22b
|
0.18 |
0.54 |
Source: deepinfra, Context: 40960
|
|
|
deepinfra
|
Qwen3-235B-A22B-Instruct-2507 |
qwen3-235b-a22b-instruct-2507
|
0.09 |
0.60 |
Source: deepinfra, Context: 262144
|
|
|
deepinfra
|
Qwen3-235B-A22B-Thinking-2507 |
qwen3-235b-a22b-thinking-2507
|
0.30 |
2.90 |
Source: deepinfra, Context: 262144
|
|
|
deepinfra
|
Qwen3-30B-A3B |
qwen3-30b-a3b
|
0.08 |
0.29 |
Source: deepinfra, Context: 40960
|
|
|
deepinfra
|
Qwen3-32B |
qwen3-32b
|
0.10 |
0.28 |
Source: deepinfra, Context: 40960
|
|
|
deepinfra
|
Qwen3-Next-80B-A3B-Instruct |
qwen3-next-80b-a3b-instruct
|
0.14 |
1.40 |
Source: deepinfra, Context: 262144
|
|
|
deepinfra
|
Qwen3-Next-80B-A3B-Thinking |
qwen3-next-80b-a3b-thinking
|
0.14 |
1.40 |
Source: deepinfra, Context: 262144
|
|
|
deepinfra
|
L3-8B-Lunaris-v1-Turbo |
l3-8b-lunaris-v1-turbo
|
0.04 |
0.05 |
Source: deepinfra, Context: 8192
|
|
|
deepinfra
|
L3.1-70B-Euryale-v2.2 |
l3.1-70b-euryale-v2.2
|
0.65 |
0.75 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
L3.3-70B-Euryale-v2.3 |
l3.3-70b-euryale-v2.3
|
0.65 |
0.75 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
olmOCR-7B-0725-FP8 |
olmocr-7b-0725-fp8
|
0.27 |
1.50 |
Source: deepinfra, Context: 16384
|
|
|
deepinfra
|
claude-3-7-sonnet-latest |
claude-3-7-sonnet-latest
|
3.30 |
16.50 |
Source: deepinfra, Context: 200000
|
|
|
deepinfra
|
claude-4-opus |
claude-4-opus
|
16.50 |
82.50 |
Source: deepinfra, Context: 200000
|
|
|
deepinfra
|
claude-4-sonnet |
claude-4-sonnet
|
3.30 |
16.50 |
Source: deepinfra, Context: 200000
|
|
|
deepinfra
|
DeepSeek-R1 |
deepseek-r1
|
0.70 |
2.40 |
Source: deepinfra, Context: 163840
|
|
|
deepinfra
|
DeepSeek-R1-0528 |
deepseek-r1-0528
|
0.50 |
2.15 |
Source: deepinfra, Context: 163840
|
|
|
deepinfra
|
DeepSeek-R1-0528-Turbo |
deepseek-r1-0528-turbo
|
1.00 |
3.00 |
Source: deepinfra, Context: 32768
|
|
|
deepinfra
|
DeepSeek-R1-Distill-Llama-70B |
deepseek-r1-distill-llama-70b
|
0.20 |
0.60 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
DeepSeek-R1-Distill-Qwen-32B |
deepseek-r1-distill-qwen-32b
|
0.27 |
0.27 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
DeepSeek-R1-Turbo |
deepseek-r1-turbo
|
1.00 |
3.00 |
Source: deepinfra, Context: 40960
|
|
|
deepinfra
|
DeepSeek-V3 |
deepseek-v3
|
0.38 |
0.89 |
Source: deepinfra, Context: 163840
|
|
|
deepinfra
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
0.25 |
0.88 |
Source: deepinfra, Context: 163840
|
|
|
deepinfra
|
DeepSeek-V3.1 |
deepseek-v3.1
|
0.27 |
1.00 |
Source: deepinfra, Context: 163840
|
|
|
deepinfra
|
DeepSeek-V3.1-Terminus |
deepseek-v3.1-terminus
|
0.27 |
1.00 |
Source: deepinfra, Context: 163840
|
|
|
deepinfra
|
gemini-2.0-flash-001 |
gemini-2.0-flash-001
|
0.10 |
0.40 |
Source: deepinfra, Context: 1000000
|
|
|
deepinfra
|
gemini-2.5-flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Source: deepinfra, Context: 1000000
|
|
|
deepinfra
|
gemini-2.5-pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Source: deepinfra, Context: 1000000
|
|
|
deepinfra
|
gemma-3-12b-it |
gemma-3-12b-it
|
0.05 |
0.10 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
gemma-3-27b-it |
gemma-3-27b-it
|
0.09 |
0.16 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
gemma-3-4b-it |
gemma-3-4b-it
|
0.04 |
0.08 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Llama-3.2-11B-Vision-Instruct |
llama-3.2-11b-vision-instruct
|
0.05 |
0.05 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Llama-3.2-3B-Instruct |
llama-3.2-3b-instruct
|
0.02 |
0.02 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.23 |
0.40 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Llama-3.3-70B-Instruct-Turbo |
llama-3.3-70b-instruct-turbo
|
0.13 |
0.39 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Llama-4-Maverick-17B-128E-Instruct-FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.15 |
0.60 |
Source: deepinfra, Context: 1048576
|
|
|
deepinfra
|
Llama-4-Scout-17B-16E-Instruct |
llama-4-scout-17b-16e-instruct
|
0.08 |
0.30 |
Source: deepinfra, Context: 327680
|
|
|
deepinfra
|
Llama-Guard-3-8B |
llama-guard-3-8b
|
0.06 |
0.06 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Llama-Guard-4-12B |
llama-guard-4-12b
|
0.18 |
0.18 |
Source: deepinfra, Context: 163840
|
|
|
deepinfra
|
Meta-Llama-3-8B-Instruct |
meta-llama-3-8b-instruct
|
0.03 |
0.06 |
Source: deepinfra, Context: 8192
|
|
|
deepinfra
|
Meta-Llama-3.1-70B-Instruct |
meta-llama-3.1-70b-instruct
|
0.40 |
0.40 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Meta-Llama-3.1-70B-Instruct-Turbo |
meta-llama-3.1-70b-instruct-turbo
|
0.10 |
0.28 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.03 |
0.05 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Meta-Llama-3.1-8B-Instruct-Turbo |
meta-llama-3.1-8b-instruct-turbo
|
0.02 |
0.03 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
WizardLM-2-8x22B |
wizardlm-2-8x22b
|
0.48 |
0.48 |
Source: deepinfra, Context: 65536
|
|
|
deepinfra
|
phi-4 |
phi-4
|
0.07 |
0.14 |
Source: deepinfra, Context: 16384
|
|
|
deepinfra
|
Mistral-Nemo-Instruct-2407 |
mistral-nemo-instruct-2407
|
0.02 |
0.04 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Mistral-Small-24B-Instruct-2501 |
mistral-small-24b-instruct-2501
|
0.05 |
0.08 |
Source: deepinfra, Context: 32768
|
|
|
deepinfra
|
Mistral-Small-3.2-24B-Instruct-2506 |
mistral-small-3.2-24b-instruct-2506
|
0.08 |
0.20 |
Source: deepinfra, Context: 128000
|
|
|
deepinfra
|
Mixtral-8x7B-Instruct-v0.1 |
mixtral-8x7b-instruct-v0.1
|
0.40 |
0.40 |
Source: deepinfra, Context: 32768
|
|
|
deepinfra
|
Kimi-K2-Instruct-0905 |
kimi-k2-instruct-0905
|
0.50 |
2.00 |
Source: deepinfra, Context: 262144
|
|
|
deepinfra
|
Llama-3.1-Nemotron-70B-Instruct |
llama-3.1-nemotron-70b-instruct
|
0.60 |
0.60 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
Llama-3.3-Nemotron-Super-49B-v1.5 |
llama-3.3-nemotron-super-49b-v1.5
|
0.10 |
0.40 |
Source: deepinfra, Context: 131072
|
|
|
deepinfra
|
NVIDIA-Nemotron-Nano-9B-v2 |
nvidia-nemotron-nano-9b-v2
|
0.04 |
0.16 |
Source: deepinfra, Context: 131072
|
|
|
deepseek
|
deepseek-coder |
deepseek-coder
|
0.14 |
0.28 |
Source: deepseek, Context: 128000
|
|
|
deepseek
|
deepseek-r1 |
deepseek-r1
|
0.55 |
2.19 |
Source: deepseek, Context: 65536
|
|
|
deepseek
|
deepseek-v3 |
deepseek-v3
|
0.27 |
1.10 |
Source: deepseek, Context: 65536
|
|
|
deepseek
|
deepseek-v3.2 |
deepseek-v3.2
|
0.28 |
0.40 |
Source: deepseek, Context: 163840
|
|
|
bedrockconverse
|
deepseek.v3-v1:0 |
deepseek.v3-v1:0
|
0.58 |
1.68 |
Source: bedrock_converse, Context: 163840
|
|
|
nlpcloud
|
dolphin |
dolphin
|
0.50 |
0.50 |
Source: nlp_cloud, Context: 16384
|
|
|
volcengine
|
doubao-embedding |
doubao-embedding
|
0.00 |
0.00 |
Source: volcengine, Context: 4096
|
|
|
volcengine
|
doubao-embedding-large |
doubao-embedding-large
|
0.00 |
0.00 |
Source: volcengine, Context: 4096
|
|
|
volcengine
|
doubao-embedding-large-text-240915 |
doubao-embedding-large-text-240915
|
0.00 |
0.00 |
Source: volcengine, Context: 4096
|
|
|
volcengine
|
doubao-embedding-large-text-250515 |
doubao-embedding-large-text-250515
|
0.00 |
0.00 |
Source: volcengine, Context: 4096
|
|
|
volcengine
|
doubao-embedding-text-240715 |
doubao-embedding-text-240715
|
0.00 |
0.00 |
Source: volcengine, Context: 4096
|
|
|
exaai
|
search |
search
|
0.00 |
0.00 |
Source: exa_ai, Context: N/A
|
|
|
firecrawl
|
search |
search
|
0.00 |
0.00 |
Source: firecrawl, Context: N/A
|
|
|
perplexity
|
search |
search
|
0.00 |
0.00 |
Source: perplexity, Context: N/A
|
|
|
searxng
|
search |
search
|
0.00 |
0.00 |
Source: searxng, Context: N/A
|
|
|
elevenlabs
|
scribe_v1 |
scribe_v1
|
0.00 |
0.00 |
Source: elevenlabs, Context: N/A
|
|
|
elevenlabs
|
scribe_v1_experimental |
scribe_v1_experimental
|
0.00 |
0.00 |
Source: elevenlabs, Context: N/A
|
|
|
cohere
|
embed-english-light-v2.0 |
embed-english-light-v2.0
|
0.10 |
0.00 |
Source: cohere, Context: 1024
|
|
|
cohere
|
embed-english-light-v3.0 |
embed-english-light-v3.0
|
0.10 |
0.00 |
Source: cohere, Context: 1024
|
|
|
cohere
|
embed-english-v2.0 |
embed-english-v2.0
|
0.10 |
0.00 |
Source: cohere, Context: 4096
|
|
|
cohere
|
embed-english-v3.0 |
embed-english-v3.0
|
0.10 |
0.00 |
Source: cohere, Context: 1024
|
|
|
cohere
|
embed-multilingual-v2.0 |
embed-multilingual-v2.0
|
0.10 |
0.00 |
Source: cohere, Context: 768
|
|
|
cohere
|
embed-multilingual-v3.0 |
embed-multilingual-v3.0
|
0.10 |
0.00 |
Source: cohere, Context: 1024
|
|
|
cohere
|
embed-multilingual-light-v3.0 |
embed-multilingual-light-v3.0
|
100.00 |
0.00 |
Source: cohere, Context: 1024
|
|
|
bedrockconverse
|
eu.amazon.nova-lite-v1:0 |
eu.amazon.nova-lite-v1:0
|
0.08 |
0.31 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrockconverse
|
eu.amazon.nova-micro-v1:0 |
eu.amazon.nova-micro-v1:0
|
0.05 |
0.18 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
eu.amazon.nova-pro-v1:0 |
eu.amazon.nova-pro-v1:0
|
1.05 |
4.20 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrock
|
eu.anthropic.claude-3-5-haiku-20241022-v1:0 |
eu.anthropic.claude-3-5-haiku-20241022-v1:0
|
0.25 |
1.25 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
eu.anthropic.claude-haiku-4-5-20251001-v1:0 |
eu.anthropic.claude-haiku-4-5-20251001-v1:0
|
1.10 |
5.50 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
eu.anthropic.claude-3-5-sonnet-20240620-v1:0 |
eu.anthropic.claude-3-5-sonnet-20240620-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
eu.anthropic.claude-3-5-sonnet-20241022-v2:0 |
eu.anthropic.claude-3-5-sonnet-20241022-v2:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
eu.anthropic.claude-3-7-sonnet-20250219-v1:0 |
eu.anthropic.claude-3-7-sonnet-20250219-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
eu.anthropic.claude-3-haiku-20240307-v1:0 |
eu.anthropic.claude-3-haiku-20240307-v1:0
|
0.25 |
1.25 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
eu.anthropic.claude-3-opus-20240229-v1:0 |
eu.anthropic.claude-3-opus-20240229-v1:0
|
15.00 |
75.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
eu.anthropic.claude-3-sonnet-20240229-v1:0 |
eu.anthropic.claude-3-sonnet-20240229-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
eu.anthropic.claude-opus-4-1-20250805-v1:0 |
eu.anthropic.claude-opus-4-1-20250805-v1:0
|
15.00 |
75.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
eu.anthropic.claude-opus-4-20250514-v1:0 |
eu.anthropic.claude-opus-4-20250514-v1:0
|
15.00 |
75.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
eu.anthropic.claude-sonnet-4-20250514-v1:0 |
eu.anthropic.claude-sonnet-4-20250514-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
eu.anthropic.claude-sonnet-4-5-20250929-v1:0 |
eu.anthropic.claude-sonnet-4-5-20250929-v1:0
|
3.30 |
16.50 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
eu.meta.llama3-2-1b-instruct-v1:0 |
eu.meta.llama3-2-1b-instruct-v1:0
|
0.13 |
0.13 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
eu.meta.llama3-2-3b-instruct-v1:0 |
eu.meta.llama3-2-3b-instruct-v1:0
|
0.19 |
0.19 |
Source: bedrock, Context: 128000
|
|
|
bedrockconverse
|
eu.mistral.pixtral-large-2502-v1:0 |
eu.mistral.pixtral-large-2502-v1:0
|
2.00 |
6.00 |
Source: bedrock_converse, Context: 128000
|
|
|
falai
|
3.2 |
3.2
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
v1.1 |
v1.1
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
v1.1-ultra |
v1.1-ultra
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
schnell |
schnell
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
text-to-image |
text-to-image
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
v3 |
v3
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
preview |
preview
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
fast |
fast
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
ultra |
ultra
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
falai
|
stable-diffusion-v35-medium |
stable-diffusion-v35-medium
|
0.00 |
0.00 |
Source: fal_ai, Context: N/A
|
|
|
featherlessai
|
Qwerky-72B |
qwerky-72b
|
0.00 |
0.00 |
Source: featherless_ai, Context: 32768
|
|
|
featherlessai
|
Qwerky-QwQ-32B |
qwerky-qwq-32b
|
0.00 |
0.00 |
Source: featherless_ai, Context: 32768
|
|
|
fireworksai
|
fireworks-ai-4.1b-to-16b |
fireworks-ai-4.1b-to-16b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: N/A
|
|
|
fireworksai
|
fireworks-ai-56b-to-176b |
fireworks-ai-56b-to-176b
|
1.20 |
1.20 |
Source: fireworks_ai, Context: N/A
|
|
|
fireworksai
|
fireworks-ai-above-16b |
fireworks-ai-above-16b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: N/A
|
|
|
fireworksai
|
fireworks-ai-default |
fireworks-ai-default
|
0.00 |
0.00 |
Source: fireworks_ai, Context: N/A
|
|
|
fireworksaiembeddingmodels
|
fireworks-ai-embedding-150m-to-350m |
fireworks-ai-embedding-150m-to-350m
|
0.02 |
0.00 |
Source: fireworks_ai-embedding-models, Context: N/A
|
|
|
fireworksaiembeddingmodels
|
fireworks-ai-embedding-up-to-150m |
fireworks-ai-embedding-up-to-150m
|
0.01 |
0.00 |
Source: fireworks_ai-embedding-models, Context: N/A
|
|
|
fireworksai
|
fireworks-ai-moe-up-to-56b |
fireworks-ai-moe-up-to-56b
|
0.50 |
0.50 |
Source: fireworks_ai, Context: N/A
|
|
|
fireworksai
|
fireworks-ai-up-to-4b |
fireworks-ai-up-to-4b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: N/A
|
|
|
fireworksaiembeddingmodels
|
UAE-Large-V1 |
uae-large-v1
|
0.02 |
0.00 |
Source: fireworks_ai-embedding-models, Context: 512
|
|
|
fireworksai
|
deepseek-coder-v2-instruct |
deepseek-coder-v2-instruct
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 65536
|
|
|
fireworksai
|
deepseek-r1 |
deepseek-r1
|
3.00 |
8.00 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
deepseek-r1-basic |
deepseek-r1-basic
|
0.55 |
2.19 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
deepseek-v3 |
deepseek-v3
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
deepseek-v3p1-terminus |
deepseek-v3p1-terminus
|
0.56 |
1.68 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
firefunction-v2 |
firefunction-v2
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
kimi-k2-instruct-0905 |
kimi-k2-instruct-0905
|
0.60 |
2.50 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
llama-v3p1-405b-instruct |
llama-v3p1-405b-instruct
|
3.00 |
3.00 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
llama-v3p1-8b-instruct |
llama-v3p1-8b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
llama-v3p2-11b-vision-instruct |
llama-v3p2-11b-vision-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
llama-v3p2-1b-instruct |
llama-v3p2-1b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
llama-v3p2-3b-instruct |
llama-v3p2-3b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
llama-v3p2-90b-vision-instruct |
llama-v3p2-90b-vision-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
llama4-maverick-instruct-basic |
llama4-maverick-instruct-basic
|
0.22 |
0.88 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama4-scout-instruct-basic |
llama4-scout-instruct-basic
|
0.15 |
0.60 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
mixtral-8x22b-instruct-hf |
mixtral-8x22b-instruct-hf
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 65536
|
|
|
fireworksai
|
qwen2-72b-instruct |
qwen2-72b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-32b-instruct |
qwen2p5-coder-32b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
yi-large |
yi-large
|
3.00 |
3.00 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksaiembeddingmodels
|
nomic-embed-text-v1 |
nomic-embed-text-v1
|
0.01 |
0.00 |
Source: fireworks_ai-embedding-models, Context: 8192
|
|
|
fireworksaiembeddingmodels
|
nomic-embed-text-v1.5 |
nomic-embed-text-v1.5
|
0.01 |
0.00 |
Source: fireworks_ai-embedding-models, Context: 8192
|
|
|
fireworksaiembeddingmodels
|
gte-base |
gte-base
|
0.01 |
0.00 |
Source: fireworks_ai-embedding-models, Context: 512
|
|
|
fireworksaiembeddingmodels
|
gte-large |
gte-large
|
0.02 |
0.00 |
Source: fireworks_ai-embedding-models, Context: 512
|
|
|
friendliai
|
meta-llama-3.1-70b-instruct |
meta-llama-3.1-70b-instruct
|
0.60 |
0.60 |
Source: friendliai, Context: 8192
|
|
|
friendliai
|
meta-llama-3.1-8b-instruct |
meta-llama-3.1-8b-instruct
|
0.10 |
0.10 |
Source: friendliai, Context: 8192
|
|
|
textcompletionopenai
|
ft:babbage-002 |
ft:babbage-002
|
1.60 |
1.60 |
Source: text-completion-openai, Context: 16384
|
|
|
textcompletionopenai
|
ft:davinci-002 |
ft:davinci-002
|
12.00 |
12.00 |
Source: text-completion-openai, Context: 16384
|
|
|
openai
|
ft:gpt-3.5-turbo |
ft:gpt-3.5-turbo
|
3.00 |
6.00 |
Source: openai, Context: 16385
|
|
|
openai
|
ft:gpt-3.5-turbo-0125 |
ft:gpt-3.5-turbo-0125
|
3.00 |
6.00 |
Source: openai, Context: 16385
|
|
|
openai
|
ft:gpt-3.5-turbo-0613 |
ft:gpt-3.5-turbo-0613
|
3.00 |
6.00 |
Source: openai, Context: 4096
|
|
|
openai
|
ft:gpt-3.5-turbo-1106 |
ft:gpt-3.5-turbo-1106
|
3.00 |
6.00 |
Source: openai, Context: 16385
|
|
|
openai
|
ft:gpt-4-0613 |
ft:gpt-4-0613
|
30.00 |
60.00 |
Source: openai, Context: 8192
|
|
|
openai
|
ft:gpt-4o-2024-08-06 |
ft:gpt-4o-2024-08-06
|
3.75 |
15.00 |
Source: openai, Context: 128000
|
|
|
openai
|
ft:gpt-4o-2024-11-20 |
ft:gpt-4o-2024-11-20
|
3.75 |
15.00 |
Source: openai, Context: 128000
|
|
|
openai
|
ft:gpt-4o-mini-2024-07-18 |
ft:gpt-4o-mini-2024-07-18
|
0.30 |
1.20 |
Source: openai, Context: 128000
|
|
|
openai
|
ft:gpt-4.1-2025-04-14 |
ft:gpt-4.1-2025-04-14
|
3.00 |
12.00 |
Source: openai, Context: 1047576
|
|
|
openai
|
ft:gpt-4.1-mini-2025-04-14 |
ft:gpt-4.1-mini-2025-04-14
|
0.80 |
3.20 |
Source: openai, Context: 1047576
|
|
|
openai
|
ft:gpt-4.1-nano-2025-04-14 |
ft:gpt-4.1-nano-2025-04-14
|
0.20 |
0.80 |
Source: openai, Context: 1047576
|
|
|
openai
|
ft:o4-mini-2025-04-16 |
ft:o4-mini-2025-04-16
|
4.00 |
16.00 |
Source: openai, Context: 200000
|
|
|
vertex
|
gemini-1.0-pro |
gemini-1.0-pro
|
0.50 |
1.50 |
Source: vertex, Context: 32760
|
|
|
vertex
|
gemini-1.0-pro-001 |
gemini-1.0-pro-001
|
0.50 |
1.50 |
Source: vertex, Context: 32760
|
|
|
vertex
|
gemini-1.0-pro-002 |
gemini-1.0-pro-002
|
0.50 |
1.50 |
Source: vertex, Context: 32760
|
|
|
vertex
|
gemini-1.0-pro-vision |
gemini-1.0-pro-vision
|
0.50 |
1.50 |
Source: vertex, Context: 16384
|
|
|
vertex
|
gemini-1.0-pro-vision-001 |
gemini-1.0-pro-vision-001
|
0.50 |
1.50 |
Source: vertex, Context: 16384
|
|
|
vertex
|
gemini-1.0-ultra |
gemini-1.0-ultra
|
0.50 |
1.50 |
Source: vertex, Context: 8192
|
|
|
vertex
|
gemini-1.0-ultra-001 |
gemini-1.0-ultra-001
|
0.50 |
1.50 |
Source: vertex, Context: 8192
|
|
|
vertex
|
gemini-1.5-flash |
gemini-1.5-flash
|
0.08 |
0.30 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-1.5-flash-001 |
gemini-1.5-flash-001
|
0.08 |
0.30 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-1.5-flash-002 |
gemini-1.5-flash-002
|
0.08 |
0.30 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-1.5-flash-exp-0827 |
gemini-1.5-flash-exp-0827
|
0.00 |
0.00 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-1.5-flash-preview-0514 |
gemini-1.5-flash-preview-0514
|
0.08 |
0.00 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-1.5-pro |
gemini-1.5-pro
|
1.25 |
5.00 |
Source: vertex, Context: 2097152
|
|
|
vertex
|
gemini-1.5-pro-001 |
gemini-1.5-pro-001
|
1.25 |
5.00 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-1.5-pro-002 |
gemini-1.5-pro-002
|
1.25 |
5.00 |
Source: vertex, Context: 2097152
|
|
|
vertex
|
gemini-1.5-pro-preview-0215 |
gemini-1.5-pro-preview-0215
|
0.08 |
0.31 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-1.5-pro-preview-0409 |
gemini-1.5-pro-preview-0409
|
0.08 |
0.31 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-1.5-pro-preview-0514 |
gemini-1.5-pro-preview-0514
|
0.08 |
0.31 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-2.0-flash |
gemini-2.0-flash
|
0.10 |
0.40 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-001 |
gemini-2.0-flash-001
|
0.15 |
0.60 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-exp |
gemini-2.0-flash-exp
|
0.15 |
0.60 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-lite |
gemini-2.0-flash-lite
|
0.08 |
0.30 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-lite-001 |
gemini-2.0-flash-lite-001
|
0.08 |
0.30 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-live-preview-04-09 |
gemini-2.0-flash-live-preview-04-09
|
0.50 |
2.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-preview-image-generation |
gemini-2.0-flash-preview-image-generation
|
0.10 |
0.40 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-thinking-exp |
gemini-2.0-flash-thinking-exp
|
0.00 |
0.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-flash-thinking-exp-01-21 |
gemini-2.0-flash-thinking-exp-01-21
|
0.00 |
0.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.0-pro-exp-02-05 |
gemini-2.0-pro-exp-02-05
|
1.25 |
10.00 |
Source: vertex, Context: 2097152
|
|
|
vertex
|
gemini-2.5-flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-flash-image |
gemini-2.5-flash-image
|
0.30 |
2.50 |
Source: vertex, Context: 32768
|
|
|
vertex
|
gemini-2.5-flash-image-preview |
gemini-2.5-flash-image-preview
|
0.30 |
30.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-3-pro-image-preview |
gemini-3-pro-image-preview
|
2.00 |
12.00 |
Source: vertex, Context: 65536
|
|
|
vertex
|
gemini-2.5-flash-lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-flash-lite-preview-09-2025 |
gemini-2.5-flash-lite-preview-09-2025
|
0.10 |
0.40 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-flash-preview-09-2025 |
gemini-2.5-flash-preview-09-2025
|
0.30 |
2.50 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-live-2.5-flash-preview-native-audio-09-2025 |
gemini-live-2.5-flash-preview-native-audio-09-2025
|
0.30 |
2.00 |
Source: vertex, Context: 1048576
|
|
|
gemini
|
gemini-live-2.5-flash-preview-native-audio-09-2025 |
gemini-live-2.5-flash-preview-native-audio-09-2025
|
0.30 |
2.00 |
Source: gemini, Context: 1048576
|
|
|
vertex
|
gemini-2.5-flash-lite-preview-06-17 |
gemini-2.5-flash-lite-preview-06-17
|
0.10 |
0.40 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-flash-preview-04-17 |
gemini-2.5-flash-preview-04-17
|
0.15 |
0.60 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-flash-preview-05-20 |
gemini-2.5-flash-preview-05-20
|
0.30 |
2.50 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-3-pro-preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-3-flash-preview |
gemini-3-flash-preview
|
0.50 |
3.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-pro-exp-03-25 |
gemini-2.5-pro-exp-03-25
|
1.25 |
10.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-pro-preview-03-25 |
gemini-2.5-pro-preview-03-25
|
1.25 |
10.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-pro-preview-05-06 |
gemini-2.5-pro-preview-05-06
|
1.25 |
10.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-pro-preview-06-05 |
gemini-2.5-pro-preview-06-05
|
1.25 |
10.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-2.5-pro-preview-tts |
gemini-2.5-pro-preview-tts
|
1.25 |
10.00 |
Source: vertex, Context: 1048576
|
|
|
vertex
|
gemini-embedding-001 |
gemini-embedding-001
|
0.15 |
0.00 |
Source: vertex, Context: 2048
|
|
|
vertex
|
gemini-flash-experimental |
gemini-flash-experimental
|
0.00 |
0.00 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-pro |
gemini-pro
|
0.50 |
1.50 |
Source: vertex, Context: 32760
|
|
|
vertex
|
gemini-pro-experimental |
gemini-pro-experimental
|
0.00 |
0.00 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
gemini-pro-vision |
gemini-pro-vision
|
0.50 |
1.50 |
Source: vertex, Context: 16384
|
|
|
gemini
|
gemini-embedding-001 |
gemini-embedding-001
|
0.15 |
0.00 |
Source: gemini, Context: 2048
|
|
|
gemini
|
gemini-1.5-flash |
gemini-1.5-flash
|
0.08 |
0.30 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-1.5-flash-001 |
gemini-1.5-flash-001
|
0.08 |
0.30 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-1.5-flash-002 |
gemini-1.5-flash-002
|
0.08 |
0.30 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-1.5-flash-8b |
gemini-1.5-flash-8b
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-1.5-flash-8b-exp-0827 |
gemini-1.5-flash-8b-exp-0827
|
0.00 |
0.00 |
Source: gemini, Context: 1000000
|
|
|
gemini
|
gemini-1.5-flash-8b-exp-0924 |
gemini-1.5-flash-8b-exp-0924
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-1.5-flash-exp-0827 |
gemini-1.5-flash-exp-0827
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-1.5-flash-latest |
gemini-1.5-flash-latest
|
0.08 |
0.30 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-1.5-pro |
gemini-1.5-pro
|
3.50 |
10.50 |
Source: gemini, Context: 2097152
|
|
|
gemini
|
gemini-1.5-pro-001 |
gemini-1.5-pro-001
|
3.50 |
10.50 |
Source: gemini, Context: 2097152
|
|
|
gemini
|
gemini-1.5-pro-002 |
gemini-1.5-pro-002
|
3.50 |
10.50 |
Source: gemini, Context: 2097152
|
|
|
gemini
|
gemini-1.5-pro-exp-0801 |
gemini-1.5-pro-exp-0801
|
3.50 |
10.50 |
Source: gemini, Context: 2097152
|
|
|
gemini
|
gemini-1.5-pro-exp-0827 |
gemini-1.5-pro-exp-0827
|
0.00 |
0.00 |
Source: gemini, Context: 2097152
|
|
|
gemini
|
gemini-1.5-pro-latest |
gemini-1.5-pro-latest
|
3.50 |
1.05 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash |
gemini-2.0-flash
|
0.10 |
0.40 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-001 |
gemini-2.0-flash-001
|
0.10 |
0.40 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-exp |
gemini-2.0-flash-exp
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-lite |
gemini-2.0-flash-lite
|
0.08 |
0.30 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-lite-preview-02-05 |
gemini-2.0-flash-lite-preview-02-05
|
0.08 |
0.30 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-live-001 |
gemini-2.0-flash-live-001
|
0.35 |
1.50 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-preview-image-generation |
gemini-2.0-flash-preview-image-generation
|
0.10 |
0.40 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-thinking-exp |
gemini-2.0-flash-thinking-exp
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-flash-thinking-exp-01-21 |
gemini-2.0-flash-thinking-exp-01-21
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.0-pro-exp-02-05 |
gemini-2.0-pro-exp-02-05
|
0.00 |
0.00 |
Source: gemini, Context: 2097152
|
|
|
gemini
|
gemini-2.5-flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-flash-image-preview |
gemini-2.5-flash-image-preview
|
0.30 |
30.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-3-pro-image-preview |
gemini-3-pro-image-preview
|
2.00 |
12.00 |
Source: gemini, Context: 65536
|
|
|
gemini
|
gemini-2.5-flash-lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-flash-lite-preview-09-2025 |
gemini-2.5-flash-lite-preview-09-2025
|
0.10 |
0.40 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-flash-preview-09-2025 |
gemini-2.5-flash-preview-09-2025
|
0.30 |
2.50 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-flash-latest |
gemini-flash-latest
|
0.30 |
2.50 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-flash-lite-latest |
gemini-flash-lite-latest
|
0.10 |
0.40 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-flash-lite-preview-06-17 |
gemini-2.5-flash-lite-preview-06-17
|
0.10 |
0.40 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-flash-preview-04-17 |
gemini-2.5-flash-preview-04-17
|
0.15 |
0.60 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-flash-preview-05-20 |
gemini-2.5-flash-preview-05-20
|
0.30 |
2.50 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-flash-preview-tts |
gemini-2.5-flash-preview-tts
|
0.15 |
0.60 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-computer-use-preview-10-2025 |
gemini-2.5-computer-use-preview-10-2025
|
1.25 |
10.00 |
Source: gemini, Context: 128000
|
|
|
gemini
|
gemini-3-pro-preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-3-flash-preview |
gemini-3-flash-preview
|
0.50 |
3.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-pro-exp-03-25 |
gemini-2.5-pro-exp-03-25
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-pro-preview-03-25 |
gemini-2.5-pro-preview-03-25
|
1.25 |
10.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-pro-preview-05-06 |
gemini-2.5-pro-preview-05-06
|
1.25 |
10.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-pro-preview-06-05 |
gemini-2.5-pro-preview-06-05
|
1.25 |
10.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-2.5-pro-preview-tts |
gemini-2.5-pro-preview-tts
|
1.25 |
10.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-exp-1114 |
gemini-exp-1114
|
0.00 |
0.00 |
Source: gemini, Context: 1048576
|
|
|
gemini
|
gemini-exp-1206 |
gemini-exp-1206
|
0.00 |
0.00 |
Source: gemini, Context: 2097152
|
|
|
gemini
|
gemini-gemma-2-27b-it |
gemini-gemma-2-27b-it
|
0.35 |
1.05 |
Source: gemini, Context: 8192
|
|
|
gemini
|
gemini-gemma-2-9b-it |
gemini-gemma-2-9b-it
|
0.35 |
1.05 |
Source: gemini, Context: 8192
|
|
|
gemini
|
gemini-pro |
gemini-pro
|
0.35 |
1.05 |
Source: gemini, Context: 32760
|
|
|
gemini
|
gemini-pro-vision |
gemini-pro-vision
|
0.35 |
1.05 |
Source: gemini, Context: 30720
|
|
|
gemini
|
gemma-3-27b-it |
gemma-3-27b-it
|
0.00 |
0.00 |
Source: gemini, Context: 131072
|
|
|
gemini
|
imagen-3.0-fast-generate-001 |
imagen-3.0-fast-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: N/A
|
|
|
gemini
|
imagen-3.0-generate-001 |
imagen-3.0-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: N/A
|
|
|
gemini
|
imagen-3.0-generate-002 |
imagen-3.0-generate-002
|
0.00 |
0.00 |
Source: gemini, Context: N/A
|
|
|
gemini
|
imagen-4.0-fast-generate-001 |
imagen-4.0-fast-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: N/A
|
|
|
gemini
|
imagen-4.0-generate-001 |
imagen-4.0-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: N/A
|
|
|
gemini
|
imagen-4.0-ultra-generate-001 |
imagen-4.0-ultra-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: N/A
|
|
|
gemini
|
learnlm-1.5-pro-experimental |
learnlm-1.5-pro-experimental
|
0.00 |
0.00 |
Source: gemini, Context: 32767
|
|
|
gemini
|
veo-2.0-generate-001 |
veo-2.0-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: 1024
|
|
|
gemini
|
veo-3.0-fast-generate-preview |
veo-3.0-fast-generate-preview
|
0.00 |
0.00 |
Source: gemini, Context: 1024
|
|
|
gemini
|
veo-3.0-generate-preview |
veo-3.0-generate-preview
|
0.00 |
0.00 |
Source: gemini, Context: 1024
|
|
|
gemini
|
veo-3.1-fast-generate-preview |
veo-3.1-fast-generate-preview
|
0.00 |
0.00 |
Source: gemini, Context: 1024
|
|
|
gemini
|
veo-3.1-generate-preview |
veo-3.1-generate-preview
|
0.00 |
0.00 |
Source: gemini, Context: 1024
|
|
|
gemini
|
veo-3.1-fast-generate-001 |
veo-3.1-fast-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: 1024
|
|
|
gemini
|
veo-3.1-generate-001 |
veo-3.1-generate-001
|
0.00 |
0.00 |
Source: gemini, Context: 1024
|
|
|
githubcopilot
|
gpt-3.5-turbo |
gpt-3.5-turbo
|
0.00 |
0.00 |
Source: github_copilot, Context: 16384
|
|
|
githubcopilot
|
gpt-3.5-turbo-0613 |
gpt-3.5-turbo-0613
|
0.00 |
0.00 |
Source: github_copilot, Context: 16384
|
|
|
githubcopilot
|
gpt-4 |
gpt-4
|
0.00 |
0.00 |
Source: github_copilot, Context: 32768
|
|
|
githubcopilot
|
gpt-4-0613 |
gpt-4-0613
|
0.00 |
0.00 |
Source: github_copilot, Context: 32768
|
|
|
githubcopilot
|
gpt-4-o-preview |
gpt-4-o-preview
|
0.00 |
0.00 |
Source: github_copilot, Context: 64000
|
|
|
githubcopilot
|
gpt-4.1-2025-04-14 |
gpt-4.1-2025-04-14
|
0.00 |
0.00 |
Source: github_copilot, Context: 128000
|
|
|
githubcopilot
|
gpt-41-copilot |
gpt-41-copilot
|
0.00 |
0.00 |
Source: github_copilot, Context: N/A
|
|
|
githubcopilot
|
gpt-4o-2024-05-13 |
gpt-4o-2024-05-13
|
0.00 |
0.00 |
Source: github_copilot, Context: 64000
|
|
|
githubcopilot
|
gpt-4o-2024-08-06 |
gpt-4o-2024-08-06
|
0.00 |
0.00 |
Source: github_copilot, Context: 64000
|
|
|
githubcopilot
|
gpt-4o-2024-11-20 |
gpt-4o-2024-11-20
|
0.00 |
0.00 |
Source: github_copilot, Context: 64000
|
|
|
githubcopilot
|
gpt-4o-mini |
gpt-4o-mini
|
0.00 |
0.00 |
Source: github_copilot, Context: 64000
|
|
|
githubcopilot
|
gpt-4o-mini-2024-07-18 |
gpt-4o-mini-2024-07-18
|
0.00 |
0.00 |
Source: github_copilot, Context: 64000
|
|
|
githubcopilot
|
text-embedding-3-small |
text-embedding-3-small
|
0.00 |
0.00 |
Source: github_copilot, Context: 8191
|
|
|
githubcopilot
|
text-embedding-3-small-inference |
text-embedding-3-small-inference
|
0.00 |
0.00 |
Source: github_copilot, Context: 8191
|
|
|
githubcopilot
|
text-embedding-ada-002 |
text-embedding-ada-002
|
0.00 |
0.00 |
Source: github_copilot, Context: 8191
|
|
|
bedrockconverse
|
google.gemma-3-12b-it |
google.gemma-3-12b-it
|
0.09 |
0.29 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
google.gemma-3-27b-it |
google.gemma-3-27b-it
|
0.23 |
0.38 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
google.gemma-3-4b-it |
google.gemma-3-4b-it
|
0.04 |
0.08 |
Source: bedrock_converse, Context: 128000
|
|
|
googlepse
|
search |
search
|
0.00 |
0.00 |
Source: google_pse, Context: N/A
|
|
|
bedrockconverse
|
global.anthropic.claude-sonnet-4-5-20250929-v1:0 |
global.anthropic.claude-sonnet-4-5-20250929-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
global.anthropic.claude-sonnet-4-20250514-v1:0 |
global.anthropic.claude-sonnet-4-20250514-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
global.anthropic.claude-haiku-4-5-20251001-v1:0 |
global.anthropic.claude-haiku-4-5-20251001-v1:0
|
1.00 |
5.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
global.amazon.nova-2-lite-v1:0 |
global.amazon.nova-2-lite-v1:0
|
0.30 |
2.50 |
Source: bedrock_converse, Context: 1000000
|
|
|
openai
|
gpt-3.5-turbo-0125 |
gpt-3.5-turbo-0125
|
0.50 |
1.50 |
Source: openai, Context: 16385
|
|
|
openai
|
gpt-3.5-turbo-0301 |
gpt-3.5-turbo-0301
|
1.50 |
2.00 |
Source: openai, Context: 4097
|
|
|
openai
|
gpt-3.5-turbo-0613 |
gpt-3.5-turbo-0613
|
1.50 |
2.00 |
Source: openai, Context: 4097
|
|
|
openai
|
gpt-3.5-turbo-1106 |
gpt-3.5-turbo-1106
|
1.00 |
2.00 |
Source: openai, Context: 16385
|
|
|
openai
|
gpt-3.5-turbo-16k |
gpt-3.5-turbo-16k
|
3.00 |
4.00 |
Source: openai, Context: 16385
|
|
|
openai
|
gpt-3.5-turbo-16k-0613 |
gpt-3.5-turbo-16k-0613
|
3.00 |
4.00 |
Source: openai, Context: 16385
|
|
|
textcompletionopenai
|
gpt-3.5-turbo-instruct |
gpt-3.5-turbo-instruct
|
1.50 |
2.00 |
Source: text-completion-openai, Context: 8192
|
|
|
textcompletionopenai
|
gpt-3.5-turbo-instruct-0914 |
gpt-3.5-turbo-instruct-0914
|
1.50 |
2.00 |
Source: text-completion-openai, Context: 8192
|
|
|
openai
|
gpt-4-0125-preview |
gpt-4-0125-preview
|
10.00 |
30.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4-0314 |
gpt-4-0314
|
30.00 |
60.00 |
Source: openai, Context: 8192
|
|
|
openai
|
gpt-4-0613 |
gpt-4-0613
|
30.00 |
60.00 |
Source: openai, Context: 8192
|
|
|
openai
|
gpt-4-1106-preview |
gpt-4-1106-preview
|
10.00 |
30.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4-1106-vision-preview |
gpt-4-1106-vision-preview
|
10.00 |
30.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4-32k |
gpt-4-32k
|
60.00 |
120.00 |
Source: openai, Context: 32768
|
|
|
openai
|
gpt-4-32k-0314 |
gpt-4-32k-0314
|
60.00 |
120.00 |
Source: openai, Context: 32768
|
|
|
openai
|
gpt-4-32k-0613 |
gpt-4-32k-0613
|
60.00 |
120.00 |
Source: openai, Context: 32768
|
|
|
openai
|
gpt-4-turbo-2024-04-09 |
gpt-4-turbo-2024-04-09
|
10.00 |
30.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4-turbo-preview |
gpt-4-turbo-preview
|
10.00 |
30.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4-vision-preview |
gpt-4-vision-preview
|
10.00 |
30.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4.1-2025-04-14 |
gpt-4.1-2025-04-14
|
2.00 |
8.00 |
Source: openai, Context: 1047576
|
|
|
openai
|
gpt-4.1-mini-2025-04-14 |
gpt-4.1-mini-2025-04-14
|
0.40 |
1.60 |
Source: openai, Context: 1047576
|
|
|
openai
|
gpt-4.1-nano-2025-04-14 |
gpt-4.1-nano-2025-04-14
|
0.10 |
0.40 |
Source: openai, Context: 1047576
|
|
|
openai
|
gpt-4.5-preview |
gpt-4.5-preview
|
75.00 |
150.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4.5-preview-2025-02-27 |
gpt-4.5-preview-2025-02-27
|
75.00 |
150.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-audio-preview |
gpt-4o-audio-preview
|
2.50 |
10.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-audio-preview-2024-10-01 |
gpt-4o-audio-preview-2024-10-01
|
2.50 |
10.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-audio-preview-2024-12-17 |
gpt-4o-audio-preview-2024-12-17
|
2.50 |
10.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-audio-preview-2025-06-03 |
gpt-4o-audio-preview-2025-06-03
|
2.50 |
10.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-2024-07-18 |
gpt-4o-mini-2024-07-18
|
0.15 |
0.60 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-audio-preview |
gpt-4o-mini-audio-preview
|
0.15 |
0.60 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-audio-preview-2024-12-17 |
gpt-4o-mini-audio-preview-2024-12-17
|
0.15 |
0.60 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-realtime-preview |
gpt-4o-mini-realtime-preview
|
0.60 |
2.40 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-realtime-preview-2024-12-17 |
gpt-4o-mini-realtime-preview-2024-12-17
|
0.60 |
2.40 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-search-preview |
gpt-4o-mini-search-preview
|
0.15 |
0.60 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-search-preview-2025-03-11 |
gpt-4o-mini-search-preview-2025-03-11
|
0.15 |
0.60 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-mini-transcribe |
gpt-4o-mini-transcribe
|
1.25 |
5.00 |
Source: openai, Context: 16000
|
|
|
openai
|
gpt-4o-mini-tts |
gpt-4o-mini-tts
|
2.50 |
10.00 |
Source: openai, Context: N/A
|
|
|
openai
|
gpt-4o-realtime-preview |
gpt-4o-realtime-preview
|
5.00 |
20.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-realtime-preview-2024-10-01 |
gpt-4o-realtime-preview-2024-10-01
|
5.00 |
20.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-realtime-preview-2024-12-17 |
gpt-4o-realtime-preview-2024-12-17
|
5.00 |
20.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-realtime-preview-2025-06-03 |
gpt-4o-realtime-preview-2025-06-03
|
5.00 |
20.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-search-preview |
gpt-4o-search-preview
|
2.50 |
10.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-search-preview-2025-03-11 |
gpt-4o-search-preview-2025-03-11
|
2.50 |
10.00 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-4o-transcribe |
gpt-4o-transcribe
|
2.50 |
10.00 |
Source: openai, Context: 16000
|
|
|
openai
|
gpt-image-1.5 |
gpt-image-1.5
|
5.00 |
10.00 |
Source: openai, Context: N/A
|
|
|
openai
|
gpt-image-1.5-2025-12-16 |
gpt-image-1.5-2025-12-16
|
5.00 |
10.00 |
Source: openai, Context: N/A
|
|
|
openai
|
gpt-5.1-2025-11-13 |
gpt-5.1-2025-11-13
|
1.25 |
10.00 |
Source: openai, Context: 272000
|
|
|
openai
|
gpt-5.2-2025-12-11 |
gpt-5.2-2025-12-11
|
1.75 |
14.00 |
Source: openai, Context: 400000
|
|
|
openai
|
gpt-5.2-pro-2025-12-11 |
gpt-5.2-pro-2025-12-11
|
21.00 |
168.00 |
Source: openai, Context: 400000
|
|
|
openai
|
gpt-5-pro-2025-10-06 |
gpt-5-pro-2025-10-06
|
15.00 |
120.00 |
Source: openai, Context: 400000
|
|
|
openai
|
gpt-5-2025-08-07 |
gpt-5-2025-08-07
|
1.25 |
10.00 |
Source: openai, Context: 272000
|
|
|
openai
|
gpt-5-chat |
gpt-5-chat
|
1.25 |
10.00 |
Source: openai, Context: 272000
|
|
|
openai
|
gpt-5-mini-2025-08-07 |
gpt-5-mini-2025-08-07
|
0.25 |
2.00 |
Source: openai, Context: 272000
|
|
|
openai
|
gpt-5-nano-2025-08-07 |
gpt-5-nano-2025-08-07
|
0.05 |
0.40 |
Source: openai, Context: 272000
|
|
|
openai
|
gpt-image-1 |
gpt-image-1
|
5.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
openai
|
gpt-image-1-mini |
gpt-image-1-mini
|
2.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
openai
|
gpt-realtime |
gpt-realtime
|
4.00 |
16.00 |
Source: openai, Context: 32000
|
|
|
openai
|
gpt-realtime-mini |
gpt-realtime-mini
|
0.60 |
2.40 |
Source: openai, Context: 128000
|
|
|
openai
|
gpt-realtime-2025-08-28 |
gpt-realtime-2025-08-28
|
4.00 |
16.00 |
Source: openai, Context: 32000
|
|
|
gradientai
|
alibaba-qwen3-32b |
alibaba-qwen3-32b
|
0.00 |
0.00 |
Source: gradient_ai, Context: 2048
|
|
|
gradientai
|
anthropic-claude-3-opus |
anthropic-claude-3-opus
|
15.00 |
75.00 |
Source: gradient_ai, Context: 1024
|
|
|
gradientai
|
anthropic-claude-3.5-haiku |
anthropic-claude-3.5-haiku
|
0.80 |
4.00 |
Source: gradient_ai, Context: 1024
|
|
|
gradientai
|
anthropic-claude-3.5-sonnet |
anthropic-claude-3.5-sonnet
|
3.00 |
15.00 |
Source: gradient_ai, Context: 1024
|
|
|
gradientai
|
anthropic-claude-3.7-sonnet |
anthropic-claude-3.7-sonnet
|
3.00 |
15.00 |
Source: gradient_ai, Context: 1024
|
|
|
gradientai
|
deepseek-r1-distill-llama-70b |
deepseek-r1-distill-llama-70b
|
0.99 |
0.99 |
Source: gradient_ai, Context: 8000
|
|
|
gradientai
|
llama3-8b-instruct |
llama3-8b-instruct
|
0.20 |
0.20 |
Source: gradient_ai, Context: 512
|
|
|
gradientai
|
llama3.3-70b-instruct |
llama3.3-70b-instruct
|
0.65 |
0.65 |
Source: gradient_ai, Context: 2048
|
|
|
gradientai
|
mistral-nemo-instruct-2407 |
mistral-nemo-instruct-2407
|
0.30 |
0.30 |
Source: gradient_ai, Context: 512
|
|
|
gradientai
|
openai-gpt-4o |
openai-gpt-4o
|
0.00 |
0.00 |
Source: gradient_ai, Context: 16384
|
|
|
gradientai
|
openai-gpt-4o-mini |
openai-gpt-4o-mini
|
0.00 |
0.00 |
Source: gradient_ai, Context: 16384
|
|
|
gradientai
|
openai-o3 |
openai-o3
|
2.00 |
8.00 |
Source: gradient_ai, Context: 100000
|
|
|
gradientai
|
openai-o3-mini |
openai-o3-mini
|
1.10 |
4.40 |
Source: gradient_ai, Context: 100000
|
|
|
lemonade
|
Qwen3-Coder-30B-A3B-Instruct-GGUF |
qwen3-coder-30b-a3b-instruct-gguf
|
0.00 |
0.00 |
Source: lemonade, Context: 262144
|
|
|
lemonade
|
gpt-oss-20b-mxfp4-GGUF |
gpt-oss-20b-mxfp4-gguf
|
0.00 |
0.00 |
Source: lemonade, Context: 131072
|
|
|
lemonade
|
gpt-oss-120b-mxfp-GGUF |
gpt-oss-120b-mxfp-gguf
|
0.00 |
0.00 |
Source: lemonade, Context: 131072
|
|
|
lemonade
|
Gemma-3-4b-it-GGUF |
gemma-3-4b-it-gguf
|
0.00 |
0.00 |
Source: lemonade, Context: 128000
|
|
|
lemonade
|
Qwen3-4B-Instruct-2507-GGUF |
qwen3-4b-instruct-2507-gguf
|
0.00 |
0.00 |
Source: lemonade, Context: 262144
|
|
|
amazonnova
|
nova-micro-v1 |
nova-micro-v1
|
0.04 |
0.14 |
Source: amazon_nova, Context: 128000
|
|
|
amazonnova
|
nova-lite-v1 |
nova-lite-v1
|
0.06 |
0.24 |
Source: amazon_nova, Context: 300000
|
|
|
amazonnova
|
nova-premier-v1 |
nova-premier-v1
|
2.50 |
12.50 |
Source: amazon_nova, Context: 1000000
|
|
|
amazonnova
|
nova-pro-v1 |
nova-pro-v1
|
0.80 |
3.20 |
Source: amazon_nova, Context: 300000
|
|
|
groq
|
gemma-7b-it |
gemma-7b-it
|
0.05 |
0.08 |
Source: groq, Context: 8192
|
|
|
groq
|
playai-tts |
playai-tts
|
0.00 |
0.00 |
Source: groq, Context: 10000
|
|
|
groq
|
whisper-large-v3 |
whisper-large-v3
|
0.00 |
0.00 |
Source: groq, Context: N/A
|
|
|
groq
|
whisper-large-v3-turbo |
whisper-large-v3-turbo
|
0.00 |
0.00 |
Source: groq, Context: N/A
|
|
|
openai
|
dall-e-3 |
dall-e-3
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
heroku
|
claude-3-5-haiku |
claude-3-5-haiku
|
0.00 |
0.00 |
Source: heroku, Context: 4096
|
|
|
heroku
|
claude-3-5-sonnet-latest |
claude-3-5-sonnet-latest
|
0.00 |
0.00 |
Source: heroku, Context: 8192
|
|
|
heroku
|
claude-3-7-sonnet |
claude-3-7-sonnet
|
0.00 |
0.00 |
Source: heroku, Context: 8192
|
|
|
heroku
|
claude-4-sonnet |
claude-4-sonnet
|
0.00 |
0.00 |
Source: heroku, Context: 8192
|
|
|
hyperbolic
|
Hermes-3-Llama-3.1-70B |
hermes-3-llama-3.1-70b
|
0.12 |
0.30 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
QwQ-32B |
qwq-32b
|
0.20 |
0.20 |
Source: hyperbolic, Context: 131072
|
|
|
hyperbolic
|
Qwen2.5-72B-Instruct |
qwen2.5-72b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 131072
|
|
|
hyperbolic
|
Qwen2.5-Coder-32B-Instruct |
qwen2.5-coder-32b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
Qwen3-235B-A22B |
qwen3-235b-a22b
|
2.00 |
2.00 |
Source: hyperbolic, Context: 131072
|
|
|
hyperbolic
|
DeepSeek-R1 |
deepseek-r1
|
0.40 |
0.40 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
DeepSeek-R1-0528 |
deepseek-r1-0528
|
0.25 |
0.25 |
Source: hyperbolic, Context: 131072
|
|
|
hyperbolic
|
DeepSeek-V3 |
deepseek-v3
|
0.20 |
0.20 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
0.40 |
0.40 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
Llama-3.2-3B-Instruct |
llama-3.2-3b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 131072
|
|
|
hyperbolic
|
Meta-Llama-3-70B-Instruct |
meta-llama-3-70b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 131072
|
|
|
hyperbolic
|
Meta-Llama-3.1-405B-Instruct |
meta-llama-3.1-405b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
Meta-Llama-3.1-70B-Instruct |
meta-llama-3.1-70b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.12 |
0.30 |
Source: hyperbolic, Context: 32768
|
|
|
hyperbolic
|
Kimi-K2-Instruct |
kimi-k2-instruct
|
2.00 |
2.00 |
Source: hyperbolic, Context: 131072
|
|
|
ai21
|
j2-light |
j2-light
|
3.00 |
3.00 |
Source: ai21, Context: 8192
|
|
|
ai21
|
j2-mid |
j2-mid
|
10.00 |
10.00 |
Source: ai21, Context: 8192
|
|
|
ai21
|
j2-ultra |
j2-ultra
|
15.00 |
15.00 |
Source: ai21, Context: 8192
|
|
|
ai21
|
jamba-1.5 |
jamba-1.5
|
0.20 |
0.40 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-1.5-large |
jamba-1.5-large
|
2.00 |
8.00 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-1.5-large@001 |
jamba-1.5-large@001
|
2.00 |
8.00 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-1.5-mini |
jamba-1.5-mini
|
0.20 |
0.40 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-1.5-mini@001 |
jamba-1.5-mini@001
|
0.20 |
0.40 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-large-1.6 |
jamba-large-1.6
|
2.00 |
8.00 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-large-1.7 |
jamba-large-1.7
|
2.00 |
8.00 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-mini-1.6 |
jamba-mini-1.6
|
0.20 |
0.40 |
Source: ai21, Context: 256000
|
|
|
ai21
|
jamba-mini-1.7 |
jamba-mini-1.7
|
0.20 |
0.40 |
Source: ai21, Context: 256000
|
|
|
jinaai
|
jina-reranker-v2-base-multilingual |
jina-reranker-v2-base-multilingual
|
0.02 |
0.02 |
Source: jina_ai, Context: 1024
|
|
|
bedrockconverse
|
jp.anthropic.claude-sonnet-4-5-20250929-v1:0 |
jp.anthropic.claude-sonnet-4-5-20250929-v1:0
|
3.30 |
16.50 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
jp.anthropic.claude-haiku-4-5-20251001-v1:0 |
jp.anthropic.claude-haiku-4-5-20251001-v1:0
|
1.10 |
5.50 |
Source: bedrock_converse, Context: 200000
|
|
|
lambdaai
|
deepseek-llama3.3-70b |
deepseek-llama3.3-70b
|
0.20 |
0.60 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
deepseek-r1-0528 |
deepseek-r1-0528
|
0.20 |
0.60 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
deepseek-r1-671b |
deepseek-r1-671b
|
0.80 |
0.80 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
deepseek-v3-0324 |
deepseek-v3-0324
|
0.20 |
0.60 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
hermes3-405b |
hermes3-405b
|
0.80 |
0.80 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
hermes3-70b |
hermes3-70b
|
0.12 |
0.30 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
hermes3-8b |
hermes3-8b
|
0.03 |
0.04 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
lfm-40b |
lfm-40b
|
0.10 |
0.20 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
lfm-7b |
lfm-7b
|
0.03 |
0.04 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama-4-maverick-17b-128e-instruct-fp8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.05 |
0.10 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama-4-scout-17b-16e-instruct |
llama-4-scout-17b-16e-instruct
|
0.05 |
0.10 |
Source: lambda_ai, Context: 16384
|
|
|
lambdaai
|
llama3.1-405b-instruct-fp8 |
llama3.1-405b-instruct-fp8
|
0.80 |
0.80 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama3.1-70b-instruct-fp8 |
llama3.1-70b-instruct-fp8
|
0.12 |
0.30 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama3.1-8b-instruct |
llama3.1-8b-instruct
|
0.03 |
0.04 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama3.1-nemotron-70b-instruct-fp8 |
llama3.1-nemotron-70b-instruct-fp8
|
0.12 |
0.30 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama3.2-11b-vision-instruct |
llama3.2-11b-vision-instruct
|
0.02 |
0.03 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama3.2-3b-instruct |
llama3.2-3b-instruct
|
0.02 |
0.03 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
llama3.3-70b-instruct-fp8 |
llama3.3-70b-instruct-fp8
|
0.12 |
0.30 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
qwen25-coder-32b-instruct |
qwen25-coder-32b-instruct
|
0.05 |
0.10 |
Source: lambda_ai, Context: 131072
|
|
|
lambdaai
|
qwen3-32b-fp8 |
qwen3-32b-fp8
|
0.05 |
0.10 |
Source: lambda_ai, Context: 131072
|
|
|
alephalpha
|
luminous-base |
luminous-base
|
30.00 |
33.00 |
Source: aleph_alpha, Context: 2048
|
|
|
alephalpha
|
luminous-base-control |
luminous-base-control
|
37.50 |
41.25 |
Source: aleph_alpha, Context: 2048
|
|
|
alephalpha
|
luminous-extended |
luminous-extended
|
45.00 |
49.50 |
Source: aleph_alpha, Context: 2048
|
|
|
alephalpha
|
luminous-extended-control |
luminous-extended-control
|
56.25 |
61.88 |
Source: aleph_alpha, Context: 2048
|
|
|
alephalpha
|
luminous-supreme |
luminous-supreme
|
175.00 |
192.50 |
Source: aleph_alpha, Context: 2048
|
|
|
alephalpha
|
luminous-supreme-control |
luminous-supreme-control
|
218.75 |
240.63 |
Source: aleph_alpha, Context: 2048
|
|
|
vertex
|
medlm-large |
medlm-large
|
0.00 |
0.00 |
Source: vertex, Context: 8192
|
|
|
vertex
|
medlm-medium |
medlm-medium
|
0.00 |
0.00 |
Source: vertex, Context: 32768
|
|
|
bedrock
|
meta.llama2-13b-chat-v1 |
meta.llama2-13b-chat-v1
|
0.75 |
1.00 |
Source: bedrock, Context: 4096
|
|
|
bedrock
|
meta.llama2-70b-chat-v1 |
meta.llama2-70b-chat-v1
|
1.95 |
2.56 |
Source: bedrock, Context: 4096
|
|
|
bedrock
|
meta.llama3-1-405b-instruct-v1:0 |
meta.llama3-1-405b-instruct-v1:0
|
5.32 |
16.00 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
meta.llama3-1-70b-instruct-v1:0 |
meta.llama3-1-70b-instruct-v1:0
|
0.99 |
0.99 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
meta.llama3-1-8b-instruct-v1:0 |
meta.llama3-1-8b-instruct-v1:0
|
0.22 |
0.22 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
meta.llama3-2-11b-instruct-v1:0 |
meta.llama3-2-11b-instruct-v1:0
|
0.35 |
0.35 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
meta.llama3-2-1b-instruct-v1:0 |
meta.llama3-2-1b-instruct-v1:0
|
0.10 |
0.10 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
meta.llama3-2-3b-instruct-v1:0 |
meta.llama3-2-3b-instruct-v1:0
|
0.15 |
0.15 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
meta.llama3-2-90b-instruct-v1:0 |
meta.llama3-2-90b-instruct-v1:0
|
2.00 |
2.00 |
Source: bedrock, Context: 128000
|
|
|
bedrockconverse
|
meta.llama3-3-70b-instruct-v1:0 |
meta.llama3-3-70b-instruct-v1:0
|
0.72 |
0.72 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
meta.llama4-maverick-17b-instruct-v1:0 |
meta.llama4-maverick-17b-instruct-v1:0
|
0.24 |
0.97 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
meta.llama4-scout-17b-instruct-v1:0 |
meta.llama4-scout-17b-instruct-v1:0
|
0.17 |
0.66 |
Source: bedrock_converse, Context: 128000
|
|
|
metallama
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.00 |
0.00 |
Source: meta_llama, Context: 128000
|
|
|
metallama
|
Llama-3.3-8B-Instruct |
llama-3.3-8b-instruct
|
0.00 |
0.00 |
Source: meta_llama, Context: 128000
|
|
|
metallama
|
Llama-4-Maverick-17B-128E-Instruct-FP8 |
llama-4-maverick-17b-128e-instruct-fp8
|
0.00 |
0.00 |
Source: meta_llama, Context: 1000000
|
|
|
metallama
|
Llama-4-Scout-17B-16E-Instruct-FP8 |
llama-4-scout-17b-16e-instruct-fp8
|
0.00 |
0.00 |
Source: meta_llama, Context: 10000000
|
|
|
bedrockconverse
|
minimax.minimax-m2 |
minimax.minimax-m2
|
0.30 |
1.20 |
Source: bedrock_converse, Context: 128000
|
|
|
minimax
|
speech-02-hd |
speech-02-hd
|
0.00 |
0.00 |
Source: minimax, Context: N/A
|
|
|
minimax
|
speech-02-turbo |
speech-02-turbo
|
0.00 |
0.00 |
Source: minimax, Context: N/A
|
|
|
minimax
|
speech-2.6-hd |
speech-2.6-hd
|
0.00 |
0.00 |
Source: minimax, Context: N/A
|
|
|
minimax
|
speech-2.6-turbo |
speech-2.6-turbo
|
0.00 |
0.00 |
Source: minimax, Context: N/A
|
|
|
minimax
|
MiniMax-M2.1-lightning |
minimax-m2.1-lightning
|
0.30 |
2.40 |
Source: minimax, Context: 1000000
|
|
|
bedrockconverse
|
mistral.magistral-small-2509 |
mistral.magistral-small-2509
|
0.50 |
1.50 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
mistral.ministral-3-14b-instruct |
mistral.ministral-3-14b-instruct
|
0.20 |
0.20 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
mistral.ministral-3-3b-instruct |
mistral.ministral-3-3b-instruct
|
0.10 |
0.10 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
mistral.ministral-3-8b-instruct |
mistral.ministral-3-8b-instruct
|
0.15 |
0.15 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrock
|
mistral.mistral-large-2407-v1:0 |
mistral.mistral-large-2407-v1:0
|
3.00 |
9.00 |
Source: bedrock, Context: 128000
|
|
|
bedrockconverse
|
mistral.mistral-large-3-675b-instruct |
mistral.mistral-large-3-675b-instruct
|
0.50 |
1.50 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrock
|
mistral.mistral-small-2402-v1:0 |
mistral.mistral-small-2402-v1:0
|
1.00 |
3.00 |
Source: bedrock, Context: 32000
|
|
|
bedrockconverse
|
mistral.voxtral-mini-3b-2507 |
mistral.voxtral-mini-3b-2507
|
0.04 |
0.04 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
mistral.voxtral-small-24b-2507 |
mistral.voxtral-small-24b-2507
|
0.10 |
0.30 |
Source: bedrock_converse, Context: 128000
|
|
|
mistral
|
codestral-2405 |
codestral-2405
|
1.00 |
3.00 |
Source: mistral, Context: 32000
|
|
|
mistral
|
codestral-2508 |
codestral-2508
|
0.30 |
0.90 |
Source: mistral, Context: 256000
|
|
|
mistral
|
codestral-mamba-latest |
codestral-mamba-latest
|
0.25 |
0.25 |
Source: mistral, Context: 256000
|
|
|
mistral
|
magistral-medium-2506 |
magistral-medium-2506
|
2.00 |
5.00 |
Source: mistral, Context: 40000
|
|
|
mistral
|
magistral-medium-2509 |
magistral-medium-2509
|
2.00 |
5.00 |
Source: mistral, Context: 40000
|
|
|
mistral
|
mistral-ocr-latest |
mistral-ocr-latest
|
0.00 |
0.00 |
Source: mistral, Context: N/A
|
|
|
mistral
|
mistral-ocr-2505-completion |
mistral-ocr-2505-completion
|
0.00 |
0.00 |
Source: mistral, Context: N/A
|
|
|
mistral
|
magistral-small-2506 |
magistral-small-2506
|
0.50 |
1.50 |
Source: mistral, Context: 40000
|
|
|
mistral
|
magistral-small-latest |
magistral-small-latest
|
0.50 |
1.50 |
Source: mistral, Context: 40000
|
|
|
mistral
|
codestral-embed |
codestral-embed
|
0.15 |
0.00 |
Source: mistral, Context: 8192
|
|
|
mistral
|
codestral-embed-2505 |
codestral-embed-2505
|
0.15 |
0.00 |
Source: mistral, Context: 8192
|
|
|
mistral
|
mistral-large-2402 |
mistral-large-2402
|
4.00 |
12.00 |
Source: mistral, Context: 32000
|
|
|
mistral
|
mistral-large-2407 |
mistral-large-2407
|
3.00 |
9.00 |
Source: mistral, Context: 128000
|
|
|
mistral
|
mistral-large-3 |
mistral-large-3
|
0.50 |
1.50 |
Source: mistral, Context: 256000
|
|
|
mistral
|
mistral-medium |
mistral-medium
|
2.70 |
8.10 |
Source: mistral, Context: 32000
|
|
|
mistral
|
mistral-medium-2312 |
mistral-medium-2312
|
2.70 |
8.10 |
Source: mistral, Context: 32000
|
|
|
mistral
|
mistral-small |
mistral-small
|
0.10 |
0.30 |
Source: mistral, Context: 32000
|
|
|
mistral
|
mistral-tiny |
mistral-tiny
|
0.25 |
0.25 |
Source: mistral, Context: 32000
|
|
|
mistral
|
open-codestral-mamba |
open-codestral-mamba
|
0.25 |
0.25 |
Source: mistral, Context: 256000
|
|
|
mistral
|
open-mistral-nemo |
open-mistral-nemo
|
0.30 |
0.30 |
Source: mistral, Context: 128000
|
|
|
mistral
|
open-mistral-nemo-2407 |
open-mistral-nemo-2407
|
0.30 |
0.30 |
Source: mistral, Context: 128000
|
|
|
mistral
|
pixtral-12b-2409 |
pixtral-12b-2409
|
0.15 |
0.15 |
Source: mistral, Context: 128000
|
|
|
mistral
|
pixtral-large-2411 |
pixtral-large-2411
|
2.00 |
6.00 |
Source: mistral, Context: 128000
|
|
|
bedrockconverse
|
moonshot.kimi-k2-thinking |
moonshot.kimi-k2-thinking
|
0.60 |
2.50 |
Source: bedrock_converse, Context: 128000
|
|
|
moonshot
|
kimi-k2-0711-preview |
kimi-k2-0711-preview
|
0.60 |
2.50 |
Source: moonshot, Context: 131072
|
|
|
moonshot
|
kimi-k2-0905-preview |
kimi-k2-0905-preview
|
0.60 |
2.50 |
Source: moonshot, Context: 262144
|
|
|
moonshot
|
kimi-k2-turbo-preview |
kimi-k2-turbo-preview
|
1.15 |
8.00 |
Source: moonshot, Context: 262144
|
|
|
moonshot
|
kimi-latest |
kimi-latest
|
2.00 |
5.00 |
Source: moonshot, Context: 131072
|
|
|
moonshot
|
kimi-latest-128k |
kimi-latest-128k
|
2.00 |
5.00 |
Source: moonshot, Context: 131072
|
|
|
moonshot
|
kimi-latest-32k |
kimi-latest-32k
|
1.00 |
3.00 |
Source: moonshot, Context: 32768
|
|
|
moonshot
|
kimi-latest-8k |
kimi-latest-8k
|
0.20 |
2.00 |
Source: moonshot, Context: 8192
|
|
|
moonshot
|
kimi-thinking-preview |
kimi-thinking-preview
|
0.60 |
2.50 |
Source: moonshot, Context: 131072
|
|
|
moonshot
|
kimi-k2-thinking |
kimi-k2-thinking
|
0.60 |
2.50 |
Source: moonshot, Context: 262144
|
|
|
moonshot
|
kimi-k2-thinking-turbo |
kimi-k2-thinking-turbo
|
1.15 |
8.00 |
Source: moonshot, Context: 262144
|
|
|
moonshot
|
moonshot-v1-128k |
moonshot-v1-128k
|
2.00 |
5.00 |
Source: moonshot, Context: 131072
|
|
|
moonshot
|
moonshot-v1-128k-0430 |
moonshot-v1-128k-0430
|
2.00 |
5.00 |
Source: moonshot, Context: 131072
|
|
|
moonshot
|
moonshot-v1-128k-vision-preview |
moonshot-v1-128k-vision-preview
|
2.00 |
5.00 |
Source: moonshot, Context: 131072
|
|
|
moonshot
|
moonshot-v1-32k |
moonshot-v1-32k
|
1.00 |
3.00 |
Source: moonshot, Context: 32768
|
|
|
moonshot
|
moonshot-v1-32k-0430 |
moonshot-v1-32k-0430
|
1.00 |
3.00 |
Source: moonshot, Context: 32768
|
|
|
moonshot
|
moonshot-v1-32k-vision-preview |
moonshot-v1-32k-vision-preview
|
1.00 |
3.00 |
Source: moonshot, Context: 32768
|
|
|
moonshot
|
moonshot-v1-8k |
moonshot-v1-8k
|
0.20 |
2.00 |
Source: moonshot, Context: 8192
|
|
|
moonshot
|
moonshot-v1-8k-0430 |
moonshot-v1-8k-0430
|
0.20 |
2.00 |
Source: moonshot, Context: 8192
|
|
|
moonshot
|
moonshot-v1-8k-vision-preview |
moonshot-v1-8k-vision-preview
|
0.20 |
2.00 |
Source: moonshot, Context: 8192
|
|
|
moonshot
|
moonshot-v1-auto |
moonshot-v1-auto
|
2.00 |
5.00 |
Source: moonshot, Context: 131072
|
|
|
vertex
|
multimodalembedding |
multimodalembedding
|
0.80 |
0.00 |
Source: vertex, Context: 2048
|
|
|
vertex
|
multimodalembedding@001 |
multimodalembedding@001
|
0.80 |
0.00 |
Source: vertex, Context: 2048
|
|
|
nscale
|
QwQ-32B |
qwq-32b
|
0.18 |
0.20 |
Source: nscale, Context: N/A
|
|
|
nscale
|
Qwen2.5-Coder-32B-Instruct |
qwen2.5-coder-32b-instruct
|
0.06 |
0.20 |
Source: nscale, Context: N/A
|
|
|
nscale
|
Qwen2.5-Coder-3B-Instruct |
qwen2.5-coder-3b-instruct
|
0.01 |
0.03 |
Source: nscale, Context: N/A
|
|
|
nscale
|
Qwen2.5-Coder-7B-Instruct |
qwen2.5-coder-7b-instruct
|
0.01 |
0.03 |
Source: nscale, Context: N/A
|
|
|
nscale
|
FLUX.1-schnell |
flux.1-schnell
|
0.00 |
0.00 |
Source: nscale, Context: N/A
|
|
|
nscale
|
DeepSeek-R1-Distill-Llama-70B |
deepseek-r1-distill-llama-70b
|
0.38 |
0.38 |
Source: nscale, Context: N/A
|
|
|
nscale
|
DeepSeek-R1-Distill-Llama-8B |
deepseek-r1-distill-llama-8b
|
0.03 |
0.03 |
Source: nscale, Context: N/A
|
|
|
nscale
|
DeepSeek-R1-Distill-Qwen-1.5B |
deepseek-r1-distill-qwen-1.5b
|
0.09 |
0.09 |
Source: nscale, Context: N/A
|
|
|
nscale
|
DeepSeek-R1-Distill-Qwen-14B |
deepseek-r1-distill-qwen-14b
|
0.07 |
0.07 |
Source: nscale, Context: N/A
|
|
|
nscale
|
DeepSeek-R1-Distill-Qwen-32B |
deepseek-r1-distill-qwen-32b
|
0.15 |
0.15 |
Source: nscale, Context: N/A
|
|
|
nscale
|
DeepSeek-R1-Distill-Qwen-7B |
deepseek-r1-distill-qwen-7b
|
0.20 |
0.20 |
Source: nscale, Context: N/A
|
|
|
nscale
|
Llama-3.1-8B-Instruct |
llama-3.1-8b-instruct
|
0.03 |
0.03 |
Source: nscale, Context: N/A
|
|
|
nscale
|
Llama-3.3-70B-Instruct |
llama-3.3-70b-instruct
|
0.20 |
0.20 |
Source: nscale, Context: N/A
|
|
|
nscale
|
Llama-4-Scout-17B-16E-Instruct |
llama-4-scout-17b-16e-instruct
|
0.09 |
0.29 |
Source: nscale, Context: N/A
|
|
|
nscale
|
mixtral-8x22b-instruct-v0.1 |
mixtral-8x22b-instruct-v0.1
|
0.60 |
0.60 |
Source: nscale, Context: N/A
|
|
|
nscale
|
stable-diffusion-xl-base-1.0 |
stable-diffusion-xl-base-1.0
|
0.00 |
0.00 |
Source: nscale, Context: N/A
|
|
|
bedrockconverse
|
nvidia.nemotron-nano-12b-v2 |
nvidia.nemotron-nano-12b-v2
|
0.20 |
0.60 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
nvidia.nemotron-nano-9b-v2 |
nvidia.nemotron-nano-9b-v2
|
0.06 |
0.23 |
Source: bedrock_converse, Context: 128000
|
|
|
openai
|
o1-2024-12-17 |
o1-2024-12-17
|
15.00 |
60.00 |
Source: openai, Context: 200000
|
|
|
openai
|
o1-mini-2024-09-12 |
o1-mini-2024-09-12
|
3.00 |
12.00 |
Source: openai, Context: 128000
|
|
|
openai
|
o1-preview-2024-09-12 |
o1-preview-2024-09-12
|
15.00 |
60.00 |
Source: openai, Context: 128000
|
|
|
openai
|
o1-pro-2025-03-19 |
o1-pro-2025-03-19
|
150.00 |
600.00 |
Source: openai, Context: 200000
|
|
|
openai
|
o3-2025-04-16 |
o3-2025-04-16
|
2.00 |
8.00 |
Source: openai, Context: 200000
|
|
|
openai
|
o3-deep-research-2025-06-26 |
o3-deep-research-2025-06-26
|
10.00 |
40.00 |
Source: openai, Context: 200000
|
|
|
openai
|
o3-mini-2025-01-31 |
o3-mini-2025-01-31
|
1.10 |
4.40 |
Source: openai, Context: 200000
|
|
|
openai
|
o3-pro-2025-06-10 |
o3-pro-2025-06-10
|
20.00 |
80.00 |
Source: openai, Context: 200000
|
|
|
openai
|
o4-mini-2025-04-16 |
o4-mini-2025-04-16
|
1.10 |
4.40 |
Source: openai, Context: 200000
|
|
|
openai
|
o4-mini-deep-research-2025-06-26 |
o4-mini-deep-research-2025-06-26
|
2.00 |
8.00 |
Source: openai, Context: 200000
|
|
|
oci
|
meta.llama-3.1-405b-instruct |
meta.llama-3.1-405b-instruct
|
10.68 |
10.68 |
Source: oci, Context: 128000
|
|
|
oci
|
meta.llama-3.2-90b-vision-instruct |
meta.llama-3.2-90b-vision-instruct
|
2.00 |
2.00 |
Source: oci, Context: 128000
|
|
|
oci
|
meta.llama-3.3-70b-instruct |
meta.llama-3.3-70b-instruct
|
0.72 |
0.72 |
Source: oci, Context: 128000
|
|
|
oci
|
meta.llama-4-maverick-17b-128e-instruct-fp8 |
meta.llama-4-maverick-17b-128e-instruct-fp8
|
0.72 |
0.72 |
Source: oci, Context: 512000
|
|
|
oci
|
meta.llama-4-scout-17b-16e-instruct |
meta.llama-4-scout-17b-16e-instruct
|
0.72 |
0.72 |
Source: oci, Context: 192000
|
|
|
oci
|
xai.grok-3 |
xai.grok-3
|
3.00 |
0.15 |
Source: oci, Context: 131072
|
|
|
oci
|
xai.grok-3-fast |
xai.grok-3-fast
|
5.00 |
25.00 |
Source: oci, Context: 131072
|
|
|
oci
|
xai.grok-3-mini |
xai.grok-3-mini
|
0.30 |
0.50 |
Source: oci, Context: 131072
|
|
|
oci
|
xai.grok-3-mini-fast |
xai.grok-3-mini-fast
|
0.60 |
4.00 |
Source: oci, Context: 131072
|
|
|
oci
|
xai.grok-4 |
xai.grok-4
|
3.00 |
0.15 |
Source: oci, Context: 128000
|
|
|
oci
|
cohere.command-latest |
cohere.command-latest
|
1.56 |
1.56 |
Source: oci, Context: 128000
|
|
|
oci
|
cohere.command-a-03-2025 |
cohere.command-a-03-2025
|
1.56 |
1.56 |
Source: oci, Context: 256000
|
|
|
oci
|
cohere.command-plus-latest |
cohere.command-plus-latest
|
1.56 |
1.56 |
Source: oci, Context: 128000
|
|
|
ollama
|
codegeex4 |
codegeex4
|
0.00 |
0.00 |
Source: ollama, Context: 32768
|
|
|
ollama
|
codegemma |
codegemma
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
codellama |
codellama
|
0.00 |
0.00 |
Source: ollama, Context: 4096
|
|
|
ollama
|
deepseek-coder-v2-base |
deepseek-coder-v2-base
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
deepseek-coder-v2-instruct |
deepseek-coder-v2-instruct
|
0.00 |
0.00 |
Source: ollama, Context: 32768
|
|
|
ollama
|
deepseek-coder-v2-lite-base |
deepseek-coder-v2-lite-base
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
deepseek-coder-v2-lite-instruct |
deepseek-coder-v2-lite-instruct
|
0.00 |
0.00 |
Source: ollama, Context: 32768
|
|
|
ollama
|
deepseek-v3.1:671b-cloud |
deepseek-v3.1:671b-cloud
|
0.00 |
0.00 |
Source: ollama, Context: 163840
|
|
|
ollama
|
gpt-oss:120b-cloud |
gpt-oss:120b-cloud
|
0.00 |
0.00 |
Source: ollama, Context: 131072
|
|
|
ollama
|
gpt-oss:20b-cloud |
gpt-oss:20b-cloud
|
0.00 |
0.00 |
Source: ollama, Context: 131072
|
|
|
ollama
|
internlm2_5-20b-chat |
internlm2_5-20b-chat
|
0.00 |
0.00 |
Source: ollama, Context: 32768
|
|
|
ollama
|
llama2 |
llama2
|
0.00 |
0.00 |
Source: ollama, Context: 4096
|
|
|
ollama
|
llama2-uncensored |
llama2-uncensored
|
0.00 |
0.00 |
Source: ollama, Context: 4096
|
|
|
ollama
|
llama2:13b |
llama2:13b
|
0.00 |
0.00 |
Source: ollama, Context: 4096
|
|
|
ollama
|
llama2:70b |
llama2:70b
|
0.00 |
0.00 |
Source: ollama, Context: 4096
|
|
|
ollama
|
llama2:7b |
llama2:7b
|
0.00 |
0.00 |
Source: ollama, Context: 4096
|
|
|
ollama
|
llama3 |
llama3
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
llama3.1 |
llama3.1
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
llama3:70b |
llama3:70b
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
llama3:8b |
llama3:8b
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
mistral |
mistral
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
mistral-7B-Instruct-v0.1 |
mistral-7b-instruct-v0.1
|
0.00 |
0.00 |
Source: ollama, Context: 8192
|
|
|
ollama
|
mistral-7B-Instruct-v0.2 |
mistral-7b-instruct-v0.2
|
0.00 |
0.00 |
Source: ollama, Context: 32768
|
|
|
ollama
|
mistral-large-instruct-2407 |
mistral-large-instruct-2407
|
0.00 |
0.00 |
Source: ollama, Context: 65536
|
|
|
ollama
|
mixtral-8x22B-Instruct-v0.1 |
mixtral-8x22b-instruct-v0.1
|
0.00 |
0.00 |
Source: ollama, Context: 65536
|
|
|
ollama
|
mixtral-8x7B-Instruct-v0.1 |
mixtral-8x7b-instruct-v0.1
|
0.00 |
0.00 |
Source: ollama, Context: 32768
|
|
|
ollama
|
orca-mini |
orca-mini
|
0.00 |
0.00 |
Source: ollama, Context: 4096
|
|
|
ollama
|
qwen3-coder:480b-cloud |
qwen3-coder:480b-cloud
|
0.00 |
0.00 |
Source: ollama, Context: 262144
|
|
|
ollama
|
vicuna |
vicuna
|
0.00 |
0.00 |
Source: ollama, Context: 2048
|
|
|
openai
|
omni-moderation-2024-09-26 |
omni-moderation-2024-09-26
|
0.00 |
0.00 |
Source: openai, Context: 32768
|
|
|
openai
|
omni-moderation-latest |
omni-moderation-latest
|
0.00 |
0.00 |
Source: openai, Context: 32768
|
|
|
openai
|
omni-moderation-latest-intents |
omni-moderation-latest-intents
|
0.00 |
0.00 |
Source: openai, Context: 32768
|
|
|
bedrockconverse
|
openai.gpt-oss-120b-1:0 |
openai.gpt-oss-120b-1:0
|
0.15 |
0.60 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
openai.gpt-oss-20b-1:0 |
openai.gpt-oss-20b-1:0
|
0.07 |
0.30 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
openai.gpt-oss-safeguard-120b |
openai.gpt-oss-safeguard-120b
|
0.15 |
0.60 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
openai.gpt-oss-safeguard-20b |
openai.gpt-oss-safeguard-20b
|
0.07 |
0.20 |
Source: bedrock_converse, Context: 128000
|
|
|
ovhcloud
|
llava-v1.6-mistral-7b-hf |
llava-v1.6-mistral-7b-hf
|
0.29 |
0.29 |
Source: ovhcloud, Context: 32000
|
|
|
ovhcloud
|
mamba-codestral-7B-v0.1 |
mamba-codestral-7b-v0.1
|
0.19 |
0.19 |
Source: ovhcloud, Context: 256000
|
|
|
palm
|
chat-bison |
chat-bison
|
0.13 |
0.13 |
Source: palm, Context: 8192
|
|
|
palm
|
chat-bison-001 |
chat-bison-001
|
0.13 |
0.13 |
Source: palm, Context: 8192
|
|
|
palm
|
text-bison |
text-bison
|
0.13 |
0.13 |
Source: palm, Context: 8192
|
|
|
palm
|
text-bison-001 |
text-bison-001
|
0.13 |
0.13 |
Source: palm, Context: 8192
|
|
|
palm
|
text-bison-safety-off |
text-bison-safety-off
|
0.13 |
0.13 |
Source: palm, Context: 8192
|
|
|
palm
|
text-bison-safety-recitation-off |
text-bison-safety-recitation-off
|
0.13 |
0.13 |
Source: palm, Context: 8192
|
|
|
parallelai
|
search |
search
|
0.00 |
0.00 |
Source: parallel_ai, Context: N/A
|
|
|
parallelai
|
search-pro |
search-pro
|
0.00 |
0.00 |
Source: parallel_ai, Context: N/A
|
|
|
perplexity
|
codellama-34b-instruct |
codellama-34b-instruct
|
0.35 |
1.40 |
Source: perplexity, Context: 16384
|
|
|
perplexity
|
codellama-70b-instruct |
codellama-70b-instruct
|
0.70 |
2.80 |
Source: perplexity, Context: 16384
|
|
|
perplexity
|
llama-2-70b-chat |
llama-2-70b-chat
|
0.70 |
2.80 |
Source: perplexity, Context: 4096
|
|
|
perplexity
|
llama-3.1-70b-instruct |
llama-3.1-70b-instruct
|
1.00 |
1.00 |
Source: perplexity, Context: 131072
|
|
|
perplexity
|
llama-3.1-8b-instruct |
llama-3.1-8b-instruct
|
0.20 |
0.20 |
Source: perplexity, Context: 131072
|
|
|
perplexity
|
llama-3.1-sonar-huge-128k-online |
llama-3.1-sonar-huge-128k-online
|
5.00 |
5.00 |
Source: perplexity, Context: 127072
|
|
|
perplexity
|
llama-3.1-sonar-large-128k-chat |
llama-3.1-sonar-large-128k-chat
|
1.00 |
1.00 |
Source: perplexity, Context: 131072
|
|
|
perplexity
|
llama-3.1-sonar-large-128k-online |
llama-3.1-sonar-large-128k-online
|
1.00 |
1.00 |
Source: perplexity, Context: 127072
|
|
|
perplexity
|
llama-3.1-sonar-small-128k-chat |
llama-3.1-sonar-small-128k-chat
|
0.20 |
0.20 |
Source: perplexity, Context: 131072
|
|
|
perplexity
|
llama-3.1-sonar-small-128k-online |
llama-3.1-sonar-small-128k-online
|
0.20 |
0.20 |
Source: perplexity, Context: 127072
|
|
|
perplexity
|
mistral-7b-instruct |
mistral-7b-instruct
|
0.07 |
0.28 |
Source: perplexity, Context: 4096
|
|
|
perplexity
|
mixtral-8x7b-instruct |
mixtral-8x7b-instruct
|
0.07 |
0.28 |
Source: perplexity, Context: 4096
|
|
|
perplexity
|
pplx-70b-chat |
pplx-70b-chat
|
0.70 |
2.80 |
Source: perplexity, Context: 4096
|
|
|
perplexity
|
pplx-70b-online |
pplx-70b-online
|
0.00 |
2.80 |
Source: perplexity, Context: 4096
|
|
|
perplexity
|
pplx-7b-chat |
pplx-7b-chat
|
0.07 |
0.28 |
Source: perplexity, Context: 8192
|
|
|
perplexity
|
pplx-7b-online |
pplx-7b-online
|
0.00 |
0.28 |
Source: perplexity, Context: 4096
|
|
|
perplexity
|
sonar-deep-research |
sonar-deep-research
|
2.00 |
8.00 |
Source: perplexity, Context: 128000
|
|
|
perplexity
|
sonar-medium-chat |
sonar-medium-chat
|
0.60 |
1.80 |
Source: perplexity, Context: 16384
|
|
|
perplexity
|
sonar-medium-online |
sonar-medium-online
|
0.00 |
1.80 |
Source: perplexity, Context: 12000
|
|
|
perplexity
|
sonar-reasoning |
sonar-reasoning
|
1.00 |
5.00 |
Source: perplexity, Context: 128000
|
|
|
perplexity
|
sonar-small-chat |
sonar-small-chat
|
0.07 |
0.28 |
Source: perplexity, Context: 16384
|
|
|
perplexity
|
sonar-small-online |
sonar-small-online
|
0.00 |
0.28 |
Source: perplexity, Context: 12000
|
|
|
publicai
|
apertus-8b-instruct |
apertus-8b-instruct
|
0.00 |
0.00 |
Source: publicai, Context: 8192
|
|
|
publicai
|
apertus-70b-instruct |
apertus-70b-instruct
|
0.00 |
0.00 |
Source: publicai, Context: 8192
|
|
|
publicai
|
Gemma-SEA-LION-v4-27B-IT |
gemma-sea-lion-v4-27b-it
|
0.00 |
0.00 |
Source: publicai, Context: 8192
|
|
|
publicai
|
salamandra-7b-instruct-tools-16k |
salamandra-7b-instruct-tools-16k
|
0.00 |
0.00 |
Source: publicai, Context: 16384
|
|
|
publicai
|
ALIA-40b-instruct_Q8_0 |
alia-40b-instruct_q8_0
|
0.00 |
0.00 |
Source: publicai, Context: 8192
|
|
|
publicai
|
Olmo-3-7B-Instruct |
olmo-3-7b-instruct
|
0.00 |
0.00 |
Source: publicai, Context: 32768
|
|
|
publicai
|
Qwen-SEA-LION-v4-32B-IT |
qwen-sea-lion-v4-32b-it
|
0.00 |
0.00 |
Source: publicai, Context: 32768
|
|
|
publicai
|
Olmo-3-7B-Think |
olmo-3-7b-think
|
0.00 |
0.00 |
Source: publicai, Context: 32768
|
|
|
publicai
|
Olmo-3-32B-Think |
olmo-3-32b-think
|
0.00 |
0.00 |
Source: publicai, Context: 32768
|
|
|
bedrockconverse
|
qwen.qwen3-coder-480b-a35b-v1:0 |
qwen.qwen3-coder-480b-a35b-v1:0
|
0.22 |
1.80 |
Source: bedrock_converse, Context: 262000
|
|
|
bedrockconverse
|
qwen.qwen3-235b-a22b-2507-v1:0 |
qwen.qwen3-235b-a22b-2507-v1:0
|
0.22 |
0.88 |
Source: bedrock_converse, Context: 262144
|
|
|
bedrockconverse
|
qwen.qwen3-coder-30b-a3b-v1:0 |
qwen.qwen3-coder-30b-a3b-v1:0
|
0.15 |
0.60 |
Source: bedrock_converse, Context: 262144
|
|
|
bedrockconverse
|
qwen.qwen3-32b-v1:0 |
qwen.qwen3-32b-v1:0
|
0.15 |
0.60 |
Source: bedrock_converse, Context: 131072
|
|
|
bedrockconverse
|
qwen.qwen3-next-80b-a3b |
qwen.qwen3-next-80b-a3b
|
0.15 |
1.20 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
qwen.qwen3-vl-235b-a22b |
qwen.qwen3-vl-235b-a22b
|
0.53 |
2.66 |
Source: bedrock_converse, Context: 128000
|
|
|
recraft
|
recraftv2 |
recraftv2
|
0.00 |
0.00 |
Source: recraft, Context: N/A
|
|
|
recraft
|
recraftv3 |
recraftv3
|
0.00 |
0.00 |
Source: recraft, Context: N/A
|
|
|
replicate
|
llama-2-13b |
llama-2-13b
|
0.10 |
0.50 |
Source: replicate, Context: 4096
|
|
|
replicate
|
llama-2-13b-chat |
llama-2-13b-chat
|
0.10 |
0.50 |
Source: replicate, Context: 4096
|
|
|
replicate
|
llama-2-70b |
llama-2-70b
|
0.65 |
2.75 |
Source: replicate, Context: 4096
|
|
|
replicate
|
llama-2-70b-chat |
llama-2-70b-chat
|
0.65 |
2.75 |
Source: replicate, Context: 4096
|
|
|
replicate
|
llama-2-7b |
llama-2-7b
|
0.05 |
0.25 |
Source: replicate, Context: 4096
|
|
|
replicate
|
llama-2-7b-chat |
llama-2-7b-chat
|
0.05 |
0.25 |
Source: replicate, Context: 4096
|
|
|
replicate
|
llama-3-70b |
llama-3-70b
|
0.65 |
2.75 |
Source: replicate, Context: 8192
|
|
|
replicate
|
llama-3-70b-instruct |
llama-3-70b-instruct
|
0.65 |
2.75 |
Source: replicate, Context: 8192
|
|
|
replicate
|
llama-3-8b |
llama-3-8b
|
0.05 |
0.25 |
Source: replicate, Context: 8086
|
|
|
replicate
|
llama-3-8b-instruct |
llama-3-8b-instruct
|
0.05 |
0.25 |
Source: replicate, Context: 8086
|
|
|
replicate
|
mistral-7b-instruct-v0.2 |
mistral-7b-instruct-v0.2
|
0.05 |
0.25 |
Source: replicate, Context: 4096
|
|
|
replicate
|
mistral-7b-v0.1 |
mistral-7b-v0.1
|
0.05 |
0.25 |
Source: replicate, Context: 4096
|
|
|
replicate
|
mixtral-8x7b-instruct-v0.1 |
mixtral-8x7b-instruct-v0.1
|
0.30 |
1.00 |
Source: replicate, Context: 4096
|
|
|
cohere
|
rerank-english-v2.0 |
rerank-english-v2.0
|
0.00 |
0.00 |
Source: cohere, Context: 4096
|
|
|
cohere
|
rerank-english-v3.0 |
rerank-english-v3.0
|
0.00 |
0.00 |
Source: cohere, Context: 4096
|
|
|
cohere
|
rerank-multilingual-v2.0 |
rerank-multilingual-v2.0
|
0.00 |
0.00 |
Source: cohere, Context: 4096
|
|
|
cohere
|
rerank-multilingual-v3.0 |
rerank-multilingual-v3.0
|
0.00 |
0.00 |
Source: cohere, Context: 4096
|
|
|
cohere
|
rerank-v3.5 |
rerank-v3.5
|
0.00 |
0.00 |
Source: cohere, Context: 4096
|
|
|
nvidianim
|
nv-rerankqa-mistral-4b-v3 |
nv-rerankqa-mistral-4b-v3
|
0.00 |
0.00 |
Source: nvidia_nim, Context: N/A
|
|
|
nvidianim
|
llama-3_2-nv-rerankqa-1b-v2 |
llama-3_2-nv-rerankqa-1b-v2
|
0.00 |
0.00 |
Source: nvidia_nim, Context: N/A
|
|
|
nvidianim
|
llama-3.2-nv-rerankqa-1b-v2 |
llama-3.2-nv-rerankqa-1b-v2
|
0.00 |
0.00 |
Source: nvidia_nim, Context: N/A
|
|
|
sagemaker
|
meta-textgeneration-llama-2-13b |
meta-textgeneration-llama-2-13b
|
0.00 |
0.00 |
Source: sagemaker, Context: 4096
|
|
|
sagemaker
|
meta-textgeneration-llama-2-13b-f |
meta-textgeneration-llama-2-13b-f
|
0.00 |
0.00 |
Source: sagemaker, Context: 4096
|
|
|
sagemaker
|
meta-textgeneration-llama-2-70b |
meta-textgeneration-llama-2-70b
|
0.00 |
0.00 |
Source: sagemaker, Context: 4096
|
|
|
sagemaker
|
meta-textgeneration-llama-2-70b-b-f |
meta-textgeneration-llama-2-70b-b-f
|
0.00 |
0.00 |
Source: sagemaker, Context: 4096
|
|
|
sagemaker
|
meta-textgeneration-llama-2-7b |
meta-textgeneration-llama-2-7b
|
0.00 |
0.00 |
Source: sagemaker, Context: 4096
|
|
|
sagemaker
|
meta-textgeneration-llama-2-7b-f |
meta-textgeneration-llama-2-7b-f
|
0.00 |
0.00 |
Source: sagemaker, Context: 4096
|
|
|
sambanova
|
DeepSeek-R1 |
deepseek-r1
|
5.00 |
7.00 |
Source: sambanova, Context: 32768
|
|
|
sambanova
|
DeepSeek-R1-Distill-Llama-70B |
deepseek-r1-distill-llama-70b
|
0.70 |
1.40 |
Source: sambanova, Context: 131072
|
|
|
sambanova
|
DeepSeek-V3-0324 |
deepseek-v3-0324
|
3.00 |
4.50 |
Source: sambanova, Context: 32768
|
|
|
sambanova
|
Llama-4-Maverick-17B-128E-Instruct |
llama-4-maverick-17b-128e-instruct
|
0.63 |
1.80 |
Source: sambanova, Context: 131072
|
|
|
sambanova
|
Llama-4-Scout-17B-16E-Instruct |
llama-4-scout-17b-16e-instruct
|
0.40 |
0.70 |
Source: sambanova, Context: 8192
|
|
|
sambanova
|
Meta-Llama-3.1-405B-Instruct |
meta-llama-3.1-405b-instruct
|
5.00 |
10.00 |
Source: sambanova, Context: 16384
|
|
|
sambanova
|
Meta-Llama-3.1-8B-Instruct |
meta-llama-3.1-8b-instruct
|
0.10 |
0.20 |
Source: sambanova, Context: 16384
|
|
|
sambanova
|
Meta-Llama-3.2-1B-Instruct |
meta-llama-3.2-1b-instruct
|
0.04 |
0.08 |
Source: sambanova, Context: 16384
|
|
|
sambanova
|
Meta-Llama-3.2-3B-Instruct |
meta-llama-3.2-3b-instruct
|
0.08 |
0.16 |
Source: sambanova, Context: 4096
|
|
|
sambanova
|
Meta-Llama-3.3-70B-Instruct |
meta-llama-3.3-70b-instruct
|
0.60 |
1.20 |
Source: sambanova, Context: 131072
|
|
|
sambanova
|
Meta-Llama-Guard-3-8B |
meta-llama-guard-3-8b
|
0.30 |
0.30 |
Source: sambanova, Context: 16384
|
|
|
sambanova
|
QwQ-32B |
qwq-32b
|
0.50 |
1.00 |
Source: sambanova, Context: 16384
|
|
|
sambanova
|
Qwen2-Audio-7B-Instruct |
qwen2-audio-7b-instruct
|
0.50 |
100.00 |
Source: sambanova, Context: 4096
|
|
|
sambanova
|
Qwen3-32B |
qwen3-32b
|
0.40 |
0.80 |
Source: sambanova, Context: 8192
|
|
|
sambanova
|
DeepSeek-V3.1 |
deepseek-v3.1
|
3.00 |
4.50 |
Source: sambanova, Context: 32768
|
|
|
sambanova
|
gpt-oss-120b |
gpt-oss-120b
|
3.00 |
4.50 |
Source: sambanova, Context: 131072
|
|
|
snowflake
|
claude-3-5-sonnet |
claude-3-5-sonnet
|
0.00 |
0.00 |
Source: snowflake, Context: 18000
|
|
|
snowflake
|
deepseek-r1 |
deepseek-r1
|
0.00 |
0.00 |
Source: snowflake, Context: 32768
|
|
|
snowflake
|
gemma-7b |
gemma-7b
|
0.00 |
0.00 |
Source: snowflake, Context: 8000
|
|
|
snowflake
|
jamba-1.5-large |
jamba-1.5-large
|
0.00 |
0.00 |
Source: snowflake, Context: 256000
|
|
|
snowflake
|
jamba-1.5-mini |
jamba-1.5-mini
|
0.00 |
0.00 |
Source: snowflake, Context: 256000
|
|
|
snowflake
|
jamba-instruct |
jamba-instruct
|
0.00 |
0.00 |
Source: snowflake, Context: 256000
|
|
|
snowflake
|
llama2-70b-chat |
llama2-70b-chat
|
0.00 |
0.00 |
Source: snowflake, Context: 4096
|
|
|
snowflake
|
llama3-70b |
llama3-70b
|
0.00 |
0.00 |
Source: snowflake, Context: 8000
|
|
|
snowflake
|
llama3-8b |
llama3-8b
|
0.00 |
0.00 |
Source: snowflake, Context: 8000
|
|
|
snowflake
|
llama3.1-405b |
llama3.1-405b
|
0.00 |
0.00 |
Source: snowflake, Context: 128000
|
|
|
snowflake
|
llama3.1-70b |
llama3.1-70b
|
0.00 |
0.00 |
Source: snowflake, Context: 128000
|
|
|
snowflake
|
llama3.1-8b |
llama3.1-8b
|
0.00 |
0.00 |
Source: snowflake, Context: 128000
|
|
|
snowflake
|
llama3.2-1b |
llama3.2-1b
|
0.00 |
0.00 |
Source: snowflake, Context: 128000
|
|
|
snowflake
|
llama3.2-3b |
llama3.2-3b
|
0.00 |
0.00 |
Source: snowflake, Context: 128000
|
|
|
snowflake
|
llama3.3-70b |
llama3.3-70b
|
0.00 |
0.00 |
Source: snowflake, Context: 128000
|
|
|
snowflake
|
mistral-7b |
mistral-7b
|
0.00 |
0.00 |
Source: snowflake, Context: 32000
|
|
|
snowflake
|
mistral-large |
mistral-large
|
0.00 |
0.00 |
Source: snowflake, Context: 32000
|
|
|
snowflake
|
mistral-large2 |
mistral-large2
|
0.00 |
0.00 |
Source: snowflake, Context: 128000
|
|
|
snowflake
|
mixtral-8x7b |
mixtral-8x7b
|
0.00 |
0.00 |
Source: snowflake, Context: 32000
|
|
|
snowflake
|
reka-core |
reka-core
|
0.00 |
0.00 |
Source: snowflake, Context: 32000
|
|
|
snowflake
|
reka-flash |
reka-flash
|
0.00 |
0.00 |
Source: snowflake, Context: 100000
|
|
|
snowflake
|
snowflake-arctic |
snowflake-arctic
|
0.00 |
0.00 |
Source: snowflake, Context: 4096
|
|
|
snowflake
|
snowflake-llama-3.1-405b |
snowflake-llama-3.1-405b
|
0.00 |
0.00 |
Source: snowflake, Context: 8000
|
|
|
snowflake
|
snowflake-llama-3.3-70b |
snowflake-llama-3.3-70b
|
0.00 |
0.00 |
Source: snowflake, Context: 8000
|
|
|
stability
|
sd3 |
sd3
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
sd3-large |
sd3-large
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
sd3-large-turbo |
sd3-large-turbo
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
sd3-medium |
sd3-medium
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
sd3.5-large |
sd3.5-large
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
sd3.5-large-turbo |
sd3.5-large-turbo
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
sd3.5-medium |
sd3.5-medium
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
stable-image-ultra |
stable-image-ultra
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
inpaint |
inpaint
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
outpaint |
outpaint
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
erase |
erase
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
search-and-replace |
search-and-replace
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
search-and-recolor |
search-and-recolor
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
remove-background |
remove-background
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
replace-background-and-relight |
replace-background-and-relight
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
sketch |
sketch
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
structure |
structure
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
style |
style
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
style-transfer |
style-transfer
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
fast |
fast
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
conservative |
conservative
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
creative |
creative
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
stability
|
stable-image-core |
stable-image-core
|
0.00 |
0.00 |
Source: stability, Context: N/A
|
|
|
bedrock
|
stability.sd3-5-large-v1:0 |
stability.sd3-5-large-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.sd3-large-v1:0 |
stability.sd3-large-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-core-v1:0 |
stability.stable-image-core-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-conservative-upscale-v1:0 |
stability.stable-conservative-upscale-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-creative-upscale-v1:0 |
stability.stable-creative-upscale-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-fast-upscale-v1:0 |
stability.stable-fast-upscale-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-outpaint-v1:0 |
stability.stable-outpaint-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-control-sketch-v1:0 |
stability.stable-image-control-sketch-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-control-structure-v1:0 |
stability.stable-image-control-structure-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-erase-object-v1:0 |
stability.stable-image-erase-object-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-inpaint-v1:0 |
stability.stable-image-inpaint-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-remove-background-v1:0 |
stability.stable-image-remove-background-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-search-recolor-v1:0 |
stability.stable-image-search-recolor-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-search-replace-v1:0 |
stability.stable-image-search-replace-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-style-guide-v1:0 |
stability.stable-image-style-guide-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-style-transfer-v1:0 |
stability.stable-style-transfer-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-core-v1:1 |
stability.stable-image-core-v1:1
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-ultra-v1:0 |
stability.stable-image-ultra-v1:0
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
bedrock
|
stability.stable-image-ultra-v1:1 |
stability.stable-image-ultra-v1:1
|
0.00 |
0.00 |
Source: bedrock, Context: 77
|
|
|
linkup
|
search |
search
|
0.00 |
0.00 |
Source: linkup, Context: N/A
|
|
|
linkup
|
search-deep |
search-deep
|
0.00 |
0.00 |
Source: linkup, Context: N/A
|
|
|
tavily
|
search |
search
|
0.00 |
0.00 |
Source: tavily, Context: N/A
|
|
|
tavily
|
search-advanced |
search-advanced
|
0.00 |
0.00 |
Source: tavily, Context: N/A
|
|
|
vertex
|
text-bison |
text-bison
|
0.00 |
0.00 |
Source: vertex, Context: 8192
|
|
|
vertex
|
text-bison32k |
text-bison32k
|
0.13 |
0.13 |
Source: vertex, Context: 8192
|
|
|
vertex
|
text-bison32k@002 |
text-bison32k@002
|
0.13 |
0.13 |
Source: vertex, Context: 8192
|
|
|
vertex
|
text-bison@001 |
text-bison@001
|
0.00 |
0.00 |
Source: vertex, Context: 8192
|
|
|
vertex
|
text-bison@002 |
text-bison@002
|
0.00 |
0.00 |
Source: vertex, Context: 8192
|
|
|
textcompletioncodestral
|
codestral-2405 |
codestral-2405
|
0.00 |
0.00 |
Source: text-completion-codestral, Context: 32000
|
|
|
textcompletioncodestral
|
codestral-latest |
codestral-latest
|
0.00 |
0.00 |
Source: text-completion-codestral, Context: 32000
|
|
|
vertex
|
text-embedding-004 |
text-embedding-004
|
0.10 |
0.00 |
Source: vertex, Context: 2048
|
|
|
vertex
|
text-embedding-005 |
text-embedding-005
|
0.10 |
0.00 |
Source: vertex, Context: 2048
|
|
|
openai
|
text-embedding-ada-002-v2 |
text-embedding-ada-002-v2
|
0.10 |
0.00 |
Source: openai, Context: 8191
|
|
|
vertex
|
text-embedding-large-exp-03-07 |
text-embedding-large-exp-03-07
|
0.10 |
0.00 |
Source: vertex, Context: 8192
|
|
|
vertex
|
text-embedding-preview-0409 |
text-embedding-preview-0409
|
0.01 |
0.00 |
Source: vertex, Context: 3072
|
|
|
openai
|
text-moderation-007 |
text-moderation-007
|
0.00 |
0.00 |
Source: openai, Context: 32768
|
|
|
openai
|
text-moderation-latest |
text-moderation-latest
|
0.00 |
0.00 |
Source: openai, Context: 32768
|
|
|
openai
|
text-moderation-stable |
text-moderation-stable
|
0.00 |
0.00 |
Source: openai, Context: 32768
|
|
|
vertex
|
text-multilingual-embedding-002 |
text-multilingual-embedding-002
|
0.10 |
0.00 |
Source: vertex, Context: 2048
|
|
|
vertex
|
text-multilingual-embedding-preview-0409 |
text-multilingual-embedding-preview-0409
|
0.01 |
0.00 |
Source: vertex, Context: 3072
|
|
|
vertex
|
text-unicorn |
text-unicorn
|
10.00 |
28.00 |
Source: vertex, Context: 8192
|
|
|
vertex
|
text-unicorn@001 |
text-unicorn@001
|
10.00 |
28.00 |
Source: vertex, Context: 8192
|
|
|
vertex
|
textembedding-gecko |
textembedding-gecko
|
0.10 |
0.00 |
Source: vertex, Context: 3072
|
|
|
vertex
|
textembedding-gecko-multilingual |
textembedding-gecko-multilingual
|
0.10 |
0.00 |
Source: vertex, Context: 3072
|
|
|
vertex
|
textembedding-gecko-multilingual@001 |
textembedding-gecko-multilingual@001
|
0.10 |
0.00 |
Source: vertex, Context: 3072
|
|
|
vertex
|
textembedding-gecko@001 |
textembedding-gecko@001
|
0.10 |
0.00 |
Source: vertex, Context: 3072
|
|
|
vertex
|
textembedding-gecko@003 |
textembedding-gecko@003
|
0.10 |
0.00 |
Source: vertex, Context: 3072
|
|
|
openai
|
tts-1 |
tts-1
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
openai
|
tts-1-hd |
tts-1-hd
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
awspolly
|
standard |
standard
|
0.00 |
0.00 |
Source: aws_polly, Context: N/A
|
|
|
awspolly
|
neural |
neural
|
0.00 |
0.00 |
Source: aws_polly, Context: N/A
|
|
|
awspolly
|
long-form |
long-form
|
0.00 |
0.00 |
Source: aws_polly, Context: N/A
|
|
|
awspolly
|
generative |
generative
|
0.00 |
0.00 |
Source: aws_polly, Context: N/A
|
|
|
bedrockconverse
|
us.amazon.nova-lite-v1:0 |
us.amazon.nova-lite-v1:0
|
0.06 |
0.24 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrockconverse
|
us.amazon.nova-micro-v1:0 |
us.amazon.nova-micro-v1:0
|
0.04 |
0.14 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
us.amazon.nova-premier-v1:0 |
us.amazon.nova-premier-v1:0
|
2.50 |
12.50 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
us.amazon.nova-pro-v1:0 |
us.amazon.nova-pro-v1:0
|
0.80 |
3.20 |
Source: bedrock_converse, Context: 300000
|
|
|
bedrockconverse
|
us.anthropic.claude-haiku-4-5-20251001-v1:0 |
us.anthropic.claude-haiku-4-5-20251001-v1:0
|
1.10 |
5.50 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
us.anthropic.claude-3-5-sonnet-20240620-v1:0 |
us.anthropic.claude-3-5-sonnet-20240620-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
us.anthropic.claude-3-5-sonnet-20241022-v2:0 |
us.anthropic.claude-3-5-sonnet-20241022-v2:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
us.anthropic.claude-3-7-sonnet-20250219-v1:0 |
us.anthropic.claude-3-7-sonnet-20250219-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrock
|
us.anthropic.claude-3-haiku-20240307-v1:0 |
us.anthropic.claude-3-haiku-20240307-v1:0
|
0.25 |
1.25 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
us.anthropic.claude-3-opus-20240229-v1:0 |
us.anthropic.claude-3-opus-20240229-v1:0
|
15.00 |
75.00 |
Source: bedrock, Context: 200000
|
|
|
bedrock
|
us.anthropic.claude-3-sonnet-20240229-v1:0 |
us.anthropic.claude-3-sonnet-20240229-v1:0
|
3.00 |
15.00 |
Source: bedrock, Context: 200000
|
|
|
bedrockconverse
|
us.anthropic.claude-opus-4-1-20250805-v1:0 |
us.anthropic.claude-opus-4-1-20250805-v1:0
|
15.00 |
75.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
us.anthropic.claude-sonnet-4-5-20250929-v1:0 |
us.anthropic.claude-sonnet-4-5-20250929-v1:0
|
3.30 |
16.50 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
au.anthropic.claude-haiku-4-5-20251001-v1:0 |
au.anthropic.claude-haiku-4-5-20251001-v1:0
|
1.10 |
5.50 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
us.anthropic.claude-opus-4-20250514-v1:0 |
us.anthropic.claude-opus-4-20250514-v1:0
|
15.00 |
75.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
us.anthropic.claude-opus-4-5-20251101-v1:0 |
us.anthropic.claude-opus-4-5-20251101-v1:0
|
5.00 |
25.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
global.anthropic.claude-opus-4-5-20251101-v1:0 |
global.anthropic.claude-opus-4-5-20251101-v1:0
|
5.00 |
25.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
eu.anthropic.claude-opus-4-5-20251101-v1:0 |
eu.anthropic.claude-opus-4-5-20251101-v1:0
|
5.00 |
25.00 |
Source: bedrock_converse, Context: 200000
|
|
|
bedrockconverse
|
us.anthropic.claude-sonnet-4-20250514-v1:0 |
us.anthropic.claude-sonnet-4-20250514-v1:0
|
3.00 |
15.00 |
Source: bedrock_converse, Context: 1000000
|
|
|
bedrockconverse
|
us.deepseek.r1-v1:0 |
us.deepseek.r1-v1:0
|
1.35 |
5.40 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrock
|
us.meta.llama3-1-405b-instruct-v1:0 |
us.meta.llama3-1-405b-instruct-v1:0
|
5.32 |
16.00 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
us.meta.llama3-1-70b-instruct-v1:0 |
us.meta.llama3-1-70b-instruct-v1:0
|
0.99 |
0.99 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
us.meta.llama3-1-8b-instruct-v1:0 |
us.meta.llama3-1-8b-instruct-v1:0
|
0.22 |
0.22 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
us.meta.llama3-2-11b-instruct-v1:0 |
us.meta.llama3-2-11b-instruct-v1:0
|
0.35 |
0.35 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
us.meta.llama3-2-1b-instruct-v1:0 |
us.meta.llama3-2-1b-instruct-v1:0
|
0.10 |
0.10 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
us.meta.llama3-2-3b-instruct-v1:0 |
us.meta.llama3-2-3b-instruct-v1:0
|
0.15 |
0.15 |
Source: bedrock, Context: 128000
|
|
|
bedrock
|
us.meta.llama3-2-90b-instruct-v1:0 |
us.meta.llama3-2-90b-instruct-v1:0
|
2.00 |
2.00 |
Source: bedrock, Context: 128000
|
|
|
bedrockconverse
|
us.meta.llama3-3-70b-instruct-v1:0 |
us.meta.llama3-3-70b-instruct-v1:0
|
0.72 |
0.72 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
us.meta.llama4-maverick-17b-instruct-v1:0 |
us.meta.llama4-maverick-17b-instruct-v1:0
|
0.24 |
0.97 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
us.meta.llama4-scout-17b-instruct-v1:0 |
us.meta.llama4-scout-17b-instruct-v1:0
|
0.17 |
0.66 |
Source: bedrock_converse, Context: 128000
|
|
|
bedrockconverse
|
us.mistral.pixtral-large-2502-v1:0 |
us.mistral.pixtral-large-2502-v1:0
|
2.00 |
6.00 |
Source: bedrock_converse, Context: 128000
|
|
|
vercel
|
claude-4-opus |
claude-4-opus
|
15.00 |
75.00 |
Source: vercel, Context: 200000
|
|
|
vercel
|
claude-4-sonnet |
claude-4-sonnet
|
3.00 |
15.00 |
Source: vercel, Context: 200000
|
|
|
vercel
|
command-r |
command-r
|
0.15 |
0.60 |
Source: vercel, Context: 128000
|
|
|
vercel
|
command-r-plus |
command-r-plus
|
2.50 |
10.00 |
Source: vercel, Context: 128000
|
|
|
vercel
|
deepseek-r1-distill-llama-70b |
deepseek-r1-distill-llama-70b
|
0.75 |
0.99 |
Source: vercel, Context: 131072
|
|
|
vercel
|
gemma-2-9b |
gemma-2-9b
|
0.20 |
0.20 |
Source: vercel, Context: 8192
|
|
|
vercel
|
llama-3-70b |
llama-3-70b
|
0.59 |
0.79 |
Source: vercel, Context: 8192
|
|
|
vercel
|
llama-3-8b |
llama-3-8b
|
0.05 |
0.08 |
Source: vercel, Context: 8192
|
|
|
vercel
|
mistral-large |
mistral-large
|
2.00 |
6.00 |
Source: vercel, Context: 32000
|
|
|
vercel
|
mistral-saba-24b |
mistral-saba-24b
|
0.79 |
0.79 |
Source: vercel, Context: 32768
|
|
|
vertex
|
chirp |
chirp
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
claude-3-5-haiku |
claude-3-5-haiku
|
1.00 |
5.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-5-haiku@20241022 |
claude-3-5-haiku@20241022
|
1.00 |
5.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-haiku-4-5@20251001 |
claude-haiku-4-5@20251001
|
1.00 |
5.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-5-sonnet |
claude-3-5-sonnet
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-5-sonnet-v2 |
claude-3-5-sonnet-v2
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-5-sonnet-v2@20241022 |
claude-3-5-sonnet-v2@20241022
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-5-sonnet@20240620 |
claude-3-5-sonnet@20240620
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-7-sonnet@20250219 |
claude-3-7-sonnet@20250219
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-haiku |
claude-3-haiku
|
0.25 |
1.25 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-haiku@20240307 |
claude-3-haiku@20240307
|
0.25 |
1.25 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-opus |
claude-3-opus
|
15.00 |
75.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-opus@20240229 |
claude-3-opus@20240229
|
15.00 |
75.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-sonnet |
claude-3-sonnet
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-3-sonnet@20240229 |
claude-3-sonnet@20240229
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-opus-4 |
claude-opus-4
|
15.00 |
75.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-opus-4-1 |
claude-opus-4-1
|
15.00 |
75.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-opus-4-1@20250805 |
claude-opus-4-1@20250805
|
15.00 |
75.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-opus-4-5 |
claude-opus-4-5
|
5.00 |
25.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-opus-4-5@20251101 |
claude-opus-4-5@20251101
|
5.00 |
25.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-sonnet-4-5 |
claude-sonnet-4-5
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-sonnet-4-5@20250929 |
claude-sonnet-4-5@20250929
|
3.00 |
15.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-opus-4@20250514 |
claude-opus-4@20250514
|
15.00 |
75.00 |
Source: vertex, Context: 200000
|
|
|
vertex
|
claude-sonnet-4 |
claude-sonnet-4
|
3.00 |
15.00 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
claude-sonnet-4@20250514 |
claude-sonnet-4@20250514
|
3.00 |
15.00 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
codestral-2@001 |
codestral-2@001
|
0.30 |
0.90 |
Source: vertex, Context: 128000
|
|
|
vertex
|
codestral-2 |
codestral-2
|
0.30 |
0.90 |
Source: vertex, Context: 128000
|
|
|
vertex
|
codestral-2501 |
codestral-2501
|
0.20 |
0.60 |
Source: vertex, Context: 128000
|
|
|
vertex
|
codestral@2405 |
codestral@2405
|
0.20 |
0.60 |
Source: vertex, Context: 128000
|
|
|
vertex
|
codestral@latest |
codestral@latest
|
0.20 |
0.60 |
Source: vertex, Context: 128000
|
|
|
vertex
|
deepseek-v3.1-maas |
deepseek-v3.1-maas
|
1.35 |
5.40 |
Source: vertex, Context: 163840
|
|
|
vertex
|
deepseek-v3.2-maas |
deepseek-v3.2-maas
|
0.56 |
1.68 |
Source: vertex, Context: 163840
|
|
|
vertex
|
deepseek-r1-0528-maas |
deepseek-r1-0528-maas
|
1.35 |
5.40 |
Source: vertex, Context: 65336
|
|
|
vertex
|
imagegeneration@006 |
imagegeneration@006
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
imagen-3.0-fast-generate-001 |
imagen-3.0-fast-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
imagen-3.0-generate-001 |
imagen-3.0-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
imagen-3.0-generate-002 |
imagen-3.0-generate-002
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
imagen-3.0-capability-001 |
imagen-3.0-capability-001
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
imagen-4.0-fast-generate-001 |
imagen-4.0-fast-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
imagen-4.0-generate-001 |
imagen-4.0-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
imagen-4.0-ultra-generate-001 |
imagen-4.0-ultra-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
jamba-1.5 |
jamba-1.5
|
0.20 |
0.40 |
Source: vertex, Context: 256000
|
|
|
vertex
|
jamba-1.5-large |
jamba-1.5-large
|
2.00 |
8.00 |
Source: vertex, Context: 256000
|
|
|
vertex
|
jamba-1.5-large@001 |
jamba-1.5-large@001
|
2.00 |
8.00 |
Source: vertex, Context: 256000
|
|
|
vertex
|
jamba-1.5-mini |
jamba-1.5-mini
|
0.20 |
0.40 |
Source: vertex, Context: 256000
|
|
|
vertex
|
jamba-1.5-mini@001 |
jamba-1.5-mini@001
|
0.20 |
0.40 |
Source: vertex, Context: 256000
|
|
|
vertex
|
llama-3.1-405b-instruct-maas |
llama-3.1-405b-instruct-maas
|
5.00 |
16.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
llama-3.1-70b-instruct-maas |
llama-3.1-70b-instruct-maas
|
0.00 |
0.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
llama-3.1-8b-instruct-maas |
llama-3.1-8b-instruct-maas
|
0.00 |
0.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
llama-3.2-90b-vision-instruct-maas |
llama-3.2-90b-vision-instruct-maas
|
0.00 |
0.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
llama-4-maverick-17b-128e-instruct-maas |
llama-4-maverick-17b-128e-instruct-maas
|
0.35 |
1.15 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
llama-4-maverick-17b-16e-instruct-maas |
llama-4-maverick-17b-16e-instruct-maas
|
0.35 |
1.15 |
Source: vertex, Context: 1000000
|
|
|
vertex
|
llama-4-scout-17b-128e-instruct-maas |
llama-4-scout-17b-128e-instruct-maas
|
0.25 |
0.70 |
Source: vertex, Context: 10000000
|
|
|
vertex
|
llama-4-scout-17b-16e-instruct-maas |
llama-4-scout-17b-16e-instruct-maas
|
0.25 |
0.70 |
Source: vertex, Context: 10000000
|
|
|
vertex
|
llama3-405b-instruct-maas |
llama3-405b-instruct-maas
|
0.00 |
0.00 |
Source: vertex, Context: 32000
|
|
|
vertex
|
llama3-70b-instruct-maas |
llama3-70b-instruct-maas
|
0.00 |
0.00 |
Source: vertex, Context: 32000
|
|
|
vertex
|
llama3-8b-instruct-maas |
llama3-8b-instruct-maas
|
0.00 |
0.00 |
Source: vertex, Context: 32000
|
|
|
vertex
|
minimax-m2-maas |
minimax-m2-maas
|
0.30 |
1.20 |
Source: vertex, Context: 196608
|
|
|
vertex
|
kimi-k2-thinking-maas |
kimi-k2-thinking-maas
|
0.60 |
2.50 |
Source: vertex, Context: 256000
|
|
|
vertex
|
mistral-medium-3 |
mistral-medium-3
|
0.40 |
2.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-medium-3@001 |
mistral-medium-3@001
|
0.40 |
2.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-large-2411 |
mistral-large-2411
|
2.00 |
6.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-large@2407 |
mistral-large@2407
|
2.00 |
6.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-large@2411-001 |
mistral-large@2411-001
|
2.00 |
6.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-large@latest |
mistral-large@latest
|
2.00 |
6.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-nemo@2407 |
mistral-nemo@2407
|
3.00 |
3.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-nemo@latest |
mistral-nemo@latest
|
0.15 |
0.15 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-small-2503 |
mistral-small-2503
|
1.00 |
3.00 |
Source: vertex, Context: 128000
|
|
|
vertex
|
mistral-small-2503@001 |
mistral-small-2503@001
|
1.00 |
3.00 |
Source: vertex, Context: 32000
|
|
|
vertex
|
mistral-ocr-2505 |
mistral-ocr-2505
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
vertex
|
deepseek-ocr-maas |
deepseek-ocr-maas
|
0.30 |
1.20 |
Source: vertex, Context: N/A
|
|
|
vertex
|
gpt-oss-120b-maas |
gpt-oss-120b-maas
|
0.15 |
0.60 |
Source: vertex, Context: 131072
|
|
|
vertex
|
gpt-oss-20b-maas |
gpt-oss-20b-maas
|
0.08 |
0.30 |
Source: vertex, Context: 131072
|
|
|
vertex
|
qwen3-235b-a22b-instruct-2507-maas |
qwen3-235b-a22b-instruct-2507-maas
|
0.25 |
1.00 |
Source: vertex, Context: 262144
|
|
|
vertex
|
qwen3-coder-480b-a35b-instruct-maas |
qwen3-coder-480b-a35b-instruct-maas
|
1.00 |
4.00 |
Source: vertex, Context: 262144
|
|
|
vertex
|
qwen3-next-80b-a3b-instruct-maas |
qwen3-next-80b-a3b-instruct-maas
|
0.15 |
1.20 |
Source: vertex, Context: 262144
|
|
|
vertex
|
qwen3-next-80b-a3b-thinking-maas |
qwen3-next-80b-a3b-thinking-maas
|
0.15 |
1.20 |
Source: vertex, Context: 262144
|
|
|
vertex
|
veo-2.0-generate-001 |
veo-2.0-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.0-fast-generate-preview |
veo-3.0-fast-generate-preview
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.0-generate-preview |
veo-3.0-generate-preview
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.0-fast-generate-001 |
veo-3.0-fast-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.0-generate-001 |
veo-3.0-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.1-generate-preview |
veo-3.1-generate-preview
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.1-fast-generate-preview |
veo-3.1-fast-generate-preview
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.1-generate-001 |
veo-3.1-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
vertex
|
veo-3.1-fast-generate-001 |
veo-3.1-fast-generate-001
|
0.00 |
0.00 |
Source: vertex, Context: 1024
|
|
|
voyage
|
rerank-2 |
rerank-2
|
0.05 |
0.00 |
Source: voyage, Context: 16000
|
|
|
voyage
|
rerank-2-lite |
rerank-2-lite
|
0.02 |
0.00 |
Source: voyage, Context: 8000
|
|
|
voyage
|
rerank-2.5 |
rerank-2.5
|
0.05 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
rerank-2.5-lite |
rerank-2.5-lite
|
0.02 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-2 |
voyage-2
|
0.10 |
0.00 |
Source: voyage, Context: 4000
|
|
|
voyage
|
voyage-3 |
voyage-3
|
0.06 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-3-large |
voyage-3-large
|
0.18 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-3-lite |
voyage-3-lite
|
0.02 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-3.5 |
voyage-3.5
|
0.06 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-3.5-lite |
voyage-3.5-lite
|
0.02 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-code-2 |
voyage-code-2
|
0.12 |
0.00 |
Source: voyage, Context: 16000
|
|
|
voyage
|
voyage-code-3 |
voyage-code-3
|
0.18 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-context-3 |
voyage-context-3
|
0.18 |
0.00 |
Source: voyage, Context: 120000
|
|
|
voyage
|
voyage-finance-2 |
voyage-finance-2
|
0.12 |
0.00 |
Source: voyage, Context: 32000
|
|
|
voyage
|
voyage-large-2 |
voyage-large-2
|
0.12 |
0.00 |
Source: voyage, Context: 16000
|
|
|
voyage
|
voyage-law-2 |
voyage-law-2
|
0.12 |
0.00 |
Source: voyage, Context: 16000
|
|
|
voyage
|
voyage-lite-01 |
voyage-lite-01
|
0.10 |
0.00 |
Source: voyage, Context: 4096
|
|
|
voyage
|
voyage-lite-02-instruct |
voyage-lite-02-instruct
|
0.10 |
0.00 |
Source: voyage, Context: 4000
|
|
|
voyage
|
voyage-multimodal-3 |
voyage-multimodal-3
|
0.12 |
0.00 |
Source: voyage, Context: 32000
|
|
|
wandb
|
gpt-oss-120b |
gpt-oss-120b
|
15,000.00 |
60,000.00 |
Source: wandb, Context: 131072
|
|
|
wandb
|
gpt-oss-20b |
gpt-oss-20b
|
5,000.00 |
20,000.00 |
Source: wandb, Context: 131072
|
|
|
wandb
|
GLM-4.5 |
glm-4.5
|
55,000.00 |
200,000.00 |
Source: wandb, Context: 131072
|
|
|
wandb
|
DeepSeek-V3.1 |
deepseek-v3.1
|
55,000.00 |
165,000.00 |
Source: wandb, Context: 128000
|
|
|
watsonx
|
granite-3-8b-instruct |
granite-3-8b-instruct
|
0.20 |
0.20 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
mistral-large |
mistral-large
|
3.00 |
10.00 |
Source: watsonx, Context: 131072
|
|
|
watsonx
|
mt0-xxl-13b |
mt0-xxl-13b
|
500.00 |
2,000.00 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
jais-13b-chat |
jais-13b-chat
|
500.00 |
2,000.00 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
flan-t5-xl-3b |
flan-t5-xl-3b
|
0.60 |
0.60 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
granite-13b-chat-v2 |
granite-13b-chat-v2
|
0.60 |
0.60 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
granite-13b-instruct-v2 |
granite-13b-instruct-v2
|
0.60 |
0.60 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
granite-3-3-8b-instruct |
granite-3-3-8b-instruct
|
0.20 |
0.20 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
granite-4-h-small |
granite-4-h-small
|
0.06 |
0.25 |
Source: watsonx, Context: 20480
|
|
|
watsonx
|
granite-guardian-3-2-2b |
granite-guardian-3-2-2b
|
0.10 |
0.10 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
granite-guardian-3-3-8b |
granite-guardian-3-3-8b
|
0.20 |
0.20 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
granite-ttm-1024-96-r2 |
granite-ttm-1024-96-r2
|
0.38 |
0.38 |
Source: watsonx, Context: 512
|
|
|
watsonx
|
granite-ttm-1536-96-r2 |
granite-ttm-1536-96-r2
|
0.38 |
0.38 |
Source: watsonx, Context: 512
|
|
|
watsonx
|
granite-ttm-512-96-r2 |
granite-ttm-512-96-r2
|
0.38 |
0.38 |
Source: watsonx, Context: 512
|
|
|
watsonx
|
granite-vision-3-2-2b |
granite-vision-3-2-2b
|
0.10 |
0.10 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
llama-3-2-11b-vision-instruct |
llama-3-2-11b-vision-instruct
|
0.35 |
0.35 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
llama-3-2-1b-instruct |
llama-3-2-1b-instruct
|
0.10 |
0.10 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
llama-3-2-3b-instruct |
llama-3-2-3b-instruct
|
0.15 |
0.15 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
llama-3-2-90b-vision-instruct |
llama-3-2-90b-vision-instruct
|
2.00 |
2.00 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
llama-3-3-70b-instruct |
llama-3-3-70b-instruct
|
0.71 |
0.71 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
llama-4-maverick-17b |
llama-4-maverick-17b
|
0.35 |
1.40 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
llama-guard-3-11b-vision |
llama-guard-3-11b-vision
|
0.35 |
0.35 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
mistral-medium-2505 |
mistral-medium-2505
|
3.00 |
10.00 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
mistral-small-2503 |
mistral-small-2503
|
0.10 |
0.30 |
Source: watsonx, Context: 32000
|
|
|
watsonx
|
mistral-small-3-1-24b-instruct-2503 |
mistral-small-3-1-24b-instruct-2503
|
0.10 |
0.30 |
Source: watsonx, Context: 32000
|
|
|
watsonx
|
pixtral-12b-2409 |
pixtral-12b-2409
|
0.35 |
0.35 |
Source: watsonx, Context: 128000
|
|
|
watsonx
|
gpt-oss-120b |
gpt-oss-120b
|
0.15 |
0.60 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
allam-1-13b-instruct |
allam-1-13b-instruct
|
1.80 |
1.80 |
Source: watsonx, Context: 8192
|
|
|
watsonx
|
whisper-large-v3-turbo |
whisper-large-v3-turbo
|
0.00 |
0.00 |
Source: watsonx, Context: N/A
|
|
|
openai
|
whisper-1 |
whisper-1
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
xai
|
grok-3-beta |
grok-3-beta
|
3.00 |
15.00 |
Source: xai, Context: 131072
|
|
|
xai
|
grok-3-fast-beta |
grok-3-fast-beta
|
5.00 |
25.00 |
Source: xai, Context: 131072
|
|
|
xai
|
grok-3-mini-beta |
grok-3-mini-beta
|
0.30 |
0.50 |
Source: xai, Context: 131072
|
|
|
xai
|
grok-3-mini-fast-beta |
grok-3-mini-fast-beta
|
0.60 |
4.00 |
Source: xai, Context: 131072
|
|
|
xai
|
grok-4-fast-reasoning |
grok-4-fast-reasoning
|
0.20 |
0.50 |
Source: xai, Context: 2000000
|
|
|
xai
|
grok-4-0709 |
grok-4-0709
|
3.00 |
15.00 |
Source: xai, Context: 256000
|
|
|
xai
|
grok-4-latest |
grok-4-latest
|
3.00 |
15.00 |
Source: xai, Context: 256000
|
|
|
xai
|
grok-4-1-fast-reasoning |
grok-4-1-fast-reasoning
|
0.20 |
0.50 |
Source: xai, Context: 2000000
|
|
|
xai
|
grok-4-1-fast-reasoning-latest |
grok-4-1-fast-reasoning-latest
|
0.20 |
0.50 |
Source: xai, Context: 2000000
|
|
|
xai
|
grok-4-1-fast-non-reasoning-latest |
grok-4-1-fast-non-reasoning-latest
|
0.20 |
0.50 |
Source: xai, Context: 2000000
|
|
|
xai
|
grok-code-fast |
grok-code-fast
|
0.20 |
1.50 |
Source: xai, Context: 256000
|
|
|
xai
|
grok-code-fast-1-0825 |
grok-code-fast-1-0825
|
0.20 |
1.50 |
Source: xai, Context: 256000
|
|
|
vertex
|
search_api |
search_api
|
0.00 |
0.00 |
Source: vertex, Context: N/A
|
|
|
openai
|
container |
container
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
openai
|
sora-2 |
sora-2
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
openai
|
sora-2-pro |
sora-2-pro
|
0.00 |
0.00 |
Source: openai, Context: N/A
|
|
|
azure
|
sora-2 |
sora-2
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
sora-2-pro |
sora-2-pro
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
azure
|
sora-2-pro-high-res |
sora-2-pro-high-res
|
0.00 |
0.00 |
Source: azure, Context: N/A
|
|
|
runwayml
|
gen4_turbo |
gen4_turbo
|
0.00 |
0.00 |
Source: runwayml, Context: N/A
|
|
|
runwayml
|
gen4_aleph |
gen4_aleph
|
0.00 |
0.00 |
Source: runwayml, Context: N/A
|
|
|
runwayml
|
gen3a_turbo |
gen3a_turbo
|
0.00 |
0.00 |
Source: runwayml, Context: N/A
|
|
|
runwayml
|
gen4_image |
gen4_image
|
0.00 |
0.00 |
Source: runwayml, Context: N/A
|
|
|
runwayml
|
gen4_image_turbo |
gen4_image_turbo
|
0.00 |
0.00 |
Source: runwayml, Context: N/A
|
|
|
runwayml
|
eleven_multilingual_v2 |
eleven_multilingual_v2
|
0.00 |
0.00 |
Source: runwayml, Context: N/A
|
|
|
fireworksai
|
flux-kontext-pro |
flux-kontext-pro
|
0.04 |
0.04 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
SSD-1B |
ssd-1b
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
chronos-hermes-13b-v2 |
chronos-hermes-13b-v2
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
code-llama-13b |
code-llama-13b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-13b-instruct |
code-llama-13b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-13b-python |
code-llama-13b-python
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-34b |
code-llama-34b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-34b-instruct |
code-llama-34b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-34b-python |
code-llama-34b-python
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-70b |
code-llama-70b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
code-llama-70b-instruct |
code-llama-70b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
code-llama-70b-python |
code-llama-70b-python
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
code-llama-7b |
code-llama-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-7b-instruct |
code-llama-7b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-llama-7b-python |
code-llama-7b-python
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
code-qwen-1p5-7b |
code-qwen-1p5-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 65536
|
|
|
fireworksai
|
codegemma-2b |
codegemma-2b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
codegemma-7b |
codegemma-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
cogito-671b-v2-p1 |
cogito-671b-v2-p1
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 163840
|
|
|
fireworksai
|
cogito-v1-preview-llama-3b |
cogito-v1-preview-llama-3b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
cogito-v1-preview-llama-70b |
cogito-v1-preview-llama-70b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
cogito-v1-preview-llama-8b |
cogito-v1-preview-llama-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
cogito-v1-preview-qwen-14b |
cogito-v1-preview-qwen-14b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
cogito-v1-preview-qwen-32b |
cogito-v1-preview-qwen-32b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
flux-kontext-max |
flux-kontext-max
|
0.08 |
0.08 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
dbrx-instruct |
dbrx-instruct
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
deepseek-coder-1b-base |
deepseek-coder-1b-base
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
deepseek-coder-33b-instruct |
deepseek-coder-33b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
deepseek-coder-7b-base |
deepseek-coder-7b-base
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
deepseek-coder-7b-base-v1p5 |
deepseek-coder-7b-base-v1p5
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
deepseek-coder-7b-instruct-v1p5 |
deepseek-coder-7b-instruct-v1p5
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
deepseek-coder-v2-lite-base |
deepseek-coder-v2-lite-base
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 163840
|
|
|
fireworksai
|
deepseek-coder-v2-lite-instruct |
deepseek-coder-v2-lite-instruct
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 163840
|
|
|
fireworksai
|
deepseek-prover-v2 |
deepseek-prover-v2
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 163840
|
|
|
fireworksai
|
deepseek-r1-0528-distill-qwen3-8b |
deepseek-r1-0528-distill-qwen3-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
deepseek-r1-distill-llama-70b |
deepseek-r1-distill-llama-70b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
deepseek-r1-distill-llama-8b |
deepseek-r1-distill-llama-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
deepseek-r1-distill-qwen-14b |
deepseek-r1-distill-qwen-14b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
deepseek-r1-distill-qwen-1p5b |
deepseek-r1-distill-qwen-1p5b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
deepseek-r1-distill-qwen-32b |
deepseek-r1-distill-qwen-32b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
deepseek-r1-distill-qwen-7b |
deepseek-r1-distill-qwen-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
deepseek-v2-lite-chat |
deepseek-v2-lite-chat
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 163840
|
|
|
fireworksai
|
deepseek-v2p5 |
deepseek-v2p5
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
devstral-small-2505 |
devstral-small-2505
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
dobby-mini-unhinged-plus-llama-3-1-8b |
dobby-mini-unhinged-plus-llama-3-1-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
dobby-unhinged-llama-3-3-70b-new |
dobby-unhinged-llama-3-3-70b-new
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
dolphin-2-9-2-qwen2-72b |
dolphin-2-9-2-qwen2-72b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
dolphin-2p6-mixtral-8x7b |
dolphin-2p6-mixtral-8x7b
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
ernie-4p5-21b-a3b-pt |
ernie-4p5-21b-a3b-pt
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
ernie-4p5-300b-a47b-pt |
ernie-4p5-300b-a47b-pt
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
fare-20b |
fare-20b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
firefunction-v1 |
firefunction-v1
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
firellava-13b |
firellava-13b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
firesearch-ocr-v6 |
firesearch-ocr-v6
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
fireworks-asr-large |
fireworks-asr-large
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
fireworks-asr-v2 |
fireworks-asr-v2
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
flux-1-dev |
flux-1-dev
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
flux-1-dev-controlnet-union |
flux-1-dev-controlnet-union
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
flux-1-dev-fp8 |
flux-1-dev-fp8
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
flux-1-schnell |
flux-1-schnell
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
flux-1-schnell-fp8 |
flux-1-schnell-fp8
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
gemma-2b-it |
gemma-2b-it
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
gemma-3-27b-it |
gemma-3-27b-it
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
gemma-7b |
gemma-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
gemma-7b-it |
gemma-7b-it
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
gemma2-9b-it |
gemma2-9b-it
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
glm-4p5v |
glm-4p5v
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
gpt-oss-safeguard-120b |
gpt-oss-safeguard-120b
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
gpt-oss-safeguard-20b |
gpt-oss-safeguard-20b
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
hermes-2-pro-mistral-7b |
hermes-2-pro-mistral-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
internvl3-38b |
internvl3-38b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
internvl3-78b |
internvl3-78b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
internvl3-8b |
internvl3-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
japanese-stable-diffusion-xl |
japanese-stable-diffusion-xl
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
kat-coder |
kat-coder
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
kat-dev-32b |
kat-dev-32b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
kat-dev-72b-exp |
kat-dev-72b-exp
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama-guard-2-8b |
llama-guard-2-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
llama-guard-3-1b |
llama-guard-3-1b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama-guard-3-8b |
llama-guard-3-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama-v2-13b |
llama-v2-13b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llama-v2-13b-chat |
llama-v2-13b-chat
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llama-v2-70b |
llama-v2-70b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llama-v2-70b-chat |
llama-v2-70b-chat
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 2048
|
|
|
fireworksai
|
llama-v2-7b |
llama-v2-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llama-v2-7b-chat |
llama-v2-7b-chat
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llama-v3-70b-instruct |
llama-v3-70b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
llama-v3-70b-instruct-hf |
llama-v3-70b-instruct-hf
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
llama-v3-8b |
llama-v3-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
llama-v3-8b-instruct-hf |
llama-v3-8b-instruct-hf
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
llama-v3p1-405b-instruct-long |
llama-v3p1-405b-instruct-long
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llama-v3p1-70b-instruct |
llama-v3p1-70b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama-v3p1-70b-instruct-1b |
llama-v3p1-70b-instruct-1b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llama-v3p1-nemotron-70b-instruct |
llama-v3p1-nemotron-70b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama-v3p2-1b |
llama-v3p2-1b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama-v3p2-3b |
llama-v3p2-3b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llama-v3p3-70b-instruct |
llama-v3p3-70b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
llamaguard-7b |
llamaguard-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
llava-yi-34b |
llava-yi-34b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
minimax-m1-80k |
minimax-m1-80k
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
ministral-3-14b-instruct-2512 |
ministral-3-14b-instruct-2512
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 256000
|
|
|
fireworksai
|
ministral-3-3b-instruct-2512 |
ministral-3-3b-instruct-2512
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 256000
|
|
|
fireworksai
|
ministral-3-8b-instruct-2512 |
ministral-3-8b-instruct-2512
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 256000
|
|
|
fireworksai
|
mistral-7b |
mistral-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mistral-7b-instruct-4k |
mistral-7b-instruct-4k
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mistral-7b-instruct-v0p2 |
mistral-7b-instruct-v0p2
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mistral-7b-instruct-v3 |
mistral-7b-instruct-v3
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mistral-7b-v0p2 |
mistral-7b-v0p2
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mistral-large-3-fp8 |
mistral-large-3-fp8
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 256000
|
|
|
fireworksai
|
mistral-nemo-base-2407 |
mistral-nemo-base-2407
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
mistral-nemo-instruct-2407 |
mistral-nemo-instruct-2407
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
mistral-small-24b-instruct-2501 |
mistral-small-24b-instruct-2501
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mixtral-8x22b |
mixtral-8x22b
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 65536
|
|
|
fireworksai
|
mixtral-8x22b-instruct |
mixtral-8x22b-instruct
|
1.20 |
1.20 |
Source: fireworks_ai, Context: 65536
|
|
|
fireworksai
|
mixtral-8x7b |
mixtral-8x7b
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mixtral-8x7b-instruct |
mixtral-8x7b-instruct
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mixtral-8x7b-instruct-hf |
mixtral-8x7b-instruct-hf
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
mythomax-l2-13b |
mythomax-l2-13b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
nemotron-nano-v2-12b-vl |
nemotron-nano-v2-12b-vl
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
nous-capybara-7b-v1p9 |
nous-capybara-7b-v1p9
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
nous-hermes-2-mixtral-8x7b-dpo |
nous-hermes-2-mixtral-8x7b-dpo
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
nous-hermes-2-yi-34b |
nous-hermes-2-yi-34b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
nous-hermes-llama2-13b |
nous-hermes-llama2-13b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
nous-hermes-llama2-70b |
nous-hermes-llama2-70b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
nous-hermes-llama2-7b |
nous-hermes-llama2-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
nvidia-nemotron-nano-12b-v2 |
nvidia-nemotron-nano-12b-v2
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
nvidia-nemotron-nano-9b-v2 |
nvidia-nemotron-nano-9b-v2
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
openchat-3p5-0106-7b |
openchat-3p5-0106-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
openhermes-2-mistral-7b |
openhermes-2-mistral-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
openhermes-2p5-mistral-7b |
openhermes-2p5-mistral-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
openorca-7b |
openorca-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
phi-2-3b |
phi-2-3b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 2048
|
|
|
fireworksai
|
phi-3-mini-128k-instruct |
phi-3-mini-128k-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
phi-3-vision-128k-instruct |
phi-3-vision-128k-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32064
|
|
|
fireworksai
|
phind-code-llama-34b-python-v1 |
phind-code-llama-34b-python-v1
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
phind-code-llama-34b-v1 |
phind-code-llama-34b-v1
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
phind-code-llama-34b-v2 |
phind-code-llama-34b-v2
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
playground-v2-1024px-aesthetic |
playground-v2-1024px-aesthetic
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
playground-v2-5-1024px-aesthetic |
playground-v2-5-1024px-aesthetic
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
pythia-12b |
pythia-12b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 2048
|
|
|
fireworksai
|
qwen-qwq-32b-preview |
qwen-qwq-32b-preview
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen-v2p5-14b-instruct |
qwen-v2p5-14b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen-v2p5-7b |
qwen-v2p5-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen1p5-72b-chat |
qwen1p5-72b-chat
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2-7b-instruct |
qwen2-7b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2-vl-2b-instruct |
qwen2-vl-2b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2-vl-72b-instruct |
qwen2-vl-72b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2-vl-7b-instruct |
qwen2-vl-7b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-0p5b-instruct |
qwen2p5-0p5b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-14b |
qwen2p5-14b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen2p5-1p5b-instruct |
qwen2p5-1p5b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-32b |
qwen2p5-32b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen2p5-32b-instruct |
qwen2p5-32b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-72b |
qwen2p5-72b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen2p5-72b-instruct |
qwen2p5-72b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-7b-instruct |
qwen2p5-7b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-0p5b |
qwen2p5-coder-0p5b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-0p5b-instruct |
qwen2p5-coder-0p5b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-14b |
qwen2p5-coder-14b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-14b-instruct |
qwen2p5-coder-14b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-1p5b |
qwen2p5-coder-1p5b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-1p5b-instruct |
qwen2p5-coder-1p5b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-32b |
qwen2p5-coder-32b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-32b-instruct-128k |
qwen2p5-coder-32b-instruct-128k
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen2p5-coder-32b-instruct-32k-rope |
qwen2p5-coder-32b-instruct-32k-rope
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-32b-instruct-64k |
qwen2p5-coder-32b-instruct-64k
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 65536
|
|
|
fireworksai
|
qwen2p5-coder-3b |
qwen2p5-coder-3b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-3b-instruct |
qwen2p5-coder-3b-instruct
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-7b |
qwen2p5-coder-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-coder-7b-instruct |
qwen2p5-coder-7b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen2p5-math-72b-instruct |
qwen2p5-math-72b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
qwen2p5-vl-32b-instruct |
qwen2p5-vl-32b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
qwen2p5-vl-3b-instruct |
qwen2p5-vl-3b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
qwen2p5-vl-72b-instruct |
qwen2p5-vl-72b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
qwen2p5-vl-7b-instruct |
qwen2p5-vl-7b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
qwen3-0p6b |
qwen3-0p6b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-14b |
qwen3-14b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-1p7b |
qwen3-1p7b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen3-1p7b-fp8-draft |
qwen3-1p7b-fp8-draft
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-1p7b-fp8-draft-131072 |
qwen3-1p7b-fp8-draft-131072
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen3-1p7b-fp8-draft-40960 |
qwen3-1p7b-fp8-draft-40960
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-235b-a22b-instruct-2507 |
qwen3-235b-a22b-instruct-2507
|
0.22 |
0.88 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-235b-a22b-thinking-2507 |
qwen3-235b-a22b-thinking-2507
|
0.22 |
0.88 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-30b-a3b |
qwen3-30b-a3b
|
0.15 |
0.60 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen3-30b-a3b-instruct-2507 |
qwen3-30b-a3b-instruct-2507
|
0.50 |
0.50 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-30b-a3b-thinking-2507 |
qwen3-30b-a3b-thinking-2507
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-32b |
qwen3-32b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
qwen3-4b |
qwen3-4b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-4b-instruct-2507 |
qwen3-4b-instruct-2507
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-8b |
qwen3-8b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-coder-30b-a3b-instruct |
qwen3-coder-30b-a3b-instruct
|
0.15 |
0.60 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-coder-480b-instruct-bf16 |
qwen3-coder-480b-instruct-bf16
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
qwen3-embedding-0p6b |
qwen3-embedding-0p6b
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
qwen3-embedding-4b |
qwen3-embedding-4b
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
|
-
|
0.10 |
0.00 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-next-80b-a3b-instruct |
qwen3-next-80b-a3b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
qwen3-next-80b-a3b-thinking |
qwen3-next-80b-a3b-thinking
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
qwen3-reranker-0p6b |
qwen3-reranker-0p6b
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-reranker-4b |
qwen3-reranker-4b
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-reranker-8b |
qwen3-reranker-8b
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 40960
|
|
|
fireworksai
|
qwen3-vl-235b-a22b-instruct |
qwen3-vl-235b-a22b-instruct
|
0.22 |
0.88 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-vl-235b-a22b-thinking |
qwen3-vl-235b-a22b-thinking
|
0.22 |
0.88 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-vl-30b-a3b-instruct |
qwen3-vl-30b-a3b-instruct
|
0.15 |
0.60 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-vl-30b-a3b-thinking |
qwen3-vl-30b-a3b-thinking
|
0.15 |
0.60 |
Source: fireworks_ai, Context: 262144
|
|
|
fireworksai
|
qwen3-vl-32b-instruct |
qwen3-vl-32b-instruct
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
qwen3-vl-8b-instruct |
qwen3-vl-8b-instruct
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
qwq-32b |
qwq-32b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 131072
|
|
|
fireworksai
|
rolm-ocr |
rolm-ocr
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 128000
|
|
|
fireworksai
|
snorkel-mistral-7b-pairrm-dpo |
snorkel-mistral-7b-pairrm-dpo
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
stable-diffusion-xl-1024-v1-0 |
stable-diffusion-xl-1024-v1-0
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
stablecode-3b |
stablecode-3b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
starcoder-16b |
starcoder-16b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
starcoder-7b |
starcoder-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 8192
|
|
|
fireworksai
|
starcoder2-15b |
starcoder2-15b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
starcoder2-3b |
starcoder2-3b
|
0.10 |
0.10 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
starcoder2-7b |
starcoder2-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 16384
|
|
|
fireworksai
|
toppy-m-7b |
toppy-m-7b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
fireworksai
|
whisper-v3 |
whisper-v3
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
whisper-v3-turbo |
whisper-v3-turbo
|
0.00 |
0.00 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
yi-34b |
yi-34b
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
yi-34b-200k-capybara |
yi-34b-200k-capybara
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 200000
|
|
|
fireworksai
|
yi-34b-chat |
yi-34b-chat
|
0.90 |
0.90 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
yi-6b |
yi-6b
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 4096
|
|
|
fireworksai
|
zephyr-7b-beta |
zephyr-7b-beta
|
0.20 |
0.20 |
Source: fireworks_ai, Context: 32768
|
|
|
openrouter
|
ByteDance Seed: Seed 1.6 Flash |
seed-1.6-flash
|
0.08 |
0.30 |
Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens. Context: 262144
|
|
|
openrouter
|
ByteDance Seed: Seed 1.6 |
seed-1.6
|
0.25 |
2.00 |
Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window. Context: 262144
|
|
|
openrouter
|
MiniMax: MiniMax M2.1 |
minimax-m2.1
|
0.12 |
0.48 |
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world capability while maintaining exceptional latency, scalability, and cost efficiency.
Compared to its predecessor, M2.1 delivers cleaner, more concise outputs and faster perceived response times. It shows leading multilingual coding performance across major systems and application languages, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, and serves as a versatile agent “brain” for IDEs, coding tools, and general-purpose assistance.
To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks). Context: 196608
|
|
|
openrouter
|
Z.AI: GLM 4.7 |
glm-4.7
|
0.16 |
0.80 |
GLM-4.7 is Z.AI’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics. Context: 202752
|
|
|
openrouter
|
Google: Gemini 3 Flash Preview |
gemini-3-flash-preview
|
0.50 |
3.00 |
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability.
The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models. Context: 1048576
|
|
|
openrouter
|
Mistral: Mistral Small Creative |
mistral-small-creative
|
0.10 |
0.30 |
Mistral Small Creative is an experimental small model designed for creative writing, narrative generation, roleplay and character-driven dialogue, general-purpose instruction following, and conversational agents. Context: 32768
|
|
|
openrouter
|
AllenAI: Olmo 3.1 32B Think (free) |
olmo-3.1-32b-think:free
|
0.00 |
0.00 |
Olmo 3.1 32B Think is a large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following. Building on the Olmo 3 series, version 3.1 delivers refined reasoning behavior and stronger performance across demanding evaluations and nuanced conversational tasks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Think continues the Olmo initiative’s commitment to openness, providing full transparency across model weights, code, and training methodology. Context: 65536
|
|
|
openrouter
|
Xiaomi: MiMo-V2-Flash (free) |
mimo-v2-flash:free
|
0.00 |
0.00 |
MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much.
Note: when integrating with agentic tools such as Claude Code, Cline, or Roo Code, **turn off reasoning mode** for the best and fastest performance—this model is deeply optimized for this scenario.
Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config). Context: 262144
|
|
|
openrouter
|
NVIDIA: Nemotron 3 Nano 30B A3B (free) |
nemotron-3-nano-30b-a3b:free
|
0.00 |
0.00 |
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.
The model is fully open with open-weights, datasets and recipes so developers can easily
customize, optimize, and deploy the model on their infrastructure for maximum privacy and
security.
Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems. Context: 256000
|
|
|
openrouter
|
NVIDIA: Nemotron 3 Nano 30B A3B |
nemotron-3-nano-30b-a3b
|
0.06 |
0.24 |
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.
The model is fully open with open-weights, datasets and recipes so developers can easily
customize, optimize, and deploy the model on their infrastructure for maximum privacy and
security.
Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems. Context: 262144
|
|
|
openrouter
|
OpenAI: GPT-5.2 Chat |
gpt-5.2-chat
|
1.75 |
14.00 |
GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation. Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-5.2 Pro |
gpt-5.2-pro
|
21.00 |
168.00 |
GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
|
|
|
openrouter
|
OpenAI: GPT-5.2 |
gpt-5.2
|
1.75 |
14.00 |
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks.
Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability. Context: 400000
|
|
|
openrouter
|
Mistral: Devstral 2 2512 (free) |
devstral-2512:free
|
0.00 |
0.00 |
Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window.
Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license. Context: 262144
|
|
|
openrouter
|
Mistral: Devstral 2 2512 |
devstral-2512
|
0.05 |
0.22 |
Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window.
Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license. Context: 262144
|
|
|
openrouter
|
Relace: Relace Search |
relace-search
|
1.00 |
3.00 |
The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request.
In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It's designed to serve as a subagent that passes its findings to an "oracle" coding agent, who orchestrates/performs the rest of the coding task.
To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the [Relace documentation](https://docs.relace.ai/docs/fast-agentic-search/agent). Context: 256000
|
|
|
openrouter
|
Z.AI: GLM 4.6V |
glm-4.6v
|
0.30 |
0.90 |
GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing. Context: 131072
|
|
|
openrouter
|
Nex AGI: DeepSeek V3.1 Nex N1 (free) |
deepseek-v3.1-nex-n1:free
|
0.00 |
0.00 |
DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent autonomy, tool use, and real-world productivity.
Nex-N1 demonstrates competitive performance across all evaluation scenarios, showing particularly strong results in practical coding and HTML generation tasks. Context: 131072
|
|
|
openrouter
|
EssentialAI: Rnj 1 Instruct |
rnj-1-instruct
|
0.15 |
0.15 |
Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance across multiple programming languages, tool-use workflows, and agentic execution environments (e.g., mini-SWE-agent). Context: 32768
|
|
|
openrouter
|
Body Builder (beta) |
bodybuilder
|
-1,000,000.00 |
-1,000,000.00 |
Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder will construct the appropriate API calls. Example: "count to 10 using gemini and opus."
This is useful for creating multi-model requests, custom model routers, or programmatic generation of API calls from human descriptions.
**BETA NOTICE**: Body Builder is in beta, and currently free. Pricing and functionality may change in the future. Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-5.1-Codex-Max |
gpt-5.1-codex-max
|
1.25 |
10.00 |
GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic workflows spanning software engineering, mathematics, and research.
GPT-5.1-Codex-Max delivers faster performance, improved reasoning, and higher token efficiency across the development lifecycle. Context: 400000
|
|
|
openrouter
|
Amazon: Nova 2 Lite |
nova-2-lite-v1
|
0.30 |
2.50 |
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text.
Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows. Context: 1000000
|
|
|
openrouter
|
Mistral: Ministral 3 14B 2512 |
ministral-14b-2512
|
0.20 |
0.20 |
The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. Context: 262144
|
|
|
openrouter
|
Mistral: Ministral 3 8B 2512 |
ministral-8b-2512
|
0.15 |
0.15 |
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities. Context: 262144
|
|
|
openrouter
|
Mistral: Ministral 3 3B 2512 |
ministral-3b-2512
|
0.10 |
0.10 |
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities. Context: 131072
|
|
|
openrouter
|
Mistral: Mistral Large 3 2512 |
mistral-large-2512
|
0.50 |
1.50 |
Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license. Context: 262144
|
|
|
openrouter
|
Arcee AI: Trinity Mini (free) |
trinity-mini:free
|
0.00 |
0.00 |
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows. Context: 131072
|
|
|
openrouter
|
Arcee AI: Trinity Mini |
trinity-mini
|
0.05 |
0.15 |
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows. Context: 131072
|
|
|
openrouter
|
DeepSeek: DeepSeek V3.2 Speciale |
deepseek-v3.2-speciale
|
0.27 |
0.41 |
DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on DeepSeek Sparse Attention (DSA) for efficient long-context processing, then scales post-training reinforcement learning to push capability beyond the base model. Reported evaluations place Speciale ahead of GPT-5 on difficult reasoning workloads, with proficiency comparable to Gemini-3.0-Pro, while retaining strong coding and tool-use reliability. Like V3.2, it benefits from a large-scale agentic task synthesis pipeline that improves compliance and generalization in interactive environments. Context: 163840
|
|
|
openrouter
|
DeepSeek: DeepSeek V3.2 |
deepseek-v3.2
|
0.25 |
0.38 |
DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.
Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 163840
|
|
|
openrouter
|
Prime Intellect: INTELLECT-3 |
intellect-3
|
0.20 |
1.10 |
INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL). It offers state-of-the-art performance for its size across math, code, science, and general reasoning, consistently outperforming many larger frontier models. Designed for strong multi-step problem solving, it maintains high accuracy on structured tasks while remaining efficient at inference thanks to its MoE architecture. Context: 131072
|
|
|
openrouter
|
TNG: R1T Chimera |
tng-r1t-chimera
|
0.25 |
0.85 |
TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter.
Characteristics and improvements include:
We think that it has a creative and pleasant personality.
It has a preliminary EQ-Bench3 value of about 1305.
It is quite a bit more intelligent than the original, albeit a slightly slower.
It is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated.
Tool calling is much improved.
TNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1). Context: 163840
|
|
|
openrouter
|
Anthropic: Claude Opus 4.5 |
claude-opus-4.5
|
5.00 |
25.00 |
Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high.
Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks. Context: 200000
|
|
|
openrouter
|
AllenAI: Olmo 3 32B Think (free) |
olmo-3-32b-think:free
|
0.00 |
0.00 |
Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and highly nuanced conversational reasoning. Developed by Ai2 under the Apache 2.0 license, Olmo 3 32B Think embodies the Olmo initiative’s commitment to openness, offering full transparency across weights, code and training methodology. Context: 65536
|
|
|
openrouter
|
AllenAI: Olmo 3 7B Instruct |
olmo-3-7b-instruct
|
0.10 |
0.20 |
Olmo 3 7B Instruct is a supervised instruction-fine-tuned variant of the Olmo 3 7B base model, optimized for instruction-following, question-answering, and natural conversational dialogue. By leveraging high-quality instruction data and an open training pipeline, it delivers strong performance across everyday NLP tasks while remaining accessible and easy to integrate. Developed by Ai2 under the Apache 2.0 license, the model offers a transparent, community-friendly option for instruction-driven applications. Context: 65536
|
|
|
openrouter
|
AllenAI: Olmo 3 7B Think |
olmo-3-7b-think
|
0.12 |
0.20 |
Olmo 3 7B Think is a research-oriented language model in the Olmo family designed for advanced reasoning and instruction-driven tasks. It excels at multi-step problem solving, logical inference, and maintaining coherent conversational context. Developed by Ai2 under the Apache 2.0 license, Olmo 3 7B Think supports transparent, fully open experimentation and provides a lightweight yet capable foundation for academic research and practical NLP workflows. Context: 65536
|
|
|
openrouter
|
Google: Nano Banana Pro (Gemini 3 Pro Image Preview) |
gemini-3-pro-image-preview
|
2.00 |
12.00 |
Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model generates context-rich graphics, from infographics and diagrams to cinematic composites, and can incorporate real-time information via Search grounding.
It offers industry-leading text rendering in images (including long passages and multilingual layouts), consistent multi-image blending, and accurate identity preservation across up to five subjects. Nano Banana Pro adds fine-grained creative controls such as localized edits, lighting and focus adjustments, camera transformations, and support for 2K/4K outputs and flexible aspect ratios. It is designed for professional-grade design, product visualization, storyboarding, and complex multi-element compositions while remaining efficient for general image creation workflows. Context: 65536
|
|
|
openrouter
|
xAI: Grok 4.1 Fast |
grok-4.1-fast
|
0.20 |
0.50 |
Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window.
Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) Context: 2000000
|
|
|
openrouter
|
Google: Gemini 3 Pro Preview |
gemini-3-pro-preview
|
2.00 |
12.00 |
Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses.
Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing. Context: 1048576
|
|
|
openrouter
|
Deep Cogito: Cogito v2.1 671B |
cogito-v2.1-671b
|
1.25 |
1.25 |
Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning to reach state-of-the-art performance on multiple categories (instruction following, coding, longer queries and creative writing). This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement. Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-5.1 |
gpt-5.1
|
1.25 |
10.00 |
GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems.
Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5 Context: 400000
|
|
|
openrouter
|
OpenAI: GPT-5.1 Chat |
gpt-5.1-chat
|
1.25 |
10.00 |
GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-5.1-Codex |
gpt-5.1-codex
|
1.25 |
10.00 |
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level)
Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. Context: 400000
|
|
|
openrouter
|
OpenAI: GPT-5.1-Codex-Mini |
gpt-5.1-codex-mini
|
0.25 |
2.00 |
GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex Context: 400000
|
|
|
openrouter
|
Kwaipilot: KAT-Coder-Pro V1 (free) |
kat-coder-pro:free
|
0.00 |
0.00 |
KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark.
The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL. Context: 256000
|
|
|
openrouter
|
Kwaipilot: KAT-Coder-Pro V1 |
kat-coder-pro
|
0.21 |
0.83 |
KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark.
The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL. Context: 256000
|
|
|
openrouter
|
MoonshotAI: Kimi K2 Thinking |
kimi-k2-thinking
|
0.32 |
0.48 |
Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift.
It sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks. Context: 262144
|
|
|
openrouter
|
Amazon: Nova Premier 1.0 |
nova-premier-v1
|
2.50 |
12.50 |
Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models. Context: 1000000
|
|
|
openrouter
|
Perplexity: Sonar Pro Search |
sonar-pro-search
|
3.00 |
15.00 |
Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests. This model powers the Pro Search mode on the Perplexity platform.
Sonar Pro Search adds autonomous, multi-step reasoning to Sonar Pro. So, instead of just one query + synthesis, it plans and executes entire research workflows using tools. Context: 200000
|
|
|
openrouter
|
Mistral: Voxtral Small 24B 2507 |
voxtral-small-24b-2507
|
0.10 |
0.30 |
Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio is priced at $100 per million seconds. Context: 32000
|
|
|
openrouter
|
OpenAI: gpt-oss-safeguard-20b |
gpt-oss-safeguard-20b
|
0.08 |
0.30 |
gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust & safety labeling.
Learn more about this model in OpenAI's gpt-oss-safeguard [user guide](https://cookbook.openai.com/articles/gpt-oss-safeguard-guide). Context: 131072
|
|
|
openrouter
|
NVIDIA: Nemotron Nano 12B 2 VL (free) |
nemotron-nano-12b-v2-vl:free
|
0.00 |
0.00 |
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency.
The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension.
Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost.
Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes. Context: 128000
|
|
|
openrouter
|
NVIDIA: Nemotron Nano 12B 2 VL |
nemotron-nano-12b-v2-vl
|
0.20 |
0.60 |
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency.
The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension.
Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost.
Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes. Context: 131072
|
|
|
openrouter
|
MiniMax: MiniMax M2 |
minimax-m2
|
0.20 |
1.00 |
MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning, tool use, and multi-step task execution while maintaining low latency and deployment efficiency.
The model excels in code generation, multi-file editing, compile-run-fix loops, and test-validated repair, showing strong results on SWE-Bench Verified, Multi-SWE-Bench, and Terminal-Bench. It also performs competitively in agentic evaluations such as BrowseComp and GAIA, effectively handling long-horizon planning, retrieval, and recovery from execution errors.
Benchmarked by [Artificial Analysis](https://artificialanalysis.ai/models/minimax-m2), MiniMax-M2 ranks among the top open-source models for composite intelligence, spanning mathematics, science, and instruction-following. Its small activation footprint enables fast inference, high concurrency, and improved unit economics, making it well-suited for large-scale agents, developer assistants, and reasoning-driven applications that require responsiveness and cost efficiency.
To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks). Context: 196608
|
|
|
openrouter
|
Qwen: Qwen3 VL 32B Instruct |
qwen3-vl-32b-instruct
|
0.50 |
1.50 |
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks. Context: 262144
|
|
|
openrouter
|
LiquidAI/LFM2-8B-A1B |
lfm2-8b-a1b
|
0.05 |
0.10 |
Model created via inbox interface Context: 32768
|
|
|
openrouter
|
LiquidAI/LFM2-2.6B |
lfm-2.2-6b
|
0.05 |
0.10 |
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. Context: 32768
|
|
|
openrouter
|
IBM: Granite 4.0 Micro |
granite-4.0-h-micro
|
0.02 |
0.11 |
Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long context tool calling. Context: 131000
|
|
|
openrouter
|
Deep Cogito: Cogito V2 Preview Llama 405B |
cogito-v2-preview-llama-405b
|
3.50 |
3.50 |
Cogito v2 405B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. It represents a significant step toward frontier intelligence with dense architecture delivering performance competitive with leading closed models. This advanced reasoning system combines policy improvement with massive scale for exceptional capabilities.
Context: 32768
|
|
|
openrouter
|
OpenAI: GPT-5 Image Mini |
gpt-5-image-mini
|
2.50 |
2.00 |
GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text rendering, and detailed image editing with reduced latency and cost. It excels at high-quality visual creation while maintaining strong text understanding, making it ideal for applications that require both efficient image generation and text processing at scale. Context: 400000
|
|
|
openrouter
|
Anthropic: Claude Haiku 4.5 |
claude-haiku-4.5
|
1.00 |
5.00 |
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications.
It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment. Context: 200000
|
|
|
openrouter
|
Qwen: Qwen3 VL 8B Thinking |
qwen3-vl-8b-thinking
|
0.18 |
2.10 |
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs.
Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs. Context: 256000
|
|
|
openrouter
|
Qwen: Qwen3 VL 8B Instruct |
qwen3-vl-8b-instruct
|
0.08 |
0.50 |
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization.
The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions. Context: 131072
|
|
|
openrouter
|
OpenAI: GPT-5 Image |
gpt-5-image
|
10.00 |
10.00 |
[GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following, text rendering, and detailed image editing. Context: 400000
|
|
|
openrouter
|
OpenAI: o3 Deep Research |
o3-deep-research
|
10.00 |
40.00 |
o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks.
Note: This model always uses the 'web_search' tool which adds additional cost. Context: 200000
|
|
|
openrouter
|
OpenAI: o4 Mini Deep Research |
o4-mini-deep-research
|
2.00 |
8.00 |
o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks.
Note: This model always uses the 'web_search' tool which adds additional cost. Context: 200000
|
|
|
openrouter
|
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 |
llama-3.3-nemotron-super-49b-v1.5
|
0.10 |
0.40 |
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality.
In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.
Context: 131072
|
|
|
openrouter
|
Baidu: ERNIE 4.5 21B A3B Thinking |
ernie-4.5-21b-a3b-thinking
|
0.07 |
0.28 |
ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performance in logical puzzles, math, science, coding, text generation, and expert-level academic benchmarks. Context: 131072
|
|
|
openrouter
|
Google: Gemini 2.5 Flash Image (Nano Banana) |
gemini-2.5-flash-image
|
0.30 |
2.50 |
Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration) Context: 32768
|
|
|
openrouter
|
Qwen: Qwen3 VL 30B A3B Thinking |
qwen3-vl-30b-a3b-thinking
|
0.20 |
1.00 |
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research. Context: 131072
|
|
|
openrouter
|
Qwen: Qwen3 VL 30B A3B Instruct |
qwen3-vl-30b-a3b-instruct
|
0.15 |
0.60 |
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research. Context: 262144
|
|
|
openrouter
|
OpenAI: GPT-5 Pro |
gpt-5-pro
|
15.00 |
120.00 |
GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
|
|
|
openrouter
|
Z.AI: GLM 4.6 |
glm-4.6
|
0.35 |
1.50 |
Compared with GLM-4.5, this generation brings several key improvements:
Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.
Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.
Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.
More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.
Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. Context: 202752
|
|
|
openrouter
|
Z.AI: GLM 4.6 (exacto) |
glm-4.6:exacto
|
0.44 |
1.76 |
Compared with GLM-4.5, this generation brings several key improvements:
Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.
Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.
Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.
More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.
Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. Context: 204800
|
|
|
openrouter
|
Anthropic: Claude Sonnet 4.5 |
claude-sonnet-4.5
|
3.00 |
15.00 |
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking.
Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use. Context: 1000000
|
|
|
openrouter
|
DeepSeek: DeepSeek V3.2 Exp |
deepseek-v3.2-exp
|
0.21 |
0.32 |
DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)
The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs. Context: 163840
|
|
|
openrouter
|
TheDrummer: Cydonia 24B V4.1 |
cydonia-24b-v4.1
|
0.30 |
0.50 |
Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence. Context: 131072
|
|
|
openrouter
|
Relace: Relace Apply 3 |
relace-apply-3
|
0.85 |
1.25 |
Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at 10,000 tokens/sec on average.
The model requires the prompt to be in the following format:
<instruction>{instruction}</instruction>
<code>{initial_code}</code>
<update>{edit_snippet}</update>
Zero Data Retention is enabled for Relace. Learn more about this model in their [documentation](https://docs.relace.ai/api-reference/instant-apply/apply) Context: 256000
|
|
|
openrouter
|
Google: Gemini 2.5 Flash Preview 09-2025 |
gemini-2.5-flash-preview-09-2025
|
0.30 |
2.50 |
Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling.
Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning). Context: 1048576
|
|
|
openrouter
|
Google: Gemini 2.5 Flash Lite Preview 09-2025 |
gemini-2.5-flash-lite-preview-09-2025
|
0.10 |
0.40 |
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. Context: 1048576
|
|
|
openrouter
|
Qwen: Qwen3 VL 235B A22B Thinking |
qwen3-vl-235b-a22b-thinking
|
0.45 |
3.50 |
Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning.
Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents. Context: 262144
|
|
|
openrouter
|
Qwen: Qwen3 VL 235B A22B Instruct |
qwen3-vl-235b-a22b-instruct
|
0.12 |
0.56 |
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning.
Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents. Context: 262144
|
|
|
openrouter
|
Qwen: Qwen3 Max |
qwen3-max
|
1.20 |
6.00 |
Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode. Context: 256000
|
|
|
openrouter
|
Qwen: Qwen3 Coder Plus |
qwen3-coder-plus
|
1.00 |
5.00 |
Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities. Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-5 Codex |
gpt-5-codex
|
1.25 |
10.00 |
GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level)
Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. Context: 400000
|
|
|
openrouter
|
DeepSeek: DeepSeek V3.1 Terminus (exacto) |
deepseek-v3.1-terminus:exacto
|
0.21 |
0.79 |
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)
The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. Context: 163840
|
|
|
openrouter
|
DeepSeek: DeepSeek V3.1 Terminus |
deepseek-v3.1-terminus
|
0.21 |
0.79 |
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)
The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. Context: 163840
|
|
|
openrouter
|
xAI: Grok 4 Fast |
grok-4-fast
|
0.20 |
0.50 |
Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model on xAI's [news post](http://x.ai/news/grok-4-fast).
Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) Context: 2000000
|
|
|
openrouter
|
Tongyi DeepResearch 30B A3B |
tongyi-deepresearch-30b-a3b
|
0.09 |
0.40 |
Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billion per token. It's optimized for long-horizon, deep information-seeking tasks and delivers state-of-the-art performance on benchmarks like Humanity's Last Exam, BrowserComp, BrowserComp-ZH, WebWalkerQA, GAIA, xbench-DeepSearch, and FRAMES. This makes it superior for complex agentic search, reasoning, and multi-step problem-solving compared to prior models.
The model includes a fully automated synthetic data pipeline for scalable pre-training, fine-tuning, and reinforcement learning. It uses large-scale continual pre-training on diverse agentic data to boost reasoning and stay fresh. It also features end-to-end on-policy RL with a customized Group Relative Policy Optimization, including token-level gradients and negative sample filtering for stable training. The model supports ReAct for core ability checks and an IterResearch-based 'Heavy' mode for max performance through test-time scaling. It's ideal for advanced research agents, tool use, and heavy inference workflows. Context: 131072
|
|
|
openrouter
|
Qwen: Qwen3 Coder Flash |
qwen3-coder-flash
|
0.30 |
1.50 |
Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities. Context: 128000
|
|
|
openrouter
|
OpenGVLab: InternVL3 78B |
internvl3-78b
|
0.10 |
0.39 |
The InternVL3 series is an advanced multimodal large language model (MLLM). Compared to InternVL 2.5, InternVL3 demonstrates stronger multimodal perception and reasoning capabilities.
In addition, InternVL3 is benchmarked against the Qwen2.5 Chat models, whose pre-trained base models serve as the initialization for its language component. Benefiting from Native Multimodal Pre-Training, the InternVL3 series surpasses the Qwen2.5 series in overall text performance. Context: 32768
|
|
|
openrouter
|
Qwen: Qwen3 Next 80B A3B Thinking |
qwen3-next-80b-a3b-thinking
|
0.15 |
1.20 |
Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior.
The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode. Context: 262144
|
|
|
openrouter
|
Qwen: Qwen3 Next 80B A3B Instruct |
qwen3-next-80b-a3b-instruct
|
0.06 |
0.60 |
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought.
The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred. Context: 262144
|
|
|
openrouter
|
Meituan: LongCat Flash Chat |
longcat-flash-chat
|
0.20 |
0.80 |
LongCat-Flash-Chat is a large-scale Mixture-of-Experts (MoE) model with 560B total parameters, of which 18.6B–31.3B (≈27B on average) are dynamically activated per input. It introduces a shortcut-connected MoE design to reduce communication overhead and achieve high throughput while maintaining training stability through advanced scaling strategies such as hyperparameter transfer, deterministic computation, and multi-stage optimization.
This release, LongCat-Flash-Chat, is a non-thinking foundation model optimized for conversational and agentic tasks. It supports long context windows up to 128K tokens and shows competitive performance across reasoning, coding, instruction following, and domain benchmarks, with particular strengths in tool use and complex multi-step interactions. Context: 131072
|
|
|
openrouter
|
Qwen: Qwen Plus 0728 |
qwen-plus-2025-07-28
|
0.40 |
1.20 |
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination. Context: 1000000
|
|
|
openrouter
|
Qwen: Qwen Plus 0728 (thinking) |
qwen-plus-2025-07-28:thinking
|
0.40 |
4.00 |
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination. Context: 1000000
|
|
|
openrouter
|
NVIDIA: Nemotron Nano 9B V2 (free) |
nemotron-nano-9b-v2:free
|
0.00 |
0.00 |
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response.
The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so. Context: 128000
|
|
|
openrouter
|
NVIDIA: Nemotron Nano 9B V2 |
nemotron-nano-9b-v2
|
0.04 |
0.16 |
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response.
The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so. Context: 131072
|
|
|
openrouter
|
MoonshotAI: Kimi K2 0905 |
kimi-k2-0905
|
0.39 |
1.90 |
Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k.
This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training. Context: 262144
|
|
|
openrouter
|
MoonshotAI: Kimi K2 0905 (exacto) |
kimi-k2-0905:exacto
|
0.60 |
2.50 |
Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k.
This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training. Context: 262144
|
|
|
openrouter
|
Deep Cogito: Cogito V2 Preview Llama 70B |
cogito-v2-preview-llama-70b
|
0.88 |
0.88 |
Cogito v2 70B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. Built with iterative policy improvement, it delivers strong performance across reasoning tasks while maintaining efficiency through shorter reasoning chains and improved intuition. Context: 32768
|
|
|
openrouter
|
Cogito V2 Preview Llama 109B |
cogito-v2-preview-llama-109b-moe
|
0.18 |
0.59 |
An instruction-tuned, hybrid-reasoning Mixture-of-Experts model built on Llama-4-Scout-17B-16E. Cogito v2 can answer directly or engage an extended “thinking” phase, with alignment guided by Iterated Distillation & Amplification (IDA). It targets coding, STEM, instruction following, and general helpfulness, with stronger multilingual, tool-calling, and reasoning performance than size-equivalent baselines. The model supports long-context use (up to 10M tokens) and standard Transformers workflows. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 32767
|
|
|
openrouter
|
StepFun: Step3 |
step3
|
0.57 |
1.42 |
Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators. Context: 65536
|
|
|
openrouter
|
Qwen: Qwen3 30B A3B Thinking 2507 |
qwen3-30b-a3b-thinking-2507
|
0.05 |
0.34 |
Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated from final answers.
Compared to earlier Qwen3-30B releases, this version improves performance across logical reasoning, mathematics, science, coding, and multilingual benchmarks. It also demonstrates stronger instruction following, tool use, and alignment with human preferences. With higher reasoning efficiency and extended output budgets, it is best suited for advanced research, competitive problem solving, and agentic applications requiring structured long-context reasoning. Context: 32768
|
|
|
openrouter
|
xAI: Grok Code Fast 1 |
grok-code-fast-1
|
0.20 |
1.50 |
Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows. Context: 256000
|
|
|
openrouter
|
Nous: Hermes 4 70B |
hermes-4-70b
|
0.11 |
0.38 |
Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit <think>...</think> reasoning traces before answering. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)
This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates. Context: 131072
|
|
|
openrouter
|
Nous: Hermes 4 405B |
hermes-4-405b
|
1.00 |
3.00 |
Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with <think>...</think> traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)
The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior. Context: 131072
|
|
|
openrouter
|
Google: Gemini 2.5 Flash Image Preview (Nano Banana) |
gemini-2.5-flash-image-preview
|
0.30 |
2.50 |
Gemini 2.5 Flash Image Preview, a.k.a. "Nano Banana," is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Context: 32768
|
|
|
openrouter
|
DeepSeek: DeepSeek V3.1 |
deepseek-chat-v3.1
|
0.15 |
0.75 |
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)
The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.
It succeeds the [DeepSeek V3-0324](/deepseek/deepseek-chat-v3-0324) model and performs well on a variety of tasks. Context: 32768
|
|
|
openrouter
|
OpenAI: GPT-4o Audio |
gpt-4o-audio-preview
|
2.50 |
10.00 |
The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs are currently not supported. Audio tokens are priced at $40 per million input audio tokens. Context: 128000
|
|
|
openrouter
|
Mistral: Mistral Medium 3.1 |
mistral-medium-3.1
|
0.40 |
2.00 |
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases.
The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments. Context: 131072
|
|
|
openrouter
|
Baidu: ERNIE 4.5 21B A3B |
ernie-4.5-21b-a3b
|
0.07 |
0.28 |
A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understanding and generation through heterogeneous MoE structures and modality-isolated routing. Supporting an extensive 131K token context length, the model achieves efficient inference via multi-expert parallel collaboration and quantization, while advanced post-training techniques including SFT, DPO, and UPO ensure optimized performance across diverse applications with specialized routing and balancing losses for superior task handling. Context: 120000
|
|
|
openrouter
|
Baidu: ERNIE 4.5 VL 28B A3B |
ernie-4.5-vl-28b-a3b
|
0.14 |
0.56 |
A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities. Context: 30000
|
|
|
openrouter
|
Z.AI: GLM 4.5V |
glm-4.5v
|
0.60 |
1.80 |
GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 65536
|
|
|
openrouter
|
AI21: Jamba Mini 1.7 |
jamba-mini-1.7
|
0.20 |
0.40 |
Jamba Mini 1.7 is a compact and efficient member of the Jamba open model family, incorporating key improvements in grounding and instruction-following while maintaining the benefits of the SSM-Transformer hybrid architecture and 256K context window. Despite its compact size, it delivers accurate, contextually grounded responses and improved steerability. Context: 256000
|
|
|
openrouter
|
AI21: Jamba Large 1.7 |
jamba-large-1.7
|
2.00 |
8.00 |
Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions. Context: 256000
|
|
|
openrouter
|
OpenAI: GPT-5 Chat |
gpt-5-chat
|
1.25 |
10.00 |
GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications. Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-5 |
gpt-5
|
1.25 |
10.00 |
GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
|
|
|
openrouter
|
OpenAI: GPT-5 Mini |
gpt-5-mini
|
0.25 |
2.00 |
GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. GPT-5 Mini is the successor to OpenAI's o4-mini model. Context: 400000
|
|
|
openrouter
|
OpenAI: GPT-5 Nano |
gpt-5-nano
|
0.05 |
0.40 |
GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger counterparts, it retains key instruction-following and safety features. It is the successor to GPT-4.1-nano and offers a lightweight option for cost-sensitive or real-time applications. Context: 400000
|
|
|
openrouter
|
OpenAI: gpt-oss-120b (free) |
gpt-oss-120b:free
|
0.00 |
0.00 |
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
|
|
|
openrouter
|
OpenAI: gpt-oss-120b |
gpt-oss-120b
|
0.02 |
0.10 |
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
|
|
|
openrouter
|
OpenAI: gpt-oss-120b (exacto) |
gpt-oss-120b:exacto
|
0.04 |
0.19 |
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
|
|
|
openrouter
|
OpenAI: gpt-oss-20b (free) |
gpt-oss-20b:free
|
0.00 |
0.00 |
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072
|
|
|
openrouter
|
OpenAI: gpt-oss-20b |
gpt-oss-20b
|
0.02 |
0.06 |
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072
|
|
|
openrouter
|
Anthropic: Claude Opus 4.1 |
claude-opus-4.1
|
15.00 |
75.00 |
Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning. Context: 200000
|
|
|
openrouter
|
Mistral: Codestral 2508 |
codestral-2508
|
0.30 |
0.90 |
Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
[Blog Post](https://mistral.ai/news/codestral-25-08) Context: 256000
|
|
|
openrouter
|
Qwen: Qwen3 Coder 30B A3B Instruct |
qwen3-coder-30b-a3b-instruct
|
0.07 |
0.27 |
Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the Qwen3 architecture, it supports a native context length of 256K tokens (extendable to 1M with Yarn) and performs strongly in tasks involving function calls, browser use, and structured code completion.
This model is optimized for instruction-following without “thinking mode”, and integrates well with OpenAI-compatible tool-use formats. Context: 160000
|
|
|
openrouter
|
Qwen: Qwen3 30B A3B Instruct 2507 |
qwen3-30b-a3b-instruct-2507
|
0.08 |
0.33 |
Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and agentic tool use. Post-trained on instruction data, it demonstrates competitive performance across reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench) benchmarks. It outperforms its non-instruct variant on subjective and open-ended tasks while retaining strong factual and coding performance. Context: 262144
|
|
|
openrouter
|
Z.AI: GLM 4.5 |
glm-4.5
|
0.35 |
1.55 |
GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
|
|
|
openrouter
|
Z.AI: GLM 4.5 Air (free) |
glm-4.5-air:free
|
0.00 |
0.00 |
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
|
|
|
openrouter
|
Z.AI: GLM 4.5 Air |
glm-4.5-air
|
0.05 |
0.22 |
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
|
|
|
openrouter
|
Qwen: Qwen3 235B A22B Thinking 2507 |
qwen3-235b-a22b-thinking-2507
|
0.11 |
0.60 |
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains.
The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases. Context: 262144
|
|
|
openrouter
|
Z.AI: GLM 4 32B |
glm-4-32b
|
0.10 |
0.10 |
GLM 4 32B is a cost-effective foundation language model.
It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks.
It is made by the same lab behind the thudm models. Context: 128000
|
|
|
openrouter
|
Qwen: Qwen3 Coder 480B A35B (free) |
qwen3-coder:free
|
0.00 |
0.00 |
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).
Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262000
|
|
|
openrouter
|
Qwen: Qwen3 Coder 480B A35B |
qwen3-coder
|
0.22 |
0.95 |
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).
Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262144
|
|
|
openrouter
|
Qwen: Qwen3 Coder 480B A35B (exacto) |
qwen3-coder:exacto
|
0.22 |
1.80 |
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).
Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262144
|
|
|
openrouter
|
ByteDance: UI-TARS 7B |
ui-tars-1.5-7b
|
0.10 |
0.20 |
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces.
This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints. Context: 128000
|
|
|
openrouter
|
Google: Gemini 2.5 Flash Lite |
gemini-2.5-flash-lite
|
0.10 |
0.40 |
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. Context: 1048576
|
|
|
openrouter
|
Qwen: Qwen3 235B A22B Instruct 2507 |
qwen3-235b-a22b-2507
|
0.07 |
0.46 |
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks).
Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench. Context: 262144
|
|
|
openrouter
|
Switchpoint Router |
router
|
0.85 |
3.40 |
Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library.
As the world of LLMs advances, our router gets smarter, ensuring you always benefit from the industry's newest models without changing your workflow.
This model is configured for a simple, flat rate per response here on OpenRouter. It's powered by the full routing engine from [Switchpoint AI](https://www.switchpoint.dev). Context: 131072
|
|
|
openrouter
|
MoonshotAI: Kimi K2 0711 (free) |
kimi-k2:free
|
0.00 |
0.00 |
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training. Context: 32768
|
|
|
openrouter
|
MoonshotAI: Kimi K2 0711 |
kimi-k2
|
0.50 |
2.40 |
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training. Context: 131072
|
|
|
openrouter
|
THUDM: GLM 4.1V 9B Thinking |
glm-4.1v-9b-thinking
|
0.04 |
0.14 |
GLM-4.1V-9B-Thinking is a 9B parameter vision-language model developed by THUDM, based on the GLM-4-9B foundation. It introduces a reasoning-centric "thinking paradigm" enhanced with reinforcement learning to improve multimodal reasoning, long-context understanding (up to 64K tokens), and complex problem solving. It achieves state-of-the-art performance among models in its class, outperforming even larger models like Qwen-2.5-VL-72B on a majority of benchmark tasks. Context: 65536
|
|
|
openrouter
|
Mistral: Devstral Medium |
devstral-medium
|
0.40 |
2.00 |
Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves 61.6% on SWE-Bench Verified, placing it ahead of Gemini 2.5 Pro and GPT-4.1 in code-related tasks, at a fraction of the cost. It is designed for generalization across prompt styles and tool use in code agents and frameworks.
Devstral Medium is available via API only (not open-weight), and supports enterprise deployment on private infrastructure, with optional fine-tuning capabilities. Context: 131072
|
|
|
openrouter
|
Mistral: Devstral Small 1.1 |
devstral-small
|
0.07 |
0.28 |
Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats.
Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes.
Context: 128000
|
|
|
openrouter
|
Venice: Uncensored (free) |
dolphin-mistral-24b-venice-edition:free
|
0.00 |
0.00 |
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models. Context: 32768
|
|
|
openrouter
|
xAI: Grok 4 |
grok-4
|
3.00 |
15.00 |
Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified. Pricing increases once the total tokens in a given request is greater than 128k tokens. See more details on the [xAI docs](https://docs.x.ai/docs/models/grok-4-0709) Context: 256000
|
|
|
openrouter
|
Google: Gemma 3n 2B (free) |
gemma-3n-e2b-it:free
|
0.00 |
0.00 |
Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based on the MatFormer architecture, it supports nested submodels and modular composition via the Mix-and-Match framework. Gemma 3n models are optimized for low-resource deployment, offering 32K context length and strong multilingual and reasoning performance across common benchmarks. This variant is trained on a diverse corpus including code, math, web, and multimodal data. Context: 8192
|
|
|
openrouter
|
Tencent: Hunyuan A13B Instruct |
hunyuan-a13b-instruct
|
0.14 |
0.57 |
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark performance across mathematics, science, coding, and multi-turn reasoning tasks, while maintaining high inference efficiency via Grouped Query Attention (GQA) and quantization support (FP8, GPTQ, etc.). Context: 131072
|
|
|
openrouter
|
TNG: DeepSeek R1T2 Chimera (free) |
deepseek-r1t2-chimera:free
|
0.00 |
0.00 |
DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks. Context: 163840
|
|
|
openrouter
|
TNG: DeepSeek R1T2 Chimera |
deepseek-r1t2-chimera
|
0.25 |
0.85 |
DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks. Context: 163840
|
|
|
openrouter
|
Morph: Morph V3 Large |
morph-v3-large
|
0.90 |
1.90 |
Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations.
The model requires the prompt to be in the following format:
<instruction>{instruction}</instruction>
<code>{initial_code}</code>
<update>{edit_snippet}</update>
Zero Data Retention is enabled for Morph. Learn more about this model in their [documentation](https://docs.morphllm.com/quickstart) Context: 262144
|
|
|
openrouter
|
Morph: Morph V3 Fast |
morph-v3-fast
|
0.80 |
1.20 |
Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations.
The model requires the prompt to be in the following format:
<instruction>{instruction}</instruction>
<code>{initial_code}</code>
<update>{edit_snippet}</update>
Zero Data Retention is enabled for Morph. Learn more about this model in their [documentation](https://docs.morphllm.com/quickstart) Context: 81920
|
|
|
openrouter
|
Baidu: ERNIE 4.5 VL 424B A47B |
ernie-4.5-vl-424b-a47b
|
0.42 |
1.25 |
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization. Context: 123000
|
|
|
openrouter
|
Baidu: ERNIE 4.5 300B A47B |
ernie-4.5-300b-a47b
|
0.28 |
1.10 |
ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in both English and Chinese. Optimized for high-throughput inference and efficient scaling, it uses a heterogeneous MoE structure with advanced routing and quantization strategies, including FP8 and 2-bit formats. This version is fine-tuned for language-only tasks and supports reasoning, tool parameters, and extended context lengths up to 131k tokens. Suitable for general-purpose LLM applications with high reasoning and throughput demands. Context: 123000
|
|
|
openrouter
|
Inception: Mercury |
mercury
|
0.25 |
1.00 |
Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post]
(https://www.inceptionlabs.ai/blog/introducing-mercury) here. Context: 128000
|
|
|
openrouter
|
Mistral: Mistral Small 3.2 24B |
mistral-small-3.2-24b-instruct
|
0.06 |
0.18 |
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks.
It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA). Context: 131072
|
|
|
openrouter
|
MiniMax: MiniMax M1 |
minimax-m1
|
0.40 |
2.20 |
MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it to process long sequences—up to 1 million tokens—while maintaining competitive FLOP efficiency. With 456 billion total parameters and 45.9B active per token, this variant is optimized for complex, multi-step reasoning tasks.
Trained via a custom reinforcement learning pipeline (CISPO), M1 excels in long-context understanding, software engineering, agentic tool use, and mathematical reasoning. Benchmarks show strong performance across FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench, often outperforming other open models like DeepSeek R1 and Qwen3-235B. Context: 1000000
|
|
|
openrouter
|
Google: Gemini 2.5 Flash |
gemini-2.5-flash
|
0.30 |
2.50 |
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling.
Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning). Context: 1048576
|
|
|
openrouter
|
Google: Gemini 2.5 Pro |
gemini-2.5-pro
|
1.25 |
10.00 |
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
|
|
|
openrouter
|
MoonshotAI: Kimi Dev 72B |
kimi-dev-72b
|
0.29 |
1.15 |
Kimi-Dev-72B is an open-source large language model fine-tuned for software engineering and issue resolution tasks. Based on Qwen2.5-72B, it is optimized using large-scale reinforcement learning that applies code patches in real repositories and validates them via full test suite execution—rewarding only correct, robust completions. The model achieves 60.4% on SWE-bench Verified, setting a new benchmark among open-source models for software bug fixing and code reasoning. Context: 131072
|
|
|
openrouter
|
OpenAI: o3 Pro |
o3-pro
|
20.00 |
80.00 |
The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.
Note that BYOK is required for this model. Set up here: https://openrouter.ai/settings/integrations Context: 200000
|
|
|
openrouter
|
xAI: Grok 3 Mini |
grok-3-mini
|
0.30 |
0.50 |
A lightweight model that thinks before responding. Fast, smart, and great for logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. Context: 131072
|
|
|
openrouter
|
xAI: Grok 3 |
grok-3
|
3.00 |
15.00 |
Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.
Context: 131072
|
|
|
openrouter
|
Google: Gemini 2.5 Pro Preview 06-05 |
gemini-2.5-pro-preview
|
1.25 |
10.00 |
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.
Context: 1048576
|
|
|
openrouter
|
DeepSeek: DeepSeek R1 0528 Qwen3 8B |
deepseek-r1-0528-qwen3-8b
|
0.06 |
0.09 |
DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro.
It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought.
The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024. Context: 128000
|
|
|
openrouter
|
DeepSeek: R1 0528 (free) |
deepseek-r1-0528:free
|
0.00 |
0.00 |
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model. Context: 163840
|
|
|
openrouter
|
DeepSeek: R1 0528 |
deepseek-r1-0528
|
0.40 |
1.75 |
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model. Context: 163840
|
|
|
openrouter
|
Anthropic: Claude Opus 4 |
claude-opus-4
|
15.00 |
75.00 |
Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation.
Read more at the [blog post here](https://www.anthropic.com/news/claude-4) Context: 200000
|
|
|
openrouter
|
Anthropic: Claude Sonnet 4 |
claude-sonnet-4
|
3.00 |
15.00 |
Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios.
Read more at the [blog post here](https://www.anthropic.com/news/claude-4) Context: 1000000
|
|
|
openrouter
|
Mistral: Devstral Small 2505 |
devstral-small-2505
|
0.06 |
0.12 |
Devstral-Small-2505 is a 24B parameter agentic LLM fine-tuned from Mistral-Small-3.1, jointly developed by Mistral AI and All Hands AI for advanced software engineering tasks. It is optimized for codebase exploration, multi-file editing, and integration into coding agents, achieving state-of-the-art results on SWE-Bench Verified (46.8%).
Devstral supports a 128k context window and uses a custom Tekken tokenizer. It is text-only, with the vision encoder removed, and is suitable for local deployment on high-end consumer hardware (e.g., RTX 4090, 32GB RAM Macs). Devstral is best used in agentic workflows via the OpenHands scaffold and is compatible with inference frameworks like vLLM, Transformers, and Ollama. It is released under the Apache 2.0 license. Context: 128000
|
|
|
openrouter
|
Google: Gemma 3n 4B (free) |
gemma-3n-e4b-it:free
|
0.00 |
0.00 |
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements.
This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/) Context: 8192
|
|
|
openrouter
|
Google: Gemma 3n 4B |
gemma-3n-e4b-it
|
0.02 |
0.04 |
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements.
This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/) Context: 32768
|
|
|
openrouter
|
OpenAI: Codex Mini |
codex-mini
|
1.50 |
6.00 |
codex-mini-latest is a fine-tuned version of o4-mini specifically for use in Codex CLI. For direct use in the API, we recommend starting with gpt-4.1. Context: 200000
|
|
|
openrouter
|
Nous: DeepHermes 3 Mistral 24B Preview |
deephermes-3-mistral-24b-preview
|
0.02 |
0.10 |
DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications.
DeepHermes 3 supports a **reasoning toggle via system prompt**, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a *"deep thinking"* mode—generating extended chains of thought wrapped in `<think></think>` tags before delivering a final answer.
System Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
Context: 32768
|
|
|
openrouter
|
Mistral: Mistral Medium 3 |
mistral-medium-3
|
0.40 |
2.00 |
Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases.
The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments. Context: 131072
|
|
|
openrouter
|
Google: Gemini 2.5 Pro Preview 05-06 |
gemini-2.5-pro-preview-05-06
|
1.25 |
10.00 |
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
|
|
|
openrouter
|
Arcee AI: Spotlight |
spotlight
|
0.18 |
0.18 |
Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests. Context: 131072
|
|
|
openrouter
|
Arcee AI: Maestro Reasoning |
maestro-reasoning
|
0.90 |
3.30 |
Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL for step‑by‑step logic. Compared to the earlier 7 B preview, the production 32 B release widens the context window to 128 k tokens and doubles pass‑rate on MATH and GSM‑8K, while also lifting code completion accuracy. Its instruction style encourages structured "thought → answer" traces that can be parsed or hidden according to user preference. That transparency pairs well with audit‑focused industries like finance or healthcare where seeing the reasoning path matters. In Arcee Conductor, Maestro is automatically selected for complex, multi‑constraint queries that smaller SLMs bounce. Context: 131072
|
|
|
openrouter
|
Arcee AI: Virtuoso Large |
virtuoso-large
|
0.75 |
1.20 |
Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and enterprise QA. Unlike many 70 B peers, it retains the 128 k context inherited from Qwen 2.5, letting it ingest books, codebases or financial filings wholesale. Training blended DeepSeek R1 distillation, multi‑epoch supervised fine‑tuning and a final DPO/RLHF alignment stage, yielding strong performance on BIG‑Bench‑Hard, GSM‑8K and long‑context Needle‑In‑Haystack tests. Enterprises use Virtuoso‑Large as the "fallback" brain in Conductor pipelines when other SLMs flag low confidence. Despite its size, aggressive KV‑cache optimizations keep first‑token latency in the low‑second range on 8× H100 nodes, making it a practical production‑grade powerhouse. Context: 131072
|
|
|
openrouter
|
Arcee AI: Coder Large |
coder-large
|
0.50 |
0.80 |
Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed GitHub, CodeSearchNet and synthetic bug‑fix corpora. It supports a 32k context window, enabling multi‑file refactoring or long diff review in a single call, and understands 30‑plus programming languages with special attention to TypeScript, Go and Terraform. Internal benchmarks show 5–8 pt gains over CodeLlama‑34 B‑Python on HumanEval and competitive BugFix scores thanks to a reinforcement pass that rewards compilable output. The model emits structured explanations alongside code blocks by default, making it suitable for educational tooling as well as production copilot scenarios. Cost‑wise, Together AI prices it well below proprietary incumbents, so teams can scale interactive coding without runaway spend. Context: 32768
|
|
|
openrouter
|
Microsoft: Phi 4 Reasoning Plus |
phi-4-reasoning-plus
|
0.07 |
0.35 |
Phi-4-reasoning-plus is an enhanced 14B parameter model from Microsoft, fine-tuned from Phi-4 with additional reinforcement learning to boost accuracy on math, science, and code reasoning tasks. It uses the same dense decoder-only transformer architecture as Phi-4, but generates longer, more comprehensive outputs structured into a step-by-step reasoning trace and final answer.
While it offers improved benchmark scores over Phi-4-reasoning across tasks like AIME, OmniMath, and HumanEvalPlus, its responses are typically ~50% longer, resulting in higher latency. Designed for English-only applications, it is well-suited for structured reasoning workflows where output quality takes priority over response speed. Context: 32768
|
|
|
openrouter
|
Inception: Mercury Coder |
mercury-coder
|
0.25 |
1.00 |
Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the [blog post here](https://www.inceptionlabs.ai/blog/introducing-mercury). Context: 128000
|
|
|
openrouter
|
Qwen: Qwen3 4B (free) |
qwen3-4b:free
|
0.00 |
0.00 |
Qwen3-4B is a 4 billion parameter dense language model from the Qwen3 series, designed to support both general-purpose and reasoning-intensive tasks. It introduces a dual-mode architecture—thinking and non-thinking—allowing dynamic switching between high-precision logical reasoning and efficient dialogue generation. This makes it well-suited for multi-turn chat, instruction following, and complex agent workflows. Context: 40960
|
|
|
openrouter
|
DeepSeek: DeepSeek Prover V2 |
deepseek-prover-v2
|
0.50 |
2.18 |
DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from [DeepSeek-Prover-V1.5](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL) Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description. Context: 163840
|
|
|
openrouter
|
Meta: Llama Guard 4 12B |
llama-guard-4-12b
|
0.18 |
0.18 |
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM—generating text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
Llama Guard 4 was aligned to safeguard against the standardized MLCommons hazards taxonomy and designed to support multimodal Llama 4 capabilities. Specifically, it combines features from previous Llama Guard models, providing content moderation for English and multiple supported languages, along with enhanced capabilities to handle mixed text-and-image prompts, including multiple images. Additionally, Llama Guard 4 is integrated into the Llama Moderations API, extending robust safety classification to text and images. Context: 163840
|
|
|
openrouter
|
Qwen: Qwen3 30B A3B |
qwen3-30b-a3b
|
0.06 |
0.22 |
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance.
Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models. Context: 40960
|
|
|
openrouter
|
Qwen: Qwen3 8B |
qwen3-8b
|
0.04 |
0.14 |
Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math, coding, and logical inference, and "non-thinking" mode for general conversation. The model is fine-tuned for instruction-following, agent integration, creative writing, and multilingual use across 100+ languages and dialects. It natively supports a 32K token context window and can extend to 131K tokens with YaRN scaling. Context: 128000
|
|
|
openrouter
|
Qwen: Qwen3 14B |
qwen3-14b
|
0.05 |
0.22 |
Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling. Context: 40960
|
|
|
openrouter
|
Qwen: Qwen3 32B |
qwen3-32b
|
0.08 |
0.24 |
Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling. Context: 40960
|
|
|
openrouter
|
Qwen: Qwen3 235B A22B |
qwen3-235b-a22b
|
0.18 |
0.54 |
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling. Context: 40960
|
|
|
openrouter
|
TNG: DeepSeek R1T Chimera (free) |
deepseek-r1t-chimera:free
|
0.00 |
0.00 |
DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks.
The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use. Context: 163840
|
|
|
openrouter
|
TNG: DeepSeek R1T Chimera |
deepseek-r1t-chimera
|
0.30 |
1.20 |
DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks.
The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use. Context: 163840
|
|
|
openrouter
|
OpenAI: o4 Mini High |
o4-mini-high
|
1.10 |
4.40 |
OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high.
OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains.
Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. Context: 200000
|
|
|
openrouter
|
OpenAI: o3 |
o3
|
2.00 |
8.00 |
o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. Context: 200000
|
|
|
openrouter
|
OpenAI: o4 Mini |
o4-mini
|
1.10 |
4.40 |
OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains.
Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. Context: 200000
|
|
|
openrouter
|
Qwen: Qwen2.5 Coder 7B Instruct |
qwen2.5-coder-7b-instruct
|
0.03 |
0.09 |
Qwen2.5-Coder-7B-Instruct is a 7B parameter instruction-tuned language model optimized for code-related tasks such as code generation, reasoning, and bug fixing. Based on the Qwen2.5 architecture, it incorporates enhancements like RoPE, SwiGLU, RMSNorm, and GQA attention with support for up to 128K tokens using YaRN-based extrapolation. It is trained on a large corpus of source code, synthetic data, and text-code grounding, providing robust performance across programming languages and agentic coding workflows.
This model is part of the Qwen2.5-Coder family and offers strong compatibility with tools like vLLM for efficient deployment. Released under the Apache 2.0 license. Context: 32768
|
|
|
openrouter
|
OpenAI: GPT-4.1 |
gpt-4.1
|
2.00 |
8.00 |
GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval. Context: 1047576
|
|
|
openrouter
|
OpenAI: GPT-4.1 Mini |
gpt-4.1-mini
|
0.40 |
1.60 |
GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints. Context: 1047576
|
|
|
openrouter
|
OpenAI: GPT-4.1 Nano |
gpt-4.1-nano
|
0.10 |
0.40 |
For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion. Context: 1047576
|
|
|
openrouter
|
EleutherAI: Llemma 7b |
llemma_7b
|
0.80 |
1.20 |
Llemma 7B is a language model for mathematics. It was initialized with Code Llama 7B weights, and trained on the Proof-Pile-2 for 200B tokens. Llemma models are particularly strong at chain-of-thought mathematical reasoning and using computational tools for mathematics, such as Python and formal theorem provers. Context: 4096
|
|
|
openrouter
|
AlfredPros: CodeLLaMa 7B Instruct Solidity |
codellama-7b-instruct-solidity
|
0.80 |
1.20 |
A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by PEFT library. Context: 4096
|
|
|
openrouter
|
xAI: Grok 3 Mini Beta |
grok-3-mini-beta
|
0.30 |
0.50 |
Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It’s ideal for reasoning-heavy tasks that don’t demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems.
Transparent "thinking" traces accessible. Defaults to low reasoning, can boost with setting `reasoning: { effort: "high" }`
Note: That there are two xAI endpoints for this model. By default when using this model we will always route you to the base endpoint. If you want the fast endpoint you can add `provider: { sort: throughput}`, to sort by throughput instead.
Context: 131072
|
|
|
openrouter
|
xAI: Grok 3 Beta |
grok-3-beta
|
3.00 |
15.00 |
Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.
Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking.
Note: That there are two xAI endpoints for this model. By default when using this model we will always route you to the base endpoint. If you want the fast endpoint you can add `provider: { sort: throughput}`, to sort by throughput instead.
Context: 131072
|
|
|
openrouter
|
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 |
llama-3.1-nemotron-ultra-253b-v1
|
0.60 |
1.80 |
Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node.
Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more. Context: 131072
|
|
|
openrouter
|
Meta: Llama 4 Maverick |
llama-4-maverick
|
0.15 |
0.60 |
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction.
Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput. Context: 1048576
|
|
|
openrouter
|
Meta: Llama 4 Scout |
llama-4-scout
|
0.08 |
0.30 |
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens.
Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025. Context: 327680
|
|
|
openrouter
|
Qwen: Qwen2.5 VL 32B Instruct |
qwen2.5-vl-32b-instruct
|
0.05 |
0.22 |
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation. Context: 16384
|
|
|
openrouter
|
DeepSeek: DeepSeek V3 0324 |
deepseek-chat-v3-0324
|
0.19 |
0.87 |
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.
It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks. Context: 163840
|
|
|
openrouter
|
OpenAI: o1-pro |
o1-pro
|
150.00 |
600.00 |
The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. Context: 200000
|
|
|
openrouter
|
Mistral: Mistral Small 3.1 24B (free) |
mistral-small-3.1-24b-instruct:free
|
0.00 |
0.00 |
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct) Context: 128000
|
|
|
openrouter
|
Mistral: Mistral Small 3.1 24B |
mistral-small-3.1-24b-instruct
|
0.03 |
0.11 |
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct) Context: 131072
|
|
|
openrouter
|
AllenAI: Olmo 2 32B Instruct |
olmo-2-0325-32b-instruct
|
0.05 |
0.20 |
OLMo-2 32B Instruct is a supervised instruction-finetuned variant of the OLMo-2 32B March 2025 base model. It excels in complex reasoning and instruction-following tasks across diverse benchmarks such as GSM8K, MATH, IFEval, and general NLP evaluation. Developed by AI2, OLMo-2 32B is part of an open, research-oriented initiative, trained primarily on English-language datasets to advance the understanding and development of open-source language models. Context: 128000
|
|
|
openrouter
|
Google: Gemma 3 4B (free) |
gemma-3-4b-it:free
|
0.00 |
0.00 |
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Context: 32768
|
|
|
openrouter
|
Google: Gemma 3 4B |
gemma-3-4b-it
|
0.02 |
0.07 |
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Context: 96000
|
|
|
openrouter
|
Google: Gemma 3 12B (free) |
gemma-3-12b-it:free
|
0.00 |
0.00 |
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after [Gemma 3 27B](google/gemma-3-27b-it) Context: 32768
|
|
|
openrouter
|
Google: Gemma 3 12B |
gemma-3-12b-it
|
0.03 |
0.10 |
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after [Gemma 3 27B](google/gemma-3-27b-it) Context: 131072
|
|
|
openrouter
|
Cohere: Command A |
command-a
|
2.50 |
10.00 |
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases.
Compared to other leading proprietary and open-weights models Command A delivers maximum performance with minimum hardware costs, excelling on business-critical agentic and multilingual tasks. Context: 256000
|
|
|
openrouter
|
OpenAI: GPT-4o-mini Search Preview |
gpt-4o-mini-search-preview
|
0.15 |
0.60 |
GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-4o Search Preview |
gpt-4o-search-preview
|
2.50 |
10.00 |
GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Context: 128000
|
|
|
openrouter
|
Google: Gemma 3 27B (free) |
gemma-3-27b-it:free
|
0.00 |
0.00 |
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it) Context: 131072
|
|
|
openrouter
|
Google: Gemma 3 27B |
gemma-3-27b-it
|
0.04 |
0.06 |
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it) Context: 131072
|
|
|
openrouter
|
TheDrummer: Skyfall 36B V2 |
skyfall-36b-v2
|
0.55 |
0.80 |
Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling. Context: 32768
|
|
|
openrouter
|
Microsoft: Phi 4 Multimodal Instruct |
phi-4-multimodal-instruct
|
0.05 |
0.10 |
Phi-4 Multimodal Instruct is a versatile 5.6B parameter foundation model that combines advanced reasoning and instruction-following capabilities across both text and visual inputs, providing accurate text outputs. The unified architecture enables efficient, low-latency inference, suitable for edge and mobile deployments. Phi-4 Multimodal Instruct supports text inputs in multiple languages including Arabic, Chinese, English, French, German, Japanese, Spanish, and more, with visual input optimized primarily for English. It delivers impressive performance on multimodal tasks involving mathematical, scientific, and document reasoning, providing developers and enterprises a powerful yet compact model for sophisticated interactive applications. For more information, see the [Phi-4 Multimodal blog post](https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/).
Context: 131072
|
|
|
openrouter
|
Perplexity: Sonar Reasoning Pro |
sonar-reasoning-pro
|
2.00 |
8.00 |
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro)
Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT). Designed for advanced use cases, it supports in-depth, multi-step queries with a larger context window and can surface more citations per search, enabling more comprehensive and extensible responses. Context: 128000
|
|
|
openrouter
|
Perplexity: Sonar Pro |
sonar-pro
|
3.00 |
15.00 |
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro)
For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like double the number of citations per search as Sonar on average. Plus, with a larger context window, it can handle longer and more nuanced searches and follow-up questions. Context: 200000
|
|
|
openrouter
|
Perplexity: Sonar Deep Research |
sonar-deep-research
|
2.00 |
8.00 |
Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains like finance, technology, health, and current events.
Notes on Pricing ([Source](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-deep-research))
- Input tokens comprise of Prompt tokens (user prompt) + Citation tokens (these are processed tokens from running searches)
- Deep Research runs multiple searches to conduct exhaustive research. Searches are priced at $5/1000 searches. A request that does 30 searches will cost $0.15 in this step.
- Reasoning is a distinct step in Deep Research since it does extensive automated reasoning through all the material it gathers during its research phase. Reasoning tokens here are a bit different than the CoTs in the answer - these are tokens that we use to reason through the research material prior to generating the outputs via the CoTs. Reasoning tokens are priced at $3/1M tokens Context: 128000
|
|
|
openrouter
|
Qwen: QwQ 32B |
qwq-32b
|
0.15 |
0.40 |
QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. Context: 32768
|
|
|
openrouter
|
Google: Gemini 2.0 Flash Lite |
gemini-2.0-flash-lite-001
|
0.08 |
0.30 |
Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices. Context: 1048576
|
|
|
openrouter
|
Anthropic: Claude 3.7 Sonnet (thinking) |
claude-3.7-sonnet:thinking
|
3.00 |
15.00 |
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.
Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.
Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet) Context: 200000
|
|
|
openrouter
|
Anthropic: Claude 3.7 Sonnet |
claude-3.7-sonnet
|
3.00 |
15.00 |
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.
Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.
Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet) Context: 200000
|
|
|
openrouter
|
Mistral: Saba |
mistral-saba
|
0.20 |
0.60 |
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications. Read more at the blog post [here](https://mistral.ai/en/news/mistral-saba) Context: 32768
|
|
|
openrouter
|
Llama Guard 3 8B |
llama-guard-3-8b
|
0.02 |
0.06 |
Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.
Context: 131072
|
|
|
openrouter
|
OpenAI: o3 Mini High |
o3-mini-high
|
1.10 |
4.40 |
OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high.
o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. Context: 200000
|
|
|
openrouter
|
Google: Gemini 2.0 Flash |
gemini-2.0-flash-001
|
0.10 |
0.40 |
Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences. Context: 1048576
|
|
|
openrouter
|
Qwen: Qwen VL Plus |
qwen-vl-plus
|
0.21 |
0.63 |
Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.
Context: 7500
|
|
|
openrouter
|
AionLabs: Aion-1.0 |
aion-1.0
|
4.00 |
8.00 |
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model. Context: 131072
|
|
|
openrouter
|
AionLabs: Aion-1.0-Mini |
aion-1.0-mini
|
0.70 |
1.40 |
Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), independently replicated for verification. Context: 131072
|
|
|
openrouter
|
AionLabs: Aion-RP 1.0 (8B) |
aion-rp-llama-3.1-8b
|
0.80 |
1.60 |
Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing. Context: 32768
|
|
|
openrouter
|
Qwen: Qwen VL Max |
qwen-vl-max
|
0.80 |
3.20 |
Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks.
Context: 131072
|
|
|
openrouter
|
Qwen: Qwen-Turbo |
qwen-turbo
|
0.05 |
0.20 |
Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks. Context: 1000000
|
|
|
openrouter
|
Qwen: Qwen2.5 VL 72B Instruct |
qwen2.5-vl-72b-instruct
|
0.15 |
0.60 |
Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images. Context: 32768
|
|
|
openrouter
|
Qwen: Qwen-Plus |
qwen-plus
|
0.40 |
1.20 |
Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination. Context: 131072
|
|
|
openrouter
|
Qwen: Qwen-Max |
qwen-max
|
1.60 |
6.40 |
Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown. Context: 32768
|
|
|
openrouter
|
OpenAI: o3 Mini |
o3-mini
|
1.10 |
4.40 |
OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding.
This model supports the `reasoning_effort` parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slug `openai/o3-mini-high` to default the parameter to "high".
The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. Context: 200000
|
|
|
openrouter
|
Mistral: Mistral Small 3 |
mistral-small-24b-instruct-2501
|
0.03 |
0.11 |
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. [Read the blog post about the model here.](https://mistral.ai/news/mistral-small-3/) Context: 32768
|
|
|
openrouter
|
DeepSeek: R1 Distill Qwen 32B |
deepseek-r1-distill-qwen-32b
|
0.27 |
0.27 |
DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\nOther benchmark results include:\n\n- AIME 2024 pass@1: 72.6\n- MATH-500 pass@1: 94.3\n- CodeForces Rating: 1691\n\nThe model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 131072
|
|
|
openrouter
|
DeepSeek: R1 Distill Qwen 14B |
deepseek-r1-distill-qwen-14b
|
0.15 |
0.15 |
DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
Other benchmark results include:
- AIME 2024 pass@1: 69.7
- MATH-500 pass@1: 93.9
- CodeForces Rating: 1481
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 32768
|
|
|
openrouter
|
Perplexity: Sonar Reasoning |
sonar-reasoning
|
1.00 |
5.00 |
Sonar Reasoning is a reasoning model provided by Perplexity based on [DeepSeek R1](/deepseek/deepseek-r1).
It allows developers to utilize long chain of thought with built-in web search. Sonar Reasoning is uncensored and hosted in US datacenters. Context: 127000
|
|
|
openrouter
|
Perplexity: Sonar |
sonar
|
1.00 |
1.00 |
Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features optimized for speed. Context: 127072
|
|
|
openrouter
|
DeepSeek: R1 Distill Llama 70B |
deepseek-r1-distill-llama-70b
|
0.03 |
0.11 |
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
- AIME 2024 pass@1: 70.0
- MATH-500 pass@1: 94.5
- CodeForces Rating: 1633
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 131072
|
|
|
openrouter
|
DeepSeek: R1 |
deepseek-r1
|
0.70 |
2.40 |
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
MIT licensed: Distill & commercialize freely! Context: 163840
|
|
|
openrouter
|
MiniMax: MiniMax-01 |
minimax-01
|
0.20 |
1.10 |
MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens.
The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model.
To read more about the release, see: https://www.minimaxi.com/en/news/minimax-01-series-2 Context: 1000192
|
|
|
openrouter
|
Microsoft: Phi 4 |
phi-4
|
0.06 |
0.14 |
[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed.
At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.
For more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)
Context: 16384
|
|
|
openrouter
|
Sao10K: Llama 3.1 70B Hanami x1 |
l3.1-70b-hanami-x1
|
3.00 |
3.00 |
This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b). Context: 16000
|
|
|
openrouter
|
DeepSeek: DeepSeek V3 |
deepseek-chat
|
0.30 |
1.20 |
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.
For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226). Context: 163840
|
|
|
openrouter
|
Sao10K: Llama 3.3 Euryale 70B |
l3.3-euryale-70b
|
0.65 |
0.75 |
Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b). Context: 131072
|
|
|
openrouter
|
OpenAI: o1 |
o1
|
15.00 |
60.00 |
The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).
Context: 200000
|
|
|
openrouter
|
Cohere: Command R7B (12-2024) |
command-r7b-12-2024
|
0.04 |
0.15 |
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps.
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
|
|
|
openrouter
|
Google: Gemini 2.0 Flash Experimental (free) |
gemini-2.0-flash-exp:free
|
0.00 |
0.00 |
Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences. Context: 1048576
|
|
|
openrouter
|
Meta: Llama 3.3 70B Instruct (free) |
llama-3.3-70b-instruct:free
|
0.00 |
0.00 |
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
[Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md) Context: 131072
|
|
|
openrouter
|
Meta: Llama 3.3 70B Instruct |
llama-3.3-70b-instruct
|
0.10 |
0.32 |
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
[Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md) Context: 131072
|
|
|
openrouter
|
Amazon: Nova Lite 1.0 |
nova-lite-v1
|
0.06 |
0.24 |
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy.
With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input. Context: 300000
|
|
|
openrouter
|
Amazon: Nova Micro 1.0 |
nova-micro-v1
|
0.04 |
0.14 |
Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities. Context: 128000
|
|
|
openrouter
|
Amazon: Nova Pro 1.0 |
nova-pro-v1
|
0.80 |
3.20 |
Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX).
Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and at analyzing financial documents.
**NOTE**: Video input is not supported at this time. Context: 300000
|
|
|
openrouter
|
OpenAI: GPT-4o (2024-11-20) |
gpt-4o-2024-11-20
|
2.50 |
10.00 |
The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses.
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. Context: 128000
|
|
|
openrouter
|
Mistral Large 2411 |
mistral-large-2411
|
2.00 |
6.00 |
Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](/mistralai/pixtral-large-2411)
It provides a significant upgrade on the previous [Mistral Large 24.07](/mistralai/mistral-large-2407), with notable improvements in long context understanding, a new system prompt, and more accurate function calling. Context: 131072
|
|
|
openrouter
|
Mistral Large 2407 |
mistral-large-2407
|
2.00 |
6.00 |
This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/).
It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.
Context: 131072
|
|
|
openrouter
|
Mistral: Pixtral Large 2411 |
pixtral-large-2411
|
2.00 |
6.00 |
Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images.
The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.
Context: 131072
|
|
|
openrouter
|
Qwen2.5 Coder 32B Instruct |
qwen-2.5-coder-32b-instruct
|
0.03 |
0.11 |
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
- Significantly improvements in **code generation**, **code reasoning** and **code fixing**.
- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
To read more about its evaluation results, check out [Qwen 2.5 Coder's blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/). Context: 32768
|
|
|
openrouter
|
SorcererLM 8x22B |
sorcererlm-8x22b
|
4.50 |
4.50 |
SorcererLM is an advanced RP and storytelling model, built as a Low-rank 16-bit LoRA fine-tuned on [WizardLM-2 8x22B](/microsoft/wizardlm-2-8x22b).
- Advanced reasoning and emotional intelligence for engaging and immersive interactions
- Vivid writing capabilities enriched with spatial and contextual awareness
- Enhanced narrative depth, promoting creative and dynamic storytelling Context: 16000
|
|
|
openrouter
|
TheDrummer: UnslopNemo 12B |
unslopnemo-12b
|
0.40 |
0.40 |
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios. Context: 32768
|
|
|
openrouter
|
Anthropic: Claude 3.5 Haiku (2024-10-22) |
claude-3.5-haiku-20241022
|
0.80 |
4.00 |
Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries.
It does not support image inputs.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use) Context: 200000
|
|
|
openrouter
|
Anthropic: Claude 3.5 Haiku |
claude-3.5-haiku
|
0.80 |
4.00 |
Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions.
This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems.
This model is currently pointing to [Claude 3.5 Haiku (2024-10-22)](/anthropic/claude-3-5-haiku-20241022). Context: 200000
|
|
|
openrouter
|
Anthropic: Claude 3.5 Sonnet |
claude-3.5-sonnet
|
6.00 |
30.00 |
New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
- Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
- Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
- Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
- Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
#multimodal Context: 200000
|
|
|
openrouter
|
Magnum v4 72B |
magnum-v4-72b
|
3.00 |
5.00 |
This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus).
The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-2.5-72b-instruct). Context: 16384
|
|
|
openrouter
|
Mistral: Ministral 8B |
ministral-8b
|
0.10 |
0.10 |
Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications. Context: 131072
|
|
|
openrouter
|
Mistral: Ministral 3B |
ministral-3b
|
0.04 |
0.04 |
Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference. Context: 131072
|
|
|
openrouter
|
Qwen: Qwen2.5 7B Instruct |
qwen-2.5-7b-instruct
|
0.04 |
0.10 |
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:
- Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.
- Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.
- Long-context Support up to 128K tokens and can generate up to 8K tokens.
- Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
|
|
|
openrouter
|
NVIDIA: Llama 3.1 Nemotron 70B Instruct |
llama-3.1-nemotron-70b-instruct
|
1.20 |
1.20 |
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
|
|
|
openrouter
|
Inflection: Inflection 3 Productivity |
inflection-3-productivity
|
2.50 |
10.00 |
Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news.
For emotional intelligence similar to Pi, see [Inflect 3 Pi](/inflection/inflection-3-pi)
See [Inflection's announcement](https://inflection.ai/blog/enterprise) for more details. Context: 8000
|
|
|
openrouter
|
Inflection: Inflection 3 Pi |
inflection-3-pi
|
2.50 |
10.00 |
Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay.
Pi has been trained to mirror your tone and style, if you use more emojis, so will Pi! Try experimenting with various prompts and conversation styles. Context: 8000
|
|
|
openrouter
|
TheDrummer: Rocinante 12B |
rocinante-12b
|
0.17 |
0.43 |
Rocinante 12B is designed for engaging storytelling and rich prose.
Early testers have reported:
- Expanded vocabulary with unique and expressive word choices
- Enhanced creativity for vivid narratives
- Adventure-filled and captivating stories Context: 32768
|
|
|
openrouter
|
Meta: Llama 3.2 90B Vision Instruct |
llama-3.2-90b-vision-instruct
|
0.35 |
0.40 |
The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.
This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 32768
|
|
|
openrouter
|
Meta: Llama 3.2 11B Vision Instruct |
llama-3.2-11b-vision-instruct
|
0.05 |
0.05 |
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.
Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
|
|
|
openrouter
|
Meta: Llama 3.2 1B Instruct |
llama-3.2-1b-instruct
|
0.03 |
0.20 |
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.
Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 60000
|
|
|
openrouter
|
Meta: Llama 3.2 3B Instruct (free) |
llama-3.2-3b-instruct:free
|
0.00 |
0.00 |
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
|
|
|
openrouter
|
Meta: Llama 3.2 3B Instruct |
llama-3.2-3b-instruct
|
0.02 |
0.02 |
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
|
|
|
openrouter
|
Qwen2.5 72B Instruct |
qwen-2.5-72b-instruct
|
0.12 |
0.39 |
Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:
- Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.
- Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.
- Long-context Support up to 128K tokens and can generate up to 8K tokens.
- Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
|
|
|
openrouter
|
NeverSleep: Lumimaid v0.2 8B |
llama-3.1-lumimaid-8b
|
0.09 |
0.60 |
Lumimaid v0.2 8B is a finetune of [Llama 3.1 8B](/models/meta-llama/llama-3.1-8b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged.
Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 32768
|
|
|
openrouter
|
Mistral: Pixtral 12B |
pixtral-12b
|
0.10 |
0.10 |
The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836. Context: 32768
|
|
|
openrouter
|
Cohere: Command R (08-2024) |
command-r-08-2024
|
0.15 |
0.60 |
command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model.
Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
|
|
|
openrouter
|
Cohere: Command R+ (08-2024) |
command-r-plus-08-2024
|
2.50 |
10.00 |
command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.
Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
|
|
|
openrouter
|
Qwen: Qwen2.5-VL 7B Instruct (free) |
qwen-2.5-vl-7b-instruct:free
|
0.00 |
0.00 |
Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:
- SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
- Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
- Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
- Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL).
Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
|
|
|
openrouter
|
Qwen: Qwen2.5-VL 7B Instruct |
qwen-2.5-vl-7b-instruct
|
0.20 |
0.20 |
Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:
- SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
- Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
- Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
- Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL).
Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
|
|
|
openrouter
|
Sao10K: Llama 3.1 Euryale 70B v2.2 |
l3.1-euryale-70b
|
0.65 |
0.75 |
Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b). Context: 32768
|
|
|
openrouter
|
Microsoft: Phi-3.5 Mini 128K Instruct |
phi-3.5-mini-128k-instruct
|
0.10 |
0.10 |
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as [Phi-3 Mini](/models/microsoft/phi-3-mini-128k-instruct).
The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters. Context: 128000
|
|
|
openrouter
|
Nous: Hermes 3 70B Instruct |
hermes-3-llama-3.1-70b
|
0.30 |
0.30 |
Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
Hermes 3 70B is a competitive, if not superior finetune of the [Llama-3.1 70B foundation model](/models/meta-llama/llama-3.1-70b-instruct), focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Context: 65536
|
|
|
openrouter
|
Nous: Hermes 3 405B Instruct (free) |
hermes-3-llama-3.1-405b:free
|
0.00 |
0.00 |
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two. Context: 131072
|
|
|
openrouter
|
Nous: Hermes 3 405B Instruct |
hermes-3-llama-3.1-405b
|
1.00 |
1.00 |
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two. Context: 131072
|
|
|
openrouter
|
OpenAI: ChatGPT-4o |
chatgpt-4o-latest
|
5.00 |
15.00 |
OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in that it has additional RLHF. It is intended for research and evaluation.
OpenAI notes that this model is not suited for production use-cases as it may be removed or redirected to another model in the future. Context: 128000
|
|
|
openrouter
|
Sao10K: Llama 3 8B Lunaris |
l3-lunaris-8b
|
0.04 |
0.05 |
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge.
Created by [Sao10k](https://huggingface.co/Sao10k), this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning.
For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1. Context: 8192
|
|
|
openrouter
|
OpenAI: GPT-4o (2024-08-06) |
gpt-4o-2024-08-06
|
2.50 |
10.00 |
The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/).
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) Context: 128000
|
|
|
openrouter
|
Meta: Llama 3.1 405B (base) |
llama-3.1-405b
|
4.00 |
4.00 |
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 32768
|
|
|
openrouter
|
Meta: Llama 3.1 405B Instruct (free) |
llama-3.1-405b-instruct:free
|
0.00 |
0.00 |
The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
|
|
|
openrouter
|
Meta: Llama 3.1 405B Instruct |
llama-3.1-405b-instruct
|
3.50 |
3.50 |
The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 10000
|
|
|
openrouter
|
Meta: Llama 3.1 8B Instruct |
llama-3.1-8b-instruct
|
0.02 |
0.03 |
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
|
|
|
openrouter
|
Meta: Llama 3.1 70B Instruct |
llama-3.1-70b-instruct
|
0.40 |
0.40 |
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
|
|
|
openrouter
|
Mistral: Mistral Nemo |
mistral-nemo
|
0.02 |
0.04 |
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
It supports function calling and is released under the Apache 2.0 license. Context: 131072
|
|
|
openrouter
|
OpenAI: GPT-4o-mini |
gpt-4o-mini
|
0.15 |
0.60 |
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.
As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
#multimodal Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-4o-mini (2024-07-18) |
gpt-4o-mini-2024-07-18
|
0.15 |
0.60 |
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.
As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
#multimodal Context: 128000
|
|
|
openrouter
|
Google: Gemma 2 27B |
gemma-2-27b-it
|
0.65 |
0.65 |
Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini).
Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.
See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Context: 8192
|
|
|
openrouter
|
Google: Gemma 2 9B |
gemma-2-9b-it
|
0.03 |
0.09 |
Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.
Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.
See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Context: 8192
|
|
|
openrouter
|
Sao10k: Llama 3 Euryale 70B v2.1 |
l3-euryale-70b
|
1.48 |
1.48 |
Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k).
- Better prompt adherence.
- Better anatomy / spatial awareness.
- Adapts much better to unique and custom formatting / reply formats.
- Very creative, lots of unique swipes.
- Is not restrictive during roleplays. Context: 8192
|
|
|
openrouter
|
Mistral: Mistral 7B Instruct (free) |
mistral-7b-instruct:free
|
0.00 |
0.00 |
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
*Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.* Context: 32768
|
|
|
openrouter
|
Mistral: Mistral 7B Instruct |
mistral-7b-instruct
|
0.03 |
0.05 |
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
*Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.* Context: 32768
|
|
|
openrouter
|
Mistral: Mistral 7B Instruct v0.3 |
mistral-7b-instruct-v0.3
|
0.20 |
0.20 |
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
An improved version of [Mistral 7B Instruct v0.2](/models/mistralai/mistral-7b-instruct-v0.2), with the following changes:
- Extended vocabulary to 32768
- Supports v3 Tokenizer
- Supports function calling
NOTE: Support for function calling depends on the provider. Context: 32768
|
|
|
openrouter
|
NousResearch: Hermes 2 Pro - Llama-3 8B |
hermes-2-pro-llama-3-8b
|
0.03 |
0.08 |
Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. Context: 8192
|
|
|
openrouter
|
Microsoft: Phi-3 Mini 128K Instruct |
phi-3-mini-128k-instruct
|
0.10 |
0.10 |
Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.
At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date. Context: 128000
|
|
|
openrouter
|
Microsoft: Phi-3 Medium 128K Instruct |
phi-3-medium-128k-instruct
|
1.00 |
1.00 |
Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.
At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance.
For 4k context length, try [Phi-3 Medium 4K](/models/microsoft/phi-3-medium-4k-instruct). Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-4o (2024-05-13) |
gpt-4o-2024-05-13
|
5.00 |
15.00 |
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)
#multimodal Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-4o |
gpt-4o
|
2.50 |
10.00 |
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)
#multimodal Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-4o (extended) |
gpt-4o:extended
|
6.00 |
18.00 |
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)
#multimodal Context: 128000
|
|
|
openrouter
|
Meta: LlamaGuard 2 8B |
llama-guard-2-8b
|
0.20 |
0.20 |
This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification.
LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated.
For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
|
|
|
openrouter
|
Meta: Llama 3 70B Instruct |
llama-3-70b-instruct
|
0.30 |
0.40 |
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
|
|
|
openrouter
|
Meta: Llama 3 8B Instruct |
llama-3-8b-instruct
|
0.03 |
0.06 |
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
|
|
|
openrouter
|
Mistral: Mixtral 8x22B Instruct |
mixtral-8x22b-instruct
|
2.00 |
6.00 |
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:
- strong math, coding, and reasoning
- large context length (64k)
- fluency in English, French, Italian, German, and Spanish
See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/).
#moe Context: 65536
|
|
|
openrouter
|
WizardLM-2 8x22B |
wizardlm-2-8x22b
|
0.48 |
0.48 |
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.
It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b).
To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/).
#moe Context: 65536
|
|
|
openrouter
|
OpenAI: GPT-4 Turbo |
gpt-4-turbo
|
10.00 |
30.00 |
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.
Training data: up to December 2023. Context: 128000
|
|
|
openrouter
|
Anthropic: Claude 3 Haiku |
claude-3-haiku
|
0.25 |
1.25 |
Claude 3 Haiku is Anthropic's fastest and most compact model for
near-instant responsiveness. Quick and accurate targeted performance.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku)
#multimodal Context: 200000
|
|
|
openrouter
|
Anthropic: Claude 3 Opus |
claude-3-opus
|
15.00 |
75.00 |
Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.
See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
#multimodal Context: 200000
|
|
|
openrouter
|
Mistral Large |
mistral-large
|
2.00 |
6.00 |
This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/).
It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents. Context: 128000
|
|
|
openrouter
|
OpenAI: GPT-3.5 Turbo (older v0613) |
gpt-3.5-turbo-0613
|
1.00 |
2.00 |
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.
Training data up to Sep 2021. Context: 4095
|
|
|
openrouter
|
OpenAI: GPT-4 Turbo Preview |
gpt-4-turbo-preview
|
10.00 |
30.00 |
The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023.
**Note:** heavily rate limited by OpenAI while in preview. Context: 128000
|
|
|
openrouter
|
Mistral Tiny |
mistral-tiny
|
0.25 |
0.25 |
Note: This model is being deprecated. Recommended replacement is the newer [Ministral 8B](/mistral/ministral-8b)
This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial. Context: 32768
|
|
|
openrouter
|
Mistral: Mistral 7B Instruct v0.2 |
mistral-7b-instruct-v0.2
|
0.20 |
0.20 |
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
An improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), with the following changes:
- 32k context window (vs 8k context in v0.1)
- Rope-theta = 1e6
- No Sliding-Window Attention Context: 32768
|
|
|
openrouter
|
Mistral: Mixtral 8x7B Instruct |
mixtral-8x7b-instruct
|
0.54 |
0.54 |
Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.
Instruct model fine-tuned by Mistral. #moe Context: 32768
|
|
|
openrouter
|
Noromaid 20B |
noromaid-20b
|
1.00 |
1.75 |
A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge.
#merge #uncensored Context: 4096
|
|
|
openrouter
|
Goliath 120B |
goliath-120b
|
6.00 |
8.00 |
A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.
Credits to
- [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
- [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios.
#merge Context: 6144
|
|
|
openrouter
|
Auto Router |
auto
|
-1,000,000.00 |
-1,000,000.00 |
Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output.
To see which model was used, visit [Activity](/activity), or read the `model` attribute of the response. Your response will be priced at the same rate as the routed model.
The meta-model is powered by [Not Diamond](https://docs.notdiamond.ai/docs/how-not-diamond-works). Learn more in our [docs](/docs/model-routing).
Requests will be routed to the following models:
- [openai/gpt-5.1](/openai/gpt-5.1)
- [openai/gpt-5](/openai/gpt-5)
- [openai/gpt-5-mini](/openai/gpt-5-mini)
- [openai/gpt-5-nano](/openai/gpt-5-nano)
- [openai/gpt-4.1](/openai/gpt-4.1)
- [openai/gpt-4.1-mini](/openai/gpt-4.1-mini)
- [openai/gpt-4.1-nano](/openai/gpt-4.1-nano)
- [openai/gpt-4o](/openai/gpt-4o)
- [openai/gpt-4o-2024-05-13](/openai/gpt-4o-2024-05-13)
- [openai/gpt-4o-2024-08-06](/openai/gpt-4o-2024-08-06)
- [openai/gpt-4o-2024-11-20](/openai/gpt-4o-2024-11-20)
- [openai/gpt-4o-mini](/openai/gpt-4o-mini)
- [openai/gpt-4o-mini-2024-07-18](/openai/gpt-4o-mini-2024-07-18)
- [openai/gpt-4-turbo](/openai/gpt-4-turbo)
- [openai/gpt-4-turbo-preview](/openai/gpt-4-turbo-preview)
- [openai/gpt-4-1106-preview](/openai/gpt-4-1106-preview)
- [openai/gpt-4](/openai/gpt-4)
- [openai/gpt-3.5-turbo](/openai/gpt-3.5-turbo)
- [openai/gpt-oss-120b](/openai/gpt-oss-120b)
- [anthropic/claude-opus-4.5](/anthropic/claude-opus-4.5)
- [anthropic/claude-opus-4.1](/anthropic/claude-opus-4.1)
- [anthropic/claude-opus-4](/anthropic/claude-opus-4)
- [anthropic/claude-sonnet-4.5](/anthropic/claude-sonnet-4.5)
- [anthropic/claude-sonnet-4](/anthropic/claude-sonnet-4)
- [anthropic/claude-3.7-sonnet](/anthropic/claude-3.7-sonnet)
- [anthropic/claude-haiku-4.5](/anthropic/claude-haiku-4.5)
- [anthropic/claude-3.5-haiku](/anthropic/claude-3.5-haiku)
- [anthropic/claude-3-haiku](/anthropic/claude-3-haiku)
- [google/gemini-3-pro-preview](/google/gemini-3-pro-preview)
- [google/gemini-2.5-pro](/google/gemini-2.5-pro)
- [google/gemini-2.0-flash-001](/google/gemini-2.0-flash-001)
- [google/gemini-2.5-flash](/google/gemini-2.5-flash)
- [mistralai/mistral-large](/mistralai/mistral-large)
- [mistralai/mistral-large-2407](/mistralai/mistral-large-2407)
- [mistralai/mistral-large-2411](/mistralai/mistral-large-2411)
- [mistralai/mistral-medium-3.1](/mistralai/mistral-medium-3.1)
- [mistralai/mistral-nemo](/mistralai/mistral-nemo)
- [mistralai/mistral-7b-instruct](/mistralai/mistral-7b-instruct)
- [mistralai/mixtral-8x7b-instruct](/mistralai/mixtral-8x7b-instruct)
- [mistralai/mixtral-8x22b-instruct](/mistralai/mixtral-8x22b-instruct)
- [mistralai/codestral-2508](/mistralai/codestral-2508)
- [x-ai/grok-4](/x-ai/grok-4)
- [x-ai/grok-3](/x-ai/grok-3)
- [x-ai/grok-3-mini](/x-ai/grok-3-mini)
- [deepseek/deepseek-r1](/deepseek/deepseek-r1)
- [meta-llama/llama-3.3-70b-instruct](/meta-llama/llama-3.3-70b-instruct)
- [meta-llama/llama-3.1-405b-instruct](/meta-llama/llama-3.1-405b-instruct)
- [meta-llama/llama-3.1-70b-instruct](/meta-llama/llama-3.1-70b-instruct)
- [meta-llama/llama-3.1-8b-instruct](/meta-llama/llama-3.1-8b-instruct)
- [meta-llama/llama-3-70b-instruct](/meta-llama/llama-3-70b-instruct)
- [meta-llama/llama-3-8b-instruct](/meta-llama/llama-3-8b-instruct)
- [qwen/qwen3-235b-a22b](/qwen/qwen3-235b-a22b)
- [qwen/qwen3-32b](/qwen/qwen3-32b)
- [qwen/qwen3-14b](/qwen/qwen3-14b)
- [cohere/command-r-plus-08-2024](/cohere/command-r-plus-08-2024)
- [cohere/command-r-08-2024](/cohere/command-r-08-2024)
- [moonshotai/kimi-k2-thinking](/moonshotai/kimi-k2-thinking)
- [perplexity/sonar](/perplexity/sonar) Context: 2000000
|
|
|
openrouter
|
OpenAI: GPT-4 Turbo (older v1106) |
gpt-4-1106-preview
|
10.00 |
30.00 |
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.
Training data: up to April 2023. Context: 128000
|
|
|
openrouter
|
Mistral: Mistral 7B Instruct v0.1 |
mistral-7b-instruct-v0.1
|
0.11 |
0.19 |
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length. Context: 2824
|
|
|
openrouter
|
OpenAI: GPT-3.5 Turbo Instruct |
gpt-3.5-turbo-instruct
|
1.50 |
2.00 |
This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021. Context: 4095
|
|
|
openrouter
|
OpenAI: GPT-3.5 Turbo 16k |
gpt-3.5-turbo-16k
|
3.00 |
4.00 |
This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021. Context: 16385
|
|
|
openrouter
|
Mancer: Weaver (alpha) |
weaver
|
0.75 |
1.00 |
An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations. Context: 8000
|
|
|
openrouter
|
ReMM SLERP 13B |
remm-slerp-l2-13b
|
0.45 |
0.65 |
A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge Context: 6144
|
|
|
openrouter
|
MythoMax 13B |
mythomax-l2-13b
|
0.06 |
0.06 |
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge Context: 4096
|
|
|
openrouter
|
OpenAI: GPT-4 (older v0314) |
gpt-4-0314
|
30.00 |
60.00 |
GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021. Context: 8191
|
|
|
openrouter
|
OpenAI: GPT-4 |
gpt-4
|
30.00 |
60.00 |
OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021. Context: 8191
|
|
|
openrouter
|
OpenAI: GPT-3.5 Turbo |
gpt-3.5-turbo
|
0.50 |
1.50 |
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.
Training data up to Sep 2021. Context: 16385
|
|
|
factoryai
|
glm-4.6 |
glm-4.6
|
- |
- |
-
|
|
|
factoryai
|
claude-haiku-4-5-20251001 |
claude-haiku-4-5-20251001
|
- |
- |
-
|
|
|
factoryai
|
gpt-5.1 |
gpt-5.1
|
- |
- |
-
|
|
|
factoryai
|
gpt-5.1-codex |
gpt-5.1-codex
|
- |
- |
-
|
|
|
factoryai
|
gpt-5.1-codex-max |
gpt-5.1-codex-max
|
- |
- |
-
|
|
|
factoryai
|
gpt-5.2 |
gpt-5.2
|
- |
- |
-
|
|
|
factoryai
|
gemini-3-pro-preview |
gemini-3-pro-preview
|
- |
- |
-
|
|
|
factoryai
|
gemini-3-flash-preview |
gemini-3-flash-preview
|
- |
- |
-
|
|
|
factoryai
|
claude-sonnet-4-5-20250929 |
claude-sonnet-4-5-20250929
|
- |
- |
-
|
|
|
factoryai
|
claude-opus-4-5-20251101 |
claude-opus-4-5-20251101
|
- |
- |
-
|
|
|
zai
|
GLM-4.7 |
glm-4.7
|
0.60 |
0.11 |
-
|
|
|
zai
|
GLM-4.6 |
glm-4.6
|
0.60 |
0.11 |
-
|
|
|
zai
|
GLM-4.6V |
glm-4.6v
|
0.30 |
0.05 |
-
|
|
|
zai
|
GLM-4.6V-FlashX |
glm-4.6v-flashx
|
0.04 |
0.00 |
-
|
|
|
zai
|
GLM-4.5 |
glm-4.5
|
0.60 |
0.11 |
-
|
|
|
zai
|
GLM-4.5V |
glm-4.5v
|
0.60 |
0.11 |
-
|
|
|
zai
|
GLM-4.5-X |
glm-4.5-x
|
2.20 |
0.45 |
-
|
|
|
zai
|
GLM-4.5-Air |
glm-4.5-air
|
0.20 |
0.03 |
-
|
|
|
zai
|
GLM-4.5-AirX |
glm-4.5-airx
|
1.10 |
0.22 |
-
|
|
|
zai
|
GLM-4-32B-0414-128K |
glm-4-32b-0414-128k
|
0.10 |
- |
-
|
|
|
zai
|
GLM-4.6V-Flash |
glm-4.6v-flash
|
0.00 |
0.00 |
-
|
|
|
zai
|
GLM-4.5-Flash |
glm-4.5-flash
|
0.00 |
0.00 |
-
|
|