AI Models List

Provider	Name	Model ID	Input Price ($/1M)	Output Price ($/1M)	Description
vercel	Grok Code Fast 1	grok-code-fast-1	0.20	1.50	xAI's latest coding model that offers fast agentic coding with a 256K context window.
vercel	Claude Sonnet 4.5	claude-sonnet-4.5	3.00	15.00	Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4.
vercel	Gemini 2.5 Flash Lite	gemini-2.5-flash-lite	0.10	0.40	Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
vercel	Gemini 3 Flash	gemini-3-flash	0.50	3.00	Google's most intelligent model built for speed, combining frontier intelligence with superior search and grounding.
vercel	Claude Haiku 4.5	claude-haiku-4.5	1.00	5.00	Claude Haiku 4.5 matches Sonnet 4's performance on coding, computer use, and agent tasks at substantially lower cost and faster speeds. It delivers near-frontier performance and Claude’s unique character at a price point that works for scaled sub-agent deployments, free tier products, and intelligence-sensitive applications with budget constraints.
vercel	MiniMax M2	minimax-m2	0.27	1.15	MiniMax-M2 redefines efficiency for agents. It is a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence.
vercel	Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
vercel	DeepSeek V3.2	deepseek-v3.2	0.27	0.40	DeepSeek-V3.2: Official successor to V3.2-Exp.
vercel	Claude Opus 4.5	claude-opus-4.5	5.00	25.00	Claude Opus 4.5 is Anthropic’s latest model in the Opus series, meant for demanding reasoning tasks and complex problem solving. This model has improvements in general intelligence and vision compared to previous iterations. In addition, it is suited for difficult coding tasks and agentic workflows, especially those with computer use and tool use, and can effectively handle context usage and external memory files.
vercel	Claude 3.7 Sonnet	claude-3.7-sonnet	3.00	15.00	Claude 3.7 Sonnet is the first hybrid reasoning model and Anthropic's most intelligent model to date. It delivers state-of-the-art performance for coding, content generation, data analysis, and planning tasks, building upon its predecessor Claude 3.5 Sonnet's capabilities in software engineering and computer use.
vercel	GPT-5.2	gpt-5.2	1.75	14.00	GPT-5.2 is OpenAI's best general-purpose model, part of the GPT-5 flagship model family. It's their most intelligent model yet for both general and agentic tasks.
vercel	Claude Sonnet 4	claude-sonnet-4	3.00	15.00	Claude Sonnet 4 significantly improves on Sonnet 3.7's industry-leading capabilities, excelling in coding with a state-of-the-art 72.7% on SWE-bench. The model balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.
vercel	Grok 4.1 Fast Non-Reasoning	grok-4.1-fast-non-reasoning	0.20	0.50	Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for speed use this variant. Otherwise, use the reasoning version.
vercel	Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	This model improves upon Gemini 2.5 Pro and is catered towards challenging tasks, especially those involving complex reasoning or agentic workflows. Improvements highlighted include use cases for coding, multi-step function calling, planning, reasoning, deep knowledge tasks, and instruction following.
vercel	GPT-5 mini	gpt-5-mini	0.25	2.00	GPT-5 mini is a cost optimized model that excels at reasoning/chat tasks. It offers an optimal balance between speed, cost, and capability.
vercel	GPT-5	gpt-5	1.25	10.00	GPT-5 is OpenAI's flagship language model that excels at complex reasoning, broad real-world knowledge, code-intensive, and multi-step agentic tasks.
vercel	GPT-5 Chat	gpt-5-chat	1.25	10.00	GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT.
vercel	GPT-5 nano	gpt-5-nano	0.05	0.40	GPT-5 nano is a high throughput model that excels at simple instruction or classification tasks.
vercel	GPT-4.1 mini	gpt-4.1-mini	0.40	1.60	GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.
vercel	GPT-5-Codex	gpt-5-codex	1.25	10.00	GPT-5-Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments.
vercel	Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Gemini 2.5 Pro is our most advanced reasoning Gemini model, capable of solving complex problems. Gemini 2.5 Pro can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and even entire code repositories.
vercel	GLM 4.6	glm-4.6	0.45	1.80	As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
vercel	Grok 4 Fast Non-Reasoning	grok-4-fast-non-reasoning	0.20	0.50	Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
vercel	gpt-oss-120b	gpt-oss-120b	0.10	0.50	Extremely capable general-purpose LLM with strong, controllable reasoning capabilities
vercel	gpt-oss-safeguard-20b	gpt-oss-safeguard-20b	0.08	0.30	OpenAI's first open weight reasoning model specifically trained for safety classification tasks. Fine-tuned from GPT-OSS, this model helps classify text content based on customizable policies, enabling bring-your-own-policy Trust & Safety AI where your own taxonomy, definitions, and thresholds guide classification decisions.
vercel	GPT-5.1 Instant	gpt-5.1-instant	1.25	10.00	GPT-5.1 Instant (or GPT-5.1 chat) is a warmer and more conversational version of GPT-5-chat, with improved instruction following and adaptive reasoning for deciding when to think before responding.
vercel	GPT-4o mini	gpt-4o-mini	0.15	0.60	GPT-4o mini from OpenAI is their most advanced and cost-efficient small model. It is multi-modal (accepting text or image inputs and outputting text) and has higher intelligence than gpt-3.5-turbo but is just as fast.
vercel	MiniMax M2.1	minimax-m2.1	0.30	1.20	MiniMax 2.1 is MiniMax's latest model, optimized specifically for robustness in coding, tool use, instruction following, and long-horizon planning.
vercel	Gemini 2.0 Flash	gemini-2.0-flash	0.10	0.40	Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
vercel	Devstral 2	devstral-2	0.00	0.00	An enterprise-grade text model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
vercel	GPT 5.1 Thinking	gpt-5.1-thinking	1.25	10.00	An upgraded version of GPT-5 that adapts thinking time more precisely to the question to spend more time on complex questions and respond more quickly to simpler tasks.
vercel	text-embedding-3-small	text-embedding-3-small	0.02	0.00	OpenAI's improved, more performant version of their ada embedding model.
vercel	Grok 4.1 Fast Reasoning	grok-4.1-fast-reasoning	0.20	0.50	Grok 4.1 Fast is xAI's best tool-calling model with a 2M context window. It reasons and completes agentic tasks accurately and rapidly, excelling at complex real-world use cases such as customer support and finance. To optimize for maximal intelligence use this variant. Otherwise, use the non-reasoning version.
vercel	DeepSeek V3.2 Thinking	deepseek-v3.2-thinking	0.28	0.42	Thinking mode of DeepSeek V3.2
vercel	GLM 4.7	glm-4.7	0.43	1.75	GLM-4.7 is Z.ai’s latest flagship model, with major upgrades focused on two key areas: stronger coding capabilities and more stable multi-step reasoning and execution.
vercel	Ministral 3B	ministral-3b	0.04	0.04	A compact, efficient model for on-device tasks like smart assistants and local analytics, offering low-latency performance.
vercel	Devstral Small 2	devstral-small-2	0.00	0.00	Our open source model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.
vercel	Mistral Embed	mistral-embed	0.10	0.00	General-purpose text embedding model for semantic search, similarity, clustering, and RAG workflows.
vercel	Nova Lite	nova-lite	0.06	0.24	A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.
vercel	Claude Opus 4.1	claude-opus-4.1	15.00	75.00	Claude Opus 4.1 is a drop-in replacement for Opus 4 that delivers superior performance and precision for real-world coding and agentic tasks. Opus 4.1 advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, and handles complex, multi-step problems with more rigor and attention to detail.
vercel	Qwen3 Next 80B A3B Instruct	qwen3-next-80b-a3b-instruct	0.09	1.10	A new generation of open-source, non-thinking mode model powered by Qwen3. This version demonstrates superior Chinese text understanding, augmented logical reasoning, and enhanced capabilities in text generation tasks over the previous iteration (Qwen3-235B-A22B-Instruct-2507).
vercel	GPT-4.1	gpt-4.1	2.00	8.00	GPT 4.1 is OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.
vercel	GPT-4o	gpt-4o	2.50	10.00	GPT-4o from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It matches GPT-4 Turbo performance with a faster and cheaper API.
vercel	GPT-4.1 nano	gpt-4.1-nano	0.10	0.40	GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.
vercel	GPT 5.1 Codex Max	gpt-5.1-codex-max	1.25	10.00	GPT‑5.1-Codex-Max is purpose-built for agentic coding.
vercel	Grok 4 Fast Reasoning	grok-4-fast-reasoning	0.20	0.50	Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning.
vercel	Grok 4	grok-4	3.00	15.00	xAI's latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.
vercel	Nano Banana (Gemini 2.5 Flash Image)	gemini-2.5-flash-image	0.30	2.50	Nano Banana (Gemini 2.5 Flash Image) is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
vercel	Nano Banana Pro (Gemini 3 Pro Image)	gemini-3-pro-image	2.00	120.00	Nano Banana Pro (Gemini 3 Pro Image) builds on Nano Banana's generation capabilities into a new era of studio-quality, functional design to help you create and edit high-fidelity, production-ready visuals with unparalleled precision and control. Improvements include enhanced world knowledge and reasoning, dynamic text and translation, and studio level controls.
vercel	gpt-oss-20b	gpt-oss-20b	0.07	0.30	A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
vercel	Gemini Embedding 001	gemini-embedding-001	0.15	0.00	State-of-the-art embedding model with excellent performance across English, multilingual and code tasks.
vercel	o4-mini	o4-mini	1.10	4.40	OpenAI's o4-mini delivers fast, cost-efficient reasoning with exceptional performance for its size, particularly excelling in math (best-performing on AIME benchmarks), coding, and visual tasks.
vercel	Sonar	sonar	1.00	1.00	Perplexity's lightweight offering with search grounding, quicker and cheaper than Sonar Pro.
vercel	Kimi K2 0905	kimi-k2-0905	0.60	2.50	Kimi K2 0905 has shown strong performance on agentic tasks thanks to its tool calling, reasoning abilities, and long context handling. But as a large parameter model (1T parameters), it’s also resource-intensive. Running it in production requires a highly optimized inference stack to avoid excessive latency.
vercel	Gemini 2.5 Flash Lite Preview 09-2025	gemini-2.5-flash-lite-preview-09-2025	0.10	0.40	Gemini 2.5 Flash-Lite is a balanced, low-latency model with configurable thinking budgets and tool connectivity (e.g., Google Search grounding and code execution). It supports multimodal input and offers a 1M-token context window.
vercel	text-embedding-3-large	text-embedding-3-large	0.13	0.00	OpenAI's most capable embedding model for both english and non-english tasks.
vercel	Gemini 2.0 Flash Lite	gemini-2.0-flash-lite	0.08	0.30	Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, built-in tool use, multimodal generation, and a 1M token context window.
vercel	Claude Opus 4	claude-opus-4	15.00	75.00	Claude Opus 4 is Anthropic's most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish.
vercel	Claude 3.5 Haiku	claude-3.5-haiku	0.80	4.00	Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.
vercel	GPT-5.2 Chat	gpt-5.2-chat	1.75	14.00	The model powering ChatGPT is gpt-5.2-chat-latest: this is OpenAI's best general-purpose model, part of the GPT-5 flagship model family.
vercel	Gemini 2.5 Flash Preview 09-2025	gemini-2.5-flash-preview-09-2025	0.30	2.50	Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance with multimodal support and a 1M token context window.
vercel	GPT-5.1 Codex mini	gpt-5.1-codex-mini	0.25	2.00	GPT-5.1 Codex mini is a smaller, faster, and cheaper version of GPT-5.1 Codex.
vercel	DeepSeek V3.2 Exp	deepseek-v3.2-exp	0.27	0.40	DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality.
vercel	MiMo V2 Flash	mimo-v2-flash	0.10	0.29	Xiaomi MiMo-V2-Flash is a proprietary MoE model developed by Xiaomi, designed for extreme inference efficiency with 309B total parameters (15B active). By incorporating an innovative Hybrid attention architecture and multi-layer MTP inference acceleration, it ranks among the top 2 global open-source models across multiple Agent benchmarks.
vercel	DeepSeek V3 0324	deepseek-v3	0.77	0.77	Fast general-purpose LLM with enhanced reasoning capabilities
vercel	Mistral Small	mistral-small	0.10	0.30	Mistral Small is the ideal choice for simple tasks that one can do in bulk - like Classification, Customer Support, or Text Generation. It offers excellent performance at an affordable price point.
vercel	o3	o3	2.00	8.00	OpenAI's o3 is their most powerful reasoning model, setting new state-of-the-art benchmarks in coding, math, science, and visual perception. It excels at complex queries requiring multi-faceted analysis, with particular strength in analyzing images, charts, and graphics.
vercel	Qwen3 Max	qwen3-max	1.20	6.00	The Qwen 3 series Max model has undergone specialized upgrades in agent programming and tool invocation compared to the preview version. The officially released model this time has achieved state-of-the-art (SOTA) performance in its field and is better suited to meet the demands of agents operating in more complex scenarios.
vercel	Llama 3.3 70B	llama-3.3-70b	0.72	0.72	The upgraded Llama 3.1 70B model features enhanced reasoning, tool use, and multilingual abilities, along with a significantly expanded 128K context window. These improvements make it well-suited for demanding tasks such as long-form summarization, multilingual conversations, and coding assistance.
vercel	Llama 3.1 8B	llama-3.1-8b	0.03	0.05	Llama 3.1 8B brings powerful performance in a smaller, more efficient package. With improved multilingual support, tool use, and a 128K context length, it enables sophisticated use cases like interactive agents and compact coding assistants while remaining lightweight and accessible.
vercel	GPT-5.1-Codex	gpt-5.1-codex	1.25	10.00	GPT-5.1-Codex is a version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.
vercel	Kimi K2 Thinking	kimi-k2-thinking	0.47	2.00	Kimi K2 Thinking is an advanced open-source thinking model by Moonshot AI. It can execute up to 200 – 300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. Built as a thinking agent, it reasons step by step while using tools, achieving state-of-the-art performance on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, with major gains in reasoning, agentic search, coding, writing, and general capabilities.
vercel	KAT-Coder-Pro V1	kat-coder-pro-v1	0.00	0.00	KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KwaiKAT series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving a remarkable 73.4% solve rate on the SWE-Bench Verified benchmark. KAT-Coder-Pro V1 delivers top-tier coding performance and has been rigorously tested by thousands of in-house engineers. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
vercel	Qwen3 235B A22b Instruct 2507	qwen-3-235b	0.13	0.60	Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
vercel	MiniMax M2.1 Lightning	minimax-m2.1-lightning	0.30	2.40	MiniMax-M2.1-lightning is a faster version of MiniMax-M2.1, offering the same performance but with significantly higher throughput (output speed ~100 TPS, MiniMax-M2 output speed ~60 TPS).
vercel	Kimi K2	kimi-k2	0.50	2.00	Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.
vercel	DeepSeek R1 0528	deepseek-r1	0.50	2.15	The latest revision of DeepSeek's first-generation reasoning model
vercel	text-embedding-ada-002	text-embedding-ada-002	0.10	0.00	OpenAI's legacy text embedding model.
vercel	Llama 4 Scout 17B 16E Instruct	llama-4-scout	0.08	0.30	The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts. Served by DeepInfra.
vercel	o3-mini	o3-mini	1.10	4.40	o3-mini is OpenAI's most recent small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini.
vercel	DeepSeek V3.1 Terminus	deepseek-v3.1-terminus	0.27	1.00	DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version and addresses user feedback (i.e. language consistency and agent upgrades).
vercel	Mistral Large 3	mistral-large-3	0.50	1.50	Mistral Large 3 2512 is Mistral’s most capable model to date. It has a sparse mixture-of-experts architecture with 41B active parameters (675B total).
vercel	Pixtral 12B 2409	pixtral-12b	0.15	0.15	A 12B model with image understanding capabilities in addition to text.
vercel	Sonar Pro	sonar-pro	3.00	15.00	Perplexity's premier offering with search grounding, supporting advanced queries and follow-ups.
vercel	GLM-4.6V-Flash	glm-4.6v-flash	0.00	0.00	For local deployment and low-latency applications. GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
vercel	Kimi K2 Thinking Turbo	kimi-k2-thinking-turbo	1.15	8.00	High-speed version of kimi-k2-thinking, suitable for scenarios requiring both deep reasoning and extremely fast responses
vercel	Llama 4 Maverick 17B 128E Instruct	llama-4-maverick	0.15	0.60	Llama 4 Maverick 17B-128E is Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities.
vercel	DeepSeek V3.1	deepseek-v3.1	0.30	1.00	DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.
vercel	Kimi K2 Turbo	kimi-k2-turbo	2.40	10.00	Kimi K2 Turbo is the high-speed version of kimi-k2, with the same model parameters as kimi-k2, but the output speed is increased to 60 tokens per second, with a maximum of 100 tokens per second, the context length is 256k
vercel	Grok 3 Mini Beta	grok-3-mini	0.30	0.50	xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible.
vercel	Claude 3.5 Sonnet	claude-3.5-sonnet	3.00	15.00	The upgraded Claude 3.5 Sonnet is now state-of-the-art for a variety of tasks including real-world software engineering, agentic capabilities and computer use. The new Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.
vercel	LongCat Flash Chat	longcat-flash-chat	0.00	0.00	LongCat-Flash-Chat is a high-throughput MoE chat model (128k context) designed for agentic tasks.
vercel	Qwen3 Next 80B A3B Thinking	qwen3-next-80b-a3b-thinking	0.15	1.50	A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).
vercel	Qwen 3.32B	qwen-3-32b	0.10	0.30	Qwen3-32B is a world-class model with comparable quality to DeepSeek R1 while outperforming GPT-4.1 and Claude Sonnet 3.7. It excels in code-gen, tool-calling, and advanced reasoning, making it an exceptional model for a wide range of production use cases.
vercel	Claude 3 Haiku	claude-3-haiku	0.25	1.25	Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts. Haiku to quickly analyze large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier.
vercel	Qwen3 VL 235B A22B Instruct	qwen3-vl-instruct	0.70	2.80	The Qwen3 series VL models has been comprehensively upgraded in areas such as visual coding and spatial perception. Its visual perception and recognition capabilities have significantly improved, supporting the understanding of ultra-long videos, and its OCR functionality has undergone a major enhancement.
vercel	Text Embedding 005	text-embedding-005	0.03	0.00	English-focused text embedding model optimized for code and English language tasks.
vercel	Nano Banana Preview (Gemini 2.5 Flash Image Preview)	gemini-2.5-flash-image-preview	0.30	2.50	Gemini 2.5 Flash Image Preview is Google's first fully hybrid reasoning model, letting developers turn thinking on or off and set thinking budgets to balance quality, cost, and latency. Upgraded for rapid creative workflows, it can generate interleaved text and images and supports conversational, multi‑turn image editing in natural language. It’s also locale‑aware, enabling culturally and linguistically appropriate image generation for audiences worldwide.
vercel	GPT 5.2	gpt-5.2-pro	21.00	168.00	Version of GPT-5.2 that produces smarter and more precise responses.
vercel	Qwen 3 Coder 30B A3B Instruct	qwen3-coder-30b-a3b	0.07	0.27	Efficient coding specialist balancing performance with cost-effectiveness for daily development tasks while maintaining strong tool integration capabilities.
vercel	Qwen3 Coder 480B A35B Instruct	qwen3-coder	0.38	1.53	Mixture-of-experts LLM with advanced coding and reasoning capabilities
vercel	Grok 2 Vision	grok-2-vision	2.00	10.00	Grok 2 vision model excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA). It can process a wide variety of visual information including documents, diagrams, charts, screenshots, and photographs.
vercel	Morph V3 Fast	morph-v3-fast	0.80	1.20	Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 4500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
vercel	Grok 3 Beta	grok-3	3.00	15.00	xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.
vercel	Nova Micro	nova-micro	0.04	0.14	A text-only model that delivers the lowest latency responses at very low cost.
vercel	Ministral 14B	ministral-14b	0.20	0.20	Ministral 3 14B is the largest model in the Ministral 3 family, offering state-of-the-art capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. Optimized for local deployment, it delivers high performance across diverse hardware, including local setups.
vercel	Ministral 8B	ministral-8b	0.10	0.10	A more powerful model with faster, memory-efficient inference, ideal for complex workflows and demanding edge applications.
vercel	Mistral Codestral	codestral	0.30	0.90	Mistral's cutting-edge language model for coding released end of July 2025, Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
vercel	Claude 3 Opus	claude-3-opus	15.00	75.00	Claude 3 Opus is Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what's possible with generative AI.
vercel	Pixtral Large	pixtral-large	2.00	6.00	Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.
vercel	GPT-4 Turbo	gpt-4-turbo	10.00	30.00	gpt-4-turbo from OpenAI has broad general knowledge and domain expertise allowing it to follow complex instructions in natural language and solve difficult problems accurately. It has a knowledge cutoff of April 2023 and a 128,000 token context window.
vercel	voyage-3.5	voyage-3.5	0.06	0.00	Voyage AI's embedding model optimized for general-purpose and multilingual retrieval quality.
vercel	Llama 3.1 70B Instruct	llama-3.1-70b	0.40	0.40	An update to Meta Llama 3 70B Instruct that includes an expanded 128K context length, multilinguality and improved reasoning capabilities.
vercel	Nemotron 3 Nano 30B A3B	nemotron-3-nano-30b-a3b	0.06	0.24	NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.
vercel	Qwen3 VL 235B A22B Thinking	qwen3-vl-thinking	0.70	8.40	Qwen3 series VL models feature significantly enhanced multimodal reasoning capabilities, with a particular focus on optimizing the model for STEM and mathematical reasoning. Visual perception and recognition abilities have been comprehensively improved, and OCR capabilities have undergone a major upgrade.
vercel	Sonar Reasoning Pro	sonar-reasoning-pro	2.00	8.00	A premium reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing comprehensive explanations with enhanced search capabilities and multiple search queries per request.
vercel	GPT-3.5 Turbo	gpt-3.5-turbo	0.50	1.50	OpenAI's most capable and cost effective model in the GPT-3.5 family optimized for chat purposes, but also works well for traditional completions tasks.
vercel	Qwen3 Embedding 8B	qwen3-embedding-8b	0.05	0.00	The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
vercel	Mistral Medium 3.1	mistral-medium	0.40	2.00	Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost.
vercel	INTELLECT 3	intellect-3	0.20	1.10	Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack. Achieving state-of-the-art performance for its size across math, code and reasoning.
vercel	Nvidia Nemotron Nano 12B V2 VL	nemotron-nano-12b-v2-vl	0.20	0.60	The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.
vercel	Qwen3-14B	qwen-3-14b	0.06	0.24	Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
vercel	Embed v4.0	embed-v4.0	0.12	0.00	A model that allows for text, images, or mixed content to be classified or turned into embeddings.
vercel	GLM 4.5 Air	glm-4.5-air	0.20	1.10	GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
vercel	GPT-5 pro	gpt-5-pro	15.00	120.00	GPT-5 pro uses more compute to think harder and provide consistently better answers. Since GPT-5 pro is designed to tackle tough problems, some requests may take several minutes to finish.
vercel	Llama 3.2 3B Instruct	llama-3.2-3b	0.15	0.15	Text-only model, fine-tuned for supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
vercel	voyage-3-large	voyage-3-large	0.18	0.00	Voyage AI's embedding model with the best general-purpose and multilingual retrieval quality.
vercel	Titan Text Embeddings V2	titan-embed-text-v2	0.02	0.00	Amazon Titan Text Embeddings V2 is a light weight, efficient multilingual embedding model supporting 1024, 512, and 256 dimensions.
vercel	Grok 3 Fast Beta	grok-3-fast	5.00	25.00	xAI's flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
vercel	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking	0.30	2.90	Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.
vercel	v0-1.5-md	v0-1.5-md	3.00	15.00	Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
vercel	Qwen3 Coder Plus	qwen3-coder-plus	1.00	5.00	Powered by Qwen3 this is a powerful Coding Agent that excels in tool calling and environment interaction to achieve autonomous programming. It combines outstanding coding proficiency with versatile general-purpose abilities.
vercel	Qwen3 Embedding 4B	qwen3-embedding-4b	0.02	0.00	The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
vercel	Grok 3 Mini Fast Beta	grok-3-mini-fast	0.60	4.00	xAI's lightweight model that thinks before responding. Great for simple or logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. The fast model variant is served on faster infrastructure, offering response times that are significantly faster than the standard. The increased speed comes at a higher cost per output token.
vercel	v0-1.0-md	v0-1.0-md	3.00	15.00	Access the model behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.
vercel	Qwen3-30B-A3B	qwen-3-30b	0.08	0.29	Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
vercel	o3 Pro	o3-pro	20.00	80.00	The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.
vercel	GLM-4.6V	glm-4.6v	0.30	0.90	GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.
vercel	Grok 2	grok-2	2.00	10.00	Grok 2 is a frontier language model with state-of-the-art reasoning capabilities. It features advanced capabilities in chat, coding, and reasoning, outperforming both Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard.
vercel	Claude 3.5 Sonnet (2024-06-20)	claude-3.5-sonnet-20240620	3.00	15.00	Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.
vercel	Nova Pro	nova-pro	0.80	3.20	A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.
vercel	Command A	command-a	2.50	10.00	Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.
vercel	Nova 2 Lite	nova-2-lite	0.30	2.50	Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text.
vercel	Sonoma Sky Alpha	sonoma-sky-alpha	0.20	0.50	This model is no longer in stealth and gets responses from Grok 4 Fast Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
vercel	Sonoma Dusk Alpha	sonoma-dusk-alpha	0.20	0.50	This model is no longer in stealth and gets responses from Grok 4 Fast Non-Reasoning–xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
vercel	Llama 3.2 1B Instruct	llama-3.2-1b	0.10	0.10	Text-only model, supporting on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.
vercel	o1	o1	15.00	60.00	o1 is OpenAI's flagship reasoning model, designed for complex problems that require deep thinking. It provides strong reasoning capabilities with improved accuracy for complex multi-step tasks.
vercel	GLM 4.5V	glm-4.5v	0.60	1.80	Built on the GLM-4.5-Air base model, GLM-4.5V inherits proven techniques from GLM-4.1V-Thinking while achieving effective scaling through a powerful 106B-parameter MoE architecture.
vercel	GLM 4.5	glm-4.5	0.60	2.20	GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
vercel	Qwen3 Max Preview	qwen3-max-preview	1.20	6.00	Qwen3-Max-Preview shows substantial gains over the 2.5 series in overall capability, with significant enhancements in Chinese-English text understanding, complex instruction following, handling of subjective open-ended tasks, multilingual ability, and tool invocation; model knowledge hallucinations are reduced.
vercel	Devstral Small 1.1	devstral-small	0.10	0.30	Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents.
vercel	voyage-3.5-lite	voyage-3.5-lite	0.02	0.00	Voyage AI's embedding model optimized for latency and cost.
vercel	FLUX.1 Kontext Max	flux-kontext-max	0.00	0.00	FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel	Imagen 4 Fast	imagen-4.0-fast-generate-001	0.00	0.00	Imagen 4 Fast is Google’s speed-optimized variant of the Imagen 4 text-to-image model, designed for rapid, high-volume image generation. It’s ideal for workflows like quick drafts, mockups, and iterative creative exploration. Despite emphasizing speed, it still benefits from the broader Imagen 4 family’s improvements in clarity, text rendering, and stylistic flexibility, and supports high-resolution outputs up to 2K.
vercel	o3-deep-research	o3-deep-research	10.00	40.00	o3-deep-research is OpenAI's most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data—brought in through MCP connectors.
vercel	FLUX1.1 [pro]	flux-pro-1.1	0.00	0.00	FLUX1.1 [pro] is the standard for text-to-image generation with fast, reliable and consistently stunning results. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel	Imagen 4	imagen-4.0-generate-001	0.00	0.00	Imagen 4: Google's flagship text-to-image model that serves as the go-to choice for a wide variety of high-quality image generation tasks, featuring significant improvements in text rendering over previous models. It now supports up to 2K resolution generation for creating detailed and crisp visuals, making it suitable for everything from marketing assets to artistic compositions.
vercel	FLUX.2 [flex]	flux-2-flex	0.00	0.00	FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [flex] supports customizable image generation and editing with adjustable steps and guidance. It's better at typography and text rendering. It supports up to 10 reference images (up to 14 MP total input). This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel	FLUX.2 [pro]	flux-2-pro	0.00	0.00	FLUX.2 is a completely new base model trained for visual intelligence, not just pixel generation, setting a new standard for both image generation and image editing. With FLUX.2 models you can expect the highest quality, higher resolutions (up to 4MP), and new capabilities like multi-ref images. FLUX.2 [pro] supports generation, editing, and multiple reference images (up to 9 MP total input). This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel	Imagen 4 Ultra	imagen-4.0-ultra-generate-001	0.00	0.00	Imagen 4 Ultra: Highest quality image generation model for detailed and photorealistic outputs.
vercel	Sonar Reasoning	sonar-reasoning	1.00	5.00	A reasoning-focused model that outputs Chain of Thought (CoT) in responses, providing detailed explanations with search grounding.
vercel	FLUX1.1 [pro] Ultra	flux-pro-1.1-ultra	0.00	0.00	FLUX1.1 [pro] Ultra delivers ultra-fast, ultra high-resolution image creation - with more pixels in every picture. Generate varying aspect ratios from text, at 4MP resolution fast. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel	FLUX.1 Kontext Pro	flux-kontext-pro	0.00	0.00	FLUX.1 Kontext creates images from text prompts with unique capabilities for character consistency and advanced editing. It also edits images using simple text prompts. No complex workflows or fine-tuning needed. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel	GPT-3.5 Turbo Instruct	gpt-3.5-turbo-instruct	1.50	2.00	Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.
vercel	Llama 3.2 90B Vision Instruct	llama-3.2-90b	0.72	0.72	Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
vercel	Qwen3 Embedding 0.6B	qwen3-embedding-0.6b	0.01	0.00	The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).
vercel	Trinity Mini	trinity-mini	0.05	0.15	Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model, engineered for efficient inference over long contexts with robust function calling and multi-step agent workflows.
vercel	FLUX.1 Fill [pro]	flux-pro-1.0-fill	0.00	0.00	A state-of-the-art inpainting model, enabling editing and expansion of real and generated images given a text description and a binary mask. This provider gives the option to change the moderation level for inputs and outputs. The control is under safety tolerance and is by default 2 on a range from 0 (more strict) through 6 (more permissive).
vercel	FLUX.2 [max]	flux-2-max	0.00	0.00	FLUX.2 [max] offers image generation and image editing with the highest quality available. It delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency. Built for professional use, FLUX.2 [max] produces production-ready outputs for marketing teams, creatives, filmmakers, and creators around the world.
vercel	Text Multilingual Embedding 002	text-multilingual-embedding-002	0.03	0.00	Multilingual text embedding model optimized for cross-lingual tasks across many languages.
vercel	Mercury Coder Small Beta	mercury-coder-small	0.25	1.00	Mercury Coder Small is ideal for code generation, debugging, and refactoring tasks with minimal latency.
vercel	LongCat Flash Thinking	longcat-flash-thinking	0.15	1.50	LongCat-Flash-Thinking is a high-throughput MoE reasoning model (128k context) optimized for agentic tasks.
vercel	Llama 3.2 11B Vision Instruct	llama-3.2-11b	0.16	0.16	Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.
vercel	Codestral Embed	codestral-embed	0.15	0.00	Code embedding model that can embed code databases and repositories to power coding assistants.
vercel	Magistral Medium 2509	magistral-medium	2.00	5.00	Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
vercel	Magistral Small 2509	magistral-small	0.50	1.50	Complex thinking, backed by deep understanding, with transparent reasoning you can follow and verify. The model excels in maintaining high-fidelity reasoning across numerous languages, even when switching between languages mid-task.
vercel	Mistral Nemo	mistral-nemo	0.04	0.17	A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.
vercel	Mixtral MoE 8x22B Instruct	mixtral-8x22b-instruct	1.20	1.20	8x22b Instruct model. 8x22b is mixture-of-experts open source model by Mistral served by Fireworks.
vercel	Morph V3 Large	morph-v3-large	0.90	1.90	Morph offers a specialized AI model that applies code changes suggested by frontier models (like Claude or GPT-4o) to your existing code files FAST - 2500+ tokens/second. It acts as the final step in the AI coding workflow. Supports 16k input tokens and 16k output tokens.
vercel	Nvidia Nemotron Nano 9B V2	nemotron-nano-9b-v2	0.04	0.16	NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.\
vercel	Codex Mini	codex-mini	1.50	6.00	Codex Mini is a fine-tuned version of o4-mini specifically for use in Codex CLI.
vercel	voyage-code-2	voyage-code-2	0.12	0.00	Voyage AI's embedding model optimized for code retrieval (17% better than alternatives). This is the previous generation of code embeddings models.
vercel	voyage-code-3	voyage-code-3	0.18	0.00	Voyage AI's embedding model optimized for code retrieval.
vercel	voyage-finance-2	voyage-finance-2	0.12	0.00	Voyage AI's embedding model optimized for finance retrieval and RAG.
vercel	voyage-law-2	voyage-law-2	0.12	0.00	Voyage AI's embedding model optimized for legal retrieval and RAG.
together	Llama 4 Maverick	llama-4-maverick	0.27	0.85	-
together	Llama 4 Scout	llama-4-scout	0.18	0.59	-
together	Llama 3.3 70B Instruct-Turbo	llama-3-3-70b-instruct-turbo	0.88	0.88	-
together	Llama 3.2 3B Instruct Turbo	llama-3-2-3b-instruct-turbo	0.06	0.06	-
together	Llama 3.1 405B Instruct Turbo	llama-3-1-405b-instruct-turbo	3.50	3.50	-
together	Llama 3.1 70B Instruct Turbo	llama-3-1-70b-instruct-turbo	0.88	0.88	-
together	Llama 3.1 8B Instruct Turbo	llama-3-1-8b-instruct-turbo	0.18	0.18	-
together	Llama 3 8B Instruct Lite	llama-3-8b-instruct-lite	0.10	0.10	-
together	Llama 3 70B Instruct Reference	llama-3-70b-instruct-reference	0.88	0.88	-
together	Llama 3 70B Instruct Turbo	llama-3-70b-instruct-turbo	0.88	0.88	-
together	LLaMA-2	llama-2	0.90	0.90	-
together	DeepSeek-R1	deepseek-r1	3.00	7.00	-
together	DeepSeek R1 Distilled Qwen 14B	deepseek-r1-distilled-qwen-14b	0.18	0.18	-
together	DeepSeek R1 Distilled Llama 70B	deepseek-r1-distilled-llama-70b	2.00	2.00	-
together	DeepSeek R1-0528-tput	deepseek-r1-0528-tput	0.55	2.19	-
together	DeepSeek-V3-1	deepseek-v3-1	0.60	1.70	-
together	DeepSeek-V3	deepseek-v3	1.25	1.25	-
together	gpt-oss-120B	gpt-oss-120b	0.15	0.60	-
together	gpt-oss-20B	gpt-oss-20b	0.05	0.20	-
together	Qwen3 Next 80B A3B Instruct	qwen3-next-80b-a3b-instruct	0.15	1.50	-
together	Qwen3 Next 80B A3B Thinking	qwen3-next-80b-a3b-thinking	0.15	1.50	-
together	Qwen3-VL 32B Instruct	qwen3-vl-32b-instruct	0.50	1.50	-
together	Qwen3-Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	2.00	2.00	-
together	Qwen3 235B A22B Instruct 2507 FP8	qwen3-235b-a22b-instruct-2507-fp8	0.20	0.60	-
together	Qwen3 235B A22B Thinking 2507 FP8	qwen3-235b-a22b-thinking-2507-fp8	0.65	3.00	-
together	Qwen3 235B A22B FP8 Throughput	qwen3-235b-a22b-fp8-throughput	0.20	0.60	-
together	Qwen 2.5 72B	qwen-2-5-72b	1.20	1.20	-
together	Qwen2.5-VL 72B Instruct	qwen2-5-vl-72b-instruct	1.95	8.00	-
together	Qwen2.5 Coder 32B Instruct	qwen2-5-coder-32b-instruct	0.80	0.80	-
together	Qwen2.5 7B Instruct Turbo	qwen2-5-7b-instruct-turbo	0.30	0.30	-
together	Qwen QwQ-32B	qwen-qwq-32b	1.20	1.20	-
together	GLM-4.6	glm-4-6	0.60	2.20	-
together	GLM-4.5-Air	glm-4-5-air	0.20	1.10	-
together	Kimi K2 Instruct	kimi-k2-instruct	1.00	3.00	-
together	Kimi K2 Thinking	kimi-k2-thinking	1.20	4.00	-
together	Kimi K2 0905	kimi-k2-0905	1.00	3.00	-
together	Mistral (7B) Instruct v0.2	mistral-7b-instruct-v0-2	0.20	0.20	-
together	Mistral Instruct	mistral-instruct	0.20	0.20	-
together	Mistral Small 3	mistral-small-3	0.80	0.80	-
together	Mixtral 8x7B Instruct v0.1	mixtral-8x7b-instruct-v0-1	0.60	0.60	-
together	Marin 8B Instruct	marin-8b-instruct	0.18	0.18	-
together	Arcee AI AFM-4.5B	arcee-ai-afm-4-5b	0.10	0.40	-
together	Arcee AI Coder-Large	arcee-ai-coder-large	0.50	0.80	-
together	Arcee AI Maestro	arcee-ai-maestro	0.90	3.30	-
together	Arcee AI Virtuoso-Large	arcee-ai-virtuoso-large	0.75	1.20	-
together	Cogito v2 preview - 109B MoE	cogito-v2-preview-109b-moe	0.18	0.59	-
together	Cogito v2 preview - 405B	cogito-v2-preview-405b	3.50	3.50	-
together	Cogito v2 preview - 671B MoE	cogito-v2-preview-671b-moe	1.25	1.25	-
together	Cogito v2 preview - 70B	cogito-v2-preview-70b	0.88	0.88	-
together	Refuel LLM-2	refuel-llm-2	0.60	0.60	-
together	Refuel LLM-2 Small	refuel-llm-2-small	0.20	0.20	-
together	Typhoon 2 70B Instruct	typhoon-2-70b-instruct	0.88	0.88	-
together	gemma-3n-E4B-it	gemma-3n-e4b-it	0.02	0.04	-
poe	-	assistant	-	-	General-purpose assistant. Write, code, ask for real-time information, create images, and more. Queries are automatically routed based on the task and subscription status. For subscribers: - General queries: @GPT-5.2-Instant - Web searches: @Web-Search - Image generation: @Nano-Banana - Video-input tasks: @Gemini-2.5-Pro For non-subscribers: - General queries: @GPT-4o-Mini - Web searches: @Web-Search - Image generation: @FLUX-schnell - Video-input tasks: @Gemini-2.5-Flash
poe	-	gpt-5.2-instant	1.60	13.00	A fast, steady conversational model built for day-to-day use. It handles long threads without drifting, keeps context clean, and answers in a straightforward way. Good for planning, rewriting, summarizing, and quick technical help. Supports 400k tokens of context and native vision. Optional parameters: Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe	-	claude-opus-4.5	4.30	21.00	Claude Opus 4.5 from Anthropic, supports customizable thinking budget (up to 64k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63999 to the end of your message.
poe	-	gemini-3-flash	0.40	2.40	Building on the reasoning capabilities of Gemini 3 Pro, Gemini 3 Flash is a powerful but affordable and performant model. It has exceptional world knowledge, multimodal understanding and reasoning capabilities at a fraction of the cost of equivalent models (as of December 2025). Optional parameters: To set thinking level, add --thinking_level and set it to either `minimal`, `low`, `high`. This is set to `low` as default. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe	-	gemini-3-pro	1.60	9.60	Gemini 3 Pro is a state-of-the-art model for math, coding, computer use, and long‑horizon agent tasks, delivering top benchmark results including 23.4% on MathArena Apex (up from 1.6%), SOTA on tau-bench, an Elo of 2,439 on LiveCodeBench Pro (vs. 2,234), 72.7% on ScreenSpot‑Pro (~2× the previous best), and a higher mean net worth on Vending‑Bench 2 ($5,478 vs. $3,838). It has a 1M input context window and a max output tokens of 64k. Optional Parameters: To instruct the bot to use more thinking effort, select from "Low" or "High" To enable web search and real-time information update, toggle "enable web search". This is disabled by default.
poe	-	gpt-5.2-pro	19.00	150.00	A powerful reasoning model that is ideal for your most complex, highest difficulty tasks. On x-high reasoning effort, scores a 90.5% on ARC-AGI-1 benchmark, an incredibly difficult problem-solving benchmark where humans score 100%. Note: the model can take up to 30 minutes to think through a problem and is quite expensive. Supports 400k tokens of context and native vision. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "medium", "high" or "Xhigh" (default: "medium") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
poe	-	gpt-5.2	1.60	13.00	GPT-5.2 is a state-of-the-art AI model from OpenAI designed for real work across writing, analysis, coding, and problem solving. It handles long contexts and multi-step tasks better than earlier versions, and it’s tuned to give accurate responses with fewer errors. Supports 400k tokens of context, and native vision. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", "high", or "Xhigh" (default: "None") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", "high" (default: medium)
poe	-	claude-sonnet-4.5	2.60	13.00	Claude Sonnet 4.5 represents a major leap forward in AI capability and alignment. It is the most advanced model released by Anthropic to date, distinguished by dramatic improvements in reasoning, mathematics, and real-world coding. Supports 1m tokens of context. To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 31,999 to the end of your message. Use `--web_search true` to enable web search and real-time information update. This is disabled by default.
poe	-	grok-4	3.00	15.00	Grok 4 is xAI's latest and most intelligent language model. It features state-of-the-art capabilities in coding, reasoning, and answering questions. It excels at handling complex and multi-step tasks. Reasoning traces are not available via the xAI API.
poe	-	claude-haiku-4.5	0.85	4.30	Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line, and scores >73% on SWE-bench verified, ranking among the world's best coding models. Supports 200k tokens of context. Optional parameters: To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63,999 to the end of your message. Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe	-	claude-opus-4.1	13.00	64.00	Claude Opus 4.1 from Anthropic, supports customizable thinking budget (up to 32k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 31999 to the end of your message.
poe	-	glm-4.7	-	-	GLM-4.7 is Z.AI's latest flagship model, with major upgrades focused on advanced coding capabilities and more reliable multi-step reasoning and execution. It shows clear gains in complex agent workflows, while delivering a more natural conversational experience and stronger front-end design sensibility. File Support: Text, Markdown and PDF files Context window: 205k tokens Optional parameters: Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default. Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
poe	-	minimax-m2.1	-	-	MiniMax M2.1 is a cutting-edge AI model designed to revolutionize how developers build software. With enhanced multi-language programming support, it excels in generating high-quality code across popular languages like Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, and JavaScript. Key improvements include: 22% faster response times and 30% lower token consumption for efficient workflows. Seamless integration with leading development frameworks (Claude Code, Droid Factory AI, BlackBox, etc.). Full-stack development capabilities, from mobile (Android/iOS) to web and 3D interactive prototyping. Optimized performance-to-cost ratio, making AI-assisted development more accessible. Whether you're a software engineer, app developer, or tech innovator, M2.1 empowers smarter coding with industry-leading AI. File Support: Text, Markdown and PDF files Context window: 205k tokens Optional parameters: Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default. Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
poe	-	gemini-2.5-flash	0.21	1.80	Gemini 2.5 Flash builds upon the popular foundation of Google's 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, search capabilities, and image/video understanding while still prioritizing speed and cost. Supports 1M tokens of input context. Serves the latest `gemini-2.5-flash-preview-09-2025` snapshot. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 24,576 to the end of your message. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe	-	gemini-2.5-pro	0.87	7.00	Gemini 2.5 Pro is Google's advanced model with frontier performance on various key benchmarks; supports web search and 1 million tokens of input context. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 32,768 to the end of your message. Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe	-	kling-omni	-	-	Bot for Kling Omni Image-to-Video inference. Send one image for image-to-video generation and two images for first-to-last frame video generation. Set duration with `--duration`, to either 5 or 10 seconds. Accepted file type: jpeg, png, webp, heic, heif. This bot does not accept video files. Note: Prompt is required after attaching images to generate video.
poe	-	deepseek-r1	18,000.00	-	Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to Together AI, a US-based company. Supports 164k tokens of input context and 33k tokens of output context. Uses the latest May 28th snapshot (DeepSeek-R1-0528).
poe	-	manus	-	-	Manus is an autonomous AI agent that executes tasks. It can take a high-level prompt, break it into subtasks, interact with tools/APIs, and deliver end-to-end results (like reports, code, websites, images, and more) without you managing each step. Notes: - In Agent mode, responses may take several minutes to complete. - Sometimes, files that Manus has created are incorrectly uploaded to the Poe message. In such cases, please check the Manus chat for the file. Parameter controls available: 1. Task Mode - Default: '--task_mode adaptive' (smart routing: may choose Chat or Agent) - Conversational single turn:' --task_mode chat' (fixed price) - Autonomous multi-step: '--task_mode agent' 2. Agent Profile - Default: '--agent_profile manus-1.6' (standard tasks) - Lower usage: '--agent_profile manus-1.6-lite' (speed/savings) - Maximum capability: '--agent_profile manus-1.6-max' (complex reasoning)
poe	-	glm-4.6	6,600.00	-	As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications. Use `--enable_thinking false` to disable thinking about the response before giving a final answer. This is enabled by default. Bot does not support media (video and audio file) attachments. Technical Specifications File Support: Text, Markdown and PDF files Context window: 200k tokens
poe	-	gpt-5.1-instant	1.10	9.00	OpenAI’s most flagship model optimized for conversational intelligence. It excels at natural dialogue, contextual memory, and adaptive tone, making it perfect for interactive agents, tutoring, and customer support. It balances speed, reliability, and empathy for seamless real‑time communication. Supports 128k tokens of input context.
poe	-	gpt-5.1	1.10	9.00	OpenAI’s flagship general‑purpose model, built for advanced reasoning, comprehension, and creativity. It delivers robust performance across text and code, with significant improvements in factual accuracy, long‑context understanding, and multilingual fluency. Ideal for research, content creation, analysis, and problem‑solving in any domain. Supports 400k of input context window. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high" (default: "None") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
poe	-	gpt-image-1.5	-	-	OpenAI's frontier image generation model in ChatGPT as of December 2025, offering exceptional prompt adherence, world knowledge, precise edits, facial preservation, level of detail, and overall quality with improved latency/generation times. It supports editing, restyling, and combining images attached to the latest user query. For a conversational image generation and editing experience use: https://poe.com/GPT-5.2 Optional Parameters: Set aspect ratio, with options 3:2, 1:1 and 2:3. Set quality to low, medium and high. Default is set to high. Enable use mask by toggling it on or by typing 'use_mask' in the prompt. This option is turned off by default. Disable high fidelity by toggling it off or by typing 'use_high_fidelity'. This option is turned on by default.
poe	-	kimi-k2-thinking	6,700.00	-	Built as a thinking agent, it performs step-by-step reasoning while utilizing tools, achieving state-of-the-art performance on benchmarks such as Humanity's Last Exam (HLE), BrowseComp, and others. The model demonstrates substantial advancements in reasoning, agentic search, coding, writing, and general problem-solving capabilities. Kimi K2 Thinking is capable of executing 200–300 sequential tool calls autonomously, maintaining coherent reasoning across hundreds of steps to solve complex tasks. File Support: Text, Markdown and PDF files Context window: 256k tokens
poe	-	deepseek-v3.2	-	-	We introduce DeepSeek-V3.2, a next-generation foundation model designed to unify high computational efficiency with state-of-the-art reasoning and agentic performance. DeepSeek-V3.2 is built upon three core technical breakthroughs: • DeepSeek Sparse Attention (DSA): A new highly efficient attention mechanism that significantly reduces computational overhead while preserving model quality, purpose-built for long-context reasoning and high-throughput workloads. • Scalable Reinforcement Learning Framework: DeepSeek-V3.2 leverages a robust RL training protocol and expanded post-training compute to reach GPT-5-level performance. Its high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and demonstrates reasoning capabilities comparable to Gemini-3.0-Pro. • Large-Scale Agentic Task Synthesis Pipeline: To enable reliable tool-use and multi-step decision-making, we develop a novel agentic data synthesis pipeline that generates high-quality interactive reasoning tasks at scale, greatly enhancing the model’s File Support: Text, Markdown and PDF files Context window: 164k tokens
poe	-	glm-4.6v	-	-	GLM-4.6V represents a significant multimodal advancement in the GLM series, achieving state-of-the-art visual understanding accuracy for models of its parameter scale. Notably, it's the first visual model to natively integrate Function Call capabilities directly into its architecture, creating a seamless pathway from visual perception to executable actions. This breakthrough establishes a unified technical foundation for deploying multimodal agents in real-world business applications. File Support: Text, Markdown, Image and PDF files Context window: 131k tokens Optional parameters: Enable Thinking - Toggle this on for the model to think before providing a response. This is disabled by default Temperature - Controls randomness in the response. Lower values make the output more focused and deterministic. Select from 0 to 2 range. This is set to 0.7 by default. Max Output Tokens: Maximum number of tokens to generate in the response. This can be set from 1 to 32768. Set to Max token at 32768 by default.
poe	-	gpt-5.1-codex	1.10	9.00	GPT‑5.1‑Codex extends GPT‑5.1’s capabilities for software development. It understands complex codebases, provides accurate completions, explains algorithms, and assists with debugging across modern programming languages. Designed for developers, it elevates productivity and supports full‑stack coding workflows with precision. Supports 400k tokens of input context. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
poe	-	gpt-5-pro	14.00	110.00	OpenAI’s latest flagship model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1. Use `--web_search true` to enable web search and real-time information access, this is disabled by default. GPT-5-Pro thinks long and hard. When using this bot through the API, consider increasing your request timeouts.
poe	-	gpt-5-chat	1.10	9.00	ChatGPT-5 points to the non-reasoning model GPT-5 snapshot (gpt-5-chat-latest) currently used in ChatGPT. Supports native vision, 400k tokens of context, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount.
poe	-	claude-code	-	-	A powerful assistant that can read, write, and analyze files across many formats. It can also delegate to other Poe bots to handle complex, multi-step tasks. Built on the Claude Agent SDK from Anthropic.
poe	-	grok-4.1-fast-reasoning	-	-	Grok-4.1-Fast-Reasoning is a high-performance version of xAI’s Grok 4.1 Fast, the company’s best agentic tool‑calling model. It works great in real-world use cases like customer support, deep research, and advanced analytical reasoning. Equipped with 2M‑token context window, this model processes vast information seamlessly, delivering coherent, context‑aware, and deeply reasoned insights at exceptional speed.
poe	-	zai-glm-4.6-cs	19,000.00	-	World’s fastest inference for ZAI GLM 4.6 with Cerebras. ZAI GLM 4.6 is a high‑performance AI model designed for advanced reasoning, superior coding, and effective tool use. It supports structured outputs, parallel tool calling, and real‑time streaming responses. Optimized for agentic coding and automation tasks, the model delivers strong real‑world performance with a context window of up to 131K tokens and output up to 40K tokens. For more information see: https://inference-docs.cerebras.ai/models/zai-glm-46 Context Limit: 131k
poe	-	gpt-5.1-codex-max	1.10	9.00	OpenAI's most capable agentic coding model; recommended for use in agentic harnesses or similar environments (e.g. Cursor, Claude Code, Codex); the default reasoning effort is set to `Xhigh` so the model will reason extensively on problems given to it (i.e. expect long generation times) and points-intensive. Accepts image attachments.
poe	-	gpt-5.1-codex-mini	0.22	1.80	GPT‑5.1‑Codex‑Mini is a lightweight, fast, and efficient code‑generation model derived from GPT‑5.1‑Codex. It’s optimized for quick iterations, smaller environments, and edge applications—offering strong coding assistance with lower computational cost while maintaining accuracy and utility. Supports 400k tokens of input context. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
poe	-	gpt-4o	-	-	OpenAI's GPT-4o answers user prompts in a natural, engaging & tailored writing with strong overall world knowledge. Uses GPT-Image-1 to create and edit images conversationally. For fine-grained image generation control (e.g. image quality), use https://poe.com/GPT-Image-1. Supports context window of 128k tokens. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	nano-banana-pro	1.70	10.00	Nano Banana Pro (Gemini 3 Pro Image Preview) can make detailed, context-rich visuals, precisely edit or restyle input images with exceptional fidelity, and even generate legible text in images in multiple languages. Optional parameters: `--aspect_ratio` (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image `--web_search true` to enable web search and real-time information access, this is disabled by default. `--image_only` (defaults: False): Determines whether to only generate image output `--image_size` (options: 1K, 2K, 4K): Resolution of image Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
poe	-	nano-banana	0.21	1.80	Google DeepMind's Nano Banana (i.e. Gemini 2.5 Flash Image model) offers image generation and editing capabilities, state-of-the-art performance in photo-realistic multi-turn edits at exceptional speeds. Supports a maximum input context of 32k tokens. Optional parameters: --aspect_ratio (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image --image_only (defaults: False): Determines whether to only generate image output Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
poe	-	grok-4.1-fast-non-reasoning	-	-	Grok-4.1-Fast-Non-Reasoning is a streamlined companion to Grok 4.1 Fast, xAI’s best agentic tool‑calling model. It has 2M context window and high responsiveness but is optimized for non‑reasoning tasks — excelling at text generation, summarization, and automated workflows that demand speed and efficiency over deep logic. Ideal for high-throughput use cases like customer support automation, bulk content creation, and fast conversational responses.
poe	-	gpt-5	1.10	9.00	OpenAI’s most advanced general model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal", "low", "medium", or "high" Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe	-	gpt-5-nano	0.04	0.36	GPT-5 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 400k input tokens of context. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal, "low", "medium", or "high" Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe	-	gpt-5-mini	0.22	1.80	GPT-5 mini is a small, fast & affordable model that matches or beats GPT-4.1 in many intelligence and vision-related tasks. Supports 400k tokens of context. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "minimal", "low", "medium", or "high". Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
poe	-	o3-pro	18.00	72.00	o3-pro is a well-rounded and powerful model across domains, with more capability than https://poe.com/o3 at the cost of higher price and lower speed. It is especially capable at math, science, coding, visual reasoning tasks, technical writing, and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe	-	gemini-2.5-flash-lite	0.07	0.28	A lightweight Gemini 2.5 Flash reasoning model optimized for cost efficiency and low latency. Supports web search. Supports 1 million tokens of input context. Serves the latest `gemini-2.5-flash-lite-preview-09-2025` snapshot. For more complex queries, use https://poe.com/Gemini-2.5-Pro or https://poe.com/Gemini-2.5-Flash To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 24,576 to the end of your message. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe	-	gpt-5-codex	1.10	9.00	GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. It supports multimodal inputs such as images or screenshots for UI development and a 400k token context window. We recommend using GPT-5-Codex only for agentic and interactive coding use cases. To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "low", "medium", or "high"
poe	-	grok-4-fast-non-reasoning	0.20	0.50	Grok 4 Fast Non-Reasoning is designed for fast, efficient tasks like content generation with a 2M token context window. Combining cutting-edge performance with cost-efficiency, it ensures high-quality results for simpler, everyday applications.
poe	-	qwen-3-next-80b-think	3,000.00	-	The Qwen3-Next-80B-Think (with thinking mode enabled by default) is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B-Thinking." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the inference cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens. This is the thinking version of https://poe.com/Qwen3-Next-80B, supports 65k tokens of context. Optional Parameters: Use additional input beside attachment button to manage the optional parameters: 1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer. Technical Specifications: File Support: PDF, DOC and XLSX files File Attachment Limitation: Audio, video and image files Context Window: 65k tokens
poe	-	qwen3-next-80b	2,400.00	-	The Qwen3-Next-80B is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the training cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens. Use `--enable_thinking false` to disable thinking mode before giving an answer. This is the non-thinking version of https://poe.com/Qwen3-Next-80B-Think; supports 65k tokens of context.
poe	-	deepseek-v3.2-exp	3,900.00	-	DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality. This delivers substantial computational efficiency improvements without compromising accuracy. Comprehensive benchmarks confirm V3.2-Exp matches V3.1-Terminus performance, proving efficiency gains don't sacrifice capability. As both a powerful tool and research platform, it establishes new paradigms for efficient long-context AI processing. Optional Parameters: Use additional input beside attachment button to manage the optional parameters: 1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer. Technical Specifications: File Support: Text, Markdown and PDF files Context window: 160k tokens
poe	-	nova-pro-1.0	-	-	Amazon Nova Pro 1.0 is a highly capable multimodal foundation model from Amazon Nova, offering a strong balance of accuracy, speed, and cost for processing text, images, and video. Its context window is 300,000 tokens, which enables handling very large inputs (including up to ~30 minutes of video input) in a single request. Use ‘--enable_latency_optimized [false/true]’ (default false) to disable/enable the latency optimized inference accordingly. Note that if enabled, costs may increase. Check the rate card for more information.
poe	-	nova-premier-1.0	-	-	The Amazon Nova Premier 1.0 model is Amazon’s most capable foundation model, able to handle extremely long contexts (≈ 1 million tokens) and multimodal inputs like text, images, and video while excelling at complex, multi‑step tasks across tools and data sources. It supports chain‑of‑thought style reasoning and breaks down problems into intermediate steps before arriving at an answer, improving coherence and accuracy. Use '--enable_thinking [true/false]' (default true) to enable/disable thinking accordingly.
poe	-	grok-4-fast-reasoning	0.20	0.50	Grok 4 Fast Reasoning delivers exceptional performance for tasks requiring logical thinking and problem-solving. With a 2M token context window and state-of-the-art cost-efficiency, it handles complex reasoning tasks with accuracy and speed, making advanced AI capabilities accessible to more users.
poe	-	nova-micro-1.0	-	-	Amazon Nova Micro is a text-only foundation model in the Amazon Nova family, designed for ultra‑low latency and very low cost, optimized for tasks like summarization, translation, and interactive chat. It supports a context window of 128,000 tokens, enabling handling of large text inputs in a single request.
poe	-	nova-lite-1.0	-	-	Amazon Nova Lite is a low‑cost multimodal foundation model from Amazon that can process text, images, and video and is optimized for speed and affordability. It offers a context window of 300,000 tokens, allowing handling of very large inputs in a single request (including up to ~30 minutes of video).
poe	-	minimax-m2	3,300.00	-	MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever. Technical Specifications File Support: Text, Markdown and PDF files Context window: 200k tokens
poe	-	hunyuan-image-3	-	-	Hunyuan Image 3.0 is Tencent’s next‑generation open‑source text-to-image model that uses a large multimodal Mixture-of-Experts architecture to unify image understanding and generation in one system. It produces high-fidelity, often photorealistic images with strong prompt adherence, multilingual text rendering, and intelligent world-knowledge reasoning that can enrich sparse prompts with appropriate visual details. Note: Uploading attachments is not supported. Parameter controls available: 1. Image Settings Size / Aspect Ratio - Default: `--size 1024x1024` (Square 1:1) - `--size 768x1024` (Portrait 3:4) - `--size 1024x768` (Landscape 4:3) - `--size 1024x1536` (Tall Portrait 2:3) - `--size 1536x1024` (Wide Landscape 3:2) - `--size 512x512` (Small Square 1:1) Quantity - `--num_images [1-4]` number of images to generate (default: 1) Quality & Generation - `--num_inference_steps [10-50]` denoising steps for quality (default: 28, higher = better quality but slower) - `--guidance_scale [1.0-20.0]` how closely to follow prompt (default: 7.5) Customization - `--negative_prompt "text"` things to avoid in generated images - `--seed [integer]` reproducible generation with fixed seed (e.g., 42)
poe	-	kling-image-o1	-	-	Kling Image O1 image generation and image editing bot. Send up to 10 images to use as a reference, and refer to each image with $image1, $image2, etc. in the prompt to specify interactions. Set resolution with `--resolution` and aspect ratio with `--aspect`. Note: `auto` aspect ratio is default and can be used only for editing, text-to-image generation has a default of `1:1`. Supports jpeg, png, heic, webp images.
poe	-	kling-2.6-pro	-	-	Generate high-quality videos with native audio from text and images using Kling 2.6 Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video. Use --silent to generate a silent video.
poe	-	flux-2-pro	-	-	Flux.2 [Pro] is Black Forest Labs' state-of-the-art model with multi-reference support, fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex colour codes within the prompt for precise colouring. Send images (Up to 8 images) in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 9 megapixels. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
poe	-	flux-2-flex	-	-	Flux.2 [Flex] is Black Forest Lab's latest model, with Multi-Reference Support, Fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex color codes within the prompt for precise coloring. Send images in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 14 megapixels. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
poe	-	flux-2-dev	-	-	Open-weight image gen (32B) model, derived from the FLUX.2 base model. The most powerful open-weight image generation and editing model available today, combining text-to-image synthesis and image editing with multiple input images in a single checkpoint. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
poe	-	mistral-medium-3.1	-	-	Mistral Medium 3.1 is a high-performance, enterprise-grade language model that delivers strong reasoning, coding, and STEM capabilities. It supports hybrid, on-prem, and in-VPC deployments, offering competitive accuracy and easy integration across cloud environments. Context Length: 131k
poe	-	exa-answer	-	-	Get a quick LLM-style answer to a question informed by Exa search results. For more in-depth results, consider using the following endpoint: https://poe.com/Exa-Research Supported file type upload: PDF, TXT, PNG, JPG, JPEG Audio and video file upload is not supported. Parameter Controls Available: - `--text false/true` Show text snippets under each source citation (default: false)
poe	-	exa-search	-	-	Utilize Exa's technology for searching web pages, finding similar web pages, crawling, and more. Note: This endpoint does not return an LLM-style response (visit the following if you want an LLM-style response: https://poe.com/Exa-Answer or https://poe.com/Exa-Research). File upload is not supported. Parameter Controls Available: 1. Operation Mode - Default: `--operation search` (Web Search) - For finding similar pages: `--operation similar` - For getting page contents: `--operation contents` - For code search: `--operation code` 2. Search Settings (search operation) - `--search_type [auto\|neural\|deep\|fast]` search algorithm (default: auto) - `--show_content` display full page content in results - `--include_domains` comma-separated domains to include - `--include_text` text that must appear (up to 5 words) - `--exclude_text` text that must NOT appear (up to 5 words) 3. Common Search Settings (search & similar operations) - `--num_results [1-100]` number of results to return (default: 10) - `--category [company\|research paper\|news\|pdf\|github\|tweet\|personal site\|linkedin profile\|financial report]` - `--exclude_domains` comma-separated domains to exclude 4. Date Filters (search operation) - `--start_crawl_date` results crawled after this date (ISO 8601) - `--end_crawl_date` results crawled before this date (ISO 8601) - `--start_published_date` content published after this date (ISO 8601) - `--end_published_date` content published before this date (ISO 8601) 5. Content Options (search, similar, & contents operations) - `--return_text` fetch page text content (default: true) - `--text_max_chars` limit text length (empty = unlimited) - `--include_html_tags` preserve HTML structure - `--return_highlights` get AI-selected key snippets - `--highlights_sentences [1-10]` sentences per highlight (default: 3) - `--highlights_per_url [1-10]` highlights per result (default: 3) - `--highlights_query` guide highlight selection - `--return_summary` get AI-generated summaries - `--summary_query` guide summary generation 6. Advanced Options (search, similar, & contents operations) - `--livecrawl [fallback\|never\|always\|preferred]` when to fetch fresh content (default: fallback) - `--subpages [0-10]` number of linked subpages to crawl (default: 0) - `--subpage_target` find specific subpages matching keyword 7. Code Search Controls (code operation) - `--code_tokens [dynamic\|5000\|10000\|20000]` response length (default: dynamic)
poe	-	exa-research	-	-	Create an asynchronous research task that explores the web, gathers sources, synthesizes findings, and returns results with citations. Note: Responses may take several minutes to complete depending on complexity. Supported file type upload: PDF, TXT, PNG, JPG, JPEG Audio and video file upload is not supported. Parameter Controls Available: Model Selection - `--model exa-research` (Standard, default) - `--model exa-research-pro` (Deepest, highest quality) - `--model exa-research-fast` (Fastest, lightest)
poe	-	kat-coder-pro	-	-	KAT-Coder-Pro V1 by KwaiKAT is a non-reasoning model optimized for agentic coding. It delivers strong performance on reasoning-style tasks while requiring significantly fewer output tokens than peer models. With the 1210 release, it achieved a score of 64 on the Artificial Analysis Intelligence Index, placing it in the global Top 10 and ranking first among all non-reasoning models. File Support: Text, Markdown and PDF files Context window: 256k tokens
poe	-	deepseek-v3.2-fw	5,300.00	-	Model from DeepSeek that harmonizes high computational efficiency with superior reasoning and agent performance. File Support: Image (JPG, JPEG, PNG, HEIC), Other File Types (PDF, PYTHON, XLSX)
poe	-	nova-lite-2	-	-	Amazon Nova 2 Lite is a fast, cost-effective multimodal reasoning model from Amazon that can process text, images, documents, and video, designed for everyday workloads like chatbots, document processing, and business automation. It offers a 1 million token context window, enabling very large, complex inputs in a single request, including long documents and extended video clips (~90 minutes). Note: Video file uploads are limited to ~1GB. Also note that reasoning traces are not exposed from AWS. Supported file types: JPEG, PNG, GIF, WEBP, PDF, DOCX, TXT, MP4, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP Parameter controls available: '--enable_reasoning true/false' - Enable step-by-step reasoning (default: true). '--reasoning_effort low/medium/high' - Specify the reasoning effort level (default: medium).
poe	-	gpt-oss-120b-t	1,500.00	-	OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. Built with community feedback and released under Apache 2.0, this 120B parameter model provides transparency, customization, and deployment flexibility for organizations requiring complete data security & privacy control.
poe	-	gpt-oss-20b-t	450.00	-	OpenAI's GPT-OSS-20B provides powerful chain-of-thought reasoning in an efficient 20B parameter model. Designed for single-GPU deployment while maintaining sophisticated reasoning capabilities, this Apache 2.0 licensed model offers the perfect balance of performance and resource efficiency for diverse applications.
poe	-	amazon-nova-reel-1.1	-	-	Amazon Nova Reel 1.1 is an advanced AI video generation model that creates up to 2-minute multi-shot videos from text and optional image prompts, offering improved video quality, latency, and visual consistency compared to its predecessor.
poe	-	kimi-k2-think-t	13,000.00	-	Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed. Supported File Types: JPEG, PNG, PDF
poe	-	amazon-nova-canvas	-	-	Amazon Nova Canvas is a high-quality image‐generation model that creates and edits images from text or image inputs—offering features like inpainting/outpainting, virtual try‑on, style controls, and background removal—all with built‑in customization.
poe	-	kimi-k2	6,300.00	-	Kimi K2-Instruct-0905 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Key Features: - Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability. - MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up. - Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving. Technical Specifications File Support: Attachments not supported Context window: 256k tokens
poe	-	kimi-k2-0905-t	11,000.00	-	The new Kimi K2-0905 model from Moonshot AI features a massive 256,000-token context window, double the length of its predecessor (Kimi K2), along with greatly improved coding abilities and front-end generation accuracy. It boasts 1 trillion total parameters (with 32 billion activated at a time) and claims 100% tool-call success in real-world tests, setting a new bar for open-source AI performance in complex, multi-step tasks
poe	-	kimi-k2-t	11,000.00	-	Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
poe	-	kimi-k2-instruct	6,000.00	-	Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Uses the latest September 5th, 2025 snapshot. The updated version has improved coding abilities, agentic tool use, and a longer (256K) context window.
poe	-	deepseek-v3.1	7,800.00	-	Latest Update: Terminus Enhancement This model has been updated with the Terminus release, addressing key user-reported issues while maintaining all original capabilities: - Language consistency: Reduced instances of mixed Chinese-English text and abnormal characters - Enhanced agent capabilities: Optimized performance of the Code Agent and Search Agent Core Capabilities DeepSeek-V3.1 is a hybrid model supporting both thinking mode and non-thinking mode, built upon the original V3 base checkpoint through a two-phase long context extension approach. Technical Specifications Context Window: 128k tokens File Support: PDF, DOC, and XLSX files File Restrictions: Does not accept audio and video files
poe	-	glm-4.6-fw	6,000.00	-	As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
poe	-	deepseek-v3.1-t	6,000.00	-	DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects: Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template. Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
poe	-	glm-4.5	5,700.00	-	The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. Technical Specifications File Support: PDF and Markdown files Context window: 128k tokens
poe	-	deepseek-v3.1-n	5,700.00	-	DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects: - Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template. - Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. - Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe	-	qwen3-coder	9,000.00	-	Qwen3 Coder 480B A35B Instruct is a state-of-the-art 480B-parameter Mixture-of-Experts model (35B active) that achieves top-tier performance across multiple agentic coding benchmarks. Supports 256K native context length and scales to 1M tokens with extrapolation. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company.
poe	-	claude-sonnet-4	2.60	13.00	Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 1m context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
poe	-	claude-opus-4	13.00	64.00	Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
poe	-	claude-opus-4-reasoning	13.00	64.00	Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
poe	-	claude-sonnet-4-reasoning	2.60	13.00	Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 60k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 61,440 to the end of your message.
poe	-	o4-mini	0.99	4.00	o4-mini provides high intelligence on a variety of tasks and domains, including science, math, and coding at an affordable price point. This bot uses medium reasoning effort by low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe	-	gemini-deep-research	1.60	9.60	Gemini Deep Research plans, executes, and synthesizes complex, multi-step investigations by querying the web and other data to produce detailed, structured reports. Offers best in the world performance on Google's newly released DeepSearchQA benchmark as of December 2025. Be sure to give your entire research request in the initial prompt and include as much detail as you can! use --interaction_id flag if you want to continue discussion in previous research task.
poe	-	o4-mini-deep-research	1.80	7.20	Deep Research from OpenAI powered by the o4-mini model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
poe	-	glm-4.5-air-t	2,400.00	-	The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
poe	-	glm-4.5-fw	5,400.00	-	The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters. It unifies reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
poe	-	grok-3	-	-	xAI's February 2025 flagship release representing nearly state-of-the-art performance in several reasoning/problem solving domains. The API doesn't yet support reasoning mode for Grok 3, but does for https://poe.com/Grok-3-Mini; this bot also doesn't have access to the X data feed. Supports 131k tokens of context, uses Grok 2 for native vision.
poe	-	grok-3-mini	-	-	xAI's February 2025 release with strong performance across many domains but at a more affordable price point. Supports reasoning with a configurable reasoning effort level, and 131k tokens of context; doesn't have access to the X data feed. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low" or "high".
poe	-	o3	1.80	7.20	o3 provides state-of-the-art intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe	-	o3-deep-research	9.00	36.00	Deep Research from OpenAI powered by the o3 model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
poe	-	elevenlabs-v3	-	-	ElevenLabs v3 is a cutting-edge text-to-speech model that brings scripts to life with remarkable realism and performance-level control. Unlike traditional TTS systems, it allows creators to shape the emotional tone, pacing, and soundscape of their audio through the use of inline audio tags. These tags are enclosed in square brackets and act as stage directions—guiding how a line is spoken or what sound effects are inserted—without being spoken aloud. This enables rich, expressive narration and dialogue for applications like audiobooks, games, podcasts, and interactive media. Whether you’re aiming for a tense whisper, a sarcastic remark, or a dramatic soundscape full of explosions and ambient effects, v3 gives you granular control directly in the text prompt. This bot will also run text-to-speech on PDF attachments / URL links. Examples of voice delivery tags include: * [whispers] I have to tell you a secret. * [angry] That was never the plan. * [sarcastic] Oh, sure. That’ll totally work. * and [laughs] You're hilarious. Examples of sound effect tags are: * [gunshot] Get down! * [applause] Thank you, everyone. * and [explosion] What was that?! These can also be combined. Multiple speakers can be supported via the parameter control. Dialogue for multiple speakers must follow the format, e.g. for 3 speakers: Speaker 1: [dialogue] Speaker 2: [dialogue] Speaker 3: [dialogue] Speaker 1: [dialogue] Speaker 2: [dialogue] --speaker_count 3 --voice_1 [voice_1] --voice_2 [voice_2] --voice_3 [voice_3] The following voices are supported: Alexandra - Conversational & Real Amy - Young & Natural Arabella - Mature Female Narrator Austin - Good Ol' Texas Boy Blondie - Warm & Conversational Bradford - British Male Storyteller Callum - Gravelly Yet Unsettling Charlotte - Raspy & Sensual Chris - Down-to-Earth Coco Li - Shanghainese Female Gaming - Unreal Tonemanagement 2003 Harry - Animated Warrior Hayato - Soothing Zen Male Hope - Upbeat & Clear James - Husky & Engaging James Gao - Calm Chinese Voice Jane - Professional Audiobook Reader Jessica - Playful American Female Juniper - Grounded Female Professional Karo Yang - Youthful Asian Male Kuon - Acute Fantastic Female Laura - Quirky Female Voice Liam - Warm, Energetic Youth Monika Sogam - Indian-English Accent Nichalia Schwartz - Engaging Female American Priyanka Sogam - Late-Night Radio Reginald - Brooding, Intense Villain ShanShan - Young, Energetic Female Xiao Bai - Shrill & Annoying Prompt input cannot exceed 5,000 characters.
poe	-	deepseek-v3	12,000.00	-	DeepSeek-V3 – the new top open-source LLM. Updated to the March 24, 2025 checkpoint. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to Together, a US-based company. Supports 131k context window and max output of 12k tokens.
poe	-	deepseek-v3-fw	9,000.00	-	DeepSeek-V3 is an open-source Mixture-of-Experts (MoE) language model; able to perform well on competitive benchmarks with cost-effective training & inference. All data submitted to this bot is governed by the Poe privacy policy and is sent to Fireworks, a US-based company. Supports 131k context window and max output of 131k tokens. Updated to serve the latest March 24th, 2025 snapshot.
poe	-	deepseek-v3.1-tm	5,700.00	-	DeepSeek-V3.1-Terminus preserves all original model capabilities while resolving key user-reported issues, including: - Language consistency: Significantly reducing mixed Chinese-English output and eliminating abnormal character occurrences - Agent performance: Enhanced optimization of both Code Agent and Search Agent functionality - Use `--enable_thinking false` to disable thinking about the response before giving a final answer. - The bot does not accept attachment. It also does not support billing logic Context window: 128k tokens.
poe	-	gpt-4.1	1.80	7.20	OpenAI’s GPT-4.1 significantly improves on past models in terms of its coding skills, long context (1M tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4o. Provides a 75% chat history cache discount. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	gpt-4.1-mini	0.36	1.40	GPT-4.1 mini is a small, fast & affordable model that matches or beats GPT-4o in many intelligence and vision-related tasks. Supports 1M tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
poe	-	gpt-4.1-nano	0.09	0.36	GPT-4.1 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 1M input tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5-nano.
poe	-	llama-4-scout-t	1,000.00	-	Llama 4 Scout, fast long-context multimodal model from Meta. A 16-expert MoE model that excels at multi-document analysis, codebase reasoning, and personalized tasks. A smaller model than Maverick but state of the art in its size & with text + image input support. Supports 300k context.
poe	-	claude-opus-4-search	13.00	64.00	Claude Opus 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe	-	claude-sonnet-4-search	2.60	13.00	Claude Sonnet 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe	-	claude-sonnet-3.7	2.60	13.00	Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. For the maximum extending thinking, please use https://poe.com/Claude-Sonnet-Reasoning-3.7. Supports a 200k token context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 16,384 to the end of your message.
poe	-	claude-sonnet-3.5	2.60	13.00	Anthropic's Claude Sonnet 3.5 using the October 22, 2024 model snapshot. Excels in complex tasks like coding, writing, analysis and visual processing. Has a context window of 200k of tokens (approximately 150k English words).
poe	-	claude-haiku-3.5	0.68	3.40	The latest generation of Anthropic's fastest model. Claude Haiku 3.5 has fast speeds and improved instruction following.
poe	-	gemini-2.0-flash	0.10	0.42	Gemini 2.0 Flash is Google's most popular model yet with enhanced performance and blazingly fast response times; supports web search grounding so can intelligently answer questions related to recent events. Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed. Supports 1 million tokens of input context. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
poe	-	gemini-2.0-flash-lite	0.05	0.21	Gemini 2.0 Flash Lite is a new model variant from Google that is our most cost-efficient model yet, and often considered a spiritual successor to Gemini 1.5 Flash in terms of capability, context window size and cost. Does not support web search (if you need search, we recommend using https://poe.com/Gemini-2.0-Flash), supports 1 million tokens of input context.
poe	-	claude-sonnet-3.7-search	2.60	13.00	Claude Sonnet 3.7 with access to real-time information from the web. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe	-	claude-haiku-3.5-search	0.68	3.40	Claude Haiku 3.5 with access to real-time information from the web.
poe	-	qwen3-max	-	-	Qwen3-Max is a major update to the Qwen3 series, delivering significant improvements in reasoning, instruction following, and multilingual support. It provides higher accuracy in complex tasks like coding and math, along with reduced hallucinations and better performance on open-ended questions. This model is served by Alibaba Cloud Int. from Singapore.
poe	-	gpt-oss-120b	1,200.00	-	OpenAI introduces the GPT-OSS-120B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe	-	gpt-oss-20b	450.00	-	OpenAI introduces the GPT-OSS-20B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-20B model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe	-	gpt-oss-120b-cs	3,200.00	-	World’s fastest inference for GPT OSS 120B with Cerebras. OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. The bot does not accept video, ppt, docx and excel files.
poe	-	openai-gpt-oss-120b	1,500.00	-	GPT-OSS-120b is a high-performance, open-weight language model designed for production-grade, general-purpose use cases. It fits on a single H100 GPU, making it accessible without requiring multi-GPU infrastructure. Trained on the Harmony response format, it excels at complex reasoning and supports configurable reasoning effort, full chain-of-thought transparency for easier debugging and trust, and native agentic capabilities for function calling, tool use, and structured outputs.
poe	-	openai-gpt-oss-20b	750.00	-	GPT-OSS-20B is a compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments. It shares the same Harmony training foundation and capabilities as 120B, with faster inference and easier deployment that is ideal for specialized or offline use cases, fast responsive performance, chain-of-thought output, and agentic workflows.
poe	-	qwen3-next-instruct-t	2,400.00	-	Qwen3-Next Instruct features a highly sparse MoE structure that activates only 3B of its 80B parameters during inference. Supports only instruct mode without thinking blocks, delivering performance on par with Qwen3-235B-A22B-Instruct-2507 on certain benchmarks while using less than 10% training cost and providing 10x+ higher throughput on contexts over 32K tokens.
poe	-	qwen3-next-think-t	3,000.00	-	Qwen3-Next Thinking features the same highly sparse MoE architecture but specialized for complex reasoning tasks. Supports only thinking mode with automatic tag inclusion, delivering exceptional analytical performance while maintaining extreme efficiency with 10x+ higher throughput on long contexts and may generate longer thinking content than predecessors.
poe	-	qwen3-max-n	22,000.00	-	Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode. File Support: Text, Markdown and PDF files Context window: 256k tokens
poe	-	qwen3-vl-235b-a22b-t	4,800.00	-	Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment. Key Features: - Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks. - Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos. - Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis. - Long Context: Processes up to 1M tokens for books or long videos. - Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers. - Visual Recognition: Recognizes a wide range of objects, landmarks, and more. - OCR: Supports 32 languages with improved performance in challenging conditions. - Text-Vision Fusion: Achieves seamless, unified comprehension. Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens
poe	-	qwen3-vl-235b-a22b-i	3,600.00	-	This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on‑demand deployment. Key Enhancements: Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos. Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers. Upgraded Visual Recognition: Broader, higher-quality pretraining is able to "recognize everything"—celebrities, anime, products, landmarks, flora/fauna, etc. Expanded OCR: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. Text Understanding on par with pure LLMs: Seamless text–vision fusion for lossless, unified comprehension. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens
poe	-	qwen-3-235b-2507-t	1,900.00	-	Qwen3 235B A22B 2507, currently the best instruct model (non-reasoning) among both closed and open source models. It excels in instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. It is also great at multilingual tasks and supports a long context window (262k).
poe	-	qwen3-235b-2507-fw	2,700.00	-	State-of-the-art language model with exceptional math, coding, and problem-solving performance. Operates in non-thinking mode, and does not generate <think></think> blocks in its output. Supports 256k tokens of native context length. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company. Uses the latest July 21st, 2025 snapshot (Qwen3-235B-A22B-Instruct-2507).
poe	-	qwen3-235b-2507-cs	6,000.00	-	World's fastest inference with Qwen3 235B Instruct (2507) model with Cerebras. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage.
poe	-	qwen3-coder-480b-t	17,000.00	-	Qwen3‑Coder‑480B is a state of the art mixture‑of‑experts (MoE) code‑specialized language model with 480 billion total parameters and 35 billion activated parameters. Qwen3‑Coder delivers exceptional performance across code generation, function calling, tool use, and long‑context reasoning. It natively supports up to 262,144‑token context windows, making it ideal for large repository and multi‑file coding tasks.
poe	-	qwen3-coder-480b-n	7,200.00	-	Qwen3-Coder-480B-A35B-Instruct delivers Claude Sonnet-comparable performance on agentic coding and browser tasks while supporting 256K-1M token long-context processing and multi-platform agentic coding capabilities. Technical Specifications File Support: Attachments not supported Context window: 256k tokens
poe	-	qwen3-235b-a22b-di	1,900.00	-	Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP8.
poe	-	qwen3-235b-a22b-n	1,800.00	-	It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). The Bot does not currently support attachments. This feature the following key enhancements: - Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. - Substantial gains in long-tail knowledge coverage across multiple languages. - Markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. - Enhanced capabilities in 256K long-context understanding. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe	-	magistral-medium-2509-thinking	-	-	Magistral Medium 2509 (thinking) by EmpirioLabs. Magistral is Mistral's first reasoning model. It is ideal for general purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs. From legal research and financial forecasting to software development and creative storytelling — this model solves multi-step challenges where transparency and precision are critical. Context Window: 40,000k Supported file type uploads: PDF, XLSX, TXT, PNG, JPG, JPEG
poe	-	o1	14.00	54.00	OpenAI's o1 is designed to reason before it responds and provides world-class capabilities on complex tasks (e.g. science, coding, and math). Improving upon o1-preview and with higher reasoning effort, it is also capable of reasoning through images and supports 200k tokens of input context. By default, uses reasoning_effort of medium, but low, medium & high are also selectable.
poe	-	o1-pro	140.00	540.00	OpenAI’s o1-pro highly capable reasoning model, tailored for complex, compute- or context-heavy tasks, dedicating additional thinking time to deliver more accurate, reliable answers. For less costly, complex tasks, https://poe.com/o3-mini is recommended. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe	-	cartesia-ink-whisper	-	-	Transcribe audio files using Speech-to-Text with the Cartesia Ink Whisper model. Select the Language (`--language`) of your audio file in Settings. Default is English (en). Supported Languages: English (en) Chinese (zh) German (de) Spanish (es) Russian (ru) Korean (ko) French (fr) Japanese (ja) Portuguese (pt) Turkish (tr) Polish (pl) Catalan (ca) Dutch (nl) Arabic (ar) Swedish (sv) Italian (it) Indonesian (id) Hindi (hi) Finnish (fi) Vietnamese (vi) Hebrew (he) Ukrainian (uk) Greek (el) Malay (ms) Czech (cs) Romanian (ro) Danish (da) Hungarian (hu) Tamil (ta) Norwegian (no) Thai (th) Urdu (ur) Croatian (hr) Bulgarian (bg) Lithuanian (lt) Latin (la) Maori (mi) Malayalam (ml) Welsh (cy) Slovak (sk) Telugu (te) Persian (fa) Latvian (lv) Bengali (bn) Serbian (sr) Azerbaijani (az) Slovenian (sl) Kannada (kn) Estonian (et) Macedonian (mk) Breton (br) Basque (eu) Icelandic (is) Armenian (hy) Nepali (ne) Mongolian (mn) Bosnian (bs) Kazakh (kk) Albanian (sq) Swahili (sw) Galician (gl) Marathi (mr) Punjabi (pa) Sinhala (si) Khmer (km) Shona (sn) Yoruba (yo) Somali (so) Afrikaans (af) Occitan (oc) Georgian (ka) Belarusian (be) Tajik (tg) Sindhi (sd) Gujarati (gu) Amharic (am) Yiddish (yi) Lao (lo) Uzbek (uz) Faroese (fo) Haitian Creole (ht) Pashto (ps) Turkmen (tk) Nynorsk (nn) Maltese (mt) Sanskrit (sa) Luxembourgish (lb) Myanmar (my) Tibetan (bo) Tagalog (tl) Malagasy (mg) Assamese (as) Tatar (tt) Hawaiian (haw) Lingala (ln) Hausa (ha) Bashkir (ba) Javanese (jw) Sundanese (su) Cantonese (yue)
poe	-	chatgpt-4o-latest	4.50	14.00	Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks. Supports context window of 128k tokens, cannot generate images.
poe	-	gpt-4o-mini	0.14	0.54	This intelligent small model from OpenAI is significantly smarter, cheaper, and just as fast as GPT-3.5 Turbo. Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
poe	-	glm-4.6-t	6,600.00	-	GLM-4.6 is the latest flagship model from Z.ai's GLM series, delivering state-of-the-art agentic and coding capabilities that rival Claude Sonnet 4. With 357B parameters in a Mixture-of-Experts architecture, an expanded 200K context window, and 30% improved token efficiency, GLM-4.6 represents the top-performing model developed in China.
poe	-	qwen3-max-preview	-	-	A preview version of the Max model in the Tongyi Qianwen 3 series, achieving an effective integration of thinking and non-thinking modes. In thinking mode, there is a significant enhancement in capabilities such as intelligent agent programming, common-sense reasoning, and reasoning across mathematics, science, and general domains. This model is served by Alibaba Cloud Int. from Singapore. Notes: - Audio/Video files are not supported. - Max Context Window: 252k Use '-- enable_thinking true/false' to enable/disable Deep Thinking accordingly.
poe	-	o3-mini	0.99	4.00	o3-mini is OpenAI's reasoning model, providing high intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high can be selected; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
poe	-	o3-mini-high	0.99	4.00	o3-mini-high is OpenAI's most recent reasoning model with reasoning_effort set to high, providing frontier intelligence on most tasks. Like other models in the o-series, it is designed to excel at science, math, and coding tasks. Supports 200k tokens of input context and 100k tokens of output context.
poe	-	llama-3.1-8b-di	300.00	-	The smallest and fastest model from Meta's Llama 3.1 family. This open-source language model excels in multilingual dialogue, outperforming numerous industry benchmarks for both closed and open-source conversational AI systems. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Input token limit 128k, output token limit 8k. Quantization: FP16 (official).
poe	-	claude-sonnet-3.7-reasoning	2.60	13.00	Reasoning capabilities on by default. Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. Recommended for complex math or coding problems. Supports a 200k token context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
poe	-	inception-mercury	-	-	Mercury is the first diffusion large language model (dLLM). On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. A new generation of LLMs that push the frontier of fast, high-quality text generation.
poe	-	inception-mercury-coder	-	-	Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder Small's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the blog post here: https://www.inceptionlabs.ai/introducing-mercury.
poe	-	mistral-medium-3	-	-	Mistral Medium 3 is a powerful, cost-efficient language model offering top-tier reasoning and multimodal performance. Context Window: 130k
poe	-	mistral-medium	2.70	8.10	Mistral AI's medium-sized model. Supports a context window of 32k tokens (around 24,000 words) and is stronger than Mixtral-8x7b and Mistral-7b on benchmarks across the board.
poe	-	llama-4-maverick-t	1,600.00	-	Llama 4 Maverick, state of the art long-context multimodal model from Meta. A 128-expert MoE powerhouse for multilingual image/text understanding (12 languages), creative writing, and enterprise-scale applications—outperforming Llama 3.3 70B. Supports 500k tokens context.
poe	-	llama-3.3-70b-fw	4,200.00	-	Meta's Llama 3.3 70B Instruct, hosted by Fireworks AI. Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
poe	-	llama-3.3-70b	3,900.00	-	Llama 3.3 70B – with similar performance as Llama 3.1 405B while being faster and much smaller! Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
poe	-	deepseek-prover-v2	-	-	DeepSeek-Prover-V2 is an open-source large language model specifically designed for formal theorem proving in Lean 4. The model builds on a recursive theorem proving pipeline powered by the company's DeepSeek-V3 foundation model.
poe	-	deepseek-r1-fw	18,000.00	-	State-of-the-art large reasoning model problem solving, math, and coding performance at a fraction of the cost; explains its chain of thought. All data you provide this bot will not be used in training, and is sent only to Fireworks AI, a US-based company. Supports 164k tokens of input context and 164k tokens of output context. Uses the latest May 28th, 2025 snapshot.
poe	-	deepseek-r1-di	6,000.00	-	Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
poe	-	deepseek-r1-n	6,000.00	-	The DeepSeek-R1 (latest Snapshot model DeepSeek-R1-0528) model features enhanced reasoning and inference capabilities through optimized algorithms and increased computational resources. It excels in mathematics, programming, and logic, with performance nearing top-tier models like o3 and Gemini 2.5 Pro. This bot does not accept attachments. Technical Specifications File Support: Attachments not supported Context window: 160k tokens
poe	-	llama-3.3-70b-n	1,400.00	-	The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
poe	-	llama-3.3-70b-cs	7,800.00	-	World’s fastest inference for Llama 3.3 70B with Cerebras. The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
poe	-	llama-3.1-70b-t	14,000.00	-	Llama 3.1 70B Instruct from Meta. Supports 128k tokens of context. The points price is subject to change.
poe	-	llama-3.1-8b-cs	900.00	-	World’s fastest inference for Llama 3.1 8B with Cerebras. This Llama 8B instruct-tuned version is fast and efficient. The Llama 3.1 8B is an instruction tuned text only model, optimized for multilingual dialogue use cases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.
poe	-	gpt-researcher	-	-	GPT Researcher is an agent that conducts deep research on any topic and generates a comprehensive report with citations. GPT Researcher is powered by Tavily's search engine. GPTR is based on the popular open source project: https://github.com/assafelovic/gpt-researcher -- by integrating Tavily search, it is optimized for curation and ranking of trusted research sources. Learn more at https://gptr.dev or https://tavily.com
poe	-	web-search	-	-	Web-enabled assistant bot that searches the internet to inform its responses. Particularly good for queries regarding up-to-date information or specific facts. Powered by Gemini 2.0 Flash.
poe	-	gpt-4o-search	2.20	9.00	OpenAI's fine-tuned model for searching the web for real-time information. For less expensive messages, consider https://poe.com/GPT-4o-mini-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
poe	-	gpt-4o-mini-search	0.14	0.54	OpenAI's fine-tuned model for searching the web for real-time information. For higher-performance, consider https://poe.com/GPT-4o-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
poe	-	reka-research	-	-	Reka Research is a state-of-the-art agentic AI that answers complex questions by browsing the web. It excels at synthesizing information from multiple sources, performing work that usually takes hours in minutes
poe	-	perplexity-sonar	-	-	Sonar by Perplexity is a cutting-edge AI model that delivers real-time, web-connected search results with accurate citations. It's designed to provide up-to-date information and customizable search sources, making it a powerful tool for integrating AI search into various applications. Context Length: 127k
poe	-	linkup-deep-search	-	-	Linkup Deep Search is an AI-powered search bot that continues to search iteratively if it hasn't found sufficient information on the first attempt. Results are slower compared to its Standard search counterpart, but often yield to more comprehensive results. Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Context Window: 100k Audio/video files are not supported at this time. Parameter controls available: 1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains. 2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format 3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display. Learn more: https://www.linkup.so/
poe	-	linkup-standard	-	-	Linkup Standard is an AI-powered search bot that provides detailed overviews and answers sourced from the web, helping you find high-quality information quickly and accurately. Results are faster compared to its Deep search counterpart. Context Window: 100k Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Audio/video files are not supported at this time. Parameter controls available: 1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains. 2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format 3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display. Learn more: https://www.linkup.so/
poe	-	perplexity-sonar-pro	-	-	Sonar Pro by Perplexity is an advanced AI model that enhances real-time, web-connected search capabilities with double the citations and a larger context window. It's designed for complex queries, providing in-depth, nuanced answers and extended extensibility, making it ideal for enterprises and developers needing robust search solutions. Context Length: 200k (max output token limit of 8k)
poe	-	perplexity-sonar-rsn-pro	-	-	This model operates on the open-sourced uncensored R1-1776 model from Perplexity with web search capabilities. The Perplexity Sonar Rsn Pro Reasoning Model takes AI-powered answers to the next level, offering unmatched quality and precision. Outperforming leading search engines and LLMs, This model has demonstrated superior performance in the SimpleQA benchmark, making it the gold standard for high-quality answer generation. Context Length: 128k (max output token limit of 8k)
poe	-	perplexity-deep-research	-	-	Perplexity Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains like finance, technology, health, and current events. Context Length: 128k
poe	-	flux-pro-1.1-ultra	-	-	State-of-the-art image generation with four times the resolution of standard FLUX-1.1-pro. Best-in-class prompt adherence and pixel-perfect image detail. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Add "--raw" (no other arguments needed) for an overall less processed, everyday aesthetic. Valid aspect ratios are 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21. Send an image to have this model reimagine/regenerate it via FLUX Redux, and use "--strength" (e.g --strength 0.7) to control the impact of the text prompt (1 gives greater influence, 0 means very little)."--raw true" to enable raw photographic detail.
poe	-	mistral-small-3.1	-	-	Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.
poe	-	claude-opus-3	13.00	64.00	Anthropic's Claude Opus 3 can handle complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks. Supports 200k tokens of context (approximately 150k English words).
poe	-	sonic-3.0	6,000.00	-	Generates audio based on your prompt using the latest Cartesia's Sonic 3.0 text-to-speech model in your voice of choice. Supports 10k characters. You can select a voice and language in option menu in the input bar. The following voices are supported covering 42 languages (English, Arabic, Bengali, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Malay, Malayalam, Marathi, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Slovak, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese): -- English -- Ariana Kiefer Tessa Brandon Linda - Conversational Guide Ronald - Thinker Brooke - Big Sister Katie - Friendly Fixer Jacqueline - Reassuring Agent Caroline - Southern Guide -- Arabic -- Amira - Dreamy Whisperer Omar - High-Energy Presenter -- Bengali -- Pooja - Everyday Assistant Rubel - City Guide -- Bulgarian -- Ivana - Instruction Provider Georgi - Conversationalist -- Chinese -- Hua - Sunny Support Yue - Gentle Woman Tao - Lecturer Lan - Instructor -- Croatian -- Petra - Strict Lecturer Ivan - Bar Companion -- Czech -- Jana - Crisp Conversationalist Petr - Pastor -- Danish -- Katrine - Calm Caregiver -- Dutch -- Bram - Instructional Daan - Business Baritone Sanne - Clear Companion Lucas - Storyteller -- Finnish -- Helmi - Warm Friend Mikko - Narration Expert -- French -- Helpful French Lady French Narrator Man Calm French Woman Antoine - Stern Man -- Georgian -- Levan - Support Guide Tamara - Support Specialist -- German -- Thomas - Anchor Viktoria - Phone Conversationalist Lukas - Professional Lena - Muse -- Greek -- Despina - Motherly Woman Nikos - Radio Storyteller -- Gujarati -- Isha - Learner Amit - Sports Student -- Hebrew -- Noam - Broadcaster -- Hindi -- Arushi - Hinglish Speaker Sunil - Official Announcer Riya - College Roommate Aadhya - Soother -- Hungarian -- Gabor - Reassuring Eszter - Customer Companion -- Indonesian -- Siti - Ad Narrator Andi - Dynamic Presenter -- Italian -- Liv - Casual Friend Alessandra - Melodic Guide Francesca - Elegant Partner Giancarlo - Support Leader -- Japanese -- Yumiko - Friendly Agent Emi - Soft-Spoken Friend Yuki - Calm Woman Daisuke - Businessman -- Kannada -- Prakash - Instructor Divya - Joyful Narrator -- Korean -- Jihyun - Anchorwoman Mimi - Show Stopper Byungtae - Enforcer Jiwoo - Service Specialist -- Malay -- Aisyah - Chat Partner Faiz - Family Guide -- Malayalam -- Latha - Friendly Host -- Marathi -- Suresh - Instruction Anika - Enthusiastic Seller -- Norwegian -- Lars - Casual Conversationalist -- Polish -- Tomek - Casual Companion Wojciech - Documentarian Piotr - Corporate Lead Katarzyna - Melodic Storyteller -- Portuguese -- Luana - Public Speaker Felipe - Casual Talker Ana Paula - Marketer Beatriz - Support Guide -- Punjabi -- Gurpreet - Companion Jaspreet - Commercial Woman -- Romanian -- Andrada - Steady Speaker Andrei - Conversationalist Guy -- Russian -- Tatiana - Friendly Storyteller Natalya - Soothing Guide Irina - Poetic Sergei - Expressive Narrator -- Slovak -- Katarina - Friendly Sales Peter - Narrator Man -- Spanish -- Pedro - Formal Speaker Daniela - Relaxed Woman Fran - Confident Young Professional Isabel - Teacher -- Swedish -- Freja - Nordic Reader Ingrid - Peaceful Guide Anders - Nordic Baritone Cees - Nordic Narrator -- Tagalog -- Luz - Casual Speaker Angelo - Calm Narrator -- Tamil -- Arun - Lively Lakshmi - Everyday -- Telugu -- Sindhu - Conversational Partner Vikram - Folk Narrator -- Thai -- Somchai - Star Suda - Fortune Teller -- Turkish -- Emre - Calming Speaker Leyla - Story Companion Azra - Service Specialist Taylan - Expressive -- Ukrainian -- Oleh - Professional Guy -- Vietnamese -- Minh - Conversational Partner Xia - Calm Companion
poe	-	hailuo-music-v1.5	-	-	Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions. Send the lyrics of the music over as your prompt. Use `--style` to set the style of the generated music - for example, rock and roll, hip-hop, etc. Both prompt/lyrics and style must be sent over for best quality. The prompt supports [intro][verse][chorus][bridge][outro] sections.
poe	-	elevenlabs-music	-	-	The ElevenLabs music model is a generative AI system designed to compose original music from text prompts. It allows creators to specify genres, moods, instruments, and structure, producing royalty-free tracks tailored to their needs. The model emphasizes speed, creative flexibility, and high-quality audio output, making it suitable for use in videos, podcasts, games, and other multimedia projects. This bot can produce songs with suggested lyrics based on general descriptions, exact lyrics if specified as such, or instrumental ones, all via prompting. Use `--music_length_ms` to set the length of the song in milliseconds (10,000 to 300,000 ms). Prompt input cannot exceed 2,000 characters.
poe	-	whisper-v3-large-t	3,000.00	-	Whisper v3 Large is a state-of-the-art automatic speech recognition and translation model developed by OpenAI, offering 10–20% lower error rates than its predecessor, Whisper large-v2. It supports transcription and translation across numerous languages, with improvements in handling diverse audio inputs, including noisy conditions and long-form audio files.
poe	-	stable-audio-2.5	-	-	Stable Audio 2.5 generates high-quality audio up to 3 minutes long from text prompts, supporting text-to-audio, audio-to-audio transformations, and inpainting with customizable settings like duration, steps, CFG scale, and more. It is Ideal for music production, cinematic sound design, and remixing. Note: Audio-to-audio and inpaint modes require a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - If replacing specific parts: `--mode audio-inpaint` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [4-8]`: Higher = better quality (recommended 6-8) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical) 5. Inpainting control (only for audio-inpaint) - `--mask_start_time [seconds]` start time of the uploaded audio to modify - `--mask_end_time [seconds]` end time of the uploaded audio to modify
poe	-	stable-audio-2.0	-	-	Stable Audio 2.0 generates audio up to 3 minutes long from text prompts, supporting text-to-audio and audio-to-audio transformations with customizable settings like duration, steps, CFG scale, and more. It is ideal for creative professionals seeking detailed and extended outputs from simple prompts. Note: Audio-to-audio mode requires a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [30-100]`: Higher = better quality (recommended 50-80) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical)
poe	-	hailuo-speech-02	-	-	Generate speech from text prompts using the MiniMax Speech-02 model. Include `--hd` at the end of your prompt for higher quality output with a higher price. You may set language with `--language`, voice with`--voice`, pitch with `--pitch`, speed with `--speed`, and volume with `--volume`. Please check the UI for allowed values for each parameter.
poe	-	elevenlabs-v2.5-turbo	-	-	ElevenLabs' leading text-to-speech technology converts your text into natural-sounding speech, using the Turbo v2.5 model. Simply send a text prompt, and the bot will generate audio using your choice of available voices. If you link a URL or a PDF, it will do its best to read it aloud to you. The overall default voice is Jessica, an American-English female. Add --voice "Voice Name" to the end of a message (e.g. "Hello world --voice Eric") to customize the voice used. Add --language and the two-letter, Language ISO-639-1 code to your message if you notice pronunciation errors; table of ISO-639-1 codes here: https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes (e.g. zh for Chinese, es for Spanish, hi for Hindi) The following voices are supported and recommended for each language: English -- Sarah, George, River, Matilda, Will, Jessica, Brian, Lily, Monika Sogam Chinese -- James Gao, Martin Li, Will, River Spanish -- David Martin, Will, Efrayn, Alejandro, Sara Martin, Regina Martin Hindi -- Ranga, Niraj, Liam, Raju, Leo, Manu, Vihana Huja, Kanika, River, Monika Sogam, Muskaan, Saanu, Riya, Devi Arabic -- Bill, Mo Wiseman, Haytham, George, Mona, Sarah, Sana, Laura German -- Bill, Otto, Leon Stern, Mila, Emilia, Lea, Leonie Indonesian -- Jessica, Putra, Mahaputra Portuguese -- Will, Muhammad, Onildo, Lily, Jessica, Alice Vietnamese -- Bill, Liam, Trung Caha, Van Phuc, Ca Dao, Trang, Jessica, Alice, Matilda Filipino -- Roger, Brian, Alice, Matilda French -- Roger, Louis, Emilie Swedish -- Will, Chris, Jessica, Charlotte Turkish -- Cavit Pancar, Sohbet Adami, Belma, Sultan, Mahidevran Romanian -- Eric, Bill, Brian, Charlotte, Lily Italian -- Carmelo, Luca, Alice, Lily Polish -- Robert, Rob, Eric, Pawel, Lily, Alice Norwegian -- Chris, Charlotte Czech -- Pawel Finnish -- Callum, River Hungarian -- Brian, Sarah Japanese -- Alice Prompt input cannot exceed 40,000 characters.
poe	-	sonic-2.0	-	-	Generates audio based on your prompt using the latest Cartesia's Sonic 2.0 text-to-speech model in your voice of choice (see below) Add --voice [Voice Name] to the end of a message to customize the voice used or to handle different language inputs (e.g. 你好 --voice Chinese Commercial Woman). All of Cartesia's voices are supported on Poe. The following voices are supported covering 15 languages (English, French, German, Spanish, Portuguese, Chinese, Japanese, Hindi, Italian, Korean, Dutch, Polish, Russian, Swedish, Turkish): Here's the alphabetical list of all the top voice names: "1920's Radioman" Aadhya Adele Alabama Man Alina American Voiceover Man Ananya Anna Announcer Man Apoorva ASMR Lady Australian Customer Support Man Australian Man Australian Narrator Lady Australian Salesman Australian Woman Barbershop Man Brenda British Customer Support Lady British Lady British Reading Lady Brooke California Girl Calm French Woman Calm Lady Camille Carson Casper Cathy Chongz Classy British Man Commercial Lady Commercial Man Confident British Man Connie Corinne Customer Support Lady Customer Support Man Dallas Dave David Devansh Elena Ellen Ethan Female Nurse Florence Francesca French Conversational Lady French Narrator Lady French Narrator Man Friendly Australian Man Friendly French Man Friendly Reading Man Friendly Sidekick German Conversational Woman German Conversation Man German Reporter Man German Woman Grace Griffin Happy Carson Helpful French Lady Helpful Woman Hindi Calm Man Hinglish Speaking Woman Indian Lady Indian Man Isabel Ishan Jacqueline Janvi Japanese Male Conversational Joan of Ark John Jordan Katie Keith Kenneth Kentucky Man Korean Support Woman Laidback Woman Lena Lily Whisper Little Gaming Girl Little Narrator Girl Liv Lukas Luke Madame Mischief Madison Maria Mateo Mexican Man Mexican Woman Mia Middle Eastern Woman Midwestern Man Midwestern Woman Movieman Nathan Newslady Newsman New York Man Nico Nonfiction Man Olivia Orion Peninsular Spanish Narrator Lady Pleasant Brazilian Lady Pleasant Man Polite Man Princess Professional Woman Rebecca Reflective Woman Ronald Russian Storyteller Man Salesman Samantha Angry Samantha Happy Sarah Sarah Curious Savannah Silas Sophie Southern Man Southern Woman Spanish Narrator Woman Spanish Reporter Woman Spanish-speaking Reporter Man Sportsman Stacy Stern French Man Steve Storyteller Lady Sweet Lady Tatiana Taylor Teacher Lady The Merchant Tutorial Man Wise Guide Man Wise Lady Wise Man Wizardman Yogaman Young Shy Japanese Woman Zia
poe	-	gemini-2.5-flash-tts	-	-	Gemini‑2.5‑Flash‑TTS is Google’s low‐latency text‑to‑speech model that converts text input into audio output, supporting both single‑ and multi‑speaker voices with controllable style, accent, and expressive tone — ideal for applications like podcasts, audiobooks, and conversational voice systems. This bot does not accept attachments. Parameter controls available: 1. Voice & Style Configuration - Basic Settings - `--mode single` (default) for single speaker or `--mode multi` for conversation - `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US) - `--output_format [MP3\|WAV\|OGG]` (default: MP3) - Single speaker: `--voice [voice_name]` (default: Charon) - Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore) - Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2) - Style Instructions - `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent") 2. Limitations - Text and style prompt limited to 4000 bytes each - Multi-speaker requires `SpeakerName: text` format Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm) Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
poe	-	gemini-2.5-pro-tts	-	-	Gemini‑2.5‑Pro‑TTS is Google’s highest‑quality text‑to‑speech model preview, designed for complex workflows like podcasts, audiobooks, and customer support; it delivers expressive, accent‑ and style‑controllable single‑ or multi‑speaker speech, supporting over 23 languages, and built for state‑of‑the‑art output with the most powerful model architecture. This bot does not accept attachments. Parameter controls available: 1. Voice & Style Configuration - Basic Settings - `--mode single` (default) for single speaker or `--mode multi` for conversation - `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US) - `--output_format [MP3\|WAV\|OGG]` (default: MP3) - Single speaker: `--voice [voice_name]` (default: Charon) - Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore) - Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2) - Style Instructions - `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent") 2. Limitations - Text and style prompt limited to 4000 bytes each - Multi-speaker requires `SpeakerName: text` format Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm) Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
poe	-	orpheus-tts	-	-	Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. Send a text prompt to voice it. Use --voice to choose from one of the available voices (`tara`, `leah`, `jess`, `leo`, `dan`,`mia`, `zac`, `zoe`). Officially supported sound effects are: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>, and <giggle>.
poe	-	deepgram-nova-3	-	-	Transcribe audio files using Speech-to-Text technology with the Deepgram Nova-3 model, featuring multi-language support and advanced customizable settings. [1] Basic Features: Use `--generate_pdf true` to generate a PDF file of the transcription, Use `--diarize true` to identify different speakers in the audio. This will automatically enable utterances. Use `--smart_format false` to disable automatic format text for improved readability including punctuation and paragraphs. This feature is enabled by default. [2] Advanced Features: Use `--dictation true` to convert spoken commands for punctuation into their respective marks (e.g., 'period' becomes '.'). This will automatically enable punctuation. Use `--measurements true` to format spoken measurement units into abbreviations Use `--profanity_filter true` to replace profanity with asterisks Use `--redact_pci true` to redact payment card information Use `--redact_pii true` to redact personally identifiable information Use `--utterances true` to segment speech into meaningful semantic units Use `--paragraphs false` to disable paragraphs feature. This feature split audio into paragraphs to improve transcript readability. This will automatically enable punctuation. This is enabled by default. Use `--punctuate false` to disable punctuate feature. This feature add punctuation and capitalization to your transcript. This is enabled by default. Use `--numerals false` to disable numerals feature. This feature convert numbers from written format to numerical format [3] Languages Supported: Auto-detect (Default) English Spanish French German Italian Portuguese Japanese Chinese Hindi Russian Dutch [4] Key Terms `--keyterm` to enter important terms to improve recognition accuracy, separated by commas. English only, Limited to 500 tokens total.
poe	-	playai-tts	-	-	Generates audio based on your prompt using PlayHT's text-to-speech model, in the voice of your choice. Use --voice [voice_name] to pass in the voice of your choice, choosing one from below. Voice defaults to `Jennifer_(English_(US)/American)`. Jennifer_(English_(US)/American) Dexter_(English_(US)/American) Ava_(English_(AU)/Australian) Tilly_(English_(AU)/Australian) Charlotte_(Advertising)_(English_(CA)/Canadian) Charlotte_(Meditation)_(English_(CA)/Canadian) Cecil_(English_(GB)/British) Sterling_(English_(GB)/British) Cillian_(English_(IE)/Irish) Madison_(English_(IE)/Irish) Ada_(English_(ZA)/South_African) Furio_(English_(IT)/Italian) Alessandro_(English_(IT)/Italian) Carmen_(English_(MX)/Mexican) Sumita_(English_(IN)/Indian) Navya_(English_(IN)/Indian) Baptiste_(English_(FR)/French) Lumi_(English_(FI)/Finnish) Ronel_Conversational_(Afrikaans/South_African) Ronel_Narrative_(Afrikaans/South_African) Abdo_Conversational_(Arabic/Arabic) Abdo_Narrative_(Arabic/Arabic) Mousmi_Conversational_(Bengali/Bengali) Mousmi_Narrative_(Bengali/Bengali) Caroline_Conversational_(Portuguese_(BR)/Brazilian) Caroline_Narrative_(Portuguese_(BR)/Brazilian) Ange_Conversational_(French/French) Ange_Narrative_(French/French) Anke_Conversational_(German/German) Anke_Narrative_(German/German) Bora_Conversational_(Greek/Greek) Bora_Narrative_(Greek/Greek) Anuj_Conversational_(Hindi/Indian) Anuj_Narrative_(Hindi/Indian) Alessandro_Conversational_(Italian/Italian) Alessandro_Narrative_(Italian/Italian) Kiriko_Conversational_(Japanese/Japanese) Kiriko_Narrative_(Japanese/Japanese) Dohee_Conversational_(Korean/Korean) Dohee_Narrative_(Korean/Korean) Ignatius_Conversational_(Malay/Malay) Ignatius_Narrative_(Malay/Malay) Adam_Conversational_(Polish/Polish) Adam_Narrative_(Polish/Polish) Andrei_Conversational_(Russian/Russian) Andrei_Narrative_(Russian/Russian) Aleksa_Conversational_(Serbian/Serbian) Aleksa_Narrative_(Serbian/Serbian) Carmen_Conversational_(Spanish/Spanish) Patricia_Conversational_(Spanish/Spanish) Aiken_Conversational_(Tagalog/Filipino) Aiken_Narrative_(Tagalog/Filipino) Katbundit_Conversational_(Thai/Thai) Katbundit_Narrative_(Thai/Thai) Ali_Conversational_(Turkish/Turkish) Ali_Narrative_(Turkish/Turkish) Sahil_Conversational_(Urdu/Pakistani) Sahil_Narrative_(Urdu/Pakistani) Mary_Conversational_(Hebrew/Israeli) Mary_Narrative_(Hebrew/Israeli)
poe	-	unreal-speech-tts	-	-	Convert chats, URLs, and documents into natural speech. 8 Languages: English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese. Use `--voice <VOICE_NAME>`. Defaults to `--voice Sierra`. Full list below: American English - Male: Noah, Jasper, Caleb, Ronan, Ethan, Daniel, Zane, Rowan - Female: Autumn, Melody, Hannah, Emily, Ivy, Kaitlyn, Luna, Willow, Lauren, Sierra British English - Male: Benjamin, Arthur, Edward, Oliver - Female: Eleanor, Chloe, Amelia, Charlotte Japanese - Male: Haruto - Female: Sakura, Hana, Yuki, Rina Chinese - Male: Wei, Jian, Hao, Sheng - Female: Mei, Lian, Ting, Jing Spanish - Male: Mateo, Javier - Female: Lucía French - Female: Élodie Hindi - Male: Arjun, Rohan - Female: Ananya, Priya Italian - Male: Luca - Female: Giulia Portuguese - Male: Thiago, Rafael - Female: Camila
poe	-	imagen-4-ultra	42,000.00	-	DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-exp-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
poe	-	imagen-4-fast	14,000.00	-	DeepMind's June 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-fast-generate-preview-06-06` model from Google Vertex, and has a maximum input of 480 tokens.
poe	-	imagen-4	28,000.00	-	DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
poe	-	phoenix-1.0	17,000.00	-	High-fidelity image generation with strong prompt adherence, especially for long and detailed instructions. Phoenix is capable of rendering coherent text in a wide variety of contexts. Prompt enhance is on to see the full power of a long, detailed prompt, but it can be turned off for full control. Uses the Phoenix 1.0 Fast model for performant, high-quality generations. Parameters: - Aspect Ratio (1:1, 3:2, 2:3, 9:16, 16:9) - Prompt Enhance (Enable the prompt for better image generation) - Style (Please see parameter control to identify available styles) Image generation prompts can be a maximum of 1500 characters.
poe	-	dreamina-3.1	-	-	ByteDance's Dreamina 3.1 Text-to-Image showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details. This model excels with large prompts, please use large prompts in case you face Content Checker issues. The model does not accept attachment. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, & 9:16.
poe	-	qwen-image	20,000.00	-	Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Experiments show strong general capabilities in image generation, with exceptional performance in text rendering, especially for Chinese. Prompt input cannot exceed 2,000 characters.
poe	-	qwen-image-20b	-	-	Qwen-Image (20B) is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt.
poe	-	hunyuan-image-2.1	-	-	Hunyuan Image 2.1 is a high quality, highly efficient text-to-image model. Send a prompt to generate an image. Use `--aspect` (one of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`) to set the aspect ratio of the generated image. Use `--negative_prompt` (examples: blur, low resolution, poor quality) to set negative prompt on the image generated. This bot does not accept attachment.
poe	-	flux-kontext-max	-	-	FLUX.1 Kontext [max] is a new premium model from Black Forest Labs that brings maximum performance across all aspects. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
poe	-	flux-kontext-pro	-	-	The FLUX.1 Kontext [pro] model delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, flawless typography, and image editing capabilities. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
poe	-	flux-krea	-	-	FLUX-Krea is a version of FLUX Dev tuned for superior aesthetics. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Krea Redux.
poe	-	imagen-3	28,000.00	-	Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For simpler prompts, faster results, & lower cost, use @Imagen3-Fast. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
poe	-	wan-animate	-	-	Wan Animate takes in an image and a video to generate another video where a character in the image replaces a character in the video(default), or the video character's motion is used to animate the character in the image. Pass --animate for the second functionality. The bot supports only four file types: JPEG, PNG, WebP, and MP4
poe	-	imagen-3-fast	14,000.00	-	Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts — optimized for short, simple prompts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For more complex prompts, use @Imagen3. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
poe	-	seedream-3.0	-	-	Seedream 3.0 by ByteDance is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation.
poe	-	seedance-1.0-pro	-	-	Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. `--duration` (3 to 12) sets the video duration. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
poe	-	seedance-1.0-lite	-	-	Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Optional paremeters: Use `--aspect` to set the aspect ratio (available values:`21:9`, `16:9`, `4:3`, `1:1`, `3:4` and `9:16`). Use `--resolution` (one of `480p`, `720p` and `1080p` to set the video resolution. Use `--duration` (3 to 12) sets the video duration. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
poe	-	ideogram-v3	-	-	Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use. Use `--aspect` to set the aspect ratio (Valid aspect ratios are 5:4, 4:3, 4:5, 1:1, 1:2, 1:3, 3:4, 3:1, 3:2, 2:1, 2:3, 16:9, 16:10, 10:16, 9:16), and use `--style` to specify a style (one of `AUTO`, `GENERAL`, `REALISTIC`, and `DESIGN`, default: `AUTO`.). Send one image with a prompt for image remixing/restyling. Send two images (one an image and the other a black-and-white mask image denoting an area) for image editing.
poe	-	ideogram-v2	57,000.00	-	Latest image model from Ideogram, with industry leading capabilities in generating realistic images, graphic design, typography, and more. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, 1:1. "--style" parameter can be defined to specify the style of image generated(GENERAL, REALISTIC, DESIGN, RENDER_3D, ANIME). Powered by Ideogram.
poe	-	flux-dev-di	5,000.00	-	High quality image generator using FLUX dev model. Top of the line prompt following, visual quality and output diversity. This model is a text to image generation only and does not accept attachments. To further customize the prompt, you can follow the parameters available: To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024 To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024 To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 25
poe	-	flux-schnell-di	990.00	-	This is the fastest version of FLUX, featuring highly optimized abstract models that excel at creative and unconventional renders. To further customize the prompt, you can follow the parameters available: To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024 To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024 To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 1
poe	-	flux-pro-1.1	-	-	State-of-the-art image generation with top-of-the-line prompt following, visual quality, image detail and output diversity. This is the most powerful version of FLUX 1.1, use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
poe	-	luma-photon-flash	-	-	Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
poe	-	hidream-i1-full	-	-	Hidream-I1 is a state-of-the-art text to image model by Hidream. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt. Hosted by fal.ai.
poe	-	retro-diffusion-core	-	-	Generate true game ready pixel art in seconds at any resolution between 16x16 and 512x512 across the various styles. Create 48x48 walking animations of sprites using the "animation_four_angle_walking" style! First 50 basic image requests worth of points free! Check out more settings below 👇 Example message: "A cute corgi wearing sunglasses and a party hat --ar 128:128 --style rd_fast__portrait" Settings: --ar <width>:<height> (Image size in pixels, larger images cost more. Or aspect ratio like 16:9) --style <style_name> (The name of the style you want to use. Available styles: rd_fast__anime, rd_fast__retro, rd_fast__simple, rd_fast__detailed, rd_fast__game_asset, rd_fast__portrait, rd_fast__texture, rd_fast__ui, rd_fast__item_sheet, rd_fast__mc_texture, rd_fast__mc_item, rd_fast__character_turnaround, rd_fast__1_bit, animation__four_angle_walking, rd_plus__default, rd_plus__retro, rd_plus__watercolor, rd_plus__textured, rd_plus__cartoon, rd_plus__ui_element, rd_plus__item_sheet, rd_plus__character_turnaround, rd_plus__isometric, rd_plus__isometric_asset, rd_plus__topdown_map, rd_plus__top_down_asset) --seed (Random number, keep the same for consistent generations) --tile (Creates seamless edges on applicable images) --tilex (Seamless horizontally only) --tiley (Seamless vertically only) --native (Returns pixel art at native resolution, without upscaling) --removebg (Automatically remove the background) --iw <decimal between 0.0 and 1.0> (Controls how strong the image generation is. 0.0 for small changes, 1.0 for big changes) Additional notes: All styles have a size range of 48x48 -> 512x512, except for the "mc" styles, which have a size range of 16x16 -> 128x128, and the "animation_four_angle_walking" style, which will only create 48x48 animations.
poe	-	stablediffusion3.5-l	-	-	Stability.ai's StableDiffusion3.5 Large, hosted by @fal, is the Stable Diffusion family's most powerful image generation model both in terms of image quality and prompt adherence. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16.
poe	-	flux-schnell	-	-	Turbo speed image generation with strengths in prompt following, visual quality, image detail and output diversity. This is the fastest version of FLUX.1. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
poe	-	gpt-image-1	-	-	OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query. For a conversational editing experience, use https://poe.com/GPT-4o (all users) or https://poe.com/Assistant (subscribers) instead. Optional parameters: `--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image ` --quality` (options: high, medium, low): Image resolution ` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit. `--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
poe	-	gpt-image-1-mini	-	-	OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query. Optional parameters: `--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image ` --quality` (options: high, medium, low): Image resolution ` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit. `--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
poe	-	veo-3.1	-	-	Google’s Veo 3.1 is an updated version of the Veo family of models that features richer native audio, from natural conversations to synchronized sound effects, and offers greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes. Optional parameters: `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9` negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy` `--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s` `--seed` to set the seed (set number value) `--reference-mode` toggle to use input images(3 max) as reference for video generation For first & last frame video generation and references support, please use www.poe.com/Veo-v3.1
poe	-	veo-3.1-fast	-	-	Google’s Veo 3.1 Fast is an updated version of the Veo family of models that's optimized for speed and cost, but still features richer native audio, from natural conversations to synchronized sound effects, and offer greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes. Optional parameters: `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9` negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy` `--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s` `--seed` to set the seed (set number value) For first & last frame video generation support, please use www.poe.com/Veo-v3.1-Fast
poe	-	sora-2-pro	-	-	Sora 2 Pro is OpenAI’s state-of-the-art video and audio generation model, capable of creating richly detailed, dynamic clips with synchronized audio from natural language prompts or images. It builds on Sora 2’s capabilities with enhanced physical accuracy, intricate world-state persistence, and higher fidelity in cinematic styles. The model excels at generating synchronized dialogue, sound effects, and realistic simulations, all while adhering to real-world physics. Sora 2 Pro also supports seamless editing, complex multi-shot prompt execution, and the integration of real-world elements like people, animals, and objects with unparalleled detail and accuracy. This bot supports text-to-video and image-to-video generation. Optional parameters: `--duration` (options: 4, 8, 12): Video output duration in seconds `--size` (options: [Landscape] - 1280x720, 1792x1024, [Portrait] - 720x1280, 1024x1792): Resolution of the output video
poe	-	sora-2	-	-	Sora 2 is OpenAI’s latest video and audio generation model, delivering exceptional realism, physical accuracy, and controllability. It excels at creating cinematic scenes, synchronized dialogue, sound effects, and dynamic simulations while faithfully adhering to the laws of physics. The model supports editing, multi-shot prompt adherence, and the integration of real-world elements, such as people, animals, and objects. This bot supports text-to-video and image-to-video generation. Optional parameters: `--duration` (options: 4, 8, 12): Video output duration in seconds `--size` (options: [landscape] - 1280x720, [portrait] - 720x1280): Resolution of the output video
poe	-	kling-2.5-turbo-std	-	-	Generate high-quality videos from images using Kling 2.5 Turbo Standard. Optional prompts: Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--duration` to set either 5 or 10 second video. Note - only Image to Video is supported, aspect ratio is inferred automatically from the image and cannot be set. Supported image file format: jpeg, png, webp
poe	-	wan-2.6	-	-	WAN 2.6 is Alibaba’s multimodal video generation model built for cinematic, multi-shot storytelling—creating high-fidelity videos from text and/or images while keeping characters and style consistent across scenes. It also supports native audio-visual sync (including lip-sync) and can generate or align dialogue/music/SFX with the visuals, enabling “prompt-to-video” results that feel production-ready without heavy post work. Notes: - This model is served from the Singapore area. - Upload an image to enable image-to-video generations or video(s) for video-to-video generations. - Responses may take upwards of 5 minutes (or more) to finish generating. Parameter controls available: 1. Video Settings - `--resolution 1080p` (default) or `--resolution 720p` - `--aspect_ratio 16:9` (default), `9:16`, `1:1`, `4:3`, or `3:4` (ignored for image-to-video as it uses the input image's aspect ratio) - `--duration [5, 10, or 15]` seconds (default: 5) (video-to-video limited to 10s max) 2. Advanced Settings - `--prompt_extend true` (default) or `--prompt_extend false`: AI prompt enhancement - `--audio true` (default) or `--audio false`: Enable/disable audio generation - `--shot_type multi` (default) or `--shot_type single`: Multi-shot narrative vs single continuous shot - `--seed [0-2147483646]`: Random seed for reproducibility - `--negative_prompt "text"`: Describe what you don't want in the video 3. Attachments - For i2v: Attach an image as the first frame - For r2v: Attach 1-3 reference videos (2-30 seconds each, MP4/MOV) (Use `character1`, `character2`, `character3` in prompt to reference subjects, ex. character1 references the subject in the first uploaded video) - For t2v/i2v: Optionally attach an audio file (3-30 seconds, max 15mb, .mp3/.wav) for custom audio 4. Multi-Shot Prompting - For multi-shot mode, use timeline syntax: `[Shot #] [Timestamp] [Action]`. Example: `[Shot 1] [0-5s] Wide shot of city skyline. [Shot 2] [5-10s] Close-up of character walking.` - Ensure timestamps match your selected duration and use transition keywords like "Hard cut" or "Fade in" between shots.
poe	-	seedream-4.0	-	-	Seedream 4.0 is ByteDance's latest and best text-to-image model, capable of impressive high fidelity image generation, with great text-rendering ability. Seedream 4.0 can also take in multiple images as references and combine them together or edit them to return an output. Pass `--aspect` to set the aspect ratio for the model (One of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`).
poe	-	kling-2.5-turbo-pro	-	-	Generate high-quality videos from text and images using Kling 2.5 Turbo Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video.
poe	-	kling-2.1-master	-	-	Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use --duration to set either 5 second or 10 second video.
poe	-	hailuo-02	-	-	Hailuo-02, MiniMax's latest video generation model. Generates 6-second, 768p videos, just submit a text prompt or an image with a prompt describing the desired video behavior, and it will create it; typically takes ~5 minutes for generation time. Strong motion effects and ultra-clear quality.
poe	-	hailuo-02-standard	-	-	MiniMax Hailuo-02 Video Generation model: Advanced image-to-video generation model with 768p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Use `--duration` to set the video duration (6 or 10 seconds).
poe	-	hailuo-02-pro	-	-	MiniMax Hailuo-02 Pro Video Generation model: Advanced image-to-video generation model with 1080p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Generates 5 second video.
poe	-	deepseek-r1-turbo-di	15,000.00	-	Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. Turbo model is quantized to achieve higher speeds. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
poe	-	hailuo-director-01	-	-	Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control. Both text-to-video and image-to-video are supported. Camera movement instructions can be added using square brackets (e.g. [Pan left] or [Zoom in]). You can use up to 3 combined movements per prompt. Duration is fixed to 5 seconds. Supported movements: Truck left/right, Pan left/right, Push in/Pull out, Pedestal up/down, Tilt up/down, Zoom in/out, Shake, Tracking shot, Static shot. For example: [Truck left, Pan right, Zoom in]. For a more detailed guide, refer https://sixth-switch-2ac.notion.site/T2V-01-Director-Model-Tutorial-with-camera-movement-1886c20a98eb80f395b8e05291ad8645
poe	-	pixverse-v5	-	-	Pixverse v5 offers advanced creative tools with three main features: Text-to-Video, which transforms written prompts into cinematic, high-detail video clips with fluid motion and accurate visual interpretation; Image-to-Video, which animates static images into dynamic short videos with lifelike motion and smooth transitions; and Transition, which generates seamless morphs between frames or scenes to create unified, professional-quality visual flow. Parameter Controls and Usage: 1. Video Generation (Main Control Section) - `--resolution [360p\|540p\|720p\|1080p]` - Description: Video resolution. - Default: 720p - `--duration [5\|8]` - Description: Video length in seconds. - Default: 5 - `--aspect_ratio [16:9\|4:3\|1:1\|3:4\|9:16]` - Description: Video aspect ratio. - Default: 16:9 - `--style [none\|anime\|3d_animation\|clay\|comic\|cyberpunk]` - Description: Video style (optional). - Default: none - `--negative_prompt "[text]"` - Description: Elements to avoid (optional). - Default: "" (empty) - `--seed [integer]` - Description: Optional seed for reproducibility (e.g., 12345). - Default: "" (empty/random) 2. Generation Modes (Determined by attachments) - Text-to-Video: Provide a prompt with 0 image attachments. - Image-to-Video: Provide 1 image attachment. - Transition: Provide 2 image attachments (first is start frame, second is end frame). 3. Limitations - The combination of `--resolution 1080p` and `--duration 8` is not supported. - Only 0, 1, or 2 image attachments are supported. - Attachments must be images (PNG/JPEG/WEBP/TIFF/BMP/HEIC/GIF).
poe	-	wan-2.5	-	-	Wan-2.5 Video Generation bot. Has text-to-video and image-to-video capabilities. Optionally, send an audio file (mp3) to guide the video generation. Optional Parameters: Control the output's resolution with `--resolution` (480p, 720p or 1080p) defaults to 720. Pricing varies on the basis of resolution. Aspect ratio with `--aspect` ( 16:9, 1:1, 9:16) defaults to 16:9. Duration with `--duration` ( 5s or 10s) defaults to 5s.
poe	-	pixverse-v4.5	-	-	Pixverse v4.5 is a video generation model capable of generating high quality videos in under a minute. Use `--negative_prompt` to set the negative prompt. Use `--duration` to set the video duration (5 or 8 seconds). Set the resolution (360p,540p,720p or 1080p) using `--resolution`. Send 1 image to perform an image-to-video task or a video effect generation task, and 2 images to perform a video transition task, using the first image as the first frame and the second image as the last frame. Use `--effect` to set the video generation effect, provided 1 image is given (Options: `Kiss_Me_AI`, `Kiss`, `Muscle_Surge`, `Warmth_of_Jesus`, `Anything,_Robot`, `The_Tiger_Touch`, `Hug`, `Holy_Wings`, `Hulk`, `Venom`, `Microwave`). Use `--style` to set the video generation style (for text-to-video,image-to-video, and transition only, options: `anime`, `3d_animation`, `clay`, `comic`, `cyberpunk`). Use `--seed` to set the seed and `--aspect` to set the aspect ratio.
poe	-	flux-dev	-	-	High-performance image generation with top of the line prompt following, visual quality, image detail and output diversity. This is a more efficient version of FLUX-pro, balancing quality and speed. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
poe	-	lyria	-	-	Google DeepMind's Lyria 2 delivers high-quality audio generation, capable of creating diverse soundscapes and musical pieces from text prompts. Allows users to specify elements to exclude in the audio using the "--no" parameter at the end of the prompt. Also supports "--seed" for deterministic generation. e.g. "An energetic electronic dance track --no vocals, slow tempo --seed 123". Lyria blocks prompts that name specific artists or songs (artist-intent and recitation checks). This bot does not support attachments. This bot accepts input prompts of up to 480 tokens.
poe	-	kling-1.6-pro	-	-	Kling v1.6 video generation bot, hosted by fal.ai. For best results, upload an image attachment. Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
poe	-	clarity-upscaler	-	-	Upscales images with high fidelity to the original image. Use "--upscale_factor" (value is a number between 1 and 4) to set the upscaled images' size (2 means the output image is 2x in size, etc.). "--creativity" and "--clarity" can be set between 0 and 1 to alter the faithfulness to the original image and the sharpness, respectively. This bot supports .jpg and .png images.
poe	-	topazlabs	30.00	-	Topaz Labs’ image upscaler is a best-in-class generative AI model to increase overall clarity and the pixel amount of inputted photos — whether they be ones generated by AI image models and from the real world — while preserving the original photo’s contents. It can produce images of as small as ~10MB and as large as 512MB, depending on the size of the input photo. Specify --upscale and a number up to 16 to control the upscaling factor, output_height and/or output_width to specify the number of pixels for each dimension, and add --generated if the input photo is AI-generated. With no parameters specified, it will increase both input photo’s height and width by 2; especially effective on images of human faces.
poe	-	veo-v3.1	-	-	Google's Veo-3.1 is an improved version of Veo 3. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`). Use `--silent` to generate a silent video at a lower cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V). Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation. Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task. Pass up to 3 images with `--reference` for a reference-to-video task. Reference images will be directly used in the video generation.
poe	-	veo-v3.1-fast	-	-	Google's Veo 3.1 Fast is a fast version of Veo 3.1. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`). Use `--silent` to generate a silent video at a lower cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V). Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task.
poe	-	wan-2.2	-	-	Wan-2.2 is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Send one image for image to video tasks, and send two images for first-frame - last-frame generation. Use `--aspect` to set the aspect ratio (One of `16:9`, `1:1`, `9:16`) for text-to-video requests. Duration is limited to 5 seconds only with up to 720p resolution.
poe	-	ltx-2-fast	-	-	LTX-2 Fast is a video model by Lightricks that delivers exceptional quality and speed. It can generate videos at up to 50 FPS in high resolutions and supports both text-to-video and image-to-video generation. Optional Prompts: Use `--generate-audio` to generate an audio with the video. This is disabled by default. Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default. Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price. Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default File attachment accepted: jpeg, png, webp
poe	-	ltx-2-pro	-	-	LTX-2 Pro is an advanced video generation model by Lightricks designed for professional‑grade results. It offers high‑quality, realistic video generation at exceptional speed and supports outputs up to 2K resolution. Perfect for both text‑to‑video and image‑to‑video creation, it delivers cinematic detail and smooth performance. Optional Prompts: Use `--generate_audio` to generate an audio with the video. This is disabled by default. Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default. Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price. Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default. File attachment accepted: jpeg, png, webp
poe	-	veo-3	-	-	Veo 3 produces incredibly high-quality videos across a diverse range of subjects and styles. It incorporates an enhanced understanding of real-world physics and the subtleties of human movement and expression, resulting in greater detail and overall realism. Veo 3 is fluent in the unique language of cinematography: you can request a specific genre, specify a lens, or suggest cinematic effects, and Veo 3 will deliver stunning 8-second video clips. It supports both text-to-video and image-to-video generation and also features native audio generation based on text prompts. Please note that Veo 3 does not accept audio attachments. To exclude specific elements, use --no followed by a negative prompt (e.g., blurry, cloudy, or other attributes). To set a specific seed value, use `--seed` followed by the desired number (e.g., --seed 2). To set aspect ratio, use `aspect_ratio` followed by either 16:9 or 9:16. To set duration, use `--duration` followed by either 4s, 6s, 8s.
poe	-	veo-3-vfast	-	-	Veo-3 Fast is a faster and more cost effective version of Google's Veo 3. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `1:1`, `9:16`). Use `--generate_audio` to generate audio with your video at a higher cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. Duration is limited to 7 seconds. This is a text to video generation model only.
poe	-	vidu	-	-	The Vidu Video Generation Bot creates videos using images and text prompts. You can generate videos in four modes: (1) Image-to-Video: send 1 image with a prompt, (2) Start-to-End Frame: send 2 images with a prompt for transition videos, (3) Reference-to-Video: send up to 3 images with the `--reference` flag for guidance, and (4) Template-to-Video: use `--template` to apply pre-designed templates (1-3 images required, pricing varies by template). Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image. The bot supports aspect ratios `--aspect` (16:9, 1:1, 9:16), set movement amplitude `--movement-amplitude`, and accepts PNG, JPEG, and WEBP formats. Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video). Duration is limited to 5 seconds.
poe	-	vidu-q1	-	-	The Vidu Q1 Video Generation Bot creates videos using text prompts and images. You can generate videos in three modes: (1) Text-to-Video: send a text prompt, (2) Image-to-Video: send 1 image with a prompt, and (3) Reference-to-Video: send up to 7 images with the `--reference flag`. Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image. The bot support aspect ratios `--aspect` (16:9, 1:1, 9:16) and set movement amplitude `--movement-amplitude` that can be customized for text-to-video and reference-to-video tasks. Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video generation). The bot accepts PNG, JPEG, and WEBP formats. Duration is limited to 5 seconds.
poe	-	veo-3-fast	-	-	Veo 3 Fast is a speed-optimized variant of Google’s Veo 3 AI video generation engine. It’s designed for rapid, cost-efficient production of short clips with synchronized audio (dialogue, ambient sound, effects). Prioritizes faster generation times while still delivering solid visual and audio quality, supports text-to-video and image-to-video workflows, allowing creators to animate still images into motion sequences, operates under defined constraints (e.g. video lengths of 4, 6, or 8 seconds, specified via the --duration parameter, e.g. "A cat dances --duration 6" will produce a 6-second video). Use `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9`. Please only upload photos that you own or have the right to use, otherwise the bot will throw an error.
poe	-	seedance-1.0-pro-fast	-	-	Seedance Pro Fast is a faster version of Seedance 1.0 Pro that balances speed, quality and cost. Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Optional prompts: Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Set to `16:9` as default. Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. Set to `1080p` as default. `--duration` (3 to 12) sets the video duration. Set to `5s` as default. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024). File attachment accepted: jpeg, png, webp
poe	-	sora	-	-	Sora is OpenAI's video generation model. Use `--duration` to set the duration of the generated video, and `--resolution` to set the video's resolution (480p, 720p, or 1080p). Set the aspect ratio of the generated video with `--aspect` (Valid aspect ratios are 16:9, 1:1, 9:16). This is a text-to-video model only. Switch to the newest models for improved video and audio creation: https://poe.com/Sora-2-Pro for cinematic excellence or https://poe.com/Sora-2 for unmatched realism and precision.
poe	-	omnihuman	-	-	OmniHuman, by Bytedance, generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio. Send an image including a human figure with a visible face, and an audio, and the bot will return a video. The maximum audio length accepted is 30 seconds.
poe	-	grok-code-fast-1	-	-	Grok-Code-Fast-1 from xAI is a high-performance, cost-efficient model designed for agentic coding. It offers visible reasoning traces, strong steerability, and supports a 256k context window.
poe	-	bagoodex-web-search	-	-	Bagoodex delivers real-time AI-powered web search offering instant access to videos, images, weather, and more. Audio and video uploads are not supported at this time.
poe	-	deep-ai-search	-	-	Deep search engine integrating Brave AI with real-time web search. This chatbot executes commands and scrapes websites at scale while preserving its hallmark intelligence advantage. The bot doesn't accept file attachments. Examples: https://poe.com/s/P0BQmsvbE7zusdY0n49l https://poe.com/s/QgQSPsLD9efQrIwbmwuO
poe	-	kling-avatar-pro	-	-	Create lifelike avatar videos featuring realistic humans, animals, cartoons, or stylized characters. Simply upload an image and an audio file to generate a video of your character speaking. Supported file formats: Images: JPEG, PNG, WEBP Audio: MP3, WAV
poe	-	playai-dialog	-	-	Generates dialogues based on your script using PlayHT's text-to-speech model, in the voices of your choice. Use --speaker_1 [voice_name] and --speaker_2 [voice_name] to pass in the voices of your choice, choosing from below. Voice defaults to `Jennifer_(English_(US)/American)`. Follow the below format while prompting (case sensitive): FORMAT: ``` Speaker 1: ...... Speaker 2: ...... Speaker 1: ...... Speaker 2: ...... --speaker_1 [voice_1] --speaker_2 [voice_2] ``` VOICES AVAILABLE: Jennifer_(English_(US)/American) Dexter_(English_(US)/American) Ava_(English_(AU)/Australian) Tilly_(English_(AU)/Australian) Charlotte_(Advertising)_(English_(CA)/Canadian) Charlotte_(Meditation)_(English_(CA)/Canadian) Cecil_(English_(GB)/British) Sterling_(English_(GB)/British) Cillian_(English_(IE)/Irish) Madison_(English_(IE)/Irish) Ada_(English_(ZA)/South_African) Furio_(English_(IT)/Italian) Alessandro_(English_(IT)/Italian) Carmen_(English_(MX)/Mexican) Sumita_(English_(IN)/Indian) Navya_(English_(IN)/Indian) Baptiste_(English_(FR)/French) Lumi_(English_(FI)/Finnish) Ronel_Conversational_(Afrikaans/South_African) Ronel_Narrative_(Afrikaans/South_African) Abdo_Conversational_(Arabic/Arabic) Abdo_Narrative_(Arabic/Arabic) Mousmi_Conversational_(Bengali/Bengali) Mousmi_Narrative_(Bengali/Bengali) Caroline_Conversational_(Portuguese_(BR)/Brazilian) Caroline_Narrative_(Portuguese_(BR)/Brazilian) Ange_Conversational_(French/French) Ange_Narrative_(French/French) Anke_Conversational_(German/German) Anke_Narrative_(German/German) Bora_Conversational_(Greek/Greek) Bora_Narrative_(Greek/Greek) Anuj_Conversational_(Hindi/Indian) Anuj_Narrative_(Hindi/Indian) Alessandro_Conversational_(Italian/Italian) Alessandro_Narrative_(Italian/Italian) Kiriko_Conversational_(Japanese/Japanese) Kiriko_Narrative_(Japanese/Japanese) Dohee_Conversational_(Korean/Korean) Dohee_Narrative_(Korean/Korean) Ignatius_Conversational_(Malay/Malay) Ignatius_Narrative_(Malay/Malay) Adam_Conversational_(Polish/Polish) Adam_Narrative_(Polish/Polish) Andrei_Conversational_(Russian/Russian) Andrei_Narrative_(Russian/Russian) Aleksa_Conversational_(Serbian/Serbian) Aleksa_Narrative_(Serbian/Serbian) Carmen_Conversational_(Spanish/Spanish) Patricia_Conversational_(Spanish/Spanish) Aiken_Conversational_(Tagalog/Filipino) Aiken_Narrative_(Tagalog/Filipino) Katbundit_Conversational_(Thai/Thai) Katbundit_Narrative_(Thai/Thai) Ali_Conversational_(Turkish/Turkish) Ali_Narrative_(Turkish/Turkish) Sahil_Conversational_(Urdu/Pakistani) Sahil_Narrative_(Urdu/Pakistani) Mary_Conversational_(Hebrew/Israeli) Mary_Narrative_(Hebrew/Israeli) Prompt input cannot exceed 10,000 characters.
poe	-	luma-photon	-	-	Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
poe	-	ideogram	45,000.00	-	Excels at creating high-quality images from text prompts. For most prompts, https://poe.com/Ideogram-v2 will produce better results. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, & 1:1.
poe	-	seededit-3.0	-	-	SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images. Please send an image with a prompt to edit the image.
poe	-	kling-2.1-pro	-	-	Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`. Requires an image attachment.
poe	-	kling-2.1-std	-	-	Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`.
poe	-	runway-gen-4-turbo	-	-	Runway's Gen-4 Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 1:1, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds. Full prompting guide here: https://help.runwayml.com/hc/en-us/articles/39789879462419-Gen-4-Video-Prompting-Guide
poe	-	runway	-	-	Runway's Gen-3 Alpha Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds.
poe	-	veo-2	-	-	Veo 2 creates incredibly high-quality videos in a wide range of subjects and styles. It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall. Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver in 8-second clips. Use `--aspect_ratio` (16:9 or 9:16) to customize video aspect ratio. Supports text-to-video as well as image-to-video. Non english input will be translated first. Note: currently has low rate limit so you may need to retry your request at times of peak usage.
poe	-	dream-machine	360,000.00	-	Luma AI's Dream Machine is an AI model that makes high-quality, realistic videos fast from text and images. Iterate at the speed of thought, create action-packed shots, and dream worlds with consistent characters on Poe today! To specify the aspect ratio of your video add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21). To loop your video add --loop True.
poe	-	kling-2.0-master	-	-	Generate high-quality videos from text or images using Kling 2.0 Master. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use `--duration` to set either 5 or 10 second video.
poe	-	qwen-edit	-	-	Image editing model based on Qwen-Image, with superior text editing capabilities.
poe	-	gptzero	-	-	GPTZero is a deep-learning-driven platform designed to analyze and flag portions of text that are likely generated by AI vs. human authors. It distinguishes between “entirely human,” “entirely AI,” or “mixed” content and highlights the specific sentences involved. *Max number of files that can submitted simultaneously is 50, and the max file size for all files combined is 15 MB. Each file's document will be truncated to 50,000 characters. Supported file types: PDF, DOC/DOCX, TXT, ODT Parameter controls available: 1. Detection Options - Multilingual (FR/ES): - `--multilingual true` (Enables the GPTZero multilingual model) - `--multilingual false` (Default/Disabled) - Model Version: - `--modelVersion [version_string]` (Selects a specific GPTZero model version, e.g., '2025-10-30-base') - `--modelVersion __latest__` (Default: Automatically uses the latest model version)
poe	-	kling-pro-effects	-	-	Generate videos with effects like squishing an object, two people hugging, making heart gestures, etc. using Kling-Pro-Effects. Requires an image input. Send a single image for `squish` and `expansion` effects and two images (of people) for `hug`, `kiss`, and `heart_gesture` effects. Set effect with --effect. Default effect: `squish`. Set duration with `--duration` with either 5s or 10s, set to 5s by default.
poe	-	hailuo-live	-	-	Hailuo Live, the latest model from Minimax, sets a new standard for bringing still images to life. From breathtakingly vivid motion to finely tuned expressions, this state-of-the-art model enables your characters to captivate, move, and shine like never before. It excels in bring art and drawings to life, exceptional realism without morphing, emotional range, and unparalleled character consistency. Generates 5 second video.
poe	-	hailuo-ai	-	-	Best-in-class text and image to video model by MiniMax.
poe	-	ray2	-	-	Ray2 is a large–scale video generative model capable of creating realistic visuals with natural, coherent motion. It has strong understanding of text instructions and can also take image input. Can produce videos from 540p to 4k resolution and with either 5/9s durations.
poe	-	veo-2-video	-	-	Veo2 is Google's cutting-edge video generation model. Veo creates videos with realistic motion and high quality output.
poe	-	wan-2.1	-	-	Wan-2.1 is a text-to-video and image-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Generates 5 second video.
poe	-	ideogram-v2a-turbo	24,000.00	-	Fast, affordable text-to-image model, optimized for graphic design and photography. For higher quality, use https://poe.com/Ideogram-v2A Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
poe	-	ideogram-v2a	39,000.00	-	Fast, affordable text-to-image model, optimized for graphic design and photography. For faster and more cost-effective generations, use https://poe.com/Ideogram-v2A-Turbo Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
poe	-	trellis-3d	-	-	Generate 3D models from your images using Trellis, a native 3D generative model enabling versatile and high-quality 3D asset creation. Send an image to convert it into a 3D model.
poe	-	flux-dev-finetuner	-	-	Fine-tune the FLUX dev model with your own pictures! Upload 8-12 of them (same subject, only one subject in the picture, ideally from different poses and backgrounds) and wait ~2-5 minutes to create your own finetuned bot that will generate pictures of this subject in whatever setting you want.
poe	-	flux-inpaint	-	-	Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
poe	-	flux-fill	-	-	Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
poe	-	bria-eraser	-	-	Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Send an image and a black-and-white mask image denoting the objects to be cleared out from the image. The input prompt is only used to create the filename of the output image.
poe	-	aya-vision	30.00	-	Aya Vision is a 32B open-weights multimodal model with advanced capabilities optimized for a variety of vision-language use cases. It is model trained to excel in 23 languages in both vision and text: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
poe	-	kling-1.5-pro	-	-	Kling v1.5 video generation bot, hosted by fal.ai. For best results, upload an image attachment. Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
poe	-	deepreasoning	-	-	DeepReasoning (previously DeepClaude) is a high-performance LLM inference that combines DeepSeek R1's Chain of Thought (CoT) reasoning capabilities with Anthropic Claude's creative and code generation prowess. It provides a unified interface for leveraging the strengths of both models while maintaining complete control over your data. Learn more: https://deepclaude.com/
poe	-	gemma-3-27b	-	-	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2
poe	-	qwen3-32b-cs	3,600.00	-	World’s fastest inference for Qwen 3 32B with Cerebras. Append /no_think to your prompt to disable the model's default reasoning behavior.
poe	-	qwen-2.5-vl-32b	6,600.00	-	Qwen2.5-VL-32B's mathematical and problem-solving capabilities have been strengthened through reinforcement learning, leading to a significantly improved user experience. The model's response styles have been refined to better align with human preferences, particularly for objective queries involving mathematics, logical reasoning, and knowledge-based Q&A. As a result, responses now feature greater detail, improved clarity, and enhanced formatting.
poe	-	qwen2.5-vl-72b-t	8,700.00	-	Qwen 2.5 VL 72B, a cutting-edge multimodal model from the Qwen Team, excels in visual and video understanding, multilingual text/image processing (including Japanese, Arabic, and Korean), and dynamic agentic reasoning for automation. It supports long-context comprehension (32K tokens)
poe	-	mistral-small-3	0.10	0.30	Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks--those that require robust language and instruction following performance, with very low latency. Released under an Apache 2.0 license and comparable to Llama-3.3-70B and Qwen2.5-32B-Instruct.
poe	-	deepseek-v3-di	4,300.00	-	Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
poe	-	deepseek-v3-turbo-di	5,900.00	-	Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. Turbo variant is quantized to achieve higher speeds. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
poe	-	phi-4-di	300.00	-	Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 16k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
poe	-	mistral-7b-v0.3-di	150.00	-	Mistral Instruct 7B v0.3 from Mistral AI. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
poe	-	aya-expanse-32b	5,100.00	-	Aya Expanse is a 32B open-weight research release of a model with highly advanced multilingual capabilities. Aya supports state-of-art generative capabilities in 23 languages: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
poe	-	liveportrait	-	-	Animates given portraits with the motion's in the video. Powered by fal.ai
poe	-	llama-3.1-8b-t-128k	3,000.00	-	Llama 3.1 8B Instruct from Meta. Supports 128k tokens of context. The points price is subject to change.
poe	-	stablediffusion3-2b	-	-	Stable Diffusion v3 Medium - by fal.ai
poe	-	mixtral8x22b-inst-fw	3,600.00	-	Mixtral 8x22B Mixture-of-Experts instruct model from Mistral hosted by Fireworks.
poe	-	command-r	5,100.00	-	I can search the web for up to date information and respond in over 10 languages!
poe	-	mistral-large-2	3.00	9.00	Mistral's latest text generation model (Mistral-Large-2407) with top-tier reasoning capabilities. It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. This bot has the full 128k context window supported by the model.
poe	-	dall-e-3	45,000.00	-	OpenAI's most powerful image generation model. Generates high quality images with intricate details based on the user's most recent prompt. For most prompts, https://poe.com/FLUX-pro-1.1-ultra or https://poe.com/FLUX-dev or https://poe.com/Imagen3 will produce better results. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 1:1, 7:4, & 4:7.
poe	-	reka-core	-	-	Reka's largest and most capable multimodal language model. Works with text, images, and video inputs. 8k context length.
poe	-	reka-flash	-	-	Reka's efficient and capable 21B multimodal model optimized for fast workloads and amazing quality. Works with text, images and video inputs.
poe	-	command-r-plus	5,100.00	-	A supercharged version of Command R. I can search the web for up to date information and respond in over 10 languages!
poe	-	claude-sonnet-3.5-june	2.60	13.00	Anthropic's legacy Sonnet 3.5 model, specifically the June 2024 snapshot (for the latest, please use https://poe.com/Claude-Sonnet-3.5). Excels in complex tasks like coding, writing, analysis and visual processing; generally, more verbose than the more concise October 2024 snapshot.
poe	-	gpt-3.5-turbo	0.45	1.40	OpenAI’s GPT 3.5 Turbo model is a powerful language generation system designed to provide highly coherent, contextually relevant, and detailed responses. Supports 16,384 tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	sketch-to-image	-	-	Takes in sketches and converts them to colored images.
poe	-	qwen2.5-coder-32b	1,500.00	-	Qwen2.5-Coder is the latest series of code-specific Qwen large language models (formerly known as CodeQwen), developed by Alibaba.
poe	-	stablediffusion3.5-t	-	-	Faster version of Stable Diffusion 3 Large, hosted by @fal. Excels for fast image generation. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1).
poe	-	flux-pro-1.1-t	30,000.00	-	The best state of the art image model from BFL. FLUX 1.1 Pro generates images six times faster than its predecessor, FLUX 1 Pro, while also improving image quality, prompt adherence, and output diversity. The bot does not support any attachments.
poe	-	flux-schnell-t	2,100.00	-	Lightning-fast AI image generation model that excels in producing high-quality visuals in just seconds. Great for quick prototyping or real-time use cases. This is the fastest version of FLUX.1. The bot does not support any attachments.
poe	-	recraft-v3	-	-	Recraft V3, state of the art image generation. Prompt input cannot exceed 1,000 characters. Use --style for styles, and --aspect for aspect ratio configuration (16:9, 4:3, 1:1, 3:4, 9:16). Available styles: realistic_image, digital_illustration, vector_illustration, realistic_image/b_and_w, realistic_image/hard_flash, realistic_image/hdr, realistic_image/natural_light, realistic_image/studio_portrait, realistic_image/enterprise, realistic_image/motion_blur, digital_illustration/pixel_art, digital_illustration/hand_drawn, digital_illustration/grain, digital_illustration/infantile_sketch, digital_illustration/2d_art_poster, digital_illustration/handmade_3d, digital_illustration/hand_drawn_outline, digital_illustration/engraving_color, digital_illustration/2d_art_poster_2, vector_illustration/engraving, vector_illustration/line_art, vector_illustration/line_circuit, vector_illustration/linocut
poe	-	llama-3-70b-t	2,300.00	-	Llama 3 70B Instruct from Meta. For most use cases, https://poe.com/Llama-3.3-70B will perform better.
poe	-	gpt-4o-aug	2.20	9.00	OpenAI's most powerful model, GPT-4o, using the August 2024 model snapshot. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	gpt-4-classic-0314	27.00	54.00	OpenAI's GPT-4 model. Powered by gpt-4-0314 (non-Turbo) for text input and gpt-4o for image input. For most use cases, https://poe.com/GPT-4o will perform significantly better.
poe	-	gpt-4-classic	27.00	54.00	OpenAI's GPT-4 model. Powered by gpt-4-0613 (non-Turbo) for text input and gpt-4o for image input. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	solar-pro-2	2,100.00	-	Solar Pro 2 is Upstage's latest frontier-scale LLM. With just 31B parameters, it delivers top-tier performance through world-class multilingual support, advanced reasoning, and real-world tool use. Especially in Korean, it outperforms much larger models across critical benchmarks. Built for the next generation of practical LLMs, Solar Pro 2 proves that smaller models can still lead. Supports a context length of 64k tokens.
poe	-	remove-background	-	-	Remove background from your images
poe	-	sana-t2i	-	-	SANA can synthesize high-resolution, high-quality images at a remarkably fast rate, with the ability to generate 4K images in less than a second. Optional parameters: Set aspect ratio, with options 16:9, 4:3, 1:1, 3:4 and 9:16. This is set to 4:3 by default.
poe	-	mistral-7b-v0.3-t	1,400.00	-	Mistral Instruct 7B v0.3 from Mistral AI. The points price is subject to change.
poe	-	tako	30,000.00	-	Tako is a bot that transforms your questions about stocks, sports, economics or politics into interactive, shareable knowledge cards from trusted sources. Tako's knowledge graph is built exclusively from authoritative, real-time data providers, and is embeddable in your apps, research and storytelling. You can adjust the specificity threshold by typing `--specificity 30` (or a value between 0 - 100) at the end of your query/question; the default is 60.
poe	-	llama-3.1-405b-fp16	62,000.00	-	The Biggest and Best open-source AI model trained by Meta, beating GPT-4o across most benchmarks. This bot is in BF16 and with 128K context length.
poe	-	llama-3.1-8b-fp16	1,500.00	-	The smallest and fastest member of the Llama 3.1 family, offering exceptional efficiency and rapid response times with 128K context length.
poe	-	llama-3.1-70b-fp16	6,000.00	-	The best LLM at its size with faster response times compared to the 405B model with 128K context length.
poe	-	llama-3-70b-fp16	6,000.00	-	A highly efficient and powerful model designed for a veriety of tasks with 128K context length.
poe	-	restyler	-	-	This bot enables rapid transformation of existing images, delivering high-quality style transfers and image modifications. Takes in a text input and an image attachment. Use --strength to control the guidance given by the initial image, with higher values adhering to the image more strongly.
poe	-	stablediffusionxl	3,600.00	-	Generates high quality images based on the user's most recent prompt. Allows users to specify elements to avoid in the image using the "--no" parameter at the end of the prompt. Select an aspect ratio with "--aspect". (e.g. "Tall trees, daylight --no rain --aspect 7:4"). Valid aspect ratios are 1:1, 7:4, 4:7, 9:7, 7:9, 19:13, 13:19, 12:5, & 5:12. Powered by Stable Diffusion XL.
poe	-	qwen-2.5-7b-t	2,300.00	-	Qwen 2.5 7B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages.
poe	-	qwen-2.5-72b-t	9,000.00	-	Qwen 2.5 72B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages. Delivering results on par with Llama-3-405B despite using only one-fifth of the parameters.
poe	-	python	30.00	-	Executes Python code (version 3.11) from the user message and outputs the results. If there are code blocks in the user message (surrounded by triple backticks), then only the code blocks will be executed. These libraries are imported into this bot's run-time automatically -- numpy, pandas, requests, matplotlib, scikit-learn, torch, PyYAML, tensorflow, scipy, pytest -- along with ~150 of the most widely used Python libraries.
poe	-	markitdown	-	-	Convert anything to Markdown: URLs, PDFs, Word, Excel, PowerPoint, images (EXIF metadata), audio (EXIF metadata and transcription), and more. This bot wraps Microsoft’s MarkItDown MCP server (https://github.com/microsoft/markitdown).
poe	-	gpt-4-turbo	9.00	27.00	Powered by OpenAI's GPT-4 Turbo with Vision. For most tasks, https://poe.com/GPT-4o will perform better. Supports 128k tokens of context. Requests with images will be routed to @GPT-4o. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	flux-1-schnell-fw	1,000.00	-	FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. 2. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps. 3. Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.
poe	-	flux-1-dev-fw	11,000.00	-	FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro]. 2. Competitive prompt following, matching the performance of closed source alternatives. 3. Trained using guidance distillation, making FLUX.1 [dev] more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.
poe	-	mochi-preview	-	-	Open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence. Supports both text-to-video and image-to-video. Generates 5 second video.
poe	-	gpt-3.5-turbo-instruct	1.40	1.80	Powered by gpt-3.5-turbo-instruct. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	gpt-3.5-turbo-raw	0.45	1.40	Powered by gpt-3.5-turbo without a system prompt. Check out the newest version of this bot here: https://poe.com/GPT-5.
poe	-	interpreter	-	-	Interpreter for Poe Python
poe	-	claude-haiku-3	0.21	1.10	Anthropic's Claude Haiku 3 outperforms models in its intelligence category on performance, speed and cost without the need for specialized fine-tuning. The compute points value is subject to change. For most use cases, https://poe.com/Claude-Haiku-3.5 will be better.
poe	-	code-saver	-	-	A system bot that handles Poe scripts in chat.
poe	-	code-editor	-	-	Official code editor for Poe Scripting using Python, used to connect multiple Poe bots and create AI workflows. Guide and tips: https://creator.poe.com/docs/script-bots/poe-python-reference
moonshotaicn	Kimi K2 Thinking Turbo	kimi-k2-thinking-turbo	1.15	8.00	Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
moonshotaicn	Kimi K2 Thinking	kimi-k2-thinking	0.60	2.50	Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
moonshotaicn	Kimi K2 0905	kimi-k2-0905-preview	0.60	2.50	Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
moonshotaicn	Kimi K2 0711	kimi-k2-0711-preview	0.60	2.50	Provider: Moonshot AI (China), Context: 131072, Output Limit: 16384
moonshotaicn	Kimi K2 Turbo	kimi-k2-turbo-preview	2.40	10.00	Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
lucidquery	LucidQuery Nexus Coder	lucidquery-nexus-coder	2.00	5.00	Provider: LucidQuery AI, Context: 250000, Output Limit: 60000
lucidquery	LucidNova RF1 100B	lucidnova-rf1-100b	2.00	5.00	Provider: LucidQuery AI, Context: 120000, Output Limit: 8000
moonshotai	Kimi K2 Thinking Turbo	kimi-k2-thinking-turbo	1.15	8.00	Provider: Moonshot AI, Context: 262144, Output Limit: 262144
moonshotai	Kimi K2 Turbo	kimi-k2-turbo-preview	2.40	10.00	Provider: Moonshot AI, Context: 262144, Output Limit: 262144
moonshotai	Kimi K2 0711	kimi-k2-0711-preview	0.60	2.50	Provider: Moonshot AI, Context: 131072, Output Limit: 16384
moonshotai	Kimi K2 Thinking	kimi-k2-thinking	0.60	2.50	Provider: Moonshot AI, Context: 262144, Output Limit: 262144
moonshotai	Kimi K2 0905	kimi-k2-0905-preview	0.60	2.50	Provider: Moonshot AI, Context: 262144, Output Limit: 262144
ollamacloud	Kimi K2 Thinking	kimi-k2-thinking:cloud	-	-	Provider: Ollama Cloud, Context: 256000, Output Limit: 8192
ollamacloud	Qwen3-VL 235B Instruct	qwen3-vl-235b-cloud	-	-	Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud	Qwen3 Coder 480B	qwen3-coder:480b-cloud	-	-	Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud	GPT-OSS 120B	gpt-oss:120b-cloud	-	-	Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud	DeepSeek-V3.1 671B	deepseek-v3.1:671b-cloud	-	-	Provider: Ollama Cloud, Context: 160000, Output Limit: 8192
ollamacloud	GLM-4.6	glm-4.6:cloud	-	-	Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud	Cogito 2.1 671B	cogito-2.1:671b-cloud	-	-	Provider: Ollama Cloud, Context: 160000, Output Limit: 8192
ollamacloud	GPT-OSS 20B	gpt-oss:20b-cloud	-	-	Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud	Qwen3-VL 235B Instruct	qwen3-vl-235b-instruct-cloud	-	-	Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud	Kimi K2	kimi-k2:1t-cloud	-	-	Provider: Ollama Cloud, Context: 256000, Output Limit: 8192
ollamacloud	MiniMax M2	minimax-m2:cloud	-	-	Provider: Ollama Cloud, Context: 200000, Output Limit: 8192
ollamacloud	Gemini 3 Pro Preview	gemini-3-pro-preview:latest	-	-	Provider: Ollama Cloud, Context: 1000000, Output Limit: 64000
xiaomi	MiMo-V2-Flash	mimo-v2-flash	0.07	0.21	Provider: Xiaomi, Context: 256000, Output Limit: 32000
alibaba	Qwen3-LiveTranslate Flash Realtime	qwen3-livetranslate-flash-realtime	10.00	10.00	Provider: Alibaba, Context: 53248, Output Limit: 4096
alibaba	Qwen3-ASR Flash	qwen3-asr-flash	0.04	0.04	Provider: Alibaba, Context: 53248, Output Limit: 4096
alibaba	Qwen-Omni Turbo	qwen-omni-turbo	0.07	0.27	Provider: Alibaba, Context: 32768, Output Limit: 2048
alibaba	Qwen-VL Max	qwen-vl-max	0.80	3.20	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen3-Next 80B-A3B Instruct	qwen3-next-80b-a3b-instruct	0.50	2.00	Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba	Qwen Turbo	qwen-turbo	0.05	0.20	Provider: Alibaba, Context: 1000000, Output Limit: 16384
alibaba	Qwen3-VL 235B-A22B	qwen3-vl-235b-a22b	0.70	2.80	Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba	Qwen3 Coder Flash	qwen3-coder-flash	0.30	1.50	Provider: Alibaba, Context: 1000000, Output Limit: 65536
alibaba	Qwen3-VL 30B-A3B	qwen3-vl-30b-a3b	0.20	0.80	Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba	Qwen3 14B	qwen3-14b	0.35	1.40	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	QVQ Max	qvq-max	1.20	4.80	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen Plus Character (Japanese)	qwen-plus-character-ja	0.50	1.40	Provider: Alibaba, Context: 8192, Output Limit: 512
alibaba	Qwen2.5 14B Instruct	qwen2-5-14b-instruct	0.35	1.40	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	QwQ Plus	qwq-plus	0.80	2.40	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen3-Coder 30B-A3B Instruct	qwen3-coder-30b-a3b-instruct	0.45	2.25	Provider: Alibaba, Context: 262144, Output Limit: 65536
alibaba	Qwen-VL OCR	qwen-vl-ocr	0.72	0.72	Provider: Alibaba, Context: 34096, Output Limit: 4096
alibaba	Qwen2.5 72B Instruct	qwen2-5-72b-instruct	1.40	5.60	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen3-Omni Flash	qwen3-omni-flash	0.43	1.66	Provider: Alibaba, Context: 65536, Output Limit: 16384
alibaba	Qwen Flash	qwen-flash	0.05	0.40	Provider: Alibaba, Context: 1000000, Output Limit: 32768
alibaba	Qwen3 8B	qwen3-8b	0.18	0.70	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen3-Omni Flash Realtime	qwen3-omni-flash-realtime	0.52	1.99	Provider: Alibaba, Context: 65536, Output Limit: 16384
alibaba	Qwen2.5-VL 72B Instruct	qwen2-5-vl-72b-instruct	2.80	8.40	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen3-VL Plus	qwen3-vl-plus	0.20	1.60	Provider: Alibaba, Context: 262144, Output Limit: 32768
alibaba	Qwen Plus	qwen-plus	0.40	1.20	Provider: Alibaba, Context: 1000000, Output Limit: 32768
alibaba	Qwen2.5 32B Instruct	qwen2-5-32b-instruct	0.70	2.80	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen2.5-Omni 7B	qwen2-5-omni-7b	0.10	0.40	Provider: Alibaba, Context: 32768, Output Limit: 2048
alibaba	Qwen Max	qwen-max	1.60	6.40	Provider: Alibaba, Context: 32768, Output Limit: 8192
alibaba	Qwen2.5 7B Instruct	qwen2-5-7b-instruct	0.18	0.70	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen2.5-VL 7B Instruct	qwen2-5-vl-7b-instruct	0.35	1.05	Provider: Alibaba, Context: 131072, Output Limit: 8192
alibaba	Qwen3 235B-A22B	qwen3-235b-a22b	0.70	2.80	Provider: Alibaba, Context: 131072, Output Limit: 16384
alibaba	Qwen-Omni Turbo Realtime	qwen-omni-turbo-realtime	0.27	1.07	Provider: Alibaba, Context: 32768, Output Limit: 2048
alibaba	Qwen-MT Turbo	qwen-mt-turbo	0.16	0.49	Provider: Alibaba, Context: 16384, Output Limit: 8192
alibaba	Qwen3-Coder 480B-A35B Instruct	qwen3-coder-480b-a35b-instruct	1.50	7.50	Provider: Alibaba, Context: 262144, Output Limit: 65536
alibaba	Qwen-MT Plus	qwen-mt-plus	2.46	7.37	Provider: Alibaba, Context: 16384, Output Limit: 8192
alibaba	Qwen3 Max	qwen3-max	1.20	6.00	Provider: Alibaba, Context: 262144, Output Limit: 65536
alibaba	Qwen3 Coder Plus	qwen3-coder-plus	1.00	5.00	Provider: Alibaba, Context: 1048576, Output Limit: 65536
alibaba	Qwen3-Next 80B-A3B (Thinking)	qwen3-next-80b-a3b-thinking	0.50	6.00	Provider: Alibaba, Context: 131072, Output Limit: 32768
alibaba	Qwen3 32B	qwen3-32b	0.70	2.80	Provider: Alibaba, Context: 131072, Output Limit: 16384
alibaba	Qwen-VL Plus	qwen-vl-plus	0.21	0.63	Provider: Alibaba, Context: 131072, Output Limit: 8192
xai	Grok 4 Fast (Non-Reasoning)	grok-4-fast-non-reasoning	0.20	0.50	Provider: xAI, Context: 2000000, Output Limit: 30000
xai	Grok 3 Fast	grok-3-fast	5.00	25.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 4	grok-4	3.00	15.00	Provider: xAI, Context: 256000, Output Limit: 64000
xai	Grok 2 Vision	grok-2-vision	2.00	10.00	Provider: xAI, Context: 8192, Output Limit: 4096
xai	Grok Code Fast 1	grok-code-fast-1	0.20	1.50	Provider: xAI, Context: 256000, Output Limit: 10000
xai	Grok 2	grok-2	2.00	10.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 3 Mini Fast Latest	grok-3-mini-fast-latest	0.60	4.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 2 Vision (1212)	grok-2-vision-1212	2.00	10.00	Provider: xAI, Context: 8192, Output Limit: 4096
xai	Grok 3	grok-3	3.00	15.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 4 Fast	grok-4-fast	0.20	0.50	Provider: xAI, Context: 2000000, Output Limit: 30000
xai	Grok 2 Latest	grok-2-latest	2.00	10.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 4.1 Fast	grok-4-1-fast	0.20	0.50	Provider: xAI, Context: 2000000, Output Limit: 30000
xai	Grok 2 (1212)	grok-2-1212	2.00	10.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 3 Fast Latest	grok-3-fast-latest	5.00	25.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 3 Latest	grok-3-latest	3.00	15.00	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 2 Vision Latest	grok-2-vision-latest	2.00	10.00	Provider: xAI, Context: 8192, Output Limit: 4096
xai	Grok Vision Beta	grok-vision-beta	5.00	15.00	Provider: xAI, Context: 8192, Output Limit: 4096
xai	Grok 3 Mini	grok-3-mini	0.30	0.50	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok Beta	grok-beta	5.00	15.00	Provider: xAI, Context: 131072, Output Limit: 4096
xai	Grok 3 Mini Latest	grok-3-mini-latest	0.30	0.50	Provider: xAI, Context: 131072, Output Limit: 8192
xai	Grok 4.1 Fast (Non-Reasoning)	grok-4-1-fast-non-reasoning	0.20	0.50	Provider: xAI, Context: 2000000, Output Limit: 30000
xai	Grok 3 Mini Fast	grok-3-mini-fast	0.60	4.00	Provider: xAI, Context: 131072, Output Limit: 8192
vultr	DeepSeek R1 Distill Qwen 32B	deepseek-r1-distill-qwen-32b	0.20	0.20	Provider: Vultr, Context: 121808, Output Limit: 8192
vultr	Qwen2.5 Coder 32B Instruct	qwen2.5-coder-32b-instruct	0.20	0.20	Provider: Vultr, Context: 12952, Output Limit: 2048
vultr	Kimi K2 Instruct	kimi-k2-instruct	0.20	0.20	Provider: Vultr, Context: 58904, Output Limit: 4096
vultr	DeepSeek R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.20	0.20	Provider: Vultr, Context: 121808, Output Limit: 8192
vultr	GPT OSS 120B	gpt-oss-120b	0.20	0.20	Provider: Vultr, Context: 121808, Output Limit: 8192
nvidia	Kimi K2 0905	kimi-k2-instruct-0905	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia	Kimi K2 Thinking	kimi-k2-thinking	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia	Kimi K2 Instruct	kimi-k2-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia	nvidia-nemotron-nano-9b-v2	nvidia-nemotron-nano-9b-v2	0.00	0.00	Provider: Nvidia, Context: 131072, Output Limit: 131072
nvidia	Cosmos Nemotron 34B	cosmos-nemotron-34b	0.00	0.00	Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia	Llama Embed Nemotron 8B	llama-embed-nemotron-8b	0.00	0.00	Provider: Nvidia, Context: 32768, Output Limit: 2048
nvidia	nemotron-3-nano-30b-a3b	nemotron-3-nano-30b-a3b	0.00	0.00	Provider: Nvidia, Context: 131072, Output Limit: 131072
nvidia	Parakeet TDT 0.6B v2	parakeet-tdt-0.6b-v2	0.00	0.00	Provider: Nvidia, Context: N/A, Output Limit: 4096
nvidia	NeMo Retriever OCR v1	nemoretriever-ocr-v1	0.00	0.00	Provider: Nvidia, Context: N/A, Output Limit: 4096
nvidia	Llama 3.3 Nemotron Super 49b V1	llama-3.3-nemotron-super-49b-v1	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 3.1 Nemotron 51b Instruct	llama-3.1-nemotron-51b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama3 Chatqa 1.5 70b	llama3-chatqa-1.5-70b	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama-3.1-Nemotron-Ultra-253B-v1	llama-3.1-nemotron-ultra-253b-v1	0.00	0.00	Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia	Llama 3.1 Nemotron 70b Instruct	llama-3.1-nemotron-70b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Nemotron 4 340b Instruct	nemotron-4-340b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 3.3 Nemotron Super 49b V1.5	llama-3.3-nemotron-super-49b-v1.5	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	MiniMax-M2	minimax-m2	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 16384
nvidia	Gemma 3n E2b It	gemma-3n-e2b-it	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Codegemma 1.1 7b	codegemma-1.1-7b	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Gemma 3n E4b It	gemma-3n-e4b-it	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Gemma 2 2b It	gemma-2-2b-it	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Gemma 3 12b It	gemma-3-12b-it	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Codegemma 7b	codegemma-7b	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Gemma 3 1b It	gemma-3-1b-it	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Gemma 2 27b It	gemma-2-27b-it	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Gemma-3-27B-IT	gemma-3-27b-it	0.00	0.00	Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia	Phi 3 Medium 128k Instruct	phi-3-medium-128k-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Phi 3 Small 128k Instruct	phi-3-small-128k-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Phi 3.5 Vision Instruct	phi-3.5-vision-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Phi 3 Small 8k Instruct	phi-3-small-8k-instruct	0.00	0.00	Provider: Nvidia, Context: 8000, Output Limit: 4096
nvidia	Phi 3.5 Moe Instruct	phi-3.5-moe-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Phi-4-Mini	phi-4-mini-instruct	0.00	0.00	Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia	Phi 3 Medium 4k Instruct	phi-3-medium-4k-instruct	0.00	0.00	Provider: Nvidia, Context: 4000, Output Limit: 4096
nvidia	Phi 3 Vision 128k Instruct	phi-3-vision-128k-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Whisper Large v3	whisper-large-v3	0.00	0.00	Provider: Nvidia, Context: N/A, Output Limit: 4096
nvidia	GPT-OSS-120B	gpt-oss-120b	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia	Qwen3-Next-80B-A3B-Instruct	qwen3-next-80b-a3b-instruct	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 16384
nvidia	Qwen2.5 Coder 32b Instruct	qwen2.5-coder-32b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Qwen2.5 Coder 7b Instruct	qwen2.5-coder-7b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Qwen3-235B-A22B	qwen3-235b-a22b	0.00	0.00	Provider: Nvidia, Context: 131072, Output Limit: 8192
nvidia	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 66536
nvidia	Qwq 32b	qwq-32b	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Qwen3-Next-80B-A3B-Thinking	qwen3-next-80b-a3b-thinking	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 16384
nvidia	Devstral-2-123B-Instruct-2512	devstral-2-123b-instruct-2512	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia	Mistral Large 3 675B Instruct 2512	mistral-large-3-675b-instruct-2512	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia	Ministral 3 14B Instruct 2512	ministral-14b-instruct-2512	0.00	0.00	Provider: Nvidia, Context: 262144, Output Limit: 262144
nvidia	Mamba Codestral 7b V0.1	mamba-codestral-7b-v0.1	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Mistral Large 2 Instruct	mistral-large-2-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Codestral 22b Instruct V0.1	codestral-22b-instruct-v0.1	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Mistral Small 3.1 24b Instruct 2503	mistral-small-3.1-24b-instruct-2503	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 3.2 11b Vision Instruct	llama-3.2-11b-vision-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama3 70b Instruct	llama3-70b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 3.3 70b Instruct	llama-3.3-70b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 3.2 1b Instruct	llama-3.2-1b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 4 Scout 17b 16e Instruct	llama-4-scout-17b-16e-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 4 Maverick 17b 128e Instruct	llama-4-maverick-17b-128e-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Codellama 70b	codellama-70b	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 3.1 405b Instruct	llama-3.1-405b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama3 8b Instruct	llama3-8b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Llama 3.1 70b Instruct	llama-3.1-70b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Deepseek R1 0528	deepseek-r1-0528	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	Deepseek R1	deepseek-r1	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	DeepSeek V3.1 Terminus	deepseek-v3.1-terminus	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia	DeepSeek V3.1	deepseek-v3.1	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 8192
nvidia	Deepseek Coder 6.7b Instruct	deepseek-coder-6.7b-instruct	0.00	0.00	Provider: Nvidia, Context: 128000, Output Limit: 4096
nvidia	FLUX.1-dev	flux.1-dev	0.00	0.00	Provider: Nvidia, Context: 4096, Output Limit: N/A
cohere	Command A Translate	command-a-translate-08-2025	2.50	10.00	Provider: Cohere, Context: 8000, Output Limit: 8000
cohere	Command A	command-a-03-2025	2.50	10.00	Provider: Cohere, Context: 256000, Output Limit: 8000
cohere	Command R	command-r-08-2024	0.15	0.60	Provider: Cohere, Context: 128000, Output Limit: 4000
cohere	Command R+	command-r-plus-08-2024	2.50	10.00	Provider: Cohere, Context: 128000, Output Limit: 4000
cohere	Command R7B	command-r7b-12-2024	0.04	0.15	Provider: Cohere, Context: 128000, Output Limit: 4000
cohere	Command A Reasoning	command-a-reasoning-08-2025	2.50	10.00	Provider: Cohere, Context: 256000, Output Limit: 32000
cohere	Command A Vision	command-a-vision-07-2025	2.50	10.00	Provider: Cohere, Context: 128000, Output Limit: 8000
upstage	solar-mini	solar-mini	0.15	0.15	Provider: Upstage, Context: 32768, Output Limit: 4096
upstage	solar-pro2	solar-pro2	0.25	0.25	Provider: Upstage, Context: 65536, Output Limit: 8192
groq	Llama 3.1 8B Instant	llama-3.1-8b-instant	0.05	0.08	Provider: Groq, Context: 131072, Output Limit: 131072
groq	Mistral Saba 24B	mistral-saba-24b	0.79	0.79	Provider: Groq, Context: 32768, Output Limit: 32768
groq	Llama 3 8B	llama3-8b-8192	0.05	0.08	Provider: Groq, Context: 8192, Output Limit: 8192
groq	Qwen QwQ 32B	qwen-qwq-32b	0.29	0.39	Provider: Groq, Context: 131072, Output Limit: 16384
groq	Llama 3 70B	llama3-70b-8192	0.59	0.79	Provider: Groq, Context: 8192, Output Limit: 8192
groq	DeepSeek R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.75	0.99	Provider: Groq, Context: 131072, Output Limit: 8192
groq	Llama Guard 3 8B	llama-guard-3-8b	0.20	0.20	Provider: Groq, Context: 8192, Output Limit: 8192
groq	Gemma 2 9B	gemma2-9b-it	0.20	0.20	Provider: Groq, Context: 8192, Output Limit: 8192
groq	Llama 3.3 70B Versatile	llama-3.3-70b-versatile	0.59	0.79	Provider: Groq, Context: 131072, Output Limit: 32768
groq	Kimi K2 Instruct 0905	kimi-k2-instruct-0905	1.00	3.00	Provider: Groq, Context: 262144, Output Limit: 16384
groq	Kimi K2 Instruct	kimi-k2-instruct	1.00	3.00	Provider: Groq, Context: 131072, Output Limit: 16384
groq	GPT OSS 20B	gpt-oss-20b	0.08	0.30	Provider: Groq, Context: 131072, Output Limit: 65536
groq	GPT OSS 120B	gpt-oss-120b	0.15	0.60	Provider: Groq, Context: 131072, Output Limit: 65536
groq	Qwen3 32B	qwen3-32b	0.29	0.59	Provider: Groq, Context: 131072, Output Limit: 16384
groq	Llama 4 Scout 17B	llama-4-scout-17b-16e-instruct	0.11	0.34	Provider: Groq, Context: 131072, Output Limit: 8192
groq	Llama 4 Maverick 17B	llama-4-maverick-17b-128e-instruct	0.20	0.60	Provider: Groq, Context: 131072, Output Limit: 8192
groq	Llama Guard 4 12B	llama-guard-4-12b	0.20	0.20	Provider: Groq, Context: 131072, Output Limit: 1024
bailing	Ling-1T	ling-1t	0.57	2.29	Provider: Bailing, Context: 128000, Output Limit: 32000
bailing	Ring-1T	ring-1t	0.57	2.29	Provider: Bailing, Context: 128000, Output Limit: 32000
githubcopilot	Gemini 2.0 Flash	gemini-2.0-flash-001	0.00	0.00	Provider: GitHub Copilot, Context: 1000000, Output Limit: 8192
githubcopilot	Claude Opus 4	claude-opus-4	0.00	0.00	Provider: GitHub Copilot, Context: 80000, Output Limit: 16000
githubcopilot	Gemini 3 Flash	gemini-3-flash-preview	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot	Grok Code Fast 1	grok-code-fast-1	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot	GPT-5.1-Codex	gpt-5.1-codex	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot	Claude Haiku 4.5	claude-haiku-4.5	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
githubcopilot	Gemini 3 Pro Preview	gemini-3-pro-preview	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot	Raptor Mini (Preview)	oswe-vscode-prime	0.00	0.00	Provider: GitHub Copilot, Context: 200000, Output Limit: 64000
githubcopilot	Claude Sonnet 3.5	claude-3.5-sonnet	0.00	0.00	Provider: GitHub Copilot, Context: 90000, Output Limit: 8192
githubcopilot	GPT-5.1-Codex-mini	gpt-5.1-codex-mini	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 100000
githubcopilot	o3-mini	o3-mini	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 65536
githubcopilot	GPT-5.1	gpt-5.1	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot	GPT-5-Codex	gpt-5-codex	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot	GPT-4o	gpt-4o	0.00	0.00	Provider: GitHub Copilot, Context: 64000, Output Limit: 16384
githubcopilot	GPT-4.1	gpt-4.1	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 16384
githubcopilot	o4-mini (Preview)	o4-mini	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 65536
githubcopilot	Claude Opus 4.1	claude-opus-41	0.00	0.00	Provider: GitHub Copilot, Context: 80000, Output Limit: 16000
githubcopilot	GPT-5-mini	gpt-5-mini	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot	Claude Sonnet 3.7	claude-3.7-sonnet	0.00	0.00	Provider: GitHub Copilot, Context: 200000, Output Limit: 16384
githubcopilot	Gemini 2.5 Pro	gemini-2.5-pro	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot	GPT-5.1-Codex-max	gpt-5.1-codex-max	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot	o3 (Preview)	o3	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 16384
githubcopilot	Claude Sonnet 4	claude-sonnet-4	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
githubcopilot	GPT-5	gpt-5	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 128000
githubcopilot	Claude Sonnet 3.7 Thinking	claude-3.7-sonnet-thought	0.00	0.00	Provider: GitHub Copilot, Context: 200000, Output Limit: 16384
githubcopilot	Claude Opus 4.5	claude-opus-4.5	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
githubcopilot	GPT-5.2	gpt-5.2	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 64000
githubcopilot	Claude Sonnet 4.5	claude-sonnet-4.5	0.00	0.00	Provider: GitHub Copilot, Context: 128000, Output Limit: 16000
mistral	Devstral Medium	devstral-medium-2507	0.40	2.00	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Mistral Large 3	mistral-large-2512	0.50	1.50	Provider: Mistral, Context: 262144, Output Limit: 262144
mistral	Mixtral 8x22B	open-mixtral-8x22b	2.00	6.00	Provider: Mistral, Context: 64000, Output Limit: 64000
mistral	Ministral 8B	ministral-8b-latest	0.10	0.10	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Pixtral Large	pixtral-large-latest	2.00	6.00	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Mistral Small 3.2	mistral-small-2506	0.10	0.30	Provider: Mistral, Context: 128000, Output Limit: 16384
mistral	devstral-2512	devstral-2512	0.40	2.00	Source: mistral, Context: 256000
mistral	Ministral 3B	ministral-3b-latest	0.04	0.04	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Pixtral 12B	pixtral-12b	0.15	0.15	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Mistral Medium 3	mistral-medium-2505	0.40	2.00	Provider: Mistral, Context: 131072, Output Limit: 131072
mistral	labs-devstral-small-2512	labs-devstral-small-2512	0.10	0.30	Source: mistral, Context: 256000
mistral	Devstral 2	devstral-medium-latest	0.40	2.00	Provider: Mistral, Context: 262144, Output Limit: 262144
mistral	Devstral Small 2505	devstral-small-2505	0.10	0.30	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Mistral Medium 3.1	mistral-medium-2508	0.40	2.00	Provider: Mistral, Context: 262144, Output Limit: 262144
mistral	Mistral Embed	mistral-embed	0.10	0.00	Provider: Mistral, Context: 8000, Output Limit: 3072
mistral	Mistral Small	mistral-small-latest	0.10	0.30	Provider: Mistral, Context: 128000, Output Limit: 16384
mistral	Magistral Small	magistral-small	0.50	1.50	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Devstral Small	devstral-small-2507	0.10	0.30	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Codestral	codestral-latest	0.30	0.90	Provider: Mistral, Context: 256000, Output Limit: 4096
mistral	Mixtral 8x7B	open-mixtral-8x7b	0.70	0.70	Provider: Mistral, Context: 32000, Output Limit: 32000
mistral	Mistral Nemo	mistral-nemo	0.15	0.15	Provider: Mistral, Context: 128000, Output Limit: 128000
mistral	Mistral 7B	open-mistral-7b	0.25	0.25	Provider: Mistral, Context: 8000, Output Limit: 8000
mistral	Mistral Large	mistral-large-latest	0.50	1.50	Provider: Mistral, Context: 262144, Output Limit: 262144
mistral	Mistral Medium	mistral-medium-latest	0.40	2.00	Provider: Mistral, Context: 128000, Output Limit: 16384
mistral	Mistral Large 2.1	mistral-large-2411	2.00	6.00	Provider: Mistral, Context: 131072, Output Limit: 16384
mistral	Magistral Medium	magistral-medium-latest	2.00	5.00	Provider: Mistral, Context: 128000, Output Limit: 16384
abacus	GPT-4.1 Nano	gpt-4.1-nano	0.10	0.40	Provider: Abacus, Context: 1047576, Output Limit: 32768
abacus	Grok 4 Fast (Non-Reasoning)	grok-4-fast-non-reasoning	0.20	0.50	Provider: Abacus, Context: 2000000, Output Limit: 16384
abacus	Gemini 2.0 Flash	gemini-2.0-flash-001	0.10	0.40	Provider: Abacus, Context: 1000000, Output Limit: 8192
abacus	DeepSeek V3.2	deepseek-ai-deepseek-v3.2	0.27	0.40	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	Llama 3.1 405B Instruct Turbo	meta-llama-meta-llama-3.1-405b-instruct-turbo	3.50	3.50	Provider: Abacus, Context: 128000, Output Limit: 4096
abacus	Gemini 3 Flash Preview	gemini-3-flash-preview	0.50	3.00	Provider: Abacus, Context: 1048576, Output Limit: 65536
abacus	Qwen3 235B A22B Instruct	qwen-qwen3-235b-a22b-instruct-2507	0.13	0.60	Provider: Abacus, Context: 262144, Output Limit: 8192
abacus	Llama 3.1 8B Instruct	meta-llama-meta-llama-3.1-8b-instruct	0.02	0.05	Provider: Abacus, Context: 128000, Output Limit: 4096
abacus	Grok Code Fast 1	grok-code-fast-1	0.20	1.50	Provider: Abacus, Context: 256000, Output Limit: 16384
abacus	DeepSeek R1	deepseek-ai-deepseek-r1	3.00	7.00	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	Kimi K2 Turbo Preview	kimi-k2-turbo-preview	0.15	8.00	Provider: Abacus, Context: 256000, Output Limit: 8192
abacus	Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	Provider: Abacus, Context: 1000000, Output Limit: 65000
abacus	Qwen3 Coder 480B A35B Instruct	qwen-qwen3-coder-480b-a35b-instruct	0.29	1.20	Provider: Abacus, Context: 262144, Output Limit: 65536
abacus	Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Provider: Abacus, Context: 1048576, Output Limit: 65536
abacus	GPT-4.1 Mini	gpt-4.1-mini	0.40	1.60	Provider: Abacus, Context: 1047576, Output Limit: 32768
abacus	Claude Opus 4.5	claude-opus-4-5-20251101	5.00	25.00	Provider: Abacus, Context: 200000, Output Limit: 64000
abacus	Qwen 2.5 Coder 32B	qwen-2.5-coder-32b	0.79	0.79	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	Claude Sonnet 4.5	claude-sonnet-4-5-20250929	3.00	15.00	Provider: Abacus, Context: 200000, Output Limit: 64000
abacus	GPT-OSS 120B	openai-gpt-oss-120b	0.08	0.44	Provider: Abacus, Context: 128000, Output Limit: 32768
abacus	Qwen3 Max	qwen-qwen3-max	1.20	6.00	Provider: Abacus, Context: 131072, Output Limit: 16384
abacus	Grok 4	grok-4-0709	3.00	15.00	Provider: Abacus, Context: 256000, Output Limit: 16384
abacus	Llama 3.1 70B Instruct	meta-llama-meta-llama-3.1-70b-instruct	0.40	0.40	Provider: Abacus, Context: 128000, Output Limit: 4096
abacus	o3-mini	o3-mini	1.10	4.40	Provider: Abacus, Context: 200000, Output Limit: 100000
abacus	GLM-4.5	zai-org-glm-4.5	0.60	2.20	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	Gemini 2.0 Pro Exp	gemini-2.0-pro-exp-02-05	-	-	Provider: Abacus, Context: 2000000, Output Limit: 8192
abacus	GPT-5.1	gpt-5.1	1.25	10.00	Provider: Abacus, Context: 400000, Output Limit: 128000
abacus	GPT-5 Nano	gpt-5-nano	0.05	0.40	Provider: Abacus, Context: 400000, Output Limit: 128000
abacus	Claude Sonnet 4	claude-sonnet-4-20250514	3.00	15.00	Provider: Abacus, Context: 200000, Output Limit: 64000
abacus	GPT-4.1	gpt-4.1	2.00	8.00	Provider: Abacus, Context: 1047576, Output Limit: 32768
abacus	o4-mini	o4-mini	1.10	4.40	Provider: Abacus, Context: 200000, Output Limit: 100000
abacus	Qwen3 32B	qwen-qwen3-32b	0.09	0.29	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	Claude Opus 4	claude-opus-4-20250514	15.00	75.00	Provider: Abacus, Context: 200000, Output Limit: 32000
abacus	GPT-5 Mini	gpt-5-mini	0.25	2.00	Provider: Abacus, Context: 400000, Output Limit: 128000
abacus	Llama 4 Maverick 17B 128E Instruct FP8	meta-llama-llama-4-maverick-17b-128e-instruct-fp8	0.14	0.59	Provider: Abacus, Context: 1000000, Output Limit: 32768
abacus	o3-pro	o3-pro	20.00	80.00	Provider: Abacus, Context: 200000, Output Limit: 100000
abacus	Claude Sonnet 3.7	claude-3-7-sonnet-20250219	3.00	15.00	Provider: Abacus, Context: 200000, Output Limit: 64000
abacus	DeepSeek V3.1 Terminus	deepseek-ai-deepseek-v3.1-terminus	0.27	1.00	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Provider: Abacus, Context: 1048576, Output Limit: 65536
abacus	GPT-4o (2024-11-20)	gpt-4o-2024-11-20	2.50	10.00	Provider: Abacus, Context: 128000, Output Limit: 16384
abacus	o3	o3	2.00	8.00	Provider: Abacus, Context: 200000, Output Limit: 100000
abacus	Qwen 2.5 72B Instruct	qwen-qwen2.5-72b-instruct	0.11	0.38	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	GLM-4.6	zai-org-glm-4.6	0.60	2.20	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	DeepSeek V3.1	deepseek-deepseek-v3.1	0.55	1.66	Provider: Abacus, Context: 128000, Output Limit: 8192
abacus	QwQ 32B	qwen-qwq-32b	0.40	0.40	Provider: Abacus, Context: 32768, Output Limit: 32768
abacus	GPT-4o Mini	gpt-4o-mini	0.15	0.60	Provider: Abacus, Context: 128000, Output Limit: 16384
abacus	GPT-5	gpt-5	1.25	10.00	Provider: Abacus, Context: 400000, Output Limit: 128000
abacus	Grok 4.1 Fast (Non-Reasoning)	grok-4-1-fast-non-reasoning	0.20	0.50	Provider: Abacus, Context: 2000000, Output Limit: 16384
abacus	Llama 3.3 70B Versatile	llama-3.3-70b-versatile	0.59	0.79	Provider: Abacus, Context: 128000, Output Limit: 32768
abacus	Claude Opus 4.1	claude-opus-4-1-20250805	15.00	75.00	Provider: Abacus, Context: 200000, Output Limit: 32000
abacus	GPT-5.2	gpt-5.2	1.75	14.00	Provider: Abacus, Context: 400000, Output Limit: 128000
abacus	GPT-5.1 Chat Latest	gpt-5.1-chat-latest	1.25	10.00	Provider: Abacus, Context: 400000, Output Limit: 128000
abacus	Claude Haiku 4.5	claude-haiku-4-5-20251001	1.00	5.00	Provider: Abacus, Context: 200000, Output Limit: 64000
nebius	Hermes 4 70B	hermes-4-70b	0.13	0.40	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	Hermes-4 405B	hermes-4-405b	1.00	3.00	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	Kimi K2 Instruct	kimi-k2-instruct	0.50	2.40	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	Llama 3.1 Nemotron Ultra 253B v1	llama-3_1-nemotron-ultra-253b-v1	0.60	1.80	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	GPT OSS 20B	gpt-oss-20b	0.05	0.20	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	GPT OSS 120B	gpt-oss-120b	0.15	0.60	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.20	0.60	Provider: Nebius Token Factory, Context: 262144, Output Limit: 8192
nebius	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	0.20	0.80	Provider: Nebius Token Factory, Context: 262144, Output Limit: 8192
nebius	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	0.40	1.80	Provider: Nebius Token Factory, Context: 262144, Output Limit: 66536
nebius	Llama 3.1 405B Instruct	llama-3_1-405b-instruct	1.00	3.00	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	Llama-3.3-70B-Instruct (Fast)	llama-3.3-70b-instruct-fast	0.25	0.75	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	Llama-3.3-70B-Instruct (Base)	llama-3.3-70b-instruct-base	0.13	0.40	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	GLM 4.5	glm-4.5	0.60	2.20	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	GLM 4.5 Air	glm-4.5-air	0.20	1.20	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
nebius	DeepSeek V3	deepseek-v3	0.50	1.50	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
deepseek	DeepSeek Chat	deepseek-chat	0.28	0.42	Provider: DeepSeek, Context: 128000, Output Limit: 8192
deepseek	DeepSeek Reasoner	deepseek-reasoner	0.28	0.42	Provider: DeepSeek, Context: 128000, Output Limit: 128000
alibabacn	DeepSeek R1 Distill Qwen 7B	deepseek-r1-distill-qwen-7b	0.07	0.14	Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn	Qwen3-ASR Flash	qwen3-asr-flash	0.03	0.03	Provider: Alibaba (China), Context: 53248, Output Limit: 4096
alibabacn	DeepSeek R1 0528	deepseek-r1-0528	0.57	2.29	Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn	DeepSeek V3	deepseek-v3	0.29	1.15	Provider: Alibaba (China), Context: 65536, Output Limit: 8192
alibabacn	Qwen-Omni Turbo	qwen-omni-turbo	0.06	0.23	Provider: Alibaba (China), Context: 32768, Output Limit: 2048
alibabacn	Qwen-VL Max	qwen-vl-max	0.23	0.57	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	DeepSeek V3.2 Exp	deepseek-v3-2-exp	0.29	0.43	Provider: Alibaba (China), Context: 131072, Output Limit: 65536
alibabacn	Qwen3-Next 80B-A3B Instruct	qwen3-next-80b-a3b-instruct	0.14	0.57	Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn	DeepSeek R1	deepseek-r1	0.57	2.29	Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn	Qwen Turbo	qwen-turbo	0.04	0.09	Provider: Alibaba (China), Context: 1000000, Output Limit: 16384
alibabacn	Qwen3-VL 235B-A22B	qwen3-vl-235b-a22b	0.29	1.15	Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn	Qwen3 Coder Flash	qwen3-coder-flash	0.14	0.57	Provider: Alibaba (China), Context: 1000000, Output Limit: 65536
alibabacn	Qwen3-VL 30B-A3B	qwen3-vl-30b-a3b	0.11	0.43	Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn	Qwen3 14B	qwen3-14b	0.14	0.57	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	QVQ Max	qvq-max	1.15	4.59	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	DeepSeek R1 Distill Qwen 32B	deepseek-r1-distill-qwen-32b	0.29	0.86	Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn	Qwen Plus Character	qwen-plus-character	0.12	0.29	Provider: Alibaba (China), Context: 32768, Output Limit: 4096
alibabacn	Qwen2.5 14B Instruct	qwen2-5-14b-instruct	0.14	0.43	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	QwQ Plus	qwq-plus	0.23	0.57	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen2.5-Coder 32B Instruct	qwen2-5-coder-32b-instruct	0.29	0.86	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen3-Coder 30B-A3B Instruct	qwen3-coder-30b-a3b-instruct	0.22	0.86	Provider: Alibaba (China), Context: 262144, Output Limit: 65536
alibabacn	Qwen Math Plus	qwen-math-plus	0.57	1.72	Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn	Qwen-VL OCR	qwen-vl-ocr	0.72	0.72	Provider: Alibaba (China), Context: 34096, Output Limit: 4096
alibabacn	Qwen Doc Turbo	qwen-doc-turbo	0.09	0.14	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen Deep Research	qwen-deep-research	7.74	23.37	Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
alibabacn	Qwen2.5 72B Instruct	qwen2-5-72b-instruct	0.57	1.72	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen3-Omni Flash	qwen3-omni-flash	0.06	0.23	Provider: Alibaba (China), Context: 65536, Output Limit: 16384
alibabacn	Qwen Flash	qwen-flash	0.02	0.22	Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
alibabacn	Qwen3 8B	qwen3-8b	0.07	0.29	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen3-Omni Flash Realtime	qwen3-omni-flash-realtime	0.23	0.92	Provider: Alibaba (China), Context: 65536, Output Limit: 16384
alibabacn	Qwen2.5-VL 72B Instruct	qwen2-5-vl-72b-instruct	2.29	6.88	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen3-VL Plus	qwen3-vl-plus	0.14	1.43	Provider: Alibaba (China), Context: 262144, Output Limit: 32768
alibabacn	Qwen Plus	qwen-plus	0.12	0.29	Provider: Alibaba (China), Context: 1000000, Output Limit: 32768
alibabacn	Qwen2.5 32B Instruct	qwen2-5-32b-instruct	0.29	0.86	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen2.5-Omni 7B	qwen2-5-omni-7b	0.09	0.35	Provider: Alibaba (China), Context: 32768, Output Limit: 2048
alibabacn	Qwen Max	qwen-max	0.35	1.38	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen Long	qwen-long	0.07	0.29	Provider: Alibaba (China), Context: 10000000, Output Limit: 8192
alibabacn	Qwen2.5-Math 72B Instruct	qwen2-5-math-72b-instruct	0.57	1.72	Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn	Moonshot Kimi K2 Instruct	moonshot-kimi-k2-instruct	0.57	2.29	Provider: Alibaba (China), Context: 131072, Output Limit: 131072
alibabacn	Tongyi Intent Detect V3	tongyi-intent-detect-v3	0.06	0.14	Provider: Alibaba (China), Context: 8192, Output Limit: 1024
alibabacn	Qwen2.5 7B Instruct	qwen2-5-7b-instruct	0.07	0.14	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen2.5-VL 7B Instruct	qwen2-5-vl-7b-instruct	0.29	0.72	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	DeepSeek V3.1	deepseek-v3-1	0.57	1.72	Provider: Alibaba (China), Context: 131072, Output Limit: 65536
alibabacn	DeepSeek R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.29	0.86	Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn	Qwen3 235B-A22B	qwen3-235b-a22b	0.29	1.15	Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn	Qwen2.5-Coder 7B Instruct	qwen2-5-coder-7b-instruct	0.14	0.29	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	DeepSeek R1 Distill Qwen 14B	deepseek-r1-distill-qwen-14b	0.14	0.43	Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn	Qwen-Omni Turbo Realtime	qwen-omni-turbo-realtime	0.23	0.92	Provider: Alibaba (China), Context: 32768, Output Limit: 2048
alibabacn	Qwen Math Turbo	qwen-math-turbo	0.29	0.86	Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn	Qwen-MT Turbo	qwen-mt-turbo	0.10	0.28	Provider: Alibaba (China), Context: 16384, Output Limit: 8192
alibabacn	DeepSeek R1 Distill Llama 8B	deepseek-r1-distill-llama-8b	0.00	0.00	Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn	Qwen3-Coder 480B-A35B Instruct	qwen3-coder-480b-a35b-instruct	0.86	3.44	Provider: Alibaba (China), Context: 262144, Output Limit: 65536
alibabacn	Qwen-MT Plus	qwen-mt-plus	0.26	0.78	Provider: Alibaba (China), Context: 16384, Output Limit: 8192
alibabacn	Qwen3 Max	qwen3-max	0.86	3.44	Provider: Alibaba (China), Context: 262144, Output Limit: 65536
alibabacn	QwQ 32B	qwq-32b	0.29	0.86	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen2.5-Math 7B Instruct	qwen2-5-math-7b-instruct	0.14	0.29	Provider: Alibaba (China), Context: 4096, Output Limit: 3072
alibabacn	Qwen3-Next 80B-A3B (Thinking)	qwen3-next-80b-a3b-thinking	0.14	1.43	Provider: Alibaba (China), Context: 131072, Output Limit: 32768
alibabacn	DeepSeek R1 Distill Qwen 1.5B	deepseek-r1-distill-qwen-1-5b	0.00	0.00	Provider: Alibaba (China), Context: 32768, Output Limit: 16384
alibabacn	Qwen3 32B	qwen3-32b	0.29	1.15	Provider: Alibaba (China), Context: 131072, Output Limit: 16384
alibabacn	Qwen-VL Plus	qwen-vl-plus	0.12	0.29	Provider: Alibaba (China), Context: 131072, Output Limit: 8192
alibabacn	Qwen3 Coder Plus	qwen3-coder-plus	1.00	5.00	Provider: Alibaba (China), Context: 1048576, Output Limit: 65536
googlevertexanthropic	Claude Opus 4.5	claude-opus-4-5@20251101	5.00	25.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic	Claude Sonnet 3.5 v2	claude-3-5-sonnet@20241022	3.00	15.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 8192
googlevertexanthropic	Claude Haiku 3.5	claude-3-5-haiku@20241022	0.80	4.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 8192
googlevertexanthropic	Claude Sonnet 4	claude-sonnet-4@20250514	3.00	15.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic	Claude Sonnet 4.5	claude-sonnet-4-5@20250929	3.00	15.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic	Claude Opus 4.1	claude-opus-4-1@20250805	15.00	75.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 32000
googlevertexanthropic	Claude Haiku 4.5	claude-haiku-4-5@20251001	1.00	5.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic	Claude Sonnet 3.7	claude-3-7-sonnet@20250219	3.00	15.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 64000
googlevertexanthropic	Claude Opus 4	claude-opus-4@20250514	15.00	75.00	Provider: Vertex (Anthropic), Context: 200000, Output Limit: 32000
venice	Grok 4.1 Fast	grok-41-fast	0.50	1.25	Provider: Venice AI, Context: 262144, Output Limit: 65536
venice	Qwen 3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.15	0.75	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	Gemini 3 Flash Preview	gemini-3-flash-preview	0.70	3.75	Provider: Venice AI, Context: 262144, Output Limit: 65536
venice	Claude Opus 4.5	claude-opus-45	6.00	30.00	Provider: Venice AI, Context: 202752, Output Limit: 50688
venice	Venice Medium	mistral-31-24b	0.50	2.00	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	Grok Code Fast 1	grok-code-fast-1	0.25	1.87	Provider: Venice AI, Context: 262144, Output Limit: 65536
venice	GLM 4.7	zai-org-glm-4.7	0.85	2.75	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	Venice Uncensored 1.1	venice-uncensored	0.20	0.90	Provider: Venice AI, Context: 32768, Output Limit: 8192
venice	Gemini 3 Pro Preview	gemini-3-pro-preview	2.50	15.00	Provider: Venice AI, Context: 202752, Output Limit: 50688
venice	GPT-5.2	openai-gpt-52	2.19	17.50	Provider: Venice AI, Context: 262144, Output Limit: 65536
venice	Venice Small	qwen3-4b	0.05	0.15	Provider: Venice AI, Context: 32768, Output Limit: 8192
venice	Llama 3.3 70B	llama-3.3-70b	0.70	2.80	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	OpenAI GPT OSS 120B	openai-gpt-oss-120b	0.07	0.30	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	Kimi K2 Thinking	kimi-k2-thinking	0.75	3.20	Provider: Venice AI, Context: 262144, Output Limit: 65536
venice	Qwen 3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	0.45	3.50	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	Llama 3.2 3B	llama-3.2-3b	0.15	0.60	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	Google Gemma 3 27B Instruct	google-gemma-3-27b-it	0.12	0.20	Provider: Venice AI, Context: 202752, Output Limit: 50688
venice	Hermes 3 Llama 3.1 405b	hermes-3-llama-3.1-405b	1.10	3.00	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	GLM 4.6V	zai-org-glm-4.6v	0.39	1.13	Provider: Venice AI, Context: 131072, Output Limit: 32768
venice	MiniMax M2.1	minimax-m21	0.40	1.60	Provider: Venice AI, Context: 202752, Output Limit: 50688
venice	Qwen 3 Next 80b	qwen3-next-80b	0.35	1.90	Provider: Venice AI, Context: 262144, Output Limit: 65536
venice	GLM 4.6	zai-org-glm-4.6	0.85	2.75	Provider: Venice AI, Context: 202752, Output Limit: 50688
venice	Qwen 3 Coder 480b	qwen3-coder-480b-a35b-instruct	0.75	3.00	Provider: Venice AI, Context: 262144, Output Limit: 65536
venice	DeepSeek V3.2	deepseek-v3.2	0.40	1.00	Provider: Venice AI, Context: 163840, Output Limit: 40960
siliconflowcn	inclusionAI/Ring-flash-2.0	ring-flash-2.0	0.14	0.57	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	inclusionAI/Ling-flash-2.0	ling-flash-2.0	0.14	0.57	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	inclusionAI/Ling-mini-2.0	ling-mini-2.0	0.07	0.28	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	moonshotai/Kimi-K2-Thinking	kimi-k2-thinking	0.55	2.50	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	moonshotai/Kimi-K2-Instruct-0905	kimi-k2-instruct-0905	0.40	2.00	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	moonshotai/Kimi-Dev-72B	kimi-dev-72b	0.29	1.15	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	moonshotai/Kimi-K2-Instruct	kimi-k2-instruct	0.58	2.29	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	tencent/Hunyuan-A13B-Instruct	hunyuan-a13b-instruct	0.14	0.57	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	tencent/Hunyuan-MT-7B	hunyuan-mt-7b	0.00	0.00	Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
siliconflowcn	MiniMaxAI/MiniMax-M1-80k	minimax-m1-80k	0.55	2.20	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	MiniMaxAI/MiniMax-M2	minimax-m2	0.30	1.20	Provider: SiliconFlow (China), Context: 197000, Output Limit: 131000
siliconflowcn	THUDM/GLM-Z1-32B-0414	glm-z1-32b-0414	0.14	0.57	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	THUDM/GLM-4-9B-0414	glm-4-9b-0414	0.09	0.09	Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
siliconflowcn	THUDM/GLM-Z1-9B-0414	glm-z1-9b-0414	0.09	0.09	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	THUDM/GLM-4.1V-9B-Thinking	glm-4.1v-9b-thinking	0.04	0.14	Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn	THUDM/GLM-4-32B-0414	glm-4-32b-0414	0.27	0.27	Provider: SiliconFlow (China), Context: 33000, Output Limit: 33000
siliconflowcn	openai/gpt-oss-120b	gpt-oss-120b	0.05	0.45	Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
siliconflowcn	openai/gpt-oss-20b	gpt-oss-20b	0.04	0.18	Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
siliconflowcn	stepfun-ai/step3	step3	0.57	1.42	Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn	nex-agi/DeepSeek-V3.1-Nex-N1	deepseek-v3.1-nex-n1	0.50	2.00	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	baidu/ERNIE-4.5-300B-A47B	ernie-4.5-300b-a47b	0.28	1.10	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	z-ai/GLM-4.5-Air	glm-4.5-air	0.14	0.86	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	z-ai/GLM-4.5	glm-4.5	0.40	2.00	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	ByteDance-Seed/Seed-OSS-36B-Instruct	seed-oss-36b-instruct	0.21	0.57	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	meta-llama/Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.06	0.06	Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn	Qwen/Qwen3-Next-80B-A3B-Thinking	qwen3-next-80b-a3b-thinking	0.14	0.57	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen2.5-14B-Instruct	qwen2.5-14b-instruct	0.10	0.10	Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn	Qwen/Qwen3-Next-80B-A3B-Instruct	qwen3-next-80b-a3b-instruct	0.14	1.40	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-VL-32B-Instruct	qwen3-vl-32b-instruct	0.20	0.60	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-Omni-30B-A3B-Thinking	qwen3-omni-30b-a3b-thinking	0.10	0.40	Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn	Qwen/Qwen3-235B-A22B-Thinking-2507	qwen3-235b-a22b-thinking-2507	0.13	0.60	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-VL-32B-Thinking	qwen3-vl-32b-thinking	0.20	1.50	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-VL-30B-A3B-Thinking	qwen3-vl-30b-a3b-thinking	0.29	1.00	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-30B-A3B-Instruct-2507	qwen3-30b-a3b-instruct-2507	0.09	0.30	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-VL-235B-A22B-Thinking	qwen3-vl-235b-a22b-thinking	0.45	3.50	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-Coder-480B-A35B-Instruct	qwen3-coder-480b-a35b-instruct	0.25	1.00	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-VL-235B-A22B-Instruct	qwen3-vl-235b-a22b-instruct	0.30	1.50	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-VL-8B-Instruct	qwen3-vl-8b-instruct	0.18	0.68	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-32B	qwen3-32b	0.14	0.57	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	Qwen/Qwen2.5-VL-7B-Instruct	qwen2.5-vl-7b-instruct	0.05	0.05	Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn	Qwen/QwQ-32B	qwq-32b	0.15	0.58	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	Qwen/Qwen2.5-VL-72B-Instruct	qwen2.5-vl-72b-instruct	0.59	0.59	Provider: SiliconFlow (China), Context: 131000, Output Limit: 4000
siliconflowcn	Qwen/Qwen3-235B-A22B	qwen3-235b-a22b	0.35	1.42	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	Qwen/Qwen2.5-7B-Instruct	qwen2.5-7b-instruct	0.05	0.05	Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn	Qwen/Qwen3-Coder-30B-A3B-Instruct	qwen3-coder-30b-a3b-instruct	0.07	0.28	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen2.5-72B-Instruct	qwen2.5-72b-instruct	0.59	0.59	Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn	Qwen/Qwen2.5-72B-Instruct-128K	qwen2.5-72b-instruct-128k	0.59	0.59	Provider: SiliconFlow (China), Context: 131000, Output Limit: 4000
siliconflowcn	Qwen/Qwen2.5-32B-Instruct	qwen2.5-32b-instruct	0.18	0.18	Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn	Qwen/Qwen2.5-Coder-32B-Instruct	qwen2.5-coder-32b-instruct	0.18	0.18	Provider: SiliconFlow (China), Context: 33000, Output Limit: 4000
siliconflowcn	Qwen/Qwen3-235B-A22B-Instruct-2507	qwen3-235b-a22b-instruct-2507	0.09	0.60	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-VL-8B-Thinking	qwen3-vl-8b-thinking	0.18	2.00	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-Omni-30B-A3B-Instruct	qwen3-omni-30b-a3b-instruct	0.10	0.40	Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn	Qwen/Qwen3-8B	qwen3-8b	0.06	0.06	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	Qwen/Qwen3-Omni-30B-A3B-Captioner	qwen3-omni-30b-a3b-captioner	0.10	0.40	Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn	Qwen/Qwen2.5-VL-32B-Instruct	qwen2.5-vl-32b-instruct	0.27	0.27	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	Qwen/Qwen3-14B	qwen3-14b	0.07	0.28	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	Qwen/Qwen3-VL-30B-A3B-Instruct	qwen3-vl-30b-a3b-instruct	0.29	1.00	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
siliconflowcn	Qwen/Qwen3-30B-A3B-Thinking-2507	qwen3-30b-a3b-thinking-2507	0.09	0.30	Provider: SiliconFlow (China), Context: 262000, Output Limit: 131000
siliconflowcn	Qwen/Qwen3-30B-A3B	qwen3-30b-a3b	0.09	0.45	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	zai-org/GLM-4.5V	glm-4.5v	0.14	0.86	Provider: SiliconFlow (China), Context: 66000, Output Limit: 66000
siliconflowcn	zai-org/GLM-4.6	glm-4.6	0.50	1.90	Provider: SiliconFlow (China), Context: 205000, Output Limit: 205000
siliconflowcn	deepseek-ai/DeepSeek-V3.1	deepseek-v3.1	0.27	1.00	Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn	deepseek-ai/DeepSeek-V3	deepseek-v3	0.25	1.00	Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn	deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	deepseek-r1-distill-qwen-7b	0.05	0.05	Provider: SiliconFlow (China), Context: 33000, Output Limit: 16000
siliconflowcn	deepseek-ai/DeepSeek-V3.1-Terminus	deepseek-v3.1-terminus	0.27	1.00	Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn	deepseek-ai/DeepSeek-V3.2-Exp	deepseek-v3.2-exp	0.27	0.41	Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
siliconflowcn	deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	deepseek-r1-distill-qwen-14b	0.10	0.10	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	deepseek-ai/deepseek-vl2	deepseek-vl2	0.15	0.15	Provider: SiliconFlow (China), Context: 4000, Output Limit: 4000
siliconflowcn	deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	deepseek-r1-distill-qwen-32b	0.18	0.18	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
siliconflowcn	deepseek-ai/DeepSeek-R1	deepseek-r1	0.50	2.18	Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000
chutes	Hermes 4.3 36B	hermes-4.3-36b	0.10	0.39	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Hermes 4 70B	hermes-4-70b	0.11	0.38	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	Hermes 4 14B	hermes-4-14b	0.01	0.05	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Hermes 4 405B FP8 TEE	hermes-4-405b-fp8-tee	0.30	1.20	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Hermes 4 405B FP8	hermes-4-405b-fp8	0.30	1.20	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	DeepHermes 3 Mistral 24B Preview	deephermes-3-mistral-24b-preview	0.02	0.10	Provider: Chutes, Context: 32768, Output Limit: 32768
chutes	dots.ocr	dots.ocr	0.01	0.01	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	Kimi K2 Instruct 0905	kimi-k2-instruct-0905	0.39	1.90	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	Kimi K2 Thinking TEE	kimi-k2-thinking-tee	0.40	1.75	Provider: Chutes, Context: 262144, Output Limit: 65535
chutes	MiniMax M2	minimax-m2	0.26	1.02	Provider: Chutes, Context: 196608, Output Limit: 196608
chutes	MiniMax M2.1 TEE	minimax-m2.1-tee	0.30	1.20	Provider: Chutes, Context: 196608, Output Limit: 65536
chutes	NVIDIA Nemotron 3 Nano 30B A3B BF16	nvidia-nemotron-3-nano-30b-a3b-bf16	0.06	0.24	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	QwQ 32B ArliAI RpR v1	qwq-32b-arliai-rpr-v1	0.03	0.11	Provider: Chutes, Context: 32768, Output Limit: 32768
chutes	DeepSeek R1T Chimera	deepseek-r1t-chimera	0.30	1.20	Provider: Chutes, Context: 163840, Output Limit: 163840
chutes	DeepSeek TNG R1T2 Chimera	deepseek-tng-r1t2-chimera	0.30	1.20	Provider: Chutes, Context: 163840, Output Limit: 163840
chutes	TNG R1T Chimera TEE	tng-r1t-chimera-tee	0.30	1.20	Provider: Chutes, Context: 163840, Output Limit: 65536
chutes	MiMo V2 Flash	mimo-v2-flash	0.17	0.65	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	InternVL3 78B	internvl3-78b	0.10	0.39	Provider: Chutes, Context: 32768, Output Limit: 32768
chutes	gpt oss 120b TEE	gpt-oss-120b-tee	0.04	0.25	Provider: Chutes, Context: 131072, Output Limit: 65536
chutes	gpt oss 20b	gpt-oss-20b	0.02	0.10	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	Mistral Small 3.1 24B Instruct 2503	mistral-small-3.1-24b-instruct-2503	0.03	0.11	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	Mistral Small 3.2 24B Instruct 2506	mistral-small-3.2-24b-instruct-2506	0.06	0.18	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	Tongyi DeepResearch 30B A3B	tongyi-deepresearch-30b-a3b	0.10	0.39	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	Devstral 2 123B Instruct 2512	devstral-2-123b-instruct-2512	0.05	0.22	Provider: Chutes, Context: 262144, Output Limit: 65536
chutes	Mistral Nemo Instruct 2407	mistral-nemo-instruct-2407	0.02	0.04	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	gemma 3 4b it	gemma-3-4b-it	0.01	0.03	Provider: Chutes, Context: 96000, Output Limit: 96000
chutes	Mistral Small 24B Instruct 2501	mistral-small-24b-instruct-2501	0.03	0.11	Provider: Chutes, Context: 32768, Output Limit: 32768
chutes	gemma 3 12b it	gemma-3-12b-it	0.03	0.10	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	gemma 3 27b it	gemma-3-27b-it	0.04	0.15	Provider: Chutes, Context: 96000, Output Limit: 96000
chutes	Qwen3 30B A3B	qwen3-30b-a3b	0.06	0.22	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Qwen3 14B	qwen3-14b	0.05	0.22	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Qwen2.5 VL 32B Instruct	qwen2.5-vl-32b-instruct	0.05	0.22	Provider: Chutes, Context: 16384, Output Limit: 16384
chutes	Qwen3Guard Gen 0.6B	qwen3guard-gen-0.6b	0.01	0.01	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.08	0.55	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	Qwen2.5 Coder 32B Instruct	qwen2.5-coder-32b-instruct	0.03	0.11	Provider: Chutes, Context: 32768, Output Limit: 32768
chutes	Qwen2.5 72B Instruct	qwen2.5-72b-instruct	0.13	0.52	Provider: Chutes, Context: 32768, Output Limit: 32768
chutes	Qwen2.5 VL 72B Instruct TEE	qwen2.5-vl-72b-instruct-tee	0.15	0.60	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Qwen3 235B A22B	qwen3-235b-a22b	0.30	1.20	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Qwen2.5 VL 72B Instruct	qwen2.5-vl-72b-instruct	0.07	0.26	Provider: Chutes, Context: 32768, Output Limit: 32768
chutes	Qwen3 235B A22B Instruct 2507 TEE	qwen3-235b-a22b-instruct-2507-tee	0.08	0.55	Provider: Chutes, Context: 262144, Output Limit: 65536
chutes	Qwen3 32B	qwen3-32b	0.08	0.24	Provider: Chutes, Context: 40960, Output Limit: 40960
chutes	Qwen3 VL 235B A22B Instruct	qwen3-vl-235b-a22b-instruct	0.30	1.20	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	Qwen3 VL 235B A22B Thinking	qwen3-vl-235b-a22b-thinking	0.30	1.20	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	Qwen3 30B A3B Instruct 2507	qwen3-30b-a3b-instruct-2507	0.08	0.33	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	Qwen3 Coder 480B A35B Instruct FP8 TEE	qwen3-coder-480b-a35b-instruct-fp8-tee	0.22	0.95	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	0.11	0.60	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	Qwen3 Next 80B A3B Instruct	qwen3-next-80b-a3b-instruct	0.10	0.80	Provider: Chutes, Context: 262144, Output Limit: 262144
chutes	GLM 4.6 TEE	glm-4.6-tee	0.40	1.75	Provider: Chutes, Context: 202752, Output Limit: 65536
chutes	GLM 4.5 TEE	glm-4.5-tee	0.35	1.55	Provider: Chutes, Context: 131072, Output Limit: 65536
chutes	GLM 4.6V	glm-4.6v	0.30	0.90	Provider: Chutes, Context: 131072, Output Limit: 65536
chutes	GLM 4.7 TEE	glm-4.7-tee	0.40	1.50	Provider: Chutes, Context: 202752, Output Limit: 65535
chutes	GLM 4.5 Air	glm-4.5-air	0.05	0.22	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	DeepSeek V3 0324 TEE	deepseek-v3-0324-tee	0.24	0.84	Provider: Chutes, Context: 163840, Output Limit: 65536
chutes	DeepSeek V3.2 Speciale TEE	deepseek-v3.2-speciale-tee	0.27	0.41	Provider: Chutes, Context: 163840, Output Limit: 65536
chutes	DeepSeek V3.1 Terminus TEE	deepseek-v3.1-terminus-tee	0.23	0.90	Provider: Chutes, Context: 163840, Output Limit: 65536
chutes	DeepSeek V3	deepseek-v3	0.30	1.20	Provider: Chutes, Context: 163840, Output Limit: 163840
chutes	DeepSeek R1 TEE	deepseek-r1-tee	0.30	1.20	Provider: Chutes, Context: 163840, Output Limit: 163840
chutes	DeepSeek R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.03	0.11	Provider: Chutes, Context: 131072, Output Limit: 131072
chutes	DeepSeek V3.1	deepseek-v3.1	0.20	0.80	Provider: Chutes, Context: 163840, Output Limit: 65536
chutes	DeepSeek R1 0528 TEE	deepseek-r1-0528-tee	0.40	1.75	Provider: Chutes, Context: 163840, Output Limit: 163840
chutes	DeepSeek V3.2 TEE	deepseek-v3.2-tee	0.27	0.41	Provider: Chutes, Context: 163840, Output Limit: 16384
chutes	DeepSeek V3.1 TEE	deepseek-v3.1-tee	0.20	0.80	Provider: Chutes, Context: 163840, Output Limit: 65536
kimiforcoding	Kimi K2 Thinking	kimi-k2-thinking	0.00	0.00	Provider: Kimi For Coding, Context: 262144, Output Limit: 32768
cortecs	Nova Pro 1.0	nova-pro-v1	1.02	4.06	Provider: Cortecs, Context: 300000, Output Limit: 5000
cortecs	Devstral 2 2512	devstral-2512	0.00	0.00	Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs	INTELLECT 3	intellect-3	0.22	1.20	Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs	Claude 4.5 Sonnet	claude-4-5-sonnet	3.26	16.30	Provider: Cortecs, Context: 200000, Output Limit: 200000
cortecs	DeepSeek V3 0324	deepseek-v3-0324	0.55	1.65	Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs	Kimi K2 Thinking	kimi-k2-thinking	0.66	2.73	Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs	Kimi K2 Instruct	kimi-k2-instruct	0.55	2.65	Provider: Cortecs, Context: 131000, Output Limit: 131000
cortecs	GPT 4.1	gpt-4.1	2.35	9.42	Provider: Cortecs, Context: 1047576, Output Limit: 32768
cortecs	Gemini 2.5 Pro	gemini-2.5-pro	1.65	11.02	Provider: Cortecs, Context: 1048576, Output Limit: 65535
cortecs	GPT Oss 120b	gpt-oss-120b	0.00	0.00	Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs	Devstral Small 2 2512	devstral-small-2512	0.00	0.00	Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	0.44	1.98	Provider: Cortecs, Context: 262000, Output Limit: 262000
cortecs	Claude Sonnet 4	claude-sonnet-4	3.31	16.54	Provider: Cortecs, Context: 200000, Output Limit: 64000
cortecs	Llama 3.1 405B Instruct	llama-3.1-405b-instruct	0.00	0.00	Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs	Qwen3 Next 80B A3B Thinking	qwen3-next-80b-a3b-thinking	0.16	1.31	Provider: Cortecs, Context: 128000, Output Limit: 128000
cortecs	Qwen3 32B	qwen3-32b	0.10	0.33	Provider: Cortecs, Context: 16384, Output Limit: 16384
githubmodels	JAIS 30b Chat	jais-30b-chat	0.00	0.00	Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels	Grok 3	grok-3	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Grok 3 Mini	grok-3-mini	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Cohere Command R 08-2024	cohere-command-r-08-2024	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Cohere Command A	cohere-command-a	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Cohere Command R+ 08-2024	cohere-command-r-plus-08-2024	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Cohere Command R	cohere-command-r	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Cohere Command R+	cohere-command-r-plus	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	DeepSeek-R1-0528	deepseek-r1-0528	0.00	0.00	Provider: GitHub Models, Context: 65536, Output Limit: 8192
githubmodels	DeepSeek-R1	deepseek-r1	0.00	0.00	Provider: GitHub Models, Context: 65536, Output Limit: 8192
githubmodels	DeepSeek-V3-0324	deepseek-v3-0324	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Mistral Medium 3 (25.05)	mistral-medium-2505	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	Ministral 3B	ministral-3b	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Mistral Nemo	mistral-nemo	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Mistral Large 24.11	mistral-large-2411	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	Codestral 25.01	codestral-2501	0.00	0.00	Provider: GitHub Models, Context: 32000, Output Limit: 8192
githubmodels	Mistral Small 3.1	mistral-small-2503	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	Phi-3-medium instruct (128k)	phi-3-medium-128k-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-3-mini instruct (4k)	phi-3-mini-4k-instruct	0.00	0.00	Provider: GitHub Models, Context: 4096, Output Limit: 1024
githubmodels	Phi-3-small instruct (128k)	phi-3-small-128k-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-3.5-vision instruct (128k)	phi-3.5-vision-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-4	phi-4	0.00	0.00	Provider: GitHub Models, Context: 16000, Output Limit: 4096
githubmodels	Phi-4-mini-reasoning	phi-4-mini-reasoning	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-3-small instruct (8k)	phi-3-small-8k-instruct	0.00	0.00	Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels	Phi-3.5-mini instruct (128k)	phi-3.5-mini-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-4-multimodal-instruct	phi-4-multimodal-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-3-mini instruct (128k)	phi-3-mini-128k-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-3.5-MoE instruct (128k)	phi-3.5-moe-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-4-mini-instruct	phi-4-mini-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	Phi-3-medium instruct (4k)	phi-3-medium-4k-instruct	0.00	0.00	Provider: GitHub Models, Context: 4096, Output Limit: 1024
githubmodels	Phi-4-Reasoning	phi-4-reasoning	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 4096
githubmodels	MAI-DS-R1	mai-ds-r1	0.00	0.00	Provider: GitHub Models, Context: 65536, Output Limit: 8192
githubmodels	GPT-4.1-nano	gpt-4.1-nano	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels	GPT-4.1-mini	gpt-4.1-mini	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels	OpenAI o1-preview	o1-preview	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	OpenAI o3-mini	o3-mini	0.00	0.00	Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels	GPT-4o	gpt-4o	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels	GPT-4.1	gpt-4.1	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels	OpenAI o4-mini	o4-mini	0.00	0.00	Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels	OpenAI o1	o1	0.00	0.00	Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels	OpenAI o1-mini	o1-mini	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 65536
githubmodels	OpenAI o3	o3	0.00	0.00	Provider: GitHub Models, Context: 200000, Output Limit: 100000
githubmodels	GPT-4o mini	gpt-4o-mini	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 16384
githubmodels	Llama-3.2-11B-Vision-Instruct	llama-3.2-11b-vision-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Meta-Llama-3.1-405B-Instruct	meta-llama-3.1-405b-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	Llama 4 Maverick 17B 128E Instruct FP8	llama-4-maverick-17b-128e-instruct-fp8	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Meta-Llama-3-70B-Instruct	meta-llama-3-70b-instruct	0.00	0.00	Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels	Meta-Llama-3.1-70B-Instruct	meta-llama-3.1-70b-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	Llama-3.2-90B-Vision-Instruct	llama-3.2-90b-vision-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Meta-Llama-3-8B-Instruct	meta-llama-3-8b-instruct	0.00	0.00	Provider: GitHub Models, Context: 8192, Output Limit: 2048
githubmodels	Llama 4 Scout 17B 16E Instruct	llama-4-scout-17b-16e-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 8192
githubmodels	Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.00	0.00	Provider: GitHub Models, Context: 128000, Output Limit: 32768
githubmodels	AI21 Jamba 1.5 Large	ai21-jamba-1.5-large	0.00	0.00	Provider: GitHub Models, Context: 256000, Output Limit: 4096
githubmodels	AI21 Jamba 1.5 Mini	ai21-jamba-1.5-mini	0.00	0.00	Provider: GitHub Models, Context: 256000, Output Limit: 4096
togetherai	Kimi K2 Instruct	kimi-k2-instruct	1.00	3.00	Provider: Together AI, Context: 131072, Output Limit: 32768
togetherai	Kimi K2 Thinking	kimi-k2-thinking	1.20	4.00	Provider: Together AI, Context: 262144, Output Limit: 32768
togetherai	Rnj-1 Instruct	rnj-1-instruct	0.15	0.15	Provider: Together AI, Context: 32768, Output Limit: 32768
togetherai	GPT OSS 120B	gpt-oss-120b	0.15	0.60	Provider: Together AI, Context: 131072, Output Limit: 131072
togetherai	Llama 3.3 70B	llama-3.3-70b-instruct-turbo	0.88	0.88	Provider: Together AI, Context: 131072, Output Limit: 66536
togetherai	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct-fp8	2.00	2.00	Provider: Together AI, Context: 262144, Output Limit: 66536
togetherai	GLM 4.6	glm-4.6	0.60	2.20	Provider: Together AI, Context: 200000, Output Limit: 32768
togetherai	DeepSeek R1	deepseek-r1	3.00	7.00	Provider: Together AI, Context: 163839, Output Limit: 12288
togetherai	DeepSeek V3	deepseek-v3	1.25	1.25	Provider: Together AI, Context: 131072, Output Limit: 12288
togetherai	DeepSeek V3.1	deepseek-v3-1	0.60	1.70	Provider: Together AI, Context: 131072, Output Limit: 12288
azure	GPT-4.1 nano	gpt-4.1-nano	0.10	0.40	Provider: Azure, Context: 1047576, Output Limit: 32768
azure	text-embedding-3-small	text-embedding-3-small	0.02	0.00	Provider: Azure, Context: 8191, Output Limit: 1536
azure	Grok 4 Fast (Non-Reasoning)	grok-4-fast-non-reasoning	0.20	0.50	Provider: Azure, Context: 2000000, Output Limit: 30000
azure	DeepSeek-R1-0528	deepseek-r1-0528	1.35	5.40	Provider: Azure, Context: 163840, Output Limit: 163840
azure	Grok 4 Fast (Reasoning)	grok-4-fast-reasoning	0.20	0.50	Provider: Azure, Context: 2000000, Output Limit: 30000
azure	Phi-3-medium-instruct (128k)	phi-3-medium-128k-instruct	0.17	0.68	Provider: Azure, Context: 128000, Output Limit: 4096
azure	GPT-4	gpt-4	60.00	120.00	Provider: Azure, Context: 8192, Output Limit: 8192
azure	Claude Opus 4.1	claude-opus-4-1	15.00	75.00	Provider: Azure, Context: 200000, Output Limit: 32000
azure	GPT-5.2 Chat	gpt-5.2-chat	1.75	14.00	Provider: Azure, Context: 128000, Output Limit: 16384
azure	Llama-3.2-11B-Vision-Instruct	llama-3.2-11b-vision-instruct	0.37	0.37	Provider: Azure, Context: 128000, Output Limit: 8192
azure	Embed v4	cohere-embed-v-4-0	0.12	0.00	Provider: Azure, Context: 128000, Output Limit: 1536
azure	Command R	cohere-command-r-08-2024	0.15	0.60	Provider: Azure, Context: 128000, Output Limit: 4000
azure	Grok 4	grok-4	3.00	15.00	Provider: Azure, Context: 256000, Output Limit: 64000
azure	Embed v3 Multilingual	cohere-embed-v3-multilingual	0.10	0.00	Provider: Azure, Context: 512, Output Limit: 1024
azure	Phi-4-mini	phi-4-mini	0.08	0.30	Provider: Azure, Context: 128000, Output Limit: 4096
azure	GPT-4 32K	gpt-4-32k	60.00	120.00	Provider: Azure, Context: 32768, Output Limit: 32768
azure	Meta-Llama-3.1-405B-Instruct	meta-llama-3.1-405b-instruct	5.33	16.00	Provider: Azure, Context: 128000, Output Limit: 32768
azure	DeepSeek-R1	deepseek-r1	1.35	5.40	Provider: Azure, Context: 163840, Output Limit: 163840
azure	Grok Code Fast 1	grok-code-fast-1	0.20	1.50	Provider: Azure, Context: 256000, Output Limit: 10000
azure	GPT-5.1 Codex	gpt-5.1-codex	1.25	10.00	Provider: Azure, Context: 400000, Output Limit: 128000
azure	Phi-3-mini-instruct (4k)	phi-3-mini-4k-instruct	0.13	0.52	Provider: Azure, Context: 4096, Output Limit: 1024
azure	Claude Haiku 4.5	claude-haiku-4-5	1.00	5.00	Provider: Azure, Context: 200000, Output Limit: 64000
azure	DeepSeek-V3.2-Speciale	deepseek-v3.2-speciale	0.28	0.42	Provider: Azure, Context: 128000, Output Limit: 128000
azure	Mistral Medium 3	mistral-medium-2505	0.40	2.00	Provider: Azure, Context: 128000, Output Limit: 128000
azure	Claude Opus 4.5	claude-opus-4-5	5.00	25.00	Provider: Azure, Context: 200000, Output Limit: 64000
azure	Phi-3-small-instruct (128k)	phi-3-small-128k-instruct	0.15	0.60	Provider: Azure, Context: 128000, Output Limit: 4096
azure	Command A	cohere-command-a	2.50	10.00	Provider: Azure, Context: 256000, Output Limit: 8000
azure	Command R+	cohere-command-r-plus-08-2024	2.50	10.00	Provider: Azure, Context: 128000, Output Limit: 4000
azure	Llama 4 Maverick 17B 128E Instruct FP8	llama-4-maverick-17b-128e-instruct-fp8	0.25	1.00	Provider: Azure, Context: 128000, Output Limit: 8192
azure	GPT-4.1 mini	gpt-4.1-mini	0.40	1.60	Provider: Azure, Context: 1047576, Output Limit: 32768
azure	GPT-5 Chat	gpt-5-chat	1.25	10.00	Provider: Azure, Context: 128000, Output Limit: 16384
azure	DeepSeek-V3.1	deepseek-v3.1	0.56	1.68	Provider: Azure, Context: 131072, Output Limit: 131072
azure	Phi-4	phi-4	0.13	0.50	Provider: Azure, Context: 128000, Output Limit: 4096
azure	Phi-4-mini-reasoning	phi-4-mini-reasoning	0.08	0.30	Provider: Azure, Context: 128000, Output Limit: 4096
azure	Claude Sonnet 4.5	claude-sonnet-4-5	3.00	15.00	Provider: Azure, Context: 200000, Output Limit: 64000
azure	GPT-3.5 Turbo 0125	gpt-3.5-turbo-0125	0.50	1.50	Provider: Azure, Context: 16384, Output Limit: 16384
azure	Grok 3	grok-3	3.00	15.00	Provider: Azure, Context: 131072, Output Limit: 8192
azure	text-embedding-3-large	text-embedding-3-large	0.13	0.00	Provider: Azure, Context: 8191, Output Limit: 3072
azure	Meta-Llama-3-70B-Instruct	meta-llama-3-70b-instruct	2.68	3.54	Provider: Azure, Context: 8192, Output Limit: 2048
azure	DeepSeek-V3-0324	deepseek-v3-0324	1.14	4.56	Provider: Azure, Context: 131072, Output Limit: 131072
azure	Phi-3-small-instruct (8k)	phi-3-small-8k-instruct	0.15	0.60	Provider: Azure, Context: 8192, Output Limit: 2048
azure	Meta-Llama-3.1-70B-Instruct	meta-llama-3.1-70b-instruct	2.68	3.54	Provider: Azure, Context: 128000, Output Limit: 32768
azure	GPT-4 Turbo	gpt-4-turbo	10.00	30.00	Provider: Azure, Context: 128000, Output Limit: 4096
azure	GPT-3.5 Turbo 0613	gpt-3.5-turbo-0613	3.00	4.00	Provider: Azure, Context: 16384, Output Limit: 16384
azure	Phi-3.5-mini-instruct	phi-3.5-mini-instruct	0.13	0.52	Provider: Azure, Context: 128000, Output Limit: 4096
azure	o1-preview	o1-preview	16.50	66.00	Provider: Azure, Context: 128000, Output Limit: 32768
azure	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.71	0.71	Provider: Azure, Context: 128000, Output Limit: 32768
azure	GPT-5.1 Codex Mini	gpt-5.1-codex-mini	0.25	2.00	Provider: Azure, Context: 400000, Output Limit: 128000
azure	Kimi K2 Thinking	kimi-k2-thinking	0.60	2.50	Provider: Azure, Context: 262144, Output Limit: 262144
azure	Model Router	model-router	0.14	0.00	Provider: Azure, Context: 128000, Output Limit: 16384
azure	o3-mini	o3-mini	1.10	4.40	Provider: Azure, Context: 200000, Output Limit: 100000
azure	GPT-5.1	gpt-5.1	1.25	10.00	Provider: Azure, Context: 272000, Output Limit: 128000
azure	GPT-5 Nano	gpt-5-nano	0.05	0.40	Provider: Azure, Context: 272000, Output Limit: 128000
azure	GPT-5-Codex	gpt-5-codex	1.25	10.00	Provider: Azure, Context: 400000, Output Limit: 128000
azure	Llama-3.2-90B-Vision-Instruct	llama-3.2-90b-vision-instruct	2.04	2.04	Provider: Azure, Context: 128000, Output Limit: 8192
azure	Phi-3-mini-instruct (128k)	phi-3-mini-128k-instruct	0.13	0.52	Provider: Azure, Context: 128000, Output Limit: 4096
azure	GPT-4o	gpt-4o	2.50	10.00	Provider: Azure, Context: 128000, Output Limit: 16384
azure	GPT-3.5 Turbo 0301	gpt-3.5-turbo-0301	1.50	2.00	Provider: Azure, Context: 4096, Output Limit: 4096
azure	Ministral 3B	ministral-3b	0.04	0.04	Provider: Azure, Context: 128000, Output Limit: 8192
azure	GPT-4.1	gpt-4.1	2.00	8.00	Provider: Azure, Context: 1047576, Output Limit: 32768
azure	o4-mini	o4-mini	1.10	4.40	Provider: Azure, Context: 200000, Output Limit: 100000
azure	Phi-4-multimodal	phi-4-multimodal	0.08	0.32	Provider: Azure, Context: 128000, Output Limit: 4096
azure	Meta-Llama-3-8B-Instruct	meta-llama-3-8b-instruct	0.30	0.61	Provider: Azure, Context: 8192, Output Limit: 2048
azure	o1	o1	15.00	60.00	Provider: Azure, Context: 200000, Output Limit: 100000
azure	Grok 3 Mini	grok-3-mini	0.30	0.50	Provider: Azure, Context: 131072, Output Limit: 8192
azure	GPT-5.1 Chat	gpt-5.1-chat	1.25	10.00	Provider: Azure, Context: 128000, Output Limit: 16384
azure	Phi-3.5-MoE-instruct	phi-3.5-moe-instruct	0.16	0.64	Provider: Azure, Context: 128000, Output Limit: 4096
azure	GPT-5 Mini	gpt-5-mini	0.25	2.00	Provider: Azure, Context: 272000, Output Limit: 128000
azure	o1-mini	o1-mini	1.10	4.40	Provider: Azure, Context: 128000, Output Limit: 65536
azure	Llama 4 Scout 17B 16E Instruct	llama-4-scout-17b-16e-instruct	0.20	0.78	Provider: Azure, Context: 128000, Output Limit: 8192
azure	Embed v3 English	cohere-embed-v3-english	0.10	0.00	Provider: Azure, Context: 512, Output Limit: 1024
azure	text-embedding-ada-002	text-embedding-ada-002	0.10	0.00	Provider: Azure, Context: 8192, Output Limit: 1536
azure	Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.30	0.61	Provider: Azure, Context: 128000, Output Limit: 32768
azure	GPT-5.1 Codex Max	gpt-5.1-codex-max	1.25	10.00	Provider: Azure, Context: 400000, Output Limit: 128000
azure	GPT-3.5 Turbo Instruct	gpt-3.5-turbo-instruct	1.50	2.00	Provider: Azure, Context: 4096, Output Limit: 4096
azure	Mistral Nemo	mistral-nemo	0.15	0.15	Provider: Azure, Context: 128000, Output Limit: 128000
azure	o3	o3	2.00	8.00	Provider: Azure, Context: 200000, Output Limit: 100000
azure	Codex Mini	codex-mini	1.50	6.00	Provider: Azure, Context: 200000, Output Limit: 100000
azure	Phi-3-medium-instruct (4k)	phi-3-medium-4k-instruct	0.17	0.68	Provider: Azure, Context: 4096, Output Limit: 1024
azure	Phi-4-reasoning	phi-4-reasoning	0.13	0.50	Provider: Azure, Context: 32000, Output Limit: 4096
azure	GPT-4 Turbo Vision	gpt-4-turbo-vision	10.00	30.00	Provider: Azure, Context: 128000, Output Limit: 4096
azure	Phi-4-reasoning-plus	phi-4-reasoning-plus	0.13	0.50	Provider: Azure, Context: 32000, Output Limit: 4096
azure	GPT-4o mini	gpt-4o-mini	0.15	0.60	Provider: Azure, Context: 128000, Output Limit: 16384
azure	GPT-5	gpt-5	1.25	10.00	Provider: Azure, Context: 272000, Output Limit: 128000
azure	MAI-DS-R1	mai-ds-r1	1.35	5.40	Provider: Azure, Context: 128000, Output Limit: 8192
azure	DeepSeek-V3.2	deepseek-v3.2	0.28	0.42	Provider: Azure, Context: 128000, Output Limit: 128000
azure	GPT-5 Pro	gpt-5-pro	15.00	120.00	Provider: Azure, Context: 400000, Output Limit: 272000
azure	Mistral Large 24.11	mistral-large-2411	2.00	6.00	Provider: Azure, Context: 128000, Output Limit: 32768
azure	GPT-5.2	gpt-5.2	1.75	14.00	Provider: Azure, Context: 400000, Output Limit: 128000
azure	Codestral 25.01	codestral-2501	0.30	0.90	Provider: Azure, Context: 256000, Output Limit: 256000
azure	Mistral Small 3.1	mistral-small-2503	0.10	0.30	Provider: Azure, Context: 128000, Output Limit: 32768
azure	GPT-3.5 Turbo 1106	gpt-3.5-turbo-1106	1.00	2.00	Provider: Azure, Context: 16384, Output Limit: 16384
baseten	Kimi K2 Instruct 0905	kimi-k2-instruct-0905	0.60	2.50	Provider: Baseten, Context: 262144, Output Limit: 262144
baseten	Kimi K2 Thinking	kimi-k2-thinking	0.60	2.50	Provider: Baseten, Context: 262144, Output Limit: 262144
baseten	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	0.38	1.53	Provider: Baseten, Context: 262144, Output Limit: 66536
baseten	GLM-4.7	glm-4.7	0.60	2.20	Provider: Baseten, Context: 204800, Output Limit: 131072
baseten	GLM 4.6	glm-4.6	0.60	2.20	Provider: Baseten, Context: 200000, Output Limit: 200000
baseten	DeepSeek V3.2	deepseek-v3.2	0.30	0.45	Provider: Baseten, Context: 163800, Output Limit: 131100
siliconflow	inclusionAI/Ling-mini-2.0	ling-mini-2.0	0.07	0.28	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	inclusionAI/Ling-flash-2.0	ling-flash-2.0	0.14	0.57	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	inclusionAI/Ring-flash-2.0	ring-flash-2.0	0.14	0.57	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	moonshotai/Kimi-K2-Instruct	kimi-k2-instruct	0.58	2.29	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	moonshotai/Kimi-Dev-72B	kimi-dev-72b	0.29	1.15	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	moonshotai/Kimi-K2-Instruct-0905	kimi-k2-instruct-0905	0.40	2.00	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	moonshotai/Kimi-K2-Thinking	kimi-k2-thinking	0.55	2.50	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	tencent/Hunyuan-MT-7B	hunyuan-mt-7b	0.00	0.00	Provider: SiliconFlow, Context: 33000, Output Limit: 33000
siliconflow	tencent/Hunyuan-A13B-Instruct	hunyuan-a13b-instruct	0.14	0.57	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	MiniMaxAI/MiniMax-M2	minimax-m2	0.30	1.20	Provider: SiliconFlow, Context: 197000, Output Limit: 131000
siliconflow	MiniMaxAI/MiniMax-M1-80k	minimax-m1-80k	0.55	2.20	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	THUDM/GLM-4-32B-0414	glm-4-32b-0414	0.27	0.27	Provider: SiliconFlow, Context: 33000, Output Limit: 33000
siliconflow	THUDM/GLM-4.1V-9B-Thinking	glm-4.1v-9b-thinking	0.04	0.14	Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow	THUDM/GLM-Z1-9B-0414	glm-z1-9b-0414	0.09	0.09	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	THUDM/GLM-4-9B-0414	glm-4-9b-0414	0.09	0.09	Provider: SiliconFlow, Context: 33000, Output Limit: 33000
siliconflow	THUDM/GLM-Z1-32B-0414	glm-z1-32b-0414	0.14	0.57	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	openai/gpt-oss-20b	gpt-oss-20b	0.04	0.18	Provider: SiliconFlow, Context: 131000, Output Limit: 8000
siliconflow	openai/gpt-oss-120b	gpt-oss-120b	0.05	0.45	Provider: SiliconFlow, Context: 131000, Output Limit: 8000
siliconflow	stepfun-ai/step3	step3	0.57	1.42	Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow	nex-agi/DeepSeek-V3.1-Nex-N1	deepseek-v3.1-nex-n1	0.50	2.00	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	baidu/ERNIE-4.5-300B-A47B	ernie-4.5-300b-a47b	0.28	1.10	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	z-ai/GLM-4.5	glm-4.5	0.40	2.00	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	z-ai/GLM-4.5-Air	glm-4.5-air	0.14	0.86	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	ByteDance-Seed/Seed-OSS-36B-Instruct	seed-oss-36b-instruct	0.21	0.57	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	meta-llama/Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.06	0.06	Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow	Qwen/Qwen3-30B-A3B	qwen3-30b-a3b	0.09	0.45	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	Qwen/Qwen3-30B-A3B-Thinking-2507	qwen3-30b-a3b-thinking-2507	0.09	0.30	Provider: SiliconFlow, Context: 262000, Output Limit: 131000
siliconflow	Qwen/Qwen3-VL-30B-A3B-Instruct	qwen3-vl-30b-a3b-instruct	0.29	1.00	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-14B	qwen3-14b	0.07	0.28	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	Qwen/Qwen2.5-VL-32B-Instruct	qwen2.5-vl-32b-instruct	0.27	0.27	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	Qwen/Qwen3-Omni-30B-A3B-Captioner	qwen3-omni-30b-a3b-captioner	0.10	0.40	Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow	Qwen/Qwen3-8B	qwen3-8b	0.06	0.06	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	Qwen/Qwen3-Omni-30B-A3B-Instruct	qwen3-omni-30b-a3b-instruct	0.10	0.40	Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow	Qwen/Qwen3-VL-8B-Thinking	qwen3-vl-8b-thinking	0.18	2.00	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-235B-A22B-Instruct-2507	qwen3-235b-a22b-instruct-2507	0.09	0.60	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen2.5-Coder-32B-Instruct	qwen2.5-coder-32b-instruct	0.18	0.18	Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow	Qwen/Qwen2.5-32B-Instruct	qwen2.5-32b-instruct	0.18	0.18	Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow	Qwen/Qwen2.5-72B-Instruct-128K	qwen2.5-72b-instruct-128k	0.59	0.59	Provider: SiliconFlow, Context: 131000, Output Limit: 4000
siliconflow	Qwen/Qwen2.5-72B-Instruct	qwen2.5-72b-instruct	0.59	0.59	Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow	Qwen/Qwen3-Coder-30B-A3B-Instruct	qwen3-coder-30b-a3b-instruct	0.07	0.28	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen2.5-7B-Instruct	qwen2.5-7b-instruct	0.05	0.05	Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow	Qwen/Qwen3-235B-A22B	qwen3-235b-a22b	0.35	1.42	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	Qwen/Qwen2.5-VL-72B-Instruct	qwen2.5-vl-72b-instruct	0.59	0.59	Provider: SiliconFlow, Context: 131000, Output Limit: 4000
siliconflow	Qwen/QwQ-32B	qwq-32b	0.15	0.58	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	Qwen/Qwen2.5-VL-7B-Instruct	qwen2.5-vl-7b-instruct	0.05	0.05	Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow	Qwen/Qwen3-32B	qwen3-32b	0.14	0.57	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	Qwen/Qwen3-VL-8B-Instruct	qwen3-vl-8b-instruct	0.18	0.68	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-VL-235B-A22B-Instruct	qwen3-vl-235b-a22b-instruct	0.30	1.50	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-Coder-480B-A35B-Instruct	qwen3-coder-480b-a35b-instruct	0.25	1.00	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-VL-235B-A22B-Thinking	qwen3-vl-235b-a22b-thinking	0.45	3.50	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-30B-A3B-Instruct-2507	qwen3-30b-a3b-instruct-2507	0.09	0.30	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-VL-30B-A3B-Thinking	qwen3-vl-30b-a3b-thinking	0.29	1.00	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-VL-32B-Thinking	qwen3-vl-32b-thinking	0.20	1.50	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-235B-A22B-Thinking-2507	qwen3-235b-a22b-thinking-2507	0.13	0.60	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-Omni-30B-A3B-Thinking	qwen3-omni-30b-a3b-thinking	0.10	0.40	Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow	Qwen/Qwen3-VL-32B-Instruct	qwen3-vl-32b-instruct	0.20	0.60	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen3-Next-80B-A3B-Instruct	qwen3-next-80b-a3b-instruct	0.14	1.40	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	Qwen/Qwen2.5-14B-Instruct	qwen2.5-14b-instruct	0.10	0.10	Provider: SiliconFlow, Context: 33000, Output Limit: 4000
siliconflow	Qwen/Qwen3-Next-80B-A3B-Thinking	qwen3-next-80b-a3b-thinking	0.14	0.57	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
siliconflow	zai-org/GLM-4.6	glm-4.6	0.50	1.90	Provider: SiliconFlow, Context: 205000, Output Limit: 205000
siliconflow	zai-org/GLM-4.5V	glm-4.5v	0.14	0.86	Provider: SiliconFlow, Context: 66000, Output Limit: 66000
siliconflow	deepseek-ai/DeepSeek-R1	deepseek-r1	0.50	2.18	Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow	deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	deepseek-r1-distill-qwen-32b	0.18	0.18	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	deepseek-ai/deepseek-vl2	deepseek-vl2	0.15	0.15	Provider: SiliconFlow, Context: 4000, Output Limit: 4000
siliconflow	deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	deepseek-r1-distill-qwen-14b	0.10	0.10	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
siliconflow	deepseek-ai/DeepSeek-V3.2-Exp	deepseek-v3.2-exp	0.27	0.41	Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow	deepseek-ai/DeepSeek-V3.1-Terminus	deepseek-v3.1-terminus	0.27	1.00	Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow	deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	deepseek-r1-distill-qwen-7b	0.05	0.05	Provider: SiliconFlow, Context: 33000, Output Limit: 16000
siliconflow	deepseek-ai/DeepSeek-V3	deepseek-v3	0.25	1.00	Provider: SiliconFlow, Context: 164000, Output Limit: 164000
siliconflow	deepseek-ai/DeepSeek-V3.1	deepseek-v3.1	0.27	1.00	Provider: SiliconFlow, Context: 164000, Output Limit: 164000
helicone	OpenAI GPT-4.1 Nano	gpt-4.1-nano	0.10	0.40	Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone	xAI Grok 4 Fast Non-Reasoning	grok-4-fast-non-reasoning	0.20	0.50	Provider: Helicone, Context: 2000000, Output Limit: 2000000
helicone	Qwen3 Coder 480B A35B Instruct Turbo	qwen3-coder	0.22	0.95	Provider: Helicone, Context: 262144, Output Limit: 16384
helicone	DeepSeek V3	deepseek-v3	0.56	1.68	Provider: Helicone, Context: 128000, Output Limit: 8192
helicone	Anthropic: Claude Opus 4	claude-opus-4	15.00	75.00	Provider: Helicone, Context: 200000, Output Limit: 32000
helicone	xAI: Grok 4 Fast Reasoning	grok-4-fast-reasoning	0.20	0.50	Provider: Helicone, Context: 2000000, Output Limit: 2000000
helicone	Meta Llama 3.1 8B Instant	llama-3.1-8b-instant	0.05	0.08	Provider: Helicone, Context: 131072, Output Limit: 32678
helicone	Anthropic: Claude Opus 4.1	claude-opus-4-1	15.00	75.00	Provider: Helicone, Context: 200000, Output Limit: 32000
helicone	xAI Grok 4	grok-4	3.00	15.00	Provider: Helicone, Context: 256000, Output Limit: 256000
helicone	Qwen3 Next 80B A3B Instruct	qwen3-next-80b-a3b-instruct	0.14	1.40	Provider: Helicone, Context: 262000, Output Limit: 16384
helicone	Meta Llama 4 Maverick 17B 128E	llama-4-maverick	0.15	0.60	Provider: Helicone, Context: 131072, Output Limit: 8192
helicone	Meta Llama Prompt Guard 2 86M	llama-prompt-guard-2-86m	0.01	0.01	Provider: Helicone, Context: 512, Output Limit: 2
helicone	xAI Grok 4.1 Fast Reasoning	grok-4-1-fast-reasoning	0.20	0.50	Provider: Helicone, Context: 2000000, Output Limit: 2000000
helicone	xAI Grok Code Fast 1	grok-code-fast-1	0.20	1.50	Provider: Helicone, Context: 256000, Output Limit: 10000
helicone	Anthropic: Claude 4.5 Haiku	claude-4.5-haiku	1.00	5.00	Provider: Helicone, Context: 200000, Output Limit: 8192
helicone	Meta Llama 3.1 8B Instruct Turbo	llama-3.1-8b-instruct-turbo	0.02	0.03	Provider: Helicone, Context: 128000, Output Limit: 128000
helicone	OpenAI: GPT-5.1 Codex	gpt-5.1-codex	1.25	10.00	Provider: Helicone, Context: 400000, Output Limit: 128000
helicone	OpenAI GPT-4.1 Mini	gpt-4.1-mini-2025-04-14	0.40	1.60	Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone	Meta Llama Guard 4 12B	llama-guard-4	0.21	0.21	Provider: Helicone, Context: 131072, Output Limit: 1024
helicone	Meta Llama 3.1 8B Instruct	llama-3.1-8b-instruct	0.02	0.05	Provider: Helicone, Context: 16384, Output Limit: 16384
helicone	Google Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	Provider: Helicone, Context: 1048576, Output Limit: 65536
helicone	Google Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Provider: Helicone, Context: 1048576, Output Limit: 65535
helicone	OpenAI GPT-4.1 Mini	gpt-4.1-mini	0.40	1.60	Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone	DeepSeek V3.1 Terminus	deepseek-v3.1-terminus	0.27	1.00	Provider: Helicone, Context: 128000, Output Limit: 16384
helicone	Meta Llama Prompt Guard 2 22M	llama-prompt-guard-2-22m	0.01	0.01	Provider: Helicone, Context: 512, Output Limit: 2
helicone	Anthropic: Claude 3.5 Sonnet v2	claude-3.5-sonnet-v2	3.00	15.00	Provider: Helicone, Context: 200000, Output Limit: 8192
helicone	Perplexity Sonar Deep Research	sonar-deep-research	2.00	8.00	Provider: Helicone, Context: 127000, Output Limit: 4096
helicone	Google Gemini 2.5 Flash Lite	gemini-2.5-flash-lite	0.10	0.40	Provider: Helicone, Context: 1048576, Output Limit: 65535
helicone	Anthropic: Claude Sonnet 4.5 (20250929)	claude-sonnet-4-5-20250929	3.00	15.00	Provider: Helicone, Context: 200000, Output Limit: 64000
helicone	xAI Grok 3	grok-3	3.00	15.00	Provider: Helicone, Context: 131072, Output Limit: 131072
helicone	Mistral Small	mistral-small	75.00	200.00	Provider: Helicone, Context: 128000, Output Limit: 128000
helicone	Kimi K2 (07/11)	kimi-k2-0711	0.57	2.30	Provider: Helicone, Context: 131072, Output Limit: 16384
helicone	OpenAI ChatGPT-4o	chatgpt-4o-latest	5.00	20.00	Provider: Helicone, Context: 128000, Output Limit: 16384
helicone	Qwen3 Coder 30B A3B Instruct	qwen3-coder-30b-a3b-instruct	0.10	0.30	Provider: Helicone, Context: 262144, Output Limit: 262144
helicone	Kimi K2 (09/05)	kimi-k2-0905	0.50	2.00	Provider: Helicone, Context: 262144, Output Limit: 16384
helicone	Perplexity Sonar Reasoning	sonar-reasoning	1.00	5.00	Provider: Helicone, Context: 127000, Output Limit: 4096
helicone	Meta Llama 3.3 70B Instruct	llama-3.3-70b-instruct	0.13	0.39	Provider: Helicone, Context: 128000, Output Limit: 16400
helicone	OpenAI: GPT-5.1 Codex Mini	gpt-5.1-codex-mini	0.25	2.00	Provider: Helicone, Context: 400000, Output Limit: 128000
helicone	Kimi K2 Thinking	kimi-k2-thinking	0.48	2.00	Provider: Helicone, Context: 256000, Output Limit: 262144
helicone	OpenAI o3 Mini	o3-mini	1.10	4.40	Provider: Helicone, Context: 200000, Output Limit: 100000
helicone	Anthropic: Claude Sonnet 4.5	claude-4.5-sonnet	3.00	15.00	Provider: Helicone, Context: 200000, Output Limit: 64000
helicone	OpenAI GPT-5.1	gpt-5.1	1.25	10.00	Provider: Helicone, Context: 400000, Output Limit: 128000
helicone	OpenAI Codex Mini Latest	codex-mini-latest	1.50	6.00	Provider: Helicone, Context: 200000, Output Limit: 100000
helicone	OpenAI GPT-5 Nano	gpt-5-nano	0.05	0.40	Provider: Helicone, Context: 400000, Output Limit: 128000
helicone	OpenAI: GPT-5 Codex	gpt-5-codex	1.25	10.00	Provider: Helicone, Context: 400000, Output Limit: 128000
helicone	OpenAI GPT-4o	gpt-4o	2.50	10.00	Provider: Helicone, Context: 128000, Output Limit: 16384
helicone	DeepSeek TNG R1T2 Chimera	deepseek-tng-r1t2-chimera	0.30	1.20	Provider: Helicone, Context: 130000, Output Limit: 163840
helicone	Anthropic: Claude Opus 4.5	claude-4.5-opus	5.00	25.00	Provider: Helicone, Context: 200000, Output Limit: 64000
helicone	OpenAI GPT-4.1	gpt-4.1	2.00	8.00	Provider: Helicone, Context: 1047576, Output Limit: 32768
helicone	Perplexity Sonar	sonar	1.00	1.00	Provider: Helicone, Context: 127000, Output Limit: 4096
helicone	Zai GLM-4.6	glm-4.6	0.45	1.50	Provider: Helicone, Context: 204800, Output Limit: 131072
helicone	OpenAI o4 Mini	o4-mini	1.10	4.40	Provider: Helicone, Context: 200000, Output Limit: 100000
helicone	Qwen3 235B A22B Thinking	qwen3-235b-a22b-thinking	0.30	2.90	Provider: Helicone, Context: 262144, Output Limit: 81920
helicone	Hermes 2 Pro Llama 3 8B	hermes-2-pro-llama-3-8b	0.14	0.14	Provider: Helicone, Context: 131072, Output Limit: 131072
helicone	OpenAI: o1	o1	15.00	60.00	Provider: Helicone, Context: 200000, Output Limit: 100000
helicone	xAI Grok 3 Mini	grok-3-mini	0.30	0.50	Provider: Helicone, Context: 131072, Output Limit: 131072
helicone	Perplexity Sonar Pro	sonar-pro	3.00	15.00	Provider: Helicone, Context: 200000, Output Limit: 4096
helicone	OpenAI GPT-5 Mini	gpt-5-mini	0.25	2.00	Provider: Helicone, Context: 400000, Output Limit: 128000
helicone	DeepSeek R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.03	0.13	Provider: Helicone, Context: 128000, Output Limit: 4096
helicone	OpenAI: o1-mini	o1-mini	1.10	4.40	Provider: Helicone, Context: 128000, Output Limit: 65536
helicone	Anthropic: Claude 3.7 Sonnet	claude-3.7-sonnet	3.00	15.00	Provider: Helicone, Context: 200000, Output Limit: 64000
helicone	Anthropic: Claude 3 Haiku	claude-3-haiku-20240307	0.25	1.25	Provider: Helicone, Context: 200000, Output Limit: 4096
helicone	OpenAI o3 Pro	o3-pro	20.00	80.00	Provider: Helicone, Context: 200000, Output Limit: 100000
helicone	Qwen2.5 Coder 7B fast	qwen2.5-coder-7b-fast	0.03	0.09	Provider: Helicone, Context: 32000, Output Limit: 8192
helicone	DeepSeek Reasoner	deepseek-reasoner	0.56	1.68	Provider: Helicone, Context: 128000, Output Limit: 64000
helicone	Google Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Provider: Helicone, Context: 1048576, Output Limit: 65536
helicone	Google Gemma 3 12B	gemma-3-12b-it	0.05	0.10	Provider: Helicone, Context: 131072, Output Limit: 8192
helicone	Mistral Nemo	mistral-nemo	20.00	40.00	Provider: Helicone, Context: 128000, Output Limit: 16400
helicone	OpenAI o3	o3	2.00	8.00	Provider: Helicone, Context: 200000, Output Limit: 100000
helicone	OpenAI GPT-OSS 20b	gpt-oss-20b	0.05	0.20	Provider: Helicone, Context: 131072, Output Limit: 131072
helicone	OpenAI GPT-OSS 120b	gpt-oss-120b	0.04	0.16	Provider: Helicone, Context: 131072, Output Limit: 131072
helicone	Anthropic: Claude 3.5 Haiku	claude-3.5-haiku	0.80	4.00	Provider: Helicone, Context: 200000, Output Limit: 8192
helicone	OpenAI GPT-5 Chat Latest	gpt-5-chat-latest	1.25	10.00	Provider: Helicone, Context: 128000, Output Limit: 16384
helicone	OpenAI GPT-4o-mini	gpt-4o-mini	0.15	0.60	Provider: Helicone, Context: 128000, Output Limit: 16384
helicone	Google Gemma 2	gemma2-9b-it	0.01	0.03	Provider: Helicone, Context: 8192, Output Limit: 8192
helicone	Anthropic: Claude Sonnet 4	claude-sonnet-4	3.00	15.00	Provider: Helicone, Context: 200000, Output Limit: 64000
helicone	Perplexity Sonar Reasoning Pro	sonar-reasoning-pro	2.00	8.00	Provider: Helicone, Context: 127000, Output Limit: 4096
helicone	OpenAI GPT-5	gpt-5	1.25	10.00	Provider: Helicone, Context: 400000, Output Limit: 128000
helicone	Qwen3 VL 235B A22B Instruct	qwen3-vl-235b-a22b-instruct	0.30	1.50	Provider: Helicone, Context: 256000, Output Limit: 16384
helicone	Qwen3 30B A3B	qwen3-30b-a3b	0.08	0.29	Provider: Helicone, Context: 41000, Output Limit: 41000
helicone	DeepSeek V3.2	deepseek-v3.2	0.27	0.41	Provider: Helicone, Context: 163840, Output Limit: 65536
helicone	xAI Grok 4.1 Fast Non-Reasoning	grok-4-1-fast-non-reasoning	0.20	0.50	Provider: Helicone, Context: 2000000, Output Limit: 30000
helicone	OpenAI: GPT-5 Pro	gpt-5-pro	15.00	120.00	Provider: Helicone, Context: 128000, Output Limit: 32768
helicone	Meta Llama 3.3 70B Versatile	llama-3.3-70b-versatile	0.59	0.79	Provider: Helicone, Context: 131072, Output Limit: 32678
helicone	Mistral-Large	mistral-large-2411	2.00	6.00	Provider: Helicone, Context: 128000, Output Limit: 32768
helicone	Anthropic: Claude Opus 4.1 (20250805)	claude-opus-4-1-20250805	15.00	75.00	Provider: Helicone, Context: 200000, Output Limit: 32000
helicone	Baidu Ernie 4.5 21B A3B Thinking	ernie-4.5-21b-a3b-thinking	0.07	0.28	Provider: Helicone, Context: 128000, Output Limit: 8000
helicone	OpenAI GPT-5.1 Chat	gpt-5.1-chat-latest	1.25	10.00	Provider: Helicone, Context: 128000, Output Limit: 16384
helicone	Qwen3 32B	qwen3-32b	0.29	0.59	Provider: Helicone, Context: 131072, Output Limit: 40960
helicone	Anthropic: Claude 4.5 Haiku (20251001)	claude-haiku-4-5-20251001	1.00	5.00	Provider: Helicone, Context: 200000, Output Limit: 8192
helicone	Meta Llama 4 Scout 17B 16E	llama-4-scout	0.08	0.30	Provider: Helicone, Context: 131072, Output Limit: 8192
huggingface	Kimi-K2-Instruct	kimi-k2-instruct	1.00	3.00	Provider: Hugging Face, Context: 131072, Output Limit: 16384
huggingface	Kimi-K2-Instruct-0905	kimi-k2-instruct-0905	1.00	3.00	Provider: Hugging Face, Context: 262144, Output Limit: 16384
huggingface	MiniMax-M2	minimax-m2	0.30	1.20	Provider: Hugging Face, Context: 204800, Output Limit: 204800
huggingface	Qwen 3 Embedding 8B	qwen3-embedding-8b	0.01	0.00	Provider: Hugging Face, Context: 32000, Output Limit: 4096
huggingface	Qwen 3 Embedding 4B	qwen3-embedding-4b	0.01	0.00	Provider: Hugging Face, Context: 32000, Output Limit: 2048
huggingface	Qwen3-Coder-480B-A35B-Instruct	qwen3-coder-480b-a35b-instruct	2.00	2.00	Provider: Hugging Face, Context: 262144, Output Limit: 66536
huggingface	Qwen3-235B-A22B-Thinking-2507	qwen3-235b-a22b-thinking-2507	0.30	3.00	Provider: Hugging Face, Context: 262144, Output Limit: 131072
huggingface	Qwen3-Next-80B-A3B-Instruct	qwen3-next-80b-a3b-instruct	0.25	1.00	Provider: Hugging Face, Context: 262144, Output Limit: 66536
huggingface	Qwen3-Next-80B-A3B-Thinking	qwen3-next-80b-a3b-thinking	0.30	2.00	Provider: Hugging Face, Context: 262144, Output Limit: 131072
huggingface	GLM-4.5	glm-4.5	0.60	2.20	Provider: Hugging Face, Context: 131072, Output Limit: 98304
huggingface	GLM-4.6	glm-4.6	0.60	2.20	Provider: Hugging Face, Context: 200000, Output Limit: 128000
huggingface	GLM-4.5-Air	glm-4.5-air	0.20	1.10	Provider: Hugging Face, Context: 128000, Output Limit: 96000
huggingface	DeepSeek-V3-0324	deepseek-v3-0324	1.25	1.25	Provider: Hugging Face, Context: 16384, Output Limit: 8192
huggingface	DeepSeek-R1-0528	deepseek-r1-0528	3.00	5.00	Provider: Hugging Face, Context: 163840, Output Limit: 163840
opencode	Qwen3 Coder	qwen3-coder	0.45	1.80	Provider: OpenCode Zen, Context: 262144, Output Limit: 65536
opencode	Claude Opus 4.1	claude-opus-4-1	15.00	75.00	Provider: OpenCode Zen, Context: 200000, Output Limit: 32000
opencode	Kimi K2	kimi-k2	0.40	2.50	Provider: OpenCode Zen, Context: 262144, Output Limit: 262144
opencode	GPT-5.1 Codex	gpt-5.1-codex	1.07	8.50	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode	Claude Haiku 4.5	claude-haiku-4-5	1.00	5.00	Provider: OpenCode Zen, Context: 200000, Output Limit: 64000
opencode	Claude Opus 4.5	claude-opus-4-5	5.00	25.00	Provider: OpenCode Zen, Context: 200000, Output Limit: 64000
opencode	Gemini 3 Pro	gemini-3-pro	2.00	12.00	Provider: OpenCode Zen, Context: 1048576, Output Limit: 65536
opencode	Alpha GLM-4.7	alpha-glm-4.7	0.60	2.20	Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode	Claude Sonnet 4.5	claude-sonnet-4-5	3.00	15.00	Provider: OpenCode Zen, Context: 1000000, Output Limit: 64000
opencode	GPT-5.1 Codex Mini	gpt-5.1-codex-mini	0.25	2.00	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode	Alpha GD4	alpha-gd4	0.50	2.00	Provider: OpenCode Zen, Context: 262144, Output Limit: 32768
opencode	Kimi K2 Thinking	kimi-k2-thinking	0.40	2.50	Provider: OpenCode Zen, Context: 262144, Output Limit: 262144
opencode	GPT-5.1	gpt-5.1	1.07	8.50	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode	GPT-5 Nano	gpt-5-nano	0.00	0.00	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode	GPT-5 Codex	gpt-5-codex	1.07	8.50	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode	Big Pickle	big-pickle	0.00	0.00	Provider: OpenCode Zen, Context: 200000, Output Limit: 128000
opencode	Claude Haiku 3.5	claude-3-5-haiku	0.80	4.00	Provider: OpenCode Zen, Context: 200000, Output Limit: 8192
opencode	GLM-4.6	glm-4.6	0.60	2.20	Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode	GLM-4.7	glm-4.7-free	0.00	0.00	Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode	Grok Code Fast 1	grok-code	0.00	0.00	Provider: OpenCode Zen, Context: 256000, Output Limit: 256000
opencode	Gemini 3 Flash	gemini-3-flash	0.50	3.00	Provider: OpenCode Zen, Context: 1048576, Output Limit: 65536
opencode	GPT-5.1 Codex Max	gpt-5.1-codex-max	1.25	10.00	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode	MiniMax M2.1	minimax-m2.1-free	0.00	0.00	Provider: OpenCode Zen, Context: 204800, Output Limit: 131072
opencode	Claude Sonnet 4	claude-sonnet-4	3.00	15.00	Provider: OpenCode Zen, Context: 1000000, Output Limit: 64000
opencode	GPT-5	gpt-5	1.07	8.50	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
opencode	GPT-5.2	gpt-5.2	1.75	14.00	Provider: OpenCode Zen, Context: 400000, Output Limit: 128000
fastrouter	Kimi K2	kimi-k2	0.55	2.20	Provider: FastRouter, Context: 131072, Output Limit: 32768
fastrouter	Grok 4	grok-4	3.00	15.00	Provider: FastRouter, Context: 256000, Output Limit: 64000
fastrouter	Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Provider: FastRouter, Context: 1048576, Output Limit: 65536
fastrouter	Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Provider: FastRouter, Context: 1048576, Output Limit: 65536
fastrouter	GPT-5 Nano	gpt-5-nano	0.05	0.40	Provider: FastRouter, Context: 400000, Output Limit: 128000
fastrouter	GPT-4.1	gpt-4.1	2.00	8.00	Provider: FastRouter, Context: 1047576, Output Limit: 32768
fastrouter	GPT-5 Mini	gpt-5-mini	0.25	2.00	Provider: FastRouter, Context: 400000, Output Limit: 128000
fastrouter	GPT OSS 20B	gpt-oss-20b	0.05	0.20	Provider: FastRouter, Context: 131072, Output Limit: 65536
fastrouter	GPT OSS 120B	gpt-oss-120b	0.15	0.60	Provider: FastRouter, Context: 131072, Output Limit: 32768
fastrouter	GPT-5	gpt-5	1.25	10.00	Provider: FastRouter, Context: 400000, Output Limit: 128000
fastrouter	Qwen3 Coder	qwen3-coder	0.30	1.20	Provider: FastRouter, Context: 262144, Output Limit: 66536
fastrouter	Claude Opus 4.1	claude-opus-4.1	15.00	75.00	Provider: FastRouter, Context: 200000, Output Limit: 32000
fastrouter	Claude Sonnet 4	claude-sonnet-4	3.00	15.00	Provider: FastRouter, Context: 200000, Output Limit: 64000
fastrouter	DeepSeek R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.03	0.14	Provider: FastRouter, Context: 131072, Output Limit: 131072
minimax	MiniMax-M2	minimax-m2	0.30	1.20	Provider: MiniMax, Context: 196608, Output Limit: 128000
minimax	MiniMax-M2.1	minimax-m2.1	0.30	1.20	Provider: MiniMax, Context: 204800, Output Limit: 131072
google	Gemini Embedding 001	gemini-embedding-001	0.15	0.00	Provider: Google, Context: 2048, Output Limit: 3072
google	Gemini 3 Flash Preview	gemini-3-flash-preview	0.50	3.00	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 2.5 Flash Image	gemini-2.5-flash-image	0.30	30.00	Provider: Google, Context: 32768, Output Limit: 32768
google	Gemini 2.5 Flash Preview 05-20	gemini-2.5-flash-preview-05-20	0.15	0.60	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini Flash-Lite Latest	gemini-flash-lite-latest	0.10	0.40	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	Provider: Google, Context: 1000000, Output Limit: 64000
google	Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini Flash Latest	gemini-flash-latest	0.30	2.50	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 2.5 Pro Preview 05-06	gemini-2.5-pro-preview-05-06	1.25	10.00	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 2.5 Flash Preview TTS	gemini-2.5-flash-preview-tts	0.50	10.00	Provider: Google, Context: 8000, Output Limit: 16000
google	Gemini 2.0 Flash Lite	gemini-2.0-flash-lite	0.08	0.30	Provider: Google, Context: 1048576, Output Limit: 8192
google	Gemini Live 2.5 Flash Preview Native Audio	gemini-live-2.5-flash-preview-native-audio	0.50	2.00	Provider: Google, Context: 131072, Output Limit: 65536
google	Gemini 2.0 Flash	gemini-2.0-flash	0.10	0.40	Provider: Google, Context: 1048576, Output Limit: 8192
google	Gemini 2.5 Flash Lite	gemini-2.5-flash-lite	0.10	0.40	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 2.5 Pro Preview 06-05	gemini-2.5-pro-preview-06-05	1.25	10.00	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini Live 2.5 Flash	gemini-live-2.5-flash	0.50	2.00	Provider: Google, Context: 128000, Output Limit: 8000
google	Gemini 2.5 Flash Lite Preview 06-17	gemini-2.5-flash-lite-preview-06-17	0.10	0.40	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 2.5 Flash Image (Preview)	gemini-2.5-flash-image-preview	0.30	30.00	Provider: Google, Context: 32768, Output Limit: 32768
google	Gemini 2.5 Flash Preview 09-25	gemini-2.5-flash-preview-09-2025	0.30	2.50	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 2.5 Flash Preview 04-17	gemini-2.5-flash-preview-04-17	0.15	0.60	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 2.5 Pro Preview TTS	gemini-2.5-pro-preview-tts	1.00	20.00	Provider: Google, Context: 8000, Output Limit: 16000
google	Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 1.5 Flash	gemini-1.5-flash	0.08	0.30	Provider: Google, Context: 1000000, Output Limit: 8192
google	Gemini 1.5 Flash-8B	gemini-1.5-flash-8b	0.04	0.15	Provider: Google, Context: 1000000, Output Limit: 8192
google	Gemini 2.5 Flash Lite Preview 09-25	gemini-2.5-flash-lite-preview-09-2025	0.10	0.40	Provider: Google, Context: 1048576, Output Limit: 65536
google	Gemini 1.5 Pro	gemini-1.5-pro	1.25	5.00	Provider: Google, Context: 1000000, Output Limit: 8192
googlevertex	Gemini Embedding 001	gemini-embedding-001	0.15	0.00	Provider: Vertex, Context: 2048, Output Limit: 3072
googlevertex	Gemini 3 Flash Preview	gemini-3-flash-preview	0.50	3.00	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Flash Preview 05-20	gemini-2.5-flash-preview-05-20	0.15	0.60	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini Flash-Lite Latest	gemini-flash-lite-latest	0.10	0.40	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini Flash Latest	gemini-flash-latest	0.30	2.50	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Pro Preview 05-06	gemini-2.5-pro-preview-05-06	1.25	10.00	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.0 Flash Lite	gemini-2.0-flash-lite	0.08	0.30	Provider: Vertex, Context: 1048576, Output Limit: 8192
googlevertex	Gemini 2.0 Flash	gemini-2.0-flash	0.10	0.40	Provider: Vertex, Context: 1048576, Output Limit: 8192
googlevertex	Gemini 2.5 Flash Lite	gemini-2.5-flash-lite	0.10	0.40	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Pro Preview 06-05	gemini-2.5-pro-preview-06-05	1.25	10.00	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Flash Lite Preview 06-17	gemini-2.5-flash-lite-preview-06-17	0.10	0.40	Provider: Vertex, Context: 65536, Output Limit: 65536
googlevertex	Gemini 2.5 Flash Preview 09-25	gemini-2.5-flash-preview-09-2025	0.30	2.50	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Flash Preview 04-17	gemini-2.5-flash-preview-04-17	0.15	0.60	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	Gemini 2.5 Flash Lite Preview 09-25	gemini-2.5-flash-lite-preview-09-2025	0.10	0.40	Provider: Vertex, Context: 1048576, Output Limit: 65536
googlevertex	GPT OSS 120B	gpt-oss-120b-maas	0.09	0.36	Provider: Vertex, Context: 131072, Output Limit: 32768
googlevertex	GPT OSS 20B	gpt-oss-20b-maas	0.07	0.25	Provider: Vertex, Context: 131072, Output Limit: 32768
cloudflareworkersai	@hf/thebloke/mistral-7b-instruct-v0.1-awq	mistral-7b-instruct-v0.1-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/deepgram/aura-1	aura-1	0.02	0.02	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@hf/mistral/mistral-7b-instruct-v0.2	mistral-7b-instruct-v0.2	0.00	0.00	Provider: Cloudflare Workers AI, Context: 3072, Output Limit: 4096
cloudflareworkersai	@cf/tinyllama/tinyllama-1.1b-chat-v1.0	tinyllama-1.1b-chat-v1.0	0.00	0.00	Provider: Cloudflare Workers AI, Context: 2048, Output Limit: 2048
cloudflareworkersai	@cf/qwen/qwen1.5-0.5b-chat	qwen1.5-0.5b-chat	0.00	0.00	Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
cloudflareworkersai	@cf/meta/llama-3.2-11b-vision-instruct	llama-3.2-11b-vision-instruct	0.05	0.68	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai	@hf/thebloke/llama-2-13b-chat-awq	llama-2-13b-chat-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/meta/llama-3.1-8b-instruct-fp8	llama-3.1-8b-instruct-fp8	0.15	0.29	Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
cloudflareworkersai	@cf/openai/whisper	whisper	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/stabilityai/stable-diffusion-xl-base-1.0	stable-diffusion-xl-base-1.0	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/meta/llama-2-7b-chat-fp16	llama-2-7b-chat-fp16	0.56	6.67	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/microsoft/resnet-50	resnet-50	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/runwayml/stable-diffusion-v1-5-inpainting	stable-diffusion-v1-5-inpainting	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/defog/sqlcoder-7b-2	sqlcoder-7b-2	0.00	0.00	Provider: Cloudflare Workers AI, Context: 10000, Output Limit: 10000
cloudflareworkersai	@cf/meta/llama-3-8b-instruct	llama-3-8b-instruct	0.28	0.83	Provider: Cloudflare Workers AI, Context: 7968, Output Limit: 7968
cloudflareworkersai	@cf/meta-llama/llama-2-7b-chat-hf-lora	llama-2-7b-chat-hf-lora	0.00	0.00	Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai	@cf/meta/llama-3.1-8b-instruct	llama-3.1-8b-instruct	0.28	0.83	Provider: Cloudflare Workers AI, Context: 7968, Output Limit: 7968
cloudflareworkersai	@cf/openchat/openchat-3.5-0106	openchat-3.5-0106	0.00	0.00	Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai	@hf/thebloke/openhermes-2.5-mistral-7b-awq	openhermes-2.5-mistral-7b-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/leonardo/lucid-origin	lucid-origin	0.01	0.01	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/facebook/bart-large-cnn	bart-large-cnn	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/black-forest-labs/flux-1-schnell	flux-1-schnell	0.00	0.00	Provider: Cloudflare Workers AI, Context: 2048, Output Limit: N/A
cloudflareworkersai	@cf/deepseek-ai/deepseek-r1-distill-qwen-32b	deepseek-r1-distill-qwen-32b	0.50	4.88	Provider: Cloudflare Workers AI, Context: 80000, Output Limit: 80000
cloudflareworkersai	@cf/google/gemma-2b-it-lora	gemma-2b-it-lora	0.00	0.00	Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai	@cf/fblgit/una-cybertron-7b-v2-bf16	una-cybertron-7b-v2-bf16	0.00	0.00	Provider: Cloudflare Workers AI, Context: 15000, Output Limit: 15000
cloudflareworkersai	@cf/aisingapore/gemma-sea-lion-v4-27b-it	gemma-sea-lion-v4-27b-it	0.35	0.56	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: N/A
cloudflareworkersai	@cf/meta/m2m100-1.2b	m2m100-1.2b	0.34	0.34	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/meta/llama-3.2-3b-instruct	llama-3.2-3b-instruct	0.05	0.34	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai	@cf/qwen/qwen2.5-coder-32b-instruct	qwen2.5-coder-32b-instruct	0.66	1.00	Provider: Cloudflare Workers AI, Context: 32768, Output Limit: 32768
cloudflareworkersai	@cf/runwayml/stable-diffusion-v1-5-img2img	stable-diffusion-v1-5-img2img	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/google/gemma-7b-it-lora	gemma-7b-it-lora	0.00	0.00	Provider: Cloudflare Workers AI, Context: 3500, Output Limit: 3500
cloudflareworkersai	@cf/qwen/qwen1.5-14b-chat-awq	qwen1.5-14b-chat-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 7500, Output Limit: 7500
cloudflareworkersai	@cf/qwen/qwen1.5-1.8b-chat	qwen1.5-1.8b-chat	0.00	0.00	Provider: Cloudflare Workers AI, Context: 32000, Output Limit: 32000
cloudflareworkersai	@cf/mistralai/mistral-small-3.1-24b-instruct	mistral-small-3.1-24b-instruct	0.35	0.56	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai	@hf/google/gemma-7b-it	gemma-7b-it	0.00	0.00	Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai	@cf/qwen/qwen3-30b-a3b-fp8	qwen3-30b-a3b-fp8	0.05	0.34	Provider: Cloudflare Workers AI, Context: 32768, Output Limit: N/A
cloudflareworkersai	@hf/thebloke/llamaguard-7b-awq	llamaguard-7b-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@hf/nousresearch/hermes-2-pro-mistral-7b	hermes-2-pro-mistral-7b	0.00	0.00	Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai	@cf/ibm-granite/granite-4.0-h-micro	granite-4.0-h-micro	0.02	0.11	Provider: Cloudflare Workers AI, Context: 131000, Output Limit: N/A
cloudflareworkersai	@cf/tiiuae/falcon-7b-instruct	falcon-7b-instruct	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/meta/llama-3.3-70b-instruct-fp8-fast	llama-3.3-70b-instruct-fp8-fast	0.29	2.25	Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai	@cf/meta/llama-3-8b-instruct-awq	llama-3-8b-instruct-awq	0.12	0.27	Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai	@cf/leonardo/phoenix-1.0	phoenix-1.0	0.01	0.01	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/microsoft/phi-2	phi-2	0.00	0.00	Provider: Cloudflare Workers AI, Context: 2048, Output Limit: 2048
cloudflareworkersai	@cf/lykon/dreamshaper-8-lcm	dreamshaper-8-lcm	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/thebloke/discolm-german-7b-v1-awq	discolm-german-7b-v1-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/meta/llama-2-7b-chat-int8	llama-2-7b-chat-int8	0.56	6.67	Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai	@cf/meta/llama-3.2-1b-instruct	llama-3.2-1b-instruct	0.03	0.20	Provider: Cloudflare Workers AI, Context: 60000, Output Limit: 60000
cloudflareworkersai	@cf/openai/whisper-large-v3-turbo	whisper-large-v3-turbo	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/meta/llama-4-scout-17b-16e-instruct	llama-4-scout-17b-16e-instruct	0.27	0.85	Provider: Cloudflare Workers AI, Context: 131000, Output Limit: 131000
cloudflareworkersai	@hf/nexusflow/starling-lm-7b-beta	starling-lm-7b-beta	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@hf/thebloke/deepseek-coder-6.7b-base-awq	deepseek-coder-6.7b-base-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/google/gemma-3-12b-it	gemma-3-12b-it	0.35	0.56	Provider: Cloudflare Workers AI, Context: 80000, Output Limit: 80000
cloudflareworkersai	@cf/meta/llama-guard-3-8b	llama-guard-3-8b	0.48	0.03	Provider: Cloudflare Workers AI, Context: 131072, Output Limit: N/A
cloudflareworkersai	@hf/thebloke/neural-chat-7b-v3-1-awq	neural-chat-7b-v3-1-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/openai/whisper-tiny-en	whisper-tiny-en	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/bytedance/stable-diffusion-xl-lightning	stable-diffusion-xl-lightning	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/mistral/mistral-7b-instruct-v0.1	mistral-7b-instruct-v0.1	0.11	0.19	Provider: Cloudflare Workers AI, Context: 2824, Output Limit: 2824
cloudflareworkersai	@cf/llava-hf/llava-1.5-7b-hf	llava-1.5-7b-hf	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/openai/gpt-oss-20b	gpt-oss-20b	0.20	0.30	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai	@cf/deepseek-ai/deepseek-math-7b-instruct	deepseek-math-7b-instruct	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/openai/gpt-oss-120b	gpt-oss-120b	0.35	0.75	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai	@cf/myshell-ai/melotts	melotts	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/qwen/qwen1.5-7b-chat-awq	qwen1.5-7b-chat-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 20000, Output Limit: 20000
cloudflareworkersai	@cf/meta/llama-3.1-8b-instruct-fast	llama-3.1-8b-instruct-fast	0.05	0.38	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareworkersai	@cf/deepgram/nova-3	nova-3	0.01	0.01	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
cloudflareworkersai	@cf/meta/llama-3.1-70b-instruct	llama-3.1-70b-instruct	0.29	2.25	Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai	@cf/qwen/qwq-32b	qwq-32b	0.66	1.00	Provider: Cloudflare Workers AI, Context: 24000, Output Limit: 24000
cloudflareworkersai	@hf/thebloke/zephyr-7b-beta-awq	zephyr-7b-beta-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@hf/thebloke/deepseek-coder-6.7b-instruct-awq	deepseek-coder-6.7b-instruct-awq	0.00	0.00	Provider: Cloudflare Workers AI, Context: 4096, Output Limit: 4096
cloudflareworkersai	@cf/meta/llama-3.1-8b-instruct-awq	llama-3.1-8b-instruct-awq	0.12	0.27	Provider: Cloudflare Workers AI, Context: 8192, Output Limit: 8192
cloudflareworkersai	@cf/mistral/mistral-7b-instruct-v0.2-lora	mistral-7b-instruct-v0.2-lora	0.00	0.00	Provider: Cloudflare Workers AI, Context: 15000, Output Limit: 15000
cloudflareworkersai	@cf/unum/uform-gen2-qwen-500m	uform-gen2-qwen-500m	0.00	0.00	Provider: Cloudflare Workers AI, Context: N/A, Output Limit: N/A
inception	Mercury Coder	mercury-coder	0.25	1.00	Provider: Inception, Context: 128000, Output Limit: 16384
inception	Mercury	mercury	0.25	1.00	Provider: Inception, Context: 128000, Output Limit: 16384
wandb	Kimi-K2-Instruct	kimi-k2-instruct	1.35	4.00	Provider: Weights & Biases, Context: 128000, Output Limit: 16384
wandb	Phi-4-mini-instruct	phi-4-mini-instruct	0.08	0.35	Provider: Weights & Biases, Context: 128000, Output Limit: 4096
wandb	Meta-Llama-3.1-8B-Instruct	llama-3.1-8b-instruct	0.22	0.22	Provider: Weights & Biases, Context: 128000, Output Limit: 32768
wandb	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.71	0.71	Provider: Weights & Biases, Context: 128000, Output Limit: 32768
wandb	Llama 4 Scout 17B 16E Instruct	llama-4-scout-17b-16e-instruct	0.17	0.66	Provider: Weights & Biases, Context: 64000, Output Limit: 8192
wandb	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.10	0.10	Provider: Weights & Biases, Context: 262144, Output Limit: 131072
wandb	Qwen3-Coder-480B-A35B-Instruct	qwen3-coder-480b-a35b-instruct	1.00	1.50	Provider: Weights & Biases, Context: 262144, Output Limit: 66536
wandb	Qwen3-235B-A22B-Thinking-2507	qwen3-235b-a22b-thinking-2507	0.10	0.10	Provider: Weights & Biases, Context: 262144, Output Limit: 131072
wandb	DeepSeek-R1-0528	deepseek-r1-0528	1.35	5.40	Provider: Weights & Biases, Context: 161000, Output Limit: 163840
wandb	DeepSeek-V3-0324	deepseek-v3-0324	1.14	2.75	Provider: Weights & Biases, Context: 161000, Output Limit: 8192
cloudflareaigateway	IBM Granite 4.0 H Micro	granite-4.0-h-micro	0.02	0.11	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	BART Large CNN	bart-large-cnn	0.00	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Mistral 7B Instruct v0.1	mistral-7b-instruct-v0.1	0.11	0.19	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	DistilBERT SST-2 INT8	distilbert-sst-2-int8	0.03	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	MyShell MeloTTS	melotts	0.00	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Gemma 3 12B IT	gemma-3-12b-it	0.35	0.56	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	PLaMo Embedding 1B	plamo-embedding-1b	0.02	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	GPT OSS 20B	gpt-oss-20b	0.20	0.30	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	GPT OSS 120B	gpt-oss-120b	0.35	0.75	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	IndicTrans2 EN-Indic 1B	indictrans2-en-indic-1b	0.34	0.34	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Pipecat Smart Turn v2	smart-turn-v2	0.00	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Qwen 2.5 Coder 32B Instruct	qwen2.5-coder-32b-instruct	0.66	1.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Qwen3 30B A3B FP8	qwen3-30b-a3b-fp8	0.05	0.34	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Qwen3 Embedding 0.6B	qwen3-embedding-0.6b	0.01	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	QwQ 32B	qwq-32b	0.66	1.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Mistral Small 3.1 24B Instruct	mistral-small-3.1-24b-instruct	0.35	0.56	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Deepgram Aura 2 (ES)	aura-2-es	0.00	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Deepgram Aura 2 (EN)	aura-2-en	0.00	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Deepgram Nova 3	nova-3	0.00	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Gemma SEA-LION v4 27B IT	gemma-sea-lion-v4-27b-it	0.35	0.56	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3.2 11B Vision Instruct	llama-3.2-11b-vision-instruct	0.05	0.68	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3.1 8B Instruct FP8	llama-3.1-8b-instruct-fp8	0.15	0.29	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 2 7B Chat FP16	llama-2-7b-chat-fp16	0.56	6.67	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3 8B Instruct	llama-3-8b-instruct	0.28	0.83	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3.1 8B Instruct	llama-3.1-8b-instruct	0.28	0.83	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	M2M100 1.2B	m2m100-1.2b	0.34	0.34	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3.2 3B Instruct	llama-3.2-3b-instruct	0.05	0.34	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3.3 70B Instruct FP8 Fast	llama-3.3-70b-instruct-fp8-fast	0.29	2.25	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3 8B Instruct AWQ	llama-3-8b-instruct-awq	0.12	0.27	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3.2 1B Instruct	llama-3.2-1b-instruct	0.03	0.20	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 4 Scout 17B 16E Instruct	llama-4-scout-17b-16e-instruct	0.27	0.85	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama Guard 3 8B	llama-guard-3-8b	0.48	0.03	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	Llama 3.1 8B Instruct AWQ	llama-3.1-8b-instruct-awq	0.12	0.27	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	BGE M3	bge-m3	0.01	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	BGE Base EN v1.5	bge-base-en-v1.5	0.07	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	BGE Large EN v1.5	bge-large-en-v1.5	0.20	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	BGE Reranker Base	bge-reranker-base	0.00	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	BGE Small EN v1.5	bge-small-en-v1.5	0.02	0.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	DeepSeek R1 Distill Qwen 32B	deepseek-r1-distill-qwen-32b	0.50	4.88	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	GPT-4	gpt-4	30.00	60.00	Provider: Cloudflare AI Gateway, Context: 8192, Output Limit: 8192
cloudflareaigateway	GPT-5.1 Codex	gpt-5.1-codex	1.25	10.00	Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
cloudflareaigateway	GPT-3.5-turbo	gpt-3.5-turbo	0.50	1.50	Provider: Cloudflare AI Gateway, Context: 16385, Output Limit: 4096
cloudflareaigateway	GPT-4 Turbo	gpt-4-turbo	10.00	30.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 4096
cloudflareaigateway	o3-mini	o3-mini	1.10	4.40	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway	GPT-5.1	gpt-5.1	1.25	10.00	Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
cloudflareaigateway	GPT-4o	gpt-4o	2.50	10.00	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	o4-mini	o4-mini	1.10	4.40	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway	o1	o1	15.00	60.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway	o3-pro	o3-pro	20.00	80.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway	o3	o3	2.00	8.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 100000
cloudflareaigateway	GPT-4o mini	gpt-4o-mini	0.15	0.60	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
cloudflareaigateway	GPT-5.2	gpt-5.2	1.75	14.00	Provider: Cloudflare AI Gateway, Context: 400000, Output Limit: 128000
cloudflareaigateway	Claude Opus 4 (latest)	claude-opus-4	15.00	75.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 32000
cloudflareaigateway	Claude Opus 4.1 (latest)	claude-opus-4-1	15.00	75.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 32000
cloudflareaigateway	Claude Haiku 4.5 (latest)	claude-haiku-4-5	1.00	5.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
cloudflareaigateway	Claude Haiku 3	claude-3-haiku	0.25	1.25	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
cloudflareaigateway	Claude Opus 4.5 (latest)	claude-opus-4-5	5.00	25.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
cloudflareaigateway	Claude Opus 3	claude-3-opus	15.00	75.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
cloudflareaigateway	Claude Sonnet 4.5 (latest)	claude-sonnet-4-5	3.00	15.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
cloudflareaigateway	Claude Sonnet 3.5 v2	claude-3.5-sonnet	3.00	15.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
cloudflareaigateway	Claude Sonnet 3	claude-3-sonnet	3.00	15.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 4096
cloudflareaigateway	Claude Haiku 3.5 (latest)	claude-3-5-haiku	0.80	4.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
cloudflareaigateway	Claude Haiku 3.5 (latest)	claude-3.5-haiku	0.80	4.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 8192
cloudflareaigateway	Claude Sonnet 4 (latest)	claude-sonnet-4	3.00	15.00	Provider: Cloudflare AI Gateway, Context: 200000, Output Limit: 64000
openai	GPT-4.1 nano	gpt-4.1-nano	0.10	0.40	Provider: OpenAI, Context: 1047576, Output Limit: 32768
openai	text-embedding-3-small	text-embedding-3-small	0.02	0.00	Provider: OpenAI, Context: 8191, Output Limit: 1536
openai	GPT-4	gpt-4	30.00	60.00	Provider: OpenAI, Context: 8192, Output Limit: 8192
openai	o1-pro	o1-pro	150.00	600.00	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	GPT-4o (2024-05-13)	gpt-4o-2024-05-13	5.00	15.00	Provider: OpenAI, Context: 128000, Output Limit: 4096
openai	GPT-5.1 Codex	gpt-5.1-codex	1.25	10.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	GPT-4o (2024-08-06)	gpt-4o-2024-08-06	2.50	10.00	Provider: OpenAI, Context: 128000, Output Limit: 16384
openai	GPT-4.1 mini	gpt-4.1-mini	0.40	1.60	Provider: OpenAI, Context: 1047576, Output Limit: 32768
openai	o3-deep-research	o3-deep-research	10.00	40.00	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	GPT-3.5-turbo	gpt-3.5-turbo	0.50	1.50	Provider: OpenAI, Context: 16385, Output Limit: 4096
openai	GPT-5.2 Pro	gpt-5.2-pro	21.00	168.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	text-embedding-3-large	text-embedding-3-large	0.13	0.00	Provider: OpenAI, Context: 8191, Output Limit: 3072
openai	GPT-4 Turbo	gpt-4-turbo	10.00	30.00	Provider: OpenAI, Context: 128000, Output Limit: 4096
openai	o1-preview	o1-preview	15.00	60.00	Provider: OpenAI, Context: 128000, Output Limit: 32768
openai	GPT-5.1 Codex mini	gpt-5.1-codex-mini	0.25	2.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	o3-mini	o3-mini	1.10	4.40	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	GPT-5.2 Chat	gpt-5.2-chat-latest	1.75	14.00	Provider: OpenAI, Context: 128000, Output Limit: 16384
openai	GPT-5.1	gpt-5.1	1.25	10.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	Codex Mini	codex-mini-latest	1.50	6.00	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	GPT-5 Nano	gpt-5-nano	0.05	0.40	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	GPT-5-Codex	gpt-5-codex	1.25	10.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	GPT-4o	gpt-4o	2.50	10.00	Provider: OpenAI, Context: 128000, Output Limit: 16384
openai	GPT-4.1	gpt-4.1	2.00	8.00	Provider: OpenAI, Context: 1047576, Output Limit: 32768
openai	o4-mini	o4-mini	1.10	4.40	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	o1	o1	15.00	60.00	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	GPT-5 Mini	gpt-5-mini	0.25	2.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	o1-mini	o1-mini	1.10	4.40	Provider: OpenAI, Context: 128000, Output Limit: 65536
openai	text-embedding-ada-002	text-embedding-ada-002	0.10	0.00	Provider: OpenAI, Context: 8192, Output Limit: 1536
openai	o3-pro	o3-pro	20.00	80.00	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	GPT-4o (2024-11-20)	gpt-4o-2024-11-20	2.50	10.00	Provider: OpenAI, Context: 128000, Output Limit: 16384
openai	GPT-5.1 Codex Max	gpt-5.1-codex-max	1.25	10.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	o3	o3	2.00	8.00	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	o4-mini-deep-research	o4-mini-deep-research	2.00	8.00	Provider: OpenAI, Context: 200000, Output Limit: 100000
openai	GPT-5 Chat (latest)	gpt-5-chat-latest	1.25	10.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	GPT-4o mini	gpt-4o-mini	0.15	0.60	Provider: OpenAI, Context: 128000, Output Limit: 16384
openai	GPT-5	gpt-5	1.25	10.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	GPT-5 Pro	gpt-5-pro	15.00	120.00	Provider: OpenAI, Context: 400000, Output Limit: 272000
openai	GPT-5.2	gpt-5.2	1.75	14.00	Provider: OpenAI, Context: 400000, Output Limit: 128000
openai	GPT-5.1 Chat	gpt-5.1-chat-latest	1.25	10.00	Provider: OpenAI, Context: 128000, Output Limit: 16384
minimaxcn	MiniMax-M2.1	minimax-m2.1	0.30	1.20	Provider: MiniMax (China), Context: 204800, Output Limit: 131072
minimaxcn	MiniMax-M2	minimax-m2	0.30	1.20	Provider: MiniMax (China), Context: 196608, Output Limit: 128000
perplexity	Sonar	sonar	1.00	1.00	Provider: Perplexity, Context: 128000, Output Limit: 4096
perplexity	Sonar Pro	sonar-pro	3.00	15.00	Provider: Perplexity, Context: 200000, Output Limit: 8192
perplexity	Sonar Reasoning Pro	sonar-reasoning-pro	2.00	8.00	Provider: Perplexity, Context: 128000, Output Limit: 4096
zenmux	Step-3	step-3	0.21	0.57	Provider: ZenMux, Context: 65536, Output Limit: 64000
zenmux	Kimi K2 Thinking Turbo	kimi-k2-thinking-turbo	1.15	8.00	Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux	Kimi K2 0905	kimi-k2-0905	0.60	2.50	Provider: ZenMux, Context: 262100, Output Limit: 64000
zenmux	Kimi K2 Thinking	kimi-k2-thinking	0.60	2.50	Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux	MiMo-V2-Flash Free	mimo-v2-flash-free	0.00	0.00	Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux	MiMo-V2-Flash	mimo-v2-flash	0.00	0.00	Provider: ZenMux, Context: 262144, Output Limit: 64000
zenmux	Grok 4	grok-4	3.00	15.00	Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux	Grok Code Fast 1	grok-code-fast-1	0.20	1.50	Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux	Grok 4.1 Fast Non Reasoning	grok-4.1-fast-non-reasoning	0.20	0.50	Provider: ZenMux, Context: 2000000, Output Limit: 64000
zenmux	Grok 4 Fast	grok-4-fast	0.20	0.50	Provider: ZenMux, Context: 2000000, Output Limit: 64000
zenmux	Grok 4.1 Fast	grok-4.1-fast	0.20	0.50	Provider: ZenMux, Context: 2000000, Output Limit: 64000
zenmux	DeepSeek-V3.2 (Non-thinking Mode)	deepseek-chat	0.28	0.42	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	DeepSeek-V3.2-Exp	deepseek-v3.2-exp	0.22	0.33	Provider: ZenMux, Context: 163840, Output Limit: 64000
zenmux	DeepSeek-V3.2 (Thinking Mode)	deepseek-reasoner	0.28	0.42	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	DeepSeek V3.2	deepseek-v3.2	0.28	0.43	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	MiniMax M2	minimax-m2	0.30	1.20	Provider: ZenMux, Context: 204800, Output Limit: 64000
zenmux	MiniMax M2.1	minimax-m2.1	0.30	1.20	Provider: ZenMux, Context: 204800, Output Limit: 64000
zenmux	Gemini 3 Flash Preview	gemini-3-flash-preview	0.50	3.00	Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux	Gemini 3 Flash Preview Free	gemini-3-flash-preview-free	0.00	0.00	Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux	Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux	Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux	Gemini 2.5 Flash Lite	gemini-2.5-flash-lite	0.10	0.40	Provider: ZenMux, Context: 1048576, Output Limit: 64000
zenmux	Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Provider: ZenMux, Context: 1048576, Output Limit: 65536
zenmux	Doubao-Seed-Code	doubao-seed-code	0.17	1.12	Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux	Doubao-Seed-1.8	doubao-seed-1.8	0.11	0.28	Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux	GPT-5.1-Codex	gpt-5.1-codex	1.25	10.00	Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux	GPT-5.1-Codex-Mini	gpt-5.1-codex-mini	0.25	2.00	Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux	GPT-5.1	gpt-5.1	1.25	10.00	Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux	GPT-5 Codex	gpt-5-codex	1.25	10.00	Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux	GPT-5.1 Chat	gpt-5.1-chat	1.25	10.00	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	GPT-5	gpt-5	1.25	10.00	Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux	GPT-5.2	gpt-5.2	1.75	14.00	Provider: ZenMux, Context: 400000, Output Limit: 64000
zenmux	ERNIE-5.0-Thinking-Preview	ernie-5.0-thinking-preview	0.84	3.37	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	Ring-1T	ring-1t	0.56	2.24	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	Ling-1T	ling-1t	0.56	2.24	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	GLM 4.7	glm-4.7	0.28	1.14	Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux	GLM 4.6V Flash (Free)	glm-4.6v-flash-free	0.00	0.00	Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux	GLM 4.6V FlashX	glm-4.6v-flash	0.00	0.00	Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux	GLM 4.5	glm-4.5	0.35	1.54	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	GLM 4.5 Air	glm-4.5-air	0.11	0.56	Provider: ZenMux, Context: 128000, Output Limit: 64000
zenmux	GLM 4.6	glm-4.6	0.35	1.54	Provider: ZenMux, Context: 200000, Output Limit: 128000
zenmux	GLM 4.6V	glm-4.6v	0.14	0.42	Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux	Qwen3-Coder-Plus	qwen3-coder-plus	1.00	5.00	Provider: ZenMux, Context: 1000000, Output Limit: 64000
zenmux	KAT-Coder-Pro-V1 Free	kat-coder-pro-v1-free	0.00	0.00	Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux	KAT-Coder-Pro-V1	kat-coder-pro-v1	0.00	0.00	Provider: ZenMux, Context: 256000, Output Limit: 64000
zenmux	Claude Opus 4	claude-opus-4	15.00	75.00	Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux	Claude Haiku 4.5	claude-haiku-4.5	1.00	5.00	Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux	Claude Opus 4.1	claude-opus-4.1	15.00	75.00	Provider: ZenMux, Context: 200000, Output Limit: 32000
zenmux	Claude Sonnet 4	claude-sonnet-4	3.00	15.00	Provider: ZenMux, Context: 1000000, Output Limit: 64000
zenmux	Claude Opus 4.5	claude-opus-4.5	5.00	25.00	Provider: ZenMux, Context: 200000, Output Limit: 64000
zenmux	Claude Sonnet 4.5	claude-sonnet-4.5	3.00	15.00	Provider: ZenMux, Context: 1000000, Output Limit: 64000
ovhcloud	Mixtral-8x7B-Instruct-v0.1	mixtral-8x7b-instruct-v0.1	0.70	0.70	Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud	Mistral-7B-Instruct-v0.3	mistral-7b-instruct-v0.3	0.11	0.11	Provider: OVHcloud AI Endpoints, Context: 127000, Output Limit: 127000
ovhcloud	Llama-3.1-8B-Instruct	llama-3.1-8b-instruct	0.11	0.11	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud	Qwen2.5-VL-72B-Instruct	qwen2.5-vl-72b-instruct	1.01	1.01	Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud	Mistral-Nemo-Instruct-2407	mistral-nemo-instruct-2407	0.14	0.14	Provider: OVHcloud AI Endpoints, Context: 118000, Output Limit: 118000
ovhcloud	Mistral-Small-3.2-24B-Instruct-2506	mistral-small-3.2-24b-instruct-2506	0.10	0.31	Provider: OVHcloud AI Endpoints, Context: 128000, Output Limit: 128000
ovhcloud	Qwen2.5-Coder-32B-Instruct	qwen2.5-coder-32b-instruct	0.96	0.96	Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud	Qwen3-Coder-30B-A3B-Instruct	qwen3-coder-30b-a3b-instruct	0.07	0.26	Provider: OVHcloud AI Endpoints, Context: 256000, Output Limit: 256000
ovhcloud	llava-next-mistral-7b	llava-next-mistral-7b	0.32	0.32	Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
ovhcloud	DeepSeek-R1-Distill-Llama-70B	deepseek-r1-distill-llama-70b	0.74	0.74	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud	Meta-Llama-3_1-70B-Instruct	meta-llama-3_1-70b-instruct	0.74	0.74	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud	gpt-oss-20b	gpt-oss-20b	0.05	0.18	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud	gpt-oss-120b	gpt-oss-120b	0.09	0.47	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud	Meta-Llama-3_3-70B-Instruct	meta-llama-3_3-70b-instruct	0.74	0.74	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
ovhcloud	Qwen3-32B	qwen3-32b	0.09	0.25	Provider: OVHcloud AI Endpoints, Context: 32000, Output Limit: 32000
v0	v0-1.5-lg	v0-1.5-lg	15.00	75.00	Provider: v0, Context: 512000, Output Limit: 32000
v0	v0-1.5-md	v0-1.5-md	3.00	15.00	Provider: v0, Context: 128000, Output Limit: 32000
v0	v0-1.0-md	v0-1.0-md	3.00	15.00	Provider: v0, Context: 128000, Output Limit: 32000
iflowcn	Qwen3-Coder-480B-A35B	qwen3-coder	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn	DeepSeek-V3	deepseek-v3	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 32000
iflowcn	Kimi-K2	kimi-k2	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn	DeepSeek-R1	deepseek-r1	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 32000
iflowcn	DeepSeek-V3.1-Terminus	deepseek-v3.1	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn	MiniMax-M2	minimax-m2	0.00	0.00	Provider: iFlow, Context: 204800, Output Limit: 131100
iflowcn	Qwen3-235B-A22B	qwen3-235b	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 32000
iflowcn	DeepSeek-V3.2	deepseek-v3.2-chat	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn	Kimi-K2-0905	kimi-k2-0905	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn	Kimi-K2-Thinking	kimi-k2-thinking	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn	Qwen3-235B-A22B-Thinking	qwen3-235b-a22b-thinking-2507	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn	Qwen3-VL-Plus	qwen3-vl-plus	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 32000
iflowcn	GLM-4.6	glm-4.6	0.00	0.00	Provider: iFlow, Context: 200000, Output Limit: 128000
iflowcn	TStars-2.0	tstars2.0	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn	Qwen3-235B-A22B-Instruct	qwen3-235b-a22b-instruct	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn	Qwen3-Max	qwen3-max	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 32000
iflowcn	DeepSeek-V3.2-Exp	deepseek-v3.2	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 64000
iflowcn	Qwen3-Max-Preview	qwen3-max-preview	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 32000
iflowcn	Qwen3-Coder-Plus	qwen3-coder-plus	0.00	0.00	Provider: iFlow, Context: 256000, Output Limit: 64000
iflowcn	Qwen3-32B	qwen3-32b	0.00	0.00	Provider: iFlow, Context: 128000, Output Limit: 32000
synthetic	Qwen 3 235B Instruct	qwen3-235b-a22b-instruct-2507	0.20	0.60	Provider: Synthetic, Context: 256000, Output Limit: 32000
synthetic	Qwen2.5-Coder-32B-Instruct	qwen2.5-coder-32b-instruct	0.80	0.80	Provider: Synthetic, Context: 32768, Output Limit: 32768
synthetic	Qwen 3 Coder 480B	qwen3-coder-480b-a35b-instruct	2.00	2.00	Provider: Synthetic, Context: 256000, Output Limit: 32000
synthetic	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	0.65	3.00	Provider: Synthetic, Context: 256000, Output Limit: 32000
synthetic	MiniMax-M2	minimax-m2	0.55	2.19	Provider: Synthetic, Context: 196608, Output Limit: 131000
synthetic	MiniMax-M2.1	minimax-m2.1	0.55	2.19	Provider: Synthetic, Context: 204800, Output Limit: 131072
synthetic	Llama-3.1-70B-Instruct	llama-3.1-70b-instruct	0.90	0.90	Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic	Llama-3.1-8B-Instruct	llama-3.1-8b-instruct	0.20	0.20	Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.90	0.90	Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic	Llama-4-Scout-17B-16E-Instruct	llama-4-scout-17b-16e-instruct	0.15	0.60	Provider: Synthetic, Context: 328000, Output Limit: 4096
synthetic	Llama-4-Maverick-17B-128E-Instruct-FP8	llama-4-maverick-17b-128e-instruct-fp8	0.22	0.88	Provider: Synthetic, Context: 524000, Output Limit: 4096
synthetic	Llama-3.1-405B-Instruct	llama-3.1-405b-instruct	3.00	3.00	Provider: Synthetic, Context: 128000, Output Limit: 32768
synthetic	Kimi K2 0905	kimi-k2-instruct-0905	1.20	1.20	Provider: Synthetic, Context: 262144, Output Limit: 32768
synthetic	Kimi K2 Thinking	kimi-k2-thinking	0.55	2.19	Provider: Synthetic, Context: 262144, Output Limit: 262144
synthetic	GLM 4.5	glm-4.5	0.55	2.19	Provider: Synthetic, Context: 128000, Output Limit: 96000
synthetic	GLM 4.7	glm-4.7	0.55	2.19	Provider: Synthetic, Context: 200000, Output Limit: 64000
synthetic	GLM 4.6	glm-4.6	0.55	2.19	Provider: Synthetic, Context: 200000, Output Limit: 64000
synthetic	DeepSeek R1	deepseek-r1	0.55	2.19	Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic	DeepSeek R1 (0528)	deepseek-r1-0528	3.00	8.00	Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic	DeepSeek V3.1 Terminus	deepseek-v3.1-terminus	1.20	1.20	Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic	DeepSeek V3.2	deepseek-v3.2	0.27	0.40	Provider: Synthetic, Context: 162816, Output Limit: 8000
synthetic	DeepSeek V3	deepseek-v3	1.25	1.25	Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic	DeepSeek V3.1	deepseek-v3.1	0.56	1.68	Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic	DeepSeek V3 (0324)	deepseek-v3-0324	1.20	1.20	Provider: Synthetic, Context: 128000, Output Limit: 128000
synthetic	GPT OSS 120B	gpt-oss-120b	0.10	0.10	Provider: Synthetic, Context: 128000, Output Limit: 32768
deepinfra	Kimi K2	kimi-k2-instruct	0.50	2.00	Provider: Deep Infra, Context: 131072, Output Limit: 32768
deepinfra	Kimi K2 Thinking	kimi-k2-thinking	0.47	2.00	Provider: Deep Infra, Context: 131072, Output Limit: 32768
deepinfra	MiniMax M2	minimax-m2	0.25	1.02	Provider: Deep Infra, Context: 262144, Output Limit: 32768
deepinfra	GPT OSS 20B	gpt-oss-20b	0.03	0.14	Provider: Deep Infra, Context: 131072, Output Limit: 16384
deepinfra	GPT OSS 120B	gpt-oss-120b	0.05	0.24	Provider: Deep Infra, Context: 131072, Output Limit: 16384
deepinfra	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	0.40	1.60	Provider: Deep Infra, Context: 262144, Output Limit: 66536
deepinfra	Qwen3 Coder 480B A35B Instruct Turbo	qwen3-coder-480b-a35b-instruct-turbo	0.30	1.20	Provider: Deep Infra, Context: 262144, Output Limit: 66536
deepinfra	GLM-4.5	glm-4.5	0.60	2.20	Provider: Deep Infra, Context: 131072, Output Limit: 98304
deepinfra	GLM-4.7	glm-4.7	0.43	1.75	Provider: Deep Infra, Context: 202752, Output Limit: 16384
zhipuai	GLM-4.6V-Flash	glm-4.6v-flash	0.00	0.00	Provider: Zhipu AI, Context: 128000, Output Limit: 32768
zhipuai	GLM-4.6V	glm-4.6v	0.30	0.90	Provider: Zhipu AI, Context: 128000, Output Limit: 32768
zhipuai	GLM-4.6	glm-4.6	0.60	2.20	Provider: Zhipu AI, Context: 204800, Output Limit: 131072
zhipuai	GLM-4.5V	glm-4.5v	0.60	1.80	Provider: Zhipu AI, Context: 64000, Output Limit: 16384
zhipuai	GLM-4.5-Air	glm-4.5-air	0.20	1.10	Provider: Zhipu AI, Context: 131072, Output Limit: 98304
zhipuai	GLM-4.5	glm-4.5	0.60	2.20	Provider: Zhipu AI, Context: 131072, Output Limit: 98304
zhipuai	GLM-4.5-Flash	glm-4.5-flash	0.00	0.00	Provider: Zhipu AI, Context: 131072, Output Limit: 98304
zhipuai	GLM-4.7	glm-4.7	0.60	2.20	Provider: Zhipu AI, Context: 204800, Output Limit: 131072
submodel	GPT OSS 120B	gpt-oss-120b	0.10	0.50	Provider: submodel, Context: 131072, Output Limit: 32768
submodel	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.20	0.30	Provider: submodel, Context: 262144, Output Limit: 131072
submodel	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct-fp8	0.20	0.80	Provider: submodel, Context: 262144, Output Limit: 262144
submodel	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	0.20	0.60	Provider: submodel, Context: 262144, Output Limit: 131072
submodel	GLM 4.5 FP8	glm-4.5-fp8	0.20	0.80	Provider: submodel, Context: 131072, Output Limit: 131072
submodel	GLM 4.5 Air	glm-4.5-air	0.10	0.50	Provider: submodel, Context: 131072, Output Limit: 131072
submodel	DeepSeek R1 0528	deepseek-r1-0528	0.50	2.15	Provider: submodel, Context: 75000, Output Limit: 163840
submodel	DeepSeek V3.1	deepseek-v3.1	0.20	0.80	Provider: submodel, Context: 75000, Output Limit: 163840
submodel	DeepSeek V3 0324	deepseek-v3-0324	0.20	0.80	Provider: submodel, Context: 75000, Output Limit: 163840
nanogpt	Kimi K2 Thinking	kimi-k2-thinking	1.00	2.00	Provider: NanoGPT, Context: 32768, Output Limit: 8192
nanogpt	Kimi K2 Instruct	kimi-k2-instruct	1.00	2.00	Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt	Hermes 4 405b Thinking	hermes-4-405b:thinking	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	Llama 3 3 Nemotron Super 49B V1 5	llama-3_3-nemotron-super-49b-v1_5	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	Deepseek V3.2 Thinking	deepseek-v3.2:thinking	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	Deepseek R1	deepseek-r1	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	Minimax M2.1	minimax-m2.1	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	GPT Oss 120b	gpt-oss-120b	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	GLM 4.6 Thinking	glm-4.6:thinking	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	GLM 4.6	glm-4.6	1.00	2.00	Provider: NanoGPT, Context: 200000, Output Limit: 8192
nanogpt	Qwen3 Coder	qwen3-coder	1.00	2.00	Provider: NanoGPT, Context: 106000, Output Limit: 8192
nanogpt	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	1.00	2.00	Provider: NanoGPT, Context: 262144, Output Limit: 8192
nanogpt	Devstral 2 123b Instruct 2512	devstral-2-123b-instruct-2512	1.00	2.00	Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt	Mistral Large 3 675b Instruct 2512	mistral-large-3-675b-instruct-2512	1.00	2.00	Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt	Ministral 14b Instruct 2512	ministral-14b-instruct-2512	1.00	2.00	Provider: NanoGPT, Context: 131072, Output Limit: 8192
nanogpt	Llama 4 Maverick	llama-4-maverick	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	Llama 3.3 70b Instruct	llama-3.3-70b-instruct	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	GLM 4.7	glm-4.7	1.00	2.00	Provider: NanoGPT, Context: 204800, Output Limit: 8192
nanogpt	GLM 4.5 Air	glm-4.5-air	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	GLM 4.7 Thinking	glm-4.7:thinking	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
nanogpt	GLM 4.5 Air Thinking	glm-4.5-air:thinking	1.00	2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
inference	Mistral Nemo 12B Instruct	mistral-nemo-12b-instruct	0.04	0.10	Provider: Inference, Context: 16000, Output Limit: 4096
inference	Google Gemma 3	gemma-3	0.15	0.30	Provider: Inference, Context: 125000, Output Limit: 4096
inference	Osmosis Structure 0.6B	osmosis-structure-0.6b	0.10	0.50	Provider: Inference, Context: 4000, Output Limit: 2048
inference	Qwen 3 Embedding 4B	qwen3-embedding-4b	0.01	0.00	Provider: Inference, Context: 32000, Output Limit: 2048
inference	Qwen 2.5 7B Vision Instruct	qwen-2.5-7b-vision-instruct	0.20	0.20	Provider: Inference, Context: 125000, Output Limit: 4096
inference	Llama 3.2 11B Vision Instruct	llama-3.2-11b-vision-instruct	0.06	0.06	Provider: Inference, Context: 16000, Output Limit: 4096
inference	Llama 3.1 8B Instruct	llama-3.1-8b-instruct	0.03	0.03	Provider: Inference, Context: 16000, Output Limit: 4096
inference	Llama 3.2 3B Instruct	llama-3.2-3b-instruct	0.02	0.02	Provider: Inference, Context: 16000, Output Limit: 4096
inference	Llama 3.2 1B Instruct	llama-3.2-1b-instruct	0.01	0.01	Provider: Inference, Context: 16000, Output Limit: 4096
requesty	Grok 4	grok-4	3.00	15.00	Provider: Requesty, Context: 256000, Output Limit: 64000
requesty	Grok 4 Fast	grok-4-fast	0.20	0.50	Provider: Requesty, Context: 2000000, Output Limit: 64000
requesty	Gemini 3 Flash	gemini-3-flash-preview	0.50	3.00	Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty	Gemini 3 Pro	gemini-3-pro-preview	2.00	12.00	Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty	Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty	Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Provider: Requesty, Context: 1048576, Output Limit: 65536
requesty	GPT-4.1 Mini	gpt-4.1-mini	0.40	1.60	Provider: Requesty, Context: 1047576, Output Limit: 32768
requesty	GPT-5 Nano	gpt-5-nano	0.05	0.40	Provider: Requesty, Context: 16000, Output Limit: 4000
requesty	GPT-4.1	gpt-4.1	2.00	8.00	Provider: Requesty, Context: 1047576, Output Limit: 32768
requesty	o4 Mini	o4-mini	1.10	4.40	Provider: Requesty, Context: 200000, Output Limit: 100000
requesty	GPT-5 Mini	gpt-5-mini	0.25	2.00	Provider: Requesty, Context: 128000, Output Limit: 32000
requesty	GPT-4o Mini	gpt-4o-mini	0.15	0.60	Provider: Requesty, Context: 128000, Output Limit: 16384
requesty	GPT-5	gpt-5	1.25	10.00	Provider: Requesty, Context: 400000, Output Limit: 128000
requesty	Claude Opus 4	claude-opus-4	15.00	75.00	Provider: Requesty, Context: 200000, Output Limit: 32000
requesty	Claude Opus 4.1	claude-opus-4-1	15.00	75.00	Provider: Requesty, Context: 200000, Output Limit: 32000
requesty	Claude Haiku 4.5	claude-haiku-4-5	1.00	5.00	Provider: Requesty, Context: 200000, Output Limit: 62000
requesty	Claude Opus 4.5	claude-opus-4-5	5.00	25.00	Provider: Requesty, Context: 200000, Output Limit: 64000
requesty	Claude Sonnet 4.5	claude-sonnet-4-5	3.00	15.00	Provider: Requesty, Context: 1000000, Output Limit: 64000
requesty	Claude Sonnet 3.7	claude-3-7-sonnet	3.00	15.00	Provider: Requesty, Context: 200000, Output Limit: 64000
requesty	Claude Sonnet 4	claude-sonnet-4	3.00	15.00	Provider: Requesty, Context: 200000, Output Limit: 64000
morph	Morph v3 Large	morph-v3-large	0.90	1.90	Provider: Morph, Context: 32000, Output Limit: 32000
morph	Auto	auto	0.85	1.55	Provider: Morph, Context: 32000, Output Limit: 32000
morph	Morph v3 Fast	morph-v3-fast	0.80	1.20	Provider: Morph, Context: 16000, Output Limit: 16000
lmstudio	GPT OSS 20B	gpt-oss-20b	0.00	0.00	Provider: LMStudio, Context: 131072, Output Limit: 32768
lmstudio	Qwen3 30B A3B 2507	qwen3-30b-a3b-2507	0.00	0.00	Provider: LMStudio, Context: 262144, Output Limit: 16384
lmstudio	Qwen3 Coder 30B	qwen3-coder-30b	0.00	0.00	Provider: LMStudio, Context: 262144, Output Limit: 65536
friendli	Llama 3.3 70B Instruct	meta-llama-3.3-70b-instruct	0.60	0.60	Provider: Friendli, Context: 131072, Output Limit: 131072
friendli	Llama 3.1 8B Instruct	meta-llama-3.1-8b-instruct	0.10	0.10	Provider: Friendli, Context: 131072, Output Limit: 8000
friendli	EXAONE 4.0.1 32B	exaone-4.0.1-32b	0.60	1.00	Provider: Friendli, Context: 131072, Output Limit: 131072
friendli	Llama 4 Maverick 17B 128E Instruct	llama-4-maverick-17b-128e-instruct	-	-	Provider: Friendli, Context: 131072, Output Limit: 8000
friendli	Llama 4 Scout 17B 16E Instruct	llama-4-scout-17b-16e-instruct	-	-	Provider: Friendli, Context: 131072, Output Limit: 8000
friendli	Qwen3 30B A3B	qwen3-30b-a3b	-	-	Provider: Friendli, Context: 131072, Output Limit: 8000
friendli	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.20	0.80	Provider: Friendli, Context: 131072, Output Limit: 131072
friendli	Qwen3 32B	qwen3-32b	-	-	Provider: Friendli, Context: 131072, Output Limit: 8000
friendli	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	-	-	Provider: Friendli, Context: 131072, Output Limit: 131072
friendli	GLM 4.6	glm-4.6	-	-	Provider: Friendli, Context: 131072, Output Limit: 131072
friendli	DeepSeek R1 0528	deepseek-r1-0528	-	-	Provider: Friendli, Context: 163840, Output Limit: 163840
sapaicore	anthropic--claude-3.5-sonnet	anthropic--claude-3.5-sonnet	3.00	15.00	Provider: SAP AI Core, Context: 200000, Output Limit: 8192
sapaicore	anthropic--claude-4.5-haiku	anthropic--claude-4.5-haiku	1.00	5.00	Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore	anthropic--claude-4-opus	anthropic--claude-4-opus	15.00	75.00	Provider: SAP AI Core, Context: 200000, Output Limit: 32000
sapaicore	gemini-2.5-flash	gemini-2.5-flash	0.30	2.50	Provider: SAP AI Core, Context: 1048576, Output Limit: 65536
sapaicore	anthropic--claude-3-haiku	anthropic--claude-3-haiku	0.25	1.25	Provider: SAP AI Core, Context: 200000, Output Limit: 4096
sapaicore	anthropic--claude-3-sonnet	anthropic--claude-3-sonnet	3.00	15.00	Provider: SAP AI Core, Context: 200000, Output Limit: 4096
sapaicore	gpt-5-nano	gpt-5-nano	0.05	0.40	Provider: SAP AI Core, Context: 400000, Output Limit: 128000
sapaicore	anthropic--claude-3.7-sonnet	anthropic--claude-3.7-sonnet	3.00	15.00	Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore	gpt-5-mini	gpt-5-mini	0.25	2.00	Provider: SAP AI Core, Context: 400000, Output Limit: 128000
sapaicore	anthropic--claude-4.5-sonnet	anthropic--claude-4.5-sonnet	3.00	15.00	Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore	gemini-2.5-pro	gemini-2.5-pro	1.25	10.00	Provider: SAP AI Core, Context: 1048576, Output Limit: 65536
sapaicore	anthropic--claude-3-opus	anthropic--claude-3-opus	15.00	75.00	Provider: SAP AI Core, Context: 200000, Output Limit: 4096
sapaicore	anthropic--claude-4-sonnet	anthropic--claude-4-sonnet	3.00	15.00	Provider: SAP AI Core, Context: 200000, Output Limit: 64000
sapaicore	gpt-5	gpt-5	1.25	10.00	Provider: SAP AI Core, Context: 400000, Output Limit: 128000
anthropic	Claude Opus 4 (latest)	claude-opus-4-0	15.00	75.00	Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic	Claude Sonnet 3.5 v2	claude-3-5-sonnet-20241022	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic	Claude Opus 4.1 (latest)	claude-opus-4-1	15.00	75.00	Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic	Claude Haiku 4.5 (latest)	claude-haiku-4-5	1.00	5.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Sonnet 3.5	claude-3-5-sonnet-20240620	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic	Claude Haiku 3.5 (latest)	claude-3-5-haiku-latest	0.80	4.00	Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic	Claude Opus 4.5 (latest)	claude-opus-4-5	5.00	25.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Opus 3	claude-3-opus-20240229	15.00	75.00	Provider: Anthropic, Context: 200000, Output Limit: 4096
anthropic	Claude Opus 4.5	claude-opus-4-5-20251101	5.00	25.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Sonnet 4.5 (latest)	claude-sonnet-4-5	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Sonnet 4.5	claude-sonnet-4-5-20250929	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Sonnet 4	claude-sonnet-4-20250514	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Opus 4	claude-opus-4-20250514	15.00	75.00	Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic	Claude Haiku 3.5	claude-3-5-haiku-20241022	0.80	4.00	Provider: Anthropic, Context: 200000, Output Limit: 8192
anthropic	Claude Haiku 3	claude-3-haiku-20240307	0.25	1.25	Provider: Anthropic, Context: 200000, Output Limit: 4096
anthropic	Claude Sonnet 3.7	claude-3-7-sonnet-20250219	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Sonnet 3.7 (latest)	claude-3-7-sonnet-latest	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Sonnet 4 (latest)	claude-sonnet-4-0	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
anthropic	Claude Opus 4.1	claude-opus-4-1-20250805	15.00	75.00	Provider: Anthropic, Context: 200000, Output Limit: 32000
anthropic	Claude Sonnet 3	claude-3-sonnet-20240229	3.00	15.00	Provider: Anthropic, Context: 200000, Output Limit: 4096
anthropic	Claude Haiku 4.5	claude-haiku-4-5-20251001	1.00	5.00	Provider: Anthropic, Context: 200000, Output Limit: 64000
aihubmix	GPT-4.1 nano	gpt-4.1-nano	0.10	0.40	Provider: AIHubMix, Context: 1047576, Output Limit: 32768
aihubmix	GLM-4.7	glm-4.7	0.27	1.10	Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.28	1.12	Provider: AIHubMix, Context: 262144, Output Limit: 262144
aihubmix	Claude Opus 4.1	claude-opus-4-1	16.50	82.50	Provider: AIHubMix, Context: 200000, Output Limit: 32000
aihubmix	GPT-5.1 Codex	gpt-5.1-codex	1.25	10.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix	Claude Haiku 4.5	claude-haiku-4-5	1.10	5.50	Provider: AIHubMix, Context: 200000, Output Limit: 64000
aihubmix	Claude Opus 4.5	claude-opus-4-5	5.00	25.00	Provider: AIHubMix, Context: 200000, Output Limit: 32000
aihubmix	Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	Provider: AIHubMix, Context: 1000000, Output Limit: 65000
aihubmix	Gemini 2.5 Flash	gemini-2.5-flash	0.08	0.30	Provider: AIHubMix, Context: 1000000, Output Limit: 65000
aihubmix	GPT-4.1 mini	gpt-4.1-mini	0.40	1.60	Provider: AIHubMix, Context: 1047576, Output Limit: 32768
aihubmix	Claude Sonnet 4.5	claude-sonnet-4-5	3.30	16.50	Provider: AIHubMix, Context: 200000, Output Limit: 64000
aihubmix	Coding GLM-4.7 Free	coding-glm-4.7-free	0.00	0.00	Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix	GPT-5.1 Codex Mini	gpt-5.1-codex-mini	0.25	2.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix	Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	0.28	2.80	Provider: AIHubMix, Context: 262144, Output Limit: 262144
aihubmix	GPT-5.1	gpt-5.1	1.25	10.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix	GPT-5-Nano	gpt-5-nano	0.50	2.00	Provider: AIHubMix, Context: 128000, Output Limit: 16384
aihubmix	GPT-5-Codex	gpt-5-codex	1.25	10.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix	GPT-4o	gpt-4o	2.50	10.00	Provider: AIHubMix, Context: 128000, Output Limit: 16384
aihubmix	GPT-4.1	gpt-4.1	2.00	8.00	Provider: AIHubMix, Context: 1047576, Output Limit: 32768
aihubmix	o4-mini	o4-mini	1.50	6.00	Provider: AIHubMix, Context: 200000, Output Limit: 65536
aihubmix	GPT-5-Mini	gpt-5-mini	1.50	6.00	Provider: AIHubMix, Context: 200000, Output Limit: 64000
aihubmix	Gemini 2.5 Pro	gemini-2.5-pro	1.25	5.00	Provider: AIHubMix, Context: 2000000, Output Limit: 65000
aihubmix	GPT-4o (2024-11-20)	gpt-4o-2024-11-20	2.50	10.00	Provider: AIHubMix, Context: 128000, Output Limit: 16384
aihubmix	GPT-5.1-Codex-Max	gpt-5.1-codex-max	1.25	10.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix	MiniMax M2.1 Free	minimax-m2.1-free	0.00	0.00	Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	0.82	3.29	Provider: AIHubMix, Context: 262144, Output Limit: 131000
aihubmix	DeepSeek-V3.2-Think	deepseek-v3.2-think	0.30	0.45	Provider: AIHubMix, Context: 131000, Output Limit: 64000
aihubmix	GPT-5	gpt-5	5.00	20.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix	MiniMax M2.1	minimax-m2.1	0.29	1.15	Provider: AIHubMix, Context: 204800, Output Limit: 131072
aihubmix	DeepSeek-V3.2	deepseek-v3.2	0.30	0.45	Provider: AIHubMix, Context: 131000, Output Limit: 64000
aihubmix	Kimi K2 0905	kimi-k2-0905	0.55	2.19	Provider: AIHubMix, Context: 262144, Output Limit: 262144
aihubmix	GPT-5-Pro	gpt-5-pro	7.00	28.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
aihubmix	GPT-5.2	gpt-5.2	1.75	14.00	Provider: AIHubMix, Context: 400000, Output Limit: 128000
fireworksai	Deepseek R1 05/28	deepseek-r1-0528	3.00	8.00	Provider: Fireworks AI, Context: 160000, Output Limit: 16384
fireworksai	DeepSeek V3.1	deepseek-v3p1	0.56	1.68	Provider: Fireworks AI, Context: 163840, Output Limit: 163840
fireworksai	DeepSeek V3.2	deepseek-v3p2	0.56	1.68	Provider: Fireworks AI, Context: 160000, Output Limit: 160000
fireworksai	MiniMax-M2	minimax-m2	0.30	1.20	Provider: Fireworks AI, Context: 192000, Output Limit: 192000
fireworksai	MiniMax-M2.1	minimax-m2p1	0.30	1.20	Provider: Fireworks AI, Context: 200000, Output Limit: 200000
fireworksai	GLM 4.7	glm-4p7	0.60	2.20	Provider: Fireworks AI, Context: 198000, Output Limit: 198000
fireworksai	Deepseek V3 03-24	deepseek-v3-0324	0.90	0.90	Provider: Fireworks AI, Context: 160000, Output Limit: 16384
fireworksai	GLM 4.6	glm-4p6	0.55	2.19	Provider: Fireworks AI, Context: 198000, Output Limit: 198000
fireworksai	Kimi K2 Thinking	kimi-k2-thinking	0.60	2.50	Provider: Fireworks AI, Context: 256000, Output Limit: 256000
fireworksai	Kimi K2 Instruct	kimi-k2-instruct	1.00	3.00	Provider: Fireworks AI, Context: 128000, Output Limit: 16384
fireworksai	Qwen3 235B-A22B	qwen3-235b-a22b	0.22	0.88	Provider: Fireworks AI, Context: 128000, Output Limit: 16384
fireworksai	GPT OSS 20B	gpt-oss-20b	0.05	0.20	Provider: Fireworks AI, Context: 131072, Output Limit: 32768
fireworksai	GPT OSS 120B	gpt-oss-120b	0.15	0.60	Provider: Fireworks AI, Context: 131072, Output Limit: 32768
fireworksai	GLM 4.5 Air	glm-4p5-air	0.22	0.88	Provider: Fireworks AI, Context: 131072, Output Limit: 131072
fireworksai	Qwen3 Coder 480B A35B Instruct	qwen3-coder-480b-a35b-instruct	0.45	1.80	Provider: Fireworks AI, Context: 256000, Output Limit: 32768
fireworksai	GLM 4.5	glm-4p5	0.55	2.19	Provider: Fireworks AI, Context: 131072, Output Limit: 131072
ionet	Kimi K2 Instruct	kimi-k2-instruct-0905	0.39	1.90	Provider: IO.NET, Context: 32768, Output Limit: 4096
ionet	Kimi K2 Thinking	kimi-k2-thinking	0.55	2.25	Provider: IO.NET, Context: 32768, Output Limit: 4096
ionet	GPT-OSS 20B	gpt-oss-20b	0.03	0.14	Provider: IO.NET, Context: 64000, Output Limit: 4096
ionet	GPT-OSS 120B	gpt-oss-120b	0.04	0.40	Provider: IO.NET, Context: 131072, Output Limit: 4096
ionet	Devstral Small 2505	devstral-small-2505	0.05	0.22	Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet	Mistral Nemo Instruct 2407	mistral-nemo-instruct-2407	0.02	0.04	Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet	Magistral Small 2506	magistral-small-2506	0.50	1.50	Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet	Mistral Large Instruct 2411	mistral-large-instruct-2411	2.00	6.00	Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet	Llama 3.3 70B Instruct	llama-3.3-70b-instruct	0.13	0.38	Provider: IO.NET, Context: 128000, Output Limit: 4096
ionet	Llama 4 Maverick 17B 128E Instruct	llama-4-maverick-17b-128e-instruct-fp8	0.15	0.60	Provider: IO.NET, Context: 430000, Output Limit: 4096
ionet	Llama 3.2 90B Vision Instruct	llama-3.2-90b-vision-instruct	0.35	0.40	Provider: IO.NET, Context: 16000, Output Limit: 4096
ionet	Qwen 3 Coder 480B	qwen3-coder-480b-a35b-instruct-int4-mixed-ar	0.22	0.95	Provider: IO.NET, Context: 106000, Output Limit: 4096
ionet	Qwen 2.5 VL 32B Instruct	qwen2.5-vl-32b-instruct	0.05	0.22	Provider: IO.NET, Context: 32000, Output Limit: 4096
ionet	Qwen 3 235B Thinking	qwen3-235b-a22b-thinking-2507	0.11	0.60	Provider: IO.NET, Context: 262144, Output Limit: 4096
ionet	Qwen 3 Next 80B Instruct	qwen3-next-80b-a3b-instruct	0.10	0.80	Provider: IO.NET, Context: 262144, Output Limit: 4096
ionet	GLM 4.6	glm-4.6	0.40	1.75	Provider: IO.NET, Context: 200000, Output Limit: 4096
ionet	DeepSeek R1	deepseek-r1-0528	2.00	8.75	Provider: IO.NET, Context: 128000, Output Limit: 4096
modelscope	GLM-4.5	glm-4.5	0.00	0.00	Provider: ModelScope, Context: 131072, Output Limit: 98304
modelscope	GLM-4.6	glm-4.6	0.00	0.00	Provider: ModelScope, Context: 202752, Output Limit: 98304
modelscope	Qwen3 30B A3B Thinking 2507	qwen3-30b-a3b-thinking-2507	0.00	0.00	Provider: ModelScope, Context: 262144, Output Limit: 32768
modelscope	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.00	0.00	Provider: ModelScope, Context: 262144, Output Limit: 131072
modelscope	Qwen3 Coder 30B A3B Instruct	qwen3-coder-30b-a3b-instruct	0.00	0.00	Provider: ModelScope, Context: 262144, Output Limit: 65536
modelscope	Qwen3 30B A3B Instruct 2507	qwen3-30b-a3b-instruct-2507	0.00	0.00	Provider: ModelScope, Context: 262144, Output Limit: 16384
modelscope	Qwen3-235B-A22B-Thinking-2507	qwen3-235b-a22b-thinking-2507	0.00	0.00	Provider: ModelScope, Context: 262144, Output Limit: 131072
azurecognitiveservices	GPT-3.5 Turbo 1106	gpt-3.5-turbo-1106	1.00	2.00	Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
azurecognitiveservices	Mistral Small 3.1	mistral-small-2503	0.10	0.30	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices	Codestral 25.01	codestral-2501	0.30	0.90	Provider: Azure Cognitive Services, Context: 256000, Output Limit: 256000
azurecognitiveservices	Mistral Large 24.11	mistral-large-2411	2.00	6.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices	GPT-5 Pro	gpt-5-pro	15.00	120.00	Provider: Azure Cognitive Services, Context: 400000, Output Limit: 272000
azurecognitiveservices	DeepSeek-V3.2	deepseek-v3.2	0.28	0.42	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices	MAI-DS-R1	mai-ds-r1	1.35	5.40	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices	GPT-5	gpt-5	1.25	10.00	Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices	GPT-4o mini	gpt-4o-mini	0.15	0.60	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices	Phi-4-reasoning-plus	phi-4-reasoning-plus	0.13	0.50	Provider: Azure Cognitive Services, Context: 32000, Output Limit: 4096
azurecognitiveservices	GPT-4 Turbo Vision	gpt-4-turbo-vision	10.00	30.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	Phi-4-reasoning	phi-4-reasoning	0.13	0.50	Provider: Azure Cognitive Services, Context: 32000, Output Limit: 4096
azurecognitiveservices	Phi-3-medium-instruct (4k)	phi-3-medium-4k-instruct	0.17	0.68	Provider: Azure Cognitive Services, Context: 4096, Output Limit: 1024
azurecognitiveservices	Codex Mini	codex-mini	1.50	6.00	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices	o3	o3	2.00	8.00	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices	Mistral Nemo	mistral-nemo	0.15	0.15	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices	GPT-3.5 Turbo Instruct	gpt-3.5-turbo-instruct	1.50	2.00	Provider: Azure Cognitive Services, Context: 4096, Output Limit: 4096
azurecognitiveservices	Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.30	0.61	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices	text-embedding-ada-002	text-embedding-ada-002	0.10	0.00	Provider: Azure Cognitive Services, Context: 8192, Output Limit: 1536
azurecognitiveservices	Embed v3 English	cohere-embed-v3-english	0.10	0.00	Provider: Azure Cognitive Services, Context: 512, Output Limit: 1024
azurecognitiveservices	Llama 4 Scout 17B 16E Instruct	llama-4-scout-17b-16e-instruct	0.20	0.78	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices	o1-mini	o1-mini	1.10	4.40	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 65536
azurecognitiveservices	GPT-5 Mini	gpt-5-mini	0.25	2.00	Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices	Phi-3.5-MoE-instruct	phi-3.5-moe-instruct	0.16	0.64	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	GPT-5.1 Chat	gpt-5.1-chat	1.25	10.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices	Grok 3 Mini	grok-3-mini	0.30	0.50	Provider: Azure Cognitive Services, Context: 131072, Output Limit: 8192
azurecognitiveservices	o1	o1	15.00	60.00	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices	Meta-Llama-3-8B-Instruct	meta-llama-3-8b-instruct	0.30	0.61	Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
azurecognitiveservices	Phi-4-multimodal	phi-4-multimodal	0.08	0.32	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	o4-mini	o4-mini	1.10	4.40	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices	GPT-4.1	gpt-4.1	2.00	8.00	Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
azurecognitiveservices	Ministral 3B	ministral-3b	0.04	0.04	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices	GPT-3.5 Turbo 0301	gpt-3.5-turbo-0301	1.50	2.00	Provider: Azure Cognitive Services, Context: 4096, Output Limit: 4096
azurecognitiveservices	GPT-4o	gpt-4o	2.50	10.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices	Phi-3-mini-instruct (128k)	phi-3-mini-128k-instruct	0.13	0.52	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	Llama-3.2-90B-Vision-Instruct	llama-3.2-90b-vision-instruct	2.04	2.04	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices	GPT-5-Codex	gpt-5-codex	1.25	10.00	Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
azurecognitiveservices	GPT-5 Nano	gpt-5-nano	0.05	0.40	Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices	GPT-5.1	gpt-5.1	1.25	10.00	Provider: Azure Cognitive Services, Context: 272000, Output Limit: 128000
azurecognitiveservices	o3-mini	o3-mini	1.10	4.40	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 100000
azurecognitiveservices	Model Router	model-router	0.14	0.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices	Kimi K2 Thinking	kimi-k2-thinking	0.60	2.50	Provider: Azure Cognitive Services, Context: 262144, Output Limit: 262144
azurecognitiveservices	GPT-5.1 Codex Mini	gpt-5.1-codex-mini	0.25	2.00	Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
azurecognitiveservices	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.71	0.71	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices	o1-preview	o1-preview	16.50	66.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices	Phi-3.5-mini-instruct	phi-3.5-mini-instruct	0.13	0.52	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	GPT-3.5 Turbo 0613	gpt-3.5-turbo-0613	3.00	4.00	Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
azurecognitiveservices	GPT-4 Turbo	gpt-4-turbo	10.00	30.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	Meta-Llama-3.1-70B-Instruct	meta-llama-3.1-70b-instruct	2.68	3.54	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices	Phi-3-small-instruct (8k)	phi-3-small-8k-instruct	0.15	0.60	Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
azurecognitiveservices	DeepSeek-V3-0324	deepseek-v3-0324	1.14	4.56	Provider: Azure Cognitive Services, Context: 131072, Output Limit: 131072
azurecognitiveservices	Meta-Llama-3-70B-Instruct	meta-llama-3-70b-instruct	2.68	3.54	Provider: Azure Cognitive Services, Context: 8192, Output Limit: 2048
azurecognitiveservices	text-embedding-3-large	text-embedding-3-large	0.13	0.00	Provider: Azure Cognitive Services, Context: 8191, Output Limit: 3072
azurecognitiveservices	Grok 3	grok-3	3.00	15.00	Provider: Azure Cognitive Services, Context: 131072, Output Limit: 8192
azurecognitiveservices	GPT-3.5 Turbo 0125	gpt-3.5-turbo-0125	0.50	1.50	Provider: Azure Cognitive Services, Context: 16384, Output Limit: 16384
azurecognitiveservices	Claude Sonnet 4.5	claude-sonnet-4-5	3.00	15.00	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
azurecognitiveservices	Phi-4-mini-reasoning	phi-4-mini-reasoning	0.08	0.30	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	Phi-4	phi-4	0.13	0.50	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	DeepSeek-V3.1	deepseek-v3.1	0.56	1.68	Provider: Azure Cognitive Services, Context: 131072, Output Limit: 131072
azurecognitiveservices	GPT-5 Chat	gpt-5-chat	1.25	10.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices	GPT-4.1 mini	gpt-4.1-mini	0.40	1.60	Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
azurecognitiveservices	Llama 4 Maverick 17B 128E Instruct FP8	llama-4-maverick-17b-128e-instruct-fp8	0.25	1.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices	Command R+	cohere-command-r-plus-08-2024	2.50	10.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4000
azurecognitiveservices	Command A	cohere-command-a	2.50	10.00	Provider: Azure Cognitive Services, Context: 256000, Output Limit: 8000
azurecognitiveservices	Phi-3-small-instruct (128k)	phi-3-small-128k-instruct	0.15	0.60	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	Claude Opus 4.5	claude-opus-4-5	5.00	25.00	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
azurecognitiveservices	Mistral Medium 3	mistral-medium-2505	0.40	2.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices	DeepSeek-V3.2-Speciale	deepseek-v3.2-speciale	0.28	0.42	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 128000
azurecognitiveservices	Claude Haiku 4.5	claude-haiku-4-5	1.00	5.00	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 64000
azurecognitiveservices	Phi-3-mini-instruct (4k)	phi-3-mini-4k-instruct	0.13	0.52	Provider: Azure Cognitive Services, Context: 4096, Output Limit: 1024
azurecognitiveservices	GPT-5.1 Codex	gpt-5.1-codex	1.25	10.00	Provider: Azure Cognitive Services, Context: 400000, Output Limit: 128000
azurecognitiveservices	Grok Code Fast 1	grok-code-fast-1	0.20	1.50	Provider: Azure Cognitive Services, Context: 256000, Output Limit: 10000
azurecognitiveservices	DeepSeek-R1	deepseek-r1	1.35	5.40	Provider: Azure Cognitive Services, Context: 163840, Output Limit: 163840
azurecognitiveservices	Meta-Llama-3.1-405B-Instruct	meta-llama-3.1-405b-instruct	5.33	16.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 32768
azurecognitiveservices	GPT-4 32K	gpt-4-32k	60.00	120.00	Provider: Azure Cognitive Services, Context: 32768, Output Limit: 32768
azurecognitiveservices	Phi-4-mini	phi-4-mini	0.08	0.30	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	Embed v3 Multilingual	cohere-embed-v3-multilingual	0.10	0.00	Provider: Azure Cognitive Services, Context: 512, Output Limit: 1024
azurecognitiveservices	Grok 4	grok-4	3.00	15.00	Provider: Azure Cognitive Services, Context: 256000, Output Limit: 64000
azurecognitiveservices	Command R	cohere-command-r-08-2024	0.15	0.60	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4000
azurecognitiveservices	Embed v4	cohere-embed-v-4-0	0.12	0.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 1536
azurecognitiveservices	Llama-3.2-11B-Vision-Instruct	llama-3.2-11b-vision-instruct	0.37	0.37	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 8192
azurecognitiveservices	GPT-5.2 Chat	gpt-5.2-chat	1.75	14.00	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 16384
azurecognitiveservices	Claude Opus 4.1	claude-opus-4-1	15.00	75.00	Provider: Azure Cognitive Services, Context: 200000, Output Limit: 32000
azurecognitiveservices	GPT-4	gpt-4	60.00	120.00	Provider: Azure Cognitive Services, Context: 8192, Output Limit: 8192
azurecognitiveservices	Phi-3-medium-instruct (128k)	phi-3-medium-128k-instruct	0.17	0.68	Provider: Azure Cognitive Services, Context: 128000, Output Limit: 4096
azurecognitiveservices	Grok 4 Fast (Reasoning)	grok-4-fast-reasoning	0.20	0.50	Provider: Azure Cognitive Services, Context: 2000000, Output Limit: 30000
azurecognitiveservices	DeepSeek-R1-0528	deepseek-r1-0528	1.35	5.40	Provider: Azure Cognitive Services, Context: 163840, Output Limit: 163840
azurecognitiveservices	Grok 4 Fast (Non-Reasoning)	grok-4-fast-non-reasoning	0.20	0.50	Provider: Azure Cognitive Services, Context: 2000000, Output Limit: 30000
azurecognitiveservices	text-embedding-3-small	text-embedding-3-small	0.02	0.00	Provider: Azure Cognitive Services, Context: 8191, Output Limit: 1536
azurecognitiveservices	GPT-4.1 nano	gpt-4.1-nano	0.10	0.40	Provider: Azure Cognitive Services, Context: 1047576, Output Limit: 32768
llama	Llama-3.3-8B-Instruct	llama-3.3-8b-instruct	0.00	0.00	Provider: Llama, Context: 128000, Output Limit: 4096
llama	Llama-4-Maverick-17B-128E-Instruct-FP8	llama-4-maverick-17b-128e-instruct-fp8	0.00	0.00	Provider: Llama, Context: 128000, Output Limit: 4096
llama	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.00	0.00	Provider: Llama, Context: 128000, Output Limit: 4096
llama	Llama-4-Scout-17B-16E-Instruct-FP8	llama-4-scout-17b-16e-instruct-fp8	0.00	0.00	Provider: Llama, Context: 128000, Output Limit: 4096
llama	Groq-Llama-4-Maverick-17B-128E-Instruct	groq-llama-4-maverick-17b-128e-instruct	0.00	0.00	Provider: Llama, Context: 128000, Output Limit: 4096
llama	Cerebras-Llama-4-Scout-17B-16E-Instruct	cerebras-llama-4-scout-17b-16e-instruct	0.00	0.00	Provider: Llama, Context: 128000, Output Limit: 4096
llama	Cerebras-Llama-4-Maverick-17B-128E-Instruct	cerebras-llama-4-maverick-17b-128e-instruct	0.00	0.00	Provider: Llama, Context: 128000, Output Limit: 4096
scaleway	Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-instruct-2507	0.75	2.25	Provider: Scaleway, Context: 260000, Output Limit: 8192
scaleway	Pixtral 12B 2409	pixtral-12b-2409	0.20	0.20	Provider: Scaleway, Context: 128000, Output Limit: 4096
scaleway	Llama 3.1 8B Instruct	llama-3.1-8b-instruct	0.20	0.20	Provider: Scaleway, Context: 128000, Output Limit: 16384
scaleway	Mistral Nemo Instruct 2407	mistral-nemo-instruct-2407	0.20	0.20	Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway	Mistral Small 3.2 24B Instruct (2506)	mistral-small-3.2-24b-instruct-2506	0.15	0.35	Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway	Qwen3-Coder 30B-A3B Instruct	qwen3-coder-30b-a3b-instruct	0.20	0.80	Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.90	0.90	Provider: Scaleway, Context: 100000, Output Limit: 4096
scaleway	Whisper Large v3	whisper-large-v3	0.00	0.00	Provider: Scaleway, Context: N/A, Output Limit: 4096
scaleway	DeepSeek R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.90	0.90	Provider: Scaleway, Context: 32000, Output Limit: 4096
scaleway	Voxtral Small 24B 2507	voxtral-small-24b-2507	0.15	0.35	Provider: Scaleway, Context: 32000, Output Limit: 8192
scaleway	GPT-OSS 120B	gpt-oss-120b	0.15	0.60	Provider: Scaleway, Context: 128000, Output Limit: 8192
scaleway	BGE Multilingual Gemma2	bge-multilingual-gemma2	0.13	0.00	Provider: Scaleway, Context: 8191, Output Limit: 3072
scaleway	Gemma-3-27B-IT	gemma-3-27b-it	0.25	0.50	Provider: Scaleway, Context: 40000, Output Limit: 8192
amazonbedrock	Command R+	cohere.command-r-plus-v1:0	3.00	15.00	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Claude 2	anthropic.claude-v2	8.00	24.00	Provider: Amazon Bedrock, Context: 100000, Output Limit: 4096
amazonbedrock	Claude Sonnet 3.7	anthropic.claude-3-7-sonnet-20250219-v1:0	3.00	15.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock	Claude Sonnet 4	anthropic.claude-sonnet-4-20250514-v1:0	3.00	15.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock	Qwen3 Coder 30B A3B Instruct	qwen.qwen3-coder-30b-a3b-v1:0	0.15	0.60	Provider: Amazon Bedrock, Context: 262144, Output Limit: 131072
amazonbedrock	Gemma 3 4B IT	google.gemma-3-4b-it	0.04	0.08	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	MiniMax M2	minimax.minimax-m2	0.30	1.20	Provider: Amazon Bedrock, Context: 204608, Output Limit: 128000
amazonbedrock	Llama 3.2 11B Instruct	meta.llama3-2-11b-instruct-v1:0	0.16	0.16	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Qwen/Qwen3-Next-80B-A3B-Instruct	qwen.qwen3-next-80b-a3b	0.14	1.40	Provider: Amazon Bedrock, Context: 262000, Output Limit: 262000
amazonbedrock	Claude Haiku 3	anthropic.claude-3-haiku-20240307-v1:0	0.25	1.25	Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock	Llama 3.2 90B Instruct	meta.llama3-2-90b-instruct-v1:0	0.72	0.72	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Qwen/Qwen3-VL-235B-A22B-Instruct	qwen.qwen3-vl-235b-a22b	0.30	1.50	Provider: Amazon Bedrock, Context: 262000, Output Limit: 262000
amazonbedrock	Llama 3.2 1B Instruct	meta.llama3-2-1b-instruct-v1:0	0.10	0.10	Provider: Amazon Bedrock, Context: 131000, Output Limit: 4096
amazonbedrock	Claude 2.1	anthropic.claude-v2:1	8.00	24.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock	DeepSeek-V3.1	deepseek.v3-v1:0	0.58	1.68	Provider: Amazon Bedrock, Context: 163840, Output Limit: 81920
amazonbedrock	Claude Opus 4.5	anthropic.claude-opus-4-5-20251101-v1:0	5.00	25.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock	Command Light	cohere.command-light-text-v14	0.30	0.60	Provider: Amazon Bedrock, Context: 4096, Output Limit: 4096
amazonbedrock	Mistral Large (24.02)	mistral.mistral-large-2402-v1:0	0.50	1.50	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Google Gemma 3 27B Instruct	google.gemma-3-27b-it	0.12	0.20	Provider: Amazon Bedrock, Context: 202752, Output Limit: 8192
amazonbedrock	NVIDIA Nemotron Nano 12B v2 VL BF16	nvidia.nemotron-nano-12b-v2	0.20	0.60	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Google Gemma 3 12B	google.gemma-3-12b-it	0.05	0.10	Provider: Amazon Bedrock, Context: 131072, Output Limit: 8192
amazonbedrock	Jamba 1.5 Large	ai21.jamba-1-5-large-v1:0	2.00	8.00	Provider: Amazon Bedrock, Context: 256000, Output Limit: 4096
amazonbedrock	Llama 3.3 70B Instruct	meta.llama3-3-70b-instruct-v1:0	0.72	0.72	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Claude Opus 3	anthropic.claude-3-opus-20240229-v1:0	15.00	75.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock	Nova Pro	amazon.nova-pro-v1:0	0.80	3.20	Provider: Amazon Bedrock, Context: 300000, Output Limit: 8192
amazonbedrock	Llama 3.1 8B Instruct	meta.llama3-1-8b-instruct-v1:0	0.22	0.22	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	gpt-oss-120b	openai.gpt-oss-120b-1:0	0.15	0.60	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Qwen3 32B (dense)	qwen.qwen3-32b-v1:0	0.15	0.60	Provider: Amazon Bedrock, Context: 16384, Output Limit: 16384
amazonbedrock	Claude Sonnet 3.5	anthropic.claude-3-5-sonnet-20240620-v1:0	3.00	15.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock	Claude Haiku 4.5	anthropic.claude-haiku-4-5-20251001-v1:0	1.00	5.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock	Command R	cohere.command-r-v1:0	0.50	1.50	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Voxtral Small 24B 2507	mistral.voxtral-small-24b-2507	0.15	0.35	Provider: Amazon Bedrock, Context: 32000, Output Limit: 8192
amazonbedrock	Nova Micro	amazon.nova-micro-v1:0	0.04	0.14	Provider: Amazon Bedrock, Context: 128000, Output Limit: 8192
amazonbedrock	Llama 3.1 70B Instruct	meta.llama3-1-70b-instruct-v1:0	0.72	0.72	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Llama 3 70B Instruct	meta.llama3-70b-instruct-v1:0	2.65	3.50	Provider: Amazon Bedrock, Context: 8192, Output Limit: 2048
amazonbedrock	DeepSeek-R1	deepseek.r1-v1:0	1.35	5.40	Provider: Amazon Bedrock, Context: 128000, Output Limit: 32768
amazonbedrock	Claude Sonnet 3.5 v2	anthropic.claude-3-5-sonnet-20241022-v2:0	3.00	15.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock	Ministral 3 8B	mistral.ministral-3-8b-instruct	0.15	0.15	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Command	cohere.command-text-v14	1.50	2.00	Provider: Amazon Bedrock, Context: 4096, Output Limit: 4096
amazonbedrock	Claude Opus 4	anthropic.claude-opus-4-20250514-v1:0	15.00	75.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 32000
amazonbedrock	Voxtral Mini 3B 2507	mistral.voxtral-mini-3b-2507	0.04	0.04	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Claude Opus 4.5 (Global)	global.anthropic.claude-opus-4-5-20251101-v1:0	5.00	25.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock	Nova 2 Lite	amazon.nova-2-lite-v1:0	0.33	2.75	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Qwen3 Coder 480B A35B Instruct	qwen.qwen3-coder-480b-a35b-v1:0	0.22	1.80	Provider: Amazon Bedrock, Context: 131072, Output Limit: 65536
amazonbedrock	Claude Sonnet 4.5	anthropic.claude-sonnet-4-5-20250929-v1:0	3.00	15.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 64000
amazonbedrock	GPT OSS Safeguard 20B	openai.gpt-oss-safeguard-20b	0.07	0.20	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	gpt-oss-20b	openai.gpt-oss-20b-1:0	0.07	0.30	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Llama 3.2 3B Instruct	meta.llama3-2-3b-instruct-v1:0	0.15	0.15	Provider: Amazon Bedrock, Context: 131000, Output Limit: 4096
amazonbedrock	Claude Instant	anthropic.claude-instant-v1	0.80	2.40	Provider: Amazon Bedrock, Context: 100000, Output Limit: 4096
amazonbedrock	Nova Premier	amazon.nova-premier-v1:0	2.50	12.50	Provider: Amazon Bedrock, Context: 1000000, Output Limit: 16384
amazonbedrock	Mistral-7B-Instruct-v0.3	mistral.mistral-7b-instruct-v0:2	0.11	0.11	Provider: Amazon Bedrock, Context: 127000, Output Limit: 127000
amazonbedrock	Mixtral-8x7B-Instruct-v0.1	mistral.mixtral-8x7b-instruct-v0:1	0.70	0.70	Provider: Amazon Bedrock, Context: 32000, Output Limit: 32000
amazonbedrock	Claude Opus 4.1	anthropic.claude-opus-4-1-20250805-v1:0	15.00	75.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 32000
amazonbedrock	Llama 4 Scout 17B Instruct	meta.llama4-scout-17b-instruct-v1:0	0.17	0.66	Provider: Amazon Bedrock, Context: 3500000, Output Limit: 16384
amazonbedrock	Jamba 1.5 Mini	ai21.jamba-1-5-mini-v1:0	0.20	0.40	Provider: Amazon Bedrock, Context: 256000, Output Limit: 4096
amazonbedrock	Llama 3 8B Instruct	meta.llama3-8b-instruct-v1:0	0.30	0.60	Provider: Amazon Bedrock, Context: 8192, Output Limit: 2048
amazonbedrock	Titan Text G1 - Express	amazon.titan-text-express-v1:0:8k	0.20	0.60	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Claude Sonnet 3	anthropic.claude-3-sonnet-20240229-v1:0	3.00	15.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 4096
amazonbedrock	NVIDIA Nemotron Nano 9B v2	nvidia.nemotron-nano-9b-v2	0.06	0.23	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Titan Text G1 - Express	amazon.titan-text-express-v1	0.20	0.60	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Llama 4 Maverick 17B Instruct	meta.llama4-maverick-17b-instruct-v1:0	0.24	0.97	Provider: Amazon Bedrock, Context: 1000000, Output Limit: 16384
amazonbedrock	Ministral 14B 3.0	mistral.ministral-3-14b-instruct	0.20	0.20	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	GPT OSS Safeguard 120B	openai.gpt-oss-safeguard-120b	0.15	0.60	Provider: Amazon Bedrock, Context: 128000, Output Limit: 4096
amazonbedrock	Qwen3 235B A22B 2507	qwen.qwen3-235b-a22b-2507-v1:0	0.22	0.88	Provider: Amazon Bedrock, Context: 262144, Output Limit: 131072
amazonbedrock	Nova Lite	amazon.nova-lite-v1:0	0.06	0.24	Provider: Amazon Bedrock, Context: 300000, Output Limit: 8192
amazonbedrock	Claude Haiku 3.5	anthropic.claude-3-5-haiku-20241022-v1:0	0.80	4.00	Provider: Amazon Bedrock, Context: 200000, Output Limit: 8192
amazonbedrock	Kimi K2 Thinking	moonshot.kimi-k2-thinking	0.60	2.50	Provider: Amazon Bedrock, Context: 256000, Output Limit: 256000
cerebras	Qwen 3 235B Instruct	qwen-3-235b-a22b-instruct-2507	0.60	1.20	Provider: Cerebras, Context: 131000, Output Limit: 32000
cerebras	zai-glm-4.6	zai-glm-4.6	2.25	2.75	Source: cerebras, Context: 128000
cerebras	GPT OSS 120B	gpt-oss-120b	0.25	0.69	Provider: Cerebras, Context: 131072, Output Limit: 32768
bedrock	amazon.nova-canvas-v1:0	amazon.nova-canvas-v1:0	0.00	0.00	Source: bedrock, Context: 2600
bedrock	stability.stable-diffusion-xl-v1	stability.stable-diffusion-xl-v1	0.00	0.00	Source: bedrock, Context: 77
openai	dall-e-2	dall-e-2	0.00	0.00	Source: openai, Context: N/A
bedrock	stability.stable-diffusion-xl-v0	stability.stable-diffusion-xl-v0	0.00	0.00	Source: bedrock, Context: 77
bedrock	ai21.j2-mid-v1	ai21.j2-mid-v1	12.50	12.50	Source: bedrock, Context: 8191
bedrock	ai21.j2-ultra-v1	ai21.j2-ultra-v1	18.80	18.80	Source: bedrock, Context: 8191
bedrock	ai21.jamba-1-5-large-v1:0	ai21.jamba-1-5-large-v1:0	2.00	8.00	Source: bedrock, Context: 256000
bedrock	ai21.jamba-1-5-mini-v1:0	ai21.jamba-1-5-mini-v1:0	0.20	0.40	Source: bedrock, Context: 256000
bedrock	ai21.jamba-instruct-v1:0	ai21.jamba-instruct-v1:0	0.50	0.70	Source: bedrock, Context: 70000
aiml	dall-e-2	dall-e-2	0.00	0.00	Source: aiml, Context: N/A
aiml	dall-e-3	dall-e-3	0.00	0.00	Source: aiml, Context: N/A
aiml	flux-pro	flux-pro	0.00	0.00	Source: aiml, Context: N/A
aiml	v1.1	v1.1	0.00	0.00	Source: aiml, Context: N/A
aiml	v1.1-ultra	v1.1-ultra	0.00	0.00	Source: aiml, Context: N/A
aiml	flux-realism	flux-realism	0.00	0.00	Source: aiml, Context: N/A
aiml	dev	dev	0.00	0.00	Source: aiml, Context: N/A
aiml	text-to-image	text-to-image	0.00	0.00	Source: aiml, Context: N/A
aiml	schnell	schnell	0.00	0.00	Source: aiml, Context: N/A
aiml	imagen-4.0-ultra-generate-001	imagen-4.0-ultra-generate-001	0.00	0.00	Source: aiml, Context: N/A
aiml	nano-banana-pro	nano-banana-pro	0.00	0.00	Source: aiml, Context: N/A
bedrockconverse	us.writer.palmyra-x4-v1:0	us.writer.palmyra-x4-v1:0	2.50	10.00	Source: bedrock_converse, Context: 128000
bedrockconverse	us.writer.palmyra-x5-v1:0	us.writer.palmyra-x5-v1:0	0.60	6.00	Source: bedrock_converse, Context: 1000000
bedrockconverse	writer.palmyra-x4-v1:0	writer.palmyra-x4-v1:0	2.50	10.00	Source: bedrock_converse, Context: 128000
bedrockconverse	writer.palmyra-x5-v1:0	writer.palmyra-x5-v1:0	0.60	6.00	Source: bedrock_converse, Context: 1000000
bedrockconverse	amazon.nova-lite-v1:0	amazon.nova-lite-v1:0	0.06	0.24	Source: bedrock_converse, Context: 300000
bedrockconverse	amazon.nova-2-lite-v1:0	amazon.nova-2-lite-v1:0	0.30	2.50	Source: bedrock_converse, Context: 1000000
bedrockconverse	apac.amazon.nova-2-lite-v1:0	apac.amazon.nova-2-lite-v1:0	0.33	2.75	Source: bedrock_converse, Context: 1000000
bedrockconverse	eu.amazon.nova-2-lite-v1:0	eu.amazon.nova-2-lite-v1:0	0.33	2.75	Source: bedrock_converse, Context: 1000000
bedrockconverse	us.amazon.nova-2-lite-v1:0	us.amazon.nova-2-lite-v1:0	0.33	2.75	Source: bedrock_converse, Context: 1000000
bedrockconverse	amazon.nova-micro-v1:0	amazon.nova-micro-v1:0	0.04	0.14	Source: bedrock_converse, Context: 128000
bedrockconverse	amazon.nova-pro-v1:0	amazon.nova-pro-v1:0	0.80	3.20	Source: bedrock_converse, Context: 300000
bedrock	amazon.rerank-v1:0	amazon.rerank-v1:0	0.00	0.00	Source: bedrock, Context: 32000
bedrock	amazon.titan-embed-image-v1	amazon.titan-embed-image-v1	0.80	0.00	Source: bedrock, Context: 128
bedrock	amazon.titan-embed-text-v1	amazon.titan-embed-text-v1	0.10	0.00	Source: bedrock, Context: 8192
bedrock	amazon.titan-embed-text-v2:0	amazon.titan-embed-text-v2:0	0.20	0.00	Source: bedrock, Context: 8192
bedrock	amazon.titan-image-generator-v1	amazon.titan-image-generator-v1	0.00	0.00	Source: bedrock, Context: N/A
bedrock	amazon.titan-image-generator-v2	amazon.titan-image-generator-v2	0.00	0.00	Source: bedrock, Context: N/A
bedrock	amazon.titan-image-generator-v2:0	amazon.titan-image-generator-v2:0	0.00	0.00	Source: bedrock, Context: N/A
bedrock	twelvelabs.marengo-embed-2-7-v1:0	twelvelabs.marengo-embed-2-7-v1:0	70.00	0.00	Source: bedrock, Context: 77
bedrock	us.twelvelabs.marengo-embed-2-7-v1:0	us.twelvelabs.marengo-embed-2-7-v1:0	70.00	0.00	Source: bedrock, Context: 77
bedrock	eu.twelvelabs.marengo-embed-2-7-v1:0	eu.twelvelabs.marengo-embed-2-7-v1:0	70.00	0.00	Source: bedrock, Context: 77
bedrock	twelvelabs.pegasus-1-2-v1:0	twelvelabs.pegasus-1-2-v1:0	0.00	7.50	Source: bedrock, Context: N/A
bedrock	us.twelvelabs.pegasus-1-2-v1:0	us.twelvelabs.pegasus-1-2-v1:0	0.00	7.50	Source: bedrock, Context: N/A
bedrock	eu.twelvelabs.pegasus-1-2-v1:0	eu.twelvelabs.pegasus-1-2-v1:0	0.00	7.50	Source: bedrock, Context: N/A
bedrock	amazon.titan-text-express-v1	amazon.titan-text-express-v1	1.30	1.70	Source: bedrock, Context: 42000
bedrock	amazon.titan-text-lite-v1	amazon.titan-text-lite-v1	0.30	0.40	Source: bedrock, Context: 42000
bedrock	amazon.titan-text-premier-v1:0	amazon.titan-text-premier-v1:0	0.50	1.50	Source: bedrock, Context: 42000
bedrock	anthropic.claude-3-5-haiku-20241022-v1:0	anthropic.claude-3-5-haiku-20241022-v1:0	0.80	4.00	Source: bedrock, Context: 200000
bedrockconverse	anthropic.claude-haiku-4-5-20251001-v1:0	anthropic.claude-haiku-4-5-20251001-v1:0	1.00	5.00	Source: bedrock_converse, Context: 200000
bedrockconverse	anthropic.claude-haiku-4-5@20251001	anthropic.claude-haiku-4-5@20251001	1.00	5.00	Source: bedrock_converse, Context: 200000
bedrock	anthropic.claude-3-5-sonnet-20240620-v1:0	anthropic.claude-3-5-sonnet-20240620-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	anthropic.claude-3-5-sonnet-20241022-v2:0	anthropic.claude-3-5-sonnet-20241022-v2:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	anthropic.claude-3-7-sonnet-20240620-v1:0	anthropic.claude-3-7-sonnet-20240620-v1:0	3.60	18.00	Source: bedrock, Context: 200000
bedrockconverse	anthropic.claude-3-7-sonnet-20250219-v1:0	anthropic.claude-3-7-sonnet-20250219-v1:0	3.00	15.00	Source: bedrock_converse, Context: 200000
bedrock	anthropic.claude-3-haiku-20240307-v1:0	anthropic.claude-3-haiku-20240307-v1:0	0.25	1.25	Source: bedrock, Context: 200000
bedrock	anthropic.claude-3-opus-20240229-v1:0	anthropic.claude-3-opus-20240229-v1:0	15.00	75.00	Source: bedrock, Context: 200000
bedrock	anthropic.claude-3-sonnet-20240229-v1:0	anthropic.claude-3-sonnet-20240229-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	anthropic.claude-instant-v1	anthropic.claude-instant-v1	0.80	2.40	Source: bedrock, Context: 100000
bedrockconverse	anthropic.claude-opus-4-1-20250805-v1:0	anthropic.claude-opus-4-1-20250805-v1:0	15.00	75.00	Source: bedrock_converse, Context: 200000
bedrockconverse	anthropic.claude-opus-4-20250514-v1:0	anthropic.claude-opus-4-20250514-v1:0	15.00	75.00	Source: bedrock_converse, Context: 200000
bedrockconverse	anthropic.claude-opus-4-5-20251101-v1:0	anthropic.claude-opus-4-5-20251101-v1:0	5.00	25.00	Source: bedrock_converse, Context: 200000
bedrockconverse	anthropic.claude-sonnet-4-20250514-v1:0	anthropic.claude-sonnet-4-20250514-v1:0	3.00	15.00	Source: bedrock_converse, Context: 1000000
bedrockconverse	anthropic.claude-sonnet-4-5-20250929-v1:0	anthropic.claude-sonnet-4-5-20250929-v1:0	3.00	15.00	Source: bedrock_converse, Context: 200000
bedrock	anthropic.claude-v1	anthropic.claude-v1	8.00	24.00	Source: bedrock, Context: 100000
bedrock	anthropic.claude-v2:1	anthropic.claude-v2:1	8.00	24.00	Source: bedrock, Context: 100000
anyscale	zephyr-7b-beta	zephyr-7b-beta	0.15	0.15	Source: anyscale, Context: 16384
anyscale	CodeLlama-34b-Instruct-hf	codellama-34b-instruct-hf	1.00	1.00	Source: anyscale, Context: 4096
anyscale	CodeLlama-70b-Instruct-hf	codellama-70b-instruct-hf	1.00	1.00	Source: anyscale, Context: 4096
anyscale	gemma-7b-it	gemma-7b-it	0.15	0.15	Source: anyscale, Context: 8192
anyscale	Llama-2-13b-chat-hf	llama-2-13b-chat-hf	0.25	0.25	Source: anyscale, Context: 4096
anyscale	Llama-2-70b-chat-hf	llama-2-70b-chat-hf	1.00	1.00	Source: anyscale, Context: 4096
anyscale	Llama-2-7b-chat-hf	llama-2-7b-chat-hf	0.15	0.15	Source: anyscale, Context: 4096
anyscale	Meta-Llama-3-70B-Instruct	meta-llama-3-70b-instruct	1.00	1.00	Source: anyscale, Context: 8192
anyscale	Meta-Llama-3-8B-Instruct	meta-llama-3-8b-instruct	0.15	0.15	Source: anyscale, Context: 8192
anyscale	Mistral-7B-Instruct-v0.1	mistral-7b-instruct-v0.1	0.15	0.15	Source: anyscale, Context: 16384
anyscale	Mixtral-8x22B-Instruct-v0.1	mixtral-8x22b-instruct-v0.1	0.90	0.90	Source: anyscale, Context: 65536
anyscale	Mixtral-8x7B-Instruct-v0.1	mixtral-8x7b-instruct-v0.1	0.15	0.15	Source: anyscale, Context: 16384
bedrockconverse	apac.amazon.nova-lite-v1:0	apac.amazon.nova-lite-v1:0	0.06	0.25	Source: bedrock_converse, Context: 300000
bedrockconverse	apac.amazon.nova-micro-v1:0	apac.amazon.nova-micro-v1:0	0.04	0.15	Source: bedrock_converse, Context: 128000
bedrockconverse	apac.amazon.nova-pro-v1:0	apac.amazon.nova-pro-v1:0	0.84	3.36	Source: bedrock_converse, Context: 300000
bedrock	apac.anthropic.claude-3-5-sonnet-20240620-v1:0	apac.anthropic.claude-3-5-sonnet-20240620-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	apac.anthropic.claude-3-5-sonnet-20241022-v2:0	apac.anthropic.claude-3-5-sonnet-20241022-v2:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	apac.anthropic.claude-3-haiku-20240307-v1:0	apac.anthropic.claude-3-haiku-20240307-v1:0	0.25	1.25	Source: bedrock, Context: 200000
bedrockconverse	apac.anthropic.claude-haiku-4-5-20251001-v1:0	apac.anthropic.claude-haiku-4-5-20251001-v1:0	1.10	5.50	Source: bedrock_converse, Context: 200000
bedrock	apac.anthropic.claude-3-sonnet-20240229-v1:0	apac.anthropic.claude-3-sonnet-20240229-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrockconverse	apac.anthropic.claude-sonnet-4-20250514-v1:0	apac.anthropic.claude-sonnet-4-20250514-v1:0	3.00	15.00	Source: bedrock_converse, Context: 1000000
assemblyai	best	best	0.00	0.00	Source: assemblyai, Context: N/A
assemblyai	nano	nano	0.00	0.00	Source: assemblyai, Context: N/A
bedrockconverse	au.anthropic.claude-sonnet-4-5-20250929-v1:0	au.anthropic.claude-sonnet-4-5-20250929-v1:0	3.30	16.50	Source: bedrock_converse, Context: 200000
azure	ada	ada	0.10	0.00	Source: azure, Context: 8191
azure	command-r-plus	command-r-plus	3.00	15.00	Source: azure, Context: 128000
azureai	claude-haiku-4-5	claude-haiku-4-5	1.00	5.00	Source: azure_ai, Context: 200000
azureai	claude-opus-4-1	claude-opus-4-1	15.00	75.00	Source: azure_ai, Context: 200000
azureai	claude-sonnet-4-5	claude-sonnet-4-5	3.00	15.00	Source: azure_ai, Context: 200000
azure	computer-use-preview	computer-use-preview	3.00	12.00	Source: azure, Context: 8192
azure	container	container	0.00	0.00	Source: azure, Context: N/A
azureai	gpt-oss-120b	gpt-oss-120b	0.15	0.60	Source: azure_ai, Context: 131072
azure	gpt-4o-2024-08-06	gpt-4o-2024-08-06	2.75	11.00	Source: azure, Context: 128000
azure	gpt-4o-2024-11-20	gpt-4o-2024-11-20	2.75	11.00	Source: azure, Context: 128000
azure	gpt-4o-mini-2024-07-18	gpt-4o-mini-2024-07-18	0.17	0.66	Source: azure, Context: 128000
azure	gpt-4o-mini-realtime-preview-2024-12-17	gpt-4o-mini-realtime-preview-2024-12-17	0.66	2.64	Source: azure, Context: 128000
azure	gpt-4o-realtime-preview-2024-10-01	gpt-4o-realtime-preview-2024-10-01	5.50	22.00	Source: azure, Context: 128000
azure	gpt-4o-realtime-preview-2024-12-17	gpt-4o-realtime-preview-2024-12-17	5.50	22.00	Source: azure, Context: 128000
azure	gpt-5-2025-08-07	gpt-5-2025-08-07	1.38	11.00	Source: azure, Context: 272000
azure	gpt-5-mini-2025-08-07	gpt-5-mini-2025-08-07	0.28	2.20	Source: azure, Context: 272000
azure	gpt-5-nano-2025-08-07	gpt-5-nano-2025-08-07	0.06	0.44	Source: azure, Context: 272000
azure	o1-2024-12-17	o1-2024-12-17	16.50	66.00	Source: azure, Context: 200000
azure	o1-mini-2024-09-12	o1-mini-2024-09-12	1.21	4.84	Source: azure, Context: 128000
azure	o1-preview-2024-09-12	o1-preview-2024-09-12	16.50	66.00	Source: azure, Context: 128000
azure	o3-mini-2025-01-31	o3-mini-2025-01-31	1.21	4.84	Source: azure, Context: 200000
azure	gpt-3.5-turbo	gpt-3.5-turbo	0.50	1.50	Source: azure, Context: 4097
azuretext	gpt-3.5-turbo-instruct-0914	gpt-3.5-turbo-instruct-0914	1.50	2.00	Source: azure_text, Context: 4097
azure	gpt-35-turbo	gpt-35-turbo	0.50	1.50	Source: azure, Context: 4097
azure	gpt-35-turbo-0125	gpt-35-turbo-0125	0.50	1.50	Source: azure, Context: 16384
azure	gpt-35-turbo-0301	gpt-35-turbo-0301	0.20	2.00	Source: azure, Context: 4097
azure	gpt-35-turbo-0613	gpt-35-turbo-0613	1.50	2.00	Source: azure, Context: 4097
azure	gpt-35-turbo-1106	gpt-35-turbo-1106	1.00	2.00	Source: azure, Context: 16384
azure	gpt-35-turbo-16k	gpt-35-turbo-16k	3.00	4.00	Source: azure, Context: 16385
azure	gpt-35-turbo-16k-0613	gpt-35-turbo-16k-0613	3.00	4.00	Source: azure, Context: 16385
azuretext	gpt-35-turbo-instruct	gpt-35-turbo-instruct	1.50	2.00	Source: azure_text, Context: 4097
azuretext	gpt-35-turbo-instruct-0914	gpt-35-turbo-instruct-0914	1.50	2.00	Source: azure_text, Context: 4097
azure	gpt-4-0125-preview	gpt-4-0125-preview	10.00	30.00	Source: azure, Context: 128000
azure	gpt-4-0613	gpt-4-0613	30.00	60.00	Source: azure, Context: 8192
azure	gpt-4-1106-preview	gpt-4-1106-preview	10.00	30.00	Source: azure, Context: 128000
azure	gpt-4-32k-0613	gpt-4-32k-0613	60.00	120.00	Source: azure, Context: 32768
azure	gpt-4-turbo-2024-04-09	gpt-4-turbo-2024-04-09	10.00	30.00	Source: azure, Context: 128000
azure	gpt-4-turbo-vision-preview	gpt-4-turbo-vision-preview	10.00	30.00	Source: azure, Context: 128000
azure	gpt-4.1-2025-04-14	gpt-4.1-2025-04-14	2.00	8.00	Source: azure, Context: 1047576
azure	gpt-4.1-mini-2025-04-14	gpt-4.1-mini-2025-04-14	0.40	1.60	Source: azure, Context: 1047576
azure	gpt-4.1-nano-2025-04-14	gpt-4.1-nano-2025-04-14	0.10	0.40	Source: azure, Context: 1047576
azure	gpt-4.5-preview	gpt-4.5-preview	75.00	150.00	Source: azure, Context: 128000
azure	gpt-4o-2024-05-13	gpt-4o-2024-05-13	5.00	15.00	Source: azure, Context: 128000
azure	gpt-audio-2025-08-28	gpt-audio-2025-08-28	2.50	10.00	Source: azure, Context: 128000
azure	gpt-audio-mini-2025-10-06	gpt-audio-mini-2025-10-06	0.60	2.40	Source: azure, Context: 128000
azure	gpt-4o-audio-preview-2024-12-17	gpt-4o-audio-preview-2024-12-17	2.50	10.00	Source: azure, Context: 128000
azure	gpt-4o-mini-audio-preview-2024-12-17	gpt-4o-mini-audio-preview-2024-12-17	2.50	10.00	Source: azure, Context: 128000
azure	gpt-realtime-2025-08-28	gpt-realtime-2025-08-28	4.00	16.00	Source: azure, Context: 32000
azure	gpt-realtime-mini-2025-10-06	gpt-realtime-mini-2025-10-06	0.60	2.40	Source: azure, Context: 32000
azure	gpt-4o-mini-transcribe	gpt-4o-mini-transcribe	1.25	5.00	Source: azure, Context: 16000
azure	gpt-4o-mini-tts	gpt-4o-mini-tts	2.50	10.00	Source: azure, Context: N/A
azure	gpt-4o-transcribe	gpt-4o-transcribe	2.50	10.00	Source: azure, Context: 16000
azure	gpt-4o-transcribe-diarize	gpt-4o-transcribe-diarize	2.50	10.00	Source: azure, Context: 16000
azure	gpt-5.1-2025-11-13	gpt-5.1-2025-11-13	1.25	10.00	Source: azure, Context: 272000
azure	gpt-5.1-chat-2025-11-13	gpt-5.1-chat-2025-11-13	1.25	10.00	Source: azure, Context: 128000
azure	gpt-5.1-codex-2025-11-13	gpt-5.1-codex-2025-11-13	1.25	10.00	Source: azure, Context: 272000
azure	gpt-5.1-codex-mini-2025-11-13	gpt-5.1-codex-mini-2025-11-13	0.25	2.00	Source: azure, Context: 272000
azure	gpt-5-chat-latest	gpt-5-chat-latest	1.25	10.00	Source: azure, Context: 128000
azure	gpt-5.2-2025-12-11	gpt-5.2-2025-12-11	1.75	14.00	Source: azure, Context: 400000
azure	gpt-5.2-chat-2025-12-11	gpt-5.2-chat-2025-12-11	1.75	14.00	Source: azure, Context: 128000
azure	gpt-5.2-pro	gpt-5.2-pro	21.00	168.00	Source: azure, Context: 400000
azure	gpt-5.2-pro-2025-12-11	gpt-5.2-pro-2025-12-11	21.00	168.00	Source: azure, Context: 400000
azure	gpt-image-1	gpt-image-1	5.00	0.00	Source: azure, Context: N/A
azure	dall-e-3	dall-e-3	0.00	0.00	Source: azure, Context: N/A
azure	gpt-image-1-mini	gpt-image-1-mini	2.00	0.00	Source: azure, Context: N/A
azure	gpt-image-1.5	gpt-image-1.5	5.00	0.00	Source: azure, Context: N/A
azure	gpt-image-1.5-2025-12-16	gpt-image-1.5-2025-12-16	5.00	0.00	Source: azure, Context: N/A
azure	mistral-large-2402	mistral-large-2402	8.00	24.00	Source: azure, Context: 32000
azure	mistral-large-latest	mistral-large-latest	8.00	24.00	Source: azure, Context: 32000
azure	o3-2025-04-16	o3-2025-04-16	2.00	8.00	Source: azure, Context: 200000
azure	o3-deep-research	o3-deep-research	10.00	40.00	Source: azure, Context: 200000
azure	o3-pro	o3-pro	20.00	80.00	Source: azure, Context: 200000
azure	o3-pro-2025-06-10	o3-pro-2025-06-10	20.00	80.00	Source: azure, Context: 200000
azure	o4-mini-2025-04-16	o4-mini-2025-04-16	1.10	4.40	Source: azure, Context: 200000
azure	dall-e-2	dall-e-2	0.00	0.00	Source: azure, Context: N/A
azure	azure-tts	azure-tts	0.00	0.00	Source: azure, Context: N/A
azure	azure-tts-hd	azure-tts-hd	0.00	0.00	Source: azure, Context: N/A
azure	tts-1	tts-1	0.00	0.00	Source: azure, Context: N/A
azure	tts-1-hd	tts-1-hd	0.00	0.00	Source: azure, Context: N/A
azure	whisper-1	whisper-1	0.00	0.00	Source: azure, Context: N/A
azureai	Cohere-embed-v3-english	cohere-embed-v3-english	0.10	0.00	Source: azure_ai, Context: 512
azureai	Cohere-embed-v3-multilingual	cohere-embed-v3-multilingual	0.10	0.00	Source: azure_ai, Context: 512
azureai	FLUX-1.1-pro	flux-1.1-pro	0.00	0.00	Source: azure_ai, Context: N/A
azureai	FLUX.1-Kontext-pro	flux.1-kontext-pro	0.00	0.00	Source: azure_ai, Context: N/A
azureai	Llama-3.2-11B-Vision-Instruct	llama-3.2-11b-vision-instruct	0.37	0.37	Source: azure_ai, Context: 128000
azureai	Llama-3.2-90B-Vision-Instruct	llama-3.2-90b-vision-instruct	2.04	2.04	Source: azure_ai, Context: 128000
azureai	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.71	0.71	Source: azure_ai, Context: 128000
azureai	Llama-4-Maverick-17B-128E-Instruct-FP8	llama-4-maverick-17b-128e-instruct-fp8	1.41	0.35	Source: azure_ai, Context: 1000000
azureai	Llama-4-Scout-17B-16E-Instruct	llama-4-scout-17b-16e-instruct	0.20	0.78	Source: azure_ai, Context: 10000000
azureai	Meta-Llama-3-70B-Instruct	meta-llama-3-70b-instruct	1.10	0.37	Source: azure_ai, Context: 8192
azureai	Meta-Llama-3.1-405B-Instruct	meta-llama-3.1-405b-instruct	5.33	16.00	Source: azure_ai, Context: 128000
azureai	Meta-Llama-3.1-70B-Instruct	meta-llama-3.1-70b-instruct	2.68	3.54	Source: azure_ai, Context: 128000
azureai	Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.30	0.61	Source: azure_ai, Context: 128000
azureai	Phi-3-medium-128k-instruct	phi-3-medium-128k-instruct	0.17	0.68	Source: azure_ai, Context: 128000
azureai	Phi-3-medium-4k-instruct	phi-3-medium-4k-instruct	0.17	0.68	Source: azure_ai, Context: 4096
azureai	Phi-3-mini-128k-instruct	phi-3-mini-128k-instruct	0.13	0.52	Source: azure_ai, Context: 128000
azureai	Phi-3-mini-4k-instruct	phi-3-mini-4k-instruct	0.13	0.52	Source: azure_ai, Context: 4096
azureai	Phi-3-small-128k-instruct	phi-3-small-128k-instruct	0.15	0.60	Source: azure_ai, Context: 128000
azureai	Phi-3-small-8k-instruct	phi-3-small-8k-instruct	0.15	0.60	Source: azure_ai, Context: 8192
azureai	Phi-3.5-MoE-instruct	phi-3.5-moe-instruct	0.16	0.64	Source: azure_ai, Context: 128000
azureai	Phi-3.5-mini-instruct	phi-3.5-mini-instruct	0.13	0.52	Source: azure_ai, Context: 128000
azureai	Phi-3.5-vision-instruct	phi-3.5-vision-instruct	0.13	0.52	Source: azure_ai, Context: 128000
azureai	Phi-4	phi-4	0.13	0.50	Source: azure_ai, Context: 16384
azureai	Phi-4-mini-instruct	phi-4-mini-instruct	0.08	0.30	Source: azure_ai, Context: 131072
azureai	Phi-4-multimodal-instruct	phi-4-multimodal-instruct	0.08	0.32	Source: azure_ai, Context: 131072
azureai	Phi-4-mini-reasoning	phi-4-mini-reasoning	0.08	0.32	Source: azure_ai, Context: 131072
azureai	Phi-4-reasoning	phi-4-reasoning	0.13	0.50	Source: azure_ai, Context: 32768
azureai	mistral-document-ai-2505	mistral-document-ai-2505	0.00	0.00	Source: azure_ai, Context: N/A
azureai	prebuilt-read	prebuilt-read	0.00	0.00	Source: azure_ai, Context: N/A
azureai	prebuilt-layout	prebuilt-layout	0.00	0.00	Source: azure_ai, Context: N/A
azureai	prebuilt-document	prebuilt-document	0.00	0.00	Source: azure_ai, Context: N/A
azureai	MAI-DS-R1	mai-ds-r1	1.35	5.40	Source: azure_ai, Context: 128000
azureai	cohere-rerank-v3-english	cohere-rerank-v3-english	0.00	0.00	Source: azure_ai, Context: 4096
azureai	cohere-rerank-v3-multilingual	cohere-rerank-v3-multilingual	0.00	0.00	Source: azure_ai, Context: 4096
azureai	cohere-rerank-v3.5	cohere-rerank-v3.5	0.00	0.00	Source: azure_ai, Context: 4096
azureai	cohere-rerank-v4.0-pro	cohere-rerank-v4.0-pro	0.00	0.00	Source: azure_ai, Context: 32768
azureai	cohere-rerank-v4.0-fast	cohere-rerank-v4.0-fast	0.00	0.00	Source: azure_ai, Context: 32768
azureai	deepseek-v3.2	deepseek-v3.2	0.58	1.68	Source: azure_ai, Context: 163840
azureai	deepseek-v3.2-speciale	deepseek-v3.2-speciale	0.58	1.68	Source: azure_ai, Context: 163840
azureai	deepseek-r1	deepseek-r1	1.35	5.40	Source: azure_ai, Context: 128000
azureai	deepseek-v3	deepseek-v3	1.14	4.56	Source: azure_ai, Context: 128000
azureai	deepseek-v3-0324	deepseek-v3-0324	1.14	4.56	Source: azure_ai, Context: 128000
azureai	embed-v-4-0	embed-v-4-0	0.12	0.00	Source: azure_ai, Context: 128000
azureai	grok-3	grok-3	3.00	15.00	Source: azure_ai, Context: 131072
azureai	grok-3-mini	grok-3-mini	0.25	1.27	Source: azure_ai, Context: 131072
azureai	grok-4	grok-4	5.50	27.50	Source: azure_ai, Context: 131072
azureai	grok-4-fast-non-reasoning	grok-4-fast-non-reasoning	0.43	1.73	Source: azure_ai, Context: 131072
azureai	grok-4-fast-reasoning	grok-4-fast-reasoning	0.43	1.73	Source: azure_ai, Context: 131072
azureai	grok-code-fast-1	grok-code-fast-1	3.50	17.50	Source: azure_ai, Context: 131072
azureai	jais-30b-chat	jais-30b-chat	3,200.00	9,710.00	Source: azure_ai, Context: 8192
azureai	jamba-instruct	jamba-instruct	0.50	0.70	Source: azure_ai, Context: 70000
azureai	ministral-3b	ministral-3b	0.04	0.04	Source: azure_ai, Context: 128000
azureai	mistral-large	mistral-large	4.00	12.00	Source: azure_ai, Context: 32000
azureai	mistral-large-2407	mistral-large-2407	2.00	6.00	Source: azure_ai, Context: 128000
azureai	mistral-large-latest	mistral-large-latest	2.00	6.00	Source: azure_ai, Context: 128000
azureai	mistral-large-3	mistral-large-3	0.50	1.50	Source: azure_ai, Context: 256000
azureai	mistral-medium-2505	mistral-medium-2505	0.40	2.00	Source: azure_ai, Context: 131072
azureai	mistral-nemo	mistral-nemo	0.15	0.15	Source: azure_ai, Context: 131072
azureai	mistral-small	mistral-small	1.00	3.00	Source: azure_ai, Context: 32000
azureai	mistral-small-2503	mistral-small-2503	1.00	3.00	Source: azure_ai, Context: 128000
textcompletionopenai	babbage-002	babbage-002	0.40	0.40	Source: text-completion-openai, Context: 16384
bedrock	cohere.command-light-text-v14	cohere.command-light-text-v14	0.30	0.60	Source: bedrock, Context: 4096
bedrock	cohere.command-text-v14	cohere.command-text-v14	1.50	2.00	Source: bedrock, Context: 4096
bedrock	meta.llama3-70b-instruct-v1:0	meta.llama3-70b-instruct-v1:0	3.18	4.20	Source: bedrock, Context: 8192
bedrock	meta.llama3-8b-instruct-v1:0	meta.llama3-8b-instruct-v1:0	0.36	0.72	Source: bedrock, Context: 8192
bedrock	mistral.mistral-7b-instruct-v0:2	mistral.mistral-7b-instruct-v0:2	0.20	0.26	Source: bedrock, Context: 32000
bedrock	mistral.mistral-large-2402-v1:0	mistral.mistral-large-2402-v1:0	10.40	31.20	Source: bedrock, Context: 32000
bedrock	mistral.mixtral-8x7b-instruct-v0:1	mistral.mixtral-8x7b-instruct-v0:1	0.59	0.91	Source: bedrock, Context: 32000
bedrock	amazon.nova-pro-v1:0	amazon.nova-pro-v1:0	0.96	3.84	Source: bedrock, Context: 300000
bedrock	claude-sonnet-4-5-20250929-v1:0	claude-sonnet-4-5-20250929-v1:0	3.30	16.50	Source: bedrock, Context: 200000
bedrock	anthropic.claude-3-7-sonnet-20250219-v1:0	anthropic.claude-3-7-sonnet-20250219-v1:0	3.60	18.00	Source: bedrock, Context: 200000
bedrock	us.anthropic.claude-3-5-haiku-20241022-v1:0	us.anthropic.claude-3-5-haiku-20241022-v1:0	0.80	4.00	Source: bedrock, Context: 200000
cerebras	llama-3.3-70b	llama-3.3-70b	0.85	1.20	Source: cerebras, Context: 128000
cerebras	llama3.1-70b	llama3.1-70b	0.60	0.60	Source: cerebras, Context: 128000
cerebras	llama3.1-8b	llama3.1-8b	0.10	0.10	Source: cerebras, Context: 128000
cerebras	qwen-3-32b	qwen-3-32b	0.40	0.80	Source: cerebras, Context: 128000
vertex	chat-bison	chat-bison	0.13	0.13	Source: vertex, Context: 8192
vertex	chat-bison-32k	chat-bison-32k	0.13	0.13	Source: vertex, Context: 32000
vertex	chat-bison-32k@002	chat-bison-32k@002	0.13	0.13	Source: vertex, Context: 32000
vertex	chat-bison@001	chat-bison@001	0.13	0.13	Source: vertex, Context: 8192
vertex	chat-bison@002	chat-bison@002	0.13	0.13	Source: vertex, Context: 8192
nlpcloud	chatdolphin	chatdolphin	0.50	0.50	Source: nlp_cloud, Context: 16384
openai	chatgpt-4o-latest	chatgpt-4o-latest	5.00	15.00	Source: openai, Context: 128000
openai	gpt-4o-transcribe-diarize	gpt-4o-transcribe-diarize	2.50	10.00	Source: openai, Context: 16000
anthropic	claude-3-5-sonnet-latest	claude-3-5-sonnet-latest	3.00	15.00	Source: anthropic, Context: 200000
anthropic	claude-3-opus-latest	claude-3-opus-latest	15.00	75.00	Source: anthropic, Context: 200000
anthropic	claude-4-opus-20250514	claude-4-opus-20250514	15.00	75.00	Source: anthropic, Context: 200000
anthropic	claude-4-sonnet-20250514	claude-4-sonnet-20250514	3.00	15.00	Source: anthropic, Context: 1000000
cloudflare	llama-2-7b-chat-fp16	llama-2-7b-chat-fp16	1.92	1.92	Source: cloudflare, Context: 3072
cloudflare	llama-2-7b-chat-int8	llama-2-7b-chat-int8	1.92	1.92	Source: cloudflare, Context: 2048
cloudflare	mistral-7b-instruct-v0.1	mistral-7b-instruct-v0.1	1.92	1.92	Source: cloudflare, Context: 8192
cloudflare	codellama-7b-instruct-awq	codellama-7b-instruct-awq	1.92	1.92	Source: cloudflare, Context: 4096
vertex	code-bison	code-bison	0.13	0.13	Source: vertex, Context: 6144
vertex	code-bison-32k@002	code-bison-32k@002	0.13	0.13	Source: vertex, Context: 6144
vertex	code-bison32k	code-bison32k	0.13	0.13	Source: vertex, Context: 6144
vertex	code-bison@001	code-bison@001	0.13	0.13	Source: vertex, Context: 6144
vertex	code-bison@002	code-bison@002	0.13	0.13	Source: vertex, Context: 6144
vertex	code-gecko	code-gecko	0.13	0.13	Source: vertex, Context: 2048
vertex	code-gecko-latest	code-gecko-latest	0.13	0.13	Source: vertex, Context: 2048
vertex	code-gecko@001	code-gecko@001	0.13	0.13	Source: vertex, Context: 2048
vertex	code-gecko@002	code-gecko@002	0.13	0.13	Source: vertex, Context: 2048
vertex	codechat-bison	codechat-bison	0.13	0.13	Source: vertex, Context: 6144
vertex	codechat-bison-32k	codechat-bison-32k	0.13	0.13	Source: vertex, Context: 32000
vertex	codechat-bison-32k@002	codechat-bison-32k@002	0.13	0.13	Source: vertex, Context: 32000
vertex	codechat-bison@001	codechat-bison@001	0.13	0.13	Source: vertex, Context: 6144
vertex	codechat-bison@002	codechat-bison@002	0.13	0.13	Source: vertex, Context: 6144
vertex	codechat-bison@latest	codechat-bison@latest	0.13	0.13	Source: vertex, Context: 6144
codestral	codestral-2405	codestral-2405	0.00	0.00	Source: codestral, Context: 32000
codestral	codestral-latest	codestral-latest	0.00	0.00	Source: codestral, Context: 32000
bedrock	cohere.command-r-plus-v1:0	cohere.command-r-plus-v1:0	3.00	15.00	Source: bedrock, Context: 128000
bedrock	cohere.command-r-v1:0	cohere.command-r-v1:0	0.50	1.50	Source: bedrock, Context: 128000
bedrock	cohere.embed-english-v3	cohere.embed-english-v3	0.10	0.00	Source: bedrock, Context: 512
bedrock	cohere.embed-multilingual-v3	cohere.embed-multilingual-v3	0.10	0.00	Source: bedrock, Context: 512
bedrock	cohere.embed-v4:0	cohere.embed-v4:0	0.12	0.00	Source: bedrock, Context: 128000
cohere	embed-v4.0	embed-v4.0	0.12	0.00	Source: cohere, Context: 128000
bedrock	cohere.rerank-v3-5:0	cohere.rerank-v3-5:0	0.00	0.00	Source: bedrock, Context: 32000
cohere	command	command	1.00	2.00	Source: cohere, Context: 4096
coherechat	command-a-03-2025	command-a-03-2025	2.50	10.00	Source: cohere_chat, Context: 256000
coherechat	command-light	command-light	0.30	0.60	Source: cohere_chat, Context: 4096
cohere	command-nightly	command-nightly	1.00	2.00	Source: cohere, Context: 4096
coherechat	command-r	command-r	0.15	0.60	Source: cohere_chat, Context: 128000
coherechat	command-r-08-2024	command-r-08-2024	0.15	0.60	Source: cohere_chat, Context: 128000
coherechat	command-r-plus	command-r-plus	2.50	10.00	Source: cohere_chat, Context: 128000
coherechat	command-r-plus-08-2024	command-r-plus-08-2024	2.50	10.00	Source: cohere_chat, Context: 128000
coherechat	command-r7b-12-2024	command-r7b-12-2024	0.15	0.04	Source: cohere_chat, Context: 128000
dashscope	qwen-coder	qwen-coder	0.30	1.50	Source: dashscope, Context: 1000000
dashscope	qwen-flash	qwen-flash	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen-flash-2025-07-28	qwen-flash-2025-07-28	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen-max	qwen-max	1.60	6.40	Source: dashscope, Context: 30720
dashscope	qwen-plus	qwen-plus	0.40	1.20	Source: dashscope, Context: 129024
dashscope	qwen-plus-2025-01-25	qwen-plus-2025-01-25	0.40	1.20	Source: dashscope, Context: 129024
dashscope	qwen-plus-2025-04-28	qwen-plus-2025-04-28	0.40	1.20	Source: dashscope, Context: 129024
dashscope	qwen-plus-2025-07-14	qwen-plus-2025-07-14	0.40	1.20	Source: dashscope, Context: 129024
dashscope	qwen-plus-2025-07-28	qwen-plus-2025-07-28	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen-plus-2025-09-11	qwen-plus-2025-09-11	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen-plus-latest	qwen-plus-latest	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen-turbo	qwen-turbo	0.05	0.20	Source: dashscope, Context: 129024
dashscope	qwen-turbo-2024-11-01	qwen-turbo-2024-11-01	0.05	0.20	Source: dashscope, Context: 1000000
dashscope	qwen-turbo-2025-04-28	qwen-turbo-2025-04-28	0.05	0.20	Source: dashscope, Context: 1000000
dashscope	qwen-turbo-latest	qwen-turbo-latest	0.05	0.20	Source: dashscope, Context: 1000000
dashscope	qwen3-30b-a3b	qwen3-30b-a3b	0.00	0.00	Source: dashscope, Context: 129024
dashscope	qwen3-coder-flash	qwen3-coder-flash	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen3-coder-flash-2025-07-28	qwen3-coder-flash-2025-07-28	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen3-coder-plus	qwen3-coder-plus	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen3-coder-plus-2025-07-22	qwen3-coder-plus-2025-07-22	0.00	0.00	Source: dashscope, Context: 997952
dashscope	qwen3-max-preview	qwen3-max-preview	0.00	0.00	Source: dashscope, Context: 258048
dashscope	qwq-plus	qwq-plus	0.80	2.40	Source: dashscope, Context: 98304
databricks	databricks-bge-large-en	databricks-bge-large-en	0.10	0.00	Source: databricks, Context: 512
databricks	databricks-claude-3-7-sonnet	databricks-claude-3-7-sonnet	3.00	15.00	Source: databricks, Context: 200000
databricks	databricks-claude-haiku-4-5	databricks-claude-haiku-4-5	1.00	5.00	Source: databricks, Context: 200000
databricks	databricks-claude-opus-4	databricks-claude-opus-4	15.00	75.00	Source: databricks, Context: 200000
databricks	databricks-claude-opus-4-1	databricks-claude-opus-4-1	15.00	75.00	Source: databricks, Context: 200000
databricks	databricks-claude-opus-4-5	databricks-claude-opus-4-5	5.00	25.00	Source: databricks, Context: 200000
databricks	databricks-claude-sonnet-4	databricks-claude-sonnet-4	3.00	15.00	Source: databricks, Context: 200000
databricks	databricks-claude-sonnet-4-1	databricks-claude-sonnet-4-1	3.00	15.00	Source: databricks, Context: 200000
databricks	databricks-claude-sonnet-4-5	databricks-claude-sonnet-4-5	3.00	15.00	Source: databricks, Context: 200000
databricks	databricks-gemini-2-5-flash	databricks-gemini-2-5-flash	0.30	2.50	Source: databricks, Context: 1048576
databricks	databricks-gemini-2-5-pro	databricks-gemini-2-5-pro	1.25	10.00	Source: databricks, Context: 1048576
databricks	databricks-gemma-3-12b	databricks-gemma-3-12b	0.15	0.50	Source: databricks, Context: 128000
databricks	databricks-gpt-5	databricks-gpt-5	1.25	10.00	Source: databricks, Context: 400000
databricks	databricks-gpt-5-1	databricks-gpt-5-1	1.25	10.00	Source: databricks, Context: 400000
databricks	databricks-gpt-5-mini	databricks-gpt-5-mini	0.25	2.00	Source: databricks, Context: 400000
databricks	databricks-gpt-5-nano	databricks-gpt-5-nano	0.05	0.40	Source: databricks, Context: 400000
databricks	databricks-gpt-oss-120b	databricks-gpt-oss-120b	0.15	0.60	Source: databricks, Context: 131072
databricks	databricks-gpt-oss-20b	databricks-gpt-oss-20b	0.07	0.30	Source: databricks, Context: 131072
databricks	databricks-gte-large-en	databricks-gte-large-en	0.13	0.00	Source: databricks, Context: 8192
databricks	databricks-llama-2-70b-chat	databricks-llama-2-70b-chat	0.50	1.50	Source: databricks, Context: 4096
databricks	databricks-llama-4-maverick	databricks-llama-4-maverick	0.50	1.50	Source: databricks, Context: 128000
databricks	databricks-meta-llama-3-1-405b-instruct	databricks-meta-llama-3-1-405b-instruct	5.00	15.00	Source: databricks, Context: 128000
databricks	databricks-meta-llama-3-1-8b-instruct	databricks-meta-llama-3-1-8b-instruct	0.15	0.45	Source: databricks, Context: 200000
databricks	databricks-meta-llama-3-3-70b-instruct	databricks-meta-llama-3-3-70b-instruct	0.50	1.50	Source: databricks, Context: 128000
databricks	databricks-meta-llama-3-70b-instruct	databricks-meta-llama-3-70b-instruct	1.00	3.00	Source: databricks, Context: 128000
databricks	databricks-mixtral-8x7b-instruct	databricks-mixtral-8x7b-instruct	0.50	1.00	Source: databricks, Context: 4096
databricks	databricks-mpt-30b-instruct	databricks-mpt-30b-instruct	1.00	1.00	Source: databricks, Context: 8192
databricks	databricks-mpt-7b-instruct	databricks-mpt-7b-instruct	0.50	0.00	Source: databricks, Context: 8192
dataforseo	search	search	0.00	0.00	Source: dataforseo, Context: N/A
textcompletionopenai	davinci-002	davinci-002	2.00	2.00	Source: text-completion-openai, Context: 16384
deepgram	base	base	0.00	0.00	Source: deepgram, Context: N/A
deepgram	base-conversationalai	base-conversationalai	0.00	0.00	Source: deepgram, Context: N/A
deepgram	base-finance	base-finance	0.00	0.00	Source: deepgram, Context: N/A
deepgram	base-general	base-general	0.00	0.00	Source: deepgram, Context: N/A
deepgram	base-meeting	base-meeting	0.00	0.00	Source: deepgram, Context: N/A
deepgram	base-phonecall	base-phonecall	0.00	0.00	Source: deepgram, Context: N/A
deepgram	base-video	base-video	0.00	0.00	Source: deepgram, Context: N/A
deepgram	base-voicemail	base-voicemail	0.00	0.00	Source: deepgram, Context: N/A
deepgram	enhanced	enhanced	0.00	0.00	Source: deepgram, Context: N/A
deepgram	enhanced-finance	enhanced-finance	0.00	0.00	Source: deepgram, Context: N/A
deepgram	enhanced-general	enhanced-general	0.00	0.00	Source: deepgram, Context: N/A
deepgram	enhanced-meeting	enhanced-meeting	0.00	0.00	Source: deepgram, Context: N/A
deepgram	enhanced-phonecall	enhanced-phonecall	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova	nova	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2	nova-2	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-atc	nova-2-atc	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-automotive	nova-2-automotive	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-conversationalai	nova-2-conversationalai	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-drivethru	nova-2-drivethru	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-finance	nova-2-finance	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-general	nova-2-general	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-meeting	nova-2-meeting	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-phonecall	nova-2-phonecall	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-video	nova-2-video	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-2-voicemail	nova-2-voicemail	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-3	nova-3	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-3-general	nova-3-general	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-3-medical	nova-3-medical	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-general	nova-general	0.00	0.00	Source: deepgram, Context: N/A
deepgram	nova-phonecall	nova-phonecall	0.00	0.00	Source: deepgram, Context: N/A
deepgram	whisper	whisper	0.00	0.00	Source: deepgram, Context: N/A
deepgram	whisper-base	whisper-base	0.00	0.00	Source: deepgram, Context: N/A
deepgram	whisper-large	whisper-large	0.00	0.00	Source: deepgram, Context: N/A
deepgram	whisper-medium	whisper-medium	0.00	0.00	Source: deepgram, Context: N/A
deepgram	whisper-small	whisper-small	0.00	0.00	Source: deepgram, Context: N/A
deepgram	whisper-tiny	whisper-tiny	0.00	0.00	Source: deepgram, Context: N/A
deepinfra	MythoMax-L2-13b	mythomax-l2-13b	0.08	0.09	Source: deepinfra, Context: 4096
deepinfra	Hermes-3-Llama-3.1-405B	hermes-3-llama-3.1-405b	1.00	1.00	Source: deepinfra, Context: 131072
deepinfra	Hermes-3-Llama-3.1-70B	hermes-3-llama-3.1-70b	0.30	0.30	Source: deepinfra, Context: 131072
deepinfra	QwQ-32B	qwq-32b	0.15	0.40	Source: deepinfra, Context: 131072
deepinfra	Qwen2.5-72B-Instruct	qwen2.5-72b-instruct	0.12	0.39	Source: deepinfra, Context: 32768
deepinfra	Qwen2.5-7B-Instruct	qwen2.5-7b-instruct	0.04	0.10	Source: deepinfra, Context: 32768
deepinfra	Qwen2.5-VL-32B-Instruct	qwen2.5-vl-32b-instruct	0.20	0.60	Source: deepinfra, Context: 128000
deepinfra	Qwen3-14B	qwen3-14b	0.06	0.24	Source: deepinfra, Context: 40960
deepinfra	Qwen3-235B-A22B	qwen3-235b-a22b	0.18	0.54	Source: deepinfra, Context: 40960
deepinfra	Qwen3-235B-A22B-Instruct-2507	qwen3-235b-a22b-instruct-2507	0.09	0.60	Source: deepinfra, Context: 262144
deepinfra	Qwen3-235B-A22B-Thinking-2507	qwen3-235b-a22b-thinking-2507	0.30	2.90	Source: deepinfra, Context: 262144
deepinfra	Qwen3-30B-A3B	qwen3-30b-a3b	0.08	0.29	Source: deepinfra, Context: 40960
deepinfra	Qwen3-32B	qwen3-32b	0.10	0.28	Source: deepinfra, Context: 40960
deepinfra	Qwen3-Next-80B-A3B-Instruct	qwen3-next-80b-a3b-instruct	0.14	1.40	Source: deepinfra, Context: 262144
deepinfra	Qwen3-Next-80B-A3B-Thinking	qwen3-next-80b-a3b-thinking	0.14	1.40	Source: deepinfra, Context: 262144
deepinfra	L3-8B-Lunaris-v1-Turbo	l3-8b-lunaris-v1-turbo	0.04	0.05	Source: deepinfra, Context: 8192
deepinfra	L3.1-70B-Euryale-v2.2	l3.1-70b-euryale-v2.2	0.65	0.75	Source: deepinfra, Context: 131072
deepinfra	L3.3-70B-Euryale-v2.3	l3.3-70b-euryale-v2.3	0.65	0.75	Source: deepinfra, Context: 131072
deepinfra	olmOCR-7B-0725-FP8	olmocr-7b-0725-fp8	0.27	1.50	Source: deepinfra, Context: 16384
deepinfra	claude-3-7-sonnet-latest	claude-3-7-sonnet-latest	3.30	16.50	Source: deepinfra, Context: 200000
deepinfra	claude-4-opus	claude-4-opus	16.50	82.50	Source: deepinfra, Context: 200000
deepinfra	claude-4-sonnet	claude-4-sonnet	3.30	16.50	Source: deepinfra, Context: 200000
deepinfra	DeepSeek-R1	deepseek-r1	0.70	2.40	Source: deepinfra, Context: 163840
deepinfra	DeepSeek-R1-0528	deepseek-r1-0528	0.50	2.15	Source: deepinfra, Context: 163840
deepinfra	DeepSeek-R1-0528-Turbo	deepseek-r1-0528-turbo	1.00	3.00	Source: deepinfra, Context: 32768
deepinfra	DeepSeek-R1-Distill-Llama-70B	deepseek-r1-distill-llama-70b	0.20	0.60	Source: deepinfra, Context: 131072
deepinfra	DeepSeek-R1-Distill-Qwen-32B	deepseek-r1-distill-qwen-32b	0.27	0.27	Source: deepinfra, Context: 131072
deepinfra	DeepSeek-R1-Turbo	deepseek-r1-turbo	1.00	3.00	Source: deepinfra, Context: 40960
deepinfra	DeepSeek-V3	deepseek-v3	0.38	0.89	Source: deepinfra, Context: 163840
deepinfra	DeepSeek-V3-0324	deepseek-v3-0324	0.25	0.88	Source: deepinfra, Context: 163840
deepinfra	DeepSeek-V3.1	deepseek-v3.1	0.27	1.00	Source: deepinfra, Context: 163840
deepinfra	DeepSeek-V3.1-Terminus	deepseek-v3.1-terminus	0.27	1.00	Source: deepinfra, Context: 163840
deepinfra	gemini-2.0-flash-001	gemini-2.0-flash-001	0.10	0.40	Source: deepinfra, Context: 1000000
deepinfra	gemini-2.5-flash	gemini-2.5-flash	0.30	2.50	Source: deepinfra, Context: 1000000
deepinfra	gemini-2.5-pro	gemini-2.5-pro	1.25	10.00	Source: deepinfra, Context: 1000000
deepinfra	gemma-3-12b-it	gemma-3-12b-it	0.05	0.10	Source: deepinfra, Context: 131072
deepinfra	gemma-3-27b-it	gemma-3-27b-it	0.09	0.16	Source: deepinfra, Context: 131072
deepinfra	gemma-3-4b-it	gemma-3-4b-it	0.04	0.08	Source: deepinfra, Context: 131072
deepinfra	Llama-3.2-11B-Vision-Instruct	llama-3.2-11b-vision-instruct	0.05	0.05	Source: deepinfra, Context: 131072
deepinfra	Llama-3.2-3B-Instruct	llama-3.2-3b-instruct	0.02	0.02	Source: deepinfra, Context: 131072
deepinfra	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.23	0.40	Source: deepinfra, Context: 131072
deepinfra	Llama-3.3-70B-Instruct-Turbo	llama-3.3-70b-instruct-turbo	0.13	0.39	Source: deepinfra, Context: 131072
deepinfra	Llama-4-Maverick-17B-128E-Instruct-FP8	llama-4-maverick-17b-128e-instruct-fp8	0.15	0.60	Source: deepinfra, Context: 1048576
deepinfra	Llama-4-Scout-17B-16E-Instruct	llama-4-scout-17b-16e-instruct	0.08	0.30	Source: deepinfra, Context: 327680
deepinfra	Llama-Guard-3-8B	llama-guard-3-8b	0.06	0.06	Source: deepinfra, Context: 131072
deepinfra	Llama-Guard-4-12B	llama-guard-4-12b	0.18	0.18	Source: deepinfra, Context: 163840
deepinfra	Meta-Llama-3-8B-Instruct	meta-llama-3-8b-instruct	0.03	0.06	Source: deepinfra, Context: 8192
deepinfra	Meta-Llama-3.1-70B-Instruct	meta-llama-3.1-70b-instruct	0.40	0.40	Source: deepinfra, Context: 131072
deepinfra	Meta-Llama-3.1-70B-Instruct-Turbo	meta-llama-3.1-70b-instruct-turbo	0.10	0.28	Source: deepinfra, Context: 131072
deepinfra	Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.03	0.05	Source: deepinfra, Context: 131072
deepinfra	Meta-Llama-3.1-8B-Instruct-Turbo	meta-llama-3.1-8b-instruct-turbo	0.02	0.03	Source: deepinfra, Context: 131072
deepinfra	WizardLM-2-8x22B	wizardlm-2-8x22b	0.48	0.48	Source: deepinfra, Context: 65536
deepinfra	phi-4	phi-4	0.07	0.14	Source: deepinfra, Context: 16384
deepinfra	Mistral-Nemo-Instruct-2407	mistral-nemo-instruct-2407	0.02	0.04	Source: deepinfra, Context: 131072
deepinfra	Mistral-Small-24B-Instruct-2501	mistral-small-24b-instruct-2501	0.05	0.08	Source: deepinfra, Context: 32768
deepinfra	Mistral-Small-3.2-24B-Instruct-2506	mistral-small-3.2-24b-instruct-2506	0.08	0.20	Source: deepinfra, Context: 128000
deepinfra	Mixtral-8x7B-Instruct-v0.1	mixtral-8x7b-instruct-v0.1	0.40	0.40	Source: deepinfra, Context: 32768
deepinfra	Kimi-K2-Instruct-0905	kimi-k2-instruct-0905	0.50	2.00	Source: deepinfra, Context: 262144
deepinfra	Llama-3.1-Nemotron-70B-Instruct	llama-3.1-nemotron-70b-instruct	0.60	0.60	Source: deepinfra, Context: 131072
deepinfra	Llama-3.3-Nemotron-Super-49B-v1.5	llama-3.3-nemotron-super-49b-v1.5	0.10	0.40	Source: deepinfra, Context: 131072
deepinfra	NVIDIA-Nemotron-Nano-9B-v2	nvidia-nemotron-nano-9b-v2	0.04	0.16	Source: deepinfra, Context: 131072
deepseek	deepseek-coder	deepseek-coder	0.14	0.28	Source: deepseek, Context: 128000
deepseek	deepseek-r1	deepseek-r1	0.55	2.19	Source: deepseek, Context: 65536
deepseek	deepseek-v3	deepseek-v3	0.27	1.10	Source: deepseek, Context: 65536
deepseek	deepseek-v3.2	deepseek-v3.2	0.28	0.40	Source: deepseek, Context: 163840
bedrockconverse	deepseek.v3-v1:0	deepseek.v3-v1:0	0.58	1.68	Source: bedrock_converse, Context: 163840
nlpcloud	dolphin	dolphin	0.50	0.50	Source: nlp_cloud, Context: 16384
volcengine	doubao-embedding	doubao-embedding	0.00	0.00	Source: volcengine, Context: 4096
volcengine	doubao-embedding-large	doubao-embedding-large	0.00	0.00	Source: volcengine, Context: 4096
volcengine	doubao-embedding-large-text-240915	doubao-embedding-large-text-240915	0.00	0.00	Source: volcengine, Context: 4096
volcengine	doubao-embedding-large-text-250515	doubao-embedding-large-text-250515	0.00	0.00	Source: volcengine, Context: 4096
volcengine	doubao-embedding-text-240715	doubao-embedding-text-240715	0.00	0.00	Source: volcengine, Context: 4096
exaai	search	search	0.00	0.00	Source: exa_ai, Context: N/A
firecrawl	search	search	0.00	0.00	Source: firecrawl, Context: N/A
perplexity	search	search	0.00	0.00	Source: perplexity, Context: N/A
searxng	search	search	0.00	0.00	Source: searxng, Context: N/A
elevenlabs	scribe_v1	scribe_v1	0.00	0.00	Source: elevenlabs, Context: N/A
elevenlabs	scribe_v1_experimental	scribe_v1_experimental	0.00	0.00	Source: elevenlabs, Context: N/A
cohere	embed-english-light-v2.0	embed-english-light-v2.0	0.10	0.00	Source: cohere, Context: 1024
cohere	embed-english-light-v3.0	embed-english-light-v3.0	0.10	0.00	Source: cohere, Context: 1024
cohere	embed-english-v2.0	embed-english-v2.0	0.10	0.00	Source: cohere, Context: 4096
cohere	embed-english-v3.0	embed-english-v3.0	0.10	0.00	Source: cohere, Context: 1024
cohere	embed-multilingual-v2.0	embed-multilingual-v2.0	0.10	0.00	Source: cohere, Context: 768
cohere	embed-multilingual-v3.0	embed-multilingual-v3.0	0.10	0.00	Source: cohere, Context: 1024
cohere	embed-multilingual-light-v3.0	embed-multilingual-light-v3.0	100.00	0.00	Source: cohere, Context: 1024
bedrockconverse	eu.amazon.nova-lite-v1:0	eu.amazon.nova-lite-v1:0	0.08	0.31	Source: bedrock_converse, Context: 300000
bedrockconverse	eu.amazon.nova-micro-v1:0	eu.amazon.nova-micro-v1:0	0.05	0.18	Source: bedrock_converse, Context: 128000
bedrockconverse	eu.amazon.nova-pro-v1:0	eu.amazon.nova-pro-v1:0	1.05	4.20	Source: bedrock_converse, Context: 300000
bedrock	eu.anthropic.claude-3-5-haiku-20241022-v1:0	eu.anthropic.claude-3-5-haiku-20241022-v1:0	0.25	1.25	Source: bedrock, Context: 200000
bedrockconverse	eu.anthropic.claude-haiku-4-5-20251001-v1:0	eu.anthropic.claude-haiku-4-5-20251001-v1:0	1.10	5.50	Source: bedrock_converse, Context: 200000
bedrock	eu.anthropic.claude-3-5-sonnet-20240620-v1:0	eu.anthropic.claude-3-5-sonnet-20240620-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	eu.anthropic.claude-3-5-sonnet-20241022-v2:0	eu.anthropic.claude-3-5-sonnet-20241022-v2:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	eu.anthropic.claude-3-7-sonnet-20250219-v1:0	eu.anthropic.claude-3-7-sonnet-20250219-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	eu.anthropic.claude-3-haiku-20240307-v1:0	eu.anthropic.claude-3-haiku-20240307-v1:0	0.25	1.25	Source: bedrock, Context: 200000
bedrock	eu.anthropic.claude-3-opus-20240229-v1:0	eu.anthropic.claude-3-opus-20240229-v1:0	15.00	75.00	Source: bedrock, Context: 200000
bedrock	eu.anthropic.claude-3-sonnet-20240229-v1:0	eu.anthropic.claude-3-sonnet-20240229-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrockconverse	eu.anthropic.claude-opus-4-1-20250805-v1:0	eu.anthropic.claude-opus-4-1-20250805-v1:0	15.00	75.00	Source: bedrock_converse, Context: 200000
bedrockconverse	eu.anthropic.claude-opus-4-20250514-v1:0	eu.anthropic.claude-opus-4-20250514-v1:0	15.00	75.00	Source: bedrock_converse, Context: 200000
bedrockconverse	eu.anthropic.claude-sonnet-4-20250514-v1:0	eu.anthropic.claude-sonnet-4-20250514-v1:0	3.00	15.00	Source: bedrock_converse, Context: 1000000
bedrockconverse	eu.anthropic.claude-sonnet-4-5-20250929-v1:0	eu.anthropic.claude-sonnet-4-5-20250929-v1:0	3.30	16.50	Source: bedrock_converse, Context: 200000
bedrock	eu.meta.llama3-2-1b-instruct-v1:0	eu.meta.llama3-2-1b-instruct-v1:0	0.13	0.13	Source: bedrock, Context: 128000
bedrock	eu.meta.llama3-2-3b-instruct-v1:0	eu.meta.llama3-2-3b-instruct-v1:0	0.19	0.19	Source: bedrock, Context: 128000
bedrockconverse	eu.mistral.pixtral-large-2502-v1:0	eu.mistral.pixtral-large-2502-v1:0	2.00	6.00	Source: bedrock_converse, Context: 128000
falai	3.2	3.2	0.00	0.00	Source: fal_ai, Context: N/A
falai	v1.1	v1.1	0.00	0.00	Source: fal_ai, Context: N/A
falai	v1.1-ultra	v1.1-ultra	0.00	0.00	Source: fal_ai, Context: N/A
falai	schnell	schnell	0.00	0.00	Source: fal_ai, Context: N/A
falai	text-to-image	text-to-image	0.00	0.00	Source: fal_ai, Context: N/A
falai	v3	v3	0.00	0.00	Source: fal_ai, Context: N/A
falai	preview	preview	0.00	0.00	Source: fal_ai, Context: N/A
falai	fast	fast	0.00	0.00	Source: fal_ai, Context: N/A
falai	ultra	ultra	0.00	0.00	Source: fal_ai, Context: N/A
falai	stable-diffusion-v35-medium	stable-diffusion-v35-medium	0.00	0.00	Source: fal_ai, Context: N/A
featherlessai	Qwerky-72B	qwerky-72b	0.00	0.00	Source: featherless_ai, Context: 32768
featherlessai	Qwerky-QwQ-32B	qwerky-qwq-32b	0.00	0.00	Source: featherless_ai, Context: 32768
fireworksai	fireworks-ai-4.1b-to-16b	fireworks-ai-4.1b-to-16b	0.20	0.20	Source: fireworks_ai, Context: N/A
fireworksai	fireworks-ai-56b-to-176b	fireworks-ai-56b-to-176b	1.20	1.20	Source: fireworks_ai, Context: N/A
fireworksai	fireworks-ai-above-16b	fireworks-ai-above-16b	0.90	0.90	Source: fireworks_ai, Context: N/A
fireworksai	fireworks-ai-default	fireworks-ai-default	0.00	0.00	Source: fireworks_ai, Context: N/A
fireworksaiembeddingmodels	fireworks-ai-embedding-150m-to-350m	fireworks-ai-embedding-150m-to-350m	0.02	0.00	Source: fireworks_ai-embedding-models, Context: N/A
fireworksaiembeddingmodels	fireworks-ai-embedding-up-to-150m	fireworks-ai-embedding-up-to-150m	0.01	0.00	Source: fireworks_ai-embedding-models, Context: N/A
fireworksai	fireworks-ai-moe-up-to-56b	fireworks-ai-moe-up-to-56b	0.50	0.50	Source: fireworks_ai, Context: N/A
fireworksai	fireworks-ai-up-to-4b	fireworks-ai-up-to-4b	0.20	0.20	Source: fireworks_ai, Context: N/A
fireworksaiembeddingmodels	UAE-Large-V1	uae-large-v1	0.02	0.00	Source: fireworks_ai-embedding-models, Context: 512
fireworksai	deepseek-coder-v2-instruct	deepseek-coder-v2-instruct	1.20	1.20	Source: fireworks_ai, Context: 65536
fireworksai	deepseek-r1	deepseek-r1	3.00	8.00	Source: fireworks_ai, Context: 128000
fireworksai	deepseek-r1-basic	deepseek-r1-basic	0.55	2.19	Source: fireworks_ai, Context: 128000
fireworksai	deepseek-v3	deepseek-v3	0.90	0.90	Source: fireworks_ai, Context: 128000
fireworksai	deepseek-v3p1-terminus	deepseek-v3p1-terminus	0.56	1.68	Source: fireworks_ai, Context: 128000
fireworksai	firefunction-v2	firefunction-v2	0.90	0.90	Source: fireworks_ai, Context: 8192
fireworksai	kimi-k2-instruct-0905	kimi-k2-instruct-0905	0.60	2.50	Source: fireworks_ai, Context: 262144
fireworksai	llama-v3p1-405b-instruct	llama-v3p1-405b-instruct	3.00	3.00	Source: fireworks_ai, Context: 128000
fireworksai	llama-v3p1-8b-instruct	llama-v3p1-8b-instruct	0.10	0.10	Source: fireworks_ai, Context: 16384
fireworksai	llama-v3p2-11b-vision-instruct	llama-v3p2-11b-vision-instruct	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	llama-v3p2-1b-instruct	llama-v3p2-1b-instruct	0.10	0.10	Source: fireworks_ai, Context: 16384
fireworksai	llama-v3p2-3b-instruct	llama-v3p2-3b-instruct	0.10	0.10	Source: fireworks_ai, Context: 16384
fireworksai	llama-v3p2-90b-vision-instruct	llama-v3p2-90b-vision-instruct	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	llama4-maverick-instruct-basic	llama4-maverick-instruct-basic	0.22	0.88	Source: fireworks_ai, Context: 131072
fireworksai	llama4-scout-instruct-basic	llama4-scout-instruct-basic	0.15	0.60	Source: fireworks_ai, Context: 131072
fireworksai	mixtral-8x22b-instruct-hf	mixtral-8x22b-instruct-hf	1.20	1.20	Source: fireworks_ai, Context: 65536
fireworksai	qwen2-72b-instruct	qwen2-72b-instruct	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-32b-instruct	qwen2p5-coder-32b-instruct	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	yi-large	yi-large	3.00	3.00	Source: fireworks_ai, Context: 32768
fireworksaiembeddingmodels	nomic-embed-text-v1	nomic-embed-text-v1	0.01	0.00	Source: fireworks_ai-embedding-models, Context: 8192
fireworksaiembeddingmodels	nomic-embed-text-v1.5	nomic-embed-text-v1.5	0.01	0.00	Source: fireworks_ai-embedding-models, Context: 8192
fireworksaiembeddingmodels	gte-base	gte-base	0.01	0.00	Source: fireworks_ai-embedding-models, Context: 512
fireworksaiembeddingmodels	gte-large	gte-large	0.02	0.00	Source: fireworks_ai-embedding-models, Context: 512
friendliai	meta-llama-3.1-70b-instruct	meta-llama-3.1-70b-instruct	0.60	0.60	Source: friendliai, Context: 8192
friendliai	meta-llama-3.1-8b-instruct	meta-llama-3.1-8b-instruct	0.10	0.10	Source: friendliai, Context: 8192
textcompletionopenai	ft:babbage-002	ft:babbage-002	1.60	1.60	Source: text-completion-openai, Context: 16384
textcompletionopenai	ft:davinci-002	ft:davinci-002	12.00	12.00	Source: text-completion-openai, Context: 16384
openai	ft:gpt-3.5-turbo	ft:gpt-3.5-turbo	3.00	6.00	Source: openai, Context: 16385
openai	ft:gpt-3.5-turbo-0125	ft:gpt-3.5-turbo-0125	3.00	6.00	Source: openai, Context: 16385
openai	ft:gpt-3.5-turbo-0613	ft:gpt-3.5-turbo-0613	3.00	6.00	Source: openai, Context: 4096
openai	ft:gpt-3.5-turbo-1106	ft:gpt-3.5-turbo-1106	3.00	6.00	Source: openai, Context: 16385
openai	ft:gpt-4-0613	ft:gpt-4-0613	30.00	60.00	Source: openai, Context: 8192
openai	ft:gpt-4o-2024-08-06	ft:gpt-4o-2024-08-06	3.75	15.00	Source: openai, Context: 128000
openai	ft:gpt-4o-2024-11-20	ft:gpt-4o-2024-11-20	3.75	15.00	Source: openai, Context: 128000
openai	ft:gpt-4o-mini-2024-07-18	ft:gpt-4o-mini-2024-07-18	0.30	1.20	Source: openai, Context: 128000
openai	ft:gpt-4.1-2025-04-14	ft:gpt-4.1-2025-04-14	3.00	12.00	Source: openai, Context: 1047576
openai	ft:gpt-4.1-mini-2025-04-14	ft:gpt-4.1-mini-2025-04-14	0.80	3.20	Source: openai, Context: 1047576
openai	ft:gpt-4.1-nano-2025-04-14	ft:gpt-4.1-nano-2025-04-14	0.20	0.80	Source: openai, Context: 1047576
openai	ft:o4-mini-2025-04-16	ft:o4-mini-2025-04-16	4.00	16.00	Source: openai, Context: 200000
vertex	gemini-1.0-pro	gemini-1.0-pro	0.50	1.50	Source: vertex, Context: 32760
vertex	gemini-1.0-pro-001	gemini-1.0-pro-001	0.50	1.50	Source: vertex, Context: 32760
vertex	gemini-1.0-pro-002	gemini-1.0-pro-002	0.50	1.50	Source: vertex, Context: 32760
vertex	gemini-1.0-pro-vision	gemini-1.0-pro-vision	0.50	1.50	Source: vertex, Context: 16384
vertex	gemini-1.0-pro-vision-001	gemini-1.0-pro-vision-001	0.50	1.50	Source: vertex, Context: 16384
vertex	gemini-1.0-ultra	gemini-1.0-ultra	0.50	1.50	Source: vertex, Context: 8192
vertex	gemini-1.0-ultra-001	gemini-1.0-ultra-001	0.50	1.50	Source: vertex, Context: 8192
vertex	gemini-1.5-flash	gemini-1.5-flash	0.08	0.30	Source: vertex, Context: 1000000
vertex	gemini-1.5-flash-001	gemini-1.5-flash-001	0.08	0.30	Source: vertex, Context: 1000000
vertex	gemini-1.5-flash-002	gemini-1.5-flash-002	0.08	0.30	Source: vertex, Context: 1048576
vertex	gemini-1.5-flash-exp-0827	gemini-1.5-flash-exp-0827	0.00	0.00	Source: vertex, Context: 1000000
vertex	gemini-1.5-flash-preview-0514	gemini-1.5-flash-preview-0514	0.08	0.00	Source: vertex, Context: 1000000
vertex	gemini-1.5-pro	gemini-1.5-pro	1.25	5.00	Source: vertex, Context: 2097152
vertex	gemini-1.5-pro-001	gemini-1.5-pro-001	1.25	5.00	Source: vertex, Context: 1000000
vertex	gemini-1.5-pro-002	gemini-1.5-pro-002	1.25	5.00	Source: vertex, Context: 2097152
vertex	gemini-1.5-pro-preview-0215	gemini-1.5-pro-preview-0215	0.08	0.31	Source: vertex, Context: 1000000
vertex	gemini-1.5-pro-preview-0409	gemini-1.5-pro-preview-0409	0.08	0.31	Source: vertex, Context: 1000000
vertex	gemini-1.5-pro-preview-0514	gemini-1.5-pro-preview-0514	0.08	0.31	Source: vertex, Context: 1000000
vertex	gemini-2.0-flash	gemini-2.0-flash	0.10	0.40	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-001	gemini-2.0-flash-001	0.15	0.60	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-exp	gemini-2.0-flash-exp	0.15	0.60	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-lite	gemini-2.0-flash-lite	0.08	0.30	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-lite-001	gemini-2.0-flash-lite-001	0.08	0.30	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-live-preview-04-09	gemini-2.0-flash-live-preview-04-09	0.50	2.00	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-preview-image-generation	gemini-2.0-flash-preview-image-generation	0.10	0.40	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-thinking-exp	gemini-2.0-flash-thinking-exp	0.00	0.00	Source: vertex, Context: 1048576
vertex	gemini-2.0-flash-thinking-exp-01-21	gemini-2.0-flash-thinking-exp-01-21	0.00	0.00	Source: vertex, Context: 1048576
vertex	gemini-2.0-pro-exp-02-05	gemini-2.0-pro-exp-02-05	1.25	10.00	Source: vertex, Context: 2097152
vertex	gemini-2.5-flash	gemini-2.5-flash	0.30	2.50	Source: vertex, Context: 1048576
vertex	gemini-2.5-flash-image	gemini-2.5-flash-image	0.30	2.50	Source: vertex, Context: 32768
vertex	gemini-2.5-flash-image-preview	gemini-2.5-flash-image-preview	0.30	30.00	Source: vertex, Context: 1048576
vertex	gemini-3-pro-image-preview	gemini-3-pro-image-preview	2.00	12.00	Source: vertex, Context: 65536
vertex	gemini-2.5-flash-lite	gemini-2.5-flash-lite	0.10	0.40	Source: vertex, Context: 1048576
vertex	gemini-2.5-flash-lite-preview-09-2025	gemini-2.5-flash-lite-preview-09-2025	0.10	0.40	Source: vertex, Context: 1048576
vertex	gemini-2.5-flash-preview-09-2025	gemini-2.5-flash-preview-09-2025	0.30	2.50	Source: vertex, Context: 1048576
vertex	gemini-live-2.5-flash-preview-native-audio-09-2025	gemini-live-2.5-flash-preview-native-audio-09-2025	0.30	2.00	Source: vertex, Context: 1048576
gemini	gemini-live-2.5-flash-preview-native-audio-09-2025	gemini-live-2.5-flash-preview-native-audio-09-2025	0.30	2.00	Source: gemini, Context: 1048576
vertex	gemini-2.5-flash-lite-preview-06-17	gemini-2.5-flash-lite-preview-06-17	0.10	0.40	Source: vertex, Context: 1048576
vertex	gemini-2.5-flash-preview-04-17	gemini-2.5-flash-preview-04-17	0.15	0.60	Source: vertex, Context: 1048576
vertex	gemini-2.5-flash-preview-05-20	gemini-2.5-flash-preview-05-20	0.30	2.50	Source: vertex, Context: 1048576
vertex	gemini-2.5-pro	gemini-2.5-pro	1.25	10.00	Source: vertex, Context: 1048576
vertex	gemini-3-pro-preview	gemini-3-pro-preview	2.00	12.00	Source: vertex, Context: 1048576
vertex	gemini-3-flash-preview	gemini-3-flash-preview	0.50	3.00	Source: vertex, Context: 1048576
vertex	gemini-2.5-pro-exp-03-25	gemini-2.5-pro-exp-03-25	1.25	10.00	Source: vertex, Context: 1048576
vertex	gemini-2.5-pro-preview-03-25	gemini-2.5-pro-preview-03-25	1.25	10.00	Source: vertex, Context: 1048576
vertex	gemini-2.5-pro-preview-05-06	gemini-2.5-pro-preview-05-06	1.25	10.00	Source: vertex, Context: 1048576
vertex	gemini-2.5-pro-preview-06-05	gemini-2.5-pro-preview-06-05	1.25	10.00	Source: vertex, Context: 1048576
vertex	gemini-2.5-pro-preview-tts	gemini-2.5-pro-preview-tts	1.25	10.00	Source: vertex, Context: 1048576
vertex	gemini-embedding-001	gemini-embedding-001	0.15	0.00	Source: vertex, Context: 2048
vertex	gemini-flash-experimental	gemini-flash-experimental	0.00	0.00	Source: vertex, Context: 1000000
vertex	gemini-pro	gemini-pro	0.50	1.50	Source: vertex, Context: 32760
vertex	gemini-pro-experimental	gemini-pro-experimental	0.00	0.00	Source: vertex, Context: 1000000
vertex	gemini-pro-vision	gemini-pro-vision	0.50	1.50	Source: vertex, Context: 16384
gemini	gemini-embedding-001	gemini-embedding-001	0.15	0.00	Source: gemini, Context: 2048
gemini	gemini-1.5-flash	gemini-1.5-flash	0.08	0.30	Source: gemini, Context: 1048576
gemini	gemini-1.5-flash-001	gemini-1.5-flash-001	0.08	0.30	Source: gemini, Context: 1048576
gemini	gemini-1.5-flash-002	gemini-1.5-flash-002	0.08	0.30	Source: gemini, Context: 1048576
gemini	gemini-1.5-flash-8b	gemini-1.5-flash-8b	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-1.5-flash-8b-exp-0827	gemini-1.5-flash-8b-exp-0827	0.00	0.00	Source: gemini, Context: 1000000
gemini	gemini-1.5-flash-8b-exp-0924	gemini-1.5-flash-8b-exp-0924	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-1.5-flash-exp-0827	gemini-1.5-flash-exp-0827	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-1.5-flash-latest	gemini-1.5-flash-latest	0.08	0.30	Source: gemini, Context: 1048576
gemini	gemini-1.5-pro	gemini-1.5-pro	3.50	10.50	Source: gemini, Context: 2097152
gemini	gemini-1.5-pro-001	gemini-1.5-pro-001	3.50	10.50	Source: gemini, Context: 2097152
gemini	gemini-1.5-pro-002	gemini-1.5-pro-002	3.50	10.50	Source: gemini, Context: 2097152
gemini	gemini-1.5-pro-exp-0801	gemini-1.5-pro-exp-0801	3.50	10.50	Source: gemini, Context: 2097152
gemini	gemini-1.5-pro-exp-0827	gemini-1.5-pro-exp-0827	0.00	0.00	Source: gemini, Context: 2097152
gemini	gemini-1.5-pro-latest	gemini-1.5-pro-latest	3.50	1.05	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash	gemini-2.0-flash	0.10	0.40	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-001	gemini-2.0-flash-001	0.10	0.40	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-exp	gemini-2.0-flash-exp	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-lite	gemini-2.0-flash-lite	0.08	0.30	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-lite-preview-02-05	gemini-2.0-flash-lite-preview-02-05	0.08	0.30	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-live-001	gemini-2.0-flash-live-001	0.35	1.50	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-preview-image-generation	gemini-2.0-flash-preview-image-generation	0.10	0.40	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-thinking-exp	gemini-2.0-flash-thinking-exp	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-2.0-flash-thinking-exp-01-21	gemini-2.0-flash-thinking-exp-01-21	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-2.0-pro-exp-02-05	gemini-2.0-pro-exp-02-05	0.00	0.00	Source: gemini, Context: 2097152
gemini	gemini-2.5-flash	gemini-2.5-flash	0.30	2.50	Source: gemini, Context: 1048576
gemini	gemini-2.5-flash-image-preview	gemini-2.5-flash-image-preview	0.30	30.00	Source: gemini, Context: 1048576
gemini	gemini-3-pro-image-preview	gemini-3-pro-image-preview	2.00	12.00	Source: gemini, Context: 65536
gemini	gemini-2.5-flash-lite	gemini-2.5-flash-lite	0.10	0.40	Source: gemini, Context: 1048576
gemini	gemini-2.5-flash-lite-preview-09-2025	gemini-2.5-flash-lite-preview-09-2025	0.10	0.40	Source: gemini, Context: 1048576
gemini	gemini-2.5-flash-preview-09-2025	gemini-2.5-flash-preview-09-2025	0.30	2.50	Source: gemini, Context: 1048576
gemini	gemini-flash-latest	gemini-flash-latest	0.30	2.50	Source: gemini, Context: 1048576
gemini	gemini-flash-lite-latest	gemini-flash-lite-latest	0.10	0.40	Source: gemini, Context: 1048576
gemini	gemini-2.5-flash-lite-preview-06-17	gemini-2.5-flash-lite-preview-06-17	0.10	0.40	Source: gemini, Context: 1048576
gemini	gemini-2.5-flash-preview-04-17	gemini-2.5-flash-preview-04-17	0.15	0.60	Source: gemini, Context: 1048576
gemini	gemini-2.5-flash-preview-05-20	gemini-2.5-flash-preview-05-20	0.30	2.50	Source: gemini, Context: 1048576
gemini	gemini-2.5-flash-preview-tts	gemini-2.5-flash-preview-tts	0.15	0.60	Source: gemini, Context: 1048576
gemini	gemini-2.5-pro	gemini-2.5-pro	1.25	10.00	Source: gemini, Context: 1048576
gemini	gemini-2.5-computer-use-preview-10-2025	gemini-2.5-computer-use-preview-10-2025	1.25	10.00	Source: gemini, Context: 128000
gemini	gemini-3-pro-preview	gemini-3-pro-preview	2.00	12.00	Source: gemini, Context: 1048576
gemini	gemini-3-flash-preview	gemini-3-flash-preview	0.50	3.00	Source: gemini, Context: 1048576
gemini	gemini-2.5-pro-exp-03-25	gemini-2.5-pro-exp-03-25	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-2.5-pro-preview-03-25	gemini-2.5-pro-preview-03-25	1.25	10.00	Source: gemini, Context: 1048576
gemini	gemini-2.5-pro-preview-05-06	gemini-2.5-pro-preview-05-06	1.25	10.00	Source: gemini, Context: 1048576
gemini	gemini-2.5-pro-preview-06-05	gemini-2.5-pro-preview-06-05	1.25	10.00	Source: gemini, Context: 1048576
gemini	gemini-2.5-pro-preview-tts	gemini-2.5-pro-preview-tts	1.25	10.00	Source: gemini, Context: 1048576
gemini	gemini-exp-1114	gemini-exp-1114	0.00	0.00	Source: gemini, Context: 1048576
gemini	gemini-exp-1206	gemini-exp-1206	0.00	0.00	Source: gemini, Context: 2097152
gemini	gemini-gemma-2-27b-it	gemini-gemma-2-27b-it	0.35	1.05	Source: gemini, Context: 8192
gemini	gemini-gemma-2-9b-it	gemini-gemma-2-9b-it	0.35	1.05	Source: gemini, Context: 8192
gemini	gemini-pro	gemini-pro	0.35	1.05	Source: gemini, Context: 32760
gemini	gemini-pro-vision	gemini-pro-vision	0.35	1.05	Source: gemini, Context: 30720
gemini	gemma-3-27b-it	gemma-3-27b-it	0.00	0.00	Source: gemini, Context: 131072
gemini	imagen-3.0-fast-generate-001	imagen-3.0-fast-generate-001	0.00	0.00	Source: gemini, Context: N/A
gemini	imagen-3.0-generate-001	imagen-3.0-generate-001	0.00	0.00	Source: gemini, Context: N/A
gemini	imagen-3.0-generate-002	imagen-3.0-generate-002	0.00	0.00	Source: gemini, Context: N/A
gemini	imagen-4.0-fast-generate-001	imagen-4.0-fast-generate-001	0.00	0.00	Source: gemini, Context: N/A
gemini	imagen-4.0-generate-001	imagen-4.0-generate-001	0.00	0.00	Source: gemini, Context: N/A
gemini	imagen-4.0-ultra-generate-001	imagen-4.0-ultra-generate-001	0.00	0.00	Source: gemini, Context: N/A
gemini	learnlm-1.5-pro-experimental	learnlm-1.5-pro-experimental	0.00	0.00	Source: gemini, Context: 32767
gemini	veo-2.0-generate-001	veo-2.0-generate-001	0.00	0.00	Source: gemini, Context: 1024
gemini	veo-3.0-fast-generate-preview	veo-3.0-fast-generate-preview	0.00	0.00	Source: gemini, Context: 1024
gemini	veo-3.0-generate-preview	veo-3.0-generate-preview	0.00	0.00	Source: gemini, Context: 1024
gemini	veo-3.1-fast-generate-preview	veo-3.1-fast-generate-preview	0.00	0.00	Source: gemini, Context: 1024
gemini	veo-3.1-generate-preview	veo-3.1-generate-preview	0.00	0.00	Source: gemini, Context: 1024
gemini	veo-3.1-fast-generate-001	veo-3.1-fast-generate-001	0.00	0.00	Source: gemini, Context: 1024
gemini	veo-3.1-generate-001	veo-3.1-generate-001	0.00	0.00	Source: gemini, Context: 1024
githubcopilot	gpt-3.5-turbo	gpt-3.5-turbo	0.00	0.00	Source: github_copilot, Context: 16384
githubcopilot	gpt-3.5-turbo-0613	gpt-3.5-turbo-0613	0.00	0.00	Source: github_copilot, Context: 16384
githubcopilot	gpt-4	gpt-4	0.00	0.00	Source: github_copilot, Context: 32768
githubcopilot	gpt-4-0613	gpt-4-0613	0.00	0.00	Source: github_copilot, Context: 32768
githubcopilot	gpt-4-o-preview	gpt-4-o-preview	0.00	0.00	Source: github_copilot, Context: 64000
githubcopilot	gpt-4.1-2025-04-14	gpt-4.1-2025-04-14	0.00	0.00	Source: github_copilot, Context: 128000
githubcopilot	gpt-41-copilot	gpt-41-copilot	0.00	0.00	Source: github_copilot, Context: N/A
githubcopilot	gpt-4o-2024-05-13	gpt-4o-2024-05-13	0.00	0.00	Source: github_copilot, Context: 64000
githubcopilot	gpt-4o-2024-08-06	gpt-4o-2024-08-06	0.00	0.00	Source: github_copilot, Context: 64000
githubcopilot	gpt-4o-2024-11-20	gpt-4o-2024-11-20	0.00	0.00	Source: github_copilot, Context: 64000
githubcopilot	gpt-4o-mini	gpt-4o-mini	0.00	0.00	Source: github_copilot, Context: 64000
githubcopilot	gpt-4o-mini-2024-07-18	gpt-4o-mini-2024-07-18	0.00	0.00	Source: github_copilot, Context: 64000
githubcopilot	text-embedding-3-small	text-embedding-3-small	0.00	0.00	Source: github_copilot, Context: 8191
githubcopilot	text-embedding-3-small-inference	text-embedding-3-small-inference	0.00	0.00	Source: github_copilot, Context: 8191
githubcopilot	text-embedding-ada-002	text-embedding-ada-002	0.00	0.00	Source: github_copilot, Context: 8191
bedrockconverse	google.gemma-3-12b-it	google.gemma-3-12b-it	0.09	0.29	Source: bedrock_converse, Context: 128000
bedrockconverse	google.gemma-3-27b-it	google.gemma-3-27b-it	0.23	0.38	Source: bedrock_converse, Context: 128000
bedrockconverse	google.gemma-3-4b-it	google.gemma-3-4b-it	0.04	0.08	Source: bedrock_converse, Context: 128000
googlepse	search	search	0.00	0.00	Source: google_pse, Context: N/A
bedrockconverse	global.anthropic.claude-sonnet-4-5-20250929-v1:0	global.anthropic.claude-sonnet-4-5-20250929-v1:0	3.00	15.00	Source: bedrock_converse, Context: 200000
bedrockconverse	global.anthropic.claude-sonnet-4-20250514-v1:0	global.anthropic.claude-sonnet-4-20250514-v1:0	3.00	15.00	Source: bedrock_converse, Context: 1000000
bedrockconverse	global.anthropic.claude-haiku-4-5-20251001-v1:0	global.anthropic.claude-haiku-4-5-20251001-v1:0	1.00	5.00	Source: bedrock_converse, Context: 200000
bedrockconverse	global.amazon.nova-2-lite-v1:0	global.amazon.nova-2-lite-v1:0	0.30	2.50	Source: bedrock_converse, Context: 1000000
openai	gpt-3.5-turbo-0125	gpt-3.5-turbo-0125	0.50	1.50	Source: openai, Context: 16385
openai	gpt-3.5-turbo-0301	gpt-3.5-turbo-0301	1.50	2.00	Source: openai, Context: 4097
openai	gpt-3.5-turbo-0613	gpt-3.5-turbo-0613	1.50	2.00	Source: openai, Context: 4097
openai	gpt-3.5-turbo-1106	gpt-3.5-turbo-1106	1.00	2.00	Source: openai, Context: 16385
openai	gpt-3.5-turbo-16k	gpt-3.5-turbo-16k	3.00	4.00	Source: openai, Context: 16385
openai	gpt-3.5-turbo-16k-0613	gpt-3.5-turbo-16k-0613	3.00	4.00	Source: openai, Context: 16385
textcompletionopenai	gpt-3.5-turbo-instruct	gpt-3.5-turbo-instruct	1.50	2.00	Source: text-completion-openai, Context: 8192
textcompletionopenai	gpt-3.5-turbo-instruct-0914	gpt-3.5-turbo-instruct-0914	1.50	2.00	Source: text-completion-openai, Context: 8192
openai	gpt-4-0125-preview	gpt-4-0125-preview	10.00	30.00	Source: openai, Context: 128000
openai	gpt-4-0314	gpt-4-0314	30.00	60.00	Source: openai, Context: 8192
openai	gpt-4-0613	gpt-4-0613	30.00	60.00	Source: openai, Context: 8192
openai	gpt-4-1106-preview	gpt-4-1106-preview	10.00	30.00	Source: openai, Context: 128000
openai	gpt-4-1106-vision-preview	gpt-4-1106-vision-preview	10.00	30.00	Source: openai, Context: 128000
openai	gpt-4-32k	gpt-4-32k	60.00	120.00	Source: openai, Context: 32768
openai	gpt-4-32k-0314	gpt-4-32k-0314	60.00	120.00	Source: openai, Context: 32768
openai	gpt-4-32k-0613	gpt-4-32k-0613	60.00	120.00	Source: openai, Context: 32768
openai	gpt-4-turbo-2024-04-09	gpt-4-turbo-2024-04-09	10.00	30.00	Source: openai, Context: 128000
openai	gpt-4-turbo-preview	gpt-4-turbo-preview	10.00	30.00	Source: openai, Context: 128000
openai	gpt-4-vision-preview	gpt-4-vision-preview	10.00	30.00	Source: openai, Context: 128000
openai	gpt-4.1-2025-04-14	gpt-4.1-2025-04-14	2.00	8.00	Source: openai, Context: 1047576
openai	gpt-4.1-mini-2025-04-14	gpt-4.1-mini-2025-04-14	0.40	1.60	Source: openai, Context: 1047576
openai	gpt-4.1-nano-2025-04-14	gpt-4.1-nano-2025-04-14	0.10	0.40	Source: openai, Context: 1047576
openai	gpt-4.5-preview	gpt-4.5-preview	75.00	150.00	Source: openai, Context: 128000
openai	gpt-4.5-preview-2025-02-27	gpt-4.5-preview-2025-02-27	75.00	150.00	Source: openai, Context: 128000
openai	gpt-4o-audio-preview	gpt-4o-audio-preview	2.50	10.00	Source: openai, Context: 128000
openai	gpt-4o-audio-preview-2024-10-01	gpt-4o-audio-preview-2024-10-01	2.50	10.00	Source: openai, Context: 128000
openai	gpt-4o-audio-preview-2024-12-17	gpt-4o-audio-preview-2024-12-17	2.50	10.00	Source: openai, Context: 128000
openai	gpt-4o-audio-preview-2025-06-03	gpt-4o-audio-preview-2025-06-03	2.50	10.00	Source: openai, Context: 128000
openai	gpt-4o-mini-2024-07-18	gpt-4o-mini-2024-07-18	0.15	0.60	Source: openai, Context: 128000
openai	gpt-4o-mini-audio-preview	gpt-4o-mini-audio-preview	0.15	0.60	Source: openai, Context: 128000
openai	gpt-4o-mini-audio-preview-2024-12-17	gpt-4o-mini-audio-preview-2024-12-17	0.15	0.60	Source: openai, Context: 128000
openai	gpt-4o-mini-realtime-preview	gpt-4o-mini-realtime-preview	0.60	2.40	Source: openai, Context: 128000
openai	gpt-4o-mini-realtime-preview-2024-12-17	gpt-4o-mini-realtime-preview-2024-12-17	0.60	2.40	Source: openai, Context: 128000
openai	gpt-4o-mini-search-preview	gpt-4o-mini-search-preview	0.15	0.60	Source: openai, Context: 128000
openai	gpt-4o-mini-search-preview-2025-03-11	gpt-4o-mini-search-preview-2025-03-11	0.15	0.60	Source: openai, Context: 128000
openai	gpt-4o-mini-transcribe	gpt-4o-mini-transcribe	1.25	5.00	Source: openai, Context: 16000
openai	gpt-4o-mini-tts	gpt-4o-mini-tts	2.50	10.00	Source: openai, Context: N/A
openai	gpt-4o-realtime-preview	gpt-4o-realtime-preview	5.00	20.00	Source: openai, Context: 128000
openai	gpt-4o-realtime-preview-2024-10-01	gpt-4o-realtime-preview-2024-10-01	5.00	20.00	Source: openai, Context: 128000
openai	gpt-4o-realtime-preview-2024-12-17	gpt-4o-realtime-preview-2024-12-17	5.00	20.00	Source: openai, Context: 128000
openai	gpt-4o-realtime-preview-2025-06-03	gpt-4o-realtime-preview-2025-06-03	5.00	20.00	Source: openai, Context: 128000
openai	gpt-4o-search-preview	gpt-4o-search-preview	2.50	10.00	Source: openai, Context: 128000
openai	gpt-4o-search-preview-2025-03-11	gpt-4o-search-preview-2025-03-11	2.50	10.00	Source: openai, Context: 128000
openai	gpt-4o-transcribe	gpt-4o-transcribe	2.50	10.00	Source: openai, Context: 16000
openai	gpt-image-1.5	gpt-image-1.5	5.00	10.00	Source: openai, Context: N/A
openai	gpt-image-1.5-2025-12-16	gpt-image-1.5-2025-12-16	5.00	10.00	Source: openai, Context: N/A
openai	gpt-5.1-2025-11-13	gpt-5.1-2025-11-13	1.25	10.00	Source: openai, Context: 272000
openai	gpt-5.2-2025-12-11	gpt-5.2-2025-12-11	1.75	14.00	Source: openai, Context: 400000
openai	gpt-5.2-pro-2025-12-11	gpt-5.2-pro-2025-12-11	21.00	168.00	Source: openai, Context: 400000
openai	gpt-5-pro-2025-10-06	gpt-5-pro-2025-10-06	15.00	120.00	Source: openai, Context: 400000
openai	gpt-5-2025-08-07	gpt-5-2025-08-07	1.25	10.00	Source: openai, Context: 272000
openai	gpt-5-chat	gpt-5-chat	1.25	10.00	Source: openai, Context: 272000
openai	gpt-5-mini-2025-08-07	gpt-5-mini-2025-08-07	0.25	2.00	Source: openai, Context: 272000
openai	gpt-5-nano-2025-08-07	gpt-5-nano-2025-08-07	0.05	0.40	Source: openai, Context: 272000
openai	gpt-image-1	gpt-image-1	5.00	0.00	Source: openai, Context: N/A
openai	gpt-image-1-mini	gpt-image-1-mini	2.00	0.00	Source: openai, Context: N/A
openai	gpt-realtime	gpt-realtime	4.00	16.00	Source: openai, Context: 32000
openai	gpt-realtime-mini	gpt-realtime-mini	0.60	2.40	Source: openai, Context: 128000
openai	gpt-realtime-2025-08-28	gpt-realtime-2025-08-28	4.00	16.00	Source: openai, Context: 32000
gradientai	alibaba-qwen3-32b	alibaba-qwen3-32b	0.00	0.00	Source: gradient_ai, Context: 2048
gradientai	anthropic-claude-3-opus	anthropic-claude-3-opus	15.00	75.00	Source: gradient_ai, Context: 1024
gradientai	anthropic-claude-3.5-haiku	anthropic-claude-3.5-haiku	0.80	4.00	Source: gradient_ai, Context: 1024
gradientai	anthropic-claude-3.5-sonnet	anthropic-claude-3.5-sonnet	3.00	15.00	Source: gradient_ai, Context: 1024
gradientai	anthropic-claude-3.7-sonnet	anthropic-claude-3.7-sonnet	3.00	15.00	Source: gradient_ai, Context: 1024
gradientai	deepseek-r1-distill-llama-70b	deepseek-r1-distill-llama-70b	0.99	0.99	Source: gradient_ai, Context: 8000
gradientai	llama3-8b-instruct	llama3-8b-instruct	0.20	0.20	Source: gradient_ai, Context: 512
gradientai	llama3.3-70b-instruct	llama3.3-70b-instruct	0.65	0.65	Source: gradient_ai, Context: 2048
gradientai	mistral-nemo-instruct-2407	mistral-nemo-instruct-2407	0.30	0.30	Source: gradient_ai, Context: 512
gradientai	openai-gpt-4o	openai-gpt-4o	0.00	0.00	Source: gradient_ai, Context: 16384
gradientai	openai-gpt-4o-mini	openai-gpt-4o-mini	0.00	0.00	Source: gradient_ai, Context: 16384
gradientai	openai-o3	openai-o3	2.00	8.00	Source: gradient_ai, Context: 100000
gradientai	openai-o3-mini	openai-o3-mini	1.10	4.40	Source: gradient_ai, Context: 100000
lemonade	Qwen3-Coder-30B-A3B-Instruct-GGUF	qwen3-coder-30b-a3b-instruct-gguf	0.00	0.00	Source: lemonade, Context: 262144
lemonade	gpt-oss-20b-mxfp4-GGUF	gpt-oss-20b-mxfp4-gguf	0.00	0.00	Source: lemonade, Context: 131072
lemonade	gpt-oss-120b-mxfp-GGUF	gpt-oss-120b-mxfp-gguf	0.00	0.00	Source: lemonade, Context: 131072
lemonade	Gemma-3-4b-it-GGUF	gemma-3-4b-it-gguf	0.00	0.00	Source: lemonade, Context: 128000
lemonade	Qwen3-4B-Instruct-2507-GGUF	qwen3-4b-instruct-2507-gguf	0.00	0.00	Source: lemonade, Context: 262144
amazonnova	nova-micro-v1	nova-micro-v1	0.04	0.14	Source: amazon_nova, Context: 128000
amazonnova	nova-lite-v1	nova-lite-v1	0.06	0.24	Source: amazon_nova, Context: 300000
amazonnova	nova-premier-v1	nova-premier-v1	2.50	12.50	Source: amazon_nova, Context: 1000000
amazonnova	nova-pro-v1	nova-pro-v1	0.80	3.20	Source: amazon_nova, Context: 300000
groq	gemma-7b-it	gemma-7b-it	0.05	0.08	Source: groq, Context: 8192
groq	playai-tts	playai-tts	0.00	0.00	Source: groq, Context: 10000
groq	whisper-large-v3	whisper-large-v3	0.00	0.00	Source: groq, Context: N/A
groq	whisper-large-v3-turbo	whisper-large-v3-turbo	0.00	0.00	Source: groq, Context: N/A
openai	dall-e-3	dall-e-3	0.00	0.00	Source: openai, Context: N/A
heroku	claude-3-5-haiku	claude-3-5-haiku	0.00	0.00	Source: heroku, Context: 4096
heroku	claude-3-5-sonnet-latest	claude-3-5-sonnet-latest	0.00	0.00	Source: heroku, Context: 8192
heroku	claude-3-7-sonnet	claude-3-7-sonnet	0.00	0.00	Source: heroku, Context: 8192
heroku	claude-4-sonnet	claude-4-sonnet	0.00	0.00	Source: heroku, Context: 8192
hyperbolic	Hermes-3-Llama-3.1-70B	hermes-3-llama-3.1-70b	0.12	0.30	Source: hyperbolic, Context: 32768
hyperbolic	QwQ-32B	qwq-32b	0.20	0.20	Source: hyperbolic, Context: 131072
hyperbolic	Qwen2.5-72B-Instruct	qwen2.5-72b-instruct	0.12	0.30	Source: hyperbolic, Context: 131072
hyperbolic	Qwen2.5-Coder-32B-Instruct	qwen2.5-coder-32b-instruct	0.12	0.30	Source: hyperbolic, Context: 32768
hyperbolic	Qwen3-235B-A22B	qwen3-235b-a22b	2.00	2.00	Source: hyperbolic, Context: 131072
hyperbolic	DeepSeek-R1	deepseek-r1	0.40	0.40	Source: hyperbolic, Context: 32768
hyperbolic	DeepSeek-R1-0528	deepseek-r1-0528	0.25	0.25	Source: hyperbolic, Context: 131072
hyperbolic	DeepSeek-V3	deepseek-v3	0.20	0.20	Source: hyperbolic, Context: 32768
hyperbolic	DeepSeek-V3-0324	deepseek-v3-0324	0.40	0.40	Source: hyperbolic, Context: 32768
hyperbolic	Llama-3.2-3B-Instruct	llama-3.2-3b-instruct	0.12	0.30	Source: hyperbolic, Context: 32768
hyperbolic	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.12	0.30	Source: hyperbolic, Context: 131072
hyperbolic	Meta-Llama-3-70B-Instruct	meta-llama-3-70b-instruct	0.12	0.30	Source: hyperbolic, Context: 131072
hyperbolic	Meta-Llama-3.1-405B-Instruct	meta-llama-3.1-405b-instruct	0.12	0.30	Source: hyperbolic, Context: 32768
hyperbolic	Meta-Llama-3.1-70B-Instruct	meta-llama-3.1-70b-instruct	0.12	0.30	Source: hyperbolic, Context: 32768
hyperbolic	Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.12	0.30	Source: hyperbolic, Context: 32768
hyperbolic	Kimi-K2-Instruct	kimi-k2-instruct	2.00	2.00	Source: hyperbolic, Context: 131072
ai21	j2-light	j2-light	3.00	3.00	Source: ai21, Context: 8192
ai21	j2-mid	j2-mid	10.00	10.00	Source: ai21, Context: 8192
ai21	j2-ultra	j2-ultra	15.00	15.00	Source: ai21, Context: 8192
ai21	jamba-1.5	jamba-1.5	0.20	0.40	Source: ai21, Context: 256000
ai21	jamba-1.5-large	jamba-1.5-large	2.00	8.00	Source: ai21, Context: 256000
ai21	jamba-1.5-large@001	jamba-1.5-large@001	2.00	8.00	Source: ai21, Context: 256000
ai21	jamba-1.5-mini	jamba-1.5-mini	0.20	0.40	Source: ai21, Context: 256000
ai21	jamba-1.5-mini@001	jamba-1.5-mini@001	0.20	0.40	Source: ai21, Context: 256000
ai21	jamba-large-1.6	jamba-large-1.6	2.00	8.00	Source: ai21, Context: 256000
ai21	jamba-large-1.7	jamba-large-1.7	2.00	8.00	Source: ai21, Context: 256000
ai21	jamba-mini-1.6	jamba-mini-1.6	0.20	0.40	Source: ai21, Context: 256000
ai21	jamba-mini-1.7	jamba-mini-1.7	0.20	0.40	Source: ai21, Context: 256000
jinaai	jina-reranker-v2-base-multilingual	jina-reranker-v2-base-multilingual	0.02	0.02	Source: jina_ai, Context: 1024
bedrockconverse	jp.anthropic.claude-sonnet-4-5-20250929-v1:0	jp.anthropic.claude-sonnet-4-5-20250929-v1:0	3.30	16.50	Source: bedrock_converse, Context: 200000
bedrockconverse	jp.anthropic.claude-haiku-4-5-20251001-v1:0	jp.anthropic.claude-haiku-4-5-20251001-v1:0	1.10	5.50	Source: bedrock_converse, Context: 200000
lambdaai	deepseek-llama3.3-70b	deepseek-llama3.3-70b	0.20	0.60	Source: lambda_ai, Context: 131072
lambdaai	deepseek-r1-0528	deepseek-r1-0528	0.20	0.60	Source: lambda_ai, Context: 131072
lambdaai	deepseek-r1-671b	deepseek-r1-671b	0.80	0.80	Source: lambda_ai, Context: 131072
lambdaai	deepseek-v3-0324	deepseek-v3-0324	0.20	0.60	Source: lambda_ai, Context: 131072
lambdaai	hermes3-405b	hermes3-405b	0.80	0.80	Source: lambda_ai, Context: 131072
lambdaai	hermes3-70b	hermes3-70b	0.12	0.30	Source: lambda_ai, Context: 131072
lambdaai	hermes3-8b	hermes3-8b	0.03	0.04	Source: lambda_ai, Context: 131072
lambdaai	lfm-40b	lfm-40b	0.10	0.20	Source: lambda_ai, Context: 131072
lambdaai	lfm-7b	lfm-7b	0.03	0.04	Source: lambda_ai, Context: 131072
lambdaai	llama-4-maverick-17b-128e-instruct-fp8	llama-4-maverick-17b-128e-instruct-fp8	0.05	0.10	Source: lambda_ai, Context: 131072
lambdaai	llama-4-scout-17b-16e-instruct	llama-4-scout-17b-16e-instruct	0.05	0.10	Source: lambda_ai, Context: 16384
lambdaai	llama3.1-405b-instruct-fp8	llama3.1-405b-instruct-fp8	0.80	0.80	Source: lambda_ai, Context: 131072
lambdaai	llama3.1-70b-instruct-fp8	llama3.1-70b-instruct-fp8	0.12	0.30	Source: lambda_ai, Context: 131072
lambdaai	llama3.1-8b-instruct	llama3.1-8b-instruct	0.03	0.04	Source: lambda_ai, Context: 131072
lambdaai	llama3.1-nemotron-70b-instruct-fp8	llama3.1-nemotron-70b-instruct-fp8	0.12	0.30	Source: lambda_ai, Context: 131072
lambdaai	llama3.2-11b-vision-instruct	llama3.2-11b-vision-instruct	0.02	0.03	Source: lambda_ai, Context: 131072
lambdaai	llama3.2-3b-instruct	llama3.2-3b-instruct	0.02	0.03	Source: lambda_ai, Context: 131072
lambdaai	llama3.3-70b-instruct-fp8	llama3.3-70b-instruct-fp8	0.12	0.30	Source: lambda_ai, Context: 131072
lambdaai	qwen25-coder-32b-instruct	qwen25-coder-32b-instruct	0.05	0.10	Source: lambda_ai, Context: 131072
lambdaai	qwen3-32b-fp8	qwen3-32b-fp8	0.05	0.10	Source: lambda_ai, Context: 131072
alephalpha	luminous-base	luminous-base	30.00	33.00	Source: aleph_alpha, Context: 2048
alephalpha	luminous-base-control	luminous-base-control	37.50	41.25	Source: aleph_alpha, Context: 2048
alephalpha	luminous-extended	luminous-extended	45.00	49.50	Source: aleph_alpha, Context: 2048
alephalpha	luminous-extended-control	luminous-extended-control	56.25	61.88	Source: aleph_alpha, Context: 2048
alephalpha	luminous-supreme	luminous-supreme	175.00	192.50	Source: aleph_alpha, Context: 2048
alephalpha	luminous-supreme-control	luminous-supreme-control	218.75	240.63	Source: aleph_alpha, Context: 2048
vertex	medlm-large	medlm-large	0.00	0.00	Source: vertex, Context: 8192
vertex	medlm-medium	medlm-medium	0.00	0.00	Source: vertex, Context: 32768
bedrock	meta.llama2-13b-chat-v1	meta.llama2-13b-chat-v1	0.75	1.00	Source: bedrock, Context: 4096
bedrock	meta.llama2-70b-chat-v1	meta.llama2-70b-chat-v1	1.95	2.56	Source: bedrock, Context: 4096
bedrock	meta.llama3-1-405b-instruct-v1:0	meta.llama3-1-405b-instruct-v1:0	5.32	16.00	Source: bedrock, Context: 128000
bedrock	meta.llama3-1-70b-instruct-v1:0	meta.llama3-1-70b-instruct-v1:0	0.99	0.99	Source: bedrock, Context: 128000
bedrock	meta.llama3-1-8b-instruct-v1:0	meta.llama3-1-8b-instruct-v1:0	0.22	0.22	Source: bedrock, Context: 128000
bedrock	meta.llama3-2-11b-instruct-v1:0	meta.llama3-2-11b-instruct-v1:0	0.35	0.35	Source: bedrock, Context: 128000
bedrock	meta.llama3-2-1b-instruct-v1:0	meta.llama3-2-1b-instruct-v1:0	0.10	0.10	Source: bedrock, Context: 128000
bedrock	meta.llama3-2-3b-instruct-v1:0	meta.llama3-2-3b-instruct-v1:0	0.15	0.15	Source: bedrock, Context: 128000
bedrock	meta.llama3-2-90b-instruct-v1:0	meta.llama3-2-90b-instruct-v1:0	2.00	2.00	Source: bedrock, Context: 128000
bedrockconverse	meta.llama3-3-70b-instruct-v1:0	meta.llama3-3-70b-instruct-v1:0	0.72	0.72	Source: bedrock_converse, Context: 128000
bedrockconverse	meta.llama4-maverick-17b-instruct-v1:0	meta.llama4-maverick-17b-instruct-v1:0	0.24	0.97	Source: bedrock_converse, Context: 128000
bedrockconverse	meta.llama4-scout-17b-instruct-v1:0	meta.llama4-scout-17b-instruct-v1:0	0.17	0.66	Source: bedrock_converse, Context: 128000
metallama	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.00	0.00	Source: meta_llama, Context: 128000
metallama	Llama-3.3-8B-Instruct	llama-3.3-8b-instruct	0.00	0.00	Source: meta_llama, Context: 128000
metallama	Llama-4-Maverick-17B-128E-Instruct-FP8	llama-4-maverick-17b-128e-instruct-fp8	0.00	0.00	Source: meta_llama, Context: 1000000
metallama	Llama-4-Scout-17B-16E-Instruct-FP8	llama-4-scout-17b-16e-instruct-fp8	0.00	0.00	Source: meta_llama, Context: 10000000
bedrockconverse	minimax.minimax-m2	minimax.minimax-m2	0.30	1.20	Source: bedrock_converse, Context: 128000
minimax	speech-02-hd	speech-02-hd	0.00	0.00	Source: minimax, Context: N/A
minimax	speech-02-turbo	speech-02-turbo	0.00	0.00	Source: minimax, Context: N/A
minimax	speech-2.6-hd	speech-2.6-hd	0.00	0.00	Source: minimax, Context: N/A
minimax	speech-2.6-turbo	speech-2.6-turbo	0.00	0.00	Source: minimax, Context: N/A
minimax	MiniMax-M2.1-lightning	minimax-m2.1-lightning	0.30	2.40	Source: minimax, Context: 1000000
bedrockconverse	mistral.magistral-small-2509	mistral.magistral-small-2509	0.50	1.50	Source: bedrock_converse, Context: 128000
bedrockconverse	mistral.ministral-3-14b-instruct	mistral.ministral-3-14b-instruct	0.20	0.20	Source: bedrock_converse, Context: 128000
bedrockconverse	mistral.ministral-3-3b-instruct	mistral.ministral-3-3b-instruct	0.10	0.10	Source: bedrock_converse, Context: 128000
bedrockconverse	mistral.ministral-3-8b-instruct	mistral.ministral-3-8b-instruct	0.15	0.15	Source: bedrock_converse, Context: 128000
bedrock	mistral.mistral-large-2407-v1:0	mistral.mistral-large-2407-v1:0	3.00	9.00	Source: bedrock, Context: 128000
bedrockconverse	mistral.mistral-large-3-675b-instruct	mistral.mistral-large-3-675b-instruct	0.50	1.50	Source: bedrock_converse, Context: 128000
bedrock	mistral.mistral-small-2402-v1:0	mistral.mistral-small-2402-v1:0	1.00	3.00	Source: bedrock, Context: 32000
bedrockconverse	mistral.voxtral-mini-3b-2507	mistral.voxtral-mini-3b-2507	0.04	0.04	Source: bedrock_converse, Context: 128000
bedrockconverse	mistral.voxtral-small-24b-2507	mistral.voxtral-small-24b-2507	0.10	0.30	Source: bedrock_converse, Context: 128000
mistral	codestral-2405	codestral-2405	1.00	3.00	Source: mistral, Context: 32000
mistral	codestral-2508	codestral-2508	0.30	0.90	Source: mistral, Context: 256000
mistral	codestral-mamba-latest	codestral-mamba-latest	0.25	0.25	Source: mistral, Context: 256000
mistral	magistral-medium-2506	magistral-medium-2506	2.00	5.00	Source: mistral, Context: 40000
mistral	magistral-medium-2509	magistral-medium-2509	2.00	5.00	Source: mistral, Context: 40000
mistral	mistral-ocr-latest	mistral-ocr-latest	0.00	0.00	Source: mistral, Context: N/A
mistral	mistral-ocr-2505-completion	mistral-ocr-2505-completion	0.00	0.00	Source: mistral, Context: N/A
mistral	magistral-small-2506	magistral-small-2506	0.50	1.50	Source: mistral, Context: 40000
mistral	magistral-small-latest	magistral-small-latest	0.50	1.50	Source: mistral, Context: 40000
mistral	codestral-embed	codestral-embed	0.15	0.00	Source: mistral, Context: 8192
mistral	codestral-embed-2505	codestral-embed-2505	0.15	0.00	Source: mistral, Context: 8192
mistral	mistral-large-2402	mistral-large-2402	4.00	12.00	Source: mistral, Context: 32000
mistral	mistral-large-2407	mistral-large-2407	3.00	9.00	Source: mistral, Context: 128000
mistral	mistral-large-3	mistral-large-3	0.50	1.50	Source: mistral, Context: 256000
mistral	mistral-medium	mistral-medium	2.70	8.10	Source: mistral, Context: 32000
mistral	mistral-medium-2312	mistral-medium-2312	2.70	8.10	Source: mistral, Context: 32000
mistral	mistral-small	mistral-small	0.10	0.30	Source: mistral, Context: 32000
mistral	mistral-tiny	mistral-tiny	0.25	0.25	Source: mistral, Context: 32000
mistral	open-codestral-mamba	open-codestral-mamba	0.25	0.25	Source: mistral, Context: 256000
mistral	open-mistral-nemo	open-mistral-nemo	0.30	0.30	Source: mistral, Context: 128000
mistral	open-mistral-nemo-2407	open-mistral-nemo-2407	0.30	0.30	Source: mistral, Context: 128000
mistral	pixtral-12b-2409	pixtral-12b-2409	0.15	0.15	Source: mistral, Context: 128000
mistral	pixtral-large-2411	pixtral-large-2411	2.00	6.00	Source: mistral, Context: 128000
bedrockconverse	moonshot.kimi-k2-thinking	moonshot.kimi-k2-thinking	0.60	2.50	Source: bedrock_converse, Context: 128000
moonshot	kimi-k2-0711-preview	kimi-k2-0711-preview	0.60	2.50	Source: moonshot, Context: 131072
moonshot	kimi-k2-0905-preview	kimi-k2-0905-preview	0.60	2.50	Source: moonshot, Context: 262144
moonshot	kimi-k2-turbo-preview	kimi-k2-turbo-preview	1.15	8.00	Source: moonshot, Context: 262144
moonshot	kimi-latest	kimi-latest	2.00	5.00	Source: moonshot, Context: 131072
moonshot	kimi-latest-128k	kimi-latest-128k	2.00	5.00	Source: moonshot, Context: 131072
moonshot	kimi-latest-32k	kimi-latest-32k	1.00	3.00	Source: moonshot, Context: 32768
moonshot	kimi-latest-8k	kimi-latest-8k	0.20	2.00	Source: moonshot, Context: 8192
moonshot	kimi-thinking-preview	kimi-thinking-preview	0.60	2.50	Source: moonshot, Context: 131072
moonshot	kimi-k2-thinking	kimi-k2-thinking	0.60	2.50	Source: moonshot, Context: 262144
moonshot	kimi-k2-thinking-turbo	kimi-k2-thinking-turbo	1.15	8.00	Source: moonshot, Context: 262144
moonshot	moonshot-v1-128k	moonshot-v1-128k	2.00	5.00	Source: moonshot, Context: 131072
moonshot	moonshot-v1-128k-0430	moonshot-v1-128k-0430	2.00	5.00	Source: moonshot, Context: 131072
moonshot	moonshot-v1-128k-vision-preview	moonshot-v1-128k-vision-preview	2.00	5.00	Source: moonshot, Context: 131072
moonshot	moonshot-v1-32k	moonshot-v1-32k	1.00	3.00	Source: moonshot, Context: 32768
moonshot	moonshot-v1-32k-0430	moonshot-v1-32k-0430	1.00	3.00	Source: moonshot, Context: 32768
moonshot	moonshot-v1-32k-vision-preview	moonshot-v1-32k-vision-preview	1.00	3.00	Source: moonshot, Context: 32768
moonshot	moonshot-v1-8k	moonshot-v1-8k	0.20	2.00	Source: moonshot, Context: 8192
moonshot	moonshot-v1-8k-0430	moonshot-v1-8k-0430	0.20	2.00	Source: moonshot, Context: 8192
moonshot	moonshot-v1-8k-vision-preview	moonshot-v1-8k-vision-preview	0.20	2.00	Source: moonshot, Context: 8192
moonshot	moonshot-v1-auto	moonshot-v1-auto	2.00	5.00	Source: moonshot, Context: 131072
vertex	multimodalembedding	multimodalembedding	0.80	0.00	Source: vertex, Context: 2048
vertex	multimodalembedding@001	multimodalembedding@001	0.80	0.00	Source: vertex, Context: 2048
nscale	QwQ-32B	qwq-32b	0.18	0.20	Source: nscale, Context: N/A
nscale	Qwen2.5-Coder-32B-Instruct	qwen2.5-coder-32b-instruct	0.06	0.20	Source: nscale, Context: N/A
nscale	Qwen2.5-Coder-3B-Instruct	qwen2.5-coder-3b-instruct	0.01	0.03	Source: nscale, Context: N/A
nscale	Qwen2.5-Coder-7B-Instruct	qwen2.5-coder-7b-instruct	0.01	0.03	Source: nscale, Context: N/A
nscale	FLUX.1-schnell	flux.1-schnell	0.00	0.00	Source: nscale, Context: N/A
nscale	DeepSeek-R1-Distill-Llama-70B	deepseek-r1-distill-llama-70b	0.38	0.38	Source: nscale, Context: N/A
nscale	DeepSeek-R1-Distill-Llama-8B	deepseek-r1-distill-llama-8b	0.03	0.03	Source: nscale, Context: N/A
nscale	DeepSeek-R1-Distill-Qwen-1.5B	deepseek-r1-distill-qwen-1.5b	0.09	0.09	Source: nscale, Context: N/A
nscale	DeepSeek-R1-Distill-Qwen-14B	deepseek-r1-distill-qwen-14b	0.07	0.07	Source: nscale, Context: N/A
nscale	DeepSeek-R1-Distill-Qwen-32B	deepseek-r1-distill-qwen-32b	0.15	0.15	Source: nscale, Context: N/A
nscale	DeepSeek-R1-Distill-Qwen-7B	deepseek-r1-distill-qwen-7b	0.20	0.20	Source: nscale, Context: N/A
nscale	Llama-3.1-8B-Instruct	llama-3.1-8b-instruct	0.03	0.03	Source: nscale, Context: N/A
nscale	Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	0.20	0.20	Source: nscale, Context: N/A
nscale	Llama-4-Scout-17B-16E-Instruct	llama-4-scout-17b-16e-instruct	0.09	0.29	Source: nscale, Context: N/A
nscale	mixtral-8x22b-instruct-v0.1	mixtral-8x22b-instruct-v0.1	0.60	0.60	Source: nscale, Context: N/A
nscale	stable-diffusion-xl-base-1.0	stable-diffusion-xl-base-1.0	0.00	0.00	Source: nscale, Context: N/A
bedrockconverse	nvidia.nemotron-nano-12b-v2	nvidia.nemotron-nano-12b-v2	0.20	0.60	Source: bedrock_converse, Context: 128000
bedrockconverse	nvidia.nemotron-nano-9b-v2	nvidia.nemotron-nano-9b-v2	0.06	0.23	Source: bedrock_converse, Context: 128000
openai	o1-2024-12-17	o1-2024-12-17	15.00	60.00	Source: openai, Context: 200000
openai	o1-mini-2024-09-12	o1-mini-2024-09-12	3.00	12.00	Source: openai, Context: 128000
openai	o1-preview-2024-09-12	o1-preview-2024-09-12	15.00	60.00	Source: openai, Context: 128000
openai	o1-pro-2025-03-19	o1-pro-2025-03-19	150.00	600.00	Source: openai, Context: 200000
openai	o3-2025-04-16	o3-2025-04-16	2.00	8.00	Source: openai, Context: 200000
openai	o3-deep-research-2025-06-26	o3-deep-research-2025-06-26	10.00	40.00	Source: openai, Context: 200000
openai	o3-mini-2025-01-31	o3-mini-2025-01-31	1.10	4.40	Source: openai, Context: 200000
openai	o3-pro-2025-06-10	o3-pro-2025-06-10	20.00	80.00	Source: openai, Context: 200000
openai	o4-mini-2025-04-16	o4-mini-2025-04-16	1.10	4.40	Source: openai, Context: 200000
openai	o4-mini-deep-research-2025-06-26	o4-mini-deep-research-2025-06-26	2.00	8.00	Source: openai, Context: 200000
oci	meta.llama-3.1-405b-instruct	meta.llama-3.1-405b-instruct	10.68	10.68	Source: oci, Context: 128000
oci	meta.llama-3.2-90b-vision-instruct	meta.llama-3.2-90b-vision-instruct	2.00	2.00	Source: oci, Context: 128000
oci	meta.llama-3.3-70b-instruct	meta.llama-3.3-70b-instruct	0.72	0.72	Source: oci, Context: 128000
oci	meta.llama-4-maverick-17b-128e-instruct-fp8	meta.llama-4-maverick-17b-128e-instruct-fp8	0.72	0.72	Source: oci, Context: 512000
oci	meta.llama-4-scout-17b-16e-instruct	meta.llama-4-scout-17b-16e-instruct	0.72	0.72	Source: oci, Context: 192000
oci	xai.grok-3	xai.grok-3	3.00	0.15	Source: oci, Context: 131072
oci	xai.grok-3-fast	xai.grok-3-fast	5.00	25.00	Source: oci, Context: 131072
oci	xai.grok-3-mini	xai.grok-3-mini	0.30	0.50	Source: oci, Context: 131072
oci	xai.grok-3-mini-fast	xai.grok-3-mini-fast	0.60	4.00	Source: oci, Context: 131072
oci	xai.grok-4	xai.grok-4	3.00	0.15	Source: oci, Context: 128000
oci	cohere.command-latest	cohere.command-latest	1.56	1.56	Source: oci, Context: 128000
oci	cohere.command-a-03-2025	cohere.command-a-03-2025	1.56	1.56	Source: oci, Context: 256000
oci	cohere.command-plus-latest	cohere.command-plus-latest	1.56	1.56	Source: oci, Context: 128000
ollama	codegeex4	codegeex4	0.00	0.00	Source: ollama, Context: 32768
ollama	codegemma	codegemma	0.00	0.00	Source: ollama, Context: 8192
ollama	codellama	codellama	0.00	0.00	Source: ollama, Context: 4096
ollama	deepseek-coder-v2-base	deepseek-coder-v2-base	0.00	0.00	Source: ollama, Context: 8192
ollama	deepseek-coder-v2-instruct	deepseek-coder-v2-instruct	0.00	0.00	Source: ollama, Context: 32768
ollama	deepseek-coder-v2-lite-base	deepseek-coder-v2-lite-base	0.00	0.00	Source: ollama, Context: 8192
ollama	deepseek-coder-v2-lite-instruct	deepseek-coder-v2-lite-instruct	0.00	0.00	Source: ollama, Context: 32768
ollama	deepseek-v3.1:671b-cloud	deepseek-v3.1:671b-cloud	0.00	0.00	Source: ollama, Context: 163840
ollama	gpt-oss:120b-cloud	gpt-oss:120b-cloud	0.00	0.00	Source: ollama, Context: 131072
ollama	gpt-oss:20b-cloud	gpt-oss:20b-cloud	0.00	0.00	Source: ollama, Context: 131072
ollama	internlm2_5-20b-chat	internlm2_5-20b-chat	0.00	0.00	Source: ollama, Context: 32768
ollama	llama2	llama2	0.00	0.00	Source: ollama, Context: 4096
ollama	llama2-uncensored	llama2-uncensored	0.00	0.00	Source: ollama, Context: 4096
ollama	llama2:13b	llama2:13b	0.00	0.00	Source: ollama, Context: 4096
ollama	llama2:70b	llama2:70b	0.00	0.00	Source: ollama, Context: 4096
ollama	llama2:7b	llama2:7b	0.00	0.00	Source: ollama, Context: 4096
ollama	llama3	llama3	0.00	0.00	Source: ollama, Context: 8192
ollama	llama3.1	llama3.1	0.00	0.00	Source: ollama, Context: 8192
ollama	llama3:70b	llama3:70b	0.00	0.00	Source: ollama, Context: 8192
ollama	llama3:8b	llama3:8b	0.00	0.00	Source: ollama, Context: 8192
ollama	mistral	mistral	0.00	0.00	Source: ollama, Context: 8192
ollama	mistral-7B-Instruct-v0.1	mistral-7b-instruct-v0.1	0.00	0.00	Source: ollama, Context: 8192
ollama	mistral-7B-Instruct-v0.2	mistral-7b-instruct-v0.2	0.00	0.00	Source: ollama, Context: 32768
ollama	mistral-large-instruct-2407	mistral-large-instruct-2407	0.00	0.00	Source: ollama, Context: 65536
ollama	mixtral-8x22B-Instruct-v0.1	mixtral-8x22b-instruct-v0.1	0.00	0.00	Source: ollama, Context: 65536
ollama	mixtral-8x7B-Instruct-v0.1	mixtral-8x7b-instruct-v0.1	0.00	0.00	Source: ollama, Context: 32768
ollama	orca-mini	orca-mini	0.00	0.00	Source: ollama, Context: 4096
ollama	qwen3-coder:480b-cloud	qwen3-coder:480b-cloud	0.00	0.00	Source: ollama, Context: 262144
ollama	vicuna	vicuna	0.00	0.00	Source: ollama, Context: 2048
openai	omni-moderation-2024-09-26	omni-moderation-2024-09-26	0.00	0.00	Source: openai, Context: 32768
openai	omni-moderation-latest	omni-moderation-latest	0.00	0.00	Source: openai, Context: 32768
openai	omni-moderation-latest-intents	omni-moderation-latest-intents	0.00	0.00	Source: openai, Context: 32768
bedrockconverse	openai.gpt-oss-120b-1:0	openai.gpt-oss-120b-1:0	0.15	0.60	Source: bedrock_converse, Context: 128000
bedrockconverse	openai.gpt-oss-20b-1:0	openai.gpt-oss-20b-1:0	0.07	0.30	Source: bedrock_converse, Context: 128000
bedrockconverse	openai.gpt-oss-safeguard-120b	openai.gpt-oss-safeguard-120b	0.15	0.60	Source: bedrock_converse, Context: 128000
bedrockconverse	openai.gpt-oss-safeguard-20b	openai.gpt-oss-safeguard-20b	0.07	0.20	Source: bedrock_converse, Context: 128000
ovhcloud	llava-v1.6-mistral-7b-hf	llava-v1.6-mistral-7b-hf	0.29	0.29	Source: ovhcloud, Context: 32000
ovhcloud	mamba-codestral-7B-v0.1	mamba-codestral-7b-v0.1	0.19	0.19	Source: ovhcloud, Context: 256000
palm	chat-bison	chat-bison	0.13	0.13	Source: palm, Context: 8192
palm	chat-bison-001	chat-bison-001	0.13	0.13	Source: palm, Context: 8192
palm	text-bison	text-bison	0.13	0.13	Source: palm, Context: 8192
palm	text-bison-001	text-bison-001	0.13	0.13	Source: palm, Context: 8192
palm	text-bison-safety-off	text-bison-safety-off	0.13	0.13	Source: palm, Context: 8192
palm	text-bison-safety-recitation-off	text-bison-safety-recitation-off	0.13	0.13	Source: palm, Context: 8192
parallelai	search	search	0.00	0.00	Source: parallel_ai, Context: N/A
parallelai	search-pro	search-pro	0.00	0.00	Source: parallel_ai, Context: N/A
perplexity	codellama-34b-instruct	codellama-34b-instruct	0.35	1.40	Source: perplexity, Context: 16384
perplexity	codellama-70b-instruct	codellama-70b-instruct	0.70	2.80	Source: perplexity, Context: 16384
perplexity	llama-2-70b-chat	llama-2-70b-chat	0.70	2.80	Source: perplexity, Context: 4096
perplexity	llama-3.1-70b-instruct	llama-3.1-70b-instruct	1.00	1.00	Source: perplexity, Context: 131072
perplexity	llama-3.1-8b-instruct	llama-3.1-8b-instruct	0.20	0.20	Source: perplexity, Context: 131072
perplexity	llama-3.1-sonar-huge-128k-online	llama-3.1-sonar-huge-128k-online	5.00	5.00	Source: perplexity, Context: 127072
perplexity	llama-3.1-sonar-large-128k-chat	llama-3.1-sonar-large-128k-chat	1.00	1.00	Source: perplexity, Context: 131072
perplexity	llama-3.1-sonar-large-128k-online	llama-3.1-sonar-large-128k-online	1.00	1.00	Source: perplexity, Context: 127072
perplexity	llama-3.1-sonar-small-128k-chat	llama-3.1-sonar-small-128k-chat	0.20	0.20	Source: perplexity, Context: 131072
perplexity	llama-3.1-sonar-small-128k-online	llama-3.1-sonar-small-128k-online	0.20	0.20	Source: perplexity, Context: 127072
perplexity	mistral-7b-instruct	mistral-7b-instruct	0.07	0.28	Source: perplexity, Context: 4096
perplexity	mixtral-8x7b-instruct	mixtral-8x7b-instruct	0.07	0.28	Source: perplexity, Context: 4096
perplexity	pplx-70b-chat	pplx-70b-chat	0.70	2.80	Source: perplexity, Context: 4096
perplexity	pplx-70b-online	pplx-70b-online	0.00	2.80	Source: perplexity, Context: 4096
perplexity	pplx-7b-chat	pplx-7b-chat	0.07	0.28	Source: perplexity, Context: 8192
perplexity	pplx-7b-online	pplx-7b-online	0.00	0.28	Source: perplexity, Context: 4096
perplexity	sonar-deep-research	sonar-deep-research	2.00	8.00	Source: perplexity, Context: 128000
perplexity	sonar-medium-chat	sonar-medium-chat	0.60	1.80	Source: perplexity, Context: 16384
perplexity	sonar-medium-online	sonar-medium-online	0.00	1.80	Source: perplexity, Context: 12000
perplexity	sonar-reasoning	sonar-reasoning	1.00	5.00	Source: perplexity, Context: 128000
perplexity	sonar-small-chat	sonar-small-chat	0.07	0.28	Source: perplexity, Context: 16384
perplexity	sonar-small-online	sonar-small-online	0.00	0.28	Source: perplexity, Context: 12000
publicai	apertus-8b-instruct	apertus-8b-instruct	0.00	0.00	Source: publicai, Context: 8192
publicai	apertus-70b-instruct	apertus-70b-instruct	0.00	0.00	Source: publicai, Context: 8192
publicai	Gemma-SEA-LION-v4-27B-IT	gemma-sea-lion-v4-27b-it	0.00	0.00	Source: publicai, Context: 8192
publicai	salamandra-7b-instruct-tools-16k	salamandra-7b-instruct-tools-16k	0.00	0.00	Source: publicai, Context: 16384
publicai	ALIA-40b-instruct_Q8_0	alia-40b-instruct_q8_0	0.00	0.00	Source: publicai, Context: 8192
publicai	Olmo-3-7B-Instruct	olmo-3-7b-instruct	0.00	0.00	Source: publicai, Context: 32768
publicai	Qwen-SEA-LION-v4-32B-IT	qwen-sea-lion-v4-32b-it	0.00	0.00	Source: publicai, Context: 32768
publicai	Olmo-3-7B-Think	olmo-3-7b-think	0.00	0.00	Source: publicai, Context: 32768
publicai	Olmo-3-32B-Think	olmo-3-32b-think	0.00	0.00	Source: publicai, Context: 32768
bedrockconverse	qwen.qwen3-coder-480b-a35b-v1:0	qwen.qwen3-coder-480b-a35b-v1:0	0.22	1.80	Source: bedrock_converse, Context: 262000
bedrockconverse	qwen.qwen3-235b-a22b-2507-v1:0	qwen.qwen3-235b-a22b-2507-v1:0	0.22	0.88	Source: bedrock_converse, Context: 262144
bedrockconverse	qwen.qwen3-coder-30b-a3b-v1:0	qwen.qwen3-coder-30b-a3b-v1:0	0.15	0.60	Source: bedrock_converse, Context: 262144
bedrockconverse	qwen.qwen3-32b-v1:0	qwen.qwen3-32b-v1:0	0.15	0.60	Source: bedrock_converse, Context: 131072
bedrockconverse	qwen.qwen3-next-80b-a3b	qwen.qwen3-next-80b-a3b	0.15	1.20	Source: bedrock_converse, Context: 128000
bedrockconverse	qwen.qwen3-vl-235b-a22b	qwen.qwen3-vl-235b-a22b	0.53	2.66	Source: bedrock_converse, Context: 128000
recraft	recraftv2	recraftv2	0.00	0.00	Source: recraft, Context: N/A
recraft	recraftv3	recraftv3	0.00	0.00	Source: recraft, Context: N/A
replicate	llama-2-13b	llama-2-13b	0.10	0.50	Source: replicate, Context: 4096
replicate	llama-2-13b-chat	llama-2-13b-chat	0.10	0.50	Source: replicate, Context: 4096
replicate	llama-2-70b	llama-2-70b	0.65	2.75	Source: replicate, Context: 4096
replicate	llama-2-70b-chat	llama-2-70b-chat	0.65	2.75	Source: replicate, Context: 4096
replicate	llama-2-7b	llama-2-7b	0.05	0.25	Source: replicate, Context: 4096
replicate	llama-2-7b-chat	llama-2-7b-chat	0.05	0.25	Source: replicate, Context: 4096
replicate	llama-3-70b	llama-3-70b	0.65	2.75	Source: replicate, Context: 8192
replicate	llama-3-70b-instruct	llama-3-70b-instruct	0.65	2.75	Source: replicate, Context: 8192
replicate	llama-3-8b	llama-3-8b	0.05	0.25	Source: replicate, Context: 8086
replicate	llama-3-8b-instruct	llama-3-8b-instruct	0.05	0.25	Source: replicate, Context: 8086
replicate	mistral-7b-instruct-v0.2	mistral-7b-instruct-v0.2	0.05	0.25	Source: replicate, Context: 4096
replicate	mistral-7b-v0.1	mistral-7b-v0.1	0.05	0.25	Source: replicate, Context: 4096
replicate	mixtral-8x7b-instruct-v0.1	mixtral-8x7b-instruct-v0.1	0.30	1.00	Source: replicate, Context: 4096
cohere	rerank-english-v2.0	rerank-english-v2.0	0.00	0.00	Source: cohere, Context: 4096
cohere	rerank-english-v3.0	rerank-english-v3.0	0.00	0.00	Source: cohere, Context: 4096
cohere	rerank-multilingual-v2.0	rerank-multilingual-v2.0	0.00	0.00	Source: cohere, Context: 4096
cohere	rerank-multilingual-v3.0	rerank-multilingual-v3.0	0.00	0.00	Source: cohere, Context: 4096
cohere	rerank-v3.5	rerank-v3.5	0.00	0.00	Source: cohere, Context: 4096
nvidianim	nv-rerankqa-mistral-4b-v3	nv-rerankqa-mistral-4b-v3	0.00	0.00	Source: nvidia_nim, Context: N/A
nvidianim	llama-3_2-nv-rerankqa-1b-v2	llama-3_2-nv-rerankqa-1b-v2	0.00	0.00	Source: nvidia_nim, Context: N/A
nvidianim	llama-3.2-nv-rerankqa-1b-v2	llama-3.2-nv-rerankqa-1b-v2	0.00	0.00	Source: nvidia_nim, Context: N/A
sagemaker	meta-textgeneration-llama-2-13b	meta-textgeneration-llama-2-13b	0.00	0.00	Source: sagemaker, Context: 4096
sagemaker	meta-textgeneration-llama-2-13b-f	meta-textgeneration-llama-2-13b-f	0.00	0.00	Source: sagemaker, Context: 4096
sagemaker	meta-textgeneration-llama-2-70b	meta-textgeneration-llama-2-70b	0.00	0.00	Source: sagemaker, Context: 4096
sagemaker	meta-textgeneration-llama-2-70b-b-f	meta-textgeneration-llama-2-70b-b-f	0.00	0.00	Source: sagemaker, Context: 4096
sagemaker	meta-textgeneration-llama-2-7b	meta-textgeneration-llama-2-7b	0.00	0.00	Source: sagemaker, Context: 4096
sagemaker	meta-textgeneration-llama-2-7b-f	meta-textgeneration-llama-2-7b-f	0.00	0.00	Source: sagemaker, Context: 4096
sambanova	DeepSeek-R1	deepseek-r1	5.00	7.00	Source: sambanova, Context: 32768
sambanova	DeepSeek-R1-Distill-Llama-70B	deepseek-r1-distill-llama-70b	0.70	1.40	Source: sambanova, Context: 131072
sambanova	DeepSeek-V3-0324	deepseek-v3-0324	3.00	4.50	Source: sambanova, Context: 32768
sambanova	Llama-4-Maverick-17B-128E-Instruct	llama-4-maverick-17b-128e-instruct	0.63	1.80	Source: sambanova, Context: 131072
sambanova	Llama-4-Scout-17B-16E-Instruct	llama-4-scout-17b-16e-instruct	0.40	0.70	Source: sambanova, Context: 8192
sambanova	Meta-Llama-3.1-405B-Instruct	meta-llama-3.1-405b-instruct	5.00	10.00	Source: sambanova, Context: 16384
sambanova	Meta-Llama-3.1-8B-Instruct	meta-llama-3.1-8b-instruct	0.10	0.20	Source: sambanova, Context: 16384
sambanova	Meta-Llama-3.2-1B-Instruct	meta-llama-3.2-1b-instruct	0.04	0.08	Source: sambanova, Context: 16384
sambanova	Meta-Llama-3.2-3B-Instruct	meta-llama-3.2-3b-instruct	0.08	0.16	Source: sambanova, Context: 4096
sambanova	Meta-Llama-3.3-70B-Instruct	meta-llama-3.3-70b-instruct	0.60	1.20	Source: sambanova, Context: 131072
sambanova	Meta-Llama-Guard-3-8B	meta-llama-guard-3-8b	0.30	0.30	Source: sambanova, Context: 16384
sambanova	QwQ-32B	qwq-32b	0.50	1.00	Source: sambanova, Context: 16384
sambanova	Qwen2-Audio-7B-Instruct	qwen2-audio-7b-instruct	0.50	100.00	Source: sambanova, Context: 4096
sambanova	Qwen3-32B	qwen3-32b	0.40	0.80	Source: sambanova, Context: 8192
sambanova	DeepSeek-V3.1	deepseek-v3.1	3.00	4.50	Source: sambanova, Context: 32768
sambanova	gpt-oss-120b	gpt-oss-120b	3.00	4.50	Source: sambanova, Context: 131072
snowflake	claude-3-5-sonnet	claude-3-5-sonnet	0.00	0.00	Source: snowflake, Context: 18000
snowflake	deepseek-r1	deepseek-r1	0.00	0.00	Source: snowflake, Context: 32768
snowflake	gemma-7b	gemma-7b	0.00	0.00	Source: snowflake, Context: 8000
snowflake	jamba-1.5-large	jamba-1.5-large	0.00	0.00	Source: snowflake, Context: 256000
snowflake	jamba-1.5-mini	jamba-1.5-mini	0.00	0.00	Source: snowflake, Context: 256000
snowflake	jamba-instruct	jamba-instruct	0.00	0.00	Source: snowflake, Context: 256000
snowflake	llama2-70b-chat	llama2-70b-chat	0.00	0.00	Source: snowflake, Context: 4096
snowflake	llama3-70b	llama3-70b	0.00	0.00	Source: snowflake, Context: 8000
snowflake	llama3-8b	llama3-8b	0.00	0.00	Source: snowflake, Context: 8000
snowflake	llama3.1-405b	llama3.1-405b	0.00	0.00	Source: snowflake, Context: 128000
snowflake	llama3.1-70b	llama3.1-70b	0.00	0.00	Source: snowflake, Context: 128000
snowflake	llama3.1-8b	llama3.1-8b	0.00	0.00	Source: snowflake, Context: 128000
snowflake	llama3.2-1b	llama3.2-1b	0.00	0.00	Source: snowflake, Context: 128000
snowflake	llama3.2-3b	llama3.2-3b	0.00	0.00	Source: snowflake, Context: 128000
snowflake	llama3.3-70b	llama3.3-70b	0.00	0.00	Source: snowflake, Context: 128000
snowflake	mistral-7b	mistral-7b	0.00	0.00	Source: snowflake, Context: 32000
snowflake	mistral-large	mistral-large	0.00	0.00	Source: snowflake, Context: 32000
snowflake	mistral-large2	mistral-large2	0.00	0.00	Source: snowflake, Context: 128000
snowflake	mixtral-8x7b	mixtral-8x7b	0.00	0.00	Source: snowflake, Context: 32000
snowflake	reka-core	reka-core	0.00	0.00	Source: snowflake, Context: 32000
snowflake	reka-flash	reka-flash	0.00	0.00	Source: snowflake, Context: 100000
snowflake	snowflake-arctic	snowflake-arctic	0.00	0.00	Source: snowflake, Context: 4096
snowflake	snowflake-llama-3.1-405b	snowflake-llama-3.1-405b	0.00	0.00	Source: snowflake, Context: 8000
snowflake	snowflake-llama-3.3-70b	snowflake-llama-3.3-70b	0.00	0.00	Source: snowflake, Context: 8000
stability	sd3	sd3	0.00	0.00	Source: stability, Context: N/A
stability	sd3-large	sd3-large	0.00	0.00	Source: stability, Context: N/A
stability	sd3-large-turbo	sd3-large-turbo	0.00	0.00	Source: stability, Context: N/A
stability	sd3-medium	sd3-medium	0.00	0.00	Source: stability, Context: N/A
stability	sd3.5-large	sd3.5-large	0.00	0.00	Source: stability, Context: N/A
stability	sd3.5-large-turbo	sd3.5-large-turbo	0.00	0.00	Source: stability, Context: N/A
stability	sd3.5-medium	sd3.5-medium	0.00	0.00	Source: stability, Context: N/A
stability	stable-image-ultra	stable-image-ultra	0.00	0.00	Source: stability, Context: N/A
stability	inpaint	inpaint	0.00	0.00	Source: stability, Context: N/A
stability	outpaint	outpaint	0.00	0.00	Source: stability, Context: N/A
stability	erase	erase	0.00	0.00	Source: stability, Context: N/A
stability	search-and-replace	search-and-replace	0.00	0.00	Source: stability, Context: N/A
stability	search-and-recolor	search-and-recolor	0.00	0.00	Source: stability, Context: N/A
stability	remove-background	remove-background	0.00	0.00	Source: stability, Context: N/A
stability	replace-background-and-relight	replace-background-and-relight	0.00	0.00	Source: stability, Context: N/A
stability	sketch	sketch	0.00	0.00	Source: stability, Context: N/A
stability	structure	structure	0.00	0.00	Source: stability, Context: N/A
stability	style	style	0.00	0.00	Source: stability, Context: N/A
stability	style-transfer	style-transfer	0.00	0.00	Source: stability, Context: N/A
stability	fast	fast	0.00	0.00	Source: stability, Context: N/A
stability	conservative	conservative	0.00	0.00	Source: stability, Context: N/A
stability	creative	creative	0.00	0.00	Source: stability, Context: N/A
stability	stable-image-core	stable-image-core	0.00	0.00	Source: stability, Context: N/A
bedrock	stability.sd3-5-large-v1:0	stability.sd3-5-large-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.sd3-large-v1:0	stability.sd3-large-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-core-v1:0	stability.stable-image-core-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-conservative-upscale-v1:0	stability.stable-conservative-upscale-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-creative-upscale-v1:0	stability.stable-creative-upscale-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-fast-upscale-v1:0	stability.stable-fast-upscale-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-outpaint-v1:0	stability.stable-outpaint-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-control-sketch-v1:0	stability.stable-image-control-sketch-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-control-structure-v1:0	stability.stable-image-control-structure-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-erase-object-v1:0	stability.stable-image-erase-object-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-inpaint-v1:0	stability.stable-image-inpaint-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-remove-background-v1:0	stability.stable-image-remove-background-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-search-recolor-v1:0	stability.stable-image-search-recolor-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-search-replace-v1:0	stability.stable-image-search-replace-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-style-guide-v1:0	stability.stable-image-style-guide-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-style-transfer-v1:0	stability.stable-style-transfer-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-core-v1:1	stability.stable-image-core-v1:1	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-ultra-v1:0	stability.stable-image-ultra-v1:0	0.00	0.00	Source: bedrock, Context: 77
bedrock	stability.stable-image-ultra-v1:1	stability.stable-image-ultra-v1:1	0.00	0.00	Source: bedrock, Context: 77
linkup	search	search	0.00	0.00	Source: linkup, Context: N/A
linkup	search-deep	search-deep	0.00	0.00	Source: linkup, Context: N/A
tavily	search	search	0.00	0.00	Source: tavily, Context: N/A
tavily	search-advanced	search-advanced	0.00	0.00	Source: tavily, Context: N/A
vertex	text-bison	text-bison	0.00	0.00	Source: vertex, Context: 8192
vertex	text-bison32k	text-bison32k	0.13	0.13	Source: vertex, Context: 8192
vertex	text-bison32k@002	text-bison32k@002	0.13	0.13	Source: vertex, Context: 8192
vertex	text-bison@001	text-bison@001	0.00	0.00	Source: vertex, Context: 8192
vertex	text-bison@002	text-bison@002	0.00	0.00	Source: vertex, Context: 8192
textcompletioncodestral	codestral-2405	codestral-2405	0.00	0.00	Source: text-completion-codestral, Context: 32000
textcompletioncodestral	codestral-latest	codestral-latest	0.00	0.00	Source: text-completion-codestral, Context: 32000
vertex	text-embedding-004	text-embedding-004	0.10	0.00	Source: vertex, Context: 2048
vertex	text-embedding-005	text-embedding-005	0.10	0.00	Source: vertex, Context: 2048
openai	text-embedding-ada-002-v2	text-embedding-ada-002-v2	0.10	0.00	Source: openai, Context: 8191
vertex	text-embedding-large-exp-03-07	text-embedding-large-exp-03-07	0.10	0.00	Source: vertex, Context: 8192
vertex	text-embedding-preview-0409	text-embedding-preview-0409	0.01	0.00	Source: vertex, Context: 3072
openai	text-moderation-007	text-moderation-007	0.00	0.00	Source: openai, Context: 32768
openai	text-moderation-latest	text-moderation-latest	0.00	0.00	Source: openai, Context: 32768
openai	text-moderation-stable	text-moderation-stable	0.00	0.00	Source: openai, Context: 32768
vertex	text-multilingual-embedding-002	text-multilingual-embedding-002	0.10	0.00	Source: vertex, Context: 2048
vertex	text-multilingual-embedding-preview-0409	text-multilingual-embedding-preview-0409	0.01	0.00	Source: vertex, Context: 3072
vertex	text-unicorn	text-unicorn	10.00	28.00	Source: vertex, Context: 8192
vertex	text-unicorn@001	text-unicorn@001	10.00	28.00	Source: vertex, Context: 8192
vertex	textembedding-gecko	textembedding-gecko	0.10	0.00	Source: vertex, Context: 3072
vertex	textembedding-gecko-multilingual	textembedding-gecko-multilingual	0.10	0.00	Source: vertex, Context: 3072
vertex	textembedding-gecko-multilingual@001	textembedding-gecko-multilingual@001	0.10	0.00	Source: vertex, Context: 3072
vertex	textembedding-gecko@001	textembedding-gecko@001	0.10	0.00	Source: vertex, Context: 3072
vertex	textembedding-gecko@003	textembedding-gecko@003	0.10	0.00	Source: vertex, Context: 3072
openai	tts-1	tts-1	0.00	0.00	Source: openai, Context: N/A
openai	tts-1-hd	tts-1-hd	0.00	0.00	Source: openai, Context: N/A
awspolly	standard	standard	0.00	0.00	Source: aws_polly, Context: N/A
awspolly	neural	neural	0.00	0.00	Source: aws_polly, Context: N/A
awspolly	long-form	long-form	0.00	0.00	Source: aws_polly, Context: N/A
awspolly	generative	generative	0.00	0.00	Source: aws_polly, Context: N/A
bedrockconverse	us.amazon.nova-lite-v1:0	us.amazon.nova-lite-v1:0	0.06	0.24	Source: bedrock_converse, Context: 300000
bedrockconverse	us.amazon.nova-micro-v1:0	us.amazon.nova-micro-v1:0	0.04	0.14	Source: bedrock_converse, Context: 128000
bedrockconverse	us.amazon.nova-premier-v1:0	us.amazon.nova-premier-v1:0	2.50	12.50	Source: bedrock_converse, Context: 1000000
bedrockconverse	us.amazon.nova-pro-v1:0	us.amazon.nova-pro-v1:0	0.80	3.20	Source: bedrock_converse, Context: 300000
bedrockconverse	us.anthropic.claude-haiku-4-5-20251001-v1:0	us.anthropic.claude-haiku-4-5-20251001-v1:0	1.10	5.50	Source: bedrock_converse, Context: 200000
bedrock	us.anthropic.claude-3-5-sonnet-20240620-v1:0	us.anthropic.claude-3-5-sonnet-20240620-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrock	us.anthropic.claude-3-5-sonnet-20241022-v2:0	us.anthropic.claude-3-5-sonnet-20241022-v2:0	3.00	15.00	Source: bedrock, Context: 200000
bedrockconverse	us.anthropic.claude-3-7-sonnet-20250219-v1:0	us.anthropic.claude-3-7-sonnet-20250219-v1:0	3.00	15.00	Source: bedrock_converse, Context: 200000
bedrock	us.anthropic.claude-3-haiku-20240307-v1:0	us.anthropic.claude-3-haiku-20240307-v1:0	0.25	1.25	Source: bedrock, Context: 200000
bedrock	us.anthropic.claude-3-opus-20240229-v1:0	us.anthropic.claude-3-opus-20240229-v1:0	15.00	75.00	Source: bedrock, Context: 200000
bedrock	us.anthropic.claude-3-sonnet-20240229-v1:0	us.anthropic.claude-3-sonnet-20240229-v1:0	3.00	15.00	Source: bedrock, Context: 200000
bedrockconverse	us.anthropic.claude-opus-4-1-20250805-v1:0	us.anthropic.claude-opus-4-1-20250805-v1:0	15.00	75.00	Source: bedrock_converse, Context: 200000
bedrockconverse	us.anthropic.claude-sonnet-4-5-20250929-v1:0	us.anthropic.claude-sonnet-4-5-20250929-v1:0	3.30	16.50	Source: bedrock_converse, Context: 200000
bedrockconverse	au.anthropic.claude-haiku-4-5-20251001-v1:0	au.anthropic.claude-haiku-4-5-20251001-v1:0	1.10	5.50	Source: bedrock_converse, Context: 200000
bedrockconverse	us.anthropic.claude-opus-4-20250514-v1:0	us.anthropic.claude-opus-4-20250514-v1:0	15.00	75.00	Source: bedrock_converse, Context: 200000
bedrockconverse	us.anthropic.claude-opus-4-5-20251101-v1:0	us.anthropic.claude-opus-4-5-20251101-v1:0	5.00	25.00	Source: bedrock_converse, Context: 200000
bedrockconverse	global.anthropic.claude-opus-4-5-20251101-v1:0	global.anthropic.claude-opus-4-5-20251101-v1:0	5.00	25.00	Source: bedrock_converse, Context: 200000
bedrockconverse	eu.anthropic.claude-opus-4-5-20251101-v1:0	eu.anthropic.claude-opus-4-5-20251101-v1:0	5.00	25.00	Source: bedrock_converse, Context: 200000
bedrockconverse	us.anthropic.claude-sonnet-4-20250514-v1:0	us.anthropic.claude-sonnet-4-20250514-v1:0	3.00	15.00	Source: bedrock_converse, Context: 1000000
bedrockconverse	us.deepseek.r1-v1:0	us.deepseek.r1-v1:0	1.35	5.40	Source: bedrock_converse, Context: 128000
bedrock	us.meta.llama3-1-405b-instruct-v1:0	us.meta.llama3-1-405b-instruct-v1:0	5.32	16.00	Source: bedrock, Context: 128000
bedrock	us.meta.llama3-1-70b-instruct-v1:0	us.meta.llama3-1-70b-instruct-v1:0	0.99	0.99	Source: bedrock, Context: 128000
bedrock	us.meta.llama3-1-8b-instruct-v1:0	us.meta.llama3-1-8b-instruct-v1:0	0.22	0.22	Source: bedrock, Context: 128000
bedrock	us.meta.llama3-2-11b-instruct-v1:0	us.meta.llama3-2-11b-instruct-v1:0	0.35	0.35	Source: bedrock, Context: 128000
bedrock	us.meta.llama3-2-1b-instruct-v1:0	us.meta.llama3-2-1b-instruct-v1:0	0.10	0.10	Source: bedrock, Context: 128000
bedrock	us.meta.llama3-2-3b-instruct-v1:0	us.meta.llama3-2-3b-instruct-v1:0	0.15	0.15	Source: bedrock, Context: 128000
bedrock	us.meta.llama3-2-90b-instruct-v1:0	us.meta.llama3-2-90b-instruct-v1:0	2.00	2.00	Source: bedrock, Context: 128000
bedrockconverse	us.meta.llama3-3-70b-instruct-v1:0	us.meta.llama3-3-70b-instruct-v1:0	0.72	0.72	Source: bedrock_converse, Context: 128000
bedrockconverse	us.meta.llama4-maverick-17b-instruct-v1:0	us.meta.llama4-maverick-17b-instruct-v1:0	0.24	0.97	Source: bedrock_converse, Context: 128000
bedrockconverse	us.meta.llama4-scout-17b-instruct-v1:0	us.meta.llama4-scout-17b-instruct-v1:0	0.17	0.66	Source: bedrock_converse, Context: 128000
bedrockconverse	us.mistral.pixtral-large-2502-v1:0	us.mistral.pixtral-large-2502-v1:0	2.00	6.00	Source: bedrock_converse, Context: 128000
vercel	claude-4-opus	claude-4-opus	15.00	75.00	Source: vercel, Context: 200000
vercel	claude-4-sonnet	claude-4-sonnet	3.00	15.00	Source: vercel, Context: 200000
vercel	command-r	command-r	0.15	0.60	Source: vercel, Context: 128000
vercel	command-r-plus	command-r-plus	2.50	10.00	Source: vercel, Context: 128000
vercel	deepseek-r1-distill-llama-70b	deepseek-r1-distill-llama-70b	0.75	0.99	Source: vercel, Context: 131072
vercel	gemma-2-9b	gemma-2-9b	0.20	0.20	Source: vercel, Context: 8192
vercel	llama-3-70b	llama-3-70b	0.59	0.79	Source: vercel, Context: 8192
vercel	llama-3-8b	llama-3-8b	0.05	0.08	Source: vercel, Context: 8192
vercel	mistral-large	mistral-large	2.00	6.00	Source: vercel, Context: 32000
vercel	mistral-saba-24b	mistral-saba-24b	0.79	0.79	Source: vercel, Context: 32768
vertex	chirp	chirp	0.00	0.00	Source: vertex, Context: N/A
vertex	claude-3-5-haiku	claude-3-5-haiku	1.00	5.00	Source: vertex, Context: 200000
vertex	claude-3-5-haiku@20241022	claude-3-5-haiku@20241022	1.00	5.00	Source: vertex, Context: 200000
vertex	claude-haiku-4-5@20251001	claude-haiku-4-5@20251001	1.00	5.00	Source: vertex, Context: 200000
vertex	claude-3-5-sonnet	claude-3-5-sonnet	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-3-5-sonnet-v2	claude-3-5-sonnet-v2	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-3-5-sonnet-v2@20241022	claude-3-5-sonnet-v2@20241022	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-3-5-sonnet@20240620	claude-3-5-sonnet@20240620	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-3-7-sonnet@20250219	claude-3-7-sonnet@20250219	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-3-haiku	claude-3-haiku	0.25	1.25	Source: vertex, Context: 200000
vertex	claude-3-haiku@20240307	claude-3-haiku@20240307	0.25	1.25	Source: vertex, Context: 200000
vertex	claude-3-opus	claude-3-opus	15.00	75.00	Source: vertex, Context: 200000
vertex	claude-3-opus@20240229	claude-3-opus@20240229	15.00	75.00	Source: vertex, Context: 200000
vertex	claude-3-sonnet	claude-3-sonnet	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-3-sonnet@20240229	claude-3-sonnet@20240229	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-opus-4	claude-opus-4	15.00	75.00	Source: vertex, Context: 200000
vertex	claude-opus-4-1	claude-opus-4-1	15.00	75.00	Source: vertex, Context: 200000
vertex	claude-opus-4-1@20250805	claude-opus-4-1@20250805	15.00	75.00	Source: vertex, Context: 200000
vertex	claude-opus-4-5	claude-opus-4-5	5.00	25.00	Source: vertex, Context: 200000
vertex	claude-opus-4-5@20251101	claude-opus-4-5@20251101	5.00	25.00	Source: vertex, Context: 200000
vertex	claude-sonnet-4-5	claude-sonnet-4-5	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-sonnet-4-5@20250929	claude-sonnet-4-5@20250929	3.00	15.00	Source: vertex, Context: 200000
vertex	claude-opus-4@20250514	claude-opus-4@20250514	15.00	75.00	Source: vertex, Context: 200000
vertex	claude-sonnet-4	claude-sonnet-4	3.00	15.00	Source: vertex, Context: 1000000
vertex	claude-sonnet-4@20250514	claude-sonnet-4@20250514	3.00	15.00	Source: vertex, Context: 1000000
vertex	codestral-2@001	codestral-2@001	0.30	0.90	Source: vertex, Context: 128000
vertex	codestral-2	codestral-2	0.30	0.90	Source: vertex, Context: 128000
vertex	codestral-2501	codestral-2501	0.20	0.60	Source: vertex, Context: 128000
vertex	codestral@2405	codestral@2405	0.20	0.60	Source: vertex, Context: 128000
vertex	codestral@latest	codestral@latest	0.20	0.60	Source: vertex, Context: 128000
vertex	deepseek-v3.1-maas	deepseek-v3.1-maas	1.35	5.40	Source: vertex, Context: 163840
vertex	deepseek-v3.2-maas	deepseek-v3.2-maas	0.56	1.68	Source: vertex, Context: 163840
vertex	deepseek-r1-0528-maas	deepseek-r1-0528-maas	1.35	5.40	Source: vertex, Context: 65336
vertex	imagegeneration@006	imagegeneration@006	0.00	0.00	Source: vertex, Context: N/A
vertex	imagen-3.0-fast-generate-001	imagen-3.0-fast-generate-001	0.00	0.00	Source: vertex, Context: N/A
vertex	imagen-3.0-generate-001	imagen-3.0-generate-001	0.00	0.00	Source: vertex, Context: N/A
vertex	imagen-3.0-generate-002	imagen-3.0-generate-002	0.00	0.00	Source: vertex, Context: N/A
vertex	imagen-3.0-capability-001	imagen-3.0-capability-001	0.00	0.00	Source: vertex, Context: N/A
vertex	imagen-4.0-fast-generate-001	imagen-4.0-fast-generate-001	0.00	0.00	Source: vertex, Context: N/A
vertex	imagen-4.0-generate-001	imagen-4.0-generate-001	0.00	0.00	Source: vertex, Context: N/A
vertex	imagen-4.0-ultra-generate-001	imagen-4.0-ultra-generate-001	0.00	0.00	Source: vertex, Context: N/A
vertex	jamba-1.5	jamba-1.5	0.20	0.40	Source: vertex, Context: 256000
vertex	jamba-1.5-large	jamba-1.5-large	2.00	8.00	Source: vertex, Context: 256000
vertex	jamba-1.5-large@001	jamba-1.5-large@001	2.00	8.00	Source: vertex, Context: 256000
vertex	jamba-1.5-mini	jamba-1.5-mini	0.20	0.40	Source: vertex, Context: 256000
vertex	jamba-1.5-mini@001	jamba-1.5-mini@001	0.20	0.40	Source: vertex, Context: 256000
vertex	llama-3.1-405b-instruct-maas	llama-3.1-405b-instruct-maas	5.00	16.00	Source: vertex, Context: 128000
vertex	llama-3.1-70b-instruct-maas	llama-3.1-70b-instruct-maas	0.00	0.00	Source: vertex, Context: 128000
vertex	llama-3.1-8b-instruct-maas	llama-3.1-8b-instruct-maas	0.00	0.00	Source: vertex, Context: 128000
vertex	llama-3.2-90b-vision-instruct-maas	llama-3.2-90b-vision-instruct-maas	0.00	0.00	Source: vertex, Context: 128000
vertex	llama-4-maverick-17b-128e-instruct-maas	llama-4-maverick-17b-128e-instruct-maas	0.35	1.15	Source: vertex, Context: 1000000
vertex	llama-4-maverick-17b-16e-instruct-maas	llama-4-maverick-17b-16e-instruct-maas	0.35	1.15	Source: vertex, Context: 1000000
vertex	llama-4-scout-17b-128e-instruct-maas	llama-4-scout-17b-128e-instruct-maas	0.25	0.70	Source: vertex, Context: 10000000
vertex	llama-4-scout-17b-16e-instruct-maas	llama-4-scout-17b-16e-instruct-maas	0.25	0.70	Source: vertex, Context: 10000000
vertex	llama3-405b-instruct-maas	llama3-405b-instruct-maas	0.00	0.00	Source: vertex, Context: 32000
vertex	llama3-70b-instruct-maas	llama3-70b-instruct-maas	0.00	0.00	Source: vertex, Context: 32000
vertex	llama3-8b-instruct-maas	llama3-8b-instruct-maas	0.00	0.00	Source: vertex, Context: 32000
vertex	minimax-m2-maas	minimax-m2-maas	0.30	1.20	Source: vertex, Context: 196608
vertex	kimi-k2-thinking-maas	kimi-k2-thinking-maas	0.60	2.50	Source: vertex, Context: 256000
vertex	mistral-medium-3	mistral-medium-3	0.40	2.00	Source: vertex, Context: 128000
vertex	mistral-medium-3@001	mistral-medium-3@001	0.40	2.00	Source: vertex, Context: 128000
vertex	mistral-large-2411	mistral-large-2411	2.00	6.00	Source: vertex, Context: 128000
vertex	mistral-large@2407	mistral-large@2407	2.00	6.00	Source: vertex, Context: 128000
vertex	mistral-large@2411-001	mistral-large@2411-001	2.00	6.00	Source: vertex, Context: 128000
vertex	mistral-large@latest	mistral-large@latest	2.00	6.00	Source: vertex, Context: 128000
vertex	mistral-nemo@2407	mistral-nemo@2407	3.00	3.00	Source: vertex, Context: 128000
vertex	mistral-nemo@latest	mistral-nemo@latest	0.15	0.15	Source: vertex, Context: 128000
vertex	mistral-small-2503	mistral-small-2503	1.00	3.00	Source: vertex, Context: 128000
vertex	mistral-small-2503@001	mistral-small-2503@001	1.00	3.00	Source: vertex, Context: 32000
vertex	mistral-ocr-2505	mistral-ocr-2505	0.00	0.00	Source: vertex, Context: N/A
vertex	deepseek-ocr-maas	deepseek-ocr-maas	0.30	1.20	Source: vertex, Context: N/A
vertex	gpt-oss-120b-maas	gpt-oss-120b-maas	0.15	0.60	Source: vertex, Context: 131072
vertex	gpt-oss-20b-maas	gpt-oss-20b-maas	0.08	0.30	Source: vertex, Context: 131072
vertex	qwen3-235b-a22b-instruct-2507-maas	qwen3-235b-a22b-instruct-2507-maas	0.25	1.00	Source: vertex, Context: 262144
vertex	qwen3-coder-480b-a35b-instruct-maas	qwen3-coder-480b-a35b-instruct-maas	1.00	4.00	Source: vertex, Context: 262144
vertex	qwen3-next-80b-a3b-instruct-maas	qwen3-next-80b-a3b-instruct-maas	0.15	1.20	Source: vertex, Context: 262144
vertex	qwen3-next-80b-a3b-thinking-maas	qwen3-next-80b-a3b-thinking-maas	0.15	1.20	Source: vertex, Context: 262144
vertex	veo-2.0-generate-001	veo-2.0-generate-001	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.0-fast-generate-preview	veo-3.0-fast-generate-preview	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.0-generate-preview	veo-3.0-generate-preview	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.0-fast-generate-001	veo-3.0-fast-generate-001	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.0-generate-001	veo-3.0-generate-001	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.1-generate-preview	veo-3.1-generate-preview	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.1-fast-generate-preview	veo-3.1-fast-generate-preview	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.1-generate-001	veo-3.1-generate-001	0.00	0.00	Source: vertex, Context: 1024
vertex	veo-3.1-fast-generate-001	veo-3.1-fast-generate-001	0.00	0.00	Source: vertex, Context: 1024
voyage	rerank-2	rerank-2	0.05	0.00	Source: voyage, Context: 16000
voyage	rerank-2-lite	rerank-2-lite	0.02	0.00	Source: voyage, Context: 8000
voyage	rerank-2.5	rerank-2.5	0.05	0.00	Source: voyage, Context: 32000
voyage	rerank-2.5-lite	rerank-2.5-lite	0.02	0.00	Source: voyage, Context: 32000
voyage	voyage-2	voyage-2	0.10	0.00	Source: voyage, Context: 4000
voyage	voyage-3	voyage-3	0.06	0.00	Source: voyage, Context: 32000
voyage	voyage-3-large	voyage-3-large	0.18	0.00	Source: voyage, Context: 32000
voyage	voyage-3-lite	voyage-3-lite	0.02	0.00	Source: voyage, Context: 32000
voyage	voyage-3.5	voyage-3.5	0.06	0.00	Source: voyage, Context: 32000
voyage	voyage-3.5-lite	voyage-3.5-lite	0.02	0.00	Source: voyage, Context: 32000
voyage	voyage-code-2	voyage-code-2	0.12	0.00	Source: voyage, Context: 16000
voyage	voyage-code-3	voyage-code-3	0.18	0.00	Source: voyage, Context: 32000
voyage	voyage-context-3	voyage-context-3	0.18	0.00	Source: voyage, Context: 120000
voyage	voyage-finance-2	voyage-finance-2	0.12	0.00	Source: voyage, Context: 32000
voyage	voyage-large-2	voyage-large-2	0.12	0.00	Source: voyage, Context: 16000
voyage	voyage-law-2	voyage-law-2	0.12	0.00	Source: voyage, Context: 16000
voyage	voyage-lite-01	voyage-lite-01	0.10	0.00	Source: voyage, Context: 4096
voyage	voyage-lite-02-instruct	voyage-lite-02-instruct	0.10	0.00	Source: voyage, Context: 4000
voyage	voyage-multimodal-3	voyage-multimodal-3	0.12	0.00	Source: voyage, Context: 32000
wandb	gpt-oss-120b	gpt-oss-120b	15,000.00	60,000.00	Source: wandb, Context: 131072
wandb	gpt-oss-20b	gpt-oss-20b	5,000.00	20,000.00	Source: wandb, Context: 131072
wandb	GLM-4.5	glm-4.5	55,000.00	200,000.00	Source: wandb, Context: 131072
wandb	DeepSeek-V3.1	deepseek-v3.1	55,000.00	165,000.00	Source: wandb, Context: 128000
watsonx	granite-3-8b-instruct	granite-3-8b-instruct	0.20	0.20	Source: watsonx, Context: 8192
watsonx	mistral-large	mistral-large	3.00	10.00	Source: watsonx, Context: 131072
watsonx	mt0-xxl-13b	mt0-xxl-13b	500.00	2,000.00	Source: watsonx, Context: 8192
watsonx	jais-13b-chat	jais-13b-chat	500.00	2,000.00	Source: watsonx, Context: 8192
watsonx	flan-t5-xl-3b	flan-t5-xl-3b	0.60	0.60	Source: watsonx, Context: 8192
watsonx	granite-13b-chat-v2	granite-13b-chat-v2	0.60	0.60	Source: watsonx, Context: 8192
watsonx	granite-13b-instruct-v2	granite-13b-instruct-v2	0.60	0.60	Source: watsonx, Context: 8192
watsonx	granite-3-3-8b-instruct	granite-3-3-8b-instruct	0.20	0.20	Source: watsonx, Context: 8192
watsonx	granite-4-h-small	granite-4-h-small	0.06	0.25	Source: watsonx, Context: 20480
watsonx	granite-guardian-3-2-2b	granite-guardian-3-2-2b	0.10	0.10	Source: watsonx, Context: 8192
watsonx	granite-guardian-3-3-8b	granite-guardian-3-3-8b	0.20	0.20	Source: watsonx, Context: 8192
watsonx	granite-ttm-1024-96-r2	granite-ttm-1024-96-r2	0.38	0.38	Source: watsonx, Context: 512
watsonx	granite-ttm-1536-96-r2	granite-ttm-1536-96-r2	0.38	0.38	Source: watsonx, Context: 512
watsonx	granite-ttm-512-96-r2	granite-ttm-512-96-r2	0.38	0.38	Source: watsonx, Context: 512
watsonx	granite-vision-3-2-2b	granite-vision-3-2-2b	0.10	0.10	Source: watsonx, Context: 8192
watsonx	llama-3-2-11b-vision-instruct	llama-3-2-11b-vision-instruct	0.35	0.35	Source: watsonx, Context: 128000
watsonx	llama-3-2-1b-instruct	llama-3-2-1b-instruct	0.10	0.10	Source: watsonx, Context: 128000
watsonx	llama-3-2-3b-instruct	llama-3-2-3b-instruct	0.15	0.15	Source: watsonx, Context: 128000
watsonx	llama-3-2-90b-vision-instruct	llama-3-2-90b-vision-instruct	2.00	2.00	Source: watsonx, Context: 128000
watsonx	llama-3-3-70b-instruct	llama-3-3-70b-instruct	0.71	0.71	Source: watsonx, Context: 128000
watsonx	llama-4-maverick-17b	llama-4-maverick-17b	0.35	1.40	Source: watsonx, Context: 128000
watsonx	llama-guard-3-11b-vision	llama-guard-3-11b-vision	0.35	0.35	Source: watsonx, Context: 128000
watsonx	mistral-medium-2505	mistral-medium-2505	3.00	10.00	Source: watsonx, Context: 128000
watsonx	mistral-small-2503	mistral-small-2503	0.10	0.30	Source: watsonx, Context: 32000
watsonx	mistral-small-3-1-24b-instruct-2503	mistral-small-3-1-24b-instruct-2503	0.10	0.30	Source: watsonx, Context: 32000
watsonx	pixtral-12b-2409	pixtral-12b-2409	0.35	0.35	Source: watsonx, Context: 128000
watsonx	gpt-oss-120b	gpt-oss-120b	0.15	0.60	Source: watsonx, Context: 8192
watsonx	allam-1-13b-instruct	allam-1-13b-instruct	1.80	1.80	Source: watsonx, Context: 8192
watsonx	whisper-large-v3-turbo	whisper-large-v3-turbo	0.00	0.00	Source: watsonx, Context: N/A
openai	whisper-1	whisper-1	0.00	0.00	Source: openai, Context: N/A
xai	grok-3-beta	grok-3-beta	3.00	15.00	Source: xai, Context: 131072
xai	grok-3-fast-beta	grok-3-fast-beta	5.00	25.00	Source: xai, Context: 131072
xai	grok-3-mini-beta	grok-3-mini-beta	0.30	0.50	Source: xai, Context: 131072
xai	grok-3-mini-fast-beta	grok-3-mini-fast-beta	0.60	4.00	Source: xai, Context: 131072
xai	grok-4-fast-reasoning	grok-4-fast-reasoning	0.20	0.50	Source: xai, Context: 2000000
xai	grok-4-0709	grok-4-0709	3.00	15.00	Source: xai, Context: 256000
xai	grok-4-latest	grok-4-latest	3.00	15.00	Source: xai, Context: 256000
xai	grok-4-1-fast-reasoning	grok-4-1-fast-reasoning	0.20	0.50	Source: xai, Context: 2000000
xai	grok-4-1-fast-reasoning-latest	grok-4-1-fast-reasoning-latest	0.20	0.50	Source: xai, Context: 2000000
xai	grok-4-1-fast-non-reasoning-latest	grok-4-1-fast-non-reasoning-latest	0.20	0.50	Source: xai, Context: 2000000
xai	grok-code-fast	grok-code-fast	0.20	1.50	Source: xai, Context: 256000
xai	grok-code-fast-1-0825	grok-code-fast-1-0825	0.20	1.50	Source: xai, Context: 256000
vertex	search_api	search_api	0.00	0.00	Source: vertex, Context: N/A
openai	container	container	0.00	0.00	Source: openai, Context: N/A
openai	sora-2	sora-2	0.00	0.00	Source: openai, Context: N/A
openai	sora-2-pro	sora-2-pro	0.00	0.00	Source: openai, Context: N/A
azure	sora-2	sora-2	0.00	0.00	Source: azure, Context: N/A
azure	sora-2-pro	sora-2-pro	0.00	0.00	Source: azure, Context: N/A
azure	sora-2-pro-high-res	sora-2-pro-high-res	0.00	0.00	Source: azure, Context: N/A
runwayml	gen4_turbo	gen4_turbo	0.00	0.00	Source: runwayml, Context: N/A
runwayml	gen4_aleph	gen4_aleph	0.00	0.00	Source: runwayml, Context: N/A
runwayml	gen3a_turbo	gen3a_turbo	0.00	0.00	Source: runwayml, Context: N/A
runwayml	gen4_image	gen4_image	0.00	0.00	Source: runwayml, Context: N/A
runwayml	gen4_image_turbo	gen4_image_turbo	0.00	0.00	Source: runwayml, Context: N/A
runwayml	eleven_multilingual_v2	eleven_multilingual_v2	0.00	0.00	Source: runwayml, Context: N/A
fireworksai	flux-kontext-pro	flux-kontext-pro	0.04	0.04	Source: fireworks_ai, Context: 4096
fireworksai	SSD-1B	ssd-1b	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	chronos-hermes-13b-v2	chronos-hermes-13b-v2	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	code-llama-13b	code-llama-13b	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-13b-instruct	code-llama-13b-instruct	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-13b-python	code-llama-13b-python	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-34b	code-llama-34b	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-34b-instruct	code-llama-34b-instruct	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-34b-python	code-llama-34b-python	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-70b	code-llama-70b	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	code-llama-70b-instruct	code-llama-70b-instruct	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	code-llama-70b-python	code-llama-70b-python	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	code-llama-7b	code-llama-7b	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-7b-instruct	code-llama-7b-instruct	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	code-llama-7b-python	code-llama-7b-python	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	code-qwen-1p5-7b	code-qwen-1p5-7b	0.20	0.20	Source: fireworks_ai, Context: 65536
fireworksai	codegemma-2b	codegemma-2b	0.10	0.10	Source: fireworks_ai, Context: 8192
fireworksai	codegemma-7b	codegemma-7b	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	cogito-671b-v2-p1	cogito-671b-v2-p1	1.20	1.20	Source: fireworks_ai, Context: 163840
fireworksai	cogito-v1-preview-llama-3b	cogito-v1-preview-llama-3b	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	cogito-v1-preview-llama-70b	cogito-v1-preview-llama-70b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	cogito-v1-preview-llama-8b	cogito-v1-preview-llama-8b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	cogito-v1-preview-qwen-14b	cogito-v1-preview-qwen-14b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	cogito-v1-preview-qwen-32b	cogito-v1-preview-qwen-32b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	flux-kontext-max	flux-kontext-max	0.08	0.08	Source: fireworks_ai, Context: 4096
fireworksai	dbrx-instruct	dbrx-instruct	1.20	1.20	Source: fireworks_ai, Context: 32768
fireworksai	deepseek-coder-1b-base	deepseek-coder-1b-base	0.10	0.10	Source: fireworks_ai, Context: 16384
fireworksai	deepseek-coder-33b-instruct	deepseek-coder-33b-instruct	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	deepseek-coder-7b-base	deepseek-coder-7b-base	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	deepseek-coder-7b-base-v1p5	deepseek-coder-7b-base-v1p5	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	deepseek-coder-7b-instruct-v1p5	deepseek-coder-7b-instruct-v1p5	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	deepseek-coder-v2-lite-base	deepseek-coder-v2-lite-base	0.50	0.50	Source: fireworks_ai, Context: 163840
fireworksai	deepseek-coder-v2-lite-instruct	deepseek-coder-v2-lite-instruct	0.50	0.50	Source: fireworks_ai, Context: 163840
fireworksai	deepseek-prover-v2	deepseek-prover-v2	1.20	1.20	Source: fireworks_ai, Context: 163840
fireworksai	deepseek-r1-0528-distill-qwen3-8b	deepseek-r1-0528-distill-qwen3-8b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	deepseek-r1-distill-llama-70b	deepseek-r1-distill-llama-70b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	deepseek-r1-distill-llama-8b	deepseek-r1-distill-llama-8b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	deepseek-r1-distill-qwen-14b	deepseek-r1-distill-qwen-14b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	deepseek-r1-distill-qwen-1p5b	deepseek-r1-distill-qwen-1p5b	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	deepseek-r1-distill-qwen-32b	deepseek-r1-distill-qwen-32b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	deepseek-r1-distill-qwen-7b	deepseek-r1-distill-qwen-7b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	deepseek-v2-lite-chat	deepseek-v2-lite-chat	0.50	0.50	Source: fireworks_ai, Context: 163840
fireworksai	deepseek-v2p5	deepseek-v2p5	1.20	1.20	Source: fireworks_ai, Context: 32768
fireworksai	devstral-small-2505	devstral-small-2505	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	dobby-mini-unhinged-plus-llama-3-1-8b	dobby-mini-unhinged-plus-llama-3-1-8b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	dobby-unhinged-llama-3-3-70b-new	dobby-unhinged-llama-3-3-70b-new	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	dolphin-2-9-2-qwen2-72b	dolphin-2-9-2-qwen2-72b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	dolphin-2p6-mixtral-8x7b	dolphin-2p6-mixtral-8x7b	0.50	0.50	Source: fireworks_ai, Context: 32768
fireworksai	ernie-4p5-21b-a3b-pt	ernie-4p5-21b-a3b-pt	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	ernie-4p5-300b-a47b-pt	ernie-4p5-300b-a47b-pt	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	fare-20b	fare-20b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	firefunction-v1	firefunction-v1	0.50	0.50	Source: fireworks_ai, Context: 32768
fireworksai	firellava-13b	firellava-13b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	firesearch-ocr-v6	firesearch-ocr-v6	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	fireworks-asr-large	fireworks-asr-large	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	fireworks-asr-v2	fireworks-asr-v2	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	flux-1-dev	flux-1-dev	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	flux-1-dev-controlnet-union	flux-1-dev-controlnet-union	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	flux-1-dev-fp8	flux-1-dev-fp8	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	flux-1-schnell	flux-1-schnell	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	flux-1-schnell-fp8	flux-1-schnell-fp8	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	gemma-2b-it	gemma-2b-it	0.10	0.10	Source: fireworks_ai, Context: 8192
fireworksai	gemma-3-27b-it	gemma-3-27b-it	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	gemma-7b	gemma-7b	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	gemma-7b-it	gemma-7b-it	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	gemma2-9b-it	gemma2-9b-it	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	glm-4p5v	glm-4p5v	1.20	1.20	Source: fireworks_ai, Context: 131072
fireworksai	gpt-oss-safeguard-120b	gpt-oss-safeguard-120b	1.20	1.20	Source: fireworks_ai, Context: 131072
fireworksai	gpt-oss-safeguard-20b	gpt-oss-safeguard-20b	0.50	0.50	Source: fireworks_ai, Context: 131072
fireworksai	hermes-2-pro-mistral-7b	hermes-2-pro-mistral-7b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	internvl3-38b	internvl3-38b	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	internvl3-78b	internvl3-78b	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	internvl3-8b	internvl3-8b	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	japanese-stable-diffusion-xl	japanese-stable-diffusion-xl	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	kat-coder	kat-coder	0.90	0.90	Source: fireworks_ai, Context: 262144
fireworksai	kat-dev-32b	kat-dev-32b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	kat-dev-72b-exp	kat-dev-72b-exp	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	llama-guard-2-8b	llama-guard-2-8b	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	llama-guard-3-1b	llama-guard-3-1b	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	llama-guard-3-8b	llama-guard-3-8b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	llama-v2-13b	llama-v2-13b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	llama-v2-13b-chat	llama-v2-13b-chat	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	llama-v2-70b	llama-v2-70b	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	llama-v2-70b-chat	llama-v2-70b-chat	0.90	0.90	Source: fireworks_ai, Context: 2048
fireworksai	llama-v2-7b	llama-v2-7b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	llama-v2-7b-chat	llama-v2-7b-chat	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	llama-v3-70b-instruct	llama-v3-70b-instruct	0.90	0.90	Source: fireworks_ai, Context: 8192
fireworksai	llama-v3-70b-instruct-hf	llama-v3-70b-instruct-hf	0.90	0.90	Source: fireworks_ai, Context: 8192
fireworksai	llama-v3-8b	llama-v3-8b	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	llama-v3-8b-instruct-hf	llama-v3-8b-instruct-hf	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	llama-v3p1-405b-instruct-long	llama-v3p1-405b-instruct-long	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	llama-v3p1-70b-instruct	llama-v3p1-70b-instruct	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	llama-v3p1-70b-instruct-1b	llama-v3p1-70b-instruct-1b	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	llama-v3p1-nemotron-70b-instruct	llama-v3p1-nemotron-70b-instruct	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	llama-v3p2-1b	llama-v3p2-1b	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	llama-v3p2-3b	llama-v3p2-3b	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	llama-v3p3-70b-instruct	llama-v3p3-70b-instruct	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	llamaguard-7b	llamaguard-7b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	llava-yi-34b	llava-yi-34b	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	minimax-m1-80k	minimax-m1-80k	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	ministral-3-14b-instruct-2512	ministral-3-14b-instruct-2512	0.20	0.20	Source: fireworks_ai, Context: 256000
fireworksai	ministral-3-3b-instruct-2512	ministral-3-3b-instruct-2512	0.10	0.10	Source: fireworks_ai, Context: 256000
fireworksai	ministral-3-8b-instruct-2512	ministral-3-8b-instruct-2512	0.20	0.20	Source: fireworks_ai, Context: 256000
fireworksai	mistral-7b	mistral-7b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	mistral-7b-instruct-4k	mistral-7b-instruct-4k	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	mistral-7b-instruct-v0p2	mistral-7b-instruct-v0p2	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	mistral-7b-instruct-v3	mistral-7b-instruct-v3	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	mistral-7b-v0p2	mistral-7b-v0p2	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	mistral-large-3-fp8	mistral-large-3-fp8	1.20	1.20	Source: fireworks_ai, Context: 256000
fireworksai	mistral-nemo-base-2407	mistral-nemo-base-2407	0.20	0.20	Source: fireworks_ai, Context: 128000
fireworksai	mistral-nemo-instruct-2407	mistral-nemo-instruct-2407	0.20	0.20	Source: fireworks_ai, Context: 128000
fireworksai	mistral-small-24b-instruct-2501	mistral-small-24b-instruct-2501	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	mixtral-8x22b	mixtral-8x22b	1.20	1.20	Source: fireworks_ai, Context: 65536
fireworksai	mixtral-8x22b-instruct	mixtral-8x22b-instruct	1.20	1.20	Source: fireworks_ai, Context: 65536
fireworksai	mixtral-8x7b	mixtral-8x7b	0.50	0.50	Source: fireworks_ai, Context: 32768
fireworksai	mixtral-8x7b-instruct	mixtral-8x7b-instruct	0.50	0.50	Source: fireworks_ai, Context: 32768
fireworksai	mixtral-8x7b-instruct-hf	mixtral-8x7b-instruct-hf	0.50	0.50	Source: fireworks_ai, Context: 32768
fireworksai	mythomax-l2-13b	mythomax-l2-13b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	nemotron-nano-v2-12b-vl	nemotron-nano-v2-12b-vl	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	nous-capybara-7b-v1p9	nous-capybara-7b-v1p9	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	nous-hermes-2-mixtral-8x7b-dpo	nous-hermes-2-mixtral-8x7b-dpo	0.50	0.50	Source: fireworks_ai, Context: 32768
fireworksai	nous-hermes-2-yi-34b	nous-hermes-2-yi-34b	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	nous-hermes-llama2-13b	nous-hermes-llama2-13b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	nous-hermes-llama2-70b	nous-hermes-llama2-70b	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	nous-hermes-llama2-7b	nous-hermes-llama2-7b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	nvidia-nemotron-nano-12b-v2	nvidia-nemotron-nano-12b-v2	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	nvidia-nemotron-nano-9b-v2	nvidia-nemotron-nano-9b-v2	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	openchat-3p5-0106-7b	openchat-3p5-0106-7b	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	openhermes-2-mistral-7b	openhermes-2-mistral-7b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	openhermes-2p5-mistral-7b	openhermes-2p5-mistral-7b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	openorca-7b	openorca-7b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	phi-2-3b	phi-2-3b	0.10	0.10	Source: fireworks_ai, Context: 2048
fireworksai	phi-3-mini-128k-instruct	phi-3-mini-128k-instruct	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	phi-3-vision-128k-instruct	phi-3-vision-128k-instruct	0.20	0.20	Source: fireworks_ai, Context: 32064
fireworksai	phind-code-llama-34b-python-v1	phind-code-llama-34b-python-v1	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	phind-code-llama-34b-v1	phind-code-llama-34b-v1	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	phind-code-llama-34b-v2	phind-code-llama-34b-v2	0.90	0.90	Source: fireworks_ai, Context: 16384
fireworksai	playground-v2-1024px-aesthetic	playground-v2-1024px-aesthetic	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	playground-v2-5-1024px-aesthetic	playground-v2-5-1024px-aesthetic	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	pythia-12b	pythia-12b	0.20	0.20	Source: fireworks_ai, Context: 2048
fireworksai	qwen-qwq-32b-preview	qwen-qwq-32b-preview	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen-v2p5-14b-instruct	qwen-v2p5-14b-instruct	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen-v2p5-7b	qwen-v2p5-7b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	qwen1p5-72b-chat	qwen1p5-72b-chat	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen2-7b-instruct	qwen2-7b-instruct	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen2-vl-2b-instruct	qwen2-vl-2b-instruct	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2-vl-72b-instruct	qwen2-vl-72b-instruct	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen2-vl-7b-instruct	qwen2-vl-7b-instruct	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-0p5b-instruct	qwen2p5-0p5b-instruct	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-14b	qwen2p5-14b	0.20	0.20	Source: fireworks_ai, Context: 131072
fireworksai	qwen2p5-1p5b-instruct	qwen2p5-1p5b-instruct	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-32b	qwen2p5-32b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	qwen2p5-32b-instruct	qwen2p5-32b-instruct	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-72b	qwen2p5-72b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	qwen2p5-72b-instruct	qwen2p5-72b-instruct	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-7b-instruct	qwen2p5-7b-instruct	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-0p5b	qwen2p5-coder-0p5b	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-0p5b-instruct	qwen2p5-coder-0p5b-instruct	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-14b	qwen2p5-coder-14b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-14b-instruct	qwen2p5-coder-14b-instruct	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-1p5b	qwen2p5-coder-1p5b	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-1p5b-instruct	qwen2p5-coder-1p5b-instruct	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-32b	qwen2p5-coder-32b	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-32b-instruct-128k	qwen2p5-coder-32b-instruct-128k	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	qwen2p5-coder-32b-instruct-32k-rope	qwen2p5-coder-32b-instruct-32k-rope	0.90	0.90	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-32b-instruct-64k	qwen2p5-coder-32b-instruct-64k	0.90	0.90	Source: fireworks_ai, Context: 65536
fireworksai	qwen2p5-coder-3b	qwen2p5-coder-3b	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-3b-instruct	qwen2p5-coder-3b-instruct	0.10	0.10	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-7b	qwen2p5-coder-7b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-coder-7b-instruct	qwen2p5-coder-7b-instruct	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	qwen2p5-math-72b-instruct	qwen2p5-math-72b-instruct	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	qwen2p5-vl-32b-instruct	qwen2p5-vl-32b-instruct	0.90	0.90	Source: fireworks_ai, Context: 128000
fireworksai	qwen2p5-vl-3b-instruct	qwen2p5-vl-3b-instruct	0.20	0.20	Source: fireworks_ai, Context: 128000
fireworksai	qwen2p5-vl-72b-instruct	qwen2p5-vl-72b-instruct	0.90	0.90	Source: fireworks_ai, Context: 128000
fireworksai	qwen2p5-vl-7b-instruct	qwen2p5-vl-7b-instruct	0.20	0.20	Source: fireworks_ai, Context: 128000
fireworksai	qwen3-0p6b	qwen3-0p6b	0.10	0.10	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-14b	qwen3-14b	0.20	0.20	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-1p7b	qwen3-1p7b	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	qwen3-1p7b-fp8-draft	qwen3-1p7b-fp8-draft	0.10	0.10	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-1p7b-fp8-draft-131072	qwen3-1p7b-fp8-draft-131072	0.10	0.10	Source: fireworks_ai, Context: 131072
fireworksai	qwen3-1p7b-fp8-draft-40960	qwen3-1p7b-fp8-draft-40960	0.10	0.10	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-235b-a22b-instruct-2507	qwen3-235b-a22b-instruct-2507	0.22	0.88	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-235b-a22b-thinking-2507	qwen3-235b-a22b-thinking-2507	0.22	0.88	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-30b-a3b	qwen3-30b-a3b	0.15	0.60	Source: fireworks_ai, Context: 131072
fireworksai	qwen3-30b-a3b-instruct-2507	qwen3-30b-a3b-instruct-2507	0.50	0.50	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-30b-a3b-thinking-2507	qwen3-30b-a3b-thinking-2507	0.90	0.90	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-32b	qwen3-32b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	qwen3-4b	qwen3-4b	0.20	0.20	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-4b-instruct-2507	qwen3-4b-instruct-2507	0.20	0.20	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-8b	qwen3-8b	0.20	0.20	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-coder-30b-a3b-instruct	qwen3-coder-30b-a3b-instruct	0.15	0.60	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-coder-480b-instruct-bf16	qwen3-coder-480b-instruct-bf16	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	qwen3-embedding-0p6b	qwen3-embedding-0p6b	0.00	0.00	Source: fireworks_ai, Context: 32768
fireworksai	qwen3-embedding-4b	qwen3-embedding-4b	0.00	0.00	Source: fireworks_ai, Context: 40960
fireworksai		-	0.10	0.00	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-next-80b-a3b-instruct	qwen3-next-80b-a3b-instruct	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	qwen3-next-80b-a3b-thinking	qwen3-next-80b-a3b-thinking	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	qwen3-reranker-0p6b	qwen3-reranker-0p6b	0.00	0.00	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-reranker-4b	qwen3-reranker-4b	0.00	0.00	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-reranker-8b	qwen3-reranker-8b	0.00	0.00	Source: fireworks_ai, Context: 40960
fireworksai	qwen3-vl-235b-a22b-instruct	qwen3-vl-235b-a22b-instruct	0.22	0.88	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-vl-235b-a22b-thinking	qwen3-vl-235b-a22b-thinking	0.22	0.88	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-vl-30b-a3b-instruct	qwen3-vl-30b-a3b-instruct	0.15	0.60	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-vl-30b-a3b-thinking	qwen3-vl-30b-a3b-thinking	0.15	0.60	Source: fireworks_ai, Context: 262144
fireworksai	qwen3-vl-32b-instruct	qwen3-vl-32b-instruct	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	qwen3-vl-8b-instruct	qwen3-vl-8b-instruct	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	qwq-32b	qwq-32b	0.90	0.90	Source: fireworks_ai, Context: 131072
fireworksai	rolm-ocr	rolm-ocr	0.20	0.20	Source: fireworks_ai, Context: 128000
fireworksai	snorkel-mistral-7b-pairrm-dpo	snorkel-mistral-7b-pairrm-dpo	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	stable-diffusion-xl-1024-v1-0	stable-diffusion-xl-1024-v1-0	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	stablecode-3b	stablecode-3b	0.10	0.10	Source: fireworks_ai, Context: 4096
fireworksai	starcoder-16b	starcoder-16b	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	starcoder-7b	starcoder-7b	0.20	0.20	Source: fireworks_ai, Context: 8192
fireworksai	starcoder2-15b	starcoder2-15b	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	starcoder2-3b	starcoder2-3b	0.10	0.10	Source: fireworks_ai, Context: 16384
fireworksai	starcoder2-7b	starcoder2-7b	0.20	0.20	Source: fireworks_ai, Context: 16384
fireworksai	toppy-m-7b	toppy-m-7b	0.20	0.20	Source: fireworks_ai, Context: 32768
fireworksai	whisper-v3	whisper-v3	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	whisper-v3-turbo	whisper-v3-turbo	0.00	0.00	Source: fireworks_ai, Context: 4096
fireworksai	yi-34b	yi-34b	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	yi-34b-200k-capybara	yi-34b-200k-capybara	0.90	0.90	Source: fireworks_ai, Context: 200000
fireworksai	yi-34b-chat	yi-34b-chat	0.90	0.90	Source: fireworks_ai, Context: 4096
fireworksai	yi-6b	yi-6b	0.20	0.20	Source: fireworks_ai, Context: 4096
fireworksai	zephyr-7b-beta	zephyr-7b-beta	0.20	0.20	Source: fireworks_ai, Context: 32768
openrouter	ByteDance Seed: Seed 1.6 Flash	seed-1.6-flash	0.08	0.30	Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens. Context: 262144
openrouter	ByteDance Seed: Seed 1.6	seed-1.6	0.25	2.00	Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window. Context: 262144
openrouter	MiniMax: MiniMax M2.1	minimax-m2.1	0.12	0.48	MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world capability while maintaining exceptional latency, scalability, and cost efficiency. Compared to its predecessor, M2.1 delivers cleaner, more concise outputs and faster perceived response times. It shows leading multilingual coding performance across major systems and application languages, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, and serves as a versatile agent “brain” for IDEs, coding tools, and general-purpose assistance. To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks). Context: 196608
openrouter	Z.AI: GLM 4.7	glm-4.7	0.16	0.80	GLM-4.7 is Z.AI’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics. Context: 202752
openrouter	Google: Gemini 3 Flash Preview	gemini-3-flash-preview	0.50	3.00	Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models. Context: 1048576
openrouter	Mistral: Mistral Small Creative	mistral-small-creative	0.10	0.30	Mistral Small Creative is an experimental small model designed for creative writing, narrative generation, roleplay and character-driven dialogue, general-purpose instruction following, and conversational agents. Context: 32768
openrouter	AllenAI: Olmo 3.1 32B Think (free)	olmo-3.1-32b-think:free	0.00	0.00	Olmo 3.1 32B Think is a large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following. Building on the Olmo 3 series, version 3.1 delivers refined reasoning behavior and stronger performance across demanding evaluations and nuanced conversational tasks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Think continues the Olmo initiative’s commitment to openness, providing full transparency across model weights, code, and training methodology. Context: 65536
openrouter	Xiaomi: MiMo-V2-Flash (free)	mimo-v2-flash:free	0.00	0.00	MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much. Note: when integrating with agentic tools such as Claude Code, Cline, or Roo Code, turn off reasoning mode for the best and fastest performance—this model is deeply optimized for this scenario. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config). Context: 262144
openrouter	NVIDIA: Nemotron 3 Nano 30B A3B (free)	nemotron-3-nano-30b-a3b:free	0.00	0.00	NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security. Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems. Context: 256000
openrouter	NVIDIA: Nemotron 3 Nano 30B A3B	nemotron-3-nano-30b-a3b	0.06	0.24	NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security. Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems. Context: 262144
openrouter	OpenAI: GPT-5.2 Chat	gpt-5.2-chat	1.75	14.00	GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation. Context: 128000
openrouter	OpenAI: GPT-5.2 Pro	gpt-5.2-pro	21.00	168.00	GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
openrouter	OpenAI: GPT-5.2	gpt-5.2	1.75	14.00	GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability. Context: 400000
openrouter	Mistral: Devstral 2 2512 (free)	devstral-2512:free	0.00	0.00	Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license. Context: 262144
openrouter	Mistral: Devstral 2 2512	devstral-2512	0.05	0.22	Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license. Context: 262144
openrouter	Relace: Relace Search	relace-search	1.00	3.00	The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It's designed to serve as a subagent that passes its findings to an "oracle" coding agent, who orchestrates/performs the rest of the coding task. To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the [Relace documentation](https://docs.relace.ai/docs/fast-agentic-search/agent). Context: 256000
openrouter	Z.AI: GLM 4.6V	glm-4.6v	0.30	0.90	GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing. Context: 131072
openrouter	Nex AGI: DeepSeek V3.1 Nex N1 (free)	deepseek-v3.1-nex-n1:free	0.00	0.00	DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent autonomy, tool use, and real-world productivity. Nex-N1 demonstrates competitive performance across all evaluation scenarios, showing particularly strong results in practical coding and HTML generation tasks. Context: 131072
openrouter	EssentialAI: Rnj 1 Instruct	rnj-1-instruct	0.15	0.15	Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance across multiple programming languages, tool-use workflows, and agentic execution environments (e.g., mini-SWE-agent). Context: 32768
openrouter	Body Builder (beta)	bodybuilder	-1,000,000.00	-1,000,000.00	Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder will construct the appropriate API calls. Example: "count to 10 using gemini and opus." This is useful for creating multi-model requests, custom model routers, or programmatic generation of API calls from human descriptions. BETA NOTICE: Body Builder is in beta, and currently free. Pricing and functionality may change in the future. Context: 128000
openrouter	OpenAI: GPT-5.1-Codex-Max	gpt-5.1-codex-max	1.25	10.00	GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic workflows spanning software engineering, mathematics, and research. GPT-5.1-Codex-Max delivers faster performance, improved reasoning, and higher token efficiency across the development lifecycle. Context: 400000
openrouter	Amazon: Nova 2 Lite	nova-2-lite-v1	0.30	2.50	Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows. Context: 1000000
openrouter	Mistral: Ministral 3 14B 2512	ministral-14b-2512	0.20	0.20	The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. Context: 262144
openrouter	Mistral: Ministral 3 8B 2512	ministral-8b-2512	0.15	0.15	A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities. Context: 262144
openrouter	Mistral: Ministral 3 3B 2512	ministral-3b-2512	0.10	0.10	The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities. Context: 131072
openrouter	Mistral: Mistral Large 3 2512	mistral-large-2512	0.50	1.50	Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license. Context: 262144
openrouter	Arcee AI: Trinity Mini (free)	trinity-mini:free	0.00	0.00	Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows. Context: 131072
openrouter	Arcee AI: Trinity Mini	trinity-mini	0.05	0.15	Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows. Context: 131072
openrouter	DeepSeek: DeepSeek V3.2 Speciale	deepseek-v3.2-speciale	0.27	0.41	DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on DeepSeek Sparse Attention (DSA) for efficient long-context processing, then scales post-training reinforcement learning to push capability beyond the base model. Reported evaluations place Speciale ahead of GPT-5 on difficult reasoning workloads, with proficiency comparable to Gemini-3.0-Pro, while retaining strong coding and tool-use reliability. Like V3.2, it benefits from a large-scale agentic task synthesis pipeline that improves compliance and generalization in interactive environments. Context: 163840
openrouter	DeepSeek: DeepSeek V3.2	deepseek-v3.2	0.25	0.38	DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 163840
openrouter	Prime Intellect: INTELLECT-3	intellect-3	0.20	1.10	INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL). It offers state-of-the-art performance for its size across math, code, science, and general reasoning, consistently outperforming many larger frontier models. Designed for strong multi-step problem solving, it maintains high accuracy on structured tasks while remaining efficient at inference thanks to its MoE architecture. Context: 131072
openrouter	TNG: R1T Chimera	tng-r1t-chimera	0.25	0.85	TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter. Characteristics and improvements include: We think that it has a creative and pleasant personality. It has a preliminary EQ-Bench3 value of about 1305. It is quite a bit more intelligent than the original, albeit a slightly slower. It is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated. Tool calling is much improved. TNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1). Context: 163840
openrouter	Anthropic: Claude Opus 4.5	claude-opus-4.5	5.00	25.00	Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high. Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks. Context: 200000
openrouter	AllenAI: Olmo 3 32B Think (free)	olmo-3-32b-think:free	0.00	0.00	Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and highly nuanced conversational reasoning. Developed by Ai2 under the Apache 2.0 license, Olmo 3 32B Think embodies the Olmo initiative’s commitment to openness, offering full transparency across weights, code and training methodology. Context: 65536
openrouter	AllenAI: Olmo 3 7B Instruct	olmo-3-7b-instruct	0.10	0.20	Olmo 3 7B Instruct is a supervised instruction-fine-tuned variant of the Olmo 3 7B base model, optimized for instruction-following, question-answering, and natural conversational dialogue. By leveraging high-quality instruction data and an open training pipeline, it delivers strong performance across everyday NLP tasks while remaining accessible and easy to integrate. Developed by Ai2 under the Apache 2.0 license, the model offers a transparent, community-friendly option for instruction-driven applications. Context: 65536
openrouter	AllenAI: Olmo 3 7B Think	olmo-3-7b-think	0.12	0.20	Olmo 3 7B Think is a research-oriented language model in the Olmo family designed for advanced reasoning and instruction-driven tasks. It excels at multi-step problem solving, logical inference, and maintaining coherent conversational context. Developed by Ai2 under the Apache 2.0 license, Olmo 3 7B Think supports transparent, fully open experimentation and provides a lightweight yet capable foundation for academic research and practical NLP workflows. Context: 65536
openrouter	Google: Nano Banana Pro (Gemini 3 Pro Image Preview)	gemini-3-pro-image-preview	2.00	12.00	Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model generates context-rich graphics, from infographics and diagrams to cinematic composites, and can incorporate real-time information via Search grounding. It offers industry-leading text rendering in images (including long passages and multilingual layouts), consistent multi-image blending, and accurate identity preservation across up to five subjects. Nano Banana Pro adds fine-grained creative controls such as localized edits, lighting and focus adjustments, camera transformations, and support for 2K/4K outputs and flexible aspect ratios. It is designed for professional-grade design, product visualization, storyboarding, and complex multi-element compositions while remaining efficient for general image creation workflows. Context: 65536
openrouter	xAI: Grok 4.1 Fast	grok-4.1-fast	0.20	0.50	Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) Context: 2000000
openrouter	Google: Gemini 3 Pro Preview	gemini-3-pro-preview	2.00	12.00	Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing. Context: 1048576
openrouter	Deep Cogito: Cogito v2.1 671B	cogito-v2.1-671b	1.25	1.25	Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning to reach state-of-the-art performance on multiple categories (instruction following, coding, longer queries and creative writing). This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement. Context: 128000
openrouter	OpenAI: GPT-5.1	gpt-5.1	1.25	10.00	GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5 Context: 400000
openrouter	OpenAI: GPT-5.1 Chat	gpt-5.1-chat	1.25	10.00	GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation. Context: 128000
openrouter	OpenAI: GPT-5.1-Codex	gpt-5.1-codex	1.25	10.00	GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. Context: 400000
openrouter	OpenAI: GPT-5.1-Codex-Mini	gpt-5.1-codex-mini	0.25	2.00	GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex Context: 400000
openrouter	Kwaipilot: KAT-Coder-Pro V1 (free)	kat-coder-pro:free	0.00	0.00	KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL. Context: 256000
openrouter	Kwaipilot: KAT-Coder-Pro V1	kat-coder-pro	0.21	0.83	KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL. Context: 256000
openrouter	MoonshotAI: Kimi K2 Thinking	kimi-k2-thinking	0.32	0.48	Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift. It sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks. Context: 262144
openrouter	Amazon: Nova Premier 1.0	nova-premier-v1	2.50	12.50	Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models. Context: 1000000
openrouter	Perplexity: Sonar Pro Search	sonar-pro-search	3.00	15.00	Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests. This model powers the Pro Search mode on the Perplexity platform. Sonar Pro Search adds autonomous, multi-step reasoning to Sonar Pro. So, instead of just one query + synthesis, it plans and executes entire research workflows using tools. Context: 200000
openrouter	Mistral: Voxtral Small 24B 2507	voxtral-small-24b-2507	0.10	0.30	Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio is priced at $100 per million seconds. Context: 32000
openrouter	OpenAI: gpt-oss-safeguard-20b	gpt-oss-safeguard-20b	0.08	0.30	gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust & safety labeling. Learn more about this model in OpenAI's gpt-oss-safeguard [user guide](https://cookbook.openai.com/articles/gpt-oss-safeguard-guide). Context: 131072
openrouter	NVIDIA: Nemotron Nano 12B 2 VL (free)	nemotron-nano-12b-v2-vl:free	0.00	0.00	NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes. Context: 128000
openrouter	NVIDIA: Nemotron Nano 12B 2 VL	nemotron-nano-12b-v2-vl	0.20	0.60	NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes. Context: 131072
openrouter	MiniMax: MiniMax M2	minimax-m2	0.20	1.00	MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning, tool use, and multi-step task execution while maintaining low latency and deployment efficiency. The model excels in code generation, multi-file editing, compile-run-fix loops, and test-validated repair, showing strong results on SWE-Bench Verified, Multi-SWE-Bench, and Terminal-Bench. It also performs competitively in agentic evaluations such as BrowseComp and GAIA, effectively handling long-horizon planning, retrieval, and recovery from execution errors. Benchmarked by [Artificial Analysis](https://artificialanalysis.ai/models/minimax-m2), MiniMax-M2 ranks among the top open-source models for composite intelligence, spanning mathematics, science, and instruction-following. Its small activation footprint enables fast inference, high concurrency, and improved unit economics, making it well-suited for large-scale agents, developer assistants, and reasoning-driven applications that require responsiveness and cost efficiency. To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks). Context: 196608
openrouter	Qwen: Qwen3 VL 32B Instruct	qwen3-vl-32b-instruct	0.50	1.50	Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks. Context: 262144
openrouter	LiquidAI/LFM2-8B-A1B	lfm2-8b-a1b	0.05	0.10	Model created via inbox interface Context: 32768
openrouter	LiquidAI/LFM2-2.6B	lfm-2.2-6b	0.05	0.10	LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. Context: 32768
openrouter	IBM: Granite 4.0 Micro	granite-4.0-h-micro	0.02	0.11	Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long context tool calling. Context: 131000
openrouter	Deep Cogito: Cogito V2 Preview Llama 405B	cogito-v2-preview-llama-405b	3.50	3.50	Cogito v2 405B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. It represents a significant step toward frontier intelligence with dense architecture delivering performance competitive with leading closed models. This advanced reasoning system combines policy improvement with massive scale for exceptional capabilities. Context: 32768
openrouter	OpenAI: GPT-5 Image Mini	gpt-5-image-mini	2.50	2.00	GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text rendering, and detailed image editing with reduced latency and cost. It excels at high-quality visual creation while maintaining strong text understanding, making it ideal for applications that require both efficient image generation and text processing at scale. Context: 400000
openrouter	Anthropic: Claude Haiku 4.5	claude-haiku-4.5	1.00	5.00	Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment. Context: 200000
openrouter	Qwen: Qwen3 VL 8B Thinking	qwen3-vl-8b-thinking	0.18	2.10	Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs. Context: 256000
openrouter	Qwen: Qwen3 VL 8B Instruct	qwen3-vl-8b-instruct	0.08	0.50	Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions. Context: 131072
openrouter	OpenAI: GPT-5 Image	gpt-5-image	10.00	10.00	[GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following, text rendering, and detailed image editing. Context: 400000
openrouter	OpenAI: o3 Deep Research	o3-deep-research	10.00	40.00	o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost. Context: 200000
openrouter	OpenAI: o4 Mini Deep Research	o4-mini-deep-research	2.00	8.00	o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost. Context: 200000
openrouter	NVIDIA: Llama 3.3 Nemotron Super 49B V1.5	llama-3.3-nemotron-super-49b-v1.5	0.10	0.40	Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter. Context: 131072
openrouter	Baidu: ERNIE 4.5 21B A3B Thinking	ernie-4.5-21b-a3b-thinking	0.07	0.28	ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performance in logical puzzles, math, science, coding, text generation, and expert-level academic benchmarks. Context: 131072
openrouter	Google: Gemini 2.5 Flash Image (Nano Banana)	gemini-2.5-flash-image	0.30	2.50	Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration) Context: 32768
openrouter	Qwen: Qwen3 VL 30B A3B Thinking	qwen3-vl-30b-a3b-thinking	0.20	1.00	Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research. Context: 131072
openrouter	Qwen: Qwen3 VL 30B A3B Instruct	qwen3-vl-30b-a3b-instruct	0.15	0.60	Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research. Context: 262144
openrouter	OpenAI: GPT-5 Pro	gpt-5-pro	15.00	120.00	GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
openrouter	Z.AI: GLM 4.6	glm-4.6	0.35	1.50	Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. Context: 202752
openrouter	Z.AI: GLM 4.6 (exacto)	glm-4.6:exacto	0.44	1.76	Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. Context: 204800
openrouter	Anthropic: Claude Sonnet 4.5	claude-sonnet-4.5	3.00	15.00	Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use. Context: 1000000
openrouter	DeepSeek: DeepSeek V3.2 Exp	deepseek-v3.2-exp	0.21	0.32	DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs. Context: 163840
openrouter	TheDrummer: Cydonia 24B V4.1	cydonia-24b-v4.1	0.30	0.50	Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence. Context: 131072
openrouter	Relace: Relace Apply 3	relace-apply-3	0.85	1.25	Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at 10,000 tokens/sec on average. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update> Zero Data Retention is enabled for Relace. Learn more about this model in their [documentation](https://docs.relace.ai/api-reference/instant-apply/apply) Context: 256000
openrouter	Google: Gemini 2.5 Flash Preview 09-2025	gemini-2.5-flash-preview-09-2025	0.30	2.50	Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning). Context: 1048576
openrouter	Google: Gemini 2.5 Flash Lite Preview 09-2025	gemini-2.5-flash-lite-preview-09-2025	0.10	0.40	Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. Context: 1048576
openrouter	Qwen: Qwen3 VL 235B A22B Thinking	qwen3-vl-235b-a22b-thinking	0.45	3.50	Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents. Context: 262144
openrouter	Qwen: Qwen3 VL 235B A22B Instruct	qwen3-vl-235b-a22b-instruct	0.12	0.56	Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents. Context: 262144
openrouter	Qwen: Qwen3 Max	qwen3-max	1.20	6.00	Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode. Context: 256000
openrouter	Qwen: Qwen3 Coder Plus	qwen3-coder-plus	1.00	5.00	Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities. Context: 128000
openrouter	OpenAI: GPT-5 Codex	gpt-5-codex	1.25	10.00	GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. Context: 400000
openrouter	DeepSeek: DeepSeek V3.1 Terminus (exacto)	deepseek-v3.1-terminus:exacto	0.21	0.79	DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. Context: 163840
openrouter	DeepSeek: DeepSeek V3.1 Terminus	deepseek-v3.1-terminus	0.21	0.79	DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. Context: 163840
openrouter	xAI: Grok 4 Fast	grok-4-fast	0.20	0.50	Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model on xAI's [news post](http://x.ai/news/grok-4-fast). Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) Context: 2000000
openrouter	Tongyi DeepResearch 30B A3B	tongyi-deepresearch-30b-a3b	0.09	0.40	Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billion per token. It's optimized for long-horizon, deep information-seeking tasks and delivers state-of-the-art performance on benchmarks like Humanity's Last Exam, BrowserComp, BrowserComp-ZH, WebWalkerQA, GAIA, xbench-DeepSearch, and FRAMES. This makes it superior for complex agentic search, reasoning, and multi-step problem-solving compared to prior models. The model includes a fully automated synthetic data pipeline for scalable pre-training, fine-tuning, and reinforcement learning. It uses large-scale continual pre-training on diverse agentic data to boost reasoning and stay fresh. It also features end-to-end on-policy RL with a customized Group Relative Policy Optimization, including token-level gradients and negative sample filtering for stable training. The model supports ReAct for core ability checks and an IterResearch-based 'Heavy' mode for max performance through test-time scaling. It's ideal for advanced research agents, tool use, and heavy inference workflows. Context: 131072
openrouter	Qwen: Qwen3 Coder Flash	qwen3-coder-flash	0.30	1.50	Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities. Context: 128000
openrouter	OpenGVLab: InternVL3 78B	internvl3-78b	0.10	0.39	The InternVL3 series is an advanced multimodal large language model (MLLM). Compared to InternVL 2.5, InternVL3 demonstrates stronger multimodal perception and reasoning capabilities. In addition, InternVL3 is benchmarked against the Qwen2.5 Chat models, whose pre-trained base models serve as the initialization for its language component. Benefiting from Native Multimodal Pre-Training, the InternVL3 series surpasses the Qwen2.5 series in overall text performance. Context: 32768
openrouter	Qwen: Qwen3 Next 80B A3B Thinking	qwen3-next-80b-a3b-thinking	0.15	1.20	Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode. Context: 262144
openrouter	Qwen: Qwen3 Next 80B A3B Instruct	qwen3-next-80b-a3b-instruct	0.06	0.60	Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought. The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred. Context: 262144
openrouter	Meituan: LongCat Flash Chat	longcat-flash-chat	0.20	0.80	LongCat-Flash-Chat is a large-scale Mixture-of-Experts (MoE) model with 560B total parameters, of which 18.6B–31.3B (≈27B on average) are dynamically activated per input. It introduces a shortcut-connected MoE design to reduce communication overhead and achieve high throughput while maintaining training stability through advanced scaling strategies such as hyperparameter transfer, deterministic computation, and multi-stage optimization. This release, LongCat-Flash-Chat, is a non-thinking foundation model optimized for conversational and agentic tasks. It supports long context windows up to 128K tokens and shows competitive performance across reasoning, coding, instruction following, and domain benchmarks, with particular strengths in tool use and complex multi-step interactions. Context: 131072
openrouter	Qwen: Qwen Plus 0728	qwen-plus-2025-07-28	0.40	1.20	Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination. Context: 1000000
openrouter	Qwen: Qwen Plus 0728 (thinking)	qwen-plus-2025-07-28:thinking	0.40	4.00	Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination. Context: 1000000
openrouter	NVIDIA: Nemotron Nano 9B V2 (free)	nemotron-nano-9b-v2:free	0.00	0.00	NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so. Context: 128000
openrouter	NVIDIA: Nemotron Nano 9B V2	nemotron-nano-9b-v2	0.04	0.16	NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so. Context: 131072
openrouter	MoonshotAI: Kimi K2 0905	kimi-k2-0905	0.39	1.90	Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training. Context: 262144
openrouter	MoonshotAI: Kimi K2 0905 (exacto)	kimi-k2-0905:exacto	0.60	2.50	Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training. Context: 262144
openrouter	Deep Cogito: Cogito V2 Preview Llama 70B	cogito-v2-preview-llama-70b	0.88	0.88	Cogito v2 70B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. Built with iterative policy improvement, it delivers strong performance across reasoning tasks while maintaining efficiency through shorter reasoning chains and improved intuition. Context: 32768
openrouter	Cogito V2 Preview Llama 109B	cogito-v2-preview-llama-109b-moe	0.18	0.59	An instruction-tuned, hybrid-reasoning Mixture-of-Experts model built on Llama-4-Scout-17B-16E. Cogito v2 can answer directly or engage an extended “thinking” phase, with alignment guided by Iterated Distillation & Amplification (IDA). It targets coding, STEM, instruction following, and general helpfulness, with stronger multilingual, tool-calling, and reasoning performance than size-equivalent baselines. The model supports long-context use (up to 10M tokens) and standard Transformers workflows. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 32767
openrouter	StepFun: Step3	step3	0.57	1.42	Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators. Context: 65536
openrouter	Qwen: Qwen3 30B A3B Thinking 2507	qwen3-30b-a3b-thinking-2507	0.05	0.34	Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated from final answers. Compared to earlier Qwen3-30B releases, this version improves performance across logical reasoning, mathematics, science, coding, and multilingual benchmarks. It also demonstrates stronger instruction following, tool use, and alignment with human preferences. With higher reasoning efficiency and extended output budgets, it is best suited for advanced research, competitive problem solving, and agentic applications requiring structured long-context reasoning. Context: 32768
openrouter	xAI: Grok Code Fast 1	grok-code-fast-1	0.20	1.50	Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows. Context: 256000
openrouter	Nous: Hermes 4 70B	hermes-4-70b	0.11	0.38	Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit <think>...</think> reasoning traces before answering. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates. Context: 131072
openrouter	Nous: Hermes 4 405B	hermes-4-405b	1.00	3.00	Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with <think>...</think> traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior. Context: 131072
openrouter	Google: Gemini 2.5 Flash Image Preview (Nano Banana)	gemini-2.5-flash-image-preview	0.30	2.50	Gemini 2.5 Flash Image Preview, a.k.a. "Nano Banana," is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Context: 32768
openrouter	DeepSeek: DeepSeek V3.1	deepseek-chat-v3.1	0.15	0.75	DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the [DeepSeek V3-0324](/deepseek/deepseek-chat-v3-0324) model and performs well on a variety of tasks. Context: 32768
openrouter	OpenAI: GPT-4o Audio	gpt-4o-audio-preview	2.50	10.00	The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs are currently not supported. Audio tokens are priced at $40 per million input audio tokens. Context: 128000
openrouter	Mistral: Mistral Medium 3.1	mistral-medium-3.1	0.40	2.00	Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments. Context: 131072
openrouter	Baidu: ERNIE 4.5 21B A3B	ernie-4.5-21b-a3b	0.07	0.28	A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understanding and generation through heterogeneous MoE structures and modality-isolated routing. Supporting an extensive 131K token context length, the model achieves efficient inference via multi-expert parallel collaboration and quantization, while advanced post-training techniques including SFT, DPO, and UPO ensure optimized performance across diverse applications with specialized routing and balancing losses for superior task handling. Context: 120000
openrouter	Baidu: ERNIE 4.5 VL 28B A3B	ernie-4.5-vl-28b-a3b	0.14	0.56	A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities. Context: 30000
openrouter	Z.AI: GLM 4.5V	glm-4.5v	0.60	1.80	GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 65536
openrouter	AI21: Jamba Mini 1.7	jamba-mini-1.7	0.20	0.40	Jamba Mini 1.7 is a compact and efficient member of the Jamba open model family, incorporating key improvements in grounding and instruction-following while maintaining the benefits of the SSM-Transformer hybrid architecture and 256K context window. Despite its compact size, it delivers accurate, contextually grounded responses and improved steerability. Context: 256000
openrouter	AI21: Jamba Large 1.7	jamba-large-1.7	2.00	8.00	Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions. Context: 256000
openrouter	OpenAI: GPT-5 Chat	gpt-5-chat	1.25	10.00	GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications. Context: 128000
openrouter	OpenAI: GPT-5	gpt-5	1.25	10.00	GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. Context: 400000
openrouter	OpenAI: GPT-5 Mini	gpt-5-mini	0.25	2.00	GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. GPT-5 Mini is the successor to OpenAI's o4-mini model. Context: 400000
openrouter	OpenAI: GPT-5 Nano	gpt-5-nano	0.05	0.40	GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger counterparts, it retains key instruction-following and safety features. It is the successor to GPT-4.1-nano and offers a lightweight option for cost-sensitive or real-time applications. Context: 400000
openrouter	OpenAI: gpt-oss-120b (free)	gpt-oss-120b:free	0.00	0.00	gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
openrouter	OpenAI: gpt-oss-120b	gpt-oss-120b	0.02	0.10	gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
openrouter	OpenAI: gpt-oss-120b (exacto)	gpt-oss-120b:exacto	0.04	0.19	gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072
openrouter	OpenAI: gpt-oss-20b (free)	gpt-oss-20b:free	0.00	0.00	gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072
openrouter	OpenAI: gpt-oss-20b	gpt-oss-20b	0.02	0.06	gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072
openrouter	Anthropic: Claude Opus 4.1	claude-opus-4.1	15.00	75.00	Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning. Context: 200000
openrouter	Mistral: Codestral 2508	codestral-2508	0.30	0.90	Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. [Blog Post](https://mistral.ai/news/codestral-25-08) Context: 256000
openrouter	Qwen: Qwen3 Coder 30B A3B Instruct	qwen3-coder-30b-a3b-instruct	0.07	0.27	Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the Qwen3 architecture, it supports a native context length of 256K tokens (extendable to 1M with Yarn) and performs strongly in tasks involving function calls, browser use, and structured code completion. This model is optimized for instruction-following without “thinking mode”, and integrates well with OpenAI-compatible tool-use formats. Context: 160000
openrouter	Qwen: Qwen3 30B A3B Instruct 2507	qwen3-30b-a3b-instruct-2507	0.08	0.33	Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and agentic tool use. Post-trained on instruction data, it demonstrates competitive performance across reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench) benchmarks. It outperforms its non-instruct variant on subjective and open-ended tasks while retaining strong factual and coding performance. Context: 262144
openrouter	Z.AI: GLM 4.5	glm-4.5	0.35	1.55	GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
openrouter	Z.AI: GLM 4.5 Air (free)	glm-4.5-air:free	0.00	0.00	GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
openrouter	Z.AI: GLM 4.5 Air	glm-4.5-air	0.05	0.22	GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
openrouter	Qwen: Qwen3 235B A22B Thinking 2507	qwen3-235b-a22b-thinking-2507	0.11	0.60	Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases. Context: 262144
openrouter	Z.AI: GLM 4 32B	glm-4-32b	0.10	0.10	GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models. Context: 128000
openrouter	Qwen: Qwen3 Coder 480B A35B (free)	qwen3-coder:free	0.00	0.00	Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262000
openrouter	Qwen: Qwen3 Coder 480B A35B	qwen3-coder	0.22	0.95	Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262144
openrouter	Qwen: Qwen3 Coder 480B A35B (exacto)	qwen3-coder:exacto	0.22	1.80	Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used. Context: 262144
openrouter	ByteDance: UI-TARS 7B	ui-tars-1.5-7b	0.10	0.20	UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints. Context: 128000
openrouter	Google: Gemini 2.5 Flash Lite	gemini-2.5-flash-lite	0.10	0.40	Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence. Context: 1048576
openrouter	Qwen: Qwen3 235B A22B Instruct 2507	qwen3-235b-a22b-2507	0.07	0.46	Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench. Context: 262144
openrouter	Switchpoint Router	router	0.85	3.40	Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you always benefit from the industry's newest models without changing your workflow. This model is configured for a simple, flat rate per response here on OpenRouter. It's powered by the full routing engine from [Switchpoint AI](https://www.switchpoint.dev). Context: 131072
openrouter	MoonshotAI: Kimi K2 0711 (free)	kimi-k2:free	0.00	0.00	Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training. Context: 32768
openrouter	MoonshotAI: Kimi K2 0711	kimi-k2	0.50	2.40	Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training. Context: 131072
openrouter	THUDM: GLM 4.1V 9B Thinking	glm-4.1v-9b-thinking	0.04	0.14	GLM-4.1V-9B-Thinking is a 9B parameter vision-language model developed by THUDM, based on the GLM-4-9B foundation. It introduces a reasoning-centric "thinking paradigm" enhanced with reinforcement learning to improve multimodal reasoning, long-context understanding (up to 64K tokens), and complex problem solving. It achieves state-of-the-art performance among models in its class, outperforming even larger models like Qwen-2.5-VL-72B on a majority of benchmark tasks. Context: 65536
openrouter	Mistral: Devstral Medium	devstral-medium	0.40	2.00	Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves 61.6% on SWE-Bench Verified, placing it ahead of Gemini 2.5 Pro and GPT-4.1 in code-related tasks, at a fraction of the cost. It is designed for generalization across prompt styles and tool use in code agents and frameworks. Devstral Medium is available via API only (not open-weight), and supports enterprise deployment on private infrastructure, with optional fine-tuning capabilities. Context: 131072
openrouter	Mistral: Devstral Small 1.1	devstral-small	0.07	0.28	Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats. Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes. Context: 128000
openrouter	Venice: Uncensored (free)	dolphin-mistral-24b-venice-edition:free	0.00	0.00	Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models. Context: 32768
openrouter	xAI: Grok 4	grok-4	3.00	15.00	Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified. Pricing increases once the total tokens in a given request is greater than 128k tokens. See more details on the [xAI docs](https://docs.x.ai/docs/models/grok-4-0709) Context: 256000
openrouter	Google: Gemma 3n 2B (free)	gemma-3n-e2b-it:free	0.00	0.00	Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based on the MatFormer architecture, it supports nested submodels and modular composition via the Mix-and-Match framework. Gemma 3n models are optimized for low-resource deployment, offering 32K context length and strong multilingual and reasoning performance across common benchmarks. This variant is trained on a diverse corpus including code, math, web, and multimodal data. Context: 8192
openrouter	Tencent: Hunyuan A13B Instruct	hunyuan-a13b-instruct	0.14	0.57	Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark performance across mathematics, science, coding, and multi-turn reasoning tasks, while maintaining high inference efficiency via Grouped Query Attention (GQA) and quantization support (FP8, GPTQ, etc.). Context: 131072
openrouter	TNG: DeepSeek R1T2 Chimera (free)	deepseek-r1t2-chimera:free	0.00	0.00	DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks. Context: 163840
openrouter	TNG: DeepSeek R1T2 Chimera	deepseek-r1t2-chimera	0.25	0.85	DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks. Context: 163840
openrouter	Morph: Morph V3 Large	morph-v3-large	0.90	1.90	Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update> Zero Data Retention is enabled for Morph. Learn more about this model in their [documentation](https://docs.morphllm.com/quickstart) Context: 262144
openrouter	Morph: Morph V3 Fast	morph-v3-fast	0.80	1.20	Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update> Zero Data Retention is enabled for Morph. Learn more about this model in their [documentation](https://docs.morphllm.com/quickstart) Context: 81920
openrouter	Baidu: ERNIE 4.5 VL 424B A47B	ernie-4.5-vl-424b-a47b	0.42	1.25	ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization. Context: 123000
openrouter	Baidu: ERNIE 4.5 300B A47B	ernie-4.5-300b-a47b	0.28	1.10	ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in both English and Chinese. Optimized for high-throughput inference and efficient scaling, it uses a heterogeneous MoE structure with advanced routing and quantization strategies, including FP8 and 2-bit formats. This version is fine-tuned for language-only tasks and supports reasoning, tool parameters, and extended context lengths up to 131k tokens. Suitable for general-purpose LLM applications with high reasoning and throughput demands. Context: 123000
openrouter	Inception: Mercury	mercury	0.25	1.00	Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post] (https://www.inceptionlabs.ai/blog/introducing-mercury) here. Context: 128000
openrouter	Mistral: Mistral Small 3.2 24B	mistral-small-3.2-24b-instruct	0.06	0.18	Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks. It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA). Context: 131072
openrouter	MiniMax: MiniMax M1	minimax-m1	0.40	2.20	MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it to process long sequences—up to 1 million tokens—while maintaining competitive FLOP efficiency. With 456 billion total parameters and 45.9B active per token, this variant is optimized for complex, multi-step reasoning tasks. Trained via a custom reinforcement learning pipeline (CISPO), M1 excels in long-context understanding, software engineering, agentic tool use, and mathematical reasoning. Benchmarks show strong performance across FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench, often outperforming other open models like DeepSeek R1 and Qwen3-235B. Context: 1000000
openrouter	Google: Gemini 2.5 Flash	gemini-2.5-flash	0.30	2.50	Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning). Context: 1048576
openrouter	Google: Gemini 2.5 Pro	gemini-2.5-pro	1.25	10.00	Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
openrouter	MoonshotAI: Kimi Dev 72B	kimi-dev-72b	0.29	1.15	Kimi-Dev-72B is an open-source large language model fine-tuned for software engineering and issue resolution tasks. Based on Qwen2.5-72B, it is optimized using large-scale reinforcement learning that applies code patches in real repositories and validates them via full test suite execution—rewarding only correct, robust completions. The model achieves 60.4% on SWE-bench Verified, setting a new benchmark among open-source models for software bug fixing and code reasoning. Context: 131072
openrouter	OpenAI: o3 Pro	o3-pro	20.00	80.00	The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers. Note that BYOK is required for this model. Set up here: https://openrouter.ai/settings/integrations Context: 200000
openrouter	xAI: Grok 3 Mini	grok-3-mini	0.30	0.50	A lightweight model that thinks before responding. Fast, smart, and great for logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible. Context: 131072
openrouter	xAI: Grok 3	grok-3	3.00	15.00	Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Context: 131072
openrouter	Google: Gemini 2.5 Pro Preview 06-05	gemini-2.5-pro-preview	1.25	10.00	Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
openrouter	DeepSeek: DeepSeek R1 0528 Qwen3 8B	deepseek-r1-0528-qwen3-8b	0.06	0.09	DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro. It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought. The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024. Context: 128000
openrouter	DeepSeek: R1 0528 (free)	deepseek-r1-0528:free	0.00	0.00	May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model. Context: 163840
openrouter	DeepSeek: R1 0528	deepseek-r1-0528	0.40	1.75	May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model. Context: 163840
openrouter	Anthropic: Claude Opus 4	claude-opus-4	15.00	75.00	Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation. Read more at the [blog post here](https://www.anthropic.com/news/claude-4) Context: 200000
openrouter	Anthropic: Claude Sonnet 4	claude-sonnet-4	3.00	15.00	Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios. Read more at the [blog post here](https://www.anthropic.com/news/claude-4) Context: 1000000
openrouter	Mistral: Devstral Small 2505	devstral-small-2505	0.06	0.12	Devstral-Small-2505 is a 24B parameter agentic LLM fine-tuned from Mistral-Small-3.1, jointly developed by Mistral AI and All Hands AI for advanced software engineering tasks. It is optimized for codebase exploration, multi-file editing, and integration into coding agents, achieving state-of-the-art results on SWE-Bench Verified (46.8%). Devstral supports a 128k context window and uses a custom Tekken tokenizer. It is text-only, with the vision encoder removed, and is suitable for local deployment on high-end consumer hardware (e.g., RTX 4090, 32GB RAM Macs). Devstral is best used in agentic workflows via the OpenHands scaffold and is compatible with inference frameworks like vLLM, Transformers, and Ollama. It is released under the Apache 2.0 license. Context: 128000
openrouter	Google: Gemma 3n 4B (free)	gemma-3n-e4b-it:free	0.00	0.00	Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/) Context: 8192
openrouter	Google: Gemma 3n 4B	gemma-3n-e4b-it	0.02	0.04	Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/) Context: 32768
openrouter	OpenAI: Codex Mini	codex-mini	1.50	6.00	codex-mini-latest is a fine-tuned version of o4-mini specifically for use in Codex CLI. For direct use in the API, we recommend starting with gpt-4.1. Context: 200000
openrouter	Nous: DeepHermes 3 Mistral 24B Preview	deephermes-3-mistral-24b-preview	0.02	0.10	DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications. DeepHermes 3 supports a reasoning toggle via system prompt, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a "deep thinking" mode—generating extended chains of thought wrapped in `<think></think>` tags before delivering a final answer. System Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem. Context: 32768
openrouter	Mistral: Mistral Medium 3	mistral-medium-3	0.40	2.00	Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments. Context: 131072
openrouter	Google: Gemini 2.5 Pro Preview 05-06	gemini-2.5-pro-preview-05-06	1.25	10.00	Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities. Context: 1048576
openrouter	Arcee AI: Spotlight	spotlight	0.18	0.18	Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests. Context: 131072
openrouter	Arcee AI: Maestro Reasoning	maestro-reasoning	0.90	3.30	Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL for step‑by‑step logic. Compared to the earlier 7 B preview, the production 32 B release widens the context window to 128 k tokens and doubles pass‑rate on MATH and GSM‑8K, while also lifting code completion accuracy. Its instruction style encourages structured "thought → answer" traces that can be parsed or hidden according to user preference. That transparency pairs well with audit‑focused industries like finance or healthcare where seeing the reasoning path matters. In Arcee Conductor, Maestro is automatically selected for complex, multi‑constraint queries that smaller SLMs bounce. Context: 131072
openrouter	Arcee AI: Virtuoso Large	virtuoso-large	0.75	1.20	Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and enterprise QA. Unlike many 70 B peers, it retains the 128 k context inherited from Qwen 2.5, letting it ingest books, codebases or financial filings wholesale. Training blended DeepSeek R1 distillation, multi‑epoch supervised fine‑tuning and a final DPO/RLHF alignment stage, yielding strong performance on BIG‑Bench‑Hard, GSM‑8K and long‑context Needle‑In‑Haystack tests. Enterprises use Virtuoso‑Large as the "fallback" brain in Conductor pipelines when other SLMs flag low confidence. Despite its size, aggressive KV‑cache optimizations keep first‑token latency in the low‑second range on 8× H100 nodes, making it a practical production‑grade powerhouse. Context: 131072
openrouter	Arcee AI: Coder Large	coder-large	0.50	0.80	Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed GitHub, CodeSearchNet and synthetic bug‑fix corpora. It supports a 32k context window, enabling multi‑file refactoring or long diff review in a single call, and understands 30‑plus programming languages with special attention to TypeScript, Go and Terraform. Internal benchmarks show 5–8 pt gains over CodeLlama‑34 B‑Python on HumanEval and competitive BugFix scores thanks to a reinforcement pass that rewards compilable output. The model emits structured explanations alongside code blocks by default, making it suitable for educational tooling as well as production copilot scenarios. Cost‑wise, Together AI prices it well below proprietary incumbents, so teams can scale interactive coding without runaway spend. Context: 32768
openrouter	Microsoft: Phi 4 Reasoning Plus	phi-4-reasoning-plus	0.07	0.35	Phi-4-reasoning-plus is an enhanced 14B parameter model from Microsoft, fine-tuned from Phi-4 with additional reinforcement learning to boost accuracy on math, science, and code reasoning tasks. It uses the same dense decoder-only transformer architecture as Phi-4, but generates longer, more comprehensive outputs structured into a step-by-step reasoning trace and final answer. While it offers improved benchmark scores over Phi-4-reasoning across tasks like AIME, OmniMath, and HumanEvalPlus, its responses are typically ~50% longer, resulting in higher latency. Designed for English-only applications, it is well-suited for structured reasoning workflows where output quality takes priority over response speed. Context: 32768
openrouter	Inception: Mercury Coder	mercury-coder	0.25	1.00	Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the [blog post here](https://www.inceptionlabs.ai/blog/introducing-mercury). Context: 128000
openrouter	Qwen: Qwen3 4B (free)	qwen3-4b:free	0.00	0.00	Qwen3-4B is a 4 billion parameter dense language model from the Qwen3 series, designed to support both general-purpose and reasoning-intensive tasks. It introduces a dual-mode architecture—thinking and non-thinking—allowing dynamic switching between high-precision logical reasoning and efficient dialogue generation. This makes it well-suited for multi-turn chat, instruction following, and complex agent workflows. Context: 40960
openrouter	DeepSeek: DeepSeek Prover V2	deepseek-prover-v2	0.50	2.18	DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from [DeepSeek-Prover-V1.5](https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL) Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description. Context: 163840
openrouter	Meta: Llama Guard 4 12B	llama-guard-4-12b	0.18	0.18	Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM—generating text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 4 was aligned to safeguard against the standardized MLCommons hazards taxonomy and designed to support multimodal Llama 4 capabilities. Specifically, it combines features from previous Llama Guard models, providing content moderation for English and multiple supported languages, along with enhanced capabilities to handle mixed text-and-image prompts, including multiple images. Additionally, Llama Guard 4 is integrated into the Llama Moderations API, extending robust safety classification to text and images. Context: 163840
openrouter	Qwen: Qwen3 30B A3B	qwen3-30b-a3b	0.06	0.22	Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models. Context: 40960
openrouter	Qwen: Qwen3 8B	qwen3-8b	0.04	0.14	Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math, coding, and logical inference, and "non-thinking" mode for general conversation. The model is fine-tuned for instruction-following, agent integration, creative writing, and multilingual use across 100+ languages and dialects. It natively supports a 32K token context window and can extend to 131K tokens with YaRN scaling. Context: 128000
openrouter	Qwen: Qwen3 14B	qwen3-14b	0.05	0.22	Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling. Context: 40960
openrouter	Qwen: Qwen3 32B	qwen3-32b	0.08	0.24	Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling. Context: 40960
openrouter	Qwen: Qwen3 235B A22B	qwen3-235b-a22b	0.18	0.54	Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling. Context: 40960
openrouter	TNG: DeepSeek R1T Chimera (free)	deepseek-r1t-chimera:free	0.00	0.00	DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks. The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use. Context: 163840
openrouter	TNG: DeepSeek R1T Chimera	deepseek-r1t-chimera	0.30	1.20	DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks. The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use. Context: 163840
openrouter	OpenAI: o4 Mini High	o4-mini-high	1.10	4.40	OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high. OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. Context: 200000
openrouter	OpenAI: o3	o3	2.00	8.00	o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. Context: 200000
openrouter	OpenAI: o4 Mini	o4-mini	1.10	4.40	OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. Context: 200000
openrouter	Qwen: Qwen2.5 Coder 7B Instruct	qwen2.5-coder-7b-instruct	0.03	0.09	Qwen2.5-Coder-7B-Instruct is a 7B parameter instruction-tuned language model optimized for code-related tasks such as code generation, reasoning, and bug fixing. Based on the Qwen2.5 architecture, it incorporates enhancements like RoPE, SwiGLU, RMSNorm, and GQA attention with support for up to 128K tokens using YaRN-based extrapolation. It is trained on a large corpus of source code, synthetic data, and text-code grounding, providing robust performance across programming languages and agentic coding workflows. This model is part of the Qwen2.5-Coder family and offers strong compatibility with tools like vLLM for efficient deployment. Released under the Apache 2.0 license. Context: 32768
openrouter	OpenAI: GPT-4.1	gpt-4.1	2.00	8.00	GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval. Context: 1047576
openrouter	OpenAI: GPT-4.1 Mini	gpt-4.1-mini	0.40	1.60	GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints. Context: 1047576
openrouter	OpenAI: GPT-4.1 Nano	gpt-4.1-nano	0.10	0.40	For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion. Context: 1047576
openrouter	EleutherAI: Llemma 7b	llemma_7b	0.80	1.20	Llemma 7B is a language model for mathematics. It was initialized with Code Llama 7B weights, and trained on the Proof-Pile-2 for 200B tokens. Llemma models are particularly strong at chain-of-thought mathematical reasoning and using computational tools for mathematics, such as Python and formal theorem provers. Context: 4096
openrouter	AlfredPros: CodeLLaMa 7B Instruct Solidity	codellama-7b-instruct-solidity	0.80	1.20	A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by PEFT library. Context: 4096
openrouter	xAI: Grok 3 Mini Beta	grok-3-mini-beta	0.30	0.50	Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It’s ideal for reasoning-heavy tasks that don’t demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems. Transparent "thinking" traces accessible. Defaults to low reasoning, can boost with setting `reasoning: { effort: "high" }` Note: That there are two xAI endpoints for this model. By default when using this model we will always route you to the base endpoint. If you want the fast endpoint you can add `provider: { sort: throughput}`, to sort by throughput instead. Context: 131072
openrouter	xAI: Grok 3 Beta	grok-3-beta	3.00	15.00	Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking. Note: That there are two xAI endpoints for this model. By default when using this model we will always route you to the base endpoint. If you want the fast endpoint you can add `provider: { sort: throughput}`, to sort by throughput instead. Context: 131072
openrouter	NVIDIA: Llama 3.1 Nemotron Ultra 253B v1	llama-3.1-nemotron-ultra-253b-v1	0.60	1.80	Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more. Context: 131072
openrouter	Meta: Llama 4 Maverick	llama-4-maverick	0.15	0.60	Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput. Context: 1048576
openrouter	Meta: Llama 4 Scout	llama-4-scout	0.08	0.30	Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025. Context: 327680
openrouter	Qwen: Qwen2.5 VL 32B Instruct	qwen2.5-vl-32b-instruct	0.05	0.22	Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation. Context: 16384
openrouter	DeepSeek: DeepSeek V3 0324	deepseek-chat-v3-0324	0.19	0.87	DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks. Context: 163840
openrouter	OpenAI: o1-pro	o1-pro	150.00	600.00	The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. Context: 200000
openrouter	Mistral: Mistral Small 3.1 24B (free)	mistral-small-3.1-24b-instruct:free	0.00	0.00	Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct) Context: 128000
openrouter	Mistral: Mistral Small 3.1 24B	mistral-small-3.1-24b-instruct	0.03	0.11	Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct) Context: 131072
openrouter	AllenAI: Olmo 2 32B Instruct	olmo-2-0325-32b-instruct	0.05	0.20	OLMo-2 32B Instruct is a supervised instruction-finetuned variant of the OLMo-2 32B March 2025 base model. It excels in complex reasoning and instruction-following tasks across diverse benchmarks such as GSM8K, MATH, IFEval, and general NLP evaluation. Developed by AI2, OLMo-2 32B is part of an open, research-oriented initiative, trained primarily on English-language datasets to advance the understanding and development of open-source language models. Context: 128000
openrouter	Google: Gemma 3 4B (free)	gemma-3-4b-it:free	0.00	0.00	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Context: 32768
openrouter	Google: Gemma 3 4B	gemma-3-4b-it	0.02	0.07	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Context: 96000
openrouter	Google: Gemma 3 12B (free)	gemma-3-12b-it:free	0.00	0.00	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after [Gemma 3 27B](google/gemma-3-27b-it) Context: 32768
openrouter	Google: Gemma 3 12B	gemma-3-12b-it	0.03	0.10	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after [Gemma 3 27B](google/gemma-3-27b-it) Context: 131072
openrouter	Cohere: Command A	command-a	2.50	10.00	Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary and open-weights models Command A delivers maximum performance with minimum hardware costs, excelling on business-critical agentic and multilingual tasks. Context: 256000
openrouter	OpenAI: GPT-4o-mini Search Preview	gpt-4o-mini-search-preview	0.15	0.60	GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Context: 128000
openrouter	OpenAI: GPT-4o Search Preview	gpt-4o-search-preview	2.50	10.00	GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Context: 128000
openrouter	Google: Gemma 3 27B (free)	gemma-3-27b-it:free	0.00	0.00	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it) Context: 131072
openrouter	Google: Gemma 3 27B	gemma-3-27b-it	0.04	0.06	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it) Context: 131072
openrouter	TheDrummer: Skyfall 36B V2	skyfall-36b-v2	0.55	0.80	Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling. Context: 32768
openrouter	Microsoft: Phi 4 Multimodal Instruct	phi-4-multimodal-instruct	0.05	0.10	Phi-4 Multimodal Instruct is a versatile 5.6B parameter foundation model that combines advanced reasoning and instruction-following capabilities across both text and visual inputs, providing accurate text outputs. The unified architecture enables efficient, low-latency inference, suitable for edge and mobile deployments. Phi-4 Multimodal Instruct supports text inputs in multiple languages including Arabic, Chinese, English, French, German, Japanese, Spanish, and more, with visual input optimized primarily for English. It delivers impressive performance on multimodal tasks involving mathematical, scientific, and document reasoning, providing developers and enterprises a powerful yet compact model for sophisticated interactive applications. For more information, see the [Phi-4 Multimodal blog post](https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/). Context: 131072
openrouter	Perplexity: Sonar Reasoning Pro	sonar-reasoning-pro	2.00	8.00	Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT). Designed for advanced use cases, it supports in-depth, multi-step queries with a larger context window and can surface more citations per search, enabling more comprehensive and extensible responses. Context: 128000
openrouter	Perplexity: Sonar Pro	sonar-pro	3.00	15.00	Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like double the number of citations per search as Sonar on average. Plus, with a larger context window, it can handle longer and more nuanced searches and follow-up questions. Context: 200000
openrouter	Perplexity: Sonar Deep Research	sonar-deep-research	2.00	8.00	Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains like finance, technology, health, and current events. Notes on Pricing ([Source](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-deep-research)) - Input tokens comprise of Prompt tokens (user prompt) + Citation tokens (these are processed tokens from running searches) - Deep Research runs multiple searches to conduct exhaustive research. Searches are priced at $5/1000 searches. A request that does 30 searches will cost $0.15 in this step. - Reasoning is a distinct step in Deep Research since it does extensive automated reasoning through all the material it gathers during its research phase. Reasoning tokens here are a bit different than the CoTs in the answer - these are tokens that we use to reason through the research material prior to generating the outputs via the CoTs. Reasoning tokens are priced at $3/1M tokens Context: 128000
openrouter	Qwen: QwQ 32B	qwq-32b	0.15	0.40	QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. Context: 32768
openrouter	Google: Gemini 2.0 Flash Lite	gemini-2.0-flash-lite-001	0.08	0.30	Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices. Context: 1048576
openrouter	Anthropic: Claude 3.7 Sonnet (thinking)	claude-3.7-sonnet:thinking	3.00	15.00	Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet) Context: 200000
openrouter	Anthropic: Claude 3.7 Sonnet	claude-3.7-sonnet	3.00	15.00	Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet) Context: 200000
openrouter	Mistral: Saba	mistral-saba	0.20	0.60	Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications. Read more at the blog post [here](https://mistral.ai/en/news/mistral-saba) Context: 32768
openrouter	Llama Guard 3 8B	llama-guard-3-8b	0.02	0.06	Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls. Context: 131072
openrouter	OpenAI: o3 Mini High	o3-mini-high	1.10	4.40	OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities. The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. Context: 200000
openrouter	Google: Gemini 2.0 Flash	gemini-2.0-flash-001	0.10	0.40	Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences. Context: 1048576
openrouter	Qwen: Qwen VL Plus	qwen-vl-plus	0.21	0.63	Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks. Context: 7500
openrouter	AionLabs: Aion-1.0	aion-1.0	4.00	8.00	Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model. Context: 131072
openrouter	AionLabs: Aion-1.0-Mini	aion-1.0-mini	0.70	1.40	Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), independently replicated for verification. Context: 131072
openrouter	AionLabs: Aion-RP 1.0 (8B)	aion-rp-llama-3.1-8b	0.80	1.60	Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing. Context: 32768
openrouter	Qwen: Qwen VL Max	qwen-vl-max	0.80	3.20	Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks. Context: 131072
openrouter	Qwen: Qwen-Turbo	qwen-turbo	0.05	0.20	Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks. Context: 1000000
openrouter	Qwen: Qwen2.5 VL 72B Instruct	qwen2.5-vl-72b-instruct	0.15	0.60	Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images. Context: 32768
openrouter	Qwen: Qwen-Plus	qwen-plus	0.40	1.20	Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination. Context: 131072
openrouter	Qwen: Qwen-Max	qwen-max	1.60	6.40	Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown. Context: 32768
openrouter	OpenAI: o3 Mini	o3-mini	1.10	4.40	OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. This model supports the `reasoning_effort` parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slug `openai/o3-mini-high` to default the parameter to "high". The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities. The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. Context: 200000
openrouter	Mistral: Mistral Small 3	mistral-small-24b-instruct-2501	0.03	0.11	Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. [Read the blog post about the model here.](https://mistral.ai/news/mistral-small-3/) Context: 32768
openrouter	DeepSeek: R1 Distill Qwen 32B	deepseek-r1-distill-qwen-32b	0.27	0.27	DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\nOther benchmark results include:\n\n- AIME 2024 pass@1: 72.6\n- MATH-500 pass@1: 94.3\n- CodeForces Rating: 1691\n\nThe model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 131072
openrouter	DeepSeek: R1 Distill Qwen 14B	deepseek-r1-distill-qwen-14b	0.15	0.15	DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 69.7 - MATH-500 pass@1: 93.9 - CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 32768
openrouter	Perplexity: Sonar Reasoning	sonar-reasoning	1.00	5.00	Sonar Reasoning is a reasoning model provided by Perplexity based on [DeepSeek R1](/deepseek/deepseek-r1). It allows developers to utilize long chain of thought with built-in web search. Sonar Reasoning is uncensored and hosted in US datacenters. Context: 127000
openrouter	Perplexity: Sonar	sonar	1.00	1.00	Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features optimized for speed. Context: 127072
openrouter	DeepSeek: R1 Distill Llama 70B	deepseek-r1-distill-llama-70b	0.03	0.11	DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Context: 131072
openrouter	DeepSeek: R1	deepseek-r1	0.70	2.40	DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120). MIT licensed: Distill & commercialize freely! Context: 163840
openrouter	MiniMax: MiniMax-01	minimax-01	0.20	1.10	MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens. The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model. To read more about the release, see: https://www.minimaxi.com/en/news/minimax-01-series-2 Context: 1000192
openrouter	Microsoft: Phi 4	phi-4	0.06	0.14	[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. For more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905) Context: 16384
openrouter	Sao10K: Llama 3.1 70B Hanami x1	l3.1-70b-hanami-x1	3.00	3.00	This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b). Context: 16000
openrouter	DeepSeek: DeepSeek V3	deepseek-chat	0.30	1.20	DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models. For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226). Context: 163840
openrouter	Sao10K: Llama 3.3 Euryale 70B	l3.3-euryale-70b	0.65	0.75	Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b). Context: 131072
openrouter	OpenAI: o1	o1	15.00	60.00	The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1). Context: 200000
openrouter	Cohere: Command R7B (12-2024)	command-r7b-12-2024	0.04	0.15	Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps. Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
openrouter	Google: Gemini 2.0 Flash Experimental (free)	gemini-2.0-flash-exp:free	0.00	0.00	Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences. Context: 1048576
openrouter	Meta: Llama 3.3 70B Instruct (free)	llama-3.3-70b-instruct:free	0.00	0.00	The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md) Context: 131072
openrouter	Meta: Llama 3.3 70B Instruct	llama-3.3-70b-instruct	0.10	0.32	The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md) Context: 131072
openrouter	Amazon: Nova Lite 1.0	nova-lite-v1	0.06	0.24	Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input. Context: 300000
openrouter	Amazon: Nova Micro 1.0	nova-micro-v1	0.04	0.14	Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities. Context: 128000
openrouter	Amazon: Nova Pro 1.0	nova-pro-v1	0.80	3.20	Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX). Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and at analyzing financial documents. NOTE: Video input is not supported at this time. Context: 300000
openrouter	OpenAI: GPT-4o (2024-11-20)	gpt-4o-2024-11-20	2.50	10.00	The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. Context: 128000
openrouter	Mistral Large 2411	mistral-large-2411	2.00	6.00	Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](/mistralai/pixtral-large-2411) It provides a significant upgrade on the previous [Mistral Large 24.07](/mistralai/mistral-large-2407), with notable improvements in long context understanding, a new system prompt, and more accurate function calling. Context: 131072
openrouter	Mistral Large 2407	mistral-large-2407	2.00	6.00	This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/). It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents. Context: 131072
openrouter	Mistral: Pixtral Large 2411	pixtral-large-2411	2.00	6.00	Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes. Context: 131072
openrouter	Qwen2.5 Coder 32B Instruct	qwen-2.5-coder-32b-instruct	0.03	0.11	Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning and code fixing. - A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. To read more about its evaluation results, check out [Qwen 2.5 Coder's blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/). Context: 32768
openrouter	SorcererLM 8x22B	sorcererlm-8x22b	4.50	4.50	SorcererLM is an advanced RP and storytelling model, built as a Low-rank 16-bit LoRA fine-tuned on [WizardLM-2 8x22B](/microsoft/wizardlm-2-8x22b). - Advanced reasoning and emotional intelligence for engaging and immersive interactions - Vivid writing capabilities enriched with spatial and contextual awareness - Enhanced narrative depth, promoting creative and dynamic storytelling Context: 16000
openrouter	TheDrummer: UnslopNemo 12B	unslopnemo-12b	0.40	0.40	UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios. Context: 32768
openrouter	Anthropic: Claude 3.5 Haiku (2024-10-22)	claude-3.5-haiku-20241022	0.80	4.00	Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries. It does not support image inputs. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use) Context: 200000
openrouter	Anthropic: Claude 3.5 Haiku	claude-3.5-haiku	0.80	4.00	Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions. This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems. This model is currently pointing to [Claude 3.5 Haiku (2024-10-22)](/anthropic/claude-3-5-haiku-20241022). Context: 200000
openrouter	Anthropic: Claude 3.5 Sonnet	claude-3.5-sonnet	6.00	30.00	New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal Context: 200000
openrouter	Magnum v4 72B	magnum-v4-72b	3.00	5.00	This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-2.5-72b-instruct). Context: 16384
openrouter	Mistral: Ministral 8B	ministral-8b	0.10	0.10	Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications. Context: 131072
openrouter	Mistral: Ministral 3B	ministral-3b	0.04	0.04	Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference. Context: 131072
openrouter	Qwen: Qwen2.5 7B Instruct	qwen-2.5-7b-instruct	0.04	0.10	Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter	NVIDIA: Llama 3.1 Nemotron 70B Instruct	llama-3.1-nemotron-70b-instruct	1.20	1.20	NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter	Inflection: Inflection 3 Productivity	inflection-3-productivity	2.50	10.00	Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional intelligence similar to Pi, see [Inflect 3 Pi](/inflection/inflection-3-pi) See [Inflection's announcement](https://inflection.ai/blog/enterprise) for more details. Context: 8000
openrouter	Inflection: Inflection 3 Pi	inflection-3-pi	2.50	10.00	Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi has been trained to mirror your tone and style, if you use more emojis, so will Pi! Try experimenting with various prompts and conversation styles. Context: 8000
openrouter	TheDrummer: Rocinante 12B	rocinante-12b	0.17	0.43	Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives - Adventure-filled and captivating stories Context: 32768
openrouter	Meta: Llama 3.2 90B Vision Instruct	llama-3.2-90b-vision-instruct	0.35	0.40	The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 32768
openrouter	Meta: Llama 3.2 11B Vision Instruct	llama-3.2-11b-vision-instruct	0.05	0.05	Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter	Meta: Llama 3.2 1B Instruct	llama-3.2-1b-instruct	0.03	0.20	Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance. Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 60000
openrouter	Meta: Llama 3.2 3B Instruct (free)	llama-3.2-3b-instruct:free	0.00	0.00	Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter	Meta: Llama 3.2 3B Instruct	llama-3.2-3b-instruct	0.02	0.02	Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/). Context: 131072
openrouter	Qwen2.5 72B Instruct	qwen-2.5-72b-instruct	0.12	0.39	Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter	NeverSleep: Lumimaid v0.2 8B	llama-3.1-lumimaid-8b	0.09	0.60	Lumimaid v0.2 8B is a finetune of [Llama 3.1 8B](/models/meta-llama/llama-3.1-8b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 32768
openrouter	Mistral: Pixtral 12B	pixtral-12b	0.10	0.10	The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836. Context: 32768
openrouter	Cohere: Command R (08-2024)	command-r-08-2024	0.15	0.60	command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model. Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
openrouter	Cohere: Command R+ (08-2024)	command-r-plus-08-2024	2.50	10.00	command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same. Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement). Context: 128000
openrouter	Qwen: Qwen2.5-VL 7B Instruct (free)	qwen-2.5-vl-7b-instruct:free	0.00	0.00	Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter	Qwen: Qwen2.5-VL 7B Instruct	qwen-2.5-vl-7b-instruct	0.20	0.20	Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). Context: 32768
openrouter	Sao10K: Llama 3.1 Euryale 70B v2.2	l3.1-euryale-70b	0.65	0.75	Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b). Context: 32768
openrouter	Microsoft: Phi-3.5 Mini 128K Instruct	phi-3.5-mini-128k-instruct	0.10	0.10	Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as [Phi-3 Mini](/models/microsoft/phi-3-mini-128k-instruct). The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters. Context: 128000
openrouter	Nous: Hermes 3 70B Instruct	hermes-3-llama-3.1-70b	0.30	0.30	Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 70B is a competitive, if not superior finetune of the [Llama-3.1 70B foundation model](/models/meta-llama/llama-3.1-70b-instruct), focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Context: 65536
openrouter	Nous: Hermes 3 405B Instruct (free)	hermes-3-llama-3.1-405b:free	0.00	0.00	Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two. Context: 131072
openrouter	Nous: Hermes 3 405B Instruct	hermes-3-llama-3.1-405b	1.00	1.00	Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two. Context: 131072
openrouter	OpenAI: ChatGPT-4o	chatgpt-4o-latest	5.00	15.00	OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in that it has additional RLHF. It is intended for research and evaluation. OpenAI notes that this model is not suited for production use-cases as it may be removed or redirected to another model in the future. Context: 128000
openrouter	Sao10K: Llama 3 8B Lunaris	l3-lunaris-8b	0.04	0.05	Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. Created by [Sao10k](https://huggingface.co/Sao10k), this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning. For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1. Context: 8192
openrouter	OpenAI: GPT-4o (2024-08-06)	gpt-4o-2024-08-06	2.50	10.00	The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/). GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) Context: 128000
openrouter	Meta: Llama 3.1 405B (base)	llama-3.1-405b	4.00	4.00	Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 32768
openrouter	Meta: Llama 3.1 405B Instruct (free)	llama-3.1-405b-instruct:free	0.00	0.00	The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
openrouter	Meta: Llama 3.1 405B Instruct	llama-3.1-405b-instruct	3.50	3.50	The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 10000
openrouter	Meta: Llama 3.1 8B Instruct	llama-3.1-8b-instruct	0.02	0.03	Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
openrouter	Meta: Llama 3.1 70B Instruct	llama-3.1-70b-instruct	0.40	0.40	Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 131072
openrouter	Mistral: Mistral Nemo	mistral-nemo	0.02	0.04	A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license. Context: 131072
openrouter	OpenAI: GPT-4o-mini	gpt-4o-mini	0.15	0.60	GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more. #multimodal Context: 128000
openrouter	OpenAI: GPT-4o-mini (2024-07-18)	gpt-4o-mini-2024-07-18	0.15	0.60	GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more. #multimodal Context: 128000
openrouter	Google: Gemma 2 27B	gemma-2-27b-it	0.65	0.65	Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Context: 8192
openrouter	Google: Gemma 2 9B	gemma-2-9b-it	0.03	0.09	Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness. See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Context: 8192
openrouter	Sao10k: Llama 3 Euryale 70B v2.1	l3-euryale-70b	1.48	1.48	Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom formatting / reply formats. - Very creative, lots of unique swipes. - Is not restrictive during roleplays. Context: 8192
openrouter	Mistral: Mistral 7B Instruct (free)	mistral-7b-instruct:free	0.00	0.00	A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version. Context: 32768
openrouter	Mistral: Mistral 7B Instruct	mistral-7b-instruct	0.03	0.05	A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version. Context: 32768
openrouter	Mistral: Mistral 7B Instruct v0.3	mistral-7b-instruct-v0.3	0.20	0.20	A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of [Mistral 7B Instruct v0.2](/models/mistralai/mistral-7b-instruct-v0.2), with the following changes: - Extended vocabulary to 32768 - Supports v3 Tokenizer - Supports function calling NOTE: Support for function calling depends on the provider. Context: 32768
openrouter	NousResearch: Hermes 2 Pro - Llama-3 8B	hermes-2-pro-llama-3-8b	0.03	0.08	Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. Context: 8192
openrouter	Microsoft: Phi-3 Mini 128K Instruct	phi-3-mini-128k-instruct	0.10	0.10	Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date. Context: 128000
openrouter	Microsoft: Phi-3 Medium 128K Instruct	phi-3-medium-128k-instruct	1.00	1.00	Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance. For 4k context length, try [Phi-3 Medium 4K](/models/microsoft/phi-3-medium-4k-instruct). Context: 128000
openrouter	OpenAI: GPT-4o (2024-05-13)	gpt-4o-2024-05-13	5.00	15.00	GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal Context: 128000
openrouter	OpenAI: GPT-4o	gpt-4o	2.50	10.00	GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal Context: 128000
openrouter	OpenAI: GPT-4o (extended)	gpt-4o:extended	6.00	18.00	GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal Context: 128000
openrouter	Meta: LlamaGuard 2 8B	llama-guard-2-8b	0.20	0.20	This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification. LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated. For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
openrouter	Meta: Llama 3 70B Instruct	llama-3-70b-instruct	0.30	0.40	Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
openrouter	Meta: Llama 3 8B Instruct	llama-3-8b-instruct	0.03	0.06	Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/). Context: 8192
openrouter	Mistral: Mixtral 8x22B Instruct	mixtral-8x22b-instruct	2.00	6.00	Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe Context: 65536
openrouter	WizardLM-2 8x22B	wizardlm-2-8x22b	0.48	0.48	WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe Context: 65536
openrouter	OpenAI: GPT-4 Turbo	gpt-4-turbo	10.00	30.00	The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023. Context: 128000
openrouter	Anthropic: Claude 3 Haiku	claude-3-haiku	0.25	1.25	Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal Context: 200000
openrouter	Anthropic: Claude 3 Opus	claude-3-opus	15.00	75.00	Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal Context: 200000
openrouter	Mistral Large	mistral-large	2.00	6.00	This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/). It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents. Context: 128000
openrouter	OpenAI: GPT-3.5 Turbo (older v0613)	gpt-3.5-turbo-0613	1.00	2.00	GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021. Context: 4095
openrouter	OpenAI: GPT-4 Turbo Preview	gpt-4-turbo-preview	10.00	30.00	The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023. Note: heavily rate limited by OpenAI while in preview. Context: 128000
openrouter	Mistral Tiny	mistral-tiny	0.25	0.25	Note: This model is being deprecated. Recommended replacement is the newer [Ministral 8B](/mistral/ministral-8b) This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial. Context: 32768
openrouter	Mistral: Mistral 7B Instruct v0.2	mistral-7b-instruct-v0.2	0.20	0.20	A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), with the following changes: - 32k context window (vs 8k context in v0.1) - Rope-theta = 1e6 - No Sliding-Window Attention Context: 32768
openrouter	Mistral: Mixtral 8x7B Instruct	mixtral-8x7b-instruct	0.54	0.54	Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe Context: 32768
openrouter	Noromaid 20B	noromaid-20b	1.00	1.75	A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge. #merge #uncensored Context: 4096
openrouter	Goliath 120B	goliath-120b	6.00	8.00	A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit). - [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios. #merge Context: 6144
openrouter	Auto Router	auto	-1,000,000.00	-1,000,000.00	Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used, visit [Activity](/activity), or read the `model` attribute of the response. Your response will be priced at the same rate as the routed model. The meta-model is powered by [Not Diamond](https://docs.notdiamond.ai/docs/how-not-diamond-works). Learn more in our [docs](/docs/model-routing). Requests will be routed to the following models: - [openai/gpt-5.1](/openai/gpt-5.1) - [openai/gpt-5](/openai/gpt-5) - [openai/gpt-5-mini](/openai/gpt-5-mini) - [openai/gpt-5-nano](/openai/gpt-5-nano) - [openai/gpt-4.1](/openai/gpt-4.1) - [openai/gpt-4.1-mini](/openai/gpt-4.1-mini) - [openai/gpt-4.1-nano](/openai/gpt-4.1-nano) - [openai/gpt-4o](/openai/gpt-4o) - [openai/gpt-4o-2024-05-13](/openai/gpt-4o-2024-05-13) - [openai/gpt-4o-2024-08-06](/openai/gpt-4o-2024-08-06) - [openai/gpt-4o-2024-11-20](/openai/gpt-4o-2024-11-20) - [openai/gpt-4o-mini](/openai/gpt-4o-mini) - [openai/gpt-4o-mini-2024-07-18](/openai/gpt-4o-mini-2024-07-18) - [openai/gpt-4-turbo](/openai/gpt-4-turbo) - [openai/gpt-4-turbo-preview](/openai/gpt-4-turbo-preview) - [openai/gpt-4-1106-preview](/openai/gpt-4-1106-preview) - [openai/gpt-4](/openai/gpt-4) - [openai/gpt-3.5-turbo](/openai/gpt-3.5-turbo) - [openai/gpt-oss-120b](/openai/gpt-oss-120b) - [anthropic/claude-opus-4.5](/anthropic/claude-opus-4.5) - [anthropic/claude-opus-4.1](/anthropic/claude-opus-4.1) - [anthropic/claude-opus-4](/anthropic/claude-opus-4) - [anthropic/claude-sonnet-4.5](/anthropic/claude-sonnet-4.5) - [anthropic/claude-sonnet-4](/anthropic/claude-sonnet-4) - [anthropic/claude-3.7-sonnet](/anthropic/claude-3.7-sonnet) - [anthropic/claude-haiku-4.5](/anthropic/claude-haiku-4.5) - [anthropic/claude-3.5-haiku](/anthropic/claude-3.5-haiku) - [anthropic/claude-3-haiku](/anthropic/claude-3-haiku) - [google/gemini-3-pro-preview](/google/gemini-3-pro-preview) - [google/gemini-2.5-pro](/google/gemini-2.5-pro) - [google/gemini-2.0-flash-001](/google/gemini-2.0-flash-001) - [google/gemini-2.5-flash](/google/gemini-2.5-flash) - [mistralai/mistral-large](/mistralai/mistral-large) - [mistralai/mistral-large-2407](/mistralai/mistral-large-2407) - [mistralai/mistral-large-2411](/mistralai/mistral-large-2411) - [mistralai/mistral-medium-3.1](/mistralai/mistral-medium-3.1) - [mistralai/mistral-nemo](/mistralai/mistral-nemo) - [mistralai/mistral-7b-instruct](/mistralai/mistral-7b-instruct) - [mistralai/mixtral-8x7b-instruct](/mistralai/mixtral-8x7b-instruct) - [mistralai/mixtral-8x22b-instruct](/mistralai/mixtral-8x22b-instruct) - [mistralai/codestral-2508](/mistralai/codestral-2508) - [x-ai/grok-4](/x-ai/grok-4) - [x-ai/grok-3](/x-ai/grok-3) - [x-ai/grok-3-mini](/x-ai/grok-3-mini) - [deepseek/deepseek-r1](/deepseek/deepseek-r1) - [meta-llama/llama-3.3-70b-instruct](/meta-llama/llama-3.3-70b-instruct) - [meta-llama/llama-3.1-405b-instruct](/meta-llama/llama-3.1-405b-instruct) - [meta-llama/llama-3.1-70b-instruct](/meta-llama/llama-3.1-70b-instruct) - [meta-llama/llama-3.1-8b-instruct](/meta-llama/llama-3.1-8b-instruct) - [meta-llama/llama-3-70b-instruct](/meta-llama/llama-3-70b-instruct) - [meta-llama/llama-3-8b-instruct](/meta-llama/llama-3-8b-instruct) - [qwen/qwen3-235b-a22b](/qwen/qwen3-235b-a22b) - [qwen/qwen3-32b](/qwen/qwen3-32b) - [qwen/qwen3-14b](/qwen/qwen3-14b) - [cohere/command-r-plus-08-2024](/cohere/command-r-plus-08-2024) - [cohere/command-r-08-2024](/cohere/command-r-08-2024) - [moonshotai/kimi-k2-thinking](/moonshotai/kimi-k2-thinking) - [perplexity/sonar](/perplexity/sonar) Context: 2000000
openrouter	OpenAI: GPT-4 Turbo (older v1106)	gpt-4-1106-preview	10.00	30.00	The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023. Context: 128000
openrouter	Mistral: Mistral 7B Instruct v0.1	mistral-7b-instruct-v0.1	0.11	0.19	A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length. Context: 2824
openrouter	OpenAI: GPT-3.5 Turbo Instruct	gpt-3.5-turbo-instruct	1.50	2.00	This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021. Context: 4095
openrouter	OpenAI: GPT-3.5 Turbo 16k	gpt-3.5-turbo-16k	3.00	4.00	This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021. Context: 16385
openrouter	Mancer: Weaver (alpha)	weaver	0.75	1.00	An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations. Context: 8000
openrouter	ReMM SLERP 13B	remm-slerp-l2-13b	0.45	0.65	A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge Context: 6144
openrouter	MythoMax 13B	mythomax-l2-13b	0.06	0.06	One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge Context: 4096
openrouter	OpenAI: GPT-4 (older v0314)	gpt-4-0314	30.00	60.00	GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021. Context: 8191
openrouter	OpenAI: GPT-4	gpt-4	30.00	60.00	OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021. Context: 8191
openrouter	OpenAI: GPT-3.5 Turbo	gpt-3.5-turbo	0.50	1.50	GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021. Context: 16385
factoryai	glm-4.6	glm-4.6	-	-	-
factoryai	claude-haiku-4-5-20251001	claude-haiku-4-5-20251001	-	-	-
factoryai	gpt-5.1	gpt-5.1	-	-	-
factoryai	gpt-5.1-codex	gpt-5.1-codex	-	-	-
factoryai	gpt-5.1-codex-max	gpt-5.1-codex-max	-	-	-
factoryai	gpt-5.2	gpt-5.2	-	-	-
factoryai	gemini-3-pro-preview	gemini-3-pro-preview	-	-	-
factoryai	gemini-3-flash-preview	gemini-3-flash-preview	-	-	-
factoryai	claude-sonnet-4-5-20250929	claude-sonnet-4-5-20250929	-	-	-
factoryai	claude-opus-4-5-20251101	claude-opus-4-5-20251101	-	-	-
zai	GLM-4.7	glm-4.7	0.60	0.11	-
zai	GLM-4.6	glm-4.6	0.60	0.11	-
zai	GLM-4.6V	glm-4.6v	0.30	0.05	-
zai	GLM-4.6V-FlashX	glm-4.6v-flashx	0.04	0.00	-
zai	GLM-4.5	glm-4.5	0.60	0.11	-
zai	GLM-4.5V	glm-4.5v	0.60	0.11	-
zai	GLM-4.5-X	glm-4.5-x	2.20	0.45	-
zai	GLM-4.5-Air	glm-4.5-air	0.20	0.03	-
zai	GLM-4.5-AirX	glm-4.5-airx	1.10	0.22	-
zai	GLM-4-32B-0414-128K	glm-4-32b-0414-128k	0.10	-	-
zai	GLM-4.6V-Flash	glm-4.6v-flash	0.00	0.00	-
zai	GLM-4.5-Flash	glm-4.5-flash	0.00	0.00	-