← Back to all models

Qwen3 Next 80B A3B Thinking

qwen3-next-80b-a3b-thinking

A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).

Available at 12 Providers

Provider Source Input Price ($/1M) Output Price ($/1M) Description Free
vercel vercel Input: $0.15 Output: $1.50 A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).
together together Input: $0.15 Output: $1.50 -
alibaba models-dev Input: $0.50 Output: $6.00 Provider: Alibaba, Context: 131072, Output Limit: 32768
nvidia models-dev Input: $0.00 Output: $0.00 Provider: Nvidia, Context: 262144, Output Limit: 16384
alibabacn models-dev Input: $0.14 Output: $1.43 Provider: Alibaba (China), Context: 131072, Output Limit: 32768
siliconflowcn models-dev Input: $0.14 Output: $0.57 Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
cortecs models-dev Input: $0.16 Output: $1.31 Provider: Cortecs, Context: 128000, Output Limit: 128000
siliconflow models-dev Input: $0.14 Output: $0.57 Provider: SiliconFlow, Context: 262000, Output Limit: 262000
huggingface models-dev Input: $0.30 Output: $2.00 Provider: Hugging Face, Context: 262144, Output Limit: 131072
deepinfra litellm Input: $0.14 Output: $1.40 Source: deepinfra, Context: 262144
fireworksai litellm Input: $0.90 Output: $0.90 Source: fireworks_ai, Context: 4096
openrouter openrouter Input: $0.15 Output: $1.20 Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode. Context: 262144