← Back to all models

Qwen3 Next 80B A3B Thinking

qwen3-next-80b-a3b-thinking

A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).

Available at 12 Providers

Provider	Source	Input Price ($/1M)	Output Price ($/1M)	Description	Free
vercel	vercel	Input: $0.15	Output: $1.50	A new generation of Qwen3-based open-source thinking mode models. This version offers improved instruction following and streamlined summary responses over the previous iteration (Qwen3-235B-A22B-Thinking-2507).
together	together	Input: $0.15	Output: $1.50	-
alibaba	models-dev	Input: $0.50	Output: $6.00	Provider: Alibaba, Context: 131072, Output Limit: 32768
nvidia	models-dev	Input: $0.00	Output: $0.00	Provider: Nvidia, Context: 262144, Output Limit: 16384
alibabacn	models-dev	Input: $0.14	Output: $1.43	Provider: Alibaba (China), Context: 131072, Output Limit: 32768
siliconflowcn	models-dev	Input: $0.14	Output: $0.57	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
cortecs	models-dev	Input: $0.16	Output: $1.31	Provider: Cortecs, Context: 128000, Output Limit: 128000
siliconflow	models-dev	Input: $0.14	Output: $0.57	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
huggingface	models-dev	Input: $0.30	Output: $2.00	Provider: Hugging Face, Context: 262144, Output Limit: 131072
deepinfra	litellm	Input: $0.14	Output: $1.40	Source: deepinfra, Context: 262144
fireworksai	litellm	Input: $0.90	Output: $0.90	Source: fireworks_ai, Context: 4096
openrouter	openrouter	Input: $0.15	Output: $1.20	Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode. Context: 262144