Kimi K2 Thinking

Provider	Source	Input Price ($/1M)	Output Price ($/1M)	Description
vercel	vercel	Input: $0.47	Output: $2.00	Kimi K2 Thinking is an advanced open-source thinking model by Moonshot AI. It can execute up to 200 – 300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. Built as a thinking agent, it reasons step by step while using tools, achieving state-of-the-art performance on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, with major gains in reasoning, agentic search, coding, writing, and general capabilities.
together	together	Input: $1.20	Output: $4.00	-
poe	poe	Input: $6,700.00	Output: -	Built as a thinking agent, it performs step-by-step reasoning while utilizing tools, achieving state-of-the-art performance on benchmarks such as Humanity's Last Exam (HLE), BrowseComp, and others. The model demonstrates substantial advancements in reasoning, agentic search, coding, writing, and general problem-solving capabilities. Kimi K2 Thinking is capable of executing 200–300 sequential tool calls autonomously, maintaining coherent reasoning across hundreds of steps to solve complex tasks. File Support: Text, Markdown and PDF files Context window: 256k tokens
moonshotaicn	models-dev	Input: $0.60	Output: $2.50	Provider: Moonshot AI (China), Context: 262144, Output Limit: 262144
moonshotai	models-dev	Input: $0.60	Output: $2.50	Provider: Moonshot AI, Context: 262144, Output Limit: 262144
nvidia	models-dev	Input: $0.00	Output: $0.00	Provider: Nvidia, Context: 262144, Output Limit: 262144
venice	models-dev	Input: $0.75	Output: $3.20	Provider: Venice AI, Context: 262144, Output Limit: 65536
siliconflowcn	models-dev	Input: $0.55	Output: $2.50	Provider: SiliconFlow (China), Context: 262000, Output Limit: 262000
kimiforcoding	models-dev	Input: $0.00	Output: $0.00	Provider: Kimi For Coding, Context: 262144, Output Limit: 32768
cortecs	models-dev	Input: $0.66	Output: $2.73	Provider: Cortecs, Context: 262000, Output Limit: 262000
togetherai	models-dev	Input: $1.20	Output: $4.00	Provider: Together AI, Context: 262144, Output Limit: 32768
azure	models-dev	Input: $0.60	Output: $2.50	Provider: Azure, Context: 262144, Output Limit: 262144
baseten	models-dev	Input: $0.60	Output: $2.50	Provider: Baseten, Context: 262144, Output Limit: 262144
siliconflow	models-dev	Input: $0.55	Output: $2.50	Provider: SiliconFlow, Context: 262000, Output Limit: 262000
helicone	models-dev	Input: $0.48	Output: $2.00	Provider: Helicone, Context: 256000, Output Limit: 262144
opencode	models-dev	Input: $0.40	Output: $2.50	Provider: OpenCode Zen, Context: 262144, Output Limit: 262144
zenmux	models-dev	Input: $0.60	Output: $2.50	Provider: ZenMux, Context: 262144, Output Limit: 64000
iflowcn	models-dev	Input: $0.00	Output: $0.00	Provider: iFlow, Context: 128000, Output Limit: 64000
synthetic	models-dev	Input: $0.55	Output: $2.19	Provider: Synthetic, Context: 262144, Output Limit: 262144
deepinfra	models-dev	Input: $0.47	Output: $2.00	Provider: Deep Infra, Context: 131072, Output Limit: 32768
nanogpt	models-dev	Input: $1.00	Output: $2.00	Provider: NanoGPT, Context: 32768, Output Limit: 8192
fireworksai	models-dev	Input: $0.60	Output: $2.50	Provider: Fireworks AI, Context: 256000, Output Limit: 256000
ionet	models-dev	Input: $0.55	Output: $2.25	Provider: IO.NET, Context: 32768, Output Limit: 4096
azurecognitiveservices	models-dev	Input: $0.60	Output: $2.50	Provider: Azure Cognitive Services, Context: 262144, Output Limit: 262144
moonshot	litellm	Input: $0.60	Output: $2.50	Source: moonshot, Context: 262144
openrouter	openrouter	Input: $0.32	Output: $0.48	Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift. It sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks. Context: 262144

Available at 26 Providers