gpt-oss-120b

Provider	Source	Input Price ($/1M)	Output Price ($/1M)	Description
vercel	vercel	Input: $0.10	Output: $0.50	Extremely capable general-purpose LLM with strong, controllable reasoning capabilities
together	together	Input: $0.15	Output: $0.60	-
poe	poe	Input: $1,200.00	Output: -	OpenAI introduces the GPT-OSS-120B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
vultr	models-dev	Input: $0.20	Output: $0.20	Provider: Vultr, Context: 121808, Output Limit: 8192
nvidia	models-dev	Input: $0.00	Output: $0.00	Provider: Nvidia, Context: 128000, Output Limit: 8192
groq	models-dev	Input: $0.15	Output: $0.60	Provider: Groq, Context: 131072, Output Limit: 65536
nebius	models-dev	Input: $0.15	Output: $0.60	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
siliconflowcn	models-dev	Input: $0.05	Output: $0.45	Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
cortecs	models-dev	Input: $0.00	Output: $0.00	Provider: Cortecs, Context: 128000, Output Limit: 128000
togetherai	models-dev	Input: $0.15	Output: $0.60	Provider: Together AI, Context: 131072, Output Limit: 131072
siliconflow	models-dev	Input: $0.05	Output: $0.45	Provider: SiliconFlow, Context: 131000, Output Limit: 8000
helicone	models-dev	Input: $0.04	Output: $0.16	Provider: Helicone, Context: 131072, Output Limit: 131072
fastrouter	models-dev	Input: $0.15	Output: $0.60	Provider: FastRouter, Context: 131072, Output Limit: 32768
cloudflareworkersai	models-dev	Input: $0.35	Output: $0.75	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareaigateway	models-dev	Input: $0.35	Output: $0.75	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
ovhcloud	models-dev	Input: $0.09	Output: $0.47	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
synthetic	models-dev	Input: $0.10	Output: $0.10	Provider: Synthetic, Context: 128000, Output Limit: 32768
deepinfra	models-dev	Input: $0.05	Output: $0.24	Provider: Deep Infra, Context: 131072, Output Limit: 16384
submodel	models-dev	Input: $0.10	Output: $0.50	Provider: submodel, Context: 131072, Output Limit: 32768
nanogpt	models-dev	Input: $1.00	Output: $2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
fireworksai	models-dev	Input: $0.15	Output: $0.60	Provider: Fireworks AI, Context: 131072, Output Limit: 32768
ionet	models-dev	Input: $0.04	Output: $0.40	Provider: IO.NET, Context: 131072, Output Limit: 4096
scaleway	models-dev	Input: $0.15	Output: $0.60	Provider: Scaleway, Context: 128000, Output Limit: 8192
cerebras	models-dev	Input: $0.25	Output: $0.69	Provider: Cerebras, Context: 131072, Output Limit: 32768
azureai	litellm	Input: $0.15	Output: $0.60	Source: azure_ai, Context: 131072
sambanova	litellm	Input: $3.00	Output: $4.50	Source: sambanova, Context: 131072
wandb	litellm	Input: $15,000.00	Output: $60,000.00	Source: wandb, Context: 131072
watsonx	litellm	Input: $0.15	Output: $0.60	Source: watsonx, Context: 8192
openrouter	openrouter	Input: $0.02	Output: $0.10	gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072

Available at 29 Providers