← Back to all models

gpt-oss-120b

gpt-oss-120b

Extremely capable general-purpose LLM with strong, controllable reasoning capabilities

Available at 29 Providers

Provider Source Input Price ($/1M) Output Price ($/1M) Description Free
vercel vercel Input: $0.10 Output: $0.50 Extremely capable general-purpose LLM with strong, controllable reasoning capabilities
together together Input: $0.15 Output: $0.60 -
poe poe Input: $1,200.00 Output: - OpenAI introduces the GPT-OSS-120B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
vultr models-dev Input: $0.20 Output: $0.20 Provider: Vultr, Context: 121808, Output Limit: 8192
nvidia models-dev Input: $0.00 Output: $0.00 Provider: Nvidia, Context: 128000, Output Limit: 8192
groq models-dev Input: $0.15 Output: $0.60 Provider: Groq, Context: 131072, Output Limit: 65536
nebius models-dev Input: $0.15 Output: $0.60 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
siliconflowcn models-dev Input: $0.05 Output: $0.45 Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
cortecs models-dev Input: $0.00 Output: $0.00 Provider: Cortecs, Context: 128000, Output Limit: 128000
togetherai models-dev Input: $0.15 Output: $0.60 Provider: Together AI, Context: 131072, Output Limit: 131072
siliconflow models-dev Input: $0.05 Output: $0.45 Provider: SiliconFlow, Context: 131000, Output Limit: 8000
helicone models-dev Input: $0.04 Output: $0.16 Provider: Helicone, Context: 131072, Output Limit: 131072
fastrouter models-dev Input: $0.15 Output: $0.60 Provider: FastRouter, Context: 131072, Output Limit: 32768
cloudflareworkersai models-dev Input: $0.35 Output: $0.75 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareaigateway models-dev Input: $0.35 Output: $0.75 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
ovhcloud models-dev Input: $0.09 Output: $0.47 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
synthetic models-dev Input: $0.10 Output: $0.10 Provider: Synthetic, Context: 128000, Output Limit: 32768
deepinfra models-dev Input: $0.05 Output: $0.24 Provider: Deep Infra, Context: 131072, Output Limit: 16384
submodel models-dev Input: $0.10 Output: $0.50 Provider: submodel, Context: 131072, Output Limit: 32768
nanogpt models-dev Input: $1.00 Output: $2.00 Provider: NanoGPT, Context: 128000, Output Limit: 8192
fireworksai models-dev Input: $0.15 Output: $0.60 Provider: Fireworks AI, Context: 131072, Output Limit: 32768
ionet models-dev Input: $0.04 Output: $0.40 Provider: IO.NET, Context: 131072, Output Limit: 4096
scaleway models-dev Input: $0.15 Output: $0.60 Provider: Scaleway, Context: 128000, Output Limit: 8192
cerebras models-dev Input: $0.25 Output: $0.69 Provider: Cerebras, Context: 131072, Output Limit: 32768
azureai litellm Input: $0.15 Output: $0.60 Source: azure_ai, Context: 131072
sambanova litellm Input: $3.00 Output: $4.50 Source: sambanova, Context: 131072
wandb litellm Input: $15,000.00 Output: $60,000.00 Source: wandb, Context: 131072
watsonx litellm Input: $0.15 Output: $0.60 Source: watsonx, Context: 8192
openrouter openrouter Input: $0.02 Output: $0.10 gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. Context: 131072