← Back to all models

gpt-oss-20b

gpt-oss-20b

A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments

Available at 19 Providers

Provider Source Input Price ($/1M) Output Price ($/1M) Description Free
vercel vercel Input: $0.07 Output: $0.30 A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
together together Input: $0.05 Output: $0.20 -
poe poe Input: $450.00 Output: - OpenAI introduces the GPT-OSS-20B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-20B model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
groq models-dev Input: $0.08 Output: $0.30 Provider: Groq, Context: 131072, Output Limit: 65536
nebius models-dev Input: $0.05 Output: $0.20 Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
siliconflowcn models-dev Input: $0.04 Output: $0.18 Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
chutes models-dev Input: $0.02 Output: $0.10 Provider: Chutes, Context: 131072, Output Limit: 131072
siliconflow models-dev Input: $0.04 Output: $0.18 Provider: SiliconFlow, Context: 131000, Output Limit: 8000
helicone models-dev Input: $0.05 Output: $0.20 Provider: Helicone, Context: 131072, Output Limit: 131072
fastrouter models-dev Input: $0.05 Output: $0.20 Provider: FastRouter, Context: 131072, Output Limit: 65536
cloudflareworkersai models-dev Input: $0.20 Output: $0.30 Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareaigateway models-dev Input: $0.20 Output: $0.30 Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
ovhcloud models-dev Input: $0.05 Output: $0.18 Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
deepinfra models-dev Input: $0.03 Output: $0.14 Provider: Deep Infra, Context: 131072, Output Limit: 16384
lmstudio models-dev Input: $0.00 Output: $0.00 Provider: LMStudio, Context: 131072, Output Limit: 32768
fireworksai models-dev Input: $0.05 Output: $0.20 Provider: Fireworks AI, Context: 131072, Output Limit: 32768
ionet models-dev Input: $0.03 Output: $0.14 Provider: IO.NET, Context: 64000, Output Limit: 4096
wandb litellm Input: $5,000.00 Output: $20,000.00 Source: wandb, Context: 131072
openrouter openrouter Input: $0.02 Output: $0.06 gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072