gpt-oss-20b

Provider	Source	Input Price ($/1M)	Output Price ($/1M)	Description
vercel	vercel	Input: $0.07	Output: $0.30	A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
together	together	Input: $0.05	Output: $0.20	-
poe	poe	Input: $450.00	Output: -	OpenAI introduces the GPT-OSS-20B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-20B model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
groq	models-dev	Input: $0.08	Output: $0.30	Provider: Groq, Context: 131072, Output Limit: 65536
nebius	models-dev	Input: $0.05	Output: $0.20	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
siliconflowcn	models-dev	Input: $0.04	Output: $0.18	Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000
chutes	models-dev	Input: $0.02	Output: $0.10	Provider: Chutes, Context: 131072, Output Limit: 131072
siliconflow	models-dev	Input: $0.04	Output: $0.18	Provider: SiliconFlow, Context: 131000, Output Limit: 8000
helicone	models-dev	Input: $0.05	Output: $0.20	Provider: Helicone, Context: 131072, Output Limit: 131072
fastrouter	models-dev	Input: $0.05	Output: $0.20	Provider: FastRouter, Context: 131072, Output Limit: 65536
cloudflareworkersai	models-dev	Input: $0.20	Output: $0.30	Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000
cloudflareaigateway	models-dev	Input: $0.20	Output: $0.30	Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384
ovhcloud	models-dev	Input: $0.05	Output: $0.18	Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000
deepinfra	models-dev	Input: $0.03	Output: $0.14	Provider: Deep Infra, Context: 131072, Output Limit: 16384
lmstudio	models-dev	Input: $0.00	Output: $0.00	Provider: LMStudio, Context: 131072, Output Limit: 32768
fireworksai	models-dev	Input: $0.05	Output: $0.20	Provider: Fireworks AI, Context: 131072, Output Limit: 32768
ionet	models-dev	Input: $0.03	Output: $0.14	Provider: IO.NET, Context: 64000, Output Limit: 4096
wandb	litellm	Input: $5,000.00	Output: $20,000.00	Source: wandb, Context: 131072
openrouter	openrouter	Input: $0.02	Output: $0.06	gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072

Available at 19 Providers