gpt-oss-20b
A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments
| Provider | Source | Input Price ($/1M) | Output Price ($/1M) | Description | Free |
|---|---|---|---|---|---|
| vercel | vercel | Input: $0.07 | Output: $0.30 | A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments | |
| together | together | Input: $0.05 | Output: $0.20 | - | |
| poe | poe | Input: $450.00 | Output: - | OpenAI introduces the GPT-OSS-20B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-20B model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens | |
| groq | models-dev | Input: $0.08 | Output: $0.30 | Provider: Groq, Context: 131072, Output Limit: 65536 | |
| nebius | models-dev | Input: $0.05 | Output: $0.20 | Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192 | |
| siliconflowcn | models-dev | Input: $0.04 | Output: $0.18 | Provider: SiliconFlow (China), Context: 131000, Output Limit: 8000 | |
| chutes | models-dev | Input: $0.02 | Output: $0.10 | Provider: Chutes, Context: 131072, Output Limit: 131072 | |
| siliconflow | models-dev | Input: $0.04 | Output: $0.18 | Provider: SiliconFlow, Context: 131000, Output Limit: 8000 | |
| helicone | models-dev | Input: $0.05 | Output: $0.20 | Provider: Helicone, Context: 131072, Output Limit: 131072 | |
| fastrouter | models-dev | Input: $0.05 | Output: $0.20 | Provider: FastRouter, Context: 131072, Output Limit: 65536 | |
| cloudflareworkersai | models-dev | Input: $0.20 | Output: $0.30 | Provider: Cloudflare Workers AI, Context: 128000, Output Limit: 128000 | |
| cloudflareaigateway | models-dev | Input: $0.20 | Output: $0.30 | Provider: Cloudflare AI Gateway, Context: 128000, Output Limit: 16384 | |
| ovhcloud | models-dev | Input: $0.05 | Output: $0.18 | Provider: OVHcloud AI Endpoints, Context: 131000, Output Limit: 131000 | |
| deepinfra | models-dev | Input: $0.03 | Output: $0.14 | Provider: Deep Infra, Context: 131072, Output Limit: 16384 | |
| lmstudio | models-dev | Input: $0.00 | Output: $0.00 | Provider: LMStudio, Context: 131072, Output Limit: 32768 | |
| fireworksai | models-dev | Input: $0.05 | Output: $0.20 | Provider: Fireworks AI, Context: 131072, Output Limit: 32768 | |
| ionet | models-dev | Input: $0.03 | Output: $0.14 | Provider: IO.NET, Context: 64000, Output Limit: 4096 | |
| wandb | litellm | Input: $5,000.00 | Output: $20,000.00 | Source: wandb, Context: 131072 | |
| openrouter | openrouter | Input: $0.02 | Output: $0.06 | gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs. Context: 131072 |