GLM 4.5 Air

glm-4.5-air

GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.

Provider	Source	Input Price ($/1M)	Output Price ($/1M)	Description
vercel	vercel	Input: $0.20	Output: $1.10	GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters.
nebius	models-dev	Input: $0.20	Output: $1.20	Provider: Nebius Token Factory, Context: 131072, Output Limit: 8192
siliconflowcn	models-dev	Input: $0.14	Output: $0.86	Provider: SiliconFlow (China), Context: 131000, Output Limit: 131000
chutes	models-dev	Input: $0.05	Output: $0.22	Provider: Chutes, Context: 131072, Output Limit: 131072
siliconflow	models-dev	Input: $0.14	Output: $0.86	Provider: SiliconFlow, Context: 131000, Output Limit: 131000
huggingface	models-dev	Input: $0.20	Output: $1.10	Provider: Hugging Face, Context: 128000, Output Limit: 96000
zenmux	models-dev	Input: $0.11	Output: $0.56	Provider: ZenMux, Context: 128000, Output Limit: 64000
zhipuai	models-dev	Input: $0.20	Output: $1.10	Provider: Zhipu AI, Context: 131072, Output Limit: 98304
submodel	models-dev	Input: $0.10	Output: $0.50	Provider: submodel, Context: 131072, Output Limit: 131072
nanogpt	models-dev	Input: $1.00	Output: $2.00	Provider: NanoGPT, Context: 128000, Output Limit: 8192
openrouter	openrouter	Input: $0.05	Output: $0.22	GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) Context: 131072
zai	zai	Input: $0.20	Output: $0.03	-

GLM 4.5 Air

Available at 12 Providers