← Back to all models

qwen3-vl-235b-a22b-t

qwen3-vl-235b-a22b-t

Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment. Key Features: - Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks. - Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos. - Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis. - Long Context: Processes up to 1M tokens for books or long videos. - Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers. - Visual Recognition: Recognizes a wide range of objects, landmarks, and more. - OCR: Supports 32 languages with improved performance in challenging conditions. - Text-Vision Fusion: Achieves seamless, unified comprehension. Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens

Available at 1 Provider

Provider Source Input Price ($/1M) Output Price ($/1M) Description Free
poe poe Input: $4,800.00 Output: - Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment. Key Features: - Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks. - Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos. - Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis. - Long Context: Processes up to 1M tokens for books or long videos. - Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers. - Visual Recognition: Recognizes a wide range of objects, landmarks, and more. - OCR: Supports 32 languages with improved performance in challenging conditions. - Text-Vision Fusion: Achieves seamless, unified comprehension. Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens