qwen3-vl-235b-a22b-t
Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment. Key Features: - Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks. - Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos. - Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis. - Long Context: Processes up to 1M tokens for books or long videos. - Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers. - Visual Recognition: Recognizes a wide range of objects, landmarks, and more. - OCR: Supports 32 languages with improved performance in challenging conditions. - Text-Vision Fusion: Achieves seamless, unified comprehension. Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens
| Provider | Source | Input Price ($/1M) | Output Price ($/1M) | Description | Free |
|---|---|---|---|---|---|
| poe | poe | Input: $4,800.00 | Output: - | Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment. Key Features: - Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks. - Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos. - Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis. - Long Context: Processes up to 1M tokens for books or long videos. - Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers. - Visual Recognition: Recognizes a wide range of objects, landmarks, and more. - OCR: Supports 32 languages with improved performance in challenging conditions. - Text-Vision Fusion: Achieves seamless, unified comprehension. Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens |