Poe Models

Name	Model ID	Input Price ($/1M)	Output Price ($/1M)	Description
-	assistant	-	-	General-purpose assistant. Write, code, ask for real-time information, create images, and more. Queries are automatically routed based on the task and subscription status. For subscribers: - General queries: @GPT-5.2-Instant - Web searches: @Web-Search - Image generation: @Nano-Banana - Video-input tasks: @Gemini-2.5-Pro For non-subscribers: - General queries: @GPT-4o-Mini - Web searches: @Web-Search - Image generation: @FLUX-schnell - Video-input tasks: @Gemini-2.5-Flash
-	gpt-5.2-instant	1.60	13.00	A fast, steady conversational model built for day-to-day use. It handles long threads without drifting, keeps context clean, and answers in a straightforward way. Good for planning, rewriting, summarizing, and quick technical help. Supports 400k tokens of context and native vision. Optional parameters: Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
-	claude-opus-4.5	4.30	21.00	Claude Opus 4.5 from Anthropic, supports customizable thinking budget (up to 64k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63999 to the end of your message.
-	gemini-3-flash	0.40	2.40	Building on the reasoning capabilities of Gemini 3 Pro, Gemini 3 Flash is a powerful but affordable and performant model. It has exceptional world knowledge, multimodal understanding and reasoning capabilities at a fraction of the cost of equivalent models (as of December 2025). Optional parameters: To set thinking level, add --thinking_level and set it to either `minimal`, `low`, `high`. This is set to `low` as default. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
-	gemini-3-pro	1.60	9.60	Gemini 3 Pro is a state-of-the-art model for math, coding, computer use, and long‑horizon agent tasks, delivering top benchmark results including 23.4% on MathArena Apex (up from 1.6%), SOTA on tau-bench, an Elo of 2,439 on LiveCodeBench Pro (vs. 2,234), 72.7% on ScreenSpot‑Pro (~2× the previous best), and a higher mean net worth on Vending‑Bench 2 ($5,478 vs. $3,838). It has a 1M input context window and a max output tokens of 64k. Optional Parameters: To instruct the bot to use more thinking effort, select from "Low" or "High" To enable web search and real-time information update, toggle "enable web search". This is disabled by default.
-	gpt-5.2-pro	19.00	150.00	A powerful reasoning model that is ideal for your most complex, highest difficulty tasks. On x-high reasoning effort, scores a 90.5% on ARC-AGI-1 benchmark, an incredibly difficult problem-solving benchmark where humans score 100%. Note: the model can take up to 30 minutes to think through a problem and is quite expensive. Supports 400k tokens of context and native vision. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "medium", "high" or "Xhigh" (default: "medium") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
-	gpt-5.2	1.60	13.00	GPT-5.2 is a state-of-the-art AI model from OpenAI designed for real work across writing, analysis, coding, and problem solving. It handles long contexts and multi-step tasks better than earlier versions, and it’s tuned to give accurate responses with fewer errors. Supports 400k tokens of context, and native vision. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", "high", or "Xhigh" (default: "None") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", "high" (default: medium)
-	claude-sonnet-4.5	2.60	13.00	Claude Sonnet 4.5 represents a major leap forward in AI capability and alignment. It is the most advanced model released by Anthropic to date, distinguished by dramatic improvements in reasoning, mathematics, and real-world coding. Supports 1m tokens of context. To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 31,999 to the end of your message. Use `--web_search true` to enable web search and real-time information update. This is disabled by default.
-	grok-4	3.00	15.00	Grok 4 is xAI's latest and most intelligent language model. It features state-of-the-art capabilities in coding, reasoning, and answering questions. It excels at handling complex and multi-step tasks. Reasoning traces are not available via the xAI API.
-	claude-haiku-4.5	0.85	4.30	Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line, and scores >73% on SWE-bench verified, ranking among the world's best coding models. Supports 200k tokens of context. Optional parameters: To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 63,999 to the end of your message. Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
-	claude-opus-4.1	13.00	64.00	Claude Opus 4.1 from Anthropic, supports customizable thinking budget (up to 32k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 31999 to the end of your message.
-	glm-4.7	-	-	GLM-4.7 is Z.AI's latest flagship model, with major upgrades focused on advanced coding capabilities and more reliable multi-step reasoning and execution. It shows clear gains in complex agent workflows, while delivering a more natural conversational experience and stronger front-end design sensibility. File Support: Text, Markdown and PDF files Context window: 205k tokens Optional parameters: Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default. Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
-	minimax-m2.1	-	-	MiniMax M2.1 is a cutting-edge AI model designed to revolutionize how developers build software. With enhanced multi-language programming support, it excels in generating high-quality code across popular languages like Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, and JavaScript. Key improvements include: 22% faster response times and 30% lower token consumption for efficient workflows. Seamless integration with leading development frameworks (Claude Code, Droid Factory AI, BlackBox, etc.). Full-stack development capabilities, from mobile (Android/iOS) to web and 3D interactive prototyping. Optimized performance-to-cost ratio, making AI-assisted development more accessible. Whether you're a software engineer, app developer, or tech innovator, M2.1 empowers smarter coding with industry-leading AI. File Support: Text, Markdown and PDF files Context window: 205k tokens Optional parameters: Use `--enable_thinking true` to enable thinking about the response before giving a final answer. This is disabled by default. Use `--temperature` and set number from 0 to 2 to control randomness in the response. Lower values make the output more focused and deterministic. This is set to 0.7 by default Use `max_output_token` and set number from 1 to 131072 to set number of tokens to generate in response. This is set to 131072 by default.
-	gemini-2.5-flash	0.21	1.80	Gemini 2.5 Flash builds upon the popular foundation of Google's 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, search capabilities, and image/video understanding while still prioritizing speed and cost. Supports 1M tokens of input context. Serves the latest `gemini-2.5-flash-preview-09-2025` snapshot. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 24,576 to the end of your message. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
-	gemini-2.5-pro	0.87	7.00	Gemini 2.5 Pro is Google's advanced model with frontier performance on various key benchmarks; supports web search and 1 million tokens of input context. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 32,768 to the end of your message. Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
-	kling-omni	-	-	Bot for Kling Omni Image-to-Video inference. Send one image for image-to-video generation and two images for first-to-last frame video generation. Set duration with `--duration`, to either 5 or 10 seconds. Accepted file type: jpeg, png, webp, heic, heif. This bot does not accept video files. Note: Prompt is required after attaching images to generate video.
-	deepseek-r1	18,000.00	-	Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to Together AI, a US-based company. Supports 164k tokens of input context and 33k tokens of output context. Uses the latest May 28th snapshot (DeepSeek-R1-0528).
-	manus	-	-	Manus is an autonomous AI agent that executes tasks. It can take a high-level prompt, break it into subtasks, interact with tools/APIs, and deliver end-to-end results (like reports, code, websites, images, and more) without you managing each step. Notes: - In Agent mode, responses may take several minutes to complete. - Sometimes, files that Manus has created are incorrectly uploaded to the Poe message. In such cases, please check the Manus chat for the file. Parameter controls available: 1. Task Mode - Default: '--task_mode adaptive' (smart routing: may choose Chat or Agent) - Conversational single turn:' --task_mode chat' (fixed price) - Autonomous multi-step: '--task_mode agent' 2. Agent Profile - Default: '--agent_profile manus-1.6' (standard tasks) - Lower usage: '--agent_profile manus-1.6-lite' (speed/savings) - Maximum capability: '--agent_profile manus-1.6-max' (complex reasoning)
-	glm-4.6	6,600.00	-	As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications. Use `--enable_thinking false` to disable thinking about the response before giving a final answer. This is enabled by default. Bot does not support media (video and audio file) attachments. Technical Specifications File Support: Text, Markdown and PDF files Context window: 200k tokens
-	gpt-5.1-instant	1.10	9.00	OpenAI’s most flagship model optimized for conversational intelligence. It excels at natural dialogue, contextual memory, and adaptive tone, making it perfect for interactive agents, tutoring, and customer support. It balances speed, reliability, and empathy for seamless real‑time communication. Supports 128k tokens of input context.
-	gpt-5.1	1.10	9.00	OpenAI’s flagship general‑purpose model, built for advanced reasoning, comprehension, and creativity. It delivers robust performance across text and code, with significant improvements in factual accuracy, long‑context understanding, and multilingual fluency. Ideal for research, content creation, analysis, and problem‑solving in any domain. Supports 400k of input context window. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high" (default: "None") Use `--web_search true` to enable web search and real-time information access, this is disabled by default. Use `--verbosity` to control response details at the end of your message with one of "low", medium", or "high" (default: medium)
-	gpt-image-1.5	-	-	OpenAI's frontier image generation model in ChatGPT as of December 2025, offering exceptional prompt adherence, world knowledge, precise edits, facial preservation, level of detail, and overall quality with improved latency/generation times. It supports editing, restyling, and combining images attached to the latest user query. For a conversational image generation and editing experience use: https://poe.com/GPT-5.2 Optional Parameters: Set aspect ratio, with options 3:2, 1:1 and 2:3. Set quality to low, medium and high. Default is set to high. Enable use mask by toggling it on or by typing 'use_mask' in the prompt. This option is turned off by default. Disable high fidelity by toggling it off or by typing 'use_high_fidelity'. This option is turned on by default.
-	kimi-k2-thinking	6,700.00	-	Built as a thinking agent, it performs step-by-step reasoning while utilizing tools, achieving state-of-the-art performance on benchmarks such as Humanity's Last Exam (HLE), BrowseComp, and others. The model demonstrates substantial advancements in reasoning, agentic search, coding, writing, and general problem-solving capabilities. Kimi K2 Thinking is capable of executing 200–300 sequential tool calls autonomously, maintaining coherent reasoning across hundreds of steps to solve complex tasks. File Support: Text, Markdown and PDF files Context window: 256k tokens
-	deepseek-v3.2	-	-	We introduce DeepSeek-V3.2, a next-generation foundation model designed to unify high computational efficiency with state-of-the-art reasoning and agentic performance. DeepSeek-V3.2 is built upon three core technical breakthroughs: • DeepSeek Sparse Attention (DSA): A new highly efficient attention mechanism that significantly reduces computational overhead while preserving model quality, purpose-built for long-context reasoning and high-throughput workloads. • Scalable Reinforcement Learning Framework: DeepSeek-V3.2 leverages a robust RL training protocol and expanded post-training compute to reach GPT-5-level performance. Its high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and demonstrates reasoning capabilities comparable to Gemini-3.0-Pro. • Large-Scale Agentic Task Synthesis Pipeline: To enable reliable tool-use and multi-step decision-making, we develop a novel agentic data synthesis pipeline that generates high-quality interactive reasoning tasks at scale, greatly enhancing the model’s File Support: Text, Markdown and PDF files Context window: 164k tokens
-	glm-4.6v	-	-	GLM-4.6V represents a significant multimodal advancement in the GLM series, achieving state-of-the-art visual understanding accuracy for models of its parameter scale. Notably, it's the first visual model to natively integrate Function Call capabilities directly into its architecture, creating a seamless pathway from visual perception to executable actions. This breakthrough establishes a unified technical foundation for deploying multimodal agents in real-world business applications. File Support: Text, Markdown, Image and PDF files Context window: 131k tokens Optional parameters: Enable Thinking - Toggle this on for the model to think before providing a response. This is disabled by default Temperature - Controls randomness in the response. Lower values make the output more focused and deterministic. Select from 0 to 2 range. This is set to 0.7 by default. Max Output Tokens: Maximum number of tokens to generate in the response. This can be set from 1 to 32768. Set to Max token at 32768 by default.
-	gpt-5.1-codex	1.10	9.00	GPT‑5.1‑Codex extends GPT‑5.1’s capabilities for software development. It understands complex codebases, provides accurate completions, explains algorithms, and assists with debugging across modern programming languages. Designed for developers, it elevates productivity and supports full‑stack coding workflows with precision. Supports 400k tokens of input context. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
-	gpt-5-pro	14.00	110.00	OpenAI’s latest flagship model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1. Use `--web_search true` to enable web search and real-time information access, this is disabled by default. GPT-5-Pro thinks long and hard. When using this bot through the API, consider increasing your request timeouts.
-	gpt-5-chat	1.10	9.00	ChatGPT-5 points to the non-reasoning model GPT-5 snapshot (gpt-5-chat-latest) currently used in ChatGPT. Supports native vision, 400k tokens of context, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount.
-	claude-code	-	-	A powerful assistant that can read, write, and analyze files across many formats. It can also delegate to other Poe bots to handle complex, multi-step tasks. Built on the Claude Agent SDK from Anthropic.
-	grok-4.1-fast-reasoning	-	-	Grok-4.1-Fast-Reasoning is a high-performance version of xAI’s Grok 4.1 Fast, the company’s best agentic tool‑calling model. It works great in real-world use cases like customer support, deep research, and advanced analytical reasoning. Equipped with 2M‑token context window, this model processes vast information seamlessly, delivering coherent, context‑aware, and deeply reasoned insights at exceptional speed.
-	zai-glm-4.6-cs	19,000.00	-	World’s fastest inference for ZAI GLM 4.6 with Cerebras. ZAI GLM 4.6 is a high‑performance AI model designed for advanced reasoning, superior coding, and effective tool use. It supports structured outputs, parallel tool calling, and real‑time streaming responses. Optimized for agentic coding and automation tasks, the model delivers strong real‑world performance with a context window of up to 131K tokens and output up to 40K tokens. For more information see: https://inference-docs.cerebras.ai/models/zai-glm-46 Context Limit: 131k
-	gpt-5.1-codex-max	1.10	9.00	OpenAI's most capable agentic coding model; recommended for use in agentic harnesses or similar environments (e.g. Cursor, Claude Code, Codex); the default reasoning effort is set to `Xhigh` so the model will reason extensively on problems given to it (i.e. expect long generation times) and points-intensive. Accepts image attachments.
-	gpt-5.1-codex-mini	0.22	1.80	GPT‑5.1‑Codex‑Mini is a lightweight, fast, and efficient code‑generation model derived from GPT‑5.1‑Codex. It’s optimized for quick iterations, smaller environments, and edge applications—offering strong coding assistance with lower computational cost while maintaining accuracy and utility. Supports 400k tokens of input context. Optional parameters: To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high"
-	gpt-4o	-	-	OpenAI's GPT-4o answers user prompts in a natural, engaging & tailored writing with strong overall world knowledge. Uses GPT-Image-1 to create and edit images conversationally. For fine-grained image generation control (e.g. image quality), use https://poe.com/GPT-Image-1. Supports context window of 128k tokens. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	nano-banana-pro	1.70	10.00	Nano Banana Pro (Gemini 3 Pro Image Preview) can make detailed, context-rich visuals, precisely edit or restyle input images with exceptional fidelity, and even generate legible text in images in multiple languages. Optional parameters: `--aspect_ratio` (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image `--web_search true` to enable web search and real-time information access, this is disabled by default. `--image_only` (defaults: False): Determines whether to only generate image output `--image_size` (options: 1K, 2K, 4K): Resolution of image Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
-	nano-banana	0.21	1.80	Google DeepMind's Nano Banana (i.e. Gemini 2.5 Flash Image model) offers image generation and editing capabilities, state-of-the-art performance in photo-realistic multi-turn edits at exceptional speeds. Supports a maximum input context of 32k tokens. Optional parameters: --aspect_ratio (options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9): Aspect ratio of the output image --image_only (defaults: False): Determines whether to only generate image output Note: Simply enabling --image_only will not result in an image unless the prompt is phrased specifically for image generation, but it does guarantee that only a single image (or none) will be produced.
-	grok-4.1-fast-non-reasoning	-	-	Grok-4.1-Fast-Non-Reasoning is a streamlined companion to Grok 4.1 Fast, xAI’s best agentic tool‑calling model. It has 2M context window and high responsiveness but is optimized for non‑reasoning tasks — excelling at text generation, summarization, and automated workflows that demand speed and efficiency over deep logic. Ideal for high-throughput use cases like customer support automation, bulk content creation, and fast conversational responses.
-	gpt-5	1.10	9.00	OpenAI’s most advanced general model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal", "low", "medium", or "high" Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
-	gpt-5-nano	0.04	0.36	GPT-5 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 400k input tokens of context. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "minimal, "low", "medium", or "high" Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
-	gpt-5-mini	0.22	1.80	GPT-5 mini is a small, fast & affordable model that matches or beats GPT-4.1 in many intelligence and vision-related tasks. Supports 400k tokens of context. Provides a 90% chat history cache discount. To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "minimal", "low", "medium", or "high". Use `--web_search true` to enable web search and real-time information access, this is disabled by default.
-	o3-pro	18.00	72.00	o3-pro is a well-rounded and powerful model across domains, with more capability than https://poe.com/o3 at the cost of higher price and lower speed. It is especially capable at math, science, coding, visual reasoning tasks, technical writing, and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
-	gemini-2.5-flash-lite	0.07	0.28	A lightweight Gemini 2.5 Flash reasoning model optimized for cost efficiency and low latency. Supports web search. Supports 1 million tokens of input context. Serves the latest `gemini-2.5-flash-lite-preview-09-2025` snapshot. For more complex queries, use https://poe.com/Gemini-2.5-Pro or https://poe.com/Gemini-2.5-Flash To instruct the bot to use more thinking effort, add `--thinking_budget` and a number ranging from 0 to 24,576 to the end of your message. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
-	gpt-5-codex	1.10	9.00	GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. It supports multimodal inputs such as images or screenshots for UI development and a 400k token context window. We recommend using GPT-5-Codex only for agentic and interactive coding use cases. To instruct the bot to use more reasoning effort, add `--reasoning_effort` to the end of your message with one of "low", "medium", or "high"
-	grok-4-fast-non-reasoning	0.20	0.50	Grok 4 Fast Non-Reasoning is designed for fast, efficient tasks like content generation with a 2M token context window. Combining cutting-edge performance with cost-efficiency, it ensures high-quality results for simpler, everyday applications.
-	qwen-3-next-80b-think	3,000.00	-	The Qwen3-Next-80B-Think (with thinking mode enabled by default) is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B-Thinking." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the inference cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens. This is the thinking version of https://poe.com/Qwen3-Next-80B, supports 65k tokens of context. Optional Parameters: Use additional input beside attachment button to manage the optional parameters: 1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer. Technical Specifications: File Support: PDF, DOC and XLSX files File Attachment Limitation: Audio, video and image files Context Window: 65k tokens
-	qwen3-next-80b	2,400.00	-	The Qwen3-Next-80B is the next-generation foundation model released by Qwen, optimized for extreme context length and large-scale parameter efficiency, also known as "Qwen3-Next-80B-A3B." Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks - while requiring less than 1/10 of the training cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32k tokens. Use `--enable_thinking false` to disable thinking mode before giving an answer. This is the non-thinking version of https://poe.com/Qwen3-Next-80B-Think; supports 65k tokens of context.
-	deepseek-v3.2-exp	3,900.00	-	DeepSeek-V3.2-Exp is an experimental model introducing the groundbreaking DeepSeek Sparse Attention (DSA) mechanism for enhanced long-context processing efficiency. Built on V3.1-Terminus, DSA achieves fine-grained sparse attention while maintaining identical output quality. This delivers substantial computational efficiency improvements without compromising accuracy. Comprehensive benchmarks confirm V3.2-Exp matches V3.1-Terminus performance, proving efficiency gains don't sacrifice capability. As both a powerful tool and research platform, it establishes new paradigms for efficient long-context AI processing. Optional Parameters: Use additional input beside attachment button to manage the optional parameters: 1. Enable/Disable Thinking - This will cause the model to think about the response before giving a final answer. Technical Specifications: File Support: Text, Markdown and PDF files Context window: 160k tokens
-	nova-pro-1.0	-	-	Amazon Nova Pro 1.0 is a highly capable multimodal foundation model from Amazon Nova, offering a strong balance of accuracy, speed, and cost for processing text, images, and video. Its context window is 300,000 tokens, which enables handling very large inputs (including up to ~30 minutes of video input) in a single request. Use ‘--enable_latency_optimized [false/true]’ (default false) to disable/enable the latency optimized inference accordingly. Note that if enabled, costs may increase. Check the rate card for more information.
-	nova-premier-1.0	-	-	The Amazon Nova Premier 1.0 model is Amazon’s most capable foundation model, able to handle extremely long contexts (≈ 1 million tokens) and multimodal inputs like text, images, and video while excelling at complex, multi‑step tasks across tools and data sources. It supports chain‑of‑thought style reasoning and breaks down problems into intermediate steps before arriving at an answer, improving coherence and accuracy. Use '--enable_thinking [true/false]' (default true) to enable/disable thinking accordingly.
-	grok-4-fast-reasoning	0.20	0.50	Grok 4 Fast Reasoning delivers exceptional performance for tasks requiring logical thinking and problem-solving. With a 2M token context window and state-of-the-art cost-efficiency, it handles complex reasoning tasks with accuracy and speed, making advanced AI capabilities accessible to more users.
-	nova-micro-1.0	-	-	Amazon Nova Micro is a text-only foundation model in the Amazon Nova family, designed for ultra‑low latency and very low cost, optimized for tasks like summarization, translation, and interactive chat. It supports a context window of 128,000 tokens, enabling handling of large text inputs in a single request.
-	nova-lite-1.0	-	-	Amazon Nova Lite is a low‑cost multimodal foundation model from Amazon that can process text, images, and video and is optimized for speed and affordability. It offers a context window of 300,000 tokens, allowing handling of very large inputs in a single request (including up to ~30 minutes of video).
-	minimax-m2	3,300.00	-	MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever. Technical Specifications File Support: Text, Markdown and PDF files Context window: 200k tokens
-	hunyuan-image-3	-	-	Hunyuan Image 3.0 is Tencent’s next‑generation open‑source text-to-image model that uses a large multimodal Mixture-of-Experts architecture to unify image understanding and generation in one system. It produces high-fidelity, often photorealistic images with strong prompt adherence, multilingual text rendering, and intelligent world-knowledge reasoning that can enrich sparse prompts with appropriate visual details. Note: Uploading attachments is not supported. Parameter controls available: 1. Image Settings Size / Aspect Ratio - Default: `--size 1024x1024` (Square 1:1) - `--size 768x1024` (Portrait 3:4) - `--size 1024x768` (Landscape 4:3) - `--size 1024x1536` (Tall Portrait 2:3) - `--size 1536x1024` (Wide Landscape 3:2) - `--size 512x512` (Small Square 1:1) Quantity - `--num_images [1-4]` number of images to generate (default: 1) Quality & Generation - `--num_inference_steps [10-50]` denoising steps for quality (default: 28, higher = better quality but slower) - `--guidance_scale [1.0-20.0]` how closely to follow prompt (default: 7.5) Customization - `--negative_prompt "text"` things to avoid in generated images - `--seed [integer]` reproducible generation with fixed seed (e.g., 42)
-	kling-image-o1	-	-	Kling Image O1 image generation and image editing bot. Send up to 10 images to use as a reference, and refer to each image with $image1, $image2, etc. in the prompt to specify interactions. Set resolution with `--resolution` and aspect ratio with `--aspect`. Note: `auto` aspect ratio is default and can be used only for editing, text-to-image generation has a default of `1:1`. Supports jpeg, png, heic, webp images.
-	kling-2.6-pro	-	-	Generate high-quality videos with native audio from text and images using Kling 2.6 Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video. Use --silent to generate a silent video.
-	flux-2-pro	-	-	Flux.2 [Pro] is Black Forest Labs' state-of-the-art model with multi-reference support, fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex colour codes within the prompt for precise colouring. Send images (Up to 8 images) in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 9 megapixels. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
-	flux-2-flex	-	-	Flux.2 [Flex] is Black Forest Lab's latest model, with Multi-Reference Support, Fine-grained text rendering, and other features. Supports structured JSON prompts, and allows use of hex color codes within the prompt for precise coloring. Send images in jpeg/png/webp format for editing. Total megapixels (input + output) should not exceed 14 megapixels. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
-	flux-2-dev	-	-	Open-weight image gen (32B) model, derived from the FLUX.2 base model. The most powerful open-weight image generation and editing model available today, combining text-to-image synthesis and image editing with multiple input images in a single checkpoint. Optional parameters: `--aspect` to set aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16
-	mistral-medium-3.1	-	-	Mistral Medium 3.1 is a high-performance, enterprise-grade language model that delivers strong reasoning, coding, and STEM capabilities. It supports hybrid, on-prem, and in-VPC deployments, offering competitive accuracy and easy integration across cloud environments. Context Length: 131k
-	exa-answer	-	-	Get a quick LLM-style answer to a question informed by Exa search results. For more in-depth results, consider using the following endpoint: https://poe.com/Exa-Research Supported file type upload: PDF, TXT, PNG, JPG, JPEG Audio and video file upload is not supported. Parameter Controls Available: - `--text false/true` Show text snippets under each source citation (default: false)
-	exa-search	-	-	Utilize Exa's technology for searching web pages, finding similar web pages, crawling, and more. Note: This endpoint does not return an LLM-style response (visit the following if you want an LLM-style response: https://poe.com/Exa-Answer or https://poe.com/Exa-Research). File upload is not supported. Parameter Controls Available: 1. Operation Mode - Default: `--operation search` (Web Search) - For finding similar pages: `--operation similar` - For getting page contents: `--operation contents` - For code search: `--operation code` 2. Search Settings (search operation) - `--search_type [auto\|neural\|deep\|fast]` search algorithm (default: auto) - `--show_content` display full page content in results - `--include_domains` comma-separated domains to include - `--include_text` text that must appear (up to 5 words) - `--exclude_text` text that must NOT appear (up to 5 words) 3. Common Search Settings (search & similar operations) - `--num_results [1-100]` number of results to return (default: 10) - `--category [company\|research paper\|news\|pdf\|github\|tweet\|personal site\|linkedin profile\|financial report]` - `--exclude_domains` comma-separated domains to exclude 4. Date Filters (search operation) - `--start_crawl_date` results crawled after this date (ISO 8601) - `--end_crawl_date` results crawled before this date (ISO 8601) - `--start_published_date` content published after this date (ISO 8601) - `--end_published_date` content published before this date (ISO 8601) 5. Content Options (search, similar, & contents operations) - `--return_text` fetch page text content (default: true) - `--text_max_chars` limit text length (empty = unlimited) - `--include_html_tags` preserve HTML structure - `--return_highlights` get AI-selected key snippets - `--highlights_sentences [1-10]` sentences per highlight (default: 3) - `--highlights_per_url [1-10]` highlights per result (default: 3) - `--highlights_query` guide highlight selection - `--return_summary` get AI-generated summaries - `--summary_query` guide summary generation 6. Advanced Options (search, similar, & contents operations) - `--livecrawl [fallback\|never\|always\|preferred]` when to fetch fresh content (default: fallback) - `--subpages [0-10]` number of linked subpages to crawl (default: 0) - `--subpage_target` find specific subpages matching keyword 7. Code Search Controls (code operation) - `--code_tokens [dynamic\|5000\|10000\|20000]` response length (default: dynamic)
-	exa-research	-	-	Create an asynchronous research task that explores the web, gathers sources, synthesizes findings, and returns results with citations. Note: Responses may take several minutes to complete depending on complexity. Supported file type upload: PDF, TXT, PNG, JPG, JPEG Audio and video file upload is not supported. Parameter Controls Available: Model Selection - `--model exa-research` (Standard, default) - `--model exa-research-pro` (Deepest, highest quality) - `--model exa-research-fast` (Fastest, lightest)
-	kat-coder-pro	-	-	KAT-Coder-Pro V1 by KwaiKAT is a non-reasoning model optimized for agentic coding. It delivers strong performance on reasoning-style tasks while requiring significantly fewer output tokens than peer models. With the 1210 release, it achieved a score of 64 on the Artificial Analysis Intelligence Index, placing it in the global Top 10 and ranking first among all non-reasoning models. File Support: Text, Markdown and PDF files Context window: 256k tokens
-	deepseek-v3.2-fw	5,300.00	-	Model from DeepSeek that harmonizes high computational efficiency with superior reasoning and agent performance. File Support: Image (JPG, JPEG, PNG, HEIC), Other File Types (PDF, PYTHON, XLSX)
-	nova-lite-2	-	-	Amazon Nova 2 Lite is a fast, cost-effective multimodal reasoning model from Amazon that can process text, images, documents, and video, designed for everyday workloads like chatbots, document processing, and business automation. It offers a 1 million token context window, enabling very large, complex inputs in a single request, including long documents and extended video clips (~90 minutes). Note: Video file uploads are limited to ~1GB. Also note that reasoning traces are not exposed from AWS. Supported file types: JPEG, PNG, GIF, WEBP, PDF, DOCX, TXT, MP4, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP Parameter controls available: '--enable_reasoning true/false' - Enable step-by-step reasoning (default: true). '--reasoning_effort low/medium/high' - Specify the reasoning effort level (default: medium).
-	gpt-oss-120b-t	1,500.00	-	OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. Built with community feedback and released under Apache 2.0, this 120B parameter model provides transparency, customization, and deployment flexibility for organizations requiring complete data security & privacy control.
-	gpt-oss-20b-t	450.00	-	OpenAI's GPT-OSS-20B provides powerful chain-of-thought reasoning in an efficient 20B parameter model. Designed for single-GPU deployment while maintaining sophisticated reasoning capabilities, this Apache 2.0 licensed model offers the perfect balance of performance and resource efficiency for diverse applications.
-	amazon-nova-reel-1.1	-	-	Amazon Nova Reel 1.1 is an advanced AI video generation model that creates up to 2-minute multi-shot videos from text and optional image prompts, offering improved video quality, latency, and visual consistency compared to its predecessor.
-	kimi-k2-think-t	13,000.00	-	Kimi K2 Thinking is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2 Thinking dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed. Supported File Types: JPEG, PNG, PDF
-	amazon-nova-canvas	-	-	Amazon Nova Canvas is a high-quality image‐generation model that creates and edits images from text or image inputs—offering features like inpainting/outpainting, virtual try‑on, style controls, and background removal—all with built‑in customization.
-	kimi-k2	6,300.00	-	Kimi K2-Instruct-0905 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Key Features: - Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability. - MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up. - Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving. Technical Specifications File Support: Attachments not supported Context window: 256k tokens
-	kimi-k2-0905-t	11,000.00	-	The new Kimi K2-0905 model from Moonshot AI features a massive 256,000-token context window, double the length of its predecessor (Kimi K2), along with greatly improved coding abilities and front-end generation accuracy. It boasts 1 trillion total parameters (with 32 billion activated at a time) and claims 100% tool-call success in real-world tests, setting a new bar for open-source AI performance in complex, multi-step tasks
-	kimi-k2-t	11,000.00	-	Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
-	kimi-k2-instruct	6,000.00	-	Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Uses the latest September 5th, 2025 snapshot. The updated version has improved coding abilities, agentic tool use, and a longer (256K) context window.
-	deepseek-v3.1	7,800.00	-	Latest Update: Terminus Enhancement This model has been updated with the Terminus release, addressing key user-reported issues while maintaining all original capabilities: - Language consistency: Reduced instances of mixed Chinese-English text and abnormal characters - Enhanced agent capabilities: Optimized performance of the Code Agent and Search Agent Core Capabilities DeepSeek-V3.1 is a hybrid model supporting both thinking mode and non-thinking mode, built upon the original V3 base checkpoint through a two-phase long context extension approach. Technical Specifications Context Window: 128k tokens File Support: PDF, DOC, and XLSX files File Restrictions: Does not accept audio and video files
-	glm-4.6-fw	6,000.00	-	As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
-	deepseek-v3.1-t	6,000.00	-	DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects: Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template. Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
-	glm-4.5	5,700.00	-	The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. Technical Specifications File Support: PDF and Markdown files Context window: 128k tokens
-	deepseek-v3.1-n	5,700.00	-	DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects: - Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template. - Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. - Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
-	qwen3-coder	9,000.00	-	Qwen3 Coder 480B A35B Instruct is a state-of-the-art 480B-parameter Mixture-of-Experts model (35B active) that achieves top-tier performance across multiple agentic coding benchmarks. Supports 256K native context length and scales to 1M tokens with extrapolation. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company.
-	claude-sonnet-4	2.60	13.00	Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 1m context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
-	claude-opus-4	13.00	64.00	Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
-	claude-opus-4-reasoning	13.00	64.00	Claude Opus 4 from Anthropic, supports customizable thinking budget (up to 30k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 30,768 to the end of your message.
-	claude-sonnet-4-reasoning	2.60	13.00	Claude Sonnet 4 from Anthropic, supports customizable thinking budget (up to 60k tokens) and 200k context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 61,440 to the end of your message.
-	o4-mini	0.99	4.00	o4-mini provides high intelligence on a variety of tasks and domains, including science, math, and coding at an affordable price point. This bot uses medium reasoning effort by low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
-	gemini-deep-research	1.60	9.60	Gemini Deep Research plans, executes, and synthesizes complex, multi-step investigations by querying the web and other data to produce detailed, structured reports. Offers best in the world performance on Google's newly released DeepSearchQA benchmark as of December 2025. Be sure to give your entire research request in the initial prompt and include as much detail as you can! use --interaction_id flag if you want to continue discussion in previous research task.
-	o4-mini-deep-research	1.80	7.20	Deep Research from OpenAI powered by the o4-mini model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
-	glm-4.5-air-t	2,400.00	-	The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
-	glm-4.5-fw	5,400.00	-	The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters. It unifies reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
-	grok-3	-	-	xAI's February 2025 flagship release representing nearly state-of-the-art performance in several reasoning/problem solving domains. The API doesn't yet support reasoning mode for Grok 3, but does for https://poe.com/Grok-3-Mini; this bot also doesn't have access to the X data feed. Supports 131k tokens of context, uses Grok 2 for native vision.
-	grok-3-mini	-	-	xAI's February 2025 release with strong performance across many domains but at a more affordable price point. Supports reasoning with a configurable reasoning effort level, and 131k tokens of context; doesn't have access to the X data feed. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low" or "high".
-	o3	1.80	7.20	o3 provides state-of-the-art intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high are also selectable; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
-	o3-deep-research	9.00	36.00	Deep Research from OpenAI powered by the o3 model, can search through extensive web information to answer complex, nuanced research questions in various domains such as finance, consulting, and science.
-	elevenlabs-v3	-	-	ElevenLabs v3 is a cutting-edge text-to-speech model that brings scripts to life with remarkable realism and performance-level control. Unlike traditional TTS systems, it allows creators to shape the emotional tone, pacing, and soundscape of their audio through the use of inline audio tags. These tags are enclosed in square brackets and act as stage directions—guiding how a line is spoken or what sound effects are inserted—without being spoken aloud. This enables rich, expressive narration and dialogue for applications like audiobooks, games, podcasts, and interactive media. Whether you’re aiming for a tense whisper, a sarcastic remark, or a dramatic soundscape full of explosions and ambient effects, v3 gives you granular control directly in the text prompt. This bot will also run text-to-speech on PDF attachments / URL links. Examples of voice delivery tags include: * [whispers] I have to tell you a secret. * [angry] That was never the plan. * [sarcastic] Oh, sure. That’ll totally work. * and [laughs] You're hilarious. Examples of sound effect tags are: * [gunshot] Get down! * [applause] Thank you, everyone. * and [explosion] What was that?! These can also be combined. Multiple speakers can be supported via the parameter control. Dialogue for multiple speakers must follow the format, e.g. for 3 speakers: Speaker 1: [dialogue] Speaker 2: [dialogue] Speaker 3: [dialogue] Speaker 1: [dialogue] Speaker 2: [dialogue] --speaker_count 3 --voice_1 [voice_1] --voice_2 [voice_2] --voice_3 [voice_3] The following voices are supported: Alexandra - Conversational & Real Amy - Young & Natural Arabella - Mature Female Narrator Austin - Good Ol' Texas Boy Blondie - Warm & Conversational Bradford - British Male Storyteller Callum - Gravelly Yet Unsettling Charlotte - Raspy & Sensual Chris - Down-to-Earth Coco Li - Shanghainese Female Gaming - Unreal Tonemanagement 2003 Harry - Animated Warrior Hayato - Soothing Zen Male Hope - Upbeat & Clear James - Husky & Engaging James Gao - Calm Chinese Voice Jane - Professional Audiobook Reader Jessica - Playful American Female Juniper - Grounded Female Professional Karo Yang - Youthful Asian Male Kuon - Acute Fantastic Female Laura - Quirky Female Voice Liam - Warm, Energetic Youth Monika Sogam - Indian-English Accent Nichalia Schwartz - Engaging Female American Priyanka Sogam - Late-Night Radio Reginald - Brooding, Intense Villain ShanShan - Young, Energetic Female Xiao Bai - Shrill & Annoying Prompt input cannot exceed 5,000 characters.
-	deepseek-v3	12,000.00	-	DeepSeek-V3 – the new top open-source LLM. Updated to the March 24, 2025 checkpoint. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to Together, a US-based company. Supports 131k context window and max output of 12k tokens.
-	deepseek-v3-fw	9,000.00	-	DeepSeek-V3 is an open-source Mixture-of-Experts (MoE) language model; able to perform well on competitive benchmarks with cost-effective training & inference. All data submitted to this bot is governed by the Poe privacy policy and is sent to Fireworks, a US-based company. Supports 131k context window and max output of 131k tokens. Updated to serve the latest March 24th, 2025 snapshot.
-	deepseek-v3.1-tm	5,700.00	-	DeepSeek-V3.1-Terminus preserves all original model capabilities while resolving key user-reported issues, including: - Language consistency: Significantly reducing mixed Chinese-English output and eliminating abnormal character occurrences - Agent performance: Enhanced optimization of both Code Agent and Search Agent functionality - Use `--enable_thinking false` to disable thinking about the response before giving a final answer. - The bot does not accept attachment. It also does not support billing logic Context window: 128k tokens.
-	gpt-4.1	1.80	7.20	OpenAI’s GPT-4.1 significantly improves on past models in terms of its coding skills, long context (1M tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4o. Provides a 75% chat history cache discount. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	gpt-4.1-mini	0.36	1.40	GPT-4.1 mini is a small, fast & affordable model that matches or beats GPT-4o in many intelligence and vision-related tasks. Supports 1M tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
-	gpt-4.1-nano	0.09	0.36	GPT-4.1 nano is an extremely fast and cheap model, ideal for text/vision summarization/categorization tasks. Supports native vision and 1M input tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5-nano.
-	llama-4-scout-t	1,000.00	-	Llama 4 Scout, fast long-context multimodal model from Meta. A 16-expert MoE model that excels at multi-document analysis, codebase reasoning, and personalized tasks. A smaller model than Maverick but state of the art in its size & with text + image input support. Supports 300k context.
-	claude-opus-4-search	13.00	64.00	Claude Opus 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
-	claude-sonnet-4-search	2.60	13.00	Claude Sonnet 4 with access to real-time information from the web. Supports customizable thinking budget of up to 126k tokens. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
-	claude-sonnet-3.7	2.60	13.00	Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. For the maximum extending thinking, please use https://poe.com/Claude-Sonnet-Reasoning-3.7. Supports a 200k token context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 16,384 to the end of your message.
-	claude-sonnet-3.5	2.60	13.00	Anthropic's Claude Sonnet 3.5 using the October 22, 2024 model snapshot. Excels in complex tasks like coding, writing, analysis and visual processing. Has a context window of 200k of tokens (approximately 150k English words).
-	claude-haiku-3.5	0.68	3.40	The latest generation of Anthropic's fastest model. Claude Haiku 3.5 has fast speeds and improved instruction following.
-	gemini-2.0-flash	0.10	0.42	Gemini 2.0 Flash is Google's most popular model yet with enhanced performance and blazingly fast response times; supports web search grounding so can intelligently answer questions related to recent events. Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed. Supports 1 million tokens of input context. To use web search and real-time information access, add `--web_search true` to enable and add `--web_search false` to disable (default setting).
-	gemini-2.0-flash-lite	0.05	0.21	Gemini 2.0 Flash Lite is a new model variant from Google that is our most cost-efficient model yet, and often considered a spiritual successor to Gemini 1.5 Flash in terms of capability, context window size and cost. Does not support web search (if you need search, we recommend using https://poe.com/Gemini-2.0-Flash), supports 1 million tokens of input context.
-	claude-sonnet-3.7-search	2.60	13.00	Claude Sonnet 3.7 with access to real-time information from the web. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
-	claude-haiku-3.5-search	0.68	3.40	Claude Haiku 3.5 with access to real-time information from the web.
-	qwen3-max	-	-	Qwen3-Max is a major update to the Qwen3 series, delivering significant improvements in reasoning, instruction following, and multilingual support. It provides higher accuracy in complex tasks like coding and math, along with reduced hallucinations and better performance on open-ended questions. This model is served by Alibaba Cloud Int. from Singapore.
-	gpt-oss-120b	1,200.00	-	OpenAI introduces the GPT-OSS-120B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
-	gpt-oss-20b	450.00	-	OpenAI introduces the GPT-OSS-20B, an open-weight reasoning model available under the Apache 2.0 license and OpenAI GPT-OSS usage policy. Developed with feedback from the open-source community, this text-only model is compatible with OpenAI Responses API and is designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities. The GPT-OSS-20B model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. This model also performs strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Technical Specifications File Support: Attachments not supported Context window: 128k tokens
-	gpt-oss-120b-cs	3,200.00	-	World’s fastest inference for GPT OSS 120B with Cerebras. OpenAI's GPT-OSS-120B delivers sophisticated chain-of-thought reasoning capabilities in a fully open model. The bot does not accept video, ppt, docx and excel files.
-	openai-gpt-oss-120b	1,500.00	-	GPT-OSS-120b is a high-performance, open-weight language model designed for production-grade, general-purpose use cases. It fits on a single H100 GPU, making it accessible without requiring multi-GPU infrastructure. Trained on the Harmony response format, it excels at complex reasoning and supports configurable reasoning effort, full chain-of-thought transparency for easier debugging and trust, and native agentic capabilities for function calling, tool use, and structured outputs.
-	openai-gpt-oss-20b	750.00	-	GPT-OSS-20B is a compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments. It shares the same Harmony training foundation and capabilities as 120B, with faster inference and easier deployment that is ideal for specialized or offline use cases, fast responsive performance, chain-of-thought output, and agentic workflows.
-	qwen3-next-instruct-t	2,400.00	-	Qwen3-Next Instruct features a highly sparse MoE structure that activates only 3B of its 80B parameters during inference. Supports only instruct mode without thinking blocks, delivering performance on par with Qwen3-235B-A22B-Instruct-2507 on certain benchmarks while using less than 10% training cost and providing 10x+ higher throughput on contexts over 32K tokens.
-	qwen3-next-think-t	3,000.00	-	Qwen3-Next Thinking features the same highly sparse MoE architecture but specialized for complex reasoning tasks. Supports only thinking mode with automatic tag inclusion, delivering exceptional analytical performance while maintaining extreme efficiency with 10x+ higher throughput on long contexts and may generate longer thinking content than predecessors.
-	qwen3-max-n	22,000.00	-	Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode. File Support: Text, Markdown and PDF files Context window: 256k tokens
-	qwen3-vl-235b-a22b-t	4,800.00	-	Qwen3-VL is the most advanced vision-language model in the Qwen series, offering enhanced text understanding, visual reasoning, spatial perception, and agent capabilities. It supports Dense/MoE architectures and Instruct/Thinking editions for versatile deployment. Key Features: - Visual Agent: Operates GUIs, recognizes elements, invokes tools, and completes tasks. - Coding Boost: Generates Draw.io, HTML, CSS, and JS from images/videos. - Spatial Perception: Enables 2D/3D reasoning with strong object positioning and occlusion analysis. - Long Context: Processes up to 1M tokens for books or long videos. - Multimodal Reasoning: Excels in STEM, math, causal analysis, and evidence-based answers. - Visual Recognition: Recognizes a wide range of objects, landmarks, and more. - OCR: Supports 32 languages with improved performance in challenging conditions. - Text-Vision Fusion: Achieves seamless, unified comprehension. Ideal for multimodal reasoning, spatial analysis, and integrated text-vision tasks. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens
-	qwen3-vl-235b-a22b-i	3,600.00	-	This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on‑demand deployment. Key Enhancements: Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos. Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers. Upgraded Visual Recognition: Broader, higher-quality pretraining is able to "recognize everything"—celebrities, anime, products, landmarks, flora/fauna, etc. Expanded OCR: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. Text Understanding on par with pure LLMs: Seamless text–vision fusion for lossless, unified comprehension. Technical Specifications File Support: Image, Video, PDF and Markdown files Context window: 128k tokens
-	qwen-3-235b-2507-t	1,900.00	-	Qwen3 235B A22B 2507, currently the best instruct model (non-reasoning) among both closed and open source models. It excels in instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. It is also great at multilingual tasks and supports a long context window (262k).
-	qwen3-235b-2507-fw	2,700.00	-	State-of-the-art language model with exceptional math, coding, and problem-solving performance. Operates in non-thinking mode, and does not generate <think></think> blocks in its output. Supports 256k tokens of native context length. All data provided will not be used in training, and is sent only to Fireworks AI, a US-based company. Uses the latest July 21st, 2025 snapshot (Qwen3-235B-A22B-Instruct-2507).
-	qwen3-235b-2507-cs	6,000.00	-	World's fastest inference with Qwen3 235B Instruct (2507) model with Cerebras. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage.
-	qwen3-coder-480b-t	17,000.00	-	Qwen3‑Coder‑480B is a state of the art mixture‑of‑experts (MoE) code‑specialized language model with 480 billion total parameters and 35 billion activated parameters. Qwen3‑Coder delivers exceptional performance across code generation, function calling, tool use, and long‑context reasoning. It natively supports up to 262,144‑token context windows, making it ideal for large repository and multi‑file coding tasks.
-	qwen3-coder-480b-n	7,200.00	-	Qwen3-Coder-480B-A35B-Instruct delivers Claude Sonnet-comparable performance on agentic coding and browser tasks while supporting 256K-1M token long-context processing and multi-platform agentic coding capabilities. Technical Specifications File Support: Attachments not supported Context window: 256k tokens
-	qwen3-235b-a22b-di	1,900.00	-	Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP8.
-	qwen3-235b-a22b-n	1,800.00	-	It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). The Bot does not currently support attachments. This feature the following key enhancements: - Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. - Substantial gains in long-tail knowledge coverage across multiple languages. - Markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. - Enhanced capabilities in 256K long-context understanding. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
-	magistral-medium-2509-thinking	-	-	Magistral Medium 2509 (thinking) by EmpirioLabs. Magistral is Mistral's first reasoning model. It is ideal for general purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs. From legal research and financial forecasting to software development and creative storytelling — this model solves multi-step challenges where transparency and precision are critical. Context Window: 40,000k Supported file type uploads: PDF, XLSX, TXT, PNG, JPG, JPEG
-	o1	14.00	54.00	OpenAI's o1 is designed to reason before it responds and provides world-class capabilities on complex tasks (e.g. science, coding, and math). Improving upon o1-preview and with higher reasoning effort, it is also capable of reasoning through images and supports 200k tokens of input context. By default, uses reasoning_effort of medium, but low, medium & high are also selectable.
-	o1-pro	140.00	540.00	OpenAI’s o1-pro highly capable reasoning model, tailored for complex, compute- or context-heavy tasks, dedicating additional thinking time to deliver more accurate, reliable answers. For less costly, complex tasks, https://poe.com/o3-mini is recommended. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
-	cartesia-ink-whisper	-	-	Transcribe audio files using Speech-to-Text with the Cartesia Ink Whisper model. Select the Language (`--language`) of your audio file in Settings. Default is English (en). Supported Languages: English (en) Chinese (zh) German (de) Spanish (es) Russian (ru) Korean (ko) French (fr) Japanese (ja) Portuguese (pt) Turkish (tr) Polish (pl) Catalan (ca) Dutch (nl) Arabic (ar) Swedish (sv) Italian (it) Indonesian (id) Hindi (hi) Finnish (fi) Vietnamese (vi) Hebrew (he) Ukrainian (uk) Greek (el) Malay (ms) Czech (cs) Romanian (ro) Danish (da) Hungarian (hu) Tamil (ta) Norwegian (no) Thai (th) Urdu (ur) Croatian (hr) Bulgarian (bg) Lithuanian (lt) Latin (la) Maori (mi) Malayalam (ml) Welsh (cy) Slovak (sk) Telugu (te) Persian (fa) Latvian (lv) Bengali (bn) Serbian (sr) Azerbaijani (az) Slovenian (sl) Kannada (kn) Estonian (et) Macedonian (mk) Breton (br) Basque (eu) Icelandic (is) Armenian (hy) Nepali (ne) Mongolian (mn) Bosnian (bs) Kazakh (kk) Albanian (sq) Swahili (sw) Galician (gl) Marathi (mr) Punjabi (pa) Sinhala (si) Khmer (km) Shona (sn) Yoruba (yo) Somali (so) Afrikaans (af) Occitan (oc) Georgian (ka) Belarusian (be) Tajik (tg) Sindhi (sd) Gujarati (gu) Amharic (am) Yiddish (yi) Lao (lo) Uzbek (uz) Faroese (fo) Haitian Creole (ht) Pashto (ps) Turkmen (tk) Nynorsk (nn) Maltese (mt) Sanskrit (sa) Luxembourgish (lb) Myanmar (my) Tibetan (bo) Tagalog (tl) Malagasy (mg) Assamese (as) Tatar (tt) Hawaiian (haw) Lingala (ln) Hausa (ha) Bashkir (ba) Javanese (jw) Sundanese (su) Cantonese (yue)
-	chatgpt-4o-latest	4.50	14.00	Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks. Supports context window of 128k tokens, cannot generate images.
-	gpt-4o-mini	0.14	0.54	This intelligent small model from OpenAI is significantly smarter, cheaper, and just as fast as GPT-3.5 Turbo. Check out the newest version of this bot here: https://poe.com/GPT-5-mini.
-	glm-4.6-t	6,600.00	-	GLM-4.6 is the latest flagship model from Z.ai's GLM series, delivering state-of-the-art agentic and coding capabilities that rival Claude Sonnet 4. With 357B parameters in a Mixture-of-Experts architecture, an expanded 200K context window, and 30% improved token efficiency, GLM-4.6 represents the top-performing model developed in China.
-	qwen3-max-preview	-	-	A preview version of the Max model in the Tongyi Qianwen 3 series, achieving an effective integration of thinking and non-thinking modes. In thinking mode, there is a significant enhancement in capabilities such as intelligent agent programming, common-sense reasoning, and reasoning across mathematics, science, and general domains. This model is served by Alibaba Cloud Int. from Singapore. Notes: - Audio/Video files are not supported. - Max Context Window: 252k Use '-- enable_thinking true/false' to enable/disable Deep Thinking accordingly.
-	o3-mini	0.99	4.00	o3-mini is OpenAI's reasoning model, providing high intelligence on a variety of tasks and domains, including science, math, and coding. This bot uses medium reasoning effort by default but low, medium & high can be selected; supports 200k tokens of input context and 100k tokens of output context. To instruct the bot to use more reasoning effort, add --reasoning_effort to the end of your message with one of "low", "medium", or "high".
-	o3-mini-high	0.99	4.00	o3-mini-high is OpenAI's most recent reasoning model with reasoning_effort set to high, providing frontier intelligence on most tasks. Like other models in the o-series, it is designed to excel at science, math, and coding tasks. Supports 200k tokens of input context and 100k tokens of output context.
-	llama-3.1-8b-di	300.00	-	The smallest and fastest model from Meta's Llama 3.1 family. This open-source language model excels in multilingual dialogue, outperforming numerous industry benchmarks for both closed and open-source conversational AI systems. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Input token limit 128k, output token limit 8k. Quantization: FP16 (official).
-	claude-sonnet-3.7-reasoning	2.60	13.00	Reasoning capabilities on by default. Claude Sonnet 3.7 is a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. Recommended for complex math or coding problems. Supports a 200k token context window. To instruct the bot to use more thinking effort, add --thinking_budget and a number ranging from 0 to 126,000 to the end of your message.
-	inception-mercury	-	-	Mercury is the first diffusion large language model (dLLM). On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. A new generation of LLMs that push the frontier of fast, high-quality text generation.
-	inception-mercury-coder	-	-	Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder Small's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the blog post here: https://www.inceptionlabs.ai/introducing-mercury.
-	mistral-medium-3	-	-	Mistral Medium 3 is a powerful, cost-efficient language model offering top-tier reasoning and multimodal performance. Context Window: 130k
-	mistral-medium	2.70	8.10	Mistral AI's medium-sized model. Supports a context window of 32k tokens (around 24,000 words) and is stronger than Mixtral-8x7b and Mistral-7b on benchmarks across the board.
-	llama-4-maverick-t	1,600.00	-	Llama 4 Maverick, state of the art long-context multimodal model from Meta. A 128-expert MoE powerhouse for multilingual image/text understanding (12 languages), creative writing, and enterprise-scale applications—outperforming Llama 3.3 70B. Supports 500k tokens context.
-	llama-3.3-70b-fw	4,200.00	-	Meta's Llama 3.3 70B Instruct, hosted by Fireworks AI. Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
-	llama-3.3-70b	3,900.00	-	Llama 3.3 70B – with similar performance as Llama 3.1 405B while being faster and much smaller! Llama 3.3 70B is a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost, improving over Llama 3.1 70B.
-	deepseek-prover-v2	-	-	DeepSeek-Prover-V2 is an open-source large language model specifically designed for formal theorem proving in Lean 4. The model builds on a recursive theorem proving pipeline powered by the company's DeepSeek-V3 foundation model.
-	deepseek-r1-fw	18,000.00	-	State-of-the-art large reasoning model problem solving, math, and coding performance at a fraction of the cost; explains its chain of thought. All data you provide this bot will not be used in training, and is sent only to Fireworks AI, a US-based company. Supports 164k tokens of input context and 164k tokens of output context. Uses the latest May 28th, 2025 snapshot.
-	deepseek-r1-di	6,000.00	-	Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
-	deepseek-r1-n	6,000.00	-	The DeepSeek-R1 (latest Snapshot model DeepSeek-R1-0528) model features enhanced reasoning and inference capabilities through optimized algorithms and increased computational resources. It excels in mathematics, programming, and logic, with performance nearing top-tier models like o3 and Gemini 2.5 Pro. This bot does not accept attachments. Technical Specifications File Support: Attachments not supported Context window: 160k tokens
-	llama-3.3-70b-n	1,400.00	-	The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Technical Specifications File Support: Attachments not supported Context window: 128k tokens
-	llama-3.3-70b-cs	7,800.00	-	World’s fastest inference for Llama 3.3 70B with Cerebras. The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
-	llama-3.1-70b-t	14,000.00	-	Llama 3.1 70B Instruct from Meta. Supports 128k tokens of context. The points price is subject to change.
-	llama-3.1-8b-cs	900.00	-	World’s fastest inference for Llama 3.1 8B with Cerebras. This Llama 8B instruct-tuned version is fast and efficient. The Llama 3.1 8B is an instruction tuned text only model, optimized for multilingual dialogue use cases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.
-	gpt-researcher	-	-	GPT Researcher is an agent that conducts deep research on any topic and generates a comprehensive report with citations. GPT Researcher is powered by Tavily's search engine. GPTR is based on the popular open source project: https://github.com/assafelovic/gpt-researcher -- by integrating Tavily search, it is optimized for curation and ranking of trusted research sources. Learn more at https://gptr.dev or https://tavily.com
-	web-search	-	-	Web-enabled assistant bot that searches the internet to inform its responses. Particularly good for queries regarding up-to-date information or specific facts. Powered by Gemini 2.0 Flash.
-	gpt-4o-search	2.20	9.00	OpenAI's fine-tuned model for searching the web for real-time information. For less expensive messages, consider https://poe.com/GPT-4o-mini-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
-	gpt-4o-mini-search	0.14	0.54	OpenAI's fine-tuned model for searching the web for real-time information. For higher-performance, consider https://poe.com/GPT-4o-Search. Uses medium search context size, currently in preview, supports 128k tokens of context. Does not support image search.
-	reka-research	-	-	Reka Research is a state-of-the-art agentic AI that answers complex questions by browsing the web. It excels at synthesizing information from multiple sources, performing work that usually takes hours in minutes
-	perplexity-sonar	-	-	Sonar by Perplexity is a cutting-edge AI model that delivers real-time, web-connected search results with accurate citations. It's designed to provide up-to-date information and customizable search sources, making it a powerful tool for integrating AI search into various applications. Context Length: 127k
-	linkup-deep-search	-	-	Linkup Deep Search is an AI-powered search bot that continues to search iteratively if it hasn't found sufficient information on the first attempt. Results are slower compared to its Standard search counterpart, but often yield to more comprehensive results. Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Context Window: 100k Audio/video files are not supported at this time. Parameter controls available: 1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains. 2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format 3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display. Learn more: https://www.linkup.so/
-	linkup-standard	-	-	Linkup Standard is an AI-powered search bot that provides detailed overviews and answers sourced from the web, helping you find high-quality information quickly and accurately. Results are faster compared to its Deep search counterpart. Context Window: 100k Linkup's technology ranks #1 globally for factual accuracy, achieving state-of-the-art scores on OpenAI’s SimpleQA benchmark. Audio/video files are not supported at this time. Parameter controls available: 1. Domain control. To search only within specific domains use --include_domains, To exclude domains from the search result use --exclude_domains, To give higher priority on search use --prioritize_domains. 2. Date Range: Use --from_date and to_date to select date range search. Use YYYY-MM-DD date format 3. Content Option: Use --include_image true to include relevant images on search and --image_count (up to 45) to display specific number of images to display. Learn more: https://www.linkup.so/
-	perplexity-sonar-pro	-	-	Sonar Pro by Perplexity is an advanced AI model that enhances real-time, web-connected search capabilities with double the citations and a larger context window. It's designed for complex queries, providing in-depth, nuanced answers and extended extensibility, making it ideal for enterprises and developers needing robust search solutions. Context Length: 200k (max output token limit of 8k)
-	perplexity-sonar-rsn-pro	-	-	This model operates on the open-sourced uncensored R1-1776 model from Perplexity with web search capabilities. The Perplexity Sonar Rsn Pro Reasoning Model takes AI-powered answers to the next level, offering unmatched quality and precision. Outperforming leading search engines and LLMs, This model has demonstrated superior performance in the SimpleQA benchmark, making it the gold standard for high-quality answer generation. Context Length: 128k (max output token limit of 8k)
-	perplexity-deep-research	-	-	Perplexity Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation across domains like finance, technology, health, and current events. Context Length: 128k
-	flux-pro-1.1-ultra	-	-	State-of-the-art image generation with four times the resolution of standard FLUX-1.1-pro. Best-in-class prompt adherence and pixel-perfect image detail. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Add "--raw" (no other arguments needed) for an overall less processed, everyday aesthetic. Valid aspect ratios are 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21. Send an image to have this model reimagine/regenerate it via FLUX Redux, and use "--strength" (e.g --strength 0.7) to control the impact of the text prompt (1 gives greater influence, 0 means very little)."--raw true" to enable raw photographic detail.
-	mistral-small-3.1	-	-	Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.
-	claude-opus-3	13.00	64.00	Anthropic's Claude Opus 3 can handle complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks. Supports 200k tokens of context (approximately 150k English words).
-	sonic-3.0	6,000.00	-	Generates audio based on your prompt using the latest Cartesia's Sonic 3.0 text-to-speech model in your voice of choice. Supports 10k characters. You can select a voice and language in option menu in the input bar. The following voices are supported covering 42 languages (English, Arabic, Bengali, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Malay, Malayalam, Marathi, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Slovak, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese): -- English -- Ariana Kiefer Tessa Brandon Linda - Conversational Guide Ronald - Thinker Brooke - Big Sister Katie - Friendly Fixer Jacqueline - Reassuring Agent Caroline - Southern Guide -- Arabic -- Amira - Dreamy Whisperer Omar - High-Energy Presenter -- Bengali -- Pooja - Everyday Assistant Rubel - City Guide -- Bulgarian -- Ivana - Instruction Provider Georgi - Conversationalist -- Chinese -- Hua - Sunny Support Yue - Gentle Woman Tao - Lecturer Lan - Instructor -- Croatian -- Petra - Strict Lecturer Ivan - Bar Companion -- Czech -- Jana - Crisp Conversationalist Petr - Pastor -- Danish -- Katrine - Calm Caregiver -- Dutch -- Bram - Instructional Daan - Business Baritone Sanne - Clear Companion Lucas - Storyteller -- Finnish -- Helmi - Warm Friend Mikko - Narration Expert -- French -- Helpful French Lady French Narrator Man Calm French Woman Antoine - Stern Man -- Georgian -- Levan - Support Guide Tamara - Support Specialist -- German -- Thomas - Anchor Viktoria - Phone Conversationalist Lukas - Professional Lena - Muse -- Greek -- Despina - Motherly Woman Nikos - Radio Storyteller -- Gujarati -- Isha - Learner Amit - Sports Student -- Hebrew -- Noam - Broadcaster -- Hindi -- Arushi - Hinglish Speaker Sunil - Official Announcer Riya - College Roommate Aadhya - Soother -- Hungarian -- Gabor - Reassuring Eszter - Customer Companion -- Indonesian -- Siti - Ad Narrator Andi - Dynamic Presenter -- Italian -- Liv - Casual Friend Alessandra - Melodic Guide Francesca - Elegant Partner Giancarlo - Support Leader -- Japanese -- Yumiko - Friendly Agent Emi - Soft-Spoken Friend Yuki - Calm Woman Daisuke - Businessman -- Kannada -- Prakash - Instructor Divya - Joyful Narrator -- Korean -- Jihyun - Anchorwoman Mimi - Show Stopper Byungtae - Enforcer Jiwoo - Service Specialist -- Malay -- Aisyah - Chat Partner Faiz - Family Guide -- Malayalam -- Latha - Friendly Host -- Marathi -- Suresh - Instruction Anika - Enthusiastic Seller -- Norwegian -- Lars - Casual Conversationalist -- Polish -- Tomek - Casual Companion Wojciech - Documentarian Piotr - Corporate Lead Katarzyna - Melodic Storyteller -- Portuguese -- Luana - Public Speaker Felipe - Casual Talker Ana Paula - Marketer Beatriz - Support Guide -- Punjabi -- Gurpreet - Companion Jaspreet - Commercial Woman -- Romanian -- Andrada - Steady Speaker Andrei - Conversationalist Guy -- Russian -- Tatiana - Friendly Storyteller Natalya - Soothing Guide Irina - Poetic Sergei - Expressive Narrator -- Slovak -- Katarina - Friendly Sales Peter - Narrator Man -- Spanish -- Pedro - Formal Speaker Daniela - Relaxed Woman Fran - Confident Young Professional Isabel - Teacher -- Swedish -- Freja - Nordic Reader Ingrid - Peaceful Guide Anders - Nordic Baritone Cees - Nordic Narrator -- Tagalog -- Luz - Casual Speaker Angelo - Calm Narrator -- Tamil -- Arun - Lively Lakshmi - Everyday -- Telugu -- Sindhu - Conversational Partner Vikram - Folk Narrator -- Thai -- Somchai - Star Suda - Fortune Teller -- Turkish -- Emre - Calming Speaker Leyla - Story Companion Azra - Service Specialist Taylan - Expressive -- Ukrainian -- Oleh - Professional Guy -- Vietnamese -- Minh - Conversational Partner Xia - Calm Companion
-	hailuo-music-v1.5	-	-	Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions. Send the lyrics of the music over as your prompt. Use `--style` to set the style of the generated music - for example, rock and roll, hip-hop, etc. Both prompt/lyrics and style must be sent over for best quality. The prompt supports [intro][verse][chorus][bridge][outro] sections.
-	elevenlabs-music	-	-	The ElevenLabs music model is a generative AI system designed to compose original music from text prompts. It allows creators to specify genres, moods, instruments, and structure, producing royalty-free tracks tailored to their needs. The model emphasizes speed, creative flexibility, and high-quality audio output, making it suitable for use in videos, podcasts, games, and other multimedia projects. This bot can produce songs with suggested lyrics based on general descriptions, exact lyrics if specified as such, or instrumental ones, all via prompting. Use `--music_length_ms` to set the length of the song in milliseconds (10,000 to 300,000 ms). Prompt input cannot exceed 2,000 characters.
-	whisper-v3-large-t	3,000.00	-	Whisper v3 Large is a state-of-the-art automatic speech recognition and translation model developed by OpenAI, offering 10–20% lower error rates than its predecessor, Whisper large-v2. It supports transcription and translation across numerous languages, with improvements in handling diverse audio inputs, including noisy conditions and long-form audio files.
-	stable-audio-2.5	-	-	Stable Audio 2.5 generates high-quality audio up to 3 minutes long from text prompts, supporting text-to-audio, audio-to-audio transformations, and inpainting with customizable settings like duration, steps, CFG scale, and more. It is Ideal for music production, cinematic sound design, and remixing. Note: Audio-to-audio and inpaint modes require a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - If replacing specific parts: `--mode audio-inpaint` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [4-8]`: Higher = better quality (recommended 6-8) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical) 5. Inpainting control (only for audio-inpaint) - `--mask_start_time [seconds]` start time of the uploaded audio to modify - `--mask_end_time [seconds]` end time of the uploaded audio to modify
-	stable-audio-2.0	-	-	Stable Audio 2.0 generates audio up to 3 minutes long from text prompts, supporting text-to-audio and audio-to-audio transformations with customizable settings like duration, steps, CFG scale, and more. It is ideal for creative professionals seeking detailed and extended outputs from simple prompts. Note: Audio-to-audio mode requires a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [30-100]`: Higher = better quality (recommended 50-80) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical)
-	hailuo-speech-02	-	-	Generate speech from text prompts using the MiniMax Speech-02 model. Include `--hd` at the end of your prompt for higher quality output with a higher price. You may set language with `--language`, voice with`--voice`, pitch with `--pitch`, speed with `--speed`, and volume with `--volume`. Please check the UI for allowed values for each parameter.
-	elevenlabs-v2.5-turbo	-	-	ElevenLabs' leading text-to-speech technology converts your text into natural-sounding speech, using the Turbo v2.5 model. Simply send a text prompt, and the bot will generate audio using your choice of available voices. If you link a URL or a PDF, it will do its best to read it aloud to you. The overall default voice is Jessica, an American-English female. Add --voice "Voice Name" to the end of a message (e.g. "Hello world --voice Eric") to customize the voice used. Add --language and the two-letter, Language ISO-639-1 code to your message if you notice pronunciation errors; table of ISO-639-1 codes here: https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes (e.g. zh for Chinese, es for Spanish, hi for Hindi) The following voices are supported and recommended for each language: English -- Sarah, George, River, Matilda, Will, Jessica, Brian, Lily, Monika Sogam Chinese -- James Gao, Martin Li, Will, River Spanish -- David Martin, Will, Efrayn, Alejandro, Sara Martin, Regina Martin Hindi -- Ranga, Niraj, Liam, Raju, Leo, Manu, Vihana Huja, Kanika, River, Monika Sogam, Muskaan, Saanu, Riya, Devi Arabic -- Bill, Mo Wiseman, Haytham, George, Mona, Sarah, Sana, Laura German -- Bill, Otto, Leon Stern, Mila, Emilia, Lea, Leonie Indonesian -- Jessica, Putra, Mahaputra Portuguese -- Will, Muhammad, Onildo, Lily, Jessica, Alice Vietnamese -- Bill, Liam, Trung Caha, Van Phuc, Ca Dao, Trang, Jessica, Alice, Matilda Filipino -- Roger, Brian, Alice, Matilda French -- Roger, Louis, Emilie Swedish -- Will, Chris, Jessica, Charlotte Turkish -- Cavit Pancar, Sohbet Adami, Belma, Sultan, Mahidevran Romanian -- Eric, Bill, Brian, Charlotte, Lily Italian -- Carmelo, Luca, Alice, Lily Polish -- Robert, Rob, Eric, Pawel, Lily, Alice Norwegian -- Chris, Charlotte Czech -- Pawel Finnish -- Callum, River Hungarian -- Brian, Sarah Japanese -- Alice Prompt input cannot exceed 40,000 characters.
-	sonic-2.0	-	-	Generates audio based on your prompt using the latest Cartesia's Sonic 2.0 text-to-speech model in your voice of choice (see below) Add --voice [Voice Name] to the end of a message to customize the voice used or to handle different language inputs (e.g. 你好 --voice Chinese Commercial Woman). All of Cartesia's voices are supported on Poe. The following voices are supported covering 15 languages (English, French, German, Spanish, Portuguese, Chinese, Japanese, Hindi, Italian, Korean, Dutch, Polish, Russian, Swedish, Turkish): Here's the alphabetical list of all the top voice names: "1920's Radioman" Aadhya Adele Alabama Man Alina American Voiceover Man Ananya Anna Announcer Man Apoorva ASMR Lady Australian Customer Support Man Australian Man Australian Narrator Lady Australian Salesman Australian Woman Barbershop Man Brenda British Customer Support Lady British Lady British Reading Lady Brooke California Girl Calm French Woman Calm Lady Camille Carson Casper Cathy Chongz Classy British Man Commercial Lady Commercial Man Confident British Man Connie Corinne Customer Support Lady Customer Support Man Dallas Dave David Devansh Elena Ellen Ethan Female Nurse Florence Francesca French Conversational Lady French Narrator Lady French Narrator Man Friendly Australian Man Friendly French Man Friendly Reading Man Friendly Sidekick German Conversational Woman German Conversation Man German Reporter Man German Woman Grace Griffin Happy Carson Helpful French Lady Helpful Woman Hindi Calm Man Hinglish Speaking Woman Indian Lady Indian Man Isabel Ishan Jacqueline Janvi Japanese Male Conversational Joan of Ark John Jordan Katie Keith Kenneth Kentucky Man Korean Support Woman Laidback Woman Lena Lily Whisper Little Gaming Girl Little Narrator Girl Liv Lukas Luke Madame Mischief Madison Maria Mateo Mexican Man Mexican Woman Mia Middle Eastern Woman Midwestern Man Midwestern Woman Movieman Nathan Newslady Newsman New York Man Nico Nonfiction Man Olivia Orion Peninsular Spanish Narrator Lady Pleasant Brazilian Lady Pleasant Man Polite Man Princess Professional Woman Rebecca Reflective Woman Ronald Russian Storyteller Man Salesman Samantha Angry Samantha Happy Sarah Sarah Curious Savannah Silas Sophie Southern Man Southern Woman Spanish Narrator Woman Spanish Reporter Woman Spanish-speaking Reporter Man Sportsman Stacy Stern French Man Steve Storyteller Lady Sweet Lady Tatiana Taylor Teacher Lady The Merchant Tutorial Man Wise Guide Man Wise Lady Wise Man Wizardman Yogaman Young Shy Japanese Woman Zia
-	gemini-2.5-flash-tts	-	-	Gemini‑2.5‑Flash‑TTS is Google’s low‐latency text‑to‑speech model that converts text input into audio output, supporting both single‑ and multi‑speaker voices with controllable style, accent, and expressive tone — ideal for applications like podcasts, audiobooks, and conversational voice systems. This bot does not accept attachments. Parameter controls available: 1. Voice & Style Configuration - Basic Settings - `--mode single` (default) for single speaker or `--mode multi` for conversation - `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US) - `--output_format [MP3\|WAV\|OGG]` (default: MP3) - Single speaker: `--voice [voice_name]` (default: Charon) - Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore) - Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2) - Style Instructions - `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent") 2. Limitations - Text and style prompt limited to 4000 bytes each - Multi-speaker requires `SpeakerName: text` format Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm) Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
-	gemini-2.5-pro-tts	-	-	Gemini‑2.5‑Pro‑TTS is Google’s highest‑quality text‑to‑speech model preview, designed for complex workflows like podcasts, audiobooks, and customer support; it delivers expressive, accent‑ and style‑controllable single‑ or multi‑speaker speech, supporting over 23 languages, and built for state‑of‑the‑art output with the most powerful model architecture. This bot does not accept attachments. Parameter controls available: 1. Voice & Style Configuration - Basic Settings - `--mode single` (default) for single speaker or `--mode multi` for conversation - `--language [code]` (e.g., en-US, fr-FR, ja-JP; default: en-US) - `--output_format [MP3\|WAV\|OGG]` (default: MP3) - Single speaker: `--voice [voice_name]` (default: Charon) - Multi-speaker: `--voice [voice_name]` (primary speaker, default: Charon), `--voice2 [voice_name]` (secondary speaker, default: Kore) - Multi-speaker: `--speaker1_name [name]` (default: Speaker1), `--speaker2_name [name]` (default: Speaker2) - Style Instructions - `--style_prompt [text]` for tone/emotion (e.g., "Cheerful tone", "Slow British accent") 2. Limitations - Text and style prompt limited to 4000 bytes each - Multi-speaker requires `SpeakerName: text` format Available voices: Zephyr (Bright), Puck (Upbeat), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Umbriel (Easy-going), Algieba (Smooth), Despina (Smooth), Erinome (Clear), Algenib (Gravelly), Rasalgethi (Informative), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Achird (Friendly), Zubenelgenubi (Casual), Vindemiatrix (Gentle), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm) Available languages: English (US, en-US), Arabic (Egyptian, ar-EG), Bengali (Bangladesh, bn-BD), Dutch (Netherlands, nl-NL), French (France, fr-FR), German (Germany, de-DE), Hindi (India, hi-IN), Indonesian (Indonesia, id-ID), Italian (Italy, it-IT), Japanese (Japan, ja-JP), Korean (Korea, ko-KR), Marathi (India, mr-IN), Polish (Poland, pl-PL), Portuguese (Brazil, pt-BR), Romanian (Romania, ro-RO), Russian (Russia, ru-RU), Spanish (US, es-US), Tamil (India, ta-IN), Telugu (India, te-IN), Thai (Thailand, th-TH), Turkish (Turkey, tr-TR), Ukrainian (Ukraine, uk-UA), Vietnamese (Vietnam, vi-VN)
-	orpheus-tts	-	-	Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. Send a text prompt to voice it. Use --voice to choose from one of the available voices (`tara`, `leah`, `jess`, `leo`, `dan`,`mia`, `zac`, `zoe`). Officially supported sound effects are: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>, and <giggle>.
-	deepgram-nova-3	-	-	Transcribe audio files using Speech-to-Text technology with the Deepgram Nova-3 model, featuring multi-language support and advanced customizable settings. [1] Basic Features: Use `--generate_pdf true` to generate a PDF file of the transcription, Use `--diarize true` to identify different speakers in the audio. This will automatically enable utterances. Use `--smart_format false` to disable automatic format text for improved readability including punctuation and paragraphs. This feature is enabled by default. [2] Advanced Features: Use `--dictation true` to convert spoken commands for punctuation into their respective marks (e.g., 'period' becomes '.'). This will automatically enable punctuation. Use `--measurements true` to format spoken measurement units into abbreviations Use `--profanity_filter true` to replace profanity with asterisks Use `--redact_pci true` to redact payment card information Use `--redact_pii true` to redact personally identifiable information Use `--utterances true` to segment speech into meaningful semantic units Use `--paragraphs false` to disable paragraphs feature. This feature split audio into paragraphs to improve transcript readability. This will automatically enable punctuation. This is enabled by default. Use `--punctuate false` to disable punctuate feature. This feature add punctuation and capitalization to your transcript. This is enabled by default. Use `--numerals false` to disable numerals feature. This feature convert numbers from written format to numerical format [3] Languages Supported: Auto-detect (Default) English Spanish French German Italian Portuguese Japanese Chinese Hindi Russian Dutch [4] Key Terms `--keyterm` to enter important terms to improve recognition accuracy, separated by commas. English only, Limited to 500 tokens total.
-	playai-tts	-	-	Generates audio based on your prompt using PlayHT's text-to-speech model, in the voice of your choice. Use --voice [voice_name] to pass in the voice of your choice, choosing one from below. Voice defaults to `Jennifer_(English_(US)/American)`. Jennifer_(English_(US)/American) Dexter_(English_(US)/American) Ava_(English_(AU)/Australian) Tilly_(English_(AU)/Australian) Charlotte_(Advertising)_(English_(CA)/Canadian) Charlotte_(Meditation)_(English_(CA)/Canadian) Cecil_(English_(GB)/British) Sterling_(English_(GB)/British) Cillian_(English_(IE)/Irish) Madison_(English_(IE)/Irish) Ada_(English_(ZA)/South_African) Furio_(English_(IT)/Italian) Alessandro_(English_(IT)/Italian) Carmen_(English_(MX)/Mexican) Sumita_(English_(IN)/Indian) Navya_(English_(IN)/Indian) Baptiste_(English_(FR)/French) Lumi_(English_(FI)/Finnish) Ronel_Conversational_(Afrikaans/South_African) Ronel_Narrative_(Afrikaans/South_African) Abdo_Conversational_(Arabic/Arabic) Abdo_Narrative_(Arabic/Arabic) Mousmi_Conversational_(Bengali/Bengali) Mousmi_Narrative_(Bengali/Bengali) Caroline_Conversational_(Portuguese_(BR)/Brazilian) Caroline_Narrative_(Portuguese_(BR)/Brazilian) Ange_Conversational_(French/French) Ange_Narrative_(French/French) Anke_Conversational_(German/German) Anke_Narrative_(German/German) Bora_Conversational_(Greek/Greek) Bora_Narrative_(Greek/Greek) Anuj_Conversational_(Hindi/Indian) Anuj_Narrative_(Hindi/Indian) Alessandro_Conversational_(Italian/Italian) Alessandro_Narrative_(Italian/Italian) Kiriko_Conversational_(Japanese/Japanese) Kiriko_Narrative_(Japanese/Japanese) Dohee_Conversational_(Korean/Korean) Dohee_Narrative_(Korean/Korean) Ignatius_Conversational_(Malay/Malay) Ignatius_Narrative_(Malay/Malay) Adam_Conversational_(Polish/Polish) Adam_Narrative_(Polish/Polish) Andrei_Conversational_(Russian/Russian) Andrei_Narrative_(Russian/Russian) Aleksa_Conversational_(Serbian/Serbian) Aleksa_Narrative_(Serbian/Serbian) Carmen_Conversational_(Spanish/Spanish) Patricia_Conversational_(Spanish/Spanish) Aiken_Conversational_(Tagalog/Filipino) Aiken_Narrative_(Tagalog/Filipino) Katbundit_Conversational_(Thai/Thai) Katbundit_Narrative_(Thai/Thai) Ali_Conversational_(Turkish/Turkish) Ali_Narrative_(Turkish/Turkish) Sahil_Conversational_(Urdu/Pakistani) Sahil_Narrative_(Urdu/Pakistani) Mary_Conversational_(Hebrew/Israeli) Mary_Narrative_(Hebrew/Israeli)
-	unreal-speech-tts	-	-	Convert chats, URLs, and documents into natural speech. 8 Languages: English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese. Use `--voice <VOICE_NAME>`. Defaults to `--voice Sierra`. Full list below: American English - Male: Noah, Jasper, Caleb, Ronan, Ethan, Daniel, Zane, Rowan - Female: Autumn, Melody, Hannah, Emily, Ivy, Kaitlyn, Luna, Willow, Lauren, Sierra British English - Male: Benjamin, Arthur, Edward, Oliver - Female: Eleanor, Chloe, Amelia, Charlotte Japanese - Male: Haruto - Female: Sakura, Hana, Yuki, Rina Chinese - Male: Wei, Jian, Hao, Sheng - Female: Mei, Lian, Ting, Jing Spanish - Male: Mateo, Javier - Female: Lucía French - Female: Élodie Hindi - Male: Arjun, Rohan - Female: Ananya, Priya Italian - Male: Luca - Female: Giulia Portuguese - Male: Thiago, Rafael - Female: Camila
-	imagen-4-ultra	42,000.00	-	DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-exp-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
-	imagen-4-fast	14,000.00	-	DeepMind's June 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-fast-generate-preview-06-06` model from Google Vertex, and has a maximum input of 480 tokens.
-	imagen-4	28,000.00	-	DeepMind's May 2025 text-to-image model with exceptional prompt adherence, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). Non-English input will be translated first. Serves the `imagen-4.0-ultra-generate-05-20` model from Google Vertex, and has a maximum input of 480 tokens.
-	phoenix-1.0	17,000.00	-	High-fidelity image generation with strong prompt adherence, especially for long and detailed instructions. Phoenix is capable of rendering coherent text in a wide variety of contexts. Prompt enhance is on to see the full power of a long, detailed prompt, but it can be turned off for full control. Uses the Phoenix 1.0 Fast model for performant, high-quality generations. Parameters: - Aspect Ratio (1:1, 3:2, 2:3, 9:16, 16:9) - Prompt Enhance (Enable the prompt for better image generation) - Style (Please see parameter control to identify available styles) Image generation prompts can be a maximum of 1500 characters.
-	dreamina-3.1	-	-	ByteDance's Dreamina 3.1 Text-to-Image showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details. This model excels with large prompts, please use large prompts in case you face Content Checker issues. The model does not accept attachment. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, & 9:16.
-	qwen-image	20,000.00	-	Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Experiments show strong general capabilities in image generation, with exceptional performance in text rendering, especially for Chinese. Prompt input cannot exceed 2,000 characters.
-	qwen-image-20b	-	-	Qwen-Image (20B) is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt.
-	hunyuan-image-2.1	-	-	Hunyuan Image 2.1 is a high quality, highly efficient text-to-image model. Send a prompt to generate an image. Use `--aspect` (one of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`) to set the aspect ratio of the generated image. Use `--negative_prompt` (examples: blur, low resolution, poor quality) to set negative prompt on the image generated. This bot does not accept attachment.
-	flux-kontext-max	-	-	FLUX.1 Kontext [max] is a new premium model from Black Forest Labs that brings maximum performance across all aspects. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
-	flux-kontext-pro	-	-	The FLUX.1 Kontext [pro] model delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, flawless typography, and image editing capabilities. Send a prompt to generate an image, or send an image along with an instruction to edit the image. Use `--aspect` to set the aspect ratio for text-to-image-generation. Available aspect ratio (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, & 9:21)
-	flux-krea	-	-	FLUX-Krea is a version of FLUX Dev tuned for superior aesthetics. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Krea Redux.
-	imagen-3	28,000.00	-	Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For simpler prompts, faster results, & lower cost, use @Imagen3-Fast. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
-	wan-animate	-	-	Wan Animate takes in an image and a video to generate another video where a character in the image replaces a character in the video(default), or the video character's motion is used to animate the character in the image. Pass --animate for the second functionality. The bot supports only four file types: JPEG, PNG, WebP, and MP4
-	imagen-3-fast	14,000.00	-	Google DeepMind's highest quality text-to-image model, capable of generating images with great detail, rich lighting, and few distracting artifacts — optimized for short, simple prompts. To adjust the aspect ratio of your image add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4). For more complex prompts, use @Imagen3. Non english input will be translated first. Image prompt cannot exceed 480 tokens.
-	seedream-3.0	-	-	Seedream 3.0 by ByteDance is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation.
-	seedance-1.0-pro	-	-	Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. `--duration` (3 to 12) sets the video duration. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
-	seedance-1.0-lite	-	-	Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Optional paremeters: Use `--aspect` to set the aspect ratio (available values:`21:9`, `16:9`, `4:3`, `1:1`, `3:4` and `9:16`). Use `--resolution` (one of `480p`, `720p` and `1080p` to set the video resolution. Use `--duration` (3 to 12) sets the video duration. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024).
-	ideogram-v3	-	-	Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use. Use `--aspect` to set the aspect ratio (Valid aspect ratios are 5:4, 4:3, 4:5, 1:1, 1:2, 1:3, 3:4, 3:1, 3:2, 2:1, 2:3, 16:9, 16:10, 10:16, 9:16), and use `--style` to specify a style (one of `AUTO`, `GENERAL`, `REALISTIC`, and `DESIGN`, default: `AUTO`.). Send one image with a prompt for image remixing/restyling. Send two images (one an image and the other a black-and-white mask image denoting an area) for image editing.
-	ideogram-v2	57,000.00	-	Latest image model from Ideogram, with industry leading capabilities in generating realistic images, graphic design, typography, and more. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, 1:1. "--style" parameter can be defined to specify the style of image generated(GENERAL, REALISTIC, DESIGN, RENDER_3D, ANIME). Powered by Ideogram.
-	flux-dev-di	5,000.00	-	High quality image generator using FLUX dev model. Top of the line prompt following, visual quality and output diversity. This model is a text to image generation only and does not accept attachments. To further customize the prompt, you can follow the parameters available: To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024 To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024 To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 25
-	flux-schnell-di	990.00	-	This is the fastest version of FLUX, featuring highly optimized abstract models that excel at creative and unconventional renders. To further customize the prompt, you can follow the parameters available: To set width, use "--width". Valid pixel options from 128 up to 1920. Default value: 1024 To set height, use "--height". Valid pixel options from 128, up to 1920. Default value: 1024 To set seed, use "--seed" for reproducible result. Options from 1 up to 2**32. Default value: random To set inference, use "--num_inference_steps". Options from 1 up to 50. Default: 1
-	flux-pro-1.1	-	-	State-of-the-art image generation with top-of-the-line prompt following, visual quality, image detail and output diversity. This is the most powerful version of FLUX 1.1, use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
-	luma-photon-flash	-	-	Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
-	hidream-i1-full	-	-	Hidream-I1 is a state-of-the-art text to image model by Hidream. Use `--aspect` to set the aspect ratio. Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Use `--negative_prompt` to set the negative prompt. Hosted by fal.ai.
-	retro-diffusion-core	-	-	Generate true game ready pixel art in seconds at any resolution between 16x16 and 512x512 across the various styles. Create 48x48 walking animations of sprites using the "animation_four_angle_walking" style! First 50 basic image requests worth of points free! Check out more settings below 👇 Example message: "A cute corgi wearing sunglasses and a party hat --ar 128:128 --style rd_fast__portrait" Settings: --ar <width>:<height> (Image size in pixels, larger images cost more. Or aspect ratio like 16:9) --style <style_name> (The name of the style you want to use. Available styles: rd_fast__anime, rd_fast__retro, rd_fast__simple, rd_fast__detailed, rd_fast__game_asset, rd_fast__portrait, rd_fast__texture, rd_fast__ui, rd_fast__item_sheet, rd_fast__mc_texture, rd_fast__mc_item, rd_fast__character_turnaround, rd_fast__1_bit, animation__four_angle_walking, rd_plus__default, rd_plus__retro, rd_plus__watercolor, rd_plus__textured, rd_plus__cartoon, rd_plus__ui_element, rd_plus__item_sheet, rd_plus__character_turnaround, rd_plus__isometric, rd_plus__isometric_asset, rd_plus__topdown_map, rd_plus__top_down_asset) --seed (Random number, keep the same for consistent generations) --tile (Creates seamless edges on applicable images) --tilex (Seamless horizontally only) --tiley (Seamless vertically only) --native (Returns pixel art at native resolution, without upscaling) --removebg (Automatically remove the background) --iw <decimal between 0.0 and 1.0> (Controls how strong the image generation is. 0.0 for small changes, 1.0 for big changes) Additional notes: All styles have a size range of 48x48 -> 512x512, except for the "mc" styles, which have a size range of 16x16 -> 128x128, and the "animation_four_angle_walking" style, which will only create 48x48 animations.
-	stablediffusion3.5-l	-	-	Stability.ai's StableDiffusion3.5 Large, hosted by @fal, is the Stable Diffusion family's most powerful image generation model both in terms of image quality and prompt adherence. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16.
-	flux-schnell	-	-	Turbo speed image generation with strengths in prompt following, visual quality, image detail and output diversity. This is the fastest version of FLUX.1. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
-	gpt-image-1	-	-	OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query. For a conversational editing experience, use https://poe.com/GPT-4o (all users) or https://poe.com/Assistant (subscribers) instead. Optional parameters: `--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image ` --quality` (options: high, medium, low): Image resolution ` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit. `--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
-	gpt-image-1-mini	-	-	OpenAI's model that powers image generation in ChatGPT, offering exceptional prompt adherence, level of detail, and quality. It supports editing, restyling, and combining images attached to the latest user query. Optional parameters: `--aspect` (options: 1:1, 3:2, 2:3): Aspect ratio of the output image ` --quality` (options: high, medium, low): Image resolution ` --use_mask`: Indicates that the last attached image is a mask for in-painting (editing specific regions). The mask must match the dimensions of the base image, with transparent (zero-alpha) areas showing which parts to edit. `--use_high_fidelity` to false to disable high input fidelity. This option is enabled by default.
-	veo-3.1	-	-	Google’s Veo 3.1 is an updated version of the Veo family of models that features richer native audio, from natural conversations to synchronized sound effects, and offers greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes. Optional parameters: `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9` negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy` `--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s` `--seed` to set the seed (set number value) `--reference-mode` toggle to use input images(3 max) as reference for video generation For first & last frame video generation and references support, please use www.poe.com/Veo-v3.1
-	veo-3.1-fast	-	-	Google’s Veo 3.1 Fast is an updated version of the Veo family of models that's optimized for speed and cost, but still features richer native audio, from natural conversations to synchronized sound effects, and offer greater narrative control with an improved understanding of cinematic styles. Enhanced image-to-video capabilities ensure better prompt adherence while delivering superior audio and visual quality and maintaining character consistency across multiple scenes. Optional parameters: `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9` negative prompt can be set by adding `--no` on elements to avoid e.g. `--no blurry`, `--no cloudy` `--duration` to set the duration (one of `4s`, `6s`, or `8s`), which defaults to `8s` `--seed` to set the seed (set number value) For first & last frame video generation support, please use www.poe.com/Veo-v3.1-Fast
-	sora-2-pro	-	-	Sora 2 Pro is OpenAI’s state-of-the-art video and audio generation model, capable of creating richly detailed, dynamic clips with synchronized audio from natural language prompts or images. It builds on Sora 2’s capabilities with enhanced physical accuracy, intricate world-state persistence, and higher fidelity in cinematic styles. The model excels at generating synchronized dialogue, sound effects, and realistic simulations, all while adhering to real-world physics. Sora 2 Pro also supports seamless editing, complex multi-shot prompt execution, and the integration of real-world elements like people, animals, and objects with unparalleled detail and accuracy. This bot supports text-to-video and image-to-video generation. Optional parameters: `--duration` (options: 4, 8, 12): Video output duration in seconds `--size` (options: [Landscape] - 1280x720, 1792x1024, [Portrait] - 720x1280, 1024x1792): Resolution of the output video
-	sora-2	-	-	Sora 2 is OpenAI’s latest video and audio generation model, delivering exceptional realism, physical accuracy, and controllability. It excels at creating cinematic scenes, synchronized dialogue, sound effects, and dynamic simulations while faithfully adhering to the laws of physics. The model supports editing, multi-shot prompt adherence, and the integration of real-world elements, such as people, animals, and objects. This bot supports text-to-video and image-to-video generation. Optional parameters: `--duration` (options: 4, 8, 12): Video output duration in seconds `--size` (options: [landscape] - 1280x720, [portrait] - 720x1280): Resolution of the output video
-	kling-2.5-turbo-std	-	-	Generate high-quality videos from images using Kling 2.5 Turbo Standard. Optional prompts: Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--duration` to set either 5 or 10 second video. Note - only Image to Video is supported, aspect ratio is inferred automatically from the image and cannot be set. Supported image file format: jpeg, png, webp
-	wan-2.6	-	-	WAN 2.6 is Alibaba’s multimodal video generation model built for cinematic, multi-shot storytelling—creating high-fidelity videos from text and/or images while keeping characters and style consistent across scenes. It also supports native audio-visual sync (including lip-sync) and can generate or align dialogue/music/SFX with the visuals, enabling “prompt-to-video” results that feel production-ready without heavy post work. Notes: - This model is served from the Singapore area. - Upload an image to enable image-to-video generations or video(s) for video-to-video generations. - Responses may take upwards of 5 minutes (or more) to finish generating. Parameter controls available: 1. Video Settings - `--resolution 1080p` (default) or `--resolution 720p` - `--aspect_ratio 16:9` (default), `9:16`, `1:1`, `4:3`, or `3:4` (ignored for image-to-video as it uses the input image's aspect ratio) - `--duration [5, 10, or 15]` seconds (default: 5) (video-to-video limited to 10s max) 2. Advanced Settings - `--prompt_extend true` (default) or `--prompt_extend false`: AI prompt enhancement - `--audio true` (default) or `--audio false`: Enable/disable audio generation - `--shot_type multi` (default) or `--shot_type single`: Multi-shot narrative vs single continuous shot - `--seed [0-2147483646]`: Random seed for reproducibility - `--negative_prompt "text"`: Describe what you don't want in the video 3. Attachments - For i2v: Attach an image as the first frame - For r2v: Attach 1-3 reference videos (2-30 seconds each, MP4/MOV) (Use `character1`, `character2`, `character3` in prompt to reference subjects, ex. character1 references the subject in the first uploaded video) - For t2v/i2v: Optionally attach an audio file (3-30 seconds, max 15mb, .mp3/.wav) for custom audio 4. Multi-Shot Prompting - For multi-shot mode, use timeline syntax: `[Shot #] [Timestamp] [Action]`. Example: `[Shot 1] [0-5s] Wide shot of city skyline. [Shot 2] [5-10s] Close-up of character walking.` - Ensure timestamps match your selected duration and use transition keywords like "Hard cut" or "Fade in" between shots.
-	seedream-4.0	-	-	Seedream 4.0 is ByteDance's latest and best text-to-image model, capable of impressive high fidelity image generation, with great text-rendering ability. Seedream 4.0 can also take in multiple images as references and combine them together or edit them to return an output. Pass `--aspect` to set the aspect ratio for the model (One of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`).
-	kling-2.5-turbo-pro	-	-	Generate high-quality videos from text and images using Kling 2.5 Turbo Pro. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`, only works for text-to-video). Use `--duration` to set either 5 or 10 second video.
-	kling-2.1-master	-	-	Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use --duration to set either 5 second or 10 second video.
-	hailuo-02	-	-	Hailuo-02, MiniMax's latest video generation model. Generates 6-second, 768p videos, just submit a text prompt or an image with a prompt describing the desired video behavior, and it will create it; typically takes ~5 minutes for generation time. Strong motion effects and ultra-clear quality.
-	hailuo-02-standard	-	-	MiniMax Hailuo-02 Video Generation model: Advanced image-to-video generation model with 768p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Use `--duration` to set the video duration (6 or 10 seconds).
-	hailuo-02-pro	-	-	MiniMax Hailuo-02 Pro Video Generation model: Advanced image-to-video generation model with 1080p resolution. Send a prompt with an image for image-to-video, and just a prompt for text-to-video generation. Generates 5 second video.
-	deepseek-r1-turbo-di	15,000.00	-	Top open-source reasoning LLM rivaling OpenAI's o1 model; delivers top-tier performance across math, code, and reasoning tasks at a fraction of the cost. Turbo model is quantized to achieve higher speeds. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
-	hailuo-director-01	-	-	Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control. Both text-to-video and image-to-video are supported. Camera movement instructions can be added using square brackets (e.g. [Pan left] or [Zoom in]). You can use up to 3 combined movements per prompt. Duration is fixed to 5 seconds. Supported movements: Truck left/right, Pan left/right, Push in/Pull out, Pedestal up/down, Tilt up/down, Zoom in/out, Shake, Tracking shot, Static shot. For example: [Truck left, Pan right, Zoom in]. For a more detailed guide, refer https://sixth-switch-2ac.notion.site/T2V-01-Director-Model-Tutorial-with-camera-movement-1886c20a98eb80f395b8e05291ad8645
-	pixverse-v5	-	-	Pixverse v5 offers advanced creative tools with three main features: Text-to-Video, which transforms written prompts into cinematic, high-detail video clips with fluid motion and accurate visual interpretation; Image-to-Video, which animates static images into dynamic short videos with lifelike motion and smooth transitions; and Transition, which generates seamless morphs between frames or scenes to create unified, professional-quality visual flow. Parameter Controls and Usage: 1. Video Generation (Main Control Section) - `--resolution [360p\|540p\|720p\|1080p]` - Description: Video resolution. - Default: 720p - `--duration [5\|8]` - Description: Video length in seconds. - Default: 5 - `--aspect_ratio [16:9\|4:3\|1:1\|3:4\|9:16]` - Description: Video aspect ratio. - Default: 16:9 - `--style [none\|anime\|3d_animation\|clay\|comic\|cyberpunk]` - Description: Video style (optional). - Default: none - `--negative_prompt "[text]"` - Description: Elements to avoid (optional). - Default: "" (empty) - `--seed [integer]` - Description: Optional seed for reproducibility (e.g., 12345). - Default: "" (empty/random) 2. Generation Modes (Determined by attachments) - Text-to-Video: Provide a prompt with 0 image attachments. - Image-to-Video: Provide 1 image attachment. - Transition: Provide 2 image attachments (first is start frame, second is end frame). 3. Limitations - The combination of `--resolution 1080p` and `--duration 8` is not supported. - Only 0, 1, or 2 image attachments are supported. - Attachments must be images (PNG/JPEG/WEBP/TIFF/BMP/HEIC/GIF).
-	wan-2.5	-	-	Wan-2.5 Video Generation bot. Has text-to-video and image-to-video capabilities. Optionally, send an audio file (mp3) to guide the video generation. Optional Parameters: Control the output's resolution with `--resolution` (480p, 720p or 1080p) defaults to 720. Pricing varies on the basis of resolution. Aspect ratio with `--aspect` ( 16:9, 1:1, 9:16) defaults to 16:9. Duration with `--duration` ( 5s or 10s) defaults to 5s.
-	pixverse-v4.5	-	-	Pixverse v4.5 is a video generation model capable of generating high quality videos in under a minute. Use `--negative_prompt` to set the negative prompt. Use `--duration` to set the video duration (5 or 8 seconds). Set the resolution (360p,540p,720p or 1080p) using `--resolution`. Send 1 image to perform an image-to-video task or a video effect generation task, and 2 images to perform a video transition task, using the first image as the first frame and the second image as the last frame. Use `--effect` to set the video generation effect, provided 1 image is given (Options: `Kiss_Me_AI`, `Kiss`, `Muscle_Surge`, `Warmth_of_Jesus`, `Anything,_Robot`, `The_Tiger_Touch`, `Hug`, `Holy_Wings`, `Hulk`, `Venom`, `Microwave`). Use `--style` to set the video generation style (for text-to-video,image-to-video, and transition only, options: `anime`, `3d_animation`, `clay`, `comic`, `cyberpunk`). Use `--seed` to set the seed and `--aspect` to set the aspect ratio.
-	flux-dev	-	-	High-performance image generation with top of the line prompt following, visual quality, image detail and output diversity. This is a more efficient version of FLUX-pro, balancing quality and speed. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 16:9, 4:3, 1:1, 3:4, 9:16. Send an image to have this model reimagine/regenerate it via FLUX Redux.
-	lyria	-	-	Google DeepMind's Lyria 2 delivers high-quality audio generation, capable of creating diverse soundscapes and musical pieces from text prompts. Allows users to specify elements to exclude in the audio using the "--no" parameter at the end of the prompt. Also supports "--seed" for deterministic generation. e.g. "An energetic electronic dance track --no vocals, slow tempo --seed 123". Lyria blocks prompts that name specific artists or songs (artist-intent and recitation checks). This bot does not support attachments. This bot accepts input prompts of up to 480 tokens.
-	kling-1.6-pro	-	-	Kling v1.6 video generation bot, hosted by fal.ai. For best results, upload an image attachment. Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
-	clarity-upscaler	-	-	Upscales images with high fidelity to the original image. Use "--upscale_factor" (value is a number between 1 and 4) to set the upscaled images' size (2 means the output image is 2x in size, etc.). "--creativity" and "--clarity" can be set between 0 and 1 to alter the faithfulness to the original image and the sharpness, respectively. This bot supports .jpg and .png images.
-	topazlabs	30.00	-	Topaz Labs’ image upscaler is a best-in-class generative AI model to increase overall clarity and the pixel amount of inputted photos — whether they be ones generated by AI image models and from the real world — while preserving the original photo’s contents. It can produce images of as small as ~10MB and as large as 512MB, depending on the size of the input photo. Specify --upscale and a number up to 16 to control the upscaling factor, output_height and/or output_width to specify the number of pixels for each dimension, and add --generated if the input photo is AI-generated. With no parameters specified, it will increase both input photo’s height and width by 2; especially effective on images of human faces.
-	veo-v3.1	-	-	Google's Veo-3.1 is an improved version of Veo 3. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`). Use `--silent` to generate a silent video at a lower cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V). Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation. Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task. Pass up to 3 images with `--reference` for a reference-to-video task. Reference images will be directly used in the video generation.
-	veo-v3.1-fast	-	-	Google's Veo 3.1 Fast is a fast version of Veo 3.1. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `9:16`). Use `--silent` to generate a silent video at a lower cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. (only for T2V). Use --duration to set the duration `4s`, `6s`, `8s`, default `8s`. `4s` and `6s` are only supported for text-to-video generation Pass a single image for image to video tasks. Pass two images for a first-frame-to-last-frame video generation task.
-	wan-2.2	-	-	Wan-2.2 is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Send one image for image to video tasks, and send two images for first-frame - last-frame generation. Use `--aspect` to set the aspect ratio (One of `16:9`, `1:1`, `9:16`) for text-to-video requests. Duration is limited to 5 seconds only with up to 720p resolution.
-	ltx-2-fast	-	-	LTX-2 Fast is a video model by Lightricks that delivers exceptional quality and speed. It can generate videos at up to 50 FPS in high resolutions and supports both text-to-video and image-to-video generation. Optional Prompts: Use `--generate-audio` to generate an audio with the video. This is disabled by default. Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default. Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price. Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default File attachment accepted: jpeg, png, webp
-	ltx-2-pro	-	-	LTX-2 Pro is an advanced video generation model by Lightricks designed for professional‑grade results. It offers high‑quality, realistic video generation at exceptional speed and supports outputs up to 2K resolution. Perfect for both text‑to‑video and image‑to‑video creation, it delivers cinematic detail and smooth performance. Optional Prompts: Use `--generate_audio` to generate an audio with the video. This is disabled by default. Pass resolution as `--resolution` with one of `1080p`, `1440p`, `2160p`. This is set to 1080p by default. Set the duration of the generated video with `--duration` (one of `6s, 8s, 10s`). This is set to 6s by default. Duration and resolution values will change the price. Set the fps of the generated video with `--fps`, to one of 25 or 50. This is set to 25 by default. File attachment accepted: jpeg, png, webp
-	veo-3	-	-	Veo 3 produces incredibly high-quality videos across a diverse range of subjects and styles. It incorporates an enhanced understanding of real-world physics and the subtleties of human movement and expression, resulting in greater detail and overall realism. Veo 3 is fluent in the unique language of cinematography: you can request a specific genre, specify a lens, or suggest cinematic effects, and Veo 3 will deliver stunning 8-second video clips. It supports both text-to-video and image-to-video generation and also features native audio generation based on text prompts. Please note that Veo 3 does not accept audio attachments. To exclude specific elements, use --no followed by a negative prompt (e.g., blurry, cloudy, or other attributes). To set a specific seed value, use `--seed` followed by the desired number (e.g., --seed 2). To set aspect ratio, use `aspect_ratio` followed by either 16:9 or 9:16. To set duration, use `--duration` followed by either 4s, 6s, 8s.
-	veo-3-vfast	-	-	Veo-3 Fast is a faster and more cost effective version of Google's Veo 3. Use `--aspect` to set the aspect ratio of the generated image (one of `16:9`, `1:1`, `9:16`). Use `--generate_audio` to generate audio with your video at a higher cost. Use --negative_prompt to set negative prompt option `blur`, `low resolution`, `poor quality`. Duration is limited to 7 seconds. This is a text to video generation model only.
-	vidu	-	-	The Vidu Video Generation Bot creates videos using images and text prompts. You can generate videos in four modes: (1) Image-to-Video: send 1 image with a prompt, (2) Start-to-End Frame: send 2 images with a prompt for transition videos, (3) Reference-to-Video: send up to 3 images with the `--reference` flag for guidance, and (4) Template-to-Video: use `--template` to apply pre-designed templates (1-3 images required, pricing varies by template). Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image. The bot supports aspect ratios `--aspect` (16:9, 1:1, 9:16), set movement amplitude `--movement-amplitude`, and accepts PNG, JPEG, and WEBP formats. Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video). Duration is limited to 5 seconds.
-	vidu-q1	-	-	The Vidu Q1 Video Generation Bot creates videos using text prompts and images. You can generate videos in three modes: (1) Text-to-Video: send a text prompt, (2) Image-to-Video: send 1 image with a prompt, and (3) Reference-to-Video: send up to 7 images with the `--reference flag`. Number of images required varies by template: `dynasty_dress` and `shop_frame` accept 1-2 images, `wish_sender` requires exactly 3 images, all other templates accept only 1 image. The bot support aspect ratios `--aspect` (16:9, 1:1, 9:16) and set movement amplitude `--movement-amplitude` that can be customized for text-to-video and reference-to-video tasks. Tasks are mutually exclusive (e.g., you cannot combine start-to-end frame and reference-to-video generation). The bot accepts PNG, JPEG, and WEBP formats. Duration is limited to 5 seconds.
-	veo-3-fast	-	-	Veo 3 Fast is a speed-optimized variant of Google’s Veo 3 AI video generation engine. It’s designed for rapid, cost-efficient production of short clips with synchronized audio (dialogue, ambient sound, effects). Prioritizes faster generation times while still delivering solid visual and audio quality, supports text-to-video and image-to-video workflows, allowing creators to animate still images into motion sequences, operates under defined constraints (e.g. video lengths of 4, 6, or 8 seconds, specified via the --duration parameter, e.g. "A cat dances --duration 6" will produce a 6-second video). Use `--aspect_ratio` to set the aspect ratio (either `16:9` or `9:16`), which defaults to `16:9`. Please only upload photos that you own or have the right to use, otherwise the bot will throw an error.
-	seedance-1.0-pro-fast	-	-	Seedance Pro Fast is a faster version of Seedance 1.0 Pro that balances speed, quality and cost. Seedance is a video generation model with text-to-video and image-to-video capabilities. It achieves breakthroughs in semantic understanding and prompt following. Optional prompts: Use `--aspect` to set the aspect ratio (available values: `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`). Set to `16:9` as default. Use `--resolution` (one of `480p`,`720p`,`1080p` to set the video resolution. Set to `1080p` as default. `--duration` (3 to 12) sets the video duration. Set to `5s` as default. Number of video tokens calculated for pricing is approximately: `height * width * fps * duration / 1024). File attachment accepted: jpeg, png, webp
-	sora	-	-	Sora is OpenAI's video generation model. Use `--duration` to set the duration of the generated video, and `--resolution` to set the video's resolution (480p, 720p, or 1080p). Set the aspect ratio of the generated video with `--aspect` (Valid aspect ratios are 16:9, 1:1, 9:16). This is a text-to-video model only. Switch to the newest models for improved video and audio creation: https://poe.com/Sora-2-Pro for cinematic excellence or https://poe.com/Sora-2 for unmatched realism and precision.
-	omnihuman	-	-	OmniHuman, by Bytedance, generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio. Send an image including a human figure with a visible face, and an audio, and the bot will return a video. The maximum audio length accepted is 30 seconds.
-	grok-code-fast-1	-	-	Grok-Code-Fast-1 from xAI is a high-performance, cost-efficient model designed for agentic coding. It offers visible reasoning traces, strong steerability, and supports a 256k context window.
-	bagoodex-web-search	-	-	Bagoodex delivers real-time AI-powered web search offering instant access to videos, images, weather, and more. Audio and video uploads are not supported at this time.
-	deep-ai-search	-	-	Deep search engine integrating Brave AI with real-time web search. This chatbot executes commands and scrapes websites at scale while preserving its hallmark intelligence advantage. The bot doesn't accept file attachments. Examples: https://poe.com/s/P0BQmsvbE7zusdY0n49l https://poe.com/s/QgQSPsLD9efQrIwbmwuO
-	kling-avatar-pro	-	-	Create lifelike avatar videos featuring realistic humans, animals, cartoons, or stylized characters. Simply upload an image and an audio file to generate a video of your character speaking. Supported file formats: Images: JPEG, PNG, WEBP Audio: MP3, WAV
-	playai-dialog	-	-	Generates dialogues based on your script using PlayHT's text-to-speech model, in the voices of your choice. Use --speaker_1 [voice_name] and --speaker_2 [voice_name] to pass in the voices of your choice, choosing from below. Voice defaults to `Jennifer_(English_(US)/American)`. Follow the below format while prompting (case sensitive): FORMAT: ``` Speaker 1: ...... Speaker 2: ...... Speaker 1: ...... Speaker 2: ...... --speaker_1 [voice_1] --speaker_2 [voice_2] ``` VOICES AVAILABLE: Jennifer_(English_(US)/American) Dexter_(English_(US)/American) Ava_(English_(AU)/Australian) Tilly_(English_(AU)/Australian) Charlotte_(Advertising)_(English_(CA)/Canadian) Charlotte_(Meditation)_(English_(CA)/Canadian) Cecil_(English_(GB)/British) Sterling_(English_(GB)/British) Cillian_(English_(IE)/Irish) Madison_(English_(IE)/Irish) Ada_(English_(ZA)/South_African) Furio_(English_(IT)/Italian) Alessandro_(English_(IT)/Italian) Carmen_(English_(MX)/Mexican) Sumita_(English_(IN)/Indian) Navya_(English_(IN)/Indian) Baptiste_(English_(FR)/French) Lumi_(English_(FI)/Finnish) Ronel_Conversational_(Afrikaans/South_African) Ronel_Narrative_(Afrikaans/South_African) Abdo_Conversational_(Arabic/Arabic) Abdo_Narrative_(Arabic/Arabic) Mousmi_Conversational_(Bengali/Bengali) Mousmi_Narrative_(Bengali/Bengali) Caroline_Conversational_(Portuguese_(BR)/Brazilian) Caroline_Narrative_(Portuguese_(BR)/Brazilian) Ange_Conversational_(French/French) Ange_Narrative_(French/French) Anke_Conversational_(German/German) Anke_Narrative_(German/German) Bora_Conversational_(Greek/Greek) Bora_Narrative_(Greek/Greek) Anuj_Conversational_(Hindi/Indian) Anuj_Narrative_(Hindi/Indian) Alessandro_Conversational_(Italian/Italian) Alessandro_Narrative_(Italian/Italian) Kiriko_Conversational_(Japanese/Japanese) Kiriko_Narrative_(Japanese/Japanese) Dohee_Conversational_(Korean/Korean) Dohee_Narrative_(Korean/Korean) Ignatius_Conversational_(Malay/Malay) Ignatius_Narrative_(Malay/Malay) Adam_Conversational_(Polish/Polish) Adam_Narrative_(Polish/Polish) Andrei_Conversational_(Russian/Russian) Andrei_Narrative_(Russian/Russian) Aleksa_Conversational_(Serbian/Serbian) Aleksa_Narrative_(Serbian/Serbian) Carmen_Conversational_(Spanish/Spanish) Patricia_Conversational_(Spanish/Spanish) Aiken_Conversational_(Tagalog/Filipino) Aiken_Narrative_(Tagalog/Filipino) Katbundit_Conversational_(Thai/Thai) Katbundit_Narrative_(Thai/Thai) Ali_Conversational_(Turkish/Turkish) Ali_Narrative_(Turkish/Turkish) Sahil_Conversational_(Urdu/Pakistani) Sahil_Narrative_(Urdu/Pakistani) Mary_Conversational_(Hebrew/Israeli) Mary_Narrative_(Hebrew/Israeli) Prompt input cannot exceed 10,000 characters.
-	luma-photon	-	-	Luma Photon delivers industry-specific visual excellence, crafting images that align perfectly with professional standards - not just generic AI art. From marketing to creative design, each generation is purposefully tailored to your industry's unique requirements. Add --aspect to the end of your prompts to change the aspect ratio of your generations (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21 are supported). Prompt input cannot exceed 5,000 characters.
-	ideogram	45,000.00	-	Excels at creating high-quality images from text prompts. For most prompts, https://poe.com/Ideogram-v2 will produce better results. Allows users to specify the aspect ratio of the image using the "--aspect" parameter at the end of the prompt (e.g. "Tall trees, daylight --aspect 9:16"). Valid aspect ratios are 10:16, 16:10, 9:16, 16:9, 3:2, 2:3, 4:3, 3:4, & 1:1.
-	seededit-3.0	-	-	SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images. Please send an image with a prompt to edit the image.
-	kling-2.1-pro	-	-	Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`. Requires an image attachment.
-	kling-2.1-std	-	-	Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Set video duration to one of `5` or `10` seconds with `--duration`.
-	runway-gen-4-turbo	-	-	Runway's Gen-4 Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 1:1, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds. Full prompting guide here: https://help.runwayml.com/hc/en-us/articles/39789879462419-Gen-4-Video-Prompting-Guide
-	runway	-	-	Runway's Gen-3 Alpha Turbo model creates best-in-class, controllable, and high-fidelity video generations based on your prompts. Both text inputs (max 1000 characters) and image inputs are supported, but we recommend using image inputs for best results. Use --aspect_ratio (16:9, 9:16, landscape, portrait) for landscape/portrait videos. Use --duration (5, 10) to specify video length in seconds.
-	veo-2	-	-	Veo 2 creates incredibly high-quality videos in a wide range of subjects and styles. It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall. Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver in 8-second clips. Use `--aspect_ratio` (16:9 or 9:16) to customize video aspect ratio. Supports text-to-video as well as image-to-video. Non english input will be translated first. Note: currently has low rate limit so you may need to retry your request at times of peak usage.
-	dream-machine	360,000.00	-	Luma AI's Dream Machine is an AI model that makes high-quality, realistic videos fast from text and images. Iterate at the speed of thought, create action-packed shots, and dream worlds with consistent characters on Poe today! To specify the aspect ratio of your video add --aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 9:21). To loop your video add --loop True.
-	kling-2.0-master	-	-	Generate high-quality videos from text or images using Kling 2.0 Master. Use `--negative_prompt` to send a negative prompt, and `--cfg_scale` to send a classifier-free guidance scale between 0.0 and 1.0 (inclusive). Use `--aspect` to set the aspect ratio (One of `16:9`, `9:16` and `1:1`). Use `--duration` to set either 5 or 10 second video.
-	qwen-edit	-	-	Image editing model based on Qwen-Image, with superior text editing capabilities.
-	gptzero	-	-	GPTZero is a deep-learning-driven platform designed to analyze and flag portions of text that are likely generated by AI vs. human authors. It distinguishes between “entirely human,” “entirely AI,” or “mixed” content and highlights the specific sentences involved. *Max number of files that can submitted simultaneously is 50, and the max file size for all files combined is 15 MB. Each file's document will be truncated to 50,000 characters. Supported file types: PDF, DOC/DOCX, TXT, ODT Parameter controls available: 1. Detection Options - Multilingual (FR/ES): - `--multilingual true` (Enables the GPTZero multilingual model) - `--multilingual false` (Default/Disabled) - Model Version: - `--modelVersion [version_string]` (Selects a specific GPTZero model version, e.g., '2025-10-30-base') - `--modelVersion __latest__` (Default: Automatically uses the latest model version)
-	kling-pro-effects	-	-	Generate videos with effects like squishing an object, two people hugging, making heart gestures, etc. using Kling-Pro-Effects. Requires an image input. Send a single image for `squish` and `expansion` effects and two images (of people) for `hug`, `kiss`, and `heart_gesture` effects. Set effect with --effect. Default effect: `squish`. Set duration with `--duration` with either 5s or 10s, set to 5s by default.
-	hailuo-live	-	-	Hailuo Live, the latest model from Minimax, sets a new standard for bringing still images to life. From breathtakingly vivid motion to finely tuned expressions, this state-of-the-art model enables your characters to captivate, move, and shine like never before. It excels in bring art and drawings to life, exceptional realism without morphing, emotional range, and unparalleled character consistency. Generates 5 second video.
-	hailuo-ai	-	-	Best-in-class text and image to video model by MiniMax.
-	ray2	-	-	Ray2 is a large–scale video generative model capable of creating realistic visuals with natural, coherent motion. It has strong understanding of text instructions and can also take image input. Can produce videos from 540p to 4k resolution and with either 5/9s durations.
-	veo-2-video	-	-	Veo2 is Google's cutting-edge video generation model. Veo creates videos with realistic motion and high quality output.
-	wan-2.1	-	-	Wan-2.1 is a text-to-video and image-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts. Generates 5 second video.
-	ideogram-v2a-turbo	24,000.00	-	Fast, affordable text-to-image model, optimized for graphic design and photography. For higher quality, use https://poe.com/Ideogram-v2A Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
-	ideogram-v2a	39,000.00	-	Fast, affordable text-to-image model, optimized for graphic design and photography. For faster and more cost-effective generations, use https://poe.com/Ideogram-v2A-Turbo Use `--aspect` to set the aspect ratio, and use `--style` to specify a style (one of `GENERAL`, `REALISTIC`, `DESIGN`, `3D RENDER` and `ANIME` default: `GENERAL`.)
-	trellis-3d	-	-	Generate 3D models from your images using Trellis, a native 3D generative model enabling versatile and high-quality 3D asset creation. Send an image to convert it into a 3D model.
-	flux-dev-finetuner	-	-	Fine-tune the FLUX dev model with your own pictures! Upload 8-12 of them (same subject, only one subject in the picture, ideally from different poses and backgrounds) and wait ~2-5 minutes to create your own finetuned bot that will generate pictures of this subject in whatever setting you want.
-	flux-inpaint	-	-	Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
-	flux-fill	-	-	Given an image and a mask (separate images), fills in the region of the image given by the mask as per the prompt. The base image should be the first image attached and the black-and-white mask should be the second image; a text prompt is required and should specify what you want the model to inpaint in the white area of the mask.
-	bria-eraser	-	-	Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Send an image and a black-and-white mask image denoting the objects to be cleared out from the image. The input prompt is only used to create the filename of the output image.
-	aya-vision	30.00	-	Aya Vision is a 32B open-weights multimodal model with advanced capabilities optimized for a variety of vision-language use cases. It is model trained to excel in 23 languages in both vision and text: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
-	kling-1.5-pro	-	-	Kling v1.5 video generation bot, hosted by fal.ai. For best results, upload an image attachment. Use `--aspect` to set the aspect ratio. Allowed values are `16:9`, `9:16` and `1:1`. Use `--duration` to set the duration of the generated video (5 or 10 seconds).
-	deepreasoning	-	-	DeepReasoning (previously DeepClaude) is a high-performance LLM inference that combines DeepSeek R1's Chain of Thought (CoT) reasoning capabilities with Anthropic Claude's creative and code generation prowess. It provides a unified interface for leveraging the strengths of both models while maintaining complete control over your data. Learn more: https://deepclaude.com/
-	gemma-3-27b	-	-	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2
-	qwen3-32b-cs	3,600.00	-	World’s fastest inference for Qwen 3 32B with Cerebras. Append /no_think to your prompt to disable the model's default reasoning behavior.
-	qwen-2.5-vl-32b	6,600.00	-	Qwen2.5-VL-32B's mathematical and problem-solving capabilities have been strengthened through reinforcement learning, leading to a significantly improved user experience. The model's response styles have been refined to better align with human preferences, particularly for objective queries involving mathematics, logical reasoning, and knowledge-based Q&A. As a result, responses now feature greater detail, improved clarity, and enhanced formatting.
-	qwen2.5-vl-72b-t	8,700.00	-	Qwen 2.5 VL 72B, a cutting-edge multimodal model from the Qwen Team, excels in visual and video understanding, multilingual text/image processing (including Japanese, Arabic, and Korean), and dynamic agentic reasoning for automation. It supports long-context comprehension (32K tokens)
-	mistral-small-3	0.10	0.30	Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks--those that require robust language and instruction following performance, with very low latency. Released under an Apache 2.0 license and comparable to Llama-3.3-70B and Qwen2.5-32B-Instruct.
-	deepseek-v3-di	4,300.00	-	Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Supports 64k tokens of input context and 8k tokens of output context. Quantization: FP8 (official).
-	deepseek-v3-turbo-di	5,900.00	-	Deepseek-v3 – the new top open-source LLM. Achieves state-of-the-art performance in tasks such as coding, mathematics, and reasoning. Turbo variant is quantized to achieve higher speeds. All data you submit to this bot is governed by the Poe privacy policy and is only sent to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP4 (turbo).
-	phi-4-di	300.00	-	Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 16k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
-	mistral-7b-v0.3-di	150.00	-	Mistral Instruct 7B v0.3 from Mistral AI. All data you provide this bot will not be used in training, and is sent only to DeepInfra, a US-based company. Supports 32k tokens of input context and 8k tokens of output context. Quantization: FP16 (official).
-	aya-expanse-32b	5,100.00	-	Aya Expanse is a 32B open-weight research release of a model with highly advanced multilingual capabilities. Aya supports state-of-art generative capabilities in 23 languages: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
-	liveportrait	-	-	Animates given portraits with the motion's in the video. Powered by fal.ai
-	llama-3.1-8b-t-128k	3,000.00	-	Llama 3.1 8B Instruct from Meta. Supports 128k tokens of context. The points price is subject to change.
-	stablediffusion3-2b	-	-	Stable Diffusion v3 Medium - by fal.ai
-	mixtral8x22b-inst-fw	3,600.00	-	Mixtral 8x22B Mixture-of-Experts instruct model from Mistral hosted by Fireworks.
-	command-r	5,100.00	-	I can search the web for up to date information and respond in over 10 languages!
-	mistral-large-2	3.00	9.00	Mistral's latest text generation model (Mistral-Large-2407) with top-tier reasoning capabilities. It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. This bot has the full 128k context window supported by the model.
-	dall-e-3	45,000.00	-	OpenAI's most powerful image generation model. Generates high quality images with intricate details based on the user's most recent prompt. For most prompts, https://poe.com/FLUX-pro-1.1-ultra or https://poe.com/FLUX-dev or https://poe.com/Imagen3 will produce better results. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1). Valid aspect ratios are 1:1, 7:4, & 4:7.
-	reka-core	-	-	Reka's largest and most capable multimodal language model. Works with text, images, and video inputs. 8k context length.
-	reka-flash	-	-	Reka's efficient and capable 21B multimodal model optimized for fast workloads and amazing quality. Works with text, images and video inputs.
-	command-r-plus	5,100.00	-	A supercharged version of Command R. I can search the web for up to date information and respond in over 10 languages!
-	claude-sonnet-3.5-june	2.60	13.00	Anthropic's legacy Sonnet 3.5 model, specifically the June 2024 snapshot (for the latest, please use https://poe.com/Claude-Sonnet-3.5). Excels in complex tasks like coding, writing, analysis and visual processing; generally, more verbose than the more concise October 2024 snapshot.
-	gpt-3.5-turbo	0.45	1.40	OpenAI’s GPT 3.5 Turbo model is a powerful language generation system designed to provide highly coherent, contextually relevant, and detailed responses. Supports 16,384 tokens of context. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	sketch-to-image	-	-	Takes in sketches and converts them to colored images.
-	qwen2.5-coder-32b	1,500.00	-	Qwen2.5-Coder is the latest series of code-specific Qwen large language models (formerly known as CodeQwen), developed by Alibaba.
-	stablediffusion3.5-t	-	-	Faster version of Stable Diffusion 3 Large, hosted by @fal. Excels for fast image generation. Use "--aspect" to select an aspect ratio (e.g --aspect 1:1).
-	flux-pro-1.1-t	30,000.00	-	The best state of the art image model from BFL. FLUX 1.1 Pro generates images six times faster than its predecessor, FLUX 1 Pro, while also improving image quality, prompt adherence, and output diversity. The bot does not support any attachments.
-	flux-schnell-t	2,100.00	-	Lightning-fast AI image generation model that excels in producing high-quality visuals in just seconds. Great for quick prototyping or real-time use cases. This is the fastest version of FLUX.1. The bot does not support any attachments.
-	recraft-v3	-	-	Recraft V3, state of the art image generation. Prompt input cannot exceed 1,000 characters. Use --style for styles, and --aspect for aspect ratio configuration (16:9, 4:3, 1:1, 3:4, 9:16). Available styles: realistic_image, digital_illustration, vector_illustration, realistic_image/b_and_w, realistic_image/hard_flash, realistic_image/hdr, realistic_image/natural_light, realistic_image/studio_portrait, realistic_image/enterprise, realistic_image/motion_blur, digital_illustration/pixel_art, digital_illustration/hand_drawn, digital_illustration/grain, digital_illustration/infantile_sketch, digital_illustration/2d_art_poster, digital_illustration/handmade_3d, digital_illustration/hand_drawn_outline, digital_illustration/engraving_color, digital_illustration/2d_art_poster_2, vector_illustration/engraving, vector_illustration/line_art, vector_illustration/line_circuit, vector_illustration/linocut
-	llama-3-70b-t	2,300.00	-	Llama 3 70B Instruct from Meta. For most use cases, https://poe.com/Llama-3.3-70B will perform better.
-	gpt-4o-aug	2.20	9.00	OpenAI's most powerful model, GPT-4o, using the August 2024 model snapshot. Stronger than GPT-3.5 in quantitative questions (math and physics), creative writing, and many other challenging tasks. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	gpt-4-classic-0314	27.00	54.00	OpenAI's GPT-4 model. Powered by gpt-4-0314 (non-Turbo) for text input and gpt-4o for image input. For most use cases, https://poe.com/GPT-4o will perform significantly better.
-	gpt-4-classic	27.00	54.00	OpenAI's GPT-4 model. Powered by gpt-4-0613 (non-Turbo) for text input and gpt-4o for image input. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	solar-pro-2	2,100.00	-	Solar Pro 2 is Upstage's latest frontier-scale LLM. With just 31B parameters, it delivers top-tier performance through world-class multilingual support, advanced reasoning, and real-world tool use. Especially in Korean, it outperforms much larger models across critical benchmarks. Built for the next generation of practical LLMs, Solar Pro 2 proves that smaller models can still lead. Supports a context length of 64k tokens.
-	remove-background	-	-	Remove background from your images
-	sana-t2i	-	-	SANA can synthesize high-resolution, high-quality images at a remarkably fast rate, with the ability to generate 4K images in less than a second. Optional parameters: Set aspect ratio, with options 16:9, 4:3, 1:1, 3:4 and 9:16. This is set to 4:3 by default.
-	mistral-7b-v0.3-t	1,400.00	-	Mistral Instruct 7B v0.3 from Mistral AI. The points price is subject to change.
-	tako	30,000.00	-	Tako is a bot that transforms your questions about stocks, sports, economics or politics into interactive, shareable knowledge cards from trusted sources. Tako's knowledge graph is built exclusively from authoritative, real-time data providers, and is embeddable in your apps, research and storytelling. You can adjust the specificity threshold by typing `--specificity 30` (or a value between 0 - 100) at the end of your query/question; the default is 60.
-	llama-3.1-405b-fp16	62,000.00	-	The Biggest and Best open-source AI model trained by Meta, beating GPT-4o across most benchmarks. This bot is in BF16 and with 128K context length.
-	llama-3.1-8b-fp16	1,500.00	-	The smallest and fastest member of the Llama 3.1 family, offering exceptional efficiency and rapid response times with 128K context length.
-	llama-3.1-70b-fp16	6,000.00	-	The best LLM at its size with faster response times compared to the 405B model with 128K context length.
-	llama-3-70b-fp16	6,000.00	-	A highly efficient and powerful model designed for a veriety of tasks with 128K context length.
-	restyler	-	-	This bot enables rapid transformation of existing images, delivering high-quality style transfers and image modifications. Takes in a text input and an image attachment. Use --strength to control the guidance given by the initial image, with higher values adhering to the image more strongly.
-	stablediffusionxl	3,600.00	-	Generates high quality images based on the user's most recent prompt. Allows users to specify elements to avoid in the image using the "--no" parameter at the end of the prompt. Select an aspect ratio with "--aspect". (e.g. "Tall trees, daylight --no rain --aspect 7:4"). Valid aspect ratios are 1:1, 7:4, 4:7, 9:7, 7:9, 19:13, 13:19, 12:5, & 5:12. Powered by Stable Diffusion XL.
-	qwen-2.5-7b-t	2,300.00	-	Qwen 2.5 7B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages.
-	qwen-2.5-72b-t	9,000.00	-	Qwen 2.5 72B from Alibaba. Excels in coding, math, instruction following, natural language understanding, and has great multilangual support with more than 29 languages. Delivering results on par with Llama-3-405B despite using only one-fifth of the parameters.
-	python	30.00	-	Executes Python code (version 3.11) from the user message and outputs the results. If there are code blocks in the user message (surrounded by triple backticks), then only the code blocks will be executed. These libraries are imported into this bot's run-time automatically -- numpy, pandas, requests, matplotlib, scikit-learn, torch, PyYAML, tensorflow, scipy, pytest -- along with ~150 of the most widely used Python libraries.
-	markitdown	-	-	Convert anything to Markdown: URLs, PDFs, Word, Excel, PowerPoint, images (EXIF metadata), audio (EXIF metadata and transcription), and more. This bot wraps Microsoft’s MarkItDown MCP server (https://github.com/microsoft/markitdown).
-	gpt-4-turbo	9.00	27.00	Powered by OpenAI's GPT-4 Turbo with Vision. For most tasks, https://poe.com/GPT-4o will perform better. Supports 128k tokens of context. Requests with images will be routed to @GPT-4o. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	flux-1-schnell-fw	1,000.00	-	FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. 2. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps. 3. Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.
-	flux-1-dev-fw	11,000.00	-	FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro]. 2. Competitive prompt following, matching the performance of closed source alternatives. 3. Trained using guidance distillation, making FLUX.1 [dev] more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.
-	mochi-preview	-	-	Open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence. Supports both text-to-video and image-to-video. Generates 5 second video.
-	gpt-3.5-turbo-instruct	1.40	1.80	Powered by gpt-3.5-turbo-instruct. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	gpt-3.5-turbo-raw	0.45	1.40	Powered by gpt-3.5-turbo without a system prompt. Check out the newest version of this bot here: https://poe.com/GPT-5.
-	interpreter	-	-	Interpreter for Poe Python
-	claude-haiku-3	0.21	1.10	Anthropic's Claude Haiku 3 outperforms models in its intelligence category on performance, speed and cost without the need for specialized fine-tuning. The compute points value is subject to change. For most use cases, https://poe.com/Claude-Haiku-3.5 will be better.
-	code-saver	-	-	A system bot that handles Poe scripts in chat.
-	code-editor	-	-	Official code editor for Poe Scripting using Python, used to connect multiple Poe bots and create AI workflows. Guide and tips: https://creator.poe.com/docs/script-bots/poe-python-reference