wan-2.6

wan-2.6

WAN 2.6 is Alibaba’s multimodal video generation model built for cinematic, multi-shot storytelling—creating high-fidelity videos from text and/or images while keeping characters and style consistent across scenes. It also supports native audio-visual sync (including lip-sync) and can generate or align dialogue/music/SFX with the visuals, enabling “prompt-to-video” results that feel production-ready without heavy post work. Notes: - This model is served from the Singapore area. - Upload an image to enable image-to-video generations or video(s) for video-to-video generations. - Responses may take upwards of 5 minutes (or more) to finish generating. Parameter controls available: 1. Video Settings - `--resolution 1080p` (default) or `--resolution 720p` - `--aspect_ratio 16:9` (default), `9:16`, `1:1`, `4:3`, or `3:4` (ignored for image-to-video as it uses the input image's aspect ratio) - `--duration [5, 10, or 15]` seconds (default: 5) (video-to-video limited to 10s max) 2. Advanced Settings - `--prompt_extend true` (default) or `--prompt_extend false`: AI prompt enhancement - `--audio true` (default) or `--audio false`: Enable/disable audio generation - `--shot_type multi` (default) or `--shot_type single`: Multi-shot narrative vs single continuous shot - `--seed [0-2147483646]`: Random seed for reproducibility - `--negative_prompt "text"`: Describe what you don't want in the video 3. Attachments - For i2v: Attach an image as the first frame - For r2v: Attach 1-3 reference videos (2-30 seconds each, MP4/MOV) (Use `character1`, `character2`, `character3` in prompt to reference subjects, ex. character1 references the subject in the first uploaded video) - For t2v/i2v: Optionally attach an audio file (3-30 seconds, max 15mb, .mp3/.wav) for custom audio 4. Multi-Shot Prompting - For multi-shot mode, use timeline syntax: `[Shot #] [Timestamp] [Action]`. Example: `[Shot 1] [0-5s] Wide shot of city skyline. [Shot 2] [5-10s] Close-up of character walking.` - Ensure timestamps match your selected duration and use transition keywords like "Hard cut" or "Fade in" between shots.

Provider	Source	Input Price ($/1M)	Output Price ($/1M)	Description	Free
poe	poe	Input: -	Output: -	WAN 2.6 is Alibaba’s multimodal video generation model built for cinematic, multi-shot storytelling—creating high-fidelity videos from text and/or images while keeping characters and style consistent across scenes. It also supports native audio-visual sync (including lip-sync) and can generate or align dialogue/music/SFX with the visuals, enabling “prompt-to-video” results that feel production-ready without heavy post work. Notes: - This model is served from the Singapore area. - Upload an image to enable image-to-video generations or video(s) for video-to-video generations. - Responses may take upwards of 5 minutes (or more) to finish generating. Parameter controls available: 1. Video Settings - `--resolution 1080p` (default) or `--resolution 720p` - `--aspect_ratio 16:9` (default), `9:16`, `1:1`, `4:3`, or `3:4` (ignored for image-to-video as it uses the input image's aspect ratio) - `--duration [5, 10, or 15]` seconds (default: 5) (video-to-video limited to 10s max) 2. Advanced Settings - `--prompt_extend true` (default) or `--prompt_extend false`: AI prompt enhancement - `--audio true` (default) or `--audio false`: Enable/disable audio generation - `--shot_type multi` (default) or `--shot_type single`: Multi-shot narrative vs single continuous shot - `--seed [0-2147483646]`: Random seed for reproducibility - `--negative_prompt "text"`: Describe what you don't want in the video 3. Attachments - For i2v: Attach an image as the first frame - For r2v: Attach 1-3 reference videos (2-30 seconds each, MP4/MOV) (Use `character1`, `character2`, `character3` in prompt to reference subjects, ex. character1 references the subject in the first uploaded video) - For t2v/i2v: Optionally attach an audio file (3-30 seconds, max 15mb, .mp3/.wav) for custom audio 4. Multi-Shot Prompting - For multi-shot mode, use timeline syntax: `[Shot #] [Timestamp] [Action]`. Example: `[Shot 1] [0-5s] Wide shot of city skyline. [Shot 2] [5-10s] Close-up of character walking.` - Ensure timestamps match your selected duration and use transition keywords like "Hard cut" or "Fade in" between shots.

wan-2.6

Available at 1 Provider