← Back to all models

stable-audio-2.0

stable-audio-2.0

Stable Audio 2.0 generates audio up to 3 minutes long from text prompts, supporting text-to-audio and audio-to-audio transformations with customizable settings like duration, steps, CFG scale, and more. It is ideal for creative professionals seeking detailed and extended outputs from simple prompts. Note: Audio-to-audio mode requires a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [30-100]`: Higher = better quality (recommended 50-80) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical)

Available at 1 Provider

Provider Source Input Price ($/1M) Output Price ($/1M) Description Free
poe poe Input: - Output: - Stable Audio 2.0 generates audio up to 3 minutes long from text prompts, supporting text-to-audio and audio-to-audio transformations with customizable settings like duration, steps, CFG scale, and more. It is ideal for creative professionals seeking detailed and extended outputs from simple prompts. Note: Audio-to-audio mode requires a prompt alongside an uploaded audio file for generation. Parameter controls available: 1. Basic - Default: text-to-audio (no `--mode` needed) - If transforming uploaded audio: `--mode audio-to-audio` - `--output_format wav` (for high quality, otherwise omit for mp3) 2. Timing and Randomness - `--duration [1-190 seconds]` controls how long generated audio is - '--random_seed false --seed [0-4294967294]' disables random seed generation 3. Advanced - `--cfg_scale [1-25]`: Higher = closer to prompt (recommended 7-15) - `--steps [30-100]`: Higher = better quality (recommended 50-80) 4. Transformation control (only for audio-to-audio) - `--strength [0-1]`: How much to change/transform (0.3-0.7 typical)