Audio-to-Audio
Audio-to-audio encompasses speech enhancement, voice conversion, source separation, and style transfer — any task where audio goes in and transformed audio comes out. Speech enhancement (denoising) was revolutionized by Meta's Demucs and Microsoft's DCCRN, now used in every video call; voice conversion took a leap with RVC and So-VITS-SVC enabling zero-shot voice cloning that sparked both creative tools and deepfake concerns. Source separation (isolating vocals, drums, bass from a mix) reached near-production quality with HTDemucs and Band-Split RNN, making stems extraction a solved problem for most music. The field is converging toward unified models that handle multiple audio transformations through natural language instructions, blurring the line with text-to-audio generation.
DNS Challenge
Deep noise suppression on Microsoft DNS challenge data
Top 10
Leading models on DNS Challenge.
No results yet. Be the first to contribute.
All datasets
2 datasets tracked for this task.
Related tasks
Other tasks in Audio.
Looking to run a model? HuggingFace hosts inference for this task type.