Codesota · Tasks · Audio-to-AudioHome/Tasks/Audio/Audio-to-Audio

Audio· audio-to-audio

Audio-to-Audio.

Audio-to-audio encompasses speech enhancement, voice conversion, source separation, and style transfer — any task where audio goes in and transformed audio comes out. Speech enhancement (denoising) was revolutionized by Meta's Demucs and Microsoft's DCCRN, now used in every video call; voice conversion took a leap with RVC and So-VITS-SVC enabling zero-shot voice cloning that sparked both creative tools and deepfake concerns. Source separation (isolating vocals, drums, bass from a mix) reached near-production quality with HTDemucs and Band-Split RNN, making stems extraction a solved problem for most music. The field is converging toward unified models that handle multiple audio transformations through natural language instructions, blurring the line with text-to-audio generation.

2

Datasets

0

Results

si-snr

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

DNS Challenge

Deep noise suppression on Microsoft DNS challenge data

Primary metric: si-snr

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on DNS Challenge.

No results yet. Be the first to contribute.

What were you looking for on Audio-to-Audio?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

2 datasets tracked for this task.

0 results · si-snr

VCTK (Voice Conversion)

0 results · pesq

§ 05 · Related tasks

Other tasks in Audio.

Audio Captioning Music Generation Sound Event Detection Text-to-Audio Voice Activity Detection

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Audio-to-Audio? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.