The VocalSound dataset is a collection of over 21,000 crowdsourced audio recordings of non-speech human vocalizations, including laughter, coughs, sneezes, sighs, throat clearing, and sniffs. It was created to improve vocal sound recognition models and contains metadata about the speakers, such as age, gender, native language, and country. The dataset is designed to help researchers develop more robust and accurate systems for tasks like automatic transcription and health monitoring.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.