LibriCount is a dataset designed for speaker count estimation that simulates a "cocktail party" environment with up to 10 speakers. It includes audio wave files and JSON annotation files, which contain metadata like the ground truth number of speakers, speaker IDs, and vocal activity. The dataset consists of 5-second, 16kHz, 16-bit mono audio recordings mixed from random utterances from the LibriSpeech CleanTest dataset.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.