Codesota · Models · Higgs Audio v3 8B STT v2Boson AI0 results · 0 benchmarks
Model card

Higgs Audio v3 8B STT v2.

Boson AISpeech-to-text8B paramsAudio-language model (Higgs)Open source

1.27% LibriSpeech test-clean — among the lowest on the HF Open ASR Leaderboard.

§ 01 · Card

Model card,
inline.

Rendered server-side from the upstream README on Hugging Face — same content as the source repo, with editorial typography. The full card, sample weights, and revision history live on HF.


Source
bosonai/higgs-audio-v3-8b-stt-v2
License
apache-2.0
Pipeline
automatic-speech-recognition

Higgs Audio v3 8B STT v2

A speech-to-text model combining a Whisper-Large-v3 encoder with a Qwen3-8B decoder (8.91B total parameters), fine-tuned with LoRA on diverse ASR benchmarks.

Usage

python
import torch import numpy as np from transformers import AutoModel, AutoTokenizer # Load model model = AutoModel.from_pretrained( "bosonai/higgs-audio-v3-8b-stt-v2", torch_dtype=torch.bfloat16, trust_remote_code=True, attn_implementation="eager", device_map="cuda:0", ) tokenizer = AutoTokenizer.from_pretrained("bosonai/higgs-audio-v3-8b-stt-v2") # Transcribe audio (16kHz mono numpy array) from transformers.utils import cached_file import importlib.util spec = importlib.util.spec_from_file_location("transcribe", cached_file("bosonai/higgs-audio-v3-8b-stt-v2", "transcribe.py", _raise_exceptions_for_connection_errors=False)) mod = importlib.util.module_from_spec(spec) spec.loader.exec_module(mod) audio_np = np.random.randn(16000).astype(np.float32) # replace with your audio text = mod.transcribe(model, tokenizer, audio_np) print(text)

Requirements

torch
transformers>=4.51.0
whisper  # for audio preprocessing (WhisperProcessor)

Architecture

  • Encoder: Whisper-Large-v3 (frozen)
  • Decoder: Qwen3-8B (LoRA fine-tuned, merged)
  • Total parameters: 8.91B
  • Audio input: 16kHz mono WAV
  • Supports: Thinking mode for improved accuracy

Performance (ESB Benchmark — Full Scale, All Samples)

| Dataset | WER | |---------|-----| | AMI | 10.14% | | Earnings22 | 8.73% | | GigaSpeech | 8.47% | | LibriSpeech Clean | 1.25% | | LibriSpeech Other | 2.38% | | SPGISpeech | 3.60% | | TED-LIUM | 3.09% | | VoxPopuli | 5.92% | | Average | 5.449% |

Card content reproduced from huggingface.co/bosonai/higgs-audio-v3-8b-stt-v2 under the upstream license. Rendering trims fenced HTML, raw widgets and tables for safety; tap the link for the untouched original.
§ 02 · Benchmarks

No recorded benchmark results yet.

This model is in the registry but doesn’t have any benchmark_results rows yet. If you have a score, submit it →

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.