Official Benchmarks
The canonical leaderboard for all offline speech-to-text models tested in the VoxBar engine arena.
The 5 test files used to benchmark the transcription models across different scenarios.
| Test | Content | What It Stresses |
|---|---|---|
| T1 | Reading a List (Motivational) | Numbering, punctuation, structured content |
| T2 | Lecture (Varoufakis — Economics) | Numbers, percentages, proper nouns, complex sentences |
| T3 | Podcast (Dawkins on Darwin) | Philosophy vocab, nested clauses, proper nouns |
| T4 | Accent 1 (Joscha Bach — German) | Dense philosophy, accented speech, "simulacrum" |
| T5 | Accent 2 (Daniel Dennett — American) | Narrative speech, "hallucinatory", stream-of-consciousness |
Testing how well models transcribe internal computer audio (lectures, podcasts, videos) without mic interference.
| Rank | Model (Size) | T1: List | T2: Lecture | T3: Podcast | T4: Accent 1 | T5: Accent 2 | AVG |
|---|---|---|---|---|---|---|---|
| 🥇 | VoxBar Voxtral 4B (14GB VRAM) | 9.5 | 9.5 | 10.0 | 10.0 | 9.5 | 9.7 |
| 🥈 | VoxBar Pro Native F16 (8.5GB VRAM) | 8.5 | 10.0 | 10.0 | 10.0 | 9.0 | 9.5 |
| 🥉 | VoxBar Kyutai 2.6B (6GB VRAM) | 9.5 | 9.5 | 9.5 | 10.0 | 8.5 | 9.4 |
| 4️⃣ | VoxBar Nemotron 0.6B (2GB VRAM) | 8.5 | 8.0 | 9.5 | 9.0 | 8.5 | 8.7 |
| 5️⃣ | VoxBar GLM-ASR 1.5B (4GB VRAM) | 8.5 | 8.0 | 9.0 | 9.0 | 8.5 | 8.6 |
| 6️⃣ | VoxBar Canary 2.5B (4GB VRAM) | 7.5 | 8.5 | 8.0 | 9.5 | 8.5 | 8.4 |
| 6️⃣ | VoxBar Kyutai 1B (2.7GB VRAM) | 9.0 | 7.5 | 8.5 | 9.0 | 8.0 | 8.4 |
| 8️⃣ | VoxBar Distil-Whisper V3 (4GB VRAM) | 6.5 | 7.5 | 7.0 | 8.5 | 8.0 | 7.5 |
| 9️⃣ | VoxBar Qwen ASR 1.7B (4.5GB VRAM) | 7.0 | 7.5 | 7.5 | 6.5 | 6.0 | 6.9 |
Testing transcription accuracy from a condenser microphone (WASAPI loopback) handling room acoustics and breathing.
| Rank | Model (Size) | T1: List | T2: Lecture | T3: Podcast | T4: Accent 1 | T5: Accent 2 | AVG |
|---|---|---|---|---|---|---|---|
| 🥇 | VoxBar Pro Native F16 (8.5GB VRAM) | 9.5 | 9.5 | 10.0 | 10.0 | 9.0 | 9.6 |
| 🥈 | VoxBar Voxtral 4B (14GB VRAM) | 9.5 | 9.5 | 10.0 | 10.0 | 8.5 | 9.5 |
| 🥉 | VoxBar Kyutai 2.6B (6GB VRAM) | 9.5 | 9.5 | 9.5 | 10.0 | 8.5 | 9.4 |
| 4️⃣ | VoxBar Canary 2.5B (4GB VRAM) | 8.5 | 8.5 | 7.5 | 9.0 | 8.5 | 8.4 |
| 5️⃣ | VoxBar Nemotron 0.6B (2GB VRAM) | 9.0 | 7.0 | 8.0 | 8.5 | 7.5 | 8.0 |
| 6️⃣ | VoxBar Kyutai 1B (2.7GB VRAM) | 9.5 | 7.0 | 7.5 | 7.0 | 8.0 | 7.8 |
| 7️⃣ | VoxBar GLM-ASR 1.5B (4GB VRAM) | 7.0 | 6.5 | 8.5 | 8.0 | 7.0 | 7.4 |
| 8️⃣ | VoxBar Qwen ASR 1.7B (4.5GB VRAM) | 5.0 | 6.5 | 6.5 | 4.5 | 7.0 | 5.9 |
| 9️⃣ | VoxBar Distil-Whisper V3 (4GB VRAM) | 6.0 | 5.5 | 5.5 | 6.0 | 6.0 | 5.8 |
The ultimate average score across all 10 tests.
| Rank | Model (Size) | Sys AVG | Mic AVG | Combined | Gap to #1 |
|---|---|---|---|---|---|
| 🥇 | VoxBar Voxtral 4B (14GB VRAM) | 9.7 | 9.5 | 9.6 | — |
| 🥈 | VoxBar Pro Native F16 (8.5GB VRAM) | 9.5 | 9.6 | 9.55 | -0.05 |
| 🥉 | VoxBar Kyutai 2.6B (6GB VRAM) | 9.4 | 9.4 | 9.4 | -0.2 |
| 4️⃣ | VoxBar Canary 2.5B (4GB VRAM) | 8.4 | 8.4 | 8.4 | -1.2 |
| 5️⃣ | VoxBar Nemotron 0.6B (2GB VRAM) | 8.7 | 8.0 | 8.35 | -1.25 |
| 6️⃣ | VoxBar Kyutai 1B (2.7GB VRAM) | 8.4 | 7.8 | 8.1 | -1.5 |
| 7️⃣ | VoxBar GLM-ASR 1.5B (4GB VRAM) | 8.6 | 7.4 | 8.0 | -1.6 |
| 8️⃣ | VoxBar Distil-Whisper V3 (4GB VRAM) | 7.5 | 5.8 | 6.65 | -2.95 |
| 9️⃣ | VoxBar Qwen ASR 1.7B (4.5GB VRAM) | 6.9 | 5.9 | 6.4 | -3.2 |