Pro-grade English transcription. The lowest VRAM in the Pro tier.
Powered by Kyutai STT 2.6B β a 2.6-billion parameter decoder-only transformer optimised for maximum English accuracy with Mimi neural audio codec.
$59 one-time Β· π₯ LAUNCH: $29.50 with code EARLYBIRD
π€ Transcribe your mic β or π listen to your system audio (meetings, podcasts, videos). All 100% local. Nothing leaves your machine.
Chunk-based processing with Mimi neural codec β 2.6 billion parameters optimised for English accuracy.
Audio is captured at 24kHz in tiny 80ms frames from your microphone β or switch to system audio mode to capture anything playing on your PC (meetings, videos, podcasts). No virtual cables needed.
Each 80ms audio frame is encoded by Kyutai's Mimi codec into 32 parallel codebook streams β capturing both the meaning of speech and its acoustic characteristics. Mimi operates at 12.5 Hz with causal streaming, producing tokens the instant audio arrives.
A 2.6-billion parameter autoregressive model converts Mimi's audio tokens into text using greedy decoding β no sampling randomness, just the most confident prediction every frame. With its larger parameter count, the 2.6B model achieves superior English accuracy (6.4% WER) with built-in punctuation and capitalisation.
The 2.6B model processes audio in chunks, with text appearing roughly ~2 seconds behind your speech. This short delay is the trade-off for the model's superior accuracy. During natural pauses, the model outputs padding tokens (silence markers) until speech resumes, keeping the pipeline alive without producing phantom text.
The model's attention cache is a fixed-capacity ring buffer β pre-allocated at startup, never growing. Following Kyutai's official inference design, GPU memory stays locked at ~5.8GB indefinitely. No memory leaks, no slowdowns, no matter how long you run it.
| Metric | Value |
|---|---|
| Arena Score | 9.4 combined — Professional tier |
| WER (Word Error Rate) | 6.4% — best-in-class for the model size |
| Delivery | Chunk-based — text arrives ~2 seconds behind speech |
| Language | English |
| Punctuation | Context-aware, generated by the model |
| Capitalisation | Automatic, intelligent |
| Resource | Usage | Behaviour Over Time |
|---|---|---|
| GPU VRAM | ~5.8GB (Kyutai STT 2.6B) | Stable — fixed-capacity ring buffer, never grows |
| RAM | ~1-2GB (Python process) | Stable |
| Disk | Zero temp files | Audio processed in memory, never written to disk |
| Network | None | Fully offline — no internet required |
VoxBar Kyutai 2.6B uses a fixed-capacity ring buffer for its attention cache. GPU memory stays locked at ~5.8GB indefinitely β no memory leaks, no slowdowns. Record for hours without interruption.
Silence timeout: 5 minutes of no detected speech triggers auto-stop. The semantic VAD system intelligently distinguishes between actual silence and natural pauses in conversation.
A built-in staleness monitor watches the pipeline. If inference stalls for any reason, VoxBar automatically resets and reconnects β no manual restart needed. This makes long sessions completely hands-free.
| Requirement | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA with 6GB VRAM | NVIDIA with 8GB+ VRAM |
| RAM | 8GB | 16GB |
| Disk | ~5GB (model + app) | SSD |
| OS | Windows 10/11 | Windows 11 |
| Software | Python 3.11+ | Included in installer |
VoxBarβ’ Pro Kyutai 2.6B is powered by Kyutai STT 2.6B, created by Kyutai Labs (Paris) and licensed under CC-BY-4.0.
VoxBarβ’ is an independent product by Conjure Labs Limited and is not affiliated with, endorsed by, or sponsored by Kyutai Labs.
| Feature | Kyutai 2.6B | Pro Native |
|---|---|---|
| Arena Score | 9.4 combined | 9.5 combined |
| VRAM | ~5.8GB | ~8.5GB |
| System Audio | Yes β capture meetings, videos | Microphone only |
| Languages | English | 13 languages |
| Delivery | ~2s delay | Sub-200ms real-time |
| Best for | Users with 6-8GB GPUs who want pro-grade English | Users with 10GB+ GPUs who want multilingual + speed |
One-time purchase. Lifetime license. 2 machines. Zero cloud.
Coming SoonSecure checkout via Lemon Squeezy / Stripe