How It Works
VoxBar Pro Native runs the same Voxtral 4B model as VoxBar Pro Docker โ but natively on Windows without any Docker, WSL, or container overhead. It uses a lightweight Python inference server powered by Hugging Face Transformers.
Here's what happens:
- One-click launch โ double-click the VoxBar shortcut. The F16 model loads directly onto your GPU via CUDA
- Captures your microphone at 16kHz, processing audio in efficient chunks
- Runs inference natively โ the Voxtral F16 4B model processes audio directly on your GPU without containerisation overhead
- Transcription appears in real-time โ words show up as you speak, with sub-200ms latency
- No background services โ when you close VoxBar, everything stops. No Docker daemon, no lingering containers
This is the same S-tier model as Pro Docker, but running natively. You get 9.5 combined Arena accuracy with 40% less VRAM (8.5GB vs 14GB).
Why Native?
Docker adds overhead. With Pro Native:
- No Docker Desktop required โ saves ~2GB RAM and eliminates WSL2 complexity
- 40% less VRAM โ F16 precision uses 8.5GB vs Docker's 14GB allocation
- Instant startup โ model loads in seconds, not minutes
- Rock-solid sessions โ no WebSocket drops, no reconnection delays
- Cleaner system โ nothing runs in the background when VoxBar is closed
Recording Limits
VoxBar Pro Native Has No Practical Recording Limit
Because Pro Native runs natively on your GPU with no Docker, no WebSocket, and no container overhead, it can record continuously for as long as you need โ hours, all day if required.
Why It Runs Forever
- Each audio chunk is processed independently โ no state carries over between chunks
- GPU memory is fixed at ~8.5GB โ the same model processes the same input size every time
- No WebSocket connection to drop, no Docker container to restart
- No context window that fills up or degrades over time
Auto-Stop Behaviour
- Silence timeout: 5 minutes of no detected speech triggers auto-stop
- This keeps your GPU free when you're not actively dictating
Real-World Testing
During live testing, VoxBar Pro Native ran continuously for over 50 minutes of natural dictation with zero interruptions, zero restarts, and zero degradation. The text quality at minute 50 was identical to minute 1.
Memory & Resource Footprint
| Resource | Usage | Behaviour Over Time |
|---|---|---|
| GPU VRAM | ~8.5GB (Voxtral F16 4B model) | Stable โ no KV cache accumulation |
| RAM | ~1-2GB (Python process) | Stable |
| Disk | Zero temp files | Audio processed in memory, never written to disk |
| Network | None | Fully offline โ no localhost server, no internet |
Accuracy & Speed
| Metric | Value |
|---|---|
| Arena Score | 9.5 combined โ tied S-tier |
| Delivery | Real-time โ words appear as you speak |
| Latency | <200ms from speech to text on screen |
| Multilingual | Yes โ 13 languages supported |
| Punctuation | Context-aware, appears naturally |
| Capitalisation | Automatic, intelligent |
Hardware Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA with 10GB VRAM | NVIDIA with 12GB+ VRAM |
| RAM | 8GB | 16GB |
| Disk | ~8.5GB for model | SSD recommended |
| OS | Windows 10/11 | Windows 11 |
| Software | NVIDIA drivers + CUDA | Latest NVIDIA drivers |
License & Attribution
| Detail | Value |
|---|---|
| Model | Voxtral-Mini-4B-Realtime-2602 (F16) |
| Creator | Mistral AI |
| License | Apache 2.0 (fully commercial) |
| Attribution | Not required (but appreciated) |
| Distribution | Can be bundled and sold commercially |
Pro Native vs Pro Docker
Both run the same Voxtral 4B model. The difference is how it's deployed:
| Feature | Pro Native | Pro Docker |
|---|---|---|
| Arena Score | 9.5 combined | 9.6 combined |
| Docker required | No | Yes |
| VRAM usage | ~8.5GB | ~14GB |
| Install | One-click | Docker Desktop + pull image |
| Session stability | Rock solid | Stable |
| AMD GPU | Not supported | Not supported |
| macOS | See Mac Models โ native Apple Metal build available separately | |
| Best for | Windows users who want simplicity and lower VRAM | Power users who want the highest arena score |
VoxBar Pro Native is for Windows users who want S-tier accuracy with the simplest possible setup and 40% less VRAM. VoxBar Pro Docker achieves a slightly higher arena score (9.6 vs 9.5) but requires Docker Desktop and 14GB VRAM.