VoxBar Pro Native — Voxtral F16 4B

How It Works

VoxBar Pro Native runs the same Voxtral 4B model as VoxBar Pro Docker — but natively on Windows without any Docker, WSL, or container overhead. It uses a lightweight Python inference server powered by Hugging Face Transformers.

Here's what happens:

One-click launch — double-click the VoxBar shortcut. The F16 model loads directly onto your GPU via CUDA
Captures your microphone at 16kHz, processing audio in efficient chunks
Runs inference natively — the Voxtral F16 4B model processes audio directly on your GPU without containerisation overhead
Transcription appears in real-time — words show up as you speak, with sub-200ms latency
No background services — when you close VoxBar, everything stops. No Docker daemon, no lingering containers

This is the same S-tier model as Pro Docker, but running natively. You get 9.5 combined Arena accuracy with 40% less VRAM (8.5GB vs 14GB).

Why Native?

Docker adds overhead. With Pro Native:

No Docker Desktop required — saves ~2GB RAM and eliminates WSL2 complexity
40% less VRAM — F16 precision uses 8.5GB vs Docker's 14GB allocation
Instant startup — model loads in seconds, not minutes
Rock-solid sessions — no WebSocket drops, no reconnection delays
Cleaner system — nothing runs in the background when VoxBar is closed

Recording Limits

VoxBar Pro Native Has No Practical Recording Limit

Because Pro Native runs natively on your GPU with no Docker, no WebSocket, and no container overhead, it can record continuously for as long as you need — hours, all day if required.

Why It Runs Forever

Each audio chunk is processed independently — no state carries over between chunks
GPU memory is fixed at ~8.5GB — the same model processes the same input size every time
No WebSocket connection to drop, no Docker container to restart
No context window that fills up or degrades over time

Auto-Stop Behaviour

Silence timeout: 5 minutes of no detected speech triggers auto-stop
This keeps your GPU free when you're not actively dictating

Real-World Testing

During live testing, VoxBar Pro Native ran continuously for over 50 minutes of natural dictation with zero interruptions, zero restarts, and zero degradation. The text quality at minute 50 was identical to minute 1.

Memory & Resource Footprint

Resource	Usage	Behaviour Over Time
GPU VRAM	~8.5GB (Voxtral F16 4B model)	Stable — no KV cache accumulation
RAM	~1-2GB (Python process)	Stable
Disk	Zero temp files	Audio processed in memory, never written to disk
Network	None	Fully offline — no localhost server, no internet

Accuracy & Speed

Metric	Value
Arena Score	9.5 combined — tied S-tier
Delivery	Real-time — words appear as you speak
Latency	<200ms from speech to text on screen
Multilingual	Yes — 13 languages supported
Punctuation	Context-aware, appears naturally
Capitalisation	Automatic, intelligent

Hardware Requirements

Requirement	Minimum	Recommended
GPU	NVIDIA with 10GB VRAM	NVIDIA with 12GB+ VRAM
RAM	8GB	16GB
Disk	~8.5GB for model	SSD recommended
OS	Windows 10/11	Windows 11
Software	NVIDIA drivers + CUDA	Latest NVIDIA drivers

License & Attribution

Detail	Value
Model	Voxtral-Mini-4B-Realtime-2602 (F16)
Creator	Mistral AI
License	Apache 2.0 (fully commercial)
Attribution	Not required (but appreciated)
Distribution	Can be bundled and sold commercially

Pro Native vs Pro Docker

Both run the same Voxtral 4B model. The difference is how it's deployed:

Feature	Pro Native	Pro Docker
Arena Score	9.5 combined	9.6 combined
Docker required	No	Yes
VRAM usage	~8.5GB	~14GB
Install	One-click	Docker Desktop + pull image
Session stability	Rock solid	Stable
AMD GPU	Not supported	Not supported
macOS	See Mac Models — native Apple Metal build available separately
Best for	Windows users who want simplicity and lower VRAM	Power users who want the highest arena score

VoxBar Pro Native is for Windows users who want S-tier accuracy with the simplest possible setup and 40% less VRAM. VoxBar Pro Docker achieves a slightly higher arena score (9.6 vs 9.5) but requires Docker Desktop and 14GB VRAM.