Ultra-lightweight real-time transcription. Just 600M parameters.
Powered by NVIDIA Nemotron Speech 0.6B β a FastConformer-RNNT streaming model built for speed and accuracy.
$39 one-time Β· π₯ LAUNCH: $19.50 with code EARLYBIRD
π€ Transcribe your mic β or π listen to your system audio (meetings, podcasts, videos). All 100% local. Nothing leaves your machine.
NVIDIA's FastConformer architecture β designed for streaming speech recognition from the ground up.
Audio is captured at 16kHz from your microphone β or switch to system audio mode to capture anything playing on your PC (meetings, videos, podcasts). No virtual cables needed.
NVIDIA's FastConformer architecture processes audio with attention-based encoding β understanding context, not just sounds.
The Recurrent Neural Network Transducer (RNNT) decoder produces text tokens in real-time β designed for streaming from day one.
Transcription flows directly into your textbox with millisecond latency. Smart autocorrect cleans up spacing and formatting during natural pauses. The smallest, fastest engine in the VoxBar lineup.
| Metric | Value |
|---|---|
| Arena Score | 8.35 combined — Sys 8.7 / Mic 8.0 |
| Architecture | FastConformer (24-layer encoder) + RNN-T decoder |
| Chunk Sizes | Configurable — 80ms, 160ms, 560ms, 1120ms |
| Language | English |
| Punctuation | Automatic — natively generated by the model |
| Capitalisation | Automatic, intelligent |
| Resource | Usage | Behaviour Over Time |
|---|---|---|
| GPU VRAM | ~4.8GB (Nemotron 0.6B) | Stable — lightweight, barely touches your GPU |
| RAM | ~1-2GB (Python process) | Stable |
| Disk | Zero temp files | Audio processed in memory, never written to disk |
| Network | None | Fully offline — no internet required |
VoxBar Nemotron processes each audio chunk independently β no state accumulates, no context window fills up. GPU memory stays fixed at ~4.8GB. Record for hours without interruption.
Silence timeout: 15 minutes (900 seconds) of no detected speech triggers auto-stop.
| Requirement | Minimum | Recommended |
|---|---|---|
| GPU (NVIDIA) | 5GB VRAM | 6GB+ VRAM |
| RAM | 8GB | 16GB |
| Disk | ~2GB (model + app) | SSD |
| OS | Windows 10/11 | Windows 11 |
| Software | Python 3.10+ / CUDA | Included in installer |
Note: VoxBar Nemotron requires an NVIDIA GPU with CUDA support. AMD and Apple Silicon are not currently supported.
VoxBarβ’ Nemotron is powered by Nemotron Speech Streaming EN 0.6B, created by the NVIDIA NeMo team and released under the NVIDIA Open Model License.
VoxBarβ’ is an independent product by Conjure Labs Limited and is not affiliated with, endorsed by, or sponsored by NVIDIA Corporation.
| Feature | Nemotron | GLM |
|---|---|---|
| Arena Score | 8.35 combined | 8.0 combined |
| VRAM | ~4.8GB | ~4GB |
| Architecture | FastConformer + RNNT | LLM-based (generative) |
| Languages | English | 17 languages |
| Best for | Highest mid-tier accuracy, NVIDIA-optimised | Multilingual needs, lightest GPU footprint |
One-time purchase. Lifetime license. 2 machines. Zero cloud.
Coming SoonSecure checkout via Lemon Squeezy / Stripe