πŸš€ 8.35 Arena β€” NVIDIA Powered

VoxBar Nemotron

Ultra-lightweight real-time transcription. Just 600M parameters.

Powered by NVIDIA Nemotron Speech 0.6B β€” a FastConformer-RNNT streaming model built for speed and accuracy.

$39 one-time Β· πŸ”₯ LAUNCH: $19.50 with code EARLYBIRD

🎀 Transcribe your mic β€” or πŸ”Š listen to your system audio (meetings, podcasts, videos). All 100% local. Nothing leaves your machine.

0.6B
Parameters
<8%
Word Error Rate
~4.8GB
VRAM Usage
100%
Local & Private

How It Works

NVIDIA's FastConformer architecture β€” designed for streaming speech recognition from the ground up.

🎀

Captures your audio

Audio is captured at 16kHz from your microphone β€” or switch to system audio mode to capture anything playing on your PC (meetings, videos, podcasts). No virtual cables needed.

⚑

FastConformer encoder

NVIDIA's FastConformer architecture processes audio with attention-based encoding β€” understanding context, not just sounds.

🧠

RNNT decoder streams text

The Recurrent Neural Network Transducer (RNNT) decoder produces text tokens in real-time β€” designed for streaming from day one.

✍️

Words appear as you speak

Transcription flows directly into your textbox with millisecond latency. Smart autocorrect cleans up spacing and formatting during natural pauses. The smallest, fastest engine in the VoxBar lineup.

Accuracy & Speed

Metric Value
Arena Score 8.35 combined — Sys 8.7 / Mic 8.0
Architecture FastConformer (24-layer encoder) + RNN-T decoder
Chunk Sizes Configurable — 80ms, 160ms, 560ms, 1120ms
Language English
Punctuation Automatic — natively generated by the model
Capitalisation Automatic, intelligent

Memory & Resource Footprint

Resource Usage Behaviour Over Time
GPU VRAM ~4.8GB (Nemotron 0.6B) Stable — lightweight, barely touches your GPU
RAM ~1-2GB (Python process) Stable
Disk Zero temp files Audio processed in memory, never written to disk
Network None Fully offline — no internet required

Recording Limits

♾️

No Recording Limit

VoxBar Nemotron processes each audio chunk independently β€” no state accumulates, no context window fills up. GPU memory stays fixed at ~4.8GB. Record for hours without interruption.

⏱️

Auto-Stop Behaviour

Silence timeout: 15 minutes (900 seconds) of no detected speech triggers auto-stop.

Why VoxBar Nemotron Is Different

What you DON'T need

No internet connection β€” everything runs locally
No cloud processing β€” your voice never leaves your machine
No Docker required β€” runs natively with Python + CUDA
No usage limits β€” unlimited transcription, forever
No subscriptions β€” one-time purchase, lifetime license

What makes it unique

NVIDIA-engineered β€” built by the team behind NeMo and CUDA
System audio capture β€” transcribe meetings, YouTube, podcasts directly from your PC's audio output
Ultra-lightweight β€” just 600M parameters, runs on 4.8GB VRAM
Streaming-native β€” FastConformer-RNNT was designed for real-time from day one
Low resource usage β€” barely touches your GPU, great for multitasking
Overlay Mode β€” transparent overlay sits on top of any app with adjustable transparency and font sizes
Mid-text editing β€” click anywhere in your text to insert new speech at that position
Voice commands β€” say "delete" to remove highlighted text, use voice punctuation and formatting

Hardware Requirements

Requirement Minimum Recommended
GPU (NVIDIA) 5GB VRAM 6GB+ VRAM
RAM 8GB 16GB
Disk ~2GB (model + app) SSD
OS Windows 10/11 Windows 11
Software Python 3.10+ / CUDA Included in installer

Note: VoxBar Nemotron requires an NVIDIA GPU with CUDA support. AMD and Apple Silicon are not currently supported.

License & Attribution

VoxBarβ„’ Nemotron is powered by Nemotron Speech Streaming EN 0.6B, created by the NVIDIA NeMo team and released under the NVIDIA Open Model License.

VoxBarβ„’ is an independent product by Conjure Labs Limited and is not affiliated with, endorsed by, or sponsored by NVIDIA Corporation.

Nemotron vs GLM

Feature Nemotron GLM
Arena Score 8.35 combined 8.0 combined
VRAM ~4.8GB ~4GB
Architecture FastConformer + RNNT LLM-based (generative)
Languages English 17 languages
Best for Highest mid-tier accuracy, NVIDIA-optimised Multilingual needs, lightest GPU footprint

The lightest engine in the VoxBar lineup.

One-time purchase. Lifetime license. 2 machines. Zero cloud.

Coming Soon

Secure checkout via Lemon Squeezy / Stripe