VoxBar Lite — Powered by Moonshine v2 (Useful Sensors)

How It Works

VoxBar Lite uses Moonshine v2 — a lightweight speech recognition model designed to run on anything, including machines with no dedicated GPU at all. Unlike the chunk-and-transcribe approach used by VoxBar AI and Ultra, Moonshine uses a true streaming architecture with an event-driven listener system.

Here's what happens, step by step:

Opens your microphone via sounddevice — captures audio at 16kHz, 1024-sample blocks
Feeds raw audio directly to the Moonshine Transcriber — no buffering, no temp files, no WAV conversion
The Transcriber processes audio in real-time using its internal streaming pipeline
Three event types fire as speech is detected:
on_line_started — a new utterance has begun (live indicator appears)
on_line_text_changed — the model is actively recognising words (live text updates)
on_line_completed — the utterance is finished (text is committed permanently)
Live text appears immediately as the model processes — you see words forming in real-time
Completed lines are committed and the model begins listening for the next utterance
Repeats forever — the streaming pipeline runs continuously

The key difference from every other VoxBar model: Moonshine never writes temp files to disk. Audio goes directly from your microphone into the model's memory. There is no WAV conversion step, no disk I/O, no cleanup needed.

Recording Limits

VoxBar Lite Has No Recording Limit

VoxBar Lite runs natively with no Docker, no server process, and no network connections. The Moonshine Transcriber processes audio in a continuous stream — there's no chunk accumulation or state that grows over time.

Important Note: Transcriber Lifecycle

When you stop and restart listening, VoxBar Lite creates a fresh Transcriber instance. This is because Moonshine's underlying C library doesn't support restarting a stopped stream. The model files are cached, so re-loading is fast (~1 second). This is invisible to the user — it just works.

Auto-Stop Behaviour

Silence timeout: 60 seconds of no detected speech
Check interval: Every 5 seconds
Designed for active dictation — stops promptly to save CPU resources

Memory & Resource Footprint

Resource	Usage	Behaviour Over Time
GPU VRAM	0GB (CPU-only) or <1GB (GPU-accelerated)	✅ Minimal — smallest footprint in the entire suite
RAM	~200-400MB	✅ Stable — streaming architecture has no accumulation
Disk	Zero temp files	✅ Audio is never written to disk — direct memory processing
Network	None	✅ Completely offline
CPU	Moderate usage during active speech	✅ Drops to near-zero during silence

VoxBar Lite is the lightest model in the entire suite. It can run on machines that can't handle any other VoxBar product — older laptops, office PCs without dedicated GPUs, or machines where GPU resources are needed for other tasks.

Architecture Advantage

What makes VoxBar Lite special: It runs everywhere. While VoxBar Pro, AI, and Ultra all require NVIDIA GPUs with specific VRAM levels, Moonshine runs on:
- Any NVIDIA GPU (even ancient ones with <1GB VRAM)
- AMD GPUs
- Intel integrated graphics
- CPU-only (no GPU at all)

This makes it the universal fallback — the one VoxBar model that every customer can use, regardless of their hardware.

Streaming vs Chunking:
Unlike VoxBar AI and Ultra which buffer audio into chunks and batch-process them, Moonshine uses true streaming — audio flows continuously into the model, and text events fire as the model recognises speech. This gives a more responsive feel than chunked models, even though the underlying accuracy may be lower.

What users DON'T have to worry about:
- ❌ No GPU required — works on pure CPU
- ❌ No Docker — runs natively
- ❌ No internet connection — completely offline
- ❌ No temp files — audio never touches disk
- ❌ No VRAM concerns — uses system RAM instead
- ❌ No cloud processing — your voice stays on your machine
- ❌ No API keys — the model runs locally
- ❌ No usage limits — unlimited transcription, forever

What users DO need to know:
- ⚠️ Lower accuracy than GPU-powered models (~7-8% WER vs 1.69-5.6%)
- ⚠️ Needs tuning — hallucination filtering and silence detection still being refined
- ⚠️ No built-in punctuation — Moonshine outputs raw text without periods or commas
- ⚠️ English-focused — multi-language model available but accuracy varies
- ⚠️ First launch downloads ~200MB model files (cached after that)

Accuracy & Speed

Metric	Value
Delivery	Streaming — live text updates as you speak
Latency	~0.5 seconds (update interval configurable)
Word Error Rate	~7-8% (usable but needs tuning)
Inference Speed	5x faster than Whisper on CPU
Punctuation	❌ Not built-in — requires post-processing
Capitalisation	❌ Not built-in — requires post-processing
Languages	English primary, multi-language available
Hallucination Risk	⚠️ Moderate — silence detection needs improvement

Accuracy Context

Moonshine's ~7-8% WER means roughly 1 in 13 words may be incorrect. For casual dictation, quick notes, and brainstorming, this is perfectly usable. For professional documents or medical/legal transcription, the GPU-powered models (VoxBar AI or Ultra) are recommended.

Hardware Requirements

Requirement	Minimum	Recommended
GPU	❌ Not required	Any GPU for acceleration
GPU (NVIDIA)	✅ Supported (optional)	Any NVIDIA GPU
GPU (AMD)	✅ Supported (optional)	Any AMD GPU
GPU (Intel)	✅ Supported (optional)	Intel integrated
RAM	4GB	8GB+
Disk	~200MB for model	SSD recommended
OS	Windows 10/11	Windows 10/11
Software	Python 3.10+	pip install moonshine-voice
Docker	❌ Not required	—

License & Attribution

Detail	Value
Model	Moonshine v2
Creator	Useful Sensors
License	Apache 2.0 (fully commercial)
Attribution	Not required (but appreciated)
Distribution	Can be bundled and sold commercially

Where It Fits in the Suite

Feature	VoxBar Pro	VoxBar AI	VoxBar Ultra	VoxBar Lite
Accuracy	★★★★★	★★★★★	★★★★★	★★★☆☆
GPU Required	Yes (8GB+)	Yes (6GB+)	Yes (2GB+)	No
VRAM	~8-10GB	~6-8GB	~2GB	0GB
Docker	Yes	No	No	No
CPU-only	❌	❌	❌	✅
AMD support	Docker only	❌	❌	✅
Punctuation	✅	✅	✅	❌
Model size	~8GB	~5GB	~1.2GB	~200MB
Best for	Premium users	Long sessions	Fast English	Everyone — any hardware

Bottom line: VoxBar Lite is the universal access point to the VoxBar suite. It's the model that ensures every customer can use VoxBar, regardless of their hardware. It won't match the accuracy of the GPU-powered models, but it brings voice transcription to machines that otherwise couldn't run any AI model at all.