How It Works
VoxBar Lite uses Moonshine v2 — a lightweight speech recognition model designed to run on anything, including machines with no dedicated GPU at all. Unlike the chunk-and-transcribe approach used by VoxBar AI and Ultra, Moonshine uses a true streaming architecture with an event-driven listener system.
Here's what happens, step by step:
- Opens your microphone via sounddevice — captures audio at 16kHz, 1024-sample blocks
- Feeds raw audio directly to the Moonshine Transcriber — no buffering, no temp files, no WAV conversion
- The Transcriber processes audio in real-time using its internal streaming pipeline
- Three event types fire as speech is detected:
on_line_started— a new utterance has begun (live indicator appears)on_line_text_changed— the model is actively recognising words (live text updates)on_line_completed— the utterance is finished (text is committed permanently)- Live text appears immediately as the model processes — you see words forming in real-time
- Completed lines are committed and the model begins listening for the next utterance
- Repeats forever — the streaming pipeline runs continuously
The key difference from every other VoxBar model: Moonshine never writes temp files to disk. Audio goes directly from your microphone into the model's memory. There is no WAV conversion step, no disk I/O, no cleanup needed.
Recording Limits
VoxBar Lite Has No Recording Limit
VoxBar Lite runs natively with no Docker, no server process, and no network connections. The Moonshine Transcriber processes audio in a continuous stream — there's no chunk accumulation or state that grows over time.
Important Note: Transcriber Lifecycle
When you stop and restart listening, VoxBar Lite creates a fresh Transcriber instance. This is because Moonshine's underlying C library doesn't support restarting a stopped stream. The model files are cached, so re-loading is fast (~1 second). This is invisible to the user — it just works.
Auto-Stop Behaviour
- Silence timeout: 60 seconds of no detected speech
- Check interval: Every 5 seconds
- Designed for active dictation — stops promptly to save CPU resources
Memory & Resource Footprint
| Resource | Usage | Behaviour Over Time |
|---|---|---|
| GPU VRAM | 0GB (CPU-only) or <1GB (GPU-accelerated) | ✅ Minimal — smallest footprint in the entire suite |
| RAM | ~200-400MB | ✅ Stable — streaming architecture has no accumulation |
| Disk | Zero temp files | ✅ Audio is never written to disk — direct memory processing |
| Network | None | ✅ Completely offline |
| CPU | Moderate usage during active speech | ✅ Drops to near-zero during silence |
VoxBar Lite is the lightest model in the entire suite. It can run on machines that can't handle any other VoxBar product — older laptops, office PCs without dedicated GPUs, or machines where GPU resources are needed for other tasks.
Architecture Advantage
What makes VoxBar Lite special: It runs everywhere. While VoxBar Pro, AI, and Ultra all require NVIDIA GPUs with specific VRAM levels, Moonshine runs on:
- Any NVIDIA GPU (even ancient ones with <1GB VRAM)
- AMD GPUs
- Intel integrated graphics
- CPU-only (no GPU at all)
This makes it the universal fallback — the one VoxBar model that every customer can use, regardless of their hardware.
Streaming vs Chunking:
Unlike VoxBar AI and Ultra which buffer audio into chunks and batch-process them, Moonshine uses true streaming — audio flows continuously into the model, and text events fire as the model recognises speech. This gives a more responsive feel than chunked models, even though the underlying accuracy may be lower.
What users DON'T have to worry about:
- ❌ No GPU required — works on pure CPU
- ❌ No Docker — runs natively
- ❌ No internet connection — completely offline
- ❌ No temp files — audio never touches disk
- ❌ No VRAM concerns — uses system RAM instead
- ❌ No cloud processing — your voice stays on your machine
- ❌ No API keys — the model runs locally
- ❌ No usage limits — unlimited transcription, forever
What users DO need to know:
- ⚠️ Lower accuracy than GPU-powered models (~7-8% WER vs 1.69-5.6%)
- ⚠️ Needs tuning — hallucination filtering and silence detection still being refined
- ⚠️ No built-in punctuation — Moonshine outputs raw text without periods or commas
- ⚠️ English-focused — multi-language model available but accuracy varies
- ⚠️ First launch downloads ~200MB model files (cached after that)
Accuracy & Speed
| Metric | Value |
|---|---|
| Delivery | Streaming — live text updates as you speak |
| Latency | ~0.5 seconds (update interval configurable) |
| Word Error Rate | ~7-8% (usable but needs tuning) |
| Inference Speed | 5x faster than Whisper on CPU |
| Punctuation | ❌ Not built-in — requires post-processing |
| Capitalisation | ❌ Not built-in — requires post-processing |
| Languages | English primary, multi-language available |
| Hallucination Risk | ⚠️ Moderate — silence detection needs improvement |
Accuracy Context
Moonshine's ~7-8% WER means roughly 1 in 13 words may be incorrect. For casual dictation, quick notes, and brainstorming, this is perfectly usable. For professional documents or medical/legal transcription, the GPU-powered models (VoxBar AI or Ultra) are recommended.
Hardware Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| GPU | ❌ Not required | Any GPU for acceleration |
| GPU (NVIDIA) | ✅ Supported (optional) | Any NVIDIA GPU |
| GPU (AMD) | ✅ Supported (optional) | Any AMD GPU |
| GPU (Intel) | ✅ Supported (optional) | Intel integrated |
| RAM | 4GB | 8GB+ |
| Disk | ~200MB for model | SSD recommended |
| OS | Windows 10/11 | Windows 10/11 |
| Software | Python 3.10+ | pip install moonshine-voice |
| Docker | ❌ Not required | — |
License & Attribution
| Detail | Value |
|---|---|
| Model | Moonshine v2 |
| Creator | Useful Sensors |
| License | Apache 2.0 (fully commercial) |
| Attribution | Not required (but appreciated) |
| Distribution | Can be bundled and sold commercially |
Where It Fits in the Suite
| Feature | VoxBar Pro | VoxBar AI | VoxBar Ultra | VoxBar Lite |
|---|---|---|---|---|
| Accuracy | ★★★★★ | ★★★★★ | ★★★★★ | ★★★☆☆ |
| GPU Required | Yes (8GB+) | Yes (6GB+) | Yes (2GB+) | No |
| VRAM | ~8-10GB | ~6-8GB | ~2GB | 0GB |
| Docker | Yes | No | No | No |
| CPU-only | ❌ | ❌ | ❌ | ✅ |
| AMD support | Docker only | ❌ | ❌ | ✅ |
| Punctuation | ✅ | ✅ | ✅ | ❌ |
| Model size | ~8GB | ~5GB | ~1.2GB | ~200MB |
| Best for | Premium users | Long sessions | Fast English | Everyone — any hardware |
Bottom line: VoxBar Lite is the universal access point to the VoxBar suite. It's the model that ensures every customer can use VoxBar, regardless of their hardware. It won't match the accuracy of the GPU-powered models, but it brings voice transcription to machines that otherwise couldn't run any AI model at all.