#4 — Budget

VoxBar Lite

Instant transcription on any hardware. No GPU required.

Powered by Moonshine v2 (Useful Sensors)

How It Works

VoxBar Lite uses Moonshine v2 — a lightweight speech recognition model designed to run on anything, including machines with no dedicated GPU at all. Unlike the chunk-and-transcribe approach used by VoxBar AI and Ultra, Moonshine uses a true streaming architecture with an event-driven listener system.

Here's what happens, step by step:

  1. Opens your microphone via sounddevice — captures audio at 16kHz, 1024-sample blocks
  2. Feeds raw audio directly to the Moonshine Transcriber — no buffering, no temp files, no WAV conversion
  3. The Transcriber processes audio in real-time using its internal streaming pipeline
  4. Three event types fire as speech is detected:
  5. on_line_started — a new utterance has begun (live indicator appears)
  6. on_line_text_changed — the model is actively recognising words (live text updates)
  7. on_line_completed — the utterance is finished (text is committed permanently)
  8. Live text appears immediately as the model processes — you see words forming in real-time
  9. Completed lines are committed and the model begins listening for the next utterance
  10. Repeats forever — the streaming pipeline runs continuously

The key difference from every other VoxBar model: Moonshine never writes temp files to disk. Audio goes directly from your microphone into the model's memory. There is no WAV conversion step, no disk I/O, no cleanup needed.

Recording Limits

VoxBar Lite Has No Recording Limit

VoxBar Lite runs natively with no Docker, no server process, and no network connections. The Moonshine Transcriber processes audio in a continuous stream — there's no chunk accumulation or state that grows over time.

Important Note: Transcriber Lifecycle

When you stop and restart listening, VoxBar Lite creates a fresh Transcriber instance. This is because Moonshine's underlying C library doesn't support restarting a stopped stream. The model files are cached, so re-loading is fast (~1 second). This is invisible to the user — it just works.

Auto-Stop Behaviour

  • Silence timeout: 60 seconds of no detected speech
  • Check interval: Every 5 seconds
  • Designed for active dictation — stops promptly to save CPU resources

Memory & Resource Footprint

Resource Usage Behaviour Over Time
GPU VRAM 0GB (CPU-only) or <1GB (GPU-accelerated) ✅ Minimal — smallest footprint in the entire suite
RAM ~200-400MB ✅ Stable — streaming architecture has no accumulation
Disk Zero temp files ✅ Audio is never written to disk — direct memory processing
Network None ✅ Completely offline
CPU Moderate usage during active speech ✅ Drops to near-zero during silence

VoxBar Lite is the lightest model in the entire suite. It can run on machines that can't handle any other VoxBar product — older laptops, office PCs without dedicated GPUs, or machines where GPU resources are needed for other tasks.

Architecture Advantage

What makes VoxBar Lite special: It runs everywhere. While VoxBar Pro, AI, and Ultra all require NVIDIA GPUs with specific VRAM levels, Moonshine runs on:
- Any NVIDIA GPU (even ancient ones with <1GB VRAM)
- AMD GPUs
- Intel integrated graphics
- CPU-only (no GPU at all)

This makes it the universal fallback — the one VoxBar model that every customer can use, regardless of their hardware.

Streaming vs Chunking:
Unlike VoxBar AI and Ultra which buffer audio into chunks and batch-process them, Moonshine uses true streaming — audio flows continuously into the model, and text events fire as the model recognises speech. This gives a more responsive feel than chunked models, even though the underlying accuracy may be lower.

What users DON'T have to worry about:
- ❌ No GPU required — works on pure CPU
- ❌ No Docker — runs natively
- ❌ No internet connection — completely offline
- ❌ No temp files — audio never touches disk
- ❌ No VRAM concerns — uses system RAM instead
- ❌ No cloud processing — your voice stays on your machine
- ❌ No API keys — the model runs locally
- ❌ No usage limits — unlimited transcription, forever

What users DO need to know:
- ⚠️ Lower accuracy than GPU-powered models (~7-8% WER vs 1.69-5.6%)
- ⚠️ Needs tuning — hallucination filtering and silence detection still being refined
- ⚠️ No built-in punctuation — Moonshine outputs raw text without periods or commas
- ⚠️ English-focused — multi-language model available but accuracy varies
- ⚠️ First launch downloads ~200MB model files (cached after that)

Accuracy & Speed

Metric Value
Delivery Streaming — live text updates as you speak
Latency ~0.5 seconds (update interval configurable)
Word Error Rate ~7-8% (usable but needs tuning)
Inference Speed 5x faster than Whisper on CPU
Punctuation ❌ Not built-in — requires post-processing
Capitalisation ❌ Not built-in — requires post-processing
Languages English primary, multi-language available
Hallucination Risk ⚠️ Moderate — silence detection needs improvement

Accuracy Context

Moonshine's ~7-8% WER means roughly 1 in 13 words may be incorrect. For casual dictation, quick notes, and brainstorming, this is perfectly usable. For professional documents or medical/legal transcription, the GPU-powered models (VoxBar AI or Ultra) are recommended.

Hardware Requirements

Requirement Minimum Recommended
GPU ❌ Not required Any GPU for acceleration
GPU (NVIDIA) ✅ Supported (optional) Any NVIDIA GPU
GPU (AMD) ✅ Supported (optional) Any AMD GPU
GPU (Intel) ✅ Supported (optional) Intel integrated
RAM 4GB 8GB+
Disk ~200MB for model SSD recommended
OS Windows 10/11 Windows 10/11
Software Python 3.10+ pip install moonshine-voice
Docker ❌ Not required

License & Attribution

Detail Value
Model Moonshine v2
Creator Useful Sensors
License Apache 2.0 (fully commercial)
Attribution Not required (but appreciated)
Distribution Can be bundled and sold commercially

Where It Fits in the Suite

Feature VoxBar Pro VoxBar AI VoxBar Ultra VoxBar Lite
Accuracy ★★★★★ ★★★★★ ★★★★★ ★★★☆☆
GPU Required Yes (8GB+) Yes (6GB+) Yes (2GB+) No
VRAM ~8-10GB ~6-8GB ~2GB 0GB
Docker Yes No No No
CPU-only
AMD support Docker only
Punctuation
Model size ~8GB ~5GB ~1.2GB ~200MB
Best for Premium users Long sessions Fast English Everyone — any hardware

Bottom line: VoxBar Lite is the universal access point to the VoxBar suite. It's the model that ensures every customer can use VoxBar, regardless of their hardware. It won't match the accuracy of the GPU-powered models, but it brings voice transcription to machines that otherwise couldn't run any AI model at all.