Your Voice Never Leaves Your Machine

The Problem with Cloud Transcription

Every major transcription service — Otter.ai, Rev, Google Docs voice typing, Microsoft Dictate — works the same way: your microphone captures audio, that audio is sent to a server, the server transcribes it, and the text comes back. Your voice leaves your machine the moment you press "record."

For casual note-taking, that might be acceptable. But for anything sensitive — medical dictation, legal notes, board meetings, private conversations, financial discussions — sending raw audio to a third-party server is a genuine risk. You don't control the server. You don't know who has access. You can't verify what happens to the recording.

HIPAA, GDPR, SOC 2, attorney-client privilege — these frameworks exist precisely because sensitive data shouldn't leave controlled environments. Cloud transcription breaks that principle by design.

How VoxBar™ Is Different

VoxBar™ runs the entire transcription pipeline on your computer. The AI model loads into your GPU (or CPU). Audio goes from your microphone directly into the model. Text comes out. Nothing is transmitted, uploaded, or logged anywhere.

There is no "opt out" of cloud processing — because there is no cloud processing. The application doesn't have networking code. It doesn't open outbound connections. It doesn't phone home.

What Happens to Your Audio — Step by Step

Here's exactly what happens when you press record in each VoxBar™ engine:

🏆 VoxBar™ Pro (Voxtral 4B)

Mic captures audio → audio streams over a local WebSocket to a Docker container on your machine → Voxtral processes it in GPU memory → text tokens stream back → text appears in the textbox. Audio exists only in memory. No files written. No network egress.

⭐ VoxBar™ AI (Canary Qwen 2.5B)

Mic captures 5 seconds of audio → written to a temporary WAV file → Canary transcribes it on GPU → temp file is immediately deleted → text appended to textbox → repeat. Each chunk is independent. No accumulation on disk.

⚡ VoxBar™ Kyutai (STT 1B)

Mic captures audio → audio passed directly to the Mimi codec in GPU memory → semantic tokens decoded frame-by-frame → text appears in real-time. Audio exists only as a numpy array in RAM. No files written. No network.

🚀 VoxBar™ Nemotron (0.6B)

Mic captures audio → audio streamed to FastConformer-RNNT on GPU → text tokens decoded frame-by-frame → text appears in textbox. Pure in-memory streaming. No temp files.

💪 VoxBar™ Whisper+ (distil-large-v3)

Mic captures audio → numpy array passed directly to CTranslate2 Whisper model → transcribed in GPU/CPU memory → anti-hallucination filter cleans output → text appears. Zero disk I/O. No temp files at all.

System Audio — Same Privacy

When you switch to system audio mode (transcribing your PC's output — meetings, videos, podcasts), the same privacy guarantees apply. System audio is captured via Windows loopback, passed directly to the model in memory, and transcribed locally. The audio never touches the network.

What About Licensing?

The only network call VoxBar™ makes is during license activation — a one-time check with Lemon Squeezy to validate your purchase. This sends your license key (not audio, not text, not usage data) and receives a yes/no response. After activation, VoxBar™ works fully offline.

There is no usage tracking, no analytics, no crash reporting, no telemetry of any kind. We don't know how often you use VoxBar™. We don't know what you transcribe. We don't want to know.

Why We Built It This Way

VoxBar™ was built by a developer who needed to transcribe sensitive client conversations and didn't trust cloud services with the audio. The product exists because of the privacy requirement — not in spite of it.

Privacy isn't a feature we added. It's the reason VoxBar™ exists. Every architectural decision — local models, no Docker networking egress, temp file deletion, in-memory processing — flows from that one principle: your voice should never leave your machine.

The VoxBar™ Privacy Promise

✅ All transcription happens locally on your hardware
✅ No audio is ever uploaded, transmitted, or logged
✅ No telemetry, analytics, or usage tracking
✅ No internet required after initial license activation
✅ Temp files (where used) are deleted immediately after processing
✅ Open-source AI models — you can inspect what runs on your machine