Listen Back: Text-to-Speech and Audio Studio Come to VoxBee

VoxBee already turns your voice into text — push-to-talk dictation, file transcripts, meeting notes. As of v0.8, it goes the other way too. Every note, transcript, summary, and history row in VoxBee can now be read back aloud, and a new Audio Studio page lets you paste any text and export it as MP3 or WAV.

A Floating Dock, Everywhere You Read

Anywhere there's text in VoxBee, there's a small play button next to it. Press it and a floating playback dock appears at the bottom-right of the window with play / pause, voice picker, and speed control. Move between notes, transcripts, and meeting summaries without losing your place — the dock keeps playing until you stop it.

Six Engines, One Interface

VoxBee ships with six TTS engines under a single playback interface:

System — Apple's built-in AVSpeechSynthesizer. Free, on-device, no internet, no API key. Great for quick re-reads.
OpenAI — natural, multilingual, well-rounded. If you already use OpenAI for STT or summaries, the same key works here.
ElevenLabs — the gold standard for expressive, emotive voices. Per-word timestamps so highlighting follows along precisely.
Deepgram Aura — fast, low-latency, great for long passages.
Groq PlayAI — high speed at low cost, ideal for batch playback.
Cartesia — Sonic-class voices with native word-level timestamps.

Cloud engines are bring-your-own-key. Drop your key once into Settings → Speech and it powers TTS automatically — VoxBee shares a single Keychain entry across STT, summarization, and TTS, so an OpenAI key entered for Whisper is the same key used for GPT summaries and TTS.

Audio Studio: Paste, Preview, Export

The new Audio Studio page in VoxBee is purpose-built for exporting audio. Paste any text, pick an engine and voice, preview a short sample, and export the full piece to MP3 or WAV. Estimated cost is shown before you commit — exports that would run above $1 ask for confirmation before consuming API credits.

Cached, Not Re-billed

Replaying a piece you've already heard never hits the network. VoxBee caches generated audio on disk by content + voice + speed + format, so a re-listen is instant and free. The cache has a 500 MB LRU cap by default, configurable from Settings → Voice.

What's Not Here Yet

v0.8 is the first TTS release. A few things on the roadmap for v0.9+:

A global hotkey to read whatever text you have selected, from any app.
Voice cloning — capture a sample of your own voice and use it in Audio Studio.
A pronunciation dictionary, SSML support, and media key + Now Playing widget integration.

Track progress on GitHub #297.

Download VoxBee and try it free for 14 days. The system TTS engine works without any API key — no signup, no setup.