Text to Speech
The Text to Speech skill transforms written content into high-quality audio using either ElevenLabs or OpenAI's TTS engine. Choose from dozens of realistic voices across multiple languages and accents, or clone a custom voice using a short audio sample with ElevenLabs. The skill supports streaming output for real-time playback, batch processing for long documents, and SSML tags for fine-grained control over pauses, emphasis, and pronunciation. Audio can be exported as MP3, WAV, or OGG. Practical use cases include accessibility tools that read website content aloud, automated audiobook production from manuscripts, voiceover generation for video scripts, and dynamic IVR system prompts. The skill integrates directly with the Podcast Generator and Summarize skills, letting you summarize an article and immediately convert the result to audio. You can adjust speaking rate, pitch, and volume through simple natural language commands. When using ElevenLabs, multilingual voices allow seamless switching between languages mid-sentence, ideal for localized content production.
Installation
clawhub install tts
Install: clawhub install tts