5.3 KiB
5.3 KiB
TTS Options Research for Daily Digest Podcast
Executive Summary
After evaluating multiple TTS solutions, Piper TTS emerges as the best choice for a daily digest workflow, offering excellent quality at zero cost with full local control.
Option Comparison
1. Piper TTS ⭐ RECOMMENDED
- Cost: FREE (open source)
- Quality: ⭐⭐⭐⭐ Very good (neural voices, natural sounding)
- Setup: Easy-Medium (binary download + voice model)
- Platform: macOS, Linux, Windows
- Automation: CLI tool, easily scripted
- Pros:
- Completely free, no API limits
- Runs locally (privacy, no internet needed)
- Fast inference on CPU
- Multiple high-quality voices available
- Active development (GitHub: rhasspy/piper)
- Cons:
- Requires downloading voice models (~50-100MB each)
- Not quite as expressive as premium APIs
- Integration:
echo "Your digest content" | piper --model en_US-lessac-medium.onnx --output_file digest.mp3
2. macOS say Command
- Cost: FREE (built-in)
- Quality: ⭐⭐ Basic (functional but robotic)
- Setup: None (pre-installed)
- Platform: macOS only
- Automation: CLI, easily scripted
- Pros:
- Zero setup required
- Native macOS integration
- Multiple built-in voices
- Cons:
- Quality is noticeably robotic
- Limited voice options
- No neural/AI voices
- Integration:
say -v Samantha -o digest.aiff "Your digest content"
3. ElevenLabs Free Tier
- Cost: FREE tier: 10,000 characters/month (~10 min audio)
- Quality: ⭐⭐⭐⭐⭐ Excellent (best-in-class natural voices)
- Setup: Easy (API key signup)
- Platform: API-based (any platform)
- Automation: REST API or Python SDK
- Pros:
- Exceptional voice quality
- Voice cloning available (paid)
- Multiple languages
- Cons:
- 10K char limit is very restrictive for daily digest
- Paid tier starts at $5/month for 30K chars
- Requires internet, API dependency
- Could exceed limits quickly with daily content
- Integration: Python SDK or curl to API
4. OpenAI TTS API
- Cost: $0.015 per 1,000 characters (~$0.018/minute)
- Quality: ⭐⭐⭐⭐⭐ Excellent (natural, expressive)
- Setup: Easy (API key)
- Platform: API-based
- Automation: REST API
- Pros:
- High quality voices (alloy, echo, fable, etc.)
- Fast, reliable API
- Good for moderate usage
- Cons:
- Not free - costs add up (~$1-3/month for daily digest)
- Requires internet connection
- Rate limits apply
- Cost Estimate: Daily 5-min digest ≈ $2-4/month
5. Coqui TTS
- Cost: FREE (open source)
- Quality: ⭐⭐⭐⭐ Good (varies by model)
- Setup: Hard (Python environment, dependencies)
- Platform: macOS, Linux, Windows
- Automation: Python scripts
- Pros:
- Free and open source
- Multiple voice models available
- Voice cloning capability
- Cons:
- Complex setup (conda/pip, GPU recommended)
- Heavier resource usage than Piper
- Project maintenance has slowed (team laid off)
- Integration: Python script with TTS library
6. Google Cloud TTS
- Cost: FREE tier: 1M characters/month (WaveNet), then $4 per 1M
- Quality: ⭐⭐⭐⭐ Very good (WaveNet voices)
- Setup: Medium (GCP account, API setup)
- Platform: API-based
- Automation: REST API or SDK
- Pros:
- Generous free tier
- Multiple voice options
- Reliable infrastructure
- Cons:
- Requires GCP account
- API complexity
- Privacy concerns (sends text to cloud)
- Integration: gcloud CLI or API calls
7. Amazon Polly
- Cost: FREE tier: 5M characters/month for 12 months, then ~$4 per 1M
- Quality: ⭐⭐⭐⭐ Good (Neural voices available)
- Setup: Medium (AWS account)
- Platform: API-based
- Automation: AWS CLI or SDK
- Pros:
- Generous free tier initially
- Neural voices sound natural
- Cons:
- Requires AWS account
- Complexity of AWS ecosystem
- Integration: AWS CLI or boto3
Recommendation
Primary Choice: Piper TTS
- Best balance of quality, cost (free), and ease of automation
- Local processing means no privacy concerns
- No rate limits or API keys to manage
- Perfect for daily scheduled digest generation
Alternative if quality is paramount: OpenAI TTS
- Use if the ~$2-4/month cost is acceptable
- Slightly better voice quality
- Simpler than maintaining local models
Avoid for this use case:
- ElevenLabs free tier (too limiting for daily use)
- macOS say (quality too low for podcast format)
- Coqui (setup complexity not worth it vs Piper)
Suggested Integration Workflow
#!/bin/bash
# Daily Digest TTS Script
# 1. Fetch or read markdown content
CONTENT=$(cat digest.md)
# 2. Convert markdown to plain text (strip formatting)
PLAIN_TEXT=$(echo "$CONTENT" | pandoc -f markdown -t plain)
# 3. Generate audio with Piper
piper \
--model ~/.local/share/piper/en_US-lessac-medium.onnx \
--output_file "digest_$(date +%Y-%m-%d).mp3" \
<<< "$PLAIN_TEXT"
# 4. Optional: Upload to podcast host or serve locally
Voice Model Recommendations for Piper
| Voice | Style | Best For |
|---|---|---|
| lessac | Neutral, clear | News/digest content |
| libritts | Natural, varied | Long-form content |
| ljspeech | Classic TTS | Short announcements |