180 lines
5.3 KiB
Markdown
180 lines
5.3 KiB
Markdown
# TTS Options Research for Daily Digest Podcast
|
|
|
|
## Executive Summary
|
|
After evaluating multiple TTS solutions, **Piper TTS** emerges as the best choice for a daily digest workflow, offering excellent quality at zero cost with full local control.
|
|
|
|
---
|
|
|
|
## Option Comparison
|
|
|
|
### 1. **Piper TTS** ⭐ RECOMMENDED
|
|
- **Cost**: FREE (open source)
|
|
- **Quality**: ⭐⭐⭐⭐ Very good (neural voices, natural sounding)
|
|
- **Setup**: Easy-Medium (binary download + voice model)
|
|
- **Platform**: macOS, Linux, Windows
|
|
- **Automation**: CLI tool, easily scripted
|
|
- **Pros**:
|
|
- Completely free, no API limits
|
|
- Runs locally (privacy, no internet needed)
|
|
- Fast inference on CPU
|
|
- Multiple high-quality voices available
|
|
- Active development (GitHub: rhasspy/piper)
|
|
- **Cons**:
|
|
- Requires downloading voice models (~50-100MB each)
|
|
- Not quite as expressive as premium APIs
|
|
- **Integration**:
|
|
```bash
|
|
echo "Your digest content" | piper --model en_US-lessac-medium.onnx --output_file digest.mp3
|
|
```
|
|
|
|
### 2. **macOS say Command**
|
|
- **Cost**: FREE (built-in)
|
|
- **Quality**: ⭐⭐ Basic (functional but robotic)
|
|
- **Setup**: None (pre-installed)
|
|
- **Platform**: macOS only
|
|
- **Automation**: CLI, easily scripted
|
|
- **Pros**:
|
|
- Zero setup required
|
|
- Native macOS integration
|
|
- Multiple built-in voices
|
|
- **Cons**:
|
|
- Quality is noticeably robotic
|
|
- Limited voice options
|
|
- No neural/AI voices
|
|
- **Integration**:
|
|
```bash
|
|
say -v Samantha -o digest.aiff "Your digest content"
|
|
```
|
|
|
|
### 3. **ElevenLabs Free Tier**
|
|
- **Cost**: FREE tier: 10,000 characters/month (~10 min audio)
|
|
- **Quality**: ⭐⭐⭐⭐⭐ Excellent (best-in-class natural voices)
|
|
- **Setup**: Easy (API key signup)
|
|
- **Platform**: API-based (any platform)
|
|
- **Automation**: REST API or Python SDK
|
|
- **Pros**:
|
|
- Exceptional voice quality
|
|
- Voice cloning available (paid)
|
|
- Multiple languages
|
|
- **Cons**:
|
|
- 10K char limit is very restrictive for daily digest
|
|
- Paid tier starts at $5/month for 30K chars
|
|
- Requires internet, API dependency
|
|
- Could exceed limits quickly with daily content
|
|
- **Integration**: Python SDK or curl to API
|
|
|
|
### 4. **OpenAI TTS API**
|
|
- **Cost**: $0.015 per 1,000 characters (~$0.018/minute)
|
|
- **Quality**: ⭐⭐⭐⭐⭐ Excellent (natural, expressive)
|
|
- **Setup**: Easy (API key)
|
|
- **Platform**: API-based
|
|
- **Automation**: REST API
|
|
- **Pros**:
|
|
- High quality voices (alloy, echo, fable, etc.)
|
|
- Fast, reliable API
|
|
- Good for moderate usage
|
|
- **Cons**:
|
|
- Not free - costs add up (~$1-3/month for daily digest)
|
|
- Requires internet connection
|
|
- Rate limits apply
|
|
- **Cost Estimate**: Daily 5-min digest ≈ $2-4/month
|
|
|
|
### 5. **Coqui TTS**
|
|
- **Cost**: FREE (open source)
|
|
- **Quality**: ⭐⭐⭐⭐ Good (varies by model)
|
|
- **Setup**: Hard (Python environment, dependencies)
|
|
- **Platform**: macOS, Linux, Windows
|
|
- **Automation**: Python scripts
|
|
- **Pros**:
|
|
- Free and open source
|
|
- Multiple voice models available
|
|
- Voice cloning capability
|
|
- **Cons**:
|
|
- Complex setup (conda/pip, GPU recommended)
|
|
- Heavier resource usage than Piper
|
|
- Project maintenance has slowed (team laid off)
|
|
- **Integration**: Python script with TTS library
|
|
|
|
### 6. **Google Cloud TTS**
|
|
- **Cost**: FREE tier: 1M characters/month (WaveNet), then $4 per 1M
|
|
- **Quality**: ⭐⭐⭐⭐ Very good (WaveNet voices)
|
|
- **Setup**: Medium (GCP account, API setup)
|
|
- **Platform**: API-based
|
|
- **Automation**: REST API or SDK
|
|
- **Pros**:
|
|
- Generous free tier
|
|
- Multiple voice options
|
|
- Reliable infrastructure
|
|
- **Cons**:
|
|
- Requires GCP account
|
|
- API complexity
|
|
- Privacy concerns (sends text to cloud)
|
|
- **Integration**: gcloud CLI or API calls
|
|
|
|
### 7. **Amazon Polly**
|
|
- **Cost**: FREE tier: 5M characters/month for 12 months, then ~$4 per 1M
|
|
- **Quality**: ⭐⭐⭐⭐ Good (Neural voices available)
|
|
- **Setup**: Medium (AWS account)
|
|
- **Platform**: API-based
|
|
- **Automation**: AWS CLI or SDK
|
|
- **Pros**:
|
|
- Generous free tier initially
|
|
- Neural voices sound natural
|
|
- **Cons**:
|
|
- Requires AWS account
|
|
- Complexity of AWS ecosystem
|
|
- **Integration**: AWS CLI or boto3
|
|
|
|
---
|
|
|
|
## Recommendation
|
|
|
|
**Primary Choice: Piper TTS**
|
|
- Best balance of quality, cost (free), and ease of automation
|
|
- Local processing means no privacy concerns
|
|
- No rate limits or API keys to manage
|
|
- Perfect for daily scheduled digest generation
|
|
|
|
**Alternative if quality is paramount: OpenAI TTS**
|
|
- Use if the ~$2-4/month cost is acceptable
|
|
- Slightly better voice quality
|
|
- Simpler than maintaining local models
|
|
|
|
**Avoid for this use case:**
|
|
- ElevenLabs free tier (too limiting for daily use)
|
|
- macOS say (quality too low for podcast format)
|
|
- Coqui (setup complexity not worth it vs Piper)
|
|
|
|
---
|
|
|
|
## Suggested Integration Workflow
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Daily Digest TTS Script
|
|
|
|
# 1. Fetch or read markdown content
|
|
CONTENT=$(cat digest.md)
|
|
|
|
# 2. Convert markdown to plain text (strip formatting)
|
|
PLAIN_TEXT=$(echo "$CONTENT" | pandoc -f markdown -t plain)
|
|
|
|
# 3. Generate audio with Piper
|
|
piper \
|
|
--model ~/.local/share/piper/en_US-lessac-medium.onnx \
|
|
--output_file "digest_$(date +%Y-%m-%d).mp3" \
|
|
<<< "$PLAIN_TEXT"
|
|
|
|
# 4. Optional: Upload to podcast host or serve locally
|
|
```
|
|
|
|
---
|
|
|
|
## Voice Model Recommendations for Piper
|
|
|
|
| Voice | Style | Best For |
|
|
|-------|-------|----------|
|
|
| lessac | Neutral, clear | News/digest content |
|
|
| libritts | Natural, varied | Long-form content |
|
|
| ljspeech | Classic TTS | Short announcements |
|