test-repo/tts-research-report.md

180 lines
5.3 KiB
Markdown

# TTS Options Research for Daily Digest Podcast
## Executive Summary
After evaluating multiple TTS solutions, **Piper TTS** emerges as the best choice for a daily digest workflow, offering excellent quality at zero cost with full local control.
---
## Option Comparison
### 1. **Piper TTS** ⭐ RECOMMENDED
- **Cost**: FREE (open source)
- **Quality**: ⭐⭐⭐⭐ Very good (neural voices, natural sounding)
- **Setup**: Easy-Medium (binary download + voice model)
- **Platform**: macOS, Linux, Windows
- **Automation**: CLI tool, easily scripted
- **Pros**:
- Completely free, no API limits
- Runs locally (privacy, no internet needed)
- Fast inference on CPU
- Multiple high-quality voices available
- Active development (GitHub: rhasspy/piper)
- **Cons**:
- Requires downloading voice models (~50-100MB each)
- Not quite as expressive as premium APIs
- **Integration**:
```bash
echo "Your digest content" | piper --model en_US-lessac-medium.onnx --output_file digest.mp3
```
### 2. **macOS say Command**
- **Cost**: FREE (built-in)
- **Quality**: ⭐⭐ Basic (functional but robotic)
- **Setup**: None (pre-installed)
- **Platform**: macOS only
- **Automation**: CLI, easily scripted
- **Pros**:
- Zero setup required
- Native macOS integration
- Multiple built-in voices
- **Cons**:
- Quality is noticeably robotic
- Limited voice options
- No neural/AI voices
- **Integration**:
```bash
say -v Samantha -o digest.aiff "Your digest content"
```
### 3. **ElevenLabs Free Tier**
- **Cost**: FREE tier: 10,000 characters/month (~10 min audio)
- **Quality**: ⭐⭐⭐⭐⭐ Excellent (best-in-class natural voices)
- **Setup**: Easy (API key signup)
- **Platform**: API-based (any platform)
- **Automation**: REST API or Python SDK
- **Pros**:
- Exceptional voice quality
- Voice cloning available (paid)
- Multiple languages
- **Cons**:
- 10K char limit is very restrictive for daily digest
- Paid tier starts at $5/month for 30K chars
- Requires internet, API dependency
- Could exceed limits quickly with daily content
- **Integration**: Python SDK or curl to API
### 4. **OpenAI TTS API**
- **Cost**: $0.015 per 1,000 characters (~$0.018/minute)
- **Quality**: ⭐⭐⭐⭐⭐ Excellent (natural, expressive)
- **Setup**: Easy (API key)
- **Platform**: API-based
- **Automation**: REST API
- **Pros**:
- High quality voices (alloy, echo, fable, etc.)
- Fast, reliable API
- Good for moderate usage
- **Cons**:
- Not free - costs add up (~$1-3/month for daily digest)
- Requires internet connection
- Rate limits apply
- **Cost Estimate**: Daily 5-min digest ≈ $2-4/month
### 5. **Coqui TTS**
- **Cost**: FREE (open source)
- **Quality**: ⭐⭐⭐⭐ Good (varies by model)
- **Setup**: Hard (Python environment, dependencies)
- **Platform**: macOS, Linux, Windows
- **Automation**: Python scripts
- **Pros**:
- Free and open source
- Multiple voice models available
- Voice cloning capability
- **Cons**:
- Complex setup (conda/pip, GPU recommended)
- Heavier resource usage than Piper
- Project maintenance has slowed (team laid off)
- **Integration**: Python script with TTS library
### 6. **Google Cloud TTS**
- **Cost**: FREE tier: 1M characters/month (WaveNet), then $4 per 1M
- **Quality**: ⭐⭐⭐⭐ Very good (WaveNet voices)
- **Setup**: Medium (GCP account, API setup)
- **Platform**: API-based
- **Automation**: REST API or SDK
- **Pros**:
- Generous free tier
- Multiple voice options
- Reliable infrastructure
- **Cons**:
- Requires GCP account
- API complexity
- Privacy concerns (sends text to cloud)
- **Integration**: gcloud CLI or API calls
### 7. **Amazon Polly**
- **Cost**: FREE tier: 5M characters/month for 12 months, then ~$4 per 1M
- **Quality**: ⭐⭐⭐⭐ Good (Neural voices available)
- **Setup**: Medium (AWS account)
- **Platform**: API-based
- **Automation**: AWS CLI or SDK
- **Pros**:
- Generous free tier initially
- Neural voices sound natural
- **Cons**:
- Requires AWS account
- Complexity of AWS ecosystem
- **Integration**: AWS CLI or boto3
---
## Recommendation
**Primary Choice: Piper TTS**
- Best balance of quality, cost (free), and ease of automation
- Local processing means no privacy concerns
- No rate limits or API keys to manage
- Perfect for daily scheduled digest generation
**Alternative if quality is paramount: OpenAI TTS**
- Use if the ~$2-4/month cost is acceptable
- Slightly better voice quality
- Simpler than maintaining local models
**Avoid for this use case:**
- ElevenLabs free tier (too limiting for daily use)
- macOS say (quality too low for podcast format)
- Coqui (setup complexity not worth it vs Piper)
---
## Suggested Integration Workflow
```bash
#!/bin/bash
# Daily Digest TTS Script
# 1. Fetch or read markdown content
CONTENT=$(cat digest.md)
# 2. Convert markdown to plain text (strip formatting)
PLAIN_TEXT=$(echo "$CONTENT" | pandoc -f markdown -t plain)
# 3. Generate audio with Piper
piper \
--model ~/.local/share/piper/en_US-lessac-medium.onnx \
--output_file "digest_$(date +%Y-%m-%d).mp3" \
<<< "$PLAIN_TEXT"
# 4. Optional: Upload to podcast host or serve locally
```
---
## Voice Model Recommendations for Piper
| Voice | Style | Best For |
|-------|-------|----------|
| lessac | Neutral, clear | News/digest content |
| libritts | Natural, varied | Long-form content |
| ljspeech | Classic TTS | Short announcements |