test-repo/tts-research-report.md

5.3 KiB

TTS Options Research for Daily Digest Podcast

Executive Summary

After evaluating multiple TTS solutions, Piper TTS emerges as the best choice for a daily digest workflow, offering excellent quality at zero cost with full local control.


Option Comparison

  • Cost: FREE (open source)
  • Quality: Very good (neural voices, natural sounding)
  • Setup: Easy-Medium (binary download + voice model)
  • Platform: macOS, Linux, Windows
  • Automation: CLI tool, easily scripted
  • Pros:
    • Completely free, no API limits
    • Runs locally (privacy, no internet needed)
    • Fast inference on CPU
    • Multiple high-quality voices available
    • Active development (GitHub: rhasspy/piper)
  • Cons:
    • Requires downloading voice models (~50-100MB each)
    • Not quite as expressive as premium APIs
  • Integration:
    echo "Your digest content" | piper --model en_US-lessac-medium.onnx --output_file digest.mp3
    

2. macOS say Command

  • Cost: FREE (built-in)
  • Quality: Basic (functional but robotic)
  • Setup: None (pre-installed)
  • Platform: macOS only
  • Automation: CLI, easily scripted
  • Pros:
    • Zero setup required
    • Native macOS integration
    • Multiple built-in voices
  • Cons:
    • Quality is noticeably robotic
    • Limited voice options
    • No neural/AI voices
  • Integration:
    say -v Samantha -o digest.aiff "Your digest content"
    

3. ElevenLabs Free Tier

  • Cost: FREE tier: 10,000 characters/month (~10 min audio)
  • Quality: Excellent (best-in-class natural voices)
  • Setup: Easy (API key signup)
  • Platform: API-based (any platform)
  • Automation: REST API or Python SDK
  • Pros:
    • Exceptional voice quality
    • Voice cloning available (paid)
    • Multiple languages
  • Cons:
    • 10K char limit is very restrictive for daily digest
    • Paid tier starts at $5/month for 30K chars
    • Requires internet, API dependency
    • Could exceed limits quickly with daily content
  • Integration: Python SDK or curl to API

4. OpenAI TTS API

  • Cost: $0.015 per 1,000 characters (~$0.018/minute)
  • Quality: Excellent (natural, expressive)
  • Setup: Easy (API key)
  • Platform: API-based
  • Automation: REST API
  • Pros:
    • High quality voices (alloy, echo, fable, etc.)
    • Fast, reliable API
    • Good for moderate usage
  • Cons:
    • Not free - costs add up (~$1-3/month for daily digest)
    • Requires internet connection
    • Rate limits apply
  • Cost Estimate: Daily 5-min digest ≈ $2-4/month

5. Coqui TTS

  • Cost: FREE (open source)
  • Quality: Good (varies by model)
  • Setup: Hard (Python environment, dependencies)
  • Platform: macOS, Linux, Windows
  • Automation: Python scripts
  • Pros:
    • Free and open source
    • Multiple voice models available
    • Voice cloning capability
  • Cons:
    • Complex setup (conda/pip, GPU recommended)
    • Heavier resource usage than Piper
    • Project maintenance has slowed (team laid off)
  • Integration: Python script with TTS library

6. Google Cloud TTS

  • Cost: FREE tier: 1M characters/month (WaveNet), then $4 per 1M
  • Quality: Very good (WaveNet voices)
  • Setup: Medium (GCP account, API setup)
  • Platform: API-based
  • Automation: REST API or SDK
  • Pros:
    • Generous free tier
    • Multiple voice options
    • Reliable infrastructure
  • Cons:
    • Requires GCP account
    • API complexity
    • Privacy concerns (sends text to cloud)
  • Integration: gcloud CLI or API calls

7. Amazon Polly

  • Cost: FREE tier: 5M characters/month for 12 months, then ~$4 per 1M
  • Quality: Good (Neural voices available)
  • Setup: Medium (AWS account)
  • Platform: API-based
  • Automation: AWS CLI or SDK
  • Pros:
    • Generous free tier initially
    • Neural voices sound natural
  • Cons:
    • Requires AWS account
    • Complexity of AWS ecosystem
  • Integration: AWS CLI or boto3

Recommendation

Primary Choice: Piper TTS

  • Best balance of quality, cost (free), and ease of automation
  • Local processing means no privacy concerns
  • No rate limits or API keys to manage
  • Perfect for daily scheduled digest generation

Alternative if quality is paramount: OpenAI TTS

  • Use if the ~$2-4/month cost is acceptable
  • Slightly better voice quality
  • Simpler than maintaining local models

Avoid for this use case:

  • ElevenLabs free tier (too limiting for daily use)
  • macOS say (quality too low for podcast format)
  • Coqui (setup complexity not worth it vs Piper)

Suggested Integration Workflow

#!/bin/bash
# Daily Digest TTS Script

# 1. Fetch or read markdown content
CONTENT=$(cat digest.md)

# 2. Convert markdown to plain text (strip formatting)
PLAIN_TEXT=$(echo "$CONTENT" | pandoc -f markdown -t plain)

# 3. Generate audio with Piper
piper \
  --model ~/.local/share/piper/en_US-lessac-medium.onnx \
  --output_file "digest_$(date +%Y-%m-%d).mp3" \
  <<< "$PLAIN_TEXT"

# 4. Optional: Upload to podcast host or serve locally

Voice Model Recommendations for Piper

Voice Style Best For
lessac Neutral, clear News/digest content
libritts Natural, varied Long-form content
ljspeech Classic TTS Short announcements