test-repo/docs/AUDIO_MIXING.md

2.8 KiB

Audio Mixing - Podcast Production

Overview

The blog-backup project generates podcast audio by mixing TTS (text-to-speech) with background music using ffmpeg.

Current Working Configuration

What It Does

  • TTS: Uses macOS say command (built-in, no external API)
  • Mixing: Music plays at 12% volume UNDER the speech (continuous bed)
  • Fades: Music fades in at start (30%) and fades out at end (30%)
  • Format: MP3 output

Audio Flow

[Music fades in 30%] → [Speech with 12% music bed] → [Music fades out]

Technical Details

File: blog-backup/src/lib/tts.ts

Environment Variables:

ENABLE_TTS=true
TTS_PROVIDER=macsay
ENABLE_PODCAST_MUSIC=true
INTRO_MUSIC_URL=/path/to/intro.mp3
OUTRO_MUSIC_URL=/path/to/outro.mp3

ffmpeg Command (Working):

ffmpeg -y -i "${ttsPath}" -stream_loop -1 -i "${introPath}" -i "${outroPath}" -filter_complex "
  [1:a]volume=0.3,apad=5[music];
  [2:a]volume=0.3[outro];
  [0:a][music]amix=duration=first:weights=1 0.12[speechbed];
  [speechbed]afade=t=in:st=0:d=1[in];
  [in][outro]concat=n=2:v=0:a=1[out]
" -map "[out]" -shortest "${outputPath}"

Known Limitations

  1. Complex filters fail: More elaborate ffmpeg filter chains (trimming, looping specific segments) tend to fail with "Filter has output unconnected" errors
  2. Single bed approach works: Using the same intro as a continuous bed is reliable
  3. Pre-sliced clips would be better: For distinct intro/speech/outro, pre-create short clips (5-10 sec) and concatenate

Music Files

Location: blog-creator/public/podcast-audio/

File Duration Use
intro.mp3 71 sec Background music bed
outro.mp3 34 sec Outro music

Suggested Improvements

  1. Create short intro clip: Extract first 5-10 sec as separate file
  2. Create short outro clip: Extract last 5-10 sec as separate file
  3. Use simpler 2-step process:
    • Step 1: Mix speech with looped bed
    • Step 2: Prepend intro, append outro

Testing

# Test TTS with music
curl -X POST "http://localhost:3002/api/tts" \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "text": "Your text here",
    "includeMusic": true
  }'

Common Errors

Error Cause Fix
"Filter has output unconnected" Complex filter chain Simplify to fewer inputs
"OPENAI_API_KEY not configured" Wrong provider Set TTS_PROVIDER=macsay
"No music files configured" Missing env vars Set INTRO_MUSIC_URL and OUTRO_MUSIC_URL

Future Enhancements

  • Pre-slice intro/outro for distinct segments
  • Add transition sounds between stories
  • Adjust bed volume based on speech pauses
  • Add compression/normalize for consistent levels