94 lines
2.8 KiB
Markdown
94 lines
2.8 KiB
Markdown
# Audio Mixing - Podcast Production
|
|
|
|
## Overview
|
|
|
|
The blog-backup project generates podcast audio by mixing TTS (text-to-speech) with background music using ffmpeg.
|
|
|
|
## Current Working Configuration
|
|
|
|
### What It Does
|
|
- **TTS:** Uses macOS `say` command (built-in, no external API)
|
|
- **Mixing:** Music plays at 12% volume UNDER the speech (continuous bed)
|
|
- **Fades:** Music fades in at start (30%) and fades out at end (30%)
|
|
- **Format:** MP3 output
|
|
|
|
### Audio Flow
|
|
```
|
|
[Music fades in 30%] → [Speech with 12% music bed] → [Music fades out]
|
|
```
|
|
|
|
### Technical Details
|
|
|
|
**File:** `blog-backup/src/lib/tts.ts`
|
|
|
|
**Environment Variables:**
|
|
```bash
|
|
ENABLE_TTS=true
|
|
TTS_PROVIDER=macsay
|
|
ENABLE_PODCAST_MUSIC=true
|
|
INTRO_MUSIC_URL=/path/to/intro.mp3
|
|
OUTRO_MUSIC_URL=/path/to/outro.mp3
|
|
```
|
|
|
|
**ffmpeg Command (Working):**
|
|
```bash
|
|
ffmpeg -y -i "${ttsPath}" -stream_loop -1 -i "${introPath}" -i "${outroPath}" -filter_complex "
|
|
[1:a]volume=0.3,apad=5[music];
|
|
[2:a]volume=0.3[outro];
|
|
[0:a][music]amix=duration=first:weights=1 0.12[speechbed];
|
|
[speechbed]afade=t=in:st=0:d=1[in];
|
|
[in][outro]concat=n=2:v=0:a=1[out]
|
|
" -map "[out]" -shortest "${outputPath}"
|
|
```
|
|
|
|
### Known Limitations
|
|
|
|
1. **Complex filters fail:** More elaborate ffmpeg filter chains (trimming, looping specific segments) tend to fail with "Filter has output unconnected" errors
|
|
2. **Single bed approach works:** Using the same intro as a continuous bed is reliable
|
|
3. **Pre-sliced clips would be better:** For distinct intro/speech/outro, pre-create short clips (5-10 sec) and concatenate
|
|
|
|
## Music Files
|
|
|
|
**Location:** `blog-creator/public/podcast-audio/`
|
|
|
|
| File | Duration | Use |
|
|
|------|----------|-----|
|
|
| intro.mp3 | 71 sec | Background music bed |
|
|
| outro.mp3 | 34 sec | Outro music |
|
|
|
|
### Suggested Improvements
|
|
|
|
1. **Create short intro clip:** Extract first 5-10 sec as separate file
|
|
2. **Create short outro clip:** Extract last 5-10 sec as separate file
|
|
3. **Use simpler 2-step process:**
|
|
- Step 1: Mix speech with looped bed
|
|
- Step 2: Prepend intro, append outro
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Test TTS with music
|
|
curl -X POST "http://localhost:3002/api/tts" \
|
|
-H "Content-Type: application/json" \
|
|
-H "x-api-key: YOUR_API_KEY" \
|
|
-d '{
|
|
"text": "Your text here",
|
|
"includeMusic": true
|
|
}'
|
|
```
|
|
|
|
## Common Errors
|
|
|
|
| Error | Cause | Fix |
|
|
|-------|-------|-----|
|
|
| "Filter has output unconnected" | Complex filter chain | Simplify to fewer inputs |
|
|
| "OPENAI_API_KEY not configured" | Wrong provider | Set TTS_PROVIDER=macsay |
|
|
| "No music files configured" | Missing env vars | Set INTRO_MUSIC_URL and OUTRO_MUSIC_URL |
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Pre-slice intro/outro for distinct segments
|
|
- [ ] Add transition sounds between stories
|
|
- [ ] Adjust bed volume based on speech pauses
|
|
- [ ] Add compression/normalize for consistent levels
|