105 lines
4.1 KiB
Markdown
105 lines
4.1 KiB
Markdown
# Podcast Audio Mixing Instructions - Layered/Professional Style
|
|
**For AI-Assisted Podcast Production**
|
|
*Version: 2.0 (Layered Overlap) | March 2026*
|
|
|
|
## Goal
|
|
Professional podcast sound: intro music starts first, voice fades in over it, music stays subtle underneath spoken word at 30% volume, then outro music fades after voice ends.
|
|
|
|
## Core Rules (MUST Follow)
|
|
- **Intro music**: Use the first 8 seconds only (`atrim=0:8`).
|
|
- **Outro music**: Use the last 8 seconds only (`atrim=26:34` for a 34s music source).
|
|
- **Music volume**: Fixed at `0.3` (30%).
|
|
- **No full-track loop**: Do not loop the entire source track.
|
|
- **No heavy dynamics**: Do not use `sidechaincompress` or ducking chains.
|
|
- **Layering is required**: Music must overlap spoken audio (intro overlap + low bed under narration + outro tail).
|
|
- Spoken narration remains primary and clear.
|
|
- Use smooth fades/crossfades (1.5-3 seconds).
|
|
- Output format: MP3 (VBR quality target around ~190 kbps, `-q:a 2`).
|
|
|
|
## Recommended FFmpeg Command (True Layered Mix)
|
|
This version does all of the following:
|
|
- Intro starts alone, voice enters at 5s with a 3s fade-in.
|
|
- Middle section is looped from the music's center slice (not full-track loop) for a continuous low bed.
|
|
- Voice and music are layered with a lightweight `amix` only.
|
|
- Outro is appended with a crossfade and fade-out.
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
# mix_podcast_layered.sh
|
|
# Usage: ./mix_podcast_layered.sh spoken_narration.mp3 music_34s.mp3 final_podcast.mp3
|
|
|
|
set -euo pipefail
|
|
|
|
SPOKEN="${1:-}"
|
|
MUSIC="${2:-}"
|
|
OUTPUT="${3:-}"
|
|
|
|
if [ -z "$SPOKEN" ] || [ -z "$MUSIC" ] || [ -z "$OUTPUT" ]; then
|
|
echo "Usage: $0 spoken.mp3 music.mp3 output.mp3"
|
|
exit 1
|
|
fi
|
|
|
|
ffmpeg -y \
|
|
-i "$SPOKEN" \
|
|
-i "$MUSIC" \
|
|
-filter_complex "
|
|
[0:a]aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo,asetpts=PTS-STARTPTS[voice_raw];
|
|
[voice_raw]adelay=5000|5000,afade=t=in:st=5:d=3[voice];
|
|
|
|
[1:a]aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo,asetpts=PTS-STARTPTS[music_base];
|
|
[music_base]atrim=0:8,volume=0.3,afade=t=in:st=0:d=1.5[intro];
|
|
[music_base]atrim=8:26,asetpts=PTS-STARTPTS,volume=0.3[mid];
|
|
[mid]aloop=loop=-1:size=2147483647,atrim=0:3600[mid_loop];
|
|
|
|
[intro][mid_loop]concat=n=2:v=0:a=1[music_timeline];
|
|
[music_timeline][voice]amix=inputs=2:duration=shortest:normalize=0[main];
|
|
|
|
[music_base]atrim=26:34,asetpts=PTS-STARTPTS,volume=0.3,afade=t=out:st=6:d=2[outro];
|
|
[main][outro]acrossfade=d=2:c1=tri:c2=tri[mix]
|
|
" \
|
|
-map "[mix]" \
|
|
-c:a libmp3lame -q:a 2 \
|
|
"$OUTPUT"
|
|
```
|
|
|
|
## Simpler Alternative (Intro/Outro Layering Only)
|
|
Use this if you want easier debugging. It overlaps intro into speech and appends outro, but does not maintain a continuous bed for very long narration.
|
|
|
|
```bash
|
|
ffmpeg -y \
|
|
-i "$SPOKEN" \
|
|
-i "$MUSIC" \
|
|
-filter_complex "
|
|
[1:a]atrim=0:8,volume=0.3,afade=t=in:st=0:d=2[intro];
|
|
[1:a]atrim=26:34,volume=0.3,afade=t=out:st=6:d=2[outro];
|
|
[intro][0:a]acrossfade=d=3:curve1=exp:curve2=exp[voiced];
|
|
[voiced][outro]acrossfade=d=2:curve1=tri:curve2=tri[mix]
|
|
" \
|
|
-map "[mix]" \
|
|
-c:a libmp3lame -q:a 2 \
|
|
"$OUTPUT"
|
|
```
|
|
|
|
## File Preparation Checklist
|
|
- Spoken narration: MP3/M4A/WAV, clean and normalized.
|
|
- Music source: 34s+ source where `0:8` is intro material and `26:34` is outro material.
|
|
- Validate durations first:
|
|
|
|
```bash
|
|
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$SPOKEN"
|
|
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$MUSIC"
|
|
```
|
|
|
|
- Test with a short spoken sample before full renders.
|
|
|
|
## Troubleshooting
|
|
- Music too loud: lower `volume=0.3` to `0.25` or `0.2`.
|
|
- Voice starts too late/early: adjust `adelay=5000|5000`.
|
|
- Intro overlap too long/short: adjust `afade` and crossfade durations.
|
|
- Outro too abrupt: increase `afade=t=out` duration or `acrossfade=d`.
|
|
- Want final loudness polish: add `-af loudnorm` to output stage.
|
|
|
|
## Notes
|
|
- This is your layered/pro baseline file for AI generation and scripting.
|
|
- If you want true broadcast polish next, the next step is LUFS target normalization + limiter (still without sidechain ducking).
|