8.9 KiB
8.9 KiB
🎤 Karaoke Video Downloader – PRD (v2.2)
✅ Overview
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using yt-dlp.exe, with advanced tracking, songlist prioritization, and flexible configuration.
📋 Goals
- Download karaoke videos from YouTube channels or playlists.
- Organize downloads by channel (or playlist) in subfolders.
- Avoid re-downloading the same videos (robust tracking).
- Prioritize and track a custom songlist across channels.
- Allow flexible, user-friendly configuration.
🧑💻 Target Users
- Karaoke DJs, home karaoke users, event hosts, or anyone needing offline karaoke video libraries.
- Users comfortable with command-line tools.
⚙️ Platform & Stack
- Platform: Windows
- Interface: Command-line (CLI)
- Tech Stack: Python 3.7+, yt-dlp.exe, mutagen (for ID3 tagging)
📥 Input
- YouTube channel or playlist URLs (e.g.
https://www.youtube.com/@SingKingKaraoke/videos) - Optional:
data/channels.txtfile with multiple channel URLs (one per line) - now defaults to this file if not specified - Optional:
data/songList.jsonfor prioritized song downloads
Example Usage
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
python download_karaoke.py --songlist-only --limit 5
python download_karaoke.py --latest-per-channel --limit 3
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
python download_karaoke.py --clear-cache SingKingKaraoke
📤 Output
- MP4 files in
downloads/<ChannelName>/subfolders - All videos tracked in
data/karaoke_tracking.json - Songlist progress tracked in
data/songlist_tracking.json - Logs in
logs/
🛠️ Features
- ✅ Channel-based downloads (with per-channel folders)
- ✅ Robust JSON tracking (downloaded, partial, failed, etc.)
- ✅ Batch saving and channel video caching for performance
- ✅ Configurable download resolution and yt-dlp options (
data/config.json) - ✅ Songlist integration: prioritize and track custom songlists
- ✅ Songlist-only mode: download only songs from the songlist
- ✅ Global songlist tracking to avoid duplicates across channels
- ✅ ID3 tagging for artist/title in MP4 files (mutagen)
- ✅ Real-time progress and detailed logging
- ✅ Automatic cleanup of extra yt-dlp files
- ✅ Reset/clear channel tracking and files via CLI
- ✅ Clear channel cache via CLI
- ✅ Download plan pre-scan and caching: Before downloading, the tool pre-scans all channels for songlist matches, builds a download plan, and prints stats. The plan is cached for 1 day in data/download_plan_cache.json for fast resuming and reliability. Use --force-download-plan to force a refresh.
- ✅ Latest-per-channel download: Download the latest N videos from each channel in a single batch, with a per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
- ✅ Fast mode with early exit: When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. If a download fails, it continues scanning until the limit is satisfied or all channels are exhausted.
- ✅ Deduplication across channels: Ensures the same song (by artist + normalized title) is not downloaded more than once, even if it appears in multiple channels. Tracks unique keys and skips duplicates.
- ✅ Fuzzy matching: Optionally use fuzzy string matching for songlist-to-video matching with configurable threshold (0-100, default 85). Uses rapidfuzz if available, falls back to difflib.
- ✅ Default channel file: If no --file is specified for songlist-only or latest-per-channel modes, automatically uses data/channels.txt as the default channel list.
📂 Folder Structure
KaroakeVideoDownloader/
├── karaoke_downloader/ # All core Python code and utilities
│ ├── downloader.py # Main downloader class
│ ├── cli.py # CLI entry point
│ ├── id3_utils.py # ID3 tagging helpers
│ ├── songlist_manager.py # Songlist logic
│ ├── youtube_utils.py # YouTube helpers
│ ├── tracking_manager.py # Tracking logic
│ ├── check_resolution.py # Resolution checker utility
│ ├── resolution_cli.py # Resolution config CLI
│ └── tracking_cli.py # Tracking management CLI
├── data/ # All config, tracking, cache, and songlist files
│ ├── config.json
│ ├── karaoke_tracking.json
│ ├── songlist_tracking.json
│ ├── channel_cache.json
│ ├── channels.txt
│ └── songList.json
├── downloads/ # All video output
│ └── [ChannelName]/ # Per-channel folders
├── logs/ # Download logs
├── downloader/yt-dlp.exe # yt-dlp binary
├── tests/ # Diagnostic and test scripts
│ └── test_installation.py
├── download_karaoke.py # Main entry point (thin wrapper)
├── README.md
├── PRD.md
├── requirements.txt
└── download_karaoke.bat # (optional Windows launcher)
🚦 CLI Options (Summary)
--file <data/channels.txt>: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes)--songlist-priority: Prioritize songlist songs in download queue--songlist-only: Download only songs from the songlist--songlist-status: Show songlist download progress--limit <N>: Limit number of downloads (enables fast mode with early exit)--resolution <720p|1080p|...>: Override resolution--status: Show download/tracking status--reset-channel <CHANNEL_NAME>: Reset all tracking and files for a channel--reset-songlist: When used with --reset-channel, also reset songlist songs for this channel--clear-cache <CHANNEL_ID|all>: Clear channel video cache for a specific channel or all--force-download-plan: Force refresh the download plan cache (re-scan all channels for matches)--latest-per-channel: Download the latest N videos from each channel (use with --limit)--fuzzy-match: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)--fuzzy-threshold <N>: Fuzzy match threshold (0-100, default 85)
🧠 Logic Highlights
- Tracking: All downloads, statuses, and formats are tracked in JSON files for reliability and deduplication.
- Songlist: Loads and normalizes
data/songList.json, matches against available videos, and prioritizes or restricts downloads accordingly. - Batch/Caching: Channel video lists are cached to minimize API calls; tracking is batch-saved for performance.
- ID3 Tagging: Artist/title extracted from video title and embedded in MP4 files.
- Cleanup: Extra files from yt-dlp (e.g.,
.info.json) are automatically removed after download. - Reset/Clear: Use
--reset-channelto reset all tracking and files for a channel (optionally including songlist songs with--reset-songlist). Use--clear-cacheto clear cached video lists for a channel or all channels. - Download plan pre-scan: Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set.
- Latest-per-channel plan: Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done.
- Fast mode with early exit: When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach.
- Deduplication across channels: Tracks unique song keys (artist + normalized title) to ensure the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list.
- Fuzzy matching: Uses string similarity algorithms to find approximate matches between songlist entries and video titles, tolerating minor differences, typos, or extra words like "Karaoke" or "Official Video".
- Default channel file: For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly.
🚀 Future Enhancements
- Web UI for easier management
- More advanced song matching (multi-language)
- Download scheduling and retry logic
- More granular status reporting
- Parallel downloads for improved speed