10 KiB
10 KiB
🎤 Karaoke Video Downloader
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using yt-dlp.exe, with advanced tracking, songlist prioritization, and flexible configuration.
✨ Features
- 🎵 Channel & Playlist Downloads: Download all videos from a YouTube channel or playlist
- 📂 Organized Storage: Each channel gets its own folder in
downloads/ - 📝 Robust Tracking: Tracks all downloads, statuses, and formats in JSON
- 🏆 Songlist Prioritization: Prioritize or restrict downloads to a custom songlist
- 🔄 Batch Saving & Caching: Efficient, minimizes API calls
- 🏷️ ID3 Tagging: Adds artist/title metadata to MP4 files
- 🧹 Automatic Cleanup: Removes extra yt-dlp files
- 📈 Real-Time Progress: Detailed console and log output
- 🧹 Reset/Clear Channel: Reset all tracking and files for a channel, or clear channel cache via CLI
- 🗂️ Latest-per-channel download: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
- 🧩 Fuzzy Matching: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
- ⚡ Fast Mode with Early Exit: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
- 🔄 Deduplication Across Channels: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
- 📋 Default Channel File: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
- 🛡️ Robust Interruption Handling: Progress is saved after each download, preventing re-downloads if the process is interrupted
- ⚡ Optimized Scanning: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching
- 🏷️ Server Duplicates Tracking: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server
🏗️ Architecture
The codebase has been refactored into a modular architecture for better maintainability and separation of concerns:
fuzzy_matcher.py: Fuzzy matching logic and similarity functionsdownload_planner.py: Download plan building and channel scanning (optimized)cache_manager.py: Cache operations and file I/O managementserver_manager.py: Server songs loading and server duplicates trackingvideo_downloader.py: Core video download execution and orchestrationchannel_manager.py: Channel and file management operationsdownloader.py: Main orchestrator and CLI interface
📋 Requirements
- Windows 10/11
- Python 3.7+
- yt-dlp.exe (in
downloader/) - mutagen (for ID3 tagging, optional)
- ffmpeg/ffprobe (for video validation, optional but recommended)
- rapidfuzz (for fuzzy matching, optional, falls back to difflib)
🚀 Quick Start
Download a Channel
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
Download Only Songlist Songs (Fast Mode)
python download_karaoke.py --songlist-only --limit 5
Download with Fuzzy Matching
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
Download Latest N Videos Per Channel
python download_karaoke.py --latest-per-channel --limit 5
Download Latest N Videos Per Channel (with fuzzy matching)
python download_karaoke.py --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85
Prioritize Songlist in Download Queue
python download_karaoke.py --songlist-priority
Show Songlist Download Progress
python download_karaoke.py --songlist-status
Limit Number of Downloads
python download_karaoke.py --limit 5
Override Resolution
python download_karaoke.py --resolution 1080p
Reset/Start Over for a Channel
python download_karaoke.py --reset-channel SingKingKaraoke
Reset Channel and Songlist Songs
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
Clear Channel Cache
python download_karaoke.py --clear-cache SingKingKaraoke
python download_karaoke.py --clear-cache all
🧠 Songlist Integration
- Place your prioritized song list in
data/songList.json(see example format below). - The tool will match and prioritize these songs across all available channel videos.
- Use
--songlist-onlyto download only these songs, or--songlist-priorityto prioritize them in the queue. - Download progress for the songlist is tracked globally in
data/songlist_tracking.json.
Example data/songList.json
[
{ "artist": "Taylor Swift", "title": "Cruel Summer" },
{ "artist": "Billie Eilish", "title": "Happier Than Ever" }
]
🛠️ Tracking & Caching
- data/karaoke_tracking.json: Tracks all downloads, statuses, and formats
- data/songlist_tracking.json: Tracks global songlist download progress
- data/server_duplicates_tracking.json: Tracks songs found to be duplicates on the server for future skipping
- data/channel_cache.json: Caches channel video lists for performance
📂 Folder Structure
KaroakeVideoDownloader/
├── karaoke_downloader/ # All core Python code and utilities
│ ├── downloader.py # Main orchestrator and CLI interface
│ ├── cli.py # CLI entry point
│ ├── fuzzy_matcher.py # Fuzzy matching logic and similarity functions
│ ├── download_planner.py # Download plan building and channel scanning (optimized)
│ ├── cache_manager.py # Cache operations and file I/O management
│ ├── server_manager.py # Server songs loading and server duplicates tracking
│ ├── video_downloader.py # Core video download execution and orchestration
│ ├── channel_manager.py # Channel and file management operations
│ ├── id3_utils.py # ID3 tagging helpers
│ ├── songlist_manager.py # Songlist logic
│ ├── youtube_utils.py # YouTube helpers
│ ├── tracking_manager.py # Tracking logic
│ ├── check_resolution.py # Resolution checker utility
│ ├── resolution_cli.py # Resolution config CLI
│ └── tracking_cli.py # Tracking management CLI
├── data/ # All config, tracking, cache, and songlist files
│ ├── config.json
│ ├── karaoke_tracking.json
│ ├── songlist_tracking.json
│ ├── channel_cache.json
│ ├── channels.txt
│ └── songList.json
├── downloads/ # All video output
│ └── [ChannelName]/ # Per-channel folders
├── logs/ # Download logs
├── downloader/yt-dlp.exe # yt-dlp binary
├── tests/ # Diagnostic and test scripts
│ └── test_installation.py
├── download_karaoke.py # Main entry point (thin wrapper)
├── README.md
├── PRD.md
├── requirements.txt
└── download_karaoke.bat # (optional Windows launcher)
🚦 CLI Options
--file <data/channels.txt>: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes)--songlist-priority: Prioritize songlist songs in download queue--songlist-only: Download only songs from the songlist--songlist-status: Show songlist download progress--limit <N>: Limit number of downloads (enables fast mode with early exit)--resolution <720p|1080p|...>: Override resolution--status: Show download/tracking status--reset-channel <CHANNEL_NAME>: Reset all tracking and files for a channel--reset-songlist: When used with --reset-channel, also reset songlist songs for this channel--clear-cache <CHANNEL_ID|all>: Clear channel video cache for a specific channel or all--clear-server-duplicates: Clear server duplicates tracking (allows re-checking songs against server)--latest-per-channel: Download the latest N videos from each channel (use with --limit)--fuzzy-match: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)--fuzzy-threshold <N>: Fuzzy match threshold (0-100, default 85)
📝 Example Usage
# Fast mode with fuzzy matching (no need to specify --file)
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
# Latest videos per channel
python download_karaoke.py --latest-per-channel --limit 5
# Traditional full scan (no limit)
python download_karaoke.py --songlist-only
# Channel-specific operations
python download_karaoke.py --reset-channel SingKingKaraoke
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
python download_karaoke.py --clear-cache all
python download_karaoke.py --clear-server-duplicates
🏷️ ID3 Tagging
- Adds artist/title/album/genre to MP4 files using mutagen (if installed)
🧹 Cleanup
- Removes
.info.jsonand.metafiles after download
🛠️ Configuration
- All options are in
data/config.json(format, resolution, metadata, etc.) - You can edit this file or use CLI flags to override
🐞 Troubleshooting
- Ensure
yt-dlp.exeis in thedownloader/folder - Check
logs/for error details - Use
python -m karaoke_downloader.check_resolutionto verify video quality - If you see errors about ffmpeg/ffprobe, install ffmpeg and ensure it is in your PATH
- For best fuzzy matching, install rapidfuzz:
pip install rapidfuzz(otherwise falls back to slower, less accurate difflib)
Happy Karaoke! 🎤