| data | ||
| downloader | ||
| karaoke_downloader | ||
| .gitignore | ||
| commands.txt | ||
| download_karaoke.bat | ||
| download_karaoke.py | ||
| examine_cache.py | ||
| PRD.md | ||
| README.md | ||
| requirements.txt | ||
🎤 Karaoke Video Downloader
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using yt-dlp.exe, with advanced tracking, songlist prioritization, and flexible configuration.
✨ Features
- 🎵 Channel & Playlist Downloads: Download all videos from a YouTube channel or playlist
- 📂 Organized Storage: Each channel gets its own folder in
downloads/ - 📝 Robust Tracking: Tracks all downloads, statuses, and formats in JSON
- 🏆 Songlist Prioritization: Prioritize or restrict downloads to a custom songlist
- 🔄 Batch Saving & Caching: Efficient, minimizes API calls
- 🏷️ ID3 Tagging: Adds artist/title metadata to MP4 files
- 🧹 Automatic Cleanup: Removes extra yt-dlp files
- 📈 Real-Time Progress: Detailed console and log output
- 🧹 Reset/Clear Channel: Reset all tracking and files for a channel, or clear channel cache via CLI
- 🗂️ Latest-per-channel download: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
- 🧩 Fuzzy Matching: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
- ⚡ Fast Mode with Early Exit: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
- 🔄 Deduplication Across Channels: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
- 📋 Default Channel File: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
- 🛡️ Robust Interruption Handling: Progress is saved after each download, preventing re-downloads if the process is interrupted
- ⚡ Optimized Scanning: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching
- 🏷️ Server Duplicates Tracking: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server
🏗️ Architecture
The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse:
Core Modules:
downloader.py: Main orchestrator and CLI interfacevideo_downloader.py: Core video download execution and orchestrationtracking_manager.py: Download tracking and status managementdownload_planner.py: Download plan building and channel scanningcache_manager.py: Cache operations and file I/O managementchannel_manager.py: Channel and file management operationssonglist_manager.py: Songlist operations and trackingserver_manager.py: Server song availability checkingfuzzy_matcher.py: Fuzzy matching logic and similarity functions
Utility Modules:
youtube_utils.py: Centralized YouTube operations and yt-dlp command generationerror_utils.py: Standardized error handling and formattingdownload_pipeline.py: Abstracted download → verify → tag → track pipelineid3_utils.py: ID3 tagging utilitiesconfig_manager.py: Configuration managementresolution_cli.py: Resolution checking utilitiestracking_cli.py: Tracking management CLI
Benefits:
- Centralized Utilities: Common operations (yt-dlp commands, error handling) are centralized
- Reduced Duplication: Eliminated code duplication across modules
- Consistency: Standardized error messages and processing pipelines
- Maintainability: Changes isolated to specific modules
- Testability: Modular components can be tested independently
📋 Requirements
- Windows 10/11
- Python 3.7+
- yt-dlp.exe (in
downloader/) - mutagen (for ID3 tagging, optional)
- ffmpeg/ffprobe (for video validation, optional but recommended)
- rapidfuzz (for fuzzy matching, optional, falls back to difflib)
🚀 Quick Start
💡 Pro Tip: For a complete list of all available commands, see
commands.txt- you can copy/paste any command directly into your terminal!
Download a Channel
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
Download Only Songlist Songs (Fast Mode)
python download_karaoke.py --songlist-only --limit 5
Focus on Specific Playlists by Title
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"
Download with Fuzzy Matching
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
Download Latest N Videos Per Channel
python download_karaoke.py --latest-per-channel --limit 5
Download Latest N Videos Per Channel (with fuzzy matching)
python download_karaoke.py --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85
Prioritize Songlist in Download Queue
python download_karaoke.py --songlist-priority
Show Songlist Download Progress
python download_karaoke.py --songlist-status
Limit Number of Downloads
python download_karaoke.py --limit 5
Override Resolution
python download_karaoke.py --resolution 1080p
Reset/Start Over for a Channel
python download_karaoke.py --reset-channel SingKingKaraoke
Reset Channel and Songlist Songs
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
Clear Channel Cache
python download_karaoke.py --clear-cache SingKingKaraoke
python download_karaoke.py --clear-cache all
🧠 Songlist Integration
- Place your prioritized song list in
data/songList.json(see example format below). - The tool will match and prioritize these songs across all available channel videos.
- Use
--songlist-onlyto download only these songs, or--songlist-priorityto prioritize them in the queue. - Use
--songlist-focusto download only songs from specific playlists by title (e.g.,--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"). - Download progress for the songlist is tracked globally in
data/songlist_tracking.json.
Example data/songList.json
[
{
"title": "2025 - Apple Top 50",
"songs": [
{ "artist": "Kendrick Lamar & SZA", "title": "luther", "position": 1 },
{ "artist": "Kendrick Lamar", "title": "Not Like Us", "position": 2 }
]
},
{
"title": "2024 - Billboard Hot 100",
"songs": [
{ "artist": "Taylor Swift", "title": "Cruel Summer", "position": 1 },
{ "artist": "Billie Eilish", "title": "Happier Than Ever", "position": 2 }
]
}
]
🛠️ Tracking & Caching
- data/karaoke_tracking.json: Tracks all downloads, statuses, and formats
- data/songlist_tracking.json: Tracks global songlist download progress
- data/server_duplicates_tracking.json: Tracks songs found to be duplicates on the server for future skipping
- data/channel_cache.json: Caches channel video lists for performance
📂 Folder Structure
KaroakeVideoDownloader/
├── commands.txt # Complete CLI commands reference (copy/paste ready)
├── karaoke_downloader/ # All core Python code and utilities
│ ├── downloader.py # Main orchestrator and CLI interface
│ ├── cli.py # CLI entry point
│ ├── video_downloader.py # Core video download execution and orchestration
│ ├── tracking_manager.py # Download tracking and status management
│ ├── download_planner.py # Download plan building and channel scanning
│ ├── cache_manager.py # Cache operations and file I/O management
│ ├── channel_manager.py # Channel and file management operations
│ ├── songlist_manager.py # Songlist operations and tracking
│ ├── server_manager.py # Server song availability checking
│ ├── fuzzy_matcher.py # Fuzzy matching logic and similarity functions
│ ├── youtube_utils.py # Centralized YouTube operations and yt-dlp commands
│ ├── error_utils.py # Standardized error handling and formatting
│ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline
│ ├── id3_utils.py # ID3 tagging utilities
│ ├── config_manager.py # Configuration management
│ ├── check_resolution.py # Resolution checker utility
│ ├── resolution_cli.py # Resolution config CLI
│ └── tracking_cli.py # Tracking management CLI
├── data/ # All config, tracking, cache, and songlist files
│ ├── config.json
│ ├── karaoke_tracking.json
│ ├── songlist_tracking.json
│ ├── channel_cache.json
│ ├── channels.txt
│ └── songList.json
├── downloads/ # All video output
│ └── [ChannelName]/ # Per-channel folders
├── logs/ # Download logs
├── downloader/yt-dlp.exe # yt-dlp binary
├── tests/ # Diagnostic and test scripts
│ └── test_installation.py
├── download_karaoke.py # Main entry point (thin wrapper)
├── README.md
├── PRD.md
├── requirements.txt
└── download_karaoke.bat # (optional Windows launcher)
🚦 CLI Options
📋 Complete Command Reference: See
commands.txtfor all available commands with examples - perfect for copy/paste!
Key Options:
--file <data/channels.txt>: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes)--songlist-priority: Prioritize songlist songs in download queue--songlist-only: Download only songs from the songlist--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...: Focus on specific playlists by title (e.g.,--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100")--songlist-status: Show songlist download progress--limit <N>: Limit number of downloads (enables fast mode with early exit)--resolution <720p|1080p|...>: Override resolution--status: Show download/tracking status--reset-channel <CHANNEL_NAME>: Reset all tracking and files for a channel--reset-songlist: When used with --reset-channel, also reset songlist songs for this channel--clear-cache <CHANNEL_ID|all>: Clear channel video cache for a specific channel or all--clear-server-duplicates: Clear server duplicates tracking (allows re-checking songs against server)--latest-per-channel: Download the latest N videos from each channel (use with --limit)--fuzzy-match: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)--fuzzy-threshold <N>: Fuzzy match threshold (0-100, default 85)
📝 Example Usage
💡 For complete examples: See
commands.txtfor all command variations with explanations!
# Fast mode with fuzzy matching (no need to specify --file)
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
# Latest videos per channel
python download_karaoke.py --latest-per-channel --limit 5
# Traditional full scan (no limit)
python download_karaoke.py --songlist-only
# Channel-specific operations
python download_karaoke.py --reset-channel SingKingKaraoke
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
python download_karaoke.py --clear-cache all
python download_karaoke.py --clear-server-duplicates
🏷️ ID3 Tagging
- Adds artist/title/album/genre to MP4 files using mutagen (if installed)
🧹 Cleanup
- Removes
.info.jsonand.metafiles after download
🛠️ Configuration
- All options are in
data/config.json(format, resolution, metadata, etc.) - You can edit this file or use CLI flags to override
📋 Command Reference File
commands.txt contains a comprehensive list of all CLI commands with explanations. This file is designed for easy copy/paste usage and includes:
- All basic download commands
- Songlist operations
- Latest-per-channel downloads
- Cache and tracking management
- Reset and cleanup operations
- Advanced combinations
- Common workflows
- Troubleshooting commands
🔄 Maintenance Note: The
commands.txtfile should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage.
🔧 Refactoring Improvements (v3.2)
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication:
Key Improvements
- Centralized yt-dlp Command Generation: Standardized command building and execution across all download operations
- Enhanced Error Handling: Structured exception hierarchy with consistent error messages and formatting
- Abstracted Download Pipeline: Reusable download → verify → tag → track process for consistent processing
- Reduced Code Duplication: Eliminated duplicate code across modules through centralized utilities
New Utility Modules
youtube_utils.py: Centralized YouTube operations and yt-dlp command generationerror_utils.py: Standardized error handling with structured exception hierarchydownload_pipeline.py: Abstracted download pipeline for consistent processing
Benefits
- Improved Maintainability: Changes to yt-dlp configuration only require updates in one place
- Better Error Handling: Consistent error messages and better debugging context
- Enhanced Testability: Modular components can be tested independently
- Reduced Complexity: Single source of truth for common operations
🐞 Troubleshooting
- Ensure
yt-dlp.exeis in thedownloader/folder - Check
logs/for error details - Use
python -m karaoke_downloader.check_resolutionto verify video quality - If you see errors about ffmpeg/ffprobe, install ffmpeg and ensure it is in your PATH
- For best fuzzy matching, install rapidfuzz:
pip install rapidfuzz(otherwise falls back to slower, less accurate difflib)
Happy Karaoke! 🎤