# ๐ŸŽค Karaoke Video Downloader A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. ## โœจ Features - ๐ŸŽต **Channel & Playlist Downloads**: Download all videos from a YouTube channel or playlist - ๐Ÿ“‚ **Organized Storage**: Each channel gets its own folder in `downloads/` - ๐Ÿ“ **Robust Tracking**: Tracks all downloads, statuses, and formats in JSON - ๐Ÿ† **Songlist Prioritization**: Prioritize or restrict downloads to a custom songlist - ๐Ÿ”„ **Batch Saving & Caching**: Efficient, minimizes API calls - ๐Ÿท๏ธ **ID3 Tagging**: Adds artist/title metadata to MP4 files - ๐Ÿงน **Automatic Cleanup**: Removes extra yt-dlp files - ๐Ÿ“ˆ **Real-Time Progress**: Detailed console and log output - ๐Ÿงน **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI - ๐Ÿ—‚๏ธ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N. - ๐Ÿงฉ **Fuzzy Matching**: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results) - โšก **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads - ๐Ÿ”„ **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list - ๐Ÿ“‹ **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time) - ๐Ÿ›ก๏ธ **Robust Interruption Handling**: Progress is saved after each download, preventing re-downloads if the process is interrupted - โšก **Optimized Scanning**: High-performance channel scanning with O(nร—m) complexity, pre-processed lookups, and early termination for faster matching - ๐Ÿท๏ธ **Server Duplicates Tracking**: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server ## ๐Ÿ—๏ธ Architecture The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse: ### Core Modules: - **`downloader.py`**: Main orchestrator and CLI interface - **`video_downloader.py`**: Core video download execution and orchestration - **`tracking_manager.py`**: Download tracking and status management - **`download_planner.py`**: Download plan building and channel scanning - **`cache_manager.py`**: Cache operations and file I/O management - **`channel_manager.py`**: Channel and file management operations - **`songlist_manager.py`**: Songlist operations and tracking - **`server_manager.py`**: Server song availability checking - **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions ### Utility Modules: - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`error_utils.py`**: Standardized error handling and formatting - **`download_pipeline.py`**: Abstracted download โ†’ verify โ†’ tag โ†’ track pipeline - **`id3_utils.py`**: ID3 tagging utilities - **`config_manager.py`**: Configuration management - **`resolution_cli.py`**: Resolution checking utilities - **`tracking_cli.py`**: Tracking management CLI ### Benefits: - **Centralized Utilities**: Common operations (yt-dlp commands, error handling) are centralized - **Reduced Duplication**: Eliminated code duplication across modules - **Consistency**: Standardized error messages and processing pipelines - **Maintainability**: Changes isolated to specific modules - **Testability**: Modular components can be tested independently ## ๐Ÿ“‹ Requirements - **Windows 10/11** - **Python 3.7+** - **yt-dlp.exe** (in `downloader/`) - **mutagen** (for ID3 tagging, optional) - **ffmpeg/ffprobe** (for video validation, optional but recommended) - **rapidfuzz** (for fuzzy matching, optional, falls back to difflib) ## ๐Ÿš€ Quick Start > **๐Ÿ’ก Pro Tip**: For a complete list of all available commands, see `commands.txt` - you can copy/paste any command directly into your terminal! ### Download a Channel ```bash python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos ``` ### Download Only Songlist Songs (Fast Mode) ```bash python download_karaoke.py --songlist-only --limit 5 ``` ### Focus on Specific Playlists by Title ```bash python download_karaoke.py --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100" ``` ### Download with Fuzzy Matching ```bash python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85 ``` ### Download Latest N Videos Per Channel ```bash python download_karaoke.py --latest-per-channel --limit 5 ``` ### Download Latest N Videos Per Channel (with fuzzy matching) ```bash python download_karaoke.py --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85 ``` ### Prioritize Songlist in Download Queue ```bash python download_karaoke.py --songlist-priority ``` ### Show Songlist Download Progress ```bash python download_karaoke.py --songlist-status ``` ### Limit Number of Downloads ```bash python download_karaoke.py --limit 5 ``` ### Override Resolution ```bash python download_karaoke.py --resolution 1080p ``` ### **Reset/Start Over for a Channel** ```bash python download_karaoke.py --reset-channel SingKingKaraoke ``` ### **Reset Channel and Songlist Songs** ```bash python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist ``` ### **Clear Channel Cache** ```bash python download_karaoke.py --clear-cache SingKingKaraoke python download_karaoke.py --clear-cache all ``` ## ๐Ÿง  Songlist Integration - Place your prioritized song list in `data/songList.json` (see example format below). - The tool will match and prioritize these songs across all available channel videos. - Use `--songlist-only` to download only these songs, or `--songlist-priority` to prioritize them in the queue. - Use `--songlist-focus` to download only songs from specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`). - Download progress for the songlist is tracked globally in `data/songlist_tracking.json`. #### Example `data/songList.json` ```json [ { "title": "2025 - Apple Top 50", "songs": [ { "artist": "Kendrick Lamar & SZA", "title": "luther", "position": 1 }, { "artist": "Kendrick Lamar", "title": "Not Like Us", "position": 2 } ] }, { "title": "2024 - Billboard Hot 100", "songs": [ { "artist": "Taylor Swift", "title": "Cruel Summer", "position": 1 }, { "artist": "Billie Eilish", "title": "Happier Than Ever", "position": 2 } ] } ] ``` ## ๐Ÿ› ๏ธ Tracking & Caching - **data/karaoke_tracking.json**: Tracks all downloads, statuses, and formats - **data/songlist_tracking.json**: Tracks global songlist download progress - **data/server_duplicates_tracking.json**: Tracks songs found to be duplicates on the server for future skipping - **data/channel_cache.json**: Caches channel video lists for performance ## ๐Ÿ“‚ Folder Structure ``` KaroakeVideoDownloader/ โ”œโ”€โ”€ commands.txt # Complete CLI commands reference (copy/paste ready) โ”œโ”€โ”€ karaoke_downloader/ # All core Python code and utilities โ”‚ โ”œโ”€โ”€ downloader.py # Main orchestrator and CLI interface โ”‚ โ”œโ”€โ”€ cli.py # CLI entry point โ”‚ โ”œโ”€โ”€ video_downloader.py # Core video download execution and orchestration โ”‚ โ”œโ”€โ”€ tracking_manager.py # Download tracking and status management โ”‚ โ”œโ”€โ”€ download_planner.py # Download plan building and channel scanning โ”‚ โ”œโ”€โ”€ cache_manager.py # Cache operations and file I/O management โ”‚ โ”œโ”€โ”€ channel_manager.py # Channel and file management operations โ”‚ โ”œโ”€โ”€ songlist_manager.py # Songlist operations and tracking โ”‚ โ”œโ”€โ”€ server_manager.py # Server song availability checking โ”‚ โ”œโ”€โ”€ fuzzy_matcher.py # Fuzzy matching logic and similarity functions โ”‚ โ”œโ”€โ”€ youtube_utils.py # Centralized YouTube operations and yt-dlp commands โ”‚ โ”œโ”€โ”€ error_utils.py # Standardized error handling and formatting โ”‚ โ”œโ”€โ”€ download_pipeline.py # Abstracted download โ†’ verify โ†’ tag โ†’ track pipeline โ”‚ โ”œโ”€โ”€ id3_utils.py # ID3 tagging utilities โ”‚ โ”œโ”€โ”€ config_manager.py # Configuration management โ”‚ โ”œโ”€โ”€ check_resolution.py # Resolution checker utility โ”‚ โ”œโ”€โ”€ resolution_cli.py # Resolution config CLI โ”‚ โ””โ”€โ”€ tracking_cli.py # Tracking management CLI โ”œโ”€โ”€ data/ # All config, tracking, cache, and songlist files โ”‚ โ”œโ”€โ”€ config.json โ”‚ โ”œโ”€โ”€ karaoke_tracking.json โ”‚ โ”œโ”€โ”€ songlist_tracking.json โ”‚ โ”œโ”€โ”€ channel_cache.json โ”‚ โ”œโ”€โ”€ channels.txt โ”‚ โ””โ”€โ”€ songList.json โ”œโ”€โ”€ downloads/ # All video output โ”‚ โ””โ”€โ”€ [ChannelName]/ # Per-channel folders โ”œโ”€โ”€ logs/ # Download logs โ”œโ”€โ”€ downloader/yt-dlp.exe # yt-dlp binary โ”œโ”€โ”€ tests/ # Diagnostic and test scripts โ”‚ โ””โ”€โ”€ test_installation.py โ”œโ”€โ”€ download_karaoke.py # Main entry point (thin wrapper) โ”œโ”€โ”€ README.md โ”œโ”€โ”€ PRD.md โ”œโ”€โ”€ requirements.txt โ””โ”€โ”€ download_karaoke.bat # (optional Windows launcher) ``` ## ๐Ÿšฆ CLI Options > **๐Ÿ“‹ Complete Command Reference**: See `commands.txt` for all available commands with examples - perfect for copy/paste! ### Key Options: - `--file `: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes) - `--songlist-priority`: Prioritize songlist songs in download queue - `--songlist-only`: Download only songs from the songlist - `--songlist-focus ...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`) - `--songlist-status`: Show songlist download progress - `--limit `: Limit number of downloads (enables fast mode with early exit) - `--resolution <720p|1080p|...>`: Override resolution - `--status`: Show download/tracking status - `--reset-channel `: **Reset all tracking and files for a channel** - `--reset-songlist`: **When used with --reset-channel, also reset songlist songs for this channel** - `--clear-cache `: **Clear channel video cache for a specific channel or all** - `--clear-server-duplicates`: **Clear server duplicates tracking (allows re-checking songs against server)** - `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)** - `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available) - `--fuzzy-threshold `: Fuzzy match threshold (0-100, default 85) ## ๐Ÿ“ Example Usage > **๐Ÿ’ก For complete examples**: See `commands.txt` for all command variations with explanations! ```bash # Fast mode with fuzzy matching (no need to specify --file) python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85 # Latest videos per channel python download_karaoke.py --latest-per-channel --limit 5 # Traditional full scan (no limit) python download_karaoke.py --songlist-only # Channel-specific operations python download_karaoke.py --reset-channel SingKingKaraoke python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist python download_karaoke.py --clear-cache all python download_karaoke.py --clear-server-duplicates ``` ## ๐Ÿท๏ธ ID3 Tagging - Adds artist/title/album/genre to MP4 files using mutagen (if installed) ## ๐Ÿงน Cleanup - Removes `.info.json` and `.meta` files after download ## ๐Ÿ› ๏ธ Configuration - All options are in `data/config.json` (format, resolution, metadata, etc.) - You can edit this file or use CLI flags to override ## ๐Ÿ“‹ Command Reference File **`commands.txt`** contains a comprehensive list of all CLI commands with explanations. This file is designed for easy copy/paste usage and includes: - All basic download commands - Songlist operations - Latest-per-channel downloads - Cache and tracking management - Reset and cleanup operations - Advanced combinations - Common workflows - Troubleshooting commands > **๐Ÿ”„ Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage. ## ๐Ÿ”ง Refactoring Improvements (v3.2) The codebase has been comprehensively refactored to improve maintainability and reduce code duplication: ### **Key Improvements** - **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations - **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting - **Abstracted Download Pipeline**: Reusable download โ†’ verify โ†’ tag โ†’ track process for consistent processing - **Reduced Code Duplication**: Eliminated duplicate code across modules through centralized utilities ### **New Utility Modules** - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`error_utils.py`**: Standardized error handling with structured exception hierarchy - **`download_pipeline.py`**: Abstracted download pipeline for consistent processing ### **Benefits** - **Improved Maintainability**: Changes to yt-dlp configuration only require updates in one place - **Better Error Handling**: Consistent error messages and better debugging context - **Enhanced Testability**: Modular components can be tested independently - **Reduced Complexity**: Single source of truth for common operations ## ๐Ÿž Troubleshooting - Ensure `yt-dlp.exe` is in the `downloader/` folder - Check `logs/` for error details - Use `python -m karaoke_downloader.check_resolution` to verify video quality - If you see errors about ffmpeg/ffprobe, install [ffmpeg](https://ffmpeg.org/download.html) and ensure it is in your PATH - For best fuzzy matching, install rapidfuzz: `pip install rapidfuzz` (otherwise falls back to slower, less accurate difflib) --- **Happy Karaoke! ๐ŸŽค**