# ๐ŸŽค Karaoke Video Downloader A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows, macOS, and Linux with automatic platform detection, optimized caching, and FFmpeg integration. ## โœจ Features - ๐ŸŽต **Channel & Playlist Downloads**: Download all videos from a YouTube channel or playlist - ๐Ÿ“‚ **Organized Storage**: Each channel gets its own folder in `downloads/` - ๐Ÿ“ **Robust Tracking**: Tracks all downloads, statuses, and formats in JSON - ๐Ÿ† **Songlist Prioritization**: Prioritize or restrict downloads to a custom songlist - ๐Ÿ”„ **Batch Saving & Caching**: Efficient, minimizes API calls - ๐Ÿท๏ธ **ID3 Tagging**: Adds artist/title metadata to MP4 files - ๐Ÿงน **Automatic Cleanup**: Removes extra yt-dlp files - ๐Ÿ“ˆ **Real-Time Progress**: Detailed console and log output - ๐Ÿงน **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI - ๐Ÿ—‚๏ธ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N. - ๐Ÿงฉ **Fuzzy Matching**: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results) - โšก **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads - ๐Ÿ”„ **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list - ๐Ÿ“‹ **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time) - ๐Ÿ›ก๏ธ **Robust Interruption Handling**: Progress is saved after each download, preventing re-downloads if the process is interrupted - โšก **Optimized Scanning**: High-performance channel scanning with O(nร—m) complexity, pre-processed lookups, and early termination for faster matching - ๐Ÿท๏ธ **Server Duplicates Tracking**: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server - โšก **Parallel Downloads**: Enable concurrent downloads with `--parallel --workers N` for significantly faster batch downloads (3-5x speedup) - ๐ŸŒ **Cross-Platform Support**: Automatic platform detection and yt-dlp integration for Windows, macOS, and Linux - ๐Ÿš€ **Optimized Caching**: Enhanced channel video caching with instant video list loading - ๐ŸŽฌ **FFmpeg Integration**: Automatic FFmpeg installation and configuration for optimal video processing ## ๐Ÿ—๏ธ Architecture The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse: ### Core Modules: - **`downloader.py`**: Main orchestrator and CLI interface - **`video_downloader.py`**: Core video download execution and orchestration - **`tracking_manager.py`**: Download tracking and status management - **`download_planner.py`**: Download plan building and channel scanning - **`cache_manager.py`**: Cache operations and file I/O management - **`channel_manager.py`**: Channel and file management operations - **`songlist_manager.py`**: Songlist operations and tracking - **`server_manager.py`**: Server song availability checking - **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions ### Utility Modules (v3.2): - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`error_utils.py`**: Standardized error handling and formatting - **`download_pipeline.py`**: Abstracted download โ†’ verify โ†’ tag โ†’ track pipeline - **`id3_utils.py`**: ID3 tagging utilities - **`config_manager.py`**: Configuration management - **`resolution_cli.py`**: Resolution checking utilities - **`tracking_cli.py`**: Tracking management CLI ### New Utility Modules (v3.3): - **`parallel_downloader.py`**: Parallel download management with thread-safe operations - `ParallelDownloader` class: Manages concurrent downloads with configurable workers - `DownloadTask` and `DownloadResult` dataclasses: Structured task and result management - Thread-safe progress tracking and error handling - Automatic retry mechanism for failed downloads - **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation - `sanitize_filename()`: Create safe filenames from artist/title - `generate_possible_filenames()`: Generate filename patterns for different modes - `check_file_exists_with_patterns()`: Check for existing files using multiple patterns - `is_valid_mp4_file()`: Validate MP4 files with header checking - `cleanup_temp_files()`: Remove temporary yt-dlp files - `ensure_directory_exists()`: Safe directory creation - **`song_validator.py`**: Centralized song validation logic - `SongValidator` class: Unified logic for checking if songs should be downloaded - `should_skip_song()`: Comprehensive validation with multiple criteria - `mark_song_failed()`: Consistent failure tracking - `handle_download_failure()`: Standardized error handling - **Enhanced `config_manager.py`**: Robust configuration management with dataclasses - `ConfigManager` class: Type-safe configuration loading and caching - `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses - Configuration validation and merging with defaults - Dynamic resolution updates ### Benefits: - **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized - **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules - **Consistency**: Standardized error messages and processing pipelines - **Maintainability**: Changes isolated to specific modules - **Testability**: Modular components can be tested independently - **Type Safety**: Comprehensive type hints across all new modules ## ๐Ÿ“‹ Requirements - **Windows 10/11, macOS 10.14+, or Linux** - **Python 3.7+** - **yt-dlp binary** (platform-specific, see setup instructions below) - **mutagen** (for ID3 tagging, optional) - **ffmpeg/ffprobe** (for video validation, optional but recommended) - **rapidfuzz** (for fuzzy matching, optional, falls back to difflib) ## ๐Ÿ–ฅ๏ธ Platform Setup ### Automatic Setup (Recommended) Run the platform setup script to automatically set up yt-dlp for your system: ```bash python setup_platform.py ``` This script will: - Detect your platform (Windows, macOS, or Linux) - Offer two installation options: 1. **Download binary file** (recommended for most users) 2. **Install via pip** (alternative method) - Make binaries executable (on Unix-like systems) - Install FFmpeg (for optimal video processing) - Test the installation ### Manual Setup If you prefer to set up manually: #### Option 1: Download Binary Files 1. **Windows**: Download `yt-dlp.exe` from [yt-dlp releases](https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp.exe) 2. **macOS**: Download `yt-dlp_macos` from [yt-dlp releases](https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos) 3. **Linux**: Download `yt-dlp` from [yt-dlp releases](https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp) Place the downloaded file in the `downloader/` directory and make it executable on Unix-like systems: ```bash chmod +x downloader/yt-dlp_macos # macOS chmod +x downloader/yt-dlp # Linux ``` #### Option 2: Install via pip ```bash pip install yt-dlp ``` The tool will automatically detect and use the pip-installed version on macOS. **Note**: FFmpeg is also required for optimal video processing. The setup script will attempt to install it automatically, or you can install it manually: - **macOS**: `brew install ffmpeg` - **Linux**: `sudo apt install ffmpeg` (Ubuntu/Debian) or `sudo yum install ffmpeg` (CentOS/RHEL) - **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) ## ๐Ÿš€ Quick Start > **๐Ÿ’ก Pro Tip**: For a complete list of all available commands, see `commands.txt` - you can copy/paste any command directly into your terminal! ### Download a Channel ```bash python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos ``` ### Download Only Songlist Songs (Fast Mode) ```bash python download_karaoke.py --songlist-only --limit 5 ``` ### Download with Parallel Processing ```bash python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10 ``` ### Focus on Specific Playlists by Title ```bash python download_karaoke.py --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100" ``` ### Download with Fuzzy Matching ```bash python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85 ``` ### Download Latest N Videos Per Channel ```bash python download_karaoke.py --latest-per-channel --limit 5 ``` ### Download Latest N Videos Per Channel (with fuzzy matching) ```bash python download_karaoke.py --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85 ``` ### Prioritize Songlist in Download Queue ```bash python download_karaoke.py --songlist-priority ``` ### Show Songlist Download Progress ```bash python download_karaoke.py --songlist-status ``` ### Limit Number of Downloads ```bash python download_karaoke.py --limit 5 ``` ### Override Resolution ```bash python download_karaoke.py --resolution 1080p ``` ### **Reset/Start Over for a Channel** ```bash python download_karaoke.py --reset-channel SingKingKaraoke ``` ### **Reset Channel and Songlist Songs** ```bash python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist ``` ### **Clear Channel Cache** ```bash python download_karaoke.py --clear-cache SingKingKaraoke python download_karaoke.py --clear-cache all ``` ## ๐Ÿง  Songlist Integration - Place your prioritized song list in `data/songList.json` (see example format below). - The tool will match and prioritize these songs across all available channel videos. - Use `--songlist-only` to download only these songs, or `--songlist-priority` to prioritize them in the queue. - Use `--songlist-focus` to download only songs from specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`). - Download progress for the songlist is tracked globally in `data/songlist_tracking.json`. #### Example `data/songList.json` ```json [ { "title": "2025 - Apple Top 50", "songs": [ { "artist": "Kendrick Lamar & SZA", "title": "luther", "position": 1 }, { "artist": "Kendrick Lamar", "title": "Not Like Us", "position": 2 } ] }, { "title": "2024 - Billboard Hot 100", "songs": [ { "artist": "Taylor Swift", "title": "Cruel Summer", "position": 1 }, { "artist": "Billie Eilish", "title": "Happier Than Ever", "position": 2 } ] } ] ``` ## ๐Ÿ› ๏ธ Tracking & Caching - **data/karaoke_tracking.json**: Tracks all downloads, statuses, and formats - **data/songlist_tracking.json**: Tracks global songlist download progress - **data/server_duplicates_tracking.json**: Tracks songs found to be duplicates on the server for future skipping - **data/channel_cache.json**: Caches channel video lists for performance ## ๐Ÿ“‚ Folder Structure ``` KaroakeVideoDownloader/ โ”œโ”€โ”€ commands.txt # Complete CLI commands reference (copy/paste ready) โ”œโ”€โ”€ karaoke_downloader/ # All core Python code and utilities โ”‚ โ”œโ”€โ”€ downloader.py # Main orchestrator and CLI interface โ”‚ โ”œโ”€โ”€ cli.py # CLI entry point โ”‚ โ”œโ”€โ”€ video_downloader.py # Core video download execution and orchestration โ”‚ โ”œโ”€โ”€ tracking_manager.py # Download tracking and status management โ”‚ โ”œโ”€โ”€ download_planner.py # Download plan building and channel scanning โ”‚ โ”œโ”€โ”€ cache_manager.py # Cache operations and file I/O management โ”‚ โ”œโ”€โ”€ channel_manager.py # Channel and file management operations โ”‚ โ”œโ”€โ”€ songlist_manager.py # Songlist operations and tracking โ”‚ โ”œโ”€โ”€ server_manager.py # Server song availability checking โ”‚ โ”œโ”€โ”€ fuzzy_matcher.py # Fuzzy matching logic and similarity functions โ”‚ โ”œโ”€โ”€ youtube_utils.py # Centralized YouTube operations and yt-dlp commands โ”‚ โ”œโ”€โ”€ error_utils.py # Standardized error handling and formatting โ”‚ โ”œโ”€โ”€ download_pipeline.py # Abstracted download โ†’ verify โ†’ tag โ†’ track pipeline โ”‚ โ”œโ”€โ”€ id3_utils.py # ID3 tagging utilities โ”‚ โ”œโ”€โ”€ config_manager.py # Configuration management with dataclasses โ”‚ โ”œโ”€โ”€ file_utils.py # Centralized file operations and filename handling โ”‚ โ”œโ”€โ”€ song_validator.py # Centralized song validation logic โ”‚ โ”œโ”€โ”€ check_resolution.py # Resolution checker utility โ”‚ โ”œโ”€โ”€ resolution_cli.py # Resolution config CLI โ”‚ โ””โ”€โ”€ tracking_cli.py # Tracking management CLI โ”œโ”€โ”€ data/ # All config, tracking, cache, and songlist files โ”‚ โ”œโ”€โ”€ config.json โ”‚ โ”œโ”€โ”€ karaoke_tracking.json โ”‚ โ”œโ”€โ”€ songlist_tracking.json โ”‚ โ”œโ”€โ”€ channel_cache.json โ”‚ โ”œโ”€โ”€ channels.txt โ”‚ โ””โ”€โ”€ songList.json โ”œโ”€โ”€ downloads/ # All video output โ”‚ โ””โ”€โ”€ [ChannelName]/ # Per-channel folders โ”œโ”€โ”€ logs/ # Download logs โ”œโ”€โ”€ downloader/yt-dlp.exe # yt-dlp binary (Windows) โ”œโ”€โ”€ downloader/yt-dlp_macos # yt-dlp binary (macOS) โ”œโ”€โ”€ downloader/yt-dlp # yt-dlp binary (Linux) โ”œโ”€โ”€ setup_platform.py # Platform setup script โ”œโ”€โ”€ test_platform.py # Platform test script โ”œโ”€โ”€ tests/ # Diagnostic and test scripts โ”‚ โ””โ”€โ”€ test_installation.py โ”œโ”€โ”€ download_karaoke.py # Main entry point (thin wrapper) โ”œโ”€โ”€ README.md โ”œโ”€โ”€ PRD.md โ”œโ”€โ”€ requirements.txt โ””โ”€โ”€ download_karaoke.bat # (optional Windows launcher) ``` ## ๐Ÿšฆ CLI Options > **๐Ÿ“‹ Complete Command Reference**: See `commands.txt` for all available commands with examples - perfect for copy/paste! ### Key Options: - `--file `: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes) - `--songlist-priority`: Prioritize songlist songs in download queue - `--songlist-only`: Download only songs from the songlist - `--songlist-focus ...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`) - `--songlist-status`: Show songlist download progress - `--limit `: Limit number of downloads (enables fast mode with early exit) - `--resolution <720p|1080p|...>`: Override resolution - `--status`: Show download/tracking status - `--reset-channel `: **Reset all tracking and files for a channel** - `--reset-songlist`: **When used with --reset-channel, also reset songlist songs for this channel** - `--clear-cache `: **Clear channel video cache for a specific channel or all** - `--clear-server-duplicates`: **Clear server duplicates tracking (allows re-checking songs against server)** - `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)** - `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available) - `--fuzzy-threshold `: Fuzzy match threshold (0-100, default 85) - `--parallel`: Enable parallel downloads for improved speed - `--workers `: Number of parallel download workers (1-10, default: 3) ## ๐Ÿ“ Example Usage > **๐Ÿ’ก For complete examples**: See `commands.txt` for all command variations with explanations! ```bash # Fast mode with fuzzy matching (no need to specify --file) python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85 # Parallel downloads for faster processing python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10 # Latest videos per channel with parallel downloads python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5 # Traditional full scan (no limit) python download_karaoke.py --songlist-only # Channel-specific operations python download_karaoke.py --reset-channel SingKingKaraoke python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist python download_karaoke.py --clear-cache all python download_karaoke.py --clear-server-duplicates ``` ## ๐Ÿท๏ธ ID3 Tagging - Adds artist/title/album/genre to MP4 files using mutagen (if installed) ## ๐Ÿงน Cleanup - Removes `.info.json` and `.meta` files after download ## ๐Ÿ› ๏ธ Configuration - All options are in `data/config.json` (format, resolution, metadata, etc.) - You can edit this file or use CLI flags to override ## ๐Ÿ“‹ Command Reference File **`commands.txt`** contains a comprehensive list of all CLI commands with explanations. This file is designed for easy copy/paste usage and includes: - All basic download commands - Songlist operations - Latest-per-channel downloads - Cache and tracking management - Reset and cleanup operations - Advanced combinations - Common workflows - Troubleshooting commands > **๐Ÿ”„ Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage. ## ๐Ÿ”ง Refactoring Improvements (v3.5) The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization: ### **New Utility Modules (v3.3)** - **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation - `sanitize_filename()`: Create safe filenames from artist/title - `generate_possible_filenames()`: Generate filename patterns for different modes - `check_file_exists_with_patterns()`: Check for existing files using multiple patterns - `is_valid_mp4_file()`: Validate MP4 files with header checking - `cleanup_temp_files()`: Remove temporary yt-dlp files - `ensure_directory_exists()`: Safe directory creation - **`song_validator.py`**: Centralized song validation logic - `SongValidator` class: Unified logic for checking if songs should be downloaded - `should_skip_song()`: Comprehensive validation with multiple criteria - `mark_song_failed()`: Consistent failure tracking - `handle_download_failure()`: Standardized error handling - **Enhanced `config_manager.py`**: Robust configuration management with dataclasses - `ConfigManager` class: Type-safe configuration loading and caching - `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses - Configuration validation and merging with defaults - Dynamic resolution updates ### **Benefits Achieved** - **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules - **Centralized File Operations**: Single source of truth for filename handling and file validation - **Unified Song Validation**: Consistent logic for checking if songs should be downloaded - **Enhanced Type Safety**: Comprehensive type hints across all new modules - **Improved Configuration Management**: Structured configuration with validation and caching - **Better Error Handling**: Consistent patterns via centralized utilities - **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place - **Improved Testability**: Modular components can be tested independently - **Better Developer Experience**: Clear function signatures and comprehensive documentation ### **Cross-Platform Support (v3.5)** - **Platform detection:** Automatic detection of Windows, macOS, and Linux systems - **Flexible yt-dlp integration:** Supports both binary files and pip-installed yt-dlp modules - **Platform-specific configuration:** Automatic selection of appropriate yt-dlp binary/command for each platform - **Setup automation:** `setup_platform.py` script for easy platform-specific setup - **Command parsing:** Intelligent parsing of yt-dlp commands (file paths vs. module commands) - **Enhanced documentation:** Platform-specific setup instructions and troubleshooting - **Backward compatibility:** Maintains full compatibility with existing Windows installations - **FFmpeg integration:** Automatic FFmpeg installation and configuration for optimal video processing - **Optimized caching:** Enhanced channel video caching with format compatibility and instant video list loading ### **New Parallel Download System (v3.4)** - **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management - **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10) - **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe - **Real-time progress tracking:** Shows active downloads, completion status, and overall progress - **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency - **Backward compatibility:** Sequential downloads remain the default when `--parallel` is not used - **Performance improvements:** Significantly faster downloads for large batches (3-5x speedup with 3-5 workers) - **Integrated with all modes:** Works with both songlist-across-channels and latest-per-channel download modes ### **Previous Improvements (v3.2)** - **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations - **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting - **Abstracted Download Pipeline**: Reusable download โ†’ verify โ†’ tag โ†’ track process for consistent processing - **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set. - **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done. - **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach. - **Deduplication across channels:** Tracks unique song keys (artist + normalized title) to ensure the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list. - **Fuzzy matching:** Uses string similarity algorithms to find approximate matches between songlist entries and video titles, tolerating minor differences, typos, or extra words like "Karaoke" or "Official Video". - **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly. - **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted. - **Optimized scanning algorithm:** High-performance channel scanning with O(nร—m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels. - **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls. - **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads. ## ๐Ÿž Troubleshooting - **Platform-specific yt-dlp setup**: - **Windows**: Ensure `yt-dlp.exe` is in the `downloader/` folder - **macOS**: Either ensure `yt-dlp_macos` is in the `downloader/` folder (make executable with `chmod +x`) OR install via pip (`pip install yt-dlp`) - **Linux**: Ensure `yt-dlp` is in the `downloader/` folder (make executable with `chmod +x`) - Run `python setup_platform.py` to automatically set up yt-dlp for your platform - Check `logs/` for error details - Use `python -m karaoke_downloader.check_resolution` to verify video quality - If you see errors about ffmpeg/ffprobe, install [ffmpeg](https://ffmpeg.org/download.html) and ensure it is in your PATH - For best fuzzy matching, install rapidfuzz: `pip install rapidfuzz` (otherwise falls back to slower, less accurate difflib) --- **Happy Karaoke! ๐ŸŽค**