# 🎤 Karaoke Video Downloader – PRD (v3.4.4) ## ✅ Overview A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows and macOS with automatic platform detection. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse. --- ## 🏗️ Architecture The codebase has been refactored into focused modules with centralized utilities: ### Core Modules: - **`downloader.py`**: Main orchestrator and CLI interface - **`video_downloader.py`**: Core video download execution and orchestration - **`tracking_manager.py`**: Download tracking and status management - **`download_planner.py`**: Download plan building and channel scanning - **`cache_manager.py`**: Cache operations and file I/O management - **`channel_manager.py`**: Channel and file management operations - **`songlist_manager.py`**: Songlist operations and tracking - **`server_manager.py`**: Server song availability checking - **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions ### Utility Modules (v3.2): - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`error_utils.py`**: Standardized error handling and formatting - **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline - **`id3_utils.py`**: ID3 tagging utilities - **`config_manager.py`**: Configuration management - **`resolution_cli.py`**: Resolution checking utilities - **`tracking_cli.py`**: Tracking management CLI ### New Utility Modules (v3.3): - **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation - **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded ### Benefits of Enhanced Modular Architecture: - **Single Responsibility**: Each module has a focused purpose - **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized - **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules - **Testability**: Individual components can be tested separately - **Maintainability**: Easier to find and fix issues - **Reusability**: Components can be used independently - **Robustness**: Better error handling and interruption recovery - **Consistency**: Standardized error messages and processing pipelines - **Type Safety**: Comprehensive type hints across all new modules --- ## 📋 Goals - Download karaoke videos from YouTube channels or playlists. - Organize downloads by channel (or playlist) in subfolders. - Avoid re-downloading the same videos (robust tracking). - Prioritize and track a custom songlist across channels. - Allow flexible, user-friendly configuration. - Provide robust interruption handling and progress recovery. --- ## 🧑‍💻 Target Users - Karaoke DJs, home karaoke users, event hosts, or anyone needing offline karaoke video libraries. - Users comfortable with command-line tools. --- ## ⚙️ Platform & Stack - **Platform:** Windows, macOS - **Interface:** Command-line (CLI) - **Tech Stack:** Python 3.7+, yt-dlp (platform-specific binary), mutagen (for ID3 tagging) --- ## 📥 Input - YouTube channel or playlist URLs (e.g. `https://www.youtube.com/@SingKingKaraoke/videos`) - Optional: `data/channels.txt` file with multiple channel URLs (one per line) - **now defaults to this file if not specified** - Optional: `data/songList.json` for prioritized song downloads ### Example Usage ```bash python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos python download_karaoke.py --songlist-only --limit 5 python download_karaoke.py --latest-per-channel --limit 3 python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist python download_karaoke.py --clear-cache SingKingKaraoke ``` --- ## 📤 Output - MP4 files in `downloads//` subfolders - All videos tracked in `data/karaoke_tracking.json` - Songlist progress tracked in `data/songlist_tracking.json` - Logs in `logs/` --- ## 🛠️ Features - ✅ Channel-based downloads (with per-channel folders) - ✅ Robust JSON tracking (downloaded, partial, failed, etc.) - ✅ Batch saving and channel video caching for performance - ✅ Configurable download resolution and yt-dlp options (`data/config.json`) - ✅ Songlist integration: prioritize and track custom songlists - ✅ Songlist-only mode: download only songs from the songlist - ✅ Songlist focus mode: download only songs from specific playlists by title - ✅ Force download mode: bypass all existing file checks and re-download songs regardless of server duplicates or existing files - ✅ Global songlist tracking to avoid duplicates across channels - ✅ ID3 tagging for artist/title in MP4 files (mutagen) - ✅ Real-time progress and detailed logging - ✅ Automatic cleanup of extra yt-dlp files - ✅ **Reset/clear channel tracking and files via CLI** - ✅ **Clear channel cache via CLI** - ✅ **Download plan pre-scan and caching**: Before downloading, the tool pre-scans all channels for songlist matches, builds a download plan, and prints stats. The plan is cached for 1 day in data/download_plan_cache.json for fast resuming and reliability. Use --force-download-plan to force a refresh. - ✅ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with a per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N. - ✅ **Fast mode with early exit**: When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. If a download fails, it continues scanning until the limit is satisfied or all channels are exhausted. - ✅ **Deduplication across channels**: Ensures the same song (by artist + normalized title) is not downloaded more than once, even if it appears in multiple channels. Tracks unique keys and skips duplicates. - ✅ **Fuzzy matching**: Optionally use fuzzy string matching for songlist-to-video matching with configurable threshold (0-100, default 85). Uses rapidfuzz if available, falls back to difflib. - ✅ **Default channel file**: If no --file is specified for songlist-only or latest-per-channel modes, automatically uses data/channels.txt as the default channel list. - ✅ **Robust interruption handling**: Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted. - ✅ **Optimized scanning performance**: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching of large songlists and channels. - ✅ **Centralized yt-dlp command generation**: Standardized command building and execution across all download operations - ✅ **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting - ✅ **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing - ✅ **Reduced code duplication**: Eliminated duplicate code across modules through centralized utilities - ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations - ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules - ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation - ✅ **Manual video collection**: Static video collection system for managing individual karaoke videos that don't belong to regular channels. Use `--manual` to download from `data/manual_videos.json`. - ✅ **Channel-specific parsing rules**: JSON-based configuration for parsing video titles from different YouTube channels, with support for various title formats and cleanup rules. --- ## 📂 Folder Structure ``` KaroakeVideoDownloader/ ├── karaoke_downloader/ # All core Python code and utilities │ ├── downloader.py # Main orchestrator and CLI interface │ ├── cli.py # CLI entry point │ ├── video_downloader.py # Core video download execution and orchestration │ ├── tracking_manager.py # Download tracking and status management │ ├── download_planner.py # Download plan building and channel scanning │ ├── cache_manager.py # Cache operations and file I/O management │ ├── channel_manager.py # Channel and file management operations │ ├── songlist_manager.py # Songlist operations and tracking │ ├── server_manager.py # Server song availability checking │ ├── fuzzy_matcher.py # Fuzzy matching logic and similarity functions │ ├── youtube_utils.py # Centralized YouTube operations and yt-dlp commands │ ├── error_utils.py # Standardized error handling and formatting │ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline │ ├── id3_utils.py # ID3 tagging utilities │ ├── config_manager.py # Configuration management with dataclasses │ ├── file_utils.py # Centralized file operations and filename handling │ ├── song_validator.py # Centralized song validation logic │ ├── check_resolution.py # Resolution checker utility │ ├── resolution_cli.py # Resolution config CLI │ └── tracking_cli.py # Tracking management CLI ├── config/ # Configuration files │ └── config.json # Main configuration file ├── data/ # All tracking, cache, and songlist files │ ├── karaoke_tracking.json │ ├── songlist_tracking.json │ ├── channel_cache.json │ ├── channels.json # Channel configuration with parsing rules │ ├── channels.txt # Legacy channel list (backward compatibility) │ ├── manual_videos.json # Manual video collection │ └── songList.json ├── utilities/ # Utility scripts and tools │ ├── add_manual_video.py # Manual video management │ ├── build_cache_from_raw.py # Cache building utility │ ├── cleanup_duplicate_files.py # File cleanup utilities │ ├── cleanup_recent_tracking.py # Tracking cleanup utilities │ ├── deduplicate_songlist_tracking.py # Data deduplication │ ├── fix_artist_name_format.py # Data cleanup utilities │ ├── fix_artist_name_format_simple.py │ ├── fix_code_quality.py # Development tools │ ├── reset_and_redownload.py # Maintenance utilities │ └── songlist_report.py # Reporting utilities ├── downloads/ # All video output │ └── [ChannelName]/ # Per-channel folders ├── logs/ # Download logs ├── downloader/yt-dlp.exe # yt-dlp binary (Windows) ├── downloader/yt-dlp_macos # yt-dlp binary (macOS) ├── src/tests/ # Test scripts │ ├── test_macos.py # macOS setup and functionality tests │ └── test_platform.py # Platform detection tests ├── download_karaoke.py # Main entry point (thin wrapper) ├── README.md ├── PRD.md ├── requirements.txt └── download_karaoke.bat # (optional Windows launcher) ``` --- ## 🚦 CLI Options (Summary) - `--file `: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes) - `--songlist-priority`: Prioritize songlist songs in download queue - `--songlist-only`: Download only songs from the songlist - `--songlist-focus ...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`) - `--songlist-file `: Custom songlist file path to use with --songlist-focus (default: data/songList.json) - `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary** - `--songlist-status`: Show songlist download progress - `--limit `: Limit number of downloads (enables fast mode with early exit) - `--resolution <720p|1080p|...>`: Override resolution - `--status`: Show download/tracking status - `--reset-channel `: **Reset all tracking and files for a channel** - `--reset-songlist`: **When used with --reset-channel, also reset songlist songs for this channel** - `--clear-cache `: **Clear channel video cache for a specific channel or all** - `--force-download-plan`: **Force refresh the download plan cache (re-scan all channels for matches)** - `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)** - `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)** - `--fuzzy-threshold `: **Fuzzy match threshold (0-100, default 85)** - `--parallel`: **Enable parallel downloads for improved speed** - `--workers `: **Number of parallel download workers (1-10, default: 3, only used with --parallel)** - `--manual`: **Download from manual videos collection (data/manual_videos.json)** - `--channel-focus `: **Download from a specific channel by name (e.g., 'SingKingKaraoke')** - `--all-videos`: **Download all videos from channel (not just songlist matches), skipping existing files and songs in songs.json** - `--dry-run`: **Build download plan and show what would be downloaded without actually downloading anything** --- ## 🧠 Logic Highlights - **Tracking:** All downloads, statuses, and formats are tracked in JSON files for reliability and deduplication. - **Songlist:** Loads and normalizes `data/songList.json`, matches against available videos, and prioritizes or restricts downloads accordingly. - **Batch/Caching:** Channel video lists are cached to minimize API calls; tracking is batch-saved for performance. - **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files. - **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download. - **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels. - **Channel-Specific Parsing:** Uses `data/channels.json` to define parsing rules for each YouTube channel, handling different video title formats (e.g., "Artist - Title", "Artist Title", "Title | Artist", etc.). - **Manual Video Collection:** Static video management system using `data/manual_videos.json` for individual karaoke videos that don't belong to regular channels. Accessible via `--manual` parameter. ## 🔧 Refactoring Improvements (v3.3) The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization: ### **New Utility Modules (v3.3)** - **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation - `sanitize_filename()`: Create safe filenames from artist/title - `generate_possible_filenames()`: Generate filename patterns for different modes - `check_file_exists_with_patterns()`: Check for existing files using multiple patterns - `is_valid_mp4_file()`: Validate MP4 files with header checking - `cleanup_temp_files()`: Remove temporary yt-dlp files - `ensure_directory_exists()`: Safe directory creation - **`song_validator.py`**: Centralized song validation logic - `SongValidator` class: Unified logic for checking if songs should be downloaded - `should_skip_song()`: Comprehensive validation with multiple criteria - `mark_song_failed()`: Consistent failure tracking - `handle_download_failure()`: Standardized error handling - **Enhanced `config_manager.py`**: Robust configuration management with dataclasses - `ConfigManager` class: Type-safe configuration loading and caching - `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses - Configuration validation and merging with defaults - Dynamic resolution updates ### **Benefits Achieved** - **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules - **Centralized File Operations**: Single source of truth for filename handling and file validation - **Unified Song Validation**: Consistent logic for checking if songs should be downloaded - **Enhanced Type Safety**: Comprehensive type hints across all new modules - **Improved Configuration Management**: Structured configuration with validation and caching - **Better Error Handling**: Consistent patterns via centralized utilities - **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place - **Improved Testability**: Modular components can be tested independently - **Better Developer Experience**: Clear function signatures and comprehensive documentation ### **Previous Improvements (v3.2)** - **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations - **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting - **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing - **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set. - **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done. - **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach. - **Deduplication across channels:** Tracks unique song keys (artist + normalized title) to ensure the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list. - **Fuzzy matching:** Uses string similarity algorithms to find approximate matches between songlist entries and video titles, tolerating minor differences, typos, or extra words like "Karaoke" or "Official Video". - **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly. - **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted. - **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels. - **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls. - **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads. ### **New Parallel Download System (v3.4)** - **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management - **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10) - **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe - **Real-time progress tracking:** Shows active downloads, completion status, and overall progress - **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency - **Backward compatibility:** Sequential downloads remain the default when `--parallel` is not used - **Performance improvements:** Significantly faster downloads for large batches (3-5x speedup with 3-5 workers) - **Integrated with all modes:** Works with both songlist-across-channels and latest-per-channel download modes --- ## 🚀 Future Enhancements - [ ] Web UI for easier management - [ ] More advanced song matching (multi-language) - [ ] Download scheduling and retry logic - [ ] More granular status reporting - [x] **Parallel downloads for improved speed** ✅ **COMPLETED** - [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED** - [x] **Consolidated extract_artist_title function** ✅ **COMPLETED** - [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED** - [ ] Unit tests for all modules - [ ] Integration tests for end-to-end workflows - [ ] Plugin system for custom file operations - [ ] Advanced configuration UI - [ ] Real-time download progress visualization ## 🔧 Recent Bug Fixes & Improvements (v3.4.1) ### **Enhanced Fuzzy Matching (v3.4.1)** - **Improved `extract_artist_title` function**: Enhanced to handle multiple video title formats beyond simple "Artist - Title" patterns - **"Title Karaoke | Artist Karaoke Version" format**: Correctly parses titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version" - **"Title Artist KARAOKE" format**: Handles titles ending with "KARAOKE" and attempts to extract artist information - **Fallback handling**: Returns empty artist and full title for unparseable formats - **Consolidated function usage**: Removed duplicate `extract_artist_title` implementations across modules - **Single source of truth**: All modules now import from `fuzzy_matcher.py` - **Consistent parsing**: Eliminated inconsistencies between different parsing implementations - **Better maintainability**: Changes to parsing logic only need to be made in one place ### **Fixed Import Conflicts** - **Resolved import conflict in `download_planner.py`**: Updated to use the enhanced `extract_artist_title` from `fuzzy_matcher.py` instead of the simpler version from `id3_utils.py` - **Updated `id3_utils.py`**: Now imports `extract_artist_title` from `fuzzy_matcher.py` for consistency ### **Enhanced --limit Parameter** - **Fixed limit application**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution - **Improved performance**: When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists ### **Benefits of Recent Improvements** - **Better matching accuracy**: Enhanced fuzzy matching can now handle a wider variety of video title formats commonly found on YouTube karaoke channels - **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found - **Consistent behavior**: All parts of the system use the same parsing logic, eliminating edge cases where different modules would parse the same title differently - **Improved performance**: The `--limit` parameter now works as expected, providing faster processing for targeted downloads - **Cleaner codebase**: Eliminated duplicate code and import conflicts, making the system more maintainable ## 🔧 Recent Bug Fixes & Improvements (v3.4.2) ### **Duplicate File Prevention & Filename Consistency** - **Enhanced file existence checking**: `check_file_exists_with_patterns()` now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates - **Automatic duplicate prevention**: Download pipeline skips downloads when files already exist (including duplicates) - **Updated yt-dlp configuration**: Set `"nooverwrites": false` to prevent yt-dlp from creating duplicate files with suffixes - **Cleanup utility**: `data/cleanup_duplicate_files.py` provides interactive cleanup of existing duplicate files - **Filename vs ID3 tag consistency**: Removed "(Karaoke Version)" suffix from ID3 tags to match filenames exactly - **Unified parsing**: Both filename generation and ID3 tagging use the same artist/title extraction logic ### **Benefits of Duplicate Prevention** - **No more duplicate files**: Eliminates `(2)`, `(3)` suffix files that waste disk space - **Consistent metadata**: Filename and ID3 tag use identical artist/title format - **Efficient disk usage**: Prevents unnecessary downloads of existing files - **Clear file identification**: Consistent naming across all file operations ## 🛠️ Maintenance ### **Regular Cleanup** - Run the cleanup utility periodically to remove any duplicate files - Monitor downloads for any new duplicate creation (should be rare with fixes) ### **Configuration** - Keep `"nooverwrites": false` in `data/config.json` - This prevents yt-dlp from creating duplicate files ### **Monitoring** - Check logs for "⏭️ Skipping download - file already exists" messages - These indicate the duplicate prevention is working correctly ## 🔧 Recent Bug Fixes & Improvements (v3.4.3) ### **Manual Video Collection System** - **New `--manual` parameter**: Simple access to manual video collection via `python download_karaoke.py --manual --limit 5` - **Static video management**: `data/manual_videos.json` stores individual karaoke videos that don't belong to regular channels - **Helper script**: `add_manual_video.py` provides easy management of manual video entries - **Full integration**: Manual videos work with all existing features (songlist matching, fuzzy matching, parallel downloads, etc.) - **No yt-dlp dependency**: Manual videos bypass YouTube API calls for video listing, using static data instead ### **Channel-Specific Parsing Rules** - **JSON-based configuration**: `data/channels.json` replaces `data/channels.txt` with structured channel configuration - **Parsing rules per channel**: Each channel can define custom parsing rules for video titles - **Multiple format support**: Handles various title formats like "Artist - Title", "Artist Title", "Title | Artist", etc. - **Suffix cleanup**: Automatic removal of common karaoke-related suffixes - **Multi-artist support**: Parsing for titles with multiple artists separated by specific delimiters - **Backward compatibility**: Still supports legacy `data/channels.txt` format ### **Benefits of New Features** - **Flexible video management**: Easy addition of individual karaoke videos without creating new channels - **Accurate parsing**: Channel-specific rules ensure correct artist/title extraction for ID3 tags and filenames - **Consistent metadata**: Proper parsing prevents filename and ID3 tag inconsistencies - **Easy maintenance**: Simple JSON structure for managing both channels and manual videos - **Full feature compatibility**: Manual videos work seamlessly with existing download modes and features ## 📚 Documentation Standards ### **Documentation Location** - **All changes, refactoring, and improvements should be documented in the PRD.md and README.md files** - **Do NOT create separate .md files for documenting changes, refactoring, or improvements** - **Use the existing sections in PRD.md and README.md to track all project evolution** ### **Where to Document Changes** - **PRD.md**: Technical details, architecture changes, bug fixes, and implementation specifics - **README.md**: User-facing features, usage instructions, and high-level improvements - **CHANGELOG.md**: Version-specific release notes and change summaries ### **Documentation Requirements** - **All new features must be documented in both PRD.md and README.md** - **All refactoring efforts must be documented in the appropriate sections** - **All bug fixes must be documented with technical details** - **Version numbers and dates should be clearly marked** - **Benefits and improvements should be explicitly stated** ### **Maintenance Responsibility** - **Keep PRD.md and README.md synchronized with code changes** - **Update documentation immediately when implementing new features** - **Remove outdated information and consolidate related changes** - **Ensure all CLI options and features are documented in both files** ## 🔧 Recent Bug Fixes & Improvements (v3.4.4) ### **All Videos Download Mode** - **New `--all-videos` parameter**: Download all videos from a channel, not just songlist matches - **Smart MP3/MP4 detection**: Automatically detects if you have MP3 versions in songs.json and downloads MP4 video versions - **Existing file skipping**: Skips videos that already exist on the filesystem - **Progress tracking**: Shows clear progress with "Downloading X/Y videos" format - **Parallel processing support**: Works with `--parallel --workers N` for faster downloads - **Channel focus integration**: Works with `--channel-focus` to target specific channels - **Limit support**: Works with `--limit N` to control download batch size ### **Smart Songlist Integration** - **MP4 version detection**: Checks if MP4 version already exists in songs.json before downloading - **MP3 upgrade path**: Downloads MP4 video versions when only MP3 versions exist in songlist - **Duplicate prevention**: Skips downloads when MP4 versions already exist - **Efficient filtering**: Only processes videos that need to be downloaded ### **Benefits of All Videos Mode** - **Complete channel downloads**: Download entire channels without songlist restrictions - **Automatic format upgrading**: Upgrade MP3 collections to MP4 video versions - **Efficient processing**: Only downloads videos that don't already exist - **Flexible control**: Use with limits, parallel processing, and channel targeting - **Clear progress feedback**: Real-time progress tracking for large downloads ## 🔧 Recent Bug Fixes & Improvements (v3.4.5) ### **Unified Download Workflow Architecture** - **Unified execution pipeline**: All download modes now use the same execution workflow, eliminating inconsistencies and broken pipelines - **Consistent behavior**: All modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) use identical download execution, progress tracking, and error handling - **Centralized download logic**: Single `execute_unified_download_workflow()` method handles all download execution - **Automatic parallel support**: All download modes automatically support `--parallel --workers N` without additional implementation - **Unified cache management**: Consistent progress tracking and resume functionality across all modes ### **Architecture Pattern for New Download Modes** When adding new download modes in the future, follow this pattern to ensure consistency: #### **1. Download Plan Building (Mode-Specific)** Each download mode should build a download plan (list of videos to download) with this structure: ```python download_plan = [ { "video_id": "video_id", "artist": "artist_name", "title": "song_title", "filename": "sanitized_filename.mp4", "channel_name": "channel_name", "video_title": "original_video_title", "force_download": False } ] ``` #### **2. Unified Execution (Shared)** All modes should use the unified execution workflow: ```python downloaded_count, success = self.execute_unified_download_workflow( download_plan=download_plan, cache_file=cache_file, # Optional, for progress tracking limit=limit, # Optional, for limiting downloads show_progress=True, # Optional, for progress display ) ``` #### **3. Execution Method Selection (Automatic)** The unified workflow automatically chooses execution method based on settings: - **Sequential**: Uses `DownloadPipeline` for single-threaded downloads - **Parallel**: Uses `ParallelDownloader` when `--parallel` is enabled #### **4. Required Implementation Pattern** ```python def download_new_mode(self, ...): """New download mode implementation.""" # 1. Build download plan (mode-specific logic) download_plan = [] for video in videos_to_download: download_plan.append({ "video_id": video["id"], "artist": artist, "title": title, "filename": filename, "channel_name": channel_name, "video_title": video["title"], "force_download": force_download }) # 2. Create cache file (optional, for progress tracking) cache_file = get_download_plan_cache_file("new_mode", **plan_kwargs) save_plan_cache(cache_file, download_plan, []) # 3. Use unified execution workflow downloaded_count, success = self.execute_unified_download_workflow( download_plan=download_plan, cache_file=cache_file, limit=limit, show_progress=True, ) return success ``` ### **Benefits of Unified Architecture** - **Consistency**: All modes behave identically for execution, progress tracking, and error handling - **Maintainability**: Changes to download execution only need to be made in one place - **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes - **Extensibility**: New modes automatically get all existing features (parallel downloads, progress tracking, etc.) - **Testing**: Easier to test since all modes use the same execution logic ### **What Was Fixed** - **Broken Pipeline**: Previously, different modes used different execution paths, leading to inconsistencies - **Missing Method**: Added missing `download_latest_per_channel()` method that was referenced in CLI but not implemented - **Code Duplication**: Eliminated duplicate download execution logic across different modes - **Inconsistent Behavior**: All modes now have identical progress tracking, error handling, and cache management ### **Future Development Guidelines** 1. **NEVER implement custom download execution logic** in new download modes 2. **ALWAYS use `execute_unified_download_workflow()`** for download execution 3. **Focus on download plan building** - that's where mode-specific logic belongs 4. **Use the standard download plan structure** for consistency 5. **Implement cache file handling** for progress tracking and resume functionality 6. **Test with both sequential and parallel modes** to ensure compatibility --- ## 🚀 Future Enhancements - [ ] Web UI for easier management - [ ] More advanced song matching (multi-language) - [ ] Download scheduling and retry logic - [ ] More granular status reporting - [x] **Parallel downloads for improved speed** ✅ **COMPLETED** - [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED** - [x] **Consolidated extract_artist_title function** ✅ **COMPLETED** - [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED** - [ ] Unit tests for all modules - [ ] Integration tests for end-to-end workflows - [ ] Plugin system for custom file operations - [ ] Advanced configuration UI - [ ] Real-time download progress visualization ## 🔧 Recent Bug Fixes & Improvements (v3.4.4) ### **macOS Support with Automatic Platform Detection** - **Cross-platform compatibility**: Added support for macOS alongside Windows - **Automatic platform detection**: Detects operating system and selects appropriate yt-dlp binary - **Flexible yt-dlp integration**: Supports both binary files (`yt-dlp_macos`) and pip installation (`python3 -m yt_dlp`) - **Setup automation**: `setup_macos.py` script for easy macOS setup with FFmpeg and yt-dlp installation - **Command parsing**: Intelligent parsing of yt-dlp commands (file paths vs. module commands) - **Enhanced validation**: Platform-specific error messages and validation in CLI - **Backward compatibility**: Maintains full compatibility with existing Windows installations ### **Benefits of macOS Support** - **Native macOS experience**: No need for Windows compatibility layers or virtualization - **Automatic setup**: Simple setup script handles all dependencies - **Flexible installation**: Choose between binary download or pip installation - **Consistent functionality**: All features work identically on both platforms - **Easy maintenance**: Platform detection handles configuration automatically ### **Setup Instructions** ```bash # Automatic setup (recommended) python3 setup_macos.py # Test installation python3 src/tests/test_macos.py # Manual setup options # 1. Install yt-dlp via pip: pip3 install yt-dlp # 2. Download binary: curl -L -o downloader/yt-dlp_macos https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos # 3. Install FFmpeg: brew install ffmpeg ``` ## 🔧 Recent Bug Fixes & Improvements (v3.4.7) ### **Configurable Data Directory Path** - **Centralized Data Path Management**: New `data_path_manager.py` module provides unified data directory path management - **Configurable Location**: Data directory path can be set in `config/config.json` under `folder_structure.data_dir` - **Backward Compatibility**: Defaults to "data" directory if not configured - **Cross-Project Integration**: Enables the karaoke downloader to be used as a component in other projects with different data directory structures - **Updated All Modules**: All modules now use the data path manager instead of hardcoded "data/" paths - **Utility Functions**: Provides `get_data_path()`, `get_data_dir()`, and `get_data_path_manager()` functions for easy access - **Fixed Circular Dependency**: Moved `config.json` from `data/` to root directory to resolve chicken-and-egg problem ### **Benefits of Configurable Data Directory** - **Flexible Deployment**: Can be integrated into other projects with different directory structures - **Centralized Configuration**: Single point of configuration for all data file paths - **Maintainable Code**: Eliminates hardcoded paths throughout the codebase - **Easy Testing**: Can use temporary directories for testing without affecting production data - **Future-Proof**: Makes it easier to change data directory structure in the future ### **Circular Dependency Solution** The original implementation had a circular dependency problem: - **Problem**: `config.json` was located in the `data/` directory - **Issue**: To read the config file, we needed to know where the data directory is - **Conflict**: But the data directory location is specified in the config file - **Solution**: Moved `config.json` to the `config/` directory as a fixed location - **Result**: Config file is always accessible in a dedicated config directory, and data directory can be configured within it - **Backward Compatibility**: System still works with config files in custom data directories when explicitly specified ## 🔧 Recent Bug Fixes & Improvements (v3.4.6) ### **Dry Run Mode** - **New `--dry-run` parameter**: Build download plan and show what would be downloaded without actually downloading anything - **Plan preview**: Shows total videos in plan and preview of first 5 videos - **Safe testing**: Test download configurations without consuming bandwidth or disk space - **All mode support**: Works with all download modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) - **Progress simulation**: Shows what the download process would look like without executing it ### **Benefits of Dry Run Mode** - **Safe testing**: Test complex download configurations without downloading anything - **Plan validation**: Verify that the download plan contains the expected videos - **Configuration debugging**: Troubleshoot download settings before committing to downloads - **Resource conservation**: Save bandwidth and disk space during testing - **User education**: Help users understand what the tool will do before running it ### **Example Usage** ```bash # Test songlist download plan python download_karaoke.py --songlist-only --limit 5 --dry-run # Test channel download plan python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 10 --dry-run # Test with fuzzy matching python download_karaoke.py --songlist-only --fuzzy-match --limit 3 --dry-run ``` ### **Future Development Guidelines**