434 lines
29 KiB
Markdown
434 lines
29 KiB
Markdown
|
||
# 🎤 Karaoke Video Downloader – PRD (v3.4.3)
|
||
|
||
## ✅ Overview
|
||
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
||
|
||
---
|
||
|
||
## 🏗️ Architecture
|
||
The codebase has been refactored into focused modules with centralized utilities:
|
||
|
||
### Core Modules:
|
||
- **`downloader.py`**: Main orchestrator and CLI interface
|
||
- **`video_downloader.py`**: Core video download execution and orchestration
|
||
- **`tracking_manager.py`**: Download tracking and status management
|
||
- **`download_planner.py`**: Download plan building and channel scanning
|
||
- **`cache_manager.py`**: Cache operations and file I/O management
|
||
- **`channel_manager.py`**: Channel and file management operations
|
||
- **`songlist_manager.py`**: Songlist operations and tracking
|
||
- **`server_manager.py`**: Server song availability checking
|
||
- **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions
|
||
|
||
### Utility Modules (v3.2):
|
||
- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation
|
||
- **`error_utils.py`**: Standardized error handling and formatting
|
||
- **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline
|
||
- **`id3_utils.py`**: ID3 tagging utilities
|
||
- **`config_manager.py`**: Configuration management
|
||
- **`resolution_cli.py`**: Resolution checking utilities
|
||
- **`tracking_cli.py`**: Tracking management CLI
|
||
|
||
### New Utility Modules (v3.3):
|
||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||
- **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded
|
||
|
||
### Benefits of Enhanced Modular Architecture:
|
||
- **Single Responsibility**: Each module has a focused purpose
|
||
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
||
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
||
- **Testability**: Individual components can be tested separately
|
||
- **Maintainability**: Easier to find and fix issues
|
||
- **Reusability**: Components can be used independently
|
||
- **Robustness**: Better error handling and interruption recovery
|
||
- **Consistency**: Standardized error messages and processing pipelines
|
||
- **Type Safety**: Comprehensive type hints across all new modules
|
||
|
||
---
|
||
|
||
## 📋 Goals
|
||
- Download karaoke videos from YouTube channels or playlists.
|
||
- Organize downloads by channel (or playlist) in subfolders.
|
||
- Avoid re-downloading the same videos (robust tracking).
|
||
- Prioritize and track a custom songlist across channels.
|
||
- Allow flexible, user-friendly configuration.
|
||
- Provide robust interruption handling and progress recovery.
|
||
|
||
---
|
||
|
||
## 🧑💻 Target Users
|
||
- Karaoke DJs, home karaoke users, event hosts, or anyone needing offline karaoke video libraries.
|
||
- Users comfortable with command-line tools.
|
||
|
||
---
|
||
|
||
## ⚙️ Platform & Stack
|
||
- **Platform:** Windows
|
||
- **Interface:** Command-line (CLI)
|
||
- **Tech Stack:** Python 3.7+, yt-dlp.exe, mutagen (for ID3 tagging)
|
||
|
||
---
|
||
|
||
## 📥 Input
|
||
- YouTube channel or playlist URLs (e.g. `https://www.youtube.com/@SingKingKaraoke/videos`)
|
||
- Optional: `data/channels.txt` file with multiple channel URLs (one per line) - **now defaults to this file if not specified**
|
||
- Optional: `data/songList.json` for prioritized song downloads
|
||
|
||
### Example Usage
|
||
```bash
|
||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
||
python download_karaoke.py --songlist-only --limit 5
|
||
python download_karaoke.py --latest-per-channel --limit 3
|
||
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
||
python download_karaoke.py --clear-cache SingKingKaraoke
|
||
```
|
||
|
||
---
|
||
|
||
## 📤 Output
|
||
- MP4 files in `downloads/<ChannelName>/` subfolders
|
||
- All videos tracked in `data/karaoke_tracking.json`
|
||
- Songlist progress tracked in `data/songlist_tracking.json`
|
||
- Logs in `logs/`
|
||
|
||
---
|
||
|
||
## 🛠️ Features
|
||
- ✅ Channel-based downloads (with per-channel folders)
|
||
- ✅ Robust JSON tracking (downloaded, partial, failed, etc.)
|
||
- ✅ Batch saving and channel video caching for performance
|
||
- ✅ Configurable download resolution and yt-dlp options (`data/config.json`)
|
||
- ✅ Songlist integration: prioritize and track custom songlists
|
||
- ✅ Songlist-only mode: download only songs from the songlist
|
||
- ✅ Songlist focus mode: download only songs from specific playlists by title
|
||
- ✅ Force download mode: bypass all existing file checks and re-download songs regardless of server duplicates or existing files
|
||
- ✅ Global songlist tracking to avoid duplicates across channels
|
||
- ✅ ID3 tagging for artist/title in MP4 files (mutagen)
|
||
- ✅ Real-time progress and detailed logging
|
||
- ✅ Automatic cleanup of extra yt-dlp files
|
||
- ✅ **Reset/clear channel tracking and files via CLI**
|
||
- ✅ **Clear channel cache via CLI**
|
||
- ✅ **Download plan pre-scan and caching**: Before downloading, the tool pre-scans all channels for songlist matches, builds a download plan, and prints stats. The plan is cached for 1 day in data/download_plan_cache.json for fast resuming and reliability. Use --force-download-plan to force a refresh.
|
||
- ✅ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with a per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
|
||
- ✅ **Fast mode with early exit**: When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. If a download fails, it continues scanning until the limit is satisfied or all channels are exhausted.
|
||
- ✅ **Deduplication across channels**: Ensures the same song (by artist + normalized title) is not downloaded more than once, even if it appears in multiple channels. Tracks unique keys and skips duplicates.
|
||
- ✅ **Fuzzy matching**: Optionally use fuzzy string matching for songlist-to-video matching with configurable threshold (0-100, default 85). Uses rapidfuzz if available, falls back to difflib.
|
||
- ✅ **Default channel file**: If no --file is specified for songlist-only or latest-per-channel modes, automatically uses data/channels.txt as the default channel list.
|
||
- ✅ **Robust interruption handling**: Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted.
|
||
- ✅ **Optimized scanning performance**: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching of large songlists and channels.
|
||
- ✅ **Centralized yt-dlp command generation**: Standardized command building and execution across all download operations
|
||
- ✅ **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting
|
||
- ✅ **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing
|
||
- ✅ **Reduced code duplication**: Eliminated duplicate code across modules through centralized utilities
|
||
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
||
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
||
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
||
- ✅ **Manual video collection**: Static video collection system for managing individual karaoke videos that don't belong to regular channels. Use `--manual` to download from `data/manual_videos.json`.
|
||
- ✅ **Channel-specific parsing rules**: JSON-based configuration for parsing video titles from different YouTube channels, with support for various title formats and cleanup rules.
|
||
|
||
---
|
||
|
||
## 📂 Folder Structure
|
||
```
|
||
KaroakeVideoDownloader/
|
||
├── karaoke_downloader/ # All core Python code and utilities
|
||
│ ├── downloader.py # Main orchestrator and CLI interface
|
||
│ ├── cli.py # CLI entry point
|
||
│ ├── video_downloader.py # Core video download execution and orchestration
|
||
│ ├── tracking_manager.py # Download tracking and status management
|
||
│ ├── download_planner.py # Download plan building and channel scanning
|
||
│ ├── cache_manager.py # Cache operations and file I/O management
|
||
│ ├── channel_manager.py # Channel and file management operations
|
||
│ ├── songlist_manager.py # Songlist operations and tracking
|
||
│ ├── server_manager.py # Server song availability checking
|
||
│ ├── fuzzy_matcher.py # Fuzzy matching logic and similarity functions
|
||
│ ├── youtube_utils.py # Centralized YouTube operations and yt-dlp commands
|
||
│ ├── error_utils.py # Standardized error handling and formatting
|
||
│ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline
|
||
│ ├── id3_utils.py # ID3 tagging utilities
|
||
│ ├── config_manager.py # Configuration management with dataclasses
|
||
│ ├── file_utils.py # Centralized file operations and filename handling
|
||
│ ├── song_validator.py # Centralized song validation logic
|
||
│ ├── check_resolution.py # Resolution checker utility
|
||
│ ├── resolution_cli.py # Resolution config CLI
|
||
│ └── tracking_cli.py # Tracking management CLI
|
||
├── data/ # All config, tracking, cache, and songlist files
|
||
│ ├── config.json
|
||
│ ├── karaoke_tracking.json
|
||
│ ├── songlist_tracking.json
|
||
│ ├── channel_cache.json
|
||
│ ├── channels.json # Channel configuration with parsing rules
|
||
│ ├── channels.txt # Legacy channel list (backward compatibility)
|
||
│ ├── manual_videos.json # Manual video collection
|
||
│ └── songList.json
|
||
├── downloads/ # All video output
|
||
│ └── [ChannelName]/ # Per-channel folders
|
||
├── logs/ # Download logs
|
||
├── downloader/yt-dlp.exe # yt-dlp binary
|
||
├── tests/ # Diagnostic and test scripts
|
||
│ └── test_installation.py
|
||
├── download_karaoke.py # Main entry point (thin wrapper)
|
||
├── README.md
|
||
├── PRD.md
|
||
├── requirements.txt
|
||
└── download_karaoke.bat # (optional Windows launcher)
|
||
```
|
||
|
||
---
|
||
|
||
## 🚦 CLI Options (Summary)
|
||
- `--file <data/channels.txt>`: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes)
|
||
- `--songlist-priority`: Prioritize songlist songs in download queue
|
||
- `--songlist-only`: Download only songs from the songlist
|
||
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
||
- `--songlist-file <FILE_PATH>`: Custom songlist file path to use with --songlist-focus (default: data/songList.json)
|
||
- `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
|
||
- `--songlist-status`: Show songlist download progress
|
||
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
||
- `--resolution <720p|1080p|...>`: Override resolution
|
||
- `--status`: Show download/tracking status
|
||
- `--reset-channel <CHANNEL_NAME>`: **Reset all tracking and files for a channel**
|
||
- `--reset-songlist`: **When used with --reset-channel, also reset songlist songs for this channel**
|
||
- `--clear-cache <CHANNEL_ID|all>`: **Clear channel video cache for a specific channel or all**
|
||
- `--force-download-plan`: **Force refresh the download plan cache (re-scan all channels for matches)**
|
||
- `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
|
||
- `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)**
|
||
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
||
- `--parallel`: **Enable parallel downloads for improved speed**
|
||
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
|
||
- `--manual`: **Download from manual videos collection (data/manual_videos.json)**
|
||
- `--channel-focus <CHANNEL_NAME>`: **Download from a specific channel by name (e.g., 'SingKingKaraoke')**
|
||
- `--all-videos`: **Download all videos from channel (not just songlist matches), skipping existing files and songs in songs.json**
|
||
|
||
---
|
||
|
||
## 🧠 Logic Highlights
|
||
- **Tracking:** All downloads, statuses, and formats are tracked in JSON files for reliability and deduplication.
|
||
- **Songlist:** Loads and normalizes `data/songList.json`, matches against available videos, and prioritizes or restricts downloads accordingly.
|
||
- **Batch/Caching:** Channel video lists are cached to minimize API calls; tracking is batch-saved for performance.
|
||
- **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files.
|
||
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
||
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
||
- **Channel-Specific Parsing:** Uses `data/channels.json` to define parsing rules for each YouTube channel, handling different video title formats (e.g., "Artist - Title", "Artist Title", "Title | Artist", etc.).
|
||
- **Manual Video Collection:** Static video management system using `data/manual_videos.json` for individual karaoke videos that don't belong to regular channels. Accessible via `--manual` parameter.
|
||
|
||
## 🔧 Refactoring Improvements (v3.3)
|
||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||
|
||
### **New Utility Modules (v3.3)**
|
||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||
- `sanitize_filename()`: Create safe filenames from artist/title
|
||
- `generate_possible_filenames()`: Generate filename patterns for different modes
|
||
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
|
||
- `is_valid_mp4_file()`: Validate MP4 files with header checking
|
||
- `cleanup_temp_files()`: Remove temporary yt-dlp files
|
||
- `ensure_directory_exists()`: Safe directory creation
|
||
|
||
- **`song_validator.py`**: Centralized song validation logic
|
||
- `SongValidator` class: Unified logic for checking if songs should be downloaded
|
||
- `should_skip_song()`: Comprehensive validation with multiple criteria
|
||
- `mark_song_failed()`: Consistent failure tracking
|
||
- `handle_download_failure()`: Standardized error handling
|
||
|
||
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
|
||
- `ConfigManager` class: Type-safe configuration loading and caching
|
||
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
|
||
- Configuration validation and merging with defaults
|
||
- Dynamic resolution updates
|
||
|
||
### **Benefits Achieved**
|
||
- **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules
|
||
- **Centralized File Operations**: Single source of truth for filename handling and file validation
|
||
- **Unified Song Validation**: Consistent logic for checking if songs should be downloaded
|
||
- **Enhanced Type Safety**: Comprehensive type hints across all new modules
|
||
- **Improved Configuration Management**: Structured configuration with validation and caching
|
||
- **Better Error Handling**: Consistent patterns via centralized utilities
|
||
- **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place
|
||
- **Improved Testability**: Modular components can be tested independently
|
||
- **Better Developer Experience**: Clear function signatures and comprehensive documentation
|
||
|
||
### **Previous Improvements (v3.2)**
|
||
- **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations
|
||
- **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting
|
||
- **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing
|
||
- **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set.
|
||
- **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done.
|
||
- **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach.
|
||
- **Deduplication across channels:** Tracks unique song keys (artist + normalized title) to ensure the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list.
|
||
- **Fuzzy matching:** Uses string similarity algorithms to find approximate matches between songlist entries and video titles, tolerating minor differences, typos, or extra words like "Karaoke" or "Official Video".
|
||
- **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly.
|
||
- **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted.
|
||
- **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels.
|
||
- **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls.
|
||
- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
|
||
|
||
### **New Parallel Download System (v3.4)**
|
||
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
||
- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
||
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
||
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
||
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
||
- **Backward compatibility:** Sequential downloads remain the default when `--parallel` is not used
|
||
- **Performance improvements:** Significantly faster downloads for large batches (3-5x speedup with 3-5 workers)
|
||
- **Integrated with all modes:** Works with both songlist-across-channels and latest-per-channel download modes
|
||
|
||
---
|
||
|
||
## 🚀 Future Enhancements
|
||
- [ ] Web UI for easier management
|
||
- [ ] More advanced song matching (multi-language)
|
||
- [ ] Download scheduling and retry logic
|
||
- [ ] More granular status reporting
|
||
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
||
- [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
|
||
- [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
|
||
- [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED**
|
||
- [ ] Unit tests for all modules
|
||
- [ ] Integration tests for end-to-end workflows
|
||
- [ ] Plugin system for custom file operations
|
||
- [ ] Advanced configuration UI
|
||
- [ ] Real-time download progress visualization
|
||
|
||
## 🔧 Recent Bug Fixes & Improvements (v3.4.1)
|
||
### **Enhanced Fuzzy Matching (v3.4.1)**
|
||
- **Improved `extract_artist_title` function**: Enhanced to handle multiple video title formats beyond simple "Artist - Title" patterns
|
||
- **"Title Karaoke | Artist Karaoke Version" format**: Correctly parses titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||
- **"Title Artist KARAOKE" format**: Handles titles ending with "KARAOKE" and attempts to extract artist information
|
||
- **Fallback handling**: Returns empty artist and full title for unparseable formats
|
||
- **Consolidated function usage**: Removed duplicate `extract_artist_title` implementations across modules
|
||
- **Single source of truth**: All modules now import from `fuzzy_matcher.py`
|
||
- **Consistent parsing**: Eliminated inconsistencies between different parsing implementations
|
||
- **Better maintainability**: Changes to parsing logic only need to be made in one place
|
||
|
||
### **Fixed Import Conflicts**
|
||
- **Resolved import conflict in `download_planner.py`**: Updated to use the enhanced `extract_artist_title` from `fuzzy_matcher.py` instead of the simpler version from `id3_utils.py`
|
||
- **Updated `id3_utils.py`**: Now imports `extract_artist_title` from `fuzzy_matcher.py` for consistency
|
||
|
||
### **Enhanced --limit Parameter**
|
||
- **Fixed limit application**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution
|
||
- **Improved performance**: When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists
|
||
|
||
### **Benefits of Recent Improvements**
|
||
- **Better matching accuracy**: Enhanced fuzzy matching can now handle a wider variety of video title formats commonly found on YouTube karaoke channels
|
||
- **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
|
||
- **Consistent behavior**: All parts of the system use the same parsing logic, eliminating edge cases where different modules would parse the same title differently
|
||
- **Improved performance**: The `--limit` parameter now works as expected, providing faster processing for targeted downloads
|
||
- **Cleaner codebase**: Eliminated duplicate code and import conflicts, making the system more maintainable
|
||
|
||
## 🔧 Recent Bug Fixes & Improvements (v3.4.2)
|
||
### **Duplicate File Prevention & Filename Consistency**
|
||
- **Enhanced file existence checking**: `check_file_exists_with_patterns()` now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates
|
||
- **Automatic duplicate prevention**: Download pipeline skips downloads when files already exist (including duplicates)
|
||
- **Updated yt-dlp configuration**: Set `"nooverwrites": false` to prevent yt-dlp from creating duplicate files with suffixes
|
||
- **Cleanup utility**: `data/cleanup_duplicate_files.py` provides interactive cleanup of existing duplicate files
|
||
- **Filename vs ID3 tag consistency**: Removed "(Karaoke Version)" suffix from ID3 tags to match filenames exactly
|
||
- **Unified parsing**: Both filename generation and ID3 tagging use the same artist/title extraction logic
|
||
|
||
### **Benefits of Duplicate Prevention**
|
||
- **No more duplicate files**: Eliminates `(2)`, `(3)` suffix files that waste disk space
|
||
- **Consistent metadata**: Filename and ID3 tag use identical artist/title format
|
||
- **Efficient disk usage**: Prevents unnecessary downloads of existing files
|
||
- **Clear file identification**: Consistent naming across all file operations
|
||
|
||
## 🛠️ Maintenance
|
||
|
||
### **Regular Cleanup**
|
||
- Run the cleanup utility periodically to remove any duplicate files
|
||
- Monitor downloads for any new duplicate creation (should be rare with fixes)
|
||
|
||
### **Configuration**
|
||
- Keep `"nooverwrites": false` in `data/config.json`
|
||
- This prevents yt-dlp from creating duplicate files
|
||
|
||
### **Monitoring**
|
||
- Check logs for "⏭️ Skipping download - file already exists" messages
|
||
- These indicate the duplicate prevention is working correctly
|
||
|
||
## 🔧 Recent Bug Fixes & Improvements (v3.4.3)
|
||
### **Manual Video Collection System**
|
||
- **New `--manual` parameter**: Simple access to manual video collection via `python download_karaoke.py --manual --limit 5`
|
||
- **Static video management**: `data/manual_videos.json` stores individual karaoke videos that don't belong to regular channels
|
||
- **Helper script**: `add_manual_video.py` provides easy management of manual video entries
|
||
- **Full integration**: Manual videos work with all existing features (songlist matching, fuzzy matching, parallel downloads, etc.)
|
||
- **No yt-dlp dependency**: Manual videos bypass YouTube API calls for video listing, using static data instead
|
||
|
||
### **Channel-Specific Parsing Rules**
|
||
- **JSON-based configuration**: `data/channels.json` replaces `data/channels.txt` with structured channel configuration
|
||
- **Parsing rules per channel**: Each channel can define custom parsing rules for video titles
|
||
- **Multiple format support**: Handles various title formats like "Artist - Title", "Artist Title", "Title | Artist", etc.
|
||
- **Suffix cleanup**: Automatic removal of common karaoke-related suffixes
|
||
- **Multi-artist support**: Parsing for titles with multiple artists separated by specific delimiters
|
||
- **Backward compatibility**: Still supports legacy `data/channels.txt` format
|
||
|
||
### **Benefits of New Features**
|
||
- **Flexible video management**: Easy addition of individual karaoke videos without creating new channels
|
||
- **Accurate parsing**: Channel-specific rules ensure correct artist/title extraction for ID3 tags and filenames
|
||
- **Consistent metadata**: Proper parsing prevents filename and ID3 tag inconsistencies
|
||
- **Easy maintenance**: Simple JSON structure for managing both channels and manual videos
|
||
- **Full feature compatibility**: Manual videos work seamlessly with existing download modes and features
|
||
|
||
## 📚 Documentation Standards
|
||
|
||
### **Documentation Location**
|
||
- **All changes, refactoring, and improvements should be documented in the PRD.md and README.md files**
|
||
- **Do NOT create separate .md files for documenting changes, refactoring, or improvements**
|
||
- **Use the existing sections in PRD.md and README.md to track all project evolution**
|
||
|
||
### **Where to Document Changes**
|
||
- **PRD.md**: Technical details, architecture changes, bug fixes, and implementation specifics
|
||
- **README.md**: User-facing features, usage instructions, and high-level improvements
|
||
- **CHANGELOG.md**: Version-specific release notes and change summaries
|
||
|
||
### **Documentation Requirements**
|
||
- **All new features must be documented in both PRD.md and README.md**
|
||
- **All refactoring efforts must be documented in the appropriate sections**
|
||
- **All bug fixes must be documented with technical details**
|
||
- **Version numbers and dates should be clearly marked**
|
||
- **Benefits and improvements should be explicitly stated**
|
||
|
||
### **Maintenance Responsibility**
|
||
- **Keep PRD.md and README.md synchronized with code changes**
|
||
- **Update documentation immediately when implementing new features**
|
||
- **Remove outdated information and consolidate related changes**
|
||
- **Ensure all CLI options and features are documented in both files**
|
||
|
||
## 🔧 Recent Bug Fixes & Improvements (v3.4.4)
|
||
### **All Videos Download Mode**
|
||
- **New `--all-videos` parameter**: Download all videos from a channel, not just songlist matches
|
||
- **Smart MP3/MP4 detection**: Automatically detects if you have MP3 versions in songs.json and downloads MP4 video versions
|
||
- **Existing file skipping**: Skips videos that already exist on the filesystem
|
||
- **Progress tracking**: Shows clear progress with "Downloading X/Y videos" format
|
||
- **Parallel processing support**: Works with `--parallel --workers N` for faster downloads
|
||
- **Channel focus integration**: Works with `--channel-focus` to target specific channels
|
||
- **Limit support**: Works with `--limit N` to control download batch size
|
||
|
||
### **Smart Songlist Integration**
|
||
- **MP4 version detection**: Checks if MP4 version already exists in songs.json before downloading
|
||
- **MP3 upgrade path**: Downloads MP4 video versions when only MP3 versions exist in songlist
|
||
- **Duplicate prevention**: Skips downloads when MP4 versions already exist
|
||
- **Efficient filtering**: Only processes videos that need to be downloaded
|
||
|
||
### **Benefits of All Videos Mode**
|
||
- **Complete channel downloads**: Download entire channels without songlist restrictions
|
||
- **Automatic format upgrading**: Upgrade MP3 collections to MP4 video versions
|
||
- **Efficient processing**: Only downloads videos that don't already exist
|
||
- **Flexible control**: Use with limits, parallel processing, and channel targeting
|
||
- **Clear progress feedback**: Real-time progress tracking for large downloads
|
||
|
||
---
|
||
|
||
## 🚀 Future Enhancements
|
||
- [ ] Web UI for easier management
|
||
- [ ] More advanced song matching (multi-language)
|
||
- [ ] Download scheduling and retry logic
|
||
- [ ] More granular status reporting
|
||
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
||
- [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
|
||
- [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
|
||
- [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED**
|
||
- [ ] Unit tests for all modules
|
||
- [ ] Integration tests for end-to-end workflows
|
||
- [ ] Plugin system for custom file operations
|
||
- [ ] Advanced configuration UI
|
||
- [ ] Real-time download progress visualization
|