Signed-off-by: mbrucedogs <mbrucedogs@gmail.com>

2025-07-27 22:49:35 -05:00 · 2025-07-27 22:49:35 -05:00 · 81b3d2d88c
commit 81b3d2d88c
parent 95a49bf39e
8 changed files with 195 additions and 31 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -0,0 +1,81 @@
 # Changelog
 ## [v3.4.1] - 2025-01-27
 ### 🐛 Bug Fixes
 - **Fixed --limit parameter behavior**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution. When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists.
 - **Fixed --limit logging accuracy**: The logging messages now accurately reflect the number of songs that will actually be processed when using `--limit`, rather than showing counts for all songs in the songlist.
 - **Resolved import conflicts**: Fixed inconsistencies between different `extract_artist_title` implementations across modules.
 ### ✨ Enhancements
 - **Enhanced fuzzy matching**: Improved `extract_artist_title` function in `fuzzy_matcher.py` to handle multiple video title formats:
  - `"Artist - Title"` format: "38 Special - Hold On Loosely"
  - `"Title Karaoke | Artist Karaoke Version"` format: "Hold On Loosely Karaoke | 38 Special Karaoke Version"
  - `"Title Artist KARAOKE"` format: "Hold On Loosely 38 Special KARAOKE"
 - **Consolidated parsing logic**: Removed duplicate `extract_artist_title` implementations and centralized all parsing logic in `fuzzy_matcher.py`
 - **Better matching accuracy**: Reduced false negatives for songs with non-standard title formats commonly found on YouTube karaoke channels
 ### 🔧 Code Quality
 - **Eliminated code duplication**: Removed duplicate `extract_artist_title` functions from `id3_utils.py` and `download_planner.py`
 - **Single source of truth**: All modules now import `extract_artist_title` from `fuzzy_matcher.py` for consistent behavior
 - **Enhanced documentation**: Added comprehensive docstrings and examples to the `extract_artist_title` function
 - **Improved maintainability**: Changes to parsing logic now only need to be made in one place
 ### 📚 Documentation
 - **Updated PRD.md**: Added section documenting recent bug fixes and improvements
 - **Updated README.md**: Enhanced feature descriptions and added recent improvements section
 - **Enhanced code comments**: Added explanatory comments for the --limit fix and import changes
 ### 🧪 Testing
 - **Verified functionality**: Successfully tested the enhanced fuzzy matching with real-world examples
 - **Confirmed performance improvements**: Validated that the --limit parameter now works as expected
 ---
 ## [v3.4.0] - 2025-01-XX
 ### ✨ New Features
 - **Parallel downloads**: Enable concurrent downloads with `--parallel --workers N` for significantly faster batch downloads (3-5x speedup)
 - **Thread-safe operations**: All tracking, caching, and progress operations are thread-safe
 - **Automatic retry mechanism**: Failed downloads are automatically retried with reduced concurrency
 ### 🔧 Improvements
 - **New parallel downloader module**: `parallel_downloader.py` provides thread-safe concurrent download management
 - **Configurable concurrency**: Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
 - **Real-time progress tracking**: Shows active downloads, completion status, and overall progress
 - **Backward compatibility**: Sequential downloads remain the default when `--parallel` is not used
 - **Integrated with all modes**: Works with both songlist-across-channels and latest-per-channel download modes
 ---
 ## [v3.3.0] - 2025-01-XX
 ### ✨ New Features
 - **Centralized file operations**: `file_utils.py` provides single source of truth for filename handling and file validation
 - **Centralized song validation**: `song_validator.py` provides unified logic for checking if songs should be downloaded
 - **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
 ### 🔧 Improvements
 - **Eliminated code duplication**: ~150 lines of duplicate code removed across modules
 - **Enhanced type safety**: Comprehensive type hints across all new modules
 - **Better error handling**: Consistent patterns via centralized utilities
 - **Improved maintainability**: Changes to file operations or song validation only require updates in one place
 ---
 ## [v3.2.0] - 2025-01-XX
 ### ✨ New Features
 - **Download plan pre-scan**: Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats
 - **Latest-per-channel plan**: Download the latest N videos from each channel, with a per-channel plan and robust resume
 - **Fast mode with early exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found
 - **Deduplication across channels**: Tracks unique song keys to ensure the same song is not downloaded from multiple channels
 - **Fuzzy matching**: Uses string similarity algorithms to find approximate matches between songlist entries and video titles
 - **Default channel file**: Automatically uses data/channels.txt as the default channel list for songlist modes
 ### 🔧 Improvements
 - **Centralized yt-dlp command generation**: Standardized command building and execution across all download operations
 - **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting
 - **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing
 - **Optimized scanning algorithm**: High-performance channel scanning with O(n×m) complexity and pre-processed lookups
 - **Robust interruption handling**: Progress is saved after each download, preventing re-downloads if the process is interrupted 
--- a/PRD.md
+++ b/PRD.md
@ -191,7 +191,7 @@ KaroakeVideoDownloader/
 - `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)**
 - `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
 - `--parallel`: **Enable parallel downloads for improved speed**
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3)**
+- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
 ---
@ -255,7 +255,7 @@ The codebase has been comprehensively refactored to improve maintainability and
 ### **New Parallel Download System (v3.4)**
 - **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
+- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
 - **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
 - **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
 - **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
@ -271,8 +271,36 @@ The codebase has been comprehensively refactored to improve maintainability and
 - [ ] Download scheduling and retry logic
 - [ ] More granular status reporting
 - [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
 - [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
 - [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
 - [ ] Unit tests for all modules
 - [ ] Integration tests for end-to-end workflows
 - [ ] Plugin system for custom file operations
 - [ ] Advanced configuration UI
 - [ ] Real-time download progress visualization
 ## 🔧 Recent Bug Fixes & Improvements (v3.4.1)
 ### **Enhanced Fuzzy Matching (v3.4.1)**
 - **Improved `extract_artist_title` function**: Enhanced to handle multiple video title formats beyond simple "Artist - Title" patterns
  - **"Title Karaoke | Artist Karaoke Version" format**: Correctly parses titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
  - **"Title Artist KARAOKE" format**: Handles titles ending with "KARAOKE" and attempts to extract artist information
  - **Fallback handling**: Returns empty artist and full title for unparseable formats
 - **Consolidated function usage**: Removed duplicate `extract_artist_title` implementations across modules
  - **Single source of truth**: All modules now import from `fuzzy_matcher.py`
  - **Consistent parsing**: Eliminated inconsistencies between different parsing implementations
  - **Better maintainability**: Changes to parsing logic only need to be made in one place
 ### **Fixed Import Conflicts**
 - **Resolved import conflict in `download_planner.py`**: Updated to use the enhanced `extract_artist_title` from `fuzzy_matcher.py` instead of the simpler version from `id3_utils.py`
 - **Updated `id3_utils.py`**: Now imports `extract_artist_title` from `fuzzy_matcher.py` for consistency
 ### **Enhanced --limit Parameter**
 - **Fixed limit application**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution
 - **Improved performance**: When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists
 ### **Benefits of Recent Improvements**
 - **Better matching accuracy**: Enhanced fuzzy matching can now handle a wider variety of video title formats commonly found on YouTube karaoke channels
 - **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
 - **Consistent behavior**: All parts of the system use the same parsing logic, eliminating edge cases where different modules would parse the same title differently
 - **Improved performance**: The `--limit` parameter now works as expected, providing faster processing for targeted downloads
 - **Cleaner codebase**: Eliminated duplicate code and import conflicts, making the system more maintainable
--- a/README.md
+++ b/README.md
@ -13,7 +13,7 @@ A Python-based Windows CLI tool to download karaoke videos from YouTube channels
 - 📈 **Real-Time Progress**: Detailed console and log output
 - 🧹 **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI
 - 🗂️ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
- 🧩 **Fuzzy Matching**: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
+- 🧩 **Enhanced Fuzzy Matching**: Advanced fuzzy string matching for songlist-to-video matching with improved video title parsing (handles multiple title formats like "Title Karaoke | Artist Karaoke Version")
 - ⚡ **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
 - 🔄 **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
 - 📋 **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
@ -80,6 +80,25 @@ The codebase has been comprehensively refactored into a modular architecture wit
 - **Testability**: Modular components can be tested independently
 - **Type Safety**: Comprehensive type hints across all new modules
 ## 🔧 Recent Improvements (v3.4.1)
 ### **Enhanced Fuzzy Matching**
 - **Improved video title parsing**: The `extract_artist_title` function now handles multiple title formats:
  - `"Title Karaoke | Artist Karaoke Version"` → Artist: "38 Special", Title: "Hold On Loosely"
  - `"Title Artist KARAOKE"` → Attempts to extract artist from complex titles
  - `"Artist - Title"` → Standard format (unchanged)
 - **Consolidated parsing logic**: All modules now use the same `extract_artist_title` function from `fuzzy_matcher.py`
 - **Better matching accuracy**: Reduced false negatives for songs with non-standard title formats
 ### **Fixed --limit Parameter**
 - **Correct limit application**: The `--limit` parameter now properly limits the scanning phase, not just downloads
 - **Improved performance**: When using `--limit N`, only the first N songs are scanned, significantly reducing processing time
 - **Accurate logging**: Logging messages now show the correct counts for songs that will actually be processed when using `--limit`
 ### **Code Quality Improvements**
 - **Eliminated duplicate functions**: Removed duplicate `extract_artist_title` implementations
 - **Fixed import conflicts**: Resolved inconsistencies between different parsing implementations
 - **Single source of truth**: All title parsing logic is now centralized in `fuzzy_matcher.py`
 ## 📋 Requirements
 - **Windows 10/11**
 - **Python 3.7+**
@ -104,7 +123,7 @@ python download_karaoke.py --songlist-only --limit 5
 ### Download with Parallel Processing
 ```bash
-python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
+python download_karaoke.py --parallel --songlist-only --limit 10
 ```
 ### Focus on Specific Playlists by Title
@ -272,8 +291,8 @@ KaroakeVideoDownloader/
 - `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
 - `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
 - `--fuzzy-threshold <N>`: Fuzzy match threshold (0-100, default 85)
- `--parallel`: Enable parallel downloads for improved speed
+- `--parallel`: Enable parallel downloads for improved speed (defaults to 3 workers)
- `--workers <N>`: Number of parallel download workers (1-10, default: 3)
+- `--workers <N>`: Number of parallel download workers (1-10, default: 3, only used with --parallel)
 - `--generate-songlist <DIR1> <DIR2>...`: **Generate song list from MP4 files with ID3 tags in specified directories**
 - `--no-append-songlist`: **Create a new song list instead of appending when using --generate-songlist**
 - `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
@ -287,10 +306,10 @@ KaroakeVideoDownloader/
 python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
 # Parallel downloads for faster processing
-python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
+python download_karaoke.py --parallel --songlist-only --limit 10
 # Latest videos per channel with parallel downloads
-python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5
+python download_karaoke.py --parallel --latest-per-channel --limit 5
 # Traditional full scan (no limit)
 python download_karaoke.py --songlist-only
@ -388,7 +407,7 @@ The codebase has been comprehensively refactored to improve maintainability and
 ### **New Parallel Download System (v3.4)**
 - **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
+- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
 - **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
 - **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
 - **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
--- a/karaoke_downloader/cli.py
+++ b/karaoke_downloader/cli.py
@ -194,13 +194,13 @@ Examples:
    parser.add_argument(
        "--parallel",
        action="store_true",
-        help="Enable parallel downloads for improved speed (3-5x faster for large batches)",
+        help="Enable parallel downloads for improved speed (3-5x faster for large batches, defaults to 3 workers)",
    )
    parser.add_argument(
        "--workers",
        type=int,
        default=3,
-        help="Number of parallel download workers (default: 3, max: 10)",
+        help="Number of parallel download workers (default: 3, max: 10, only used with --parallel)",
    )
    parser.add_argument(
        "--generate-songlist",
--- a/karaoke_downloader/download_planner.py
+++ b/karaoke_downloader/download_planner.py
@ -17,6 +17,8 @@ from karaoke_downloader.cache_manager import (
    load_cached_plan,
    save_plan_cache,
 )
 # Import all fuzzy matching functions including the enhanced extract_artist_title
 # This ensures consistent parsing across all modules and supports multiple video title formats
 from karaoke_downloader.fuzzy_matcher import (
    create_song_key,
    create_video_key,
--- a/karaoke_downloader/downloader.py
+++ b/karaoke_downloader/downloader.py
@ -458,26 +458,33 @@ class KaraokeDownloader:
                not_on_server.append(song)
-            if server_available_mp4 > 0:
+        # Apply limit to undownloaded list before logging
-                print(
+        # This ensures that only the specified number of songs are processed and logged,
-                    f"\n🎵 {server_available_mp4} songs already available as MP4 on server, skipping."
+        # providing accurate counts when using --limit
-                )
+        if limit is not None:
-            if server_available_other > 0:
+            original_count = len(not_on_server)
-                print(
+            not_on_server = not_on_server[:limit]
-                    f"\n🎵 {server_available_other} songs found on server as MP3/CDG, will download video versions."
+            print(f"\n🎯 Limited to first {limit} songs (was {original_count} total)")
                )
            if marked_duplicates > 0:
                print(
                    f"\n🏷️ {marked_duplicates} songs previously marked as server duplicates, skipping."
                )
        undownloaded = not_on_server
-        # Apply limit to undownloaded list before scanning
+        # Now log the counts based on the limited list
-        if limit is not None:
+        if server_available_mp4 > 0:
-            original_count = len(undownloaded)
+            print(
-            undownloaded = undownloaded[:limit]
+                f"\n🎵 {server_available_mp4} songs already available as MP4 on server, skipping."
-            print(f"\n🎯 Limited to first {limit} songs (was {original_count} total)")
+            )
        if server_available_other > 0:
            # Only count songs that are in the limited list
            limited_server_other = sum(1 for song in not_on_server 
                                     if f"{song['artist'].lower()}_{normalize_title(song['title'])}" in server_songs)
            if limited_server_other > 0:
                print(
                    f"\n🎵 {limited_server_other} songs found on server as MP3/CDG, will download video versions."
                )
        if marked_duplicates > 0:
            print(
                f"\n🏷️ {marked_duplicates} songs previously marked as server duplicates, skipping."
            )
        print(f"\n🎯 {len(undownloaded)} songs need to be downloaded.")
        if not undownloaded:
--- a/karaoke_downloader/fuzzy_matcher.py
+++ b/karaoke_downloader/fuzzy_matcher.py
@ -32,8 +32,33 @@ def normalize_title(title):
 def extract_artist_title(video_title):
-    """Extract artist and title from video title."""
+    """
-    # Handle "Title - Artist" format
+    Extract artist and title from video title.
    This function handles multiple common video title formats found on YouTube karaoke channels:
    1. "Artist - Title" format: "38 Special - Hold On Loosely"
    2. "Title Karaoke | Artist Karaoke Version" format: "Hold On Loosely Karaoke | 38 Special Karaoke Version"
    3. "Title Artist KARAOKE" format: "Hold On Loosely 38 Special KARAOKE"
    Args:
        video_title (str): The YouTube video title to parse
    Returns:
        tuple: (artist, title) where artist and title are strings. If parsing fails,
               artist will be empty string and title will be the full video title.
    Examples:
        >>> extract_artist_title("38 Special - Hold On Loosely")
        ("38 Special", "Hold On Loosely")
        >>> extract_artist_title("Hold On Loosely Karaoke | 38 Special Karaoke Version")
        ("38 Special", "Hold On Loosely")
        >>> extract_artist_title("Unknown Format Video Title")
        ("", "Unknown Format Video Title")
    """
    # Handle "Artist - Title" format
    if " - " in video_title:
        parts = video_title.split(" - ", 1)
        return parts[0].strip(), parts[1].strip()
--- a/karaoke_downloader/id3_utils.py
+++ b/karaoke_downloader/id3_utils.py
@ -31,6 +31,8 @@ def clean_channel_name(channel_name: str) -> str:
    return "Unknown"
 # Import the enhanced extract_artist_title function from fuzzy_matcher.py
 # This ensures consistent parsing across all modules and supports multiple video title formats
 from karaoke_downloader.fuzzy_matcher import extract_artist_title