Signed-off-by: mbrucedogs <mbrucedogs@gmail.com>
This commit is contained in:
parent
95a49bf39e
commit
81b3d2d88c
81
CHANGELOG.md
Normal file
81
CHANGELOG.md
Normal file
@ -0,0 +1,81 @@
|
|||||||
|
# Changelog
|
||||||
|
|
||||||
|
## [v3.4.1] - 2025-01-27
|
||||||
|
|
||||||
|
### 🐛 Bug Fixes
|
||||||
|
- **Fixed --limit parameter behavior**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution. When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists.
|
||||||
|
- **Fixed --limit logging accuracy**: The logging messages now accurately reflect the number of songs that will actually be processed when using `--limit`, rather than showing counts for all songs in the songlist.
|
||||||
|
- **Resolved import conflicts**: Fixed inconsistencies between different `extract_artist_title` implementations across modules.
|
||||||
|
|
||||||
|
### ✨ Enhancements
|
||||||
|
- **Enhanced fuzzy matching**: Improved `extract_artist_title` function in `fuzzy_matcher.py` to handle multiple video title formats:
|
||||||
|
- `"Artist - Title"` format: "38 Special - Hold On Loosely"
|
||||||
|
- `"Title Karaoke | Artist Karaoke Version"` format: "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||||||
|
- `"Title Artist KARAOKE"` format: "Hold On Loosely 38 Special KARAOKE"
|
||||||
|
- **Consolidated parsing logic**: Removed duplicate `extract_artist_title` implementations and centralized all parsing logic in `fuzzy_matcher.py`
|
||||||
|
- **Better matching accuracy**: Reduced false negatives for songs with non-standard title formats commonly found on YouTube karaoke channels
|
||||||
|
|
||||||
|
### 🔧 Code Quality
|
||||||
|
- **Eliminated code duplication**: Removed duplicate `extract_artist_title` functions from `id3_utils.py` and `download_planner.py`
|
||||||
|
- **Single source of truth**: All modules now import `extract_artist_title` from `fuzzy_matcher.py` for consistent behavior
|
||||||
|
- **Enhanced documentation**: Added comprehensive docstrings and examples to the `extract_artist_title` function
|
||||||
|
- **Improved maintainability**: Changes to parsing logic now only need to be made in one place
|
||||||
|
|
||||||
|
### 📚 Documentation
|
||||||
|
- **Updated PRD.md**: Added section documenting recent bug fixes and improvements
|
||||||
|
- **Updated README.md**: Enhanced feature descriptions and added recent improvements section
|
||||||
|
- **Enhanced code comments**: Added explanatory comments for the --limit fix and import changes
|
||||||
|
|
||||||
|
### 🧪 Testing
|
||||||
|
- **Verified functionality**: Successfully tested the enhanced fuzzy matching with real-world examples
|
||||||
|
- **Confirmed performance improvements**: Validated that the --limit parameter now works as expected
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v3.4.0] - 2025-01-XX
|
||||||
|
|
||||||
|
### ✨ New Features
|
||||||
|
- **Parallel downloads**: Enable concurrent downloads with `--parallel --workers N` for significantly faster batch downloads (3-5x speedup)
|
||||||
|
- **Thread-safe operations**: All tracking, caching, and progress operations are thread-safe
|
||||||
|
- **Automatic retry mechanism**: Failed downloads are automatically retried with reduced concurrency
|
||||||
|
|
||||||
|
### 🔧 Improvements
|
||||||
|
- **New parallel downloader module**: `parallel_downloader.py` provides thread-safe concurrent download management
|
||||||
|
- **Configurable concurrency**: Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
||||||
|
- **Real-time progress tracking**: Shows active downloads, completion status, and overall progress
|
||||||
|
- **Backward compatibility**: Sequential downloads remain the default when `--parallel` is not used
|
||||||
|
- **Integrated with all modes**: Works with both songlist-across-channels and latest-per-channel download modes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v3.3.0] - 2025-01-XX
|
||||||
|
|
||||||
|
### ✨ New Features
|
||||||
|
- **Centralized file operations**: `file_utils.py` provides single source of truth for filename handling and file validation
|
||||||
|
- **Centralized song validation**: `song_validator.py` provides unified logic for checking if songs should be downloaded
|
||||||
|
- **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
||||||
|
|
||||||
|
### 🔧 Improvements
|
||||||
|
- **Eliminated code duplication**: ~150 lines of duplicate code removed across modules
|
||||||
|
- **Enhanced type safety**: Comprehensive type hints across all new modules
|
||||||
|
- **Better error handling**: Consistent patterns via centralized utilities
|
||||||
|
- **Improved maintainability**: Changes to file operations or song validation only require updates in one place
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v3.2.0] - 2025-01-XX
|
||||||
|
|
||||||
|
### ✨ New Features
|
||||||
|
- **Download plan pre-scan**: Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats
|
||||||
|
- **Latest-per-channel plan**: Download the latest N videos from each channel, with a per-channel plan and robust resume
|
||||||
|
- **Fast mode with early exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found
|
||||||
|
- **Deduplication across channels**: Tracks unique song keys to ensure the same song is not downloaded from multiple channels
|
||||||
|
- **Fuzzy matching**: Uses string similarity algorithms to find approximate matches between songlist entries and video titles
|
||||||
|
- **Default channel file**: Automatically uses data/channels.txt as the default channel list for songlist modes
|
||||||
|
|
||||||
|
### 🔧 Improvements
|
||||||
|
- **Centralized yt-dlp command generation**: Standardized command building and execution across all download operations
|
||||||
|
- **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting
|
||||||
|
- **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing
|
||||||
|
- **Optimized scanning algorithm**: High-performance channel scanning with O(n×m) complexity and pre-processed lookups
|
||||||
|
- **Robust interruption handling**: Progress is saved after each download, preventing re-downloads if the process is interrupted
|
||||||
32
PRD.md
32
PRD.md
@ -191,7 +191,7 @@ KaroakeVideoDownloader/
|
|||||||
- `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)**
|
- `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)**
|
||||||
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
||||||
- `--parallel`: **Enable parallel downloads for improved speed**
|
- `--parallel`: **Enable parallel downloads for improved speed**
|
||||||
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3)**
|
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -255,7 +255,7 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
|
|
||||||
### **New Parallel Download System (v3.4)**
|
### **New Parallel Download System (v3.4)**
|
||||||
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
||||||
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
|
- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
||||||
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
||||||
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
||||||
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
||||||
@ -271,8 +271,36 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
- [ ] Download scheduling and retry logic
|
- [ ] Download scheduling and retry logic
|
||||||
- [ ] More granular status reporting
|
- [ ] More granular status reporting
|
||||||
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
||||||
|
- [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
|
||||||
|
- [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
|
||||||
- [ ] Unit tests for all modules
|
- [ ] Unit tests for all modules
|
||||||
- [ ] Integration tests for end-to-end workflows
|
- [ ] Integration tests for end-to-end workflows
|
||||||
- [ ] Plugin system for custom file operations
|
- [ ] Plugin system for custom file operations
|
||||||
- [ ] Advanced configuration UI
|
- [ ] Advanced configuration UI
|
||||||
- [ ] Real-time download progress visualization
|
- [ ] Real-time download progress visualization
|
||||||
|
|
||||||
|
## 🔧 Recent Bug Fixes & Improvements (v3.4.1)
|
||||||
|
### **Enhanced Fuzzy Matching (v3.4.1)**
|
||||||
|
- **Improved `extract_artist_title` function**: Enhanced to handle multiple video title formats beyond simple "Artist - Title" patterns
|
||||||
|
- **"Title Karaoke | Artist Karaoke Version" format**: Correctly parses titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||||||
|
- **"Title Artist KARAOKE" format**: Handles titles ending with "KARAOKE" and attempts to extract artist information
|
||||||
|
- **Fallback handling**: Returns empty artist and full title for unparseable formats
|
||||||
|
- **Consolidated function usage**: Removed duplicate `extract_artist_title` implementations across modules
|
||||||
|
- **Single source of truth**: All modules now import from `fuzzy_matcher.py`
|
||||||
|
- **Consistent parsing**: Eliminated inconsistencies between different parsing implementations
|
||||||
|
- **Better maintainability**: Changes to parsing logic only need to be made in one place
|
||||||
|
|
||||||
|
### **Fixed Import Conflicts**
|
||||||
|
- **Resolved import conflict in `download_planner.py`**: Updated to use the enhanced `extract_artist_title` from `fuzzy_matcher.py` instead of the simpler version from `id3_utils.py`
|
||||||
|
- **Updated `id3_utils.py`**: Now imports `extract_artist_title` from `fuzzy_matcher.py` for consistency
|
||||||
|
|
||||||
|
### **Enhanced --limit Parameter**
|
||||||
|
- **Fixed limit application**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution
|
||||||
|
- **Improved performance**: When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists
|
||||||
|
|
||||||
|
### **Benefits of Recent Improvements**
|
||||||
|
- **Better matching accuracy**: Enhanced fuzzy matching can now handle a wider variety of video title formats commonly found on YouTube karaoke channels
|
||||||
|
- **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
|
||||||
|
- **Consistent behavior**: All parts of the system use the same parsing logic, eliminating edge cases where different modules would parse the same title differently
|
||||||
|
- **Improved performance**: The `--limit` parameter now works as expected, providing faster processing for targeted downloads
|
||||||
|
- **Cleaner codebase**: Eliminated duplicate code and import conflicts, making the system more maintainable
|
||||||
|
|||||||
33
README.md
33
README.md
@ -13,7 +13,7 @@ A Python-based Windows CLI tool to download karaoke videos from YouTube channels
|
|||||||
- 📈 **Real-Time Progress**: Detailed console and log output
|
- 📈 **Real-Time Progress**: Detailed console and log output
|
||||||
- 🧹 **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI
|
- 🧹 **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI
|
||||||
- 🗂️ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
|
- 🗂️ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
|
||||||
- 🧩 **Fuzzy Matching**: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
|
- 🧩 **Enhanced Fuzzy Matching**: Advanced fuzzy string matching for songlist-to-video matching with improved video title parsing (handles multiple title formats like "Title Karaoke | Artist Karaoke Version")
|
||||||
- ⚡ **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
|
- ⚡ **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
|
||||||
- 🔄 **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
|
- 🔄 **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
|
||||||
- 📋 **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
|
- 📋 **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
|
||||||
@ -80,6 +80,25 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
|||||||
- **Testability**: Modular components can be tested independently
|
- **Testability**: Modular components can be tested independently
|
||||||
- **Type Safety**: Comprehensive type hints across all new modules
|
- **Type Safety**: Comprehensive type hints across all new modules
|
||||||
|
|
||||||
|
## 🔧 Recent Improvements (v3.4.1)
|
||||||
|
### **Enhanced Fuzzy Matching**
|
||||||
|
- **Improved video title parsing**: The `extract_artist_title` function now handles multiple title formats:
|
||||||
|
- `"Title Karaoke | Artist Karaoke Version"` → Artist: "38 Special", Title: "Hold On Loosely"
|
||||||
|
- `"Title Artist KARAOKE"` → Attempts to extract artist from complex titles
|
||||||
|
- `"Artist - Title"` → Standard format (unchanged)
|
||||||
|
- **Consolidated parsing logic**: All modules now use the same `extract_artist_title` function from `fuzzy_matcher.py`
|
||||||
|
- **Better matching accuracy**: Reduced false negatives for songs with non-standard title formats
|
||||||
|
|
||||||
|
### **Fixed --limit Parameter**
|
||||||
|
- **Correct limit application**: The `--limit` parameter now properly limits the scanning phase, not just downloads
|
||||||
|
- **Improved performance**: When using `--limit N`, only the first N songs are scanned, significantly reducing processing time
|
||||||
|
- **Accurate logging**: Logging messages now show the correct counts for songs that will actually be processed when using `--limit`
|
||||||
|
|
||||||
|
### **Code Quality Improvements**
|
||||||
|
- **Eliminated duplicate functions**: Removed duplicate `extract_artist_title` implementations
|
||||||
|
- **Fixed import conflicts**: Resolved inconsistencies between different parsing implementations
|
||||||
|
- **Single source of truth**: All title parsing logic is now centralized in `fuzzy_matcher.py`
|
||||||
|
|
||||||
## 📋 Requirements
|
## 📋 Requirements
|
||||||
- **Windows 10/11**
|
- **Windows 10/11**
|
||||||
- **Python 3.7+**
|
- **Python 3.7+**
|
||||||
@ -104,7 +123,7 @@ python download_karaoke.py --songlist-only --limit 5
|
|||||||
|
|
||||||
### Download with Parallel Processing
|
### Download with Parallel Processing
|
||||||
```bash
|
```bash
|
||||||
python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
python download_karaoke.py --parallel --songlist-only --limit 10
|
||||||
```
|
```
|
||||||
|
|
||||||
### Focus on Specific Playlists by Title
|
### Focus on Specific Playlists by Title
|
||||||
@ -272,8 +291,8 @@ KaroakeVideoDownloader/
|
|||||||
- `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
|
- `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
|
||||||
- `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
|
- `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
|
||||||
- `--fuzzy-threshold <N>`: Fuzzy match threshold (0-100, default 85)
|
- `--fuzzy-threshold <N>`: Fuzzy match threshold (0-100, default 85)
|
||||||
- `--parallel`: Enable parallel downloads for improved speed
|
- `--parallel`: Enable parallel downloads for improved speed (defaults to 3 workers)
|
||||||
- `--workers <N>`: Number of parallel download workers (1-10, default: 3)
|
- `--workers <N>`: Number of parallel download workers (1-10, default: 3, only used with --parallel)
|
||||||
- `--generate-songlist <DIR1> <DIR2>...`: **Generate song list from MP4 files with ID3 tags in specified directories**
|
- `--generate-songlist <DIR1> <DIR2>...`: **Generate song list from MP4 files with ID3 tags in specified directories**
|
||||||
- `--no-append-songlist`: **Create a new song list instead of appending when using --generate-songlist**
|
- `--no-append-songlist`: **Create a new song list instead of appending when using --generate-songlist**
|
||||||
- `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
|
- `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
|
||||||
@ -287,10 +306,10 @@ KaroakeVideoDownloader/
|
|||||||
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
||||||
|
|
||||||
# Parallel downloads for faster processing
|
# Parallel downloads for faster processing
|
||||||
python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
python download_karaoke.py --parallel --songlist-only --limit 10
|
||||||
|
|
||||||
# Latest videos per channel with parallel downloads
|
# Latest videos per channel with parallel downloads
|
||||||
python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5
|
python download_karaoke.py --parallel --latest-per-channel --limit 5
|
||||||
|
|
||||||
# Traditional full scan (no limit)
|
# Traditional full scan (no limit)
|
||||||
python download_karaoke.py --songlist-only
|
python download_karaoke.py --songlist-only
|
||||||
@ -388,7 +407,7 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
|
|
||||||
### **New Parallel Download System (v3.4)**
|
### **New Parallel Download System (v3.4)**
|
||||||
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
||||||
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
|
- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
||||||
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
||||||
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
||||||
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
||||||
|
|||||||
@ -194,13 +194,13 @@ Examples:
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--parallel",
|
"--parallel",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help="Enable parallel downloads for improved speed (3-5x faster for large batches)",
|
help="Enable parallel downloads for improved speed (3-5x faster for large batches, defaults to 3 workers)",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--workers",
|
"--workers",
|
||||||
type=int,
|
type=int,
|
||||||
default=3,
|
default=3,
|
||||||
help="Number of parallel download workers (default: 3, max: 10)",
|
help="Number of parallel download workers (default: 3, max: 10, only used with --parallel)",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--generate-songlist",
|
"--generate-songlist",
|
||||||
|
|||||||
@ -17,6 +17,8 @@ from karaoke_downloader.cache_manager import (
|
|||||||
load_cached_plan,
|
load_cached_plan,
|
||||||
save_plan_cache,
|
save_plan_cache,
|
||||||
)
|
)
|
||||||
|
# Import all fuzzy matching functions including the enhanced extract_artist_title
|
||||||
|
# This ensures consistent parsing across all modules and supports multiple video title formats
|
||||||
from karaoke_downloader.fuzzy_matcher import (
|
from karaoke_downloader.fuzzy_matcher import (
|
||||||
create_song_key,
|
create_song_key,
|
||||||
create_video_key,
|
create_video_key,
|
||||||
|
|||||||
@ -458,26 +458,33 @@ class KaraokeDownloader:
|
|||||||
|
|
||||||
not_on_server.append(song)
|
not_on_server.append(song)
|
||||||
|
|
||||||
if server_available_mp4 > 0:
|
# Apply limit to undownloaded list before logging
|
||||||
print(
|
# This ensures that only the specified number of songs are processed and logged,
|
||||||
f"\n🎵 {server_available_mp4} songs already available as MP4 on server, skipping."
|
# providing accurate counts when using --limit
|
||||||
)
|
if limit is not None:
|
||||||
if server_available_other > 0:
|
original_count = len(not_on_server)
|
||||||
print(
|
not_on_server = not_on_server[:limit]
|
||||||
f"\n🎵 {server_available_other} songs found on server as MP3/CDG, will download video versions."
|
print(f"\n🎯 Limited to first {limit} songs (was {original_count} total)")
|
||||||
)
|
|
||||||
if marked_duplicates > 0:
|
|
||||||
print(
|
|
||||||
f"\n🏷️ {marked_duplicates} songs previously marked as server duplicates, skipping."
|
|
||||||
)
|
|
||||||
|
|
||||||
undownloaded = not_on_server
|
undownloaded = not_on_server
|
||||||
|
|
||||||
# Apply limit to undownloaded list before scanning
|
# Now log the counts based on the limited list
|
||||||
if limit is not None:
|
if server_available_mp4 > 0:
|
||||||
original_count = len(undownloaded)
|
print(
|
||||||
undownloaded = undownloaded[:limit]
|
f"\n🎵 {server_available_mp4} songs already available as MP4 on server, skipping."
|
||||||
print(f"\n🎯 Limited to first {limit} songs (was {original_count} total)")
|
)
|
||||||
|
if server_available_other > 0:
|
||||||
|
# Only count songs that are in the limited list
|
||||||
|
limited_server_other = sum(1 for song in not_on_server
|
||||||
|
if f"{song['artist'].lower()}_{normalize_title(song['title'])}" in server_songs)
|
||||||
|
if limited_server_other > 0:
|
||||||
|
print(
|
||||||
|
f"\n🎵 {limited_server_other} songs found on server as MP3/CDG, will download video versions."
|
||||||
|
)
|
||||||
|
if marked_duplicates > 0:
|
||||||
|
print(
|
||||||
|
f"\n🏷️ {marked_duplicates} songs previously marked as server duplicates, skipping."
|
||||||
|
)
|
||||||
|
|
||||||
print(f"\n🎯 {len(undownloaded)} songs need to be downloaded.")
|
print(f"\n🎯 {len(undownloaded)} songs need to be downloaded.")
|
||||||
if not undownloaded:
|
if not undownloaded:
|
||||||
|
|||||||
@ -32,8 +32,33 @@ def normalize_title(title):
|
|||||||
|
|
||||||
|
|
||||||
def extract_artist_title(video_title):
|
def extract_artist_title(video_title):
|
||||||
"""Extract artist and title from video title."""
|
"""
|
||||||
# Handle "Title - Artist" format
|
Extract artist and title from video title.
|
||||||
|
|
||||||
|
This function handles multiple common video title formats found on YouTube karaoke channels:
|
||||||
|
|
||||||
|
1. "Artist - Title" format: "38 Special - Hold On Loosely"
|
||||||
|
2. "Title Karaoke | Artist Karaoke Version" format: "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||||||
|
3. "Title Artist KARAOKE" format: "Hold On Loosely 38 Special KARAOKE"
|
||||||
|
|
||||||
|
Args:
|
||||||
|
video_title (str): The YouTube video title to parse
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
tuple: (artist, title) where artist and title are strings. If parsing fails,
|
||||||
|
artist will be empty string and title will be the full video title.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> extract_artist_title("38 Special - Hold On Loosely")
|
||||||
|
("38 Special", "Hold On Loosely")
|
||||||
|
|
||||||
|
>>> extract_artist_title("Hold On Loosely Karaoke | 38 Special Karaoke Version")
|
||||||
|
("38 Special", "Hold On Loosely")
|
||||||
|
|
||||||
|
>>> extract_artist_title("Unknown Format Video Title")
|
||||||
|
("", "Unknown Format Video Title")
|
||||||
|
"""
|
||||||
|
# Handle "Artist - Title" format
|
||||||
if " - " in video_title:
|
if " - " in video_title:
|
||||||
parts = video_title.split(" - ", 1)
|
parts = video_title.split(" - ", 1)
|
||||||
return parts[0].strip(), parts[1].strip()
|
return parts[0].strip(), parts[1].strip()
|
||||||
|
|||||||
@ -31,6 +31,8 @@ def clean_channel_name(channel_name: str) -> str:
|
|||||||
return "Unknown"
|
return "Unknown"
|
||||||
|
|
||||||
|
|
||||||
|
# Import the enhanced extract_artist_title function from fuzzy_matcher.py
|
||||||
|
# This ensures consistent parsing across all modules and supports multiple video title formats
|
||||||
from karaoke_downloader.fuzzy_matcher import extract_artist_title
|
from karaoke_downloader.fuzzy_matcher import extract_artist_title
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user