Signed-off-by: mbrucedogs <mbrucedogs@gmail.com>
This commit is contained in:
parent
409e66780c
commit
9f0787d00a
97
PRD.md
97
PRD.md
@ -415,6 +415,103 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
- **Flexible control**: Use with limits, parallel processing, and channel targeting
|
- **Flexible control**: Use with limits, parallel processing, and channel targeting
|
||||||
- **Clear progress feedback**: Real-time progress tracking for large downloads
|
- **Clear progress feedback**: Real-time progress tracking for large downloads
|
||||||
|
|
||||||
|
## 🔧 Recent Bug Fixes & Improvements (v3.4.5)
|
||||||
|
### **Unified Download Workflow Architecture**
|
||||||
|
- **Unified execution pipeline**: All download modes now use the same execution workflow, eliminating inconsistencies and broken pipelines
|
||||||
|
- **Consistent behavior**: All modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) use identical download execution, progress tracking, and error handling
|
||||||
|
- **Centralized download logic**: Single `execute_unified_download_workflow()` method handles all download execution
|
||||||
|
- **Automatic parallel support**: All download modes automatically support `--parallel --workers N` without additional implementation
|
||||||
|
- **Unified cache management**: Consistent progress tracking and resume functionality across all modes
|
||||||
|
|
||||||
|
### **Architecture Pattern for New Download Modes**
|
||||||
|
When adding new download modes in the future, follow this pattern to ensure consistency:
|
||||||
|
|
||||||
|
#### **1. Download Plan Building (Mode-Specific)**
|
||||||
|
Each download mode should build a download plan (list of videos to download) with this structure:
|
||||||
|
```python
|
||||||
|
download_plan = [
|
||||||
|
{
|
||||||
|
"video_id": "video_id",
|
||||||
|
"artist": "artist_name",
|
||||||
|
"title": "song_title",
|
||||||
|
"filename": "sanitized_filename.mp4",
|
||||||
|
"channel_name": "channel_name",
|
||||||
|
"video_title": "original_video_title",
|
||||||
|
"force_download": False
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **2. Unified Execution (Shared)**
|
||||||
|
All modes should use the unified execution workflow:
|
||||||
|
```python
|
||||||
|
downloaded_count, success = self.execute_unified_download_workflow(
|
||||||
|
download_plan=download_plan,
|
||||||
|
cache_file=cache_file, # Optional, for progress tracking
|
||||||
|
limit=limit, # Optional, for limiting downloads
|
||||||
|
show_progress=True, # Optional, for progress display
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **3. Execution Method Selection (Automatic)**
|
||||||
|
The unified workflow automatically chooses execution method based on settings:
|
||||||
|
- **Sequential**: Uses `DownloadPipeline` for single-threaded downloads
|
||||||
|
- **Parallel**: Uses `ParallelDownloader` when `--parallel` is enabled
|
||||||
|
|
||||||
|
#### **4. Required Implementation Pattern**
|
||||||
|
```python
|
||||||
|
def download_new_mode(self, ...):
|
||||||
|
"""New download mode implementation."""
|
||||||
|
|
||||||
|
# 1. Build download plan (mode-specific logic)
|
||||||
|
download_plan = []
|
||||||
|
for video in videos_to_download:
|
||||||
|
download_plan.append({
|
||||||
|
"video_id": video["id"],
|
||||||
|
"artist": artist,
|
||||||
|
"title": title,
|
||||||
|
"filename": filename,
|
||||||
|
"channel_name": channel_name,
|
||||||
|
"video_title": video["title"],
|
||||||
|
"force_download": force_download
|
||||||
|
})
|
||||||
|
|
||||||
|
# 2. Create cache file (optional, for progress tracking)
|
||||||
|
cache_file = get_download_plan_cache_file("new_mode", **plan_kwargs)
|
||||||
|
save_plan_cache(cache_file, download_plan, [])
|
||||||
|
|
||||||
|
# 3. Use unified execution workflow
|
||||||
|
downloaded_count, success = self.execute_unified_download_workflow(
|
||||||
|
download_plan=download_plan,
|
||||||
|
cache_file=cache_file,
|
||||||
|
limit=limit,
|
||||||
|
show_progress=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
return success
|
||||||
|
```
|
||||||
|
|
||||||
|
### **Benefits of Unified Architecture**
|
||||||
|
- **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
||||||
|
- **Maintainability**: Changes to download execution only need to be made in one place
|
||||||
|
- **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
||||||
|
- **Extensibility**: New modes automatically get all existing features (parallel downloads, progress tracking, etc.)
|
||||||
|
- **Testing**: Easier to test since all modes use the same execution logic
|
||||||
|
|
||||||
|
### **What Was Fixed**
|
||||||
|
- **Broken Pipeline**: Previously, different modes used different execution paths, leading to inconsistencies
|
||||||
|
- **Missing Method**: Added missing `download_latest_per_channel()` method that was referenced in CLI but not implemented
|
||||||
|
- **Code Duplication**: Eliminated duplicate download execution logic across different modes
|
||||||
|
- **Inconsistent Behavior**: All modes now have identical progress tracking, error handling, and cache management
|
||||||
|
|
||||||
|
### **Future Development Guidelines**
|
||||||
|
1. **NEVER implement custom download execution logic** in new download modes
|
||||||
|
2. **ALWAYS use `execute_unified_download_workflow()`** for download execution
|
||||||
|
3. **Focus on download plan building** - that's where mode-specific logic belongs
|
||||||
|
4. **Use the standard download plan structure** for consistency
|
||||||
|
5. **Implement cache file handling** for progress tracking and resume functionality
|
||||||
|
6. **Test with both sequential and parallel modes** to ensure compatibility
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🚀 Future Enhancements
|
## 🚀 Future Enhancements
|
||||||
|
|||||||
118
README.md
118
README.md
@ -49,47 +49,82 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
|||||||
- **`tracking_cli.py`**: Tracking management CLI
|
- **`tracking_cli.py`**: Tracking management CLI
|
||||||
|
|
||||||
### New Utility Modules (v3.3):
|
### New Utility Modules (v3.3):
|
||||||
- **`parallel_downloader.py`**: Parallel download management with thread-safe operations
|
|
||||||
- `ParallelDownloader` class: Manages concurrent downloads with configurable workers
|
|
||||||
- `DownloadTask` and `DownloadResult` dataclasses: Structured task and result management
|
|
||||||
- Thread-safe progress tracking and error handling
|
|
||||||
- Automatic retry mechanism for failed downloads
|
|
||||||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||||||
- `sanitize_filename()`: Create safe filenames from artist/title
|
- **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded
|
||||||
- `generate_possible_filenames()`: Generate filename patterns for different modes
|
|
||||||
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
|
|
||||||
- `is_valid_mp4_file()`: Validate MP4 files with header checking
|
|
||||||
- `cleanup_temp_files()`: Remove temporary yt-dlp files
|
|
||||||
- `ensure_directory_exists()`: Safe directory creation
|
|
||||||
|
|
||||||
- **`song_validator.py`**: Centralized song validation logic
|
### **Unified Download Workflow (v3.4.5)**
|
||||||
- `SongValidator` class: Unified logic for checking if songs should be downloaded
|
- **`execute_unified_download_workflow()`**: Centralized download execution that all modes use
|
||||||
- `should_skip_song()`: Comprehensive validation with multiple criteria
|
- **`_execute_sequential_downloads()`**: Sequential download execution using DownloadPipeline
|
||||||
- `mark_song_failed()`: Consistent failure tracking
|
- **`_execute_parallel_downloads()`**: Parallel download execution using ParallelDownloader
|
||||||
- `handle_download_failure()`: Standardized error handling
|
|
||||||
|
|
||||||
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
|
### **Benefits of Enhanced Modular Architecture:**
|
||||||
- `ConfigManager` class: Type-safe configuration loading and caching
|
- **Single Responsibility**: Each module has a focused purpose
|
||||||
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
|
|
||||||
- Configuration validation and merging with defaults
|
|
||||||
- Dynamic resolution updates
|
|
||||||
|
|
||||||
### Benefits:
|
|
||||||
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
||||||
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
||||||
|
- **Testability**: Individual components can be tested separately
|
||||||
|
- **Maintainability**: Easier to find and fix issues
|
||||||
|
- **Reusability**: Components can be used independently
|
||||||
|
- **Robustness**: Better error handling and interruption recovery
|
||||||
- **Consistency**: Standardized error messages and processing pipelines
|
- **Consistency**: Standardized error messages and processing pipelines
|
||||||
- **Maintainability**: Changes isolated to specific modules
|
|
||||||
- **Testability**: Modular components can be tested independently
|
|
||||||
- **Type Safety**: Comprehensive type hints across all new modules
|
- **Type Safety**: Comprehensive type hints across all new modules
|
||||||
|
- **Unified Execution**: All download modes use the same execution pipeline for consistency
|
||||||
|
|
||||||
|
## 🔧 Development Guidelines
|
||||||
|
|
||||||
|
### **Adding New Download Modes**
|
||||||
|
When adding new download modes, follow the unified workflow pattern to ensure consistency:
|
||||||
|
|
||||||
|
#### **1. Build Download Plan (Mode-Specific)**
|
||||||
|
```python
|
||||||
|
def download_new_mode(self, ...):
|
||||||
|
# Build download plan with standard structure
|
||||||
|
download_plan = []
|
||||||
|
for video in videos_to_download:
|
||||||
|
download_plan.append({
|
||||||
|
"video_id": video["id"],
|
||||||
|
"artist": artist,
|
||||||
|
"title": title,
|
||||||
|
"filename": filename,
|
||||||
|
"channel_name": channel_name,
|
||||||
|
"video_title": video["title"],
|
||||||
|
"force_download": force_download
|
||||||
|
})
|
||||||
|
|
||||||
|
# Use unified execution workflow
|
||||||
|
downloaded_count, success = self.execute_unified_download_workflow(
|
||||||
|
download_plan=download_plan,
|
||||||
|
cache_file=cache_file,
|
||||||
|
limit=limit,
|
||||||
|
show_progress=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
return success
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **2. Key Principles**
|
||||||
|
- **NEVER implement custom download execution logic** - always use `execute_unified_download_workflow()`
|
||||||
|
- **Focus on download plan building** - that's where mode-specific logic belongs
|
||||||
|
- **Use the standard download plan structure** for consistency
|
||||||
|
- **Implement cache file handling** for progress tracking and resume functionality
|
||||||
|
- **Test with both sequential and parallel modes** to ensure compatibility
|
||||||
|
|
||||||
|
#### **3. Benefits of Unified Architecture**
|
||||||
|
- **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
||||||
|
- **Automatic Features**: New modes automatically get parallel downloads, progress tracking, and cache management
|
||||||
|
- **Maintainability**: Changes to download execution only need to be made in one place
|
||||||
|
- **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
||||||
|
|
||||||
## 🔧 Recent Improvements (v3.4.1)
|
## 🔧 Recent Improvements (v3.4.1)
|
||||||
### **Enhanced Fuzzy Matching**
|
### **Enhanced Fuzzy Matching**
|
||||||
- **Improved video title parsing**: The `extract_artist_title` function now handles multiple title formats:
|
- **Improved title parsing**: Enhanced `extract_artist_title` function to handle multiple video title formats
|
||||||
- `"Title Karaoke | Artist Karaoke Version"` → Artist: "38 Special", Title: "Hold On Loosely"
|
- **Better matching accuracy**: Can now parse titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||||||
- `"Title Artist KARAOKE"` → Attempts to extract artist from complex titles
|
- **Consistent parsing**: All modules now use the same parsing logic from `fuzzy_matcher.py`
|
||||||
- `"Artist - Title"` → Standard format (unchanged)
|
- **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
|
||||||
- **Consolidated parsing logic**: All modules now use the same `extract_artist_title` function from `fuzzy_matcher.py`
|
|
||||||
- **Better matching accuracy**: Reduced false negatives for songs with non-standard title formats
|
### **Fixed Import Conflicts**
|
||||||
|
- **Resolved import conflicts**: Updated modules to use the enhanced `extract_artist_title` from `fuzzy_matcher.py`
|
||||||
|
- **Consistent behavior**: All parts of the system use the same parsing logic
|
||||||
|
- **Cleaner codebase**: Eliminated duplicate code and import conflicts
|
||||||
|
|
||||||
### **Fixed --limit Parameter**
|
### **Fixed --limit Parameter**
|
||||||
- **Correct limit application**: The `--limit` parameter now properly limits the scanning phase, not just downloads
|
- **Correct limit application**: The `--limit` parameter now properly limits the scanning phase, not just downloads
|
||||||
@ -101,6 +136,27 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
|||||||
- **Fixed import conflicts**: Resolved inconsistencies between different parsing implementations
|
- **Fixed import conflicts**: Resolved inconsistencies between different parsing implementations
|
||||||
- **Single source of truth**: All title parsing logic is now centralized in `fuzzy_matcher.py`
|
- **Single source of truth**: All title parsing logic is now centralized in `fuzzy_matcher.py`
|
||||||
|
|
||||||
|
## 🔧 Recent Improvements (v3.4.5)
|
||||||
|
### **Unified Download Workflow Architecture**
|
||||||
|
- **Unified execution pipeline**: All download modes now use the same execution workflow, eliminating inconsistencies and broken pipelines
|
||||||
|
- **Consistent behavior**: All modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) use identical download execution, progress tracking, and error handling
|
||||||
|
- **Centralized download logic**: Single `execute_unified_download_workflow()` method handles all download execution
|
||||||
|
- **Automatic parallel support**: All download modes automatically support `--parallel --workers N` without additional implementation
|
||||||
|
- **Unified cache management**: Consistent progress tracking and resume functionality across all modes
|
||||||
|
|
||||||
|
### **What Was Fixed**
|
||||||
|
- **Broken Pipeline**: Previously, different modes used different execution paths, leading to inconsistencies
|
||||||
|
- **Missing Method**: Added missing `download_latest_per_channel()` method that was referenced in CLI but not implemented
|
||||||
|
- **Code Duplication**: Eliminated duplicate download execution logic across different modes
|
||||||
|
- **Inconsistent Behavior**: All modes now have identical progress tracking, error handling, and cache management
|
||||||
|
|
||||||
|
### **Benefits**
|
||||||
|
- ✅ **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
||||||
|
- ✅ **Maintainability**: Changes to download execution only need to be made in one place
|
||||||
|
- ✅ **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
||||||
|
- ✅ **Extensibility**: New modes automatically get all existing features (parallel downloads, progress tracking, etc.)
|
||||||
|
- ✅ **Testing**: Easier to test since all modes use the same execution logic
|
||||||
|
|
||||||
## 🛡️ Duplicate File Prevention & Filename Consistency (v3.4.2)
|
## 🛡️ Duplicate File Prevention & Filename Consistency (v3.4.2)
|
||||||
### **Duplicate File Prevention**
|
### **Duplicate File Prevention**
|
||||||
- **Enhanced file existence checking**: Now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates
|
- **Enhanced file existence checking**: Now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates
|
||||||
|
|||||||
@ -443,43 +443,16 @@ class KaraokeDownloader:
|
|||||||
print(f" 🎯 Limit applied: {limit} videos")
|
print(f" 🎯 Limit applied: {limit} videos")
|
||||||
print(f" 📁 Output directory: downloads/{channel_name}/")
|
print(f" 📁 Output directory: downloads/{channel_name}/")
|
||||||
print(f" 💾 Download plan cached to: {cache_file.name}")
|
print(f" 💾 Download plan cached to: {cache_file.name}")
|
||||||
print(f"\n🎬 Starting downloads...")
|
|
||||||
|
|
||||||
# Download videos using the download pipeline
|
# Use unified download workflow
|
||||||
pipeline = DownloadPipeline(
|
downloaded_count, success = self.execute_unified_download_workflow(
|
||||||
yt_dlp_path=str(self.yt_dlp_path),
|
download_plan=download_plan,
|
||||||
config=self.config,
|
cache_file=cache_file,
|
||||||
downloads_dir=self.downloads_dir,
|
limit=limit,
|
||||||
songlist_tracking=self.songlist_tracking,
|
show_progress=True,
|
||||||
tracker=self.tracker,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
success_count = 0
|
return success
|
||||||
total_to_download = len(download_plan)
|
|
||||||
for i, plan_item in enumerate(download_plan[:], 1): # Use slice to create a copy for iteration
|
|
||||||
print(f"⬇️ Downloading {i}/{total_to_download}: {plan_item['artist']} - {plan_item['title']}")
|
|
||||||
|
|
||||||
if pipeline.execute_pipeline(
|
|
||||||
video_id=plan_item["video_id"],
|
|
||||||
artist=plan_item["artist"],
|
|
||||||
title=plan_item["title"],
|
|
||||||
channel_name=plan_item["channel_name"],
|
|
||||||
video_title=plan_item["video_title"],
|
|
||||||
):
|
|
||||||
success_count += 1
|
|
||||||
# Remove completed item from cache
|
|
||||||
download_plan.remove(plan_item) # Remove the current item
|
|
||||||
if download_plan: # If there are still items left
|
|
||||||
save_plan_cache(cache_file, download_plan, [])
|
|
||||||
print(f"🗑️ Removed completed item from download plan. {len(download_plan)} items remaining.")
|
|
||||||
else:
|
|
||||||
# All downloads completed, delete the cache file
|
|
||||||
from karaoke_downloader.cache_manager import delete_plan_cache
|
|
||||||
delete_plan_cache(cache_file)
|
|
||||||
print("🗑️ All downloads completed, deleted download plan cache.")
|
|
||||||
|
|
||||||
print(f"\n🎉 Download complete! {success_count}/{total_to_download} videos downloaded successfully")
|
|
||||||
return success_count > 0
|
|
||||||
|
|
||||||
def download_songlist_across_channels(
|
def download_songlist_across_channels(
|
||||||
self,
|
self,
|
||||||
@ -718,14 +691,134 @@ class KaraokeDownloader:
|
|||||||
print(f" ...and {len(unmatched)-DEFAULT_DISPLAY_LIMIT} more.")
|
print(f" ...and {len(unmatched)-DEFAULT_DISPLAY_LIMIT} more.")
|
||||||
|
|
||||||
# --- Download phase ---
|
# --- Download phase ---
|
||||||
downloaded_count, success = self.execute_download_plan_parallel(
|
downloaded_count, success = self.execute_unified_download_workflow(
|
||||||
download_plan=download_plan,
|
download_plan=download_plan,
|
||||||
unmatched=unmatched,
|
|
||||||
cache_file=cache_file,
|
cache_file=cache_file,
|
||||||
limit=limit,
|
limit=limit,
|
||||||
)
|
)
|
||||||
return success
|
return success
|
||||||
|
|
||||||
|
def download_latest_per_channel(
|
||||||
|
self,
|
||||||
|
channel_urls,
|
||||||
|
limit=None,
|
||||||
|
force_refresh_download_plan=False,
|
||||||
|
fuzzy_match=False,
|
||||||
|
fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD,
|
||||||
|
force_download=False,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Download the latest N videos from each channel.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
channel_urls: List of channel URLs to process
|
||||||
|
limit: Number of latest videos to download from each channel
|
||||||
|
force_refresh_download_plan: Force refresh the download plan cache
|
||||||
|
fuzzy_match: Whether to use fuzzy matching
|
||||||
|
fuzzy_threshold: Threshold for fuzzy matching
|
||||||
|
force_download: Force download regardless of existing files
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
bool: True if successful, False otherwise
|
||||||
|
"""
|
||||||
|
print(f"\n🎬 Downloading latest {limit} videos from {len(channel_urls)} channels")
|
||||||
|
|
||||||
|
# Build download plan for latest videos from each channel
|
||||||
|
download_plan = []
|
||||||
|
total_videos_found = 0
|
||||||
|
|
||||||
|
for i, channel_url in enumerate(channel_urls, 1):
|
||||||
|
print(f"\n🚦 Processing channel {i}/{len(channel_urls)}: {channel_url}")
|
||||||
|
|
||||||
|
# Get channel info
|
||||||
|
channel_name, channel_id = get_channel_info(channel_url)
|
||||||
|
print(f" ✅ Channel: {channel_name}")
|
||||||
|
|
||||||
|
# Get videos from channel
|
||||||
|
available_videos = self.tracker.get_channel_video_list(
|
||||||
|
channel_url,
|
||||||
|
str(self.yt_dlp_path),
|
||||||
|
force_refresh=False
|
||||||
|
)
|
||||||
|
|
||||||
|
if not available_videos:
|
||||||
|
print(f" ⚠️ No videos found for {channel_name}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
print(f" 📊 Found {len(available_videos)} videos")
|
||||||
|
|
||||||
|
# Take the latest N videos (they're already sorted by date)
|
||||||
|
latest_videos = available_videos[:limit] if limit else available_videos
|
||||||
|
print(f" 🎯 Processing latest {len(latest_videos)} videos")
|
||||||
|
|
||||||
|
# Process each video
|
||||||
|
for video in latest_videos:
|
||||||
|
video_title = video["title"]
|
||||||
|
video_id = video["id"]
|
||||||
|
|
||||||
|
# Extract artist and title
|
||||||
|
artist, extracted_title = self.channel_parser.extract_artist_title(video_title, channel_name)
|
||||||
|
if not artist and not extracted_title:
|
||||||
|
# Fallback: use the full title
|
||||||
|
artist = ""
|
||||||
|
extracted_title = video_title
|
||||||
|
|
||||||
|
# Create filename
|
||||||
|
filename = sanitize_filename(artist, extracted_title)
|
||||||
|
|
||||||
|
# Add to download plan
|
||||||
|
download_plan.append({
|
||||||
|
"video_id": video_id,
|
||||||
|
"artist": artist,
|
||||||
|
"title": extracted_title,
|
||||||
|
"filename": filename,
|
||||||
|
"channel_name": channel_name,
|
||||||
|
"video_title": video_title,
|
||||||
|
"force_download": force_download
|
||||||
|
})
|
||||||
|
|
||||||
|
total_videos_found += 1
|
||||||
|
|
||||||
|
print(f"\n📋 Download plan created: {total_videos_found} videos from {len(channel_urls)} channels")
|
||||||
|
|
||||||
|
if not download_plan:
|
||||||
|
print("❌ No videos to download")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Create cache file for progress tracking
|
||||||
|
import hashlib
|
||||||
|
from karaoke_downloader.cache_manager import get_download_plan_cache_file, save_plan_cache
|
||||||
|
|
||||||
|
plan_kwargs = {
|
||||||
|
"mode": "latest_per_channel",
|
||||||
|
"channels": len(channel_urls),
|
||||||
|
"limit_per_channel": limit,
|
||||||
|
"force_download": force_download,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add channel URLs hash to ensure same channels = same cache
|
||||||
|
channels_hash = hashlib.md5(
|
||||||
|
"|".join(sorted(channel_urls)).encode()
|
||||||
|
).hexdigest()[:8]
|
||||||
|
plan_kwargs["channels_hash"] = channels_hash
|
||||||
|
|
||||||
|
cache_file = get_download_plan_cache_file("latest_per_channel", **plan_kwargs)
|
||||||
|
|
||||||
|
# Save the plan to cache
|
||||||
|
save_plan_cache(cache_file, download_plan, []) # No unmatched for latest-per-channel mode
|
||||||
|
|
||||||
|
print(f"💾 Download plan cached to: {cache_file.name}")
|
||||||
|
|
||||||
|
# Use unified download workflow
|
||||||
|
downloaded_count, success = self.execute_unified_download_workflow(
|
||||||
|
download_plan=download_plan,
|
||||||
|
cache_file=cache_file,
|
||||||
|
limit=None, # Limit already applied during plan building
|
||||||
|
show_progress=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
def _process_videos_for_download(self, available_videos, channel_name, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD, force_download=False):
|
def _process_videos_for_download(self, available_videos, channel_name, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD, force_download=False):
|
||||||
"""Process videos for download (used for both manual and regular channels)."""
|
"""Process videos for download (used for both manual and regular channels)."""
|
||||||
songlist = load_songlist(self.songlist_file_path)
|
songlist = load_songlist(self.songlist_file_path)
|
||||||
@ -841,49 +934,39 @@ class KaraokeDownloader:
|
|||||||
matches = matches[:limit]
|
matches = matches[:limit]
|
||||||
print(f"🎯 Limiting to {len(matches)} downloads")
|
print(f"🎯 Limiting to {len(matches)} downloads")
|
||||||
|
|
||||||
# Download matched videos
|
# Convert matches to a download plan
|
||||||
success_count = 0
|
download_plan = []
|
||||||
for i, match in enumerate(matches, 1):
|
for match in matches:
|
||||||
video = match["video"]
|
video = match["video"]
|
||||||
song = match["song"]
|
song = match["song"]
|
||||||
artist = match["artist"]
|
artist = match["artist"]
|
||||||
title = match["title"]
|
title = match["title"]
|
||||||
video_id = video["id"]
|
video_id = video["id"]
|
||||||
|
|
||||||
print(f"\n⬇️ Downloading {i}/{len(matches)}: {artist} - {title}")
|
|
||||||
print(f" 🎬 Video: {video['title']} ({channel_name})")
|
|
||||||
if match["match_type"] == "fuzzy":
|
|
||||||
print(f" 🎯 Match Score: {match['match_score']:.1f}%")
|
|
||||||
|
|
||||||
# Create filename
|
# Create filename
|
||||||
filename = sanitize_filename(artist, title)
|
filename = sanitize_filename(artist, title)
|
||||||
output_path = self.downloads_dir / channel_name / filename
|
output_path = self.downloads_dir / channel_name / filename
|
||||||
|
|
||||||
# Use the download pipeline
|
# Add to download plan
|
||||||
pipeline = DownloadPipeline(
|
download_plan.append({
|
||||||
yt_dlp_path=str(self.yt_dlp_path),
|
"video_id": video_id,
|
||||||
config=self.config,
|
"artist": artist,
|
||||||
downloads_dir=self.downloads_dir,
|
"title": title,
|
||||||
songlist_tracking=self.songlist_tracking,
|
"filename": filename,
|
||||||
tracker=self.tracker,
|
"channel_name": channel_name,
|
||||||
)
|
"video_title": video["title"],
|
||||||
|
"force_download": force_download
|
||||||
success = pipeline.execute_pipeline(
|
})
|
||||||
video_id=video_id,
|
|
||||||
artist=artist,
|
|
||||||
title=title,
|
|
||||||
channel_name=channel_name,
|
|
||||||
video_title=video["title"]
|
|
||||||
)
|
|
||||||
|
|
||||||
if success:
|
|
||||||
success_count += 1
|
|
||||||
print(f"✅ Successfully downloaded: {artist} - {title}")
|
|
||||||
else:
|
|
||||||
print(f"❌ Failed to download: {artist} - {title}")
|
|
||||||
|
|
||||||
print(f"\n🎉 Download complete! {success_count}/{len(matches)} videos downloaded successfully")
|
# Use the unified download workflow
|
||||||
return success_count > 0
|
downloaded_count, success = self.execute_unified_download_workflow(
|
||||||
|
download_plan=download_plan,
|
||||||
|
cache_file=None, # No specific cache file for this mode
|
||||||
|
limit=limit,
|
||||||
|
show_progress=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
def _download_single_video(self, video, channel_name, filename, force_download=False):
|
def _download_single_video(self, video, channel_name, filename, force_download=False):
|
||||||
"""Download a single video using the download pipeline."""
|
"""Download a single video using the download pipeline."""
|
||||||
@ -923,6 +1006,138 @@ class KaraokeDownloader:
|
|||||||
|
|
||||||
return success
|
return success
|
||||||
|
|
||||||
|
def execute_unified_download_workflow(
|
||||||
|
self,
|
||||||
|
download_plan,
|
||||||
|
cache_file=None,
|
||||||
|
limit=None,
|
||||||
|
show_progress=True,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Unified download workflow that all download modes use.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
download_plan: List of download items with video_id, artist, title, channel_name, video_title
|
||||||
|
cache_file: Optional cache file for progress tracking
|
||||||
|
limit: Optional limit on number of downloads
|
||||||
|
show_progress: Whether to show progress information
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
tuple: (downloaded_count, success)
|
||||||
|
"""
|
||||||
|
if not download_plan:
|
||||||
|
print("📋 No videos to download in plan")
|
||||||
|
return 0, True
|
||||||
|
|
||||||
|
total_to_download = len(download_plan)
|
||||||
|
if limit:
|
||||||
|
total_to_download = min(limit, total_to_download)
|
||||||
|
download_plan = download_plan[:limit]
|
||||||
|
|
||||||
|
if show_progress:
|
||||||
|
print(f"\n🎬 Starting downloads: {total_to_download} videos")
|
||||||
|
print(f" 📁 Output directory: downloads/")
|
||||||
|
if cache_file:
|
||||||
|
print(f" 💾 Progress tracking: {cache_file.name}")
|
||||||
|
|
||||||
|
# Choose execution method based on parallel settings
|
||||||
|
if self.enable_parallel_downloads:
|
||||||
|
return self._execute_parallel_downloads(download_plan, cache_file, show_progress)
|
||||||
|
else:
|
||||||
|
return self._execute_sequential_downloads(download_plan, cache_file, show_progress)
|
||||||
|
|
||||||
|
def _execute_sequential_downloads(self, download_plan, cache_file, show_progress):
|
||||||
|
"""Execute downloads sequentially using the download pipeline."""
|
||||||
|
success_count = 0
|
||||||
|
total_to_download = len(download_plan)
|
||||||
|
|
||||||
|
# Create download pipeline
|
||||||
|
pipeline = DownloadPipeline(
|
||||||
|
yt_dlp_path=str(self.yt_dlp_path),
|
||||||
|
config=self.config,
|
||||||
|
downloads_dir=self.downloads_dir,
|
||||||
|
songlist_tracking=self.songlist_tracking,
|
||||||
|
tracker=self.tracker,
|
||||||
|
)
|
||||||
|
|
||||||
|
for i, plan_item in enumerate(download_plan, 1):
|
||||||
|
if show_progress:
|
||||||
|
print(f"\n⬇️ Downloading {i}/{total_to_download}: {plan_item['artist']} - {plan_item['title']}")
|
||||||
|
print(f" 🎬 Video: {plan_item['video_title']} ({plan_item['channel_name']})")
|
||||||
|
|
||||||
|
success = pipeline.execute_pipeline(
|
||||||
|
video_id=plan_item["video_id"],
|
||||||
|
artist=plan_item["artist"],
|
||||||
|
title=plan_item["title"],
|
||||||
|
channel_name=plan_item["channel_name"],
|
||||||
|
video_title=plan_item["video_title"],
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
success_count += 1
|
||||||
|
if show_progress:
|
||||||
|
print(f"✅ Successfully downloaded: {plan_item['artist']} - {plan_item['title']}")
|
||||||
|
else:
|
||||||
|
if show_progress:
|
||||||
|
print(f"❌ Failed to download: {plan_item['artist']} - {plan_item['title']}")
|
||||||
|
|
||||||
|
# Update cache if provided
|
||||||
|
if cache_file:
|
||||||
|
# Remove completed item from plan and update cache
|
||||||
|
download_plan.remove(plan_item)
|
||||||
|
from karaoke_downloader.cache_manager import save_plan_cache
|
||||||
|
save_plan_cache(cache_file, download_plan, []) # No unmatched for unified workflow
|
||||||
|
|
||||||
|
if not download_plan: # All downloads completed
|
||||||
|
from karaoke_downloader.cache_manager import delete_plan_cache
|
||||||
|
delete_plan_cache(cache_file)
|
||||||
|
if show_progress:
|
||||||
|
print("🗑️ All downloads completed, deleted download plan cache.")
|
||||||
|
|
||||||
|
if show_progress:
|
||||||
|
print(f"\n🎉 Download complete! {success_count}/{total_to_download} videos downloaded successfully")
|
||||||
|
|
||||||
|
return success_count, success_count > 0
|
||||||
|
|
||||||
|
def _execute_parallel_downloads(self, download_plan, cache_file, show_progress):
|
||||||
|
"""Execute downloads in parallel using the parallel downloader."""
|
||||||
|
from karaoke_downloader.parallel_downloader import create_parallel_downloader
|
||||||
|
|
||||||
|
# Create parallel downloader
|
||||||
|
parallel_downloader = create_parallel_downloader(
|
||||||
|
yt_dlp_path=str(self.yt_dlp_path),
|
||||||
|
config=self.config,
|
||||||
|
downloads_dir=self.downloads_dir,
|
||||||
|
max_workers=self.parallel_workers,
|
||||||
|
songlist_tracking=self.songlist_tracking,
|
||||||
|
tracker=self.tracker,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Convert download plan to tasks
|
||||||
|
tasks = []
|
||||||
|
for item in download_plan:
|
||||||
|
from karaoke_downloader.parallel_downloader import DownloadTask
|
||||||
|
task = DownloadTask(
|
||||||
|
video_id=item["video_id"],
|
||||||
|
artist=item["artist"],
|
||||||
|
title=item["title"],
|
||||||
|
channel_name=item["channel_name"],
|
||||||
|
video_title=item["video_title"],
|
||||||
|
)
|
||||||
|
tasks.append(task)
|
||||||
|
|
||||||
|
# Execute parallel downloads
|
||||||
|
results = parallel_downloader.execute_downloads(tasks)
|
||||||
|
|
||||||
|
# Count successes
|
||||||
|
success_count = sum(1 for result in results if result.success)
|
||||||
|
total_to_download = len(tasks)
|
||||||
|
|
||||||
|
if show_progress:
|
||||||
|
print(f"\n🎉 Parallel download complete! {success_count}/{total_to_download} videos downloaded successfully")
|
||||||
|
|
||||||
|
return success_count, success_count > 0
|
||||||
|
|
||||||
|
|
||||||
def reset_songlist_all():
|
def reset_songlist_all():
|
||||||
"""Delete all files tracked in songlist_tracking.json, clear songlist_tracking.json, and remove songlist songs from karaoke_tracking.json."""
|
"""Delete all files tracked in songlist_tracking.json, clear songlist_tracking.json, and remove songlist songs from karaoke_tracking.json."""
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user