Signed-off-by: mbrucedogs <mbrucedogs@gmail.com>

This commit is contained in:
mbrucedogs 2025-07-25 13:48:45 -05:00
parent a135efa13a
commit c5a3838e82
10 changed files with 838 additions and 337 deletions

74
PRD.md
View File

@ -1,5 +1,5 @@
# 🎤 Karaoke Video Downloader PRD (v3.2) # 🎤 Karaoke Video Downloader PRD (v3.3)
## ✅ Overview ## ✅ Overview
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse. A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
@ -20,7 +20,7 @@ The codebase has been refactored into focused modules with centralized utilities
- **`server_manager.py`**: Server song availability checking - **`server_manager.py`**: Server song availability checking
- **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions - **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions
### New Utility Modules (v3.2): ### Utility Modules (v3.2):
- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation
- **`error_utils.py`**: Standardized error handling and formatting - **`error_utils.py`**: Standardized error handling and formatting
- **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline - **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline
@ -29,15 +29,20 @@ The codebase has been refactored into focused modules with centralized utilities
- **`resolution_cli.py`**: Resolution checking utilities - **`resolution_cli.py`**: Resolution checking utilities
- **`tracking_cli.py`**: Tracking management CLI - **`tracking_cli.py`**: Tracking management CLI
### New Utility Modules (v3.3):
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
- **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded
### Benefits of Enhanced Modular Architecture: ### Benefits of Enhanced Modular Architecture:
- **Single Responsibility**: Each module has a focused purpose - **Single Responsibility**: Each module has a focused purpose
- **Centralized Utilities**: Common operations (yt-dlp commands, error handling) are centralized - **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
- **Reduced Duplication**: Eliminated code duplication across modules - **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
- **Testability**: Individual components can be tested separately - **Testability**: Individual components can be tested separately
- **Maintainability**: Easier to find and fix issues - **Maintainability**: Easier to find and fix issues
- **Reusability**: Components can be used independently - **Reusability**: Components can be used independently
- **Robustness**: Better error handling and interruption recovery - **Robustness**: Better error handling and interruption recovery
- **Consistency**: Standardized error messages and processing pipelines - **Consistency**: Standardized error messages and processing pipelines
- **Type Safety**: Comprehensive type hints across all new modules
--- ---
@ -95,6 +100,7 @@ python download_karaoke.py --clear-cache SingKingKaraoke
- ✅ Configurable download resolution and yt-dlp options (`data/config.json`) - ✅ Configurable download resolution and yt-dlp options (`data/config.json`)
- ✅ Songlist integration: prioritize and track custom songlists - ✅ Songlist integration: prioritize and track custom songlists
- ✅ Songlist-only mode: download only songs from the songlist - ✅ Songlist-only mode: download only songs from the songlist
- ✅ Songlist focus mode: download only songs from specific playlists by title
- ✅ Global songlist tracking to avoid duplicates across channels - ✅ Global songlist tracking to avoid duplicates across channels
- ✅ ID3 tagging for artist/title in MP4 files (mutagen) - ✅ ID3 tagging for artist/title in MP4 files (mutagen)
- ✅ Real-time progress and detailed logging - ✅ Real-time progress and detailed logging
@ -113,6 +119,9 @@ python download_karaoke.py --clear-cache SingKingKaraoke
- ✅ **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting - ✅ **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting
- ✅ **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing - ✅ **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing
- ✅ **Reduced code duplication**: Eliminated duplicate code across modules through centralized utilities - ✅ **Reduced code duplication**: Eliminated duplicate code across modules through centralized utilities
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
--- ---
@ -134,7 +143,9 @@ KaroakeVideoDownloader/
│ ├── error_utils.py # Standardized error handling and formatting │ ├── error_utils.py # Standardized error handling and formatting
│ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline │ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline
│ ├── id3_utils.py # ID3 tagging utilities │ ├── id3_utils.py # ID3 tagging utilities
│ ├── config_manager.py # Configuration management │ ├── config_manager.py # Configuration management with dataclasses
│ ├── file_utils.py # Centralized file operations and filename handling
│ ├── song_validator.py # Centralized song validation logic
│ ├── check_resolution.py # Resolution checker utility │ ├── check_resolution.py # Resolution checker utility
│ ├── resolution_cli.py # Resolution config CLI │ ├── resolution_cli.py # Resolution config CLI
│ └── tracking_cli.py # Tracking management CLI │ └── tracking_cli.py # Tracking management CLI
@ -164,6 +175,7 @@ KaroakeVideoDownloader/
- `--file <data/channels.txt>`: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes) - `--file <data/channels.txt>`: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes)
- `--songlist-priority`: Prioritize songlist songs in download queue - `--songlist-priority`: Prioritize songlist songs in download queue
- `--songlist-only`: Download only songs from the songlist - `--songlist-only`: Download only songs from the songlist
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
- `--songlist-status`: Show songlist download progress - `--songlist-status`: Show songlist download progress
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit) - `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
- `--resolution <720p|1080p|...>`: Override resolution - `--resolution <720p|1080p|...>`: Override resolution
@ -186,20 +198,45 @@ KaroakeVideoDownloader/
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download. - **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels. - **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
## 🔧 Refactoring Improvements (v3.2) ## 🔧 Refactoring Improvements (v3.3)
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication: The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
### **Centralized Utilities** ### **New Utility Modules (v3.3)**
- **`youtube_utils.py`**: Centralized yt-dlp command generation and YouTube operations - **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
- **`error_utils.py`**: Standardized error handling with structured exception hierarchy - `sanitize_filename()`: Create safe filenames from artist/title
- **`download_pipeline.py`**: Abstracted download pipeline for consistent processing - `generate_possible_filenames()`: Generate filename patterns for different modes
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
- `is_valid_mp4_file()`: Validate MP4 files with header checking
- `cleanup_temp_files()`: Remove temporary yt-dlp files
- `ensure_directory_exists()`: Safe directory creation
- **`song_validator.py`**: Centralized song validation logic
- `SongValidator` class: Unified logic for checking if songs should be downloaded
- `should_skip_song()`: Comprehensive validation with multiple criteria
- `mark_song_failed()`: Consistent failure tracking
- `handle_download_failure()`: Standardized error handling
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
- `ConfigManager` class: Type-safe configuration loading and caching
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
- Configuration validation and merging with defaults
- Dynamic resolution updates
### **Benefits Achieved** ### **Benefits Achieved**
- **Reduced Duplication**: Eliminated ~50 lines of duplicated yt-dlp command generation - **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules
- **Improved Maintainability**: Changes to yt-dlp configuration only require updates in one place - **Centralized File Operations**: Single source of truth for filename handling and file validation
- **Enhanced Error Handling**: Consistent error messages and better debugging context - **Unified Song Validation**: Consistent logic for checking if songs should be downloaded
- **Better Code Organization**: Clear separation of concerns and logical module structure - **Enhanced Type Safety**: Comprehensive type hints across all new modules
- **Increased Testability**: Modular components can be tested independently - **Improved Configuration Management**: Structured configuration with validation and caching
- **Better Error Handling**: Consistent patterns via centralized utilities
- **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place
- **Improved Testability**: Modular components can be tested independently
- **Better Developer Experience**: Clear function signatures and comprehensive documentation
### **Previous Improvements (v3.2)**
- **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations
- **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting
- **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing
- **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set. - **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set.
- **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done. - **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done.
- **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach. - **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach.
@ -208,6 +245,8 @@ The codebase has been comprehensively refactored to improve maintainability and
- **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly. - **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly.
- **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted. - **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted.
- **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels. - **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels.
- **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls.
- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
--- ---
@ -219,3 +258,6 @@ The codebase has been comprehensively refactored to improve maintainability and
- [ ] Parallel downloads for improved speed - [ ] Parallel downloads for improved speed
- [ ] Unit tests for all modules - [ ] Unit tests for all modules
- [ ] Integration tests for end-to-end workflows - [ ] Integration tests for end-to-end workflows
- [ ] Plugin system for custom file operations
- [ ] Advanced configuration UI
- [ ] Real-time download progress visualization

View File

@ -35,7 +35,7 @@ The codebase has been comprehensively refactored into a modular architecture wit
- **`server_manager.py`**: Server song availability checking - **`server_manager.py`**: Server song availability checking
- **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions - **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions
### Utility Modules: ### Utility Modules (v3.2):
- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation
- **`error_utils.py`**: Standardized error handling and formatting - **`error_utils.py`**: Standardized error handling and formatting
- **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline - **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline
@ -44,12 +44,34 @@ The codebase has been comprehensively refactored into a modular architecture wit
- **`resolution_cli.py`**: Resolution checking utilities - **`resolution_cli.py`**: Resolution checking utilities
- **`tracking_cli.py`**: Tracking management CLI - **`tracking_cli.py`**: Tracking management CLI
### New Utility Modules (v3.3):
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
- `sanitize_filename()`: Create safe filenames from artist/title
- `generate_possible_filenames()`: Generate filename patterns for different modes
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
- `is_valid_mp4_file()`: Validate MP4 files with header checking
- `cleanup_temp_files()`: Remove temporary yt-dlp files
- `ensure_directory_exists()`: Safe directory creation
- **`song_validator.py`**: Centralized song validation logic
- `SongValidator` class: Unified logic for checking if songs should be downloaded
- `should_skip_song()`: Comprehensive validation with multiple criteria
- `mark_song_failed()`: Consistent failure tracking
- `handle_download_failure()`: Standardized error handling
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
- `ConfigManager` class: Type-safe configuration loading and caching
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
- Configuration validation and merging with defaults
- Dynamic resolution updates
### Benefits: ### Benefits:
- **Centralized Utilities**: Common operations (yt-dlp commands, error handling) are centralized - **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
- **Reduced Duplication**: Eliminated code duplication across modules - **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
- **Consistency**: Standardized error messages and processing pipelines - **Consistency**: Standardized error messages and processing pipelines
- **Maintainability**: Changes isolated to specific modules - **Maintainability**: Changes isolated to specific modules
- **Testability**: Modular components can be tested independently - **Testability**: Modular components can be tested independently
- **Type Safety**: Comprehensive type hints across all new modules
## 📋 Requirements ## 📋 Requirements
- **Windows 10/11** - **Windows 10/11**
@ -181,7 +203,9 @@ KaroakeVideoDownloader/
│ ├── error_utils.py # Standardized error handling and formatting │ ├── error_utils.py # Standardized error handling and formatting
│ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline │ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline
│ ├── id3_utils.py # ID3 tagging utilities │ ├── id3_utils.py # ID3 tagging utilities
│ ├── config_manager.py # Configuration management │ ├── config_manager.py # Configuration management with dataclasses
│ ├── file_utils.py # Centralized file operations and filename handling
│ ├── song_validator.py # Centralized song validation logic
│ ├── check_resolution.py # Resolution checker utility │ ├── check_resolution.py # Resolution checker utility
│ ├── resolution_cli.py # Resolution config CLI │ ├── resolution_cli.py # Resolution config CLI
│ └── tracking_cli.py # Tracking management CLI │ └── tracking_cli.py # Tracking management CLI
@ -271,25 +295,55 @@ python download_karaoke.py --clear-server-duplicates
> **🔄 Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage. > **🔄 Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage.
## 🔧 Refactoring Improvements (v3.2) ## 🔧 Refactoring Improvements (v3.3)
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication: The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
### **Key Improvements** ### **New Utility Modules (v3.3)**
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
- `sanitize_filename()`: Create safe filenames from artist/title
- `generate_possible_filenames()`: Generate filename patterns for different modes
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
- `is_valid_mp4_file()`: Validate MP4 files with header checking
- `cleanup_temp_files()`: Remove temporary yt-dlp files
- `ensure_directory_exists()`: Safe directory creation
- **`song_validator.py`**: Centralized song validation logic
- `SongValidator` class: Unified logic for checking if songs should be downloaded
- `should_skip_song()`: Comprehensive validation with multiple criteria
- `mark_song_failed()`: Consistent failure tracking
- `handle_download_failure()`: Standardized error handling
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
- `ConfigManager` class: Type-safe configuration loading and caching
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
- Configuration validation and merging with defaults
- Dynamic resolution updates
### **Benefits Achieved**
- **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules
- **Centralized File Operations**: Single source of truth for filename handling and file validation
- **Unified Song Validation**: Consistent logic for checking if songs should be downloaded
- **Enhanced Type Safety**: Comprehensive type hints across all new modules
- **Improved Configuration Management**: Structured configuration with validation and caching
- **Better Error Handling**: Consistent patterns via centralized utilities
- **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place
- **Improved Testability**: Modular components can be tested independently
- **Better Developer Experience**: Clear function signatures and comprehensive documentation
### **Previous Improvements (v3.2)**
- **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations - **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations
- **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting - **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting
- **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing - **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing
- **Reduced Code Duplication**: Eliminated duplicate code across modules through centralized utilities - **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set.
- **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done.
### **New Utility Modules** - **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach.
- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **Deduplication across channels:** Tracks unique song keys (artist + normalized title) to ensure the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list.
- **`error_utils.py`**: Standardized error handling with structured exception hierarchy - **Fuzzy matching:** Uses string similarity algorithms to find approximate matches between songlist entries and video titles, tolerating minor differences, typos, or extra words like "Karaoke" or "Official Video".
- **`download_pipeline.py`**: Abstracted download pipeline for consistent processing - **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly.
- **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted.
### **Benefits** - **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels.
- **Improved Maintainability**: Changes to yt-dlp configuration only require updates in one place - **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls.
- **Better Error Handling**: Consistent error messages and better debugging context - **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
- **Enhanced Testability**: Modular components can be tested independently
- **Reduced Complexity**: Single source of truth for common operations
## 🐞 Troubleshooting ## 🐞 Troubleshooting
- Ensure `yt-dlp.exe` is in the `downloader/` folder - Ensure `yt-dlp.exe` is in the `downloader/` folder

View File

@ -75,17 +75,7 @@ Examples:
downloader.songlist_only = True # Enable songlist-only mode when focusing downloader.songlist_only = True # Enable songlist-only mode when focusing
print(f"🎯 Songlist focus mode enabled for playlists: {', '.join(args.songlist_focus)}") print(f"🎯 Songlist focus mode enabled for playlists: {', '.join(args.songlist_focus)}")
if args.resolution != '720p': if args.resolution != '720p':
resolution_map = { downloader.config_manager.update_resolution(args.resolution)
'480p': '480',
'720p': '720',
'1080p': '1080',
'1440p': '1440',
'2160p': '2160'
}
height = resolution_map[args.resolution]
downloader.config["download_settings"]["format"] = f"best[height<={height}][ext=mp4]/best[height<={height}]/best[ext=mp4]/best"
downloader.config["download_settings"]["preferred_resolution"] = args.resolution
print(f"🎬 Using resolution: {args.resolution}")
# --- NEW: Reset channel CLI command --- # --- NEW: Reset channel CLI command ---
if args.reset_channel: if args.reset_channel:

View File

@ -1,77 +1,303 @@
""" """
Configuration management utilities. Configuration management utilities for the karaoke downloader.
Handles loading and managing application configuration. Provides centralized configuration loading, validation, and management.
""" """
import json import json
from pathlib import Path from pathlib import Path
from typing import Dict, Any, Optional, Union
from dataclasses import dataclass, field
from datetime import datetime
DATA_DIR = Path("data") # Default configuration values
DEFAULT_CONFIG = {
"download_settings": {
"format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best",
"preferred_resolution": "720p",
"audio_format": "mp3",
"audio_quality": "0",
"subtitle_language": "en",
"subtitle_format": "srt",
"write_metadata": False,
"write_thumbnail": False,
"write_description": False,
"write_annotations": False,
"write_comments": False,
"write_subtitles": False,
"embed_metadata": False,
"add_metadata": False,
"continue_downloads": True,
"no_overwrites": True,
"ignore_errors": True,
"no_warnings": False
},
"folder_structure": {
"downloads_dir": "downloads",
"logs_dir": "logs",
"tracking_file": "data/karaoke_tracking.json"
},
"logging": {
"level": "INFO",
"format": "%(asctime)s - %(levelname)s - %(message)s",
"include_console": True,
"include_file": True
},
"yt_dlp_path": "downloader/yt-dlp.exe"
}
def load_config(): # Resolution mapping for CLI arguments
"""Load configuration from data/config.json or return defaults.""" RESOLUTION_MAP = {
config_file = DATA_DIR / "config.json" '480p': '480',
if config_file.exists(): '720p': '720',
try: '1080p': '1080',
with open(config_file, 'r', encoding='utf-8') as f: '1440p': '1440',
return json.load(f) '2160p': '2160'
except (json.JSONDecodeError, FileNotFoundError) as e: }
print(f"Warning: Could not load config.json: {e}")
return get_default_config() @dataclass
class DownloadSettings:
"""Configuration for download settings."""
format: str = "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best"
outtmpl: str = "%(title)s_720p.%(ext)s"
merge_output_format: str = "mp4"
noplaylist: bool = True
postprocessors: list = None
preferred_resolution: str = "720p"
audio_format: str = "mp3"
audio_quality: str = "0"
subtitle_language: str = "en"
subtitle_format: str = "srt"
write_metadata: bool = False
write_thumbnail: bool = False
write_description: bool = False
writedescription: bool = False
write_annotations: bool = False
writeannotations: bool = False
write_comments: bool = False
writecomments: bool = False
write_subtitles: bool = False
writesubtitles: bool = False
writeinfojson: bool = False
writethumbnail: bool = False
embed_metadata: bool = False
add_metadata: bool = False
continue_downloads: bool = True
continuedl: bool = True
no_overwrites: bool = True
nooverwrites: bool = True
ignore_errors: bool = True
ignoreerrors: bool = True
no_warnings: bool = False
def get_default_config(): def __post_init__(self):
"""Get the default configuration.""" """Initialize default values for complex fields."""
return { if self.postprocessors is None:
"download_settings": { self.postprocessors = [{
"format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best", "key": "FFmpegExtractAudio",
"preferred_resolution": "720p", "preferredcodec": "mp3",
"audio_format": "mp3", "preferredquality": "0"
"audio_quality": "0", }]
"subtitle_language": "en",
"subtitle_format": "srt",
"write_metadata": False,
"write_thumbnail": False,
"write_description": False,
"write_annotations": False,
"write_comments": False,
"write_subtitles": False,
"embed_metadata": False,
"add_metadata": False,
"continue_downloads": True,
"no_overwrites": True,
"ignore_errors": True,
"no_warnings": False
},
"folder_structure": {
"downloads_dir": "downloads",
"logs_dir": "logs",
"tracking_file": str(DATA_DIR / "karaoke_tracking.json")
},
"logging": {
"level": "INFO",
"format": "%(asctime)s - %(levelname)s - %(message)s",
"include_console": True,
"include_file": True
},
"yt_dlp_path": "downloader/yt-dlp.exe"
}
def save_config(config): @dataclass
"""Save configuration to data/config.json.""" class FolderStructure:
config_file = DATA_DIR / "config.json" """Configuration for folder structure."""
config_file.parent.mkdir(exist_ok=True) downloads_dir: str = "downloads"
logs_dir: str = "logs"
tracking_file: str = "data/karaoke_tracking.json"
try: @dataclass
with open(config_file, 'w', encoding='utf-8') as f: class LoggingConfig:
json.dump(config, f, indent=2, ensure_ascii=False) """Configuration for logging."""
return True level: str = "INFO"
except Exception as e: format: str = "%(asctime)s - %(levelname)s - %(message)s"
print(f"Error saving config: {e}") include_console: bool = True
return False include_file: bool = True
def update_config(updates): @dataclass
"""Update configuration with new values.""" class AppConfig:
config = load_config() """Main application configuration."""
config.update(updates) download_settings: DownloadSettings = field(default_factory=DownloadSettings)
return save_config(config) folder_structure: FolderStructure = field(default_factory=FolderStructure)
logging: LoggingConfig = field(default_factory=LoggingConfig)
yt_dlp_path: str = "downloader/yt-dlp.exe"
_config_file: Optional[Path] = None
_last_modified: Optional[datetime] = None
class ConfigManager:
"""
Manages application configuration with loading, validation, and caching.
"""
def __init__(self, config_file: Union[str, Path] = "data/config.json"):
"""
Initialize the configuration manager.
Args:
config_file: Path to the configuration file
"""
self.config_file = Path(config_file)
self._config: Optional[AppConfig] = None
self._last_modified: Optional[datetime] = None
def load_config(self, force_reload: bool = False) -> AppConfig:
"""
Load configuration from file with caching.
Args:
force_reload: Force reload even if file hasn't changed
Returns:
AppConfig instance
"""
# Check if we need to reload
if not force_reload and self._config is not None:
if self.config_file.exists():
current_mtime = datetime.fromtimestamp(self.config_file.stat().st_mtime)
if self._last_modified and current_mtime <= self._last_modified:
return self._config
# Load configuration
config_data = self._load_config_file()
self._config = self._create_config_from_dict(config_data)
self._last_modified = datetime.now()
return self._config
def _load_config_file(self) -> Dict[str, Any]:
"""
Load configuration from file with fallback to defaults.
Returns:
Configuration dictionary
"""
if self.config_file.exists():
try:
with open(self.config_file, 'r', encoding='utf-8') as f:
file_config = json.load(f)
# Merge with defaults
return self._merge_configs(DEFAULT_CONFIG, file_config)
except (json.JSONDecodeError, FileNotFoundError) as e:
print(f"Warning: Could not load config.json: {e}")
print("Using default configuration.")
return DEFAULT_CONFIG.copy()
def _merge_configs(self, default: Dict[str, Any], user: Dict[str, Any]) -> Dict[str, Any]:
"""
Merge user configuration with defaults.
Args:
default: Default configuration
user: User configuration
Returns:
Merged configuration
"""
merged = default.copy()
for key, value in user.items():
if key in merged and isinstance(merged[key], dict) and isinstance(value, dict):
merged[key] = self._merge_configs(merged[key], value)
else:
merged[key] = value
return merged
def _create_config_from_dict(self, config_data: Dict[str, Any]) -> AppConfig:
"""
Create AppConfig from dictionary.
Args:
config_data: Configuration dictionary
Returns:
AppConfig instance
"""
download_settings = DownloadSettings(**config_data.get("download_settings", {}))
folder_structure = FolderStructure(**config_data.get("folder_structure", {}))
logging_config = LoggingConfig(**config_data.get("logging", {}))
return AppConfig(
download_settings=download_settings,
folder_structure=folder_structure,
logging=logging_config,
yt_dlp_path=config_data.get("yt_dlp_path", "downloader/yt-dlp.exe"),
_config_file=self.config_file
)
def update_resolution(self, resolution: str) -> None:
"""
Update the download format based on resolution.
Args:
resolution: Resolution string (e.g., "720p", "1080p")
"""
if self._config is None:
self.load_config()
if resolution in RESOLUTION_MAP:
height = RESOLUTION_MAP[resolution]
format_str = f"best[height<={height}][ext=mp4]/best[height<={height}]/best[ext=mp4]/best"
self._config.download_settings.format = format_str
self._config.download_settings.preferred_resolution = resolution
print(f"🎬 Using resolution: {resolution}")
def get_config(self) -> AppConfig:
"""
Get the current configuration.
Returns:
AppConfig instance
"""
if self._config is None:
return self.load_config()
return self._config
def save_config(self) -> None:
"""
Save current configuration to file.
"""
if self._config is None:
return
config_dict = {
"download_settings": self._config.download_settings.__dict__,
"folder_structure": self._config.folder_structure.__dict__,
"logging": self._config.logging.__dict__,
"yt_dlp_path": self._config.yt_dlp_path
}
# Ensure directory exists
self.config_file.parent.mkdir(parents=True, exist_ok=True)
with open(self.config_file, 'w', encoding='utf-8') as f:
json.dump(config_dict, f, indent=2, ensure_ascii=False)
print(f"Configuration saved to {self.config_file}")
# Global configuration manager instance
_config_manager: Optional[ConfigManager] = None
def get_config_manager() -> ConfigManager:
"""
Get the global configuration manager instance.
Returns:
ConfigManager instance
"""
global _config_manager
if _config_manager is None:
_config_manager = ConfigManager()
return _config_manager
def load_config(force_reload: bool = False) -> AppConfig:
"""
Load configuration using the global manager.
Args:
force_reload: Force reload even if file hasn't changed
Returns:
AppConfig instance
"""
return get_config_manager().load_config(force_reload)

View File

@ -96,11 +96,11 @@ class DownloadPipeline:
) )
print(f"🔧 Running command: {' '.join(cmd)}") print(f"🔧 Running command: {' '.join(cmd)}")
print(f"📺 Resolution settings: {self.config.get('download_settings', {}).get('preferred_resolution', 'Unknown')}") print(f"📺 Resolution settings: {self.config.download_settings.preferred_resolution}")
print(f"🎬 Format string: {self.config.get('download_settings', {}).get('format', 'Unknown')}") print(f"🎬 Format string: {self.config.download_settings.format}")
# Debug: Show available formats (optional) # Debug: Show available formats (optional)
if self.config.get('debug_show_formats', False): if hasattr(self.config, 'debug_show_formats') and self.config.debug_show_formats:
show_available_formats(video_url, self.yt_dlp_path) show_available_formats(video_url, self.yt_dlp_path)
try: try:

View File

@ -5,6 +5,7 @@ import json
import re import re
from pathlib import Path from pathlib import Path
from datetime import datetime, timedelta from datetime import datetime, timedelta
from typing import Dict, Any, Optional, List, Tuple
from karaoke_downloader.tracking_manager import TrackingManager, SongStatus, FormatType from karaoke_downloader.tracking_manager import TrackingManager, SongStatus, FormatType
from karaoke_downloader.id3_utils import add_id3_tags, extract_artist_title from karaoke_downloader.id3_utils import add_id3_tags, extract_artist_title
from karaoke_downloader.songlist_manager import ( from karaoke_downloader.songlist_manager import (
@ -27,30 +28,47 @@ from karaoke_downloader.video_downloader import download_video_and_track, is_val
from karaoke_downloader.channel_manager import reset_channel_downloads, download_from_file from karaoke_downloader.channel_manager import reset_channel_downloads, download_from_file
from karaoke_downloader.download_pipeline import DownloadPipeline from karaoke_downloader.download_pipeline import DownloadPipeline
from karaoke_downloader.error_utils import handle_yt_dlp_error, log_error from karaoke_downloader.error_utils import handle_yt_dlp_error, log_error
from karaoke_downloader.song_validator import create_song_validator
from karaoke_downloader.config_manager import load_config, get_config_manager
from karaoke_downloader.file_utils import sanitize_filename, ensure_directory_exists
# Constants # Constants
DEFAULT_FUZZY_THRESHOLD = 85 DEFAULT_FUZZY_THRESHOLD = 85
DEFAULT_CACHE_EXPIRATION_DAYS = 1 DEFAULT_CACHE_EXPIRATION_DAYS = 1
DEFAULT_FILENAME_LENGTH_LIMIT = 100
DEFAULT_ARTIST_LENGTH_LIMIT = 30
DEFAULT_TITLE_LENGTH_LIMIT = 60
DEFAULT_DISPLAY_LIMIT = 10 DEFAULT_DISPLAY_LIMIT = 10
DATA_DIR = Path("data") DATA_DIR = Path("data")
class KaraokeDownloader: class KaraokeDownloader:
def __init__(self): def __init__(self):
self.yt_dlp_path = Path("downloader/yt-dlp.exe") # Load configuration
self.downloads_dir = Path("downloads") self.config_manager = get_config_manager()
self.logs_dir = Path("logs") self.config = self.config_manager.load_config()
self.downloads_dir.mkdir(exist_ok=True)
self.logs_dir.mkdir(exist_ok=True) # Initialize paths
self.tracker = TrackingManager(tracking_file=DATA_DIR / "karaoke_tracking.json", cache_file=DATA_DIR / "channel_cache.json") self.yt_dlp_path = Path(self.config.yt_dlp_path)
self.config = self._load_config() self.downloads_dir = Path(self.config.folder_structure.downloads_dir)
self.logs_dir = Path(self.config.folder_structure.logs_dir)
# Ensure directories exist
ensure_directory_exists(self.downloads_dir)
ensure_directory_exists(self.logs_dir)
# Initialize tracking
tracking_file = DATA_DIR / "karaoke_tracking.json"
cache_file = DATA_DIR / "channel_cache.json"
self.tracker = TrackingManager(tracking_file=tracking_file, cache_file=cache_file)
# Initialize song validator
self.song_validator = create_song_validator(self.tracker, self.downloads_dir)
# Load songlist tracking
self.songlist_tracking_file = DATA_DIR / "songlist_tracking.json" self.songlist_tracking_file = DATA_DIR / "songlist_tracking.json"
self.songlist_tracking = load_songlist_tracking(str(self.songlist_tracking_file)) self.songlist_tracking = load_songlist_tracking(str(self.songlist_tracking_file))
# Load server songs for availability checking # Load server songs for availability checking
self.server_songs = load_server_songs() self.server_songs = load_server_songs()
# Songlist focus mode attributes # Songlist focus mode attributes
self.songlist_focus_titles = None self.songlist_focus_titles = None
self.songlist_only = False self.songlist_only = False
@ -58,114 +76,30 @@ class KaraokeDownloader:
self.download_limit = None self.download_limit = None
def _load_config(self): def _load_config(self):
config_file = DATA_DIR / "config.json" """Load configuration using the config manager."""
if config_file.exists(): return self.config_manager.load_config()
try:
with open(config_file, 'r', encoding='utf-8') as f:
return json.load(f)
except (json.JSONDecodeError, FileNotFoundError) as e:
print(f"Warning: Could not load config.json: {e}")
return {
"download_settings": {
"format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best",
"preferred_resolution": "720p",
"audio_format": "mp3",
"audio_quality": "0",
"subtitle_language": "en",
"subtitle_format": "srt",
"write_metadata": False,
"write_thumbnail": False,
"write_description": False,
"write_annotations": False,
"write_comments": False,
"write_subtitles": False,
"embed_metadata": False,
"add_metadata": False,
"continue_downloads": True,
"no_overwrites": True,
"ignore_errors": True,
"no_warnings": False
},
"folder_structure": {
"downloads_dir": "downloads",
"logs_dir": "logs",
"tracking_file": str(DATA_DIR / "karaoke_tracking.json")
},
"logging": {
"level": "INFO",
"format": "%(asctime)s - %(levelname)s - %(message)s",
"include_console": True,
"include_file": True
},
"yt_dlp_path": "downloader/yt-dlp.exe"
}
def _should_skip_song(self, artist, title, channel_name, video_id, video_title, server_songs=None, server_duplicates_tracking=None): def _should_skip_song(self, artist, title, channel_name, video_id, video_title, server_songs=None, server_duplicates_tracking=None):
""" """
Centralized method to check if a song should be skipped. Check if a song should be skipped using the centralized SongValidator.
Performs four checks in order:
1. Already downloaded (tracking)
2. File exists on filesystem
3. Already on server
4. Previously failed download (bad file)
Returns: Returns:
tuple: (should_skip, reason, total_filtered) tuple: (should_skip, reason, total_filtered)
""" """
total_filtered = 0 return self.song_validator.should_skip_song(
artist, title, channel_name, video_id, video_title,
# Check 1: Already downloaded by this system server_songs, server_duplicates_tracking
if self.tracker.is_song_downloaded(artist, title, channel_name, video_id): )
return True, "already downloaded", total_filtered
# Check 2: File already exists on filesystem
# Generate the expected filename based on the download mode context
safe_title = title
invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
for char in invalid_chars:
safe_title = safe_title.replace(char, "")
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
# Try different filename patterns that might exist
possible_filenames = [
f"{artist} - {safe_title}.mp4", # Songlist mode
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
f"{artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode
]
for filename in possible_filenames:
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
# Apply length limits if needed
safe_artist = artist.replace("'", "").replace('"', "").strip()
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
output_path = self.downloads_dir / channel_name / filename
if output_path.exists() and output_path.stat().st_size > 0:
return True, "file exists", total_filtered
# Check 3: Already on server (if server data provided)
if server_songs is not None and server_duplicates_tracking is not None:
from karaoke_downloader.server_manager import check_and_mark_server_duplicate
if check_and_mark_server_duplicate(server_songs, server_duplicates_tracking, artist, title, video_title, channel_name):
total_filtered += 1
return True, "on server", total_filtered
# Check 4: Previously failed download (bad file)
if self.tracker.is_song_failed(artist, title, channel_name, video_id):
return True, "previously failed", total_filtered
return False, None, total_filtered
def _mark_song_failed(self, artist, title, video_id, channel_name, error_message): def _mark_song_failed(self, artist, title, video_id, channel_name, error_message):
""" """
Centralized method to mark a song as failed in tracking. Mark a song as failed in tracking using the SongValidator.
""" """
self.tracker.mark_song_failed(artist, title, video_id, channel_name, error_message) self.song_validator.mark_song_failed(artist, title, video_id, channel_name, error_message)
print(f"🏷️ Marked song as failed: {artist} - {title}")
def _handle_download_failure(self, artist, title, video_id, channel_name, error_type, error_details=""): def _handle_download_failure(self, artist, title, video_id, channel_name, error_type, error_details=""):
""" """
Centralized method to handle download failures. Handle download failures using the SongValidator.
Args: Args:
artist: Song artist artist: Song artist
@ -175,10 +109,7 @@ class KaraokeDownloader:
error_type: Type of error (e.g., "yt-dlp failed", "file verification failed") error_type: Type of error (e.g., "yt-dlp failed", "file verification failed")
error_details: Additional error details error_details: Additional error details
""" """
error_msg = f"{error_type}" self.song_validator.handle_download_failure(artist, title, video_id, channel_name, error_type, error_details)
if error_details:
error_msg += f": {error_details}"
self._mark_song_failed(artist, title, video_id, channel_name, error_msg)
def download_channel_videos(self, url, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD): def download_channel_videos(self, url, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD):
"""Download videos from a channel or playlist URL, respecting songlist-only and limit flags. Supports fuzzy matching.""" """Download videos from a channel or playlist URL, respecting songlist-only and limit flags. Supports fuzzy matching."""
@ -193,7 +124,7 @@ class KaraokeDownloader:
server_songs = load_server_songs() server_songs = load_server_songs()
server_duplicates_tracking = load_server_duplicates_tracking() server_duplicates_tracking = load_server_duplicates_tracking()
limit = self.config.get('limit', 1) limit = getattr(self.config, 'limit', 1)
cmd = [ cmd = [
str(self.yt_dlp_path), str(self.yt_dlp_path),
'--flat-playlist', '--flat-playlist',

View File

@ -0,0 +1,182 @@
"""
File utilities for filename sanitization, path operations, and file validation.
Centralizes common file operations to eliminate code duplication.
"""
import re
from pathlib import Path
from typing import List, Optional, Tuple
# Constants for filename operations
DEFAULT_FILENAME_LENGTH_LIMIT = 100
DEFAULT_ARTIST_LENGTH_LIMIT = 30
DEFAULT_TITLE_LENGTH_LIMIT = 60
# Windows invalid characters
INVALID_FILENAME_CHARS = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
def sanitize_filename(artist: str, title: str, max_length: int = DEFAULT_FILENAME_LENGTH_LIMIT) -> str:
"""
Create a safe filename from artist and title.
Args:
artist: Song artist name
title: Song title
max_length: Maximum filename length (default: 100)
Returns:
Sanitized filename string
"""
# Clean up title
safe_title = title.replace("(From ", "").replace(")", "").replace(" - ", " ").replace(":", "")
safe_title = safe_title.replace("'", "").replace('"', "")
# Clean up artist
safe_artist = artist.replace("'", "").replace('"', "").strip()
# Remove invalid characters
for char in INVALID_FILENAME_CHARS:
safe_title = safe_title.replace(char, "")
safe_artist = safe_artist.replace(char, "")
# Remove problematic patterns
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
safe_artist = safe_artist.strip()
# Create filename
filename = f"{safe_artist} - {safe_title}.mp4"
# Limit filename length if needed
if len(filename) > max_length:
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
return filename
def generate_possible_filenames(artist: str, title: str, channel_name: str) -> List[str]:
"""
Generate possible filename patterns for different download modes.
Args:
artist: Song artist name
title: Song title
channel_name: Channel name
Returns:
List of possible filename patterns
"""
safe_title = sanitize_title_for_filenames(title)
safe_artist = artist.replace("'", "").replace('"', "").strip()
return [
f"{safe_artist} - {safe_title}.mp4", # Songlist mode
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
f"{safe_artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode
]
def sanitize_title_for_filenames(title: str) -> str:
"""
Sanitize title specifically for filename generation.
Args:
title: Song title
Returns:
Sanitized title string
"""
safe_title = title
for char in INVALID_FILENAME_CHARS:
safe_title = safe_title.replace(char, "")
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
return safe_title
def check_file_exists_with_patterns(
downloads_dir: Path,
channel_name: str,
artist: str,
title: str
) -> Tuple[bool, Optional[Path]]:
"""
Check if a file exists using multiple possible filename patterns.
Args:
downloads_dir: Base downloads directory
channel_name: Channel name
artist: Song artist
title: Song title
Returns:
Tuple of (exists, file_path) where file_path is None if not found
"""
possible_filenames = generate_possible_filenames(artist, title, channel_name)
channel_dir = downloads_dir / channel_name
for filename in possible_filenames:
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
# Apply length limits if needed
safe_artist = artist.replace("'", "").replace('"', "").strip()
safe_title = sanitize_title_for_filenames(title)
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
file_path = channel_dir / filename
if file_path.exists() and file_path.stat().st_size > 0:
return True, file_path
return False, None
def ensure_directory_exists(directory: Path) -> None:
"""
Ensure a directory exists, creating it if necessary.
Args:
directory: Directory path to ensure exists
"""
directory.mkdir(parents=True, exist_ok=True)
def is_valid_mp4_file(file_path: Path) -> bool:
"""
Check if a file is a valid MP4 file.
Args:
file_path: Path to the file to check
Returns:
True if file is a valid MP4, False otherwise
"""
if not file_path.exists():
return False
# Check file size
if file_path.stat().st_size == 0:
return False
# Check file extension
if file_path.suffix.lower() != '.mp4':
return False
# Basic MP4 header check (first 4 bytes should be 'ftyp')
try:
with open(file_path, 'rb') as f:
header = f.read(8)
if len(header) >= 8 and header[4:8] == b'ftyp':
return True
except (IOError, OSError):
pass
return False
def cleanup_temp_files(file_path: Path) -> None:
"""
Clean up temporary files created by yt-dlp.
Args:
file_path: Base file path (without extension)
"""
temp_extensions = ['.info.json', '.meta', '.webp', '.jpg', '.png']
for ext in temp_extensions:
temp_file = file_path.with_suffix(ext)
if temp_file.exists():
try:
temp_file.unlink()
except (IOError, OSError):
pass # Ignore cleanup errors

View File

@ -0,0 +1,144 @@
"""
Song validation utilities for checking if songs should be downloaded.
Centralizes song validation logic to eliminate code duplication.
"""
from pathlib import Path
from typing import Dict, Any, Optional, Tuple, List
from karaoke_downloader.file_utils import check_file_exists_with_patterns
from karaoke_downloader.tracking_manager import TrackingManager
class SongValidator:
"""
Centralized song validation logic for checking if songs should be downloaded.
"""
def __init__(self, tracker: TrackingManager, downloads_dir: Path):
"""
Initialize the song validator.
Args:
tracker: Tracking manager instance
downloads_dir: Base downloads directory
"""
self.tracker = tracker
self.downloads_dir = downloads_dir
def should_skip_song(
self,
artist: str,
title: str,
channel_name: str,
video_id: Optional[str] = None,
video_title: Optional[str] = None,
server_songs: Optional[Dict[str, Any]] = None,
server_duplicates_tracking: Optional[Dict[str, Any]] = None
) -> Tuple[bool, Optional[str], int]:
"""
Check if a song should be skipped based on multiple criteria.
Performs checks in order:
1. Already downloaded (tracking)
2. File exists on filesystem
3. Already on server
4. Previously failed download (bad file)
Args:
artist: Song artist name
title: Song title
channel_name: Channel name
video_id: YouTube video ID (optional)
video_title: YouTube video title (optional)
server_songs: Server songs data (optional)
server_duplicates_tracking: Server duplicates tracking (optional)
Returns:
Tuple of (should_skip, reason, total_filtered)
"""
total_filtered = 0
# Check 1: Already downloaded by this system
if self.tracker.is_song_downloaded(artist, title, channel_name, video_id):
return True, "already downloaded", total_filtered
# Check 2: File already exists on filesystem
file_exists, _ = check_file_exists_with_patterns(
self.downloads_dir, channel_name, artist, title
)
if file_exists:
return True, "file exists", total_filtered
# Check 3: Already on server (if server data provided)
if server_songs is not None and server_duplicates_tracking is not None:
from karaoke_downloader.server_manager import check_and_mark_server_duplicate
if check_and_mark_server_duplicate(
server_songs, server_duplicates_tracking,
artist, title, video_title, channel_name
):
total_filtered += 1
return True, "on server", total_filtered
# Check 4: Previously failed download (bad file)
if self.tracker.is_song_failed(artist, title, channel_name, video_id):
return True, "previously failed", total_filtered
return False, None, total_filtered
def mark_song_failed(
self,
artist: str,
title: str,
video_id: Optional[str],
channel_name: str,
error_message: str
) -> None:
"""
Mark a song as failed in tracking.
Args:
artist: Song artist name
title: Song title
video_id: YouTube video ID (optional)
channel_name: Channel name
error_message: Error message to record
"""
self.tracker.mark_song_failed(artist, title, video_id, channel_name, error_message)
print(f"🏷️ Marked song as failed: {artist} - {title}")
def handle_download_failure(
self,
artist: str,
title: str,
video_id: Optional[str],
channel_name: str,
error_type: str,
error_details: str = ""
) -> None:
"""
Handle download failures with consistent error formatting.
Args:
artist: Song artist name
title: Song title
video_id: YouTube video ID (optional)
channel_name: Channel name
error_type: Type of error (e.g., "yt-dlp failed", "file verification failed")
error_details: Additional error details
"""
error_msg = f"{error_type}"
if error_details:
error_msg += f": {error_details}"
self.mark_song_failed(artist, title, video_id, channel_name, error_msg)
def create_song_validator(tracker: TrackingManager, downloads_dir: Path) -> SongValidator:
"""
Factory function to create a song validator instance.
Args:
tracker: Tracking manager instance
downloads_dir: Base downloads directory
Returns:
SongValidator instance
"""
return SongValidator(tracker, downloads_dir)

View File

@ -5,70 +5,29 @@ Handles the actual downloading and post-processing of videos.
import subprocess import subprocess
from pathlib import Path from pathlib import Path
from typing import Dict, Any, Optional, Tuple
from karaoke_downloader.id3_utils import add_id3_tags from karaoke_downloader.id3_utils import add_id3_tags
from karaoke_downloader.songlist_manager import mark_songlist_song_downloaded from karaoke_downloader.songlist_manager import mark_songlist_song_downloaded
from karaoke_downloader.download_planner import save_plan_cache from karaoke_downloader.download_planner import save_plan_cache
from karaoke_downloader.youtube_utils import build_yt_dlp_command, execute_yt_dlp_command, show_available_formats from karaoke_downloader.youtube_utils import build_yt_dlp_command, execute_yt_dlp_command, show_available_formats
from karaoke_downloader.error_utils import handle_yt_dlp_error, handle_file_validation_error, log_error from karaoke_downloader.error_utils import handle_yt_dlp_error, handle_file_validation_error, log_error
from karaoke_downloader.file_utils import sanitize_filename, is_valid_mp4_file, cleanup_temp_files, ensure_directory_exists
# Constants # Constants
DEFAULT_FILENAME_LENGTH_LIMIT = 100
DEFAULT_ARTIST_LENGTH_LIMIT = 30
DEFAULT_TITLE_LENGTH_LIMIT = 60
DEFAULT_FORMAT_CHECK_TIMEOUT = 30 DEFAULT_FORMAT_CHECK_TIMEOUT = 30
def sanitize_filename(artist, title): def is_valid_mp4(file_path: Path) -> bool:
"""
Create a safe filename from artist and title.
Removes invalid characters and limits length.
"""
# Create a shorter, safer filename
safe_title = title.replace("(From ", "").replace(")", "").replace(" - ", " ").replace(":", "").replace("'", "").replace('"', "")
safe_artist = artist.replace("'", "").replace('"', "")
# Remove all Windows-invalid characters
invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
for char in invalid_chars:
safe_title = safe_title.replace(char, "")
safe_artist = safe_artist.replace(char, "")
# Also remove any other potentially problematic characters
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
safe_artist = safe_artist.strip()
filename = f"{safe_artist} - {safe_title}.mp4"
# Limit filename length to avoid Windows path issues
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
return filename
def is_valid_mp4(file_path):
""" """
Check if a file is a valid MP4 file. Check if a file is a valid MP4 file.
Uses ffprobe if available, otherwise checks file extension and size. Uses ffprobe if available, otherwise checks file extension and size.
Args:
file_path: Path to the file to check
Returns:
True if file is a valid MP4, False otherwise
""" """
if not file_path.exists(): return is_valid_mp4_file(file_path)
return False
# Check file size
if file_path.stat().st_size == 0:
return False
# Try to use ffprobe for validation
try:
import subprocess
result = subprocess.run(
['ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', str(file_path)],
capture_output=True,
text=True,
check=True
)
return True
except (subprocess.CalledProcessError, FileNotFoundError):
# If ffprobe is not available, just check the extension and size
return file_path.suffix.lower() == '.mp4' and file_path.stat().st_size > 0
def download_video_and_track(yt_dlp_path, config, downloads_dir, songlist_tracking, def download_video_and_track(yt_dlp_path, config, downloads_dir, songlist_tracking,
channel_name, channel_url, video_id, video_title, channel_name, channel_url, video_id, video_title,
@ -83,10 +42,33 @@ def download_video_and_track(yt_dlp_path, config, downloads_dir, songlist_tracki
artist, title, channel_name, songlist_tracking artist, title, channel_name, songlist_tracking
) )
def download_single_video(output_path, video_id, config, yt_dlp_path, def download_single_video(
artist, title, channel_name, songlist_tracking): output_path: Path,
"""Download a single video and handle post-processing.""" video_id: str,
output_path.parent.mkdir(parents=True, exist_ok=True) config: Dict[str, Any],
yt_dlp_path: str,
artist: str,
title: str,
channel_name: str,
songlist_tracking: Dict[str, Any]
) -> bool:
"""
Download a single video and handle post-processing.
Args:
output_path: Output file path
video_id: YouTube video ID
config: Configuration dictionary
yt_dlp_path: Path to yt-dlp executable
artist: Song artist name
title: Song title
channel_name: Channel name
songlist_tracking: Songlist tracking data
Returns:
True if successful, False otherwise
"""
ensure_directory_exists(output_path.parent)
print(f"⬇️ Downloading: {artist} - {title} -> {output_path}") print(f"⬇️ Downloading: {artist} - {title} -> {output_path}")
video_url = f"https://www.youtube.com/watch?v={video_id}" video_url = f"https://www.youtube.com/watch?v={video_id}"
@ -95,11 +77,11 @@ def download_single_video(output_path, video_id, config, yt_dlp_path,
cmd = build_yt_dlp_command(yt_dlp_path, video_url, output_path, config) cmd = build_yt_dlp_command(yt_dlp_path, video_url, output_path, config)
print(f"🔧 Running command: {' '.join(cmd)}") print(f"🔧 Running command: {' '.join(cmd)}")
print(f"📺 Resolution settings: {config.get('download_settings', {}).get('preferred_resolution', 'Unknown')}") print(f"📺 Resolution settings: {config.download_settings.preferred_resolution}")
print(f"🎬 Format string: {config.get('download_settings', {}).get('format', 'Unknown')}") print(f"🎬 Format string: {config.download_settings.format}")
# Debug: Show available formats (optional) # Debug: Show available formats (optional)
if config.get('debug_show_formats', False): if hasattr(config, 'debug_show_formats') and config.debug_show_formats:
show_available_formats(video_url, yt_dlp_path) show_available_formats(video_url, yt_dlp_path)
try: try:
@ -121,6 +103,9 @@ def download_single_video(output_path, video_id, config, yt_dlp_path,
add_id3_tags(output_path, f"{artist} - {title} (Karaoke Version)", channel_name) add_id3_tags(output_path, f"{artist} - {title} (Karaoke Version)", channel_name)
mark_songlist_song_downloaded(songlist_tracking, artist, title, channel_name, output_path) mark_songlist_song_downloaded(songlist_tracking, artist, title, channel_name, output_path)
# Clean up temporary files
cleanup_temp_files(output_path.with_suffix(''))
print(f"✅ Downloaded and tracked: {artist} - {title}") print(f"✅ Downloaded and tracked: {artist} - {title}")
print(f"🎉 All post-processing complete for: {output_path}") print(f"🎉 All post-processing complete for: {output_path}")
@ -255,58 +240,5 @@ def cleanup_cache(cache_file):
except Exception as e: except Exception as e:
print(f"⚠️ Could not delete download plan cache: {e}") print(f"⚠️ Could not delete download plan cache: {e}")
def should_skip_song_standalone(artist, title, channel_name, video_id, video_title, downloads_dir, tracker=None, server_songs=None, server_duplicates_tracking=None): # Note: should_skip_song_standalone function has been removed and replaced with SongValidator class
""" # Use karaoke_downloader.song_validator.create_song_validator() instead
Standalone function to check if a song should be skipped.
Performs four checks in order:
1. Already downloaded (tracking) - if tracker provided
2. File exists on filesystem
3. Already on server - if server data provided
4. Previously failed download (bad file) - if tracker provided
Returns:
tuple: (should_skip, reason, total_filtered)
"""
total_filtered = 0
# Check 1: Already downloaded by this system (if tracker provided)
if tracker and tracker.is_song_downloaded(artist, title, channel_name, video_id):
return True, "already downloaded", total_filtered
# Check 2: File already exists on filesystem
# Generate the expected filename based on the download mode context
safe_title = title
invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
for char in invalid_chars:
safe_title = safe_title.replace(char, "")
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
# Try different filename patterns that might exist
possible_filenames = [
f"{artist} - {safe_title}.mp4", # Songlist mode
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
f"{artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode
]
for filename in possible_filenames:
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
# Apply length limits if needed
safe_artist = artist.replace("'", "").replace('"', "").strip()
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
output_path = downloads_dir / channel_name / filename
if output_path.exists() and output_path.stat().st_size > 0:
return True, "file exists", total_filtered
# Check 3: Already on server (if server data provided)
if server_songs is not None and server_duplicates_tracking is not None:
from karaoke_downloader.server_manager import check_and_mark_server_duplicate
if check_and_mark_server_duplicate(server_songs, server_duplicates_tracking, artist, title, video_title, channel_name):
total_filtered += 1
return True, "on server", total_filtered
# Check 4: Previously failed download (bad file) - if tracker provided
if tracker and tracker.is_song_failed(artist, title, channel_name, video_id):
return True, "previously failed", total_filtered
return False, None, total_filtered

View File

@ -78,7 +78,7 @@ def build_yt_dlp_command(
"--ignore-errors", "--ignore-errors",
"--no-warnings", "--no-warnings",
"-o", str(output_path), "-o", str(output_path),
"-f", config.get("download_settings", {}).get("format", "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best"), "-f", config.download_settings.format,
video_url video_url
] ]