diff --git a/PRD.md b/PRD.md index b7e2f6e..e7a862e 100644 --- a/PRD.md +++ b/PRD.md @@ -1,5 +1,5 @@ -# 🎤 Karaoke Video Downloader – PRD (v3.2) +# 🎤 Karaoke Video Downloader – PRD (v3.3) ## ✅ Overview A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse. @@ -20,7 +20,7 @@ The codebase has been refactored into focused modules with centralized utilities - **`server_manager.py`**: Server song availability checking - **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions -### New Utility Modules (v3.2): +### Utility Modules (v3.2): - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`error_utils.py`**: Standardized error handling and formatting - **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline @@ -29,15 +29,20 @@ The codebase has been refactored into focused modules with centralized utilities - **`resolution_cli.py`**: Resolution checking utilities - **`tracking_cli.py`**: Tracking management CLI +### New Utility Modules (v3.3): +- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation +- **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded + ### Benefits of Enhanced Modular Architecture: - **Single Responsibility**: Each module has a focused purpose -- **Centralized Utilities**: Common operations (yt-dlp commands, error handling) are centralized -- **Reduced Duplication**: Eliminated code duplication across modules +- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized +- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules - **Testability**: Individual components can be tested separately - **Maintainability**: Easier to find and fix issues - **Reusability**: Components can be used independently - **Robustness**: Better error handling and interruption recovery - **Consistency**: Standardized error messages and processing pipelines +- **Type Safety**: Comprehensive type hints across all new modules --- @@ -95,6 +100,7 @@ python download_karaoke.py --clear-cache SingKingKaraoke - ✅ Configurable download resolution and yt-dlp options (`data/config.json`) - ✅ Songlist integration: prioritize and track custom songlists - ✅ Songlist-only mode: download only songs from the songlist +- ✅ Songlist focus mode: download only songs from specific playlists by title - ✅ Global songlist tracking to avoid duplicates across channels - ✅ ID3 tagging for artist/title in MP4 files (mutagen) - ✅ Real-time progress and detailed logging @@ -113,6 +119,9 @@ python download_karaoke.py --clear-cache SingKingKaraoke - ✅ **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting - ✅ **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing - ✅ **Reduced code duplication**: Eliminated duplicate code across modules through centralized utilities +- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations +- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules +- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation --- @@ -134,7 +143,9 @@ KaroakeVideoDownloader/ │ ├── error_utils.py # Standardized error handling and formatting │ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline │ ├── id3_utils.py # ID3 tagging utilities -│ ├── config_manager.py # Configuration management +│ ├── config_manager.py # Configuration management with dataclasses +│ ├── file_utils.py # Centralized file operations and filename handling +│ ├── song_validator.py # Centralized song validation logic │ ├── check_resolution.py # Resolution checker utility │ ├── resolution_cli.py # Resolution config CLI │ └── tracking_cli.py # Tracking management CLI @@ -164,6 +175,7 @@ KaroakeVideoDownloader/ - `--file `: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes) - `--songlist-priority`: Prioritize songlist songs in download queue - `--songlist-only`: Download only songs from the songlist +- `--songlist-focus ...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`) - `--songlist-status`: Show songlist download progress - `--limit `: Limit number of downloads (enables fast mode with early exit) - `--resolution <720p|1080p|...>`: Override resolution @@ -186,20 +198,45 @@ KaroakeVideoDownloader/ - **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download. - **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels. -## 🔧 Refactoring Improvements (v3.2) -The codebase has been comprehensively refactored to improve maintainability and reduce code duplication: +## 🔧 Refactoring Improvements (v3.3) +The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization: -### **Centralized Utilities** -- **`youtube_utils.py`**: Centralized yt-dlp command generation and YouTube operations -- **`error_utils.py`**: Standardized error handling with structured exception hierarchy -- **`download_pipeline.py`**: Abstracted download pipeline for consistent processing +### **New Utility Modules (v3.3)** +- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation + - `sanitize_filename()`: Create safe filenames from artist/title + - `generate_possible_filenames()`: Generate filename patterns for different modes + - `check_file_exists_with_patterns()`: Check for existing files using multiple patterns + - `is_valid_mp4_file()`: Validate MP4 files with header checking + - `cleanup_temp_files()`: Remove temporary yt-dlp files + - `ensure_directory_exists()`: Safe directory creation + +- **`song_validator.py`**: Centralized song validation logic + - `SongValidator` class: Unified logic for checking if songs should be downloaded + - `should_skip_song()`: Comprehensive validation with multiple criteria + - `mark_song_failed()`: Consistent failure tracking + - `handle_download_failure()`: Standardized error handling + +- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses + - `ConfigManager` class: Type-safe configuration loading and caching + - `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses + - Configuration validation and merging with defaults + - Dynamic resolution updates ### **Benefits Achieved** -- **Reduced Duplication**: Eliminated ~50 lines of duplicated yt-dlp command generation -- **Improved Maintainability**: Changes to yt-dlp configuration only require updates in one place -- **Enhanced Error Handling**: Consistent error messages and better debugging context -- **Better Code Organization**: Clear separation of concerns and logical module structure -- **Increased Testability**: Modular components can be tested independently +- **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules +- **Centralized File Operations**: Single source of truth for filename handling and file validation +- **Unified Song Validation**: Consistent logic for checking if songs should be downloaded +- **Enhanced Type Safety**: Comprehensive type hints across all new modules +- **Improved Configuration Management**: Structured configuration with validation and caching +- **Better Error Handling**: Consistent patterns via centralized utilities +- **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place +- **Improved Testability**: Modular components can be tested independently +- **Better Developer Experience**: Clear function signatures and comprehensive documentation + +### **Previous Improvements (v3.2)** +- **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations +- **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting +- **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing - **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set. - **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done. - **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach. @@ -208,6 +245,8 @@ The codebase has been comprehensively refactored to improve maintainability and - **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly. - **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted. - **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels. +- **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls. +- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads. --- @@ -219,3 +258,6 @@ The codebase has been comprehensively refactored to improve maintainability and - [ ] Parallel downloads for improved speed - [ ] Unit tests for all modules - [ ] Integration tests for end-to-end workflows +- [ ] Plugin system for custom file operations +- [ ] Advanced configuration UI +- [ ] Real-time download progress visualization diff --git a/README.md b/README.md index a1a5000..28e7325 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ The codebase has been comprehensively refactored into a modular architecture wit - **`server_manager.py`**: Server song availability checking - **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions -### Utility Modules: +### Utility Modules (v3.2): - **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation - **`error_utils.py`**: Standardized error handling and formatting - **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline @@ -44,12 +44,34 @@ The codebase has been comprehensively refactored into a modular architecture wit - **`resolution_cli.py`**: Resolution checking utilities - **`tracking_cli.py`**: Tracking management CLI +### New Utility Modules (v3.3): +- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation + - `sanitize_filename()`: Create safe filenames from artist/title + - `generate_possible_filenames()`: Generate filename patterns for different modes + - `check_file_exists_with_patterns()`: Check for existing files using multiple patterns + - `is_valid_mp4_file()`: Validate MP4 files with header checking + - `cleanup_temp_files()`: Remove temporary yt-dlp files + - `ensure_directory_exists()`: Safe directory creation + +- **`song_validator.py`**: Centralized song validation logic + - `SongValidator` class: Unified logic for checking if songs should be downloaded + - `should_skip_song()`: Comprehensive validation with multiple criteria + - `mark_song_failed()`: Consistent failure tracking + - `handle_download_failure()`: Standardized error handling + +- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses + - `ConfigManager` class: Type-safe configuration loading and caching + - `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses + - Configuration validation and merging with defaults + - Dynamic resolution updates + ### Benefits: -- **Centralized Utilities**: Common operations (yt-dlp commands, error handling) are centralized -- **Reduced Duplication**: Eliminated code duplication across modules +- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized +- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules - **Consistency**: Standardized error messages and processing pipelines - **Maintainability**: Changes isolated to specific modules - **Testability**: Modular components can be tested independently +- **Type Safety**: Comprehensive type hints across all new modules ## 📋 Requirements - **Windows 10/11** @@ -181,7 +203,9 @@ KaroakeVideoDownloader/ │ ├── error_utils.py # Standardized error handling and formatting │ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline │ ├── id3_utils.py # ID3 tagging utilities -│ ├── config_manager.py # Configuration management +│ ├── config_manager.py # Configuration management with dataclasses +│ ├── file_utils.py # Centralized file operations and filename handling +│ ├── song_validator.py # Centralized song validation logic │ ├── check_resolution.py # Resolution checker utility │ ├── resolution_cli.py # Resolution config CLI │ └── tracking_cli.py # Tracking management CLI @@ -271,25 +295,55 @@ python download_karaoke.py --clear-server-duplicates > **🔄 Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage. -## 🔧 Refactoring Improvements (v3.2) -The codebase has been comprehensively refactored to improve maintainability and reduce code duplication: +## 🔧 Refactoring Improvements (v3.3) +The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization: -### **Key Improvements** +### **New Utility Modules (v3.3)** +- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation + - `sanitize_filename()`: Create safe filenames from artist/title + - `generate_possible_filenames()`: Generate filename patterns for different modes + - `check_file_exists_with_patterns()`: Check for existing files using multiple patterns + - `is_valid_mp4_file()`: Validate MP4 files with header checking + - `cleanup_temp_files()`: Remove temporary yt-dlp files + - `ensure_directory_exists()`: Safe directory creation + +- **`song_validator.py`**: Centralized song validation logic + - `SongValidator` class: Unified logic for checking if songs should be downloaded + - `should_skip_song()`: Comprehensive validation with multiple criteria + - `mark_song_failed()`: Consistent failure tracking + - `handle_download_failure()`: Standardized error handling + +- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses + - `ConfigManager` class: Type-safe configuration loading and caching + - `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses + - Configuration validation and merging with defaults + - Dynamic resolution updates + +### **Benefits Achieved** +- **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules +- **Centralized File Operations**: Single source of truth for filename handling and file validation +- **Unified Song Validation**: Consistent logic for checking if songs should be downloaded +- **Enhanced Type Safety**: Comprehensive type hints across all new modules +- **Improved Configuration Management**: Structured configuration with validation and caching +- **Better Error Handling**: Consistent patterns via centralized utilities +- **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place +- **Improved Testability**: Modular components can be tested independently +- **Better Developer Experience**: Clear function signatures and comprehensive documentation + +### **Previous Improvements (v3.2)** - **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations - **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting - **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing -- **Reduced Code Duplication**: Eliminated duplicate code across modules through centralized utilities - -### **New Utility Modules** -- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation -- **`error_utils.py`**: Standardized error handling with structured exception hierarchy -- **`download_pipeline.py`**: Abstracted download pipeline for consistent processing - -### **Benefits** -- **Improved Maintainability**: Changes to yt-dlp configuration only require updates in one place -- **Better Error Handling**: Consistent error messages and better debugging context -- **Enhanced Testability**: Modular components can be tested independently -- **Reduced Complexity**: Single source of truth for common operations +- **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set. +- **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done. +- **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach. +- **Deduplication across channels:** Tracks unique song keys (artist + normalized title) to ensure the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list. +- **Fuzzy matching:** Uses string similarity algorithms to find approximate matches between songlist entries and video titles, tolerating minor differences, typos, or extra words like "Karaoke" or "Official Video". +- **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly. +- **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted. +- **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels. +- **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls. +- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads. ## 🐞 Troubleshooting - Ensure `yt-dlp.exe` is in the `downloader/` folder diff --git a/karaoke_downloader/cli.py b/karaoke_downloader/cli.py index 085ae65..76bcd6c 100644 --- a/karaoke_downloader/cli.py +++ b/karaoke_downloader/cli.py @@ -75,17 +75,7 @@ Examples: downloader.songlist_only = True # Enable songlist-only mode when focusing print(f"🎯 Songlist focus mode enabled for playlists: {', '.join(args.songlist_focus)}") if args.resolution != '720p': - resolution_map = { - '480p': '480', - '720p': '720', - '1080p': '1080', - '1440p': '1440', - '2160p': '2160' - } - height = resolution_map[args.resolution] - downloader.config["download_settings"]["format"] = f"best[height<={height}][ext=mp4]/best[height<={height}]/best[ext=mp4]/best" - downloader.config["download_settings"]["preferred_resolution"] = args.resolution - print(f"🎬 Using resolution: {args.resolution}") + downloader.config_manager.update_resolution(args.resolution) # --- NEW: Reset channel CLI command --- if args.reset_channel: diff --git a/karaoke_downloader/config_manager.py b/karaoke_downloader/config_manager.py index 4920c92..5cbf8e3 100644 --- a/karaoke_downloader/config_manager.py +++ b/karaoke_downloader/config_manager.py @@ -1,77 +1,303 @@ """ -Configuration management utilities. -Handles loading and managing application configuration. +Configuration management utilities for the karaoke downloader. +Provides centralized configuration loading, validation, and management. """ import json from pathlib import Path +from typing import Dict, Any, Optional, Union +from dataclasses import dataclass, field +from datetime import datetime -DATA_DIR = Path("data") +# Default configuration values +DEFAULT_CONFIG = { + "download_settings": { + "format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best", + "preferred_resolution": "720p", + "audio_format": "mp3", + "audio_quality": "0", + "subtitle_language": "en", + "subtitle_format": "srt", + "write_metadata": False, + "write_thumbnail": False, + "write_description": False, + "write_annotations": False, + "write_comments": False, + "write_subtitles": False, + "embed_metadata": False, + "add_metadata": False, + "continue_downloads": True, + "no_overwrites": True, + "ignore_errors": True, + "no_warnings": False + }, + "folder_structure": { + "downloads_dir": "downloads", + "logs_dir": "logs", + "tracking_file": "data/karaoke_tracking.json" + }, + "logging": { + "level": "INFO", + "format": "%(asctime)s - %(levelname)s - %(message)s", + "include_console": True, + "include_file": True + }, + "yt_dlp_path": "downloader/yt-dlp.exe" +} -def load_config(): - """Load configuration from data/config.json or return defaults.""" - config_file = DATA_DIR / "config.json" - if config_file.exists(): - try: - with open(config_file, 'r', encoding='utf-8') as f: - return json.load(f) - except (json.JSONDecodeError, FileNotFoundError) as e: - print(f"Warning: Could not load config.json: {e}") +# Resolution mapping for CLI arguments +RESOLUTION_MAP = { + '480p': '480', + '720p': '720', + '1080p': '1080', + '1440p': '1440', + '2160p': '2160' +} + +@dataclass +class DownloadSettings: + """Configuration for download settings.""" + format: str = "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best" + outtmpl: str = "%(title)s_720p.%(ext)s" + merge_output_format: str = "mp4" + noplaylist: bool = True + postprocessors: list = None + preferred_resolution: str = "720p" + audio_format: str = "mp3" + audio_quality: str = "0" + subtitle_language: str = "en" + subtitle_format: str = "srt" + write_metadata: bool = False + write_thumbnail: bool = False + write_description: bool = False + writedescription: bool = False + write_annotations: bool = False + writeannotations: bool = False + write_comments: bool = False + writecomments: bool = False + write_subtitles: bool = False + writesubtitles: bool = False + writeinfojson: bool = False + writethumbnail: bool = False + embed_metadata: bool = False + add_metadata: bool = False + continue_downloads: bool = True + continuedl: bool = True + no_overwrites: bool = True + nooverwrites: bool = True + ignore_errors: bool = True + ignoreerrors: bool = True + no_warnings: bool = False - return get_default_config() + def __post_init__(self): + """Initialize default values for complex fields.""" + if self.postprocessors is None: + self.postprocessors = [{ + "key": "FFmpegExtractAudio", + "preferredcodec": "mp3", + "preferredquality": "0" + }] -def get_default_config(): - """Get the default configuration.""" - return { - "download_settings": { - "format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best", - "preferred_resolution": "720p", - "audio_format": "mp3", - "audio_quality": "0", - "subtitle_language": "en", - "subtitle_format": "srt", - "write_metadata": False, - "write_thumbnail": False, - "write_description": False, - "write_annotations": False, - "write_comments": False, - "write_subtitles": False, - "embed_metadata": False, - "add_metadata": False, - "continue_downloads": True, - "no_overwrites": True, - "ignore_errors": True, - "no_warnings": False - }, - "folder_structure": { - "downloads_dir": "downloads", - "logs_dir": "logs", - "tracking_file": str(DATA_DIR / "karaoke_tracking.json") - }, - "logging": { - "level": "INFO", - "format": "%(asctime)s - %(levelname)s - %(message)s", - "include_console": True, - "include_file": True - }, - "yt_dlp_path": "downloader/yt-dlp.exe" - } +@dataclass +class FolderStructure: + """Configuration for folder structure.""" + downloads_dir: str = "downloads" + logs_dir: str = "logs" + tracking_file: str = "data/karaoke_tracking.json" -def save_config(config): - """Save configuration to data/config.json.""" - config_file = DATA_DIR / "config.json" - config_file.parent.mkdir(exist_ok=True) +@dataclass +class LoggingConfig: + """Configuration for logging.""" + level: str = "INFO" + format: str = "%(asctime)s - %(levelname)s - %(message)s" + include_console: bool = True + include_file: bool = True + +@dataclass +class AppConfig: + """Main application configuration.""" + download_settings: DownloadSettings = field(default_factory=DownloadSettings) + folder_structure: FolderStructure = field(default_factory=FolderStructure) + logging: LoggingConfig = field(default_factory=LoggingConfig) + yt_dlp_path: str = "downloader/yt-dlp.exe" + _config_file: Optional[Path] = None + _last_modified: Optional[datetime] = None + +class ConfigManager: + """ + Manages application configuration with loading, validation, and caching. + """ - try: - with open(config_file, 'w', encoding='utf-8') as f: - json.dump(config, f, indent=2, ensure_ascii=False) - return True - except Exception as e: - print(f"Error saving config: {e}") - return False + def __init__(self, config_file: Union[str, Path] = "data/config.json"): + """ + Initialize the configuration manager. + + Args: + config_file: Path to the configuration file + """ + self.config_file = Path(config_file) + self._config: Optional[AppConfig] = None + self._last_modified: Optional[datetime] = None + + def load_config(self, force_reload: bool = False) -> AppConfig: + """ + Load configuration from file with caching. + + Args: + force_reload: Force reload even if file hasn't changed + + Returns: + AppConfig instance + """ + # Check if we need to reload + if not force_reload and self._config is not None: + if self.config_file.exists(): + current_mtime = datetime.fromtimestamp(self.config_file.stat().st_mtime) + if self._last_modified and current_mtime <= self._last_modified: + return self._config + + # Load configuration + config_data = self._load_config_file() + self._config = self._create_config_from_dict(config_data) + self._last_modified = datetime.now() + + return self._config + + def _load_config_file(self) -> Dict[str, Any]: + """ + Load configuration from file with fallback to defaults. + + Returns: + Configuration dictionary + """ + if self.config_file.exists(): + try: + with open(self.config_file, 'r', encoding='utf-8') as f: + file_config = json.load(f) + # Merge with defaults + return self._merge_configs(DEFAULT_CONFIG, file_config) + except (json.JSONDecodeError, FileNotFoundError) as e: + print(f"Warning: Could not load config.json: {e}") + print("Using default configuration.") + + return DEFAULT_CONFIG.copy() + + def _merge_configs(self, default: Dict[str, Any], user: Dict[str, Any]) -> Dict[str, Any]: + """ + Merge user configuration with defaults. + + Args: + default: Default configuration + user: User configuration + + Returns: + Merged configuration + """ + merged = default.copy() + + for key, value in user.items(): + if key in merged and isinstance(merged[key], dict) and isinstance(value, dict): + merged[key] = self._merge_configs(merged[key], value) + else: + merged[key] = value + + return merged + + def _create_config_from_dict(self, config_data: Dict[str, Any]) -> AppConfig: + """ + Create AppConfig from dictionary. + + Args: + config_data: Configuration dictionary + + Returns: + AppConfig instance + """ + download_settings = DownloadSettings(**config_data.get("download_settings", {})) + folder_structure = FolderStructure(**config_data.get("folder_structure", {})) + logging_config = LoggingConfig(**config_data.get("logging", {})) + + return AppConfig( + download_settings=download_settings, + folder_structure=folder_structure, + logging=logging_config, + yt_dlp_path=config_data.get("yt_dlp_path", "downloader/yt-dlp.exe"), + _config_file=self.config_file + ) + + def update_resolution(self, resolution: str) -> None: + """ + Update the download format based on resolution. + + Args: + resolution: Resolution string (e.g., "720p", "1080p") + """ + if self._config is None: + self.load_config() + + if resolution in RESOLUTION_MAP: + height = RESOLUTION_MAP[resolution] + format_str = f"best[height<={height}][ext=mp4]/best[height<={height}]/best[ext=mp4]/best" + self._config.download_settings.format = format_str + self._config.download_settings.preferred_resolution = resolution + print(f"🎬 Using resolution: {resolution}") + + def get_config(self) -> AppConfig: + """ + Get the current configuration. + + Returns: + AppConfig instance + """ + if self._config is None: + return self.load_config() + return self._config + + def save_config(self) -> None: + """ + Save current configuration to file. + """ + if self._config is None: + return + + config_dict = { + "download_settings": self._config.download_settings.__dict__, + "folder_structure": self._config.folder_structure.__dict__, + "logging": self._config.logging.__dict__, + "yt_dlp_path": self._config.yt_dlp_path + } + + # Ensure directory exists + self.config_file.parent.mkdir(parents=True, exist_ok=True) + + with open(self.config_file, 'w', encoding='utf-8') as f: + json.dump(config_dict, f, indent=2, ensure_ascii=False) + + print(f"Configuration saved to {self.config_file}") -def update_config(updates): - """Update configuration with new values.""" - config = load_config() - config.update(updates) - return save_config(config) \ No newline at end of file +# Global configuration manager instance +_config_manager: Optional[ConfigManager] = None + +def get_config_manager() -> ConfigManager: + """ + Get the global configuration manager instance. + + Returns: + ConfigManager instance + """ + global _config_manager + if _config_manager is None: + _config_manager = ConfigManager() + return _config_manager + +def load_config(force_reload: bool = False) -> AppConfig: + """ + Load configuration using the global manager. + + Args: + force_reload: Force reload even if file hasn't changed + + Returns: + AppConfig instance + """ + return get_config_manager().load_config(force_reload) \ No newline at end of file diff --git a/karaoke_downloader/download_pipeline.py b/karaoke_downloader/download_pipeline.py index 4df5545..a32ab01 100644 --- a/karaoke_downloader/download_pipeline.py +++ b/karaoke_downloader/download_pipeline.py @@ -96,11 +96,11 @@ class DownloadPipeline: ) print(f"🔧 Running command: {' '.join(cmd)}") - print(f"📺 Resolution settings: {self.config.get('download_settings', {}).get('preferred_resolution', 'Unknown')}") - print(f"🎬 Format string: {self.config.get('download_settings', {}).get('format', 'Unknown')}") + print(f"📺 Resolution settings: {self.config.download_settings.preferred_resolution}") + print(f"🎬 Format string: {self.config.download_settings.format}") # Debug: Show available formats (optional) - if self.config.get('debug_show_formats', False): + if hasattr(self.config, 'debug_show_formats') and self.config.debug_show_formats: show_available_formats(video_url, self.yt_dlp_path) try: diff --git a/karaoke_downloader/downloader.py b/karaoke_downloader/downloader.py index 947511a..86aa658 100644 --- a/karaoke_downloader/downloader.py +++ b/karaoke_downloader/downloader.py @@ -5,6 +5,7 @@ import json import re from pathlib import Path from datetime import datetime, timedelta +from typing import Dict, Any, Optional, List, Tuple from karaoke_downloader.tracking_manager import TrackingManager, SongStatus, FormatType from karaoke_downloader.id3_utils import add_id3_tags, extract_artist_title from karaoke_downloader.songlist_manager import ( @@ -27,30 +28,47 @@ from karaoke_downloader.video_downloader import download_video_and_track, is_val from karaoke_downloader.channel_manager import reset_channel_downloads, download_from_file from karaoke_downloader.download_pipeline import DownloadPipeline from karaoke_downloader.error_utils import handle_yt_dlp_error, log_error +from karaoke_downloader.song_validator import create_song_validator +from karaoke_downloader.config_manager import load_config, get_config_manager +from karaoke_downloader.file_utils import sanitize_filename, ensure_directory_exists # Constants DEFAULT_FUZZY_THRESHOLD = 85 DEFAULT_CACHE_EXPIRATION_DAYS = 1 -DEFAULT_FILENAME_LENGTH_LIMIT = 100 -DEFAULT_ARTIST_LENGTH_LIMIT = 30 -DEFAULT_TITLE_LENGTH_LIMIT = 60 DEFAULT_DISPLAY_LIMIT = 10 DATA_DIR = Path("data") class KaraokeDownloader: def __init__(self): - self.yt_dlp_path = Path("downloader/yt-dlp.exe") - self.downloads_dir = Path("downloads") - self.logs_dir = Path("logs") - self.downloads_dir.mkdir(exist_ok=True) - self.logs_dir.mkdir(exist_ok=True) - self.tracker = TrackingManager(tracking_file=DATA_DIR / "karaoke_tracking.json", cache_file=DATA_DIR / "channel_cache.json") - self.config = self._load_config() + # Load configuration + self.config_manager = get_config_manager() + self.config = self.config_manager.load_config() + + # Initialize paths + self.yt_dlp_path = Path(self.config.yt_dlp_path) + self.downloads_dir = Path(self.config.folder_structure.downloads_dir) + self.logs_dir = Path(self.config.folder_structure.logs_dir) + + # Ensure directories exist + ensure_directory_exists(self.downloads_dir) + ensure_directory_exists(self.logs_dir) + + # Initialize tracking + tracking_file = DATA_DIR / "karaoke_tracking.json" + cache_file = DATA_DIR / "channel_cache.json" + self.tracker = TrackingManager(tracking_file=tracking_file, cache_file=cache_file) + + # Initialize song validator + self.song_validator = create_song_validator(self.tracker, self.downloads_dir) + + # Load songlist tracking self.songlist_tracking_file = DATA_DIR / "songlist_tracking.json" self.songlist_tracking = load_songlist_tracking(str(self.songlist_tracking_file)) + # Load server songs for availability checking self.server_songs = load_server_songs() + # Songlist focus mode attributes self.songlist_focus_titles = None self.songlist_only = False @@ -58,114 +76,30 @@ class KaraokeDownloader: self.download_limit = None def _load_config(self): - config_file = DATA_DIR / "config.json" - if config_file.exists(): - try: - with open(config_file, 'r', encoding='utf-8') as f: - return json.load(f) - except (json.JSONDecodeError, FileNotFoundError) as e: - print(f"Warning: Could not load config.json: {e}") - return { - "download_settings": { - "format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best", - "preferred_resolution": "720p", - "audio_format": "mp3", - "audio_quality": "0", - "subtitle_language": "en", - "subtitle_format": "srt", - "write_metadata": False, - "write_thumbnail": False, - "write_description": False, - "write_annotations": False, - "write_comments": False, - "write_subtitles": False, - "embed_metadata": False, - "add_metadata": False, - "continue_downloads": True, - "no_overwrites": True, - "ignore_errors": True, - "no_warnings": False - }, - "folder_structure": { - "downloads_dir": "downloads", - "logs_dir": "logs", - "tracking_file": str(DATA_DIR / "karaoke_tracking.json") - }, - "logging": { - "level": "INFO", - "format": "%(asctime)s - %(levelname)s - %(message)s", - "include_console": True, - "include_file": True - }, - "yt_dlp_path": "downloader/yt-dlp.exe" - } + """Load configuration using the config manager.""" + return self.config_manager.load_config() def _should_skip_song(self, artist, title, channel_name, video_id, video_title, server_songs=None, server_duplicates_tracking=None): """ - Centralized method to check if a song should be skipped. - Performs four checks in order: - 1. Already downloaded (tracking) - 2. File exists on filesystem - 3. Already on server - 4. Previously failed download (bad file) + Check if a song should be skipped using the centralized SongValidator. Returns: tuple: (should_skip, reason, total_filtered) """ - total_filtered = 0 - - # Check 1: Already downloaded by this system - if self.tracker.is_song_downloaded(artist, title, channel_name, video_id): - return True, "already downloaded", total_filtered - - # Check 2: File already exists on filesystem - # Generate the expected filename based on the download mode context - safe_title = title - invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\'] - for char in invalid_chars: - safe_title = safe_title.replace(char, "") - safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip() - - # Try different filename patterns that might exist - possible_filenames = [ - f"{artist} - {safe_title}.mp4", # Songlist mode - f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode - f"{artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode - ] - - for filename in possible_filenames: - if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT: - # Apply length limits if needed - safe_artist = artist.replace("'", "").replace('"', "").strip() - filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4" - - output_path = self.downloads_dir / channel_name / filename - if output_path.exists() and output_path.stat().st_size > 0: - return True, "file exists", total_filtered - - # Check 3: Already on server (if server data provided) - if server_songs is not None and server_duplicates_tracking is not None: - from karaoke_downloader.server_manager import check_and_mark_server_duplicate - if check_and_mark_server_duplicate(server_songs, server_duplicates_tracking, artist, title, video_title, channel_name): - total_filtered += 1 - return True, "on server", total_filtered - - # Check 4: Previously failed download (bad file) - if self.tracker.is_song_failed(artist, title, channel_name, video_id): - return True, "previously failed", total_filtered - - return False, None, total_filtered + return self.song_validator.should_skip_song( + artist, title, channel_name, video_id, video_title, + server_songs, server_duplicates_tracking + ) def _mark_song_failed(self, artist, title, video_id, channel_name, error_message): """ - Centralized method to mark a song as failed in tracking. + Mark a song as failed in tracking using the SongValidator. """ - self.tracker.mark_song_failed(artist, title, video_id, channel_name, error_message) - print(f"🏷️ Marked song as failed: {artist} - {title}") + self.song_validator.mark_song_failed(artist, title, video_id, channel_name, error_message) def _handle_download_failure(self, artist, title, video_id, channel_name, error_type, error_details=""): """ - Centralized method to handle download failures. + Handle download failures using the SongValidator. Args: artist: Song artist @@ -175,10 +109,7 @@ class KaraokeDownloader: error_type: Type of error (e.g., "yt-dlp failed", "file verification failed") error_details: Additional error details """ - error_msg = f"{error_type}" - if error_details: - error_msg += f": {error_details}" - self._mark_song_failed(artist, title, video_id, channel_name, error_msg) + self.song_validator.handle_download_failure(artist, title, video_id, channel_name, error_type, error_details) def download_channel_videos(self, url, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD): """Download videos from a channel or playlist URL, respecting songlist-only and limit flags. Supports fuzzy matching.""" @@ -193,7 +124,7 @@ class KaraokeDownloader: server_songs = load_server_songs() server_duplicates_tracking = load_server_duplicates_tracking() - limit = self.config.get('limit', 1) + limit = getattr(self.config, 'limit', 1) cmd = [ str(self.yt_dlp_path), '--flat-playlist', diff --git a/karaoke_downloader/file_utils.py b/karaoke_downloader/file_utils.py new file mode 100644 index 0000000..c9f29b3 --- /dev/null +++ b/karaoke_downloader/file_utils.py @@ -0,0 +1,182 @@ +""" +File utilities for filename sanitization, path operations, and file validation. +Centralizes common file operations to eliminate code duplication. +""" + +import re +from pathlib import Path +from typing import List, Optional, Tuple + +# Constants for filename operations +DEFAULT_FILENAME_LENGTH_LIMIT = 100 +DEFAULT_ARTIST_LENGTH_LIMIT = 30 +DEFAULT_TITLE_LENGTH_LIMIT = 60 + +# Windows invalid characters +INVALID_FILENAME_CHARS = ['?', ':', '*', '"', '<', '>', '|', '/', '\\'] + +def sanitize_filename(artist: str, title: str, max_length: int = DEFAULT_FILENAME_LENGTH_LIMIT) -> str: + """ + Create a safe filename from artist and title. + + Args: + artist: Song artist name + title: Song title + max_length: Maximum filename length (default: 100) + + Returns: + Sanitized filename string + """ + # Clean up title + safe_title = title.replace("(From ", "").replace(")", "").replace(" - ", " ").replace(":", "") + safe_title = safe_title.replace("'", "").replace('"', "") + + # Clean up artist + safe_artist = artist.replace("'", "").replace('"', "").strip() + + # Remove invalid characters + for char in INVALID_FILENAME_CHARS: + safe_title = safe_title.replace(char, "") + safe_artist = safe_artist.replace(char, "") + + # Remove problematic patterns + safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip() + safe_artist = safe_artist.strip() + + # Create filename + filename = f"{safe_artist} - {safe_title}.mp4" + + # Limit filename length if needed + if len(filename) > max_length: + filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4" + + return filename + +def generate_possible_filenames(artist: str, title: str, channel_name: str) -> List[str]: + """ + Generate possible filename patterns for different download modes. + + Args: + artist: Song artist name + title: Song title + channel_name: Channel name + + Returns: + List of possible filename patterns + """ + safe_title = sanitize_title_for_filenames(title) + safe_artist = artist.replace("'", "").replace('"', "").strip() + + return [ + f"{safe_artist} - {safe_title}.mp4", # Songlist mode + f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode + f"{safe_artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode + ] + +def sanitize_title_for_filenames(title: str) -> str: + """ + Sanitize title specifically for filename generation. + + Args: + title: Song title + + Returns: + Sanitized title string + """ + safe_title = title + for char in INVALID_FILENAME_CHARS: + safe_title = safe_title.replace(char, "") + safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip() + return safe_title + +def check_file_exists_with_patterns( + downloads_dir: Path, + channel_name: str, + artist: str, + title: str +) -> Tuple[bool, Optional[Path]]: + """ + Check if a file exists using multiple possible filename patterns. + + Args: + downloads_dir: Base downloads directory + channel_name: Channel name + artist: Song artist + title: Song title + + Returns: + Tuple of (exists, file_path) where file_path is None if not found + """ + possible_filenames = generate_possible_filenames(artist, title, channel_name) + channel_dir = downloads_dir / channel_name + + for filename in possible_filenames: + if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT: + # Apply length limits if needed + safe_artist = artist.replace("'", "").replace('"', "").strip() + safe_title = sanitize_title_for_filenames(title) + filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4" + + file_path = channel_dir / filename + if file_path.exists() and file_path.stat().st_size > 0: + return True, file_path + + return False, None + +def ensure_directory_exists(directory: Path) -> None: + """ + Ensure a directory exists, creating it if necessary. + + Args: + directory: Directory path to ensure exists + """ + directory.mkdir(parents=True, exist_ok=True) + +def is_valid_mp4_file(file_path: Path) -> bool: + """ + Check if a file is a valid MP4 file. + + Args: + file_path: Path to the file to check + + Returns: + True if file is a valid MP4, False otherwise + """ + if not file_path.exists(): + return False + + # Check file size + if file_path.stat().st_size == 0: + return False + + # Check file extension + if file_path.suffix.lower() != '.mp4': + return False + + # Basic MP4 header check (first 4 bytes should be 'ftyp') + try: + with open(file_path, 'rb') as f: + header = f.read(8) + if len(header) >= 8 and header[4:8] == b'ftyp': + return True + except (IOError, OSError): + pass + + return False + +def cleanup_temp_files(file_path: Path) -> None: + """ + Clean up temporary files created by yt-dlp. + + Args: + file_path: Base file path (without extension) + """ + temp_extensions = ['.info.json', '.meta', '.webp', '.jpg', '.png'] + + for ext in temp_extensions: + temp_file = file_path.with_suffix(ext) + if temp_file.exists(): + try: + temp_file.unlink() + except (IOError, OSError): + pass # Ignore cleanup errors \ No newline at end of file diff --git a/karaoke_downloader/song_validator.py b/karaoke_downloader/song_validator.py new file mode 100644 index 0000000..ce69292 --- /dev/null +++ b/karaoke_downloader/song_validator.py @@ -0,0 +1,144 @@ +""" +Song validation utilities for checking if songs should be downloaded. +Centralizes song validation logic to eliminate code duplication. +""" + +from pathlib import Path +from typing import Dict, Any, Optional, Tuple, List +from karaoke_downloader.file_utils import check_file_exists_with_patterns +from karaoke_downloader.tracking_manager import TrackingManager + +class SongValidator: + """ + Centralized song validation logic for checking if songs should be downloaded. + """ + + def __init__(self, tracker: TrackingManager, downloads_dir: Path): + """ + Initialize the song validator. + + Args: + tracker: Tracking manager instance + downloads_dir: Base downloads directory + """ + self.tracker = tracker + self.downloads_dir = downloads_dir + + def should_skip_song( + self, + artist: str, + title: str, + channel_name: str, + video_id: Optional[str] = None, + video_title: Optional[str] = None, + server_songs: Optional[Dict[str, Any]] = None, + server_duplicates_tracking: Optional[Dict[str, Any]] = None + ) -> Tuple[bool, Optional[str], int]: + """ + Check if a song should be skipped based on multiple criteria. + + Performs checks in order: + 1. Already downloaded (tracking) + 2. File exists on filesystem + 3. Already on server + 4. Previously failed download (bad file) + + Args: + artist: Song artist name + title: Song title + channel_name: Channel name + video_id: YouTube video ID (optional) + video_title: YouTube video title (optional) + server_songs: Server songs data (optional) + server_duplicates_tracking: Server duplicates tracking (optional) + + Returns: + Tuple of (should_skip, reason, total_filtered) + """ + total_filtered = 0 + + # Check 1: Already downloaded by this system + if self.tracker.is_song_downloaded(artist, title, channel_name, video_id): + return True, "already downloaded", total_filtered + + # Check 2: File already exists on filesystem + file_exists, _ = check_file_exists_with_patterns( + self.downloads_dir, channel_name, artist, title + ) + if file_exists: + return True, "file exists", total_filtered + + # Check 3: Already on server (if server data provided) + if server_songs is not None and server_duplicates_tracking is not None: + from karaoke_downloader.server_manager import check_and_mark_server_duplicate + if check_and_mark_server_duplicate( + server_songs, server_duplicates_tracking, + artist, title, video_title, channel_name + ): + total_filtered += 1 + return True, "on server", total_filtered + + # Check 4: Previously failed download (bad file) + if self.tracker.is_song_failed(artist, title, channel_name, video_id): + return True, "previously failed", total_filtered + + return False, None, total_filtered + + def mark_song_failed( + self, + artist: str, + title: str, + video_id: Optional[str], + channel_name: str, + error_message: str + ) -> None: + """ + Mark a song as failed in tracking. + + Args: + artist: Song artist name + title: Song title + video_id: YouTube video ID (optional) + channel_name: Channel name + error_message: Error message to record + """ + self.tracker.mark_song_failed(artist, title, video_id, channel_name, error_message) + print(f"🏷️ Marked song as failed: {artist} - {title}") + + def handle_download_failure( + self, + artist: str, + title: str, + video_id: Optional[str], + channel_name: str, + error_type: str, + error_details: str = "" + ) -> None: + """ + Handle download failures with consistent error formatting. + + Args: + artist: Song artist name + title: Song title + video_id: YouTube video ID (optional) + channel_name: Channel name + error_type: Type of error (e.g., "yt-dlp failed", "file verification failed") + error_details: Additional error details + """ + error_msg = f"{error_type}" + if error_details: + error_msg += f": {error_details}" + self.mark_song_failed(artist, title, video_id, channel_name, error_msg) + +def create_song_validator(tracker: TrackingManager, downloads_dir: Path) -> SongValidator: + """ + Factory function to create a song validator instance. + + Args: + tracker: Tracking manager instance + downloads_dir: Base downloads directory + + Returns: + SongValidator instance + """ + return SongValidator(tracker, downloads_dir) \ No newline at end of file diff --git a/karaoke_downloader/video_downloader.py b/karaoke_downloader/video_downloader.py index 6a7fa56..050820c 100644 --- a/karaoke_downloader/video_downloader.py +++ b/karaoke_downloader/video_downloader.py @@ -5,70 +5,29 @@ Handles the actual downloading and post-processing of videos. import subprocess from pathlib import Path +from typing import Dict, Any, Optional, Tuple from karaoke_downloader.id3_utils import add_id3_tags from karaoke_downloader.songlist_manager import mark_songlist_song_downloaded from karaoke_downloader.download_planner import save_plan_cache from karaoke_downloader.youtube_utils import build_yt_dlp_command, execute_yt_dlp_command, show_available_formats from karaoke_downloader.error_utils import handle_yt_dlp_error, handle_file_validation_error, log_error +from karaoke_downloader.file_utils import sanitize_filename, is_valid_mp4_file, cleanup_temp_files, ensure_directory_exists # Constants -DEFAULT_FILENAME_LENGTH_LIMIT = 100 -DEFAULT_ARTIST_LENGTH_LIMIT = 30 -DEFAULT_TITLE_LENGTH_LIMIT = 60 DEFAULT_FORMAT_CHECK_TIMEOUT = 30 -def sanitize_filename(artist, title): - """ - Create a safe filename from artist and title. - Removes invalid characters and limits length. - """ - # Create a shorter, safer filename - safe_title = title.replace("(From ", "").replace(")", "").replace(" - ", " ").replace(":", "").replace("'", "").replace('"', "") - safe_artist = artist.replace("'", "").replace('"', "") - - # Remove all Windows-invalid characters - invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\'] - for char in invalid_chars: - safe_title = safe_title.replace(char, "") - safe_artist = safe_artist.replace(char, "") - - # Also remove any other potentially problematic characters - safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip() - safe_artist = safe_artist.strip() - - filename = f"{safe_artist} - {safe_title}.mp4" - - # Limit filename length to avoid Windows path issues - if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT: - filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4" - - return filename - -def is_valid_mp4(file_path): +def is_valid_mp4(file_path: Path) -> bool: """ Check if a file is a valid MP4 file. Uses ffprobe if available, otherwise checks file extension and size. + + Args: + file_path: Path to the file to check + + Returns: + True if file is a valid MP4, False otherwise """ - if not file_path.exists(): - return False - - # Check file size - if file_path.stat().st_size == 0: - return False - - # Try to use ffprobe for validation - try: - import subprocess - result = subprocess.run( - ['ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', str(file_path)], - capture_output=True, - text=True, - check=True - ) - return True - except (subprocess.CalledProcessError, FileNotFoundError): - # If ffprobe is not available, just check the extension and size - return file_path.suffix.lower() == '.mp4' and file_path.stat().st_size > 0 + return is_valid_mp4_file(file_path) def download_video_and_track(yt_dlp_path, config, downloads_dir, songlist_tracking, channel_name, channel_url, video_id, video_title, @@ -83,10 +42,33 @@ def download_video_and_track(yt_dlp_path, config, downloads_dir, songlist_tracki artist, title, channel_name, songlist_tracking ) -def download_single_video(output_path, video_id, config, yt_dlp_path, - artist, title, channel_name, songlist_tracking): - """Download a single video and handle post-processing.""" - output_path.parent.mkdir(parents=True, exist_ok=True) +def download_single_video( + output_path: Path, + video_id: str, + config: Dict[str, Any], + yt_dlp_path: str, + artist: str, + title: str, + channel_name: str, + songlist_tracking: Dict[str, Any] +) -> bool: + """ + Download a single video and handle post-processing. + + Args: + output_path: Output file path + video_id: YouTube video ID + config: Configuration dictionary + yt_dlp_path: Path to yt-dlp executable + artist: Song artist name + title: Song title + channel_name: Channel name + songlist_tracking: Songlist tracking data + + Returns: + True if successful, False otherwise + """ + ensure_directory_exists(output_path.parent) print(f"⬇️ Downloading: {artist} - {title} -> {output_path}") video_url = f"https://www.youtube.com/watch?v={video_id}" @@ -95,11 +77,11 @@ def download_single_video(output_path, video_id, config, yt_dlp_path, cmd = build_yt_dlp_command(yt_dlp_path, video_url, output_path, config) print(f"🔧 Running command: {' '.join(cmd)}") - print(f"📺 Resolution settings: {config.get('download_settings', {}).get('preferred_resolution', 'Unknown')}") - print(f"🎬 Format string: {config.get('download_settings', {}).get('format', 'Unknown')}") + print(f"📺 Resolution settings: {config.download_settings.preferred_resolution}") + print(f"🎬 Format string: {config.download_settings.format}") # Debug: Show available formats (optional) - if config.get('debug_show_formats', False): + if hasattr(config, 'debug_show_formats') and config.debug_show_formats: show_available_formats(video_url, yt_dlp_path) try: @@ -121,6 +103,9 @@ def download_single_video(output_path, video_id, config, yt_dlp_path, add_id3_tags(output_path, f"{artist} - {title} (Karaoke Version)", channel_name) mark_songlist_song_downloaded(songlist_tracking, artist, title, channel_name, output_path) + # Clean up temporary files + cleanup_temp_files(output_path.with_suffix('')) + print(f"✅ Downloaded and tracked: {artist} - {title}") print(f"🎉 All post-processing complete for: {output_path}") @@ -255,58 +240,5 @@ def cleanup_cache(cache_file): except Exception as e: print(f"⚠️ Could not delete download plan cache: {e}") -def should_skip_song_standalone(artist, title, channel_name, video_id, video_title, downloads_dir, tracker=None, server_songs=None, server_duplicates_tracking=None): - """ - Standalone function to check if a song should be skipped. - Performs four checks in order: - 1. Already downloaded (tracking) - if tracker provided - 2. File exists on filesystem - 3. Already on server - if server data provided - 4. Previously failed download (bad file) - if tracker provided - - Returns: - tuple: (should_skip, reason, total_filtered) - """ - total_filtered = 0 - - # Check 1: Already downloaded by this system (if tracker provided) - if tracker and tracker.is_song_downloaded(artist, title, channel_name, video_id): - return True, "already downloaded", total_filtered - - # Check 2: File already exists on filesystem - # Generate the expected filename based on the download mode context - safe_title = title - invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\'] - for char in invalid_chars: - safe_title = safe_title.replace(char, "") - safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip() - - # Try different filename patterns that might exist - possible_filenames = [ - f"{artist} - {safe_title}.mp4", # Songlist mode - f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode - f"{artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode - ] - - for filename in possible_filenames: - if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT: - # Apply length limits if needed - safe_artist = artist.replace("'", "").replace('"', "").strip() - filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4" - - output_path = downloads_dir / channel_name / filename - if output_path.exists() and output_path.stat().st_size > 0: - return True, "file exists", total_filtered - - # Check 3: Already on server (if server data provided) - if server_songs is not None and server_duplicates_tracking is not None: - from karaoke_downloader.server_manager import check_and_mark_server_duplicate - if check_and_mark_server_duplicate(server_songs, server_duplicates_tracking, artist, title, video_title, channel_name): - total_filtered += 1 - return True, "on server", total_filtered - - # Check 4: Previously failed download (bad file) - if tracker provided - if tracker and tracker.is_song_failed(artist, title, channel_name, video_id): - return True, "previously failed", total_filtered - - return False, None, total_filtered \ No newline at end of file +# Note: should_skip_song_standalone function has been removed and replaced with SongValidator class +# Use karaoke_downloader.song_validator.create_song_validator() instead \ No newline at end of file diff --git a/karaoke_downloader/youtube_utils.py b/karaoke_downloader/youtube_utils.py index 5ada9f2..fcb4f64 100644 --- a/karaoke_downloader/youtube_utils.py +++ b/karaoke_downloader/youtube_utils.py @@ -78,7 +78,7 @@ def build_yt_dlp_command( "--ignore-errors", "--no-warnings", "-o", str(output_path), - "-f", config.get("download_settings", {}).get("format", "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best"), + "-f", config.download_settings.format, video_url ]