Signed-off-by: mbrucedogs <mbrucedogs@gmail.com>
This commit is contained in:
parent
a135efa13a
commit
c5a3838e82
74
PRD.md
74
PRD.md
@ -1,5 +1,5 @@
|
||||
|
||||
# 🎤 Karaoke Video Downloader – PRD (v3.2)
|
||||
# 🎤 Karaoke Video Downloader – PRD (v3.3)
|
||||
|
||||
## ✅ Overview
|
||||
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
||||
@ -20,7 +20,7 @@ The codebase has been refactored into focused modules with centralized utilities
|
||||
- **`server_manager.py`**: Server song availability checking
|
||||
- **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions
|
||||
|
||||
### New Utility Modules (v3.2):
|
||||
### Utility Modules (v3.2):
|
||||
- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation
|
||||
- **`error_utils.py`**: Standardized error handling and formatting
|
||||
- **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline
|
||||
@ -29,15 +29,20 @@ The codebase has been refactored into focused modules with centralized utilities
|
||||
- **`resolution_cli.py`**: Resolution checking utilities
|
||||
- **`tracking_cli.py`**: Tracking management CLI
|
||||
|
||||
### New Utility Modules (v3.3):
|
||||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||||
- **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded
|
||||
|
||||
### Benefits of Enhanced Modular Architecture:
|
||||
- **Single Responsibility**: Each module has a focused purpose
|
||||
- **Centralized Utilities**: Common operations (yt-dlp commands, error handling) are centralized
|
||||
- **Reduced Duplication**: Eliminated code duplication across modules
|
||||
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
||||
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
||||
- **Testability**: Individual components can be tested separately
|
||||
- **Maintainability**: Easier to find and fix issues
|
||||
- **Reusability**: Components can be used independently
|
||||
- **Robustness**: Better error handling and interruption recovery
|
||||
- **Consistency**: Standardized error messages and processing pipelines
|
||||
- **Type Safety**: Comprehensive type hints across all new modules
|
||||
|
||||
---
|
||||
|
||||
@ -95,6 +100,7 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
||||
- ✅ Configurable download resolution and yt-dlp options (`data/config.json`)
|
||||
- ✅ Songlist integration: prioritize and track custom songlists
|
||||
- ✅ Songlist-only mode: download only songs from the songlist
|
||||
- ✅ Songlist focus mode: download only songs from specific playlists by title
|
||||
- ✅ Global songlist tracking to avoid duplicates across channels
|
||||
- ✅ ID3 tagging for artist/title in MP4 files (mutagen)
|
||||
- ✅ Real-time progress and detailed logging
|
||||
@ -113,6 +119,9 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
||||
- ✅ **Enhanced error handling**: Structured exception hierarchy with consistent error messages and formatting
|
||||
- ✅ **Abstracted download pipeline**: Reusable download → verify → tag → track process for consistent processing
|
||||
- ✅ **Reduced code duplication**: Eliminated duplicate code across modules through centralized utilities
|
||||
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
||||
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
||||
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
||||
|
||||
---
|
||||
|
||||
@ -134,7 +143,9 @@ KaroakeVideoDownloader/
|
||||
│ ├── error_utils.py # Standardized error handling and formatting
|
||||
│ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline
|
||||
│ ├── id3_utils.py # ID3 tagging utilities
|
||||
│ ├── config_manager.py # Configuration management
|
||||
│ ├── config_manager.py # Configuration management with dataclasses
|
||||
│ ├── file_utils.py # Centralized file operations and filename handling
|
||||
│ ├── song_validator.py # Centralized song validation logic
|
||||
│ ├── check_resolution.py # Resolution checker utility
|
||||
│ ├── resolution_cli.py # Resolution config CLI
|
||||
│ └── tracking_cli.py # Tracking management CLI
|
||||
@ -164,6 +175,7 @@ KaroakeVideoDownloader/
|
||||
- `--file <data/channels.txt>`: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes)
|
||||
- `--songlist-priority`: Prioritize songlist songs in download queue
|
||||
- `--songlist-only`: Download only songs from the songlist
|
||||
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
||||
- `--songlist-status`: Show songlist download progress
|
||||
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
||||
- `--resolution <720p|1080p|...>`: Override resolution
|
||||
@ -186,20 +198,45 @@ KaroakeVideoDownloader/
|
||||
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
||||
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
||||
|
||||
## 🔧 Refactoring Improvements (v3.2)
|
||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication:
|
||||
## 🔧 Refactoring Improvements (v3.3)
|
||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||
|
||||
### **Centralized Utilities**
|
||||
- **`youtube_utils.py`**: Centralized yt-dlp command generation and YouTube operations
|
||||
- **`error_utils.py`**: Standardized error handling with structured exception hierarchy
|
||||
- **`download_pipeline.py`**: Abstracted download pipeline for consistent processing
|
||||
### **New Utility Modules (v3.3)**
|
||||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||||
- `sanitize_filename()`: Create safe filenames from artist/title
|
||||
- `generate_possible_filenames()`: Generate filename patterns for different modes
|
||||
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
|
||||
- `is_valid_mp4_file()`: Validate MP4 files with header checking
|
||||
- `cleanup_temp_files()`: Remove temporary yt-dlp files
|
||||
- `ensure_directory_exists()`: Safe directory creation
|
||||
|
||||
- **`song_validator.py`**: Centralized song validation logic
|
||||
- `SongValidator` class: Unified logic for checking if songs should be downloaded
|
||||
- `should_skip_song()`: Comprehensive validation with multiple criteria
|
||||
- `mark_song_failed()`: Consistent failure tracking
|
||||
- `handle_download_failure()`: Standardized error handling
|
||||
|
||||
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
|
||||
- `ConfigManager` class: Type-safe configuration loading and caching
|
||||
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
|
||||
- Configuration validation and merging with defaults
|
||||
- Dynamic resolution updates
|
||||
|
||||
### **Benefits Achieved**
|
||||
- **Reduced Duplication**: Eliminated ~50 lines of duplicated yt-dlp command generation
|
||||
- **Improved Maintainability**: Changes to yt-dlp configuration only require updates in one place
|
||||
- **Enhanced Error Handling**: Consistent error messages and better debugging context
|
||||
- **Better Code Organization**: Clear separation of concerns and logical module structure
|
||||
- **Increased Testability**: Modular components can be tested independently
|
||||
- **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules
|
||||
- **Centralized File Operations**: Single source of truth for filename handling and file validation
|
||||
- **Unified Song Validation**: Consistent logic for checking if songs should be downloaded
|
||||
- **Enhanced Type Safety**: Comprehensive type hints across all new modules
|
||||
- **Improved Configuration Management**: Structured configuration with validation and caching
|
||||
- **Better Error Handling**: Consistent patterns via centralized utilities
|
||||
- **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place
|
||||
- **Improved Testability**: Modular components can be tested independently
|
||||
- **Better Developer Experience**: Clear function signatures and comprehensive documentation
|
||||
|
||||
### **Previous Improvements (v3.2)**
|
||||
- **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations
|
||||
- **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting
|
||||
- **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing
|
||||
- **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set.
|
||||
- **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done.
|
||||
- **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach.
|
||||
@ -208,6 +245,8 @@ The codebase has been comprehensively refactored to improve maintainability and
|
||||
- **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly.
|
||||
- **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted.
|
||||
- **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels.
|
||||
- **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls.
|
||||
- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
|
||||
|
||||
---
|
||||
|
||||
@ -219,3 +258,6 @@ The codebase has been comprehensively refactored to improve maintainability and
|
||||
- [ ] Parallel downloads for improved speed
|
||||
- [ ] Unit tests for all modules
|
||||
- [ ] Integration tests for end-to-end workflows
|
||||
- [ ] Plugin system for custom file operations
|
||||
- [ ] Advanced configuration UI
|
||||
- [ ] Real-time download progress visualization
|
||||
|
||||
92
README.md
92
README.md
@ -35,7 +35,7 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
||||
- **`server_manager.py`**: Server song availability checking
|
||||
- **`fuzzy_matcher.py`**: Fuzzy matching logic and similarity functions
|
||||
|
||||
### Utility Modules:
|
||||
### Utility Modules (v3.2):
|
||||
- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation
|
||||
- **`error_utils.py`**: Standardized error handling and formatting
|
||||
- **`download_pipeline.py`**: Abstracted download → verify → tag → track pipeline
|
||||
@ -44,12 +44,34 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
||||
- **`resolution_cli.py`**: Resolution checking utilities
|
||||
- **`tracking_cli.py`**: Tracking management CLI
|
||||
|
||||
### New Utility Modules (v3.3):
|
||||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||||
- `sanitize_filename()`: Create safe filenames from artist/title
|
||||
- `generate_possible_filenames()`: Generate filename patterns for different modes
|
||||
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
|
||||
- `is_valid_mp4_file()`: Validate MP4 files with header checking
|
||||
- `cleanup_temp_files()`: Remove temporary yt-dlp files
|
||||
- `ensure_directory_exists()`: Safe directory creation
|
||||
|
||||
- **`song_validator.py`**: Centralized song validation logic
|
||||
- `SongValidator` class: Unified logic for checking if songs should be downloaded
|
||||
- `should_skip_song()`: Comprehensive validation with multiple criteria
|
||||
- `mark_song_failed()`: Consistent failure tracking
|
||||
- `handle_download_failure()`: Standardized error handling
|
||||
|
||||
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
|
||||
- `ConfigManager` class: Type-safe configuration loading and caching
|
||||
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
|
||||
- Configuration validation and merging with defaults
|
||||
- Dynamic resolution updates
|
||||
|
||||
### Benefits:
|
||||
- **Centralized Utilities**: Common operations (yt-dlp commands, error handling) are centralized
|
||||
- **Reduced Duplication**: Eliminated code duplication across modules
|
||||
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
||||
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
||||
- **Consistency**: Standardized error messages and processing pipelines
|
||||
- **Maintainability**: Changes isolated to specific modules
|
||||
- **Testability**: Modular components can be tested independently
|
||||
- **Type Safety**: Comprehensive type hints across all new modules
|
||||
|
||||
## 📋 Requirements
|
||||
- **Windows 10/11**
|
||||
@ -181,7 +203,9 @@ KaroakeVideoDownloader/
|
||||
│ ├── error_utils.py # Standardized error handling and formatting
|
||||
│ ├── download_pipeline.py # Abstracted download → verify → tag → track pipeline
|
||||
│ ├── id3_utils.py # ID3 tagging utilities
|
||||
│ ├── config_manager.py # Configuration management
|
||||
│ ├── config_manager.py # Configuration management with dataclasses
|
||||
│ ├── file_utils.py # Centralized file operations and filename handling
|
||||
│ ├── song_validator.py # Centralized song validation logic
|
||||
│ ├── check_resolution.py # Resolution checker utility
|
||||
│ ├── resolution_cli.py # Resolution config CLI
|
||||
│ └── tracking_cli.py # Tracking management CLI
|
||||
@ -271,25 +295,55 @@ python download_karaoke.py --clear-server-duplicates
|
||||
|
||||
> **🔄 Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage.
|
||||
|
||||
## 🔧 Refactoring Improvements (v3.2)
|
||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication:
|
||||
## 🔧 Refactoring Improvements (v3.3)
|
||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||
|
||||
### **Key Improvements**
|
||||
### **New Utility Modules (v3.3)**
|
||||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||||
- `sanitize_filename()`: Create safe filenames from artist/title
|
||||
- `generate_possible_filenames()`: Generate filename patterns for different modes
|
||||
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
|
||||
- `is_valid_mp4_file()`: Validate MP4 files with header checking
|
||||
- `cleanup_temp_files()`: Remove temporary yt-dlp files
|
||||
- `ensure_directory_exists()`: Safe directory creation
|
||||
|
||||
- **`song_validator.py`**: Centralized song validation logic
|
||||
- `SongValidator` class: Unified logic for checking if songs should be downloaded
|
||||
- `should_skip_song()`: Comprehensive validation with multiple criteria
|
||||
- `mark_song_failed()`: Consistent failure tracking
|
||||
- `handle_download_failure()`: Standardized error handling
|
||||
|
||||
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
|
||||
- `ConfigManager` class: Type-safe configuration loading and caching
|
||||
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
|
||||
- Configuration validation and merging with defaults
|
||||
- Dynamic resolution updates
|
||||
|
||||
### **Benefits Achieved**
|
||||
- **Eliminated Code Duplication**: ~150 lines of duplicate code removed across modules
|
||||
- **Centralized File Operations**: Single source of truth for filename handling and file validation
|
||||
- **Unified Song Validation**: Consistent logic for checking if songs should be downloaded
|
||||
- **Enhanced Type Safety**: Comprehensive type hints across all new modules
|
||||
- **Improved Configuration Management**: Structured configuration with validation and caching
|
||||
- **Better Error Handling**: Consistent patterns via centralized utilities
|
||||
- **Enhanced Maintainability**: Changes to file operations or song validation only require updates in one place
|
||||
- **Improved Testability**: Modular components can be tested independently
|
||||
- **Better Developer Experience**: Clear function signatures and comprehensive documentation
|
||||
|
||||
### **Previous Improvements (v3.2)**
|
||||
- **Centralized yt-dlp Command Generation**: Standardized command building and execution across all download operations
|
||||
- **Enhanced Error Handling**: Structured exception hierarchy with consistent error messages and formatting
|
||||
- **Abstracted Download Pipeline**: Reusable download → verify → tag → track process for consistent processing
|
||||
- **Reduced Code Duplication**: Eliminated duplicate code across modules through centralized utilities
|
||||
|
||||
### **New Utility Modules**
|
||||
- **`youtube_utils.py`**: Centralized YouTube operations and yt-dlp command generation
|
||||
- **`error_utils.py`**: Standardized error handling with structured exception hierarchy
|
||||
- **`download_pipeline.py`**: Abstracted download pipeline for consistent processing
|
||||
|
||||
### **Benefits**
|
||||
- **Improved Maintainability**: Changes to yt-dlp configuration only require updates in one place
|
||||
- **Better Error Handling**: Consistent error messages and better debugging context
|
||||
- **Enhanced Testability**: Modular components can be tested independently
|
||||
- **Reduced Complexity**: Single source of truth for common operations
|
||||
- **Download plan pre-scan:** Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set.
|
||||
- **Latest-per-channel plan:** Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done.
|
||||
- **Fast mode with early exit:** When a limit is set, the tool scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads. This provides much faster performance for small limits compared to the full pre-scan approach.
|
||||
- **Deduplication across channels:** Tracks unique song keys (artist + normalized title) to ensure the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list.
|
||||
- **Fuzzy matching:** Uses string similarity algorithms to find approximate matches between songlist entries and video titles, tolerating minor differences, typos, or extra words like "Karaoke" or "Official Video".
|
||||
- **Default channel file:** For songlist-only and latest-per-channel modes, if no --file is specified, automatically uses data/channels.txt as the default channel list, reducing the need to specify the file path repeatedly.
|
||||
- **Robust interruption handling:** Progress is saved after each download, and files are checked for existence before downloading to prevent re-downloads if the process is interrupted.
|
||||
- **Optimized scanning algorithm:** High-performance channel scanning with O(n×m) complexity, pre-processed song lookups using sets and dictionaries, and early termination for faster matching of large songlists and channels.
|
||||
- **Enhanced cache management:** Improved channel cache key handling for better cache hit rates and reduced YouTube API calls.
|
||||
- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
|
||||
|
||||
## 🐞 Troubleshooting
|
||||
- Ensure `yt-dlp.exe` is in the `downloader/` folder
|
||||
|
||||
@ -75,17 +75,7 @@ Examples:
|
||||
downloader.songlist_only = True # Enable songlist-only mode when focusing
|
||||
print(f"🎯 Songlist focus mode enabled for playlists: {', '.join(args.songlist_focus)}")
|
||||
if args.resolution != '720p':
|
||||
resolution_map = {
|
||||
'480p': '480',
|
||||
'720p': '720',
|
||||
'1080p': '1080',
|
||||
'1440p': '1440',
|
||||
'2160p': '2160'
|
||||
}
|
||||
height = resolution_map[args.resolution]
|
||||
downloader.config["download_settings"]["format"] = f"best[height<={height}][ext=mp4]/best[height<={height}]/best[ext=mp4]/best"
|
||||
downloader.config["download_settings"]["preferred_resolution"] = args.resolution
|
||||
print(f"🎬 Using resolution: {args.resolution}")
|
||||
downloader.config_manager.update_resolution(args.resolution)
|
||||
|
||||
# --- NEW: Reset channel CLI command ---
|
||||
if args.reset_channel:
|
||||
|
||||
@ -1,77 +1,303 @@
|
||||
"""
|
||||
Configuration management utilities.
|
||||
Handles loading and managing application configuration.
|
||||
Configuration management utilities for the karaoke downloader.
|
||||
Provides centralized configuration loading, validation, and management.
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional, Union
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
|
||||
DATA_DIR = Path("data")
|
||||
# Default configuration values
|
||||
DEFAULT_CONFIG = {
|
||||
"download_settings": {
|
||||
"format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best",
|
||||
"preferred_resolution": "720p",
|
||||
"audio_format": "mp3",
|
||||
"audio_quality": "0",
|
||||
"subtitle_language": "en",
|
||||
"subtitle_format": "srt",
|
||||
"write_metadata": False,
|
||||
"write_thumbnail": False,
|
||||
"write_description": False,
|
||||
"write_annotations": False,
|
||||
"write_comments": False,
|
||||
"write_subtitles": False,
|
||||
"embed_metadata": False,
|
||||
"add_metadata": False,
|
||||
"continue_downloads": True,
|
||||
"no_overwrites": True,
|
||||
"ignore_errors": True,
|
||||
"no_warnings": False
|
||||
},
|
||||
"folder_structure": {
|
||||
"downloads_dir": "downloads",
|
||||
"logs_dir": "logs",
|
||||
"tracking_file": "data/karaoke_tracking.json"
|
||||
},
|
||||
"logging": {
|
||||
"level": "INFO",
|
||||
"format": "%(asctime)s - %(levelname)s - %(message)s",
|
||||
"include_console": True,
|
||||
"include_file": True
|
||||
},
|
||||
"yt_dlp_path": "downloader/yt-dlp.exe"
|
||||
}
|
||||
|
||||
def load_config():
|
||||
"""Load configuration from data/config.json or return defaults."""
|
||||
config_file = DATA_DIR / "config.json"
|
||||
if config_file.exists():
|
||||
try:
|
||||
with open(config_file, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, FileNotFoundError) as e:
|
||||
print(f"Warning: Could not load config.json: {e}")
|
||||
# Resolution mapping for CLI arguments
|
||||
RESOLUTION_MAP = {
|
||||
'480p': '480',
|
||||
'720p': '720',
|
||||
'1080p': '1080',
|
||||
'1440p': '1440',
|
||||
'2160p': '2160'
|
||||
}
|
||||
|
||||
@dataclass
|
||||
class DownloadSettings:
|
||||
"""Configuration for download settings."""
|
||||
format: str = "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best"
|
||||
outtmpl: str = "%(title)s_720p.%(ext)s"
|
||||
merge_output_format: str = "mp4"
|
||||
noplaylist: bool = True
|
||||
postprocessors: list = None
|
||||
preferred_resolution: str = "720p"
|
||||
audio_format: str = "mp3"
|
||||
audio_quality: str = "0"
|
||||
subtitle_language: str = "en"
|
||||
subtitle_format: str = "srt"
|
||||
write_metadata: bool = False
|
||||
write_thumbnail: bool = False
|
||||
write_description: bool = False
|
||||
writedescription: bool = False
|
||||
write_annotations: bool = False
|
||||
writeannotations: bool = False
|
||||
write_comments: bool = False
|
||||
writecomments: bool = False
|
||||
write_subtitles: bool = False
|
||||
writesubtitles: bool = False
|
||||
writeinfojson: bool = False
|
||||
writethumbnail: bool = False
|
||||
embed_metadata: bool = False
|
||||
add_metadata: bool = False
|
||||
continue_downloads: bool = True
|
||||
continuedl: bool = True
|
||||
no_overwrites: bool = True
|
||||
nooverwrites: bool = True
|
||||
ignore_errors: bool = True
|
||||
ignoreerrors: bool = True
|
||||
no_warnings: bool = False
|
||||
|
||||
return get_default_config()
|
||||
def __post_init__(self):
|
||||
"""Initialize default values for complex fields."""
|
||||
if self.postprocessors is None:
|
||||
self.postprocessors = [{
|
||||
"key": "FFmpegExtractAudio",
|
||||
"preferredcodec": "mp3",
|
||||
"preferredquality": "0"
|
||||
}]
|
||||
|
||||
def get_default_config():
|
||||
"""Get the default configuration."""
|
||||
return {
|
||||
"download_settings": {
|
||||
"format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best",
|
||||
"preferred_resolution": "720p",
|
||||
"audio_format": "mp3",
|
||||
"audio_quality": "0",
|
||||
"subtitle_language": "en",
|
||||
"subtitle_format": "srt",
|
||||
"write_metadata": False,
|
||||
"write_thumbnail": False,
|
||||
"write_description": False,
|
||||
"write_annotations": False,
|
||||
"write_comments": False,
|
||||
"write_subtitles": False,
|
||||
"embed_metadata": False,
|
||||
"add_metadata": False,
|
||||
"continue_downloads": True,
|
||||
"no_overwrites": True,
|
||||
"ignore_errors": True,
|
||||
"no_warnings": False
|
||||
},
|
||||
"folder_structure": {
|
||||
"downloads_dir": "downloads",
|
||||
"logs_dir": "logs",
|
||||
"tracking_file": str(DATA_DIR / "karaoke_tracking.json")
|
||||
},
|
||||
"logging": {
|
||||
"level": "INFO",
|
||||
"format": "%(asctime)s - %(levelname)s - %(message)s",
|
||||
"include_console": True,
|
||||
"include_file": True
|
||||
},
|
||||
"yt_dlp_path": "downloader/yt-dlp.exe"
|
||||
}
|
||||
@dataclass
|
||||
class FolderStructure:
|
||||
"""Configuration for folder structure."""
|
||||
downloads_dir: str = "downloads"
|
||||
logs_dir: str = "logs"
|
||||
tracking_file: str = "data/karaoke_tracking.json"
|
||||
|
||||
def save_config(config):
|
||||
"""Save configuration to data/config.json."""
|
||||
config_file = DATA_DIR / "config.json"
|
||||
config_file.parent.mkdir(exist_ok=True)
|
||||
@dataclass
|
||||
class LoggingConfig:
|
||||
"""Configuration for logging."""
|
||||
level: str = "INFO"
|
||||
format: str = "%(asctime)s - %(levelname)s - %(message)s"
|
||||
include_console: bool = True
|
||||
include_file: bool = True
|
||||
|
||||
@dataclass
|
||||
class AppConfig:
|
||||
"""Main application configuration."""
|
||||
download_settings: DownloadSettings = field(default_factory=DownloadSettings)
|
||||
folder_structure: FolderStructure = field(default_factory=FolderStructure)
|
||||
logging: LoggingConfig = field(default_factory=LoggingConfig)
|
||||
yt_dlp_path: str = "downloader/yt-dlp.exe"
|
||||
_config_file: Optional[Path] = None
|
||||
_last_modified: Optional[datetime] = None
|
||||
|
||||
class ConfigManager:
|
||||
"""
|
||||
Manages application configuration with loading, validation, and caching.
|
||||
"""
|
||||
|
||||
try:
|
||||
with open(config_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(config, f, indent=2, ensure_ascii=False)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Error saving config: {e}")
|
||||
return False
|
||||
def __init__(self, config_file: Union[str, Path] = "data/config.json"):
|
||||
"""
|
||||
Initialize the configuration manager.
|
||||
|
||||
Args:
|
||||
config_file: Path to the configuration file
|
||||
"""
|
||||
self.config_file = Path(config_file)
|
||||
self._config: Optional[AppConfig] = None
|
||||
self._last_modified: Optional[datetime] = None
|
||||
|
||||
def load_config(self, force_reload: bool = False) -> AppConfig:
|
||||
"""
|
||||
Load configuration from file with caching.
|
||||
|
||||
Args:
|
||||
force_reload: Force reload even if file hasn't changed
|
||||
|
||||
Returns:
|
||||
AppConfig instance
|
||||
"""
|
||||
# Check if we need to reload
|
||||
if not force_reload and self._config is not None:
|
||||
if self.config_file.exists():
|
||||
current_mtime = datetime.fromtimestamp(self.config_file.stat().st_mtime)
|
||||
if self._last_modified and current_mtime <= self._last_modified:
|
||||
return self._config
|
||||
|
||||
# Load configuration
|
||||
config_data = self._load_config_file()
|
||||
self._config = self._create_config_from_dict(config_data)
|
||||
self._last_modified = datetime.now()
|
||||
|
||||
return self._config
|
||||
|
||||
def _load_config_file(self) -> Dict[str, Any]:
|
||||
"""
|
||||
Load configuration from file with fallback to defaults.
|
||||
|
||||
Returns:
|
||||
Configuration dictionary
|
||||
"""
|
||||
if self.config_file.exists():
|
||||
try:
|
||||
with open(self.config_file, 'r', encoding='utf-8') as f:
|
||||
file_config = json.load(f)
|
||||
# Merge with defaults
|
||||
return self._merge_configs(DEFAULT_CONFIG, file_config)
|
||||
except (json.JSONDecodeError, FileNotFoundError) as e:
|
||||
print(f"Warning: Could not load config.json: {e}")
|
||||
print("Using default configuration.")
|
||||
|
||||
return DEFAULT_CONFIG.copy()
|
||||
|
||||
def _merge_configs(self, default: Dict[str, Any], user: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Merge user configuration with defaults.
|
||||
|
||||
Args:
|
||||
default: Default configuration
|
||||
user: User configuration
|
||||
|
||||
Returns:
|
||||
Merged configuration
|
||||
"""
|
||||
merged = default.copy()
|
||||
|
||||
for key, value in user.items():
|
||||
if key in merged and isinstance(merged[key], dict) and isinstance(value, dict):
|
||||
merged[key] = self._merge_configs(merged[key], value)
|
||||
else:
|
||||
merged[key] = value
|
||||
|
||||
return merged
|
||||
|
||||
def _create_config_from_dict(self, config_data: Dict[str, Any]) -> AppConfig:
|
||||
"""
|
||||
Create AppConfig from dictionary.
|
||||
|
||||
Args:
|
||||
config_data: Configuration dictionary
|
||||
|
||||
Returns:
|
||||
AppConfig instance
|
||||
"""
|
||||
download_settings = DownloadSettings(**config_data.get("download_settings", {}))
|
||||
folder_structure = FolderStructure(**config_data.get("folder_structure", {}))
|
||||
logging_config = LoggingConfig(**config_data.get("logging", {}))
|
||||
|
||||
return AppConfig(
|
||||
download_settings=download_settings,
|
||||
folder_structure=folder_structure,
|
||||
logging=logging_config,
|
||||
yt_dlp_path=config_data.get("yt_dlp_path", "downloader/yt-dlp.exe"),
|
||||
_config_file=self.config_file
|
||||
)
|
||||
|
||||
def update_resolution(self, resolution: str) -> None:
|
||||
"""
|
||||
Update the download format based on resolution.
|
||||
|
||||
Args:
|
||||
resolution: Resolution string (e.g., "720p", "1080p")
|
||||
"""
|
||||
if self._config is None:
|
||||
self.load_config()
|
||||
|
||||
if resolution in RESOLUTION_MAP:
|
||||
height = RESOLUTION_MAP[resolution]
|
||||
format_str = f"best[height<={height}][ext=mp4]/best[height<={height}]/best[ext=mp4]/best"
|
||||
self._config.download_settings.format = format_str
|
||||
self._config.download_settings.preferred_resolution = resolution
|
||||
print(f"🎬 Using resolution: {resolution}")
|
||||
|
||||
def get_config(self) -> AppConfig:
|
||||
"""
|
||||
Get the current configuration.
|
||||
|
||||
Returns:
|
||||
AppConfig instance
|
||||
"""
|
||||
if self._config is None:
|
||||
return self.load_config()
|
||||
return self._config
|
||||
|
||||
def save_config(self) -> None:
|
||||
"""
|
||||
Save current configuration to file.
|
||||
"""
|
||||
if self._config is None:
|
||||
return
|
||||
|
||||
config_dict = {
|
||||
"download_settings": self._config.download_settings.__dict__,
|
||||
"folder_structure": self._config.folder_structure.__dict__,
|
||||
"logging": self._config.logging.__dict__,
|
||||
"yt_dlp_path": self._config.yt_dlp_path
|
||||
}
|
||||
|
||||
# Ensure directory exists
|
||||
self.config_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with open(self.config_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(config_dict, f, indent=2, ensure_ascii=False)
|
||||
|
||||
print(f"Configuration saved to {self.config_file}")
|
||||
|
||||
def update_config(updates):
|
||||
"""Update configuration with new values."""
|
||||
config = load_config()
|
||||
config.update(updates)
|
||||
return save_config(config)
|
||||
# Global configuration manager instance
|
||||
_config_manager: Optional[ConfigManager] = None
|
||||
|
||||
def get_config_manager() -> ConfigManager:
|
||||
"""
|
||||
Get the global configuration manager instance.
|
||||
|
||||
Returns:
|
||||
ConfigManager instance
|
||||
"""
|
||||
global _config_manager
|
||||
if _config_manager is None:
|
||||
_config_manager = ConfigManager()
|
||||
return _config_manager
|
||||
|
||||
def load_config(force_reload: bool = False) -> AppConfig:
|
||||
"""
|
||||
Load configuration using the global manager.
|
||||
|
||||
Args:
|
||||
force_reload: Force reload even if file hasn't changed
|
||||
|
||||
Returns:
|
||||
AppConfig instance
|
||||
"""
|
||||
return get_config_manager().load_config(force_reload)
|
||||
@ -96,11 +96,11 @@ class DownloadPipeline:
|
||||
)
|
||||
|
||||
print(f"🔧 Running command: {' '.join(cmd)}")
|
||||
print(f"📺 Resolution settings: {self.config.get('download_settings', {}).get('preferred_resolution', 'Unknown')}")
|
||||
print(f"🎬 Format string: {self.config.get('download_settings', {}).get('format', 'Unknown')}")
|
||||
print(f"📺 Resolution settings: {self.config.download_settings.preferred_resolution}")
|
||||
print(f"🎬 Format string: {self.config.download_settings.format}")
|
||||
|
||||
# Debug: Show available formats (optional)
|
||||
if self.config.get('debug_show_formats', False):
|
||||
if hasattr(self.config, 'debug_show_formats') and self.config.debug_show_formats:
|
||||
show_available_formats(video_url, self.yt_dlp_path)
|
||||
|
||||
try:
|
||||
|
||||
@ -5,6 +5,7 @@ import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Dict, Any, Optional, List, Tuple
|
||||
from karaoke_downloader.tracking_manager import TrackingManager, SongStatus, FormatType
|
||||
from karaoke_downloader.id3_utils import add_id3_tags, extract_artist_title
|
||||
from karaoke_downloader.songlist_manager import (
|
||||
@ -27,30 +28,47 @@ from karaoke_downloader.video_downloader import download_video_and_track, is_val
|
||||
from karaoke_downloader.channel_manager import reset_channel_downloads, download_from_file
|
||||
from karaoke_downloader.download_pipeline import DownloadPipeline
|
||||
from karaoke_downloader.error_utils import handle_yt_dlp_error, log_error
|
||||
from karaoke_downloader.song_validator import create_song_validator
|
||||
from karaoke_downloader.config_manager import load_config, get_config_manager
|
||||
from karaoke_downloader.file_utils import sanitize_filename, ensure_directory_exists
|
||||
|
||||
# Constants
|
||||
DEFAULT_FUZZY_THRESHOLD = 85
|
||||
DEFAULT_CACHE_EXPIRATION_DAYS = 1
|
||||
DEFAULT_FILENAME_LENGTH_LIMIT = 100
|
||||
DEFAULT_ARTIST_LENGTH_LIMIT = 30
|
||||
DEFAULT_TITLE_LENGTH_LIMIT = 60
|
||||
DEFAULT_DISPLAY_LIMIT = 10
|
||||
|
||||
DATA_DIR = Path("data")
|
||||
|
||||
class KaraokeDownloader:
|
||||
def __init__(self):
|
||||
self.yt_dlp_path = Path("downloader/yt-dlp.exe")
|
||||
self.downloads_dir = Path("downloads")
|
||||
self.logs_dir = Path("logs")
|
||||
self.downloads_dir.mkdir(exist_ok=True)
|
||||
self.logs_dir.mkdir(exist_ok=True)
|
||||
self.tracker = TrackingManager(tracking_file=DATA_DIR / "karaoke_tracking.json", cache_file=DATA_DIR / "channel_cache.json")
|
||||
self.config = self._load_config()
|
||||
# Load configuration
|
||||
self.config_manager = get_config_manager()
|
||||
self.config = self.config_manager.load_config()
|
||||
|
||||
# Initialize paths
|
||||
self.yt_dlp_path = Path(self.config.yt_dlp_path)
|
||||
self.downloads_dir = Path(self.config.folder_structure.downloads_dir)
|
||||
self.logs_dir = Path(self.config.folder_structure.logs_dir)
|
||||
|
||||
# Ensure directories exist
|
||||
ensure_directory_exists(self.downloads_dir)
|
||||
ensure_directory_exists(self.logs_dir)
|
||||
|
||||
# Initialize tracking
|
||||
tracking_file = DATA_DIR / "karaoke_tracking.json"
|
||||
cache_file = DATA_DIR / "channel_cache.json"
|
||||
self.tracker = TrackingManager(tracking_file=tracking_file, cache_file=cache_file)
|
||||
|
||||
# Initialize song validator
|
||||
self.song_validator = create_song_validator(self.tracker, self.downloads_dir)
|
||||
|
||||
# Load songlist tracking
|
||||
self.songlist_tracking_file = DATA_DIR / "songlist_tracking.json"
|
||||
self.songlist_tracking = load_songlist_tracking(str(self.songlist_tracking_file))
|
||||
|
||||
# Load server songs for availability checking
|
||||
self.server_songs = load_server_songs()
|
||||
|
||||
# Songlist focus mode attributes
|
||||
self.songlist_focus_titles = None
|
||||
self.songlist_only = False
|
||||
@ -58,114 +76,30 @@ class KaraokeDownloader:
|
||||
self.download_limit = None
|
||||
|
||||
def _load_config(self):
|
||||
config_file = DATA_DIR / "config.json"
|
||||
if config_file.exists():
|
||||
try:
|
||||
with open(config_file, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, FileNotFoundError) as e:
|
||||
print(f"Warning: Could not load config.json: {e}")
|
||||
return {
|
||||
"download_settings": {
|
||||
"format": "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best",
|
||||
"preferred_resolution": "720p",
|
||||
"audio_format": "mp3",
|
||||
"audio_quality": "0",
|
||||
"subtitle_language": "en",
|
||||
"subtitle_format": "srt",
|
||||
"write_metadata": False,
|
||||
"write_thumbnail": False,
|
||||
"write_description": False,
|
||||
"write_annotations": False,
|
||||
"write_comments": False,
|
||||
"write_subtitles": False,
|
||||
"embed_metadata": False,
|
||||
"add_metadata": False,
|
||||
"continue_downloads": True,
|
||||
"no_overwrites": True,
|
||||
"ignore_errors": True,
|
||||
"no_warnings": False
|
||||
},
|
||||
"folder_structure": {
|
||||
"downloads_dir": "downloads",
|
||||
"logs_dir": "logs",
|
||||
"tracking_file": str(DATA_DIR / "karaoke_tracking.json")
|
||||
},
|
||||
"logging": {
|
||||
"level": "INFO",
|
||||
"format": "%(asctime)s - %(levelname)s - %(message)s",
|
||||
"include_console": True,
|
||||
"include_file": True
|
||||
},
|
||||
"yt_dlp_path": "downloader/yt-dlp.exe"
|
||||
}
|
||||
"""Load configuration using the config manager."""
|
||||
return self.config_manager.load_config()
|
||||
|
||||
def _should_skip_song(self, artist, title, channel_name, video_id, video_title, server_songs=None, server_duplicates_tracking=None):
|
||||
"""
|
||||
Centralized method to check if a song should be skipped.
|
||||
Performs four checks in order:
|
||||
1. Already downloaded (tracking)
|
||||
2. File exists on filesystem
|
||||
3. Already on server
|
||||
4. Previously failed download (bad file)
|
||||
Check if a song should be skipped using the centralized SongValidator.
|
||||
|
||||
Returns:
|
||||
tuple: (should_skip, reason, total_filtered)
|
||||
"""
|
||||
total_filtered = 0
|
||||
|
||||
# Check 1: Already downloaded by this system
|
||||
if self.tracker.is_song_downloaded(artist, title, channel_name, video_id):
|
||||
return True, "already downloaded", total_filtered
|
||||
|
||||
# Check 2: File already exists on filesystem
|
||||
# Generate the expected filename based on the download mode context
|
||||
safe_title = title
|
||||
invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
|
||||
for char in invalid_chars:
|
||||
safe_title = safe_title.replace(char, "")
|
||||
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
|
||||
|
||||
# Try different filename patterns that might exist
|
||||
possible_filenames = [
|
||||
f"{artist} - {safe_title}.mp4", # Songlist mode
|
||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
||||
f"{artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode
|
||||
]
|
||||
|
||||
for filename in possible_filenames:
|
||||
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
|
||||
# Apply length limits if needed
|
||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
|
||||
output_path = self.downloads_dir / channel_name / filename
|
||||
if output_path.exists() and output_path.stat().st_size > 0:
|
||||
return True, "file exists", total_filtered
|
||||
|
||||
# Check 3: Already on server (if server data provided)
|
||||
if server_songs is not None and server_duplicates_tracking is not None:
|
||||
from karaoke_downloader.server_manager import check_and_mark_server_duplicate
|
||||
if check_and_mark_server_duplicate(server_songs, server_duplicates_tracking, artist, title, video_title, channel_name):
|
||||
total_filtered += 1
|
||||
return True, "on server", total_filtered
|
||||
|
||||
# Check 4: Previously failed download (bad file)
|
||||
if self.tracker.is_song_failed(artist, title, channel_name, video_id):
|
||||
return True, "previously failed", total_filtered
|
||||
|
||||
return False, None, total_filtered
|
||||
return self.song_validator.should_skip_song(
|
||||
artist, title, channel_name, video_id, video_title,
|
||||
server_songs, server_duplicates_tracking
|
||||
)
|
||||
|
||||
def _mark_song_failed(self, artist, title, video_id, channel_name, error_message):
|
||||
"""
|
||||
Centralized method to mark a song as failed in tracking.
|
||||
Mark a song as failed in tracking using the SongValidator.
|
||||
"""
|
||||
self.tracker.mark_song_failed(artist, title, video_id, channel_name, error_message)
|
||||
print(f"🏷️ Marked song as failed: {artist} - {title}")
|
||||
self.song_validator.mark_song_failed(artist, title, video_id, channel_name, error_message)
|
||||
|
||||
def _handle_download_failure(self, artist, title, video_id, channel_name, error_type, error_details=""):
|
||||
"""
|
||||
Centralized method to handle download failures.
|
||||
Handle download failures using the SongValidator.
|
||||
|
||||
Args:
|
||||
artist: Song artist
|
||||
@ -175,10 +109,7 @@ class KaraokeDownloader:
|
||||
error_type: Type of error (e.g., "yt-dlp failed", "file verification failed")
|
||||
error_details: Additional error details
|
||||
"""
|
||||
error_msg = f"{error_type}"
|
||||
if error_details:
|
||||
error_msg += f": {error_details}"
|
||||
self._mark_song_failed(artist, title, video_id, channel_name, error_msg)
|
||||
self.song_validator.handle_download_failure(artist, title, video_id, channel_name, error_type, error_details)
|
||||
|
||||
def download_channel_videos(self, url, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD):
|
||||
"""Download videos from a channel or playlist URL, respecting songlist-only and limit flags. Supports fuzzy matching."""
|
||||
@ -193,7 +124,7 @@ class KaraokeDownloader:
|
||||
server_songs = load_server_songs()
|
||||
server_duplicates_tracking = load_server_duplicates_tracking()
|
||||
|
||||
limit = self.config.get('limit', 1)
|
||||
limit = getattr(self.config, 'limit', 1)
|
||||
cmd = [
|
||||
str(self.yt_dlp_path),
|
||||
'--flat-playlist',
|
||||
|
||||
182
karaoke_downloader/file_utils.py
Normal file
182
karaoke_downloader/file_utils.py
Normal file
@ -0,0 +1,182 @@
|
||||
"""
|
||||
File utilities for filename sanitization, path operations, and file validation.
|
||||
Centralizes common file operations to eliminate code duplication.
|
||||
"""
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import List, Optional, Tuple
|
||||
|
||||
# Constants for filename operations
|
||||
DEFAULT_FILENAME_LENGTH_LIMIT = 100
|
||||
DEFAULT_ARTIST_LENGTH_LIMIT = 30
|
||||
DEFAULT_TITLE_LENGTH_LIMIT = 60
|
||||
|
||||
# Windows invalid characters
|
||||
INVALID_FILENAME_CHARS = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
|
||||
|
||||
def sanitize_filename(artist: str, title: str, max_length: int = DEFAULT_FILENAME_LENGTH_LIMIT) -> str:
|
||||
"""
|
||||
Create a safe filename from artist and title.
|
||||
|
||||
Args:
|
||||
artist: Song artist name
|
||||
title: Song title
|
||||
max_length: Maximum filename length (default: 100)
|
||||
|
||||
Returns:
|
||||
Sanitized filename string
|
||||
"""
|
||||
# Clean up title
|
||||
safe_title = title.replace("(From ", "").replace(")", "").replace(" - ", " ").replace(":", "")
|
||||
safe_title = safe_title.replace("'", "").replace('"', "")
|
||||
|
||||
# Clean up artist
|
||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||
|
||||
# Remove invalid characters
|
||||
for char in INVALID_FILENAME_CHARS:
|
||||
safe_title = safe_title.replace(char, "")
|
||||
safe_artist = safe_artist.replace(char, "")
|
||||
|
||||
# Remove problematic patterns
|
||||
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
|
||||
safe_artist = safe_artist.strip()
|
||||
|
||||
# Create filename
|
||||
filename = f"{safe_artist} - {safe_title}.mp4"
|
||||
|
||||
# Limit filename length if needed
|
||||
if len(filename) > max_length:
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
|
||||
return filename
|
||||
|
||||
def generate_possible_filenames(artist: str, title: str, channel_name: str) -> List[str]:
|
||||
"""
|
||||
Generate possible filename patterns for different download modes.
|
||||
|
||||
Args:
|
||||
artist: Song artist name
|
||||
title: Song title
|
||||
channel_name: Channel name
|
||||
|
||||
Returns:
|
||||
List of possible filename patterns
|
||||
"""
|
||||
safe_title = sanitize_title_for_filenames(title)
|
||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||
|
||||
return [
|
||||
f"{safe_artist} - {safe_title}.mp4", # Songlist mode
|
||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
||||
f"{safe_artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode
|
||||
]
|
||||
|
||||
def sanitize_title_for_filenames(title: str) -> str:
|
||||
"""
|
||||
Sanitize title specifically for filename generation.
|
||||
|
||||
Args:
|
||||
title: Song title
|
||||
|
||||
Returns:
|
||||
Sanitized title string
|
||||
"""
|
||||
safe_title = title
|
||||
for char in INVALID_FILENAME_CHARS:
|
||||
safe_title = safe_title.replace(char, "")
|
||||
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
|
||||
return safe_title
|
||||
|
||||
def check_file_exists_with_patterns(
|
||||
downloads_dir: Path,
|
||||
channel_name: str,
|
||||
artist: str,
|
||||
title: str
|
||||
) -> Tuple[bool, Optional[Path]]:
|
||||
"""
|
||||
Check if a file exists using multiple possible filename patterns.
|
||||
|
||||
Args:
|
||||
downloads_dir: Base downloads directory
|
||||
channel_name: Channel name
|
||||
artist: Song artist
|
||||
title: Song title
|
||||
|
||||
Returns:
|
||||
Tuple of (exists, file_path) where file_path is None if not found
|
||||
"""
|
||||
possible_filenames = generate_possible_filenames(artist, title, channel_name)
|
||||
channel_dir = downloads_dir / channel_name
|
||||
|
||||
for filename in possible_filenames:
|
||||
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
|
||||
# Apply length limits if needed
|
||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||
safe_title = sanitize_title_for_filenames(title)
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
|
||||
file_path = channel_dir / filename
|
||||
if file_path.exists() and file_path.stat().st_size > 0:
|
||||
return True, file_path
|
||||
|
||||
return False, None
|
||||
|
||||
def ensure_directory_exists(directory: Path) -> None:
|
||||
"""
|
||||
Ensure a directory exists, creating it if necessary.
|
||||
|
||||
Args:
|
||||
directory: Directory path to ensure exists
|
||||
"""
|
||||
directory.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def is_valid_mp4_file(file_path: Path) -> bool:
|
||||
"""
|
||||
Check if a file is a valid MP4 file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file to check
|
||||
|
||||
Returns:
|
||||
True if file is a valid MP4, False otherwise
|
||||
"""
|
||||
if not file_path.exists():
|
||||
return False
|
||||
|
||||
# Check file size
|
||||
if file_path.stat().st_size == 0:
|
||||
return False
|
||||
|
||||
# Check file extension
|
||||
if file_path.suffix.lower() != '.mp4':
|
||||
return False
|
||||
|
||||
# Basic MP4 header check (first 4 bytes should be 'ftyp')
|
||||
try:
|
||||
with open(file_path, 'rb') as f:
|
||||
header = f.read(8)
|
||||
if len(header) >= 8 and header[4:8] == b'ftyp':
|
||||
return True
|
||||
except (IOError, OSError):
|
||||
pass
|
||||
|
||||
return False
|
||||
|
||||
def cleanup_temp_files(file_path: Path) -> None:
|
||||
"""
|
||||
Clean up temporary files created by yt-dlp.
|
||||
|
||||
Args:
|
||||
file_path: Base file path (without extension)
|
||||
"""
|
||||
temp_extensions = ['.info.json', '.meta', '.webp', '.jpg', '.png']
|
||||
|
||||
for ext in temp_extensions:
|
||||
temp_file = file_path.with_suffix(ext)
|
||||
if temp_file.exists():
|
||||
try:
|
||||
temp_file.unlink()
|
||||
except (IOError, OSError):
|
||||
pass # Ignore cleanup errors
|
||||
144
karaoke_downloader/song_validator.py
Normal file
144
karaoke_downloader/song_validator.py
Normal file
@ -0,0 +1,144 @@
|
||||
"""
|
||||
Song validation utilities for checking if songs should be downloaded.
|
||||
Centralizes song validation logic to eliminate code duplication.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional, Tuple, List
|
||||
from karaoke_downloader.file_utils import check_file_exists_with_patterns
|
||||
from karaoke_downloader.tracking_manager import TrackingManager
|
||||
|
||||
class SongValidator:
|
||||
"""
|
||||
Centralized song validation logic for checking if songs should be downloaded.
|
||||
"""
|
||||
|
||||
def __init__(self, tracker: TrackingManager, downloads_dir: Path):
|
||||
"""
|
||||
Initialize the song validator.
|
||||
|
||||
Args:
|
||||
tracker: Tracking manager instance
|
||||
downloads_dir: Base downloads directory
|
||||
"""
|
||||
self.tracker = tracker
|
||||
self.downloads_dir = downloads_dir
|
||||
|
||||
def should_skip_song(
|
||||
self,
|
||||
artist: str,
|
||||
title: str,
|
||||
channel_name: str,
|
||||
video_id: Optional[str] = None,
|
||||
video_title: Optional[str] = None,
|
||||
server_songs: Optional[Dict[str, Any]] = None,
|
||||
server_duplicates_tracking: Optional[Dict[str, Any]] = None
|
||||
) -> Tuple[bool, Optional[str], int]:
|
||||
"""
|
||||
Check if a song should be skipped based on multiple criteria.
|
||||
|
||||
Performs checks in order:
|
||||
1. Already downloaded (tracking)
|
||||
2. File exists on filesystem
|
||||
3. Already on server
|
||||
4. Previously failed download (bad file)
|
||||
|
||||
Args:
|
||||
artist: Song artist name
|
||||
title: Song title
|
||||
channel_name: Channel name
|
||||
video_id: YouTube video ID (optional)
|
||||
video_title: YouTube video title (optional)
|
||||
server_songs: Server songs data (optional)
|
||||
server_duplicates_tracking: Server duplicates tracking (optional)
|
||||
|
||||
Returns:
|
||||
Tuple of (should_skip, reason, total_filtered)
|
||||
"""
|
||||
total_filtered = 0
|
||||
|
||||
# Check 1: Already downloaded by this system
|
||||
if self.tracker.is_song_downloaded(artist, title, channel_name, video_id):
|
||||
return True, "already downloaded", total_filtered
|
||||
|
||||
# Check 2: File already exists on filesystem
|
||||
file_exists, _ = check_file_exists_with_patterns(
|
||||
self.downloads_dir, channel_name, artist, title
|
||||
)
|
||||
if file_exists:
|
||||
return True, "file exists", total_filtered
|
||||
|
||||
# Check 3: Already on server (if server data provided)
|
||||
if server_songs is not None and server_duplicates_tracking is not None:
|
||||
from karaoke_downloader.server_manager import check_and_mark_server_duplicate
|
||||
if check_and_mark_server_duplicate(
|
||||
server_songs, server_duplicates_tracking,
|
||||
artist, title, video_title, channel_name
|
||||
):
|
||||
total_filtered += 1
|
||||
return True, "on server", total_filtered
|
||||
|
||||
# Check 4: Previously failed download (bad file)
|
||||
if self.tracker.is_song_failed(artist, title, channel_name, video_id):
|
||||
return True, "previously failed", total_filtered
|
||||
|
||||
return False, None, total_filtered
|
||||
|
||||
def mark_song_failed(
|
||||
self,
|
||||
artist: str,
|
||||
title: str,
|
||||
video_id: Optional[str],
|
||||
channel_name: str,
|
||||
error_message: str
|
||||
) -> None:
|
||||
"""
|
||||
Mark a song as failed in tracking.
|
||||
|
||||
Args:
|
||||
artist: Song artist name
|
||||
title: Song title
|
||||
video_id: YouTube video ID (optional)
|
||||
channel_name: Channel name
|
||||
error_message: Error message to record
|
||||
"""
|
||||
self.tracker.mark_song_failed(artist, title, video_id, channel_name, error_message)
|
||||
print(f"🏷️ Marked song as failed: {artist} - {title}")
|
||||
|
||||
def handle_download_failure(
|
||||
self,
|
||||
artist: str,
|
||||
title: str,
|
||||
video_id: Optional[str],
|
||||
channel_name: str,
|
||||
error_type: str,
|
||||
error_details: str = ""
|
||||
) -> None:
|
||||
"""
|
||||
Handle download failures with consistent error formatting.
|
||||
|
||||
Args:
|
||||
artist: Song artist name
|
||||
title: Song title
|
||||
video_id: YouTube video ID (optional)
|
||||
channel_name: Channel name
|
||||
error_type: Type of error (e.g., "yt-dlp failed", "file verification failed")
|
||||
error_details: Additional error details
|
||||
"""
|
||||
error_msg = f"{error_type}"
|
||||
if error_details:
|
||||
error_msg += f": {error_details}"
|
||||
self.mark_song_failed(artist, title, video_id, channel_name, error_msg)
|
||||
|
||||
def create_song_validator(tracker: TrackingManager, downloads_dir: Path) -> SongValidator:
|
||||
"""
|
||||
Factory function to create a song validator instance.
|
||||
|
||||
Args:
|
||||
tracker: Tracking manager instance
|
||||
downloads_dir: Base downloads directory
|
||||
|
||||
Returns:
|
||||
SongValidator instance
|
||||
"""
|
||||
return SongValidator(tracker, downloads_dir)
|
||||
@ -5,70 +5,29 @@ Handles the actual downloading and post-processing of videos.
|
||||
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional, Tuple
|
||||
from karaoke_downloader.id3_utils import add_id3_tags
|
||||
from karaoke_downloader.songlist_manager import mark_songlist_song_downloaded
|
||||
from karaoke_downloader.download_planner import save_plan_cache
|
||||
from karaoke_downloader.youtube_utils import build_yt_dlp_command, execute_yt_dlp_command, show_available_formats
|
||||
from karaoke_downloader.error_utils import handle_yt_dlp_error, handle_file_validation_error, log_error
|
||||
from karaoke_downloader.file_utils import sanitize_filename, is_valid_mp4_file, cleanup_temp_files, ensure_directory_exists
|
||||
|
||||
# Constants
|
||||
DEFAULT_FILENAME_LENGTH_LIMIT = 100
|
||||
DEFAULT_ARTIST_LENGTH_LIMIT = 30
|
||||
DEFAULT_TITLE_LENGTH_LIMIT = 60
|
||||
DEFAULT_FORMAT_CHECK_TIMEOUT = 30
|
||||
|
||||
def sanitize_filename(artist, title):
|
||||
"""
|
||||
Create a safe filename from artist and title.
|
||||
Removes invalid characters and limits length.
|
||||
"""
|
||||
# Create a shorter, safer filename
|
||||
safe_title = title.replace("(From ", "").replace(")", "").replace(" - ", " ").replace(":", "").replace("'", "").replace('"', "")
|
||||
safe_artist = artist.replace("'", "").replace('"', "")
|
||||
|
||||
# Remove all Windows-invalid characters
|
||||
invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
|
||||
for char in invalid_chars:
|
||||
safe_title = safe_title.replace(char, "")
|
||||
safe_artist = safe_artist.replace(char, "")
|
||||
|
||||
# Also remove any other potentially problematic characters
|
||||
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
|
||||
safe_artist = safe_artist.strip()
|
||||
|
||||
filename = f"{safe_artist} - {safe_title}.mp4"
|
||||
|
||||
# Limit filename length to avoid Windows path issues
|
||||
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
|
||||
return filename
|
||||
|
||||
def is_valid_mp4(file_path):
|
||||
def is_valid_mp4(file_path: Path) -> bool:
|
||||
"""
|
||||
Check if a file is a valid MP4 file.
|
||||
Uses ffprobe if available, otherwise checks file extension and size.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file to check
|
||||
|
||||
Returns:
|
||||
True if file is a valid MP4, False otherwise
|
||||
"""
|
||||
if not file_path.exists():
|
||||
return False
|
||||
|
||||
# Check file size
|
||||
if file_path.stat().st_size == 0:
|
||||
return False
|
||||
|
||||
# Try to use ffprobe for validation
|
||||
try:
|
||||
import subprocess
|
||||
result = subprocess.run(
|
||||
['ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', str(file_path)],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=True
|
||||
)
|
||||
return True
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
# If ffprobe is not available, just check the extension and size
|
||||
return file_path.suffix.lower() == '.mp4' and file_path.stat().st_size > 0
|
||||
return is_valid_mp4_file(file_path)
|
||||
|
||||
def download_video_and_track(yt_dlp_path, config, downloads_dir, songlist_tracking,
|
||||
channel_name, channel_url, video_id, video_title,
|
||||
@ -83,10 +42,33 @@ def download_video_and_track(yt_dlp_path, config, downloads_dir, songlist_tracki
|
||||
artist, title, channel_name, songlist_tracking
|
||||
)
|
||||
|
||||
def download_single_video(output_path, video_id, config, yt_dlp_path,
|
||||
artist, title, channel_name, songlist_tracking):
|
||||
"""Download a single video and handle post-processing."""
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
def download_single_video(
|
||||
output_path: Path,
|
||||
video_id: str,
|
||||
config: Dict[str, Any],
|
||||
yt_dlp_path: str,
|
||||
artist: str,
|
||||
title: str,
|
||||
channel_name: str,
|
||||
songlist_tracking: Dict[str, Any]
|
||||
) -> bool:
|
||||
"""
|
||||
Download a single video and handle post-processing.
|
||||
|
||||
Args:
|
||||
output_path: Output file path
|
||||
video_id: YouTube video ID
|
||||
config: Configuration dictionary
|
||||
yt_dlp_path: Path to yt-dlp executable
|
||||
artist: Song artist name
|
||||
title: Song title
|
||||
channel_name: Channel name
|
||||
songlist_tracking: Songlist tracking data
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
ensure_directory_exists(output_path.parent)
|
||||
print(f"⬇️ Downloading: {artist} - {title} -> {output_path}")
|
||||
|
||||
video_url = f"https://www.youtube.com/watch?v={video_id}"
|
||||
@ -95,11 +77,11 @@ def download_single_video(output_path, video_id, config, yt_dlp_path,
|
||||
cmd = build_yt_dlp_command(yt_dlp_path, video_url, output_path, config)
|
||||
|
||||
print(f"🔧 Running command: {' '.join(cmd)}")
|
||||
print(f"📺 Resolution settings: {config.get('download_settings', {}).get('preferred_resolution', 'Unknown')}")
|
||||
print(f"🎬 Format string: {config.get('download_settings', {}).get('format', 'Unknown')}")
|
||||
print(f"📺 Resolution settings: {config.download_settings.preferred_resolution}")
|
||||
print(f"🎬 Format string: {config.download_settings.format}")
|
||||
|
||||
# Debug: Show available formats (optional)
|
||||
if config.get('debug_show_formats', False):
|
||||
if hasattr(config, 'debug_show_formats') and config.debug_show_formats:
|
||||
show_available_formats(video_url, yt_dlp_path)
|
||||
|
||||
try:
|
||||
@ -121,6 +103,9 @@ def download_single_video(output_path, video_id, config, yt_dlp_path,
|
||||
add_id3_tags(output_path, f"{artist} - {title} (Karaoke Version)", channel_name)
|
||||
mark_songlist_song_downloaded(songlist_tracking, artist, title, channel_name, output_path)
|
||||
|
||||
# Clean up temporary files
|
||||
cleanup_temp_files(output_path.with_suffix(''))
|
||||
|
||||
print(f"✅ Downloaded and tracked: {artist} - {title}")
|
||||
print(f"🎉 All post-processing complete for: {output_path}")
|
||||
|
||||
@ -255,58 +240,5 @@ def cleanup_cache(cache_file):
|
||||
except Exception as e:
|
||||
print(f"⚠️ Could not delete download plan cache: {e}")
|
||||
|
||||
def should_skip_song_standalone(artist, title, channel_name, video_id, video_title, downloads_dir, tracker=None, server_songs=None, server_duplicates_tracking=None):
|
||||
"""
|
||||
Standalone function to check if a song should be skipped.
|
||||
Performs four checks in order:
|
||||
1. Already downloaded (tracking) - if tracker provided
|
||||
2. File exists on filesystem
|
||||
3. Already on server - if server data provided
|
||||
4. Previously failed download (bad file) - if tracker provided
|
||||
|
||||
Returns:
|
||||
tuple: (should_skip, reason, total_filtered)
|
||||
"""
|
||||
total_filtered = 0
|
||||
|
||||
# Check 1: Already downloaded by this system (if tracker provided)
|
||||
if tracker and tracker.is_song_downloaded(artist, title, channel_name, video_id):
|
||||
return True, "already downloaded", total_filtered
|
||||
|
||||
# Check 2: File already exists on filesystem
|
||||
# Generate the expected filename based on the download mode context
|
||||
safe_title = title
|
||||
invalid_chars = ['?', ':', '*', '"', '<', '>', '|', '/', '\\']
|
||||
for char in invalid_chars:
|
||||
safe_title = safe_title.replace(char, "")
|
||||
safe_title = safe_title.replace("...", "").replace("..", "").replace(".", "").strip()
|
||||
|
||||
# Try different filename patterns that might exist
|
||||
possible_filenames = [
|
||||
f"{artist} - {safe_title}.mp4", # Songlist mode
|
||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
||||
f"{artist} - {safe_title} (Karaoke Version).mp4" # Channel videos mode
|
||||
]
|
||||
|
||||
for filename in possible_filenames:
|
||||
if len(filename) > DEFAULT_FILENAME_LENGTH_LIMIT:
|
||||
# Apply length limits if needed
|
||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
|
||||
output_path = downloads_dir / channel_name / filename
|
||||
if output_path.exists() and output_path.stat().st_size > 0:
|
||||
return True, "file exists", total_filtered
|
||||
|
||||
# Check 3: Already on server (if server data provided)
|
||||
if server_songs is not None and server_duplicates_tracking is not None:
|
||||
from karaoke_downloader.server_manager import check_and_mark_server_duplicate
|
||||
if check_and_mark_server_duplicate(server_songs, server_duplicates_tracking, artist, title, video_title, channel_name):
|
||||
total_filtered += 1
|
||||
return True, "on server", total_filtered
|
||||
|
||||
# Check 4: Previously failed download (bad file) - if tracker provided
|
||||
if tracker and tracker.is_song_failed(artist, title, channel_name, video_id):
|
||||
return True, "previously failed", total_filtered
|
||||
|
||||
return False, None, total_filtered
|
||||
# Note: should_skip_song_standalone function has been removed and replaced with SongValidator class
|
||||
# Use karaoke_downloader.song_validator.create_song_validator() instead
|
||||
@ -78,7 +78,7 @@ def build_yt_dlp_command(
|
||||
"--ignore-errors",
|
||||
"--no-warnings",
|
||||
"-o", str(output_path),
|
||||
"-f", config.get("download_settings", {}).get("format", "best[height<=720][ext=mp4]/best[height<=720]/best[ext=mp4]/best"),
|
||||
"-f", config.download_settings.format,
|
||||
video_url
|
||||
]
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user