Compare commits
1 Commits
develop
...
multiplatf
| Author | SHA1 | Date | |
|---|---|---|---|
| bed46ff2d2 |
3
.gitignore
vendored
3
.gitignore
vendored
@ -14,6 +14,9 @@ logs/
|
|||||||
*.log
|
*.log
|
||||||
|
|
||||||
# Tracking and cache files
|
# Tracking and cache files
|
||||||
|
karaoke_tracking.json
|
||||||
|
karaoke_tracking.json.backup
|
||||||
|
songlist_tracking.json
|
||||||
*.cache
|
*.cache
|
||||||
|
|
||||||
# yt-dlp temporary files
|
# yt-dlp temporary files
|
||||||
|
|||||||
385
PRD.md
385
PRD.md
@ -1,8 +1,8 @@
|
|||||||
|
|
||||||
# 🎤 Karaoke Video Downloader – PRD (v3.4.4)
|
# 🎤 Karaoke Video Downloader – PRD (v3.5)
|
||||||
|
|
||||||
## ✅ Overview
|
## ✅ Overview
|
||||||
A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows and macOS with automatic platform detection. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows, macOS, and Linux with automatic platform detection and optimized caching. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -63,7 +63,7 @@ The codebase has been refactored into focused modules with centralized utilities
|
|||||||
---
|
---
|
||||||
|
|
||||||
## ⚙️ Platform & Stack
|
## ⚙️ Platform & Stack
|
||||||
- **Platform:** Windows, macOS
|
- **Platform:** Windows, macOS, Linux
|
||||||
- **Interface:** Command-line (CLI)
|
- **Interface:** Command-line (CLI)
|
||||||
- **Tech Stack:** Python 3.7+, yt-dlp (platform-specific binary), mutagen (for ID3 tagging)
|
- **Tech Stack:** Python 3.7+, yt-dlp (platform-specific binary), mutagen (for ID3 tagging)
|
||||||
|
|
||||||
@ -101,7 +101,6 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
|||||||
- ✅ Songlist integration: prioritize and track custom songlists
|
- ✅ Songlist integration: prioritize and track custom songlists
|
||||||
- ✅ Songlist-only mode: download only songs from the songlist
|
- ✅ Songlist-only mode: download only songs from the songlist
|
||||||
- ✅ Songlist focus mode: download only songs from specific playlists by title
|
- ✅ Songlist focus mode: download only songs from specific playlists by title
|
||||||
- ✅ Force download mode: bypass all existing file checks and re-download songs regardless of server duplicates or existing files
|
|
||||||
- ✅ Global songlist tracking to avoid duplicates across channels
|
- ✅ Global songlist tracking to avoid duplicates across channels
|
||||||
- ✅ ID3 tagging for artist/title in MP4 files (mutagen)
|
- ✅ ID3 tagging for artist/title in MP4 files (mutagen)
|
||||||
- ✅ Real-time progress and detailed logging
|
- ✅ Real-time progress and detailed logging
|
||||||
@ -123,8 +122,6 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
|||||||
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
||||||
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
||||||
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
||||||
- ✅ **Manual video collection**: Static video collection system for managing individual karaoke videos that don't belong to regular channels. Use `--manual` to download from `data/manual_videos.json`.
|
|
||||||
- ✅ **Channel-specific parsing rules**: JSON-based configuration for parsing video titles from different YouTube channels, with support for various title formats and cleanup rules.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -152,34 +149,21 @@ KaroakeVideoDownloader/
|
|||||||
│ ├── check_resolution.py # Resolution checker utility
|
│ ├── check_resolution.py # Resolution checker utility
|
||||||
│ ├── resolution_cli.py # Resolution config CLI
|
│ ├── resolution_cli.py # Resolution config CLI
|
||||||
│ └── tracking_cli.py # Tracking management CLI
|
│ └── tracking_cli.py # Tracking management CLI
|
||||||
├── config/ # Configuration files
|
├── data/ # All config, tracking, cache, and songlist files
|
||||||
│ └── config.json # Main configuration file
|
│ ├── config.json
|
||||||
├── data/ # All tracking, cache, and songlist files
|
|
||||||
│ ├── karaoke_tracking.json
|
│ ├── karaoke_tracking.json
|
||||||
│ ├── songlist_tracking.json
|
│ ├── songlist_tracking.json
|
||||||
│ ├── channel_cache.json
|
│ ├── channel_cache.json
|
||||||
│ ├── channels.json # Channel configuration with parsing rules
|
│ ├── channels.txt
|
||||||
│ ├── manual_videos.json # Manual video collection
|
|
||||||
│ └── songList.json
|
│ └── songList.json
|
||||||
├── utilities/ # Utility scripts and tools
|
|
||||||
│ ├── add_manual_video.py # Manual video management
|
|
||||||
│ ├── build_cache_from_raw.py # Cache building utility
|
|
||||||
│ ├── cleanup_duplicate_files.py # File cleanup utilities
|
|
||||||
│ ├── cleanup_recent_tracking.py # Tracking cleanup utilities
|
|
||||||
│ ├── deduplicate_songlist_tracking.py # Data deduplication
|
|
||||||
│ ├── fix_artist_name_format.py # Data cleanup utilities
|
|
||||||
│ ├── fix_artist_name_format_simple.py
|
|
||||||
│ ├── fix_code_quality.py # Development tools
|
|
||||||
│ ├── reset_and_redownload.py # Maintenance utilities
|
|
||||||
│ └── songlist_report.py # Reporting utilities
|
|
||||||
├── downloads/ # All video output
|
├── downloads/ # All video output
|
||||||
│ └── [ChannelName]/ # Per-channel folders
|
│ └── [ChannelName]/ # Per-channel folders
|
||||||
├── logs/ # Download logs
|
├── logs/ # Download logs
|
||||||
├── downloader/yt-dlp.exe # yt-dlp binary (Windows)
|
├── downloader/yt-dlp.exe # yt-dlp binary (Windows)
|
||||||
├── downloader/yt-dlp_macos # yt-dlp binary (macOS)
|
├── downloader/yt-dlp_macos # yt-dlp binary (macOS)
|
||||||
├── src/tests/ # Test scripts
|
├── downloader/yt-dlp # yt-dlp binary (Linux)
|
||||||
│ ├── test_macos.py # macOS setup and functionality tests
|
├── tests/ # Diagnostic and test scripts
|
||||||
│ └── test_platform.py # Platform detection tests
|
│ └── test_installation.py
|
||||||
├── download_karaoke.py # Main entry point (thin wrapper)
|
├── download_karaoke.py # Main entry point (thin wrapper)
|
||||||
├── README.md
|
├── README.md
|
||||||
├── PRD.md
|
├── PRD.md
|
||||||
@ -194,8 +178,6 @@ KaroakeVideoDownloader/
|
|||||||
- `--songlist-priority`: Prioritize songlist songs in download queue
|
- `--songlist-priority`: Prioritize songlist songs in download queue
|
||||||
- `--songlist-only`: Download only songs from the songlist
|
- `--songlist-only`: Download only songs from the songlist
|
||||||
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
||||||
- `--songlist-file <FILE_PATH>`: Custom songlist file path to use with --songlist-focus (default: data/songList.json)
|
|
||||||
- `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
|
|
||||||
- `--songlist-status`: Show songlist download progress
|
- `--songlist-status`: Show songlist download progress
|
||||||
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
||||||
- `--resolution <720p|1080p|...>`: Override resolution
|
- `--resolution <720p|1080p|...>`: Override resolution
|
||||||
@ -208,11 +190,7 @@ KaroakeVideoDownloader/
|
|||||||
- `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)**
|
- `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)**
|
||||||
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
||||||
- `--parallel`: **Enable parallel downloads for improved speed**
|
- `--parallel`: **Enable parallel downloads for improved speed**
|
||||||
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
|
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3)**
|
||||||
- `--manual`: **Download from manual videos collection (data/manual_videos.json)**
|
|
||||||
- `--channel-focus <CHANNEL_NAME>`: **Download from a specific channel by name (e.g., 'SingKingKaraoke')**
|
|
||||||
- `--all-videos`: **Download all videos from channel (not just songlist matches), skipping existing files and songs in songs.json**
|
|
||||||
- `--dry-run`: **Build download plan and show what would be downloaded without actually downloading anything**
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -223,8 +201,6 @@ KaroakeVideoDownloader/
|
|||||||
- **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files.
|
- **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files.
|
||||||
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
||||||
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
||||||
- **Channel-Specific Parsing:** Uses `data/channels.json` to define parsing rules for each YouTube channel, handling different video title formats (e.g., "Artist - Title", "Artist Title", "Title | Artist", etc.).
|
|
||||||
- **Manual Video Collection:** Static video management system using `data/manual_videos.json` for individual karaoke videos that don't belong to regular channels. Accessible via `--manual` parameter.
|
|
||||||
|
|
||||||
## 🔧 Refactoring Improvements (v3.3)
|
## 🔧 Refactoring Improvements (v3.3)
|
||||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||||
@ -278,7 +254,7 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
|
|
||||||
### **New Parallel Download System (v3.4)**
|
### **New Parallel Download System (v3.4)**
|
||||||
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
||||||
- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
|
||||||
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
||||||
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
||||||
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
||||||
@ -286,245 +262,16 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
- **Performance improvements:** Significantly faster downloads for large batches (3-5x speedup with 3-5 workers)
|
- **Performance improvements:** Significantly faster downloads for large batches (3-5x speedup with 3-5 workers)
|
||||||
- **Integrated with all modes:** Works with both songlist-across-channels and latest-per-channel download modes
|
- **Integrated with all modes:** Works with both songlist-across-channels and latest-per-channel download modes
|
||||||
|
|
||||||
---
|
### **Cross-Platform Support (v3.5)**
|
||||||
|
- **Platform detection:** Automatic detection of Windows, macOS, and Linux systems
|
||||||
## 🚀 Future Enhancements
|
- **Flexible yt-dlp integration:** Supports both binary files and pip-installed yt-dlp modules
|
||||||
- [ ] Web UI for easier management
|
- **Platform-specific configuration:** Automatic selection of appropriate yt-dlp binary/command for each platform
|
||||||
- [ ] More advanced song matching (multi-language)
|
- **Setup automation:** `setup_platform.py` script for easy platform-specific setup
|
||||||
- [ ] Download scheduling and retry logic
|
- **Command parsing:** Intelligent parsing of yt-dlp commands (file paths vs. module commands)
|
||||||
- [ ] More granular status reporting
|
- **Enhanced documentation:** Platform-specific setup instructions and troubleshooting
|
||||||
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
- **Backward compatibility:** Maintains full compatibility with existing Windows installations
|
||||||
- [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
|
- **FFmpeg integration:** Automatic FFmpeg installation and configuration for optimal video processing
|
||||||
- [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
|
- **Optimized caching:** Enhanced channel video caching with format compatibility and instant video list loading
|
||||||
- [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED**
|
|
||||||
- [ ] Unit tests for all modules
|
|
||||||
- [ ] Integration tests for end-to-end workflows
|
|
||||||
- [ ] Plugin system for custom file operations
|
|
||||||
- [ ] Advanced configuration UI
|
|
||||||
- [ ] Real-time download progress visualization
|
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.1)
|
|
||||||
### **Enhanced Fuzzy Matching (v3.4.1)**
|
|
||||||
- **Improved `extract_artist_title` function**: Enhanced to handle multiple video title formats beyond simple "Artist - Title" patterns
|
|
||||||
- **"Title Karaoke | Artist Karaoke Version" format**: Correctly parses titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
|
||||||
- **"Title Artist KARAOKE" format**: Handles titles ending with "KARAOKE" and attempts to extract artist information
|
|
||||||
- **Fallback handling**: Returns empty artist and full title for unparseable formats
|
|
||||||
- **Consolidated function usage**: Removed duplicate `extract_artist_title` implementations across modules
|
|
||||||
- **Single source of truth**: All modules now import from `fuzzy_matcher.py`
|
|
||||||
- **Consistent parsing**: Eliminated inconsistencies between different parsing implementations
|
|
||||||
- **Better maintainability**: Changes to parsing logic only need to be made in one place
|
|
||||||
|
|
||||||
### **Fixed Import Conflicts**
|
|
||||||
- **Resolved import conflict in `download_planner.py`**: Updated to use the enhanced `extract_artist_title` from `fuzzy_matcher.py` instead of the simpler version from `id3_utils.py`
|
|
||||||
- **Updated `id3_utils.py`**: Now imports `extract_artist_title` from `fuzzy_matcher.py` for consistency
|
|
||||||
|
|
||||||
### **Enhanced --limit Parameter**
|
|
||||||
- **Fixed limit application**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution
|
|
||||||
- **Improved performance**: When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists
|
|
||||||
|
|
||||||
### **Benefits of Recent Improvements**
|
|
||||||
- **Better matching accuracy**: Enhanced fuzzy matching can now handle a wider variety of video title formats commonly found on YouTube karaoke channels
|
|
||||||
- **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
|
|
||||||
- **Consistent behavior**: All parts of the system use the same parsing logic, eliminating edge cases where different modules would parse the same title differently
|
|
||||||
- **Improved performance**: The `--limit` parameter now works as expected, providing faster processing for targeted downloads
|
|
||||||
- **Cleaner codebase**: Eliminated duplicate code and import conflicts, making the system more maintainable
|
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.2)
|
|
||||||
### **Duplicate File Prevention & Filename Consistency**
|
|
||||||
- **Enhanced file existence checking**: `check_file_exists_with_patterns()` now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates
|
|
||||||
- **Automatic duplicate prevention**: Download pipeline skips downloads when files already exist (including duplicates)
|
|
||||||
- **Updated yt-dlp configuration**: Set `"nooverwrites": false` to prevent yt-dlp from creating duplicate files with suffixes
|
|
||||||
- **Cleanup utility**: `data/cleanup_duplicate_files.py` provides interactive cleanup of existing duplicate files
|
|
||||||
- **Filename vs ID3 tag consistency**: Removed "(Karaoke Version)" suffix from ID3 tags to match filenames exactly
|
|
||||||
- **Unified parsing**: Both filename generation and ID3 tagging use the same artist/title extraction logic
|
|
||||||
|
|
||||||
### **Benefits of Duplicate Prevention**
|
|
||||||
- **No more duplicate files**: Eliminates `(2)`, `(3)` suffix files that waste disk space
|
|
||||||
- **Consistent metadata**: Filename and ID3 tag use identical artist/title format
|
|
||||||
- **Efficient disk usage**: Prevents unnecessary downloads of existing files
|
|
||||||
- **Clear file identification**: Consistent naming across all file operations
|
|
||||||
|
|
||||||
## 🛠️ Maintenance
|
|
||||||
|
|
||||||
### **Regular Cleanup**
|
|
||||||
- Run the cleanup utility periodically to remove any duplicate files
|
|
||||||
- Monitor downloads for any new duplicate creation (should be rare with fixes)
|
|
||||||
|
|
||||||
### **Configuration**
|
|
||||||
- Keep `"nooverwrites": false` in `data/config.json`
|
|
||||||
- This prevents yt-dlp from creating duplicate files
|
|
||||||
|
|
||||||
### **Monitoring**
|
|
||||||
- Check logs for "⏭️ Skipping download - file already exists" messages
|
|
||||||
- These indicate the duplicate prevention is working correctly
|
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.3)
|
|
||||||
### **Manual Video Collection System**
|
|
||||||
- **New `--manual` parameter**: Simple access to manual video collection via `python download_karaoke.py --manual --limit 5`
|
|
||||||
- **Static video management**: `data/manual_videos.json` stores individual karaoke videos that don't belong to regular channels
|
|
||||||
- **Helper script**: `add_manual_video.py` provides easy management of manual video entries
|
|
||||||
- **Full integration**: Manual videos work with all existing features (songlist matching, fuzzy matching, parallel downloads, etc.)
|
|
||||||
- **No yt-dlp dependency**: Manual videos bypass YouTube API calls for video listing, using static data instead
|
|
||||||
|
|
||||||
### **Channel-Specific Parsing Rules**
|
|
||||||
- **JSON-based configuration**: `data/channels.json` replaces `data/channels.txt` with structured channel configuration
|
|
||||||
- **Parsing rules per channel**: Each channel can define custom parsing rules for video titles
|
|
||||||
- **Multiple format support**: Handles various title formats like "Artist - Title", "Artist Title", "Title | Artist", etc.
|
|
||||||
- **Suffix cleanup**: Automatic removal of common karaoke-related suffixes
|
|
||||||
- **Multi-artist support**: Parsing for titles with multiple artists separated by specific delimiters
|
|
||||||
- **Backward compatibility**: Still supports legacy `data/channels.txt` format
|
|
||||||
|
|
||||||
### **Benefits of New Features**
|
|
||||||
- **Flexible video management**: Easy addition of individual karaoke videos without creating new channels
|
|
||||||
- **Accurate parsing**: Channel-specific rules ensure correct artist/title extraction for ID3 tags and filenames
|
|
||||||
- **Consistent metadata**: Proper parsing prevents filename and ID3 tag inconsistencies
|
|
||||||
- **Easy maintenance**: Simple JSON structure for managing both channels and manual videos
|
|
||||||
- **Full feature compatibility**: Manual videos work seamlessly with existing download modes and features
|
|
||||||
|
|
||||||
## 📚 Documentation Standards
|
|
||||||
|
|
||||||
### **Documentation Location**
|
|
||||||
- **All changes, refactoring, and improvements should be documented in the PRD.md and README.md files**
|
|
||||||
- **Do NOT create separate .md files for documenting changes, refactoring, or improvements**
|
|
||||||
- **Use the existing sections in PRD.md and README.md to track all project evolution**
|
|
||||||
|
|
||||||
### **Where to Document Changes**
|
|
||||||
- **PRD.md**: Technical details, architecture changes, bug fixes, and implementation specifics
|
|
||||||
- **README.md**: User-facing features, usage instructions, and high-level improvements
|
|
||||||
- **CHANGELOG.md**: Version-specific release notes and change summaries
|
|
||||||
|
|
||||||
### **Documentation Requirements**
|
|
||||||
- **All new features must be documented in both PRD.md and README.md**
|
|
||||||
- **All refactoring efforts must be documented in the appropriate sections**
|
|
||||||
- **All bug fixes must be documented with technical details**
|
|
||||||
- **Version numbers and dates should be clearly marked**
|
|
||||||
- **Benefits and improvements should be explicitly stated**
|
|
||||||
|
|
||||||
### **Maintenance Responsibility**
|
|
||||||
- **Keep PRD.md and README.md synchronized with code changes**
|
|
||||||
- **Update documentation immediately when implementing new features**
|
|
||||||
- **Remove outdated information and consolidate related changes**
|
|
||||||
- **Ensure all CLI options and features are documented in both files**
|
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.4)
|
|
||||||
### **All Videos Download Mode**
|
|
||||||
- **New `--all-videos` parameter**: Download all videos from a channel, not just songlist matches
|
|
||||||
- **Smart MP3/MP4 detection**: Automatically detects if you have MP3 versions in songs.json and downloads MP4 video versions
|
|
||||||
- **Existing file skipping**: Skips videos that already exist on the filesystem
|
|
||||||
- **Progress tracking**: Shows clear progress with "Downloading X/Y videos" format
|
|
||||||
- **Parallel processing support**: Works with `--parallel --workers N` for faster downloads
|
|
||||||
- **Channel focus integration**: Works with `--channel-focus` to target specific channels
|
|
||||||
- **Limit support**: Works with `--limit N` to control download batch size
|
|
||||||
|
|
||||||
### **Smart Songlist Integration**
|
|
||||||
- **MP4 version detection**: Checks if MP4 version already exists in songs.json before downloading
|
|
||||||
- **MP3 upgrade path**: Downloads MP4 video versions when only MP3 versions exist in songlist
|
|
||||||
- **Duplicate prevention**: Skips downloads when MP4 versions already exist
|
|
||||||
- **Efficient filtering**: Only processes videos that need to be downloaded
|
|
||||||
|
|
||||||
### **Benefits of All Videos Mode**
|
|
||||||
- **Complete channel downloads**: Download entire channels without songlist restrictions
|
|
||||||
- **Automatic format upgrading**: Upgrade MP3 collections to MP4 video versions
|
|
||||||
- **Efficient processing**: Only downloads videos that don't already exist
|
|
||||||
- **Flexible control**: Use with limits, parallel processing, and channel targeting
|
|
||||||
- **Clear progress feedback**: Real-time progress tracking for large downloads
|
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.5)
|
|
||||||
### **Unified Download Workflow Architecture**
|
|
||||||
- **Unified execution pipeline**: All download modes now use the same execution workflow, eliminating inconsistencies and broken pipelines
|
|
||||||
- **Consistent behavior**: All modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) use identical download execution, progress tracking, and error handling
|
|
||||||
- **Centralized download logic**: Single `execute_unified_download_workflow()` method handles all download execution
|
|
||||||
- **Automatic parallel support**: All download modes automatically support `--parallel --workers N` without additional implementation
|
|
||||||
- **Unified cache management**: Consistent progress tracking and resume functionality across all modes
|
|
||||||
|
|
||||||
### **Architecture Pattern for New Download Modes**
|
|
||||||
When adding new download modes in the future, follow this pattern to ensure consistency:
|
|
||||||
|
|
||||||
#### **1. Download Plan Building (Mode-Specific)**
|
|
||||||
Each download mode should build a download plan (list of videos to download) with this structure:
|
|
||||||
```python
|
|
||||||
download_plan = [
|
|
||||||
{
|
|
||||||
"video_id": "video_id",
|
|
||||||
"artist": "artist_name",
|
|
||||||
"title": "song_title",
|
|
||||||
"filename": "sanitized_filename.mp4",
|
|
||||||
"channel_name": "channel_name",
|
|
||||||
"video_title": "original_video_title",
|
|
||||||
"force_download": False
|
|
||||||
}
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
#### **2. Unified Execution (Shared)**
|
|
||||||
All modes should use the unified execution workflow:
|
|
||||||
```python
|
|
||||||
downloaded_count, success = self.execute_unified_download_workflow(
|
|
||||||
download_plan=download_plan,
|
|
||||||
cache_file=cache_file, # Optional, for progress tracking
|
|
||||||
limit=limit, # Optional, for limiting downloads
|
|
||||||
show_progress=True, # Optional, for progress display
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
#### **3. Execution Method Selection (Automatic)**
|
|
||||||
The unified workflow automatically chooses execution method based on settings:
|
|
||||||
- **Sequential**: Uses `DownloadPipeline` for single-threaded downloads
|
|
||||||
- **Parallel**: Uses `ParallelDownloader` when `--parallel` is enabled
|
|
||||||
|
|
||||||
#### **4. Required Implementation Pattern**
|
|
||||||
```python
|
|
||||||
def download_new_mode(self, ...):
|
|
||||||
"""New download mode implementation."""
|
|
||||||
|
|
||||||
# 1. Build download plan (mode-specific logic)
|
|
||||||
download_plan = []
|
|
||||||
for video in videos_to_download:
|
|
||||||
download_plan.append({
|
|
||||||
"video_id": video["id"],
|
|
||||||
"artist": artist,
|
|
||||||
"title": title,
|
|
||||||
"filename": filename,
|
|
||||||
"channel_name": channel_name,
|
|
||||||
"video_title": video["title"],
|
|
||||||
"force_download": force_download
|
|
||||||
})
|
|
||||||
|
|
||||||
# 2. Create cache file (optional, for progress tracking)
|
|
||||||
cache_file = get_download_plan_cache_file("new_mode", **plan_kwargs)
|
|
||||||
save_plan_cache(cache_file, download_plan, [])
|
|
||||||
|
|
||||||
# 3. Use unified execution workflow
|
|
||||||
downloaded_count, success = self.execute_unified_download_workflow(
|
|
||||||
download_plan=download_plan,
|
|
||||||
cache_file=cache_file,
|
|
||||||
limit=limit,
|
|
||||||
show_progress=True,
|
|
||||||
)
|
|
||||||
|
|
||||||
return success
|
|
||||||
```
|
|
||||||
|
|
||||||
### **Benefits of Unified Architecture**
|
|
||||||
- **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
|
||||||
- **Maintainability**: Changes to download execution only need to be made in one place
|
|
||||||
- **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
|
||||||
- **Extensibility**: New modes automatically get all existing features (parallel downloads, progress tracking, etc.)
|
|
||||||
- **Testing**: Easier to test since all modes use the same execution logic
|
|
||||||
|
|
||||||
### **What Was Fixed**
|
|
||||||
- **Broken Pipeline**: Previously, different modes used different execution paths, leading to inconsistencies
|
|
||||||
- **Missing Method**: Added missing `download_latest_per_channel()` method that was referenced in CLI but not implemented
|
|
||||||
- **Code Duplication**: Eliminated duplicate download execution logic across different modes
|
|
||||||
- **Inconsistent Behavior**: All modes now have identical progress tracking, error handling, and cache management
|
|
||||||
|
|
||||||
### **Future Development Guidelines**
|
|
||||||
1. **NEVER implement custom download execution logic** in new download modes
|
|
||||||
2. **ALWAYS use `execute_unified_download_workflow()`** for download execution
|
|
||||||
3. **Focus on download plan building** - that's where mode-specific logic belongs
|
|
||||||
4. **Use the standard download plan structure** for consistency
|
|
||||||
5. **Implement cache file handling** for progress tracking and resume functionality
|
|
||||||
6. **Test with both sequential and parallel modes** to ensure compatibility
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -534,97 +281,9 @@ def download_new_mode(self, ...):
|
|||||||
- [ ] Download scheduling and retry logic
|
- [ ] Download scheduling and retry logic
|
||||||
- [ ] More granular status reporting
|
- [ ] More granular status reporting
|
||||||
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
||||||
- [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
|
- [x] **Cross-platform support (Windows, macOS, Linux)** ✅ **COMPLETED**
|
||||||
- [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
|
|
||||||
- [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED**
|
|
||||||
- [ ] Unit tests for all modules
|
- [ ] Unit tests for all modules
|
||||||
- [ ] Integration tests for end-to-end workflows
|
- [ ] Integration tests for end-to-end workflows
|
||||||
- [ ] Plugin system for custom file operations
|
- [ ] Plugin system for custom file operations
|
||||||
- [ ] Advanced configuration UI
|
- [ ] Advanced configuration UI
|
||||||
- [ ] Real-time download progress visualization
|
- [ ] Real-time download progress visualization
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.4)
|
|
||||||
### **macOS Support with Automatic Platform Detection**
|
|
||||||
- **Cross-platform compatibility**: Added support for macOS alongside Windows
|
|
||||||
- **Automatic platform detection**: Detects operating system and selects appropriate yt-dlp binary
|
|
||||||
- **Flexible yt-dlp integration**: Supports both binary files (`yt-dlp_macos`) and pip installation (`python3 -m yt_dlp`)
|
|
||||||
- **Setup automation**: `setup_macos.py` script for easy macOS setup with FFmpeg and yt-dlp installation
|
|
||||||
- **Command parsing**: Intelligent parsing of yt-dlp commands (file paths vs. module commands)
|
|
||||||
- **Enhanced validation**: Platform-specific error messages and validation in CLI
|
|
||||||
- **Backward compatibility**: Maintains full compatibility with existing Windows installations
|
|
||||||
|
|
||||||
### **Benefits of macOS Support**
|
|
||||||
- **Native macOS experience**: No need for Windows compatibility layers or virtualization
|
|
||||||
- **Automatic setup**: Simple setup script handles all dependencies
|
|
||||||
- **Flexible installation**: Choose between binary download or pip installation
|
|
||||||
- **Consistent functionality**: All features work identically on both platforms
|
|
||||||
- **Easy maintenance**: Platform detection handles configuration automatically
|
|
||||||
|
|
||||||
### **Setup Instructions**
|
|
||||||
```bash
|
|
||||||
# Automatic setup (recommended)
|
|
||||||
python3 setup_macos.py
|
|
||||||
|
|
||||||
# Test installation
|
|
||||||
python3 src/tests/test_macos.py
|
|
||||||
|
|
||||||
# Manual setup options
|
|
||||||
# 1. Install yt-dlp via pip: pip3 install yt-dlp
|
|
||||||
# 2. Download binary: curl -L -o downloader/yt-dlp_macos https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos
|
|
||||||
# 3. Install FFmpeg: brew install ffmpeg
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.7)
|
|
||||||
### **Configurable Data Directory Path**
|
|
||||||
- **Centralized Data Path Management**: New `data_path_manager.py` module provides unified data directory path management
|
|
||||||
- **Configurable Location**: Data directory path can be set in `config/config.json` under `folder_structure.data_dir`
|
|
||||||
- **Backward Compatibility**: Defaults to "data" directory if not configured
|
|
||||||
- **Cross-Project Integration**: Enables the karaoke downloader to be used as a component in other projects with different data directory structures
|
|
||||||
- **Updated All Modules**: All modules now use the data path manager instead of hardcoded "data/" paths
|
|
||||||
- **Utility Functions**: Provides `get_data_path()`, `get_data_dir()`, and `get_data_path_manager()` functions for easy access
|
|
||||||
- **Fixed Circular Dependency**: Moved `config.json` from `data/` to root directory to resolve chicken-and-egg problem
|
|
||||||
|
|
||||||
### **Benefits of Configurable Data Directory**
|
|
||||||
- **Flexible Deployment**: Can be integrated into other projects with different directory structures
|
|
||||||
- **Centralized Configuration**: Single point of configuration for all data file paths
|
|
||||||
- **Maintainable Code**: Eliminates hardcoded paths throughout the codebase
|
|
||||||
- **Easy Testing**: Can use temporary directories for testing without affecting production data
|
|
||||||
- **Future-Proof**: Makes it easier to change data directory structure in the future
|
|
||||||
|
|
||||||
### **Circular Dependency Solution**
|
|
||||||
The original implementation had a circular dependency problem:
|
|
||||||
- **Problem**: `config.json` was located in the `data/` directory
|
|
||||||
- **Issue**: To read the config file, we needed to know where the data directory is
|
|
||||||
- **Conflict**: But the data directory location is specified in the config file
|
|
||||||
- **Solution**: Moved `config.json` to the `config/` directory as a fixed location
|
|
||||||
- **Result**: Config file is always accessible in a dedicated config directory, and data directory can be configured within it
|
|
||||||
- **Backward Compatibility**: System still works with config files in custom data directories when explicitly specified
|
|
||||||
|
|
||||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.6)
|
|
||||||
### **Dry Run Mode**
|
|
||||||
- **New `--dry-run` parameter**: Build download plan and show what would be downloaded without actually downloading anything
|
|
||||||
- **Plan preview**: Shows total videos in plan and preview of first 5 videos
|
|
||||||
- **Safe testing**: Test download configurations without consuming bandwidth or disk space
|
|
||||||
- **All mode support**: Works with all download modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel)
|
|
||||||
- **Progress simulation**: Shows what the download process would look like without executing it
|
|
||||||
|
|
||||||
### **Benefits of Dry Run Mode**
|
|
||||||
- **Safe testing**: Test complex download configurations without downloading anything
|
|
||||||
- **Plan validation**: Verify that the download plan contains the expected videos
|
|
||||||
- **Configuration debugging**: Troubleshoot download settings before committing to downloads
|
|
||||||
- **Resource conservation**: Save bandwidth and disk space during testing
|
|
||||||
- **User education**: Help users understand what the tool will do before running it
|
|
||||||
|
|
||||||
### **Example Usage**
|
|
||||||
```bash
|
|
||||||
# Test songlist download plan
|
|
||||||
python download_karaoke.py --songlist-only --limit 5 --dry-run
|
|
||||||
|
|
||||||
# Test channel download plan
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 10 --dry-run
|
|
||||||
|
|
||||||
# Test with fuzzy matching
|
|
||||||
python download_karaoke.py --songlist-only --fuzzy-match --limit 3 --dry-run
|
|
||||||
```
|
|
||||||
|
|
||||||
### **Future Development Guidelines**
|
|
||||||
|
|||||||
374
README.md
374
README.md
@ -1,6 +1,6 @@
|
|||||||
# 🎤 Karaoke Video Downloader
|
# 🎤 Karaoke Video Downloader
|
||||||
|
|
||||||
A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows and macOS with automatic platform detection.
|
A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows, macOS, and Linux with automatic platform detection, optimized caching, and FFmpeg integration.
|
||||||
|
|
||||||
## ✨ Features
|
## ✨ Features
|
||||||
- 🎵 **Channel & Playlist Downloads**: Download all videos from a YouTube channel or playlist
|
- 🎵 **Channel & Playlist Downloads**: Download all videos from a YouTube channel or playlist
|
||||||
@ -13,7 +13,7 @@ A Python-based cross-platform CLI tool to download karaoke videos from YouTube c
|
|||||||
- 📈 **Real-Time Progress**: Detailed console and log output
|
- 📈 **Real-Time Progress**: Detailed console and log output
|
||||||
- 🧹 **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI
|
- 🧹 **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI
|
||||||
- 🗂️ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
|
- 🗂️ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
|
||||||
- 🧩 **Enhanced Fuzzy Matching**: Advanced fuzzy string matching for songlist-to-video matching with improved video title parsing (handles multiple title formats like "Title Karaoke | Artist Karaoke Version")
|
- 🧩 **Fuzzy Matching**: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
|
||||||
- ⚡ **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
|
- ⚡ **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
|
||||||
- 🔄 **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
|
- 🔄 **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
|
||||||
- 📋 **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
|
- 📋 **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
|
||||||
@ -21,20 +21,13 @@ A Python-based cross-platform CLI tool to download karaoke videos from YouTube c
|
|||||||
- ⚡ **Optimized Scanning**: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching
|
- ⚡ **Optimized Scanning**: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching
|
||||||
- 🏷️ **Server Duplicates Tracking**: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server
|
- 🏷️ **Server Duplicates Tracking**: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server
|
||||||
- ⚡ **Parallel Downloads**: Enable concurrent downloads with `--parallel --workers N` for significantly faster batch downloads (3-5x speedup)
|
- ⚡ **Parallel Downloads**: Enable concurrent downloads with `--parallel --workers N` for significantly faster batch downloads (3-5x speedup)
|
||||||
- 📊 **Unmatched Songs Reports**: Generate detailed reports of songs that couldn't be found in any channel with `--generate-unmatched-report`
|
- 🌐 **Cross-Platform Support**: Automatic platform detection and yt-dlp integration for Windows, macOS, and Linux
|
||||||
- 🛡️ **Duplicate File Prevention**: Automatically detects and prevents duplicate files with `(2)`, `(3)` suffixes, with cleanup utility for existing duplicates
|
- 🚀 **Optimized Caching**: Enhanced channel video caching with instant video list loading
|
||||||
- 🏷️ **Consistent Metadata**: Filename and ID3 tag use identical artist/title format for clear file identification
|
- 🎬 **FFmpeg Integration**: Automatic FFmpeg installation and configuration for optimal video processing
|
||||||
- 🍎 **macOS Support**: Automatic platform detection and setup with native macOS binaries and FFmpeg integration
|
|
||||||
|
|
||||||
## 🏗️ Architecture
|
## 🏗️ Architecture
|
||||||
The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse:
|
The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse:
|
||||||
|
|
||||||
### **Configurable Data Directory (v3.4.7)**
|
|
||||||
- **Centralized Data Path Management**: `data_path_manager.py` provides unified data directory path management
|
|
||||||
- **Configurable Location**: Data directory path can be set in `config/config.json` under `folder_structure.data_dir`
|
|
||||||
- **Backward Compatibility**: Defaults to "data" directory if not configured
|
|
||||||
- **Cross-Project Integration**: Enables the karaoke downloader to be used as a component in other projects with different data directory structures
|
|
||||||
|
|
||||||
### Core Modules:
|
### Core Modules:
|
||||||
- **`downloader.py`**: Main orchestrator and CLI interface
|
- **`downloader.py`**: Main orchestrator and CLI interface
|
||||||
- **`video_downloader.py`**: Core video download execution and orchestration
|
- **`video_downloader.py`**: Core video download execution and orchestration
|
||||||
@ -56,191 +49,90 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
|||||||
- **`tracking_cli.py`**: Tracking management CLI
|
- **`tracking_cli.py`**: Tracking management CLI
|
||||||
|
|
||||||
### New Utility Modules (v3.3):
|
### New Utility Modules (v3.3):
|
||||||
|
- **`parallel_downloader.py`**: Parallel download management with thread-safe operations
|
||||||
|
- `ParallelDownloader` class: Manages concurrent downloads with configurable workers
|
||||||
|
- `DownloadTask` and `DownloadResult` dataclasses: Structured task and result management
|
||||||
|
- Thread-safe progress tracking and error handling
|
||||||
|
- Automatic retry mechanism for failed downloads
|
||||||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||||||
- **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded
|
- `sanitize_filename()`: Create safe filenames from artist/title
|
||||||
|
- `generate_possible_filenames()`: Generate filename patterns for different modes
|
||||||
|
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
|
||||||
|
- `is_valid_mp4_file()`: Validate MP4 files with header checking
|
||||||
|
- `cleanup_temp_files()`: Remove temporary yt-dlp files
|
||||||
|
- `ensure_directory_exists()`: Safe directory creation
|
||||||
|
|
||||||
### New Utility Modules (v3.4.7):
|
- **`song_validator.py`**: Centralized song validation logic
|
||||||
- **`data_path_manager.py`**: Centralized data directory path management and file path resolution
|
- `SongValidator` class: Unified logic for checking if songs should be downloaded
|
||||||
|
- `should_skip_song()`: Comprehensive validation with multiple criteria
|
||||||
|
- `mark_song_failed()`: Consistent failure tracking
|
||||||
|
- `handle_download_failure()`: Standardized error handling
|
||||||
|
|
||||||
### **Unified Download Workflow (v3.4.5)**
|
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
|
||||||
- **`execute_unified_download_workflow()`**: Centralized download execution that all modes use
|
- `ConfigManager` class: Type-safe configuration loading and caching
|
||||||
- **`_execute_sequential_downloads()`**: Sequential download execution using DownloadPipeline
|
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
|
||||||
- **`_execute_parallel_downloads()`**: Parallel download execution using ParallelDownloader
|
- Configuration validation and merging with defaults
|
||||||
|
- Dynamic resolution updates
|
||||||
|
|
||||||
### **Benefits of Enhanced Modular Architecture:**
|
### Benefits:
|
||||||
- **Single Responsibility**: Each module has a focused purpose
|
|
||||||
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
||||||
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
||||||
- **Testability**: Individual components can be tested separately
|
|
||||||
- **Maintainability**: Easier to find and fix issues
|
|
||||||
- **Reusability**: Components can be used independently
|
|
||||||
- **Robustness**: Better error handling and interruption recovery
|
|
||||||
- **Consistency**: Standardized error messages and processing pipelines
|
- **Consistency**: Standardized error messages and processing pipelines
|
||||||
|
- **Maintainability**: Changes isolated to specific modules
|
||||||
|
- **Testability**: Modular components can be tested independently
|
||||||
- **Type Safety**: Comprehensive type hints across all new modules
|
- **Type Safety**: Comprehensive type hints across all new modules
|
||||||
- **Unified Execution**: All download modes use the same execution pipeline for consistency
|
|
||||||
|
|
||||||
## 🔧 Development Guidelines
|
|
||||||
|
|
||||||
### **Adding New Download Modes**
|
|
||||||
When adding new download modes, follow the unified workflow pattern to ensure consistency:
|
|
||||||
|
|
||||||
#### **1. Build Download Plan (Mode-Specific)**
|
|
||||||
```python
|
|
||||||
def download_new_mode(self, ...):
|
|
||||||
# Build download plan with standard structure
|
|
||||||
download_plan = []
|
|
||||||
for video in videos_to_download:
|
|
||||||
download_plan.append({
|
|
||||||
"video_id": video["id"],
|
|
||||||
"artist": artist,
|
|
||||||
"title": title,
|
|
||||||
"filename": filename,
|
|
||||||
"channel_name": channel_name,
|
|
||||||
"video_title": video["title"],
|
|
||||||
"force_download": force_download
|
|
||||||
})
|
|
||||||
|
|
||||||
# Use unified execution workflow
|
|
||||||
downloaded_count, success = self.execute_unified_download_workflow(
|
|
||||||
download_plan=download_plan,
|
|
||||||
cache_file=cache_file,
|
|
||||||
limit=limit,
|
|
||||||
show_progress=True,
|
|
||||||
)
|
|
||||||
|
|
||||||
return success
|
|
||||||
```
|
|
||||||
|
|
||||||
#### **2. Key Principles**
|
|
||||||
- **NEVER implement custom download execution logic** - always use `execute_unified_download_workflow()`
|
|
||||||
- **Focus on download plan building** - that's where mode-specific logic belongs
|
|
||||||
- **Use the standard download plan structure** for consistency
|
|
||||||
- **Implement cache file handling** for progress tracking and resume functionality
|
|
||||||
- **Test with both sequential and parallel modes** to ensure compatibility
|
|
||||||
|
|
||||||
#### **3. Benefits of Unified Architecture**
|
|
||||||
- **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
|
||||||
- **Automatic Features**: New modes automatically get parallel downloads, progress tracking, and cache management
|
|
||||||
- **Maintainability**: Changes to download execution only need to be made in one place
|
|
||||||
- **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
|
||||||
|
|
||||||
## 🔧 Recent Improvements (v3.4.1)
|
|
||||||
### **Enhanced Fuzzy Matching**
|
|
||||||
- **Improved title parsing**: Enhanced `extract_artist_title` function to handle multiple video title formats
|
|
||||||
- **Better matching accuracy**: Can now parse titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
|
||||||
- **Consistent parsing**: All modules now use the same parsing logic from `fuzzy_matcher.py`
|
|
||||||
- **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
|
|
||||||
|
|
||||||
### **Fixed Import Conflicts**
|
|
||||||
- **Resolved import conflicts**: Updated modules to use the enhanced `extract_artist_title` from `fuzzy_matcher.py`
|
|
||||||
- **Consistent behavior**: All parts of the system use the same parsing logic
|
|
||||||
- **Cleaner codebase**: Eliminated duplicate code and import conflicts
|
|
||||||
|
|
||||||
### **Fixed --limit Parameter**
|
|
||||||
- **Correct limit application**: The `--limit` parameter now properly limits the scanning phase, not just downloads
|
|
||||||
- **Improved performance**: When using `--limit N`, only the first N songs are scanned, significantly reducing processing time
|
|
||||||
- **Accurate logging**: Logging messages now show the correct counts for songs that will actually be processed when using `--limit`
|
|
||||||
|
|
||||||
### **Code Quality Improvements**
|
|
||||||
- **Eliminated duplicate functions**: Removed duplicate `extract_artist_title` implementations
|
|
||||||
- **Fixed import conflicts**: Resolved inconsistencies between different parsing implementations
|
|
||||||
- **Single source of truth**: All title parsing logic is now centralized in `fuzzy_matcher.py`
|
|
||||||
|
|
||||||
## 🔧 Recent Improvements (v3.4.5)
|
|
||||||
### **Unified Download Workflow Architecture**
|
|
||||||
- **Unified execution pipeline**: All download modes now use the same execution workflow, eliminating inconsistencies and broken pipelines
|
|
||||||
- **Consistent behavior**: All modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) use identical download execution, progress tracking, and error handling
|
|
||||||
- **Centralized download logic**: Single `execute_unified_download_workflow()` method handles all download execution
|
|
||||||
- **Automatic parallel support**: All download modes automatically support `--parallel --workers N` without additional implementation
|
|
||||||
- **Unified cache management**: Consistent progress tracking and resume functionality across all modes
|
|
||||||
|
|
||||||
### **What Was Fixed**
|
|
||||||
- **Broken Pipeline**: Previously, different modes used different execution paths, leading to inconsistencies
|
|
||||||
- **Missing Method**: Added missing `download_latest_per_channel()` method that was referenced in CLI but not implemented
|
|
||||||
- **Code Duplication**: Eliminated duplicate download execution logic across different modes
|
|
||||||
- **Inconsistent Behavior**: All modes now have identical progress tracking, error handling, and cache management
|
|
||||||
|
|
||||||
### **Benefits**
|
|
||||||
- ✅ **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
|
||||||
- ✅ **Maintainability**: Changes to download execution only need to be made in one place
|
|
||||||
- ✅ **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
|
||||||
- ✅ **Extensibility**: New modes automatically get all existing features (parallel downloads, progress tracking, etc.)
|
|
||||||
- ✅ **Testing**: Easier to test since all modes use the same execution logic
|
|
||||||
|
|
||||||
## 🛡️ Duplicate File Prevention & Filename Consistency (v3.4.2)
|
|
||||||
### **Duplicate File Prevention**
|
|
||||||
- **Enhanced file existence checking**: Now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates
|
|
||||||
- **Automatic duplicate prevention**: Skips downloads when files already exist (including duplicates)
|
|
||||||
- **Updated yt-dlp configuration**: Set `"nooverwrites": false` to prevent yt-dlp from creating duplicate files
|
|
||||||
- **Cleanup utility**: `data/cleanup_duplicate_files.py` helps identify and remove existing duplicate files
|
|
||||||
|
|
||||||
### **Filename vs ID3 Tag Consistency**
|
|
||||||
- **Consistent metadata**: Filename and ID3 tag now use identical artist/title format
|
|
||||||
- **Removed extra suffixes**: No more "(Karaoke Version)" in ID3 tags that don't match filenames
|
|
||||||
- **Unified parsing**: Both filename generation and ID3 tagging use the same artist/title extraction
|
|
||||||
|
|
||||||
### **Benefits**
|
|
||||||
- ✅ **No more duplicate files** with `(2)`, `(3)` suffixes
|
|
||||||
- ✅ **Consistent metadata** between filename and ID3 tags
|
|
||||||
- ✅ **Efficient disk usage** by preventing unnecessary downloads
|
|
||||||
- ✅ **Clear file identification** with consistent naming
|
|
||||||
|
|
||||||
### **Clean Up Existing Duplicates**
|
|
||||||
```bash
|
|
||||||
# Run the cleanup utility to find and remove existing duplicates
|
|
||||||
python data/cleanup_duplicate_files.py
|
|
||||||
|
|
||||||
# Choose option 1 for dry run (recommended first)
|
|
||||||
# Choose option 2 to actually delete duplicates
|
|
||||||
```
|
|
||||||
|
|
||||||
## 📋 Requirements
|
## 📋 Requirements
|
||||||
- **Windows 10/11 or macOS 10.14+**
|
- **Windows 10/11, macOS 10.14+, or Linux**
|
||||||
- **Python 3.7+**
|
- **Python 3.7+**
|
||||||
- **yt-dlp binary** (platform-specific, see setup instructions below)
|
- **yt-dlp binary** (platform-specific, see setup instructions below)
|
||||||
- **mutagen** (for ID3 tagging, optional)
|
- **mutagen** (for ID3 tagging, optional)
|
||||||
- **ffmpeg/ffprobe** (for video validation, optional but recommended)
|
- **ffmpeg/ffprobe** (for video validation, optional but recommended)
|
||||||
- **rapidfuzz** (for fuzzy matching, optional, falls back to difflib)
|
- **rapidfuzz** (for fuzzy matching, optional, falls back to difflib)
|
||||||
|
|
||||||
## 🍎 macOS Setup
|
## 🖥️ Platform Setup
|
||||||
|
|
||||||
### Automatic Setup (Recommended)
|
### Automatic Setup (Recommended)
|
||||||
Run the macOS setup script to automatically set up yt-dlp and FFmpeg:
|
Run the platform setup script to automatically set up yt-dlp for your system:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 setup_macos.py
|
python setup_platform.py
|
||||||
```
|
```
|
||||||
|
|
||||||
This script will:
|
This script will:
|
||||||
- Detect your macOS version
|
- Detect your platform (Windows, macOS, or Linux)
|
||||||
- Offer installation options for yt-dlp (pip or binary download)
|
- Offer two installation options:
|
||||||
- Install FFmpeg via Homebrew
|
1. **Download binary file** (recommended for most users)
|
||||||
|
2. **Install via pip** (alternative method)
|
||||||
|
- Make binaries executable (on Unix-like systems)
|
||||||
|
- Install FFmpeg (for optimal video processing)
|
||||||
- Test the installation
|
- Test the installation
|
||||||
|
|
||||||
### Manual Setup
|
### Manual Setup
|
||||||
If you prefer to set up manually:
|
If you prefer to set up manually:
|
||||||
|
|
||||||
#### Option 1: Install yt-dlp via pip
|
#### Option 1: Download Binary Files
|
||||||
|
1. **Windows**: Download `yt-dlp.exe` from [yt-dlp releases](https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp.exe)
|
||||||
|
2. **macOS**: Download `yt-dlp_macos` from [yt-dlp releases](https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos)
|
||||||
|
3. **Linux**: Download `yt-dlp` from [yt-dlp releases](https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp)
|
||||||
|
|
||||||
|
Place the downloaded file in the `downloader/` directory and make it executable on Unix-like systems:
|
||||||
```bash
|
```bash
|
||||||
pip3 install yt-dlp
|
chmod +x downloader/yt-dlp_macos # macOS
|
||||||
|
chmod +x downloader/yt-dlp # Linux
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Option 2: Download yt-dlp binary
|
#### Option 2: Install via pip
|
||||||
```bash
|
```bash
|
||||||
mkdir -p downloader
|
pip install yt-dlp
|
||||||
curl -L -o downloader/yt-dlp_macos https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos
|
|
||||||
chmod +x downloader/yt-dlp_macos
|
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Install FFmpeg
|
The tool will automatically detect and use the pip-installed version on macOS.
|
||||||
```bash
|
|
||||||
brew install ffmpeg
|
|
||||||
```
|
|
||||||
|
|
||||||
### Test Installation
|
**Note**: FFmpeg is also required for optimal video processing. The setup script will attempt to install it automatically, or you can install it manually:
|
||||||
```bash
|
- **macOS**: `brew install ffmpeg`
|
||||||
python3 src/tests/test_macos.py
|
- **Linux**: `sudo apt install ffmpeg` (Ubuntu/Debian) or `sudo yum install ffmpeg` (CentOS/RHEL)
|
||||||
```
|
- **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
|
||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
@ -251,21 +143,6 @@ python3 src/tests/test_macos.py
|
|||||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
```
|
```
|
||||||
|
|
||||||
### Download ALL Videos from a Channel (Not Just Songlist Matches)
|
|
||||||
```bash
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos
|
|
||||||
```
|
|
||||||
|
|
||||||
### Download ALL Videos with Parallel Processing
|
|
||||||
```bash
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 10
|
|
||||||
```
|
|
||||||
|
|
||||||
### Download ALL Videos with Limit
|
|
||||||
```bash
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 100
|
|
||||||
```
|
|
||||||
|
|
||||||
### Download Only Songlist Songs (Fast Mode)
|
### Download Only Songlist Songs (Fast Mode)
|
||||||
```bash
|
```bash
|
||||||
python download_karaoke.py --songlist-only --limit 5
|
python download_karaoke.py --songlist-only --limit 5
|
||||||
@ -273,7 +150,7 @@ python download_karaoke.py --songlist-only --limit 5
|
|||||||
|
|
||||||
### Download with Parallel Processing
|
### Download with Parallel Processing
|
||||||
```bash
|
```bash
|
||||||
python download_karaoke.py --parallel --songlist-only --limit 10
|
python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
||||||
```
|
```
|
||||||
|
|
||||||
### Focus on Specific Playlists by Title
|
### Focus on Specific Playlists by Title
|
||||||
@ -281,31 +158,11 @@ python download_karaoke.py --parallel --songlist-only --limit 10
|
|||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"
|
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Focus on Specific Playlists from Custom File
|
|
||||||
```bash
|
|
||||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Force Download from Channels (Bypass All Existing File Checks)
|
|
||||||
```bash
|
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force
|
|
||||||
```
|
|
||||||
|
|
||||||
### Download with Fuzzy Matching
|
### Download with Fuzzy Matching
|
||||||
```bash
|
```bash
|
||||||
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
||||||
```
|
```
|
||||||
|
|
||||||
### Test Download Plan (Dry Run)
|
|
||||||
```bash
|
|
||||||
python download_karaoke.py --songlist-only --limit 5 --dry-run
|
|
||||||
```
|
|
||||||
|
|
||||||
### Test Channel Download Plan (Dry Run)
|
|
||||||
```bash
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 10 --dry-run
|
|
||||||
```
|
|
||||||
|
|
||||||
### Download Latest N Videos Per Channel
|
### Download Latest N Videos Per Channel
|
||||||
```bash
|
```bash
|
||||||
python download_karaoke.py --latest-per-channel --limit 5
|
python download_karaoke.py --latest-per-channel --limit 5
|
||||||
@ -410,33 +267,23 @@ KaroakeVideoDownloader/
|
|||||||
│ ├── check_resolution.py # Resolution checker utility
|
│ ├── check_resolution.py # Resolution checker utility
|
||||||
│ ├── resolution_cli.py # Resolution config CLI
|
│ ├── resolution_cli.py # Resolution config CLI
|
||||||
│ └── tracking_cli.py # Tracking management CLI
|
│ └── tracking_cli.py # Tracking management CLI
|
||||||
├── config/ # Configuration files
|
├── data/ # All config, tracking, cache, and songlist files
|
||||||
│ └── config.json # Main configuration file
|
│ ├── config.json
|
||||||
├── data/ # All tracking, cache, and songlist files
|
|
||||||
│ ├── karaoke_tracking.json
|
│ ├── karaoke_tracking.json
|
||||||
│ ├── songlist_tracking.json
|
│ ├── songlist_tracking.json
|
||||||
│ ├── channel_cache.json
|
│ ├── channel_cache.json
|
||||||
│ ├── channels.json # Channel configuration with parsing rules
|
│ ├── channels.txt
|
||||||
│ └── songList.json
|
│ └── songList.json
|
||||||
├── utilities/ # Utility scripts and tools
|
|
||||||
│ ├── add_manual_video.py # Manual video management
|
|
||||||
│ ├── build_cache_from_raw.py # Cache building utility
|
|
||||||
│ ├── cleanup_duplicate_files.py # File cleanup utilities
|
|
||||||
│ ├── cleanup_recent_tracking.py # Tracking cleanup utilities
|
|
||||||
│ ├── deduplicate_songlist_tracking.py # Data deduplication
|
|
||||||
│ ├── fix_artist_name_format.py # Data cleanup utilities
|
|
||||||
│ ├── fix_artist_name_format_simple.py
|
|
||||||
│ ├── fix_code_quality.py # Development tools
|
|
||||||
│ ├── reset_and_redownload.py # Maintenance utilities
|
|
||||||
│ └── songlist_report.py # Reporting utilities
|
|
||||||
├── downloads/ # All video output
|
├── downloads/ # All video output
|
||||||
│ └── [ChannelName]/ # Per-channel folders
|
│ └── [ChannelName]/ # Per-channel folders
|
||||||
├── logs/ # Download logs
|
├── logs/ # Download logs
|
||||||
├── downloader/yt-dlp.exe # yt-dlp binary (Windows)
|
├── downloader/yt-dlp.exe # yt-dlp binary (Windows)
|
||||||
├── downloader/yt-dlp_macos # yt-dlp binary (macOS)
|
├── downloader/yt-dlp_macos # yt-dlp binary (macOS)
|
||||||
├── src/tests/ # Test scripts
|
├── downloader/yt-dlp # yt-dlp binary (Linux)
|
||||||
│ ├── test_macos.py # macOS setup and functionality tests
|
├── setup_platform.py # Platform setup script
|
||||||
│ └── test_platform.py # Platform detection tests
|
├── test_platform.py # Platform test script
|
||||||
|
├── tests/ # Diagnostic and test scripts
|
||||||
|
│ └── test_installation.py
|
||||||
├── download_karaoke.py # Main entry point (thin wrapper)
|
├── download_karaoke.py # Main entry point (thin wrapper)
|
||||||
├── README.md
|
├── README.md
|
||||||
├── PRD.md
|
├── PRD.md
|
||||||
@ -453,7 +300,6 @@ KaroakeVideoDownloader/
|
|||||||
- `--songlist-priority`: Prioritize songlist songs in download queue
|
- `--songlist-priority`: Prioritize songlist songs in download queue
|
||||||
- `--songlist-only`: Download only songs from the songlist
|
- `--songlist-only`: Download only songs from the songlist
|
||||||
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
||||||
- `--songlist-file <FILE_PATH>`: Custom songlist file path to use with --songlist-focus (default: data/songList.json)
|
|
||||||
- `--songlist-status`: Show songlist download progress
|
- `--songlist-status`: Show songlist download progress
|
||||||
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
||||||
- `--resolution <720p|1080p|...>`: Override resolution
|
- `--resolution <720p|1080p|...>`: Override resolution
|
||||||
@ -465,14 +311,8 @@ KaroakeVideoDownloader/
|
|||||||
- `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
|
- `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
|
||||||
- `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
|
- `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
|
||||||
- `--fuzzy-threshold <N>`: Fuzzy match threshold (0-100, default 85)
|
- `--fuzzy-threshold <N>`: Fuzzy match threshold (0-100, default 85)
|
||||||
- `--parallel`: Enable parallel downloads for improved speed (defaults to 3 workers)
|
- `--parallel`: Enable parallel downloads for improved speed
|
||||||
- `--workers <N>`: Number of parallel download workers (1-10, default: 3, only used with --parallel)
|
- `--workers <N>`: Number of parallel download workers (1-10, default: 3)
|
||||||
- `--generate-songlist <DIR1> <DIR2>...`: **Generate song list from MP4 files with ID3 tags in specified directories**
|
|
||||||
- `--no-append-songlist`: **Create a new song list instead of appending when using --generate-songlist**
|
|
||||||
- `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
|
|
||||||
- `--channel-focus <CHANNEL_NAME>`: **Download from a specific channel by name (e.g., 'SingKingKaraoke')**
|
|
||||||
- `--all-videos`: **Download all videos from channel (not just songlist matches), skipping existing files**
|
|
||||||
- `--dry-run`: **Build download plan and show what would be downloaded without actually downloading anything**
|
|
||||||
|
|
||||||
## 📝 Example Usage
|
## 📝 Example Usage
|
||||||
|
|
||||||
@ -483,61 +323,30 @@ KaroakeVideoDownloader/
|
|||||||
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
||||||
|
|
||||||
# Parallel downloads for faster processing
|
# Parallel downloads for faster processing
|
||||||
python download_karaoke.py --parallel --songlist-only --limit 10
|
python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
||||||
|
|
||||||
# Latest videos per channel with parallel downloads
|
# Latest videos per channel with parallel downloads
|
||||||
python download_karaoke.py --parallel --latest-per-channel --limit 5
|
python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5
|
||||||
|
|
||||||
# Traditional full scan (no limit)
|
# Traditional full scan (no limit)
|
||||||
python download_karaoke.py --songlist-only
|
python download_karaoke.py --songlist-only
|
||||||
|
|
||||||
# Focused fuzzy matching (target specific playlists with flexible matching)
|
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --fuzzy-match --fuzzy-threshold 80 --limit 10
|
|
||||||
|
|
||||||
# Focus on specific playlists from a custom file
|
|
||||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json" --limit 10
|
|
||||||
|
|
||||||
# Force download with fuzzy matching (bypass all existing file checks)
|
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force --fuzzy-match --fuzzy-threshold 80 --limit 10
|
|
||||||
|
|
||||||
# Channel-specific operations
|
# Channel-specific operations
|
||||||
python download_karaoke.py --reset-channel SingKingKaraoke
|
python download_karaoke.py --reset-channel SingKingKaraoke
|
||||||
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
||||||
python download_karaoke.py --clear-cache all
|
python download_karaoke.py --clear-cache all
|
||||||
python download_karaoke.py --clear-server-duplicates
|
python download_karaoke.py --clear-server-duplicates
|
||||||
|
|
||||||
# Download ALL videos from a specific channel
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 10
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 100
|
|
||||||
|
|
||||||
# Song list generation from MP4 files
|
|
||||||
python download_karaoke.py --generate-songlist /path/to/mp4/directory
|
|
||||||
python download_karaoke.py --generate-songlist /path/to/dir1 /path/to/dir2 --no-append-songlist
|
|
||||||
|
|
||||||
# Generate report of songs that couldn't be found
|
|
||||||
python download_karaoke.py --generate-unmatched-report
|
|
||||||
python download_karaoke.py --generate-unmatched-report --fuzzy-match --fuzzy-threshold 85
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🏷️ ID3 Tagging
|
## 🏷️ ID3 Tagging
|
||||||
- Adds artist/title/album/genre to MP4 files using mutagen (if installed)
|
- Adds artist/title/album/genre to MP4 files using mutagen (if installed)
|
||||||
|
|
||||||
## 📋 Song List Generation
|
|
||||||
- **Generate song lists from existing MP4 files**: Use `--generate-songlist` to create song lists from directories containing MP4 files with ID3 tags
|
|
||||||
- **Automatic ID3 extraction**: Extracts artist and title from MP4 files' ID3 tags
|
|
||||||
- **Directory-based organization**: Each directory becomes a playlist with the directory name as the title
|
|
||||||
- **Position tracking**: Songs are numbered starting from 1 based on file order
|
|
||||||
- **Append or replace**: Choose to append to existing song list or create a new one with `--no-append-songlist`
|
|
||||||
- **Multiple directories**: Process multiple directories in a single command
|
|
||||||
|
|
||||||
## 🧹 Cleanup
|
## 🧹 Cleanup
|
||||||
- Removes `.info.json` and `.meta` files after download
|
- Removes `.info.json` and `.meta` files after download
|
||||||
|
|
||||||
## 🛠️ Configuration
|
## 🛠️ Configuration
|
||||||
- All options are in `config/config.json` (format, resolution, metadata, etc.)
|
- All options are in `data/config.json` (format, resolution, metadata, etc.)
|
||||||
- You can edit this file or use CLI flags to override
|
- You can edit this file or use CLI flags to override
|
||||||
- **Configurable Data Directory**: The data directory path can be configured in `config/config.json` under `folder_structure.data_dir` (default: "data")
|
|
||||||
|
|
||||||
## 📋 Command Reference File
|
## 📋 Command Reference File
|
||||||
|
|
||||||
@ -553,32 +362,7 @@ python download_karaoke.py --generate-unmatched-report --fuzzy-match --fuzzy-thr
|
|||||||
|
|
||||||
> **🔄 Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage.
|
> **🔄 Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage.
|
||||||
|
|
||||||
## 📚 Documentation Standards
|
## 🔧 Refactoring Improvements (v3.5)
|
||||||
|
|
||||||
### **Documentation Location**
|
|
||||||
- **All changes, refactoring, and improvements should be documented in the PRD.md and README.md files**
|
|
||||||
- **Do NOT create separate .md files for documenting changes, refactoring, or improvements**
|
|
||||||
- **Use the existing sections in PRD.md and README.md to track all project evolution**
|
|
||||||
|
|
||||||
### **Where to Document Changes**
|
|
||||||
- **PRD.md**: Technical details, architecture changes, bug fixes, and implementation specifics
|
|
||||||
- **README.md**: User-facing features, usage instructions, and high-level improvements
|
|
||||||
- **CHANGELOG.md**: Version-specific release notes and change summaries
|
|
||||||
|
|
||||||
### **Documentation Requirements**
|
|
||||||
- **All new features must be documented in both PRD.md and README.md**
|
|
||||||
- **All refactoring efforts must be documented in the appropriate sections**
|
|
||||||
- **All bug fixes must be documented with technical details**
|
|
||||||
- **Version numbers and dates should be clearly marked**
|
|
||||||
- **Benefits and improvements should be explicitly stated**
|
|
||||||
|
|
||||||
### **Maintenance Responsibility**
|
|
||||||
- **Keep PRD.md and README.md synchronized with code changes**
|
|
||||||
- **Update documentation immediately when implementing new features**
|
|
||||||
- **Remove outdated information and consolidate related changes**
|
|
||||||
- **Ensure all CLI options and features are documented in both files**
|
|
||||||
|
|
||||||
## 🔧 Refactoring Improvements (v3.3)
|
|
||||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||||
|
|
||||||
### **New Utility Modules (v3.3)**
|
### **New Utility Modules (v3.3)**
|
||||||
@ -613,9 +397,20 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
- **Improved Testability**: Modular components can be tested independently
|
- **Improved Testability**: Modular components can be tested independently
|
||||||
- **Better Developer Experience**: Clear function signatures and comprehensive documentation
|
- **Better Developer Experience**: Clear function signatures and comprehensive documentation
|
||||||
|
|
||||||
|
### **Cross-Platform Support (v3.5)**
|
||||||
|
- **Platform detection:** Automatic detection of Windows, macOS, and Linux systems
|
||||||
|
- **Flexible yt-dlp integration:** Supports both binary files and pip-installed yt-dlp modules
|
||||||
|
- **Platform-specific configuration:** Automatic selection of appropriate yt-dlp binary/command for each platform
|
||||||
|
- **Setup automation:** `setup_platform.py` script for easy platform-specific setup
|
||||||
|
- **Command parsing:** Intelligent parsing of yt-dlp commands (file paths vs. module commands)
|
||||||
|
- **Enhanced documentation:** Platform-specific setup instructions and troubleshooting
|
||||||
|
- **Backward compatibility:** Maintains full compatibility with existing Windows installations
|
||||||
|
- **FFmpeg integration:** Automatic FFmpeg installation and configuration for optimal video processing
|
||||||
|
- **Optimized caching:** Enhanced channel video caching with format compatibility and instant video list loading
|
||||||
|
|
||||||
### **New Parallel Download System (v3.4)**
|
### **New Parallel Download System (v3.4)**
|
||||||
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
||||||
- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
|
||||||
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
||||||
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
||||||
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
||||||
@ -639,8 +434,11 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
|
- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
|
||||||
|
|
||||||
## 🐞 Troubleshooting
|
## 🐞 Troubleshooting
|
||||||
- **Windows**: Ensure `yt-dlp.exe` is in the `downloader/` folder
|
- **Platform-specific yt-dlp setup**:
|
||||||
- **macOS**: Run `python3 setup_macos.py` to set up yt-dlp and FFmpeg
|
- **Windows**: Ensure `yt-dlp.exe` is in the `downloader/` folder
|
||||||
|
- **macOS**: Either ensure `yt-dlp_macos` is in the `downloader/` folder (make executable with `chmod +x`) OR install via pip (`pip install yt-dlp`)
|
||||||
|
- **Linux**: Ensure `yt-dlp` is in the `downloader/` folder (make executable with `chmod +x`)
|
||||||
|
- Run `python setup_platform.py` to automatically set up yt-dlp for your platform
|
||||||
- Check `logs/` for error details
|
- Check `logs/` for error details
|
||||||
- Use `python -m karaoke_downloader.check_resolution` to verify video quality
|
- Use `python -m karaoke_downloader.check_resolution` to verify video quality
|
||||||
- If you see errors about ffmpeg/ffprobe, install [ffmpeg](https://ffmpeg.org/download.html) and ensure it is in your PATH
|
- If you see errors about ffmpeg/ffprobe, install [ffmpeg](https://ffmpeg.org/download.html) and ensure it is in your PATH
|
||||||
|
|||||||
162
commands.txt
162
commands.txt
@ -1,6 +1,6 @@
|
|||||||
# 🎤 Karaoke Video Downloader - CLI Commands Reference
|
# 🎤 Karaoke Video Downloader - CLI Commands Reference
|
||||||
# Copy and paste these commands into your terminal
|
# Copy and paste these commands into your terminal
|
||||||
# Updated: v3.4.4 (includes macOS support, all videos download mode, manual video collection, channel parsing rules, and all previous improvements)
|
# Updated: v3.5 (includes cross-platform support, optimized caching, and all refactoring improvements)
|
||||||
|
|
||||||
## 📥 BASIC DOWNLOADS
|
## 📥 BASIC DOWNLOADS
|
||||||
|
|
||||||
@ -8,7 +8,7 @@
|
|||||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
|
|
||||||
# Download from a file containing multiple channel URLs
|
# Download from a file containing multiple channel URLs
|
||||||
python download_karaoke.py --file data/channels.json
|
python download_karaoke.py --file data/channels.txt
|
||||||
|
|
||||||
# Download with custom resolution (480p, 720p, 1080p, 1440p, 2160p)
|
# Download with custom resolution (480p, 720p, 1080p, 1440p, 2160p)
|
||||||
python download_karaoke.py --resolution 1080p https://www.youtube.com/@SingKingKaraoke/videos
|
python download_karaoke.py --resolution 1080p https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
@ -19,69 +19,9 @@ python download_karaoke.py --limit 10 https://www.youtube.com/@SingKingKaraoke/v
|
|||||||
# Enable parallel downloads for faster processing (3-5x speedup)
|
# Enable parallel downloads for faster processing (3-5x speedup)
|
||||||
python download_karaoke.py --parallel --workers 5 --limit 10 https://www.youtube.com/@SingKingKaraoke/videos
|
python download_karaoke.py --parallel --workers 5 --limit 10 https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
|
|
||||||
## 🎤 MANUAL VIDEO COLLECTION (v3.4.3)
|
|
||||||
|
|
||||||
# Download from manual videos collection (data/manual_videos.json)
|
|
||||||
python download_karaoke.py --manual --limit 5
|
|
||||||
|
|
||||||
# Download manual videos with fuzzy matching
|
|
||||||
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
|
||||||
|
|
||||||
# Download manual videos with parallel processing
|
|
||||||
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
|
||||||
|
|
||||||
# Download manual videos with songlist matching
|
|
||||||
python download_karaoke.py --manual --songlist-only --limit 10
|
|
||||||
|
|
||||||
# Force download from manual videos (bypass existing file checks)
|
|
||||||
python download_karaoke.py --manual --force --limit 5
|
|
||||||
|
|
||||||
# Add a video to manual collection (interactive)
|
|
||||||
python utilities/add_manual_video.py add "Artist - Song Title (Karaoke Version)" "https://www.youtube.com/watch?v=VIDEO_ID"
|
|
||||||
|
|
||||||
# List all manual videos
|
|
||||||
python utilities/add_manual_video.py list
|
|
||||||
|
|
||||||
# Remove a video from manual collection
|
|
||||||
python utilities/add_manual_video.py remove "Artist - Song Title (Karaoke Version)"
|
|
||||||
|
|
||||||
## 🎬 ALL VIDEOS DOWNLOAD MODE (v3.4.4)
|
|
||||||
|
|
||||||
# Download ALL videos from a specific channel (not just songlist matches)
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos
|
|
||||||
|
|
||||||
# Download ALL videos with parallel processing for speed
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 10
|
|
||||||
|
|
||||||
# Download ALL videos with limit (download first N videos)
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 100
|
|
||||||
|
|
||||||
# Download ALL videos with parallel processing and limit
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 5 --limit 50
|
|
||||||
|
|
||||||
# Download ALL videos from ZoomKaraokeOfficial channel
|
|
||||||
python download_karaoke.py --channel-focus ZoomKaraokeOfficial --all-videos
|
|
||||||
|
|
||||||
# Download ALL videos with custom resolution
|
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --resolution 1080p
|
|
||||||
|
|
||||||
## 📋 SONG LIST GENERATION
|
|
||||||
|
|
||||||
# Generate song list from MP4 files in a directory (append to existing song list)
|
|
||||||
python download_karaoke.py --generate-songlist /path/to/mp4/directory
|
|
||||||
|
|
||||||
# Generate song list from multiple directories
|
|
||||||
python download_karaoke.py --generate-songlist /path/to/dir1 /path/to/dir2 /path/to/dir3
|
|
||||||
|
|
||||||
# Generate song list and create a new song list file (don't append)
|
|
||||||
python download_karaoke.py --generate-songlist /path/to/mp4/directory --no-append-songlist
|
|
||||||
|
|
||||||
# Generate song list from multiple directories and create new file
|
|
||||||
python download_karaoke.py --generate-songlist /path/to/dir1 /path/to/dir2 --no-append-songlist
|
|
||||||
|
|
||||||
## 🎵 SONGLIST OPERATIONS
|
## 🎵 SONGLIST OPERATIONS
|
||||||
|
|
||||||
# Download only songs from your songlist (uses data/channels.json by default)
|
# Download only songs from your songlist (uses data/channels.txt by default)
|
||||||
python download_karaoke.py --songlist-only
|
python download_karaoke.py --songlist-only
|
||||||
|
|
||||||
# Download only songlist songs with limit
|
# Download only songlist songs with limit
|
||||||
@ -111,18 +51,6 @@ python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --limit 5
|
|||||||
# Focus on specific playlists with parallel processing
|
# Focus on specific playlists with parallel processing
|
||||||
python download_karaoke.py --parallel --workers 3 --songlist-focus "2025 - Apple Top 50" --limit 5
|
python download_karaoke.py --parallel --workers 3 --songlist-focus "2025 - Apple Top 50" --limit 5
|
||||||
|
|
||||||
# Focus on specific playlists from a custom songlist file
|
|
||||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json"
|
|
||||||
|
|
||||||
# Focus on specific playlists from a custom file with force mode
|
|
||||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json" --force
|
|
||||||
|
|
||||||
# Force download from channels regardless of existing files or server duplicates
|
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force
|
|
||||||
|
|
||||||
# Force download with parallel processing
|
|
||||||
python download_karaoke.py --parallel --workers 5 --songlist-focus "2025 - Apple Top 50" --force --limit 10
|
|
||||||
|
|
||||||
# Prioritize songlist songs in download queue (default behavior)
|
# Prioritize songlist songs in download queue (default behavior)
|
||||||
python download_karaoke.py --songlist-priority https://www.youtube.com/@SingKingKaraoke/videos
|
python download_karaoke.py --songlist-priority https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
|
|
||||||
@ -132,35 +60,6 @@ python download_karaoke.py --no-songlist-priority https://www.youtube.com/@SingK
|
|||||||
# Show songlist download status and statistics
|
# Show songlist download status and statistics
|
||||||
python download_karaoke.py --songlist-status
|
python download_karaoke.py --songlist-status
|
||||||
|
|
||||||
## 📊 UNMATCHED SONGS REPORTS
|
|
||||||
|
|
||||||
# Generate report of songs that couldn't be found in any channel (standalone)
|
|
||||||
python download_karaoke.py --generate-unmatched-report
|
|
||||||
|
|
||||||
# Generate report with fuzzy matching enabled (standalone)
|
|
||||||
python download_karaoke.py --generate-unmatched-report --fuzzy-match --fuzzy-threshold 85
|
|
||||||
|
|
||||||
# Generate report using a specific channel file (standalone)
|
|
||||||
python download_karaoke.py --generate-unmatched-report --file data/my_channels.txt
|
|
||||||
|
|
||||||
# Generate report from a custom songlist file (standalone)
|
|
||||||
python download_karaoke.py --generate-unmatched-report --songlist-file "data/my_custom_songlist.json"
|
|
||||||
|
|
||||||
# Generate report with focus on specific playlists from a custom file (standalone)
|
|
||||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json" --generate-unmatched-report
|
|
||||||
|
|
||||||
# Download songs AND generate unmatched report (additive feature)
|
|
||||||
python download_karaoke.py --songlist-only --limit 10 --generate-unmatched-report
|
|
||||||
|
|
||||||
# Download with fuzzy matching AND generate unmatched report
|
|
||||||
python download_karaoke.py --songlist-only --fuzzy-match --fuzzy-threshold 85 --limit 10 --generate-unmatched-report
|
|
||||||
|
|
||||||
# Download from specific playlists AND generate unmatched report
|
|
||||||
python download_karaoke.py --songlist-focus "CCKaraoke" --limit 10 --generate-unmatched-report
|
|
||||||
|
|
||||||
# Generate report with custom fuzzy threshold
|
|
||||||
python download_karaoke.py --generate-unmatched-report --fuzzy-match --fuzzy-threshold 80
|
|
||||||
|
|
||||||
## ⚡ PARALLEL DOWNLOADS (v3.4)
|
## ⚡ PARALLEL DOWNLOADS (v3.4)
|
||||||
|
|
||||||
# Basic parallel downloads (3-5x faster than sequential)
|
# Basic parallel downloads (3-5x faster than sequential)
|
||||||
@ -195,7 +94,7 @@ python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5
|
|||||||
python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85
|
python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85
|
||||||
|
|
||||||
# Download latest videos from specific channels file
|
# Download latest videos from specific channels file
|
||||||
python download_karaoke.py --latest-per-channel --limit 5 --file data/channels.json
|
python download_karaoke.py --latest-per-channel --limit 5 --file data/channels.txt
|
||||||
|
|
||||||
## 🔄 CACHE & TRACKING MANAGEMENT
|
## 🔄 CACHE & TRACKING MANAGEMENT
|
||||||
|
|
||||||
@ -254,7 +153,7 @@ python download_karaoke.py --version
|
|||||||
python download_karaoke.py --songlist-only --limit 20 --fuzzy-match --fuzzy-threshold 85 --resolution 1080p
|
python download_karaoke.py --songlist-only --limit 20 --fuzzy-match --fuzzy-threshold 85 --resolution 1080p
|
||||||
|
|
||||||
# Latest videos per channel with fuzzy matching
|
# Latest videos per channel with fuzzy matching
|
||||||
python download_karaoke.py --latest-per-channel --limit 3 --fuzzy-match --fuzzy-threshold 90 --file data/channels.json
|
python download_karaoke.py --latest-per-channel --limit 3 --fuzzy-match --fuzzy-threshold 90 --file data/channels.txt
|
||||||
|
|
||||||
# Force refresh everything and download songlist
|
# Force refresh everything and download songlist
|
||||||
python download_karaoke.py --songlist-only --force-download-plan --refresh --limit 10
|
python download_karaoke.py --songlist-only --force-download-plan --refresh --limit 10
|
||||||
@ -273,9 +172,6 @@ python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
|||||||
# 1b. Focus on specific playlists (fast targeted download)
|
# 1b. Focus on specific playlists (fast targeted download)
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --limit 5
|
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --limit 5
|
||||||
|
|
||||||
# 1c. Force download from specific playlists (bypass all existing file checks)
|
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force --limit 5
|
|
||||||
|
|
||||||
# 2. Latest videos from all channels
|
# 2. Latest videos from all channels
|
||||||
python download_karaoke.py --latest-per-channel --limit 5
|
python download_karaoke.py --latest-per-channel --limit 5
|
||||||
|
|
||||||
@ -294,9 +190,6 @@ python download_karaoke.py --parallel --workers 5 --songlist-only --fuzzy-match
|
|||||||
# 4b. Focused fuzzy matching (target specific playlists with flexible matching)
|
# 4b. Focused fuzzy matching (target specific playlists with flexible matching)
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --fuzzy-match --fuzzy-threshold 80 --limit 10
|
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --fuzzy-match --fuzzy-threshold 80 --limit 10
|
||||||
|
|
||||||
# 4c. Force download with fuzzy matching (bypass all existing file checks)
|
|
||||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force --fuzzy-match --fuzzy-threshold 80 --limit 10
|
|
||||||
|
|
||||||
# 5. Reset and start fresh
|
# 5. Reset and start fresh
|
||||||
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
||||||
|
|
||||||
@ -304,38 +197,27 @@ python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
|||||||
python download_karaoke.py --status
|
python download_karaoke.py --status
|
||||||
python download_karaoke.py --clear-cache all
|
python download_karaoke.py --clear-cache all
|
||||||
|
|
||||||
# 7. Download from manual video collection
|
## 🌐 PLATFORM SETUP COMMANDS (v3.5)
|
||||||
python download_karaoke.py --manual --limit 5
|
|
||||||
|
|
||||||
# 7b. Fast parallel manual video download
|
# Automatic platform setup (detects OS and installs yt-dlp + FFmpeg)
|
||||||
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
python setup_platform.py
|
||||||
|
|
||||||
# 7c. Manual videos with fuzzy matching
|
# Test platform detection and yt-dlp integration
|
||||||
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
python test_platform.py
|
||||||
|
|
||||||
## 🍎 macOS SETUP COMMANDS
|
# Manual platform-specific setup
|
||||||
|
# Windows: Download yt-dlp.exe to downloader/ folder
|
||||||
# Automatic macOS setup (detects OS and installs yt-dlp + FFmpeg)
|
# macOS: brew install ffmpeg && pip install yt-dlp
|
||||||
python3 setup_macos.py
|
# Linux: sudo apt install ffmpeg && download yt-dlp to downloader/ folder
|
||||||
|
|
||||||
# Test macOS setup and functionality
|
|
||||||
python3 src/tests/test_macos.py
|
|
||||||
|
|
||||||
# Manual macOS setup options
|
|
||||||
# Install yt-dlp via pip
|
|
||||||
pip3 install yt-dlp
|
|
||||||
|
|
||||||
# Download yt-dlp binary for macOS
|
|
||||||
mkdir -p downloader && curl -L -o downloader/yt-dlp_macos https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos && chmod +x downloader/yt-dlp_macos
|
|
||||||
|
|
||||||
# Install FFmpeg via Homebrew
|
|
||||||
brew install ffmpeg
|
|
||||||
|
|
||||||
## 🔧 TROUBLESHOOTING COMMANDS
|
## 🔧 TROUBLESHOOTING COMMANDS
|
||||||
|
|
||||||
# Check if everything is working
|
# Check if everything is working
|
||||||
python download_karaoke.py --version
|
python download_karaoke.py --version
|
||||||
|
|
||||||
|
# Test platform setup
|
||||||
|
python test_platform.py
|
||||||
|
|
||||||
# Force refresh everything
|
# Force refresh everything
|
||||||
python download_karaoke.py --force-download-plan --refresh --clear-cache all
|
python download_karaoke.py --force-download-plan --refresh --clear-cache all
|
||||||
|
|
||||||
@ -346,9 +228,7 @@ python download_karaoke.py --clear-server-duplicates
|
|||||||
## 📝 NOTES
|
## 📝 NOTES
|
||||||
|
|
||||||
# Default files used:
|
# Default files used:
|
||||||
# - data/channels.json (channel configuration with parsing rules, preferred)
|
# - data/channels.txt (default channel list for songlist modes)
|
||||||
# - data/channels.json (channel configuration with parsing rules)
|
|
||||||
# - data/manual_videos.json (manual video collection)
|
|
||||||
# - data/songList.json (your prioritized song list)
|
# - data/songList.json (your prioritized song list)
|
||||||
# - data/config.json (download settings)
|
# - data/config.json (download settings)
|
||||||
|
|
||||||
@ -357,12 +237,11 @@ python download_karaoke.py --clear-server-duplicates
|
|||||||
# Fuzzy threshold: 0-100 (higher = more strict matching, default 90)
|
# Fuzzy threshold: 0-100 (higher = more strict matching, default 90)
|
||||||
|
|
||||||
# The system automatically:
|
# The system automatically:
|
||||||
# - Uses data/channels.json for channel configuration and parsing rules
|
# - Uses data/channels.txt if no --file specified in songlist modes
|
||||||
# - Caches channel data for 24 hours (configurable)
|
# - Caches channel data for 24 hours (configurable)
|
||||||
# - Tracks all downloads in JSON files
|
# - Tracks all downloads in JSON files
|
||||||
# - Avoids re-downloading existing files
|
# - Avoids re-downloading existing files
|
||||||
# - Checks for server duplicates
|
# - Checks for server duplicates
|
||||||
# - Supports manual video collection via --manual parameter
|
|
||||||
|
|
||||||
# For best performance:
|
# For best performance:
|
||||||
# - Use --parallel --workers 5 for 3-5x faster downloads
|
# - Use --parallel --workers 5 for 3-5x faster downloads
|
||||||
@ -370,7 +249,8 @@ python download_karaoke.py --clear-server-duplicates
|
|||||||
# - Use --fuzzy-match for better song discovery
|
# - Use --fuzzy-match for better song discovery
|
||||||
# - Use --refresh sparingly (forces re-scan)
|
# - Use --refresh sparingly (forces re-scan)
|
||||||
# - Clear cache if you encounter issues
|
# - Clear cache if you encounter issues
|
||||||
# - macOS users: Run `python3 setup_macos.py` for automatic setup
|
# - Channel caching provides instant video list loading (no YouTube API calls)
|
||||||
|
# - FFmpeg integration ensures optimal video processing and merging
|
||||||
|
|
||||||
# Parallel download tips:
|
# Parallel download tips:
|
||||||
# - Start with --workers 3 for conservative approach
|
# - Start with --workers 3 for conservative approach
|
||||||
|
|||||||
4022
data/bak_songList.json
Normal file
4022
data/bak_songList.json
Normal file
File diff suppressed because it is too large
Load Diff
164578
data/channel_cache.json
Normal file
164578
data/channel_cache.json
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,19 +0,0 @@
|
|||||||
{
|
|
||||||
"channel_id": "@LetsSingKaraoke",
|
|
||||||
"videos": [
|
|
||||||
{
|
|
||||||
"title": "Sub Urban - Cradles | Karaoke (instrumental)",
|
|
||||||
"id": "8uj7IzhdiO4"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Sia - Snowman | Karaoke (instrumental)",
|
|
||||||
"id": "ZbWHuncTgsM"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Trevor Daniel - Falling | Karaoke (Instrumental)",
|
|
||||||
"id": "nU7n2aq7f98"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"last_updated": "2025-08-05T15:59:09.280488",
|
|
||||||
"video_count": 3
|
|
||||||
}
|
|
||||||
@ -1,10 +0,0 @@
|
|||||||
# Raw yt-dlp output for @LetsSingKaraoke
|
|
||||||
# Channel URL: https://www.youtube.com/@LetsSingKaraoke/videos
|
|
||||||
# Command: downloader/yt-dlp_macos --flat-playlist --print %(title)s|%(id)s|%(url)s --verbose https://www.youtube.com/@LetsSingKaraoke/videos
|
|
||||||
# Timestamp: 2025-08-05T15:59:09.280155
|
|
||||||
# Total lines: 3
|
|
||||||
################################################################################
|
|
||||||
|
|
||||||
1: Sub Urban - Cradles | Karaoke (instrumental)|8uj7IzhdiO4|https://www.youtube.com/watch?v=8uj7IzhdiO4
|
|
||||||
2: Sia - Snowman | Karaoke (instrumental)|ZbWHuncTgsM|https://www.youtube.com/watch?v=ZbWHuncTgsM
|
|
||||||
3: Trevor Daniel - Falling | Karaoke (Instrumental)|nU7n2aq7f98|https://www.youtube.com/watch?v=nU7n2aq7f98
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,191 +0,0 @@
|
|||||||
{
|
|
||||||
"channels": [
|
|
||||||
{
|
|
||||||
"name": "@SingKingKaraoke",
|
|
||||||
"url": "https://www.youtube.com/@SingKingKaraoke/videos",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "Karaoke Version"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"examples": [
|
|
||||||
"Artist - Title (Karaoke)",
|
|
||||||
"Artist - Title (Karaoke Version)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"description": "Standard artist - title format with karaoke suffix"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "@KaraokeOnVEVO",
|
|
||||||
"url": "https://www.youtube.com/@KaraokeOnVEVO/videos",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["(Karaoke)"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"examples": [
|
|
||||||
"George Jones - A Picture Of Me (Without You) (Karaoke)",
|
|
||||||
"Iggy Pop, Kate Pierson - Candy (Karaoke)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"description": "Standard artist - title format with (Karaoke) suffix"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "@StingrayKaraoke",
|
|
||||||
"url": "https://www.youtube.com/@StingrayKaraoke/videos",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["(Karaoke Version)"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"playlist_indicators": [
|
|
||||||
"TOP SONGS OF",
|
|
||||||
"THE BEST",
|
|
||||||
"BEST",
|
|
||||||
"NON-STOP",
|
|
||||||
"MASHUP",
|
|
||||||
"FEAT.",
|
|
||||||
"WITH LYRICS"
|
|
||||||
],
|
|
||||||
"examples": [
|
|
||||||
"Gracie Abrams - That's So True (Karaoke Version)",
|
|
||||||
"TOP SONGS OF 2024 KARAOKE WITH LYRICS BY BILLIE EILISH, GRACIE ABRAMS & MORE"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"description": "Standard artist - title format with (Karaoke Version) suffix, also has playlist titles"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "@sing2karaoke",
|
|
||||||
"url": "https://www.youtube.com/@sing2karaoke/videos",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_spaces",
|
|
||||||
"separator": " ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["(Karaoke Version) Lyrics", "(Karaoke Version)", "Karaoke Version Lyrics"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"multi_artist_separator": ", ",
|
|
||||||
"examples": [
|
|
||||||
"Lauren Spencer Smith Fingers Crossed",
|
|
||||||
"Calvin Harris, Clementine Douglas Blessings (Karaoke Version) Lyrics"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"description": "Artist and title separated by multiple spaces, supports multiple artists"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "@ZoomKaraokeOfficial",
|
|
||||||
"url": "https://www.youtube.com/@ZoomKaraokeOfficial/videos",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": [
|
|
||||||
"(Karaoke)",
|
|
||||||
"(Karaoke Version)",
|
|
||||||
"Karaoke Version",
|
|
||||||
"- Karaoke Version from Zoom Karaoke",
|
|
||||||
"- Karaoke Version from Zoom",
|
|
||||||
"- Karaoke Version from Zoom Karaoke (Radiohead Cover)",
|
|
||||||
"- Karaoke Version from Zoom (Radiohead Cover)"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"examples": [
|
|
||||||
"The Mavericks - Here Comes My Baby - Karaoke Version from Zoom Karaoke"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"description": "Standard artist - title format with '- Karaoke Version from Zoom Karaoke' suffix"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "@VocalStarKaraoke",
|
|
||||||
"url": "https://www.youtube.com/@VocalStarKaraoke/videos",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": false,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["KARAOKE Without Backing Vocals", "KARAOKE With Vocal Guide", "KARAOKE"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"examples": [
|
|
||||||
"Don't Say You Love Me - Jin KARAOKE Without Backing Vocals",
|
|
||||||
"Don't Say You Love Me - Jin KARAOKE With Vocal Guide"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"description": "Title first, then dash separator, then artist with KARAOKE suffix"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "@ManualVideos",
|
|
||||||
"url": "manual://static",
|
|
||||||
"manual_videos_file": "data/manual_videos.json",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"description": "Manual collection of individual karaoke videos (static, never expires)"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "Let's Sing Karaoke",
|
|
||||||
"url": "https://www.youtube.com/@LetsSingKaraoke/videos",
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "Karaoke Version", "(In the style of)"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"examples": [
|
|
||||||
"Artist - Title (Karaoke)",
|
|
||||||
"Artist - Title (In the style of Other Artist)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"artist_name_processing": true,
|
|
||||||
"description": "Let's Sing Karaoke with enhanced artist name processing"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"global_parsing_settings": {
|
|
||||||
"fallback_format": "artist_title_separator",
|
|
||||||
"fallback_separator": " - ",
|
|
||||||
"common_suffixes": [
|
|
||||||
"(Karaoke)",
|
|
||||||
"(Karaoke Version)",
|
|
||||||
"Karaoke Version",
|
|
||||||
"(Karaoke Version) Lyrics",
|
|
||||||
"Karaoke Version Lyrics"
|
|
||||||
],
|
|
||||||
"playlist_indicators": [
|
|
||||||
"TOP",
|
|
||||||
"BEST",
|
|
||||||
"MASHUP",
|
|
||||||
"FEAT.",
|
|
||||||
"WITH LYRICS",
|
|
||||||
"NON-STOP",
|
|
||||||
"PLAYLIST"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
7
data/channels.txt
Normal file
7
data/channels.txt
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
|
https://www.youtube.com/@karafun/videos
|
||||||
|
https://www.youtube.com/@KaraokeOnVEVO/videos
|
||||||
|
https://www.youtube.com/@StingrayKaraoke/videos
|
||||||
|
https://www.youtube.com/@CCKaraoke/videos
|
||||||
|
https://www.youtube.com/@AtomicKaraoke/videos
|
||||||
|
https://www.youtube.com/@sing2karaoke/videos
|
||||||
@ -2,11 +2,7 @@ import json
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from datetime import datetime, time
|
from datetime import datetime, time
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
def cleanup_recent_tracking(tracking_path="data/songlist_tracking.json", cutoff_time_str="11:00"):
|
||||||
|
|
||||||
def cleanup_recent_tracking(tracking_path=None, cutoff_time_str="11:00"):
|
|
||||||
if tracking_path is None:
|
|
||||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
|
||||||
"""Remove entries from songlist_tracking.json that were added after the specified time today."""
|
"""Remove entries from songlist_tracking.json that were added after the specified time today."""
|
||||||
tracking_file = Path(tracking_path)
|
tracking_file = Path(tracking_path)
|
||||||
if not tracking_file.exists():
|
if not tracking_file.exists():
|
||||||
@ -19,14 +19,13 @@
|
|||||||
"writethumbnail": false,
|
"writethumbnail": false,
|
||||||
"embed_metadata": false,
|
"embed_metadata": false,
|
||||||
"continuedl": true,
|
"continuedl": true,
|
||||||
"nooverwrites": false,
|
"nooverwrites": true,
|
||||||
"ignoreerrors": true,
|
"ignoreerrors": true,
|
||||||
"no_warnings": false
|
"no_warnings": false
|
||||||
},
|
},
|
||||||
"folder_structure": {
|
"folder_structure": {
|
||||||
"downloads_dir": "downloads",
|
"downloads_dir": "downloads",
|
||||||
"logs_dir": "logs",
|
"logs_dir": "logs",
|
||||||
"data_dir": "data",
|
|
||||||
"tracking_file": "downloaded_videos.json"
|
"tracking_file": "downloaded_videos.json"
|
||||||
},
|
},
|
||||||
"logging": {
|
"logging": {
|
||||||
@ -39,7 +38,8 @@
|
|||||||
"auto_detect_platform": true,
|
"auto_detect_platform": true,
|
||||||
"yt_dlp_paths": {
|
"yt_dlp_paths": {
|
||||||
"windows": "downloader/yt-dlp.exe",
|
"windows": "downloader/yt-dlp.exe",
|
||||||
"macos": "downloader/yt-dlp_macos"
|
"macos": "python3 -m yt_dlp",
|
||||||
|
"linux": "downloader/yt-dlp"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"yt_dlp_path": "downloader/yt-dlp.exe"
|
"yt_dlp_path": "downloader/yt-dlp.exe"
|
||||||
115120
data/karaoke_tracking.json
115120
data/karaoke_tracking.json
File diff suppressed because it is too large
Load Diff
@ -1,85 +0,0 @@
|
|||||||
{
|
|
||||||
"channel_name": "@ManualVideos",
|
|
||||||
"channel_url": "manual://static",
|
|
||||||
"description": "Manual collection of individual karaoke videos",
|
|
||||||
"videos": [
|
|
||||||
{
|
|
||||||
"title": "Nickelback - Photograph",
|
|
||||||
"url": "https://www.youtube.com/watch?v=qZXwpceqt9s",
|
|
||||||
"id": "qZXwpceqt9s",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Ed Sheeran & Beyoncé - Perfect Duet",
|
|
||||||
"url": "https://www.youtube.com/watch?v=qegLWI99Wg0",
|
|
||||||
"id": "qegLWI99Wg0",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "10,000 Maniacs - More Than This",
|
|
||||||
"url": "https://www.youtube.com/watch?v=wxnuF-APJ5M",
|
|
||||||
"id": "wxnuF-APJ5M",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "AC/DC - Big Balls",
|
|
||||||
"url": "https://www.youtube.com/watch?v=kiSDpVmu4Bk",
|
|
||||||
"id": "kiSDpVmu4Bk",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Jon Bon Jovi - Blaze of Glory",
|
|
||||||
"url": "https://www.youtube.com/watch?v=SzRAoDMlQY",
|
|
||||||
"id": "SzRAoDMlQY",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "ZZ Top - Sharp Dressed Man",
|
|
||||||
"url": "https://www.youtube.com/watch?v=prRalwto9iY",
|
|
||||||
"id": "prRalwto9iY",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Nickelback - Photograph",
|
|
||||||
"url": "https://www.youtube.com/watch?v=qTphCTAUhUg",
|
|
||||||
"id": "qTphCTAUhUg",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Billy Joel - Shes Got A Way",
|
|
||||||
"url": "https://www.youtube.com/watch?v=DeeTFIgKuC8",
|
|
||||||
"id": "DeeTFIgKuC8",
|
|
||||||
"upload_date": "2024-01-01",
|
|
||||||
"duration": 180,
|
|
||||||
"view_count": 1000
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": [
|
|
||||||
"(Karaoke)",
|
|
||||||
"(Karaoke Version)",
|
|
||||||
"(Karaoke Version) Lyrics"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
@ -23902,7 +23902,7 @@
|
|||||||
"title": "Superman (It's Not Easy)"
|
"title": "Superman (It's Not Easy)"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"artist": "'NSync",
|
"artist": "'N Sync",
|
||||||
"position": 16,
|
"position": 16,
|
||||||
"title": "Gone"
|
"title": "Gone"
|
||||||
},
|
},
|
||||||
@ -24122,7 +24122,7 @@
|
|||||||
"title": "Turn Off The Light"
|
"title": "Turn Off The Light"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"artist": "'NSync",
|
"artist": "'N Sync",
|
||||||
"position": 13,
|
"position": 13,
|
||||||
"title": "Gone"
|
"title": "Gone"
|
||||||
},
|
},
|
||||||
@ -24617,7 +24617,7 @@
|
|||||||
"title": "Most Girls"
|
"title": "Most Girls"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"artist": "'NSync",
|
"artist": "'N Sync",
|
||||||
"position": 11,
|
"position": 11,
|
||||||
"title": "This I Promise You"
|
"title": "This I Promise You"
|
||||||
},
|
},
|
||||||
@ -24857,7 +24857,7 @@
|
|||||||
"title": "I Just Wanna Love U (Give It 2 Me)"
|
"title": "I Just Wanna Love U (Give It 2 Me)"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"artist": "'NSync",
|
"artist": "'N Sync",
|
||||||
"position": 12,
|
"position": 12,
|
||||||
"title": "This I Promise You"
|
"title": "This I Promise You"
|
||||||
},
|
},
|
||||||
@ -25857,7 +25857,7 @@
|
|||||||
"title": "Tha Block Is Hot"
|
"title": "Tha Block Is Hot"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"artist": "'NSync & Gloria Estefan",
|
"artist": "'N Sync & Gloria Estefan",
|
||||||
"position": 85,
|
"position": 85,
|
||||||
"title": "Music Of My Heart"
|
"title": "Music Of My Heart"
|
||||||
},
|
},
|
||||||
@ -26237,7 +26237,7 @@
|
|||||||
"title": "Touch It"
|
"title": "Touch It"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"artist": "NSync",
|
"artist": "N Sync",
|
||||||
"position": 34,
|
"position": 34,
|
||||||
"title": "(God Must Have Spent) A Little More Time On You"
|
"title": "(God Must Have Spent) A Little More Time On You"
|
||||||
},
|
},
|
||||||
|
|||||||
@ -1,15 +1,11 @@
|
|||||||
import json
|
import json
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
def normalize_title(title):
|
def normalize_title(title):
|
||||||
normalized = title.replace("(Karaoke Version)", "").replace("(Karaoke)", "").strip()
|
normalized = title.replace("(Karaoke Version)", "").replace("(Karaoke)", "").strip()
|
||||||
return " ".join(normalized.split()).lower()
|
return " ".join(normalized.split()).lower()
|
||||||
|
|
||||||
def load_songlist(songlist_path=None):
|
def load_songlist(songlist_path="data/songList.json"):
|
||||||
if songlist_path is None:
|
|
||||||
songlist_path = str(get_data_path_manager().get_songlist_path())
|
|
||||||
songlist_file = Path(songlist_path)
|
songlist_file = Path(songlist_path)
|
||||||
if not songlist_file.exists():
|
if not songlist_file.exists():
|
||||||
print(f"⚠️ Songlist file not found: {songlist_path}")
|
print(f"⚠️ Songlist file not found: {songlist_path}")
|
||||||
@ -28,18 +24,14 @@ def load_songlist(songlist_path=None):
|
|||||||
})
|
})
|
||||||
return all_songs
|
return all_songs
|
||||||
|
|
||||||
def load_songlist_tracking(tracking_path=None):
|
def load_songlist_tracking(tracking_path="data/songlist_tracking.json"):
|
||||||
if tracking_path is None:
|
|
||||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
|
||||||
tracking_file = Path(tracking_path)
|
tracking_file = Path(tracking_path)
|
||||||
if not tracking_file.exists():
|
if not tracking_file.exists():
|
||||||
return {}
|
return {}
|
||||||
with open(tracking_file, 'r', encoding='utf-8') as f:
|
with open(tracking_file, 'r', encoding='utf-8') as f:
|
||||||
return json.load(f)
|
return json.load(f)
|
||||||
|
|
||||||
def load_server_songs(songs_path=None):
|
def load_server_songs(songs_path="data/songs.json"):
|
||||||
if songs_path is None:
|
|
||||||
songs_path = str(get_data_path_manager().get_songs_path())
|
|
||||||
"""Load the list of songs already available on the server."""
|
"""Load the list of songs already available on the server."""
|
||||||
songs_file = Path(songs_path)
|
songs_file = Path(songs_path)
|
||||||
if not songs_file.exists():
|
if not songs_file.exists():
|
||||||
File diff suppressed because it is too large
Load Diff
BIN
downloader/yt-dlp
Normal file
BIN
downloader/yt-dlp
Normal file
Binary file not shown.
@ -9,8 +9,6 @@ import json
|
|||||||
from datetime import datetime, timedelta
|
from datetime import datetime, timedelta
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
# Constants
|
# Constants
|
||||||
DEFAULT_CACHE_EXPIRATION_DAYS = 1
|
DEFAULT_CACHE_EXPIRATION_DAYS = 1
|
||||||
DEFAULT_CACHE_FILENAME_LENGTH_LIMIT = 200 # Increased from 60
|
DEFAULT_CACHE_FILENAME_LENGTH_LIMIT = 200 # Increased from 60
|
||||||
@ -39,7 +37,7 @@ def get_download_plan_cache_file(mode, **kwargs):
|
|||||||
+ hashlib.md5(base.encode()).hexdigest()[:8]
|
+ hashlib.md5(base.encode()).hexdigest()[:8]
|
||||||
)
|
)
|
||||||
|
|
||||||
return get_data_path_manager().get_path(f"{base}.json")
|
return Path(f"data/{base}.json")
|
||||||
|
|
||||||
|
|
||||||
def load_cached_plan(cache_file, max_age_days=DEFAULT_CACHE_EXPIRATION_DAYS):
|
def load_cached_plan(cache_file, max_age_days=DEFAULT_CACHE_EXPIRATION_DAYS):
|
||||||
|
|||||||
@ -1,260 +0,0 @@
|
|||||||
"""
|
|
||||||
Channel-specific parsing utilities for extracting artist and title from video titles.
|
|
||||||
|
|
||||||
This module handles the different title formats used by various karaoke channels,
|
|
||||||
providing channel-specific parsing rules to extract artist and title information
|
|
||||||
correctly for ID3 tagging and filename generation.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import re
|
|
||||||
from typing import Dict, List, Optional, Tuple, Any
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
|
|
||||||
class ChannelParser:
|
|
||||||
"""Handles channel-specific parsing of video titles to extract artist and title."""
|
|
||||||
|
|
||||||
def __init__(self, channels_file: str = None):
|
|
||||||
if channels_file is None:
|
|
||||||
channels_file = str(get_data_path_manager().get_channels_json_path())
|
|
||||||
"""Initialize the parser with channel configuration."""
|
|
||||||
self.channels_file = Path(channels_file)
|
|
||||||
self.channels_config = self._load_channels_config()
|
|
||||||
|
|
||||||
def _load_channels_config(self) -> Dict[str, Any]:
|
|
||||||
"""Load the channels configuration from JSON file."""
|
|
||||||
if not self.channels_file.exists():
|
|
||||||
raise FileNotFoundError(f"Channels configuration file not found: {self.channels_file}")
|
|
||||||
|
|
||||||
with open(self.channels_file, 'r', encoding='utf-8') as f:
|
|
||||||
return json.load(f)
|
|
||||||
|
|
||||||
def get_channel_config(self, channel_name: str) -> Optional[Dict[str, Any]]:
|
|
||||||
"""Get the configuration for a specific channel."""
|
|
||||||
for channel in self.channels_config.get("channels", []):
|
|
||||||
if channel["name"] == channel_name:
|
|
||||||
return channel
|
|
||||||
return None
|
|
||||||
|
|
||||||
def extract_artist_title(self, video_title: str, channel_name: str) -> Tuple[str, str]:
|
|
||||||
"""
|
|
||||||
Extract artist and title from a video title using channel-specific parsing rules.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
video_title: The full video title from YouTube
|
|
||||||
channel_name: The name of the channel (must match config)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (artist, title) - both may be empty strings if parsing fails
|
|
||||||
"""
|
|
||||||
channel_config = self.get_channel_config(channel_name)
|
|
||||||
if not channel_config:
|
|
||||||
# Fallback to global settings
|
|
||||||
return self._fallback_parse(video_title)
|
|
||||||
|
|
||||||
parsing_rules = channel_config.get("parsing_rules", {})
|
|
||||||
format_type = parsing_rules.get("format", "artist_title_separator")
|
|
||||||
|
|
||||||
if format_type == "artist_title_separator":
|
|
||||||
return self._parse_artist_title_separator(video_title, parsing_rules)
|
|
||||||
elif format_type == "artist_title_spaces":
|
|
||||||
return self._parse_artist_title_spaces(video_title, parsing_rules)
|
|
||||||
elif format_type == "title_artist_pipe":
|
|
||||||
return self._parse_title_artist_pipe(video_title, parsing_rules)
|
|
||||||
else:
|
|
||||||
return self._fallback_parse(video_title)
|
|
||||||
|
|
||||||
def _parse_artist_title_separator(self, video_title: str, rules: Dict[str, Any]) -> Tuple[str, str]:
|
|
||||||
"""Parse format: 'Artist - Title' or 'Title - Artist'."""
|
|
||||||
separator = rules.get("separator", " - ")
|
|
||||||
artist_first = rules.get("artist_first", True)
|
|
||||||
|
|
||||||
if separator not in video_title:
|
|
||||||
return "", video_title.strip()
|
|
||||||
|
|
||||||
parts = video_title.split(separator, 1)
|
|
||||||
if len(parts) != 2:
|
|
||||||
return "", video_title.strip()
|
|
||||||
|
|
||||||
part1, part2 = parts[0].strip(), parts[1].strip()
|
|
||||||
|
|
||||||
# Apply cleanup to both parts
|
|
||||||
part1_clean = self._cleanup_title(part1, rules.get("title_cleanup", {}))
|
|
||||||
part2_clean = self._cleanup_title(part2, rules.get("title_cleanup", {}))
|
|
||||||
|
|
||||||
if artist_first:
|
|
||||||
return part1_clean, part2_clean
|
|
||||||
else:
|
|
||||||
return part2_clean, part1_clean
|
|
||||||
|
|
||||||
def _parse_artist_title_spaces(self, video_title: str, rules: Dict[str, Any]) -> Tuple[str, str]:
|
|
||||||
"""Parse format: 'Artist Title' (multiple spaces)."""
|
|
||||||
separator = rules.get("separator", " ")
|
|
||||||
multi_artist_sep = rules.get("multi_artist_separator", ", ")
|
|
||||||
|
|
||||||
# Try multiple space patterns to handle inconsistent spacing
|
|
||||||
# Look for the LAST occurrence of multiple spaces to handle cases with commas
|
|
||||||
space_patterns = [" ", " ", " "] # 3, 2, 4 spaces
|
|
||||||
|
|
||||||
for pattern in space_patterns:
|
|
||||||
if pattern in video_title:
|
|
||||||
# Split on the LAST occurrence of the pattern
|
|
||||||
last_index = video_title.rfind(pattern)
|
|
||||||
if last_index != -1:
|
|
||||||
artist_part = video_title[:last_index].strip()
|
|
||||||
title_part = video_title[last_index + len(pattern):].strip()
|
|
||||||
|
|
||||||
# Handle multiple artists (e.g., "Artist1, Artist2")
|
|
||||||
if multi_artist_sep in artist_part:
|
|
||||||
# Keep the full artist string as is
|
|
||||||
artist = artist_part
|
|
||||||
else:
|
|
||||||
artist = artist_part
|
|
||||||
|
|
||||||
title = self._cleanup_title(title_part, rules.get("title_cleanup", {}))
|
|
||||||
|
|
||||||
return artist, title
|
|
||||||
|
|
||||||
# Try dash patterns as fallback for inconsistent formatting
|
|
||||||
dash_patterns = [" - ", " – ", " -"] # Regular dash, en dash, dash without trailing space
|
|
||||||
|
|
||||||
for pattern in dash_patterns:
|
|
||||||
if pattern in video_title:
|
|
||||||
# Split on the LAST occurrence of the pattern
|
|
||||||
last_index = video_title.rfind(pattern)
|
|
||||||
if last_index != -1:
|
|
||||||
artist_part = video_title[:last_index].strip()
|
|
||||||
title_part = video_title[last_index + len(pattern):].strip()
|
|
||||||
|
|
||||||
# Handle multiple artists (e.g., "Artist1, Artist2")
|
|
||||||
if multi_artist_sep in artist_part:
|
|
||||||
# Keep the full artist string as is
|
|
||||||
artist = artist_part
|
|
||||||
else:
|
|
||||||
artist = artist_part
|
|
||||||
|
|
||||||
title = self._cleanup_title(title_part, rules.get("title_cleanup", {}))
|
|
||||||
|
|
||||||
return artist, title
|
|
||||||
|
|
||||||
# If no pattern matches, return empty artist and full title
|
|
||||||
return "", video_title.strip()
|
|
||||||
|
|
||||||
def _parse_title_artist_pipe(self, video_title: str, rules: Dict[str, Any]) -> Tuple[str, str]:
|
|
||||||
"""Parse format: 'Title | Artist'."""
|
|
||||||
separator = rules.get("separator", " | ")
|
|
||||||
|
|
||||||
if separator not in video_title:
|
|
||||||
return "", video_title.strip()
|
|
||||||
|
|
||||||
parts = video_title.split(separator, 1)
|
|
||||||
if len(parts) != 2:
|
|
||||||
return "", video_title.strip()
|
|
||||||
|
|
||||||
title_part, artist_part = parts[0].strip(), parts[1].strip()
|
|
||||||
|
|
||||||
title = self._cleanup_title(title_part, rules.get("title_cleanup", {}))
|
|
||||||
artist = self._cleanup_title(artist_part, rules.get("artist_cleanup", {}))
|
|
||||||
|
|
||||||
return artist, title
|
|
||||||
|
|
||||||
def _cleanup_title(self, text: str, cleanup_rules: Dict[str, Any]) -> str:
|
|
||||||
"""Apply cleanup rules to remove suffixes and normalize text."""
|
|
||||||
if not cleanup_rules:
|
|
||||||
return text.strip()
|
|
||||||
|
|
||||||
cleaned = text.strip()
|
|
||||||
|
|
||||||
# Handle remove_suffix rule
|
|
||||||
if "remove_suffix" in cleanup_rules:
|
|
||||||
suffixes = cleanup_rules["remove_suffix"].get("suffixes", [])
|
|
||||||
for suffix in suffixes:
|
|
||||||
if cleaned.endswith(suffix):
|
|
||||||
cleaned = cleaned[:-len(suffix)].strip()
|
|
||||||
break
|
|
||||||
|
|
||||||
return cleaned
|
|
||||||
|
|
||||||
def _fallback_parse(self, video_title: str) -> Tuple[str, str]:
|
|
||||||
"""Fallback parsing using global settings."""
|
|
||||||
global_settings = self.channels_config.get("global_parsing_settings", {})
|
|
||||||
fallback_format = global_settings.get("fallback_format", "artist_title_separator")
|
|
||||||
fallback_separator = global_settings.get("fallback_separator", " - ")
|
|
||||||
|
|
||||||
if fallback_format == "artist_title_separator":
|
|
||||||
if fallback_separator in video_title:
|
|
||||||
parts = video_title.split(fallback_separator, 1)
|
|
||||||
if len(parts) == 2:
|
|
||||||
artist = parts[0].strip()
|
|
||||||
title = parts[1].strip()
|
|
||||||
# Apply global suffix cleanup
|
|
||||||
for suffix in global_settings.get("common_suffixes", []):
|
|
||||||
if title.endswith(suffix):
|
|
||||||
title = title[:-len(suffix)].strip()
|
|
||||||
break
|
|
||||||
return artist, title
|
|
||||||
|
|
||||||
# If all else fails, return empty artist and full title
|
|
||||||
return "", video_title.strip()
|
|
||||||
|
|
||||||
def is_playlist_title(self, video_title: str, channel_name: str) -> bool:
|
|
||||||
"""Check if a video title appears to be a playlist rather than a single song."""
|
|
||||||
channel_config = self.get_channel_config(channel_name)
|
|
||||||
if not channel_config:
|
|
||||||
return self._is_playlist_by_global_rules(video_title)
|
|
||||||
|
|
||||||
parsing_rules = channel_config.get("parsing_rules", {})
|
|
||||||
playlist_indicators = parsing_rules.get("playlist_indicators", [])
|
|
||||||
|
|
||||||
if not playlist_indicators:
|
|
||||||
return self._is_playlist_by_global_rules(video_title)
|
|
||||||
|
|
||||||
title_upper = video_title.upper()
|
|
||||||
for indicator in playlist_indicators:
|
|
||||||
if indicator.upper() in title_upper:
|
|
||||||
return True
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
def _is_playlist_by_global_rules(self, video_title: str) -> bool:
|
|
||||||
"""Check if title is a playlist using global rules."""
|
|
||||||
global_settings = self.channels_config.get("global_parsing_settings", {})
|
|
||||||
playlist_indicators = global_settings.get("playlist_indicators", [])
|
|
||||||
|
|
||||||
title_upper = video_title.upper()
|
|
||||||
for indicator in playlist_indicators:
|
|
||||||
if indicator.upper() in title_upper:
|
|
||||||
return True
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
def get_all_channel_names(self) -> List[str]:
|
|
||||||
"""Get a list of all configured channel names."""
|
|
||||||
return [channel["name"] for channel in self.channels_config.get("channels", [])]
|
|
||||||
|
|
||||||
def get_channel_url(self, channel_name: str) -> Optional[str]:
|
|
||||||
"""Get the URL for a specific channel."""
|
|
||||||
channel_config = self.get_channel_config(channel_name)
|
|
||||||
return channel_config.get("url") if channel_config else None
|
|
||||||
|
|
||||||
|
|
||||||
# Convenience function for backward compatibility
|
|
||||||
def extract_artist_title(video_title: str, channel_name: str, channels_file: str = None) -> Tuple[str, str]:
|
|
||||||
if channels_file is None:
|
|
||||||
channels_file = str(get_data_path_manager().get_channels_json_path())
|
|
||||||
"""
|
|
||||||
Convenience function to extract artist and title from a video title.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
video_title: The full video title from YouTube
|
|
||||||
channel_name: The name of the channel
|
|
||||||
channels_file: Path to the channels configuration file
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (artist, title)
|
|
||||||
"""
|
|
||||||
parser = ChannelParser(channels_file)
|
|
||||||
return parser.extract_artist_title(video_title, channel_name)
|
|
||||||
@ -1,117 +1,27 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Karaoke Video Downloader CLI
|
|
||||||
Command-line interface for the karaoke video downloader.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
from pathlib import Path
|
|
||||||
from typing import List
|
|
||||||
|
|
||||||
from karaoke_downloader.channel_parser import ChannelParser
|
from pathlib import Path
|
||||||
from karaoke_downloader.config_manager import AppConfig
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
from karaoke_downloader.downloader import KaraokeDownloader
|
from karaoke_downloader.downloader import KaraokeDownloader
|
||||||
|
|
||||||
# Constants
|
# Constants
|
||||||
DEFAULT_LATEST_PER_CHANNEL_LIMIT = 10
|
|
||||||
DEFAULT_FUZZY_THRESHOLD = 85
|
DEFAULT_FUZZY_THRESHOLD = 85
|
||||||
|
DEFAULT_LATEST_PER_CHANNEL_LIMIT = 5
|
||||||
|
DEFAULT_DISPLAY_LIMIT = 10
|
||||||
def load_channels_from_json(channels_file: str = None) -> List[str]:
|
DEFAULT_CACHE_DURATION_HOURS = 24
|
||||||
"""
|
|
||||||
Load channel URLs from the new JSON format.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
channels_file: Path to the channels.json file (if None, uses default from config)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of channel URLs
|
|
||||||
"""
|
|
||||||
if channels_file is None:
|
|
||||||
channels_file = str(get_data_path_manager().get_channels_json_path())
|
|
||||||
|
|
||||||
try:
|
|
||||||
parser = ChannelParser(channels_file)
|
|
||||||
channels = parser.channels_config.get("channels", [])
|
|
||||||
return [channel["url"] for channel in channels]
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Error loading channels from {channels_file}: {e}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
|
|
||||||
def load_channels_from_text(channels_file: str = None) -> List[str]:
|
|
||||||
"""
|
|
||||||
Load channel URLs from the old text format (for backward compatibility).
|
|
||||||
|
|
||||||
Args:
|
|
||||||
channels_file: Path to the channels.txt file (if None, uses default from config)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of channel URLs
|
|
||||||
"""
|
|
||||||
if channels_file is None:
|
|
||||||
channels_file = str(get_data_path_manager().get_channels_txt_path())
|
|
||||||
|
|
||||||
try:
|
|
||||||
with open(channels_file, "r", encoding="utf-8") as f:
|
|
||||||
return [
|
|
||||||
line.strip()
|
|
||||||
for line in f
|
|
||||||
if line.strip() and not line.strip().startswith("#")
|
|
||||||
]
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Error loading channels from {channels_file}: {e}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
|
|
||||||
def load_channels(channel_file: str = None) -> List[str]:
|
|
||||||
"""Load channel URLs from file."""
|
|
||||||
if channel_file is None:
|
|
||||||
# Use JSON configuration
|
|
||||||
data_path_manager = get_data_path_manager()
|
|
||||||
if data_path_manager.file_exists("channels.json"):
|
|
||||||
return load_channels_from_json()
|
|
||||||
else:
|
|
||||||
return []
|
|
||||||
else:
|
|
||||||
if channel_file.endswith(".json"):
|
|
||||||
return load_channels_from_json(channel_file)
|
|
||||||
else:
|
|
||||||
return load_channels_from_text(channel_file)
|
|
||||||
|
|
||||||
|
|
||||||
def get_channel_url_by_name(channel_name: str) -> str:
|
|
||||||
"""Look up a channel URL by its name from the channels configuration."""
|
|
||||||
channel_urls = load_channels()
|
|
||||||
|
|
||||||
# Normalize the channel name for comparison
|
|
||||||
normalized_name = channel_name.lower().replace("@", "").replace("karaoke", "").strip()
|
|
||||||
|
|
||||||
for url in channel_urls:
|
|
||||||
# Extract channel name from URL
|
|
||||||
if "/@" in url:
|
|
||||||
url_channel_name = url.split("/@")[1].split("/")[0].lower()
|
|
||||||
if url_channel_name == normalized_name or url_channel_name.replace("karaoke", "").strip() == normalized_name:
|
|
||||||
return url
|
|
||||||
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
parser = argparse.ArgumentParser(
|
parser = argparse.ArgumentParser(
|
||||||
description="Karaoke Video Downloader - Download YouTube playlists and channel videos for karaoke (default: downloads latest videos from all channels)",
|
description="Karaoke Video Downloader - Download YouTube playlists and channel videos for karaoke",
|
||||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
epilog="""
|
epilog="""
|
||||||
Examples:
|
Examples:
|
||||||
python download_karaoke.py --limit 10 # Download latest 10 videos from all channels
|
python download_karaoke.py https://www.youtube.com/playlist?list=XYZ
|
||||||
python download_karaoke.py --songlist-only --limit 10 # Download only songlist songs across channels
|
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --limit 5 # Download from specific channel
|
python download_karaoke.py --file data/channels.txt
|
||||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos # Download ALL videos from channel
|
|
||||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos # Download from specific channel URL
|
|
||||||
python download_karaoke.py --file data/channels.txt # Download from custom channel list
|
|
||||||
python download_karaoke.py --reset-channel SingKingKaraoke --delete-files
|
python download_karaoke.py --reset-channel SingKingKaraoke --delete-files
|
||||||
""",
|
""",
|
||||||
)
|
)
|
||||||
@ -182,34 +92,13 @@ Examples:
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--songlist-priority",
|
"--songlist-priority",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help="Prioritize downloads based on songList.json in the data directory (default: enabled)",
|
help="Prioritize downloads based on data/songList.json (default: enabled)",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--no-songlist-priority",
|
"--no-songlist-priority",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help="Disable songlist prioritization",
|
help="Disable songlist prioritization",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
|
||||||
"--generate-unmatched-report",
|
|
||||||
action="store_true",
|
|
||||||
help="Generate a report of songs that couldn't be found in any channel (runs after downloads)",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--show-pagination",
|
|
||||||
action="store_true",
|
|
||||||
help="Show page-by-page progress when downloading channel video lists (slower but more detailed)",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--parallel-channels",
|
|
||||||
action="store_true",
|
|
||||||
help="Enable parallel channel scanning for faster channel processing (scans multiple channels simultaneously)",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--channel-workers",
|
|
||||||
type=int,
|
|
||||||
default=3,
|
|
||||||
help="Number of parallel channel scanning workers (default: 3, max: 10)",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--songlist-only",
|
"--songlist-only",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
@ -221,16 +110,6 @@ Examples:
|
|||||||
metavar="PLAYLIST_TITLE",
|
metavar="PLAYLIST_TITLE",
|
||||||
help='Focus on specific playlists by title (e.g., --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100")',
|
help='Focus on specific playlists by title (e.g., --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100")',
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
|
||||||
"--songlist-file",
|
|
||||||
metavar="FILE_PATH",
|
|
||||||
help="Custom songlist file path to use with --songlist-focus (default: songList.json in the data directory)",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--force",
|
|
||||||
action="store_true",
|
|
||||||
help="Force download from channels regardless of whether songs are already downloaded, on server, or marked as duplicates",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--songlist-status",
|
"--songlist-status",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
@ -267,7 +146,7 @@ Examples:
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--latest-per-channel",
|
"--latest-per-channel",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help="Download the latest N videos from each channel (use with --limit) [DEPRECATED: This is now the default behavior]",
|
help="Download the latest N videos from each channel (use with --limit)",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--fuzzy-match",
|
"--fuzzy-match",
|
||||||
@ -277,50 +156,19 @@ Examples:
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--fuzzy-threshold",
|
"--fuzzy-threshold",
|
||||||
type=int,
|
type=int,
|
||||||
default=DEFAULT_FUZZY_THRESHOLD,
|
default=90,
|
||||||
help=f"Fuzzy match threshold (0-100, default {DEFAULT_FUZZY_THRESHOLD})",
|
help="Fuzzy match threshold (0-100, default 90)",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--parallel",
|
"--parallel",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help="Enable parallel downloads for improved speed (3-5x faster for large batches, defaults to 3 workers)",
|
help="Enable parallel downloads for improved speed",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--workers",
|
"--workers",
|
||||||
type=int,
|
type=int,
|
||||||
default=3,
|
default=3,
|
||||||
help="Number of parallel download workers (default: 3, max: 10, only used with --parallel)",
|
help="Number of parallel download workers (default: 3, max: 10)",
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--generate-songlist",
|
|
||||||
nargs="+",
|
|
||||||
metavar="DIRECTORY",
|
|
||||||
help="Generate song list from MP4 files with ID3 tags in specified directories",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--no-append-songlist",
|
|
||||||
action="store_true",
|
|
||||||
help="Create a new song list instead of appending when using --generate-songlist",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--manual",
|
|
||||||
action="store_true",
|
|
||||||
help="Download from manual videos collection (manual_videos.json in the data directory)",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--channel-focus",
|
|
||||||
type=str,
|
|
||||||
help="Download from a specific channel by name (e.g., 'SingKingKaraoke')",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--all-videos",
|
|
||||||
action="store_true",
|
|
||||||
help="Download all videos from channel (not just songlist matches), skipping existing files",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--dry-run",
|
|
||||||
action="store_true",
|
|
||||||
help="Build download plan and show what would be downloaded without actually downloading anything",
|
|
||||||
)
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
@ -329,11 +177,6 @@ Examples:
|
|||||||
print("❌ Error: --workers must be between 1 and 10")
|
print("❌ Error: --workers must be between 1 and 10")
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
# Validate channel workers argument
|
|
||||||
if args.channel_workers < 1 or args.channel_workers > 10:
|
|
||||||
print("❌ Error: --channel-workers must be between 1 and 10")
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
# Load configuration to get platform-aware yt-dlp path
|
# Load configuration to get platform-aware yt-dlp path
|
||||||
from karaoke_downloader.config_manager import load_config
|
from karaoke_downloader.config_manager import load_config
|
||||||
config = load_config()
|
config = load_config()
|
||||||
@ -344,12 +187,13 @@ Examples:
|
|||||||
# It's a command string, test if it works
|
# It's a command string, test if it works
|
||||||
try:
|
try:
|
||||||
import subprocess
|
import subprocess
|
||||||
cmd = yt_dlp_path.split() + ["--version"]
|
from karaoke_downloader.youtube_utils import _parse_yt_dlp_command
|
||||||
|
cmd = _parse_yt_dlp_command(yt_dlp_path) + ["--version"]
|
||||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
||||||
if result.returncode != 0:
|
if result.returncode != 0:
|
||||||
raise Exception(f"Command failed: {result.stderr}")
|
raise Exception(f"Command failed: {result.stderr}")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
platform_name = "macOS" if sys.platform == "darwin" else "Windows"
|
platform_name = "macOS" if sys.platform == "darwin" else "Windows" if sys.platform == "win32" else "Linux"
|
||||||
print(f"❌ Error: yt-dlp command failed: {yt_dlp_path}")
|
print(f"❌ Error: yt-dlp command failed: {yt_dlp_path}")
|
||||||
print(f"Please ensure yt-dlp is properly installed for {platform_name}")
|
print(f"Please ensure yt-dlp is properly installed for {platform_name}")
|
||||||
print(f"Error: {e}")
|
print(f"Error: {e}")
|
||||||
@ -358,7 +202,7 @@ Examples:
|
|||||||
# It's a file path, check if it exists
|
# It's a file path, check if it exists
|
||||||
yt_dlp_file = Path(yt_dlp_path)
|
yt_dlp_file = Path(yt_dlp_path)
|
||||||
if not yt_dlp_file.exists():
|
if not yt_dlp_file.exists():
|
||||||
platform_name = "macOS" if sys.platform == "darwin" else "Windows"
|
platform_name = "macOS" if sys.platform == "darwin" else "Windows" if sys.platform == "win32" else "Linux"
|
||||||
binary_name = yt_dlp_file.name
|
binary_name = yt_dlp_file.name
|
||||||
print(f"❌ Error: {binary_name} not found in downloader/ directory")
|
print(f"❌ Error: {binary_name} not found in downloader/ directory")
|
||||||
print(f"Please ensure {binary_name} is present in the downloader/ folder for {platform_name}")
|
print(f"Please ensure {binary_name} is present in the downloader/ folder for {platform_name}")
|
||||||
@ -392,19 +236,9 @@ Examples:
|
|||||||
if args.songlist_focus:
|
if args.songlist_focus:
|
||||||
downloader.songlist_focus_titles = args.songlist_focus
|
downloader.songlist_focus_titles = args.songlist_focus
|
||||||
downloader.songlist_only = True # Enable songlist-only mode when focusing
|
downloader.songlist_only = True # Enable songlist-only mode when focusing
|
||||||
args.songlist_only = True # Also set the args flag to ensure CLI logic works
|
|
||||||
print(
|
print(
|
||||||
f"🎯 Songlist focus mode enabled for playlists: {', '.join(args.songlist_focus)}"
|
f"🎯 Songlist focus mode enabled for playlists: {', '.join(args.songlist_focus)}"
|
||||||
)
|
)
|
||||||
if args.songlist_file:
|
|
||||||
downloader.songlist_file_path = args.songlist_file
|
|
||||||
print(f"📁 Using custom songlist file: {args.songlist_file}")
|
|
||||||
if args.force:
|
|
||||||
downloader.force_download = True
|
|
||||||
print("💪 Force mode enabled - will download regardless of existing files or server duplicates")
|
|
||||||
if args.dry_run:
|
|
||||||
downloader.dry_run = True
|
|
||||||
print("🔍 Dry run mode enabled - will show download plan without downloading")
|
|
||||||
if args.resolution != "720p":
|
if args.resolution != "720p":
|
||||||
downloader.config_manager.update_resolution(args.resolution)
|
downloader.config_manager.update_resolution(args.resolution)
|
||||||
|
|
||||||
@ -418,16 +252,17 @@ Examples:
|
|||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
# --- END NEW ---
|
# --- END NEW ---
|
||||||
|
|
||||||
# --- NEW: If no URL or file is provided, but --songlist-only is set, use all channels ---
|
# --- NEW: If no URL or file is provided, but --songlist-only is set, use all channels in data/channels.txt ---
|
||||||
if (args.songlist_only or args.songlist_focus) and not args.url and not args.file:
|
if args.songlist_only and not args.url and not args.file:
|
||||||
channel_urls = load_channels()
|
channels_file = Path("data/channels.txt")
|
||||||
if channel_urls:
|
if channels_file.exists():
|
||||||
|
args.file = str(channels_file)
|
||||||
print(
|
print(
|
||||||
"📋 No URL or --file provided, defaulting to all configured channels for songlist mode."
|
"📋 No URL or --file provided, defaulting to all channels in data/channels.txt for songlist-only mode."
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
print(
|
print(
|
||||||
"❌ No URL, --file, or channel configuration found. Please provide a channel URL or create channels.json in the data directory."
|
"❌ No URL, --file, or data/channels.txt found. Please provide a channel URL or a file with channel URLs."
|
||||||
)
|
)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
# --- END NEW ---
|
# --- END NEW ---
|
||||||
@ -447,22 +282,6 @@ Examples:
|
|||||||
print("ℹ️ Songs will be re-checked against the server on next run.")
|
print("ℹ️ Songs will be re-checked against the server on next run.")
|
||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
|
|
||||||
if args.generate_songlist:
|
|
||||||
from karaoke_downloader.songlist_generator import SongListGenerator
|
|
||||||
|
|
||||||
print("🎵 Generating song list from MP4 files with ID3 tags...")
|
|
||||||
generator = SongListGenerator()
|
|
||||||
try:
|
|
||||||
generator.generate_songlist_from_multiple_directories(
|
|
||||||
args.generate_songlist,
|
|
||||||
append=not args.no_append_songlist
|
|
||||||
)
|
|
||||||
print("✅ Song list generation completed successfully!")
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Error generating song list: {e}")
|
|
||||||
sys.exit(1)
|
|
||||||
sys.exit(0)
|
|
||||||
|
|
||||||
if args.status:
|
if args.status:
|
||||||
stats = downloader.tracker.get_statistics()
|
stats = downloader.tracker.get_statistics()
|
||||||
print("🎤 Karaoke Downloader Status")
|
print("🎤 Karaoke Downloader Status")
|
||||||
@ -480,10 +299,9 @@ Examples:
|
|||||||
print("💾 Channel Cache Information")
|
print("💾 Channel Cache Information")
|
||||||
print("=" * 40)
|
print("=" * 40)
|
||||||
print(f"Total Channels: {cache_info['total_channels']}")
|
print(f"Total Channels: {cache_info['total_channels']}")
|
||||||
print(f"Total Cached Videos: {cache_info['total_videos']}")
|
print(f"Total Cached Videos: {cache_info['total_cached_videos']}")
|
||||||
print("\n📋 Channel Details:")
|
print(f"Cache Duration: {cache_info['cache_duration_hours']} hours")
|
||||||
for channel in cache_info['channels']:
|
print(f"Last Updated: {cache_info['last_updated']}")
|
||||||
print(f" • {channel['channel']}: {channel['videos']} videos (updated: {channel['last_updated']})")
|
|
||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
elif args.clear_cache:
|
elif args.clear_cache:
|
||||||
if args.clear_cache == "all":
|
if args.clear_cache == "all":
|
||||||
@ -523,243 +341,71 @@ Examples:
|
|||||||
if len(tracking) > 10:
|
if len(tracking) > 10:
|
||||||
print(f" ... and {len(tracking) - 10} more")
|
print(f" ... and {len(tracking) - 10} more")
|
||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
elif args.manual:
|
|
||||||
# Download from manual videos collection
|
|
||||||
print("🎤 Downloading from manual videos collection...")
|
|
||||||
success = downloader.download_channel_videos(
|
|
||||||
"manual://static",
|
|
||||||
force_refresh=args.refresh,
|
|
||||||
fuzzy_match=args.fuzzy_match,
|
|
||||||
fuzzy_threshold=args.fuzzy_threshold,
|
|
||||||
force_download=args.force,
|
|
||||||
)
|
|
||||||
elif args.channel_focus:
|
|
||||||
# Download from a specific channel by name
|
|
||||||
print(f"🎤 Looking up channel: {args.channel_focus}")
|
|
||||||
channel_url = get_channel_url_by_name(args.channel_focus)
|
|
||||||
|
|
||||||
if not channel_url:
|
|
||||||
print(f"❌ Channel '{args.channel_focus}' not found in configuration")
|
|
||||||
print("Available channels:")
|
|
||||||
channel_urls = load_channels()
|
|
||||||
for url in channel_urls:
|
|
||||||
if "/@" in url:
|
|
||||||
channel_name = url.split("/@")[1].split("/")[0]
|
|
||||||
print(f" • {channel_name}")
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
if args.all_videos:
|
|
||||||
# Download ALL videos from the channel (not just songlist matches)
|
|
||||||
print(f"🎤 Downloading ALL videos from channel: {args.channel_focus} ({channel_url})")
|
|
||||||
success = downloader.download_all_channel_videos(
|
|
||||||
channel_url,
|
|
||||||
force_refresh=args.refresh,
|
|
||||||
force_download=args.force,
|
|
||||||
limit=args.limit,
|
|
||||||
dry_run=args.dry_run,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
# Download only songlist matches from the channel
|
|
||||||
print(f"🎤 Downloading from channel: {args.channel_focus} ({channel_url})")
|
|
||||||
success = downloader.download_channel_videos(
|
|
||||||
channel_url,
|
|
||||||
force_refresh=args.refresh,
|
|
||||||
fuzzy_match=args.fuzzy_match,
|
|
||||||
fuzzy_threshold=args.fuzzy_threshold,
|
|
||||||
force_download=args.force,
|
|
||||||
dry_run=args.dry_run,
|
|
||||||
)
|
|
||||||
elif args.songlist_only or args.songlist_focus:
|
elif args.songlist_only or args.songlist_focus:
|
||||||
# Use provided file or default to channels configuration
|
# Use provided file or default to data/channels.txt
|
||||||
channel_urls = load_channels(args.file)
|
channel_file = args.file if args.file else "data/channels.txt"
|
||||||
if not channel_urls:
|
if not os.path.exists(channel_file):
|
||||||
print(f"❌ No channels found in configuration")
|
print(f"❌ Channel file not found: {channel_file}")
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
limit = args.limit if args.limit else None
|
with open(channel_file, "r", encoding="utf-8") as f:
|
||||||
success = downloader.download_songlist_across_channels(
|
|
||||||
channel_urls,
|
|
||||||
limit=args.limit,
|
|
||||||
force_refresh_download_plan=args.force_download_plan if hasattr(args, "force_download_plan") else False,
|
|
||||||
fuzzy_match=args.fuzzy_match,
|
|
||||||
fuzzy_threshold=args.fuzzy_threshold,
|
|
||||||
force_download=args.force,
|
|
||||||
show_pagination=args.show_pagination,
|
|
||||||
parallel_channels=args.parallel_channels,
|
|
||||||
max_channel_workers=args.channel_workers,
|
|
||||||
dry_run=args.dry_run,
|
|
||||||
)
|
|
||||||
elif args.latest_per_channel:
|
|
||||||
# Use provided file or default to channels configuration
|
|
||||||
channel_urls = load_channels(args.file)
|
|
||||||
if not channel_urls:
|
|
||||||
print(f"❌ No channels found in configuration")
|
|
||||||
sys.exit(1)
|
|
||||||
limit = args.limit if args.limit else DEFAULT_LATEST_PER_CHANNEL_LIMIT
|
|
||||||
force_refresh_download_plan = (
|
|
||||||
args.force_download_plan if hasattr(args, "force_download_plan") else False
|
|
||||||
)
|
|
||||||
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
|
||||||
fuzzy_threshold = (
|
|
||||||
args.fuzzy_threshold
|
|
||||||
if hasattr(args, "fuzzy_threshold")
|
|
||||||
else DEFAULT_FUZZY_THRESHOLD
|
|
||||||
)
|
|
||||||
success = downloader.download_latest_per_channel(
|
|
||||||
channel_urls,
|
|
||||||
limit=limit,
|
|
||||||
force_refresh_download_plan=force_refresh_download_plan,
|
|
||||||
fuzzy_match=fuzzy_match,
|
|
||||||
fuzzy_threshold=fuzzy_threshold,
|
|
||||||
force_download=args.force,
|
|
||||||
dry_run=args.dry_run,
|
|
||||||
)
|
|
||||||
elif args.url:
|
|
||||||
success = downloader.download_channel_videos(
|
|
||||||
args.url, force_refresh=args.refresh, dry_run=args.dry_run
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
# Default behavior: download from channels (equivalent to --latest-per-channel)
|
|
||||||
print("🎯 No specific mode specified, defaulting to download from channels")
|
|
||||||
channel_urls = load_channels(args.file)
|
|
||||||
if not channel_urls:
|
|
||||||
print(f"❌ No channels found in configuration")
|
|
||||||
print("Please provide a channel URL or create channels.json in the data directory")
|
|
||||||
sys.exit(1)
|
|
||||||
limit = args.limit if args.limit else DEFAULT_LATEST_PER_CHANNEL_LIMIT
|
|
||||||
force_refresh_download_plan = (
|
|
||||||
args.force_download_plan if hasattr(args, "force_download_plan") else False
|
|
||||||
)
|
|
||||||
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
|
||||||
fuzzy_threshold = (
|
|
||||||
args.fuzzy_threshold
|
|
||||||
if hasattr(args, "fuzzy_threshold")
|
|
||||||
else DEFAULT_FUZZY_THRESHOLD
|
|
||||||
)
|
|
||||||
success = downloader.download_latest_per_channel(
|
|
||||||
channel_urls,
|
|
||||||
limit=limit,
|
|
||||||
force_refresh_download_plan=force_refresh_download_plan,
|
|
||||||
fuzzy_match=fuzzy_match,
|
|
||||||
fuzzy_threshold=fuzzy_threshold,
|
|
||||||
force_download=args.force,
|
|
||||||
dry_run=args.dry_run,
|
|
||||||
)
|
|
||||||
|
|
||||||
# Generate unmatched report if requested (additive feature)
|
|
||||||
if args.generate_unmatched_report:
|
|
||||||
from karaoke_downloader.download_planner import generate_unmatched_report, build_download_plan
|
|
||||||
from karaoke_downloader.songlist_manager import load_songlist
|
|
||||||
|
|
||||||
print("\n🔍 Generating unmatched songs report...")
|
|
||||||
|
|
||||||
# Load songlist based on focus mode
|
|
||||||
if args.songlist_focus:
|
|
||||||
# Load focused playlists
|
|
||||||
songlist_file_path = args.songlist_file if args.songlist_file else str(get_data_path_manager().get_songlist_path())
|
|
||||||
songlist_file = Path(songlist_file_path)
|
|
||||||
if not songlist_file.exists():
|
|
||||||
print(f"⚠️ Songlist file not found: {songlist_file_path}")
|
|
||||||
else:
|
|
||||||
try:
|
|
||||||
with open(songlist_file, "r", encoding="utf-8") as f:
|
|
||||||
raw_data = json.load(f)
|
|
||||||
|
|
||||||
# Filter playlists by title
|
|
||||||
focused_playlists = []
|
|
||||||
for playlist in raw_data:
|
|
||||||
playlist_title = playlist.get("title", "")
|
|
||||||
if playlist_title in args.songlist_focus:
|
|
||||||
focused_playlists.append(playlist)
|
|
||||||
|
|
||||||
if focused_playlists:
|
|
||||||
# Flatten the focused playlists into songs
|
|
||||||
focused_songs = []
|
|
||||||
seen = set()
|
|
||||||
for playlist in focused_playlists:
|
|
||||||
if "songs" in playlist:
|
|
||||||
for song in playlist["songs"]:
|
|
||||||
if "artist" in song and "title" in song:
|
|
||||||
artist = song["artist"].strip()
|
|
||||||
title = song["title"].strip()
|
|
||||||
key = f"{artist.lower()}_{title.lower()}"
|
|
||||||
if key in seen:
|
|
||||||
continue
|
|
||||||
seen.add(key)
|
|
||||||
focused_songs.append(
|
|
||||||
{
|
|
||||||
"artist": artist,
|
|
||||||
"title": title,
|
|
||||||
"position": song.get("position", 0),
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
songlist = focused_songs
|
|
||||||
else:
|
|
||||||
print(f"⚠️ No playlists found matching: {', '.join(args.songlist_focus)}")
|
|
||||||
songlist = []
|
|
||||||
|
|
||||||
except (json.JSONDecodeError, FileNotFoundError) as e:
|
|
||||||
print(f"⚠️ Could not load songlist for report: {e}")
|
|
||||||
songlist = []
|
|
||||||
else:
|
|
||||||
# Load all songs from songlist
|
|
||||||
songlist_path = args.songlist_file if args.songlist_file else str(get_data_path_manager().get_songlist_path())
|
|
||||||
songlist = load_songlist(songlist_path)
|
|
||||||
|
|
||||||
if songlist:
|
|
||||||
# Load channel URLs
|
|
||||||
channel_file = args.file if args.file else str(get_data_path_manager().get_channels_txt_path())
|
|
||||||
if os.path.exists(channel_file):
|
|
||||||
with open(channel_file, "r", encoding='utf-8') as f:
|
|
||||||
channel_urls = [
|
channel_urls = [
|
||||||
line.strip()
|
line.strip()
|
||||||
for line in f
|
for line in f
|
||||||
if line.strip() and not line.strip().startswith("#")
|
if line.strip() and not line.strip().startswith("#")
|
||||||
]
|
]
|
||||||
|
limit = args.limit if args.limit else None
|
||||||
print(f"📋 Analyzing {len(songlist)} songs against {len(channel_urls)} channels...")
|
force_refresh_download_plan = (
|
||||||
|
args.force_download_plan if hasattr(args, "force_download_plan") else False
|
||||||
# Build download plan to get unmatched songs
|
)
|
||||||
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
||||||
fuzzy_threshold = (
|
fuzzy_threshold = (
|
||||||
args.fuzzy_threshold
|
args.fuzzy_threshold
|
||||||
if hasattr(args, "fuzzy_threshold")
|
if hasattr(args, "fuzzy_threshold")
|
||||||
else DEFAULT_FUZZY_THRESHOLD
|
else DEFAULT_FUZZY_THRESHOLD
|
||||||
)
|
)
|
||||||
|
success = downloader.download_songlist_across_channels(
|
||||||
try:
|
|
||||||
download_plan, unmatched = build_download_plan(
|
|
||||||
channel_urls,
|
channel_urls,
|
||||||
songlist,
|
limit=limit,
|
||||||
downloader.tracker,
|
force_refresh_download_plan=force_refresh_download_plan,
|
||||||
downloader.yt_dlp_path,
|
|
||||||
fuzzy_match=fuzzy_match,
|
fuzzy_match=fuzzy_match,
|
||||||
fuzzy_threshold=fuzzy_threshold,
|
fuzzy_threshold=fuzzy_threshold,
|
||||||
)
|
)
|
||||||
|
elif args.latest_per_channel:
|
||||||
if unmatched:
|
# Use provided file or default to data/channels.txt
|
||||||
report_file = generate_unmatched_report(unmatched)
|
channel_file = args.file if args.file else "data/channels.txt"
|
||||||
print(f"\n📋 Unmatched songs report generated successfully!")
|
if not os.path.exists(channel_file):
|
||||||
print(f"📁 Report saved to: {report_file}")
|
|
||||||
print(f"📊 Summary: {len(download_plan)} songs found, {len(unmatched)} songs not found")
|
|
||||||
print(f"\n🔍 First 10 unmatched songs:")
|
|
||||||
for i, song in enumerate(unmatched[:10], 1):
|
|
||||||
print(f" {i:2d}. {song['artist']} - {song['title']}")
|
|
||||||
if len(unmatched) > 10:
|
|
||||||
print(f" ... and {len(unmatched) - 10} more songs")
|
|
||||||
else:
|
|
||||||
print(f"\n✅ All {len(songlist)} songs were found in the channels!")
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Error generating report: {e}")
|
|
||||||
else:
|
|
||||||
print(f"❌ Channel file not found: {channel_file}")
|
print(f"❌ Channel file not found: {channel_file}")
|
||||||
|
sys.exit(1)
|
||||||
|
with open(channel_file, "r", encoding="utf-8") as f:
|
||||||
|
channel_urls = [
|
||||||
|
line.strip()
|
||||||
|
for line in f
|
||||||
|
if line.strip() and not line.strip().startswith("#")
|
||||||
|
]
|
||||||
|
limit = args.limit if args.limit else DEFAULT_LATEST_PER_CHANNEL_LIMIT
|
||||||
|
force_refresh_download_plan = (
|
||||||
|
args.force_download_plan if hasattr(args, "force_download_plan") else False
|
||||||
|
)
|
||||||
|
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
||||||
|
fuzzy_threshold = (
|
||||||
|
args.fuzzy_threshold
|
||||||
|
if hasattr(args, "fuzzy_threshold")
|
||||||
|
else DEFAULT_FUZZY_THRESHOLD
|
||||||
|
)
|
||||||
|
success = downloader.download_latest_per_channel(
|
||||||
|
channel_urls,
|
||||||
|
limit=limit,
|
||||||
|
force_refresh_download_plan=force_refresh_download_plan,
|
||||||
|
fuzzy_match=fuzzy_match,
|
||||||
|
fuzzy_threshold=fuzzy_threshold,
|
||||||
|
)
|
||||||
|
elif args.url:
|
||||||
|
success = downloader.download_channel_videos(
|
||||||
|
args.url, force_refresh=args.refresh
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
print("❌ No songlist available for report generation")
|
parser.print_help()
|
||||||
|
sys.exit(1)
|
||||||
# Initialize success variable
|
|
||||||
success = False
|
|
||||||
|
|
||||||
downloader.tracker.force_save()
|
downloader.tracker.force_save()
|
||||||
if success:
|
if success:
|
||||||
print("\n🎤 All downloads completed successfully!")
|
print("\n🎤 All downloads completed successfully!")
|
||||||
|
|||||||
@ -36,7 +36,6 @@ DEFAULT_CONFIG = {
|
|||||||
"folder_structure": {
|
"folder_structure": {
|
||||||
"downloads_dir": "downloads",
|
"downloads_dir": "downloads",
|
||||||
"logs_dir": "logs",
|
"logs_dir": "logs",
|
||||||
"data_dir": "data",
|
|
||||||
"tracking_file": "data/karaoke_tracking.json",
|
"tracking_file": "data/karaoke_tracking.json",
|
||||||
},
|
},
|
||||||
"logging": {
|
"logging": {
|
||||||
@ -49,8 +48,9 @@ DEFAULT_CONFIG = {
|
|||||||
"auto_detect_platform": True,
|
"auto_detect_platform": True,
|
||||||
"yt_dlp_paths": {
|
"yt_dlp_paths": {
|
||||||
"windows": "downloader/yt-dlp.exe",
|
"windows": "downloader/yt-dlp.exe",
|
||||||
"macos": "downloader/yt-dlp_macos"
|
"macos": "downloader/yt-dlp_macos",
|
||||||
}
|
"linux": "downloader/yt-dlp",
|
||||||
|
},
|
||||||
},
|
},
|
||||||
"yt_dlp_path": "downloader/yt-dlp.exe",
|
"yt_dlp_path": "downloader/yt-dlp.exe",
|
||||||
}
|
}
|
||||||
@ -66,20 +66,23 @@ RESOLUTION_MAP = {
|
|||||||
|
|
||||||
|
|
||||||
def detect_platform() -> str:
|
def detect_platform() -> str:
|
||||||
"""Detect the current platform and return platform name."""
|
"""Detect the current platform and return the appropriate platform key."""
|
||||||
system = platform.system().lower()
|
system = platform.system().lower()
|
||||||
if system == "windows":
|
if system == "windows":
|
||||||
return "windows"
|
return "windows"
|
||||||
elif system == "darwin":
|
elif system == "darwin":
|
||||||
return "macos"
|
return "macos"
|
||||||
|
elif system == "linux":
|
||||||
|
return "linux"
|
||||||
else:
|
else:
|
||||||
return "windows" # Default to Windows for other platforms
|
# Default to windows for unknown platforms
|
||||||
|
return "windows"
|
||||||
|
|
||||||
|
|
||||||
def get_platform_yt_dlp_path(platform_paths: Dict[str, str]) -> str:
|
def get_platform_yt_dlp_path(platform_paths: Dict[str, str]) -> str:
|
||||||
"""Get the appropriate yt-dlp path for the current platform."""
|
"""Get the appropriate yt-dlp path for the current platform."""
|
||||||
platform_name = detect_platform()
|
platform_key = detect_platform()
|
||||||
return platform_paths.get(platform_name, platform_paths.get("windows", "downloader/yt-dlp.exe"))
|
return platform_paths.get(platform_key, platform_paths.get("windows", "downloader/yt-dlp.exe"))
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
@ -136,7 +139,6 @@ class FolderStructure:
|
|||||||
|
|
||||||
downloads_dir: str = "downloads"
|
downloads_dir: str = "downloads"
|
||||||
logs_dir: str = "logs"
|
logs_dir: str = "logs"
|
||||||
data_dir: str = "data"
|
|
||||||
tracking_file: str = "data/karaoke_tracking.json"
|
tracking_file: str = "data/karaoke_tracking.json"
|
||||||
|
|
||||||
|
|
||||||
@ -167,21 +169,14 @@ class ConfigManager:
|
|||||||
Manages application configuration with loading, validation, and caching.
|
Manages application configuration with loading, validation, and caching.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config_file: Union[str, Path] = "config/config.json", data_dir: Optional[str] = None):
|
def __init__(self, config_file: Union[str, Path] = "data/config.json"):
|
||||||
"""
|
"""
|
||||||
Initialize the configuration manager.
|
Initialize the configuration manager.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
config_file: Path to the configuration file
|
config_file: Path to the configuration file
|
||||||
data_dir: Optional custom data directory path
|
|
||||||
"""
|
"""
|
||||||
# If config_file is relative and data_dir is provided, make it relative to data_dir
|
|
||||||
if data_dir and not Path(config_file).is_absolute():
|
|
||||||
self.config_file = Path(data_dir) / config_file
|
|
||||||
else:
|
|
||||||
self.config_file = Path(config_file)
|
self.config_file = Path(config_file)
|
||||||
|
|
||||||
self._data_dir = data_dir
|
|
||||||
self._config: Optional[AppConfig] = None
|
self._config: Optional[AppConfig] = None
|
||||||
self._last_modified: Optional[datetime] = None
|
self._last_modified: Optional[datetime] = None
|
||||||
|
|
||||||
@ -342,35 +337,27 @@ class ConfigManager:
|
|||||||
_config_manager: Optional[ConfigManager] = None
|
_config_manager: Optional[ConfigManager] = None
|
||||||
|
|
||||||
|
|
||||||
def get_config_manager(config_file: Optional[Union[str, Path]] = None, data_dir: Optional[str] = None) -> ConfigManager:
|
def get_config_manager() -> ConfigManager:
|
||||||
"""
|
"""
|
||||||
Get the global configuration manager instance.
|
Get the global configuration manager instance.
|
||||||
|
|
||||||
Args:
|
|
||||||
config_file: Optional path to config file (default: "config.json" in root)
|
|
||||||
data_dir: Optional custom data directory path
|
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
ConfigManager instance
|
ConfigManager instance
|
||||||
"""
|
"""
|
||||||
global _config_manager
|
global _config_manager
|
||||||
if _config_manager is None or config_file is not None or data_dir is not None:
|
if _config_manager is None:
|
||||||
if config_file is None:
|
_config_manager = ConfigManager()
|
||||||
config_file = "config/config.json"
|
|
||||||
_config_manager = ConfigManager(config_file, data_dir)
|
|
||||||
return _config_manager
|
return _config_manager
|
||||||
|
|
||||||
|
|
||||||
def load_config(force_reload: bool = False, config_file: Optional[Union[str, Path]] = None, data_dir: Optional[str] = None) -> AppConfig:
|
def load_config(force_reload: bool = False) -> AppConfig:
|
||||||
"""
|
"""
|
||||||
Load configuration using the global manager.
|
Load configuration using the global manager.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
force_reload: Force reload even if file hasn't changed
|
force_reload: Force reload even if file hasn't changed
|
||||||
config_file: Optional path to config file (default: "config.json" in root)
|
|
||||||
data_dir: Optional custom data directory path
|
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
AppConfig instance
|
AppConfig instance
|
||||||
"""
|
"""
|
||||||
return get_config_manager(config_file, data_dir).load_config(force_reload)
|
return get_config_manager().load_config(force_reload)
|
||||||
|
|||||||
@ -1,184 +0,0 @@
|
|||||||
"""
|
|
||||||
Data path management utilities for the karaoke downloader.
|
|
||||||
Provides centralized data directory path management and file path resolution.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Optional
|
|
||||||
|
|
||||||
from .config_manager import get_config_manager
|
|
||||||
|
|
||||||
|
|
||||||
class DataPathManager:
|
|
||||||
"""
|
|
||||||
Manages data directory paths and provides utilities for resolving file paths
|
|
||||||
relative to the configured data directory.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, data_dir: Optional[str] = None):
|
|
||||||
"""
|
|
||||||
Initialize the data path manager.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
data_dir: Optional custom data directory path. If None, uses config.
|
|
||||||
"""
|
|
||||||
self._data_dir = data_dir
|
|
||||||
|
|
||||||
# If a custom data directory is provided, look for config.json in that directory
|
|
||||||
if data_dir:
|
|
||||||
config_file = Path(data_dir) / "config.json"
|
|
||||||
self._config_manager = get_config_manager(str(config_file))
|
|
||||||
else:
|
|
||||||
# Otherwise, use the default config.json in the root directory
|
|
||||||
self._config_manager = get_config_manager()
|
|
||||||
|
|
||||||
@property
|
|
||||||
def data_dir(self) -> Path:
|
|
||||||
"""
|
|
||||||
Get the configured data directory path.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Path to the data directory
|
|
||||||
"""
|
|
||||||
if self._data_dir:
|
|
||||||
return Path(self._data_dir)
|
|
||||||
|
|
||||||
# Get from config
|
|
||||||
config = self._config_manager.get_config()
|
|
||||||
data_dir = getattr(config.folder_structure, 'data_dir', 'data')
|
|
||||||
return Path(data_dir)
|
|
||||||
|
|
||||||
def get_path(self, filename: str) -> Path:
|
|
||||||
"""
|
|
||||||
Get the full path to a file in the data directory.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
filename: Name of the file (e.g., 'config.json', 'channels.json')
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Full path to the file
|
|
||||||
"""
|
|
||||||
return self.data_dir / filename
|
|
||||||
|
|
||||||
def get_channels_json_path(self) -> Path:
|
|
||||||
"""Get path to channels.json file."""
|
|
||||||
return self.get_path('channels.json')
|
|
||||||
|
|
||||||
def get_channels_txt_path(self) -> Path:
|
|
||||||
"""Get path to channels.txt file."""
|
|
||||||
return self.get_path('channels.txt')
|
|
||||||
|
|
||||||
def get_songlist_path(self) -> Path:
|
|
||||||
"""Get path to songList.json file."""
|
|
||||||
return self.get_path('songList.json')
|
|
||||||
|
|
||||||
def get_songlist_tracking_path(self) -> Path:
|
|
||||||
"""Get path to songlist_tracking.json file."""
|
|
||||||
return self.get_path('songlist_tracking.json')
|
|
||||||
|
|
||||||
def get_karaoke_tracking_path(self) -> Path:
|
|
||||||
"""Get path to karaoke_tracking.json file."""
|
|
||||||
return self.get_path('karaoke_tracking.json')
|
|
||||||
|
|
||||||
def get_server_duplicates_tracking_path(self) -> Path:
|
|
||||||
"""Get path to server_duplicates_tracking.json file."""
|
|
||||||
return self.get_path('server_duplicates_tracking.json')
|
|
||||||
|
|
||||||
def get_manual_videos_path(self) -> Path:
|
|
||||||
"""Get path to manual_videos.json file."""
|
|
||||||
return self.get_path('manual_videos.json')
|
|
||||||
|
|
||||||
def get_songs_path(self) -> Path:
|
|
||||||
"""Get path to songs.json file."""
|
|
||||||
return self.get_path('songs.json')
|
|
||||||
|
|
||||||
def get_channel_cache_dir(self) -> Path:
|
|
||||||
"""Get path to channel_cache directory."""
|
|
||||||
return self.get_path('channel_cache')
|
|
||||||
|
|
||||||
def get_channel_cache_path(self, channel_id: str) -> Path:
|
|
||||||
"""Get path to a specific channel cache file."""
|
|
||||||
return self.get_channel_cache_dir() / f"{channel_id}.json"
|
|
||||||
|
|
||||||
def get_download_plan_cache_path(self, plan_name: str, **kwargs) -> Path:
|
|
||||||
"""Get path to download plan cache file."""
|
|
||||||
# Create a hash from kwargs for unique cache files
|
|
||||||
import hashlib
|
|
||||||
if kwargs:
|
|
||||||
kwargs_str = str(sorted(kwargs.items()))
|
|
||||||
hash_suffix = hashlib.md5(kwargs_str.encode()).hexdigest()[:8]
|
|
||||||
plan_name = f"{plan_name}_{hash_suffix}"
|
|
||||||
return self.get_path(f"plan_latest_per_channel_{plan_name}.json")
|
|
||||||
|
|
||||||
def get_unmatched_report_path(self, timestamp: Optional[str] = None) -> Path:
|
|
||||||
"""Get path to unmatched songs report file."""
|
|
||||||
if timestamp:
|
|
||||||
return self.get_path(f"unmatched_songs_report_{timestamp}.json")
|
|
||||||
return self.get_path("unmatched_songs_report.json")
|
|
||||||
|
|
||||||
def ensure_data_dir_exists(self) -> None:
|
|
||||||
"""Ensure the data directory exists."""
|
|
||||||
self.data_dir.mkdir(parents=True, exist_ok=True)
|
|
||||||
|
|
||||||
def list_data_files(self) -> list:
|
|
||||||
"""List all files in the data directory."""
|
|
||||||
if not self.data_dir.exists():
|
|
||||||
return []
|
|
||||||
|
|
||||||
files = []
|
|
||||||
for file_path in self.data_dir.iterdir():
|
|
||||||
if file_path.is_file():
|
|
||||||
files.append(file_path.name)
|
|
||||||
return sorted(files)
|
|
||||||
|
|
||||||
def file_exists(self, filename: str) -> bool:
|
|
||||||
"""Check if a file exists in the data directory."""
|
|
||||||
return self.get_path(filename).exists()
|
|
||||||
|
|
||||||
|
|
||||||
# Global data path manager instance
|
|
||||||
_data_path_manager: Optional[DataPathManager] = None
|
|
||||||
|
|
||||||
|
|
||||||
def get_data_path_manager(data_dir: Optional[str] = None) -> DataPathManager:
|
|
||||||
"""
|
|
||||||
Get the global data path manager instance.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
data_dir: Optional custom data directory path
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
DataPathManager instance
|
|
||||||
"""
|
|
||||||
global _data_path_manager
|
|
||||||
if _data_path_manager is None or data_dir is not None:
|
|
||||||
_data_path_manager = DataPathManager(data_dir)
|
|
||||||
return _data_path_manager
|
|
||||||
|
|
||||||
|
|
||||||
def get_data_path(filename: str, data_dir: Optional[str] = None) -> Path:
|
|
||||||
"""
|
|
||||||
Get the full path to a file in the data directory.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
filename: Name of the file
|
|
||||||
data_dir: Optional custom data directory path
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Full path to the file
|
|
||||||
"""
|
|
||||||
return get_data_path_manager(data_dir).get_path(filename)
|
|
||||||
|
|
||||||
|
|
||||||
def get_data_dir(data_dir: Optional[str] = None) -> Path:
|
|
||||||
"""
|
|
||||||
Get the configured data directory path.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
data_dir: Optional custom data directory path
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Path to the data directory
|
|
||||||
"""
|
|
||||||
return get_data_path_manager(data_dir).data_dir
|
|
||||||
@ -20,12 +20,6 @@ from karaoke_downloader.youtube_utils import (
|
|||||||
execute_yt_dlp_command,
|
execute_yt_dlp_command,
|
||||||
show_available_formats,
|
show_available_formats,
|
||||||
)
|
)
|
||||||
from karaoke_downloader.file_utils import (
|
|
||||||
cleanup_temp_files,
|
|
||||||
get_unique_filename,
|
|
||||||
is_valid_mp4_file,
|
|
||||||
sanitize_filename,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class DownloadPipeline:
|
class DownloadPipeline:
|
||||||
@ -69,15 +63,9 @@ class DownloadPipeline:
|
|||||||
True if successful, False otherwise
|
True if successful, False otherwise
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
# Step 1: Prepare file path and check for existing files
|
# Step 1: Prepare file path
|
||||||
output_path, file_exists = get_unique_filename(self.downloads_dir, channel_name, artist, title)
|
filename = sanitize_filename(artist, title)
|
||||||
|
output_path = self.downloads_dir / channel_name / filename
|
||||||
if file_exists:
|
|
||||||
print(f"⏭️ Skipping download - file already exists: {output_path.name}")
|
|
||||||
# Still add tags and track the existing file
|
|
||||||
if self._add_tags(output_path, artist, title, channel_name):
|
|
||||||
self._track_download(output_path, artist, title, video_id, channel_name)
|
|
||||||
return True
|
|
||||||
|
|
||||||
# Step 2: Download video
|
# Step 2: Download video
|
||||||
if not self._download_video(video_id, output_path, artist, title, channel_name):
|
if not self._download_video(video_id, output_path, artist, title, channel_name):
|
||||||
@ -226,10 +214,8 @@ class DownloadPipeline:
|
|||||||
) -> bool:
|
) -> bool:
|
||||||
"""Step 3: Add ID3 tags to the downloaded file."""
|
"""Step 3: Add ID3 tags to the downloaded file."""
|
||||||
try:
|
try:
|
||||||
# Use the same artist/title as the filename for consistency
|
|
||||||
# Don't add "(Karaoke Version)" to the ID3 tag title
|
|
||||||
add_id3_tags(
|
add_id3_tags(
|
||||||
output_path, f"{artist} - {title}", channel_name
|
output_path, f"{artist} - {title} (Karaoke Version)", channel_name
|
||||||
)
|
)
|
||||||
print(f"🏷️ Added ID3 tags: {artist} - {title}")
|
print(f"🏷️ Added ID3 tags: {artist} - {title}")
|
||||||
return True
|
return True
|
||||||
@ -297,10 +283,9 @@ class DownloadPipeline:
|
|||||||
video_title = video.get("title", "")
|
video_title = video.get("title", "")
|
||||||
|
|
||||||
# Extract artist and title from video title
|
# Extract artist and title from video title
|
||||||
from karaoke_downloader.channel_parser import ChannelParser
|
from karaoke_downloader.id3_utils import extract_artist_title
|
||||||
|
|
||||||
channel_parser = ChannelParser()
|
artist, title = extract_artist_title(video_title)
|
||||||
artist, title = channel_parser.extract_artist_title(video_title, channel_name)
|
|
||||||
|
|
||||||
print(f" ({i}/{total}) Processing: {artist} - {title}")
|
print(f" ({i}/{total}) Processing: {artist} - {title}")
|
||||||
|
|
||||||
|
|||||||
@ -3,31 +3,19 @@ Download plan building utilities.
|
|||||||
Handles pre-scanning channels and building download plans.
|
Handles pre-scanning channels and building download plans.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import concurrent.futures
|
|
||||||
import hashlib
|
|
||||||
import json
|
|
||||||
import sys
|
|
||||||
from datetime import datetime
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any, Dict, List, Optional, Tuple
|
|
||||||
|
|
||||||
from karaoke_downloader.cache_manager import (
|
from karaoke_downloader.cache_manager import (
|
||||||
delete_plan_cache,
|
delete_plan_cache,
|
||||||
get_download_plan_cache_file,
|
get_download_plan_cache_file,
|
||||||
load_cached_plan,
|
load_cached_plan,
|
||||||
save_plan_cache,
|
save_plan_cache,
|
||||||
)
|
)
|
||||||
# Import all fuzzy matching functions
|
|
||||||
from karaoke_downloader.fuzzy_matcher import (
|
from karaoke_downloader.fuzzy_matcher import (
|
||||||
create_song_key,
|
create_song_key,
|
||||||
create_video_key,
|
extract_artist_title,
|
||||||
get_similarity_function,
|
get_similarity_function,
|
||||||
is_exact_match,
|
is_exact_match,
|
||||||
is_fuzzy_match,
|
is_fuzzy_match,
|
||||||
normalize_title,
|
|
||||||
)
|
)
|
||||||
from karaoke_downloader.channel_parser import ChannelParser
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
from karaoke_downloader.youtube_utils import get_channel_info
|
from karaoke_downloader.youtube_utils import get_channel_info
|
||||||
|
|
||||||
# Constants
|
# Constants
|
||||||
@ -35,156 +23,6 @@ DEFAULT_FILENAME_LENGTH_LIMIT = 100
|
|||||||
DEFAULT_ARTIST_LENGTH_LIMIT = 30
|
DEFAULT_ARTIST_LENGTH_LIMIT = 30
|
||||||
DEFAULT_TITLE_LENGTH_LIMIT = 60
|
DEFAULT_TITLE_LENGTH_LIMIT = 60
|
||||||
DEFAULT_FUZZY_THRESHOLD = 85
|
DEFAULT_FUZZY_THRESHOLD = 85
|
||||||
DEFAULT_DISPLAY_LIMIT = 10
|
|
||||||
|
|
||||||
|
|
||||||
def generate_unmatched_report(unmatched: List[Dict[str, Any]], report_path: str = None) -> str:
|
|
||||||
"""
|
|
||||||
Generate a detailed report of unmatched songs and save it to a file.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
unmatched: List of unmatched songs from build_download_plan
|
|
||||||
report_path: Optional path to save the report (default: data/unmatched_songs_report.json)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Path to the saved report file
|
|
||||||
"""
|
|
||||||
if report_path is None:
|
|
||||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
|
||||||
report_path = str(get_data_path_manager().get_unmatched_report_path(timestamp))
|
|
||||||
|
|
||||||
report_data = {
|
|
||||||
"generated_at": datetime.now().isoformat(),
|
|
||||||
"total_unmatched": len(unmatched),
|
|
||||||
"unmatched_songs": []
|
|
||||||
}
|
|
||||||
|
|
||||||
for song in unmatched:
|
|
||||||
report_data["unmatched_songs"].append({
|
|
||||||
"artist": song["artist"],
|
|
||||||
"title": song["title"],
|
|
||||||
"position": song.get("position", 0),
|
|
||||||
"search_key": create_song_key(song["artist"], song["title"])
|
|
||||||
})
|
|
||||||
|
|
||||||
# Sort by artist, then by title for easier reading
|
|
||||||
report_data["unmatched_songs"].sort(key=lambda x: (x["artist"].lower(), x["title"].lower()))
|
|
||||||
|
|
||||||
# Ensure the data directory exists
|
|
||||||
report_file = Path(report_path)
|
|
||||||
report_file.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
|
|
||||||
# Save the report
|
|
||||||
with open(report_file, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(report_data, f, indent=2, ensure_ascii=False)
|
|
||||||
|
|
||||||
return str(report_file)
|
|
||||||
|
|
||||||
|
|
||||||
def _scan_channel_for_matches(
|
|
||||||
channel_url,
|
|
||||||
channel_name,
|
|
||||||
channel_id,
|
|
||||||
song_keys,
|
|
||||||
song_lookup,
|
|
||||||
fuzzy_match,
|
|
||||||
fuzzy_threshold,
|
|
||||||
show_pagination,
|
|
||||||
yt_dlp_path,
|
|
||||||
tracker,
|
|
||||||
):
|
|
||||||
"""
|
|
||||||
Scan a single channel for matches (used in parallel processing).
|
|
||||||
|
|
||||||
Args:
|
|
||||||
channel_url: URL of the channel to scan
|
|
||||||
channel_name: Name of the channel
|
|
||||||
channel_id: ID of the channel
|
|
||||||
song_keys: Set of song keys to match against
|
|
||||||
song_lookup: Dictionary mapping song keys to song data
|
|
||||||
fuzzy_match: Whether to use fuzzy matching
|
|
||||||
fuzzy_threshold: Threshold for fuzzy matching
|
|
||||||
show_pagination: Whether to show pagination progress
|
|
||||||
yt_dlp_path: Path to yt-dlp executable
|
|
||||||
tracker: Tracking manager instance
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of video matches found in this channel
|
|
||||||
"""
|
|
||||||
print(f"\n🚦 Scanning channel: {channel_name} ({channel_url})")
|
|
||||||
|
|
||||||
# Get channel info if not provided
|
|
||||||
if not channel_name or not channel_id:
|
|
||||||
channel_name, channel_id = get_channel_info(channel_url)
|
|
||||||
|
|
||||||
# Fetch video list from channel
|
|
||||||
available_videos = tracker.get_channel_video_list(
|
|
||||||
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False, show_pagination=show_pagination
|
|
||||||
)
|
|
||||||
|
|
||||||
print(f" 📊 Channel has {len(available_videos)} videos to scan")
|
|
||||||
|
|
||||||
video_matches = []
|
|
||||||
|
|
||||||
# Pre-process video titles for efficient matching
|
|
||||||
channel_parser = ChannelParser()
|
|
||||||
if fuzzy_match:
|
|
||||||
# For fuzzy matching, create normalized video keys
|
|
||||||
for video in available_videos:
|
|
||||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
|
||||||
video_key = create_song_key(v_artist, v_title)
|
|
||||||
|
|
||||||
# Find best match among remaining songs
|
|
||||||
best_match = None
|
|
||||||
best_score = 0
|
|
||||||
for song_key in song_keys:
|
|
||||||
if song_key in song_lookup: # Only check unmatched songs
|
|
||||||
score = get_similarity_function()(song_key, video_key)
|
|
||||||
if score >= fuzzy_threshold and score > best_score:
|
|
||||||
best_score = score
|
|
||||||
best_match = song_key
|
|
||||||
|
|
||||||
if best_match:
|
|
||||||
song = song_lookup[best_match]
|
|
||||||
video_matches.append(
|
|
||||||
{
|
|
||||||
"artist": song["artist"],
|
|
||||||
"title": song["title"],
|
|
||||||
"channel_name": channel_name,
|
|
||||||
"channel_url": channel_url,
|
|
||||||
"video_id": video["id"],
|
|
||||||
"video_title": video["title"],
|
|
||||||
"match_score": best_score,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
# Remove matched song from future consideration
|
|
||||||
del song_lookup[best_match]
|
|
||||||
song_keys.remove(best_match)
|
|
||||||
else:
|
|
||||||
# For exact matching, use direct key comparison
|
|
||||||
for video in available_videos:
|
|
||||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
|
||||||
video_key = create_song_key(v_artist, v_title)
|
|
||||||
|
|
||||||
if video_key in song_keys:
|
|
||||||
song = song_lookup[video_key]
|
|
||||||
video_matches.append(
|
|
||||||
{
|
|
||||||
"artist": song["artist"],
|
|
||||||
"title": song["title"],
|
|
||||||
"channel_name": channel_name,
|
|
||||||
"channel_url": channel_url,
|
|
||||||
"video_id": video["id"],
|
|
||||||
"video_title": video["title"],
|
|
||||||
"match_score": 100,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
# Remove matched song from future consideration
|
|
||||||
del song_lookup[video_key]
|
|
||||||
song_keys.remove(video_key)
|
|
||||||
|
|
||||||
print(f" ✅ Found {len(video_matches)} matches in {channel_name}")
|
|
||||||
return video_matches
|
|
||||||
|
|
||||||
|
|
||||||
def build_download_plan(
|
def build_download_plan(
|
||||||
@ -194,9 +32,6 @@ def build_download_plan(
|
|||||||
yt_dlp_path,
|
yt_dlp_path,
|
||||||
fuzzy_match=False,
|
fuzzy_match=False,
|
||||||
fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD,
|
fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD,
|
||||||
show_pagination=False,
|
|
||||||
parallel_channels=False,
|
|
||||||
max_channel_workers=3,
|
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
For each song in undownloaded, scan all channels for a match.
|
For each song in undownloaded, scan all channels for a match.
|
||||||
@ -217,120 +52,6 @@ def build_download_plan(
|
|||||||
song_keys.add(key)
|
song_keys.add(key)
|
||||||
song_lookup[key] = song
|
song_lookup[key] = song
|
||||||
|
|
||||||
if parallel_channels:
|
|
||||||
print(f"🚀 Running parallel channel scanning with {max_channel_workers} workers.")
|
|
||||||
|
|
||||||
# Create a thread-safe copy of song data for parallel processing
|
|
||||||
import threading
|
|
||||||
song_keys_lock = threading.Lock()
|
|
||||||
song_lookup_lock = threading.Lock()
|
|
||||||
|
|
||||||
def scan_channel_safe(channel_url):
|
|
||||||
"""Thread-safe channel scanning function."""
|
|
||||||
print(f"\n🚦 Scanning channel: {channel_url}")
|
|
||||||
|
|
||||||
# Get channel info
|
|
||||||
channel_name, channel_id = get_channel_info(channel_url)
|
|
||||||
print(f" ✅ Channel info: {channel_name} (ID: {channel_id})")
|
|
||||||
|
|
||||||
# Fetch video list from channel
|
|
||||||
available_videos = tracker.get_channel_video_list(
|
|
||||||
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False, show_pagination=show_pagination
|
|
||||||
)
|
|
||||||
print(f" 📊 Channel has {len(available_videos)} videos to scan")
|
|
||||||
|
|
||||||
video_matches = []
|
|
||||||
|
|
||||||
# Pre-process video titles for efficient matching
|
|
||||||
channel_parser = ChannelParser()
|
|
||||||
if fuzzy_match:
|
|
||||||
# For fuzzy matching, create normalized video keys
|
|
||||||
for video in available_videos:
|
|
||||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
|
||||||
video_key = create_song_key(v_artist, v_title)
|
|
||||||
|
|
||||||
# Find best match among remaining songs (thread-safe)
|
|
||||||
best_match = None
|
|
||||||
best_score = 0
|
|
||||||
with song_keys_lock:
|
|
||||||
available_song_keys = list(song_keys) # Copy for iteration
|
|
||||||
|
|
||||||
for song_key in available_song_keys:
|
|
||||||
with song_lookup_lock:
|
|
||||||
if song_key in song_lookup: # Only check unmatched songs
|
|
||||||
score = get_similarity_function()(song_key, video_key)
|
|
||||||
if score >= fuzzy_threshold and score > best_score:
|
|
||||||
best_score = score
|
|
||||||
best_match = song_key
|
|
||||||
|
|
||||||
if best_match:
|
|
||||||
with song_lookup_lock:
|
|
||||||
if best_match in song_lookup: # Double-check it's still available
|
|
||||||
song = song_lookup[best_match]
|
|
||||||
video_matches.append(
|
|
||||||
{
|
|
||||||
"artist": song["artist"],
|
|
||||||
"title": song["title"],
|
|
||||||
"channel_name": channel_name,
|
|
||||||
"channel_url": channel_url,
|
|
||||||
"video_id": video["id"],
|
|
||||||
"video_title": video["title"],
|
|
||||||
"match_score": best_score,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
# Remove matched song from future consideration
|
|
||||||
del song_lookup[best_match]
|
|
||||||
with song_keys_lock:
|
|
||||||
song_keys.discard(best_match)
|
|
||||||
else:
|
|
||||||
# For exact matching, use direct key comparison
|
|
||||||
for video in available_videos:
|
|
||||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
|
||||||
video_key = create_song_key(v_artist, v_title)
|
|
||||||
|
|
||||||
with song_lookup_lock:
|
|
||||||
if video_key in song_keys and video_key in song_lookup:
|
|
||||||
song = song_lookup[video_key]
|
|
||||||
video_matches.append(
|
|
||||||
{
|
|
||||||
"artist": song["artist"],
|
|
||||||
"title": song["title"],
|
|
||||||
"channel_name": channel_name,
|
|
||||||
"channel_url": channel_url,
|
|
||||||
"video_id": video["id"],
|
|
||||||
"video_title": video["title"],
|
|
||||||
"match_score": 100,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
# Remove matched song from future consideration
|
|
||||||
del song_lookup[video_key]
|
|
||||||
with song_keys_lock:
|
|
||||||
song_keys.discard(video_key)
|
|
||||||
|
|
||||||
print(f" ✅ Found {len(video_matches)} matches in {channel_name}")
|
|
||||||
return video_matches
|
|
||||||
|
|
||||||
# Execute parallel channel scanning
|
|
||||||
with concurrent.futures.ThreadPoolExecutor(max_workers=max_channel_workers) as executor:
|
|
||||||
# Submit all channel scanning tasks
|
|
||||||
future_to_channel = {
|
|
||||||
executor.submit(scan_channel_safe, channel_url): channel_url
|
|
||||||
for channel_url in channel_urls
|
|
||||||
}
|
|
||||||
|
|
||||||
# Process results as they complete
|
|
||||||
for future in concurrent.futures.as_completed(future_to_channel):
|
|
||||||
channel_url = future_to_channel[future]
|
|
||||||
try:
|
|
||||||
video_matches = future.result()
|
|
||||||
plan.extend(video_matches)
|
|
||||||
channel_name, _ = get_channel_info(channel_url)
|
|
||||||
channel_match_counts[channel_name] = len(video_matches)
|
|
||||||
except Exception as e:
|
|
||||||
print(f"⚠️ Error processing channel {channel_url}: {e}")
|
|
||||||
channel_name, _ = get_channel_info(channel_url)
|
|
||||||
channel_match_counts[channel_name] = 0
|
|
||||||
else:
|
|
||||||
for i, channel_url in enumerate(channel_urls, 1):
|
for i, channel_url in enumerate(channel_urls, 1):
|
||||||
print(f"\n🚦 Starting channel {i}/{len(channel_urls)}: {channel_url}")
|
print(f"\n🚦 Starting channel {i}/{len(channel_urls)}: {channel_url}")
|
||||||
print(f" 🔍 Getting channel info...")
|
print(f" 🔍 Getting channel info...")
|
||||||
@ -338,7 +59,7 @@ def build_download_plan(
|
|||||||
print(f" ✅ Channel info: {channel_name} (ID: {channel_id})")
|
print(f" ✅ Channel info: {channel_name} (ID: {channel_id})")
|
||||||
print(f" 🔍 Fetching video list from channel...")
|
print(f" 🔍 Fetching video list from channel...")
|
||||||
available_videos = tracker.get_channel_video_list(
|
available_videos = tracker.get_channel_video_list(
|
||||||
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False, show_pagination=show_pagination
|
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False
|
||||||
)
|
)
|
||||||
print(
|
print(
|
||||||
f" 📊 Channel has {len(available_videos)} videos to scan against {len(undownloaded)} songlist songs"
|
f" 📊 Channel has {len(available_videos)} videos to scan against {len(undownloaded)} songlist songs"
|
||||||
@ -347,11 +68,10 @@ def build_download_plan(
|
|||||||
video_matches = [] # Initialize video_matches for this channel
|
video_matches = [] # Initialize video_matches for this channel
|
||||||
|
|
||||||
# Pre-process video titles for efficient matching
|
# Pre-process video titles for efficient matching
|
||||||
channel_parser = ChannelParser()
|
|
||||||
if fuzzy_match:
|
if fuzzy_match:
|
||||||
# For fuzzy matching, create normalized video keys
|
# For fuzzy matching, create normalized video keys
|
||||||
for video in available_videos:
|
for video in available_videos:
|
||||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
v_artist, v_title = extract_artist_title(video["title"])
|
||||||
video_key = create_song_key(v_artist, v_title)
|
video_key = create_song_key(v_artist, v_title)
|
||||||
|
|
||||||
# Find best match among remaining songs
|
# Find best match among remaining songs
|
||||||
@ -384,7 +104,7 @@ def build_download_plan(
|
|||||||
else:
|
else:
|
||||||
# For exact matching, use direct key comparison
|
# For exact matching, use direct key comparison
|
||||||
for video in available_videos:
|
for video in available_videos:
|
||||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
v_artist, v_title = extract_artist_title(video["title"])
|
||||||
video_key = create_song_key(v_artist, v_title)
|
video_key = create_song_key(v_artist, v_title)
|
||||||
|
|
||||||
if video_key in song_keys:
|
if video_key in song_keys:
|
||||||
@ -423,13 +143,4 @@ def build_download_plan(
|
|||||||
f" TOTAL: {sum(channel_match_counts.values())} matches across {len(channel_match_counts)} channels."
|
f" TOTAL: {sum(channel_match_counts.values())} matches across {len(channel_match_counts)} channels."
|
||||||
)
|
)
|
||||||
|
|
||||||
# Generate unmatched songs report if there are any
|
|
||||||
if unmatched:
|
|
||||||
try:
|
|
||||||
report_file = generate_unmatched_report(unmatched)
|
|
||||||
print(f"\n📋 Unmatched songs report saved to: {report_file}")
|
|
||||||
print(f"📋 Total unmatched songs: {len(unmatched)}")
|
|
||||||
except Exception as e:
|
|
||||||
print(f"⚠️ Could not generate unmatched songs report: {e}")
|
|
||||||
|
|
||||||
return plan, unmatched
|
return plan, unmatched
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@ -34,6 +34,7 @@ def sanitize_filename(
|
|||||||
# Clean up title
|
# Clean up title
|
||||||
safe_title = (
|
safe_title = (
|
||||||
title.replace("(From ", "")
|
title.replace("(From ", "")
|
||||||
|
.replace(")", "")
|
||||||
.replace(" - ", " ")
|
.replace(" - ", " ")
|
||||||
.replace(":", "")
|
.replace(":", "")
|
||||||
)
|
)
|
||||||
@ -53,18 +54,11 @@ def sanitize_filename(
|
|||||||
)
|
)
|
||||||
safe_artist = safe_artist.strip()
|
safe_artist = safe_artist.strip()
|
||||||
|
|
||||||
# Create filename - handle empty artist case
|
# Create filename
|
||||||
if not safe_artist or safe_artist.strip() == "":
|
|
||||||
# If no artist, just use the title
|
|
||||||
filename = f"{safe_title}.mp4"
|
|
||||||
else:
|
|
||||||
filename = f"{safe_artist} - {safe_title}.mp4"
|
filename = f"{safe_artist} - {safe_title}.mp4"
|
||||||
|
|
||||||
# Limit filename length if needed
|
# Limit filename length if needed
|
||||||
if len(filename) > max_length:
|
if len(filename) > max_length:
|
||||||
if not safe_artist or safe_artist.strip() == "":
|
|
||||||
filename = f"{safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
|
||||||
else:
|
|
||||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||||
|
|
||||||
return filename
|
return filename
|
||||||
@ -87,14 +81,6 @@ def generate_possible_filenames(
|
|||||||
safe_title = sanitize_title_for_filenames(title)
|
safe_title = sanitize_title_for_filenames(title)
|
||||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||||
|
|
||||||
# Handle empty artist case
|
|
||||||
if not safe_artist or safe_artist.strip() == "":
|
|
||||||
return [
|
|
||||||
f"{safe_title}.mp4", # Songlist mode (no artist)
|
|
||||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
|
||||||
f"{safe_title} (Karaoke Version).mp4", # Channel videos mode (no artist)
|
|
||||||
]
|
|
||||||
else:
|
|
||||||
return [
|
return [
|
||||||
f"{safe_artist} - {safe_title}.mp4", # Songlist mode
|
f"{safe_artist} - {safe_title}.mp4", # Songlist mode
|
||||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
||||||
@ -126,7 +112,6 @@ def check_file_exists_with_patterns(
|
|||||||
) -> Tuple[bool, Optional[Path]]:
|
) -> Tuple[bool, Optional[Path]]:
|
||||||
"""
|
"""
|
||||||
Check if a file exists using multiple possible filename patterns.
|
Check if a file exists using multiple possible filename patterns.
|
||||||
Also checks for files with (2), (3), etc. suffixes that yt-dlp might create.
|
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
downloads_dir: Base downloads directory
|
downloads_dir: Base downloads directory
|
||||||
@ -145,56 +130,15 @@ def check_file_exists_with_patterns(
|
|||||||
# Apply length limits if needed
|
# Apply length limits if needed
|
||||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||||
safe_title = sanitize_title_for_filenames(title)
|
safe_title = sanitize_title_for_filenames(title)
|
||||||
if not safe_artist or safe_artist.strip() == "":
|
|
||||||
filename = f"{safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
|
||||||
else:
|
|
||||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||||
|
|
||||||
# Check for exact filename match
|
|
||||||
file_path = channel_dir / filename
|
file_path = channel_dir / filename
|
||||||
if file_path.exists() and file_path.stat().st_size > 0:
|
if file_path.exists() and file_path.stat().st_size > 0:
|
||||||
return True, file_path
|
return True, file_path
|
||||||
|
|
||||||
# Check for files with (2), (3), etc. suffixes
|
|
||||||
base_name = filename.replace(".mp4", "")
|
|
||||||
for suffix in range(2, 10): # Check up to (9)
|
|
||||||
suffixed_filename = f"{base_name} ({suffix}).mp4"
|
|
||||||
suffixed_path = channel_dir / suffixed_filename
|
|
||||||
if suffixed_path.exists() and suffixed_path.stat().st_size > 0:
|
|
||||||
return True, suffixed_path
|
|
||||||
|
|
||||||
return False, None
|
return False, None
|
||||||
|
|
||||||
|
|
||||||
def get_unique_filename(
|
|
||||||
downloads_dir: Path, channel_name: str, artist: str, title: str
|
|
||||||
) -> Tuple[Path, bool]:
|
|
||||||
"""
|
|
||||||
Get a unique filename for download, checking for existing files including duplicates.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
downloads_dir: Base downloads directory
|
|
||||||
channel_name: Channel name
|
|
||||||
artist: Song artist
|
|
||||||
title: Song title
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (file_path, is_existing) where is_existing indicates if a file already exists
|
|
||||||
"""
|
|
||||||
filename = sanitize_filename(artist, title)
|
|
||||||
channel_dir = downloads_dir / channel_name
|
|
||||||
file_path = channel_dir / filename
|
|
||||||
|
|
||||||
# Check if file already exists
|
|
||||||
exists, existing_path = check_file_exists_with_patterns(downloads_dir, channel_name, artist, title)
|
|
||||||
|
|
||||||
if exists and existing_path:
|
|
||||||
print(f"📁 File already exists: {existing_path.name}")
|
|
||||||
return existing_path, True
|
|
||||||
|
|
||||||
return file_path, False
|
|
||||||
|
|
||||||
|
|
||||||
def ensure_directory_exists(directory: Path) -> None:
|
def ensure_directory_exists(directory: Path) -> None:
|
||||||
"""
|
"""
|
||||||
Ensure a directory exists, creating it if necessary.
|
Ensure a directory exists, creating it if necessary.
|
||||||
|
|||||||
@ -32,72 +32,10 @@ def normalize_title(title):
|
|||||||
|
|
||||||
|
|
||||||
def extract_artist_title(video_title):
|
def extract_artist_title(video_title):
|
||||||
"""
|
"""Extract artist and title from video title."""
|
||||||
Extract artist and title from video title.
|
|
||||||
|
|
||||||
This function handles multiple common video title formats found on YouTube karaoke channels:
|
|
||||||
|
|
||||||
1. "Artist - Title" format: "38 Special - Hold On Loosely"
|
|
||||||
2. "Title Karaoke | Artist Karaoke Version" format: "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
|
||||||
3. "Title Artist KARAOKE" format: "Hold On Loosely 38 Special KARAOKE"
|
|
||||||
|
|
||||||
Args:
|
|
||||||
video_title (str): The YouTube video title to parse
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
tuple: (artist, title) where artist and title are strings. If parsing fails,
|
|
||||||
artist will be empty string and title will be the full video title.
|
|
||||||
|
|
||||||
Examples:
|
|
||||||
>>> extract_artist_title("38 Special - Hold On Loosely")
|
|
||||||
("38 Special", "Hold On Loosely")
|
|
||||||
|
|
||||||
>>> extract_artist_title("Hold On Loosely Karaoke | 38 Special Karaoke Version")
|
|
||||||
("38 Special", "Hold On Loosely")
|
|
||||||
|
|
||||||
>>> extract_artist_title("Unknown Format Video Title")
|
|
||||||
("", "Unknown Format Video Title")
|
|
||||||
"""
|
|
||||||
# Handle "Artist - Title" format
|
|
||||||
if " - " in video_title:
|
if " - " in video_title:
|
||||||
parts = video_title.split(" - ", 1)
|
parts = video_title.split(" - ", 1)
|
||||||
return parts[0].strip(), parts[1].strip()
|
return parts[0].strip(), parts[1].strip()
|
||||||
|
|
||||||
# Handle "Title Karaoke | Artist Karaoke Version" format
|
|
||||||
if " | " in video_title and "karaoke" in video_title.lower():
|
|
||||||
parts = video_title.split(" | ", 1)
|
|
||||||
title_part = parts[0].strip()
|
|
||||||
artist_part = parts[1].strip()
|
|
||||||
|
|
||||||
# Clean up the parts
|
|
||||||
title = title_part.replace("Karaoke", "").strip()
|
|
||||||
artist = artist_part.replace("Karaoke Version", "").strip()
|
|
||||||
|
|
||||||
return artist, title
|
|
||||||
|
|
||||||
# Handle "Title Artist KARAOKE" format
|
|
||||||
if "karaoke" in video_title.lower():
|
|
||||||
# Try to find the artist by looking for common patterns
|
|
||||||
title_lower = video_title.lower()
|
|
||||||
|
|
||||||
# Look for patterns like "Title Artist KARAOKE"
|
|
||||||
# This is a simplified approach - we'll need to improve this
|
|
||||||
words = video_title.split()
|
|
||||||
if len(words) >= 3:
|
|
||||||
# Assume the last word before "KARAOKE" is part of the artist
|
|
||||||
for i, word in enumerate(words):
|
|
||||||
if "karaoke" in word.lower():
|
|
||||||
if i >= 2:
|
|
||||||
# Everything before the last word before KARAOKE is title
|
|
||||||
# Everything after is artist
|
|
||||||
title = " ".join(words[:i-1])
|
|
||||||
artist = " ".join(words[i-1:])
|
|
||||||
return artist, title
|
|
||||||
|
|
||||||
# If we can't parse it, return empty artist and full title
|
|
||||||
return "", video_title
|
|
||||||
|
|
||||||
# Default: return empty artist and full title
|
|
||||||
return "", video_title
|
return "", video_title
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -7,33 +7,17 @@ except ImportError:
|
|||||||
MUTAGEN_AVAILABLE = False
|
MUTAGEN_AVAILABLE = False
|
||||||
|
|
||||||
|
|
||||||
def clean_channel_name(channel_name: str) -> str:
|
def extract_artist_title(video_title):
|
||||||
"""
|
title = (
|
||||||
Clean channel name for ID3 tagging by removing @ symbol and ensuring it's alpha-only.
|
video_title.replace("(Karaoke Version)", "").replace("(Karaoke)", "").strip()
|
||||||
|
)
|
||||||
Args:
|
if " - " in title:
|
||||||
channel_name: Raw channel name (may contain @ symbol)
|
parts = title.split(" - ", 1)
|
||||||
|
if len(parts) == 2:
|
||||||
Returns:
|
artist = parts[0].strip()
|
||||||
Cleaned channel name suitable for ID3 tags
|
song_title = parts[1].strip()
|
||||||
"""
|
return artist, song_title
|
||||||
# Remove @ symbol if present
|
return "Unknown Artist", title
|
||||||
if channel_name.startswith('@'):
|
|
||||||
channel_name = channel_name[1:]
|
|
||||||
|
|
||||||
# Remove any non-alphanumeric characters and convert to single word
|
|
||||||
# Keep only letters, numbers, and spaces, then take the first word
|
|
||||||
cleaned = re.sub(r'[^a-zA-Z0-9\s]', '', channel_name)
|
|
||||||
words = cleaned.split()
|
|
||||||
if words:
|
|
||||||
return words[0] # Return only the first word
|
|
||||||
|
|
||||||
return "Unknown"
|
|
||||||
|
|
||||||
|
|
||||||
# Import the enhanced extract_artist_title function from fuzzy_matcher.py
|
|
||||||
# This ensures consistent parsing across all modules and supports multiple video title formats
|
|
||||||
from karaoke_downloader.fuzzy_matcher import extract_artist_title
|
|
||||||
|
|
||||||
|
|
||||||
def add_id3_tags(file_path, video_title, channel_name):
|
def add_id3_tags(file_path, video_title, channel_name):
|
||||||
@ -42,13 +26,12 @@ def add_id3_tags(file_path, video_title, channel_name):
|
|||||||
return
|
return
|
||||||
try:
|
try:
|
||||||
artist, title = extract_artist_title(video_title)
|
artist, title = extract_artist_title(video_title)
|
||||||
clean_channel = clean_channel_name(channel_name)
|
|
||||||
mp4 = MP4(str(file_path))
|
mp4 = MP4(str(file_path))
|
||||||
mp4["\xa9nam"] = title
|
mp4["\xa9nam"] = title
|
||||||
mp4["\xa9ART"] = artist
|
mp4["\xa9ART"] = artist
|
||||||
mp4["\xa9alb"] = clean_channel # Use clean channel name only, no suffix
|
mp4["\xa9alb"] = f"{channel_name} Karaoke"
|
||||||
mp4["\xa9gen"] = "Karaoke"
|
mp4["\xa9gen"] = "Karaoke"
|
||||||
mp4.save()
|
mp4.save()
|
||||||
print(f"📝 Added ID3 tags: Artist='{artist}', Title='{title}', Album='{clean_channel}'")
|
print(f"📝 Added ID3 tags: Artist='{artist}', Title='{title}'")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"⚠️ Could not add ID3 tags: {e}")
|
print(f"⚠️ Could not add ID3 tags: {e}")
|
||||||
|
|||||||
@ -1,83 +0,0 @@
|
|||||||
"""
|
|
||||||
Manual video manager for handling static video collections.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Dict, List, Optional, Any
|
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
def load_manual_videos(manual_file: str = None) -> List[Dict[str, Any]]:
|
|
||||||
if manual_file is None:
|
|
||||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
|
||||||
"""
|
|
||||||
Load manual videos from the JSON file.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
manual_file: Path to manual videos JSON file
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of video dictionaries
|
|
||||||
"""
|
|
||||||
manual_path = Path(manual_file)
|
|
||||||
|
|
||||||
if not manual_path.exists():
|
|
||||||
print(f"⚠️ Manual videos file not found: {manual_file}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
try:
|
|
||||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
|
||||||
data = json.load(f)
|
|
||||||
|
|
||||||
videos = data.get("videos", [])
|
|
||||||
print(f"📋 Loaded {len(videos)} manual videos from {manual_file}")
|
|
||||||
return videos
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Error loading manual videos: {e}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
def get_manual_videos_for_channel(channel_name: str, manual_file: str = None) -> List[Dict[str, Any]]:
|
|
||||||
if manual_file is None:
|
|
||||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
|
||||||
"""
|
|
||||||
Get manual videos for a specific channel.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
channel_name: Channel name (should be "@ManualVideos")
|
|
||||||
manual_file: Path to manual videos JSON file
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of video dictionaries
|
|
||||||
"""
|
|
||||||
if channel_name != "@ManualVideos":
|
|
||||||
return []
|
|
||||||
|
|
||||||
return load_manual_videos(manual_file)
|
|
||||||
|
|
||||||
def is_manual_channel(channel_url: str) -> bool:
|
|
||||||
"""
|
|
||||||
Check if a channel URL is a manual channel.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
channel_url: Channel URL
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if it's a manual channel
|
|
||||||
"""
|
|
||||||
return channel_url == "manual://static"
|
|
||||||
|
|
||||||
def get_manual_channel_info(channel_url: str) -> tuple[str, str]:
|
|
||||||
"""
|
|
||||||
Get channel info for manual channels.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
channel_url: Channel URL
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (channel_name, channel_id)
|
|
||||||
"""
|
|
||||||
if channel_url == "manual://static":
|
|
||||||
return "@ManualVideos", "manual"
|
|
||||||
return None, None
|
|
||||||
@ -56,6 +56,14 @@ def update_resolution(resolution):
|
|||||||
"include_console": True,
|
"include_console": True,
|
||||||
"include_file": True,
|
"include_file": True,
|
||||||
},
|
},
|
||||||
|
"platform_settings": {
|
||||||
|
"auto_detect_platform": True,
|
||||||
|
"yt_dlp_paths": {
|
||||||
|
"windows": "downloader/yt-dlp.exe",
|
||||||
|
"macos": "downloader/yt-dlp_macos",
|
||||||
|
"linux": "downloader/yt-dlp",
|
||||||
|
},
|
||||||
|
},
|
||||||
"yt_dlp_path": "downloader/yt-dlp.exe",
|
"yt_dlp_path": "downloader/yt-dlp.exe",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -7,40 +7,28 @@ import json
|
|||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
|
def load_server_songs(songs_path="data/songs.json"):
|
||||||
def load_server_songs(songs_path=None):
|
"""Load the list of songs already available on the server."""
|
||||||
if songs_path is None:
|
|
||||||
songs_path = str(get_data_path_manager().get_songs_path())
|
|
||||||
"""Load the list of songs already available on the server with format information."""
|
|
||||||
songs_file = Path(songs_path)
|
songs_file = Path(songs_path)
|
||||||
if not songs_file.exists():
|
if not songs_file.exists():
|
||||||
print(f"⚠️ Server songs file not found: {songs_path}")
|
print(f"⚠️ Server songs file not found: {songs_path}")
|
||||||
return {}
|
return set()
|
||||||
try:
|
try:
|
||||||
with open(songs_file, "r", encoding="utf-8") as f:
|
with open(songs_file, "r", encoding="utf-8") as f:
|
||||||
data = json.load(f)
|
data = json.load(f)
|
||||||
server_songs = {}
|
server_songs = set()
|
||||||
for song in data:
|
for song in data:
|
||||||
if "artist" in song and "title" in song and "path" in song:
|
if "artist" in song and "title" in song:
|
||||||
artist = song["artist"].strip()
|
artist = song["artist"].strip()
|
||||||
title = song["title"].strip()
|
title = song["title"].strip()
|
||||||
path = song["path"].strip()
|
|
||||||
key = f"{artist.lower()}_{normalize_title(title)}"
|
key = f"{artist.lower()}_{normalize_title(title)}"
|
||||||
server_songs[key] = {
|
server_songs.add(key)
|
||||||
"artist": artist,
|
|
||||||
"title": title,
|
|
||||||
"path": path,
|
|
||||||
"is_mp3": path.lower().endswith('.mp3'),
|
|
||||||
"is_cdg": 'cdg' in path.lower(),
|
|
||||||
"is_mp4": path.lower().endswith('.mp4')
|
|
||||||
}
|
|
||||||
print(f"📋 Loaded {len(server_songs)} songs from server (songs.json)")
|
print(f"📋 Loaded {len(server_songs)} songs from server (songs.json)")
|
||||||
return server_songs
|
return server_songs
|
||||||
except (json.JSONDecodeError, FileNotFoundError) as e:
|
except (json.JSONDecodeError, FileNotFoundError) as e:
|
||||||
print(f"⚠️ Could not load server songs: {e}")
|
print(f"⚠️ Could not load server songs: {e}")
|
||||||
return {}
|
return set()
|
||||||
|
|
||||||
|
|
||||||
def is_song_on_server(server_songs, artist, title):
|
def is_song_on_server(server_songs, artist, title):
|
||||||
@ -49,24 +37,9 @@ def is_song_on_server(server_songs, artist, title):
|
|||||||
return key in server_songs
|
return key in server_songs
|
||||||
|
|
||||||
|
|
||||||
def should_skip_server_song(server_songs, artist, title):
|
|
||||||
"""Check if a song should be skipped because it's already available as MP4 on server.
|
|
||||||
Returns True if the song should be skipped (MP4 format), False if it should be downloaded (MP3/CDG format)."""
|
|
||||||
key = f"{artist.lower()}_{normalize_title(title)}"
|
|
||||||
if key not in server_songs:
|
|
||||||
return False # Not on server, so don't skip
|
|
||||||
|
|
||||||
song_info = server_songs[key]
|
|
||||||
# Skip if it's an MP4 file (video format)
|
|
||||||
# Don't skip if it's MP3 or in CDG folder (different format)
|
|
||||||
return song_info.get("is_mp4", False) and not song_info.get("is_cdg", False)
|
|
||||||
|
|
||||||
|
|
||||||
def load_server_duplicates_tracking(
|
def load_server_duplicates_tracking(
|
||||||
tracking_path=None,
|
tracking_path="data/server_duplicates_tracking.json",
|
||||||
):
|
):
|
||||||
if tracking_path is None:
|
|
||||||
tracking_path = str(get_data_path_manager().get_server_duplicates_tracking_path())
|
|
||||||
"""Load the tracking of songs found to be duplicates on the server."""
|
"""Load the tracking of songs found to be duplicates on the server."""
|
||||||
tracking_file = Path(tracking_path)
|
tracking_file = Path(tracking_path)
|
||||||
if not tracking_file.exists():
|
if not tracking_file.exists():
|
||||||
@ -80,10 +53,8 @@ def load_server_duplicates_tracking(
|
|||||||
|
|
||||||
|
|
||||||
def save_server_duplicates_tracking(
|
def save_server_duplicates_tracking(
|
||||||
tracking, tracking_path=None
|
tracking, tracking_path="data/server_duplicates_tracking.json"
|
||||||
):
|
):
|
||||||
if tracking_path is None:
|
|
||||||
tracking_path = str(get_data_path_manager().get_server_duplicates_tracking_path())
|
|
||||||
"""Save the tracking of songs found to be duplicates on the server."""
|
"""Save the tracking of songs found to be duplicates on the server."""
|
||||||
try:
|
try:
|
||||||
with open(tracking_path, "w", encoding="utf-8") as f:
|
with open(tracking_path, "w", encoding="utf-8") as f:
|
||||||
@ -115,9 +86,8 @@ def mark_song_as_server_duplicate(tracking, artist, title, video_title, channel_
|
|||||||
def check_and_mark_server_duplicate(
|
def check_and_mark_server_duplicate(
|
||||||
server_songs, server_duplicates_tracking, artist, title, video_title, channel_name
|
server_songs, server_duplicates_tracking, artist, title, video_title, channel_name
|
||||||
):
|
):
|
||||||
"""Check if a song should be skipped because it's already available as MP4 on server and mark it as duplicate if so.
|
"""Check if a song is on server and mark it as duplicate if so. Returns True if it's a duplicate."""
|
||||||
Returns True if it should be skipped (MP4 format), False if it should be downloaded (MP3/CDG format)."""
|
if is_song_on_server(server_songs, artist, title):
|
||||||
if should_skip_server_song(server_songs, artist, title):
|
|
||||||
if not is_song_marked_as_server_duplicate(
|
if not is_song_marked_as_server_duplicate(
|
||||||
server_duplicates_tracking, artist, title
|
server_duplicates_tracking, artist, title
|
||||||
):
|
):
|
||||||
|
|||||||
@ -35,7 +35,6 @@ class SongValidator:
|
|||||||
video_title: Optional[str] = None,
|
video_title: Optional[str] = None,
|
||||||
server_songs: Optional[Dict[str, Any]] = None,
|
server_songs: Optional[Dict[str, Any]] = None,
|
||||||
server_duplicates_tracking: Optional[Dict[str, Any]] = None,
|
server_duplicates_tracking: Optional[Dict[str, Any]] = None,
|
||||||
force_download: bool = False,
|
|
||||||
) -> Tuple[bool, Optional[str], int]:
|
) -> Tuple[bool, Optional[str], int]:
|
||||||
"""
|
"""
|
||||||
Check if a song should be skipped based on multiple criteria.
|
Check if a song should be skipped based on multiple criteria.
|
||||||
@ -54,15 +53,10 @@ class SongValidator:
|
|||||||
video_title: YouTube video title (optional)
|
video_title: YouTube video title (optional)
|
||||||
server_songs: Server songs data (optional)
|
server_songs: Server songs data (optional)
|
||||||
server_duplicates_tracking: Server duplicates tracking (optional)
|
server_duplicates_tracking: Server duplicates tracking (optional)
|
||||||
force_download: If True, bypass all validation checks and force download
|
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Tuple of (should_skip, reason, total_filtered)
|
Tuple of (should_skip, reason, total_filtered)
|
||||||
"""
|
"""
|
||||||
# If force download is enabled, skip all validation checks
|
|
||||||
if force_download:
|
|
||||||
return False, None, 0
|
|
||||||
|
|
||||||
total_filtered = 0
|
total_filtered = 0
|
||||||
|
|
||||||
# Check 1: Already downloaded by this system
|
# Check 1: Already downloaded by this system
|
||||||
|
|||||||
@ -1,265 +0,0 @@
|
|||||||
import json
|
|
||||||
import os
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import List, Dict, Any, Optional
|
|
||||||
from mutagen.mp4 import MP4
|
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
|
|
||||||
class SongListGenerator:
|
|
||||||
"""Utility class for generating song lists from MP4 files with ID3 tags."""
|
|
||||||
|
|
||||||
def __init__(self, songlist_path: str = None):
|
|
||||||
if songlist_path is None:
|
|
||||||
songlist_path = str(get_data_path_manager().get_songlist_path())
|
|
||||||
self.songlist_path = Path(songlist_path)
|
|
||||||
self.songlist_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
|
|
||||||
def read_existing_songlist(self) -> List[Dict[str, Any]]:
|
|
||||||
"""Read existing song list from JSON file."""
|
|
||||||
if self.songlist_path.exists():
|
|
||||||
try:
|
|
||||||
with open(self.songlist_path, 'r', encoding='utf-8') as f:
|
|
||||||
return json.load(f)
|
|
||||||
except (json.JSONDecodeError, IOError) as e:
|
|
||||||
print(f"⚠️ Warning: Could not read existing songlist: {e}")
|
|
||||||
return []
|
|
||||||
return []
|
|
||||||
|
|
||||||
def save_songlist(self, songlist: List[Dict[str, Any]]) -> None:
|
|
||||||
"""Save song list to JSON file."""
|
|
||||||
try:
|
|
||||||
with open(self.songlist_path, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(songlist, f, indent=2, ensure_ascii=False)
|
|
||||||
print(f"✅ Song list saved to {self.songlist_path}")
|
|
||||||
except IOError as e:
|
|
||||||
print(f"❌ Error saving song list: {e}")
|
|
||||||
raise
|
|
||||||
|
|
||||||
def extract_id3_tags(self, mp4_path: Path) -> Optional[Dict[str, str]]:
|
|
||||||
"""Extract ID3 tags from MP4 file."""
|
|
||||||
try:
|
|
||||||
mp4 = MP4(str(mp4_path))
|
|
||||||
|
|
||||||
# Extract artist and title from ID3 tags
|
|
||||||
artist = mp4.get("\xa9ART", ["Unknown Artist"])[0] if "\xa9ART" in mp4 else "Unknown Artist"
|
|
||||||
title = mp4.get("\xa9nam", ["Unknown Title"])[0] if "\xa9nam" in mp4 else "Unknown Title"
|
|
||||||
|
|
||||||
return {
|
|
||||||
"artist": artist,
|
|
||||||
"title": title
|
|
||||||
}
|
|
||||||
except Exception as e:
|
|
||||||
print(f"⚠️ Warning: Could not extract ID3 tags from {mp4_path.name}: {e}")
|
|
||||||
return None
|
|
||||||
|
|
||||||
def scan_directory_for_mp4_files(self, directory_path: str) -> List[Path]:
|
|
||||||
"""Scan directory for MP4 files."""
|
|
||||||
directory = Path(directory_path)
|
|
||||||
if not directory.exists():
|
|
||||||
raise FileNotFoundError(f"Directory not found: {directory_path}")
|
|
||||||
|
|
||||||
if not directory.is_dir():
|
|
||||||
raise ValueError(f"Path is not a directory: {directory_path}")
|
|
||||||
|
|
||||||
mp4_files = list(directory.glob("*.mp4"))
|
|
||||||
if not mp4_files:
|
|
||||||
print(f"⚠️ No MP4 files found in {directory_path}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
print(f"📁 Found {len(mp4_files)} MP4 files in {directory.name}")
|
|
||||||
return sorted(mp4_files)
|
|
||||||
|
|
||||||
def generate_songlist_from_directory(self, directory_path: str, append: bool = True) -> Dict[str, Any]:
|
|
||||||
"""Generate a song list from MP4 files in a directory."""
|
|
||||||
directory = Path(directory_path)
|
|
||||||
directory_name = directory.name
|
|
||||||
|
|
||||||
# Scan for MP4 files
|
|
||||||
mp4_files = self.scan_directory_for_mp4_files(directory_path)
|
|
||||||
if not mp4_files:
|
|
||||||
return {}
|
|
||||||
|
|
||||||
# Extract ID3 tags and create songs list
|
|
||||||
songs = []
|
|
||||||
for index, mp4_file in enumerate(mp4_files, start=1):
|
|
||||||
id3_tags = self.extract_id3_tags(mp4_file)
|
|
||||||
if id3_tags:
|
|
||||||
song = {
|
|
||||||
"position": index,
|
|
||||||
"title": id3_tags["title"],
|
|
||||||
"artist": id3_tags["artist"]
|
|
||||||
}
|
|
||||||
songs.append(song)
|
|
||||||
print(f" {index:3d}. {id3_tags['artist']} - {id3_tags['title']}")
|
|
||||||
|
|
||||||
if not songs:
|
|
||||||
print("❌ No valid ID3 tags found in any MP4 files")
|
|
||||||
return {}
|
|
||||||
|
|
||||||
# Create the song list entry
|
|
||||||
songlist_entry = {
|
|
||||||
"title": directory_name,
|
|
||||||
"songs": songs
|
|
||||||
}
|
|
||||||
|
|
||||||
# Handle appending to existing song list
|
|
||||||
if append:
|
|
||||||
existing_songlist = self.read_existing_songlist()
|
|
||||||
|
|
||||||
# Check if a playlist with this title already exists
|
|
||||||
existing_index = None
|
|
||||||
for i, entry in enumerate(existing_songlist):
|
|
||||||
if entry.get("title") == directory_name:
|
|
||||||
existing_index = i
|
|
||||||
break
|
|
||||||
|
|
||||||
if existing_index is not None:
|
|
||||||
# Replace existing entry
|
|
||||||
print(f"🔄 Replacing existing playlist: {directory_name}")
|
|
||||||
existing_songlist[existing_index] = songlist_entry
|
|
||||||
else:
|
|
||||||
# Add new entry to the beginning of the list
|
|
||||||
print(f"➕ Adding new playlist: {directory_name}")
|
|
||||||
existing_songlist.insert(0, songlist_entry)
|
|
||||||
|
|
||||||
self.save_songlist(existing_songlist)
|
|
||||||
else:
|
|
||||||
# Create new song list with just this entry
|
|
||||||
print(f"📝 Creating new song list with playlist: {directory_name}")
|
|
||||||
self.save_songlist([songlist_entry])
|
|
||||||
|
|
||||||
return songlist_entry
|
|
||||||
|
|
||||||
def generate_songlist_from_multiple_directories(self, directory_paths: List[str], append: bool = True) -> List[Dict[str, Any]]:
|
|
||||||
"""Generate song lists from multiple directories."""
|
|
||||||
results = []
|
|
||||||
errors = []
|
|
||||||
|
|
||||||
# Read existing song list once at the beginning
|
|
||||||
existing_songlist = self.read_existing_songlist() if append else []
|
|
||||||
|
|
||||||
for directory_path in directory_paths:
|
|
||||||
try:
|
|
||||||
print(f"\n📂 Processing directory: {directory_path}")
|
|
||||||
directory = Path(directory_path)
|
|
||||||
directory_name = directory.name
|
|
||||||
|
|
||||||
# Scan for MP4 files
|
|
||||||
mp4_files = self.scan_directory_for_mp4_files(directory_path)
|
|
||||||
if not mp4_files:
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Extract ID3 tags and create songs list
|
|
||||||
songs = []
|
|
||||||
for index, mp4_file in enumerate(mp4_files, start=1):
|
|
||||||
id3_tags = self.extract_id3_tags(mp4_file)
|
|
||||||
if id3_tags:
|
|
||||||
song = {
|
|
||||||
"position": index,
|
|
||||||
"title": id3_tags["title"],
|
|
||||||
"artist": id3_tags["artist"]
|
|
||||||
}
|
|
||||||
songs.append(song)
|
|
||||||
print(f" {index:3d}. {id3_tags['artist']} - {id3_tags['title']}")
|
|
||||||
|
|
||||||
if not songs:
|
|
||||||
print("❌ No valid ID3 tags found in any MP4 files")
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Create the song list entry
|
|
||||||
songlist_entry = {
|
|
||||||
"title": directory_name,
|
|
||||||
"songs": songs
|
|
||||||
}
|
|
||||||
|
|
||||||
# Check if a playlist with this title already exists
|
|
||||||
existing_index = None
|
|
||||||
for i, entry in enumerate(existing_songlist):
|
|
||||||
if entry.get("title") == directory_name:
|
|
||||||
existing_index = i
|
|
||||||
break
|
|
||||||
|
|
||||||
if existing_index is not None:
|
|
||||||
# Replace existing entry
|
|
||||||
print(f"🔄 Replacing existing playlist: {directory_name}")
|
|
||||||
existing_songlist[existing_index] = songlist_entry
|
|
||||||
else:
|
|
||||||
# Add new entry to the beginning of the list
|
|
||||||
print(f"➕ Adding new playlist: {directory_name}")
|
|
||||||
existing_songlist.insert(0, songlist_entry)
|
|
||||||
|
|
||||||
results.append(songlist_entry)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
error_msg = f"Error processing {directory_path}: {e}"
|
|
||||||
print(f"❌ {error_msg}")
|
|
||||||
errors.append(error_msg)
|
|
||||||
|
|
||||||
# Save the final song list
|
|
||||||
if results:
|
|
||||||
if append:
|
|
||||||
# Save the updated existing song list
|
|
||||||
self.save_songlist(existing_songlist)
|
|
||||||
else:
|
|
||||||
# Create new song list with just the results
|
|
||||||
self.save_songlist(results)
|
|
||||||
|
|
||||||
# If there were any errors, raise an exception
|
|
||||||
if errors:
|
|
||||||
raise Exception(f"Failed to process {len(errors)} directories: {'; '.join(errors)}")
|
|
||||||
|
|
||||||
return results
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""CLI entry point for song list generation."""
|
|
||||||
import argparse
|
|
||||||
import sys
|
|
||||||
|
|
||||||
parser = argparse.ArgumentParser(
|
|
||||||
description="Generate song lists from MP4 files with ID3 tags",
|
|
||||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
|
||||||
epilog="""
|
|
||||||
Examples:
|
|
||||||
python -m karaoke_downloader.songlist_generator /path/to/mp4/directory
|
|
||||||
python -m karaoke_downloader.songlist_generator /path/to/dir1 /path/to/dir2 --no-append
|
|
||||||
python -m karaoke_downloader.songlist_generator /path/to/dir --songlist-path custom_songlist.json
|
|
||||||
"""
|
|
||||||
)
|
|
||||||
|
|
||||||
parser.add_argument(
|
|
||||||
"directories",
|
|
||||||
nargs="+",
|
|
||||||
help="Directory paths containing MP4 files with ID3 tags"
|
|
||||||
)
|
|
||||||
|
|
||||||
parser.add_argument(
|
|
||||||
"--no-append",
|
|
||||||
action="store_true",
|
|
||||||
help="Create a new song list instead of appending to existing one"
|
|
||||||
)
|
|
||||||
|
|
||||||
parser.add_argument(
|
|
||||||
"--songlist-path",
|
|
||||||
default=None,
|
|
||||||
help="Path to the song list JSON file (default: songList.json in the data directory)"
|
|
||||||
)
|
|
||||||
|
|
||||||
args = parser.parse_args()
|
|
||||||
|
|
||||||
try:
|
|
||||||
generator = SongListGenerator(args.songlist_path)
|
|
||||||
generator.generate_songlist_from_multiple_directories(
|
|
||||||
args.directories,
|
|
||||||
append=not args.no_append
|
|
||||||
)
|
|
||||||
print("\n✅ Song list generation completed successfully!")
|
|
||||||
except Exception as e:
|
|
||||||
print(f"\n❌ Error: {e}")
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -7,7 +7,6 @@ import json
|
|||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
from karaoke_downloader.server_manager import (
|
from karaoke_downloader.server_manager import (
|
||||||
check_and_mark_server_duplicate,
|
check_and_mark_server_duplicate,
|
||||||
is_song_marked_as_server_duplicate,
|
is_song_marked_as_server_duplicate,
|
||||||
@ -17,9 +16,7 @@ from karaoke_downloader.server_manager import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def load_songlist(songlist_path=None):
|
def load_songlist(songlist_path="data/songList.json"):
|
||||||
if songlist_path is None:
|
|
||||||
songlist_path = str(get_data_path_manager().get_songlist_path())
|
|
||||||
songlist_file = Path(songlist_path)
|
songlist_file = Path(songlist_path)
|
||||||
if not songlist_file.exists():
|
if not songlist_file.exists():
|
||||||
print(f"⚠️ Songlist file not found: {songlist_path}")
|
print(f"⚠️ Songlist file not found: {songlist_path}")
|
||||||
@ -58,9 +55,7 @@ def normalize_title(title):
|
|||||||
return " ".join(normalized.split()).lower()
|
return " ".join(normalized.split()).lower()
|
||||||
|
|
||||||
|
|
||||||
def load_songlist_tracking(tracking_path=None):
|
def load_songlist_tracking(tracking_path="data/songlist_tracking.json"):
|
||||||
if tracking_path is None:
|
|
||||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
|
||||||
tracking_file = Path(tracking_path)
|
tracking_file = Path(tracking_path)
|
||||||
if not tracking_file.exists():
|
if not tracking_file.exists():
|
||||||
return {}
|
return {}
|
||||||
@ -72,9 +67,7 @@ def load_songlist_tracking(tracking_path=None):
|
|||||||
return {}
|
return {}
|
||||||
|
|
||||||
|
|
||||||
def save_songlist_tracking(tracking, tracking_path=None):
|
def save_songlist_tracking(tracking, tracking_path="data/songlist_tracking.json"):
|
||||||
if tracking_path is None:
|
|
||||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
|
||||||
try:
|
try:
|
||||||
with open(tracking_path, "w", encoding="utf-8") as f:
|
with open(tracking_path, "w", encoding="utf-8") as f:
|
||||||
json.dump(tracking, f, indent=2, ensure_ascii=False)
|
json.dump(tracking, f, indent=2, ensure_ascii=False)
|
||||||
|
|||||||
@ -1,12 +1,10 @@
|
|||||||
import json
|
import threading
|
||||||
import os
|
|
||||||
import re
|
|
||||||
from datetime import datetime, timedelta
|
|
||||||
from enum import Enum
|
from enum import Enum
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any, Dict, List, Optional, Tuple
|
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
class SongStatus(str, Enum):
|
class SongStatus(str, Enum):
|
||||||
NOT_DOWNLOADED = "NOT_DOWNLOADED"
|
NOT_DOWNLOADED = "NOT_DOWNLOADED"
|
||||||
@ -27,133 +25,46 @@ class FormatType(str, Enum):
|
|||||||
class TrackingManager:
|
class TrackingManager:
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
tracking_file=None,
|
tracking_file="data/karaoke_tracking.json",
|
||||||
cache_dir=None,
|
cache_file="data/channel_cache.json",
|
||||||
):
|
):
|
||||||
if tracking_file is None:
|
|
||||||
tracking_file = str(get_data_path_manager().get_karaoke_tracking_path())
|
|
||||||
if cache_dir is None:
|
|
||||||
cache_dir = str(get_data_path_manager().get_channel_cache_dir())
|
|
||||||
|
|
||||||
self.tracking_file = Path(tracking_file)
|
self.tracking_file = Path(tracking_file)
|
||||||
self.cache_dir = Path(cache_dir)
|
self.cache_file = Path(cache_file)
|
||||||
|
self.data = {"playlists": {}, "songs": {}}
|
||||||
# Ensure cache directory exists
|
self.cache = {}
|
||||||
self.cache_dir.mkdir(parents=True, exist_ok=True)
|
self._lock = threading.Lock()
|
||||||
|
self._load()
|
||||||
self.data = self._load()
|
self._load_cache()
|
||||||
print(f"📊 Tracking manager initialized with {len(self.data.get('songs', {}))} tracked songs")
|
|
||||||
|
|
||||||
def _load(self):
|
def _load(self):
|
||||||
"""Load tracking data from JSON file."""
|
|
||||||
if self.tracking_file.exists():
|
if self.tracking_file.exists():
|
||||||
try:
|
try:
|
||||||
with open(self.tracking_file, "r", encoding="utf-8") as f:
|
with open(self.tracking_file, "r", encoding="utf-8") as f:
|
||||||
return json.load(f)
|
self.data = json.load(f)
|
||||||
except json.JSONDecodeError:
|
except Exception:
|
||||||
print(f"⚠️ Corrupted tracking file, creating new one")
|
self.data = {"playlists": {}, "songs": {}}
|
||||||
|
|
||||||
return {"songs": {}, "playlists": {}, "last_updated": datetime.now().isoformat()}
|
|
||||||
|
|
||||||
def _save(self):
|
def _save(self):
|
||||||
"""Save tracking data to JSON file."""
|
with self._lock:
|
||||||
self.data["last_updated"] = datetime.now().isoformat()
|
|
||||||
self.tracking_file.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
with open(self.tracking_file, "w", encoding="utf-8") as f:
|
with open(self.tracking_file, "w", encoding="utf-8") as f:
|
||||||
json.dump(self.data, f, indent=2, ensure_ascii=False)
|
json.dump(self.data, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
def force_save(self):
|
def force_save(self):
|
||||||
"""Force save the tracking data."""
|
|
||||||
self._save()
|
self._save()
|
||||||
|
|
||||||
def _get_channel_cache_file(self, channel_id: str) -> Path:
|
def _load_cache(self):
|
||||||
"""Get the cache file path for a specific channel."""
|
if self.cache_file.exists():
|
||||||
# Sanitize channel ID for filename
|
|
||||||
safe_channel_id = re.sub(r'[<>:"/\\|?*]', '_', channel_id)
|
|
||||||
return self.cache_dir / f"{safe_channel_id}.json"
|
|
||||||
|
|
||||||
def _load_channel_cache(self, channel_id: str) -> List[Dict[str, str]]:
|
|
||||||
"""Load cache for a specific channel."""
|
|
||||||
cache_file = self._get_channel_cache_file(channel_id)
|
|
||||||
if cache_file.exists():
|
|
||||||
try:
|
try:
|
||||||
with open(cache_file, 'r', encoding='utf-8') as f:
|
with open(self.cache_file, "r", encoding="utf-8") as f:
|
||||||
data = json.load(f)
|
self.cache = json.load(f)
|
||||||
return data.get('videos', [])
|
except Exception:
|
||||||
except (json.JSONDecodeError, KeyError):
|
self.cache = {}
|
||||||
print(f" ⚠️ Corrupted cache file for {channel_id}, will recreate")
|
|
||||||
return []
|
|
||||||
return []
|
|
||||||
|
|
||||||
def _save_channel_cache(self, channel_id: str, videos: List[Dict[str, str]]):
|
def save_cache(self):
|
||||||
"""Save cache for a specific channel."""
|
with open(self.cache_file, "w", encoding="utf-8") as f:
|
||||||
cache_file = self._get_channel_cache_file(channel_id)
|
json.dump(self.cache, f, indent=2, ensure_ascii=False)
|
||||||
data = {
|
|
||||||
'channel_id': channel_id,
|
|
||||||
'videos': videos,
|
|
||||||
'last_updated': datetime.now().isoformat(),
|
|
||||||
'video_count': len(videos)
|
|
||||||
}
|
|
||||||
with open(cache_file, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
|
||||||
|
|
||||||
def _clear_channel_cache(self, channel_id: str):
|
|
||||||
"""Clear cache for a specific channel."""
|
|
||||||
cache_file = self._get_channel_cache_file(channel_id)
|
|
||||||
if cache_file.exists():
|
|
||||||
cache_file.unlink()
|
|
||||||
print(f" 🗑️ Cleared cache file: {cache_file.name}")
|
|
||||||
|
|
||||||
def get_cache_info(self):
|
|
||||||
"""Get information about all channel cache files."""
|
|
||||||
cache_files = list(self.cache_dir.glob("*.json"))
|
|
||||||
total_videos = 0
|
|
||||||
cache_info = []
|
|
||||||
|
|
||||||
for cache_file in cache_files:
|
|
||||||
try:
|
|
||||||
with open(cache_file, 'r', encoding='utf-8') as f:
|
|
||||||
data = json.load(f)
|
|
||||||
video_count = len(data.get('videos', []))
|
|
||||||
total_videos += video_count
|
|
||||||
last_updated = data.get('last_updated', 'Unknown')
|
|
||||||
cache_info.append({
|
|
||||||
'channel': data.get('channel_id', cache_file.stem),
|
|
||||||
'videos': video_count,
|
|
||||||
'last_updated': last_updated,
|
|
||||||
'file': cache_file.name
|
|
||||||
})
|
|
||||||
except Exception as e:
|
|
||||||
print(f"⚠️ Error reading cache file {cache_file.name}: {e}")
|
|
||||||
|
|
||||||
return {
|
|
||||||
'total_channels': len(cache_files),
|
|
||||||
'total_videos': total_videos,
|
|
||||||
'channels': cache_info
|
|
||||||
}
|
|
||||||
|
|
||||||
def clear_channel_cache(self, channel_id=None):
|
|
||||||
"""Clear cache for a specific channel or all channels."""
|
|
||||||
if channel_id:
|
|
||||||
self._clear_channel_cache(channel_id)
|
|
||||||
print(f"🗑️ Cleared cache for channel: {channel_id}")
|
|
||||||
else:
|
|
||||||
# Clear all cache files
|
|
||||||
cache_files = list(self.cache_dir.glob("*.json"))
|
|
||||||
for cache_file in cache_files:
|
|
||||||
cache_file.unlink()
|
|
||||||
print(f"🗑️ Cleared all {len(cache_files)} channel cache files")
|
|
||||||
|
|
||||||
def set_cache_duration(self, hours):
|
|
||||||
"""Placeholder for cache duration logic"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
def export_playlist_report(self, playlist_id):
|
|
||||||
"""Export a report for a specific playlist."""
|
|
||||||
pass
|
|
||||||
|
|
||||||
def get_statistics(self):
|
def get_statistics(self):
|
||||||
"""Get statistics about tracked songs."""
|
|
||||||
total_songs = len(self.data["songs"])
|
total_songs = len(self.data["songs"])
|
||||||
downloaded_songs = sum(
|
downloaded_songs = sum(
|
||||||
1
|
1
|
||||||
@ -191,13 +102,11 @@ class TrackingManager:
|
|||||||
}
|
}
|
||||||
|
|
||||||
def get_playlist_songs(self, playlist_id):
|
def get_playlist_songs(self, playlist_id):
|
||||||
"""Get songs for a specific playlist."""
|
|
||||||
return [
|
return [
|
||||||
s for s in self.data["songs"].values() if s["playlist_id"] == playlist_id
|
s for s in self.data["songs"].values() if s["playlist_id"] == playlist_id
|
||||||
]
|
]
|
||||||
|
|
||||||
def get_failed_songs(self, playlist_id=None):
|
def get_failed_songs(self, playlist_id=None):
|
||||||
"""Get failed songs, optionally filtered by playlist."""
|
|
||||||
if playlist_id:
|
if playlist_id:
|
||||||
return [
|
return [
|
||||||
s
|
s
|
||||||
@ -209,7 +118,6 @@ class TrackingManager:
|
|||||||
]
|
]
|
||||||
|
|
||||||
def get_partial_downloads(self, playlist_id=None):
|
def get_partial_downloads(self, playlist_id=None):
|
||||||
"""Get partial downloads, optionally filtered by playlist."""
|
|
||||||
if playlist_id:
|
if playlist_id:
|
||||||
return [
|
return [
|
||||||
s
|
s
|
||||||
@ -221,7 +129,7 @@ class TrackingManager:
|
|||||||
]
|
]
|
||||||
|
|
||||||
def cleanup_orphaned_files(self, downloads_dir):
|
def cleanup_orphaned_files(self, downloads_dir):
|
||||||
"""Remove tracking entries for files that no longer exist."""
|
# Remove tracking entries for files that no longer exist
|
||||||
orphaned = []
|
orphaned = []
|
||||||
for song_id, song in list(self.data["songs"].items()):
|
for song_id, song in list(self.data["songs"].items()):
|
||||||
file_path = song.get("file_path")
|
file_path = song.get("file_path")
|
||||||
@ -231,17 +139,51 @@ class TrackingManager:
|
|||||||
self.force_save()
|
self.force_save()
|
||||||
return orphaned
|
return orphaned
|
||||||
|
|
||||||
|
def get_cache_info(self):
|
||||||
|
total_channels = len(self.cache)
|
||||||
|
total_cached_videos = sum(len(v) for v in self.cache.values())
|
||||||
|
cache_duration_hours = 24 # default
|
||||||
|
last_updated = None
|
||||||
|
return {
|
||||||
|
"total_channels": total_channels,
|
||||||
|
"total_cached_videos": total_cached_videos,
|
||||||
|
"cache_duration_hours": cache_duration_hours,
|
||||||
|
"last_updated": last_updated,
|
||||||
|
}
|
||||||
|
|
||||||
|
def clear_channel_cache(self, channel_id=None):
|
||||||
|
if channel_id is None or channel_id == "all":
|
||||||
|
self.cache = {}
|
||||||
|
else:
|
||||||
|
self.cache.pop(channel_id, None)
|
||||||
|
self.save_cache()
|
||||||
|
|
||||||
|
def set_cache_duration(self, hours):
|
||||||
|
# Placeholder for cache duration logic
|
||||||
|
pass
|
||||||
|
|
||||||
|
def export_playlist_report(self, playlist_id):
|
||||||
|
playlist = self.data["playlists"].get(playlist_id)
|
||||||
|
if not playlist:
|
||||||
|
return f"Playlist '{playlist_id}' not found."
|
||||||
|
songs = self.get_playlist_songs(playlist_id)
|
||||||
|
report = {"playlist": playlist, "songs": songs}
|
||||||
|
return json.dumps(report, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
def is_song_downloaded(self, artist, title, channel_name=None, video_id=None):
|
def is_song_downloaded(self, artist, title, channel_name=None, video_id=None):
|
||||||
"""
|
"""
|
||||||
Check if a song has already been downloaded.
|
Check if a song has already been downloaded by this system.
|
||||||
Returns True if the song exists in tracking with DOWNLOADED status.
|
Returns True if the song exists in tracking with DOWNLOADED or CONVERTED status.
|
||||||
"""
|
"""
|
||||||
# If we have video_id and channel_name, try direct key lookup first (most efficient)
|
# If we have video_id and channel_name, try direct key lookup first (most efficient)
|
||||||
if video_id and channel_name:
|
if video_id and channel_name:
|
||||||
song_key = f"{video_id}@{channel_name}"
|
song_key = f"{video_id}@{channel_name}"
|
||||||
if song_key in self.data["songs"]:
|
if song_key in self.data["songs"]:
|
||||||
song_data = self.data["songs"][song_key]
|
song_data = self.data["songs"][song_key]
|
||||||
if song_data.get("status") == SongStatus.DOWNLOADED:
|
if song_data.get("status") in [
|
||||||
|
SongStatus.DOWNLOADED,
|
||||||
|
SongStatus.CONVERTED,
|
||||||
|
]:
|
||||||
return True
|
return True
|
||||||
|
|
||||||
# Fallback to content search (for cases where we don't have video_id)
|
# Fallback to content search (for cases where we don't have video_id)
|
||||||
@ -249,14 +191,19 @@ class TrackingManager:
|
|||||||
# Check if this song matches the artist and title
|
# Check if this song matches the artist and title
|
||||||
if song_data.get("artist") == artist and song_data.get("title") == title:
|
if song_data.get("artist") == artist and song_data.get("title") == title:
|
||||||
# Check if it's marked as downloaded
|
# Check if it's marked as downloaded
|
||||||
if song_data.get("status") == SongStatus.DOWNLOADED:
|
if song_data.get("status") in [
|
||||||
|
SongStatus.DOWNLOADED,
|
||||||
|
SongStatus.CONVERTED,
|
||||||
|
]:
|
||||||
return True
|
return True
|
||||||
# Also check the video title field which might contain the song info
|
# Also check the video title field which might contain the song info
|
||||||
video_title = song_data.get("video_title", "")
|
video_title = song_data.get("video_title", "")
|
||||||
if video_title and artist in video_title and title in video_title:
|
if video_title and artist in video_title and title in video_title:
|
||||||
if song_data.get("status") == SongStatus.DOWNLOADED:
|
if song_data.get("status") in [
|
||||||
|
SongStatus.DOWNLOADED,
|
||||||
|
SongStatus.CONVERTED,
|
||||||
|
]:
|
||||||
return True
|
return True
|
||||||
|
|
||||||
return False
|
return False
|
||||||
|
|
||||||
def is_file_exists(self, file_path):
|
def is_file_exists(self, file_path):
|
||||||
@ -336,359 +283,114 @@ class TrackingManager:
|
|||||||
self._save()
|
self._save()
|
||||||
|
|
||||||
def get_channel_video_list(
|
def get_channel_video_list(
|
||||||
self, channel_url, yt_dlp_path="downloader/yt-dlp.exe", force_refresh=False, show_pagination=False
|
self, channel_url, yt_dlp_path=None, force_refresh=False
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Return a list of videos (dicts with 'title' and 'id') for the channel, using cache if available unless force_refresh is True.
|
Return a list of videos (dicts with 'title' and 'id') for the channel, using cache if available unless force_refresh is True.
|
||||||
|
|
||||||
Args:
|
|
||||||
channel_url: YouTube channel URL
|
|
||||||
yt_dlp_path: Path to yt-dlp executable
|
|
||||||
force_refresh: Force refresh cache even if available
|
|
||||||
show_pagination: Show page-by-page progress (slower but more detailed)
|
|
||||||
"""
|
"""
|
||||||
|
# Use platform-aware path if none provided
|
||||||
|
if yt_dlp_path is None:
|
||||||
|
from karaoke_downloader.config_manager import load_config
|
||||||
|
config = load_config()
|
||||||
|
yt_dlp_path = config.yt_dlp_path
|
||||||
|
|
||||||
channel_name, channel_id = None, None
|
channel_name, channel_id = None, None
|
||||||
|
|
||||||
# Check if this is a manual channel
|
|
||||||
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
|
||||||
|
|
||||||
if is_manual_channel(channel_url):
|
|
||||||
channel_name, channel_id = get_manual_channel_info(channel_url)
|
|
||||||
if channel_name and channel_id:
|
|
||||||
print(f" 📋 Loading manual videos for {channel_name}")
|
|
||||||
manual_videos = get_manual_videos_for_channel(channel_name)
|
|
||||||
# Convert to the expected format
|
|
||||||
videos = []
|
|
||||||
for video in manual_videos:
|
|
||||||
videos.append({
|
|
||||||
"title": video.get("title", ""),
|
|
||||||
"id": video.get("id", ""),
|
|
||||||
"url": video.get("url", "")
|
|
||||||
})
|
|
||||||
print(f" ✅ Loaded {len(videos)} manual videos")
|
|
||||||
return videos
|
|
||||||
else:
|
|
||||||
print(f" ❌ Could not get manual channel info for: {channel_url}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
# Regular YouTube channel processing
|
|
||||||
from karaoke_downloader.youtube_utils import get_channel_info
|
from karaoke_downloader.youtube_utils import get_channel_info
|
||||||
|
|
||||||
channel_name, channel_id = get_channel_info(channel_url)
|
channel_name, channel_id = get_channel_info(channel_url)
|
||||||
|
|
||||||
if not channel_id:
|
# Check if cache has the old flat structure or new nested structure
|
||||||
print(f" ❌ Could not extract channel ID from URL: {channel_url}")
|
cache_data = None
|
||||||
return []
|
cache_key = None
|
||||||
|
|
||||||
print(f" 🔍 Channel: {channel_name} (ID: {channel_id})")
|
# Try nested structure first (new format)
|
||||||
|
if "channels" in self.cache:
|
||||||
# Check if we have cached data for this channel
|
# Try multiple possible cache keys in nested structure
|
||||||
if not force_refresh:
|
possible_keys = [
|
||||||
cached_videos = self._load_channel_cache(channel_id)
|
channel_id, # The extracted channel ID
|
||||||
if cached_videos:
|
channel_url, # The full URL
|
||||||
# Validate that the cached data has proper video IDs
|
channel_name, # The extracted channel name
|
||||||
corrupted = False
|
|
||||||
|
|
||||||
# Check if any video IDs look like titles instead of proper YouTube IDs
|
|
||||||
for video in cached_videos[:20]: # Check first 20 videos
|
|
||||||
video_id = video.get("id", "")
|
|
||||||
# More comprehensive validation - YouTube IDs should be 11 characters and contain only alphanumeric, hyphens, and underscores
|
|
||||||
if video_id and (
|
|
||||||
len(video_id) != 11 or
|
|
||||||
not video_id.replace('-', '').replace('_', '').isalnum() or
|
|
||||||
" " in video_id or
|
|
||||||
"Lyrics" in video_id or
|
|
||||||
"KARAOKE" in video_id.upper() or
|
|
||||||
"Vocal" in video_id or
|
|
||||||
"Guide" in video_id
|
|
||||||
):
|
|
||||||
print(f" ⚠️ Detected corrupted video ID in cache: '{video_id}'")
|
|
||||||
corrupted = True
|
|
||||||
break
|
|
||||||
|
|
||||||
if corrupted:
|
|
||||||
print(f" 🧹 Clearing corrupted cache for {channel_id}")
|
|
||||||
self._clear_channel_cache(channel_id)
|
|
||||||
force_refresh = True
|
|
||||||
else:
|
|
||||||
print(f" 📋 Using cached video list ({len(cached_videos)} videos)")
|
|
||||||
return cached_videos
|
|
||||||
|
|
||||||
# Choose fetch method based on show_pagination flag
|
|
||||||
if show_pagination:
|
|
||||||
return self._fetch_videos_with_pagination(channel_url, channel_id, yt_dlp_path)
|
|
||||||
else:
|
|
||||||
return self._fetch_videos_flat_playlist(channel_url, channel_id, yt_dlp_path)
|
|
||||||
|
|
||||||
def _fetch_videos_with_pagination(self, channel_url, channel_id, yt_dlp_path):
|
|
||||||
"""Fetch videos showing page-by-page progress."""
|
|
||||||
print(f" 🌐 Fetching video list from YouTube (page-by-page mode)...")
|
|
||||||
print(f" 📡 Channel URL: {channel_url}")
|
|
||||||
|
|
||||||
import subprocess
|
|
||||||
|
|
||||||
all_videos = []
|
|
||||||
page = 1
|
|
||||||
videos_per_page = 200 # YouTube/yt-dlp supports up to 200 videos per page, reducing API calls and errors
|
|
||||||
|
|
||||||
while True:
|
|
||||||
print(f" 📄 Fetching page {page}...")
|
|
||||||
|
|
||||||
# Fetch one page at a time
|
|
||||||
cmd = [
|
|
||||||
yt_dlp_path,
|
|
||||||
"--flat-playlist",
|
|
||||||
"--print",
|
|
||||||
"%(title)s|%(id)s|%(url)s",
|
|
||||||
"--playlist-start",
|
|
||||||
str((page - 1) * videos_per_page + 1),
|
|
||||||
"--playlist-end",
|
|
||||||
str(page * videos_per_page),
|
|
||||||
channel_url,
|
|
||||||
]
|
]
|
||||||
|
|
||||||
try:
|
for key in possible_keys:
|
||||||
# Increased timeout to 180 seconds for larger pages (200 videos)
|
if key and key in self.cache["channels"]:
|
||||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True, timeout=180)
|
cache_data = self.cache["channels"][key]["videos"]
|
||||||
lines = result.stdout.strip().splitlines()
|
cache_key = key
|
||||||
|
|
||||||
# Save raw output for debugging (for each page)
|
|
||||||
raw_output_file = self._get_channel_cache_file(channel_id).parent / f"{channel_id}_raw_output_page{page}.txt"
|
|
||||||
try:
|
|
||||||
with open(raw_output_file, 'w', encoding='utf-8') as f:
|
|
||||||
f.write(f"# Raw yt-dlp output for {channel_id} - Page {page}\n")
|
|
||||||
f.write(f"# Channel URL: {channel_url}\n")
|
|
||||||
f.write(f"# Command: {' '.join(cmd)}\n")
|
|
||||||
f.write(f"# Timestamp: {datetime.now().isoformat()}\n")
|
|
||||||
f.write(f"# Total lines: {len(lines)}\n")
|
|
||||||
f.write("#" * 80 + "\n\n")
|
|
||||||
for i, line in enumerate(lines, 1):
|
|
||||||
f.write(f"{i:6d}: {line}\n")
|
|
||||||
print(f" 💾 Saved raw output to: {raw_output_file.name}")
|
|
||||||
except Exception as e:
|
|
||||||
print(f" ⚠️ Could not save raw output: {e}")
|
|
||||||
|
|
||||||
if not lines:
|
|
||||||
print(f" ✅ No more videos found on page {page}")
|
|
||||||
break
|
break
|
||||||
|
|
||||||
print(f" 📊 Page {page}: Found {len(lines)} videos")
|
# Try flat structure (old format) as fallback
|
||||||
|
if cache_data is None:
|
||||||
|
possible_keys = [
|
||||||
|
channel_id, # The extracted channel ID
|
||||||
|
channel_url, # The full URL
|
||||||
|
channel_name, # The extracted channel name
|
||||||
|
]
|
||||||
|
|
||||||
page_videos = []
|
for key in possible_keys:
|
||||||
invalid_count = 0
|
if key and key in self.cache:
|
||||||
|
cache_data = self.cache[key]
|
||||||
|
cache_key = key
|
||||||
|
break
|
||||||
|
|
||||||
for line in lines:
|
if not cache_key:
|
||||||
if not line.strip():
|
cache_key = channel_id or channel_url # Use as fallback for new entries
|
||||||
continue
|
|
||||||
|
|
||||||
# More robust parsing that handles titles with | characters
|
print(f" 🔍 Trying cache keys: {possible_keys}")
|
||||||
# Extract video ID directly from the URL that yt-dlp provides
|
print(f" 🔍 Selected cache key: '{cache_key}'")
|
||||||
|
|
||||||
# Find the URL and extract video ID from it
|
if not force_refresh and cache_data is not None:
|
||||||
url_match = re.search(r'https://www\.youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})', line)
|
print(
|
||||||
if not url_match:
|
f" 📋 Using cached video list ({len(cache_data)} videos)"
|
||||||
continue
|
)
|
||||||
|
# Convert old cache format to new format if needed
|
||||||
# Extract video ID directly from the URL
|
converted_videos = []
|
||||||
video_id = url_match.group(1)
|
for video in cache_data:
|
||||||
|
if "video_id" in video and "id" not in video:
|
||||||
# Extract title (everything before the video ID in the line)
|
# Convert old format to new format
|
||||||
title = line[:line.find(video_id)].rstrip('|').strip()
|
converted_videos.append({
|
||||||
|
"title": video["title"],
|
||||||
# Validate video ID
|
"id": video["video_id"]
|
||||||
if video_id and (
|
})
|
||||||
len(video_id) == 11 and
|
|
||||||
video_id.replace('-', '').replace('_', '').isalnum() and
|
|
||||||
" " not in video_id and
|
|
||||||
"Lyrics" not in video_id and
|
|
||||||
"KARAOKE" not in video_id.upper() and
|
|
||||||
"Vocal" not in video_id and
|
|
||||||
"Guide" not in video_id
|
|
||||||
):
|
|
||||||
page_videos.append({"title": title, "id": video_id})
|
|
||||||
else:
|
else:
|
||||||
invalid_count += 1
|
# Already in new format
|
||||||
if invalid_count <= 3: # Show first 3 invalid IDs per page
|
converted_videos.append(video)
|
||||||
print(f" ⚠️ Invalid ID: '{video_id}' for '{title[:50]}...'")
|
return converted_videos
|
||||||
|
else:
|
||||||
if invalid_count > 3:
|
print(f" ❌ Cache miss for all keys")
|
||||||
print(f" ⚠️ ... and {invalid_count - 3} more invalid IDs on this page")
|
|
||||||
|
|
||||||
all_videos.extend(page_videos)
|
|
||||||
print(f" ✅ Page {page}: Added {len(page_videos)} valid videos (total: {len(all_videos)})")
|
|
||||||
|
|
||||||
# If we got fewer videos than expected, we're probably at the end
|
|
||||||
if len(lines) < videos_per_page:
|
|
||||||
print(f" 🏁 Reached end of channel (last page had {len(lines)} videos)")
|
|
||||||
break
|
|
||||||
|
|
||||||
page += 1
|
|
||||||
|
|
||||||
# Safety check to prevent infinite loops
|
|
||||||
if page > 50: # Max 50 pages (10,000 videos with 200 per page)
|
|
||||||
print(f" ⚠️ Reached maximum page limit (50 pages), stopping")
|
|
||||||
break
|
|
||||||
|
|
||||||
except subprocess.TimeoutExpired:
|
|
||||||
print(f" ⚠️ Page {page} timed out, stopping")
|
|
||||||
break
|
|
||||||
except subprocess.CalledProcessError as e:
|
|
||||||
print(f" ❌ Error fetching page {page}: {e}")
|
|
||||||
break
|
|
||||||
except KeyboardInterrupt:
|
|
||||||
print(f" ⏹️ User interrupted, stopping at page {page}")
|
|
||||||
break
|
|
||||||
|
|
||||||
if not all_videos:
|
|
||||||
print(f" ❌ No valid videos found")
|
|
||||||
return []
|
|
||||||
|
|
||||||
print(f" 🎉 Channel download complete!")
|
|
||||||
print(f" 📊 Total videos fetched: {len(all_videos)}")
|
|
||||||
|
|
||||||
# Save to individual channel cache file
|
|
||||||
self._save_channel_cache(channel_id, all_videos)
|
|
||||||
print(f" 💾 Saved cache to: {self._get_channel_cache_file(channel_id).name}")
|
|
||||||
|
|
||||||
return all_videos
|
|
||||||
|
|
||||||
def _fetch_videos_flat_playlist(self, channel_url, channel_id, yt_dlp_path):
|
|
||||||
"""Fetch all videos using flat playlist (faster but less detailed progress)."""
|
|
||||||
# Fetch with yt-dlp
|
# Fetch with yt-dlp
|
||||||
print(f" 🌐 Fetching video list from YouTube (this may take a while)...")
|
print(f" 🌐 Fetching video list from YouTube (this may take a while)...")
|
||||||
print(f" 📡 Channel URL: {channel_url}")
|
|
||||||
|
|
||||||
import subprocess
|
import subprocess
|
||||||
from karaoke_downloader.youtube_utils import _parse_yt_dlp_command
|
from karaoke_downloader.youtube_utils import _parse_yt_dlp_command
|
||||||
|
|
||||||
# First, let's get the total count to show progress
|
|
||||||
count_cmd = _parse_yt_dlp_command(yt_dlp_path) + [
|
|
||||||
"--flat-playlist",
|
|
||||||
"--print",
|
|
||||||
"%(title)s",
|
|
||||||
"--playlist-end",
|
|
||||||
"1", # Just get first video to test
|
|
||||||
channel_url,
|
|
||||||
]
|
|
||||||
|
|
||||||
try:
|
|
||||||
print(f" 🔍 Testing channel access...")
|
|
||||||
test_result = subprocess.run(count_cmd, capture_output=True, text=True, timeout=30)
|
|
||||||
if test_result.returncode == 0:
|
|
||||||
print(f" ✅ Channel is accessible")
|
|
||||||
else:
|
|
||||||
print(f" ⚠️ Channel test failed: {test_result.stderr}")
|
|
||||||
except subprocess.TimeoutExpired:
|
|
||||||
print(f" ⚠️ Channel test timed out")
|
|
||||||
except Exception as e:
|
|
||||||
print(f" ⚠️ Channel test error: {e}")
|
|
||||||
|
|
||||||
# Now fetch all videos with progress indicators
|
|
||||||
cmd = _parse_yt_dlp_command(yt_dlp_path) + [
|
cmd = _parse_yt_dlp_command(yt_dlp_path) + [
|
||||||
"--flat-playlist",
|
"--flat-playlist",
|
||||||
"--print",
|
"--print",
|
||||||
"%(title)s|%(id)s|%(url)s",
|
"%(title)s|%(id)s|%(url)s",
|
||||||
"--verbose", # Add verbose output to see what's happening
|
|
||||||
channel_url,
|
channel_url,
|
||||||
]
|
]
|
||||||
|
|
||||||
try:
|
try:
|
||||||
print(f" 🔧 Running yt-dlp command: {' '.join(cmd)}")
|
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
|
||||||
print(f" 📥 Starting video list download...")
|
|
||||||
|
|
||||||
# Use a timeout and show progress
|
|
||||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True, timeout=300)
|
|
||||||
lines = result.stdout.strip().splitlines()
|
lines = result.stdout.strip().splitlines()
|
||||||
|
|
||||||
# Save raw output for debugging
|
|
||||||
raw_output_file = self._get_channel_cache_file(channel_id).parent / f"{channel_id}_raw_output.txt"
|
|
||||||
try:
|
|
||||||
with open(raw_output_file, 'w', encoding='utf-8') as f:
|
|
||||||
f.write(f"# Raw yt-dlp output for {channel_id}\n")
|
|
||||||
f.write(f"# Channel URL: {channel_url}\n")
|
|
||||||
f.write(f"# Command: {' '.join(cmd)}\n")
|
|
||||||
f.write(f"# Timestamp: {datetime.now().isoformat()}\n")
|
|
||||||
f.write(f"# Total lines: {len(lines)}\n")
|
|
||||||
f.write("#" * 80 + "\n\n")
|
|
||||||
for i, line in enumerate(lines, 1):
|
|
||||||
f.write(f"{i:6d}: {line}\n")
|
|
||||||
print(f" 💾 Saved raw output to: {raw_output_file.name}")
|
|
||||||
except Exception as e:
|
|
||||||
print(f" ⚠️ Could not save raw output: {e}")
|
|
||||||
|
|
||||||
print(f" 📄 Raw output lines: {len(lines)}")
|
|
||||||
print(f" 📊 Download completed successfully!")
|
|
||||||
|
|
||||||
# Show some sample lines to understand the format
|
|
||||||
if lines:
|
|
||||||
print(f" 📋 Sample output format:")
|
|
||||||
for i, line in enumerate(lines[:3]):
|
|
||||||
print(f" Line {i+1}: {line[:100]}...")
|
|
||||||
if len(lines) > 3:
|
|
||||||
print(f" ... and {len(lines) - 3} more lines")
|
|
||||||
|
|
||||||
videos = []
|
videos = []
|
||||||
invalid_count = 0
|
for line in lines:
|
||||||
|
parts = line.split("|")
|
||||||
print(f" 🔍 Processing {len(lines)} video entries...")
|
if len(parts) >= 2:
|
||||||
|
title, video_id = parts[0].strip(), parts[1].strip()
|
||||||
for i, line in enumerate(lines):
|
|
||||||
if i % 1000 == 0 and i > 0: # Progress indicator every 1000 lines
|
|
||||||
print(f" 📊 Processing line {i}/{len(lines)}... ({i/len(lines)*100:.1f}%)")
|
|
||||||
|
|
||||||
# More robust parsing that handles titles with | characters
|
|
||||||
# Extract video ID directly from the URL that yt-dlp provides
|
|
||||||
|
|
||||||
# Find the URL and extract video ID from it
|
|
||||||
url_match = re.search(r'https://www\.youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})', line)
|
|
||||||
if not url_match:
|
|
||||||
invalid_count += 1
|
|
||||||
if invalid_count <= 5:
|
|
||||||
print(f" ⚠️ Skipping line with no URL: '{line[:100]}...'")
|
|
||||||
elif invalid_count == 6:
|
|
||||||
print(f" ⚠️ ... and {len(lines) - i - 1} more invalid lines")
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Extract video ID directly from the URL
|
|
||||||
video_id = url_match.group(1)
|
|
||||||
|
|
||||||
# Extract title (everything before the video ID in the line)
|
|
||||||
title = line[:line.find(video_id)].rstrip('|').strip()
|
|
||||||
|
|
||||||
# Validate video ID
|
|
||||||
if video_id and (
|
|
||||||
len(video_id) == 11 and
|
|
||||||
video_id.replace('-', '').replace('_', '').isalnum() and
|
|
||||||
" " not in video_id and
|
|
||||||
"Lyrics" not in video_id and
|
|
||||||
"KARAOKE" not in video_id.upper() and
|
|
||||||
"Vocal" not in video_id and
|
|
||||||
"Guide" not in video_id
|
|
||||||
):
|
|
||||||
videos.append({"title": title, "id": video_id})
|
videos.append({"title": title, "id": video_id})
|
||||||
else:
|
|
||||||
invalid_count += 1
|
|
||||||
if invalid_count <= 5: # Only show first 5 invalid IDs
|
|
||||||
print(f" ⚠️ Skipping invalid video ID: '{video_id}' for title: '{title[:50]}...'")
|
|
||||||
elif invalid_count == 6:
|
|
||||||
print(f" ⚠️ ... and {len(lines) - i - 1} more invalid IDs")
|
|
||||||
|
|
||||||
if not videos:
|
# Save in nested structure format
|
||||||
print(f" ❌ No valid videos found after parsing")
|
if "channels" not in self.cache:
|
||||||
return []
|
self.cache["channels"] = {}
|
||||||
|
|
||||||
print(f" ✅ Parsed {len(videos)} valid videos from YouTube")
|
self.cache["channels"][cache_key] = {
|
||||||
print(f" ⚠️ Skipped {invalid_count} invalid video IDs")
|
"videos": videos,
|
||||||
|
"last_updated": datetime.now().isoformat(),
|
||||||
# Save to individual channel cache file
|
"channel_name": channel_name,
|
||||||
self._save_channel_cache(channel_id, videos)
|
"channel_id": channel_id
|
||||||
print(f" 💾 Saved cache to: {self._get_channel_cache_file(channel_id).name}")
|
}
|
||||||
|
|
||||||
|
self.save_cache()
|
||||||
return videos
|
return videos
|
||||||
|
|
||||||
except subprocess.TimeoutExpired:
|
|
||||||
print(f"❌ yt-dlp timed out after 5 minutes - channel may be too large")
|
|
||||||
return []
|
|
||||||
except subprocess.CalledProcessError as e:
|
except subprocess.CalledProcessError as e:
|
||||||
print(f"❌ yt-dlp failed to fetch playlist for cache: {e}")
|
print(f"❌ yt-dlp failed to fetch playlist for cache: {e}")
|
||||||
print(f" 📄 stderr: {e.stderr}")
|
|
||||||
return []
|
return []
|
||||||
|
|||||||
@ -107,10 +107,6 @@ def download_single_video(
|
|||||||
|
|
||||||
video_url = f"https://www.youtube.com/watch?v={video_id}"
|
video_url = f"https://www.youtube.com/watch?v={video_id}"
|
||||||
|
|
||||||
# Debug: Show the video_id and URL being used
|
|
||||||
print(f"🔍 DEBUG: video_id = '{video_id}'")
|
|
||||||
print(f"🔍 DEBUG: video_url = '{video_url}'")
|
|
||||||
|
|
||||||
# Build command using centralized utility
|
# Build command using centralized utility
|
||||||
cmd = build_yt_dlp_command(yt_dlp_path, video_url, output_path, config)
|
cmd = build_yt_dlp_command(yt_dlp_path, video_url, output_path, config)
|
||||||
|
|
||||||
@ -259,7 +255,7 @@ def execute_download_plan(
|
|||||||
video_id = item["video_id"]
|
video_id = item["video_id"]
|
||||||
video_title = item["video_title"]
|
video_title = item["video_title"]
|
||||||
|
|
||||||
print(f"\n⬇️ Downloading {downloaded_count + 1} of {total_to_download}:")
|
print(f"\n⬇️ Downloading {len(download_plan) - idx} of {total_to_download}:")
|
||||||
print(f" 📋 Songlist: {artist} - {title}")
|
print(f" 📋 Songlist: {artist} - {title}")
|
||||||
print(f" 🎬 Video: {video_title} ({channel_name})")
|
print(f" 🎬 Video: {video_title} ({channel_name})")
|
||||||
if "match_score" in item:
|
if "match_score" in item:
|
||||||
|
|||||||
@ -23,9 +23,15 @@ def _parse_yt_dlp_command(yt_dlp_path: str) -> List[str]:
|
|||||||
|
|
||||||
|
|
||||||
def get_channel_info(
|
def get_channel_info(
|
||||||
channel_url: str, yt_dlp_path: str = "downloader/yt-dlp.exe"
|
channel_url: str, yt_dlp_path: str = None
|
||||||
) -> tuple[str, str]:
|
) -> tuple[str, str]:
|
||||||
"""Get channel information using yt-dlp. Returns (channel_name, channel_id)."""
|
"""Get channel information using yt-dlp. Returns (channel_name, channel_id)."""
|
||||||
|
# Use platform-aware path if none provided
|
||||||
|
if yt_dlp_path is None:
|
||||||
|
from karaoke_downloader.config_manager import load_config
|
||||||
|
config = load_config()
|
||||||
|
yt_dlp_path = config.yt_dlp_path
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Extract channel name from URL for now (faster than calling yt-dlp)
|
# Extract channel name from URL for now (faster than calling yt-dlp)
|
||||||
if "/@" in channel_url:
|
if "/@" in channel_url:
|
||||||
@ -52,9 +58,15 @@ def get_channel_info(
|
|||||||
|
|
||||||
|
|
||||||
def get_playlist_info(
|
def get_playlist_info(
|
||||||
playlist_url: str, yt_dlp_path: str = "downloader/yt-dlp.exe"
|
playlist_url: str, yt_dlp_path: str = None
|
||||||
) -> List[Dict[str, Any]]:
|
) -> List[Dict[str, Any]]:
|
||||||
"""Get playlist information using yt-dlp."""
|
"""Get playlist information using yt-dlp."""
|
||||||
|
# Use platform-aware path if none provided
|
||||||
|
if yt_dlp_path is None:
|
||||||
|
from karaoke_downloader.config_manager import load_config
|
||||||
|
config = load_config()
|
||||||
|
yt_dlp_path = config.yt_dlp_path
|
||||||
|
|
||||||
try:
|
try:
|
||||||
cmd = _parse_yt_dlp_command(yt_dlp_path) + ["--dump-json", "--flat-playlist", playlist_url]
|
cmd = _parse_yt_dlp_command(yt_dlp_path) + ["--dump-json", "--flat-playlist", playlist_url]
|
||||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
|
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
|
||||||
@ -88,7 +100,7 @@ def build_yt_dlp_command(
|
|||||||
Returns:
|
Returns:
|
||||||
List of command arguments for subprocess.run
|
List of command arguments for subprocess.run
|
||||||
"""
|
"""
|
||||||
cmd = _parse_yt_dlp_command(yt_dlp_path) + [
|
cmd = _parse_yt_dlp_command(str(yt_dlp_path)) + [
|
||||||
"--no-check-certificates",
|
"--no-check-certificates",
|
||||||
"--ignore-errors",
|
"--ignore-errors",
|
||||||
"--no-warnings",
|
"--no-warnings",
|
||||||
@ -129,7 +141,7 @@ def execute_yt_dlp_command(
|
|||||||
|
|
||||||
|
|
||||||
def show_available_formats(
|
def show_available_formats(
|
||||||
video_url: str, yt_dlp_path: str = "downloader/yt-dlp.exe", timeout: int = 30
|
video_url: str, yt_dlp_path: str = None, timeout: int = 30
|
||||||
) -> None:
|
) -> None:
|
||||||
"""
|
"""
|
||||||
Show available formats for a video (debugging utility).
|
Show available formats for a video (debugging utility).
|
||||||
@ -139,8 +151,14 @@ def show_available_formats(
|
|||||||
yt_dlp_path: Path to yt-dlp executable
|
yt_dlp_path: Path to yt-dlp executable
|
||||||
timeout: Timeout in seconds
|
timeout: Timeout in seconds
|
||||||
"""
|
"""
|
||||||
|
# Use platform-aware path if none provided
|
||||||
|
if yt_dlp_path is None:
|
||||||
|
from karaoke_downloader.config_manager import load_config
|
||||||
|
config = load_config()
|
||||||
|
yt_dlp_path = config.yt_dlp_path
|
||||||
|
|
||||||
print(f"🔍 Checking available formats for: {video_url}")
|
print(f"🔍 Checking available formats for: {video_url}")
|
||||||
format_cmd = _parse_yt_dlp_command(yt_dlp_path) + ["--list-formats", video_url]
|
format_cmd = _parse_yt_dlp_command(str(yt_dlp_path)) + ["--list-formats", video_url]
|
||||||
try:
|
try:
|
||||||
format_result = subprocess.run(
|
format_result = subprocess.run(
|
||||||
format_cmd, capture_output=True, text=True, timeout=timeout
|
format_cmd, capture_output=True, text=True, timeout=timeout
|
||||||
|
|||||||
220
setup_macos.py
220
setup_macos.py
@ -1,220 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
macOS setup script for Karaoke Video Downloader.
|
|
||||||
This script helps users set up yt-dlp and FFmpeg on macOS.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
import subprocess
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
|
||||||
def check_ffmpeg():
|
|
||||||
"""Check if FFmpeg is installed."""
|
|
||||||
try:
|
|
||||||
result = subprocess.run(["ffmpeg", "-version"], capture_output=True, text=True, timeout=10)
|
|
||||||
return result.returncode == 0
|
|
||||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def check_yt_dlp():
|
|
||||||
"""Check if yt-dlp is installed via pip or binary."""
|
|
||||||
# Check pip installation
|
|
||||||
try:
|
|
||||||
result = subprocess.run([sys.executable, "-m", "yt_dlp", "--version"],
|
|
||||||
capture_output=True, text=True, timeout=10)
|
|
||||||
if result.returncode == 0:
|
|
||||||
return True
|
|
||||||
except (subprocess.TimeoutExpired, subprocess.CalledProcessError):
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Check binary file
|
|
||||||
binary_path = Path("downloader/yt-dlp_macos")
|
|
||||||
if binary_path.exists():
|
|
||||||
try:
|
|
||||||
result = subprocess.run([str(binary_path), "--version"],
|
|
||||||
capture_output=True, text=True, timeout=10)
|
|
||||||
return result.returncode == 0
|
|
||||||
except (subprocess.TimeoutExpired, subprocess.CalledProcessError):
|
|
||||||
pass
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def install_ffmpeg():
|
|
||||||
"""Install FFmpeg via Homebrew."""
|
|
||||||
print("🎬 Installing FFmpeg...")
|
|
||||||
|
|
||||||
# Check if Homebrew is installed
|
|
||||||
try:
|
|
||||||
subprocess.run(["brew", "--version"], capture_output=True, check=True)
|
|
||||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
|
||||||
print("❌ Homebrew is not installed. Please install Homebrew first:")
|
|
||||||
print(" /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"")
|
|
||||||
return False
|
|
||||||
|
|
||||||
try:
|
|
||||||
print("🍺 Installing FFmpeg via Homebrew...")
|
|
||||||
result = subprocess.run(["brew", "install", "ffmpeg"],
|
|
||||||
capture_output=True, text=True, check=True)
|
|
||||||
print("✅ FFmpeg installed successfully!")
|
|
||||||
return True
|
|
||||||
except subprocess.CalledProcessError as e:
|
|
||||||
print(f"❌ Failed to install FFmpeg: {e}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def download_yt_dlp_binary():
|
|
||||||
"""Download yt-dlp binary for macOS."""
|
|
||||||
print("📥 Downloading yt-dlp binary for macOS...")
|
|
||||||
|
|
||||||
# Create downloader directory if it doesn't exist
|
|
||||||
downloader_dir = Path("downloader")
|
|
||||||
downloader_dir.mkdir(exist_ok=True)
|
|
||||||
|
|
||||||
# Download yt-dlp binary
|
|
||||||
binary_path = downloader_dir / "yt-dlp_macos"
|
|
||||||
url = "https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos"
|
|
||||||
|
|
||||||
try:
|
|
||||||
print(f"📡 Downloading from: {url}")
|
|
||||||
result = subprocess.run(["curl", "-L", "-o", str(binary_path), url],
|
|
||||||
capture_output=True, text=True, check=True)
|
|
||||||
|
|
||||||
# Make it executable
|
|
||||||
binary_path.chmod(0o755)
|
|
||||||
print(f"✅ yt-dlp binary downloaded to: {binary_path}")
|
|
||||||
|
|
||||||
# Test the binary
|
|
||||||
test_result = subprocess.run([str(binary_path), "--version"],
|
|
||||||
capture_output=True, text=True, timeout=10)
|
|
||||||
if test_result.returncode == 0:
|
|
||||||
version = test_result.stdout.strip()
|
|
||||||
print(f"✅ Binary test successful! Version: {version}")
|
|
||||||
return True
|
|
||||||
else:
|
|
||||||
print(f"❌ Binary test failed: {test_result.stderr}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
except subprocess.CalledProcessError as e:
|
|
||||||
print(f"❌ Failed to download yt-dlp binary: {e}")
|
|
||||||
return False
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Error downloading binary: {e}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def install_yt_dlp():
|
|
||||||
"""Install yt-dlp via pip."""
|
|
||||||
print("📦 Installing yt-dlp...")
|
|
||||||
|
|
||||||
try:
|
|
||||||
result = subprocess.run([sys.executable, "-m", "pip", "install", "yt-dlp"],
|
|
||||||
capture_output=True, text=True, check=True)
|
|
||||||
print("✅ yt-dlp installed successfully!")
|
|
||||||
return True
|
|
||||||
except subprocess.CalledProcessError as e:
|
|
||||||
print(f"❌ Failed to install yt-dlp: {e}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def test_installation():
|
|
||||||
"""Test the installation."""
|
|
||||||
print("\n🧪 Testing installation...")
|
|
||||||
|
|
||||||
# Test FFmpeg
|
|
||||||
if check_ffmpeg():
|
|
||||||
print("✅ FFmpeg is working!")
|
|
||||||
else:
|
|
||||||
print("❌ FFmpeg is not working")
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Test yt-dlp
|
|
||||||
if check_yt_dlp():
|
|
||||||
print("✅ yt-dlp is working!")
|
|
||||||
else:
|
|
||||||
print("❌ yt-dlp is not working")
|
|
||||||
return False
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
print("🍎 macOS Setup for Karaoke Video Downloader")
|
|
||||||
print("=" * 50)
|
|
||||||
|
|
||||||
# Check current status
|
|
||||||
print("🔍 Checking current installation...")
|
|
||||||
ffmpeg_installed = check_ffmpeg()
|
|
||||||
yt_dlp_installed = check_yt_dlp()
|
|
||||||
|
|
||||||
print(f"FFmpeg: {'✅ Installed' if ffmpeg_installed else '❌ Not installed'}")
|
|
||||||
print(f"yt-dlp: {'✅ Installed' if yt_dlp_installed else '❌ Not installed'}")
|
|
||||||
|
|
||||||
if ffmpeg_installed and yt_dlp_installed:
|
|
||||||
print("\n🎉 Everything is already installed and working!")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Install missing components
|
|
||||||
print("\n🚀 Installing missing components...")
|
|
||||||
|
|
||||||
# Install FFmpeg if needed
|
|
||||||
if not ffmpeg_installed:
|
|
||||||
print("\n🎬 FFmpeg Installation Options:")
|
|
||||||
print("1. Install via Homebrew (recommended)")
|
|
||||||
print("2. Download from ffmpeg.org")
|
|
||||||
print("3. Skip FFmpeg installation")
|
|
||||||
|
|
||||||
choice = input("\nChoose an option (1-3): ").strip()
|
|
||||||
|
|
||||||
if choice == "1":
|
|
||||||
if not install_ffmpeg():
|
|
||||||
print("❌ FFmpeg installation failed")
|
|
||||||
return
|
|
||||||
elif choice == "2":
|
|
||||||
print("📥 Please download FFmpeg from: https://ffmpeg.org/download.html")
|
|
||||||
print(" Extract and add to your PATH, then run this script again.")
|
|
||||||
return
|
|
||||||
elif choice == "3":
|
|
||||||
print("⚠️ FFmpeg is required for video processing. Some features may not work.")
|
|
||||||
else:
|
|
||||||
print("❌ Invalid choice")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Install yt-dlp if needed
|
|
||||||
if not yt_dlp_installed:
|
|
||||||
print("\n📦 yt-dlp Installation Options:")
|
|
||||||
print("1. Install via pip (recommended)")
|
|
||||||
print("2. Download binary file")
|
|
||||||
print("3. Skip yt-dlp installation")
|
|
||||||
|
|
||||||
choice = input("\nChoose an option (1-3): ").strip()
|
|
||||||
|
|
||||||
if choice == "1":
|
|
||||||
if not install_yt_dlp():
|
|
||||||
print("❌ yt-dlp installation failed")
|
|
||||||
return
|
|
||||||
elif choice == "2":
|
|
||||||
if not download_yt_dlp_binary():
|
|
||||||
print("❌ yt-dlp binary download failed")
|
|
||||||
return
|
|
||||||
elif choice == "3":
|
|
||||||
print("❌ yt-dlp is required for video downloading.")
|
|
||||||
return
|
|
||||||
else:
|
|
||||||
print("❌ Invalid choice")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Test installation
|
|
||||||
if test_installation():
|
|
||||||
print("\n🎉 Setup completed successfully!")
|
|
||||||
print("You can now use the Karaoke Video Downloader on macOS.")
|
|
||||||
print("Run: python download_karaoke.py --help")
|
|
||||||
else:
|
|
||||||
print("\n❌ Setup failed. Please check the error messages above.")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
288
setup_platform.py
Normal file
288
setup_platform.py
Normal file
@ -0,0 +1,288 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Platform setup script for Karaoke Video Downloader.
|
||||||
|
This script helps users download the correct yt-dlp binary for their platform.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import platform
|
||||||
|
import sys
|
||||||
|
import urllib.request
|
||||||
|
import zipfile
|
||||||
|
import tarfile
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def detect_platform():
|
||||||
|
"""Detect the current platform and return platform info."""
|
||||||
|
system = platform.system().lower()
|
||||||
|
machine = platform.machine().lower()
|
||||||
|
|
||||||
|
if system == "windows":
|
||||||
|
return "windows", "yt-dlp.exe"
|
||||||
|
elif system == "darwin":
|
||||||
|
return "macos", "yt-dlp_macos"
|
||||||
|
elif system == "linux":
|
||||||
|
return "linux", "yt-dlp"
|
||||||
|
else:
|
||||||
|
return "unknown", "yt-dlp"
|
||||||
|
|
||||||
|
|
||||||
|
def get_download_url(platform_name):
|
||||||
|
"""Get the download URL for yt-dlp based on platform."""
|
||||||
|
base_url = "https://github.com/yt-dlp/yt-dlp/releases/latest/download"
|
||||||
|
|
||||||
|
if platform_name == "windows":
|
||||||
|
return f"{base_url}/yt-dlp.exe"
|
||||||
|
elif platform_name == "macos":
|
||||||
|
return f"{base_url}/yt-dlp_macos"
|
||||||
|
elif platform_name == "linux":
|
||||||
|
return f"{base_url}/yt-dlp"
|
||||||
|
else:
|
||||||
|
raise ValueError(f"Unsupported platform: {platform_name}")
|
||||||
|
|
||||||
|
|
||||||
|
def install_via_pip():
|
||||||
|
"""Install yt-dlp via pip."""
|
||||||
|
print("📦 Installing yt-dlp via pip...")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run([sys.executable, "-m", "pip", "install", "yt-dlp"],
|
||||||
|
capture_output=True, text=True, check=True)
|
||||||
|
print("✅ yt-dlp installed successfully via pip!")
|
||||||
|
return True
|
||||||
|
except subprocess.CalledProcessError as e:
|
||||||
|
print(f"❌ Failed to install yt-dlp via pip: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def check_ffmpeg():
|
||||||
|
"""Check if FFmpeg is installed and available."""
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run(["ffmpeg", "-version"], capture_output=True, text=True, timeout=10)
|
||||||
|
return result.returncode == 0
|
||||||
|
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def install_ffmpeg():
|
||||||
|
"""Install FFmpeg based on platform."""
|
||||||
|
import subprocess
|
||||||
|
platform_name, _ = detect_platform()
|
||||||
|
|
||||||
|
print("🎬 Installing FFmpeg...")
|
||||||
|
|
||||||
|
if platform_name == "macos":
|
||||||
|
# Try using Homebrew first
|
||||||
|
try:
|
||||||
|
print("🍺 Attempting to install FFmpeg via Homebrew...")
|
||||||
|
result = subprocess.run(["brew", "install", "ffmpeg"],
|
||||||
|
capture_output=True, text=True, check=True)
|
||||||
|
print("✅ FFmpeg installed successfully via Homebrew!")
|
||||||
|
return True
|
||||||
|
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||||
|
print("⚠️ Homebrew not found or failed. Trying alternative methods...")
|
||||||
|
|
||||||
|
# Try using MacPorts
|
||||||
|
try:
|
||||||
|
print("🍎 Attempting to install FFmpeg via MacPorts...")
|
||||||
|
result = subprocess.run(["sudo", "port", "install", "ffmpeg"],
|
||||||
|
capture_output=True, text=True, check=True)
|
||||||
|
print("✅ FFmpeg installed successfully via MacPorts!")
|
||||||
|
return True
|
||||||
|
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||||
|
print("❌ Could not install FFmpeg automatically.")
|
||||||
|
print("Please install FFmpeg manually:")
|
||||||
|
print("1. Install Homebrew: https://brew.sh/")
|
||||||
|
print("2. Run: brew install ffmpeg")
|
||||||
|
print("3. Or download from: https://ffmpeg.org/download.html")
|
||||||
|
return False
|
||||||
|
|
||||||
|
elif platform_name == "linux":
|
||||||
|
try:
|
||||||
|
print("🐧 Attempting to install FFmpeg via package manager...")
|
||||||
|
# Try apt (Ubuntu/Debian)
|
||||||
|
try:
|
||||||
|
result = subprocess.run(["sudo", "apt", "update"], capture_output=True, text=True, check=True)
|
||||||
|
result = subprocess.run(["sudo", "apt", "install", "-y", "ffmpeg"],
|
||||||
|
capture_output=True, text=True, check=True)
|
||||||
|
print("✅ FFmpeg installed successfully via apt!")
|
||||||
|
return True
|
||||||
|
except subprocess.CalledProcessError:
|
||||||
|
# Try yum (CentOS/RHEL)
|
||||||
|
try:
|
||||||
|
result = subprocess.run(["sudo", "yum", "install", "-y", "ffmpeg"],
|
||||||
|
capture_output=True, text=True, check=True)
|
||||||
|
print("✅ FFmpeg installed successfully via yum!")
|
||||||
|
return True
|
||||||
|
except subprocess.CalledProcessError:
|
||||||
|
print("❌ Could not install FFmpeg automatically.")
|
||||||
|
print("Please install FFmpeg manually for your Linux distribution.")
|
||||||
|
return False
|
||||||
|
except FileNotFoundError:
|
||||||
|
print("❌ Could not install FFmpeg automatically.")
|
||||||
|
print("Please install FFmpeg manually for your Linux distribution.")
|
||||||
|
return False
|
||||||
|
|
||||||
|
elif platform_name == "windows":
|
||||||
|
print("❌ FFmpeg installation not automated for Windows.")
|
||||||
|
print("Please install FFmpeg manually:")
|
||||||
|
print("1. Download from: https://ffmpeg.org/download.html")
|
||||||
|
print("2. Extract to a folder and add to PATH")
|
||||||
|
print("3. Or use Chocolatey: choco install ffmpeg")
|
||||||
|
return False
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def download_file(url, destination):
|
||||||
|
"""Download a file from URL to destination."""
|
||||||
|
print(f"📥 Downloading from: {url}")
|
||||||
|
print(f"📁 Saving to: {destination}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
urllib.request.urlretrieve(url, destination)
|
||||||
|
print("✅ Download completed successfully!")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Download failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def make_executable(file_path):
|
||||||
|
"""Make a file executable (for Unix-like systems)."""
|
||||||
|
try:
|
||||||
|
os.chmod(file_path, 0o755)
|
||||||
|
print(f"🔧 Made {file_path} executable")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ Could not make file executable: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print("🎤 Karaoke Video Downloader - Platform Setup")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
# Detect platform
|
||||||
|
platform_name, binary_name = detect_platform()
|
||||||
|
print(f"🖥️ Detected platform: {platform_name}")
|
||||||
|
print(f"📦 Binary name: {binary_name}")
|
||||||
|
|
||||||
|
# Create downloader directory if it doesn't exist
|
||||||
|
downloader_dir = Path("downloader")
|
||||||
|
downloader_dir.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Check if binary already exists
|
||||||
|
binary_path = downloader_dir / binary_name
|
||||||
|
if binary_path.exists():
|
||||||
|
print(f"✅ {binary_name} already exists in downloader/ directory")
|
||||||
|
response = input("Do you want to re-download it? (y/N): ").strip().lower()
|
||||||
|
if response != 'y':
|
||||||
|
print("Setup completed!")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Offer installation options
|
||||||
|
print(f"\n🔧 Installation options for {platform_name}:")
|
||||||
|
print("1. Download binary file (recommended for most users)")
|
||||||
|
print("2. Install via pip (alternative method)")
|
||||||
|
|
||||||
|
choice = input("Choose installation method (1 or 2): ").strip()
|
||||||
|
|
||||||
|
if choice == "2":
|
||||||
|
# Install via pip
|
||||||
|
if install_via_pip():
|
||||||
|
print(f"\n✅ yt-dlp installed successfully!")
|
||||||
|
|
||||||
|
# Test the installation
|
||||||
|
print(f"\n🧪 Testing yt-dlp installation...")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run([sys.executable, "-m", "yt_dlp", "--version"],
|
||||||
|
capture_output=True, text=True, timeout=10)
|
||||||
|
if result.returncode == 0:
|
||||||
|
version = result.stdout.strip()
|
||||||
|
print(f"✅ yt-dlp is working! Version: {version}")
|
||||||
|
else:
|
||||||
|
print(f"⚠️ yt-dlp test failed: {result.stderr}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ Could not test yt-dlp: {e}")
|
||||||
|
|
||||||
|
# Check and install FFmpeg
|
||||||
|
print(f"\n🎬 Checking FFmpeg installation...")
|
||||||
|
if check_ffmpeg():
|
||||||
|
print(f"✅ FFmpeg is already installed and working!")
|
||||||
|
else:
|
||||||
|
print(f"⚠️ FFmpeg not found. Installing...")
|
||||||
|
if install_ffmpeg():
|
||||||
|
print(f"✅ FFmpeg installed successfully!")
|
||||||
|
else:
|
||||||
|
print(f"⚠️ FFmpeg installation failed. The tool will still work but may be slower.")
|
||||||
|
|
||||||
|
print(f"\n🎉 Setup completed successfully!")
|
||||||
|
print(f"📦 yt-dlp installed via pip")
|
||||||
|
print(f"🖥️ Platform: {platform_name}")
|
||||||
|
print(f"\n🎉 You're ready to use the Karaoke Video Downloader!")
|
||||||
|
print(f"Run: python download_karaoke.py --help")
|
||||||
|
return
|
||||||
|
else:
|
||||||
|
print("❌ Pip installation failed. Trying binary download...")
|
||||||
|
|
||||||
|
# Download binary file
|
||||||
|
try:
|
||||||
|
download_url = get_download_url(platform_name)
|
||||||
|
except ValueError as e:
|
||||||
|
print(f"❌ {e}")
|
||||||
|
print("Please manually download yt-dlp for your platform from:")
|
||||||
|
print("https://github.com/yt-dlp/yt-dlp/releases/latest")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Download the binary
|
||||||
|
print(f"\n🚀 Downloading yt-dlp for {platform_name}...")
|
||||||
|
if download_file(download_url, binary_path):
|
||||||
|
# Make executable on Unix-like systems
|
||||||
|
if platform_name in ["macos", "linux"]:
|
||||||
|
make_executable(binary_path)
|
||||||
|
|
||||||
|
print(f"\n✅ yt-dlp binary downloaded successfully!")
|
||||||
|
print(f"📁 yt-dlp binary location: {binary_path}")
|
||||||
|
print(f"🖥️ Platform: {platform_name}")
|
||||||
|
|
||||||
|
# Test the binary
|
||||||
|
print(f"\n🧪 Testing yt-dlp installation...")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run([str(binary_path), "--version"],
|
||||||
|
capture_output=True, text=True, timeout=10)
|
||||||
|
if result.returncode == 0:
|
||||||
|
version = result.stdout.strip()
|
||||||
|
print(f"✅ yt-dlp is working! Version: {version}")
|
||||||
|
else:
|
||||||
|
print(f"⚠️ yt-dlp test failed: {result.stderr}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ Could not test yt-dlp: {e}")
|
||||||
|
|
||||||
|
# Check and install FFmpeg
|
||||||
|
print(f"\n🎬 Checking FFmpeg installation...")
|
||||||
|
if check_ffmpeg():
|
||||||
|
print(f"✅ FFmpeg is already installed and working!")
|
||||||
|
else:
|
||||||
|
print(f"⚠️ FFmpeg not found. Installing...")
|
||||||
|
if install_ffmpeg():
|
||||||
|
print(f"✅ FFmpeg installed successfully!")
|
||||||
|
else:
|
||||||
|
print(f"⚠️ FFmpeg installation failed. The tool will still work but may be slower.")
|
||||||
|
|
||||||
|
print(f"\n🎉 Setup completed successfully!")
|
||||||
|
print(f"📁 yt-dlp binary location: {binary_path}")
|
||||||
|
print(f"🖥️ Platform: {platform_name}")
|
||||||
|
print(f"\n🎉 You're ready to use the Karaoke Video Downloader!")
|
||||||
|
print(f"Run: python download_karaoke.py --help")
|
||||||
|
|
||||||
|
else:
|
||||||
|
print(f"\n❌ Setup failed. Please manually download yt-dlp for {platform_name}")
|
||||||
|
print(f"Download URL: {download_url}")
|
||||||
|
print(f"Save to: {binary_path}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@ -1,198 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Helper script to add manual videos to the manual videos collection.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import re
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Dict, List, Optional
|
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
def extract_video_id(url: str) -> Optional[str]:
|
|
||||||
"""Extract video ID from YouTube URL."""
|
|
||||||
patterns = [
|
|
||||||
r'(?:youtube\.com/watch\?v=|youtu\.be/|youtube\.com/embed/)([a-zA-Z0-9_-]{11})',
|
|
||||||
r'youtube\.com/watch\?.*v=([a-zA-Z0-9_-]{11})'
|
|
||||||
]
|
|
||||||
|
|
||||||
for pattern in patterns:
|
|
||||||
match = re.search(pattern, url)
|
|
||||||
if match:
|
|
||||||
return match.group(1)
|
|
||||||
return None
|
|
||||||
|
|
||||||
def add_manual_video(title: str, url: str, manual_file: str = None):
|
|
||||||
if manual_file is None:
|
|
||||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
|
||||||
"""
|
|
||||||
Add a manual video to the collection.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
title: Video title (e.g., "Artist - Song (Karaoke Version)")
|
|
||||||
url: YouTube URL
|
|
||||||
manual_file: Path to manual videos JSON file
|
|
||||||
"""
|
|
||||||
manual_path = Path(manual_file)
|
|
||||||
|
|
||||||
# Load existing data or create new
|
|
||||||
if manual_path.exists():
|
|
||||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
|
||||||
data = json.load(f)
|
|
||||||
else:
|
|
||||||
data = {
|
|
||||||
"channel_name": "@ManualVideos",
|
|
||||||
"channel_url": "manual://static",
|
|
||||||
"description": "Manual collection of individual karaoke videos",
|
|
||||||
"videos": [],
|
|
||||||
"parsing_rules": {
|
|
||||||
"format": "artist_title_separator",
|
|
||||||
"separator": " - ",
|
|
||||||
"artist_first": true,
|
|
||||||
"title_cleanup": {
|
|
||||||
"remove_suffix": {
|
|
||||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Extract video ID
|
|
||||||
video_id = extract_video_id(url)
|
|
||||||
if not video_id:
|
|
||||||
print(f"❌ Could not extract video ID from URL: {url}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Check if video already exists
|
|
||||||
existing_ids = [video.get("id") for video in data["videos"]]
|
|
||||||
if video_id in existing_ids:
|
|
||||||
print(f"⚠️ Video already exists: {title}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Add new video
|
|
||||||
new_video = {
|
|
||||||
"title": title,
|
|
||||||
"url": url,
|
|
||||||
"id": video_id,
|
|
||||||
"upload_date": "2024-01-01", # Default date
|
|
||||||
"duration": 180, # Default duration
|
|
||||||
"view_count": 1000 # Default view count
|
|
||||||
}
|
|
||||||
|
|
||||||
data["videos"].append(new_video)
|
|
||||||
|
|
||||||
# Save updated data
|
|
||||||
manual_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
with open(manual_path, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
|
||||||
|
|
||||||
print(f"✅ Added video: {title}")
|
|
||||||
print(f" URL: {url}")
|
|
||||||
print(f" ID: {video_id}")
|
|
||||||
return True
|
|
||||||
|
|
||||||
def list_manual_videos(manual_file: str = None):
|
|
||||||
if manual_file is None:
|
|
||||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
|
||||||
"""List all manual videos."""
|
|
||||||
manual_path = Path(manual_file)
|
|
||||||
|
|
||||||
if not manual_path.exists():
|
|
||||||
print("❌ No manual videos file found")
|
|
||||||
return
|
|
||||||
|
|
||||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
|
||||||
data = json.load(f)
|
|
||||||
|
|
||||||
print(f"📋 Manual Videos ({len(data['videos'])} videos):")
|
|
||||||
print("=" * 60)
|
|
||||||
|
|
||||||
for i, video in enumerate(data['videos'], 1):
|
|
||||||
print(f"{i:2d}. {video['title']}")
|
|
||||||
print(f" URL: {video['url']}")
|
|
||||||
print(f" ID: {video['id']}")
|
|
||||||
print()
|
|
||||||
|
|
||||||
def remove_manual_video(video_id: str, manual_file: str = None):
|
|
||||||
if manual_file is None:
|
|
||||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
|
||||||
"""Remove a manual video by ID."""
|
|
||||||
manual_path = Path(manual_file)
|
|
||||||
|
|
||||||
if not manual_path.exists():
|
|
||||||
print("❌ No manual videos file found")
|
|
||||||
return False
|
|
||||||
|
|
||||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
|
||||||
data = json.load(f)
|
|
||||||
|
|
||||||
# Find and remove video
|
|
||||||
for i, video in enumerate(data['videos']):
|
|
||||||
if video['id'] == video_id:
|
|
||||||
removed_video = data['videos'].pop(i)
|
|
||||||
with open(manual_path, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
|
||||||
print(f"✅ Removed video: {removed_video['title']}")
|
|
||||||
return True
|
|
||||||
|
|
||||||
print(f"❌ Video with ID '{video_id}' not found")
|
|
||||||
return False
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Interactive mode for adding manual videos."""
|
|
||||||
print("🎤 Manual Video Manager")
|
|
||||||
print("=" * 30)
|
|
||||||
print("1. Add video")
|
|
||||||
print("2. List videos")
|
|
||||||
print("3. Remove video")
|
|
||||||
print("4. Exit")
|
|
||||||
|
|
||||||
while True:
|
|
||||||
choice = input("\nSelect option (1-4): ").strip()
|
|
||||||
|
|
||||||
if choice == "1":
|
|
||||||
title = input("Enter video title (e.g., 'Artist - Song (Karaoke Version)'): ").strip()
|
|
||||||
url = input("Enter YouTube URL: ").strip()
|
|
||||||
|
|
||||||
if title and url:
|
|
||||||
add_manual_video(title, url)
|
|
||||||
else:
|
|
||||||
print("❌ Title and URL are required")
|
|
||||||
|
|
||||||
elif choice == "2":
|
|
||||||
list_manual_videos()
|
|
||||||
|
|
||||||
elif choice == "3":
|
|
||||||
video_id = input("Enter video ID to remove: ").strip()
|
|
||||||
if video_id:
|
|
||||||
remove_manual_video(video_id)
|
|
||||||
else:
|
|
||||||
print("❌ Video ID is required")
|
|
||||||
|
|
||||||
elif choice == "4":
|
|
||||||
print("👋 Goodbye!")
|
|
||||||
break
|
|
||||||
|
|
||||||
else:
|
|
||||||
print("❌ Invalid option")
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
import sys
|
|
||||||
|
|
||||||
if len(sys.argv) > 1:
|
|
||||||
# Command line mode
|
|
||||||
if sys.argv[1] == "add" and len(sys.argv) >= 4:
|
|
||||||
add_manual_video(sys.argv[2], sys.argv[3])
|
|
||||||
elif sys.argv[1] == "list":
|
|
||||||
list_manual_videos()
|
|
||||||
elif sys.argv[1] == "remove" and len(sys.argv) >= 3:
|
|
||||||
remove_manual_video(sys.argv[2])
|
|
||||||
else:
|
|
||||||
print("Usage:")
|
|
||||||
print(" python add_manual_video.py add 'Title' 'URL'")
|
|
||||||
print(" python add_manual_video.py list")
|
|
||||||
print(" python add_manual_video.py remove VIDEO_ID")
|
|
||||||
else:
|
|
||||||
# Interactive mode
|
|
||||||
main()
|
|
||||||
@ -1,127 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Script to build channel cache from raw yt-dlp output file.
|
|
||||||
This uses the fixed parsing logic to handle titles with | characters.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import re
|
|
||||||
from datetime import datetime
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
def parse_raw_output_file(raw_file_path):
|
|
||||||
"""Parse the raw output file and extract valid videos."""
|
|
||||||
videos = []
|
|
||||||
invalid_count = 0
|
|
||||||
|
|
||||||
print(f"🔍 Parsing raw output file: {raw_file_path}")
|
|
||||||
|
|
||||||
with open(raw_file_path, 'r', encoding='utf-8') as f:
|
|
||||||
lines = f.readlines()
|
|
||||||
|
|
||||||
# Skip header lines (lines starting with #)
|
|
||||||
data_lines = [line for line in lines if not line.strip().startswith('#') and line.strip()]
|
|
||||||
|
|
||||||
print(f"📄 Found {len(data_lines)} data lines to process")
|
|
||||||
|
|
||||||
for i, line in enumerate(data_lines):
|
|
||||||
if i % 1000 == 0 and i > 0: # Progress indicator every 1000 lines
|
|
||||||
print(f"📊 Processing line {i}/{len(data_lines)}... ({i/len(data_lines)*100:.1f}%)")
|
|
||||||
|
|
||||||
# Remove line number prefix (e.g., " 1234: ")
|
|
||||||
line = re.sub(r'^\s*\d+:\s*', '', line.strip())
|
|
||||||
|
|
||||||
# More robust parsing that handles titles with | characters
|
|
||||||
# Extract video ID directly from the URL that yt-dlp provides
|
|
||||||
|
|
||||||
# Find the URL and extract video ID from it
|
|
||||||
url_match = re.search(r'https://www\.youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})', line)
|
|
||||||
if not url_match:
|
|
||||||
invalid_count += 1
|
|
||||||
if invalid_count <= 5:
|
|
||||||
print(f"⚠️ Skipping line with no URL: '{line[:100]}...'")
|
|
||||||
elif invalid_count == 6:
|
|
||||||
print(f"⚠️ ... and {len(data_lines) - i - 1} more invalid lines")
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Extract video ID directly from the URL
|
|
||||||
video_id = url_match.group(1)
|
|
||||||
|
|
||||||
# Extract title (everything before the video ID in the line)
|
|
||||||
title = line[:line.find(video_id)].rstrip('|').strip()
|
|
||||||
|
|
||||||
# Validate video ID
|
|
||||||
if video_id and (
|
|
||||||
len(video_id) == 11 and
|
|
||||||
video_id.replace('-', '').replace('_', '').isalnum() and
|
|
||||||
" " not in video_id and
|
|
||||||
"Lyrics" not in video_id and
|
|
||||||
"KARAOKE" not in video_id.upper() and
|
|
||||||
"Vocal" not in video_id and
|
|
||||||
"Guide" not in video_id
|
|
||||||
):
|
|
||||||
videos.append({"title": title, "id": video_id})
|
|
||||||
else:
|
|
||||||
invalid_count += 1
|
|
||||||
if invalid_count <= 5: # Only show first 5 invalid IDs
|
|
||||||
print(f"⚠️ Skipping invalid video ID: '{video_id}' for title: '{title[:50]}...'")
|
|
||||||
elif invalid_count == 6:
|
|
||||||
print(f"⚠️ ... and {len(data_lines) - i - 1} more invalid IDs")
|
|
||||||
|
|
||||||
print(f"✅ Parsed {len(videos)} valid videos from raw output")
|
|
||||||
print(f"⚠️ Skipped {invalid_count} invalid video IDs")
|
|
||||||
|
|
||||||
return videos
|
|
||||||
|
|
||||||
def save_cache_file(channel_id, videos, cache_dir=None):
|
|
||||||
if cache_dir is None:
|
|
||||||
cache_dir = str(get_data_path_manager().get_channel_cache_dir())
|
|
||||||
"""Save the parsed videos to a cache file."""
|
|
||||||
cache_dir = Path(cache_dir)
|
|
||||||
cache_dir.mkdir(parents=True, exist_ok=True)
|
|
||||||
|
|
||||||
# Sanitize channel ID for filename
|
|
||||||
safe_channel_id = re.sub(r'[<>:"/\\|?*]', '_', channel_id)
|
|
||||||
cache_file = cache_dir / f"{safe_channel_id}.json"
|
|
||||||
|
|
||||||
data = {
|
|
||||||
'channel_id': channel_id,
|
|
||||||
'videos': videos,
|
|
||||||
'last_updated': datetime.now().isoformat(),
|
|
||||||
'video_count': len(videos)
|
|
||||||
}
|
|
||||||
|
|
||||||
with open(cache_file, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
|
||||||
|
|
||||||
print(f"💾 Saved cache to: {cache_file.name}")
|
|
||||||
return cache_file
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main function to build cache from raw output."""
|
|
||||||
data_path_manager = get_data_path_manager()
|
|
||||||
raw_file_path = data_path_manager.get_channel_cache_dir() / "@VocalStarKaraoke_raw_output.txt"
|
|
||||||
|
|
||||||
if not raw_file_path.exists():
|
|
||||||
print(f"❌ Raw output file not found: {raw_file_path}")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Parse the raw output file
|
|
||||||
videos = parse_raw_output_file(raw_file_path)
|
|
||||||
|
|
||||||
if not videos:
|
|
||||||
print("❌ No valid videos found")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Save to cache file
|
|
||||||
channel_id = "@VocalStarKaraoke"
|
|
||||||
cache_file = save_cache_file(channel_id, videos)
|
|
||||||
|
|
||||||
print(f"🎉 Cache build complete!")
|
|
||||||
print(f"📊 Total videos in cache: {len(videos)}")
|
|
||||||
print(f"📁 Cache file: {cache_file}")
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -1,164 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Utility script to identify and clean up duplicate files with (2), (3) suffixes.
|
|
||||||
This helps clean up files that were created before the duplicate prevention was implemented.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import re
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Dict, List, Tuple
|
|
||||||
|
|
||||||
def find_duplicate_files(downloads_dir: str = "downloads") -> Dict[str, List[Path]]:
|
|
||||||
"""
|
|
||||||
Find duplicate files with (2), (3), etc. suffixes in the downloads directory.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
downloads_dir: Path to downloads directory
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Dictionary mapping base filenames to lists of duplicate files
|
|
||||||
"""
|
|
||||||
downloads_path = Path(downloads_dir)
|
|
||||||
if not downloads_path.exists():
|
|
||||||
print(f"❌ Downloads directory not found: {downloads_dir}")
|
|
||||||
return {}
|
|
||||||
|
|
||||||
duplicates = {}
|
|
||||||
|
|
||||||
# Scan all MP4 files in the downloads directory
|
|
||||||
for mp4_file in downloads_path.rglob("*.mp4"):
|
|
||||||
filename = mp4_file.name
|
|
||||||
|
|
||||||
# Check if this is a duplicate file with (2), (3), etc.
|
|
||||||
match = re.match(r'^(.+?)\s*\((\d+)\)\.mp4$', filename)
|
|
||||||
if match:
|
|
||||||
base_name = match.group(1)
|
|
||||||
suffix_num = int(match.group(2))
|
|
||||||
|
|
||||||
if base_name not in duplicates:
|
|
||||||
duplicates[base_name] = []
|
|
||||||
|
|
||||||
duplicates[base_name].append((mp4_file, suffix_num))
|
|
||||||
|
|
||||||
# Sort duplicates by suffix number
|
|
||||||
for base_name in duplicates:
|
|
||||||
duplicates[base_name].sort(key=lambda x: x[1])
|
|
||||||
|
|
||||||
return duplicates
|
|
||||||
|
|
||||||
def analyze_duplicates(duplicates: Dict[str, List[Tuple[Path, int]]]) -> None:
|
|
||||||
"""
|
|
||||||
Analyze and display information about found duplicates.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
duplicates: Dictionary of duplicate files
|
|
||||||
"""
|
|
||||||
if not duplicates:
|
|
||||||
print("✅ No duplicate files found!")
|
|
||||||
return
|
|
||||||
|
|
||||||
print(f"🔍 Found {len(duplicates)} sets of duplicate files:")
|
|
||||||
print()
|
|
||||||
|
|
||||||
total_duplicates = 0
|
|
||||||
for base_name, files in duplicates.items():
|
|
||||||
print(f"📁 {base_name}")
|
|
||||||
for file_path, suffix in files:
|
|
||||||
file_size = file_path.stat().st_size / (1024 * 1024) # MB
|
|
||||||
print(f" ({suffix}) {file_path.name} - {file_size:.1f} MB")
|
|
||||||
print()
|
|
||||||
total_duplicates += len(files) - 1 # -1 because we keep the original
|
|
||||||
|
|
||||||
print(f"📊 Summary: {len(duplicates)} base files with {total_duplicates} duplicate files")
|
|
||||||
|
|
||||||
def cleanup_duplicates(duplicates: Dict[str, List[Tuple[Path, int]]], dry_run: bool = True) -> None:
|
|
||||||
"""
|
|
||||||
Clean up duplicate files, keeping only the first occurrence.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
duplicates: Dictionary of duplicate files
|
|
||||||
dry_run: If True, only show what would be deleted without actually deleting
|
|
||||||
"""
|
|
||||||
if not duplicates:
|
|
||||||
print("✅ No duplicates to clean up!")
|
|
||||||
return
|
|
||||||
|
|
||||||
mode = "DRY RUN" if dry_run else "ACTUAL CLEANUP"
|
|
||||||
print(f"🧹 Starting {mode}...")
|
|
||||||
print()
|
|
||||||
|
|
||||||
total_deleted = 0
|
|
||||||
total_size_freed = 0
|
|
||||||
|
|
||||||
for base_name, files in duplicates.items():
|
|
||||||
print(f"📁 Processing: {base_name}")
|
|
||||||
|
|
||||||
# Keep the first file (lowest suffix number), delete the rest
|
|
||||||
files_to_delete = files[1:] # Skip the first file
|
|
||||||
|
|
||||||
for file_path, suffix in files_to_delete:
|
|
||||||
file_size = file_path.stat().st_size / (1024 * 1024) # MB
|
|
||||||
|
|
||||||
if dry_run:
|
|
||||||
print(f" 🗑️ Would delete: {file_path.name} ({file_size:.1f} MB)")
|
|
||||||
else:
|
|
||||||
try:
|
|
||||||
file_path.unlink()
|
|
||||||
print(f" ✅ Deleted: {file_path.name} ({file_size:.1f} MB)")
|
|
||||||
total_deleted += 1
|
|
||||||
total_size_freed += file_size
|
|
||||||
except Exception as e:
|
|
||||||
print(f" ❌ Failed to delete {file_path.name}: {e}")
|
|
||||||
|
|
||||||
print()
|
|
||||||
|
|
||||||
if dry_run:
|
|
||||||
print(f"📊 DRY RUN SUMMARY: Would delete {len([f for files in duplicates.values() for f in files[1:]])} files")
|
|
||||||
else:
|
|
||||||
print(f"📊 CLEANUP SUMMARY: Deleted {total_deleted} files, freed {total_size_freed:.1f} MB")
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main function to run the duplicate file cleanup."""
|
|
||||||
print("🎵 Karaoke Video Downloader - Duplicate File Cleanup")
|
|
||||||
print("=" * 50)
|
|
||||||
print()
|
|
||||||
|
|
||||||
# Find duplicates
|
|
||||||
duplicates = find_duplicate_files()
|
|
||||||
|
|
||||||
if not duplicates:
|
|
||||||
print("✅ No duplicate files found!")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Analyze duplicates
|
|
||||||
analyze_duplicates(duplicates)
|
|
||||||
print()
|
|
||||||
|
|
||||||
# Ask user what to do
|
|
||||||
while True:
|
|
||||||
print("Options:")
|
|
||||||
print("1. Dry run (show what would be deleted)")
|
|
||||||
print("2. Actually delete duplicate files")
|
|
||||||
print("3. Exit without doing anything")
|
|
||||||
|
|
||||||
choice = input("\nEnter your choice (1-3): ").strip()
|
|
||||||
|
|
||||||
if choice == "1":
|
|
||||||
cleanup_duplicates(duplicates, dry_run=True)
|
|
||||||
break
|
|
||||||
elif choice == "2":
|
|
||||||
confirm = input("⚠️ Are you sure you want to delete duplicate files? (yes/no): ").strip().lower()
|
|
||||||
if confirm in ["yes", "y"]:
|
|
||||||
cleanup_duplicates(duplicates, dry_run=False)
|
|
||||||
else:
|
|
||||||
print("❌ Cleanup cancelled.")
|
|
||||||
break
|
|
||||||
elif choice == "3":
|
|
||||||
print("❌ Exiting without cleanup.")
|
|
||||||
break
|
|
||||||
else:
|
|
||||||
print("❌ Invalid choice. Please enter 1, 2, or 3.")
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -1,465 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Fix artist name formatting for Let's Sing Karaoke channel.
|
|
||||||
|
|
||||||
This script specifically targets the "Last Name, First Name" format and converts it to
|
|
||||||
"First Name Last Name" format in ID3 tags. It only processes entries where there is exactly one comma
|
|
||||||
followed by exactly 2 words, to avoid affecting multi-artist entries.
|
|
||||||
|
|
||||||
Usage:
|
|
||||||
python fix_artist_name_format.py --preview # Show what would be changed
|
|
||||||
python fix_artist_name_format.py --apply # Actually make the changes
|
|
||||||
python fix_artist_name_format.py --external "D:\Karaoke\Karaoke\MP4\Let's Sing Karaoke" # Use external directory
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import os
|
|
||||||
import re
|
|
||||||
import shutil
|
|
||||||
import argparse
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Dict, List, Tuple, Optional
|
|
||||||
|
|
||||||
# Try to import mutagen for ID3 tag manipulation
|
|
||||||
try:
|
|
||||||
from mutagen.mp4 import MP4
|
|
||||||
MUTAGEN_AVAILABLE = True
|
|
||||||
except ImportError:
|
|
||||||
MUTAGEN_AVAILABLE = False
|
|
||||||
print("⚠️ mutagen not available - install with: pip install mutagen")
|
|
||||||
|
|
||||||
|
|
||||||
def is_lastname_firstname_format(artist_name: str) -> bool:
|
|
||||||
"""
|
|
||||||
Check if artist name is in "Last Name, First Name" format.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
artist_name: The artist name to check
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if the name matches "Last Name, First Name" format with exactly 2 words after comma
|
|
||||||
"""
|
|
||||||
if ',' not in artist_name:
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Split by comma
|
|
||||||
parts = artist_name.split(',', 1)
|
|
||||||
if len(parts) != 2:
|
|
||||||
return False
|
|
||||||
|
|
||||||
last_name = parts[0].strip()
|
|
||||||
first_name_part = parts[1].strip()
|
|
||||||
|
|
||||||
# Check if there are exactly 2 words after the comma
|
|
||||||
words_after_comma = first_name_part.split()
|
|
||||||
if len(words_after_comma) != 2:
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Additional check: make sure it's not a multi-artist entry
|
|
||||||
# If there are more than 2 words total in the artist name, it might be multi-artist
|
|
||||||
total_words = len(artist_name.split())
|
|
||||||
if total_words > 4: # Last, First Name (4 words max for single artist)
|
|
||||||
return False
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
|
|
||||||
def convert_to_firstname_lastname(artist_name: str) -> str:
|
|
||||||
"""
|
|
||||||
Convert "Last Name, First Name" to "First Name Last Name".
|
|
||||||
|
|
||||||
Args:
|
|
||||||
artist_name: Artist name in "Last Name, First Name" format
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Artist name in "First Name Last Name" format
|
|
||||||
"""
|
|
||||||
parts = artist_name.split(',', 1)
|
|
||||||
last_name = parts[0].strip()
|
|
||||||
first_name_part = parts[1].strip()
|
|
||||||
|
|
||||||
# Split the first name part into words
|
|
||||||
words = first_name_part.split()
|
|
||||||
if len(words) == 2:
|
|
||||||
first_name = words[0]
|
|
||||||
middle_name = words[1]
|
|
||||||
return f"{first_name} {middle_name} {last_name}"
|
|
||||||
else:
|
|
||||||
# Fallback - just reverse the parts
|
|
||||||
return f"{first_name_part} {last_name}"
|
|
||||||
|
|
||||||
|
|
||||||
def extract_artist_title_from_filename(filename: str) -> Tuple[str, str]:
|
|
||||||
"""
|
|
||||||
Extract artist and title from a filename.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
filename: MP4 filename (without extension)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (artist, title)
|
|
||||||
"""
|
|
||||||
# Remove .mp4 extension
|
|
||||||
if filename.endswith('.mp4'):
|
|
||||||
filename = filename[:-4]
|
|
||||||
|
|
||||||
# Look for " - " separator
|
|
||||||
if " - " in filename:
|
|
||||||
parts = filename.split(" - ", 1)
|
|
||||||
return parts[0].strip(), parts[1].strip()
|
|
||||||
|
|
||||||
return "", filename
|
|
||||||
|
|
||||||
|
|
||||||
def update_id3_tags(file_path: str, new_artist: str, apply_changes: bool = False) -> bool:
|
|
||||||
"""
|
|
||||||
Update the ID3 tags in an MP4 file.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
file_path: Path to the MP4 file
|
|
||||||
new_artist: New artist name to set
|
|
||||||
apply_changes: Whether to actually apply changes or just preview
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if successful, False otherwise
|
|
||||||
"""
|
|
||||||
if not MUTAGEN_AVAILABLE:
|
|
||||||
print(f"⚠️ mutagen not available - cannot update ID3 tags for {file_path}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
try:
|
|
||||||
mp4 = MP4(file_path)
|
|
||||||
|
|
||||||
if apply_changes:
|
|
||||||
# Update the artist tag
|
|
||||||
mp4["\xa9ART"] = new_artist
|
|
||||||
mp4.save()
|
|
||||||
print(f"📝 Updated ID3 tag: {os.path.basename(file_path)} → Artist: '{new_artist}'")
|
|
||||||
else:
|
|
||||||
# Just preview what would be changed
|
|
||||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
|
||||||
print(f"📝 Would update ID3 tag: {os.path.basename(file_path)} → Artist: '{current_artist}' → '{new_artist}'")
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Failed to update ID3 tags for {file_path}: {e}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def scan_external_directory(directory_path: str) -> List[Dict]:
|
|
||||||
"""
|
|
||||||
Scan external directory for MP4 files with "Last Name, First Name" format in ID3 tags.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
directory_path: Path to the external directory
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of files that need ID3 tag updates
|
|
||||||
"""
|
|
||||||
if not os.path.exists(directory_path):
|
|
||||||
print(f"❌ Directory not found: {directory_path}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
if not MUTAGEN_AVAILABLE:
|
|
||||||
print("❌ mutagen not available - cannot scan ID3 tags")
|
|
||||||
return []
|
|
||||||
|
|
||||||
files_to_update = []
|
|
||||||
|
|
||||||
# Scan for MP4 files
|
|
||||||
for file_path in Path(directory_path).glob("*.mp4"):
|
|
||||||
try:
|
|
||||||
mp4 = MP4(str(file_path))
|
|
||||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
|
||||||
|
|
||||||
if current_artist and is_lastname_firstname_format(current_artist):
|
|
||||||
new_artist = convert_to_firstname_lastname(current_artist)
|
|
||||||
|
|
||||||
files_to_update.append({
|
|
||||||
'file_path': str(file_path),
|
|
||||||
'filename': file_path.name,
|
|
||||||
'old_artist': current_artist,
|
|
||||||
'new_artist': new_artist
|
|
||||||
})
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"⚠️ Could not read ID3 tags from {file_path.name}: {e}")
|
|
||||||
|
|
||||||
return files_to_update
|
|
||||||
|
|
||||||
|
|
||||||
def update_tracking_file(tracking_file: str, channel_name: str = "Let's Sing Karaoke", apply_changes: bool = False) -> Tuple[int, List[Dict]]:
|
|
||||||
"""
|
|
||||||
Update the karaoke tracking file to fix artist name formatting.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
tracking_file: Path to the tracking JSON file
|
|
||||||
channel_name: Channel name to target (default: Let's Sing Karaoke)
|
|
||||||
apply_changes: Whether to actually apply changes or just preview
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (number of changes made, list of changed entries)
|
|
||||||
"""
|
|
||||||
if not os.path.exists(tracking_file):
|
|
||||||
print(f"❌ Tracking file not found: {tracking_file}")
|
|
||||||
return 0, []
|
|
||||||
|
|
||||||
# Load the tracking data
|
|
||||||
with open(tracking_file, 'r', encoding='utf-8') as f:
|
|
||||||
data = json.load(f)
|
|
||||||
|
|
||||||
changes_made = 0
|
|
||||||
changed_entries = []
|
|
||||||
|
|
||||||
# Process songs
|
|
||||||
for song_key, song_data in data.get('songs', {}).items():
|
|
||||||
if song_data.get('channel_name') != channel_name:
|
|
||||||
continue
|
|
||||||
|
|
||||||
artist = song_data.get('artist', '')
|
|
||||||
if not artist or not is_lastname_firstname_format(artist):
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Convert the artist name
|
|
||||||
new_artist = convert_to_firstname_lastname(artist)
|
|
||||||
|
|
||||||
if apply_changes:
|
|
||||||
# Update the tracking data
|
|
||||||
song_data['artist'] = new_artist
|
|
||||||
|
|
||||||
# Update the video title if it exists and contains the old artist name
|
|
||||||
video_title = song_data.get('video_title', '')
|
|
||||||
if video_title and artist in video_title:
|
|
||||||
song_data['video_title'] = video_title.replace(artist, new_artist)
|
|
||||||
|
|
||||||
# Update the file path if it exists
|
|
||||||
file_path = song_data.get('file_path', '')
|
|
||||||
if file_path and artist in file_path:
|
|
||||||
song_data['file_path'] = file_path.replace(artist, new_artist)
|
|
||||||
|
|
||||||
changes_made += 1
|
|
||||||
changed_entries.append({
|
|
||||||
'song_key': song_key,
|
|
||||||
'old_artist': artist,
|
|
||||||
'new_artist': new_artist,
|
|
||||||
'title': song_data.get('title', ''),
|
|
||||||
'file_path': song_data.get('file_path', '')
|
|
||||||
})
|
|
||||||
|
|
||||||
print(f"🔄 {'Updated' if apply_changes else 'Would update'}: '{artist}' → '{new_artist}' ({song_data.get('title', '')})")
|
|
||||||
|
|
||||||
# Save the updated data
|
|
||||||
if apply_changes and changes_made > 0:
|
|
||||||
# Create backup
|
|
||||||
backup_file = f"{tracking_file}.backup"
|
|
||||||
shutil.copy2(tracking_file, backup_file)
|
|
||||||
print(f"💾 Created backup: {backup_file}")
|
|
||||||
|
|
||||||
# Save updated file
|
|
||||||
with open(tracking_file, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
|
||||||
print(f"💾 Updated tracking file: {tracking_file}")
|
|
||||||
|
|
||||||
return changes_made, changed_entries
|
|
||||||
|
|
||||||
|
|
||||||
def update_songlist_tracking(songlist_file: str, channel_name: str = "Let's Sing Karaoke", apply_changes: bool = False) -> Tuple[int, List[Dict]]:
|
|
||||||
"""
|
|
||||||
Update the songlist tracking file to fix artist name formatting.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
songlist_file: Path to the songlist tracking JSON file
|
|
||||||
channel_name: Channel name to target (default: Let's Sing Karaoke)
|
|
||||||
apply_changes: Whether to actually apply changes or just preview
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (number of changes made, list of changed entries)
|
|
||||||
"""
|
|
||||||
if not os.path.exists(songlist_file):
|
|
||||||
print(f"❌ Songlist tracking file not found: {songlist_file}")
|
|
||||||
return 0, []
|
|
||||||
|
|
||||||
# Load the songlist data
|
|
||||||
with open(songlist_file, 'r', encoding='utf-8') as f:
|
|
||||||
data = json.load(f)
|
|
||||||
|
|
||||||
changes_made = 0
|
|
||||||
changed_entries = []
|
|
||||||
|
|
||||||
# Process songlist entries
|
|
||||||
for song_key, song_data in data.items():
|
|
||||||
artist = song_data.get('artist', '')
|
|
||||||
if not artist or not is_lastname_firstname_format(artist):
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Convert the artist name
|
|
||||||
new_artist = convert_to_firstname_lastname(artist)
|
|
||||||
|
|
||||||
if apply_changes:
|
|
||||||
# Update the songlist data
|
|
||||||
song_data['artist'] = new_artist
|
|
||||||
|
|
||||||
changes_made += 1
|
|
||||||
changed_entries.append({
|
|
||||||
'song_key': song_key,
|
|
||||||
'old_artist': artist,
|
|
||||||
'new_artist': new_artist,
|
|
||||||
'title': song_data.get('title', '')
|
|
||||||
})
|
|
||||||
|
|
||||||
print(f"🔄 {'Updated' if apply_changes else 'Would update'} songlist: '{artist}' → '{new_artist}' ({song_data.get('title', '')})")
|
|
||||||
|
|
||||||
# Save the updated data
|
|
||||||
if apply_changes and changes_made > 0:
|
|
||||||
# Create backup
|
|
||||||
backup_file = f"{songlist_file}.backup"
|
|
||||||
shutil.copy2(songlist_file, backup_file)
|
|
||||||
print(f"💾 Created backup: {backup_file}")
|
|
||||||
|
|
||||||
# Save updated file
|
|
||||||
with open(songlist_file, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
|
||||||
print(f"💾 Updated songlist file: {songlist_file}")
|
|
||||||
|
|
||||||
return changes_made, changed_entries
|
|
||||||
|
|
||||||
|
|
||||||
def update_id3_tags_for_files(files_to_update: List[Dict], apply_changes: bool = False) -> int:
|
|
||||||
"""
|
|
||||||
Update ID3 tags for a list of files.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
files_to_update: List of files to update
|
|
||||||
apply_changes: Whether to actually apply changes or just preview
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Number of files successfully updated
|
|
||||||
"""
|
|
||||||
updated_count = 0
|
|
||||||
|
|
||||||
for file_info in files_to_update:
|
|
||||||
file_path = file_info['file_path']
|
|
||||||
new_artist = file_info['new_artist']
|
|
||||||
|
|
||||||
if update_id3_tags(file_path, new_artist, apply_changes):
|
|
||||||
updated_count += 1
|
|
||||||
|
|
||||||
return updated_count
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main function to run the artist name fix script."""
|
|
||||||
parser = argparse.ArgumentParser(description="Fix artist name formatting in ID3 tags for Let's Sing Karaoke")
|
|
||||||
parser.add_argument('--preview', action='store_true', help='Show what would be changed without making changes')
|
|
||||||
parser.add_argument('--apply', action='store_true', help='Actually apply the changes')
|
|
||||||
parser.add_argument('--external', type=str, help='Path to external karaoke directory')
|
|
||||||
|
|
||||||
args = parser.parse_args()
|
|
||||||
|
|
||||||
# Default to preview mode if no action specified
|
|
||||||
if not args.preview and not args.apply:
|
|
||||||
args.preview = True
|
|
||||||
|
|
||||||
print("🎤 Artist Name Format Fix Script (ID3 Tags Only)")
|
|
||||||
print("=" * 60)
|
|
||||||
print("This script will fix 'Last Name, First Name' format to 'First Name Last Name'")
|
|
||||||
print("Only targeting Let's Sing Karaoke channel to avoid affecting other channels.")
|
|
||||||
print("Focusing on ID3 tags only - filenames will not be changed.")
|
|
||||||
print()
|
|
||||||
|
|
||||||
if not MUTAGEN_AVAILABLE:
|
|
||||||
print("❌ mutagen library not available!")
|
|
||||||
print("Please install it with: pip install mutagen")
|
|
||||||
return
|
|
||||||
|
|
||||||
if args.preview:
|
|
||||||
print("🔍 PREVIEW MODE - No changes will be made")
|
|
||||||
else:
|
|
||||||
print("⚡ APPLY MODE - Changes will be made")
|
|
||||||
print()
|
|
||||||
|
|
||||||
# File paths
|
|
||||||
tracking_file = "data/karaoke_tracking.json"
|
|
||||||
songlist_file = "data/songlist_tracking.json"
|
|
||||||
|
|
||||||
# Process external directory if specified
|
|
||||||
if args.external:
|
|
||||||
print(f"📁 Scanning external directory: {args.external}")
|
|
||||||
external_files = scan_external_directory(args.external)
|
|
||||||
|
|
||||||
if external_files:
|
|
||||||
print(f"\n📋 Found {len(external_files)} files with 'Last Name, First Name' format in ID3 tags:")
|
|
||||||
for file_info in external_files:
|
|
||||||
print(f" • {file_info['filename']}: '{file_info['old_artist']}' → '{file_info['new_artist']}'")
|
|
||||||
|
|
||||||
if args.apply:
|
|
||||||
print(f"\n📝 Updating ID3 tags in external files...")
|
|
||||||
updated_count = update_id3_tags_for_files(external_files, apply_changes=True)
|
|
||||||
print(f"✅ Updated ID3 tags in {updated_count} external files")
|
|
||||||
else:
|
|
||||||
print(f"\n📝 Would update ID3 tags in {len(external_files)} external files")
|
|
||||||
else:
|
|
||||||
print("✅ No files with 'Last Name, First Name' format found in ID3 tags")
|
|
||||||
|
|
||||||
# Process tracking files (only if they exist in current project)
|
|
||||||
if os.path.exists(tracking_file):
|
|
||||||
print(f"\n📊 Processing karaoke tracking file...")
|
|
||||||
tracking_changes, tracking_entries = update_tracking_file(tracking_file, apply_changes=args.apply)
|
|
||||||
else:
|
|
||||||
print(f"\n⚠️ Tracking file not found: {tracking_file}")
|
|
||||||
tracking_changes = 0
|
|
||||||
|
|
||||||
if os.path.exists(songlist_file):
|
|
||||||
print(f"\n📊 Processing songlist tracking file...")
|
|
||||||
songlist_changes, songlist_entries = update_songlist_tracking(songlist_file, apply_changes=args.apply)
|
|
||||||
else:
|
|
||||||
print(f"\n⚠️ Songlist tracking file not found: {songlist_file}")
|
|
||||||
songlist_changes = 0
|
|
||||||
|
|
||||||
# Process local downloads directory ID3 tags
|
|
||||||
downloads_dir = "downloads"
|
|
||||||
local_id3_updates = 0
|
|
||||||
if os.path.exists(downloads_dir) and tracking_changes > 0:
|
|
||||||
print(f"\n📝 Processing ID3 tags in local downloads directory...")
|
|
||||||
# Scan local downloads for files that need ID3 tag updates
|
|
||||||
local_files = []
|
|
||||||
for entry in tracking_entries:
|
|
||||||
file_path = entry.get('file_path', '')
|
|
||||||
if file_path and os.path.exists(file_path.replace('\\', '/')):
|
|
||||||
local_files.append({
|
|
||||||
'file_path': file_path.replace('\\', '/'),
|
|
||||||
'filename': os.path.basename(file_path),
|
|
||||||
'old_artist': entry['old_artist'],
|
|
||||||
'new_artist': entry['new_artist']
|
|
||||||
})
|
|
||||||
|
|
||||||
if local_files:
|
|
||||||
local_id3_updates = update_id3_tags_for_files(local_files, apply_changes=args.apply)
|
|
||||||
|
|
||||||
total_changes = tracking_changes + songlist_changes
|
|
||||||
|
|
||||||
print("\n" + "=" * 60)
|
|
||||||
print("📋 Summary:")
|
|
||||||
print(f" • Tracking file changes: {tracking_changes}")
|
|
||||||
print(f" • Songlist file changes: {songlist_changes}")
|
|
||||||
print(f" • Local ID3 tag updates: {local_id3_updates}")
|
|
||||||
print(f" • Total changes: {total_changes}")
|
|
||||||
|
|
||||||
if args.external:
|
|
||||||
external_count = len(scan_external_directory(args.external)) if args.preview else len(external_files)
|
|
||||||
print(f" • External ID3 tag updates: {external_count}")
|
|
||||||
|
|
||||||
if total_changes > 0 or (args.external and external_count > 0):
|
|
||||||
if args.apply:
|
|
||||||
print("\n✅ Artist name formatting in ID3 tags has been fixed!")
|
|
||||||
print("💾 Backups have been created for all modified files.")
|
|
||||||
print("🔄 You may need to re-run your karaoke downloader to update any cached data.")
|
|
||||||
else:
|
|
||||||
print("\n🔍 Preview complete. Use --apply to make these changes.")
|
|
||||||
else:
|
|
||||||
print("\n✅ No changes needed! All artist names are already in the correct format.")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -1,295 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Fix artist name formatting for Let's Sing Karaoke channel.
|
|
||||||
|
|
||||||
This script specifically targets the "Last Name, First Name" format and converts it to
|
|
||||||
"First Name Last Name" format in ID3 tags. It only processes entries where there is exactly one comma
|
|
||||||
followed by exactly 2 words, to avoid affecting multi-artist entries.
|
|
||||||
|
|
||||||
Usage:
|
|
||||||
python fix_artist_name_format_simple.py --preview # Show what would be changed
|
|
||||||
python fix_artist_name_format_simple.py --apply # Actually make the changes
|
|
||||||
python fix_artist_name_format_simple.py --external "D:\Karaoke\Karaoke\MP4\Let's Sing Karaoke" # Use external directory
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import os
|
|
||||||
import re
|
|
||||||
import shutil
|
|
||||||
import argparse
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Dict, List, Tuple, Optional
|
|
||||||
|
|
||||||
# Try to import mutagen for ID3 tag manipulation
|
|
||||||
try:
|
|
||||||
from mutagen.mp4 import MP4
|
|
||||||
MUTAGEN_AVAILABLE = True
|
|
||||||
except ImportError:
|
|
||||||
MUTAGEN_AVAILABLE = False
|
|
||||||
print("WARNING: mutagen not available - install with: pip install mutagen")
|
|
||||||
|
|
||||||
|
|
||||||
def is_lastname_firstname_format(artist_name: str) -> bool:
|
|
||||||
"""
|
|
||||||
Check if artist name is in "Last Name, First Name" format.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
artist_name: The artist name to check
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if the name matches "Last Name, First Name" format with exactly 1 or 2 words after comma
|
|
||||||
"""
|
|
||||||
if ',' not in artist_name:
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Split by comma
|
|
||||||
parts = artist_name.split(',', 1)
|
|
||||||
if len(parts) != 2:
|
|
||||||
return False
|
|
||||||
|
|
||||||
last_name = parts[0].strip()
|
|
||||||
first_name_part = parts[1].strip()
|
|
||||||
|
|
||||||
# Check if there are exactly 1 or 2 words after the comma
|
|
||||||
words_after_comma = first_name_part.split()
|
|
||||||
if len(words_after_comma) not in [1, 2]:
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Additional check: make sure it's not a multi-artist entry
|
|
||||||
# If there are more than 4 words total in the artist name, it might be multi-artist
|
|
||||||
total_words = len(artist_name.split())
|
|
||||||
if total_words > 4: # Last, First Name (4 words max for single artist)
|
|
||||||
return False
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
|
|
||||||
def convert_lastname_firstname(artist_name: str) -> str:
|
|
||||||
"""
|
|
||||||
Convert "Last Name, First Name" to "First Name Last Name".
|
|
||||||
|
|
||||||
Args:
|
|
||||||
artist_name: The artist name to convert
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
The converted artist name
|
|
||||||
"""
|
|
||||||
if ',' not in artist_name:
|
|
||||||
return artist_name
|
|
||||||
|
|
||||||
parts = artist_name.split(',', 1)
|
|
||||||
if len(parts) != 2:
|
|
||||||
return artist_name
|
|
||||||
|
|
||||||
last_name = parts[0].strip()
|
|
||||||
first_name = parts[1].strip()
|
|
||||||
|
|
||||||
return f"{first_name} {last_name}"
|
|
||||||
|
|
||||||
|
|
||||||
def process_artist_name(artist_name: str) -> str:
|
|
||||||
"""
|
|
||||||
Process an artist name, handling both single artists and multiple artists separated by "&".
|
|
||||||
|
|
||||||
Args:
|
|
||||||
artist_name: The artist name to process
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
The processed artist name
|
|
||||||
"""
|
|
||||||
if '&' in artist_name:
|
|
||||||
# Split by "&" and process each artist individually
|
|
||||||
artists = [artist.strip() for artist in artist_name.split('&')]
|
|
||||||
processed_artists = []
|
|
||||||
|
|
||||||
for artist in artists:
|
|
||||||
if is_lastname_firstname_format(artist):
|
|
||||||
processed_artist = convert_lastname_firstname(artist)
|
|
||||||
processed_artists.append(processed_artist)
|
|
||||||
else:
|
|
||||||
processed_artists.append(artist)
|
|
||||||
|
|
||||||
# Rejoin with "&"
|
|
||||||
return ' & '.join(processed_artists)
|
|
||||||
else:
|
|
||||||
# Single artist
|
|
||||||
if is_lastname_firstname_format(artist_name):
|
|
||||||
return convert_lastname_firstname(artist_name)
|
|
||||||
else:
|
|
||||||
return artist_name
|
|
||||||
|
|
||||||
|
|
||||||
def update_id3_tags(file_path: str, new_artist: str, apply_changes: bool = False) -> bool:
|
|
||||||
"""
|
|
||||||
Update the ID3 tags in an MP4 file.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
file_path: Path to the MP4 file
|
|
||||||
new_artist: New artist name to set
|
|
||||||
apply_changes: Whether to actually apply changes or just preview
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if successful, False otherwise
|
|
||||||
"""
|
|
||||||
if not MUTAGEN_AVAILABLE:
|
|
||||||
print(f"WARNING: mutagen not available - cannot update ID3 tags for {file_path}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
try:
|
|
||||||
mp4 = MP4(file_path)
|
|
||||||
|
|
||||||
if apply_changes:
|
|
||||||
# Update the artist tag
|
|
||||||
mp4["\xa9ART"] = new_artist
|
|
||||||
mp4.save()
|
|
||||||
print(f"UPDATED ID3 tag: {os.path.basename(file_path)} -> Artist: '{new_artist}'")
|
|
||||||
else:
|
|
||||||
# Just preview what would be changed
|
|
||||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
|
||||||
print(f"WOULD UPDATE ID3 tag: {os.path.basename(file_path)} -> Artist: '{current_artist}' -> '{new_artist}'")
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"ERROR: Failed to update ID3 tags for {file_path}: {e}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def scan_external_directory(directory_path: str, debug: bool = False) -> List[Dict]:
|
|
||||||
"""
|
|
||||||
Scan external directory for MP4 files with "Last Name, First Name" format in ID3 tags.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
directory_path: Path to the external directory
|
|
||||||
debug: Whether to show debug information
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of files that need ID3 tag updates
|
|
||||||
"""
|
|
||||||
if not os.path.exists(directory_path):
|
|
||||||
print(f"ERROR: Directory not found: {directory_path}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
if not MUTAGEN_AVAILABLE:
|
|
||||||
print("ERROR: mutagen not available - cannot scan ID3 tags")
|
|
||||||
return []
|
|
||||||
|
|
||||||
files_to_update = []
|
|
||||||
total_files = 0
|
|
||||||
files_with_artist_tags = 0
|
|
||||||
|
|
||||||
# Scan for MP4 files
|
|
||||||
for file_path in Path(directory_path).glob("*.mp4"):
|
|
||||||
total_files += 1
|
|
||||||
try:
|
|
||||||
mp4 = MP4(str(file_path))
|
|
||||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
|
||||||
|
|
||||||
if current_artist != "Unknown":
|
|
||||||
files_with_artist_tags += 1
|
|
||||||
|
|
||||||
if debug:
|
|
||||||
print(f"DEBUG: {file_path.name} -> Artist: '{current_artist}'")
|
|
||||||
|
|
||||||
# Process the artist name to handle multiple artists
|
|
||||||
processed_artist = process_artist_name(current_artist)
|
|
||||||
|
|
||||||
if processed_artist != current_artist:
|
|
||||||
files_to_update.append({
|
|
||||||
'file_path': str(file_path),
|
|
||||||
'filename': file_path.name,
|
|
||||||
'old_artist': current_artist,
|
|
||||||
'new_artist': processed_artist
|
|
||||||
})
|
|
||||||
|
|
||||||
if debug:
|
|
||||||
print(f"DEBUG: MATCH FOUND - {file_path.name}: '{current_artist}' -> '{processed_artist}'")
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
if debug:
|
|
||||||
print(f"WARNING: Could not read ID3 tags from {file_path.name}: {e}")
|
|
||||||
|
|
||||||
print(f"INFO: Scanned {total_files} MP4 files, {files_with_artist_tags} had artist tags, {len(files_to_update)} need updates")
|
|
||||||
return files_to_update
|
|
||||||
|
|
||||||
|
|
||||||
def update_id3_tags_for_files(files_to_update: List[Dict], apply_changes: bool = False) -> int:
|
|
||||||
"""
|
|
||||||
Update ID3 tags for a list of files.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
files_to_update: List of files to update
|
|
||||||
apply_changes: Whether to actually apply changes or just preview
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Number of files successfully updated
|
|
||||||
"""
|
|
||||||
updated_count = 0
|
|
||||||
|
|
||||||
for file_info in files_to_update:
|
|
||||||
file_path = file_info['file_path']
|
|
||||||
new_artist = file_info['new_artist']
|
|
||||||
|
|
||||||
if update_id3_tags(file_path, new_artist, apply_changes):
|
|
||||||
updated_count += 1
|
|
||||||
|
|
||||||
return updated_count
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main function to run the artist name fix script."""
|
|
||||||
parser = argparse.ArgumentParser(description="Fix artist name formatting in ID3 tags for Let's Sing Karaoke")
|
|
||||||
parser.add_argument('--preview', action='store_true', help='Show what would be changed without making changes')
|
|
||||||
parser.add_argument('--apply', action='store_true', help='Actually apply the changes')
|
|
||||||
parser.add_argument('--external', type=str, help='Path to external karaoke directory')
|
|
||||||
parser.add_argument('--debug', action='store_true', help='Show debug information')
|
|
||||||
|
|
||||||
args = parser.parse_args()
|
|
||||||
|
|
||||||
# Default to preview mode if no action specified
|
|
||||||
if not args.preview and not args.apply:
|
|
||||||
args.preview = True
|
|
||||||
|
|
||||||
print("Artist Name Format Fix Script (ID3 Tags Only)")
|
|
||||||
print("=" * 60)
|
|
||||||
print("This script will fix 'Last Name, First Name' format to 'First Name Last Name'")
|
|
||||||
print("Only targeting Let's Sing Karaoke channel to avoid affecting other channels.")
|
|
||||||
print("Focusing on ID3 tags only - filenames will not be changed.")
|
|
||||||
print()
|
|
||||||
|
|
||||||
if not MUTAGEN_AVAILABLE:
|
|
||||||
print("ERROR: mutagen library not available!")
|
|
||||||
print("Please install it with: pip install mutagen")
|
|
||||||
return
|
|
||||||
|
|
||||||
if args.preview:
|
|
||||||
print("PREVIEW MODE - No changes will be made")
|
|
||||||
else:
|
|
||||||
print("APPLY MODE - Changes will be made")
|
|
||||||
print()
|
|
||||||
|
|
||||||
# Process external directory if specified
|
|
||||||
if args.external:
|
|
||||||
print(f"Scanning external directory: {args.external}")
|
|
||||||
external_files = scan_external_directory(args.external, debug=args.debug)
|
|
||||||
|
|
||||||
if external_files:
|
|
||||||
print(f"\nFound {len(external_files)} files with 'Last Name, First Name' format in ID3 tags:")
|
|
||||||
for file_info in external_files:
|
|
||||||
print(f" * {file_info['filename']}: '{file_info['old_artist']}' -> '{file_info['new_artist']}'")
|
|
||||||
|
|
||||||
if args.apply:
|
|
||||||
print(f"\nUpdating ID3 tags in external files...")
|
|
||||||
updated_count = update_id3_tags_for_files(external_files, apply_changes=True)
|
|
||||||
print(f"SUCCESS: Updated ID3 tags in {updated_count} external files")
|
|
||||||
else:
|
|
||||||
print(f"\nWould update ID3 tags in {len(external_files)} external files")
|
|
||||||
else:
|
|
||||||
print("SUCCESS: No files with 'Last Name, First Name' format found in ID3 tags")
|
|
||||||
|
|
||||||
print("\n" + "=" * 60)
|
|
||||||
print("Summary complete.")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -1,151 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Script to reset karaoke tracking and re-download files with the new channel parser.
|
|
||||||
|
|
||||||
This script will:
|
|
||||||
1. Reset the karaoke_tracking.json to remove all downloaded entries
|
|
||||||
2. Optionally delete the downloaded files
|
|
||||||
3. Allow you to re-download with the new channel parser system
|
|
||||||
"""
|
|
||||||
|
|
||||||
import json
|
|
||||||
import os
|
|
||||||
import shutil
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import List, Dict, Any
|
|
||||||
|
|
||||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
|
||||||
|
|
||||||
|
|
||||||
def reset_karaoke_tracking(tracking_file: str = None) -> None:
|
|
||||||
if tracking_file is None:
|
|
||||||
tracking_file = str(get_data_path_manager().get_karaoke_tracking_path())
|
|
||||||
"""Reset the karaoke tracking file to empty state."""
|
|
||||||
print(f"Resetting {tracking_file}...")
|
|
||||||
|
|
||||||
# Create backup of current tracking
|
|
||||||
backup_file = f"{tracking_file}.backup"
|
|
||||||
if os.path.exists(tracking_file):
|
|
||||||
shutil.copy2(tracking_file, backup_file)
|
|
||||||
print(f"Created backup: {backup_file}")
|
|
||||||
|
|
||||||
# Reset to empty state
|
|
||||||
empty_tracking = {
|
|
||||||
"playlists": {},
|
|
||||||
"songs": {}
|
|
||||||
}
|
|
||||||
|
|
||||||
with open(tracking_file, 'w', encoding='utf-8') as f:
|
|
||||||
json.dump(empty_tracking, f, indent=2, ensure_ascii=False)
|
|
||||||
|
|
||||||
print(f"✅ Reset {tracking_file} to empty state")
|
|
||||||
|
|
||||||
|
|
||||||
def delete_downloaded_files(downloads_dir: str = "downloads") -> None:
|
|
||||||
"""Delete all downloaded files and folders."""
|
|
||||||
if not os.path.exists(downloads_dir):
|
|
||||||
print(f"Downloads directory {downloads_dir} does not exist.")
|
|
||||||
return
|
|
||||||
|
|
||||||
print(f"Deleting all files in {downloads_dir}...")
|
|
||||||
|
|
||||||
try:
|
|
||||||
shutil.rmtree(downloads_dir)
|
|
||||||
print(f"✅ Deleted {downloads_dir} directory")
|
|
||||||
except Exception as e:
|
|
||||||
print(f"❌ Error deleting {downloads_dir}: {e}")
|
|
||||||
|
|
||||||
|
|
||||||
def show_download_stats(tracking_file: str = None) -> None:
|
|
||||||
if tracking_file is None:
|
|
||||||
tracking_file = str(get_data_path_manager().get_karaoke_tracking_path())
|
|
||||||
"""Show statistics about current downloads."""
|
|
||||||
if not os.path.exists(tracking_file):
|
|
||||||
print("No tracking file found.")
|
|
||||||
return
|
|
||||||
|
|
||||||
with open(tracking_file, 'r', encoding='utf-8') as f:
|
|
||||||
tracking = json.load(f)
|
|
||||||
|
|
||||||
songs = tracking.get("songs", {})
|
|
||||||
total_songs = len(songs)
|
|
||||||
|
|
||||||
if total_songs == 0:
|
|
||||||
print("No songs in tracking file.")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Count by status
|
|
||||||
status_counts = {}
|
|
||||||
channel_counts = {}
|
|
||||||
|
|
||||||
for song_id, song_data in songs.items():
|
|
||||||
status = song_data.get("status", "UNKNOWN")
|
|
||||||
channel = song_data.get("channel_name", "UNKNOWN")
|
|
||||||
|
|
||||||
status_counts[status] = status_counts.get(status, 0) + 1
|
|
||||||
channel_counts[channel] = channel_counts.get(channel, 0) + 1
|
|
||||||
|
|
||||||
print(f"\n📊 Current Download Statistics:")
|
|
||||||
print(f"Total songs: {total_songs}")
|
|
||||||
print(f"\nBy Status:")
|
|
||||||
for status, count in status_counts.items():
|
|
||||||
print(f" {status}: {count}")
|
|
||||||
|
|
||||||
print(f"\nBy Channel:")
|
|
||||||
for channel, count in channel_counts.items():
|
|
||||||
print(f" {channel}: {count}")
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main function to handle reset and re-download process."""
|
|
||||||
print("🔄 Karaoke Download Reset and Re-download Tool")
|
|
||||||
print("=" * 50)
|
|
||||||
|
|
||||||
# Show current stats
|
|
||||||
print("\nCurrent download statistics:")
|
|
||||||
show_download_stats()
|
|
||||||
|
|
||||||
# Ask user what they want to do
|
|
||||||
print("\nOptions:")
|
|
||||||
print("1. Reset tracking only (keep files)")
|
|
||||||
print("2. Reset tracking and delete all downloaded files")
|
|
||||||
print("3. Show current stats only")
|
|
||||||
print("4. Exit")
|
|
||||||
|
|
||||||
choice = input("\nEnter your choice (1-4): ").strip()
|
|
||||||
|
|
||||||
if choice == "1":
|
|
||||||
print("\n🔄 Resetting tracking only...")
|
|
||||||
reset_karaoke_tracking()
|
|
||||||
print("\n✅ Tracking reset complete!")
|
|
||||||
print("You can now re-download files with the new channel parser system.")
|
|
||||||
print("\nTo re-download, run:")
|
|
||||||
print("python download_karaoke.py --file data/channels.txt --limit 50")
|
|
||||||
|
|
||||||
elif choice == "2":
|
|
||||||
print("\n🔄 Resetting tracking and deleting files...")
|
|
||||||
confirm = input("Are you sure you want to delete ALL downloaded files? (yes/no): ").strip().lower()
|
|
||||||
|
|
||||||
if confirm == "yes":
|
|
||||||
reset_karaoke_tracking()
|
|
||||||
delete_downloaded_files()
|
|
||||||
print("\n✅ Reset complete! All tracking and files have been removed.")
|
|
||||||
print("You can now re-download files with the new channel parser system.")
|
|
||||||
print("\nTo re-download, run:")
|
|
||||||
print("python download_karaoke.py --file data/channels.txt --limit 50")
|
|
||||||
else:
|
|
||||||
print("Operation cancelled.")
|
|
||||||
|
|
||||||
elif choice == "3":
|
|
||||||
print("\n📊 Current statistics:")
|
|
||||||
show_download_stats()
|
|
||||||
|
|
||||||
elif choice == "4":
|
|
||||||
print("Exiting...")
|
|
||||||
|
|
||||||
else:
|
|
||||||
print("Invalid choice. Please enter 1, 2, 3, or 4.")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
Loading…
Reference in New Issue
Block a user