Compare commits
31 Commits
multiplatf
...
develop
| Author | SHA1 | Date | |
|---|---|---|---|
| 1b6ac6454b | |||
| e34c43a8f4 | |||
| 6a796d8571 | |||
| b0eb76930a | |||
| 157f3a171b | |||
| eb3642d652 | |||
| a82c9741a5 | |||
| 50b402ddec | |||
| 9f0787d00a | |||
| 409e66780c | |||
| 42e7a6a09c | |||
| ec95b24a69 | |||
| 21f8348419 | |||
| d18ac54476 | |||
| c48c1d3696 | |||
| 273a748a1a | |||
| 5f3b00a39a | |||
| 24a6a37efd | |||
| c864af7794 | |||
| 613b64601a | |||
| 981f92ce95 | |||
| 8dbc2fb8fd | |||
| 81b3d2d88c | |||
| 95a49bf39e | |||
| c8f02ac3b4 | |||
| f914d54067 | |||
| ea07188739 | |||
| 2c63bf809b | |||
| 7090fad1fd | |||
| c78be7a7ad | |||
| e6b2c9443c |
3
.gitignore
vendored
3
.gitignore
vendored
@ -14,9 +14,6 @@ logs/
|
||||
*.log
|
||||
|
||||
# Tracking and cache files
|
||||
karaoke_tracking.json
|
||||
karaoke_tracking.json.backup
|
||||
songlist_tracking.json
|
||||
*.cache
|
||||
|
||||
# yt-dlp temporary files
|
||||
|
||||
379
PRD.md
379
PRD.md
@ -1,8 +1,8 @@
|
||||
|
||||
# 🎤 Karaoke Video Downloader – PRD (v3.3)
|
||||
# 🎤 Karaoke Video Downloader – PRD (v3.4.4)
|
||||
|
||||
## ✅ Overview
|
||||
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
||||
A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows and macOS with automatic platform detection. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
||||
|
||||
---
|
||||
|
||||
@ -63,9 +63,9 @@ The codebase has been refactored into focused modules with centralized utilities
|
||||
---
|
||||
|
||||
## ⚙️ Platform & Stack
|
||||
- **Platform:** Windows
|
||||
- **Platform:** Windows, macOS
|
||||
- **Interface:** Command-line (CLI)
|
||||
- **Tech Stack:** Python 3.7+, yt-dlp.exe, mutagen (for ID3 tagging)
|
||||
- **Tech Stack:** Python 3.7+, yt-dlp (platform-specific binary), mutagen (for ID3 tagging)
|
||||
|
||||
---
|
||||
|
||||
@ -101,6 +101,7 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
||||
- ✅ Songlist integration: prioritize and track custom songlists
|
||||
- ✅ Songlist-only mode: download only songs from the songlist
|
||||
- ✅ Songlist focus mode: download only songs from specific playlists by title
|
||||
- ✅ Force download mode: bypass all existing file checks and re-download songs regardless of server duplicates or existing files
|
||||
- ✅ Global songlist tracking to avoid duplicates across channels
|
||||
- ✅ ID3 tagging for artist/title in MP4 files (mutagen)
|
||||
- ✅ Real-time progress and detailed logging
|
||||
@ -122,6 +123,8 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
||||
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
||||
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
||||
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
||||
- ✅ **Manual video collection**: Static video collection system for managing individual karaoke videos that don't belong to regular channels. Use `--manual` to download from `data/manual_videos.json`.
|
||||
- ✅ **Channel-specific parsing rules**: JSON-based configuration for parsing video titles from different YouTube channels, with support for various title formats and cleanup rules.
|
||||
|
||||
---
|
||||
|
||||
@ -149,19 +152,34 @@ KaroakeVideoDownloader/
|
||||
│ ├── check_resolution.py # Resolution checker utility
|
||||
│ ├── resolution_cli.py # Resolution config CLI
|
||||
│ └── tracking_cli.py # Tracking management CLI
|
||||
├── data/ # All config, tracking, cache, and songlist files
|
||||
│ ├── config.json
|
||||
├── config/ # Configuration files
|
||||
│ └── config.json # Main configuration file
|
||||
├── data/ # All tracking, cache, and songlist files
|
||||
│ ├── karaoke_tracking.json
|
||||
│ ├── songlist_tracking.json
|
||||
│ ├── channel_cache.json
|
||||
│ ├── channels.txt
|
||||
│ ├── channels.json # Channel configuration with parsing rules
|
||||
│ ├── manual_videos.json # Manual video collection
|
||||
│ └── songList.json
|
||||
├── utilities/ # Utility scripts and tools
|
||||
│ ├── add_manual_video.py # Manual video management
|
||||
│ ├── build_cache_from_raw.py # Cache building utility
|
||||
│ ├── cleanup_duplicate_files.py # File cleanup utilities
|
||||
│ ├── cleanup_recent_tracking.py # Tracking cleanup utilities
|
||||
│ ├── deduplicate_songlist_tracking.py # Data deduplication
|
||||
│ ├── fix_artist_name_format.py # Data cleanup utilities
|
||||
│ ├── fix_artist_name_format_simple.py
|
||||
│ ├── fix_code_quality.py # Development tools
|
||||
│ ├── reset_and_redownload.py # Maintenance utilities
|
||||
│ └── songlist_report.py # Reporting utilities
|
||||
├── downloads/ # All video output
|
||||
│ └── [ChannelName]/ # Per-channel folders
|
||||
├── logs/ # Download logs
|
||||
├── downloader/yt-dlp.exe # yt-dlp binary
|
||||
├── tests/ # Diagnostic and test scripts
|
||||
│ └── test_installation.py
|
||||
├── downloader/yt-dlp.exe # yt-dlp binary (Windows)
|
||||
├── downloader/yt-dlp_macos # yt-dlp binary (macOS)
|
||||
├── src/tests/ # Test scripts
|
||||
│ ├── test_macos.py # macOS setup and functionality tests
|
||||
│ └── test_platform.py # Platform detection tests
|
||||
├── download_karaoke.py # Main entry point (thin wrapper)
|
||||
├── README.md
|
||||
├── PRD.md
|
||||
@ -176,6 +194,8 @@ KaroakeVideoDownloader/
|
||||
- `--songlist-priority`: Prioritize songlist songs in download queue
|
||||
- `--songlist-only`: Download only songs from the songlist
|
||||
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
||||
- `--songlist-file <FILE_PATH>`: Custom songlist file path to use with --songlist-focus (default: data/songList.json)
|
||||
- `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
|
||||
- `--songlist-status`: Show songlist download progress
|
||||
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
||||
- `--resolution <720p|1080p|...>`: Override resolution
|
||||
@ -188,7 +208,11 @@ KaroakeVideoDownloader/
|
||||
- `--fuzzy-match`: **Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)**
|
||||
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
||||
- `--parallel`: **Enable parallel downloads for improved speed**
|
||||
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3)**
|
||||
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
|
||||
- `--manual`: **Download from manual videos collection (data/manual_videos.json)**
|
||||
- `--channel-focus <CHANNEL_NAME>`: **Download from a specific channel by name (e.g., 'SingKingKaraoke')**
|
||||
- `--all-videos`: **Download all videos from channel (not just songlist matches), skipping existing files and songs in songs.json**
|
||||
- `--dry-run`: **Build download plan and show what would be downloaded without actually downloading anything**
|
||||
|
||||
---
|
||||
|
||||
@ -199,6 +223,8 @@ KaroakeVideoDownloader/
|
||||
- **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files.
|
||||
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
||||
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
||||
- **Channel-Specific Parsing:** Uses `data/channels.json` to define parsing rules for each YouTube channel, handling different video title formats (e.g., "Artist - Title", "Artist Title", "Title | Artist", etc.).
|
||||
- **Manual Video Collection:** Static video management system using `data/manual_videos.json` for individual karaoke videos that don't belong to regular channels. Accessible via `--manual` parameter.
|
||||
|
||||
## 🔧 Refactoring Improvements (v3.3)
|
||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||
@ -252,7 +278,7 @@ The codebase has been comprehensively refactored to improve maintainability and
|
||||
|
||||
### **New Parallel Download System (v3.4)**
|
||||
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
||||
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
|
||||
- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
||||
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
||||
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
||||
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
||||
@ -268,8 +294,337 @@ The codebase has been comprehensively refactored to improve maintainability and
|
||||
- [ ] Download scheduling and retry logic
|
||||
- [ ] More granular status reporting
|
||||
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
||||
- [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
|
||||
- [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
|
||||
- [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED**
|
||||
- [ ] Unit tests for all modules
|
||||
- [ ] Integration tests for end-to-end workflows
|
||||
- [ ] Plugin system for custom file operations
|
||||
- [ ] Advanced configuration UI
|
||||
- [ ] Real-time download progress visualization
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.1)
|
||||
### **Enhanced Fuzzy Matching (v3.4.1)**
|
||||
- **Improved `extract_artist_title` function**: Enhanced to handle multiple video title formats beyond simple "Artist - Title" patterns
|
||||
- **"Title Karaoke | Artist Karaoke Version" format**: Correctly parses titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||||
- **"Title Artist KARAOKE" format**: Handles titles ending with "KARAOKE" and attempts to extract artist information
|
||||
- **Fallback handling**: Returns empty artist and full title for unparseable formats
|
||||
- **Consolidated function usage**: Removed duplicate `extract_artist_title` implementations across modules
|
||||
- **Single source of truth**: All modules now import from `fuzzy_matcher.py`
|
||||
- **Consistent parsing**: Eliminated inconsistencies between different parsing implementations
|
||||
- **Better maintainability**: Changes to parsing logic only need to be made in one place
|
||||
|
||||
### **Fixed Import Conflicts**
|
||||
- **Resolved import conflict in `download_planner.py`**: Updated to use the enhanced `extract_artist_title` from `fuzzy_matcher.py` instead of the simpler version from `id3_utils.py`
|
||||
- **Updated `id3_utils.py`**: Now imports `extract_artist_title` from `fuzzy_matcher.py` for consistency
|
||||
|
||||
### **Enhanced --limit Parameter**
|
||||
- **Fixed limit application**: The `--limit` parameter now correctly applies to the scanning phase, not just the download execution
|
||||
- **Improved performance**: When using `--limit N`, only the first N songs are scanned against channels, significantly reducing processing time for large songlists
|
||||
|
||||
### **Benefits of Recent Improvements**
|
||||
- **Better matching accuracy**: Enhanced fuzzy matching can now handle a wider variety of video title formats commonly found on YouTube karaoke channels
|
||||
- **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
|
||||
- **Consistent behavior**: All parts of the system use the same parsing logic, eliminating edge cases where different modules would parse the same title differently
|
||||
- **Improved performance**: The `--limit` parameter now works as expected, providing faster processing for targeted downloads
|
||||
- **Cleaner codebase**: Eliminated duplicate code and import conflicts, making the system more maintainable
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.2)
|
||||
### **Duplicate File Prevention & Filename Consistency**
|
||||
- **Enhanced file existence checking**: `check_file_exists_with_patterns()` now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates
|
||||
- **Automatic duplicate prevention**: Download pipeline skips downloads when files already exist (including duplicates)
|
||||
- **Updated yt-dlp configuration**: Set `"nooverwrites": false` to prevent yt-dlp from creating duplicate files with suffixes
|
||||
- **Cleanup utility**: `data/cleanup_duplicate_files.py` provides interactive cleanup of existing duplicate files
|
||||
- **Filename vs ID3 tag consistency**: Removed "(Karaoke Version)" suffix from ID3 tags to match filenames exactly
|
||||
- **Unified parsing**: Both filename generation and ID3 tagging use the same artist/title extraction logic
|
||||
|
||||
### **Benefits of Duplicate Prevention**
|
||||
- **No more duplicate files**: Eliminates `(2)`, `(3)` suffix files that waste disk space
|
||||
- **Consistent metadata**: Filename and ID3 tag use identical artist/title format
|
||||
- **Efficient disk usage**: Prevents unnecessary downloads of existing files
|
||||
- **Clear file identification**: Consistent naming across all file operations
|
||||
|
||||
## 🛠️ Maintenance
|
||||
|
||||
### **Regular Cleanup**
|
||||
- Run the cleanup utility periodically to remove any duplicate files
|
||||
- Monitor downloads for any new duplicate creation (should be rare with fixes)
|
||||
|
||||
### **Configuration**
|
||||
- Keep `"nooverwrites": false` in `data/config.json`
|
||||
- This prevents yt-dlp from creating duplicate files
|
||||
|
||||
### **Monitoring**
|
||||
- Check logs for "⏭️ Skipping download - file already exists" messages
|
||||
- These indicate the duplicate prevention is working correctly
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.3)
|
||||
### **Manual Video Collection System**
|
||||
- **New `--manual` parameter**: Simple access to manual video collection via `python download_karaoke.py --manual --limit 5`
|
||||
- **Static video management**: `data/manual_videos.json` stores individual karaoke videos that don't belong to regular channels
|
||||
- **Helper script**: `add_manual_video.py` provides easy management of manual video entries
|
||||
- **Full integration**: Manual videos work with all existing features (songlist matching, fuzzy matching, parallel downloads, etc.)
|
||||
- **No yt-dlp dependency**: Manual videos bypass YouTube API calls for video listing, using static data instead
|
||||
|
||||
### **Channel-Specific Parsing Rules**
|
||||
- **JSON-based configuration**: `data/channels.json` replaces `data/channels.txt` with structured channel configuration
|
||||
- **Parsing rules per channel**: Each channel can define custom parsing rules for video titles
|
||||
- **Multiple format support**: Handles various title formats like "Artist - Title", "Artist Title", "Title | Artist", etc.
|
||||
- **Suffix cleanup**: Automatic removal of common karaoke-related suffixes
|
||||
- **Multi-artist support**: Parsing for titles with multiple artists separated by specific delimiters
|
||||
- **Backward compatibility**: Still supports legacy `data/channels.txt` format
|
||||
|
||||
### **Benefits of New Features**
|
||||
- **Flexible video management**: Easy addition of individual karaoke videos without creating new channels
|
||||
- **Accurate parsing**: Channel-specific rules ensure correct artist/title extraction for ID3 tags and filenames
|
||||
- **Consistent metadata**: Proper parsing prevents filename and ID3 tag inconsistencies
|
||||
- **Easy maintenance**: Simple JSON structure for managing both channels and manual videos
|
||||
- **Full feature compatibility**: Manual videos work seamlessly with existing download modes and features
|
||||
|
||||
## 📚 Documentation Standards
|
||||
|
||||
### **Documentation Location**
|
||||
- **All changes, refactoring, and improvements should be documented in the PRD.md and README.md files**
|
||||
- **Do NOT create separate .md files for documenting changes, refactoring, or improvements**
|
||||
- **Use the existing sections in PRD.md and README.md to track all project evolution**
|
||||
|
||||
### **Where to Document Changes**
|
||||
- **PRD.md**: Technical details, architecture changes, bug fixes, and implementation specifics
|
||||
- **README.md**: User-facing features, usage instructions, and high-level improvements
|
||||
- **CHANGELOG.md**: Version-specific release notes and change summaries
|
||||
|
||||
### **Documentation Requirements**
|
||||
- **All new features must be documented in both PRD.md and README.md**
|
||||
- **All refactoring efforts must be documented in the appropriate sections**
|
||||
- **All bug fixes must be documented with technical details**
|
||||
- **Version numbers and dates should be clearly marked**
|
||||
- **Benefits and improvements should be explicitly stated**
|
||||
|
||||
### **Maintenance Responsibility**
|
||||
- **Keep PRD.md and README.md synchronized with code changes**
|
||||
- **Update documentation immediately when implementing new features**
|
||||
- **Remove outdated information and consolidate related changes**
|
||||
- **Ensure all CLI options and features are documented in both files**
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.4)
|
||||
### **All Videos Download Mode**
|
||||
- **New `--all-videos` parameter**: Download all videos from a channel, not just songlist matches
|
||||
- **Smart MP3/MP4 detection**: Automatically detects if you have MP3 versions in songs.json and downloads MP4 video versions
|
||||
- **Existing file skipping**: Skips videos that already exist on the filesystem
|
||||
- **Progress tracking**: Shows clear progress with "Downloading X/Y videos" format
|
||||
- **Parallel processing support**: Works with `--parallel --workers N` for faster downloads
|
||||
- **Channel focus integration**: Works with `--channel-focus` to target specific channels
|
||||
- **Limit support**: Works with `--limit N` to control download batch size
|
||||
|
||||
### **Smart Songlist Integration**
|
||||
- **MP4 version detection**: Checks if MP4 version already exists in songs.json before downloading
|
||||
- **MP3 upgrade path**: Downloads MP4 video versions when only MP3 versions exist in songlist
|
||||
- **Duplicate prevention**: Skips downloads when MP4 versions already exist
|
||||
- **Efficient filtering**: Only processes videos that need to be downloaded
|
||||
|
||||
### **Benefits of All Videos Mode**
|
||||
- **Complete channel downloads**: Download entire channels without songlist restrictions
|
||||
- **Automatic format upgrading**: Upgrade MP3 collections to MP4 video versions
|
||||
- **Efficient processing**: Only downloads videos that don't already exist
|
||||
- **Flexible control**: Use with limits, parallel processing, and channel targeting
|
||||
- **Clear progress feedback**: Real-time progress tracking for large downloads
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.5)
|
||||
### **Unified Download Workflow Architecture**
|
||||
- **Unified execution pipeline**: All download modes now use the same execution workflow, eliminating inconsistencies and broken pipelines
|
||||
- **Consistent behavior**: All modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) use identical download execution, progress tracking, and error handling
|
||||
- **Centralized download logic**: Single `execute_unified_download_workflow()` method handles all download execution
|
||||
- **Automatic parallel support**: All download modes automatically support `--parallel --workers N` without additional implementation
|
||||
- **Unified cache management**: Consistent progress tracking and resume functionality across all modes
|
||||
|
||||
### **Architecture Pattern for New Download Modes**
|
||||
When adding new download modes in the future, follow this pattern to ensure consistency:
|
||||
|
||||
#### **1. Download Plan Building (Mode-Specific)**
|
||||
Each download mode should build a download plan (list of videos to download) with this structure:
|
||||
```python
|
||||
download_plan = [
|
||||
{
|
||||
"video_id": "video_id",
|
||||
"artist": "artist_name",
|
||||
"title": "song_title",
|
||||
"filename": "sanitized_filename.mp4",
|
||||
"channel_name": "channel_name",
|
||||
"video_title": "original_video_title",
|
||||
"force_download": False
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
#### **2. Unified Execution (Shared)**
|
||||
All modes should use the unified execution workflow:
|
||||
```python
|
||||
downloaded_count, success = self.execute_unified_download_workflow(
|
||||
download_plan=download_plan,
|
||||
cache_file=cache_file, # Optional, for progress tracking
|
||||
limit=limit, # Optional, for limiting downloads
|
||||
show_progress=True, # Optional, for progress display
|
||||
)
|
||||
```
|
||||
|
||||
#### **3. Execution Method Selection (Automatic)**
|
||||
The unified workflow automatically chooses execution method based on settings:
|
||||
- **Sequential**: Uses `DownloadPipeline` for single-threaded downloads
|
||||
- **Parallel**: Uses `ParallelDownloader` when `--parallel` is enabled
|
||||
|
||||
#### **4. Required Implementation Pattern**
|
||||
```python
|
||||
def download_new_mode(self, ...):
|
||||
"""New download mode implementation."""
|
||||
|
||||
# 1. Build download plan (mode-specific logic)
|
||||
download_plan = []
|
||||
for video in videos_to_download:
|
||||
download_plan.append({
|
||||
"video_id": video["id"],
|
||||
"artist": artist,
|
||||
"title": title,
|
||||
"filename": filename,
|
||||
"channel_name": channel_name,
|
||||
"video_title": video["title"],
|
||||
"force_download": force_download
|
||||
})
|
||||
|
||||
# 2. Create cache file (optional, for progress tracking)
|
||||
cache_file = get_download_plan_cache_file("new_mode", **plan_kwargs)
|
||||
save_plan_cache(cache_file, download_plan, [])
|
||||
|
||||
# 3. Use unified execution workflow
|
||||
downloaded_count, success = self.execute_unified_download_workflow(
|
||||
download_plan=download_plan,
|
||||
cache_file=cache_file,
|
||||
limit=limit,
|
||||
show_progress=True,
|
||||
)
|
||||
|
||||
return success
|
||||
```
|
||||
|
||||
### **Benefits of Unified Architecture**
|
||||
- **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
||||
- **Maintainability**: Changes to download execution only need to be made in one place
|
||||
- **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
||||
- **Extensibility**: New modes automatically get all existing features (parallel downloads, progress tracking, etc.)
|
||||
- **Testing**: Easier to test since all modes use the same execution logic
|
||||
|
||||
### **What Was Fixed**
|
||||
- **Broken Pipeline**: Previously, different modes used different execution paths, leading to inconsistencies
|
||||
- **Missing Method**: Added missing `download_latest_per_channel()` method that was referenced in CLI but not implemented
|
||||
- **Code Duplication**: Eliminated duplicate download execution logic across different modes
|
||||
- **Inconsistent Behavior**: All modes now have identical progress tracking, error handling, and cache management
|
||||
|
||||
### **Future Development Guidelines**
|
||||
1. **NEVER implement custom download execution logic** in new download modes
|
||||
2. **ALWAYS use `execute_unified_download_workflow()`** for download execution
|
||||
3. **Focus on download plan building** - that's where mode-specific logic belongs
|
||||
4. **Use the standard download plan structure** for consistency
|
||||
5. **Implement cache file handling** for progress tracking and resume functionality
|
||||
6. **Test with both sequential and parallel modes** to ensure compatibility
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Future Enhancements
|
||||
- [ ] Web UI for easier management
|
||||
- [ ] More advanced song matching (multi-language)
|
||||
- [ ] Download scheduling and retry logic
|
||||
- [ ] More granular status reporting
|
||||
- [x] **Parallel downloads for improved speed** ✅ **COMPLETED**
|
||||
- [x] **Enhanced fuzzy matching with improved video title parsing** ✅ **COMPLETED**
|
||||
- [x] **Consolidated extract_artist_title function** ✅ **COMPLETED**
|
||||
- [x] **Duplicate file prevention and filename consistency** ✅ **COMPLETED**
|
||||
- [ ] Unit tests for all modules
|
||||
- [ ] Integration tests for end-to-end workflows
|
||||
- [ ] Plugin system for custom file operations
|
||||
- [ ] Advanced configuration UI
|
||||
- [ ] Real-time download progress visualization
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.4)
|
||||
### **macOS Support with Automatic Platform Detection**
|
||||
- **Cross-platform compatibility**: Added support for macOS alongside Windows
|
||||
- **Automatic platform detection**: Detects operating system and selects appropriate yt-dlp binary
|
||||
- **Flexible yt-dlp integration**: Supports both binary files (`yt-dlp_macos`) and pip installation (`python3 -m yt_dlp`)
|
||||
- **Setup automation**: `setup_macos.py` script for easy macOS setup with FFmpeg and yt-dlp installation
|
||||
- **Command parsing**: Intelligent parsing of yt-dlp commands (file paths vs. module commands)
|
||||
- **Enhanced validation**: Platform-specific error messages and validation in CLI
|
||||
- **Backward compatibility**: Maintains full compatibility with existing Windows installations
|
||||
|
||||
### **Benefits of macOS Support**
|
||||
- **Native macOS experience**: No need for Windows compatibility layers or virtualization
|
||||
- **Automatic setup**: Simple setup script handles all dependencies
|
||||
- **Flexible installation**: Choose between binary download or pip installation
|
||||
- **Consistent functionality**: All features work identically on both platforms
|
||||
- **Easy maintenance**: Platform detection handles configuration automatically
|
||||
|
||||
### **Setup Instructions**
|
||||
```bash
|
||||
# Automatic setup (recommended)
|
||||
python3 setup_macos.py
|
||||
|
||||
# Test installation
|
||||
python3 src/tests/test_macos.py
|
||||
|
||||
# Manual setup options
|
||||
# 1. Install yt-dlp via pip: pip3 install yt-dlp
|
||||
# 2. Download binary: curl -L -o downloader/yt-dlp_macos https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos
|
||||
# 3. Install FFmpeg: brew install ffmpeg
|
||||
```
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.7)
|
||||
### **Configurable Data Directory Path**
|
||||
- **Centralized Data Path Management**: New `data_path_manager.py` module provides unified data directory path management
|
||||
- **Configurable Location**: Data directory path can be set in `config/config.json` under `folder_structure.data_dir`
|
||||
- **Backward Compatibility**: Defaults to "data" directory if not configured
|
||||
- **Cross-Project Integration**: Enables the karaoke downloader to be used as a component in other projects with different data directory structures
|
||||
- **Updated All Modules**: All modules now use the data path manager instead of hardcoded "data/" paths
|
||||
- **Utility Functions**: Provides `get_data_path()`, `get_data_dir()`, and `get_data_path_manager()` functions for easy access
|
||||
- **Fixed Circular Dependency**: Moved `config.json` from `data/` to root directory to resolve chicken-and-egg problem
|
||||
|
||||
### **Benefits of Configurable Data Directory**
|
||||
- **Flexible Deployment**: Can be integrated into other projects with different directory structures
|
||||
- **Centralized Configuration**: Single point of configuration for all data file paths
|
||||
- **Maintainable Code**: Eliminates hardcoded paths throughout the codebase
|
||||
- **Easy Testing**: Can use temporary directories for testing without affecting production data
|
||||
- **Future-Proof**: Makes it easier to change data directory structure in the future
|
||||
|
||||
### **Circular Dependency Solution**
|
||||
The original implementation had a circular dependency problem:
|
||||
- **Problem**: `config.json` was located in the `data/` directory
|
||||
- **Issue**: To read the config file, we needed to know where the data directory is
|
||||
- **Conflict**: But the data directory location is specified in the config file
|
||||
- **Solution**: Moved `config.json` to the `config/` directory as a fixed location
|
||||
- **Result**: Config file is always accessible in a dedicated config directory, and data directory can be configured within it
|
||||
- **Backward Compatibility**: System still works with config files in custom data directories when explicitly specified
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.6)
|
||||
### **Dry Run Mode**
|
||||
- **New `--dry-run` parameter**: Build download plan and show what would be downloaded without actually downloading anything
|
||||
- **Plan preview**: Shows total videos in plan and preview of first 5 videos
|
||||
- **Safe testing**: Test download configurations without consuming bandwidth or disk space
|
||||
- **All mode support**: Works with all download modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel)
|
||||
- **Progress simulation**: Shows what the download process would look like without executing it
|
||||
|
||||
### **Benefits of Dry Run Mode**
|
||||
- **Safe testing**: Test complex download configurations without downloading anything
|
||||
- **Plan validation**: Verify that the download plan contains the expected videos
|
||||
- **Configuration debugging**: Troubleshoot download settings before committing to downloads
|
||||
- **Resource conservation**: Save bandwidth and disk space during testing
|
||||
- **User education**: Help users understand what the tool will do before running it
|
||||
|
||||
### **Example Usage**
|
||||
```bash
|
||||
# Test songlist download plan
|
||||
python download_karaoke.py --songlist-only --limit 5 --dry-run
|
||||
|
||||
# Test channel download plan
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 10 --dry-run
|
||||
|
||||
# Test with fuzzy matching
|
||||
python download_karaoke.py --songlist-only --fuzzy-match --limit 3 --dry-run
|
||||
```
|
||||
|
||||
### **Future Development Guidelines**
|
||||
|
||||
352
README.md
352
README.md
@ -1,6 +1,6 @@
|
||||
# 🎤 Karaoke Video Downloader
|
||||
|
||||
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration.
|
||||
A Python-based cross-platform CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp`, with advanced tracking, songlist prioritization, and flexible configuration. Supports Windows and macOS with automatic platform detection.
|
||||
|
||||
## ✨ Features
|
||||
- 🎵 **Channel & Playlist Downloads**: Download all videos from a YouTube channel or playlist
|
||||
@ -13,7 +13,7 @@ A Python-based Windows CLI tool to download karaoke videos from YouTube channels
|
||||
- 📈 **Real-Time Progress**: Detailed console and log output
|
||||
- 🧹 **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI
|
||||
- 🗂️ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
|
||||
- 🧩 **Fuzzy Matching**: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
|
||||
- 🧩 **Enhanced Fuzzy Matching**: Advanced fuzzy string matching for songlist-to-video matching with improved video title parsing (handles multiple title formats like "Title Karaoke | Artist Karaoke Version")
|
||||
- ⚡ **Fast Mode with Early Exit**: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
|
||||
- 🔄 **Deduplication Across Channels**: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
|
||||
- 📋 **Default Channel File**: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
|
||||
@ -21,10 +21,20 @@ A Python-based Windows CLI tool to download karaoke videos from YouTube channels
|
||||
- ⚡ **Optimized Scanning**: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching
|
||||
- 🏷️ **Server Duplicates Tracking**: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server
|
||||
- ⚡ **Parallel Downloads**: Enable concurrent downloads with `--parallel --workers N` for significantly faster batch downloads (3-5x speedup)
|
||||
- 📊 **Unmatched Songs Reports**: Generate detailed reports of songs that couldn't be found in any channel with `--generate-unmatched-report`
|
||||
- 🛡️ **Duplicate File Prevention**: Automatically detects and prevents duplicate files with `(2)`, `(3)` suffixes, with cleanup utility for existing duplicates
|
||||
- 🏷️ **Consistent Metadata**: Filename and ID3 tag use identical artist/title format for clear file identification
|
||||
- 🍎 **macOS Support**: Automatic platform detection and setup with native macOS binaries and FFmpeg integration
|
||||
|
||||
## 🏗️ Architecture
|
||||
The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse:
|
||||
|
||||
### **Configurable Data Directory (v3.4.7)**
|
||||
- **Centralized Data Path Management**: `data_path_manager.py` provides unified data directory path management
|
||||
- **Configurable Location**: Data directory path can be set in `config/config.json` under `folder_structure.data_dir`
|
||||
- **Backward Compatibility**: Defaults to "data" directory if not configured
|
||||
- **Cross-Project Integration**: Enables the karaoke downloader to be used as a component in other projects with different data directory structures
|
||||
|
||||
### Core Modules:
|
||||
- **`downloader.py`**: Main orchestrator and CLI interface
|
||||
- **`video_downloader.py`**: Core video download execution and orchestration
|
||||
@ -46,47 +56,192 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
||||
- **`tracking_cli.py`**: Tracking management CLI
|
||||
|
||||
### New Utility Modules (v3.3):
|
||||
- **`parallel_downloader.py`**: Parallel download management with thread-safe operations
|
||||
- `ParallelDownloader` class: Manages concurrent downloads with configurable workers
|
||||
- `DownloadTask` and `DownloadResult` dataclasses: Structured task and result management
|
||||
- Thread-safe progress tracking and error handling
|
||||
- Automatic retry mechanism for failed downloads
|
||||
- **`file_utils.py`**: Centralized file operations, filename sanitization, and file validation
|
||||
- `sanitize_filename()`: Create safe filenames from artist/title
|
||||
- `generate_possible_filenames()`: Generate filename patterns for different modes
|
||||
- `check_file_exists_with_patterns()`: Check for existing files using multiple patterns
|
||||
- `is_valid_mp4_file()`: Validate MP4 files with header checking
|
||||
- `cleanup_temp_files()`: Remove temporary yt-dlp files
|
||||
- `ensure_directory_exists()`: Safe directory creation
|
||||
- **`song_validator.py`**: Centralized song validation logic for checking if songs should be downloaded
|
||||
|
||||
- **`song_validator.py`**: Centralized song validation logic
|
||||
- `SongValidator` class: Unified logic for checking if songs should be downloaded
|
||||
- `should_skip_song()`: Comprehensive validation with multiple criteria
|
||||
- `mark_song_failed()`: Consistent failure tracking
|
||||
- `handle_download_failure()`: Standardized error handling
|
||||
### New Utility Modules (v3.4.7):
|
||||
- **`data_path_manager.py`**: Centralized data directory path management and file path resolution
|
||||
|
||||
- **Enhanced `config_manager.py`**: Robust configuration management with dataclasses
|
||||
- `ConfigManager` class: Type-safe configuration loading and caching
|
||||
- `DownloadSettings`, `FolderStructure`, `LoggingConfig` dataclasses
|
||||
- Configuration validation and merging with defaults
|
||||
- Dynamic resolution updates
|
||||
### **Unified Download Workflow (v3.4.5)**
|
||||
- **`execute_unified_download_workflow()`**: Centralized download execution that all modes use
|
||||
- **`_execute_sequential_downloads()`**: Sequential download execution using DownloadPipeline
|
||||
- **`_execute_parallel_downloads()`**: Parallel download execution using ParallelDownloader
|
||||
|
||||
### Benefits:
|
||||
### **Benefits of Enhanced Modular Architecture:**
|
||||
- **Single Responsibility**: Each module has a focused purpose
|
||||
- **Centralized Utilities**: Common operations (file operations, song validation, yt-dlp commands, error handling) are centralized
|
||||
- **Reduced Duplication**: Eliminated ~150 lines of code duplication across modules
|
||||
- **Testability**: Individual components can be tested separately
|
||||
- **Maintainability**: Easier to find and fix issues
|
||||
- **Reusability**: Components can be used independently
|
||||
- **Robustness**: Better error handling and interruption recovery
|
||||
- **Consistency**: Standardized error messages and processing pipelines
|
||||
- **Maintainability**: Changes isolated to specific modules
|
||||
- **Testability**: Modular components can be tested independently
|
||||
- **Type Safety**: Comprehensive type hints across all new modules
|
||||
- **Unified Execution**: All download modes use the same execution pipeline for consistency
|
||||
|
||||
## 🔧 Development Guidelines
|
||||
|
||||
### **Adding New Download Modes**
|
||||
When adding new download modes, follow the unified workflow pattern to ensure consistency:
|
||||
|
||||
#### **1. Build Download Plan (Mode-Specific)**
|
||||
```python
|
||||
def download_new_mode(self, ...):
|
||||
# Build download plan with standard structure
|
||||
download_plan = []
|
||||
for video in videos_to_download:
|
||||
download_plan.append({
|
||||
"video_id": video["id"],
|
||||
"artist": artist,
|
||||
"title": title,
|
||||
"filename": filename,
|
||||
"channel_name": channel_name,
|
||||
"video_title": video["title"],
|
||||
"force_download": force_download
|
||||
})
|
||||
|
||||
# Use unified execution workflow
|
||||
downloaded_count, success = self.execute_unified_download_workflow(
|
||||
download_plan=download_plan,
|
||||
cache_file=cache_file,
|
||||
limit=limit,
|
||||
show_progress=True,
|
||||
)
|
||||
|
||||
return success
|
||||
```
|
||||
|
||||
#### **2. Key Principles**
|
||||
- **NEVER implement custom download execution logic** - always use `execute_unified_download_workflow()`
|
||||
- **Focus on download plan building** - that's where mode-specific logic belongs
|
||||
- **Use the standard download plan structure** for consistency
|
||||
- **Implement cache file handling** for progress tracking and resume functionality
|
||||
- **Test with both sequential and parallel modes** to ensure compatibility
|
||||
|
||||
#### **3. Benefits of Unified Architecture**
|
||||
- **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
||||
- **Automatic Features**: New modes automatically get parallel downloads, progress tracking, and cache management
|
||||
- **Maintainability**: Changes to download execution only need to be made in one place
|
||||
- **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
||||
|
||||
## 🔧 Recent Improvements (v3.4.1)
|
||||
### **Enhanced Fuzzy Matching**
|
||||
- **Improved title parsing**: Enhanced `extract_artist_title` function to handle multiple video title formats
|
||||
- **Better matching accuracy**: Can now parse titles like "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||||
- **Consistent parsing**: All modules now use the same parsing logic from `fuzzy_matcher.py`
|
||||
- **Reduced false negatives**: Songs that previously couldn't be matched due to title format differences now have a higher chance of being found
|
||||
|
||||
### **Fixed Import Conflicts**
|
||||
- **Resolved import conflicts**: Updated modules to use the enhanced `extract_artist_title` from `fuzzy_matcher.py`
|
||||
- **Consistent behavior**: All parts of the system use the same parsing logic
|
||||
- **Cleaner codebase**: Eliminated duplicate code and import conflicts
|
||||
|
||||
### **Fixed --limit Parameter**
|
||||
- **Correct limit application**: The `--limit` parameter now properly limits the scanning phase, not just downloads
|
||||
- **Improved performance**: When using `--limit N`, only the first N songs are scanned, significantly reducing processing time
|
||||
- **Accurate logging**: Logging messages now show the correct counts for songs that will actually be processed when using `--limit`
|
||||
|
||||
### **Code Quality Improvements**
|
||||
- **Eliminated duplicate functions**: Removed duplicate `extract_artist_title` implementations
|
||||
- **Fixed import conflicts**: Resolved inconsistencies between different parsing implementations
|
||||
- **Single source of truth**: All title parsing logic is now centralized in `fuzzy_matcher.py`
|
||||
|
||||
## 🔧 Recent Improvements (v3.4.5)
|
||||
### **Unified Download Workflow Architecture**
|
||||
- **Unified execution pipeline**: All download modes now use the same execution workflow, eliminating inconsistencies and broken pipelines
|
||||
- **Consistent behavior**: All modes (--channel-focus, --all-videos, --songlist-only, --latest-per-channel) use identical download execution, progress tracking, and error handling
|
||||
- **Centralized download logic**: Single `execute_unified_download_workflow()` method handles all download execution
|
||||
- **Automatic parallel support**: All download modes automatically support `--parallel --workers N` without additional implementation
|
||||
- **Unified cache management**: Consistent progress tracking and resume functionality across all modes
|
||||
|
||||
### **What Was Fixed**
|
||||
- **Broken Pipeline**: Previously, different modes used different execution paths, leading to inconsistencies
|
||||
- **Missing Method**: Added missing `download_latest_per_channel()` method that was referenced in CLI but not implemented
|
||||
- **Code Duplication**: Eliminated duplicate download execution logic across different modes
|
||||
- **Inconsistent Behavior**: All modes now have identical progress tracking, error handling, and cache management
|
||||
|
||||
### **Benefits**
|
||||
- ✅ **Consistency**: All modes behave identically for execution, progress tracking, and error handling
|
||||
- ✅ **Maintainability**: Changes to download execution only need to be made in one place
|
||||
- ✅ **Reliability**: Eliminates broken pipelines and inconsistent behavior between modes
|
||||
- ✅ **Extensibility**: New modes automatically get all existing features (parallel downloads, progress tracking, etc.)
|
||||
- ✅ **Testing**: Easier to test since all modes use the same execution logic
|
||||
|
||||
## 🛡️ Duplicate File Prevention & Filename Consistency (v3.4.2)
|
||||
### **Duplicate File Prevention**
|
||||
- **Enhanced file existence checking**: Now detects files with `(2)`, `(3)`, etc. suffixes that yt-dlp creates
|
||||
- **Automatic duplicate prevention**: Skips downloads when files already exist (including duplicates)
|
||||
- **Updated yt-dlp configuration**: Set `"nooverwrites": false` to prevent yt-dlp from creating duplicate files
|
||||
- **Cleanup utility**: `data/cleanup_duplicate_files.py` helps identify and remove existing duplicate files
|
||||
|
||||
### **Filename vs ID3 Tag Consistency**
|
||||
- **Consistent metadata**: Filename and ID3 tag now use identical artist/title format
|
||||
- **Removed extra suffixes**: No more "(Karaoke Version)" in ID3 tags that don't match filenames
|
||||
- **Unified parsing**: Both filename generation and ID3 tagging use the same artist/title extraction
|
||||
|
||||
### **Benefits**
|
||||
- ✅ **No more duplicate files** with `(2)`, `(3)` suffixes
|
||||
- ✅ **Consistent metadata** between filename and ID3 tags
|
||||
- ✅ **Efficient disk usage** by preventing unnecessary downloads
|
||||
- ✅ **Clear file identification** with consistent naming
|
||||
|
||||
### **Clean Up Existing Duplicates**
|
||||
```bash
|
||||
# Run the cleanup utility to find and remove existing duplicates
|
||||
python data/cleanup_duplicate_files.py
|
||||
|
||||
# Choose option 1 for dry run (recommended first)
|
||||
# Choose option 2 to actually delete duplicates
|
||||
```
|
||||
|
||||
## 📋 Requirements
|
||||
- **Windows 10/11**
|
||||
- **Windows 10/11 or macOS 10.14+**
|
||||
- **Python 3.7+**
|
||||
- **yt-dlp.exe** (in `downloader/`)
|
||||
- **yt-dlp binary** (platform-specific, see setup instructions below)
|
||||
- **mutagen** (for ID3 tagging, optional)
|
||||
- **ffmpeg/ffprobe** (for video validation, optional but recommended)
|
||||
- **rapidfuzz** (for fuzzy matching, optional, falls back to difflib)
|
||||
|
||||
## 🍎 macOS Setup
|
||||
|
||||
### Automatic Setup (Recommended)
|
||||
Run the macOS setup script to automatically set up yt-dlp and FFmpeg:
|
||||
|
||||
```bash
|
||||
python3 setup_macos.py
|
||||
```
|
||||
|
||||
This script will:
|
||||
- Detect your macOS version
|
||||
- Offer installation options for yt-dlp (pip or binary download)
|
||||
- Install FFmpeg via Homebrew
|
||||
- Test the installation
|
||||
|
||||
### Manual Setup
|
||||
If you prefer to set up manually:
|
||||
|
||||
#### Option 1: Install yt-dlp via pip
|
||||
```bash
|
||||
pip3 install yt-dlp
|
||||
```
|
||||
|
||||
#### Option 2: Download yt-dlp binary
|
||||
```bash
|
||||
mkdir -p downloader
|
||||
curl -L -o downloader/yt-dlp_macos https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos
|
||||
chmod +x downloader/yt-dlp_macos
|
||||
```
|
||||
|
||||
#### Install FFmpeg
|
||||
```bash
|
||||
brew install ffmpeg
|
||||
```
|
||||
|
||||
### Test Installation
|
||||
```bash
|
||||
python3 src/tests/test_macos.py
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
> **💡 Pro Tip**: For a complete list of all available commands, see `commands.txt` - you can copy/paste any command directly into your terminal!
|
||||
@ -96,6 +251,21 @@ The codebase has been comprehensively refactored into a modular architecture wit
|
||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
||||
```
|
||||
|
||||
### Download ALL Videos from a Channel (Not Just Songlist Matches)
|
||||
```bash
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos
|
||||
```
|
||||
|
||||
### Download ALL Videos with Parallel Processing
|
||||
```bash
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 10
|
||||
```
|
||||
|
||||
### Download ALL Videos with Limit
|
||||
```bash
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 100
|
||||
```
|
||||
|
||||
### Download Only Songlist Songs (Fast Mode)
|
||||
```bash
|
||||
python download_karaoke.py --songlist-only --limit 5
|
||||
@ -103,7 +273,7 @@ python download_karaoke.py --songlist-only --limit 5
|
||||
|
||||
### Download with Parallel Processing
|
||||
```bash
|
||||
python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
||||
python download_karaoke.py --parallel --songlist-only --limit 10
|
||||
```
|
||||
|
||||
### Focus on Specific Playlists by Title
|
||||
@ -111,11 +281,31 @@ python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"
|
||||
```
|
||||
|
||||
### Focus on Specific Playlists from Custom File
|
||||
```bash
|
||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json"
|
||||
```
|
||||
|
||||
### Force Download from Channels (Bypass All Existing File Checks)
|
||||
```bash
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force
|
||||
```
|
||||
|
||||
### Download with Fuzzy Matching
|
||||
```bash
|
||||
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
||||
```
|
||||
|
||||
### Test Download Plan (Dry Run)
|
||||
```bash
|
||||
python download_karaoke.py --songlist-only --limit 5 --dry-run
|
||||
```
|
||||
|
||||
### Test Channel Download Plan (Dry Run)
|
||||
```bash
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 10 --dry-run
|
||||
```
|
||||
|
||||
### Download Latest N Videos Per Channel
|
||||
```bash
|
||||
python download_karaoke.py --latest-per-channel --limit 5
|
||||
@ -220,19 +410,33 @@ KaroakeVideoDownloader/
|
||||
│ ├── check_resolution.py # Resolution checker utility
|
||||
│ ├── resolution_cli.py # Resolution config CLI
|
||||
│ └── tracking_cli.py # Tracking management CLI
|
||||
├── data/ # All config, tracking, cache, and songlist files
|
||||
│ ├── config.json
|
||||
├── config/ # Configuration files
|
||||
│ └── config.json # Main configuration file
|
||||
├── data/ # All tracking, cache, and songlist files
|
||||
│ ├── karaoke_tracking.json
|
||||
│ ├── songlist_tracking.json
|
||||
│ ├── channel_cache.json
|
||||
│ ├── channels.txt
|
||||
│ ├── channels.json # Channel configuration with parsing rules
|
||||
│ └── songList.json
|
||||
├── utilities/ # Utility scripts and tools
|
||||
│ ├── add_manual_video.py # Manual video management
|
||||
│ ├── build_cache_from_raw.py # Cache building utility
|
||||
│ ├── cleanup_duplicate_files.py # File cleanup utilities
|
||||
│ ├── cleanup_recent_tracking.py # Tracking cleanup utilities
|
||||
│ ├── deduplicate_songlist_tracking.py # Data deduplication
|
||||
│ ├── fix_artist_name_format.py # Data cleanup utilities
|
||||
│ ├── fix_artist_name_format_simple.py
|
||||
│ ├── fix_code_quality.py # Development tools
|
||||
│ ├── reset_and_redownload.py # Maintenance utilities
|
||||
│ └── songlist_report.py # Reporting utilities
|
||||
├── downloads/ # All video output
|
||||
│ └── [ChannelName]/ # Per-channel folders
|
||||
├── logs/ # Download logs
|
||||
├── downloader/yt-dlp.exe # yt-dlp binary
|
||||
├── tests/ # Diagnostic and test scripts
|
||||
│ └── test_installation.py
|
||||
├── downloader/yt-dlp.exe # yt-dlp binary (Windows)
|
||||
├── downloader/yt-dlp_macos # yt-dlp binary (macOS)
|
||||
├── src/tests/ # Test scripts
|
||||
│ ├── test_macos.py # macOS setup and functionality tests
|
||||
│ └── test_platform.py # Platform detection tests
|
||||
├── download_karaoke.py # Main entry point (thin wrapper)
|
||||
├── README.md
|
||||
├── PRD.md
|
||||
@ -249,6 +453,7 @@ KaroakeVideoDownloader/
|
||||
- `--songlist-priority`: Prioritize songlist songs in download queue
|
||||
- `--songlist-only`: Download only songs from the songlist
|
||||
- `--songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...`: Focus on specific playlists by title (e.g., `--songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"`)
|
||||
- `--songlist-file <FILE_PATH>`: Custom songlist file path to use with --songlist-focus (default: data/songList.json)
|
||||
- `--songlist-status`: Show songlist download progress
|
||||
- `--limit <N>`: Limit number of downloads (enables fast mode with early exit)
|
||||
- `--resolution <720p|1080p|...>`: Override resolution
|
||||
@ -260,8 +465,14 @@ KaroakeVideoDownloader/
|
||||
- `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
|
||||
- `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
|
||||
- `--fuzzy-threshold <N>`: Fuzzy match threshold (0-100, default 85)
|
||||
- `--parallel`: Enable parallel downloads for improved speed
|
||||
- `--workers <N>`: Number of parallel download workers (1-10, default: 3)
|
||||
- `--parallel`: Enable parallel downloads for improved speed (defaults to 3 workers)
|
||||
- `--workers <N>`: Number of parallel download workers (1-10, default: 3, only used with --parallel)
|
||||
- `--generate-songlist <DIR1> <DIR2>...`: **Generate song list from MP4 files with ID3 tags in specified directories**
|
||||
- `--no-append-songlist`: **Create a new song list instead of appending when using --generate-songlist**
|
||||
- `--force`: **Force download from channels, bypassing all existing file checks and re-downloading if necessary**
|
||||
- `--channel-focus <CHANNEL_NAME>`: **Download from a specific channel by name (e.g., 'SingKingKaraoke')**
|
||||
- `--all-videos`: **Download all videos from channel (not just songlist matches), skipping existing files**
|
||||
- `--dry-run`: **Build download plan and show what would be downloaded without actually downloading anything**
|
||||
|
||||
## 📝 Example Usage
|
||||
|
||||
@ -272,30 +483,61 @@ KaroakeVideoDownloader/
|
||||
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85
|
||||
|
||||
# Parallel downloads for faster processing
|
||||
python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
||||
python download_karaoke.py --parallel --songlist-only --limit 10
|
||||
|
||||
# Latest videos per channel with parallel downloads
|
||||
python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5
|
||||
python download_karaoke.py --parallel --latest-per-channel --limit 5
|
||||
|
||||
# Traditional full scan (no limit)
|
||||
python download_karaoke.py --songlist-only
|
||||
|
||||
# Focused fuzzy matching (target specific playlists with flexible matching)
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --fuzzy-match --fuzzy-threshold 80 --limit 10
|
||||
|
||||
# Focus on specific playlists from a custom file
|
||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json" --limit 10
|
||||
|
||||
# Force download with fuzzy matching (bypass all existing file checks)
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force --fuzzy-match --fuzzy-threshold 80 --limit 10
|
||||
|
||||
# Channel-specific operations
|
||||
python download_karaoke.py --reset-channel SingKingKaraoke
|
||||
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
||||
python download_karaoke.py --clear-cache all
|
||||
python download_karaoke.py --clear-server-duplicates
|
||||
|
||||
# Download ALL videos from a specific channel
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 10
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 100
|
||||
|
||||
# Song list generation from MP4 files
|
||||
python download_karaoke.py --generate-songlist /path/to/mp4/directory
|
||||
python download_karaoke.py --generate-songlist /path/to/dir1 /path/to/dir2 --no-append-songlist
|
||||
|
||||
# Generate report of songs that couldn't be found
|
||||
python download_karaoke.py --generate-unmatched-report
|
||||
python download_karaoke.py --generate-unmatched-report --fuzzy-match --fuzzy-threshold 85
|
||||
```
|
||||
|
||||
## 🏷️ ID3 Tagging
|
||||
- Adds artist/title/album/genre to MP4 files using mutagen (if installed)
|
||||
|
||||
## 📋 Song List Generation
|
||||
- **Generate song lists from existing MP4 files**: Use `--generate-songlist` to create song lists from directories containing MP4 files with ID3 tags
|
||||
- **Automatic ID3 extraction**: Extracts artist and title from MP4 files' ID3 tags
|
||||
- **Directory-based organization**: Each directory becomes a playlist with the directory name as the title
|
||||
- **Position tracking**: Songs are numbered starting from 1 based on file order
|
||||
- **Append or replace**: Choose to append to existing song list or create a new one with `--no-append-songlist`
|
||||
- **Multiple directories**: Process multiple directories in a single command
|
||||
|
||||
## 🧹 Cleanup
|
||||
- Removes `.info.json` and `.meta` files after download
|
||||
|
||||
## 🛠️ Configuration
|
||||
- All options are in `data/config.json` (format, resolution, metadata, etc.)
|
||||
- All options are in `config/config.json` (format, resolution, metadata, etc.)
|
||||
- You can edit this file or use CLI flags to override
|
||||
- **Configurable Data Directory**: The data directory path can be configured in `config/config.json` under `folder_structure.data_dir` (default: "data")
|
||||
|
||||
## 📋 Command Reference File
|
||||
|
||||
@ -311,6 +553,31 @@ python download_karaoke.py --clear-server-duplicates
|
||||
|
||||
> **🔄 Maintenance Note**: The `commands.txt` file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage.
|
||||
|
||||
## 📚 Documentation Standards
|
||||
|
||||
### **Documentation Location**
|
||||
- **All changes, refactoring, and improvements should be documented in the PRD.md and README.md files**
|
||||
- **Do NOT create separate .md files for documenting changes, refactoring, or improvements**
|
||||
- **Use the existing sections in PRD.md and README.md to track all project evolution**
|
||||
|
||||
### **Where to Document Changes**
|
||||
- **PRD.md**: Technical details, architecture changes, bug fixes, and implementation specifics
|
||||
- **README.md**: User-facing features, usage instructions, and high-level improvements
|
||||
- **CHANGELOG.md**: Version-specific release notes and change summaries
|
||||
|
||||
### **Documentation Requirements**
|
||||
- **All new features must be documented in both PRD.md and README.md**
|
||||
- **All refactoring efforts must be documented in the appropriate sections**
|
||||
- **All bug fixes must be documented with technical details**
|
||||
- **Version numbers and dates should be clearly marked**
|
||||
- **Benefits and improvements should be explicitly stated**
|
||||
|
||||
### **Maintenance Responsibility**
|
||||
- **Keep PRD.md and README.md synchronized with code changes**
|
||||
- **Update documentation immediately when implementing new features**
|
||||
- **Remove outdated information and consolidate related changes**
|
||||
- **Ensure all CLI options and features are documented in both files**
|
||||
|
||||
## 🔧 Refactoring Improvements (v3.3)
|
||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||
|
||||
@ -348,7 +615,7 @@ The codebase has been comprehensively refactored to improve maintainability and
|
||||
|
||||
### **New Parallel Download System (v3.4)**
|
||||
- **Parallel downloader module:** `parallel_downloader.py` provides thread-safe concurrent download management
|
||||
- **Configurable concurrency:** Use `--parallel --workers N` to enable parallel downloads with N workers (1-10)
|
||||
- **Configurable concurrency:** Use `--parallel` to enable parallel downloads with 3 workers by default, or `--parallel --workers N` for custom worker count (1-10)
|
||||
- **Thread-safe operations:** All tracking, caching, and progress operations are thread-safe
|
||||
- **Real-time progress tracking:** Shows active downloads, completion status, and overall progress
|
||||
- **Automatic retry mechanism:** Failed downloads are automatically retried with reduced concurrency
|
||||
@ -372,7 +639,8 @@ The codebase has been comprehensively refactored to improve maintainability and
|
||||
- **Robust download plan execution:** Fixed index management in download plan execution to prevent errors during interrupted downloads.
|
||||
|
||||
## 🐞 Troubleshooting
|
||||
- Ensure `yt-dlp.exe` is in the `downloader/` folder
|
||||
- **Windows**: Ensure `yt-dlp.exe` is in the `downloader/` folder
|
||||
- **macOS**: Run `python3 setup_macos.py` to set up yt-dlp and FFmpeg
|
||||
- Check `logs/` for error details
|
||||
- Use `python -m karaoke_downloader.check_resolution` to verify video quality
|
||||
- If you see errors about ffmpeg/ffprobe, install [ffmpeg](https://ffmpeg.org/download.html) and ensure it is in your PATH
|
||||
|
||||
152
commands.txt
152
commands.txt
@ -1,6 +1,6 @@
|
||||
# 🎤 Karaoke Video Downloader - CLI Commands Reference
|
||||
# Copy and paste these commands into your terminal
|
||||
# Updated: v3.4 (includes parallel downloads and all refactoring improvements)
|
||||
# Updated: v3.4.4 (includes macOS support, all videos download mode, manual video collection, channel parsing rules, and all previous improvements)
|
||||
|
||||
## 📥 BASIC DOWNLOADS
|
||||
|
||||
@ -8,7 +8,7 @@
|
||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
||||
|
||||
# Download from a file containing multiple channel URLs
|
||||
python download_karaoke.py --file data/channels.txt
|
||||
python download_karaoke.py --file data/channels.json
|
||||
|
||||
# Download with custom resolution (480p, 720p, 1080p, 1440p, 2160p)
|
||||
python download_karaoke.py --resolution 1080p https://www.youtube.com/@SingKingKaraoke/videos
|
||||
@ -19,9 +19,69 @@ python download_karaoke.py --limit 10 https://www.youtube.com/@SingKingKaraoke/v
|
||||
# Enable parallel downloads for faster processing (3-5x speedup)
|
||||
python download_karaoke.py --parallel --workers 5 --limit 10 https://www.youtube.com/@SingKingKaraoke/videos
|
||||
|
||||
## 🎤 MANUAL VIDEO COLLECTION (v3.4.3)
|
||||
|
||||
# Download from manual videos collection (data/manual_videos.json)
|
||||
python download_karaoke.py --manual --limit 5
|
||||
|
||||
# Download manual videos with fuzzy matching
|
||||
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
||||
|
||||
# Download manual videos with parallel processing
|
||||
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
||||
|
||||
# Download manual videos with songlist matching
|
||||
python download_karaoke.py --manual --songlist-only --limit 10
|
||||
|
||||
# Force download from manual videos (bypass existing file checks)
|
||||
python download_karaoke.py --manual --force --limit 5
|
||||
|
||||
# Add a video to manual collection (interactive)
|
||||
python utilities/add_manual_video.py add "Artist - Song Title (Karaoke Version)" "https://www.youtube.com/watch?v=VIDEO_ID"
|
||||
|
||||
# List all manual videos
|
||||
python utilities/add_manual_video.py list
|
||||
|
||||
# Remove a video from manual collection
|
||||
python utilities/add_manual_video.py remove "Artist - Song Title (Karaoke Version)"
|
||||
|
||||
## 🎬 ALL VIDEOS DOWNLOAD MODE (v3.4.4)
|
||||
|
||||
# Download ALL videos from a specific channel (not just songlist matches)
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos
|
||||
|
||||
# Download ALL videos with parallel processing for speed
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 10
|
||||
|
||||
# Download ALL videos with limit (download first N videos)
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --limit 100
|
||||
|
||||
# Download ALL videos with parallel processing and limit
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --parallel --workers 5 --limit 50
|
||||
|
||||
# Download ALL videos from ZoomKaraokeOfficial channel
|
||||
python download_karaoke.py --channel-focus ZoomKaraokeOfficial --all-videos
|
||||
|
||||
# Download ALL videos with custom resolution
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos --resolution 1080p
|
||||
|
||||
## 📋 SONG LIST GENERATION
|
||||
|
||||
# Generate song list from MP4 files in a directory (append to existing song list)
|
||||
python download_karaoke.py --generate-songlist /path/to/mp4/directory
|
||||
|
||||
# Generate song list from multiple directories
|
||||
python download_karaoke.py --generate-songlist /path/to/dir1 /path/to/dir2 /path/to/dir3
|
||||
|
||||
# Generate song list and create a new song list file (don't append)
|
||||
python download_karaoke.py --generate-songlist /path/to/mp4/directory --no-append-songlist
|
||||
|
||||
# Generate song list from multiple directories and create new file
|
||||
python download_karaoke.py --generate-songlist /path/to/dir1 /path/to/dir2 --no-append-songlist
|
||||
|
||||
## 🎵 SONGLIST OPERATIONS
|
||||
|
||||
# Download only songs from your songlist (uses data/channels.txt by default)
|
||||
# Download only songs from your songlist (uses data/channels.json by default)
|
||||
python download_karaoke.py --songlist-only
|
||||
|
||||
# Download only songlist songs with limit
|
||||
@ -51,6 +111,18 @@ python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --limit 5
|
||||
# Focus on specific playlists with parallel processing
|
||||
python download_karaoke.py --parallel --workers 3 --songlist-focus "2025 - Apple Top 50" --limit 5
|
||||
|
||||
# Focus on specific playlists from a custom songlist file
|
||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json"
|
||||
|
||||
# Focus on specific playlists from a custom file with force mode
|
||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json" --force
|
||||
|
||||
# Force download from channels regardless of existing files or server duplicates
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force
|
||||
|
||||
# Force download with parallel processing
|
||||
python download_karaoke.py --parallel --workers 5 --songlist-focus "2025 - Apple Top 50" --force --limit 10
|
||||
|
||||
# Prioritize songlist songs in download queue (default behavior)
|
||||
python download_karaoke.py --songlist-priority https://www.youtube.com/@SingKingKaraoke/videos
|
||||
|
||||
@ -60,6 +132,35 @@ python download_karaoke.py --no-songlist-priority https://www.youtube.com/@SingK
|
||||
# Show songlist download status and statistics
|
||||
python download_karaoke.py --songlist-status
|
||||
|
||||
## 📊 UNMATCHED SONGS REPORTS
|
||||
|
||||
# Generate report of songs that couldn't be found in any channel (standalone)
|
||||
python download_karaoke.py --generate-unmatched-report
|
||||
|
||||
# Generate report with fuzzy matching enabled (standalone)
|
||||
python download_karaoke.py --generate-unmatched-report --fuzzy-match --fuzzy-threshold 85
|
||||
|
||||
# Generate report using a specific channel file (standalone)
|
||||
python download_karaoke.py --generate-unmatched-report --file data/my_channels.txt
|
||||
|
||||
# Generate report from a custom songlist file (standalone)
|
||||
python download_karaoke.py --generate-unmatched-report --songlist-file "data/my_custom_songlist.json"
|
||||
|
||||
# Generate report with focus on specific playlists from a custom file (standalone)
|
||||
python download_karaoke.py --songlist-focus "CCKaraoke" --songlist-file "data/my_custom_songlist.json" --generate-unmatched-report
|
||||
|
||||
# Download songs AND generate unmatched report (additive feature)
|
||||
python download_karaoke.py --songlist-only --limit 10 --generate-unmatched-report
|
||||
|
||||
# Download with fuzzy matching AND generate unmatched report
|
||||
python download_karaoke.py --songlist-only --fuzzy-match --fuzzy-threshold 85 --limit 10 --generate-unmatched-report
|
||||
|
||||
# Download from specific playlists AND generate unmatched report
|
||||
python download_karaoke.py --songlist-focus "CCKaraoke" --limit 10 --generate-unmatched-report
|
||||
|
||||
# Generate report with custom fuzzy threshold
|
||||
python download_karaoke.py --generate-unmatched-report --fuzzy-match --fuzzy-threshold 80
|
||||
|
||||
## ⚡ PARALLEL DOWNLOADS (v3.4)
|
||||
|
||||
# Basic parallel downloads (3-5x faster than sequential)
|
||||
@ -94,7 +195,7 @@ python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5
|
||||
python download_karaoke.py --parallel --workers 3 --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85
|
||||
|
||||
# Download latest videos from specific channels file
|
||||
python download_karaoke.py --latest-per-channel --limit 5 --file data/channels.txt
|
||||
python download_karaoke.py --latest-per-channel --limit 5 --file data/channels.json
|
||||
|
||||
## 🔄 CACHE & TRACKING MANAGEMENT
|
||||
|
||||
@ -153,7 +254,7 @@ python download_karaoke.py --version
|
||||
python download_karaoke.py --songlist-only --limit 20 --fuzzy-match --fuzzy-threshold 85 --resolution 1080p
|
||||
|
||||
# Latest videos per channel with fuzzy matching
|
||||
python download_karaoke.py --latest-per-channel --limit 3 --fuzzy-match --fuzzy-threshold 90 --file data/channels.txt
|
||||
python download_karaoke.py --latest-per-channel --limit 3 --fuzzy-match --fuzzy-threshold 90 --file data/channels.json
|
||||
|
||||
# Force refresh everything and download songlist
|
||||
python download_karaoke.py --songlist-only --force-download-plan --refresh --limit 10
|
||||
@ -172,6 +273,9 @@ python download_karaoke.py --parallel --workers 5 --songlist-only --limit 10
|
||||
# 1b. Focus on specific playlists (fast targeted download)
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --limit 5
|
||||
|
||||
# 1c. Force download from specific playlists (bypass all existing file checks)
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force --limit 5
|
||||
|
||||
# 2. Latest videos from all channels
|
||||
python download_karaoke.py --latest-per-channel --limit 5
|
||||
|
||||
@ -190,6 +294,9 @@ python download_karaoke.py --parallel --workers 5 --songlist-only --fuzzy-match
|
||||
# 4b. Focused fuzzy matching (target specific playlists with flexible matching)
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --fuzzy-match --fuzzy-threshold 80 --limit 10
|
||||
|
||||
# 4c. Force download with fuzzy matching (bypass all existing file checks)
|
||||
python download_karaoke.py --songlist-focus "2025 - Apple Top 50" --force --fuzzy-match --fuzzy-threshold 80 --limit 10
|
||||
|
||||
# 5. Reset and start fresh
|
||||
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
||||
|
||||
@ -197,6 +304,33 @@ python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
||||
python download_karaoke.py --status
|
||||
python download_karaoke.py --clear-cache all
|
||||
|
||||
# 7. Download from manual video collection
|
||||
python download_karaoke.py --manual --limit 5
|
||||
|
||||
# 7b. Fast parallel manual video download
|
||||
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
||||
|
||||
# 7c. Manual videos with fuzzy matching
|
||||
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
||||
|
||||
## 🍎 macOS SETUP COMMANDS
|
||||
|
||||
# Automatic macOS setup (detects OS and installs yt-dlp + FFmpeg)
|
||||
python3 setup_macos.py
|
||||
|
||||
# Test macOS setup and functionality
|
||||
python3 src/tests/test_macos.py
|
||||
|
||||
# Manual macOS setup options
|
||||
# Install yt-dlp via pip
|
||||
pip3 install yt-dlp
|
||||
|
||||
# Download yt-dlp binary for macOS
|
||||
mkdir -p downloader && curl -L -o downloader/yt-dlp_macos https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos && chmod +x downloader/yt-dlp_macos
|
||||
|
||||
# Install FFmpeg via Homebrew
|
||||
brew install ffmpeg
|
||||
|
||||
## 🔧 TROUBLESHOOTING COMMANDS
|
||||
|
||||
# Check if everything is working
|
||||
@ -212,7 +346,9 @@ python download_karaoke.py --clear-server-duplicates
|
||||
## 📝 NOTES
|
||||
|
||||
# Default files used:
|
||||
# - data/channels.txt (default channel list for songlist modes)
|
||||
# - data/channels.json (channel configuration with parsing rules, preferred)
|
||||
# - data/channels.json (channel configuration with parsing rules)
|
||||
# - data/manual_videos.json (manual video collection)
|
||||
# - data/songList.json (your prioritized song list)
|
||||
# - data/config.json (download settings)
|
||||
|
||||
@ -221,11 +357,12 @@ python download_karaoke.py --clear-server-duplicates
|
||||
# Fuzzy threshold: 0-100 (higher = more strict matching, default 90)
|
||||
|
||||
# The system automatically:
|
||||
# - Uses data/channels.txt if no --file specified in songlist modes
|
||||
# - Uses data/channels.json for channel configuration and parsing rules
|
||||
# - Caches channel data for 24 hours (configurable)
|
||||
# - Tracks all downloads in JSON files
|
||||
# - Avoids re-downloading existing files
|
||||
# - Checks for server duplicates
|
||||
# - Supports manual video collection via --manual parameter
|
||||
|
||||
# For best performance:
|
||||
# - Use --parallel --workers 5 for 3-5x faster downloads
|
||||
@ -233,6 +370,7 @@ python download_karaoke.py --clear-server-duplicates
|
||||
# - Use --fuzzy-match for better song discovery
|
||||
# - Use --refresh sparingly (forces re-scan)
|
||||
# - Clear cache if you encounter issues
|
||||
# - macOS users: Run `python3 setup_macos.py` for automatic setup
|
||||
|
||||
# Parallel download tips:
|
||||
# - Start with --workers 3 for conservative approach
|
||||
|
||||
@ -19,13 +19,14 @@
|
||||
"writethumbnail": false,
|
||||
"embed_metadata": false,
|
||||
"continuedl": true,
|
||||
"nooverwrites": true,
|
||||
"nooverwrites": false,
|
||||
"ignoreerrors": true,
|
||||
"no_warnings": false
|
||||
},
|
||||
"folder_structure": {
|
||||
"downloads_dir": "downloads",
|
||||
"logs_dir": "logs",
|
||||
"data_dir": "data",
|
||||
"tracking_file": "downloaded_videos.json"
|
||||
},
|
||||
"logging": {
|
||||
@ -34,5 +35,12 @@
|
||||
"include_console": true,
|
||||
"include_file": true
|
||||
},
|
||||
"platform_settings": {
|
||||
"auto_detect_platform": true,
|
||||
"yt_dlp_paths": {
|
||||
"windows": "downloader/yt-dlp.exe",
|
||||
"macos": "downloader/yt-dlp_macos"
|
||||
}
|
||||
},
|
||||
"yt_dlp_path": "downloader/yt-dlp.exe"
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
164578
data/channel_cache.json
164578
data/channel_cache.json
File diff suppressed because it is too large
Load Diff
54775
data/channel_cache/@KaraokeOnVEVO.json
Normal file
54775
data/channel_cache/@KaraokeOnVEVO.json
Normal file
File diff suppressed because it is too large
Load Diff
13699
data/channel_cache/@KaraokeOnVEVO_raw_output.txt
Normal file
13699
data/channel_cache/@KaraokeOnVEVO_raw_output.txt
Normal file
File diff suppressed because it is too large
Load Diff
19
data/channel_cache/@LetsSingKaraoke.json
Normal file
19
data/channel_cache/@LetsSingKaraoke.json
Normal file
@ -0,0 +1,19 @@
|
||||
{
|
||||
"channel_id": "@LetsSingKaraoke",
|
||||
"videos": [
|
||||
{
|
||||
"title": "Sub Urban - Cradles | Karaoke (instrumental)",
|
||||
"id": "8uj7IzhdiO4"
|
||||
},
|
||||
{
|
||||
"title": "Sia - Snowman | Karaoke (instrumental)",
|
||||
"id": "ZbWHuncTgsM"
|
||||
},
|
||||
{
|
||||
"title": "Trevor Daniel - Falling | Karaoke (Instrumental)",
|
||||
"id": "nU7n2aq7f98"
|
||||
}
|
||||
],
|
||||
"last_updated": "2025-08-05T15:59:09.280488",
|
||||
"video_count": 3
|
||||
}
|
||||
10
data/channel_cache/@LetsSingKaraoke_raw_output.txt
Normal file
10
data/channel_cache/@LetsSingKaraoke_raw_output.txt
Normal file
@ -0,0 +1,10 @@
|
||||
# Raw yt-dlp output for @LetsSingKaraoke
|
||||
# Channel URL: https://www.youtube.com/@LetsSingKaraoke/videos
|
||||
# Command: downloader/yt-dlp_macos --flat-playlist --print %(title)s|%(id)s|%(url)s --verbose https://www.youtube.com/@LetsSingKaraoke/videos
|
||||
# Timestamp: 2025-08-05T15:59:09.280155
|
||||
# Total lines: 3
|
||||
################################################################################
|
||||
|
||||
1: Sub Urban - Cradles | Karaoke (instrumental)|8uj7IzhdiO4|https://www.youtube.com/watch?v=8uj7IzhdiO4
|
||||
2: Sia - Snowman | Karaoke (instrumental)|ZbWHuncTgsM|https://www.youtube.com/watch?v=ZbWHuncTgsM
|
||||
3: Trevor Daniel - Falling | Karaoke (Instrumental)|nU7n2aq7f98|https://www.youtube.com/watch?v=nU7n2aq7f98
|
||||
20043
data/channel_cache/@SingKingKaraoke.json
Normal file
20043
data/channel_cache/@SingKingKaraoke.json
Normal file
File diff suppressed because it is too large
Load Diff
5016
data/channel_cache/@SingKingKaraoke_raw_output.txt
Normal file
5016
data/channel_cache/@SingKingKaraoke_raw_output.txt
Normal file
File diff suppressed because it is too large
Load Diff
12287
data/channel_cache/@StingrayKaraoke.json
Normal file
12287
data/channel_cache/@StingrayKaraoke.json
Normal file
File diff suppressed because it is too large
Load Diff
3077
data/channel_cache/@StingrayKaraoke_raw_output.txt
Normal file
3077
data/channel_cache/@StingrayKaraoke_raw_output.txt
Normal file
File diff suppressed because it is too large
Load Diff
32951
data/channel_cache/@VocalStarKaraoke.json
Normal file
32951
data/channel_cache/@VocalStarKaraoke.json
Normal file
File diff suppressed because it is too large
Load Diff
8243
data/channel_cache/@VocalStarKaraoke_raw_output.txt
Normal file
8243
data/channel_cache/@VocalStarKaraoke_raw_output.txt
Normal file
File diff suppressed because it is too large
Load Diff
38979
data/channel_cache/@ZoomKaraokeOfficial.json
Normal file
38979
data/channel_cache/@ZoomKaraokeOfficial.json
Normal file
File diff suppressed because it is too large
Load Diff
9750
data/channel_cache/@ZoomKaraokeOfficial_raw_output.txt
Normal file
9750
data/channel_cache/@ZoomKaraokeOfficial_raw_output.txt
Normal file
File diff suppressed because it is too large
Load Diff
5015
data/channel_cache/@sing2karaoke.json
Normal file
5015
data/channel_cache/@sing2karaoke.json
Normal file
File diff suppressed because it is too large
Load Diff
1259
data/channel_cache/@sing2karaoke_raw_output.txt
Normal file
1259
data/channel_cache/@sing2karaoke_raw_output.txt
Normal file
File diff suppressed because it is too large
Load Diff
191
data/channels.json
Normal file
191
data/channels.json
Normal file
@ -0,0 +1,191 @@
|
||||
{
|
||||
"channels": [
|
||||
{
|
||||
"name": "@SingKingKaraoke",
|
||||
"url": "https://www.youtube.com/@SingKingKaraoke/videos",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "Karaoke Version"]
|
||||
}
|
||||
},
|
||||
"examples": [
|
||||
"Artist - Title (Karaoke)",
|
||||
"Artist - Title (Karaoke Version)"
|
||||
]
|
||||
},
|
||||
"description": "Standard artist - title format with karaoke suffix"
|
||||
},
|
||||
{
|
||||
"name": "@KaraokeOnVEVO",
|
||||
"url": "https://www.youtube.com/@KaraokeOnVEVO/videos",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke)"]
|
||||
}
|
||||
},
|
||||
"examples": [
|
||||
"George Jones - A Picture Of Me (Without You) (Karaoke)",
|
||||
"Iggy Pop, Kate Pierson - Candy (Karaoke)"
|
||||
]
|
||||
},
|
||||
"description": "Standard artist - title format with (Karaoke) suffix"
|
||||
},
|
||||
{
|
||||
"name": "@StingrayKaraoke",
|
||||
"url": "https://www.youtube.com/@StingrayKaraoke/videos",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke Version)"]
|
||||
}
|
||||
},
|
||||
"playlist_indicators": [
|
||||
"TOP SONGS OF",
|
||||
"THE BEST",
|
||||
"BEST",
|
||||
"NON-STOP",
|
||||
"MASHUP",
|
||||
"FEAT.",
|
||||
"WITH LYRICS"
|
||||
],
|
||||
"examples": [
|
||||
"Gracie Abrams - That's So True (Karaoke Version)",
|
||||
"TOP SONGS OF 2024 KARAOKE WITH LYRICS BY BILLIE EILISH, GRACIE ABRAMS & MORE"
|
||||
]
|
||||
},
|
||||
"description": "Standard artist - title format with (Karaoke Version) suffix, also has playlist titles"
|
||||
},
|
||||
{
|
||||
"name": "@sing2karaoke",
|
||||
"url": "https://www.youtube.com/@sing2karaoke/videos",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_spaces",
|
||||
"separator": " ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke Version) Lyrics", "(Karaoke Version)", "Karaoke Version Lyrics"]
|
||||
}
|
||||
},
|
||||
"multi_artist_separator": ", ",
|
||||
"examples": [
|
||||
"Lauren Spencer Smith Fingers Crossed",
|
||||
"Calvin Harris, Clementine Douglas Blessings (Karaoke Version) Lyrics"
|
||||
]
|
||||
},
|
||||
"description": "Artist and title separated by multiple spaces, supports multiple artists"
|
||||
},
|
||||
{
|
||||
"name": "@ZoomKaraokeOfficial",
|
||||
"url": "https://www.youtube.com/@ZoomKaraokeOfficial/videos",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": [
|
||||
"(Karaoke)",
|
||||
"(Karaoke Version)",
|
||||
"Karaoke Version",
|
||||
"- Karaoke Version from Zoom Karaoke",
|
||||
"- Karaoke Version from Zoom",
|
||||
"- Karaoke Version from Zoom Karaoke (Radiohead Cover)",
|
||||
"- Karaoke Version from Zoom (Radiohead Cover)"
|
||||
]
|
||||
}
|
||||
},
|
||||
"examples": [
|
||||
"The Mavericks - Here Comes My Baby - Karaoke Version from Zoom Karaoke"
|
||||
]
|
||||
},
|
||||
"description": "Standard artist - title format with '- Karaoke Version from Zoom Karaoke' suffix"
|
||||
},
|
||||
{
|
||||
"name": "@VocalStarKaraoke",
|
||||
"url": "https://www.youtube.com/@VocalStarKaraoke/videos",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": false,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["KARAOKE Without Backing Vocals", "KARAOKE With Vocal Guide", "KARAOKE"]
|
||||
}
|
||||
},
|
||||
"examples": [
|
||||
"Don't Say You Love Me - Jin KARAOKE Without Backing Vocals",
|
||||
"Don't Say You Love Me - Jin KARAOKE With Vocal Guide"
|
||||
]
|
||||
},
|
||||
"description": "Title first, then dash separator, then artist with KARAOKE suffix"
|
||||
},
|
||||
{
|
||||
"name": "@ManualVideos",
|
||||
"url": "manual://static",
|
||||
"manual_videos_file": "data/manual_videos.json",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"description": "Manual collection of individual karaoke videos (static, never expires)"
|
||||
},
|
||||
{
|
||||
"name": "Let's Sing Karaoke",
|
||||
"url": "https://www.youtube.com/@LetsSingKaraoke/videos",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "Karaoke Version", "(In the style of)"]
|
||||
}
|
||||
},
|
||||
"examples": [
|
||||
"Artist - Title (Karaoke)",
|
||||
"Artist - Title (In the style of Other Artist)"
|
||||
]
|
||||
},
|
||||
"artist_name_processing": true,
|
||||
"description": "Let's Sing Karaoke with enhanced artist name processing"
|
||||
}
|
||||
],
|
||||
"global_parsing_settings": {
|
||||
"fallback_format": "artist_title_separator",
|
||||
"fallback_separator": " - ",
|
||||
"common_suffixes": [
|
||||
"(Karaoke)",
|
||||
"(Karaoke Version)",
|
||||
"Karaoke Version",
|
||||
"(Karaoke Version) Lyrics",
|
||||
"Karaoke Version Lyrics"
|
||||
],
|
||||
"playlist_indicators": [
|
||||
"TOP",
|
||||
"BEST",
|
||||
"MASHUP",
|
||||
"FEAT.",
|
||||
"WITH LYRICS",
|
||||
"NON-STOP",
|
||||
"PLAYLIST"
|
||||
]
|
||||
}
|
||||
}
|
||||
@ -1,7 +0,0 @@
|
||||
https://www.youtube.com/@SingKingKaraoke/videos
|
||||
https://www.youtube.com/@karafun/videos
|
||||
https://www.youtube.com/@KaraokeOnVEVO/videos
|
||||
https://www.youtube.com/@StingrayKaraoke/videos
|
||||
https://www.youtube.com/@CCKaraoke/videos
|
||||
https://www.youtube.com/@AtomicKaraoke/videos
|
||||
https://www.youtube.com/@sing2karaoke/videos
|
||||
115120
data/karaoke_tracking.json
Normal file
115120
data/karaoke_tracking.json
Normal file
File diff suppressed because it is too large
Load Diff
85
data/manual_videos.json
Normal file
85
data/manual_videos.json
Normal file
@ -0,0 +1,85 @@
|
||||
{
|
||||
"channel_name": "@ManualVideos",
|
||||
"channel_url": "manual://static",
|
||||
"description": "Manual collection of individual karaoke videos",
|
||||
"videos": [
|
||||
{
|
||||
"title": "Nickelback - Photograph",
|
||||
"url": "https://www.youtube.com/watch?v=qZXwpceqt9s",
|
||||
"id": "qZXwpceqt9s",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "Ed Sheeran & Beyoncé - Perfect Duet",
|
||||
"url": "https://www.youtube.com/watch?v=qegLWI99Wg0",
|
||||
"id": "qegLWI99Wg0",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "10,000 Maniacs - More Than This",
|
||||
"url": "https://www.youtube.com/watch?v=wxnuF-APJ5M",
|
||||
"id": "wxnuF-APJ5M",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "AC/DC - Big Balls",
|
||||
"url": "https://www.youtube.com/watch?v=kiSDpVmu4Bk",
|
||||
"id": "kiSDpVmu4Bk",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "Jon Bon Jovi - Blaze of Glory",
|
||||
"url": "https://www.youtube.com/watch?v=SzRAoDMlQY",
|
||||
"id": "SzRAoDMlQY",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "ZZ Top - Sharp Dressed Man",
|
||||
"url": "https://www.youtube.com/watch?v=prRalwto9iY",
|
||||
"id": "prRalwto9iY",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "Nickelback - Photograph",
|
||||
"url": "https://www.youtube.com/watch?v=qTphCTAUhUg",
|
||||
"id": "qTphCTAUhUg",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "Billy Joel - Shes Got A Way",
|
||||
"url": "https://www.youtube.com/watch?v=DeeTFIgKuC8",
|
||||
"id": "DeeTFIgKuC8",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
}
|
||||
],
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": [
|
||||
"(Karaoke)",
|
||||
"(Karaoke Version)",
|
||||
"(Karaoke Version) Lyrics"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@ -23902,7 +23902,7 @@
|
||||
"title": "Superman (It's Not Easy)"
|
||||
},
|
||||
{
|
||||
"artist": "'N Sync",
|
||||
"artist": "'NSync",
|
||||
"position": 16,
|
||||
"title": "Gone"
|
||||
},
|
||||
@ -24122,7 +24122,7 @@
|
||||
"title": "Turn Off The Light"
|
||||
},
|
||||
{
|
||||
"artist": "'N Sync",
|
||||
"artist": "'NSync",
|
||||
"position": 13,
|
||||
"title": "Gone"
|
||||
},
|
||||
@ -24617,7 +24617,7 @@
|
||||
"title": "Most Girls"
|
||||
},
|
||||
{
|
||||
"artist": "'N Sync",
|
||||
"artist": "'NSync",
|
||||
"position": 11,
|
||||
"title": "This I Promise You"
|
||||
},
|
||||
@ -24857,7 +24857,7 @@
|
||||
"title": "I Just Wanna Love U (Give It 2 Me)"
|
||||
},
|
||||
{
|
||||
"artist": "'N Sync",
|
||||
"artist": "'NSync",
|
||||
"position": 12,
|
||||
"title": "This I Promise You"
|
||||
},
|
||||
@ -25857,7 +25857,7 @@
|
||||
"title": "Tha Block Is Hot"
|
||||
},
|
||||
{
|
||||
"artist": "'N Sync & Gloria Estefan",
|
||||
"artist": "'NSync & Gloria Estefan",
|
||||
"position": 85,
|
||||
"title": "Music Of My Heart"
|
||||
},
|
||||
@ -26237,7 +26237,7 @@
|
||||
"title": "Touch It"
|
||||
},
|
||||
{
|
||||
"artist": "N Sync",
|
||||
"artist": "NSync",
|
||||
"position": 34,
|
||||
"title": "(God Must Have Spent) A Little More Time On You"
|
||||
},
|
||||
|
||||
93214
data/songlist_tracking.json
Normal file
93214
data/songlist_tracking.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
downloader/yt-dlp_macos
Executable file
BIN
downloader/yt-dlp_macos
Executable file
Binary file not shown.
@ -9,6 +9,8 @@ import json
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
# Constants
|
||||
DEFAULT_CACHE_EXPIRATION_DAYS = 1
|
||||
DEFAULT_CACHE_FILENAME_LENGTH_LIMIT = 200 # Increased from 60
|
||||
@ -37,7 +39,7 @@ def get_download_plan_cache_file(mode, **kwargs):
|
||||
+ hashlib.md5(base.encode()).hexdigest()[:8]
|
||||
)
|
||||
|
||||
return Path(f"data/{base}.json")
|
||||
return get_data_path_manager().get_path(f"{base}.json")
|
||||
|
||||
|
||||
def load_cached_plan(cache_file, max_age_days=DEFAULT_CACHE_EXPIRATION_DAYS):
|
||||
|
||||
260
karaoke_downloader/channel_parser.py
Normal file
260
karaoke_downloader/channel_parser.py
Normal file
@ -0,0 +1,260 @@
|
||||
"""
|
||||
Channel-specific parsing utilities for extracting artist and title from video titles.
|
||||
|
||||
This module handles the different title formats used by various karaoke channels,
|
||||
providing channel-specific parsing rules to extract artist and title information
|
||||
correctly for ID3 tagging and filename generation.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from typing import Dict, List, Optional, Tuple, Any
|
||||
from pathlib import Path
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
|
||||
class ChannelParser:
|
||||
"""Handles channel-specific parsing of video titles to extract artist and title."""
|
||||
|
||||
def __init__(self, channels_file: str = None):
|
||||
if channels_file is None:
|
||||
channels_file = str(get_data_path_manager().get_channels_json_path())
|
||||
"""Initialize the parser with channel configuration."""
|
||||
self.channels_file = Path(channels_file)
|
||||
self.channels_config = self._load_channels_config()
|
||||
|
||||
def _load_channels_config(self) -> Dict[str, Any]:
|
||||
"""Load the channels configuration from JSON file."""
|
||||
if not self.channels_file.exists():
|
||||
raise FileNotFoundError(f"Channels configuration file not found: {self.channels_file}")
|
||||
|
||||
with open(self.channels_file, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
|
||||
def get_channel_config(self, channel_name: str) -> Optional[Dict[str, Any]]:
|
||||
"""Get the configuration for a specific channel."""
|
||||
for channel in self.channels_config.get("channels", []):
|
||||
if channel["name"] == channel_name:
|
||||
return channel
|
||||
return None
|
||||
|
||||
def extract_artist_title(self, video_title: str, channel_name: str) -> Tuple[str, str]:
|
||||
"""
|
||||
Extract artist and title from a video title using channel-specific parsing rules.
|
||||
|
||||
Args:
|
||||
video_title: The full video title from YouTube
|
||||
channel_name: The name of the channel (must match config)
|
||||
|
||||
Returns:
|
||||
Tuple of (artist, title) - both may be empty strings if parsing fails
|
||||
"""
|
||||
channel_config = self.get_channel_config(channel_name)
|
||||
if not channel_config:
|
||||
# Fallback to global settings
|
||||
return self._fallback_parse(video_title)
|
||||
|
||||
parsing_rules = channel_config.get("parsing_rules", {})
|
||||
format_type = parsing_rules.get("format", "artist_title_separator")
|
||||
|
||||
if format_type == "artist_title_separator":
|
||||
return self._parse_artist_title_separator(video_title, parsing_rules)
|
||||
elif format_type == "artist_title_spaces":
|
||||
return self._parse_artist_title_spaces(video_title, parsing_rules)
|
||||
elif format_type == "title_artist_pipe":
|
||||
return self._parse_title_artist_pipe(video_title, parsing_rules)
|
||||
else:
|
||||
return self._fallback_parse(video_title)
|
||||
|
||||
def _parse_artist_title_separator(self, video_title: str, rules: Dict[str, Any]) -> Tuple[str, str]:
|
||||
"""Parse format: 'Artist - Title' or 'Title - Artist'."""
|
||||
separator = rules.get("separator", " - ")
|
||||
artist_first = rules.get("artist_first", True)
|
||||
|
||||
if separator not in video_title:
|
||||
return "", video_title.strip()
|
||||
|
||||
parts = video_title.split(separator, 1)
|
||||
if len(parts) != 2:
|
||||
return "", video_title.strip()
|
||||
|
||||
part1, part2 = parts[0].strip(), parts[1].strip()
|
||||
|
||||
# Apply cleanup to both parts
|
||||
part1_clean = self._cleanup_title(part1, rules.get("title_cleanup", {}))
|
||||
part2_clean = self._cleanup_title(part2, rules.get("title_cleanup", {}))
|
||||
|
||||
if artist_first:
|
||||
return part1_clean, part2_clean
|
||||
else:
|
||||
return part2_clean, part1_clean
|
||||
|
||||
def _parse_artist_title_spaces(self, video_title: str, rules: Dict[str, Any]) -> Tuple[str, str]:
|
||||
"""Parse format: 'Artist Title' (multiple spaces)."""
|
||||
separator = rules.get("separator", " ")
|
||||
multi_artist_sep = rules.get("multi_artist_separator", ", ")
|
||||
|
||||
# Try multiple space patterns to handle inconsistent spacing
|
||||
# Look for the LAST occurrence of multiple spaces to handle cases with commas
|
||||
space_patterns = [" ", " ", " "] # 3, 2, 4 spaces
|
||||
|
||||
for pattern in space_patterns:
|
||||
if pattern in video_title:
|
||||
# Split on the LAST occurrence of the pattern
|
||||
last_index = video_title.rfind(pattern)
|
||||
if last_index != -1:
|
||||
artist_part = video_title[:last_index].strip()
|
||||
title_part = video_title[last_index + len(pattern):].strip()
|
||||
|
||||
# Handle multiple artists (e.g., "Artist1, Artist2")
|
||||
if multi_artist_sep in artist_part:
|
||||
# Keep the full artist string as is
|
||||
artist = artist_part
|
||||
else:
|
||||
artist = artist_part
|
||||
|
||||
title = self._cleanup_title(title_part, rules.get("title_cleanup", {}))
|
||||
|
||||
return artist, title
|
||||
|
||||
# Try dash patterns as fallback for inconsistent formatting
|
||||
dash_patterns = [" - ", " – ", " -"] # Regular dash, en dash, dash without trailing space
|
||||
|
||||
for pattern in dash_patterns:
|
||||
if pattern in video_title:
|
||||
# Split on the LAST occurrence of the pattern
|
||||
last_index = video_title.rfind(pattern)
|
||||
if last_index != -1:
|
||||
artist_part = video_title[:last_index].strip()
|
||||
title_part = video_title[last_index + len(pattern):].strip()
|
||||
|
||||
# Handle multiple artists (e.g., "Artist1, Artist2")
|
||||
if multi_artist_sep in artist_part:
|
||||
# Keep the full artist string as is
|
||||
artist = artist_part
|
||||
else:
|
||||
artist = artist_part
|
||||
|
||||
title = self._cleanup_title(title_part, rules.get("title_cleanup", {}))
|
||||
|
||||
return artist, title
|
||||
|
||||
# If no pattern matches, return empty artist and full title
|
||||
return "", video_title.strip()
|
||||
|
||||
def _parse_title_artist_pipe(self, video_title: str, rules: Dict[str, Any]) -> Tuple[str, str]:
|
||||
"""Parse format: 'Title | Artist'."""
|
||||
separator = rules.get("separator", " | ")
|
||||
|
||||
if separator not in video_title:
|
||||
return "", video_title.strip()
|
||||
|
||||
parts = video_title.split(separator, 1)
|
||||
if len(parts) != 2:
|
||||
return "", video_title.strip()
|
||||
|
||||
title_part, artist_part = parts[0].strip(), parts[1].strip()
|
||||
|
||||
title = self._cleanup_title(title_part, rules.get("title_cleanup", {}))
|
||||
artist = self._cleanup_title(artist_part, rules.get("artist_cleanup", {}))
|
||||
|
||||
return artist, title
|
||||
|
||||
def _cleanup_title(self, text: str, cleanup_rules: Dict[str, Any]) -> str:
|
||||
"""Apply cleanup rules to remove suffixes and normalize text."""
|
||||
if not cleanup_rules:
|
||||
return text.strip()
|
||||
|
||||
cleaned = text.strip()
|
||||
|
||||
# Handle remove_suffix rule
|
||||
if "remove_suffix" in cleanup_rules:
|
||||
suffixes = cleanup_rules["remove_suffix"].get("suffixes", [])
|
||||
for suffix in suffixes:
|
||||
if cleaned.endswith(suffix):
|
||||
cleaned = cleaned[:-len(suffix)].strip()
|
||||
break
|
||||
|
||||
return cleaned
|
||||
|
||||
def _fallback_parse(self, video_title: str) -> Tuple[str, str]:
|
||||
"""Fallback parsing using global settings."""
|
||||
global_settings = self.channels_config.get("global_parsing_settings", {})
|
||||
fallback_format = global_settings.get("fallback_format", "artist_title_separator")
|
||||
fallback_separator = global_settings.get("fallback_separator", " - ")
|
||||
|
||||
if fallback_format == "artist_title_separator":
|
||||
if fallback_separator in video_title:
|
||||
parts = video_title.split(fallback_separator, 1)
|
||||
if len(parts) == 2:
|
||||
artist = parts[0].strip()
|
||||
title = parts[1].strip()
|
||||
# Apply global suffix cleanup
|
||||
for suffix in global_settings.get("common_suffixes", []):
|
||||
if title.endswith(suffix):
|
||||
title = title[:-len(suffix)].strip()
|
||||
break
|
||||
return artist, title
|
||||
|
||||
# If all else fails, return empty artist and full title
|
||||
return "", video_title.strip()
|
||||
|
||||
def is_playlist_title(self, video_title: str, channel_name: str) -> bool:
|
||||
"""Check if a video title appears to be a playlist rather than a single song."""
|
||||
channel_config = self.get_channel_config(channel_name)
|
||||
if not channel_config:
|
||||
return self._is_playlist_by_global_rules(video_title)
|
||||
|
||||
parsing_rules = channel_config.get("parsing_rules", {})
|
||||
playlist_indicators = parsing_rules.get("playlist_indicators", [])
|
||||
|
||||
if not playlist_indicators:
|
||||
return self._is_playlist_by_global_rules(video_title)
|
||||
|
||||
title_upper = video_title.upper()
|
||||
for indicator in playlist_indicators:
|
||||
if indicator.upper() in title_upper:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _is_playlist_by_global_rules(self, video_title: str) -> bool:
|
||||
"""Check if title is a playlist using global rules."""
|
||||
global_settings = self.channels_config.get("global_parsing_settings", {})
|
||||
playlist_indicators = global_settings.get("playlist_indicators", [])
|
||||
|
||||
title_upper = video_title.upper()
|
||||
for indicator in playlist_indicators:
|
||||
if indicator.upper() in title_upper:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def get_all_channel_names(self) -> List[str]:
|
||||
"""Get a list of all configured channel names."""
|
||||
return [channel["name"] for channel in self.channels_config.get("channels", [])]
|
||||
|
||||
def get_channel_url(self, channel_name: str) -> Optional[str]:
|
||||
"""Get the URL for a specific channel."""
|
||||
channel_config = self.get_channel_config(channel_name)
|
||||
return channel_config.get("url") if channel_config else None
|
||||
|
||||
|
||||
# Convenience function for backward compatibility
|
||||
def extract_artist_title(video_title: str, channel_name: str, channels_file: str = None) -> Tuple[str, str]:
|
||||
if channels_file is None:
|
||||
channels_file = str(get_data_path_manager().get_channels_json_path())
|
||||
"""
|
||||
Convenience function to extract artist and title from a video title.
|
||||
|
||||
Args:
|
||||
video_title: The full video title from YouTube
|
||||
channel_name: The name of the channel
|
||||
channels_file: Path to the channels configuration file
|
||||
|
||||
Returns:
|
||||
Tuple of (artist, title)
|
||||
"""
|
||||
parser = ChannelParser(channels_file)
|
||||
return parser.extract_artist_title(video_title, channel_name)
|
||||
@ -1,27 +1,117 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Karaoke Video Downloader CLI
|
||||
Command-line interface for the karaoke video downloader.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
from karaoke_downloader.channel_parser import ChannelParser
|
||||
from karaoke_downloader.config_manager import AppConfig
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
from karaoke_downloader.downloader import KaraokeDownloader
|
||||
|
||||
# Constants
|
||||
DEFAULT_LATEST_PER_CHANNEL_LIMIT = 10
|
||||
DEFAULT_FUZZY_THRESHOLD = 85
|
||||
DEFAULT_LATEST_PER_CHANNEL_LIMIT = 5
|
||||
DEFAULT_DISPLAY_LIMIT = 10
|
||||
DEFAULT_CACHE_DURATION_HOURS = 24
|
||||
|
||||
|
||||
def load_channels_from_json(channels_file: str = None) -> List[str]:
|
||||
"""
|
||||
Load channel URLs from the new JSON format.
|
||||
|
||||
Args:
|
||||
channels_file: Path to the channels.json file (if None, uses default from config)
|
||||
|
||||
Returns:
|
||||
List of channel URLs
|
||||
"""
|
||||
if channels_file is None:
|
||||
channels_file = str(get_data_path_manager().get_channels_json_path())
|
||||
|
||||
try:
|
||||
parser = ChannelParser(channels_file)
|
||||
channels = parser.channels_config.get("channels", [])
|
||||
return [channel["url"] for channel in channels]
|
||||
except Exception as e:
|
||||
print(f"❌ Error loading channels from {channels_file}: {e}")
|
||||
return []
|
||||
|
||||
|
||||
def load_channels_from_text(channels_file: str = None) -> List[str]:
|
||||
"""
|
||||
Load channel URLs from the old text format (for backward compatibility).
|
||||
|
||||
Args:
|
||||
channels_file: Path to the channels.txt file (if None, uses default from config)
|
||||
|
||||
Returns:
|
||||
List of channel URLs
|
||||
"""
|
||||
if channels_file is None:
|
||||
channels_file = str(get_data_path_manager().get_channels_txt_path())
|
||||
|
||||
try:
|
||||
with open(channels_file, "r", encoding="utf-8") as f:
|
||||
return [
|
||||
line.strip()
|
||||
for line in f
|
||||
if line.strip() and not line.strip().startswith("#")
|
||||
]
|
||||
except Exception as e:
|
||||
print(f"❌ Error loading channels from {channels_file}: {e}")
|
||||
return []
|
||||
|
||||
|
||||
def load_channels(channel_file: str = None) -> List[str]:
|
||||
"""Load channel URLs from file."""
|
||||
if channel_file is None:
|
||||
# Use JSON configuration
|
||||
data_path_manager = get_data_path_manager()
|
||||
if data_path_manager.file_exists("channels.json"):
|
||||
return load_channels_from_json()
|
||||
else:
|
||||
return []
|
||||
else:
|
||||
if channel_file.endswith(".json"):
|
||||
return load_channels_from_json(channel_file)
|
||||
else:
|
||||
return load_channels_from_text(channel_file)
|
||||
|
||||
|
||||
def get_channel_url_by_name(channel_name: str) -> str:
|
||||
"""Look up a channel URL by its name from the channels configuration."""
|
||||
channel_urls = load_channels()
|
||||
|
||||
# Normalize the channel name for comparison
|
||||
normalized_name = channel_name.lower().replace("@", "").replace("karaoke", "").strip()
|
||||
|
||||
for url in channel_urls:
|
||||
# Extract channel name from URL
|
||||
if "/@" in url:
|
||||
url_channel_name = url.split("/@")[1].split("/")[0].lower()
|
||||
if url_channel_name == normalized_name or url_channel_name.replace("karaoke", "").strip() == normalized_name:
|
||||
return url
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Karaoke Video Downloader - Download YouTube playlists and channel videos for karaoke",
|
||||
description="Karaoke Video Downloader - Download YouTube playlists and channel videos for karaoke (default: downloads latest videos from all channels)",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python download_karaoke.py https://www.youtube.com/playlist?list=XYZ
|
||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
|
||||
python download_karaoke.py --file data/channels.txt
|
||||
python download_karaoke.py --limit 10 # Download latest 10 videos from all channels
|
||||
python download_karaoke.py --songlist-only --limit 10 # Download only songlist songs across channels
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --limit 5 # Download from specific channel
|
||||
python download_karaoke.py --channel-focus SingKingKaraoke --all-videos # Download ALL videos from channel
|
||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos # Download from specific channel URL
|
||||
python download_karaoke.py --file data/channels.txt # Download from custom channel list
|
||||
python download_karaoke.py --reset-channel SingKingKaraoke --delete-files
|
||||
""",
|
||||
)
|
||||
@ -92,13 +182,34 @@ Examples:
|
||||
parser.add_argument(
|
||||
"--songlist-priority",
|
||||
action="store_true",
|
||||
help="Prioritize downloads based on data/songList.json (default: enabled)",
|
||||
help="Prioritize downloads based on songList.json in the data directory (default: enabled)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-songlist-priority",
|
||||
action="store_true",
|
||||
help="Disable songlist prioritization",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--generate-unmatched-report",
|
||||
action="store_true",
|
||||
help="Generate a report of songs that couldn't be found in any channel (runs after downloads)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--show-pagination",
|
||||
action="store_true",
|
||||
help="Show page-by-page progress when downloading channel video lists (slower but more detailed)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--parallel-channels",
|
||||
action="store_true",
|
||||
help="Enable parallel channel scanning for faster channel processing (scans multiple channels simultaneously)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--channel-workers",
|
||||
type=int,
|
||||
default=3,
|
||||
help="Number of parallel channel scanning workers (default: 3, max: 10)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--songlist-only",
|
||||
action="store_true",
|
||||
@ -110,6 +221,16 @@ Examples:
|
||||
metavar="PLAYLIST_TITLE",
|
||||
help='Focus on specific playlists by title (e.g., --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100")',
|
||||
)
|
||||
parser.add_argument(
|
||||
"--songlist-file",
|
||||
metavar="FILE_PATH",
|
||||
help="Custom songlist file path to use with --songlist-focus (default: songList.json in the data directory)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--force",
|
||||
action="store_true",
|
||||
help="Force download from channels regardless of whether songs are already downloaded, on server, or marked as duplicates",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--songlist-status",
|
||||
action="store_true",
|
||||
@ -146,7 +267,7 @@ Examples:
|
||||
parser.add_argument(
|
||||
"--latest-per-channel",
|
||||
action="store_true",
|
||||
help="Download the latest N videos from each channel (use with --limit)",
|
||||
help="Download the latest N videos from each channel (use with --limit) [DEPRECATED: This is now the default behavior]",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--fuzzy-match",
|
||||
@ -156,19 +277,50 @@ Examples:
|
||||
parser.add_argument(
|
||||
"--fuzzy-threshold",
|
||||
type=int,
|
||||
default=90,
|
||||
help="Fuzzy match threshold (0-100, default 90)",
|
||||
default=DEFAULT_FUZZY_THRESHOLD,
|
||||
help=f"Fuzzy match threshold (0-100, default {DEFAULT_FUZZY_THRESHOLD})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--parallel",
|
||||
action="store_true",
|
||||
help="Enable parallel downloads for improved speed",
|
||||
help="Enable parallel downloads for improved speed (3-5x faster for large batches, defaults to 3 workers)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--workers",
|
||||
type=int,
|
||||
default=3,
|
||||
help="Number of parallel download workers (default: 3, max: 10)",
|
||||
help="Number of parallel download workers (default: 3, max: 10, only used with --parallel)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--generate-songlist",
|
||||
nargs="+",
|
||||
metavar="DIRECTORY",
|
||||
help="Generate song list from MP4 files with ID3 tags in specified directories",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-append-songlist",
|
||||
action="store_true",
|
||||
help="Create a new song list instead of appending when using --generate-songlist",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--manual",
|
||||
action="store_true",
|
||||
help="Download from manual videos collection (manual_videos.json in the data directory)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--channel-focus",
|
||||
type=str,
|
||||
help="Download from a specific channel by name (e.g., 'SingKingKaraoke')",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--all-videos",
|
||||
action="store_true",
|
||||
help="Download all videos from channel (not just songlist matches), skipping existing files",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dry-run",
|
||||
action="store_true",
|
||||
help="Build download plan and show what would be downloaded without actually downloading anything",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
@ -177,12 +329,42 @@ Examples:
|
||||
print("❌ Error: --workers must be between 1 and 10")
|
||||
sys.exit(1)
|
||||
|
||||
yt_dlp_path = Path("downloader/yt-dlp.exe")
|
||||
if not yt_dlp_path.exists():
|
||||
print("❌ Error: yt-dlp.exe not found in downloader/ directory")
|
||||
print("Please ensure yt-dlp.exe is present in the downloader/ folder")
|
||||
# Validate channel workers argument
|
||||
if args.channel_workers < 1 or args.channel_workers > 10:
|
||||
print("❌ Error: --channel-workers must be between 1 and 10")
|
||||
sys.exit(1)
|
||||
|
||||
# Load configuration to get platform-aware yt-dlp path
|
||||
from karaoke_downloader.config_manager import load_config
|
||||
config = load_config()
|
||||
yt_dlp_path = config.yt_dlp_path
|
||||
|
||||
# Check if it's a command string (like "python3 -m yt_dlp") or a file path
|
||||
if yt_dlp_path.startswith(('python', 'python3')):
|
||||
# It's a command string, test if it works
|
||||
try:
|
||||
import subprocess
|
||||
cmd = yt_dlp_path.split() + ["--version"]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
||||
if result.returncode != 0:
|
||||
raise Exception(f"Command failed: {result.stderr}")
|
||||
except Exception as e:
|
||||
platform_name = "macOS" if sys.platform == "darwin" else "Windows"
|
||||
print(f"❌ Error: yt-dlp command failed: {yt_dlp_path}")
|
||||
print(f"Please ensure yt-dlp is properly installed for {platform_name}")
|
||||
print(f"Error: {e}")
|
||||
sys.exit(1)
|
||||
else:
|
||||
# It's a file path, check if it exists
|
||||
yt_dlp_file = Path(yt_dlp_path)
|
||||
if not yt_dlp_file.exists():
|
||||
platform_name = "macOS" if sys.platform == "darwin" else "Windows"
|
||||
binary_name = yt_dlp_file.name
|
||||
print(f"❌ Error: {binary_name} not found in downloader/ directory")
|
||||
print(f"Please ensure {binary_name} is present in the downloader/ folder for {platform_name}")
|
||||
print(f"Expected path: {yt_dlp_file}")
|
||||
sys.exit(1)
|
||||
|
||||
downloader = KaraokeDownloader()
|
||||
|
||||
# Set parallel download options
|
||||
@ -210,9 +392,19 @@ Examples:
|
||||
if args.songlist_focus:
|
||||
downloader.songlist_focus_titles = args.songlist_focus
|
||||
downloader.songlist_only = True # Enable songlist-only mode when focusing
|
||||
args.songlist_only = True # Also set the args flag to ensure CLI logic works
|
||||
print(
|
||||
f"🎯 Songlist focus mode enabled for playlists: {', '.join(args.songlist_focus)}"
|
||||
)
|
||||
if args.songlist_file:
|
||||
downloader.songlist_file_path = args.songlist_file
|
||||
print(f"📁 Using custom songlist file: {args.songlist_file}")
|
||||
if args.force:
|
||||
downloader.force_download = True
|
||||
print("💪 Force mode enabled - will download regardless of existing files or server duplicates")
|
||||
if args.dry_run:
|
||||
downloader.dry_run = True
|
||||
print("🔍 Dry run mode enabled - will show download plan without downloading")
|
||||
if args.resolution != "720p":
|
||||
downloader.config_manager.update_resolution(args.resolution)
|
||||
|
||||
@ -226,17 +418,16 @@ Examples:
|
||||
sys.exit(0)
|
||||
# --- END NEW ---
|
||||
|
||||
# --- NEW: If no URL or file is provided, but --songlist-only is set, use all channels in data/channels.txt ---
|
||||
if args.songlist_only and not args.url and not args.file:
|
||||
channels_file = Path("data/channels.txt")
|
||||
if channels_file.exists():
|
||||
args.file = str(channels_file)
|
||||
# --- NEW: If no URL or file is provided, but --songlist-only is set, use all channels ---
|
||||
if (args.songlist_only or args.songlist_focus) and not args.url and not args.file:
|
||||
channel_urls = load_channels()
|
||||
if channel_urls:
|
||||
print(
|
||||
"📋 No URL or --file provided, defaulting to all channels in data/channels.txt for songlist-only mode."
|
||||
"📋 No URL or --file provided, defaulting to all configured channels for songlist mode."
|
||||
)
|
||||
else:
|
||||
print(
|
||||
"❌ No URL, --file, or data/channels.txt found. Please provide a channel URL or a file with channel URLs."
|
||||
"❌ No URL, --file, or channel configuration found. Please provide a channel URL or create channels.json in the data directory."
|
||||
)
|
||||
sys.exit(1)
|
||||
# --- END NEW ---
|
||||
@ -256,6 +447,22 @@ Examples:
|
||||
print("ℹ️ Songs will be re-checked against the server on next run.")
|
||||
sys.exit(0)
|
||||
|
||||
if args.generate_songlist:
|
||||
from karaoke_downloader.songlist_generator import SongListGenerator
|
||||
|
||||
print("🎵 Generating song list from MP4 files with ID3 tags...")
|
||||
generator = SongListGenerator()
|
||||
try:
|
||||
generator.generate_songlist_from_multiple_directories(
|
||||
args.generate_songlist,
|
||||
append=not args.no_append_songlist
|
||||
)
|
||||
print("✅ Song list generation completed successfully!")
|
||||
except Exception as e:
|
||||
print(f"❌ Error generating song list: {e}")
|
||||
sys.exit(1)
|
||||
sys.exit(0)
|
||||
|
||||
if args.status:
|
||||
stats = downloader.tracker.get_statistics()
|
||||
print("🎤 Karaoke Downloader Status")
|
||||
@ -273,9 +480,10 @@ Examples:
|
||||
print("💾 Channel Cache Information")
|
||||
print("=" * 40)
|
||||
print(f"Total Channels: {cache_info['total_channels']}")
|
||||
print(f"Total Cached Videos: {cache_info['total_cached_videos']}")
|
||||
print(f"Cache Duration: {cache_info['cache_duration_hours']} hours")
|
||||
print(f"Last Updated: {cache_info['last_updated']}")
|
||||
print(f"Total Cached Videos: {cache_info['total_videos']}")
|
||||
print("\n📋 Channel Details:")
|
||||
for channel in cache_info['channels']:
|
||||
print(f" • {channel['channel']}: {channel['videos']} videos (updated: {channel['last_updated']})")
|
||||
sys.exit(0)
|
||||
elif args.clear_cache:
|
||||
if args.clear_cache == "all":
|
||||
@ -315,47 +523,77 @@ Examples:
|
||||
if len(tracking) > 10:
|
||||
print(f" ... and {len(tracking) - 10} more")
|
||||
sys.exit(0)
|
||||
elif args.songlist_only or args.songlist_focus:
|
||||
# Use provided file or default to data/channels.txt
|
||||
channel_file = args.file if args.file else "data/channels.txt"
|
||||
if not os.path.exists(channel_file):
|
||||
print(f"❌ Channel file not found: {channel_file}")
|
||||
elif args.manual:
|
||||
# Download from manual videos collection
|
||||
print("🎤 Downloading from manual videos collection...")
|
||||
success = downloader.download_channel_videos(
|
||||
"manual://static",
|
||||
force_refresh=args.refresh,
|
||||
fuzzy_match=args.fuzzy_match,
|
||||
fuzzy_threshold=args.fuzzy_threshold,
|
||||
force_download=args.force,
|
||||
)
|
||||
elif args.channel_focus:
|
||||
# Download from a specific channel by name
|
||||
print(f"🎤 Looking up channel: {args.channel_focus}")
|
||||
channel_url = get_channel_url_by_name(args.channel_focus)
|
||||
|
||||
if not channel_url:
|
||||
print(f"❌ Channel '{args.channel_focus}' not found in configuration")
|
||||
print("Available channels:")
|
||||
channel_urls = load_channels()
|
||||
for url in channel_urls:
|
||||
if "/@" in url:
|
||||
channel_name = url.split("/@")[1].split("/")[0]
|
||||
print(f" • {channel_name}")
|
||||
sys.exit(1)
|
||||
|
||||
if args.all_videos:
|
||||
# Download ALL videos from the channel (not just songlist matches)
|
||||
print(f"🎤 Downloading ALL videos from channel: {args.channel_focus} ({channel_url})")
|
||||
success = downloader.download_all_channel_videos(
|
||||
channel_url,
|
||||
force_refresh=args.refresh,
|
||||
force_download=args.force,
|
||||
limit=args.limit,
|
||||
dry_run=args.dry_run,
|
||||
)
|
||||
else:
|
||||
# Download only songlist matches from the channel
|
||||
print(f"🎤 Downloading from channel: {args.channel_focus} ({channel_url})")
|
||||
success = downloader.download_channel_videos(
|
||||
channel_url,
|
||||
force_refresh=args.refresh,
|
||||
fuzzy_match=args.fuzzy_match,
|
||||
fuzzy_threshold=args.fuzzy_threshold,
|
||||
force_download=args.force,
|
||||
dry_run=args.dry_run,
|
||||
)
|
||||
elif args.songlist_only or args.songlist_focus:
|
||||
# Use provided file or default to channels configuration
|
||||
channel_urls = load_channels(args.file)
|
||||
if not channel_urls:
|
||||
print(f"❌ No channels found in configuration")
|
||||
sys.exit(1)
|
||||
with open(channel_file, "r", encoding="utf-8") as f:
|
||||
channel_urls = [
|
||||
line.strip()
|
||||
for line in f
|
||||
if line.strip() and not line.strip().startswith("#")
|
||||
]
|
||||
limit = args.limit if args.limit else None
|
||||
force_refresh_download_plan = (
|
||||
args.force_download_plan if hasattr(args, "force_download_plan") else False
|
||||
)
|
||||
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
||||
fuzzy_threshold = (
|
||||
args.fuzzy_threshold
|
||||
if hasattr(args, "fuzzy_threshold")
|
||||
else DEFAULT_FUZZY_THRESHOLD
|
||||
)
|
||||
success = downloader.download_songlist_across_channels(
|
||||
channel_urls,
|
||||
limit=limit,
|
||||
force_refresh_download_plan=force_refresh_download_plan,
|
||||
fuzzy_match=fuzzy_match,
|
||||
fuzzy_threshold=fuzzy_threshold,
|
||||
limit=args.limit,
|
||||
force_refresh_download_plan=args.force_download_plan if hasattr(args, "force_download_plan") else False,
|
||||
fuzzy_match=args.fuzzy_match,
|
||||
fuzzy_threshold=args.fuzzy_threshold,
|
||||
force_download=args.force,
|
||||
show_pagination=args.show_pagination,
|
||||
parallel_channels=args.parallel_channels,
|
||||
max_channel_workers=args.channel_workers,
|
||||
dry_run=args.dry_run,
|
||||
)
|
||||
elif args.latest_per_channel:
|
||||
# Use provided file or default to data/channels.txt
|
||||
channel_file = args.file if args.file else "data/channels.txt"
|
||||
if not os.path.exists(channel_file):
|
||||
print(f"❌ Channel file not found: {channel_file}")
|
||||
# Use provided file or default to channels configuration
|
||||
channel_urls = load_channels(args.file)
|
||||
if not channel_urls:
|
||||
print(f"❌ No channels found in configuration")
|
||||
sys.exit(1)
|
||||
with open(channel_file, "r", encoding="utf-8") as f:
|
||||
channel_urls = [
|
||||
line.strip()
|
||||
for line in f
|
||||
if line.strip() and not line.strip().startswith("#")
|
||||
]
|
||||
limit = args.limit if args.limit else DEFAULT_LATEST_PER_CHANNEL_LIMIT
|
||||
force_refresh_download_plan = (
|
||||
args.force_download_plan if hasattr(args, "force_download_plan") else False
|
||||
@ -372,14 +610,156 @@ Examples:
|
||||
force_refresh_download_plan=force_refresh_download_plan,
|
||||
fuzzy_match=fuzzy_match,
|
||||
fuzzy_threshold=fuzzy_threshold,
|
||||
force_download=args.force,
|
||||
dry_run=args.dry_run,
|
||||
)
|
||||
elif args.url:
|
||||
success = downloader.download_channel_videos(
|
||||
args.url, force_refresh=args.refresh
|
||||
args.url, force_refresh=args.refresh, dry_run=args.dry_run
|
||||
)
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
# Default behavior: download from channels (equivalent to --latest-per-channel)
|
||||
print("🎯 No specific mode specified, defaulting to download from channels")
|
||||
channel_urls = load_channels(args.file)
|
||||
if not channel_urls:
|
||||
print(f"❌ No channels found in configuration")
|
||||
print("Please provide a channel URL or create channels.json in the data directory")
|
||||
sys.exit(1)
|
||||
limit = args.limit if args.limit else DEFAULT_LATEST_PER_CHANNEL_LIMIT
|
||||
force_refresh_download_plan = (
|
||||
args.force_download_plan if hasattr(args, "force_download_plan") else False
|
||||
)
|
||||
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
||||
fuzzy_threshold = (
|
||||
args.fuzzy_threshold
|
||||
if hasattr(args, "fuzzy_threshold")
|
||||
else DEFAULT_FUZZY_THRESHOLD
|
||||
)
|
||||
success = downloader.download_latest_per_channel(
|
||||
channel_urls,
|
||||
limit=limit,
|
||||
force_refresh_download_plan=force_refresh_download_plan,
|
||||
fuzzy_match=fuzzy_match,
|
||||
fuzzy_threshold=fuzzy_threshold,
|
||||
force_download=args.force,
|
||||
dry_run=args.dry_run,
|
||||
)
|
||||
|
||||
# Generate unmatched report if requested (additive feature)
|
||||
if args.generate_unmatched_report:
|
||||
from karaoke_downloader.download_planner import generate_unmatched_report, build_download_plan
|
||||
from karaoke_downloader.songlist_manager import load_songlist
|
||||
|
||||
print("\n🔍 Generating unmatched songs report...")
|
||||
|
||||
# Load songlist based on focus mode
|
||||
if args.songlist_focus:
|
||||
# Load focused playlists
|
||||
songlist_file_path = args.songlist_file if args.songlist_file else str(get_data_path_manager().get_songlist_path())
|
||||
songlist_file = Path(songlist_file_path)
|
||||
if not songlist_file.exists():
|
||||
print(f"⚠️ Songlist file not found: {songlist_file_path}")
|
||||
else:
|
||||
try:
|
||||
with open(songlist_file, "r", encoding="utf-8") as f:
|
||||
raw_data = json.load(f)
|
||||
|
||||
# Filter playlists by title
|
||||
focused_playlists = []
|
||||
for playlist in raw_data:
|
||||
playlist_title = playlist.get("title", "")
|
||||
if playlist_title in args.songlist_focus:
|
||||
focused_playlists.append(playlist)
|
||||
|
||||
if focused_playlists:
|
||||
# Flatten the focused playlists into songs
|
||||
focused_songs = []
|
||||
seen = set()
|
||||
for playlist in focused_playlists:
|
||||
if "songs" in playlist:
|
||||
for song in playlist["songs"]:
|
||||
if "artist" in song and "title" in song:
|
||||
artist = song["artist"].strip()
|
||||
title = song["title"].strip()
|
||||
key = f"{artist.lower()}_{title.lower()}"
|
||||
if key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
focused_songs.append(
|
||||
{
|
||||
"artist": artist,
|
||||
"title": title,
|
||||
"position": song.get("position", 0),
|
||||
}
|
||||
)
|
||||
|
||||
songlist = focused_songs
|
||||
else:
|
||||
print(f"⚠️ No playlists found matching: {', '.join(args.songlist_focus)}")
|
||||
songlist = []
|
||||
|
||||
except (json.JSONDecodeError, FileNotFoundError) as e:
|
||||
print(f"⚠️ Could not load songlist for report: {e}")
|
||||
songlist = []
|
||||
else:
|
||||
# Load all songs from songlist
|
||||
songlist_path = args.songlist_file if args.songlist_file else str(get_data_path_manager().get_songlist_path())
|
||||
songlist = load_songlist(songlist_path)
|
||||
|
||||
if songlist:
|
||||
# Load channel URLs
|
||||
channel_file = args.file if args.file else str(get_data_path_manager().get_channels_txt_path())
|
||||
if os.path.exists(channel_file):
|
||||
with open(channel_file, "r", encoding='utf-8') as f:
|
||||
channel_urls = [
|
||||
line.strip()
|
||||
for line in f
|
||||
if line.strip() and not line.strip().startswith("#")
|
||||
]
|
||||
|
||||
print(f"📋 Analyzing {len(songlist)} songs against {len(channel_urls)} channels...")
|
||||
|
||||
# Build download plan to get unmatched songs
|
||||
fuzzy_match = args.fuzzy_match if hasattr(args, "fuzzy_match") else False
|
||||
fuzzy_threshold = (
|
||||
args.fuzzy_threshold
|
||||
if hasattr(args, "fuzzy_threshold")
|
||||
else DEFAULT_FUZZY_THRESHOLD
|
||||
)
|
||||
|
||||
try:
|
||||
download_plan, unmatched = build_download_plan(
|
||||
channel_urls,
|
||||
songlist,
|
||||
downloader.tracker,
|
||||
downloader.yt_dlp_path,
|
||||
fuzzy_match=fuzzy_match,
|
||||
fuzzy_threshold=fuzzy_threshold,
|
||||
)
|
||||
|
||||
if unmatched:
|
||||
report_file = generate_unmatched_report(unmatched)
|
||||
print(f"\n📋 Unmatched songs report generated successfully!")
|
||||
print(f"📁 Report saved to: {report_file}")
|
||||
print(f"📊 Summary: {len(download_plan)} songs found, {len(unmatched)} songs not found")
|
||||
print(f"\n🔍 First 10 unmatched songs:")
|
||||
for i, song in enumerate(unmatched[:10], 1):
|
||||
print(f" {i:2d}. {song['artist']} - {song['title']}")
|
||||
if len(unmatched) > 10:
|
||||
print(f" ... and {len(unmatched) - 10} more songs")
|
||||
else:
|
||||
print(f"\n✅ All {len(songlist)} songs were found in the channels!")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error generating report: {e}")
|
||||
else:
|
||||
print(f"❌ Channel file not found: {channel_file}")
|
||||
else:
|
||||
print("❌ No songlist available for report generation")
|
||||
|
||||
# Initialize success variable
|
||||
success = False
|
||||
|
||||
downloader.tracker.force_save()
|
||||
if success:
|
||||
print("\n🎤 All downloads completed successfully!")
|
||||
|
||||
@ -4,6 +4,8 @@ Provides centralized configuration loading, validation, and management.
|
||||
"""
|
||||
|
||||
import json
|
||||
import platform
|
||||
import sys
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
@ -34,6 +36,7 @@ DEFAULT_CONFIG = {
|
||||
"folder_structure": {
|
||||
"downloads_dir": "downloads",
|
||||
"logs_dir": "logs",
|
||||
"data_dir": "data",
|
||||
"tracking_file": "data/karaoke_tracking.json",
|
||||
},
|
||||
"logging": {
|
||||
@ -42,6 +45,13 @@ DEFAULT_CONFIG = {
|
||||
"include_console": True,
|
||||
"include_file": True,
|
||||
},
|
||||
"platform_settings": {
|
||||
"auto_detect_platform": True,
|
||||
"yt_dlp_paths": {
|
||||
"windows": "downloader/yt-dlp.exe",
|
||||
"macos": "downloader/yt-dlp_macos"
|
||||
}
|
||||
},
|
||||
"yt_dlp_path": "downloader/yt-dlp.exe",
|
||||
}
|
||||
|
||||
@ -55,6 +65,23 @@ RESOLUTION_MAP = {
|
||||
}
|
||||
|
||||
|
||||
def detect_platform() -> str:
|
||||
"""Detect the current platform and return platform name."""
|
||||
system = platform.system().lower()
|
||||
if system == "windows":
|
||||
return "windows"
|
||||
elif system == "darwin":
|
||||
return "macos"
|
||||
else:
|
||||
return "windows" # Default to Windows for other platforms
|
||||
|
||||
|
||||
def get_platform_yt_dlp_path(platform_paths: Dict[str, str]) -> str:
|
||||
"""Get the appropriate yt-dlp path for the current platform."""
|
||||
platform_name = detect_platform()
|
||||
return platform_paths.get(platform_name, platform_paths.get("windows", "downloader/yt-dlp.exe"))
|
||||
|
||||
|
||||
@dataclass
|
||||
class DownloadSettings:
|
||||
"""Configuration for download settings."""
|
||||
@ -109,6 +136,7 @@ class FolderStructure:
|
||||
|
||||
downloads_dir: str = "downloads"
|
||||
logs_dir: str = "logs"
|
||||
data_dir: str = "data"
|
||||
tracking_file: str = "data/karaoke_tracking.json"
|
||||
|
||||
|
||||
@ -139,14 +167,21 @@ class ConfigManager:
|
||||
Manages application configuration with loading, validation, and caching.
|
||||
"""
|
||||
|
||||
def __init__(self, config_file: Union[str, Path] = "data/config.json"):
|
||||
def __init__(self, config_file: Union[str, Path] = "config/config.json", data_dir: Optional[str] = None):
|
||||
"""
|
||||
Initialize the configuration manager.
|
||||
|
||||
Args:
|
||||
config_file: Path to the configuration file
|
||||
data_dir: Optional custom data directory path
|
||||
"""
|
||||
self.config_file = Path(config_file)
|
||||
# If config_file is relative and data_dir is provided, make it relative to data_dir
|
||||
if data_dir and not Path(config_file).is_absolute():
|
||||
self.config_file = Path(data_dir) / config_file
|
||||
else:
|
||||
self.config_file = Path(config_file)
|
||||
|
||||
self._data_dir = data_dir
|
||||
self._config: Optional[AppConfig] = None
|
||||
self._last_modified: Optional[datetime] = None
|
||||
|
||||
@ -234,11 +269,21 @@ class ConfigManager:
|
||||
folder_structure = FolderStructure(**config_data.get("folder_structure", {}))
|
||||
logging_config = LoggingConfig(**config_data.get("logging", {}))
|
||||
|
||||
# Handle platform-specific yt-dlp path
|
||||
yt_dlp_path = config_data.get("yt_dlp_path", "downloader/yt-dlp.exe")
|
||||
|
||||
# Check if platform auto-detection is enabled
|
||||
platform_settings = config_data.get("platform_settings", {})
|
||||
if platform_settings.get("auto_detect_platform", True):
|
||||
platform_paths = platform_settings.get("yt_dlp_paths", {})
|
||||
if platform_paths:
|
||||
yt_dlp_path = get_platform_yt_dlp_path(platform_paths)
|
||||
|
||||
return AppConfig(
|
||||
download_settings=download_settings,
|
||||
folder_structure=folder_structure,
|
||||
logging=logging_config,
|
||||
yt_dlp_path=config_data.get("yt_dlp_path", "downloader/yt-dlp.exe"),
|
||||
yt_dlp_path=yt_dlp_path,
|
||||
_config_file=self.config_file,
|
||||
)
|
||||
|
||||
@ -297,27 +342,35 @@ class ConfigManager:
|
||||
_config_manager: Optional[ConfigManager] = None
|
||||
|
||||
|
||||
def get_config_manager() -> ConfigManager:
|
||||
def get_config_manager(config_file: Optional[Union[str, Path]] = None, data_dir: Optional[str] = None) -> ConfigManager:
|
||||
"""
|
||||
Get the global configuration manager instance.
|
||||
|
||||
Args:
|
||||
config_file: Optional path to config file (default: "config.json" in root)
|
||||
data_dir: Optional custom data directory path
|
||||
|
||||
Returns:
|
||||
ConfigManager instance
|
||||
"""
|
||||
global _config_manager
|
||||
if _config_manager is None:
|
||||
_config_manager = ConfigManager()
|
||||
if _config_manager is None or config_file is not None or data_dir is not None:
|
||||
if config_file is None:
|
||||
config_file = "config/config.json"
|
||||
_config_manager = ConfigManager(config_file, data_dir)
|
||||
return _config_manager
|
||||
|
||||
|
||||
def load_config(force_reload: bool = False) -> AppConfig:
|
||||
def load_config(force_reload: bool = False, config_file: Optional[Union[str, Path]] = None, data_dir: Optional[str] = None) -> AppConfig:
|
||||
"""
|
||||
Load configuration using the global manager.
|
||||
|
||||
Args:
|
||||
force_reload: Force reload even if file hasn't changed
|
||||
config_file: Optional path to config file (default: "config.json" in root)
|
||||
data_dir: Optional custom data directory path
|
||||
|
||||
Returns:
|
||||
AppConfig instance
|
||||
"""
|
||||
return get_config_manager().load_config(force_reload)
|
||||
return get_config_manager(config_file, data_dir).load_config(force_reload)
|
||||
|
||||
184
karaoke_downloader/data_path_manager.py
Normal file
184
karaoke_downloader/data_path_manager.py
Normal file
@ -0,0 +1,184 @@
|
||||
"""
|
||||
Data path management utilities for the karaoke downloader.
|
||||
Provides centralized data directory path management and file path resolution.
|
||||
"""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from .config_manager import get_config_manager
|
||||
|
||||
|
||||
class DataPathManager:
|
||||
"""
|
||||
Manages data directory paths and provides utilities for resolving file paths
|
||||
relative to the configured data directory.
|
||||
"""
|
||||
|
||||
def __init__(self, data_dir: Optional[str] = None):
|
||||
"""
|
||||
Initialize the data path manager.
|
||||
|
||||
Args:
|
||||
data_dir: Optional custom data directory path. If None, uses config.
|
||||
"""
|
||||
self._data_dir = data_dir
|
||||
|
||||
# If a custom data directory is provided, look for config.json in that directory
|
||||
if data_dir:
|
||||
config_file = Path(data_dir) / "config.json"
|
||||
self._config_manager = get_config_manager(str(config_file))
|
||||
else:
|
||||
# Otherwise, use the default config.json in the root directory
|
||||
self._config_manager = get_config_manager()
|
||||
|
||||
@property
|
||||
def data_dir(self) -> Path:
|
||||
"""
|
||||
Get the configured data directory path.
|
||||
|
||||
Returns:
|
||||
Path to the data directory
|
||||
"""
|
||||
if self._data_dir:
|
||||
return Path(self._data_dir)
|
||||
|
||||
# Get from config
|
||||
config = self._config_manager.get_config()
|
||||
data_dir = getattr(config.folder_structure, 'data_dir', 'data')
|
||||
return Path(data_dir)
|
||||
|
||||
def get_path(self, filename: str) -> Path:
|
||||
"""
|
||||
Get the full path to a file in the data directory.
|
||||
|
||||
Args:
|
||||
filename: Name of the file (e.g., 'config.json', 'channels.json')
|
||||
|
||||
Returns:
|
||||
Full path to the file
|
||||
"""
|
||||
return self.data_dir / filename
|
||||
|
||||
def get_channels_json_path(self) -> Path:
|
||||
"""Get path to channels.json file."""
|
||||
return self.get_path('channels.json')
|
||||
|
||||
def get_channels_txt_path(self) -> Path:
|
||||
"""Get path to channels.txt file."""
|
||||
return self.get_path('channels.txt')
|
||||
|
||||
def get_songlist_path(self) -> Path:
|
||||
"""Get path to songList.json file."""
|
||||
return self.get_path('songList.json')
|
||||
|
||||
def get_songlist_tracking_path(self) -> Path:
|
||||
"""Get path to songlist_tracking.json file."""
|
||||
return self.get_path('songlist_tracking.json')
|
||||
|
||||
def get_karaoke_tracking_path(self) -> Path:
|
||||
"""Get path to karaoke_tracking.json file."""
|
||||
return self.get_path('karaoke_tracking.json')
|
||||
|
||||
def get_server_duplicates_tracking_path(self) -> Path:
|
||||
"""Get path to server_duplicates_tracking.json file."""
|
||||
return self.get_path('server_duplicates_tracking.json')
|
||||
|
||||
def get_manual_videos_path(self) -> Path:
|
||||
"""Get path to manual_videos.json file."""
|
||||
return self.get_path('manual_videos.json')
|
||||
|
||||
def get_songs_path(self) -> Path:
|
||||
"""Get path to songs.json file."""
|
||||
return self.get_path('songs.json')
|
||||
|
||||
def get_channel_cache_dir(self) -> Path:
|
||||
"""Get path to channel_cache directory."""
|
||||
return self.get_path('channel_cache')
|
||||
|
||||
def get_channel_cache_path(self, channel_id: str) -> Path:
|
||||
"""Get path to a specific channel cache file."""
|
||||
return self.get_channel_cache_dir() / f"{channel_id}.json"
|
||||
|
||||
def get_download_plan_cache_path(self, plan_name: str, **kwargs) -> Path:
|
||||
"""Get path to download plan cache file."""
|
||||
# Create a hash from kwargs for unique cache files
|
||||
import hashlib
|
||||
if kwargs:
|
||||
kwargs_str = str(sorted(kwargs.items()))
|
||||
hash_suffix = hashlib.md5(kwargs_str.encode()).hexdigest()[:8]
|
||||
plan_name = f"{plan_name}_{hash_suffix}"
|
||||
return self.get_path(f"plan_latest_per_channel_{plan_name}.json")
|
||||
|
||||
def get_unmatched_report_path(self, timestamp: Optional[str] = None) -> Path:
|
||||
"""Get path to unmatched songs report file."""
|
||||
if timestamp:
|
||||
return self.get_path(f"unmatched_songs_report_{timestamp}.json")
|
||||
return self.get_path("unmatched_songs_report.json")
|
||||
|
||||
def ensure_data_dir_exists(self) -> None:
|
||||
"""Ensure the data directory exists."""
|
||||
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def list_data_files(self) -> list:
|
||||
"""List all files in the data directory."""
|
||||
if not self.data_dir.exists():
|
||||
return []
|
||||
|
||||
files = []
|
||||
for file_path in self.data_dir.iterdir():
|
||||
if file_path.is_file():
|
||||
files.append(file_path.name)
|
||||
return sorted(files)
|
||||
|
||||
def file_exists(self, filename: str) -> bool:
|
||||
"""Check if a file exists in the data directory."""
|
||||
return self.get_path(filename).exists()
|
||||
|
||||
|
||||
# Global data path manager instance
|
||||
_data_path_manager: Optional[DataPathManager] = None
|
||||
|
||||
|
||||
def get_data_path_manager(data_dir: Optional[str] = None) -> DataPathManager:
|
||||
"""
|
||||
Get the global data path manager instance.
|
||||
|
||||
Args:
|
||||
data_dir: Optional custom data directory path
|
||||
|
||||
Returns:
|
||||
DataPathManager instance
|
||||
"""
|
||||
global _data_path_manager
|
||||
if _data_path_manager is None or data_dir is not None:
|
||||
_data_path_manager = DataPathManager(data_dir)
|
||||
return _data_path_manager
|
||||
|
||||
|
||||
def get_data_path(filename: str, data_dir: Optional[str] = None) -> Path:
|
||||
"""
|
||||
Get the full path to a file in the data directory.
|
||||
|
||||
Args:
|
||||
filename: Name of the file
|
||||
data_dir: Optional custom data directory path
|
||||
|
||||
Returns:
|
||||
Full path to the file
|
||||
"""
|
||||
return get_data_path_manager(data_dir).get_path(filename)
|
||||
|
||||
|
||||
def get_data_dir(data_dir: Optional[str] = None) -> Path:
|
||||
"""
|
||||
Get the configured data directory path.
|
||||
|
||||
Args:
|
||||
data_dir: Optional custom data directory path
|
||||
|
||||
Returns:
|
||||
Path to the data directory
|
||||
"""
|
||||
return get_data_path_manager(data_dir).data_dir
|
||||
@ -20,6 +20,12 @@ from karaoke_downloader.youtube_utils import (
|
||||
execute_yt_dlp_command,
|
||||
show_available_formats,
|
||||
)
|
||||
from karaoke_downloader.file_utils import (
|
||||
cleanup_temp_files,
|
||||
get_unique_filename,
|
||||
is_valid_mp4_file,
|
||||
sanitize_filename,
|
||||
)
|
||||
|
||||
|
||||
class DownloadPipeline:
|
||||
@ -63,9 +69,15 @@ class DownloadPipeline:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
try:
|
||||
# Step 1: Prepare file path
|
||||
filename = sanitize_filename(artist, title)
|
||||
output_path = self.downloads_dir / channel_name / filename
|
||||
# Step 1: Prepare file path and check for existing files
|
||||
output_path, file_exists = get_unique_filename(self.downloads_dir, channel_name, artist, title)
|
||||
|
||||
if file_exists:
|
||||
print(f"⏭️ Skipping download - file already exists: {output_path.name}")
|
||||
# Still add tags and track the existing file
|
||||
if self._add_tags(output_path, artist, title, channel_name):
|
||||
self._track_download(output_path, artist, title, video_id, channel_name)
|
||||
return True
|
||||
|
||||
# Step 2: Download video
|
||||
if not self._download_video(video_id, output_path, artist, title, channel_name):
|
||||
@ -214,8 +226,10 @@ class DownloadPipeline:
|
||||
) -> bool:
|
||||
"""Step 3: Add ID3 tags to the downloaded file."""
|
||||
try:
|
||||
# Use the same artist/title as the filename for consistency
|
||||
# Don't add "(Karaoke Version)" to the ID3 tag title
|
||||
add_id3_tags(
|
||||
output_path, f"{artist} - {title} (Karaoke Version)", channel_name
|
||||
output_path, f"{artist} - {title}", channel_name
|
||||
)
|
||||
print(f"🏷️ Added ID3 tags: {artist} - {title}")
|
||||
return True
|
||||
@ -283,9 +297,10 @@ class DownloadPipeline:
|
||||
video_title = video.get("title", "")
|
||||
|
||||
# Extract artist and title from video title
|
||||
from karaoke_downloader.id3_utils import extract_artist_title
|
||||
from karaoke_downloader.channel_parser import ChannelParser
|
||||
|
||||
artist, title = extract_artist_title(video_title)
|
||||
channel_parser = ChannelParser()
|
||||
artist, title = channel_parser.extract_artist_title(video_title, channel_name)
|
||||
|
||||
print(f" ({i}/{total}) Processing: {artist} - {title}")
|
||||
|
||||
|
||||
@ -3,19 +3,31 @@ Download plan building utilities.
|
||||
Handles pre-scanning channels and building download plans.
|
||||
"""
|
||||
|
||||
import concurrent.futures
|
||||
import hashlib
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
from karaoke_downloader.cache_manager import (
|
||||
delete_plan_cache,
|
||||
get_download_plan_cache_file,
|
||||
load_cached_plan,
|
||||
save_plan_cache,
|
||||
)
|
||||
# Import all fuzzy matching functions
|
||||
from karaoke_downloader.fuzzy_matcher import (
|
||||
create_song_key,
|
||||
extract_artist_title,
|
||||
create_video_key,
|
||||
get_similarity_function,
|
||||
is_exact_match,
|
||||
is_fuzzy_match,
|
||||
normalize_title,
|
||||
)
|
||||
from karaoke_downloader.channel_parser import ChannelParser
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
from karaoke_downloader.youtube_utils import get_channel_info
|
||||
|
||||
# Constants
|
||||
@ -23,6 +35,156 @@ DEFAULT_FILENAME_LENGTH_LIMIT = 100
|
||||
DEFAULT_ARTIST_LENGTH_LIMIT = 30
|
||||
DEFAULT_TITLE_LENGTH_LIMIT = 60
|
||||
DEFAULT_FUZZY_THRESHOLD = 85
|
||||
DEFAULT_DISPLAY_LIMIT = 10
|
||||
|
||||
|
||||
def generate_unmatched_report(unmatched: List[Dict[str, Any]], report_path: str = None) -> str:
|
||||
"""
|
||||
Generate a detailed report of unmatched songs and save it to a file.
|
||||
|
||||
Args:
|
||||
unmatched: List of unmatched songs from build_download_plan
|
||||
report_path: Optional path to save the report (default: data/unmatched_songs_report.json)
|
||||
|
||||
Returns:
|
||||
Path to the saved report file
|
||||
"""
|
||||
if report_path is None:
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
report_path = str(get_data_path_manager().get_unmatched_report_path(timestamp))
|
||||
|
||||
report_data = {
|
||||
"generated_at": datetime.now().isoformat(),
|
||||
"total_unmatched": len(unmatched),
|
||||
"unmatched_songs": []
|
||||
}
|
||||
|
||||
for song in unmatched:
|
||||
report_data["unmatched_songs"].append({
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"position": song.get("position", 0),
|
||||
"search_key": create_song_key(song["artist"], song["title"])
|
||||
})
|
||||
|
||||
# Sort by artist, then by title for easier reading
|
||||
report_data["unmatched_songs"].sort(key=lambda x: (x["artist"].lower(), x["title"].lower()))
|
||||
|
||||
# Ensure the data directory exists
|
||||
report_file = Path(report_path)
|
||||
report_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Save the report
|
||||
with open(report_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(report_data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
return str(report_file)
|
||||
|
||||
|
||||
def _scan_channel_for_matches(
|
||||
channel_url,
|
||||
channel_name,
|
||||
channel_id,
|
||||
song_keys,
|
||||
song_lookup,
|
||||
fuzzy_match,
|
||||
fuzzy_threshold,
|
||||
show_pagination,
|
||||
yt_dlp_path,
|
||||
tracker,
|
||||
):
|
||||
"""
|
||||
Scan a single channel for matches (used in parallel processing).
|
||||
|
||||
Args:
|
||||
channel_url: URL of the channel to scan
|
||||
channel_name: Name of the channel
|
||||
channel_id: ID of the channel
|
||||
song_keys: Set of song keys to match against
|
||||
song_lookup: Dictionary mapping song keys to song data
|
||||
fuzzy_match: Whether to use fuzzy matching
|
||||
fuzzy_threshold: Threshold for fuzzy matching
|
||||
show_pagination: Whether to show pagination progress
|
||||
yt_dlp_path: Path to yt-dlp executable
|
||||
tracker: Tracking manager instance
|
||||
|
||||
Returns:
|
||||
List of video matches found in this channel
|
||||
"""
|
||||
print(f"\n🚦 Scanning channel: {channel_name} ({channel_url})")
|
||||
|
||||
# Get channel info if not provided
|
||||
if not channel_name or not channel_id:
|
||||
channel_name, channel_id = get_channel_info(channel_url)
|
||||
|
||||
# Fetch video list from channel
|
||||
available_videos = tracker.get_channel_video_list(
|
||||
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False, show_pagination=show_pagination
|
||||
)
|
||||
|
||||
print(f" 📊 Channel has {len(available_videos)} videos to scan")
|
||||
|
||||
video_matches = []
|
||||
|
||||
# Pre-process video titles for efficient matching
|
||||
channel_parser = ChannelParser()
|
||||
if fuzzy_match:
|
||||
# For fuzzy matching, create normalized video keys
|
||||
for video in available_videos:
|
||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
|
||||
# Find best match among remaining songs
|
||||
best_match = None
|
||||
best_score = 0
|
||||
for song_key in song_keys:
|
||||
if song_key in song_lookup: # Only check unmatched songs
|
||||
score = get_similarity_function()(song_key, video_key)
|
||||
if score >= fuzzy_threshold and score > best_score:
|
||||
best_score = score
|
||||
best_match = song_key
|
||||
|
||||
if best_match:
|
||||
song = song_lookup[best_match]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": best_score,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[best_match]
|
||||
song_keys.remove(best_match)
|
||||
else:
|
||||
# For exact matching, use direct key comparison
|
||||
for video in available_videos:
|
||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
|
||||
if video_key in song_keys:
|
||||
song = song_lookup[video_key]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": 100,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[video_key]
|
||||
song_keys.remove(video_key)
|
||||
|
||||
print(f" ✅ Found {len(video_matches)} matches in {channel_name}")
|
||||
return video_matches
|
||||
|
||||
|
||||
def build_download_plan(
|
||||
@ -32,6 +194,9 @@ def build_download_plan(
|
||||
yt_dlp_path,
|
||||
fuzzy_match=False,
|
||||
fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD,
|
||||
show_pagination=False,
|
||||
parallel_channels=False,
|
||||
max_channel_workers=3,
|
||||
):
|
||||
"""
|
||||
For each song in undownloaded, scan all channels for a match.
|
||||
@ -52,85 +217,200 @@ def build_download_plan(
|
||||
song_keys.add(key)
|
||||
song_lookup[key] = song
|
||||
|
||||
for i, channel_url in enumerate(channel_urls, 1):
|
||||
print(f"\n🚦 Starting channel {i}/{len(channel_urls)}: {channel_url}")
|
||||
print(f" 🔍 Getting channel info...")
|
||||
channel_name, channel_id = get_channel_info(channel_url)
|
||||
print(f" ✅ Channel info: {channel_name} (ID: {channel_id})")
|
||||
print(f" 🔍 Fetching video list from channel...")
|
||||
available_videos = tracker.get_channel_video_list(
|
||||
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False
|
||||
)
|
||||
print(
|
||||
f" 📊 Channel has {len(available_videos)} videos to scan against {len(undownloaded)} songlist songs"
|
||||
)
|
||||
matches_this_channel = 0
|
||||
video_matches = [] # Initialize video_matches for this channel
|
||||
if parallel_channels:
|
||||
print(f"🚀 Running parallel channel scanning with {max_channel_workers} workers.")
|
||||
|
||||
# Create a thread-safe copy of song data for parallel processing
|
||||
import threading
|
||||
song_keys_lock = threading.Lock()
|
||||
song_lookup_lock = threading.Lock()
|
||||
|
||||
def scan_channel_safe(channel_url):
|
||||
"""Thread-safe channel scanning function."""
|
||||
print(f"\n🚦 Scanning channel: {channel_url}")
|
||||
|
||||
# Get channel info
|
||||
channel_name, channel_id = get_channel_info(channel_url)
|
||||
print(f" ✅ Channel info: {channel_name} (ID: {channel_id})")
|
||||
|
||||
# Fetch video list from channel
|
||||
available_videos = tracker.get_channel_video_list(
|
||||
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False, show_pagination=show_pagination
|
||||
)
|
||||
print(f" 📊 Channel has {len(available_videos)} videos to scan")
|
||||
|
||||
video_matches = []
|
||||
|
||||
# Pre-process video titles for efficient matching
|
||||
channel_parser = ChannelParser()
|
||||
if fuzzy_match:
|
||||
# For fuzzy matching, create normalized video keys
|
||||
for video in available_videos:
|
||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
|
||||
# Pre-process video titles for efficient matching
|
||||
if fuzzy_match:
|
||||
# For fuzzy matching, create normalized video keys
|
||||
for video in available_videos:
|
||||
v_artist, v_title = extract_artist_title(video["title"])
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
# Find best match among remaining songs (thread-safe)
|
||||
best_match = None
|
||||
best_score = 0
|
||||
with song_keys_lock:
|
||||
available_song_keys = list(song_keys) # Copy for iteration
|
||||
|
||||
for song_key in available_song_keys:
|
||||
with song_lookup_lock:
|
||||
if song_key in song_lookup: # Only check unmatched songs
|
||||
score = get_similarity_function()(song_key, video_key)
|
||||
if score >= fuzzy_threshold and score > best_score:
|
||||
best_score = score
|
||||
best_match = song_key
|
||||
|
||||
# Find best match among remaining songs
|
||||
best_match = None
|
||||
best_score = 0
|
||||
for song_key in song_keys:
|
||||
if song_key in song_lookup: # Only check unmatched songs
|
||||
score = get_similarity_function()(song_key, video_key)
|
||||
if score >= fuzzy_threshold and score > best_score:
|
||||
best_score = score
|
||||
best_match = song_key
|
||||
if best_match:
|
||||
with song_lookup_lock:
|
||||
if best_match in song_lookup: # Double-check it's still available
|
||||
song = song_lookup[best_match]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": best_score,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[best_match]
|
||||
with song_keys_lock:
|
||||
song_keys.discard(best_match)
|
||||
else:
|
||||
# For exact matching, use direct key comparison
|
||||
for video in available_videos:
|
||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
|
||||
if best_match:
|
||||
song = song_lookup[best_match]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": best_score,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[best_match]
|
||||
song_keys.remove(best_match)
|
||||
matches_this_channel += 1
|
||||
else:
|
||||
# For exact matching, use direct key comparison
|
||||
for video in available_videos:
|
||||
v_artist, v_title = extract_artist_title(video["title"])
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
with song_lookup_lock:
|
||||
if video_key in song_keys and video_key in song_lookup:
|
||||
song = song_lookup[video_key]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": 100,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[video_key]
|
||||
with song_keys_lock:
|
||||
song_keys.discard(video_key)
|
||||
|
||||
print(f" ✅ Found {len(video_matches)} matches in {channel_name}")
|
||||
return video_matches
|
||||
|
||||
# Execute parallel channel scanning
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=max_channel_workers) as executor:
|
||||
# Submit all channel scanning tasks
|
||||
future_to_channel = {
|
||||
executor.submit(scan_channel_safe, channel_url): channel_url
|
||||
for channel_url in channel_urls
|
||||
}
|
||||
|
||||
# Process results as they complete
|
||||
for future in concurrent.futures.as_completed(future_to_channel):
|
||||
channel_url = future_to_channel[future]
|
||||
try:
|
||||
video_matches = future.result()
|
||||
plan.extend(video_matches)
|
||||
channel_name, _ = get_channel_info(channel_url)
|
||||
channel_match_counts[channel_name] = len(video_matches)
|
||||
except Exception as e:
|
||||
print(f"⚠️ Error processing channel {channel_url}: {e}")
|
||||
channel_name, _ = get_channel_info(channel_url)
|
||||
channel_match_counts[channel_name] = 0
|
||||
else:
|
||||
for i, channel_url in enumerate(channel_urls, 1):
|
||||
print(f"\n🚦 Starting channel {i}/{len(channel_urls)}: {channel_url}")
|
||||
print(f" 🔍 Getting channel info...")
|
||||
channel_name, channel_id = get_channel_info(channel_url)
|
||||
print(f" ✅ Channel info: {channel_name} (ID: {channel_id})")
|
||||
print(f" 🔍 Fetching video list from channel...")
|
||||
available_videos = tracker.get_channel_video_list(
|
||||
channel_url, yt_dlp_path=str(yt_dlp_path), force_refresh=False, show_pagination=show_pagination
|
||||
)
|
||||
print(
|
||||
f" 📊 Channel has {len(available_videos)} videos to scan against {len(undownloaded)} songlist songs"
|
||||
)
|
||||
matches_this_channel = 0
|
||||
video_matches = [] # Initialize video_matches for this channel
|
||||
|
||||
if video_key in song_keys:
|
||||
song = song_lookup[video_key]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": 100,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[video_key]
|
||||
song_keys.remove(video_key)
|
||||
matches_this_channel += 1
|
||||
# Pre-process video titles for efficient matching
|
||||
channel_parser = ChannelParser()
|
||||
if fuzzy_match:
|
||||
# For fuzzy matching, create normalized video keys
|
||||
for video in available_videos:
|
||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
|
||||
# Add matches to plan
|
||||
plan.extend(video_matches)
|
||||
# Find best match among remaining songs
|
||||
best_match = None
|
||||
best_score = 0
|
||||
for song_key in song_keys:
|
||||
if song_key in song_lookup: # Only check unmatched songs
|
||||
score = get_similarity_function()(song_key, video_key)
|
||||
if score >= fuzzy_threshold and score > best_score:
|
||||
best_score = score
|
||||
best_match = song_key
|
||||
|
||||
# Print match count once per channel
|
||||
channel_match_counts[channel_name] = matches_this_channel
|
||||
print(f" → Found {matches_this_channel} songlist matches in this channel.")
|
||||
if best_match:
|
||||
song = song_lookup[best_match]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": best_score,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[best_match]
|
||||
song_keys.remove(best_match)
|
||||
matches_this_channel += 1
|
||||
else:
|
||||
# For exact matching, use direct key comparison
|
||||
for video in available_videos:
|
||||
v_artist, v_title = channel_parser.extract_artist_title(video["title"], channel_name)
|
||||
video_key = create_song_key(v_artist, v_title)
|
||||
|
||||
if video_key in song_keys:
|
||||
song = song_lookup[video_key]
|
||||
video_matches.append(
|
||||
{
|
||||
"artist": song["artist"],
|
||||
"title": song["title"],
|
||||
"channel_name": channel_name,
|
||||
"channel_url": channel_url,
|
||||
"video_id": video["id"],
|
||||
"video_title": video["title"],
|
||||
"match_score": 100,
|
||||
}
|
||||
)
|
||||
# Remove matched song from future consideration
|
||||
del song_lookup[video_key]
|
||||
song_keys.remove(video_key)
|
||||
matches_this_channel += 1
|
||||
|
||||
# Add matches to plan
|
||||
plan.extend(video_matches)
|
||||
|
||||
# Print match count once per channel
|
||||
channel_match_counts[channel_name] = matches_this_channel
|
||||
print(f" → Found {matches_this_channel} songlist matches in this channel.")
|
||||
|
||||
# Remaining unmatched songs
|
||||
unmatched = list(song_lookup.values())
|
||||
@ -143,4 +423,13 @@ def build_download_plan(
|
||||
f" TOTAL: {sum(channel_match_counts.values())} matches across {len(channel_match_counts)} channels."
|
||||
)
|
||||
|
||||
# Generate unmatched songs report if there are any
|
||||
if unmatched:
|
||||
try:
|
||||
report_file = generate_unmatched_report(unmatched)
|
||||
print(f"\n📋 Unmatched songs report saved to: {report_file}")
|
||||
print(f"📋 Total unmatched songs: {len(unmatched)}")
|
||||
except Exception as e:
|
||||
print(f"⚠️ Could not generate unmatched songs report: {e}")
|
||||
|
||||
return plan, unmatched
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -34,7 +34,6 @@ def sanitize_filename(
|
||||
# Clean up title
|
||||
safe_title = (
|
||||
title.replace("(From ", "")
|
||||
.replace(")", "")
|
||||
.replace(" - ", " ")
|
||||
.replace(":", "")
|
||||
)
|
||||
@ -54,12 +53,19 @@ def sanitize_filename(
|
||||
)
|
||||
safe_artist = safe_artist.strip()
|
||||
|
||||
# Create filename
|
||||
filename = f"{safe_artist} - {safe_title}.mp4"
|
||||
# Create filename - handle empty artist case
|
||||
if not safe_artist or safe_artist.strip() == "":
|
||||
# If no artist, just use the title
|
||||
filename = f"{safe_title}.mp4"
|
||||
else:
|
||||
filename = f"{safe_artist} - {safe_title}.mp4"
|
||||
|
||||
# Limit filename length if needed
|
||||
if len(filename) > max_length:
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
if not safe_artist or safe_artist.strip() == "":
|
||||
filename = f"{safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
else:
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
|
||||
return filename
|
||||
|
||||
@ -81,11 +87,19 @@ def generate_possible_filenames(
|
||||
safe_title = sanitize_title_for_filenames(title)
|
||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||
|
||||
return [
|
||||
f"{safe_artist} - {safe_title}.mp4", # Songlist mode
|
||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
||||
f"{safe_artist} - {safe_title} (Karaoke Version).mp4", # Channel videos mode
|
||||
]
|
||||
# Handle empty artist case
|
||||
if not safe_artist or safe_artist.strip() == "":
|
||||
return [
|
||||
f"{safe_title}.mp4", # Songlist mode (no artist)
|
||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
||||
f"{safe_title} (Karaoke Version).mp4", # Channel videos mode (no artist)
|
||||
]
|
||||
else:
|
||||
return [
|
||||
f"{safe_artist} - {safe_title}.mp4", # Songlist mode
|
||||
f"{channel_name} - {safe_title}.mp4", # Latest-per-channel mode
|
||||
f"{safe_artist} - {safe_title} (Karaoke Version).mp4", # Channel videos mode
|
||||
]
|
||||
|
||||
|
||||
def sanitize_title_for_filenames(title: str) -> str:
|
||||
@ -112,6 +126,7 @@ def check_file_exists_with_patterns(
|
||||
) -> Tuple[bool, Optional[Path]]:
|
||||
"""
|
||||
Check if a file exists using multiple possible filename patterns.
|
||||
Also checks for files with (2), (3), etc. suffixes that yt-dlp might create.
|
||||
|
||||
Args:
|
||||
downloads_dir: Base downloads directory
|
||||
@ -130,15 +145,56 @@ def check_file_exists_with_patterns(
|
||||
# Apply length limits if needed
|
||||
safe_artist = artist.replace("'", "").replace('"', "").strip()
|
||||
safe_title = sanitize_title_for_filenames(title)
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
if not safe_artist or safe_artist.strip() == "":
|
||||
filename = f"{safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
else:
|
||||
filename = f"{safe_artist[:DEFAULT_ARTIST_LENGTH_LIMIT]} - {safe_title[:DEFAULT_TITLE_LENGTH_LIMIT]}.mp4"
|
||||
|
||||
# Check for exact filename match
|
||||
file_path = channel_dir / filename
|
||||
if file_path.exists() and file_path.stat().st_size > 0:
|
||||
return True, file_path
|
||||
|
||||
# Check for files with (2), (3), etc. suffixes
|
||||
base_name = filename.replace(".mp4", "")
|
||||
for suffix in range(2, 10): # Check up to (9)
|
||||
suffixed_filename = f"{base_name} ({suffix}).mp4"
|
||||
suffixed_path = channel_dir / suffixed_filename
|
||||
if suffixed_path.exists() and suffixed_path.stat().st_size > 0:
|
||||
return True, suffixed_path
|
||||
|
||||
return False, None
|
||||
|
||||
|
||||
def get_unique_filename(
|
||||
downloads_dir: Path, channel_name: str, artist: str, title: str
|
||||
) -> Tuple[Path, bool]:
|
||||
"""
|
||||
Get a unique filename for download, checking for existing files including duplicates.
|
||||
|
||||
Args:
|
||||
downloads_dir: Base downloads directory
|
||||
channel_name: Channel name
|
||||
artist: Song artist
|
||||
title: Song title
|
||||
|
||||
Returns:
|
||||
Tuple of (file_path, is_existing) where is_existing indicates if a file already exists
|
||||
"""
|
||||
filename = sanitize_filename(artist, title)
|
||||
channel_dir = downloads_dir / channel_name
|
||||
file_path = channel_dir / filename
|
||||
|
||||
# Check if file already exists
|
||||
exists, existing_path = check_file_exists_with_patterns(downloads_dir, channel_name, artist, title)
|
||||
|
||||
if exists and existing_path:
|
||||
print(f"📁 File already exists: {existing_path.name}")
|
||||
return existing_path, True
|
||||
|
||||
return file_path, False
|
||||
|
||||
|
||||
def ensure_directory_exists(directory: Path) -> None:
|
||||
"""
|
||||
Ensure a directory exists, creating it if necessary.
|
||||
|
||||
@ -32,10 +32,72 @@ def normalize_title(title):
|
||||
|
||||
|
||||
def extract_artist_title(video_title):
|
||||
"""Extract artist and title from video title."""
|
||||
"""
|
||||
Extract artist and title from video title.
|
||||
|
||||
This function handles multiple common video title formats found on YouTube karaoke channels:
|
||||
|
||||
1. "Artist - Title" format: "38 Special - Hold On Loosely"
|
||||
2. "Title Karaoke | Artist Karaoke Version" format: "Hold On Loosely Karaoke | 38 Special Karaoke Version"
|
||||
3. "Title Artist KARAOKE" format: "Hold On Loosely 38 Special KARAOKE"
|
||||
|
||||
Args:
|
||||
video_title (str): The YouTube video title to parse
|
||||
|
||||
Returns:
|
||||
tuple: (artist, title) where artist and title are strings. If parsing fails,
|
||||
artist will be empty string and title will be the full video title.
|
||||
|
||||
Examples:
|
||||
>>> extract_artist_title("38 Special - Hold On Loosely")
|
||||
("38 Special", "Hold On Loosely")
|
||||
|
||||
>>> extract_artist_title("Hold On Loosely Karaoke | 38 Special Karaoke Version")
|
||||
("38 Special", "Hold On Loosely")
|
||||
|
||||
>>> extract_artist_title("Unknown Format Video Title")
|
||||
("", "Unknown Format Video Title")
|
||||
"""
|
||||
# Handle "Artist - Title" format
|
||||
if " - " in video_title:
|
||||
parts = video_title.split(" - ", 1)
|
||||
return parts[0].strip(), parts[1].strip()
|
||||
|
||||
# Handle "Title Karaoke | Artist Karaoke Version" format
|
||||
if " | " in video_title and "karaoke" in video_title.lower():
|
||||
parts = video_title.split(" | ", 1)
|
||||
title_part = parts[0].strip()
|
||||
artist_part = parts[1].strip()
|
||||
|
||||
# Clean up the parts
|
||||
title = title_part.replace("Karaoke", "").strip()
|
||||
artist = artist_part.replace("Karaoke Version", "").strip()
|
||||
|
||||
return artist, title
|
||||
|
||||
# Handle "Title Artist KARAOKE" format
|
||||
if "karaoke" in video_title.lower():
|
||||
# Try to find the artist by looking for common patterns
|
||||
title_lower = video_title.lower()
|
||||
|
||||
# Look for patterns like "Title Artist KARAOKE"
|
||||
# This is a simplified approach - we'll need to improve this
|
||||
words = video_title.split()
|
||||
if len(words) >= 3:
|
||||
# Assume the last word before "KARAOKE" is part of the artist
|
||||
for i, word in enumerate(words):
|
||||
if "karaoke" in word.lower():
|
||||
if i >= 2:
|
||||
# Everything before the last word before KARAOKE is title
|
||||
# Everything after is artist
|
||||
title = " ".join(words[:i-1])
|
||||
artist = " ".join(words[i-1:])
|
||||
return artist, title
|
||||
|
||||
# If we can't parse it, return empty artist and full title
|
||||
return "", video_title
|
||||
|
||||
# Default: return empty artist and full title
|
||||
return "", video_title
|
||||
|
||||
|
||||
|
||||
@ -7,17 +7,33 @@ except ImportError:
|
||||
MUTAGEN_AVAILABLE = False
|
||||
|
||||
|
||||
def extract_artist_title(video_title):
|
||||
title = (
|
||||
video_title.replace("(Karaoke Version)", "").replace("(Karaoke)", "").strip()
|
||||
)
|
||||
if " - " in title:
|
||||
parts = title.split(" - ", 1)
|
||||
if len(parts) == 2:
|
||||
artist = parts[0].strip()
|
||||
song_title = parts[1].strip()
|
||||
return artist, song_title
|
||||
return "Unknown Artist", title
|
||||
def clean_channel_name(channel_name: str) -> str:
|
||||
"""
|
||||
Clean channel name for ID3 tagging by removing @ symbol and ensuring it's alpha-only.
|
||||
|
||||
Args:
|
||||
channel_name: Raw channel name (may contain @ symbol)
|
||||
|
||||
Returns:
|
||||
Cleaned channel name suitable for ID3 tags
|
||||
"""
|
||||
# Remove @ symbol if present
|
||||
if channel_name.startswith('@'):
|
||||
channel_name = channel_name[1:]
|
||||
|
||||
# Remove any non-alphanumeric characters and convert to single word
|
||||
# Keep only letters, numbers, and spaces, then take the first word
|
||||
cleaned = re.sub(r'[^a-zA-Z0-9\s]', '', channel_name)
|
||||
words = cleaned.split()
|
||||
if words:
|
||||
return words[0] # Return only the first word
|
||||
|
||||
return "Unknown"
|
||||
|
||||
|
||||
# Import the enhanced extract_artist_title function from fuzzy_matcher.py
|
||||
# This ensures consistent parsing across all modules and supports multiple video title formats
|
||||
from karaoke_downloader.fuzzy_matcher import extract_artist_title
|
||||
|
||||
|
||||
def add_id3_tags(file_path, video_title, channel_name):
|
||||
@ -26,12 +42,13 @@ def add_id3_tags(file_path, video_title, channel_name):
|
||||
return
|
||||
try:
|
||||
artist, title = extract_artist_title(video_title)
|
||||
clean_channel = clean_channel_name(channel_name)
|
||||
mp4 = MP4(str(file_path))
|
||||
mp4["\xa9nam"] = title
|
||||
mp4["\xa9ART"] = artist
|
||||
mp4["\xa9alb"] = f"{channel_name} Karaoke"
|
||||
mp4["\xa9alb"] = clean_channel # Use clean channel name only, no suffix
|
||||
mp4["\xa9gen"] = "Karaoke"
|
||||
mp4.save()
|
||||
print(f"📝 Added ID3 tags: Artist='{artist}', Title='{title}'")
|
||||
print(f"📝 Added ID3 tags: Artist='{artist}', Title='{title}', Album='{clean_channel}'")
|
||||
except Exception as e:
|
||||
print(f"⚠️ Could not add ID3 tags: {e}")
|
||||
|
||||
83
karaoke_downloader/manual_video_manager.py
Normal file
83
karaoke_downloader/manual_video_manager.py
Normal file
@ -0,0 +1,83 @@
|
||||
"""
|
||||
Manual video manager for handling static video collections.
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Any
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
def load_manual_videos(manual_file: str = None) -> List[Dict[str, Any]]:
|
||||
if manual_file is None:
|
||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
||||
"""
|
||||
Load manual videos from the JSON file.
|
||||
|
||||
Args:
|
||||
manual_file: Path to manual videos JSON file
|
||||
|
||||
Returns:
|
||||
List of video dictionaries
|
||||
"""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
if not manual_path.exists():
|
||||
print(f"⚠️ Manual videos file not found: {manual_file}")
|
||||
return []
|
||||
|
||||
try:
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
videos = data.get("videos", [])
|
||||
print(f"📋 Loaded {len(videos)} manual videos from {manual_file}")
|
||||
return videos
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error loading manual videos: {e}")
|
||||
return []
|
||||
|
||||
def get_manual_videos_for_channel(channel_name: str, manual_file: str = None) -> List[Dict[str, Any]]:
|
||||
if manual_file is None:
|
||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
||||
"""
|
||||
Get manual videos for a specific channel.
|
||||
|
||||
Args:
|
||||
channel_name: Channel name (should be "@ManualVideos")
|
||||
manual_file: Path to manual videos JSON file
|
||||
|
||||
Returns:
|
||||
List of video dictionaries
|
||||
"""
|
||||
if channel_name != "@ManualVideos":
|
||||
return []
|
||||
|
||||
return load_manual_videos(manual_file)
|
||||
|
||||
def is_manual_channel(channel_url: str) -> bool:
|
||||
"""
|
||||
Check if a channel URL is a manual channel.
|
||||
|
||||
Args:
|
||||
channel_url: Channel URL
|
||||
|
||||
Returns:
|
||||
True if it's a manual channel
|
||||
"""
|
||||
return channel_url == "manual://static"
|
||||
|
||||
def get_manual_channel_info(channel_url: str) -> tuple[str, str]:
|
||||
"""
|
||||
Get channel info for manual channels.
|
||||
|
||||
Args:
|
||||
channel_url: Channel URL
|
||||
|
||||
Returns:
|
||||
Tuple of (channel_name, channel_id)
|
||||
"""
|
||||
if channel_url == "manual://static":
|
||||
return "@ManualVideos", "manual"
|
||||
return None, None
|
||||
@ -7,28 +7,40 @@ import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
def load_server_songs(songs_path="data/songs.json"):
|
||||
"""Load the list of songs already available on the server."""
|
||||
|
||||
def load_server_songs(songs_path=None):
|
||||
if songs_path is None:
|
||||
songs_path = str(get_data_path_manager().get_songs_path())
|
||||
"""Load the list of songs already available on the server with format information."""
|
||||
songs_file = Path(songs_path)
|
||||
if not songs_file.exists():
|
||||
print(f"⚠️ Server songs file not found: {songs_path}")
|
||||
return set()
|
||||
return {}
|
||||
try:
|
||||
with open(songs_file, "r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
server_songs = set()
|
||||
server_songs = {}
|
||||
for song in data:
|
||||
if "artist" in song and "title" in song:
|
||||
if "artist" in song and "title" in song and "path" in song:
|
||||
artist = song["artist"].strip()
|
||||
title = song["title"].strip()
|
||||
path = song["path"].strip()
|
||||
key = f"{artist.lower()}_{normalize_title(title)}"
|
||||
server_songs.add(key)
|
||||
server_songs[key] = {
|
||||
"artist": artist,
|
||||
"title": title,
|
||||
"path": path,
|
||||
"is_mp3": path.lower().endswith('.mp3'),
|
||||
"is_cdg": 'cdg' in path.lower(),
|
||||
"is_mp4": path.lower().endswith('.mp4')
|
||||
}
|
||||
print(f"📋 Loaded {len(server_songs)} songs from server (songs.json)")
|
||||
return server_songs
|
||||
except (json.JSONDecodeError, FileNotFoundError) as e:
|
||||
print(f"⚠️ Could not load server songs: {e}")
|
||||
return set()
|
||||
return {}
|
||||
|
||||
|
||||
def is_song_on_server(server_songs, artist, title):
|
||||
@ -37,9 +49,24 @@ def is_song_on_server(server_songs, artist, title):
|
||||
return key in server_songs
|
||||
|
||||
|
||||
def should_skip_server_song(server_songs, artist, title):
|
||||
"""Check if a song should be skipped because it's already available as MP4 on server.
|
||||
Returns True if the song should be skipped (MP4 format), False if it should be downloaded (MP3/CDG format)."""
|
||||
key = f"{artist.lower()}_{normalize_title(title)}"
|
||||
if key not in server_songs:
|
||||
return False # Not on server, so don't skip
|
||||
|
||||
song_info = server_songs[key]
|
||||
# Skip if it's an MP4 file (video format)
|
||||
# Don't skip if it's MP3 or in CDG folder (different format)
|
||||
return song_info.get("is_mp4", False) and not song_info.get("is_cdg", False)
|
||||
|
||||
|
||||
def load_server_duplicates_tracking(
|
||||
tracking_path="data/server_duplicates_tracking.json",
|
||||
tracking_path=None,
|
||||
):
|
||||
if tracking_path is None:
|
||||
tracking_path = str(get_data_path_manager().get_server_duplicates_tracking_path())
|
||||
"""Load the tracking of songs found to be duplicates on the server."""
|
||||
tracking_file = Path(tracking_path)
|
||||
if not tracking_file.exists():
|
||||
@ -53,8 +80,10 @@ def load_server_duplicates_tracking(
|
||||
|
||||
|
||||
def save_server_duplicates_tracking(
|
||||
tracking, tracking_path="data/server_duplicates_tracking.json"
|
||||
tracking, tracking_path=None
|
||||
):
|
||||
if tracking_path is None:
|
||||
tracking_path = str(get_data_path_manager().get_server_duplicates_tracking_path())
|
||||
"""Save the tracking of songs found to be duplicates on the server."""
|
||||
try:
|
||||
with open(tracking_path, "w", encoding="utf-8") as f:
|
||||
@ -86,8 +115,9 @@ def mark_song_as_server_duplicate(tracking, artist, title, video_title, channel_
|
||||
def check_and_mark_server_duplicate(
|
||||
server_songs, server_duplicates_tracking, artist, title, video_title, channel_name
|
||||
):
|
||||
"""Check if a song is on server and mark it as duplicate if so. Returns True if it's a duplicate."""
|
||||
if is_song_on_server(server_songs, artist, title):
|
||||
"""Check if a song should be skipped because it's already available as MP4 on server and mark it as duplicate if so.
|
||||
Returns True if it should be skipped (MP4 format), False if it should be downloaded (MP3/CDG format)."""
|
||||
if should_skip_server_song(server_songs, artist, title):
|
||||
if not is_song_marked_as_server_duplicate(
|
||||
server_duplicates_tracking, artist, title
|
||||
):
|
||||
|
||||
@ -35,6 +35,7 @@ class SongValidator:
|
||||
video_title: Optional[str] = None,
|
||||
server_songs: Optional[Dict[str, Any]] = None,
|
||||
server_duplicates_tracking: Optional[Dict[str, Any]] = None,
|
||||
force_download: bool = False,
|
||||
) -> Tuple[bool, Optional[str], int]:
|
||||
"""
|
||||
Check if a song should be skipped based on multiple criteria.
|
||||
@ -53,10 +54,15 @@ class SongValidator:
|
||||
video_title: YouTube video title (optional)
|
||||
server_songs: Server songs data (optional)
|
||||
server_duplicates_tracking: Server duplicates tracking (optional)
|
||||
force_download: If True, bypass all validation checks and force download
|
||||
|
||||
Returns:
|
||||
Tuple of (should_skip, reason, total_filtered)
|
||||
"""
|
||||
# If force download is enabled, skip all validation checks
|
||||
if force_download:
|
||||
return False, None, 0
|
||||
|
||||
total_filtered = 0
|
||||
|
||||
# Check 1: Already downloaded by this system
|
||||
|
||||
265
karaoke_downloader/songlist_generator.py
Normal file
265
karaoke_downloader/songlist_generator.py
Normal file
@ -0,0 +1,265 @@
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
from mutagen.mp4 import MP4
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
|
||||
class SongListGenerator:
|
||||
"""Utility class for generating song lists from MP4 files with ID3 tags."""
|
||||
|
||||
def __init__(self, songlist_path: str = None):
|
||||
if songlist_path is None:
|
||||
songlist_path = str(get_data_path_manager().get_songlist_path())
|
||||
self.songlist_path = Path(songlist_path)
|
||||
self.songlist_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def read_existing_songlist(self) -> List[Dict[str, Any]]:
|
||||
"""Read existing song list from JSON file."""
|
||||
if self.songlist_path.exists():
|
||||
try:
|
||||
with open(self.songlist_path, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, IOError) as e:
|
||||
print(f"⚠️ Warning: Could not read existing songlist: {e}")
|
||||
return []
|
||||
return []
|
||||
|
||||
def save_songlist(self, songlist: List[Dict[str, Any]]) -> None:
|
||||
"""Save song list to JSON file."""
|
||||
try:
|
||||
with open(self.songlist_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(songlist, f, indent=2, ensure_ascii=False)
|
||||
print(f"✅ Song list saved to {self.songlist_path}")
|
||||
except IOError as e:
|
||||
print(f"❌ Error saving song list: {e}")
|
||||
raise
|
||||
|
||||
def extract_id3_tags(self, mp4_path: Path) -> Optional[Dict[str, str]]:
|
||||
"""Extract ID3 tags from MP4 file."""
|
||||
try:
|
||||
mp4 = MP4(str(mp4_path))
|
||||
|
||||
# Extract artist and title from ID3 tags
|
||||
artist = mp4.get("\xa9ART", ["Unknown Artist"])[0] if "\xa9ART" in mp4 else "Unknown Artist"
|
||||
title = mp4.get("\xa9nam", ["Unknown Title"])[0] if "\xa9nam" in mp4 else "Unknown Title"
|
||||
|
||||
return {
|
||||
"artist": artist,
|
||||
"title": title
|
||||
}
|
||||
except Exception as e:
|
||||
print(f"⚠️ Warning: Could not extract ID3 tags from {mp4_path.name}: {e}")
|
||||
return None
|
||||
|
||||
def scan_directory_for_mp4_files(self, directory_path: str) -> List[Path]:
|
||||
"""Scan directory for MP4 files."""
|
||||
directory = Path(directory_path)
|
||||
if not directory.exists():
|
||||
raise FileNotFoundError(f"Directory not found: {directory_path}")
|
||||
|
||||
if not directory.is_dir():
|
||||
raise ValueError(f"Path is not a directory: {directory_path}")
|
||||
|
||||
mp4_files = list(directory.glob("*.mp4"))
|
||||
if not mp4_files:
|
||||
print(f"⚠️ No MP4 files found in {directory_path}")
|
||||
return []
|
||||
|
||||
print(f"📁 Found {len(mp4_files)} MP4 files in {directory.name}")
|
||||
return sorted(mp4_files)
|
||||
|
||||
def generate_songlist_from_directory(self, directory_path: str, append: bool = True) -> Dict[str, Any]:
|
||||
"""Generate a song list from MP4 files in a directory."""
|
||||
directory = Path(directory_path)
|
||||
directory_name = directory.name
|
||||
|
||||
# Scan for MP4 files
|
||||
mp4_files = self.scan_directory_for_mp4_files(directory_path)
|
||||
if not mp4_files:
|
||||
return {}
|
||||
|
||||
# Extract ID3 tags and create songs list
|
||||
songs = []
|
||||
for index, mp4_file in enumerate(mp4_files, start=1):
|
||||
id3_tags = self.extract_id3_tags(mp4_file)
|
||||
if id3_tags:
|
||||
song = {
|
||||
"position": index,
|
||||
"title": id3_tags["title"],
|
||||
"artist": id3_tags["artist"]
|
||||
}
|
||||
songs.append(song)
|
||||
print(f" {index:3d}. {id3_tags['artist']} - {id3_tags['title']}")
|
||||
|
||||
if not songs:
|
||||
print("❌ No valid ID3 tags found in any MP4 files")
|
||||
return {}
|
||||
|
||||
# Create the song list entry
|
||||
songlist_entry = {
|
||||
"title": directory_name,
|
||||
"songs": songs
|
||||
}
|
||||
|
||||
# Handle appending to existing song list
|
||||
if append:
|
||||
existing_songlist = self.read_existing_songlist()
|
||||
|
||||
# Check if a playlist with this title already exists
|
||||
existing_index = None
|
||||
for i, entry in enumerate(existing_songlist):
|
||||
if entry.get("title") == directory_name:
|
||||
existing_index = i
|
||||
break
|
||||
|
||||
if existing_index is not None:
|
||||
# Replace existing entry
|
||||
print(f"🔄 Replacing existing playlist: {directory_name}")
|
||||
existing_songlist[existing_index] = songlist_entry
|
||||
else:
|
||||
# Add new entry to the beginning of the list
|
||||
print(f"➕ Adding new playlist: {directory_name}")
|
||||
existing_songlist.insert(0, songlist_entry)
|
||||
|
||||
self.save_songlist(existing_songlist)
|
||||
else:
|
||||
# Create new song list with just this entry
|
||||
print(f"📝 Creating new song list with playlist: {directory_name}")
|
||||
self.save_songlist([songlist_entry])
|
||||
|
||||
return songlist_entry
|
||||
|
||||
def generate_songlist_from_multiple_directories(self, directory_paths: List[str], append: bool = True) -> List[Dict[str, Any]]:
|
||||
"""Generate song lists from multiple directories."""
|
||||
results = []
|
||||
errors = []
|
||||
|
||||
# Read existing song list once at the beginning
|
||||
existing_songlist = self.read_existing_songlist() if append else []
|
||||
|
||||
for directory_path in directory_paths:
|
||||
try:
|
||||
print(f"\n📂 Processing directory: {directory_path}")
|
||||
directory = Path(directory_path)
|
||||
directory_name = directory.name
|
||||
|
||||
# Scan for MP4 files
|
||||
mp4_files = self.scan_directory_for_mp4_files(directory_path)
|
||||
if not mp4_files:
|
||||
continue
|
||||
|
||||
# Extract ID3 tags and create songs list
|
||||
songs = []
|
||||
for index, mp4_file in enumerate(mp4_files, start=1):
|
||||
id3_tags = self.extract_id3_tags(mp4_file)
|
||||
if id3_tags:
|
||||
song = {
|
||||
"position": index,
|
||||
"title": id3_tags["title"],
|
||||
"artist": id3_tags["artist"]
|
||||
}
|
||||
songs.append(song)
|
||||
print(f" {index:3d}. {id3_tags['artist']} - {id3_tags['title']}")
|
||||
|
||||
if not songs:
|
||||
print("❌ No valid ID3 tags found in any MP4 files")
|
||||
continue
|
||||
|
||||
# Create the song list entry
|
||||
songlist_entry = {
|
||||
"title": directory_name,
|
||||
"songs": songs
|
||||
}
|
||||
|
||||
# Check if a playlist with this title already exists
|
||||
existing_index = None
|
||||
for i, entry in enumerate(existing_songlist):
|
||||
if entry.get("title") == directory_name:
|
||||
existing_index = i
|
||||
break
|
||||
|
||||
if existing_index is not None:
|
||||
# Replace existing entry
|
||||
print(f"🔄 Replacing existing playlist: {directory_name}")
|
||||
existing_songlist[existing_index] = songlist_entry
|
||||
else:
|
||||
# Add new entry to the beginning of the list
|
||||
print(f"➕ Adding new playlist: {directory_name}")
|
||||
existing_songlist.insert(0, songlist_entry)
|
||||
|
||||
results.append(songlist_entry)
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"Error processing {directory_path}: {e}"
|
||||
print(f"❌ {error_msg}")
|
||||
errors.append(error_msg)
|
||||
|
||||
# Save the final song list
|
||||
if results:
|
||||
if append:
|
||||
# Save the updated existing song list
|
||||
self.save_songlist(existing_songlist)
|
||||
else:
|
||||
# Create new song list with just the results
|
||||
self.save_songlist(results)
|
||||
|
||||
# If there were any errors, raise an exception
|
||||
if errors:
|
||||
raise Exception(f"Failed to process {len(errors)} directories: {'; '.join(errors)}")
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point for song list generation."""
|
||||
import argparse
|
||||
import sys
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate song lists from MP4 files with ID3 tags",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python -m karaoke_downloader.songlist_generator /path/to/mp4/directory
|
||||
python -m karaoke_downloader.songlist_generator /path/to/dir1 /path/to/dir2 --no-append
|
||||
python -m karaoke_downloader.songlist_generator /path/to/dir --songlist-path custom_songlist.json
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"directories",
|
||||
nargs="+",
|
||||
help="Directory paths containing MP4 files with ID3 tags"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--no-append",
|
||||
action="store_true",
|
||||
help="Create a new song list instead of appending to existing one"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--songlist-path",
|
||||
default=None,
|
||||
help="Path to the song list JSON file (default: songList.json in the data directory)"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
generator = SongListGenerator(args.songlist_path)
|
||||
generator.generate_songlist_from_multiple_directories(
|
||||
args.directories,
|
||||
append=not args.no_append
|
||||
)
|
||||
print("\n✅ Song list generation completed successfully!")
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -7,6 +7,7 @@ import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
from karaoke_downloader.server_manager import (
|
||||
check_and_mark_server_duplicate,
|
||||
is_song_marked_as_server_duplicate,
|
||||
@ -16,7 +17,9 @@ from karaoke_downloader.server_manager import (
|
||||
)
|
||||
|
||||
|
||||
def load_songlist(songlist_path="data/songList.json"):
|
||||
def load_songlist(songlist_path=None):
|
||||
if songlist_path is None:
|
||||
songlist_path = str(get_data_path_manager().get_songlist_path())
|
||||
songlist_file = Path(songlist_path)
|
||||
if not songlist_file.exists():
|
||||
print(f"⚠️ Songlist file not found: {songlist_path}")
|
||||
@ -55,7 +58,9 @@ def normalize_title(title):
|
||||
return " ".join(normalized.split()).lower()
|
||||
|
||||
|
||||
def load_songlist_tracking(tracking_path="data/songlist_tracking.json"):
|
||||
def load_songlist_tracking(tracking_path=None):
|
||||
if tracking_path is None:
|
||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
||||
tracking_file = Path(tracking_path)
|
||||
if not tracking_file.exists():
|
||||
return {}
|
||||
@ -67,7 +72,9 @@ def load_songlist_tracking(tracking_path="data/songlist_tracking.json"):
|
||||
return {}
|
||||
|
||||
|
||||
def save_songlist_tracking(tracking, tracking_path="data/songlist_tracking.json"):
|
||||
def save_songlist_tracking(tracking, tracking_path=None):
|
||||
if tracking_path is None:
|
||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
||||
try:
|
||||
with open(tracking_path, "w", encoding="utf-8") as f:
|
||||
json.dump(tracking, f, indent=2, ensure_ascii=False)
|
||||
|
||||
@ -1,10 +1,12 @@
|
||||
import threading
|
||||
from enum import Enum
|
||||
|
||||
import json
|
||||
from datetime import datetime
|
||||
import os
|
||||
import re
|
||||
from datetime import datetime, timedelta
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
class SongStatus(str, Enum):
|
||||
NOT_DOWNLOADED = "NOT_DOWNLOADED"
|
||||
@ -25,46 +27,133 @@ class FormatType(str, Enum):
|
||||
class TrackingManager:
|
||||
def __init__(
|
||||
self,
|
||||
tracking_file="data/karaoke_tracking.json",
|
||||
cache_file="data/channel_cache.json",
|
||||
tracking_file=None,
|
||||
cache_dir=None,
|
||||
):
|
||||
if tracking_file is None:
|
||||
tracking_file = str(get_data_path_manager().get_karaoke_tracking_path())
|
||||
if cache_dir is None:
|
||||
cache_dir = str(get_data_path_manager().get_channel_cache_dir())
|
||||
|
||||
self.tracking_file = Path(tracking_file)
|
||||
self.cache_file = Path(cache_file)
|
||||
self.data = {"playlists": {}, "songs": {}}
|
||||
self.cache = {}
|
||||
self._lock = threading.Lock()
|
||||
self._load()
|
||||
self._load_cache()
|
||||
self.cache_dir = Path(cache_dir)
|
||||
|
||||
# Ensure cache directory exists
|
||||
self.cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.data = self._load()
|
||||
print(f"📊 Tracking manager initialized with {len(self.data.get('songs', {}))} tracked songs")
|
||||
|
||||
def _load(self):
|
||||
"""Load tracking data from JSON file."""
|
||||
if self.tracking_file.exists():
|
||||
try:
|
||||
with open(self.tracking_file, "r", encoding="utf-8") as f:
|
||||
self.data = json.load(f)
|
||||
except Exception:
|
||||
self.data = {"playlists": {}, "songs": {}}
|
||||
return json.load(f)
|
||||
except json.JSONDecodeError:
|
||||
print(f"⚠️ Corrupted tracking file, creating new one")
|
||||
|
||||
return {"songs": {}, "playlists": {}, "last_updated": datetime.now().isoformat()}
|
||||
|
||||
def _save(self):
|
||||
with self._lock:
|
||||
with open(self.tracking_file, "w", encoding="utf-8") as f:
|
||||
json.dump(self.data, f, indent=2, ensure_ascii=False)
|
||||
"""Save tracking data to JSON file."""
|
||||
self.data["last_updated"] = datetime.now().isoformat()
|
||||
self.tracking_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(self.tracking_file, "w", encoding="utf-8") as f:
|
||||
json.dump(self.data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
def force_save(self):
|
||||
"""Force save the tracking data."""
|
||||
self._save()
|
||||
|
||||
def _load_cache(self):
|
||||
if self.cache_file.exists():
|
||||
try:
|
||||
with open(self.cache_file, "r", encoding="utf-8") as f:
|
||||
self.cache = json.load(f)
|
||||
except Exception:
|
||||
self.cache = {}
|
||||
def _get_channel_cache_file(self, channel_id: str) -> Path:
|
||||
"""Get the cache file path for a specific channel."""
|
||||
# Sanitize channel ID for filename
|
||||
safe_channel_id = re.sub(r'[<>:"/\\|?*]', '_', channel_id)
|
||||
return self.cache_dir / f"{safe_channel_id}.json"
|
||||
|
||||
def save_cache(self):
|
||||
with open(self.cache_file, "w", encoding="utf-8") as f:
|
||||
json.dump(self.cache, f, indent=2, ensure_ascii=False)
|
||||
def _load_channel_cache(self, channel_id: str) -> List[Dict[str, str]]:
|
||||
"""Load cache for a specific channel."""
|
||||
cache_file = self._get_channel_cache_file(channel_id)
|
||||
if cache_file.exists():
|
||||
try:
|
||||
with open(cache_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
return data.get('videos', [])
|
||||
except (json.JSONDecodeError, KeyError):
|
||||
print(f" ⚠️ Corrupted cache file for {channel_id}, will recreate")
|
||||
return []
|
||||
return []
|
||||
|
||||
def _save_channel_cache(self, channel_id: str, videos: List[Dict[str, str]]):
|
||||
"""Save cache for a specific channel."""
|
||||
cache_file = self._get_channel_cache_file(channel_id)
|
||||
data = {
|
||||
'channel_id': channel_id,
|
||||
'videos': videos,
|
||||
'last_updated': datetime.now().isoformat(),
|
||||
'video_count': len(videos)
|
||||
}
|
||||
with open(cache_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
def _clear_channel_cache(self, channel_id: str):
|
||||
"""Clear cache for a specific channel."""
|
||||
cache_file = self._get_channel_cache_file(channel_id)
|
||||
if cache_file.exists():
|
||||
cache_file.unlink()
|
||||
print(f" 🗑️ Cleared cache file: {cache_file.name}")
|
||||
|
||||
def get_cache_info(self):
|
||||
"""Get information about all channel cache files."""
|
||||
cache_files = list(self.cache_dir.glob("*.json"))
|
||||
total_videos = 0
|
||||
cache_info = []
|
||||
|
||||
for cache_file in cache_files:
|
||||
try:
|
||||
with open(cache_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
video_count = len(data.get('videos', []))
|
||||
total_videos += video_count
|
||||
last_updated = data.get('last_updated', 'Unknown')
|
||||
cache_info.append({
|
||||
'channel': data.get('channel_id', cache_file.stem),
|
||||
'videos': video_count,
|
||||
'last_updated': last_updated,
|
||||
'file': cache_file.name
|
||||
})
|
||||
except Exception as e:
|
||||
print(f"⚠️ Error reading cache file {cache_file.name}: {e}")
|
||||
|
||||
return {
|
||||
'total_channels': len(cache_files),
|
||||
'total_videos': total_videos,
|
||||
'channels': cache_info
|
||||
}
|
||||
|
||||
def clear_channel_cache(self, channel_id=None):
|
||||
"""Clear cache for a specific channel or all channels."""
|
||||
if channel_id:
|
||||
self._clear_channel_cache(channel_id)
|
||||
print(f"🗑️ Cleared cache for channel: {channel_id}")
|
||||
else:
|
||||
# Clear all cache files
|
||||
cache_files = list(self.cache_dir.glob("*.json"))
|
||||
for cache_file in cache_files:
|
||||
cache_file.unlink()
|
||||
print(f"🗑️ Cleared all {len(cache_files)} channel cache files")
|
||||
|
||||
def set_cache_duration(self, hours):
|
||||
"""Placeholder for cache duration logic"""
|
||||
pass
|
||||
|
||||
def export_playlist_report(self, playlist_id):
|
||||
"""Export a report for a specific playlist."""
|
||||
pass
|
||||
|
||||
def get_statistics(self):
|
||||
"""Get statistics about tracked songs."""
|
||||
total_songs = len(self.data["songs"])
|
||||
downloaded_songs = sum(
|
||||
1
|
||||
@ -102,11 +191,13 @@ class TrackingManager:
|
||||
}
|
||||
|
||||
def get_playlist_songs(self, playlist_id):
|
||||
"""Get songs for a specific playlist."""
|
||||
return [
|
||||
s for s in self.data["songs"].values() if s["playlist_id"] == playlist_id
|
||||
]
|
||||
|
||||
def get_failed_songs(self, playlist_id=None):
|
||||
"""Get failed songs, optionally filtered by playlist."""
|
||||
if playlist_id:
|
||||
return [
|
||||
s
|
||||
@ -118,6 +209,7 @@ class TrackingManager:
|
||||
]
|
||||
|
||||
def get_partial_downloads(self, playlist_id=None):
|
||||
"""Get partial downloads, optionally filtered by playlist."""
|
||||
if playlist_id:
|
||||
return [
|
||||
s
|
||||
@ -129,7 +221,7 @@ class TrackingManager:
|
||||
]
|
||||
|
||||
def cleanup_orphaned_files(self, downloads_dir):
|
||||
# Remove tracking entries for files that no longer exist
|
||||
"""Remove tracking entries for files that no longer exist."""
|
||||
orphaned = []
|
||||
for song_id, song in list(self.data["songs"].items()):
|
||||
file_path = song.get("file_path")
|
||||
@ -139,51 +231,17 @@ class TrackingManager:
|
||||
self.force_save()
|
||||
return orphaned
|
||||
|
||||
def get_cache_info(self):
|
||||
total_channels = len(self.cache)
|
||||
total_cached_videos = sum(len(v) for v in self.cache.values())
|
||||
cache_duration_hours = 24 # default
|
||||
last_updated = None
|
||||
return {
|
||||
"total_channels": total_channels,
|
||||
"total_cached_videos": total_cached_videos,
|
||||
"cache_duration_hours": cache_duration_hours,
|
||||
"last_updated": last_updated,
|
||||
}
|
||||
|
||||
def clear_channel_cache(self, channel_id=None):
|
||||
if channel_id is None or channel_id == "all":
|
||||
self.cache = {}
|
||||
else:
|
||||
self.cache.pop(channel_id, None)
|
||||
self.save_cache()
|
||||
|
||||
def set_cache_duration(self, hours):
|
||||
# Placeholder for cache duration logic
|
||||
pass
|
||||
|
||||
def export_playlist_report(self, playlist_id):
|
||||
playlist = self.data["playlists"].get(playlist_id)
|
||||
if not playlist:
|
||||
return f"Playlist '{playlist_id}' not found."
|
||||
songs = self.get_playlist_songs(playlist_id)
|
||||
report = {"playlist": playlist, "songs": songs}
|
||||
return json.dumps(report, indent=2, ensure_ascii=False)
|
||||
|
||||
def is_song_downloaded(self, artist, title, channel_name=None, video_id=None):
|
||||
"""
|
||||
Check if a song has already been downloaded by this system.
|
||||
Returns True if the song exists in tracking with DOWNLOADED or CONVERTED status.
|
||||
Check if a song has already been downloaded.
|
||||
Returns True if the song exists in tracking with DOWNLOADED status.
|
||||
"""
|
||||
# If we have video_id and channel_name, try direct key lookup first (most efficient)
|
||||
if video_id and channel_name:
|
||||
song_key = f"{video_id}@{channel_name}"
|
||||
if song_key in self.data["songs"]:
|
||||
song_data = self.data["songs"][song_key]
|
||||
if song_data.get("status") in [
|
||||
SongStatus.DOWNLOADED,
|
||||
SongStatus.CONVERTED,
|
||||
]:
|
||||
if song_data.get("status") == SongStatus.DOWNLOADED:
|
||||
return True
|
||||
|
||||
# Fallback to content search (for cases where we don't have video_id)
|
||||
@ -191,19 +249,14 @@ class TrackingManager:
|
||||
# Check if this song matches the artist and title
|
||||
if song_data.get("artist") == artist and song_data.get("title") == title:
|
||||
# Check if it's marked as downloaded
|
||||
if song_data.get("status") in [
|
||||
SongStatus.DOWNLOADED,
|
||||
SongStatus.CONVERTED,
|
||||
]:
|
||||
if song_data.get("status") == SongStatus.DOWNLOADED:
|
||||
return True
|
||||
# Also check the video title field which might contain the song info
|
||||
video_title = song_data.get("video_title", "")
|
||||
if video_title and artist in video_title and title in video_title:
|
||||
if song_data.get("status") in [
|
||||
SongStatus.DOWNLOADED,
|
||||
SongStatus.CONVERTED,
|
||||
]:
|
||||
if song_data.get("status") == SongStatus.DOWNLOADED:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def is_file_exists(self, file_path):
|
||||
@ -283,65 +336,359 @@ class TrackingManager:
|
||||
self._save()
|
||||
|
||||
def get_channel_video_list(
|
||||
self, channel_url, yt_dlp_path="downloader/yt-dlp.exe", force_refresh=False
|
||||
self, channel_url, yt_dlp_path="downloader/yt-dlp.exe", force_refresh=False, show_pagination=False
|
||||
):
|
||||
"""
|
||||
Return a list of videos (dicts with 'title' and 'id') for the channel, using cache if available unless force_refresh is True.
|
||||
|
||||
Args:
|
||||
channel_url: YouTube channel URL
|
||||
yt_dlp_path: Path to yt-dlp executable
|
||||
force_refresh: Force refresh cache even if available
|
||||
show_pagination: Show page-by-page progress (slower but more detailed)
|
||||
"""
|
||||
channel_name, channel_id = None, None
|
||||
|
||||
# Check if this is a manual channel
|
||||
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
||||
|
||||
if is_manual_channel(channel_url):
|
||||
channel_name, channel_id = get_manual_channel_info(channel_url)
|
||||
if channel_name and channel_id:
|
||||
print(f" 📋 Loading manual videos for {channel_name}")
|
||||
manual_videos = get_manual_videos_for_channel(channel_name)
|
||||
# Convert to the expected format
|
||||
videos = []
|
||||
for video in manual_videos:
|
||||
videos.append({
|
||||
"title": video.get("title", ""),
|
||||
"id": video.get("id", ""),
|
||||
"url": video.get("url", "")
|
||||
})
|
||||
print(f" ✅ Loaded {len(videos)} manual videos")
|
||||
return videos
|
||||
else:
|
||||
print(f" ❌ Could not get manual channel info for: {channel_url}")
|
||||
return []
|
||||
|
||||
# Regular YouTube channel processing
|
||||
from karaoke_downloader.youtube_utils import get_channel_info
|
||||
|
||||
channel_name, channel_id = get_channel_info(channel_url)
|
||||
|
||||
if not channel_id:
|
||||
print(f" ❌ Could not extract channel ID from URL: {channel_url}")
|
||||
return []
|
||||
|
||||
# Try multiple possible cache keys
|
||||
possible_keys = [
|
||||
channel_id, # The extracted channel ID
|
||||
channel_url, # The full URL
|
||||
channel_name, # The extracted channel name
|
||||
]
|
||||
print(f" 🔍 Channel: {channel_name} (ID: {channel_id})")
|
||||
|
||||
cache_key = None
|
||||
for key in possible_keys:
|
||||
if key and key in self.cache:
|
||||
cache_key = key
|
||||
break
|
||||
# Check if we have cached data for this channel
|
||||
if not force_refresh:
|
||||
cached_videos = self._load_channel_cache(channel_id)
|
||||
if cached_videos:
|
||||
# Validate that the cached data has proper video IDs
|
||||
corrupted = False
|
||||
|
||||
# Check if any video IDs look like titles instead of proper YouTube IDs
|
||||
for video in cached_videos[:20]: # Check first 20 videos
|
||||
video_id = video.get("id", "")
|
||||
# More comprehensive validation - YouTube IDs should be 11 characters and contain only alphanumeric, hyphens, and underscores
|
||||
if video_id and (
|
||||
len(video_id) != 11 or
|
||||
not video_id.replace('-', '').replace('_', '').isalnum() or
|
||||
" " in video_id or
|
||||
"Lyrics" in video_id or
|
||||
"KARAOKE" in video_id.upper() or
|
||||
"Vocal" in video_id or
|
||||
"Guide" in video_id
|
||||
):
|
||||
print(f" ⚠️ Detected corrupted video ID in cache: '{video_id}'")
|
||||
corrupted = True
|
||||
break
|
||||
|
||||
if corrupted:
|
||||
print(f" 🧹 Clearing corrupted cache for {channel_id}")
|
||||
self._clear_channel_cache(channel_id)
|
||||
force_refresh = True
|
||||
else:
|
||||
print(f" 📋 Using cached video list ({len(cached_videos)} videos)")
|
||||
return cached_videos
|
||||
|
||||
if not cache_key:
|
||||
cache_key = channel_id or channel_url # Use as fallback for new entries
|
||||
|
||||
print(f" 🔍 Trying cache keys: {possible_keys}")
|
||||
print(f" 🔍 Selected cache key: '{cache_key}'")
|
||||
|
||||
if not force_refresh and cache_key in self.cache:
|
||||
print(
|
||||
f" 📋 Using cached video list ({len(self.cache[cache_key])} videos)"
|
||||
)
|
||||
return self.cache[cache_key]
|
||||
# Choose fetch method based on show_pagination flag
|
||||
if show_pagination:
|
||||
return self._fetch_videos_with_pagination(channel_url, channel_id, yt_dlp_path)
|
||||
else:
|
||||
print(f" ❌ Cache miss for all keys")
|
||||
return self._fetch_videos_flat_playlist(channel_url, channel_id, yt_dlp_path)
|
||||
|
||||
def _fetch_videos_with_pagination(self, channel_url, channel_id, yt_dlp_path):
|
||||
"""Fetch videos showing page-by-page progress."""
|
||||
print(f" 🌐 Fetching video list from YouTube (page-by-page mode)...")
|
||||
print(f" 📡 Channel URL: {channel_url}")
|
||||
|
||||
import subprocess
|
||||
|
||||
all_videos = []
|
||||
page = 1
|
||||
videos_per_page = 200 # YouTube/yt-dlp supports up to 200 videos per page, reducing API calls and errors
|
||||
|
||||
while True:
|
||||
print(f" 📄 Fetching page {page}...")
|
||||
|
||||
# Fetch one page at a time
|
||||
cmd = [
|
||||
yt_dlp_path,
|
||||
"--flat-playlist",
|
||||
"--print",
|
||||
"%(title)s|%(id)s|%(url)s",
|
||||
"--playlist-start",
|
||||
str((page - 1) * videos_per_page + 1),
|
||||
"--playlist-end",
|
||||
str(page * videos_per_page),
|
||||
channel_url,
|
||||
]
|
||||
|
||||
try:
|
||||
# Increased timeout to 180 seconds for larger pages (200 videos)
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True, timeout=180)
|
||||
lines = result.stdout.strip().splitlines()
|
||||
|
||||
# Save raw output for debugging (for each page)
|
||||
raw_output_file = self._get_channel_cache_file(channel_id).parent / f"{channel_id}_raw_output_page{page}.txt"
|
||||
try:
|
||||
with open(raw_output_file, 'w', encoding='utf-8') as f:
|
||||
f.write(f"# Raw yt-dlp output for {channel_id} - Page {page}\n")
|
||||
f.write(f"# Channel URL: {channel_url}\n")
|
||||
f.write(f"# Command: {' '.join(cmd)}\n")
|
||||
f.write(f"# Timestamp: {datetime.now().isoformat()}\n")
|
||||
f.write(f"# Total lines: {len(lines)}\n")
|
||||
f.write("#" * 80 + "\n\n")
|
||||
for i, line in enumerate(lines, 1):
|
||||
f.write(f"{i:6d}: {line}\n")
|
||||
print(f" 💾 Saved raw output to: {raw_output_file.name}")
|
||||
except Exception as e:
|
||||
print(f" ⚠️ Could not save raw output: {e}")
|
||||
|
||||
if not lines:
|
||||
print(f" ✅ No more videos found on page {page}")
|
||||
break
|
||||
|
||||
print(f" 📊 Page {page}: Found {len(lines)} videos")
|
||||
|
||||
page_videos = []
|
||||
invalid_count = 0
|
||||
|
||||
for line in lines:
|
||||
if not line.strip():
|
||||
continue
|
||||
|
||||
# More robust parsing that handles titles with | characters
|
||||
# Extract video ID directly from the URL that yt-dlp provides
|
||||
|
||||
# Find the URL and extract video ID from it
|
||||
url_match = re.search(r'https://www\.youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})', line)
|
||||
if not url_match:
|
||||
continue
|
||||
|
||||
# Extract video ID directly from the URL
|
||||
video_id = url_match.group(1)
|
||||
|
||||
# Extract title (everything before the video ID in the line)
|
||||
title = line[:line.find(video_id)].rstrip('|').strip()
|
||||
|
||||
# Validate video ID
|
||||
if video_id and (
|
||||
len(video_id) == 11 and
|
||||
video_id.replace('-', '').replace('_', '').isalnum() and
|
||||
" " not in video_id and
|
||||
"Lyrics" not in video_id and
|
||||
"KARAOKE" not in video_id.upper() and
|
||||
"Vocal" not in video_id and
|
||||
"Guide" not in video_id
|
||||
):
|
||||
page_videos.append({"title": title, "id": video_id})
|
||||
else:
|
||||
invalid_count += 1
|
||||
if invalid_count <= 3: # Show first 3 invalid IDs per page
|
||||
print(f" ⚠️ Invalid ID: '{video_id}' for '{title[:50]}...'")
|
||||
|
||||
if invalid_count > 3:
|
||||
print(f" ⚠️ ... and {invalid_count - 3} more invalid IDs on this page")
|
||||
|
||||
all_videos.extend(page_videos)
|
||||
print(f" ✅ Page {page}: Added {len(page_videos)} valid videos (total: {len(all_videos)})")
|
||||
|
||||
# If we got fewer videos than expected, we're probably at the end
|
||||
if len(lines) < videos_per_page:
|
||||
print(f" 🏁 Reached end of channel (last page had {len(lines)} videos)")
|
||||
break
|
||||
|
||||
page += 1
|
||||
|
||||
# Safety check to prevent infinite loops
|
||||
if page > 50: # Max 50 pages (10,000 videos with 200 per page)
|
||||
print(f" ⚠️ Reached maximum page limit (50 pages), stopping")
|
||||
break
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
print(f" ⚠️ Page {page} timed out, stopping")
|
||||
break
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f" ❌ Error fetching page {page}: {e}")
|
||||
break
|
||||
except KeyboardInterrupt:
|
||||
print(f" ⏹️ User interrupted, stopping at page {page}")
|
||||
break
|
||||
|
||||
if not all_videos:
|
||||
print(f" ❌ No valid videos found")
|
||||
return []
|
||||
|
||||
print(f" 🎉 Channel download complete!")
|
||||
print(f" 📊 Total videos fetched: {len(all_videos)}")
|
||||
|
||||
# Save to individual channel cache file
|
||||
self._save_channel_cache(channel_id, all_videos)
|
||||
print(f" 💾 Saved cache to: {self._get_channel_cache_file(channel_id).name}")
|
||||
|
||||
return all_videos
|
||||
|
||||
def _fetch_videos_flat_playlist(self, channel_url, channel_id, yt_dlp_path):
|
||||
"""Fetch all videos using flat playlist (faster but less detailed progress)."""
|
||||
# Fetch with yt-dlp
|
||||
print(f" 🌐 Fetching video list from YouTube (this may take a while)...")
|
||||
print(f" 📡 Channel URL: {channel_url}")
|
||||
|
||||
import subprocess
|
||||
from karaoke_downloader.youtube_utils import _parse_yt_dlp_command
|
||||
|
||||
cmd = [
|
||||
yt_dlp_path,
|
||||
# First, let's get the total count to show progress
|
||||
count_cmd = _parse_yt_dlp_command(yt_dlp_path) + [
|
||||
"--flat-playlist",
|
||||
"--print",
|
||||
"%(title)s",
|
||||
"--playlist-end",
|
||||
"1", # Just get first video to test
|
||||
channel_url,
|
||||
]
|
||||
|
||||
try:
|
||||
print(f" 🔍 Testing channel access...")
|
||||
test_result = subprocess.run(count_cmd, capture_output=True, text=True, timeout=30)
|
||||
if test_result.returncode == 0:
|
||||
print(f" ✅ Channel is accessible")
|
||||
else:
|
||||
print(f" ⚠️ Channel test failed: {test_result.stderr}")
|
||||
except subprocess.TimeoutExpired:
|
||||
print(f" ⚠️ Channel test timed out")
|
||||
except Exception as e:
|
||||
print(f" ⚠️ Channel test error: {e}")
|
||||
|
||||
# Now fetch all videos with progress indicators
|
||||
cmd = _parse_yt_dlp_command(yt_dlp_path) + [
|
||||
"--flat-playlist",
|
||||
"--print",
|
||||
"%(title)s|%(id)s|%(url)s",
|
||||
"--verbose", # Add verbose output to see what's happening
|
||||
channel_url,
|
||||
]
|
||||
|
||||
try:
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
|
||||
print(f" 🔧 Running yt-dlp command: {' '.join(cmd)}")
|
||||
print(f" 📥 Starting video list download...")
|
||||
|
||||
# Use a timeout and show progress
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True, timeout=300)
|
||||
lines = result.stdout.strip().splitlines()
|
||||
|
||||
# Save raw output for debugging
|
||||
raw_output_file = self._get_channel_cache_file(channel_id).parent / f"{channel_id}_raw_output.txt"
|
||||
try:
|
||||
with open(raw_output_file, 'w', encoding='utf-8') as f:
|
||||
f.write(f"# Raw yt-dlp output for {channel_id}\n")
|
||||
f.write(f"# Channel URL: {channel_url}\n")
|
||||
f.write(f"# Command: {' '.join(cmd)}\n")
|
||||
f.write(f"# Timestamp: {datetime.now().isoformat()}\n")
|
||||
f.write(f"# Total lines: {len(lines)}\n")
|
||||
f.write("#" * 80 + "\n\n")
|
||||
for i, line in enumerate(lines, 1):
|
||||
f.write(f"{i:6d}: {line}\n")
|
||||
print(f" 💾 Saved raw output to: {raw_output_file.name}")
|
||||
except Exception as e:
|
||||
print(f" ⚠️ Could not save raw output: {e}")
|
||||
|
||||
print(f" 📄 Raw output lines: {len(lines)}")
|
||||
print(f" 📊 Download completed successfully!")
|
||||
|
||||
# Show some sample lines to understand the format
|
||||
if lines:
|
||||
print(f" 📋 Sample output format:")
|
||||
for i, line in enumerate(lines[:3]):
|
||||
print(f" Line {i+1}: {line[:100]}...")
|
||||
if len(lines) > 3:
|
||||
print(f" ... and {len(lines) - 3} more lines")
|
||||
|
||||
videos = []
|
||||
for line in lines:
|
||||
parts = line.split("|")
|
||||
if len(parts) >= 2:
|
||||
title, video_id = parts[0].strip(), parts[1].strip()
|
||||
invalid_count = 0
|
||||
|
||||
print(f" 🔍 Processing {len(lines)} video entries...")
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
if i % 1000 == 0 and i > 0: # Progress indicator every 1000 lines
|
||||
print(f" 📊 Processing line {i}/{len(lines)}... ({i/len(lines)*100:.1f}%)")
|
||||
|
||||
# More robust parsing that handles titles with | characters
|
||||
# Extract video ID directly from the URL that yt-dlp provides
|
||||
|
||||
# Find the URL and extract video ID from it
|
||||
url_match = re.search(r'https://www\.youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})', line)
|
||||
if not url_match:
|
||||
invalid_count += 1
|
||||
if invalid_count <= 5:
|
||||
print(f" ⚠️ Skipping line with no URL: '{line[:100]}...'")
|
||||
elif invalid_count == 6:
|
||||
print(f" ⚠️ ... and {len(lines) - i - 1} more invalid lines")
|
||||
continue
|
||||
|
||||
# Extract video ID directly from the URL
|
||||
video_id = url_match.group(1)
|
||||
|
||||
# Extract title (everything before the video ID in the line)
|
||||
title = line[:line.find(video_id)].rstrip('|').strip()
|
||||
|
||||
# Validate video ID
|
||||
if video_id and (
|
||||
len(video_id) == 11 and
|
||||
video_id.replace('-', '').replace('_', '').isalnum() and
|
||||
" " not in video_id and
|
||||
"Lyrics" not in video_id and
|
||||
"KARAOKE" not in video_id.upper() and
|
||||
"Vocal" not in video_id and
|
||||
"Guide" not in video_id
|
||||
):
|
||||
videos.append({"title": title, "id": video_id})
|
||||
self.cache[cache_key] = videos
|
||||
self.save_cache()
|
||||
else:
|
||||
invalid_count += 1
|
||||
if invalid_count <= 5: # Only show first 5 invalid IDs
|
||||
print(f" ⚠️ Skipping invalid video ID: '{video_id}' for title: '{title[:50]}...'")
|
||||
elif invalid_count == 6:
|
||||
print(f" ⚠️ ... and {len(lines) - i - 1} more invalid IDs")
|
||||
|
||||
if not videos:
|
||||
print(f" ❌ No valid videos found after parsing")
|
||||
return []
|
||||
|
||||
print(f" ✅ Parsed {len(videos)} valid videos from YouTube")
|
||||
print(f" ⚠️ Skipped {invalid_count} invalid video IDs")
|
||||
|
||||
# Save to individual channel cache file
|
||||
self._save_channel_cache(channel_id, videos)
|
||||
print(f" 💾 Saved cache to: {self._get_channel_cache_file(channel_id).name}")
|
||||
|
||||
return videos
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
print(f"❌ yt-dlp timed out after 5 minutes - channel may be too large")
|
||||
return []
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"❌ yt-dlp failed to fetch playlist for cache: {e}")
|
||||
print(f" 📄 stderr: {e.stderr}")
|
||||
return []
|
||||
|
||||
@ -106,6 +106,10 @@ def download_single_video(
|
||||
print(f"⬇️ Downloading: {artist} - {title} -> {output_path}")
|
||||
|
||||
video_url = f"https://www.youtube.com/watch?v={video_id}"
|
||||
|
||||
# Debug: Show the video_id and URL being used
|
||||
print(f"🔍 DEBUG: video_id = '{video_id}'")
|
||||
print(f"🔍 DEBUG: video_url = '{video_url}'")
|
||||
|
||||
# Build command using centralized utility
|
||||
cmd = build_yt_dlp_command(yt_dlp_path, video_url, output_path, config)
|
||||
@ -255,7 +259,7 @@ def execute_download_plan(
|
||||
video_id = item["video_id"]
|
||||
video_title = item["video_title"]
|
||||
|
||||
print(f"\n⬇️ Downloading {len(download_plan) - idx} of {total_to_download}:")
|
||||
print(f"\n⬇️ Downloading {downloaded_count + 1} of {total_to_download}:")
|
||||
print(f" 📋 Songlist: {artist} - {title}")
|
||||
print(f" 🎬 Video: {video_title} ({channel_name})")
|
||||
if "match_score" in item:
|
||||
|
||||
@ -9,6 +9,19 @@ from typing import Any, Dict, List, Optional, Union
|
||||
from karaoke_downloader.config_manager import AppConfig
|
||||
|
||||
|
||||
def _parse_yt_dlp_command(yt_dlp_path: str) -> List[str]:
|
||||
"""
|
||||
Parse yt-dlp path/command into a list of command arguments.
|
||||
Handles both file paths and command strings like 'python3 -m yt_dlp'.
|
||||
"""
|
||||
if yt_dlp_path.startswith(('python', 'python3')):
|
||||
# It's a Python module command
|
||||
return yt_dlp_path.split()
|
||||
else:
|
||||
# It's a file path
|
||||
return [yt_dlp_path]
|
||||
|
||||
|
||||
def get_channel_info(
|
||||
channel_url: str, yt_dlp_path: str = "downloader/yt-dlp.exe"
|
||||
) -> tuple[str, str]:
|
||||
@ -43,7 +56,7 @@ def get_playlist_info(
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""Get playlist information using yt-dlp."""
|
||||
try:
|
||||
cmd = [yt_dlp_path, "--dump-json", "--flat-playlist", playlist_url]
|
||||
cmd = _parse_yt_dlp_command(yt_dlp_path) + ["--dump-json", "--flat-playlist", playlist_url]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
|
||||
videos = []
|
||||
for line in result.stdout.strip().split("\n"):
|
||||
@ -75,8 +88,7 @@ def build_yt_dlp_command(
|
||||
Returns:
|
||||
List of command arguments for subprocess.run
|
||||
"""
|
||||
cmd = [
|
||||
str(yt_dlp_path),
|
||||
cmd = _parse_yt_dlp_command(yt_dlp_path) + [
|
||||
"--no-check-certificates",
|
||||
"--ignore-errors",
|
||||
"--no-warnings",
|
||||
@ -128,7 +140,7 @@ def show_available_formats(
|
||||
timeout: Timeout in seconds
|
||||
"""
|
||||
print(f"🔍 Checking available formats for: {video_url}")
|
||||
format_cmd = [str(yt_dlp_path), "--list-formats", video_url]
|
||||
format_cmd = _parse_yt_dlp_command(yt_dlp_path) + ["--list-formats", video_url]
|
||||
try:
|
||||
format_result = subprocess.run(
|
||||
format_cmd, capture_output=True, text=True, timeout=timeout
|
||||
|
||||
220
setup_macos.py
Normal file
220
setup_macos.py
Normal file
@ -0,0 +1,220 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
macOS setup script for Karaoke Video Downloader.
|
||||
This script helps users set up yt-dlp and FFmpeg on macOS.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def check_ffmpeg():
|
||||
"""Check if FFmpeg is installed."""
|
||||
try:
|
||||
result = subprocess.run(["ffmpeg", "-version"], capture_output=True, text=True, timeout=10)
|
||||
return result.returncode == 0
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||
return False
|
||||
|
||||
|
||||
def check_yt_dlp():
|
||||
"""Check if yt-dlp is installed via pip or binary."""
|
||||
# Check pip installation
|
||||
try:
|
||||
result = subprocess.run([sys.executable, "-m", "yt_dlp", "--version"],
|
||||
capture_output=True, text=True, timeout=10)
|
||||
if result.returncode == 0:
|
||||
return True
|
||||
except (subprocess.TimeoutExpired, subprocess.CalledProcessError):
|
||||
pass
|
||||
|
||||
# Check binary file
|
||||
binary_path = Path("downloader/yt-dlp_macos")
|
||||
if binary_path.exists():
|
||||
try:
|
||||
result = subprocess.run([str(binary_path), "--version"],
|
||||
capture_output=True, text=True, timeout=10)
|
||||
return result.returncode == 0
|
||||
except (subprocess.TimeoutExpired, subprocess.CalledProcessError):
|
||||
pass
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def install_ffmpeg():
|
||||
"""Install FFmpeg via Homebrew."""
|
||||
print("🎬 Installing FFmpeg...")
|
||||
|
||||
# Check if Homebrew is installed
|
||||
try:
|
||||
subprocess.run(["brew", "--version"], capture_output=True, check=True)
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print("❌ Homebrew is not installed. Please install Homebrew first:")
|
||||
print(" /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"")
|
||||
return False
|
||||
|
||||
try:
|
||||
print("🍺 Installing FFmpeg via Homebrew...")
|
||||
result = subprocess.run(["brew", "install", "ffmpeg"],
|
||||
capture_output=True, text=True, check=True)
|
||||
print("✅ FFmpeg installed successfully!")
|
||||
return True
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"❌ Failed to install FFmpeg: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def download_yt_dlp_binary():
|
||||
"""Download yt-dlp binary for macOS."""
|
||||
print("📥 Downloading yt-dlp binary for macOS...")
|
||||
|
||||
# Create downloader directory if it doesn't exist
|
||||
downloader_dir = Path("downloader")
|
||||
downloader_dir.mkdir(exist_ok=True)
|
||||
|
||||
# Download yt-dlp binary
|
||||
binary_path = downloader_dir / "yt-dlp_macos"
|
||||
url = "https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos"
|
||||
|
||||
try:
|
||||
print(f"📡 Downloading from: {url}")
|
||||
result = subprocess.run(["curl", "-L", "-o", str(binary_path), url],
|
||||
capture_output=True, text=True, check=True)
|
||||
|
||||
# Make it executable
|
||||
binary_path.chmod(0o755)
|
||||
print(f"✅ yt-dlp binary downloaded to: {binary_path}")
|
||||
|
||||
# Test the binary
|
||||
test_result = subprocess.run([str(binary_path), "--version"],
|
||||
capture_output=True, text=True, timeout=10)
|
||||
if test_result.returncode == 0:
|
||||
version = test_result.stdout.strip()
|
||||
print(f"✅ Binary test successful! Version: {version}")
|
||||
return True
|
||||
else:
|
||||
print(f"❌ Binary test failed: {test_result.stderr}")
|
||||
return False
|
||||
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"❌ Failed to download yt-dlp binary: {e}")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"❌ Error downloading binary: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def install_yt_dlp():
|
||||
"""Install yt-dlp via pip."""
|
||||
print("📦 Installing yt-dlp...")
|
||||
|
||||
try:
|
||||
result = subprocess.run([sys.executable, "-m", "pip", "install", "yt-dlp"],
|
||||
capture_output=True, text=True, check=True)
|
||||
print("✅ yt-dlp installed successfully!")
|
||||
return True
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"❌ Failed to install yt-dlp: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def test_installation():
|
||||
"""Test the installation."""
|
||||
print("\n🧪 Testing installation...")
|
||||
|
||||
# Test FFmpeg
|
||||
if check_ffmpeg():
|
||||
print("✅ FFmpeg is working!")
|
||||
else:
|
||||
print("❌ FFmpeg is not working")
|
||||
return False
|
||||
|
||||
# Test yt-dlp
|
||||
if check_yt_dlp():
|
||||
print("✅ yt-dlp is working!")
|
||||
else:
|
||||
print("❌ yt-dlp is not working")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
print("🍎 macOS Setup for Karaoke Video Downloader")
|
||||
print("=" * 50)
|
||||
|
||||
# Check current status
|
||||
print("🔍 Checking current installation...")
|
||||
ffmpeg_installed = check_ffmpeg()
|
||||
yt_dlp_installed = check_yt_dlp()
|
||||
|
||||
print(f"FFmpeg: {'✅ Installed' if ffmpeg_installed else '❌ Not installed'}")
|
||||
print(f"yt-dlp: {'✅ Installed' if yt_dlp_installed else '❌ Not installed'}")
|
||||
|
||||
if ffmpeg_installed and yt_dlp_installed:
|
||||
print("\n🎉 Everything is already installed and working!")
|
||||
return
|
||||
|
||||
# Install missing components
|
||||
print("\n🚀 Installing missing components...")
|
||||
|
||||
# Install FFmpeg if needed
|
||||
if not ffmpeg_installed:
|
||||
print("\n🎬 FFmpeg Installation Options:")
|
||||
print("1. Install via Homebrew (recommended)")
|
||||
print("2. Download from ffmpeg.org")
|
||||
print("3. Skip FFmpeg installation")
|
||||
|
||||
choice = input("\nChoose an option (1-3): ").strip()
|
||||
|
||||
if choice == "1":
|
||||
if not install_ffmpeg():
|
||||
print("❌ FFmpeg installation failed")
|
||||
return
|
||||
elif choice == "2":
|
||||
print("📥 Please download FFmpeg from: https://ffmpeg.org/download.html")
|
||||
print(" Extract and add to your PATH, then run this script again.")
|
||||
return
|
||||
elif choice == "3":
|
||||
print("⚠️ FFmpeg is required for video processing. Some features may not work.")
|
||||
else:
|
||||
print("❌ Invalid choice")
|
||||
return
|
||||
|
||||
# Install yt-dlp if needed
|
||||
if not yt_dlp_installed:
|
||||
print("\n📦 yt-dlp Installation Options:")
|
||||
print("1. Install via pip (recommended)")
|
||||
print("2. Download binary file")
|
||||
print("3. Skip yt-dlp installation")
|
||||
|
||||
choice = input("\nChoose an option (1-3): ").strip()
|
||||
|
||||
if choice == "1":
|
||||
if not install_yt_dlp():
|
||||
print("❌ yt-dlp installation failed")
|
||||
return
|
||||
elif choice == "2":
|
||||
if not download_yt_dlp_binary():
|
||||
print("❌ yt-dlp binary download failed")
|
||||
return
|
||||
elif choice == "3":
|
||||
print("❌ yt-dlp is required for video downloading.")
|
||||
return
|
||||
else:
|
||||
print("❌ Invalid choice")
|
||||
return
|
||||
|
||||
# Test installation
|
||||
if test_installation():
|
||||
print("\n🎉 Setup completed successfully!")
|
||||
print("You can now use the Karaoke Video Downloader on macOS.")
|
||||
print("Run: python download_karaoke.py --help")
|
||||
else:
|
||||
print("\n❌ Setup failed. Please check the error messages above.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
198
utilities/add_manual_video.py
Normal file
198
utilities/add_manual_video.py
Normal file
@ -0,0 +1,198 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Helper script to add manual videos to the manual videos collection.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
def extract_video_id(url: str) -> Optional[str]:
|
||||
"""Extract video ID from YouTube URL."""
|
||||
patterns = [
|
||||
r'(?:youtube\.com/watch\?v=|youtu\.be/|youtube\.com/embed/)([a-zA-Z0-9_-]{11})',
|
||||
r'youtube\.com/watch\?.*v=([a-zA-Z0-9_-]{11})'
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
return None
|
||||
|
||||
def add_manual_video(title: str, url: str, manual_file: str = None):
|
||||
if manual_file is None:
|
||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
||||
"""
|
||||
Add a manual video to the collection.
|
||||
|
||||
Args:
|
||||
title: Video title (e.g., "Artist - Song (Karaoke Version)")
|
||||
url: YouTube URL
|
||||
manual_file: Path to manual videos JSON file
|
||||
"""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
# Load existing data or create new
|
||||
if manual_path.exists():
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
else:
|
||||
data = {
|
||||
"channel_name": "@ManualVideos",
|
||||
"channel_url": "manual://static",
|
||||
"description": "Manual collection of individual karaoke videos",
|
||||
"videos": [],
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Extract video ID
|
||||
video_id = extract_video_id(url)
|
||||
if not video_id:
|
||||
print(f"❌ Could not extract video ID from URL: {url}")
|
||||
return False
|
||||
|
||||
# Check if video already exists
|
||||
existing_ids = [video.get("id") for video in data["videos"]]
|
||||
if video_id in existing_ids:
|
||||
print(f"⚠️ Video already exists: {title}")
|
||||
return False
|
||||
|
||||
# Add new video
|
||||
new_video = {
|
||||
"title": title,
|
||||
"url": url,
|
||||
"id": video_id,
|
||||
"upload_date": "2024-01-01", # Default date
|
||||
"duration": 180, # Default duration
|
||||
"view_count": 1000 # Default view count
|
||||
}
|
||||
|
||||
data["videos"].append(new_video)
|
||||
|
||||
# Save updated data
|
||||
manual_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(manual_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
print(f"✅ Added video: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" ID: {video_id}")
|
||||
return True
|
||||
|
||||
def list_manual_videos(manual_file: str = None):
|
||||
if manual_file is None:
|
||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
||||
"""List all manual videos."""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
if not manual_path.exists():
|
||||
print("❌ No manual videos file found")
|
||||
return
|
||||
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
print(f"📋 Manual Videos ({len(data['videos'])} videos):")
|
||||
print("=" * 60)
|
||||
|
||||
for i, video in enumerate(data['videos'], 1):
|
||||
print(f"{i:2d}. {video['title']}")
|
||||
print(f" URL: {video['url']}")
|
||||
print(f" ID: {video['id']}")
|
||||
print()
|
||||
|
||||
def remove_manual_video(video_id: str, manual_file: str = None):
|
||||
if manual_file is None:
|
||||
manual_file = str(get_data_path_manager().get_manual_videos_path())
|
||||
"""Remove a manual video by ID."""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
if not manual_path.exists():
|
||||
print("❌ No manual videos file found")
|
||||
return False
|
||||
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
# Find and remove video
|
||||
for i, video in enumerate(data['videos']):
|
||||
if video['id'] == video_id:
|
||||
removed_video = data['videos'].pop(i)
|
||||
with open(manual_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
print(f"✅ Removed video: {removed_video['title']}")
|
||||
return True
|
||||
|
||||
print(f"❌ Video with ID '{video_id}' not found")
|
||||
return False
|
||||
|
||||
def main():
|
||||
"""Interactive mode for adding manual videos."""
|
||||
print("🎤 Manual Video Manager")
|
||||
print("=" * 30)
|
||||
print("1. Add video")
|
||||
print("2. List videos")
|
||||
print("3. Remove video")
|
||||
print("4. Exit")
|
||||
|
||||
while True:
|
||||
choice = input("\nSelect option (1-4): ").strip()
|
||||
|
||||
if choice == "1":
|
||||
title = input("Enter video title (e.g., 'Artist - Song (Karaoke Version)'): ").strip()
|
||||
url = input("Enter YouTube URL: ").strip()
|
||||
|
||||
if title and url:
|
||||
add_manual_video(title, url)
|
||||
else:
|
||||
print("❌ Title and URL are required")
|
||||
|
||||
elif choice == "2":
|
||||
list_manual_videos()
|
||||
|
||||
elif choice == "3":
|
||||
video_id = input("Enter video ID to remove: ").strip()
|
||||
if video_id:
|
||||
remove_manual_video(video_id)
|
||||
else:
|
||||
print("❌ Video ID is required")
|
||||
|
||||
elif choice == "4":
|
||||
print("👋 Goodbye!")
|
||||
break
|
||||
|
||||
else:
|
||||
print("❌ Invalid option")
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
if len(sys.argv) > 1:
|
||||
# Command line mode
|
||||
if sys.argv[1] == "add" and len(sys.argv) >= 4:
|
||||
add_manual_video(sys.argv[2], sys.argv[3])
|
||||
elif sys.argv[1] == "list":
|
||||
list_manual_videos()
|
||||
elif sys.argv[1] == "remove" and len(sys.argv) >= 3:
|
||||
remove_manual_video(sys.argv[2])
|
||||
else:
|
||||
print("Usage:")
|
||||
print(" python add_manual_video.py add 'Title' 'URL'")
|
||||
print(" python add_manual_video.py list")
|
||||
print(" python add_manual_video.py remove VIDEO_ID")
|
||||
else:
|
||||
# Interactive mode
|
||||
main()
|
||||
127
utilities/build_cache_from_raw.py
Normal file
127
utilities/build_cache_from_raw.py
Normal file
@ -0,0 +1,127 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Script to build channel cache from raw yt-dlp output file.
|
||||
This uses the fixed parsing logic to handle titles with | characters.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
def parse_raw_output_file(raw_file_path):
|
||||
"""Parse the raw output file and extract valid videos."""
|
||||
videos = []
|
||||
invalid_count = 0
|
||||
|
||||
print(f"🔍 Parsing raw output file: {raw_file_path}")
|
||||
|
||||
with open(raw_file_path, 'r', encoding='utf-8') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
# Skip header lines (lines starting with #)
|
||||
data_lines = [line for line in lines if not line.strip().startswith('#') and line.strip()]
|
||||
|
||||
print(f"📄 Found {len(data_lines)} data lines to process")
|
||||
|
||||
for i, line in enumerate(data_lines):
|
||||
if i % 1000 == 0 and i > 0: # Progress indicator every 1000 lines
|
||||
print(f"📊 Processing line {i}/{len(data_lines)}... ({i/len(data_lines)*100:.1f}%)")
|
||||
|
||||
# Remove line number prefix (e.g., " 1234: ")
|
||||
line = re.sub(r'^\s*\d+:\s*', '', line.strip())
|
||||
|
||||
# More robust parsing that handles titles with | characters
|
||||
# Extract video ID directly from the URL that yt-dlp provides
|
||||
|
||||
# Find the URL and extract video ID from it
|
||||
url_match = re.search(r'https://www\.youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})', line)
|
||||
if not url_match:
|
||||
invalid_count += 1
|
||||
if invalid_count <= 5:
|
||||
print(f"⚠️ Skipping line with no URL: '{line[:100]}...'")
|
||||
elif invalid_count == 6:
|
||||
print(f"⚠️ ... and {len(data_lines) - i - 1} more invalid lines")
|
||||
continue
|
||||
|
||||
# Extract video ID directly from the URL
|
||||
video_id = url_match.group(1)
|
||||
|
||||
# Extract title (everything before the video ID in the line)
|
||||
title = line[:line.find(video_id)].rstrip('|').strip()
|
||||
|
||||
# Validate video ID
|
||||
if video_id and (
|
||||
len(video_id) == 11 and
|
||||
video_id.replace('-', '').replace('_', '').isalnum() and
|
||||
" " not in video_id and
|
||||
"Lyrics" not in video_id and
|
||||
"KARAOKE" not in video_id.upper() and
|
||||
"Vocal" not in video_id and
|
||||
"Guide" not in video_id
|
||||
):
|
||||
videos.append({"title": title, "id": video_id})
|
||||
else:
|
||||
invalid_count += 1
|
||||
if invalid_count <= 5: # Only show first 5 invalid IDs
|
||||
print(f"⚠️ Skipping invalid video ID: '{video_id}' for title: '{title[:50]}...'")
|
||||
elif invalid_count == 6:
|
||||
print(f"⚠️ ... and {len(data_lines) - i - 1} more invalid IDs")
|
||||
|
||||
print(f"✅ Parsed {len(videos)} valid videos from raw output")
|
||||
print(f"⚠️ Skipped {invalid_count} invalid video IDs")
|
||||
|
||||
return videos
|
||||
|
||||
def save_cache_file(channel_id, videos, cache_dir=None):
|
||||
if cache_dir is None:
|
||||
cache_dir = str(get_data_path_manager().get_channel_cache_dir())
|
||||
"""Save the parsed videos to a cache file."""
|
||||
cache_dir = Path(cache_dir)
|
||||
cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Sanitize channel ID for filename
|
||||
safe_channel_id = re.sub(r'[<>:"/\\|?*]', '_', channel_id)
|
||||
cache_file = cache_dir / f"{safe_channel_id}.json"
|
||||
|
||||
data = {
|
||||
'channel_id': channel_id,
|
||||
'videos': videos,
|
||||
'last_updated': datetime.now().isoformat(),
|
||||
'video_count': len(videos)
|
||||
}
|
||||
|
||||
with open(cache_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
print(f"💾 Saved cache to: {cache_file.name}")
|
||||
return cache_file
|
||||
|
||||
def main():
|
||||
"""Main function to build cache from raw output."""
|
||||
data_path_manager = get_data_path_manager()
|
||||
raw_file_path = data_path_manager.get_channel_cache_dir() / "@VocalStarKaraoke_raw_output.txt"
|
||||
|
||||
if not raw_file_path.exists():
|
||||
print(f"❌ Raw output file not found: {raw_file_path}")
|
||||
return
|
||||
|
||||
# Parse the raw output file
|
||||
videos = parse_raw_output_file(raw_file_path)
|
||||
|
||||
if not videos:
|
||||
print("❌ No valid videos found")
|
||||
return
|
||||
|
||||
# Save to cache file
|
||||
channel_id = "@VocalStarKaraoke"
|
||||
cache_file = save_cache_file(channel_id, videos)
|
||||
|
||||
print(f"🎉 Cache build complete!")
|
||||
print(f"📊 Total videos in cache: {len(videos)}")
|
||||
print(f"📁 Cache file: {cache_file}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
164
utilities/cleanup_duplicate_files.py
Normal file
164
utilities/cleanup_duplicate_files.py
Normal file
@ -0,0 +1,164 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Utility script to identify and clean up duplicate files with (2), (3) suffixes.
|
||||
This helps clean up files that were created before the duplicate prevention was implemented.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Tuple
|
||||
|
||||
def find_duplicate_files(downloads_dir: str = "downloads") -> Dict[str, List[Path]]:
|
||||
"""
|
||||
Find duplicate files with (2), (3), etc. suffixes in the downloads directory.
|
||||
|
||||
Args:
|
||||
downloads_dir: Path to downloads directory
|
||||
|
||||
Returns:
|
||||
Dictionary mapping base filenames to lists of duplicate files
|
||||
"""
|
||||
downloads_path = Path(downloads_dir)
|
||||
if not downloads_path.exists():
|
||||
print(f"❌ Downloads directory not found: {downloads_dir}")
|
||||
return {}
|
||||
|
||||
duplicates = {}
|
||||
|
||||
# Scan all MP4 files in the downloads directory
|
||||
for mp4_file in downloads_path.rglob("*.mp4"):
|
||||
filename = mp4_file.name
|
||||
|
||||
# Check if this is a duplicate file with (2), (3), etc.
|
||||
match = re.match(r'^(.+?)\s*\((\d+)\)\.mp4$', filename)
|
||||
if match:
|
||||
base_name = match.group(1)
|
||||
suffix_num = int(match.group(2))
|
||||
|
||||
if base_name not in duplicates:
|
||||
duplicates[base_name] = []
|
||||
|
||||
duplicates[base_name].append((mp4_file, suffix_num))
|
||||
|
||||
# Sort duplicates by suffix number
|
||||
for base_name in duplicates:
|
||||
duplicates[base_name].sort(key=lambda x: x[1])
|
||||
|
||||
return duplicates
|
||||
|
||||
def analyze_duplicates(duplicates: Dict[str, List[Tuple[Path, int]]]) -> None:
|
||||
"""
|
||||
Analyze and display information about found duplicates.
|
||||
|
||||
Args:
|
||||
duplicates: Dictionary of duplicate files
|
||||
"""
|
||||
if not duplicates:
|
||||
print("✅ No duplicate files found!")
|
||||
return
|
||||
|
||||
print(f"🔍 Found {len(duplicates)} sets of duplicate files:")
|
||||
print()
|
||||
|
||||
total_duplicates = 0
|
||||
for base_name, files in duplicates.items():
|
||||
print(f"📁 {base_name}")
|
||||
for file_path, suffix in files:
|
||||
file_size = file_path.stat().st_size / (1024 * 1024) # MB
|
||||
print(f" ({suffix}) {file_path.name} - {file_size:.1f} MB")
|
||||
print()
|
||||
total_duplicates += len(files) - 1 # -1 because we keep the original
|
||||
|
||||
print(f"📊 Summary: {len(duplicates)} base files with {total_duplicates} duplicate files")
|
||||
|
||||
def cleanup_duplicates(duplicates: Dict[str, List[Tuple[Path, int]]], dry_run: bool = True) -> None:
|
||||
"""
|
||||
Clean up duplicate files, keeping only the first occurrence.
|
||||
|
||||
Args:
|
||||
duplicates: Dictionary of duplicate files
|
||||
dry_run: If True, only show what would be deleted without actually deleting
|
||||
"""
|
||||
if not duplicates:
|
||||
print("✅ No duplicates to clean up!")
|
||||
return
|
||||
|
||||
mode = "DRY RUN" if dry_run else "ACTUAL CLEANUP"
|
||||
print(f"🧹 Starting {mode}...")
|
||||
print()
|
||||
|
||||
total_deleted = 0
|
||||
total_size_freed = 0
|
||||
|
||||
for base_name, files in duplicates.items():
|
||||
print(f"📁 Processing: {base_name}")
|
||||
|
||||
# Keep the first file (lowest suffix number), delete the rest
|
||||
files_to_delete = files[1:] # Skip the first file
|
||||
|
||||
for file_path, suffix in files_to_delete:
|
||||
file_size = file_path.stat().st_size / (1024 * 1024) # MB
|
||||
|
||||
if dry_run:
|
||||
print(f" 🗑️ Would delete: {file_path.name} ({file_size:.1f} MB)")
|
||||
else:
|
||||
try:
|
||||
file_path.unlink()
|
||||
print(f" ✅ Deleted: {file_path.name} ({file_size:.1f} MB)")
|
||||
total_deleted += 1
|
||||
total_size_freed += file_size
|
||||
except Exception as e:
|
||||
print(f" ❌ Failed to delete {file_path.name}: {e}")
|
||||
|
||||
print()
|
||||
|
||||
if dry_run:
|
||||
print(f"📊 DRY RUN SUMMARY: Would delete {len([f for files in duplicates.values() for f in files[1:]])} files")
|
||||
else:
|
||||
print(f"📊 CLEANUP SUMMARY: Deleted {total_deleted} files, freed {total_size_freed:.1f} MB")
|
||||
|
||||
def main():
|
||||
"""Main function to run the duplicate file cleanup."""
|
||||
print("🎵 Karaoke Video Downloader - Duplicate File Cleanup")
|
||||
print("=" * 50)
|
||||
print()
|
||||
|
||||
# Find duplicates
|
||||
duplicates = find_duplicate_files()
|
||||
|
||||
if not duplicates:
|
||||
print("✅ No duplicate files found!")
|
||||
return
|
||||
|
||||
# Analyze duplicates
|
||||
analyze_duplicates(duplicates)
|
||||
print()
|
||||
|
||||
# Ask user what to do
|
||||
while True:
|
||||
print("Options:")
|
||||
print("1. Dry run (show what would be deleted)")
|
||||
print("2. Actually delete duplicate files")
|
||||
print("3. Exit without doing anything")
|
||||
|
||||
choice = input("\nEnter your choice (1-3): ").strip()
|
||||
|
||||
if choice == "1":
|
||||
cleanup_duplicates(duplicates, dry_run=True)
|
||||
break
|
||||
elif choice == "2":
|
||||
confirm = input("⚠️ Are you sure you want to delete duplicate files? (yes/no): ").strip().lower()
|
||||
if confirm in ["yes", "y"]:
|
||||
cleanup_duplicates(duplicates, dry_run=False)
|
||||
else:
|
||||
print("❌ Cleanup cancelled.")
|
||||
break
|
||||
elif choice == "3":
|
||||
print("❌ Exiting without cleanup.")
|
||||
break
|
||||
else:
|
||||
print("❌ Invalid choice. Please enter 1, 2, or 3.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -2,7 +2,11 @@ import json
|
||||
from pathlib import Path
|
||||
from datetime import datetime, time
|
||||
|
||||
def cleanup_recent_tracking(tracking_path="data/songlist_tracking.json", cutoff_time_str="11:00"):
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
def cleanup_recent_tracking(tracking_path=None, cutoff_time_str="11:00"):
|
||||
if tracking_path is None:
|
||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
||||
"""Remove entries from songlist_tracking.json that were added after the specified time today."""
|
||||
tracking_file = Path(tracking_path)
|
||||
if not tracking_file.exists():
|
||||
465
utilities/fix_artist_name_format.py
Normal file
465
utilities/fix_artist_name_format.py
Normal file
@ -0,0 +1,465 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Fix artist name formatting for Let's Sing Karaoke channel.
|
||||
|
||||
This script specifically targets the "Last Name, First Name" format and converts it to
|
||||
"First Name Last Name" format in ID3 tags. It only processes entries where there is exactly one comma
|
||||
followed by exactly 2 words, to avoid affecting multi-artist entries.
|
||||
|
||||
Usage:
|
||||
python fix_artist_name_format.py --preview # Show what would be changed
|
||||
python fix_artist_name_format.py --apply # Actually make the changes
|
||||
python fix_artist_name_format.py --external "D:\Karaoke\Karaoke\MP4\Let's Sing Karaoke" # Use external directory
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Tuple, Optional
|
||||
|
||||
# Try to import mutagen for ID3 tag manipulation
|
||||
try:
|
||||
from mutagen.mp4 import MP4
|
||||
MUTAGEN_AVAILABLE = True
|
||||
except ImportError:
|
||||
MUTAGEN_AVAILABLE = False
|
||||
print("⚠️ mutagen not available - install with: pip install mutagen")
|
||||
|
||||
|
||||
def is_lastname_firstname_format(artist_name: str) -> bool:
|
||||
"""
|
||||
Check if artist name is in "Last Name, First Name" format.
|
||||
|
||||
Args:
|
||||
artist_name: The artist name to check
|
||||
|
||||
Returns:
|
||||
True if the name matches "Last Name, First Name" format with exactly 2 words after comma
|
||||
"""
|
||||
if ',' not in artist_name:
|
||||
return False
|
||||
|
||||
# Split by comma
|
||||
parts = artist_name.split(',', 1)
|
||||
if len(parts) != 2:
|
||||
return False
|
||||
|
||||
last_name = parts[0].strip()
|
||||
first_name_part = parts[1].strip()
|
||||
|
||||
# Check if there are exactly 2 words after the comma
|
||||
words_after_comma = first_name_part.split()
|
||||
if len(words_after_comma) != 2:
|
||||
return False
|
||||
|
||||
# Additional check: make sure it's not a multi-artist entry
|
||||
# If there are more than 2 words total in the artist name, it might be multi-artist
|
||||
total_words = len(artist_name.split())
|
||||
if total_words > 4: # Last, First Name (4 words max for single artist)
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def convert_to_firstname_lastname(artist_name: str) -> str:
|
||||
"""
|
||||
Convert "Last Name, First Name" to "First Name Last Name".
|
||||
|
||||
Args:
|
||||
artist_name: Artist name in "Last Name, First Name" format
|
||||
|
||||
Returns:
|
||||
Artist name in "First Name Last Name" format
|
||||
"""
|
||||
parts = artist_name.split(',', 1)
|
||||
last_name = parts[0].strip()
|
||||
first_name_part = parts[1].strip()
|
||||
|
||||
# Split the first name part into words
|
||||
words = first_name_part.split()
|
||||
if len(words) == 2:
|
||||
first_name = words[0]
|
||||
middle_name = words[1]
|
||||
return f"{first_name} {middle_name} {last_name}"
|
||||
else:
|
||||
# Fallback - just reverse the parts
|
||||
return f"{first_name_part} {last_name}"
|
||||
|
||||
|
||||
def extract_artist_title_from_filename(filename: str) -> Tuple[str, str]:
|
||||
"""
|
||||
Extract artist and title from a filename.
|
||||
|
||||
Args:
|
||||
filename: MP4 filename (without extension)
|
||||
|
||||
Returns:
|
||||
Tuple of (artist, title)
|
||||
"""
|
||||
# Remove .mp4 extension
|
||||
if filename.endswith('.mp4'):
|
||||
filename = filename[:-4]
|
||||
|
||||
# Look for " - " separator
|
||||
if " - " in filename:
|
||||
parts = filename.split(" - ", 1)
|
||||
return parts[0].strip(), parts[1].strip()
|
||||
|
||||
return "", filename
|
||||
|
||||
|
||||
def update_id3_tags(file_path: str, new_artist: str, apply_changes: bool = False) -> bool:
|
||||
"""
|
||||
Update the ID3 tags in an MP4 file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the MP4 file
|
||||
new_artist: New artist name to set
|
||||
apply_changes: Whether to actually apply changes or just preview
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
if not MUTAGEN_AVAILABLE:
|
||||
print(f"⚠️ mutagen not available - cannot update ID3 tags for {file_path}")
|
||||
return False
|
||||
|
||||
try:
|
||||
mp4 = MP4(file_path)
|
||||
|
||||
if apply_changes:
|
||||
# Update the artist tag
|
||||
mp4["\xa9ART"] = new_artist
|
||||
mp4.save()
|
||||
print(f"📝 Updated ID3 tag: {os.path.basename(file_path)} → Artist: '{new_artist}'")
|
||||
else:
|
||||
# Just preview what would be changed
|
||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
||||
print(f"📝 Would update ID3 tag: {os.path.basename(file_path)} → Artist: '{current_artist}' → '{new_artist}'")
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Failed to update ID3 tags for {file_path}: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def scan_external_directory(directory_path: str) -> List[Dict]:
|
||||
"""
|
||||
Scan external directory for MP4 files with "Last Name, First Name" format in ID3 tags.
|
||||
|
||||
Args:
|
||||
directory_path: Path to the external directory
|
||||
|
||||
Returns:
|
||||
List of files that need ID3 tag updates
|
||||
"""
|
||||
if not os.path.exists(directory_path):
|
||||
print(f"❌ Directory not found: {directory_path}")
|
||||
return []
|
||||
|
||||
if not MUTAGEN_AVAILABLE:
|
||||
print("❌ mutagen not available - cannot scan ID3 tags")
|
||||
return []
|
||||
|
||||
files_to_update = []
|
||||
|
||||
# Scan for MP4 files
|
||||
for file_path in Path(directory_path).glob("*.mp4"):
|
||||
try:
|
||||
mp4 = MP4(str(file_path))
|
||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
||||
|
||||
if current_artist and is_lastname_firstname_format(current_artist):
|
||||
new_artist = convert_to_firstname_lastname(current_artist)
|
||||
|
||||
files_to_update.append({
|
||||
'file_path': str(file_path),
|
||||
'filename': file_path.name,
|
||||
'old_artist': current_artist,
|
||||
'new_artist': new_artist
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
print(f"⚠️ Could not read ID3 tags from {file_path.name}: {e}")
|
||||
|
||||
return files_to_update
|
||||
|
||||
|
||||
def update_tracking_file(tracking_file: str, channel_name: str = "Let's Sing Karaoke", apply_changes: bool = False) -> Tuple[int, List[Dict]]:
|
||||
"""
|
||||
Update the karaoke tracking file to fix artist name formatting.
|
||||
|
||||
Args:
|
||||
tracking_file: Path to the tracking JSON file
|
||||
channel_name: Channel name to target (default: Let's Sing Karaoke)
|
||||
apply_changes: Whether to actually apply changes or just preview
|
||||
|
||||
Returns:
|
||||
Tuple of (number of changes made, list of changed entries)
|
||||
"""
|
||||
if not os.path.exists(tracking_file):
|
||||
print(f"❌ Tracking file not found: {tracking_file}")
|
||||
return 0, []
|
||||
|
||||
# Load the tracking data
|
||||
with open(tracking_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
changes_made = 0
|
||||
changed_entries = []
|
||||
|
||||
# Process songs
|
||||
for song_key, song_data in data.get('songs', {}).items():
|
||||
if song_data.get('channel_name') != channel_name:
|
||||
continue
|
||||
|
||||
artist = song_data.get('artist', '')
|
||||
if not artist or not is_lastname_firstname_format(artist):
|
||||
continue
|
||||
|
||||
# Convert the artist name
|
||||
new_artist = convert_to_firstname_lastname(artist)
|
||||
|
||||
if apply_changes:
|
||||
# Update the tracking data
|
||||
song_data['artist'] = new_artist
|
||||
|
||||
# Update the video title if it exists and contains the old artist name
|
||||
video_title = song_data.get('video_title', '')
|
||||
if video_title and artist in video_title:
|
||||
song_data['video_title'] = video_title.replace(artist, new_artist)
|
||||
|
||||
# Update the file path if it exists
|
||||
file_path = song_data.get('file_path', '')
|
||||
if file_path and artist in file_path:
|
||||
song_data['file_path'] = file_path.replace(artist, new_artist)
|
||||
|
||||
changes_made += 1
|
||||
changed_entries.append({
|
||||
'song_key': song_key,
|
||||
'old_artist': artist,
|
||||
'new_artist': new_artist,
|
||||
'title': song_data.get('title', ''),
|
||||
'file_path': song_data.get('file_path', '')
|
||||
})
|
||||
|
||||
print(f"🔄 {'Updated' if apply_changes else 'Would update'}: '{artist}' → '{new_artist}' ({song_data.get('title', '')})")
|
||||
|
||||
# Save the updated data
|
||||
if apply_changes and changes_made > 0:
|
||||
# Create backup
|
||||
backup_file = f"{tracking_file}.backup"
|
||||
shutil.copy2(tracking_file, backup_file)
|
||||
print(f"💾 Created backup: {backup_file}")
|
||||
|
||||
# Save updated file
|
||||
with open(tracking_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
print(f"💾 Updated tracking file: {tracking_file}")
|
||||
|
||||
return changes_made, changed_entries
|
||||
|
||||
|
||||
def update_songlist_tracking(songlist_file: str, channel_name: str = "Let's Sing Karaoke", apply_changes: bool = False) -> Tuple[int, List[Dict]]:
|
||||
"""
|
||||
Update the songlist tracking file to fix artist name formatting.
|
||||
|
||||
Args:
|
||||
songlist_file: Path to the songlist tracking JSON file
|
||||
channel_name: Channel name to target (default: Let's Sing Karaoke)
|
||||
apply_changes: Whether to actually apply changes or just preview
|
||||
|
||||
Returns:
|
||||
Tuple of (number of changes made, list of changed entries)
|
||||
"""
|
||||
if not os.path.exists(songlist_file):
|
||||
print(f"❌ Songlist tracking file not found: {songlist_file}")
|
||||
return 0, []
|
||||
|
||||
# Load the songlist data
|
||||
with open(songlist_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
changes_made = 0
|
||||
changed_entries = []
|
||||
|
||||
# Process songlist entries
|
||||
for song_key, song_data in data.items():
|
||||
artist = song_data.get('artist', '')
|
||||
if not artist or not is_lastname_firstname_format(artist):
|
||||
continue
|
||||
|
||||
# Convert the artist name
|
||||
new_artist = convert_to_firstname_lastname(artist)
|
||||
|
||||
if apply_changes:
|
||||
# Update the songlist data
|
||||
song_data['artist'] = new_artist
|
||||
|
||||
changes_made += 1
|
||||
changed_entries.append({
|
||||
'song_key': song_key,
|
||||
'old_artist': artist,
|
||||
'new_artist': new_artist,
|
||||
'title': song_data.get('title', '')
|
||||
})
|
||||
|
||||
print(f"🔄 {'Updated' if apply_changes else 'Would update'} songlist: '{artist}' → '{new_artist}' ({song_data.get('title', '')})")
|
||||
|
||||
# Save the updated data
|
||||
if apply_changes and changes_made > 0:
|
||||
# Create backup
|
||||
backup_file = f"{songlist_file}.backup"
|
||||
shutil.copy2(songlist_file, backup_file)
|
||||
print(f"💾 Created backup: {backup_file}")
|
||||
|
||||
# Save updated file
|
||||
with open(songlist_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
print(f"💾 Updated songlist file: {songlist_file}")
|
||||
|
||||
return changes_made, changed_entries
|
||||
|
||||
|
||||
def update_id3_tags_for_files(files_to_update: List[Dict], apply_changes: bool = False) -> int:
|
||||
"""
|
||||
Update ID3 tags for a list of files.
|
||||
|
||||
Args:
|
||||
files_to_update: List of files to update
|
||||
apply_changes: Whether to actually apply changes or just preview
|
||||
|
||||
Returns:
|
||||
Number of files successfully updated
|
||||
"""
|
||||
updated_count = 0
|
||||
|
||||
for file_info in files_to_update:
|
||||
file_path = file_info['file_path']
|
||||
new_artist = file_info['new_artist']
|
||||
|
||||
if update_id3_tags(file_path, new_artist, apply_changes):
|
||||
updated_count += 1
|
||||
|
||||
return updated_count
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to run the artist name fix script."""
|
||||
parser = argparse.ArgumentParser(description="Fix artist name formatting in ID3 tags for Let's Sing Karaoke")
|
||||
parser.add_argument('--preview', action='store_true', help='Show what would be changed without making changes')
|
||||
parser.add_argument('--apply', action='store_true', help='Actually apply the changes')
|
||||
parser.add_argument('--external', type=str, help='Path to external karaoke directory')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Default to preview mode if no action specified
|
||||
if not args.preview and not args.apply:
|
||||
args.preview = True
|
||||
|
||||
print("🎤 Artist Name Format Fix Script (ID3 Tags Only)")
|
||||
print("=" * 60)
|
||||
print("This script will fix 'Last Name, First Name' format to 'First Name Last Name'")
|
||||
print("Only targeting Let's Sing Karaoke channel to avoid affecting other channels.")
|
||||
print("Focusing on ID3 tags only - filenames will not be changed.")
|
||||
print()
|
||||
|
||||
if not MUTAGEN_AVAILABLE:
|
||||
print("❌ mutagen library not available!")
|
||||
print("Please install it with: pip install mutagen")
|
||||
return
|
||||
|
||||
if args.preview:
|
||||
print("🔍 PREVIEW MODE - No changes will be made")
|
||||
else:
|
||||
print("⚡ APPLY MODE - Changes will be made")
|
||||
print()
|
||||
|
||||
# File paths
|
||||
tracking_file = "data/karaoke_tracking.json"
|
||||
songlist_file = "data/songlist_tracking.json"
|
||||
|
||||
# Process external directory if specified
|
||||
if args.external:
|
||||
print(f"📁 Scanning external directory: {args.external}")
|
||||
external_files = scan_external_directory(args.external)
|
||||
|
||||
if external_files:
|
||||
print(f"\n📋 Found {len(external_files)} files with 'Last Name, First Name' format in ID3 tags:")
|
||||
for file_info in external_files:
|
||||
print(f" • {file_info['filename']}: '{file_info['old_artist']}' → '{file_info['new_artist']}'")
|
||||
|
||||
if args.apply:
|
||||
print(f"\n📝 Updating ID3 tags in external files...")
|
||||
updated_count = update_id3_tags_for_files(external_files, apply_changes=True)
|
||||
print(f"✅ Updated ID3 tags in {updated_count} external files")
|
||||
else:
|
||||
print(f"\n📝 Would update ID3 tags in {len(external_files)} external files")
|
||||
else:
|
||||
print("✅ No files with 'Last Name, First Name' format found in ID3 tags")
|
||||
|
||||
# Process tracking files (only if they exist in current project)
|
||||
if os.path.exists(tracking_file):
|
||||
print(f"\n📊 Processing karaoke tracking file...")
|
||||
tracking_changes, tracking_entries = update_tracking_file(tracking_file, apply_changes=args.apply)
|
||||
else:
|
||||
print(f"\n⚠️ Tracking file not found: {tracking_file}")
|
||||
tracking_changes = 0
|
||||
|
||||
if os.path.exists(songlist_file):
|
||||
print(f"\n📊 Processing songlist tracking file...")
|
||||
songlist_changes, songlist_entries = update_songlist_tracking(songlist_file, apply_changes=args.apply)
|
||||
else:
|
||||
print(f"\n⚠️ Songlist tracking file not found: {songlist_file}")
|
||||
songlist_changes = 0
|
||||
|
||||
# Process local downloads directory ID3 tags
|
||||
downloads_dir = "downloads"
|
||||
local_id3_updates = 0
|
||||
if os.path.exists(downloads_dir) and tracking_changes > 0:
|
||||
print(f"\n📝 Processing ID3 tags in local downloads directory...")
|
||||
# Scan local downloads for files that need ID3 tag updates
|
||||
local_files = []
|
||||
for entry in tracking_entries:
|
||||
file_path = entry.get('file_path', '')
|
||||
if file_path and os.path.exists(file_path.replace('\\', '/')):
|
||||
local_files.append({
|
||||
'file_path': file_path.replace('\\', '/'),
|
||||
'filename': os.path.basename(file_path),
|
||||
'old_artist': entry['old_artist'],
|
||||
'new_artist': entry['new_artist']
|
||||
})
|
||||
|
||||
if local_files:
|
||||
local_id3_updates = update_id3_tags_for_files(local_files, apply_changes=args.apply)
|
||||
|
||||
total_changes = tracking_changes + songlist_changes
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("📋 Summary:")
|
||||
print(f" • Tracking file changes: {tracking_changes}")
|
||||
print(f" • Songlist file changes: {songlist_changes}")
|
||||
print(f" • Local ID3 tag updates: {local_id3_updates}")
|
||||
print(f" • Total changes: {total_changes}")
|
||||
|
||||
if args.external:
|
||||
external_count = len(scan_external_directory(args.external)) if args.preview else len(external_files)
|
||||
print(f" • External ID3 tag updates: {external_count}")
|
||||
|
||||
if total_changes > 0 or (args.external and external_count > 0):
|
||||
if args.apply:
|
||||
print("\n✅ Artist name formatting in ID3 tags has been fixed!")
|
||||
print("💾 Backups have been created for all modified files.")
|
||||
print("🔄 You may need to re-run your karaoke downloader to update any cached data.")
|
||||
else:
|
||||
print("\n🔍 Preview complete. Use --apply to make these changes.")
|
||||
else:
|
||||
print("\n✅ No changes needed! All artist names are already in the correct format.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
295
utilities/fix_artist_name_format_simple.py
Normal file
295
utilities/fix_artist_name_format_simple.py
Normal file
@ -0,0 +1,295 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Fix artist name formatting for Let's Sing Karaoke channel.
|
||||
|
||||
This script specifically targets the "Last Name, First Name" format and converts it to
|
||||
"First Name Last Name" format in ID3 tags. It only processes entries where there is exactly one comma
|
||||
followed by exactly 2 words, to avoid affecting multi-artist entries.
|
||||
|
||||
Usage:
|
||||
python fix_artist_name_format_simple.py --preview # Show what would be changed
|
||||
python fix_artist_name_format_simple.py --apply # Actually make the changes
|
||||
python fix_artist_name_format_simple.py --external "D:\Karaoke\Karaoke\MP4\Let's Sing Karaoke" # Use external directory
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Tuple, Optional
|
||||
|
||||
# Try to import mutagen for ID3 tag manipulation
|
||||
try:
|
||||
from mutagen.mp4 import MP4
|
||||
MUTAGEN_AVAILABLE = True
|
||||
except ImportError:
|
||||
MUTAGEN_AVAILABLE = False
|
||||
print("WARNING: mutagen not available - install with: pip install mutagen")
|
||||
|
||||
|
||||
def is_lastname_firstname_format(artist_name: str) -> bool:
|
||||
"""
|
||||
Check if artist name is in "Last Name, First Name" format.
|
||||
|
||||
Args:
|
||||
artist_name: The artist name to check
|
||||
|
||||
Returns:
|
||||
True if the name matches "Last Name, First Name" format with exactly 1 or 2 words after comma
|
||||
"""
|
||||
if ',' not in artist_name:
|
||||
return False
|
||||
|
||||
# Split by comma
|
||||
parts = artist_name.split(',', 1)
|
||||
if len(parts) != 2:
|
||||
return False
|
||||
|
||||
last_name = parts[0].strip()
|
||||
first_name_part = parts[1].strip()
|
||||
|
||||
# Check if there are exactly 1 or 2 words after the comma
|
||||
words_after_comma = first_name_part.split()
|
||||
if len(words_after_comma) not in [1, 2]:
|
||||
return False
|
||||
|
||||
# Additional check: make sure it's not a multi-artist entry
|
||||
# If there are more than 4 words total in the artist name, it might be multi-artist
|
||||
total_words = len(artist_name.split())
|
||||
if total_words > 4: # Last, First Name (4 words max for single artist)
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def convert_lastname_firstname(artist_name: str) -> str:
|
||||
"""
|
||||
Convert "Last Name, First Name" to "First Name Last Name".
|
||||
|
||||
Args:
|
||||
artist_name: The artist name to convert
|
||||
|
||||
Returns:
|
||||
The converted artist name
|
||||
"""
|
||||
if ',' not in artist_name:
|
||||
return artist_name
|
||||
|
||||
parts = artist_name.split(',', 1)
|
||||
if len(parts) != 2:
|
||||
return artist_name
|
||||
|
||||
last_name = parts[0].strip()
|
||||
first_name = parts[1].strip()
|
||||
|
||||
return f"{first_name} {last_name}"
|
||||
|
||||
|
||||
def process_artist_name(artist_name: str) -> str:
|
||||
"""
|
||||
Process an artist name, handling both single artists and multiple artists separated by "&".
|
||||
|
||||
Args:
|
||||
artist_name: The artist name to process
|
||||
|
||||
Returns:
|
||||
The processed artist name
|
||||
"""
|
||||
if '&' in artist_name:
|
||||
# Split by "&" and process each artist individually
|
||||
artists = [artist.strip() for artist in artist_name.split('&')]
|
||||
processed_artists = []
|
||||
|
||||
for artist in artists:
|
||||
if is_lastname_firstname_format(artist):
|
||||
processed_artist = convert_lastname_firstname(artist)
|
||||
processed_artists.append(processed_artist)
|
||||
else:
|
||||
processed_artists.append(artist)
|
||||
|
||||
# Rejoin with "&"
|
||||
return ' & '.join(processed_artists)
|
||||
else:
|
||||
# Single artist
|
||||
if is_lastname_firstname_format(artist_name):
|
||||
return convert_lastname_firstname(artist_name)
|
||||
else:
|
||||
return artist_name
|
||||
|
||||
|
||||
def update_id3_tags(file_path: str, new_artist: str, apply_changes: bool = False) -> bool:
|
||||
"""
|
||||
Update the ID3 tags in an MP4 file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the MP4 file
|
||||
new_artist: New artist name to set
|
||||
apply_changes: Whether to actually apply changes or just preview
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
if not MUTAGEN_AVAILABLE:
|
||||
print(f"WARNING: mutagen not available - cannot update ID3 tags for {file_path}")
|
||||
return False
|
||||
|
||||
try:
|
||||
mp4 = MP4(file_path)
|
||||
|
||||
if apply_changes:
|
||||
# Update the artist tag
|
||||
mp4["\xa9ART"] = new_artist
|
||||
mp4.save()
|
||||
print(f"UPDATED ID3 tag: {os.path.basename(file_path)} -> Artist: '{new_artist}'")
|
||||
else:
|
||||
# Just preview what would be changed
|
||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
||||
print(f"WOULD UPDATE ID3 tag: {os.path.basename(file_path)} -> Artist: '{current_artist}' -> '{new_artist}'")
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"ERROR: Failed to update ID3 tags for {file_path}: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def scan_external_directory(directory_path: str, debug: bool = False) -> List[Dict]:
|
||||
"""
|
||||
Scan external directory for MP4 files with "Last Name, First Name" format in ID3 tags.
|
||||
|
||||
Args:
|
||||
directory_path: Path to the external directory
|
||||
debug: Whether to show debug information
|
||||
|
||||
Returns:
|
||||
List of files that need ID3 tag updates
|
||||
"""
|
||||
if not os.path.exists(directory_path):
|
||||
print(f"ERROR: Directory not found: {directory_path}")
|
||||
return []
|
||||
|
||||
if not MUTAGEN_AVAILABLE:
|
||||
print("ERROR: mutagen not available - cannot scan ID3 tags")
|
||||
return []
|
||||
|
||||
files_to_update = []
|
||||
total_files = 0
|
||||
files_with_artist_tags = 0
|
||||
|
||||
# Scan for MP4 files
|
||||
for file_path in Path(directory_path).glob("*.mp4"):
|
||||
total_files += 1
|
||||
try:
|
||||
mp4 = MP4(str(file_path))
|
||||
current_artist = mp4.get("\xa9ART", ["Unknown"])[0] if "\xa9ART" in mp4 else "Unknown"
|
||||
|
||||
if current_artist != "Unknown":
|
||||
files_with_artist_tags += 1
|
||||
|
||||
if debug:
|
||||
print(f"DEBUG: {file_path.name} -> Artist: '{current_artist}'")
|
||||
|
||||
# Process the artist name to handle multiple artists
|
||||
processed_artist = process_artist_name(current_artist)
|
||||
|
||||
if processed_artist != current_artist:
|
||||
files_to_update.append({
|
||||
'file_path': str(file_path),
|
||||
'filename': file_path.name,
|
||||
'old_artist': current_artist,
|
||||
'new_artist': processed_artist
|
||||
})
|
||||
|
||||
if debug:
|
||||
print(f"DEBUG: MATCH FOUND - {file_path.name}: '{current_artist}' -> '{processed_artist}'")
|
||||
|
||||
except Exception as e:
|
||||
if debug:
|
||||
print(f"WARNING: Could not read ID3 tags from {file_path.name}: {e}")
|
||||
|
||||
print(f"INFO: Scanned {total_files} MP4 files, {files_with_artist_tags} had artist tags, {len(files_to_update)} need updates")
|
||||
return files_to_update
|
||||
|
||||
|
||||
def update_id3_tags_for_files(files_to_update: List[Dict], apply_changes: bool = False) -> int:
|
||||
"""
|
||||
Update ID3 tags for a list of files.
|
||||
|
||||
Args:
|
||||
files_to_update: List of files to update
|
||||
apply_changes: Whether to actually apply changes or just preview
|
||||
|
||||
Returns:
|
||||
Number of files successfully updated
|
||||
"""
|
||||
updated_count = 0
|
||||
|
||||
for file_info in files_to_update:
|
||||
file_path = file_info['file_path']
|
||||
new_artist = file_info['new_artist']
|
||||
|
||||
if update_id3_tags(file_path, new_artist, apply_changes):
|
||||
updated_count += 1
|
||||
|
||||
return updated_count
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to run the artist name fix script."""
|
||||
parser = argparse.ArgumentParser(description="Fix artist name formatting in ID3 tags for Let's Sing Karaoke")
|
||||
parser.add_argument('--preview', action='store_true', help='Show what would be changed without making changes')
|
||||
parser.add_argument('--apply', action='store_true', help='Actually apply the changes')
|
||||
parser.add_argument('--external', type=str, help='Path to external karaoke directory')
|
||||
parser.add_argument('--debug', action='store_true', help='Show debug information')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Default to preview mode if no action specified
|
||||
if not args.preview and not args.apply:
|
||||
args.preview = True
|
||||
|
||||
print("Artist Name Format Fix Script (ID3 Tags Only)")
|
||||
print("=" * 60)
|
||||
print("This script will fix 'Last Name, First Name' format to 'First Name Last Name'")
|
||||
print("Only targeting Let's Sing Karaoke channel to avoid affecting other channels.")
|
||||
print("Focusing on ID3 tags only - filenames will not be changed.")
|
||||
print()
|
||||
|
||||
if not MUTAGEN_AVAILABLE:
|
||||
print("ERROR: mutagen library not available!")
|
||||
print("Please install it with: pip install mutagen")
|
||||
return
|
||||
|
||||
if args.preview:
|
||||
print("PREVIEW MODE - No changes will be made")
|
||||
else:
|
||||
print("APPLY MODE - Changes will be made")
|
||||
print()
|
||||
|
||||
# Process external directory if specified
|
||||
if args.external:
|
||||
print(f"Scanning external directory: {args.external}")
|
||||
external_files = scan_external_directory(args.external, debug=args.debug)
|
||||
|
||||
if external_files:
|
||||
print(f"\nFound {len(external_files)} files with 'Last Name, First Name' format in ID3 tags:")
|
||||
for file_info in external_files:
|
||||
print(f" * {file_info['filename']}: '{file_info['old_artist']}' -> '{file_info['new_artist']}'")
|
||||
|
||||
if args.apply:
|
||||
print(f"\nUpdating ID3 tags in external files...")
|
||||
updated_count = update_id3_tags_for_files(external_files, apply_changes=True)
|
||||
print(f"SUCCESS: Updated ID3 tags in {updated_count} external files")
|
||||
else:
|
||||
print(f"\nWould update ID3 tags in {len(external_files)} external files")
|
||||
else:
|
||||
print("SUCCESS: No files with 'Last Name, First Name' format found in ID3 tags")
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("Summary complete.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
151
utilities/reset_and_redownload.py
Normal file
151
utilities/reset_and_redownload.py
Normal file
@ -0,0 +1,151 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Script to reset karaoke tracking and re-download files with the new channel parser.
|
||||
|
||||
This script will:
|
||||
1. Reset the karaoke_tracking.json to remove all downloaded entries
|
||||
2. Optionally delete the downloaded files
|
||||
3. Allow you to re-download with the new channel parser system
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
|
||||
def reset_karaoke_tracking(tracking_file: str = None) -> None:
|
||||
if tracking_file is None:
|
||||
tracking_file = str(get_data_path_manager().get_karaoke_tracking_path())
|
||||
"""Reset the karaoke tracking file to empty state."""
|
||||
print(f"Resetting {tracking_file}...")
|
||||
|
||||
# Create backup of current tracking
|
||||
backup_file = f"{tracking_file}.backup"
|
||||
if os.path.exists(tracking_file):
|
||||
shutil.copy2(tracking_file, backup_file)
|
||||
print(f"Created backup: {backup_file}")
|
||||
|
||||
# Reset to empty state
|
||||
empty_tracking = {
|
||||
"playlists": {},
|
||||
"songs": {}
|
||||
}
|
||||
|
||||
with open(tracking_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(empty_tracking, f, indent=2, ensure_ascii=False)
|
||||
|
||||
print(f"✅ Reset {tracking_file} to empty state")
|
||||
|
||||
|
||||
def delete_downloaded_files(downloads_dir: str = "downloads") -> None:
|
||||
"""Delete all downloaded files and folders."""
|
||||
if not os.path.exists(downloads_dir):
|
||||
print(f"Downloads directory {downloads_dir} does not exist.")
|
||||
return
|
||||
|
||||
print(f"Deleting all files in {downloads_dir}...")
|
||||
|
||||
try:
|
||||
shutil.rmtree(downloads_dir)
|
||||
print(f"✅ Deleted {downloads_dir} directory")
|
||||
except Exception as e:
|
||||
print(f"❌ Error deleting {downloads_dir}: {e}")
|
||||
|
||||
|
||||
def show_download_stats(tracking_file: str = None) -> None:
|
||||
if tracking_file is None:
|
||||
tracking_file = str(get_data_path_manager().get_karaoke_tracking_path())
|
||||
"""Show statistics about current downloads."""
|
||||
if not os.path.exists(tracking_file):
|
||||
print("No tracking file found.")
|
||||
return
|
||||
|
||||
with open(tracking_file, 'r', encoding='utf-8') as f:
|
||||
tracking = json.load(f)
|
||||
|
||||
songs = tracking.get("songs", {})
|
||||
total_songs = len(songs)
|
||||
|
||||
if total_songs == 0:
|
||||
print("No songs in tracking file.")
|
||||
return
|
||||
|
||||
# Count by status
|
||||
status_counts = {}
|
||||
channel_counts = {}
|
||||
|
||||
for song_id, song_data in songs.items():
|
||||
status = song_data.get("status", "UNKNOWN")
|
||||
channel = song_data.get("channel_name", "UNKNOWN")
|
||||
|
||||
status_counts[status] = status_counts.get(status, 0) + 1
|
||||
channel_counts[channel] = channel_counts.get(channel, 0) + 1
|
||||
|
||||
print(f"\n📊 Current Download Statistics:")
|
||||
print(f"Total songs: {total_songs}")
|
||||
print(f"\nBy Status:")
|
||||
for status, count in status_counts.items():
|
||||
print(f" {status}: {count}")
|
||||
|
||||
print(f"\nBy Channel:")
|
||||
for channel, count in channel_counts.items():
|
||||
print(f" {channel}: {count}")
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to handle reset and re-download process."""
|
||||
print("🔄 Karaoke Download Reset and Re-download Tool")
|
||||
print("=" * 50)
|
||||
|
||||
# Show current stats
|
||||
print("\nCurrent download statistics:")
|
||||
show_download_stats()
|
||||
|
||||
# Ask user what they want to do
|
||||
print("\nOptions:")
|
||||
print("1. Reset tracking only (keep files)")
|
||||
print("2. Reset tracking and delete all downloaded files")
|
||||
print("3. Show current stats only")
|
||||
print("4. Exit")
|
||||
|
||||
choice = input("\nEnter your choice (1-4): ").strip()
|
||||
|
||||
if choice == "1":
|
||||
print("\n🔄 Resetting tracking only...")
|
||||
reset_karaoke_tracking()
|
||||
print("\n✅ Tracking reset complete!")
|
||||
print("You can now re-download files with the new channel parser system.")
|
||||
print("\nTo re-download, run:")
|
||||
print("python download_karaoke.py --file data/channels.txt --limit 50")
|
||||
|
||||
elif choice == "2":
|
||||
print("\n🔄 Resetting tracking and deleting files...")
|
||||
confirm = input("Are you sure you want to delete ALL downloaded files? (yes/no): ").strip().lower()
|
||||
|
||||
if confirm == "yes":
|
||||
reset_karaoke_tracking()
|
||||
delete_downloaded_files()
|
||||
print("\n✅ Reset complete! All tracking and files have been removed.")
|
||||
print("You can now re-download files with the new channel parser system.")
|
||||
print("\nTo re-download, run:")
|
||||
print("python download_karaoke.py --file data/channels.txt --limit 50")
|
||||
else:
|
||||
print("Operation cancelled.")
|
||||
|
||||
elif choice == "3":
|
||||
print("\n📊 Current statistics:")
|
||||
show_download_stats()
|
||||
|
||||
elif choice == "4":
|
||||
print("Exiting...")
|
||||
|
||||
else:
|
||||
print("Invalid choice. Please enter 1, 2, 3, or 4.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -1,11 +1,15 @@
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from karaoke_downloader.data_path_manager import get_data_path_manager
|
||||
|
||||
def normalize_title(title):
|
||||
normalized = title.replace("(Karaoke Version)", "").replace("(Karaoke)", "").strip()
|
||||
return " ".join(normalized.split()).lower()
|
||||
|
||||
def load_songlist(songlist_path="data/songList.json"):
|
||||
def load_songlist(songlist_path=None):
|
||||
if songlist_path is None:
|
||||
songlist_path = str(get_data_path_manager().get_songlist_path())
|
||||
songlist_file = Path(songlist_path)
|
||||
if not songlist_file.exists():
|
||||
print(f"⚠️ Songlist file not found: {songlist_path}")
|
||||
@ -24,14 +28,18 @@ def load_songlist(songlist_path="data/songList.json"):
|
||||
})
|
||||
return all_songs
|
||||
|
||||
def load_songlist_tracking(tracking_path="data/songlist_tracking.json"):
|
||||
def load_songlist_tracking(tracking_path=None):
|
||||
if tracking_path is None:
|
||||
tracking_path = str(get_data_path_manager().get_songlist_tracking_path())
|
||||
tracking_file = Path(tracking_path)
|
||||
if not tracking_file.exists():
|
||||
return {}
|
||||
with open(tracking_file, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
|
||||
def load_server_songs(songs_path="data/songs.json"):
|
||||
def load_server_songs(songs_path=None):
|
||||
if songs_path is None:
|
||||
songs_path = str(get_data_path_manager().get_songs_path())
|
||||
"""Load the list of songs already available on the server."""
|
||||
songs_file = Path(songs_path)
|
||||
if not songs_file.exists():
|
||||
Loading…
Reference in New Issue
Block a user