Signed-off-by: mbrucedogs <mbrucedogs@gmail.com>
This commit is contained in:
parent
21f8348419
commit
ec95b24a69
34
PRD.md
34
PRD.md
@ -1,5 +1,5 @@
|
||||
|
||||
# 🎤 Karaoke Video Downloader – PRD (v3.3)
|
||||
# 🎤 Karaoke Video Downloader – PRD (v3.4.3)
|
||||
|
||||
## ✅ Overview
|
||||
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
||||
@ -123,6 +123,8 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
||||
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
||||
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
||||
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
||||
- ✅ **Manual video collection**: Static video collection system for managing individual karaoke videos that don't belong to regular channels. Use `--manual` to download from `data/manual_videos.json`.
|
||||
- ✅ **Channel-specific parsing rules**: JSON-based configuration for parsing video titles from different YouTube channels, with support for various title formats and cleanup rules.
|
||||
|
||||
---
|
||||
|
||||
@ -155,7 +157,9 @@ KaroakeVideoDownloader/
|
||||
│ ├── karaoke_tracking.json
|
||||
│ ├── songlist_tracking.json
|
||||
│ ├── channel_cache.json
|
||||
│ ├── channels.txt
|
||||
│ ├── channels.json # Channel configuration with parsing rules
|
||||
│ ├── channels.txt # Legacy channel list (backward compatibility)
|
||||
│ ├── manual_videos.json # Manual video collection
|
||||
│ └── songList.json
|
||||
├── downloads/ # All video output
|
||||
│ └── [ChannelName]/ # Per-channel folders
|
||||
@ -192,6 +196,7 @@ KaroakeVideoDownloader/
|
||||
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
||||
- `--parallel`: **Enable parallel downloads for improved speed**
|
||||
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
|
||||
- `--manual`: **Download from manual videos collection (data/manual_videos.json)**
|
||||
|
||||
---
|
||||
|
||||
@ -202,6 +207,8 @@ KaroakeVideoDownloader/
|
||||
- **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files.
|
||||
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
||||
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
||||
- **Channel-Specific Parsing:** Uses `data/channels.json` to define parsing rules for each YouTube channel, handling different video title formats (e.g., "Artist - Title", "Artist Title", "Title | Artist", etc.).
|
||||
- **Manual Video Collection:** Static video management system using `data/manual_videos.json` for individual karaoke videos that don't belong to regular channels. Accessible via `--manual` parameter.
|
||||
|
||||
## 🔧 Refactoring Improvements (v3.3)
|
||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||
@ -335,6 +342,29 @@ The codebase has been comprehensively refactored to improve maintainability and
|
||||
- Check logs for "⏭️ Skipping download - file already exists" messages
|
||||
- These indicate the duplicate prevention is working correctly
|
||||
|
||||
## 🔧 Recent Bug Fixes & Improvements (v3.4.3)
|
||||
### **Manual Video Collection System**
|
||||
- **New `--manual` parameter**: Simple access to manual video collection via `python download_karaoke.py --manual --limit 5`
|
||||
- **Static video management**: `data/manual_videos.json` stores individual karaoke videos that don't belong to regular channels
|
||||
- **Helper script**: `add_manual_video.py` provides easy management of manual video entries
|
||||
- **Full integration**: Manual videos work with all existing features (songlist matching, fuzzy matching, parallel downloads, etc.)
|
||||
- **No yt-dlp dependency**: Manual videos bypass YouTube API calls for video listing, using static data instead
|
||||
|
||||
### **Channel-Specific Parsing Rules**
|
||||
- **JSON-based configuration**: `data/channels.json` replaces `data/channels.txt` with structured channel configuration
|
||||
- **Parsing rules per channel**: Each channel can define custom parsing rules for video titles
|
||||
- **Multiple format support**: Handles various title formats like "Artist - Title", "Artist Title", "Title | Artist", etc.
|
||||
- **Suffix cleanup**: Automatic removal of common karaoke-related suffixes
|
||||
- **Multi-artist support**: Parsing for titles with multiple artists separated by specific delimiters
|
||||
- **Backward compatibility**: Still supports legacy `data/channels.txt` format
|
||||
|
||||
### **Benefits of New Features**
|
||||
- **Flexible video management**: Easy addition of individual karaoke videos without creating new channels
|
||||
- **Accurate parsing**: Channel-specific rules ensure correct artist/title extraction for ID3 tags and filenames
|
||||
- **Consistent metadata**: Proper parsing prevents filename and ID3 tag inconsistencies
|
||||
- **Easy maintenance**: Simple JSON structure for managing both channels and manual videos
|
||||
- **Full feature compatibility**: Manual videos work seamlessly with existing download modes and features
|
||||
|
||||
## 📚 Documentation Standards
|
||||
|
||||
### **Documentation Location**
|
||||
|
||||
190
add_manual_video.py
Normal file
190
add_manual_video.py
Normal file
@ -0,0 +1,190 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Helper script to add manual videos to the manual videos collection.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
def extract_video_id(url: str) -> Optional[str]:
|
||||
"""Extract video ID from YouTube URL."""
|
||||
patterns = [
|
||||
r'(?:youtube\.com/watch\?v=|youtu\.be/|youtube\.com/embed/)([a-zA-Z0-9_-]{11})',
|
||||
r'youtube\.com/watch\?.*v=([a-zA-Z0-9_-]{11})'
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
return None
|
||||
|
||||
def add_manual_video(title: str, url: str, manual_file: str = "data/manual_videos.json"):
|
||||
"""
|
||||
Add a manual video to the collection.
|
||||
|
||||
Args:
|
||||
title: Video title (e.g., "Artist - Song (Karaoke Version)")
|
||||
url: YouTube URL
|
||||
manual_file: Path to manual videos JSON file
|
||||
"""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
# Load existing data or create new
|
||||
if manual_path.exists():
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
else:
|
||||
data = {
|
||||
"channel_name": "@ManualVideos",
|
||||
"channel_url": "manual://static",
|
||||
"description": "Manual collection of individual karaoke videos",
|
||||
"videos": [],
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Extract video ID
|
||||
video_id = extract_video_id(url)
|
||||
if not video_id:
|
||||
print(f"❌ Could not extract video ID from URL: {url}")
|
||||
return False
|
||||
|
||||
# Check if video already exists
|
||||
existing_ids = [video.get("id") for video in data["videos"]]
|
||||
if video_id in existing_ids:
|
||||
print(f"⚠️ Video already exists: {title}")
|
||||
return False
|
||||
|
||||
# Add new video
|
||||
new_video = {
|
||||
"title": title,
|
||||
"url": url,
|
||||
"id": video_id,
|
||||
"upload_date": "2024-01-01", # Default date
|
||||
"duration": 180, # Default duration
|
||||
"view_count": 1000 # Default view count
|
||||
}
|
||||
|
||||
data["videos"].append(new_video)
|
||||
|
||||
# Save updated data
|
||||
manual_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(manual_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
print(f"✅ Added video: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" ID: {video_id}")
|
||||
return True
|
||||
|
||||
def list_manual_videos(manual_file: str = "data/manual_videos.json"):
|
||||
"""List all manual videos."""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
if not manual_path.exists():
|
||||
print("❌ No manual videos file found")
|
||||
return
|
||||
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
print(f"📋 Manual Videos ({len(data['videos'])} videos):")
|
||||
print("=" * 60)
|
||||
|
||||
for i, video in enumerate(data['videos'], 1):
|
||||
print(f"{i:2d}. {video['title']}")
|
||||
print(f" URL: {video['url']}")
|
||||
print(f" ID: {video['id']}")
|
||||
print()
|
||||
|
||||
def remove_manual_video(video_id: str, manual_file: str = "data/manual_videos.json"):
|
||||
"""Remove a manual video by ID."""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
if not manual_path.exists():
|
||||
print("❌ No manual videos file found")
|
||||
return False
|
||||
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
# Find and remove video
|
||||
for i, video in enumerate(data['videos']):
|
||||
if video['id'] == video_id:
|
||||
removed_video = data['videos'].pop(i)
|
||||
with open(manual_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
print(f"✅ Removed video: {removed_video['title']}")
|
||||
return True
|
||||
|
||||
print(f"❌ Video with ID '{video_id}' not found")
|
||||
return False
|
||||
|
||||
def main():
|
||||
"""Interactive mode for adding manual videos."""
|
||||
print("🎤 Manual Video Manager")
|
||||
print("=" * 30)
|
||||
print("1. Add video")
|
||||
print("2. List videos")
|
||||
print("3. Remove video")
|
||||
print("4. Exit")
|
||||
|
||||
while True:
|
||||
choice = input("\nSelect option (1-4): ").strip()
|
||||
|
||||
if choice == "1":
|
||||
title = input("Enter video title (e.g., 'Artist - Song (Karaoke Version)'): ").strip()
|
||||
url = input("Enter YouTube URL: ").strip()
|
||||
|
||||
if title and url:
|
||||
add_manual_video(title, url)
|
||||
else:
|
||||
print("❌ Title and URL are required")
|
||||
|
||||
elif choice == "2":
|
||||
list_manual_videos()
|
||||
|
||||
elif choice == "3":
|
||||
video_id = input("Enter video ID to remove: ").strip()
|
||||
if video_id:
|
||||
remove_manual_video(video_id)
|
||||
else:
|
||||
print("❌ Video ID is required")
|
||||
|
||||
elif choice == "4":
|
||||
print("👋 Goodbye!")
|
||||
break
|
||||
|
||||
else:
|
||||
print("❌ Invalid option")
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
if len(sys.argv) > 1:
|
||||
# Command line mode
|
||||
if sys.argv[1] == "add" and len(sys.argv) >= 4:
|
||||
add_manual_video(sys.argv[2], sys.argv[3])
|
||||
elif sys.argv[1] == "list":
|
||||
list_manual_videos()
|
||||
elif sys.argv[1] == "remove" and len(sys.argv) >= 3:
|
||||
remove_manual_video(sys.argv[2])
|
||||
else:
|
||||
print("Usage:")
|
||||
print(" python add_manual_video.py add 'Title' 'URL'")
|
||||
print(" python add_manual_video.py list")
|
||||
print(" python add_manual_video.py remove VIDEO_ID")
|
||||
else:
|
||||
# Interactive mode
|
||||
main()
|
||||
44
commands.txt
44
commands.txt
@ -1,6 +1,6 @@
|
||||
# 🎤 Karaoke Video Downloader - CLI Commands Reference
|
||||
# Copy and paste these commands into your terminal
|
||||
# Updated: v3.4 (includes parallel downloads and all refactoring improvements)
|
||||
# Updated: v3.4.3 (includes manual video collection, channel parsing rules, and all previous improvements)
|
||||
|
||||
## 📥 BASIC DOWNLOADS
|
||||
|
||||
@ -19,6 +19,32 @@ python download_karaoke.py --limit 10 https://www.youtube.com/@SingKingKaraoke/v
|
||||
# Enable parallel downloads for faster processing (3-5x speedup)
|
||||
python download_karaoke.py --parallel --workers 5 --limit 10 https://www.youtube.com/@SingKingKaraoke/videos
|
||||
|
||||
## 🎤 MANUAL VIDEO COLLECTION (v3.4.3)
|
||||
|
||||
# Download from manual videos collection (data/manual_videos.json)
|
||||
python download_karaoke.py --manual --limit 5
|
||||
|
||||
# Download manual videos with fuzzy matching
|
||||
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
||||
|
||||
# Download manual videos with parallel processing
|
||||
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
||||
|
||||
# Download manual videos with songlist matching
|
||||
python download_karaoke.py --manual --songlist-only --limit 10
|
||||
|
||||
# Force download from manual videos (bypass existing file checks)
|
||||
python download_karaoke.py --manual --force --limit 5
|
||||
|
||||
# Add a video to manual collection (interactive)
|
||||
python add_manual_video.py add "Artist - Song Title (Karaoke Version)" "https://www.youtube.com/watch?v=VIDEO_ID"
|
||||
|
||||
# List all manual videos
|
||||
python add_manual_video.py list
|
||||
|
||||
# Remove a video from manual collection
|
||||
python add_manual_video.py remove "Artist - Song Title (Karaoke Version)"
|
||||
|
||||
## 📋 SONG LIST GENERATION
|
||||
|
||||
# Generate song list from MP4 files in a directory (append to existing song list)
|
||||
@ -258,6 +284,15 @@ python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
||||
python download_karaoke.py --status
|
||||
python download_karaoke.py --clear-cache all
|
||||
|
||||
# 7. Download from manual video collection
|
||||
python download_karaoke.py --manual --limit 5
|
||||
|
||||
# 7b. Fast parallel manual video download
|
||||
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
||||
|
||||
# 7c. Manual videos with fuzzy matching
|
||||
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
||||
|
||||
## 🔧 TROUBLESHOOTING COMMANDS
|
||||
|
||||
# Check if everything is working
|
||||
@ -273,7 +308,9 @@ python download_karaoke.py --clear-server-duplicates
|
||||
## 📝 NOTES
|
||||
|
||||
# Default files used:
|
||||
# - data/channels.txt (default channel list for songlist modes)
|
||||
# - data/channels.json (channel configuration with parsing rules, preferred)
|
||||
# - data/channels.txt (legacy channel list, backward compatibility)
|
||||
# - data/manual_videos.json (manual video collection)
|
||||
# - data/songList.json (your prioritized song list)
|
||||
# - data/config.json (download settings)
|
||||
|
||||
@ -282,11 +319,12 @@ python download_karaoke.py --clear-server-duplicates
|
||||
# Fuzzy threshold: 0-100 (higher = more strict matching, default 90)
|
||||
|
||||
# The system automatically:
|
||||
# - Uses data/channels.txt if no --file specified in songlist modes
|
||||
# - Uses data/channels.json if available, falls back to data/channels.txt if no --file specified in songlist modes
|
||||
# - Caches channel data for 24 hours (configurable)
|
||||
# - Tracks all downloads in JSON files
|
||||
# - Avoids re-downloading existing files
|
||||
# - Checks for server duplicates
|
||||
# - Supports manual video collection via --manual parameter
|
||||
|
||||
# For best performance:
|
||||
# - Use --parallel --workers 5 for 3-5x faster downloads
|
||||
|
||||
@ -131,6 +131,22 @@
|
||||
},
|
||||
"description": "Title first, then dash separator, then artist with KARAOKE suffix"
|
||||
},
|
||||
{
|
||||
"name": "@ManualVideos",
|
||||
"url": "manual://static",
|
||||
"manual_videos_file": "data/manual_videos.json",
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"description": "Manual collection of individual karaoke videos (static, never expires)"
|
||||
},
|
||||
{
|
||||
"name": "Let's Sing Karaoke",
|
||||
"url": "https://www.youtube.com/@LetsSingKaraoke/videos",
|
||||
|
||||
45
data/manual_videos.json
Normal file
45
data/manual_videos.json
Normal file
@ -0,0 +1,45 @@
|
||||
{
|
||||
"channel_name": "@ManualVideos",
|
||||
"channel_url": "manual://static",
|
||||
"description": "Manual collection of individual karaoke videos",
|
||||
"videos": [
|
||||
{
|
||||
"title": "10,000 Maniacs - Because The Night",
|
||||
"url": "https://www.youtube.com/watch?v=7CoVTWBw1xs",
|
||||
"id": "7CoVTWBw1xs",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "10,000 Maniacs - Like The Weather",
|
||||
"url": "https://www.youtube.com/watch?v=brc7wNVRv_4",
|
||||
"id": "brc7wNVRv_4",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
},
|
||||
{
|
||||
"title": "10,000 Maniacs - More Than This",
|
||||
"url": "https://www.youtube.com/watch?v=wxnuF-APJ5M",
|
||||
"id": "wxnuF-APJ5M",
|
||||
"upload_date": "2024-01-01",
|
||||
"duration": 180,
|
||||
"view_count": 1000
|
||||
}
|
||||
],
|
||||
"parsing_rules": {
|
||||
"format": "artist_title_separator",
|
||||
"separator": " - ",
|
||||
"artist_first": true,
|
||||
"title_cleanup": {
|
||||
"remove_suffix": {
|
||||
"suffixes": [
|
||||
"(Karaoke)",
|
||||
"(Karaoke Version)",
|
||||
"(Karaoke Version) Lyrics"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -8286,5 +8286,21 @@
|
||||
"channel": "@VocalStarKaraoke",
|
||||
"marked_at": "2025-07-28T10:18:11.001221",
|
||||
"reason": "already_on_server"
|
||||
},
|
||||
"kendrick lamar_not like us": {
|
||||
"artist": "Kendrick Lamar",
|
||||
"title": "Not Like Us",
|
||||
"video_title": "Kendrick Lamar Not Like Us (Karaoke Version) Lyrics",
|
||||
"channel": "@sing2karaoke",
|
||||
"marked_at": "2025-07-28T14:24:01.915881",
|
||||
"reason": "already_on_server"
|
||||
},
|
||||
"ed sheeran_you need me i don't need you": {
|
||||
"artist": "Ed Sheeran",
|
||||
"title": "You Need Me I Don't Need You",
|
||||
"video_title": "Ed Sheeran You Need Me I Don't Need You (Karaoke Version) Lyrics",
|
||||
"channel": "@sing2karaoke",
|
||||
"marked_at": "2025-07-28T14:24:01.939201",
|
||||
"reason": "already_on_server"
|
||||
}
|
||||
}
|
||||
5810
data/songs.json
5810
data/songs.json
File diff suppressed because it is too large
Load Diff
@ -102,6 +102,7 @@ Examples:
|
||||
python download_karaoke.py --songlist-only --limit 10 # Download only songlist songs across channels
|
||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos # Download from specific channel
|
||||
python download_karaoke.py --file data/channels.txt # Download from custom channel list
|
||||
python download_karaoke.py --manual --limit 5 # Download from manual videos collection
|
||||
python download_karaoke.py --reset-channel SingKingKaraoke --delete-files
|
||||
""",
|
||||
)
|
||||
@ -292,6 +293,11 @@ Examples:
|
||||
action="store_true",
|
||||
help="Create a new song list instead of appending when using --generate-songlist",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--manual",
|
||||
action="store_true",
|
||||
help="Download from manual videos collection (data/manual_videos.json)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
# Validate workers argument
|
||||
@ -464,6 +470,15 @@ Examples:
|
||||
if len(tracking) > 10:
|
||||
print(f" ... and {len(tracking) - 10} more")
|
||||
sys.exit(0)
|
||||
elif args.manual:
|
||||
# Download from manual videos collection
|
||||
print("🎤 Downloading from manual videos collection...")
|
||||
success = downloader.download_channel_videos(
|
||||
"manual://static",
|
||||
force_refresh=args.refresh,
|
||||
fuzzy_match=args.fuzzy_match,
|
||||
fuzzy_threshold=args.fuzzy_threshold,
|
||||
)
|
||||
elif args.songlist_only or args.songlist_focus:
|
||||
# Use provided file or default to channels configuration
|
||||
channel_urls = load_channels(args.file)
|
||||
|
||||
@ -63,6 +63,7 @@ from karaoke_downloader.parallel_downloader import (
|
||||
create_parallel_downloader,
|
||||
)
|
||||
from karaoke_downloader.youtube_utils import get_channel_info, get_playlist_info
|
||||
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
||||
|
||||
# Constants
|
||||
DEFAULT_FUZZY_THRESHOLD = 85
|
||||
@ -186,6 +187,35 @@ class KaraokeDownloader:
|
||||
fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD,
|
||||
):
|
||||
"""Download videos from a channel or playlist URL, respecting songlist-only and limit flags. Supports fuzzy matching."""
|
||||
|
||||
# Check if this is a manual channel
|
||||
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
||||
|
||||
if is_manual_channel(url):
|
||||
channel_name, channel_id = get_manual_channel_info(url)
|
||||
print(f"\n🎬 Downloading from manual channel: {channel_name} ({url})")
|
||||
|
||||
# Load manual videos
|
||||
manual_videos = get_manual_videos_for_channel(channel_name)
|
||||
if not manual_videos:
|
||||
print("⚠️ No manual videos found. Skipping.")
|
||||
return False
|
||||
|
||||
# Convert to the expected format
|
||||
available_videos = []
|
||||
for video in manual_videos:
|
||||
available_videos.append({
|
||||
"title": video.get("title", ""),
|
||||
"id": video.get("id", ""),
|
||||
"url": video.get("url", "")
|
||||
})
|
||||
|
||||
print(f"📋 Found {len(available_videos)} manual videos")
|
||||
|
||||
# Process manual videos (skip yt-dlp)
|
||||
return self._process_videos_for_download(available_videos, channel_name, force_refresh, fuzzy_match, fuzzy_threshold)
|
||||
|
||||
# Regular YouTube channel processing
|
||||
channel_name, channel_id = get_channel_info(url)
|
||||
print(f"\n🎬 Downloading from channel: {channel_name} ({url})")
|
||||
songlist = load_songlist(self.songlist_file_path)
|
||||
@ -1011,6 +1041,134 @@ class KaraokeDownloader:
|
||||
# --- Download phase ---
|
||||
return self.execute_latest_per_channel_parallel(channel_plans, cache_file)
|
||||
|
||||
def _process_videos_for_download(self, available_videos, channel_name, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD):
|
||||
"""Process videos for download (used for both manual and regular channels)."""
|
||||
songlist = load_songlist(self.songlist_file_path)
|
||||
if not songlist:
|
||||
print("⚠️ No songlist loaded. Skipping.")
|
||||
return False
|
||||
|
||||
# Load server songs and duplicates tracking for availability checking
|
||||
server_songs = load_server_songs()
|
||||
server_duplicates_tracking = load_server_duplicates_tracking()
|
||||
|
||||
limit = getattr(self.config, "limit", 1)
|
||||
|
||||
# Normalize songlist for matching
|
||||
normalized_songlist = {
|
||||
create_song_key(s["artist"], s["title"]): s for s in songlist
|
||||
}
|
||||
matches = []
|
||||
similarity = get_similarity_function()
|
||||
|
||||
print(f"🔍 Scanning {len(available_videos)} videos for songlist matches...")
|
||||
|
||||
for video in available_videos:
|
||||
title = video["title"]
|
||||
video_id = video["id"]
|
||||
|
||||
# Extract artist and title using channel parser
|
||||
artist, extracted_title = self.channel_parser.extract_artist_title(title, channel_name)
|
||||
|
||||
if not artist and not extracted_title:
|
||||
continue
|
||||
|
||||
song_key = create_song_key(artist, extracted_title)
|
||||
|
||||
# Check for exact matches first
|
||||
if song_key in normalized_songlist:
|
||||
song_data = normalized_songlist[song_key]
|
||||
matches.append({
|
||||
"video": video,
|
||||
"song": song_data,
|
||||
"match_type": "exact",
|
||||
"match_score": 100.0,
|
||||
"artist": artist,
|
||||
"title": extracted_title
|
||||
})
|
||||
print(f" ✅ Exact match: {artist} - {extracted_title}")
|
||||
continue
|
||||
|
||||
# Check for fuzzy matches if enabled
|
||||
if fuzzy_match:
|
||||
best_match = None
|
||||
best_score = 0
|
||||
|
||||
for song_key, song_data in normalized_songlist.items():
|
||||
score = similarity(f"{artist} {extracted_title}", f"{song_data['artist']} {song_data['title']}")
|
||||
if score > best_score and score >= fuzzy_threshold:
|
||||
best_score = score
|
||||
best_match = song_data
|
||||
|
||||
if best_match:
|
||||
matches.append({
|
||||
"video": video,
|
||||
"song": best_match,
|
||||
"match_type": "fuzzy",
|
||||
"match_score": best_score,
|
||||
"artist": artist,
|
||||
"title": extracted_title
|
||||
})
|
||||
print(f" 🎯 Fuzzy match ({best_score:.1f}%): {artist} - {extracted_title} -> {best_match['artist']} - {best_match['title']}")
|
||||
|
||||
print(f"📊 Found {len(matches)} matches out of {len(available_videos)} videos")
|
||||
|
||||
if not matches:
|
||||
print("❌ No matches found in songlist")
|
||||
return False
|
||||
|
||||
# Sort matches by score (exact matches first, then by fuzzy score)
|
||||
matches.sort(key=lambda x: (x["match_type"] != "exact", -x["match_score"]))
|
||||
|
||||
# Limit downloads
|
||||
if limit:
|
||||
matches = matches[:limit]
|
||||
print(f"🎯 Limiting to {len(matches)} downloads")
|
||||
|
||||
# Download matched videos
|
||||
success_count = 0
|
||||
for i, match in enumerate(matches, 1):
|
||||
video = match["video"]
|
||||
song = match["song"]
|
||||
artist = match["artist"]
|
||||
title = match["title"]
|
||||
video_id = video["id"]
|
||||
|
||||
print(f"\n⬇️ Downloading {i}/{len(matches)}: {artist} - {title}")
|
||||
print(f" 🎬 Video: {video['title']} ({channel_name})")
|
||||
if match["match_type"] == "fuzzy":
|
||||
print(f" 🎯 Match Score: {match['match_score']:.1f}%")
|
||||
|
||||
# Create filename
|
||||
filename = sanitize_filename(artist, title)
|
||||
output_path = self.downloads_dir / channel_name / filename
|
||||
|
||||
# Use the download pipeline
|
||||
pipeline = DownloadPipeline(
|
||||
yt_dlp_path=str(self.yt_dlp_path),
|
||||
config=self.config,
|
||||
downloads_dir=self.downloads_dir,
|
||||
songlist_tracking=self.songlist_tracking,
|
||||
tracker=self.tracker,
|
||||
)
|
||||
|
||||
success = pipeline.execute_pipeline(
|
||||
video_id=video_id,
|
||||
artist=artist,
|
||||
title=title,
|
||||
channel_name=channel_name,
|
||||
video_title=video["title"]
|
||||
)
|
||||
|
||||
if success:
|
||||
success_count += 1
|
||||
print(f"✅ Successfully downloaded: {artist} - {title}")
|
||||
else:
|
||||
print(f"❌ Failed to download: {artist} - {title}")
|
||||
|
||||
print(f"\n🎉 Download complete! {success_count}/{len(matches)} videos downloaded successfully")
|
||||
return success_count > 0
|
||||
|
||||
|
||||
def reset_songlist_all():
|
||||
"""Delete all files tracked in songlist_tracking.json, clear songlist_tracking.json, and remove songlist songs from karaoke_tracking.json."""
|
||||
|
||||
77
karaoke_downloader/manual_video_manager.py
Normal file
77
karaoke_downloader/manual_video_manager.py
Normal file
@ -0,0 +1,77 @@
|
||||
"""
|
||||
Manual video manager for handling static video collections.
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Any
|
||||
|
||||
def load_manual_videos(manual_file: str = "data/manual_videos.json") -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Load manual videos from the JSON file.
|
||||
|
||||
Args:
|
||||
manual_file: Path to manual videos JSON file
|
||||
|
||||
Returns:
|
||||
List of video dictionaries
|
||||
"""
|
||||
manual_path = Path(manual_file)
|
||||
|
||||
if not manual_path.exists():
|
||||
print(f"⚠️ Manual videos file not found: {manual_file}")
|
||||
return []
|
||||
|
||||
try:
|
||||
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
videos = data.get("videos", [])
|
||||
print(f"📋 Loaded {len(videos)} manual videos from {manual_file}")
|
||||
return videos
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error loading manual videos: {e}")
|
||||
return []
|
||||
|
||||
def get_manual_videos_for_channel(channel_name: str, manual_file: str = "data/manual_videos.json") -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Get manual videos for a specific channel.
|
||||
|
||||
Args:
|
||||
channel_name: Channel name (should be "@ManualVideos")
|
||||
manual_file: Path to manual videos JSON file
|
||||
|
||||
Returns:
|
||||
List of video dictionaries
|
||||
"""
|
||||
if channel_name != "@ManualVideos":
|
||||
return []
|
||||
|
||||
return load_manual_videos(manual_file)
|
||||
|
||||
def is_manual_channel(channel_url: str) -> bool:
|
||||
"""
|
||||
Check if a channel URL is a manual channel.
|
||||
|
||||
Args:
|
||||
channel_url: Channel URL
|
||||
|
||||
Returns:
|
||||
True if it's a manual channel
|
||||
"""
|
||||
return channel_url == "manual://static"
|
||||
|
||||
def get_manual_channel_info(channel_url: str) -> tuple[str, str]:
|
||||
"""
|
||||
Get channel info for manual channels.
|
||||
|
||||
Args:
|
||||
channel_url: Channel URL
|
||||
|
||||
Returns:
|
||||
Tuple of (channel_name, channel_id)
|
||||
"""
|
||||
if channel_url == "manual://static":
|
||||
return "@ManualVideos", "manual"
|
||||
return None, None
|
||||
@ -341,8 +341,31 @@ class TrackingManager:
|
||||
show_pagination: Show page-by-page progress (slower but more detailed)
|
||||
"""
|
||||
channel_name, channel_id = None, None
|
||||
from karaoke_downloader.youtube_utils import get_channel_info
|
||||
|
||||
# Check if this is a manual channel
|
||||
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
||||
|
||||
if is_manual_channel(channel_url):
|
||||
channel_name, channel_id = get_manual_channel_info(channel_url)
|
||||
if channel_name and channel_id:
|
||||
print(f" 📋 Loading manual videos for {channel_name}")
|
||||
manual_videos = get_manual_videos_for_channel(channel_name)
|
||||
# Convert to the expected format
|
||||
videos = []
|
||||
for video in manual_videos:
|
||||
videos.append({
|
||||
"title": video.get("title", ""),
|
||||
"id": video.get("id", ""),
|
||||
"url": video.get("url", "")
|
||||
})
|
||||
print(f" ✅ Loaded {len(videos)} manual videos")
|
||||
return videos
|
||||
else:
|
||||
print(f" ❌ Could not get manual channel info for: {channel_url}")
|
||||
return []
|
||||
|
||||
# Regular YouTube channel processing
|
||||
from karaoke_downloader.youtube_utils import get_channel_info
|
||||
channel_name, channel_id = get_channel_info(channel_url)
|
||||
|
||||
if not channel_id:
|
||||
|
||||
Loading…
Reference in New Issue
Block a user