Signed-off-by: mbrucedogs <mbrucedogs@gmail.com>
This commit is contained in:
parent
21f8348419
commit
ec95b24a69
34
PRD.md
34
PRD.md
@ -1,5 +1,5 @@
|
|||||||
|
|
||||||
# 🎤 Karaoke Video Downloader – PRD (v3.3)
|
# 🎤 Karaoke Video Downloader – PRD (v3.4.3)
|
||||||
|
|
||||||
## ✅ Overview
|
## ✅ Overview
|
||||||
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration. The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse.
|
||||||
@ -123,6 +123,8 @@ python download_karaoke.py --clear-cache SingKingKaraoke
|
|||||||
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
- ✅ **Centralized file operations**: Single source of truth for filename sanitization, file validation, and path operations
|
||||||
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
- ✅ **Centralized song validation**: Unified logic for checking if songs should be downloaded across all modules
|
||||||
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
- ✅ **Enhanced configuration management**: Structured configuration with dataclasses, type safety, and validation
|
||||||
|
- ✅ **Manual video collection**: Static video collection system for managing individual karaoke videos that don't belong to regular channels. Use `--manual` to download from `data/manual_videos.json`.
|
||||||
|
- ✅ **Channel-specific parsing rules**: JSON-based configuration for parsing video titles from different YouTube channels, with support for various title formats and cleanup rules.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -155,7 +157,9 @@ KaroakeVideoDownloader/
|
|||||||
│ ├── karaoke_tracking.json
|
│ ├── karaoke_tracking.json
|
||||||
│ ├── songlist_tracking.json
|
│ ├── songlist_tracking.json
|
||||||
│ ├── channel_cache.json
|
│ ├── channel_cache.json
|
||||||
│ ├── channels.txt
|
│ ├── channels.json # Channel configuration with parsing rules
|
||||||
|
│ ├── channels.txt # Legacy channel list (backward compatibility)
|
||||||
|
│ ├── manual_videos.json # Manual video collection
|
||||||
│ └── songList.json
|
│ └── songList.json
|
||||||
├── downloads/ # All video output
|
├── downloads/ # All video output
|
||||||
│ └── [ChannelName]/ # Per-channel folders
|
│ └── [ChannelName]/ # Per-channel folders
|
||||||
@ -192,6 +196,7 @@ KaroakeVideoDownloader/
|
|||||||
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
- `--fuzzy-threshold <N>`: **Fuzzy match threshold (0-100, default 85)**
|
||||||
- `--parallel`: **Enable parallel downloads for improved speed**
|
- `--parallel`: **Enable parallel downloads for improved speed**
|
||||||
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
|
- `--workers <N>`: **Number of parallel download workers (1-10, default: 3, only used with --parallel)**
|
||||||
|
- `--manual`: **Download from manual videos collection (data/manual_videos.json)**
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -202,6 +207,8 @@ KaroakeVideoDownloader/
|
|||||||
- **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files.
|
- **ID3 Tagging:** Artist/title extracted from video title and embedded in MP4 files.
|
||||||
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
- **Cleanup:** Extra files from yt-dlp (e.g., `.info.json`) are automatically removed after download.
|
||||||
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
- **Reset/Clear:** Use `--reset-channel` to reset all tracking and files for a channel (optionally including songlist songs with `--reset-songlist`). Use `--clear-cache` to clear cached video lists for a channel or all channels.
|
||||||
|
- **Channel-Specific Parsing:** Uses `data/channels.json` to define parsing rules for each YouTube channel, handling different video title formats (e.g., "Artist - Title", "Artist Title", "Title | Artist", etc.).
|
||||||
|
- **Manual Video Collection:** Static video management system using `data/manual_videos.json` for individual karaoke videos that don't belong to regular channels. Accessible via `--manual` parameter.
|
||||||
|
|
||||||
## 🔧 Refactoring Improvements (v3.3)
|
## 🔧 Refactoring Improvements (v3.3)
|
||||||
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
The codebase has been comprehensively refactored to improve maintainability and reduce code duplication. Recent improvements have enhanced reliability, performance, and code organization:
|
||||||
@ -335,6 +342,29 @@ The codebase has been comprehensively refactored to improve maintainability and
|
|||||||
- Check logs for "⏭️ Skipping download - file already exists" messages
|
- Check logs for "⏭️ Skipping download - file already exists" messages
|
||||||
- These indicate the duplicate prevention is working correctly
|
- These indicate the duplicate prevention is working correctly
|
||||||
|
|
||||||
|
## 🔧 Recent Bug Fixes & Improvements (v3.4.3)
|
||||||
|
### **Manual Video Collection System**
|
||||||
|
- **New `--manual` parameter**: Simple access to manual video collection via `python download_karaoke.py --manual --limit 5`
|
||||||
|
- **Static video management**: `data/manual_videos.json` stores individual karaoke videos that don't belong to regular channels
|
||||||
|
- **Helper script**: `add_manual_video.py` provides easy management of manual video entries
|
||||||
|
- **Full integration**: Manual videos work with all existing features (songlist matching, fuzzy matching, parallel downloads, etc.)
|
||||||
|
- **No yt-dlp dependency**: Manual videos bypass YouTube API calls for video listing, using static data instead
|
||||||
|
|
||||||
|
### **Channel-Specific Parsing Rules**
|
||||||
|
- **JSON-based configuration**: `data/channels.json` replaces `data/channels.txt` with structured channel configuration
|
||||||
|
- **Parsing rules per channel**: Each channel can define custom parsing rules for video titles
|
||||||
|
- **Multiple format support**: Handles various title formats like "Artist - Title", "Artist Title", "Title | Artist", etc.
|
||||||
|
- **Suffix cleanup**: Automatic removal of common karaoke-related suffixes
|
||||||
|
- **Multi-artist support**: Parsing for titles with multiple artists separated by specific delimiters
|
||||||
|
- **Backward compatibility**: Still supports legacy `data/channels.txt` format
|
||||||
|
|
||||||
|
### **Benefits of New Features**
|
||||||
|
- **Flexible video management**: Easy addition of individual karaoke videos without creating new channels
|
||||||
|
- **Accurate parsing**: Channel-specific rules ensure correct artist/title extraction for ID3 tags and filenames
|
||||||
|
- **Consistent metadata**: Proper parsing prevents filename and ID3 tag inconsistencies
|
||||||
|
- **Easy maintenance**: Simple JSON structure for managing both channels and manual videos
|
||||||
|
- **Full feature compatibility**: Manual videos work seamlessly with existing download modes and features
|
||||||
|
|
||||||
## 📚 Documentation Standards
|
## 📚 Documentation Standards
|
||||||
|
|
||||||
### **Documentation Location**
|
### **Documentation Location**
|
||||||
|
|||||||
190
add_manual_video.py
Normal file
190
add_manual_video.py
Normal file
@ -0,0 +1,190 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Helper script to add manual videos to the manual videos collection.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
|
||||||
|
def extract_video_id(url: str) -> Optional[str]:
|
||||||
|
"""Extract video ID from YouTube URL."""
|
||||||
|
patterns = [
|
||||||
|
r'(?:youtube\.com/watch\?v=|youtu\.be/|youtube\.com/embed/)([a-zA-Z0-9_-]{11})',
|
||||||
|
r'youtube\.com/watch\?.*v=([a-zA-Z0-9_-]{11})'
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
match = re.search(pattern, url)
|
||||||
|
if match:
|
||||||
|
return match.group(1)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def add_manual_video(title: str, url: str, manual_file: str = "data/manual_videos.json"):
|
||||||
|
"""
|
||||||
|
Add a manual video to the collection.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
title: Video title (e.g., "Artist - Song (Karaoke Version)")
|
||||||
|
url: YouTube URL
|
||||||
|
manual_file: Path to manual videos JSON file
|
||||||
|
"""
|
||||||
|
manual_path = Path(manual_file)
|
||||||
|
|
||||||
|
# Load existing data or create new
|
||||||
|
if manual_path.exists():
|
||||||
|
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
else:
|
||||||
|
data = {
|
||||||
|
"channel_name": "@ManualVideos",
|
||||||
|
"channel_url": "manual://static",
|
||||||
|
"description": "Manual collection of individual karaoke videos",
|
||||||
|
"videos": [],
|
||||||
|
"parsing_rules": {
|
||||||
|
"format": "artist_title_separator",
|
||||||
|
"separator": " - ",
|
||||||
|
"artist_first": true,
|
||||||
|
"title_cleanup": {
|
||||||
|
"remove_suffix": {
|
||||||
|
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Extract video ID
|
||||||
|
video_id = extract_video_id(url)
|
||||||
|
if not video_id:
|
||||||
|
print(f"❌ Could not extract video ID from URL: {url}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check if video already exists
|
||||||
|
existing_ids = [video.get("id") for video in data["videos"]]
|
||||||
|
if video_id in existing_ids:
|
||||||
|
print(f"⚠️ Video already exists: {title}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Add new video
|
||||||
|
new_video = {
|
||||||
|
"title": title,
|
||||||
|
"url": url,
|
||||||
|
"id": video_id,
|
||||||
|
"upload_date": "2024-01-01", # Default date
|
||||||
|
"duration": 180, # Default duration
|
||||||
|
"view_count": 1000 # Default view count
|
||||||
|
}
|
||||||
|
|
||||||
|
data["videos"].append(new_video)
|
||||||
|
|
||||||
|
# Save updated data
|
||||||
|
manual_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(manual_path, 'w', encoding='utf-8') as f:
|
||||||
|
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
print(f"✅ Added video: {title}")
|
||||||
|
print(f" URL: {url}")
|
||||||
|
print(f" ID: {video_id}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
def list_manual_videos(manual_file: str = "data/manual_videos.json"):
|
||||||
|
"""List all manual videos."""
|
||||||
|
manual_path = Path(manual_file)
|
||||||
|
|
||||||
|
if not manual_path.exists():
|
||||||
|
print("❌ No manual videos file found")
|
||||||
|
return
|
||||||
|
|
||||||
|
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
print(f"📋 Manual Videos ({len(data['videos'])} videos):")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
for i, video in enumerate(data['videos'], 1):
|
||||||
|
print(f"{i:2d}. {video['title']}")
|
||||||
|
print(f" URL: {video['url']}")
|
||||||
|
print(f" ID: {video['id']}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
def remove_manual_video(video_id: str, manual_file: str = "data/manual_videos.json"):
|
||||||
|
"""Remove a manual video by ID."""
|
||||||
|
manual_path = Path(manual_file)
|
||||||
|
|
||||||
|
if not manual_path.exists():
|
||||||
|
print("❌ No manual videos file found")
|
||||||
|
return False
|
||||||
|
|
||||||
|
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
# Find and remove video
|
||||||
|
for i, video in enumerate(data['videos']):
|
||||||
|
if video['id'] == video_id:
|
||||||
|
removed_video = data['videos'].pop(i)
|
||||||
|
with open(manual_path, 'w', encoding='utf-8') as f:
|
||||||
|
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||||
|
print(f"✅ Removed video: {removed_video['title']}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
print(f"❌ Video with ID '{video_id}' not found")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Interactive mode for adding manual videos."""
|
||||||
|
print("🎤 Manual Video Manager")
|
||||||
|
print("=" * 30)
|
||||||
|
print("1. Add video")
|
||||||
|
print("2. List videos")
|
||||||
|
print("3. Remove video")
|
||||||
|
print("4. Exit")
|
||||||
|
|
||||||
|
while True:
|
||||||
|
choice = input("\nSelect option (1-4): ").strip()
|
||||||
|
|
||||||
|
if choice == "1":
|
||||||
|
title = input("Enter video title (e.g., 'Artist - Song (Karaoke Version)'): ").strip()
|
||||||
|
url = input("Enter YouTube URL: ").strip()
|
||||||
|
|
||||||
|
if title and url:
|
||||||
|
add_manual_video(title, url)
|
||||||
|
else:
|
||||||
|
print("❌ Title and URL are required")
|
||||||
|
|
||||||
|
elif choice == "2":
|
||||||
|
list_manual_videos()
|
||||||
|
|
||||||
|
elif choice == "3":
|
||||||
|
video_id = input("Enter video ID to remove: ").strip()
|
||||||
|
if video_id:
|
||||||
|
remove_manual_video(video_id)
|
||||||
|
else:
|
||||||
|
print("❌ Video ID is required")
|
||||||
|
|
||||||
|
elif choice == "4":
|
||||||
|
print("👋 Goodbye!")
|
||||||
|
break
|
||||||
|
|
||||||
|
else:
|
||||||
|
print("❌ Invalid option")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if len(sys.argv) > 1:
|
||||||
|
# Command line mode
|
||||||
|
if sys.argv[1] == "add" and len(sys.argv) >= 4:
|
||||||
|
add_manual_video(sys.argv[2], sys.argv[3])
|
||||||
|
elif sys.argv[1] == "list":
|
||||||
|
list_manual_videos()
|
||||||
|
elif sys.argv[1] == "remove" and len(sys.argv) >= 3:
|
||||||
|
remove_manual_video(sys.argv[2])
|
||||||
|
else:
|
||||||
|
print("Usage:")
|
||||||
|
print(" python add_manual_video.py add 'Title' 'URL'")
|
||||||
|
print(" python add_manual_video.py list")
|
||||||
|
print(" python add_manual_video.py remove VIDEO_ID")
|
||||||
|
else:
|
||||||
|
# Interactive mode
|
||||||
|
main()
|
||||||
44
commands.txt
44
commands.txt
@ -1,6 +1,6 @@
|
|||||||
# 🎤 Karaoke Video Downloader - CLI Commands Reference
|
# 🎤 Karaoke Video Downloader - CLI Commands Reference
|
||||||
# Copy and paste these commands into your terminal
|
# Copy and paste these commands into your terminal
|
||||||
# Updated: v3.4 (includes parallel downloads and all refactoring improvements)
|
# Updated: v3.4.3 (includes manual video collection, channel parsing rules, and all previous improvements)
|
||||||
|
|
||||||
## 📥 BASIC DOWNLOADS
|
## 📥 BASIC DOWNLOADS
|
||||||
|
|
||||||
@ -19,6 +19,32 @@ python download_karaoke.py --limit 10 https://www.youtube.com/@SingKingKaraoke/v
|
|||||||
# Enable parallel downloads for faster processing (3-5x speedup)
|
# Enable parallel downloads for faster processing (3-5x speedup)
|
||||||
python download_karaoke.py --parallel --workers 5 --limit 10 https://www.youtube.com/@SingKingKaraoke/videos
|
python download_karaoke.py --parallel --workers 5 --limit 10 https://www.youtube.com/@SingKingKaraoke/videos
|
||||||
|
|
||||||
|
## 🎤 MANUAL VIDEO COLLECTION (v3.4.3)
|
||||||
|
|
||||||
|
# Download from manual videos collection (data/manual_videos.json)
|
||||||
|
python download_karaoke.py --manual --limit 5
|
||||||
|
|
||||||
|
# Download manual videos with fuzzy matching
|
||||||
|
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
||||||
|
|
||||||
|
# Download manual videos with parallel processing
|
||||||
|
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
||||||
|
|
||||||
|
# Download manual videos with songlist matching
|
||||||
|
python download_karaoke.py --manual --songlist-only --limit 10
|
||||||
|
|
||||||
|
# Force download from manual videos (bypass existing file checks)
|
||||||
|
python download_karaoke.py --manual --force --limit 5
|
||||||
|
|
||||||
|
# Add a video to manual collection (interactive)
|
||||||
|
python add_manual_video.py add "Artist - Song Title (Karaoke Version)" "https://www.youtube.com/watch?v=VIDEO_ID"
|
||||||
|
|
||||||
|
# List all manual videos
|
||||||
|
python add_manual_video.py list
|
||||||
|
|
||||||
|
# Remove a video from manual collection
|
||||||
|
python add_manual_video.py remove "Artist - Song Title (Karaoke Version)"
|
||||||
|
|
||||||
## 📋 SONG LIST GENERATION
|
## 📋 SONG LIST GENERATION
|
||||||
|
|
||||||
# Generate song list from MP4 files in a directory (append to existing song list)
|
# Generate song list from MP4 files in a directory (append to existing song list)
|
||||||
@ -258,6 +284,15 @@ python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
|
|||||||
python download_karaoke.py --status
|
python download_karaoke.py --status
|
||||||
python download_karaoke.py --clear-cache all
|
python download_karaoke.py --clear-cache all
|
||||||
|
|
||||||
|
# 7. Download from manual video collection
|
||||||
|
python download_karaoke.py --manual --limit 5
|
||||||
|
|
||||||
|
# 7b. Fast parallel manual video download
|
||||||
|
python download_karaoke.py --parallel --workers 3 --manual --limit 5
|
||||||
|
|
||||||
|
# 7c. Manual videos with fuzzy matching
|
||||||
|
python download_karaoke.py --manual --fuzzy-match --fuzzy-threshold 85 --limit 10
|
||||||
|
|
||||||
## 🔧 TROUBLESHOOTING COMMANDS
|
## 🔧 TROUBLESHOOTING COMMANDS
|
||||||
|
|
||||||
# Check if everything is working
|
# Check if everything is working
|
||||||
@ -273,7 +308,9 @@ python download_karaoke.py --clear-server-duplicates
|
|||||||
## 📝 NOTES
|
## 📝 NOTES
|
||||||
|
|
||||||
# Default files used:
|
# Default files used:
|
||||||
# - data/channels.txt (default channel list for songlist modes)
|
# - data/channels.json (channel configuration with parsing rules, preferred)
|
||||||
|
# - data/channels.txt (legacy channel list, backward compatibility)
|
||||||
|
# - data/manual_videos.json (manual video collection)
|
||||||
# - data/songList.json (your prioritized song list)
|
# - data/songList.json (your prioritized song list)
|
||||||
# - data/config.json (download settings)
|
# - data/config.json (download settings)
|
||||||
|
|
||||||
@ -282,11 +319,12 @@ python download_karaoke.py --clear-server-duplicates
|
|||||||
# Fuzzy threshold: 0-100 (higher = more strict matching, default 90)
|
# Fuzzy threshold: 0-100 (higher = more strict matching, default 90)
|
||||||
|
|
||||||
# The system automatically:
|
# The system automatically:
|
||||||
# - Uses data/channels.txt if no --file specified in songlist modes
|
# - Uses data/channels.json if available, falls back to data/channels.txt if no --file specified in songlist modes
|
||||||
# - Caches channel data for 24 hours (configurable)
|
# - Caches channel data for 24 hours (configurable)
|
||||||
# - Tracks all downloads in JSON files
|
# - Tracks all downloads in JSON files
|
||||||
# - Avoids re-downloading existing files
|
# - Avoids re-downloading existing files
|
||||||
# - Checks for server duplicates
|
# - Checks for server duplicates
|
||||||
|
# - Supports manual video collection via --manual parameter
|
||||||
|
|
||||||
# For best performance:
|
# For best performance:
|
||||||
# - Use --parallel --workers 5 for 3-5x faster downloads
|
# - Use --parallel --workers 5 for 3-5x faster downloads
|
||||||
|
|||||||
@ -131,6 +131,22 @@
|
|||||||
},
|
},
|
||||||
"description": "Title first, then dash separator, then artist with KARAOKE suffix"
|
"description": "Title first, then dash separator, then artist with KARAOKE suffix"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"name": "@ManualVideos",
|
||||||
|
"url": "manual://static",
|
||||||
|
"manual_videos_file": "data/manual_videos.json",
|
||||||
|
"parsing_rules": {
|
||||||
|
"format": "artist_title_separator",
|
||||||
|
"separator": " - ",
|
||||||
|
"artist_first": true,
|
||||||
|
"title_cleanup": {
|
||||||
|
"remove_suffix": {
|
||||||
|
"suffixes": ["(Karaoke)", "(Karaoke Version)", "(Karaoke Version) Lyrics"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"description": "Manual collection of individual karaoke videos (static, never expires)"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"name": "Let's Sing Karaoke",
|
"name": "Let's Sing Karaoke",
|
||||||
"url": "https://www.youtube.com/@LetsSingKaraoke/videos",
|
"url": "https://www.youtube.com/@LetsSingKaraoke/videos",
|
||||||
|
|||||||
45
data/manual_videos.json
Normal file
45
data/manual_videos.json
Normal file
@ -0,0 +1,45 @@
|
|||||||
|
{
|
||||||
|
"channel_name": "@ManualVideos",
|
||||||
|
"channel_url": "manual://static",
|
||||||
|
"description": "Manual collection of individual karaoke videos",
|
||||||
|
"videos": [
|
||||||
|
{
|
||||||
|
"title": "10,000 Maniacs - Because The Night",
|
||||||
|
"url": "https://www.youtube.com/watch?v=7CoVTWBw1xs",
|
||||||
|
"id": "7CoVTWBw1xs",
|
||||||
|
"upload_date": "2024-01-01",
|
||||||
|
"duration": 180,
|
||||||
|
"view_count": 1000
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "10,000 Maniacs - Like The Weather",
|
||||||
|
"url": "https://www.youtube.com/watch?v=brc7wNVRv_4",
|
||||||
|
"id": "brc7wNVRv_4",
|
||||||
|
"upload_date": "2024-01-01",
|
||||||
|
"duration": 180,
|
||||||
|
"view_count": 1000
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "10,000 Maniacs - More Than This",
|
||||||
|
"url": "https://www.youtube.com/watch?v=wxnuF-APJ5M",
|
||||||
|
"id": "wxnuF-APJ5M",
|
||||||
|
"upload_date": "2024-01-01",
|
||||||
|
"duration": 180,
|
||||||
|
"view_count": 1000
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"parsing_rules": {
|
||||||
|
"format": "artist_title_separator",
|
||||||
|
"separator": " - ",
|
||||||
|
"artist_first": true,
|
||||||
|
"title_cleanup": {
|
||||||
|
"remove_suffix": {
|
||||||
|
"suffixes": [
|
||||||
|
"(Karaoke)",
|
||||||
|
"(Karaoke Version)",
|
||||||
|
"(Karaoke Version) Lyrics"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -8286,5 +8286,21 @@
|
|||||||
"channel": "@VocalStarKaraoke",
|
"channel": "@VocalStarKaraoke",
|
||||||
"marked_at": "2025-07-28T10:18:11.001221",
|
"marked_at": "2025-07-28T10:18:11.001221",
|
||||||
"reason": "already_on_server"
|
"reason": "already_on_server"
|
||||||
|
},
|
||||||
|
"kendrick lamar_not like us": {
|
||||||
|
"artist": "Kendrick Lamar",
|
||||||
|
"title": "Not Like Us",
|
||||||
|
"video_title": "Kendrick Lamar Not Like Us (Karaoke Version) Lyrics",
|
||||||
|
"channel": "@sing2karaoke",
|
||||||
|
"marked_at": "2025-07-28T14:24:01.915881",
|
||||||
|
"reason": "already_on_server"
|
||||||
|
},
|
||||||
|
"ed sheeran_you need me i don't need you": {
|
||||||
|
"artist": "Ed Sheeran",
|
||||||
|
"title": "You Need Me I Don't Need You",
|
||||||
|
"video_title": "Ed Sheeran You Need Me I Don't Need You (Karaoke Version) Lyrics",
|
||||||
|
"channel": "@sing2karaoke",
|
||||||
|
"marked_at": "2025-07-28T14:24:01.939201",
|
||||||
|
"reason": "already_on_server"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
5808
data/songs.json
5808
data/songs.json
File diff suppressed because it is too large
Load Diff
@ -102,6 +102,7 @@ Examples:
|
|||||||
python download_karaoke.py --songlist-only --limit 10 # Download only songlist songs across channels
|
python download_karaoke.py --songlist-only --limit 10 # Download only songlist songs across channels
|
||||||
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos # Download from specific channel
|
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos # Download from specific channel
|
||||||
python download_karaoke.py --file data/channels.txt # Download from custom channel list
|
python download_karaoke.py --file data/channels.txt # Download from custom channel list
|
||||||
|
python download_karaoke.py --manual --limit 5 # Download from manual videos collection
|
||||||
python download_karaoke.py --reset-channel SingKingKaraoke --delete-files
|
python download_karaoke.py --reset-channel SingKingKaraoke --delete-files
|
||||||
""",
|
""",
|
||||||
)
|
)
|
||||||
@ -292,6 +293,11 @@ Examples:
|
|||||||
action="store_true",
|
action="store_true",
|
||||||
help="Create a new song list instead of appending when using --generate-songlist",
|
help="Create a new song list instead of appending when using --generate-songlist",
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--manual",
|
||||||
|
action="store_true",
|
||||||
|
help="Download from manual videos collection (data/manual_videos.json)",
|
||||||
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
# Validate workers argument
|
# Validate workers argument
|
||||||
@ -464,6 +470,15 @@ Examples:
|
|||||||
if len(tracking) > 10:
|
if len(tracking) > 10:
|
||||||
print(f" ... and {len(tracking) - 10} more")
|
print(f" ... and {len(tracking) - 10} more")
|
||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
|
elif args.manual:
|
||||||
|
# Download from manual videos collection
|
||||||
|
print("🎤 Downloading from manual videos collection...")
|
||||||
|
success = downloader.download_channel_videos(
|
||||||
|
"manual://static",
|
||||||
|
force_refresh=args.refresh,
|
||||||
|
fuzzy_match=args.fuzzy_match,
|
||||||
|
fuzzy_threshold=args.fuzzy_threshold,
|
||||||
|
)
|
||||||
elif args.songlist_only or args.songlist_focus:
|
elif args.songlist_only or args.songlist_focus:
|
||||||
# Use provided file or default to channels configuration
|
# Use provided file or default to channels configuration
|
||||||
channel_urls = load_channels(args.file)
|
channel_urls = load_channels(args.file)
|
||||||
|
|||||||
@ -63,6 +63,7 @@ from karaoke_downloader.parallel_downloader import (
|
|||||||
create_parallel_downloader,
|
create_parallel_downloader,
|
||||||
)
|
)
|
||||||
from karaoke_downloader.youtube_utils import get_channel_info, get_playlist_info
|
from karaoke_downloader.youtube_utils import get_channel_info, get_playlist_info
|
||||||
|
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
||||||
|
|
||||||
# Constants
|
# Constants
|
||||||
DEFAULT_FUZZY_THRESHOLD = 85
|
DEFAULT_FUZZY_THRESHOLD = 85
|
||||||
@ -186,6 +187,35 @@ class KaraokeDownloader:
|
|||||||
fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD,
|
fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD,
|
||||||
):
|
):
|
||||||
"""Download videos from a channel or playlist URL, respecting songlist-only and limit flags. Supports fuzzy matching."""
|
"""Download videos from a channel or playlist URL, respecting songlist-only and limit flags. Supports fuzzy matching."""
|
||||||
|
|
||||||
|
# Check if this is a manual channel
|
||||||
|
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
||||||
|
|
||||||
|
if is_manual_channel(url):
|
||||||
|
channel_name, channel_id = get_manual_channel_info(url)
|
||||||
|
print(f"\n🎬 Downloading from manual channel: {channel_name} ({url})")
|
||||||
|
|
||||||
|
# Load manual videos
|
||||||
|
manual_videos = get_manual_videos_for_channel(channel_name)
|
||||||
|
if not manual_videos:
|
||||||
|
print("⚠️ No manual videos found. Skipping.")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Convert to the expected format
|
||||||
|
available_videos = []
|
||||||
|
for video in manual_videos:
|
||||||
|
available_videos.append({
|
||||||
|
"title": video.get("title", ""),
|
||||||
|
"id": video.get("id", ""),
|
||||||
|
"url": video.get("url", "")
|
||||||
|
})
|
||||||
|
|
||||||
|
print(f"📋 Found {len(available_videos)} manual videos")
|
||||||
|
|
||||||
|
# Process manual videos (skip yt-dlp)
|
||||||
|
return self._process_videos_for_download(available_videos, channel_name, force_refresh, fuzzy_match, fuzzy_threshold)
|
||||||
|
|
||||||
|
# Regular YouTube channel processing
|
||||||
channel_name, channel_id = get_channel_info(url)
|
channel_name, channel_id = get_channel_info(url)
|
||||||
print(f"\n🎬 Downloading from channel: {channel_name} ({url})")
|
print(f"\n🎬 Downloading from channel: {channel_name} ({url})")
|
||||||
songlist = load_songlist(self.songlist_file_path)
|
songlist = load_songlist(self.songlist_file_path)
|
||||||
@ -1011,6 +1041,134 @@ class KaraokeDownloader:
|
|||||||
# --- Download phase ---
|
# --- Download phase ---
|
||||||
return self.execute_latest_per_channel_parallel(channel_plans, cache_file)
|
return self.execute_latest_per_channel_parallel(channel_plans, cache_file)
|
||||||
|
|
||||||
|
def _process_videos_for_download(self, available_videos, channel_name, force_refresh=False, fuzzy_match=False, fuzzy_threshold=DEFAULT_FUZZY_THRESHOLD):
|
||||||
|
"""Process videos for download (used for both manual and regular channels)."""
|
||||||
|
songlist = load_songlist(self.songlist_file_path)
|
||||||
|
if not songlist:
|
||||||
|
print("⚠️ No songlist loaded. Skipping.")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Load server songs and duplicates tracking for availability checking
|
||||||
|
server_songs = load_server_songs()
|
||||||
|
server_duplicates_tracking = load_server_duplicates_tracking()
|
||||||
|
|
||||||
|
limit = getattr(self.config, "limit", 1)
|
||||||
|
|
||||||
|
# Normalize songlist for matching
|
||||||
|
normalized_songlist = {
|
||||||
|
create_song_key(s["artist"], s["title"]): s for s in songlist
|
||||||
|
}
|
||||||
|
matches = []
|
||||||
|
similarity = get_similarity_function()
|
||||||
|
|
||||||
|
print(f"🔍 Scanning {len(available_videos)} videos for songlist matches...")
|
||||||
|
|
||||||
|
for video in available_videos:
|
||||||
|
title = video["title"]
|
||||||
|
video_id = video["id"]
|
||||||
|
|
||||||
|
# Extract artist and title using channel parser
|
||||||
|
artist, extracted_title = self.channel_parser.extract_artist_title(title, channel_name)
|
||||||
|
|
||||||
|
if not artist and not extracted_title:
|
||||||
|
continue
|
||||||
|
|
||||||
|
song_key = create_song_key(artist, extracted_title)
|
||||||
|
|
||||||
|
# Check for exact matches first
|
||||||
|
if song_key in normalized_songlist:
|
||||||
|
song_data = normalized_songlist[song_key]
|
||||||
|
matches.append({
|
||||||
|
"video": video,
|
||||||
|
"song": song_data,
|
||||||
|
"match_type": "exact",
|
||||||
|
"match_score": 100.0,
|
||||||
|
"artist": artist,
|
||||||
|
"title": extracted_title
|
||||||
|
})
|
||||||
|
print(f" ✅ Exact match: {artist} - {extracted_title}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check for fuzzy matches if enabled
|
||||||
|
if fuzzy_match:
|
||||||
|
best_match = None
|
||||||
|
best_score = 0
|
||||||
|
|
||||||
|
for song_key, song_data in normalized_songlist.items():
|
||||||
|
score = similarity(f"{artist} {extracted_title}", f"{song_data['artist']} {song_data['title']}")
|
||||||
|
if score > best_score and score >= fuzzy_threshold:
|
||||||
|
best_score = score
|
||||||
|
best_match = song_data
|
||||||
|
|
||||||
|
if best_match:
|
||||||
|
matches.append({
|
||||||
|
"video": video,
|
||||||
|
"song": best_match,
|
||||||
|
"match_type": "fuzzy",
|
||||||
|
"match_score": best_score,
|
||||||
|
"artist": artist,
|
||||||
|
"title": extracted_title
|
||||||
|
})
|
||||||
|
print(f" 🎯 Fuzzy match ({best_score:.1f}%): {artist} - {extracted_title} -> {best_match['artist']} - {best_match['title']}")
|
||||||
|
|
||||||
|
print(f"📊 Found {len(matches)} matches out of {len(available_videos)} videos")
|
||||||
|
|
||||||
|
if not matches:
|
||||||
|
print("❌ No matches found in songlist")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Sort matches by score (exact matches first, then by fuzzy score)
|
||||||
|
matches.sort(key=lambda x: (x["match_type"] != "exact", -x["match_score"]))
|
||||||
|
|
||||||
|
# Limit downloads
|
||||||
|
if limit:
|
||||||
|
matches = matches[:limit]
|
||||||
|
print(f"🎯 Limiting to {len(matches)} downloads")
|
||||||
|
|
||||||
|
# Download matched videos
|
||||||
|
success_count = 0
|
||||||
|
for i, match in enumerate(matches, 1):
|
||||||
|
video = match["video"]
|
||||||
|
song = match["song"]
|
||||||
|
artist = match["artist"]
|
||||||
|
title = match["title"]
|
||||||
|
video_id = video["id"]
|
||||||
|
|
||||||
|
print(f"\n⬇️ Downloading {i}/{len(matches)}: {artist} - {title}")
|
||||||
|
print(f" 🎬 Video: {video['title']} ({channel_name})")
|
||||||
|
if match["match_type"] == "fuzzy":
|
||||||
|
print(f" 🎯 Match Score: {match['match_score']:.1f}%")
|
||||||
|
|
||||||
|
# Create filename
|
||||||
|
filename = sanitize_filename(artist, title)
|
||||||
|
output_path = self.downloads_dir / channel_name / filename
|
||||||
|
|
||||||
|
# Use the download pipeline
|
||||||
|
pipeline = DownloadPipeline(
|
||||||
|
yt_dlp_path=str(self.yt_dlp_path),
|
||||||
|
config=self.config,
|
||||||
|
downloads_dir=self.downloads_dir,
|
||||||
|
songlist_tracking=self.songlist_tracking,
|
||||||
|
tracker=self.tracker,
|
||||||
|
)
|
||||||
|
|
||||||
|
success = pipeline.execute_pipeline(
|
||||||
|
video_id=video_id,
|
||||||
|
artist=artist,
|
||||||
|
title=title,
|
||||||
|
channel_name=channel_name,
|
||||||
|
video_title=video["title"]
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
success_count += 1
|
||||||
|
print(f"✅ Successfully downloaded: {artist} - {title}")
|
||||||
|
else:
|
||||||
|
print(f"❌ Failed to download: {artist} - {title}")
|
||||||
|
|
||||||
|
print(f"\n🎉 Download complete! {success_count}/{len(matches)} videos downloaded successfully")
|
||||||
|
return success_count > 0
|
||||||
|
|
||||||
|
|
||||||
def reset_songlist_all():
|
def reset_songlist_all():
|
||||||
"""Delete all files tracked in songlist_tracking.json, clear songlist_tracking.json, and remove songlist songs from karaoke_tracking.json."""
|
"""Delete all files tracked in songlist_tracking.json, clear songlist_tracking.json, and remove songlist songs from karaoke_tracking.json."""
|
||||||
|
|||||||
77
karaoke_downloader/manual_video_manager.py
Normal file
77
karaoke_downloader/manual_video_manager.py
Normal file
@ -0,0 +1,77 @@
|
|||||||
|
"""
|
||||||
|
Manual video manager for handling static video collections.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Any
|
||||||
|
|
||||||
|
def load_manual_videos(manual_file: str = "data/manual_videos.json") -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Load manual videos from the JSON file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
manual_file: Path to manual videos JSON file
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of video dictionaries
|
||||||
|
"""
|
||||||
|
manual_path = Path(manual_file)
|
||||||
|
|
||||||
|
if not manual_path.exists():
|
||||||
|
print(f"⚠️ Manual videos file not found: {manual_file}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(manual_path, 'r', encoding='utf-8') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
videos = data.get("videos", [])
|
||||||
|
print(f"📋 Loaded {len(videos)} manual videos from {manual_file}")
|
||||||
|
return videos
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error loading manual videos: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_manual_videos_for_channel(channel_name: str, manual_file: str = "data/manual_videos.json") -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Get manual videos for a specific channel.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
channel_name: Channel name (should be "@ManualVideos")
|
||||||
|
manual_file: Path to manual videos JSON file
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of video dictionaries
|
||||||
|
"""
|
||||||
|
if channel_name != "@ManualVideos":
|
||||||
|
return []
|
||||||
|
|
||||||
|
return load_manual_videos(manual_file)
|
||||||
|
|
||||||
|
def is_manual_channel(channel_url: str) -> bool:
|
||||||
|
"""
|
||||||
|
Check if a channel URL is a manual channel.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
channel_url: Channel URL
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if it's a manual channel
|
||||||
|
"""
|
||||||
|
return channel_url == "manual://static"
|
||||||
|
|
||||||
|
def get_manual_channel_info(channel_url: str) -> tuple[str, str]:
|
||||||
|
"""
|
||||||
|
Get channel info for manual channels.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
channel_url: Channel URL
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (channel_name, channel_id)
|
||||||
|
"""
|
||||||
|
if channel_url == "manual://static":
|
||||||
|
return "@ManualVideos", "manual"
|
||||||
|
return None, None
|
||||||
@ -341,8 +341,31 @@ class TrackingManager:
|
|||||||
show_pagination: Show page-by-page progress (slower but more detailed)
|
show_pagination: Show page-by-page progress (slower but more detailed)
|
||||||
"""
|
"""
|
||||||
channel_name, channel_id = None, None
|
channel_name, channel_id = None, None
|
||||||
from karaoke_downloader.youtube_utils import get_channel_info
|
|
||||||
|
|
||||||
|
# Check if this is a manual channel
|
||||||
|
from karaoke_downloader.manual_video_manager import is_manual_channel, get_manual_channel_info, get_manual_videos_for_channel
|
||||||
|
|
||||||
|
if is_manual_channel(channel_url):
|
||||||
|
channel_name, channel_id = get_manual_channel_info(channel_url)
|
||||||
|
if channel_name and channel_id:
|
||||||
|
print(f" 📋 Loading manual videos for {channel_name}")
|
||||||
|
manual_videos = get_manual_videos_for_channel(channel_name)
|
||||||
|
# Convert to the expected format
|
||||||
|
videos = []
|
||||||
|
for video in manual_videos:
|
||||||
|
videos.append({
|
||||||
|
"title": video.get("title", ""),
|
||||||
|
"id": video.get("id", ""),
|
||||||
|
"url": video.get("url", "")
|
||||||
|
})
|
||||||
|
print(f" ✅ Loaded {len(videos)} manual videos")
|
||||||
|
return videos
|
||||||
|
else:
|
||||||
|
print(f" ❌ Could not get manual channel info for: {channel_url}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Regular YouTube channel processing
|
||||||
|
from karaoke_downloader.youtube_utils import get_channel_info
|
||||||
channel_name, channel_id = get_channel_info(channel_url)
|
channel_name, channel_id = get_channel_info(channel_url)
|
||||||
|
|
||||||
if not channel_id:
|
if not channel_id:
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user