KaraokeVideoDownloader/README.md

180 lines
7.2 KiB
Markdown

# 🎤 Karaoke Video Downloader
A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using `yt-dlp.exe`, with advanced tracking, songlist prioritization, and flexible configuration.
## ✨ Features
- 🎵 **Channel & Playlist Downloads**: Download all videos from a YouTube channel or playlist
- 📂 **Organized Storage**: Each channel gets its own folder in `downloads/`
- 📝 **Robust Tracking**: Tracks all downloads, statuses, and formats in JSON
- 🏆 **Songlist Prioritization**: Prioritize or restrict downloads to a custom songlist
- 🔄 **Batch Saving & Caching**: Efficient, minimizes API calls
- 🏷️ **ID3 Tagging**: Adds artist/title metadata to MP4 files
- 🧹 **Automatic Cleanup**: Removes extra yt-dlp files
- 📈 **Real-Time Progress**: Detailed console and log output
- 🧹 **Reset/Clear Channel**: Reset all tracking and files for a channel, or clear channel cache via CLI
- 🗂️ **Latest-per-channel download**: Download the latest N videos from each channel in a single batch, with a per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
- 🧩 **Fuzzy Matching**: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
## 📋 Requirements
- **Windows 10/11**
- **Python 3.7+**
- **yt-dlp.exe** (in `downloader/`)
- **mutagen** (for ID3 tagging, optional)
- **ffmpeg/ffprobe** (for video validation, optional but recommended)
- **rapidfuzz** (for fuzzy matching, optional, falls back to difflib)
## 🚀 Quick Start
### Download a Channel
```bash
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
```
### Download from a List of Channels
```bash
python download_karaoke.py --file data/channels.txt
```
### Download Only Songlist Songs
```bash
python download_karaoke.py --songlist-only
```
### Prioritize Songlist in Download Queue
```bash
python download_karaoke.py --songlist-priority
```
### Show Songlist Download Progress
```bash
python download_karaoke.py --songlist-status
```
### Limit Number of Downloads
```bash
python download_karaoke.py --limit 5
```
### Override Resolution
```bash
python download_karaoke.py --resolution 1080p
```
### Download Latest N Videos Per Channel
```bash
python download_karaoke.py --file data/channels.txt --latest-per-channel --limit 5
```
### **Reset/Start Over for a Channel**
```bash
python download_karaoke.py --reset-channel SingKingKaraoke
```
### **Reset Channel and Songlist Songs**
```bash
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
```
### **Clear Channel Cache**
```bash
python download_karaoke.py --clear-cache SingKingKaraoke
python download_karaoke.py --clear-cache all
```
## 🧠 Songlist Integration
- Place your prioritized song list in `data/songList.json` (see example format below).
- The tool will match and prioritize these songs across all available channel videos.
- Use `--songlist-only` to download only these songs, or `--songlist-priority` to prioritize them in the queue.
- Download progress for the songlist is tracked globally in `data/songlist_tracking.json`.
#### Example `data/songList.json`
```json
[
{ "artist": "Taylor Swift", "title": "Cruel Summer" },
{ "artist": "Billie Eilish", "title": "Happier Than Ever" }
]
```
## 🛠️ Tracking & Caching
- **data/karaoke_tracking.json**: Tracks all downloads, statuses, and formats
- **data/songlist_tracking.json**: Tracks global songlist download progress
- **data/channel_cache.json**: Caches channel video lists for performance
## 📂 Folder Structure
```
KaroakeVideoDownloader/
├── karaoke_downloader/ # All core Python code and utilities
│ ├── downloader.py # Main downloader class
│ ├── cli.py # CLI entry point
│ ├── id3_utils.py # ID3 tagging helpers
│ ├── songlist_manager.py # Songlist logic
│ ├── youtube_utils.py # YouTube helpers
│ ├── tracking_manager.py # Tracking logic
│ ├── check_resolution.py # Resolution checker utility
│ ├── resolution_cli.py # Resolution config CLI
│ └── tracking_cli.py # Tracking management CLI
├── data/ # All config, tracking, cache, and songlist files
│ ├── config.json
│ ├── karaoke_tracking.json
│ ├── songlist_tracking.json
│ ├── channel_cache.json
│ ├── channels.txt
│ └── songList.json
├── downloads/ # All video output
│ └── [ChannelName]/ # Per-channel folders
├── logs/ # Download logs
├── downloader/yt-dlp.exe # yt-dlp binary
├── tests/ # Diagnostic and test scripts
│ └── test_installation.py
├── download_karaoke.py # Main entry point (thin wrapper)
├── README.md
├── PRD.md
├── requirements.txt
└── download_karaoke.bat # (optional Windows launcher)
```
## 🚦 CLI Options
- `--file <data/channels.txt>`: Download from a list of channels
- `--songlist-priority`: Prioritize songlist songs in download queue
- `--songlist-only`: Download only songs from the songlist
- `--songlist-status`: Show songlist download progress
- `--limit <N>`: Limit number of downloads
- `--resolution <720p|1080p|...>`: Override resolution
- `--status`: Show download/tracking status
- `--reset-channel <CHANNEL_NAME>`: **Reset all tracking and files for a channel**
- `--reset-songlist`: **When used with --reset-channel, also reset songlist songs for this channel**
- `--clear-cache <CHANNEL_ID|all>`: **Clear channel video cache for a specific channel or all**
- `--latest-per-channel`: **Download the latest N videos from each channel (use with --limit)**
- `--fuzzy-match`: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
- `--fuzzy-threshold <N>`: Fuzzy match threshold (0-100, default 85)
## 📝 Example Usage
```bash
python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos --songlist-priority --limit 10
python download_karaoke.py --file data/channels.txt --songlist-only
python download_karaoke.py --songlist-status
python download_karaoke.py --reset-channel SingKingKaraoke
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
python download_karaoke.py --clear-cache all
```
## 🏷️ ID3 Tagging
- Adds artist/title/album/genre to MP4 files using mutagen (if installed)
## 🧹 Cleanup
- Removes `.info.json` and `.meta` files after download
## 🛠️ Configuration
- All options are in `data/config.json` (format, resolution, metadata, etc.)
- You can edit this file or use CLI flags to override
## 🐞 Troubleshooting
- Ensure `yt-dlp.exe` is in the `downloader/` folder
- Check `logs/` for error details
- Use `python -m karaoke_downloader.check_resolution` to verify video quality
- If you see errors about ffmpeg/ffprobe, install [ffmpeg](https://ffmpeg.org/download.html) and ensure it is in your PATH
- For best fuzzy matching, install rapidfuzz: `pip install rapidfuzz` (otherwise falls back to slower, less accurate difflib)
---
**Happy Karaoke! 🎤**