KaraokeVideoDownloader/PRD.md

6.7 KiB
Raw Blame History

🎤 Karaoke Video Downloader PRD (v2.1)

Overview

A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using yt-dlp.exe, with advanced tracking, songlist prioritization, and flexible configuration.


📋 Goals

  • Download karaoke videos from YouTube channels or playlists.
  • Organize downloads by channel (or playlist) in subfolders.
  • Avoid re-downloading the same videos (robust tracking).
  • Prioritize and track a custom songlist across channels.
  • Allow flexible, user-friendly configuration.

🧑‍💻 Target Users

  • Karaoke DJs, home karaoke users, event hosts, or anyone needing offline karaoke video libraries.
  • Users comfortable with command-line tools.

⚙️ Platform & Stack

  • Platform: Windows
  • Interface: Command-line (CLI)
  • Tech Stack: Python 3.7+, yt-dlp.exe, mutagen (for ID3 tagging)

📥 Input

  • YouTube channel or playlist URLs (e.g. https://www.youtube.com/@SingKingKaraoke/videos)
  • Optional: data/channels.txt file with multiple channel URLs (one per line)
  • Optional: data/songList.json for prioritized song downloads

Example Usage

python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos
python download_karaoke.py --file data/channels.txt
python download_karaoke.py --songlist-only
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
python download_karaoke.py --clear-cache SingKingKaraoke

📤 Output

  • MP4 files in downloads/<ChannelName>/ subfolders
  • All videos tracked in data/karaoke_tracking.json
  • Songlist progress tracked in data/songlist_tracking.json
  • Logs in logs/

🛠️ Features

  • Channel-based downloads (with per-channel folders)
  • Robust JSON tracking (downloaded, partial, failed, etc.)
  • Batch saving and channel video caching for performance
  • Configurable download resolution and yt-dlp options (data/config.json)
  • Songlist integration: prioritize and track custom songlists
  • Songlist-only mode: download only songs from the songlist
  • Global songlist tracking to avoid duplicates across channels
  • ID3 tagging for artist/title in MP4 files (mutagen)
  • Real-time progress and detailed logging
  • Automatic cleanup of extra yt-dlp files
  • Reset/clear channel tracking and files via CLI
  • Clear channel cache via CLI
  • Download plan pre-scan and caching: Before downloading, the tool pre-scans all channels for songlist matches, builds a download plan, and prints stats. The plan is cached for 1 day in data/download_plan_cache.json for fast resuming and reliability. Use --force-download-plan to force a refresh.
  • Latest-per-channel download: Download the latest N videos from each channel in a single batch, with a per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.

📂 Folder Structure

KaroakeVideoDownloader/
├── karaoke_downloader/         # All core Python code and utilities
│   ├── downloader.py           # Main downloader class
│   ├── cli.py                  # CLI entry point
│   ├── id3_utils.py            # ID3 tagging helpers
│   ├── songlist_manager.py     # Songlist logic
│   ├── youtube_utils.py        # YouTube helpers
│   ├── tracking_manager.py     # Tracking logic
│   ├── check_resolution.py     # Resolution checker utility
│   ├── resolution_cli.py       # Resolution config CLI
│   └── tracking_cli.py         # Tracking management CLI
├── data/                      # All config, tracking, cache, and songlist files
│   ├── config.json
│   ├── karaoke_tracking.json
│   ├── songlist_tracking.json
│   ├── channel_cache.json
│   ├── channels.txt
│   └── songList.json
├── downloads/                 # All video output
│   └── [ChannelName]/         # Per-channel folders
├── logs/                      # Download logs
├── downloader/yt-dlp.exe      # yt-dlp binary
├── tests/                     # Diagnostic and test scripts
│   └── test_installation.py
├── download_karaoke.py        # Main entry point (thin wrapper)
├── README.md
├── PRD.md
├── requirements.txt
└── download_karaoke.bat       # (optional Windows launcher)

🚦 CLI Options (Summary)

  • --file <data/channels.txt>: Download from a list of channels
  • --songlist-priority: Prioritize songlist songs in download queue
  • --songlist-only: Download only songs from the songlist
  • --songlist-status: Show songlist download progress
  • --limit <N>: Limit number of downloads
  • --resolution <720p|1080p|...>: Override resolution
  • --status: Show download/tracking status
  • --reset-channel <CHANNEL_NAME>: Reset all tracking and files for a channel
  • --reset-songlist: When used with --reset-channel, also reset songlist songs for this channel
  • --clear-cache <CHANNEL_ID|all>: Clear channel video cache for a specific channel or all
  • --force-download-plan: Force refresh the download plan cache (re-scan all channels for matches)
  • --latest-per-channel: Download the latest N videos from each channel (use with --limit)

🧠 Logic Highlights

  • Tracking: All downloads, statuses, and formats are tracked in JSON files for reliability and deduplication.
  • Songlist: Loads and normalizes data/songList.json, matches against available videos, and prioritizes or restricts downloads accordingly.
  • Batch/Caching: Channel video lists are cached to minimize API calls; tracking is batch-saved for performance.
  • ID3 Tagging: Artist/title extracted from video title and embedded in MP4 files.
  • Cleanup: Extra files from yt-dlp (e.g., .info.json) are automatically removed after download.
  • Reset/Clear: Use --reset-channel to reset all tracking and files for a channel (optionally including songlist songs with --reset-songlist). Use --clear-cache to clear cached video lists for a channel or all channels.
  • Download plan pre-scan: Before downloading, the tool scans all channels for songlist matches, builds a download plan, and prints stats (matches, unmatched, per-channel breakdown). The plan is cached for 1 day and reused unless --force-download-plan is set.
  • Latest-per-channel plan: Download the latest N videos from each channel, with a per-channel plan and robust resume. Each channel is removed from the plan as it completes. Plan cache is deleted when all channels are done.

🚀 Future Enhancements

  • Web UI for easier management
  • More advanced song matching (fuzzy, multi-language)
  • Download scheduling and retry logic
  • More granular status reporting