Go to file
2025-07-25 13:28:39 -05:00
data Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-25 13:28:39 -05:00
downloader Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-23 22:02:30 -05:00
karaoke_downloader Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-25 13:28:39 -05:00
.gitignore Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-23 22:02:30 -05:00
commands.txt Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-25 12:34:01 -05:00
download_karaoke.bat Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-23 22:02:30 -05:00
download_karaoke.py Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-24 08:17:41 -05:00
examine_cache.py Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-25 12:45:51 -05:00
PRD.md Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-25 08:45:18 -05:00
README.md Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-25 12:34:01 -05:00
requirements.txt Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-23 22:02:30 -05:00

🎤 Karaoke Video Downloader

A Python-based Windows CLI tool to download karaoke videos from YouTube channels/playlists using yt-dlp.exe, with advanced tracking, songlist prioritization, and flexible configuration.

Features

  • 🎵 Channel & Playlist Downloads: Download all videos from a YouTube channel or playlist
  • 📂 Organized Storage: Each channel gets its own folder in downloads/
  • 📝 Robust Tracking: Tracks all downloads, statuses, and formats in JSON
  • 🏆 Songlist Prioritization: Prioritize or restrict downloads to a custom songlist
  • 🔄 Batch Saving & Caching: Efficient, minimizes API calls
  • 🏷️ ID3 Tagging: Adds artist/title metadata to MP4 files
  • 🧹 Automatic Cleanup: Removes extra yt-dlp files
  • 📈 Real-Time Progress: Detailed console and log output
  • 🧹 Reset/Clear Channel: Reset all tracking and files for a channel, or clear channel cache via CLI
  • 🗂️ Latest-per-channel download: Download the latest N videos from each channel in a single batch, with server deduplication, fuzzy matching support, per-channel download plan, robust resume, and unique plan cache. Use --latest-per-channel and --limit N.
  • 🧩 Fuzzy Matching: Optionally use fuzzy string matching for songlist-to-video matching (with --fuzzy-match, requires rapidfuzz for best results)
  • Fast Mode with Early Exit: When a limit is set, scans channels and songs in order, downloads immediately when a match is found, and stops as soon as the limit is reached with successful downloads
  • 🔄 Deduplication Across Channels: Ensures the same song is not downloaded from multiple channels, even if it appears in more than one channel's video list
  • 📋 Default Channel File: Automatically uses data/channels.txt as the default channel list for songlist modes (no need to specify --file every time)
  • 🛡️ Robust Interruption Handling: Progress is saved after each download, preventing re-downloads if the process is interrupted
  • Optimized Scanning: High-performance channel scanning with O(n×m) complexity, pre-processed lookups, and early termination for faster matching
  • 🏷️ Server Duplicates Tracking: Automatically checks against local songs.json file and marks duplicates for future skipping, preventing re-downloads of songs already on the server

🏗️ Architecture

The codebase has been comprehensively refactored into a modular architecture with centralized utilities for improved maintainability, error handling, and code reuse:

Core Modules:

  • downloader.py: Main orchestrator and CLI interface
  • video_downloader.py: Core video download execution and orchestration
  • tracking_manager.py: Download tracking and status management
  • download_planner.py: Download plan building and channel scanning
  • cache_manager.py: Cache operations and file I/O management
  • channel_manager.py: Channel and file management operations
  • songlist_manager.py: Songlist operations and tracking
  • server_manager.py: Server song availability checking
  • fuzzy_matcher.py: Fuzzy matching logic and similarity functions

Utility Modules:

  • youtube_utils.py: Centralized YouTube operations and yt-dlp command generation
  • error_utils.py: Standardized error handling and formatting
  • download_pipeline.py: Abstracted download → verify → tag → track pipeline
  • id3_utils.py: ID3 tagging utilities
  • config_manager.py: Configuration management
  • resolution_cli.py: Resolution checking utilities
  • tracking_cli.py: Tracking management CLI

Benefits:

  • Centralized Utilities: Common operations (yt-dlp commands, error handling) are centralized
  • Reduced Duplication: Eliminated code duplication across modules
  • Consistency: Standardized error messages and processing pipelines
  • Maintainability: Changes isolated to specific modules
  • Testability: Modular components can be tested independently

📋 Requirements

  • Windows 10/11
  • Python 3.7+
  • yt-dlp.exe (in downloader/)
  • mutagen (for ID3 tagging, optional)
  • ffmpeg/ffprobe (for video validation, optional but recommended)
  • rapidfuzz (for fuzzy matching, optional, falls back to difflib)

🚀 Quick Start

💡 Pro Tip: For a complete list of all available commands, see commands.txt - you can copy/paste any command directly into your terminal!

Download a Channel

python download_karaoke.py https://www.youtube.com/@SingKingKaraoke/videos

Download Only Songlist Songs (Fast Mode)

python download_karaoke.py --songlist-only --limit 5

Focus on Specific Playlists by Title

python download_karaoke.py --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100"

Download with Fuzzy Matching

python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85

Download Latest N Videos Per Channel

python download_karaoke.py --latest-per-channel --limit 5

Download Latest N Videos Per Channel (with fuzzy matching)

python download_karaoke.py --latest-per-channel --limit 5 --fuzzy-match --fuzzy-threshold 85

Prioritize Songlist in Download Queue

python download_karaoke.py --songlist-priority

Show Songlist Download Progress

python download_karaoke.py --songlist-status

Limit Number of Downloads

python download_karaoke.py --limit 5

Override Resolution

python download_karaoke.py --resolution 1080p

Reset/Start Over for a Channel

python download_karaoke.py --reset-channel SingKingKaraoke

Reset Channel and Songlist Songs

python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist

Clear Channel Cache

python download_karaoke.py --clear-cache SingKingKaraoke
python download_karaoke.py --clear-cache all

🧠 Songlist Integration

  • Place your prioritized song list in data/songList.json (see example format below).
  • The tool will match and prioritize these songs across all available channel videos.
  • Use --songlist-only to download only these songs, or --songlist-priority to prioritize them in the queue.
  • Use --songlist-focus to download only songs from specific playlists by title (e.g., --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100").
  • Download progress for the songlist is tracked globally in data/songlist_tracking.json.

Example data/songList.json

[
  {
    "title": "2025 - Apple Top 50",
    "songs": [
      { "artist": "Kendrick Lamar & SZA", "title": "luther", "position": 1 },
      { "artist": "Kendrick Lamar", "title": "Not Like Us", "position": 2 }
    ]
  },
  {
    "title": "2024 - Billboard Hot 100",
    "songs": [
      { "artist": "Taylor Swift", "title": "Cruel Summer", "position": 1 },
      { "artist": "Billie Eilish", "title": "Happier Than Ever", "position": 2 }
    ]
  }
]

🛠️ Tracking & Caching

  • data/karaoke_tracking.json: Tracks all downloads, statuses, and formats
  • data/songlist_tracking.json: Tracks global songlist download progress
  • data/server_duplicates_tracking.json: Tracks songs found to be duplicates on the server for future skipping
  • data/channel_cache.json: Caches channel video lists for performance

📂 Folder Structure

KaroakeVideoDownloader/
├── commands.txt               # Complete CLI commands reference (copy/paste ready)
├── karaoke_downloader/         # All core Python code and utilities
│   ├── downloader.py           # Main orchestrator and CLI interface
│   ├── cli.py                  # CLI entry point
│   ├── video_downloader.py     # Core video download execution and orchestration
│   ├── tracking_manager.py     # Download tracking and status management
│   ├── download_planner.py     # Download plan building and channel scanning
│   ├── cache_manager.py        # Cache operations and file I/O management
│   ├── channel_manager.py      # Channel and file management operations
│   ├── songlist_manager.py     # Songlist operations and tracking
│   ├── server_manager.py       # Server song availability checking
│   ├── fuzzy_matcher.py        # Fuzzy matching logic and similarity functions
│   ├── youtube_utils.py        # Centralized YouTube operations and yt-dlp commands
│   ├── error_utils.py          # Standardized error handling and formatting
│   ├── download_pipeline.py    # Abstracted download → verify → tag → track pipeline
│   ├── id3_utils.py            # ID3 tagging utilities
│   ├── config_manager.py       # Configuration management
│   ├── check_resolution.py     # Resolution checker utility
│   ├── resolution_cli.py       # Resolution config CLI
│   └── tracking_cli.py         # Tracking management CLI
├── data/                      # All config, tracking, cache, and songlist files
│   ├── config.json
│   ├── karaoke_tracking.json
│   ├── songlist_tracking.json
│   ├── channel_cache.json
│   ├── channels.txt
│   └── songList.json
├── downloads/                 # All video output
│   └── [ChannelName]/         # Per-channel folders
├── logs/                      # Download logs
├── downloader/yt-dlp.exe      # yt-dlp binary
├── tests/                     # Diagnostic and test scripts
│   └── test_installation.py
├── download_karaoke.py        # Main entry point (thin wrapper)
├── README.md
├── PRD.md
├── requirements.txt
└── download_karaoke.bat       # (optional Windows launcher)

🚦 CLI Options

📋 Complete Command Reference: See commands.txt for all available commands with examples - perfect for copy/paste!

Key Options:

  • --file <data/channels.txt>: Download from a list of channels (optional, defaults to data/channels.txt for songlist modes)
  • --songlist-priority: Prioritize songlist songs in download queue
  • --songlist-only: Download only songs from the songlist
  • --songlist-focus <PLAYLIST_TITLE1> <PLAYLIST_TITLE2>...: Focus on specific playlists by title (e.g., --songlist-focus "2025 - Apple Top 50" "2024 - Billboard Hot 100")
  • --songlist-status: Show songlist download progress
  • --limit <N>: Limit number of downloads (enables fast mode with early exit)
  • --resolution <720p|1080p|...>: Override resolution
  • --status: Show download/tracking status
  • --reset-channel <CHANNEL_NAME>: Reset all tracking and files for a channel
  • --reset-songlist: When used with --reset-channel, also reset songlist songs for this channel
  • --clear-cache <CHANNEL_ID|all>: Clear channel video cache for a specific channel or all
  • --clear-server-duplicates: Clear server duplicates tracking (allows re-checking songs against server)
  • --latest-per-channel: Download the latest N videos from each channel (use with --limit)
  • --fuzzy-match: Enable fuzzy matching for songlist-to-video matching (uses rapidfuzz if available)
  • --fuzzy-threshold <N>: Fuzzy match threshold (0-100, default 85)

📝 Example Usage

💡 For complete examples: See commands.txt for all command variations with explanations!

# Fast mode with fuzzy matching (no need to specify --file)
python download_karaoke.py --songlist-only --limit 10 --fuzzy-match --fuzzy-threshold 85

# Latest videos per channel
python download_karaoke.py --latest-per-channel --limit 5

# Traditional full scan (no limit)
python download_karaoke.py --songlist-only

# Channel-specific operations
python download_karaoke.py --reset-channel SingKingKaraoke
python download_karaoke.py --reset-channel SingKingKaraoke --reset-songlist
python download_karaoke.py --clear-cache all
python download_karaoke.py --clear-server-duplicates

🏷️ ID3 Tagging

  • Adds artist/title/album/genre to MP4 files using mutagen (if installed)

🧹 Cleanup

  • Removes .info.json and .meta files after download

🛠️ Configuration

  • All options are in data/config.json (format, resolution, metadata, etc.)
  • You can edit this file or use CLI flags to override

📋 Command Reference File

commands.txt contains a comprehensive list of all CLI commands with explanations. This file is designed for easy copy/paste usage and includes:

  • All basic download commands
  • Songlist operations
  • Latest-per-channel downloads
  • Cache and tracking management
  • Reset and cleanup operations
  • Advanced combinations
  • Common workflows
  • Troubleshooting commands

🔄 Maintenance Note: The commands.txt file should be kept up to date with any CLI changes. When adding new command-line options or modifying existing ones, update this file to reflect all available commands and their usage.

🔧 Refactoring Improvements (v3.2)

The codebase has been comprehensively refactored to improve maintainability and reduce code duplication:

Key Improvements

  • Centralized yt-dlp Command Generation: Standardized command building and execution across all download operations
  • Enhanced Error Handling: Structured exception hierarchy with consistent error messages and formatting
  • Abstracted Download Pipeline: Reusable download → verify → tag → track process for consistent processing
  • Reduced Code Duplication: Eliminated duplicate code across modules through centralized utilities

New Utility Modules

  • youtube_utils.py: Centralized YouTube operations and yt-dlp command generation
  • error_utils.py: Standardized error handling with structured exception hierarchy
  • download_pipeline.py: Abstracted download pipeline for consistent processing

Benefits

  • Improved Maintainability: Changes to yt-dlp configuration only require updates in one place
  • Better Error Handling: Consistent error messages and better debugging context
  • Enhanced Testability: Modular components can be tested independently
  • Reduced Complexity: Single source of truth for common operations

🐞 Troubleshooting

  • Ensure yt-dlp.exe is in the downloader/ folder
  • Check logs/ for error details
  • Use python -m karaoke_downloader.check_resolution to verify video quality
  • If you see errors about ffmpeg/ffprobe, install ffmpeg and ensure it is in your PATH
  • For best fuzzy matching, install rapidfuzz: pip install rapidfuzz (otherwise falls back to slower, less accurate difflib)

Happy Karaoke! 🎤