KaraokeMerge/README.md

6.6 KiB

Karaoke Song Library Cleanup Tool

A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke song collections. The tool identifies duplicate songs across different formats and generates a "skip list" for future imports.

Features

Core Functionality

  • Song Deduplication: Identifies duplicate songs based on artist + title matching
  • Multi-Format Support: Handles MP3, CDG, and MP4 files
  • CDG/MP3 Pairing: Treats CDG and MP3 files with the same base filename as single karaoke units
  • Channel Priority: For MP4 files, prioritizes based on folder names in the path
  • Fuzzy Matching: Configurable fuzzy matching for artist/title comparison
  • Playlist Validation: Validates playlists against your song library with exact and fuzzy matching

File Type Priority System

  1. MP4 files (with channel priority sorting)
  2. CDG/MP3 pairs (treated as single units)
  3. Standalone MP3 files
  4. Standalone CDG files

Web UI Features

  • Interactive Table View: Sortable, filterable grid of duplicate songs
  • Bulk Selection: Select multiple items for batch operations
  • Search & Filter: Real-time search across artists, titles, and paths
  • Responsive Design: Mobile-friendly interface
  • Easy Startup: Automated dependency checking and browser launch

🆕 Drag-and-Drop Priority Management

  • Visual Priority Reordering: Drag and drop files within each duplicate group to change their priority
  • Persistent Preferences: Save your priority preferences for future CLI runs
  • Priority Indicators: Visual numbered indicators show the current priority order
  • Reset Functionality: Easily reset to default priorities if needed

🔄 Reset & Regenerate Feature

  • One-Click Reset: Delete all generated files and regenerate everything with a single button click
  • Complete Cleanup: Removes skipSongs.json, reports directory, and preferences directory
  • Automatic CLI Execution: Runs the CLI tool automatically to regenerate all data
  • Progress Feedback: Shows loading state and provides detailed feedback on completion

Installation

Prerequisites

  • Python 3.7 or higher
  • pip (Python package installer)

Installation Steps

  1. Clone the repository:

    git clone <repository-url>
    cd KaraokeMerge
    
  2. Install dependencies:

    pip install -r requirements.txt
    

    Note: The installation includes:

    • Flask for the web UI
    • fuzzywuzzy and python-Levenshtein for fuzzy matching in playlist validation
    • All other required dependencies
  3. Verify installation:

    python -c "import flask, fuzzywuzzy; print('All dependencies installed successfully!')"
    

Migration from Previous Versions

If you're upgrading from a previous version that used allSongs.json, run the migration script:

python3 migrate_to_songs_json.py

This script will:

  • Rename allSongs.json to songs.json
  • Add data_directory configuration to config.json
  • Create backups of your original files

Usage

CLI Tool

Run the main CLI tool:

python cli/main.py

Options:

  • --verbose: Enable verbose output
  • --save-reports: Generate detailed analysis reports
  • --dry-run: Show what would be done without making changes

Web UI

Start the web interface:

python start_web_ui.py

The web UI will automatically:

  1. Check for required dependencies
  2. Start the Flask server
  3. Open your default browser to the interface

Playlist Validation

Validate your playlists against your song library:

cd cli
python playlist_validator.py

Options:

  • --playlist-index N: Validate a specific playlist by index
  • --output results.json: Save results to a JSON file
  • --apply: Apply corrections to playlists (use with caution)

Note: Playlist validation uses fuzzy matching to find potential matches. Make sure fuzzywuzzy is installed for best results.

Priority Preferences

The web UI now supports drag-and-drop priority management:

  1. Reorder Files: Click the "Details" button for any duplicate group, then drag files to reorder them
  2. Save Preferences: Click "Save Priority Preferences" to store your choices
  3. Apply to CLI: Future CLI runs will automatically use your saved preferences
  4. Reset: Use "Reset Priorities" to restore default behavior

Your preferences are saved in data/preferences/priority_preferences.json and will be automatically loaded by the CLI tool.

Configuration

Edit config/config.json to customize:

  • Channel priorities for MP4 files
  • Matching settings (fuzzy matching, thresholds)
  • Output options

File Structure

KaraokeMerge/
├── data/
│   ├── songs.json             # Input: Your song library data
│   ├── skipSongs.json         # Output: Generated skip list
│   ├── preferences/           # User priority preferences
│   │   └── priority_preferences.json
│   └── reports/               # Detailed analysis reports
├── config/
│   └── config.json            # Configuration settings
├── cli/
│   ├── main.py                # Main CLI application
│   ├── matching.py            # Song matching logic
│   ├── preferences.py         # Priority preferences manager
│   ├── report.py              # Report generation
│   └── utils.py               # Utility functions
├── web/                       # Web UI for manual review
│   ├── app.py                 # Flask web application
│   └── templates/
│       └── index.html         # Web interface template
├── start_web_ui.py            # Web UI startup script
├── test_tool.py               # Validation and testing script
├── requirements.txt           # Python dependencies
├── PRD.md                     # Product Requirements Document
└── README.md                  # Project documentation

Data Requirements

Place your song library data in data/songs.json with the following format:

[
  {
    "artist": "Artist Name",
    "title": "Song Title",
    "path": "path/to/file.mp3"
  }
]

Performance

Successfully tested with:

  • 37,015 songs
  • 12,424 duplicates (33.6% duplicate rate)
  • 10,998 unique files after deduplication

Contributing

This project follows strict architectural principles:

  • Separation of Concerns: Modular design with focused responsibilities
  • Constants and Enums: Centralized configuration
  • Readability: Self-documenting code with clear naming
  • Extensibility: Designed for future growth
  • Refactorability: Minimal coupling between components