Go to file
2025-07-26 17:49:32 -05:00
cli Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 17:13:49 -05:00
config Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 16:40:56 -05:00
web Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 17:49:32 -05:00
.gitignore Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 16:43:48 -05:00
PRD.md Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 17:23:31 -05:00
README.md Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 17:13:49 -05:00
requirements.txt Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 16:40:56 -05:00
start_web_ui.py Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 16:40:56 -05:00
test_tool.py Signed-off-by: mbrucedogs <mbrucedogs@gmail.com> 2025-07-26 17:13:49 -05:00

Karaoke Song Library Cleanup Tool

A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke song collections. The tool identifies duplicate songs across different formats and generates a "skip list" for future imports.

Features

Core Functionality

  • Song Deduplication: Identifies duplicate songs based on artist + title matching
  • Multi-Format Support: Handles MP3, CDG, and MP4 files
  • CDG/MP3 Pairing: Treats CDG and MP3 files with the same base filename as single karaoke units
  • Channel Priority: For MP4 files, prioritizes based on folder names in the path
  • Fuzzy Matching: Configurable fuzzy matching for artist/title comparison

File Type Priority System

  1. MP4 files (with channel priority sorting)
  2. CDG/MP3 pairs (treated as single units)
  3. Standalone MP3 files
  4. Standalone CDG files

Web UI Features

  • Interactive Table View: Sortable, filterable grid of duplicate songs
  • Bulk Selection: Select multiple items for batch operations
  • Search & Filter: Real-time search across artists, titles, and paths
  • Responsive Design: Mobile-friendly interface
  • Easy Startup: Automated dependency checking and browser launch

🆕 Drag-and-Drop Priority Management

  • Visual Priority Reordering: Drag and drop files within each duplicate group to change their priority
  • Persistent Preferences: Save your priority preferences for future CLI runs
  • Priority Indicators: Visual numbered indicators show the current priority order
  • Reset Functionality: Easily reset to default priorities if needed

Installation

  1. Clone the repository
  2. Install dependencies:
    pip install -r requirements.txt
    

Usage

CLI Tool

Run the main CLI tool:

python cli/main.py

Options:

  • --verbose: Enable verbose output
  • --save-reports: Generate detailed analysis reports
  • --dry-run: Show what would be done without making changes

Web UI

Start the web interface:

python start_web_ui.py

The web UI will automatically:

  1. Check for required dependencies
  2. Start the Flask server
  3. Open your default browser to the interface

Priority Preferences

The web UI now supports drag-and-drop priority management:

  1. Reorder Files: Click the "Details" button for any duplicate group, then drag files to reorder them
  2. Save Preferences: Click "Save Priority Preferences" to store your choices
  3. Apply to CLI: Future CLI runs will automatically use your saved preferences
  4. Reset: Use "Reset Priorities" to restore default behavior

Your preferences are saved in data/preferences/priority_preferences.json and will be automatically loaded by the CLI tool.

Configuration

Edit config/config.json to customize:

  • Channel priorities for MP4 files
  • Matching settings (fuzzy matching, thresholds)
  • Output options

File Structure

KaraokeMerge/
├── data/
│   ├── allSongs.json          # Input: Your song library data
│   ├── skipSongs.json         # Output: Generated skip list
│   ├── preferences/           # User priority preferences
│   │   └── priority_preferences.json
│   └── reports/               # Detailed analysis reports
├── config/
│   └── config.json            # Configuration settings
├── cli/
│   ├── main.py                # Main CLI application
│   ├── matching.py            # Song matching logic
│   ├── preferences.py         # Priority preferences manager
│   ├── report.py              # Report generation
│   └── utils.py               # Utility functions
├── web/                       # Web UI for manual review
│   ├── app.py                 # Flask web application
│   └── templates/
│       └── index.html         # Web interface template
├── start_web_ui.py            # Web UI startup script
├── test_tool.py               # Validation and testing script
├── requirements.txt           # Python dependencies
├── PRD.md                     # Product Requirements Document
└── README.md                  # Project documentation

Data Requirements

Place your song library data in data/allSongs.json with the following format:

[
  {
    "artist": "Artist Name",
    "title": "Song Title",
    "path": "path/to/file.mp3"
  }
]

Performance

Successfully tested with:

  • 37,015 songs
  • 12,424 duplicates (33.6% duplicate rate)
  • 10,998 unique files after deduplication

Contributing

This project follows strict architectural principles:

  • Separation of Concerns: Modular design with focused responsibilities
  • Constants and Enums: Centralized configuration
  • Readability: Self-documenting code with clear naming
  • Extensibility: Designed for future growth
  • Refactorability: Minimal coupling between components