# Karaoke Song Library Cleanup Tool A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke song collections. The tool identifies duplicate songs across different formats and generates a "skip list" for future imports. ## Features ### Core Functionality - **Song Deduplication**: Identifies duplicate songs based on artist + title matching - **Multi-Format Support**: Handles MP3, CDG, and MP4 files - **CDG/MP3 Pairing**: Treats CDG and MP3 files with the same base filename as single karaoke units - **Channel Priority**: For MP4 files, prioritizes based on folder names in the path - **Fuzzy Matching**: Configurable fuzzy matching for artist/title comparison - **Playlist Validation**: Validates playlists against your song library with exact and fuzzy matching ### File Type Priority System 1. **MP4 files** (with channel priority sorting) 2. **CDG/MP3 pairs** (treated as single units) 3. **Standalone MP3** files 4. **Standalone CDG** files ### Web UI Features - **Interactive Table View**: Sortable, filterable grid of duplicate songs - **Bulk Selection**: Select multiple items for batch operations - **Search & Filter**: Real-time search across artists, titles, and paths - **Responsive Design**: Mobile-friendly interface - **Easy Startup**: Automated dependency checking and browser launch ### 🆕 Drag-and-Drop Priority Management - **Visual Priority Reordering**: Drag and drop files within each duplicate group to change their priority - **Persistent Preferences**: Save your priority preferences for future CLI runs - **Priority Indicators**: Visual numbered indicators show the current priority order - **Reset Functionality**: Easily reset to default priorities if needed ## Installation ### Prerequisites - Python 3.7 or higher - pip (Python package installer) ### Installation Steps 1. Clone the repository: ```bash git clone cd KaraokeMerge ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` **Note**: The installation includes: - **Flask** for the web UI - **fuzzywuzzy** and **python-Levenshtein** for fuzzy matching in playlist validation - All other required dependencies 3. Verify installation: ```bash python -c "import flask, fuzzywuzzy; print('All dependencies installed successfully!')" ``` ## Usage ### CLI Tool Run the main CLI tool: ```bash python cli/main.py ``` Options: - `--verbose`: Enable verbose output - `--save-reports`: Generate detailed analysis reports - `--dry-run`: Show what would be done without making changes ### Web UI Start the web interface: ```bash python start_web_ui.py ``` The web UI will automatically: 1. Check for required dependencies 2. Start the Flask server 3. Open your default browser to the interface ### Playlist Validation Validate your playlists against your song library: ```bash cd cli python playlist_validator.py ``` Options: - `--playlist-index N`: Validate a specific playlist by index - `--output results.json`: Save results to a JSON file - `--apply`: Apply corrections to playlists (use with caution) **Note**: Playlist validation uses fuzzy matching to find potential matches. Make sure fuzzywuzzy is installed for best results. ### Priority Preferences The web UI now supports drag-and-drop priority management: 1. **Reorder Files**: Click the "Details" button for any duplicate group, then drag files to reorder them 2. **Save Preferences**: Click "Save Priority Preferences" to store your choices 3. **Apply to CLI**: Future CLI runs will automatically use your saved preferences 4. **Reset**: Use "Reset Priorities" to restore default behavior Your preferences are saved in `data/preferences/priority_preferences.json` and will be automatically loaded by the CLI tool. ## Configuration Edit `config/config.json` to customize: - Channel priorities for MP4 files - Matching settings (fuzzy matching, thresholds) - Output options ## File Structure ``` KaraokeMerge/ ├── data/ │ ├── allSongs.json # Input: Your song library data │ ├── skipSongs.json # Output: Generated skip list │ ├── preferences/ # User priority preferences │ │ └── priority_preferences.json │ └── reports/ # Detailed analysis reports ├── config/ │ └── config.json # Configuration settings ├── cli/ │ ├── main.py # Main CLI application │ ├── matching.py # Song matching logic │ ├── preferences.py # Priority preferences manager │ ├── report.py # Report generation │ └── utils.py # Utility functions ├── web/ # Web UI for manual review │ ├── app.py # Flask web application │ └── templates/ │ └── index.html # Web interface template ├── start_web_ui.py # Web UI startup script ├── test_tool.py # Validation and testing script ├── requirements.txt # Python dependencies ├── PRD.md # Product Requirements Document └── README.md # Project documentation ``` ## Data Requirements Place your song library data in `data/allSongs.json` with the following format: ```json [ { "artist": "Artist Name", "title": "Song Title", "path": "path/to/file.mp3" } ] ``` ## Performance Successfully tested with: - 37,015 songs - 12,424 duplicates (33.6% duplicate rate) - 10,998 unique files after deduplication ## Contributing This project follows strict architectural principles: - **Separation of Concerns**: Modular design with focused responsibilities - **Constants and Enums**: Centralized configuration - **Readability**: Self-documenting code with clear naming - **Extensibility**: Designed for future growth - **Refactorability**: Minimal coupling between components