| cli | ||
| config | ||
| web | ||
| .gitignore | ||
| migrate_to_songs_json.py | ||
| PRD.md | ||
| README.md | ||
| requirements.txt | ||
| start_web_ui.py | ||
| test_tool.py | ||
Karaoke Song Library Cleanup Tool
A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke song collections. The tool identifies duplicate songs across different formats and generates a "skip list" for future imports.
Features
Core Functionality
- Song Deduplication: Identifies duplicate songs based on artist + title matching
- Multi-Format Support: Handles MP3, CDG, and MP4 files
- CDG/MP3 Pairing: Treats CDG and MP3 files with the same base filename as single karaoke units
- Channel Priority: For MP4 files, prioritizes based on folder names in the path
- Fuzzy Matching: Configurable fuzzy matching for artist/title comparison
- Playlist Validation: Validates playlists against your song library with exact and fuzzy matching
File Type Priority System
- MP4 files (with channel priority sorting)
- CDG/MP3 pairs (treated as single units)
- Standalone MP3 files
- Standalone CDG files
Web UI Features
- Interactive Table View: Sortable, filterable grid of duplicate songs
- Bulk Selection: Select multiple items for batch operations
- Search & Filter: Real-time search across artists, titles, and paths
- Responsive Design: Mobile-friendly interface
- Easy Startup: Automated dependency checking and browser launch
🆕 Drag-and-Drop Priority Management
- Visual Priority Reordering: Drag and drop files within each duplicate group to change their priority
- Persistent Preferences: Save your priority preferences for future CLI runs
- Priority Indicators: Visual numbered indicators show the current priority order
- Reset Functionality: Easily reset to default priorities if needed
🔄 Reset & Regenerate Feature
- One-Click Reset: Delete all generated files and regenerate everything with a single button click
- Complete Cleanup: Removes skipSongs.json, reports directory, and preferences directory
- Automatic CLI Execution: Runs the CLI tool automatically to regenerate all data
- Progress Feedback: Shows loading state and provides detailed feedback on completion
Installation
Prerequisites
- Python 3.7 or higher
- pip (Python package installer)
Installation Steps
-
Clone the repository:
git clone <repository-url> cd KaraokeMerge -
Install dependencies:
pip install -r requirements.txtNote: The installation includes:
- Flask for the web UI
- fuzzywuzzy and python-Levenshtein for fuzzy matching in playlist validation
- All other required dependencies
-
Verify installation:
python -c "import flask, fuzzywuzzy; print('All dependencies installed successfully!')"
Migration from Previous Versions
If you're upgrading from a previous version that used allSongs.json, run the migration script:
python3 migrate_to_songs_json.py
This script will:
- Rename
allSongs.jsontosongs.json - Add
data_directoryconfiguration toconfig.json - Create backups of your original files
Usage
CLI Tool
Run the main CLI tool:
python cli/main.py
Options:
--verbose: Enable verbose output--save-reports: Generate detailed analysis reports--dry-run: Show what would be done without making changes
Web UI
Start the web interface:
python start_web_ui.py
The web UI will automatically:
- Check for required dependencies
- Start the Flask server
- Open your default browser to the interface
Playlist Validation
Validate your playlists against your song library:
cd cli
python playlist_validator.py
Options:
--playlist-index N: Validate a specific playlist by index--output results.json: Save results to a JSON file--apply: Apply corrections to playlists (use with caution)
Note: Playlist validation uses fuzzy matching to find potential matches. Make sure fuzzywuzzy is installed for best results.
Priority Preferences
The web UI now supports drag-and-drop priority management:
- Reorder Files: Click the "Details" button for any duplicate group, then drag files to reorder them
- Save Preferences: Click "Save Priority Preferences" to store your choices
- Apply to CLI: Future CLI runs will automatically use your saved preferences
- Reset: Use "Reset Priorities" to restore default behavior
Your preferences are saved in data/preferences/priority_preferences.json and will be automatically loaded by the CLI tool.
Configuration
Edit config/config.json to customize:
- Channel priorities for MP4 files
- Matching settings (fuzzy matching, thresholds)
- Output options
File Structure
KaraokeMerge/
├── data/
│ ├── songs.json # Input: Your song library data
│ ├── skipSongs.json # Output: Generated skip list
│ ├── preferences/ # User priority preferences
│ │ └── priority_preferences.json
│ └── reports/ # Detailed analysis reports
├── config/
│ └── config.json # Configuration settings
├── cli/
│ ├── main.py # Main CLI application
│ ├── matching.py # Song matching logic
│ ├── preferences.py # Priority preferences manager
│ ├── report.py # Report generation
│ └── utils.py # Utility functions
├── web/ # Web UI for manual review
│ ├── app.py # Flask web application
│ └── templates/
│ └── index.html # Web interface template
├── start_web_ui.py # Web UI startup script
├── test_tool.py # Validation and testing script
├── requirements.txt # Python dependencies
├── PRD.md # Product Requirements Document
└── README.md # Project documentation
Data Requirements
Place your song library data in data/songs.json with the following format:
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3"
}
]
Performance
Successfully tested with:
- 37,015 songs
- 12,424 duplicates (33.6% duplicate rate)
- 10,998 unique files after deduplication
Contributing
This project follows strict architectural principles:
- Separation of Concerns: Modular design with focused responsibilities
- Constants and Enums: Centralized configuration
- Readability: Self-documenting code with clear naming
- Extensibility: Designed for future growth
- Refactorability: Minimal coupling between components