| cli | ||
| config | ||
| web | ||
| .gitignore | ||
| PRD.md | ||
| README.md | ||
| requirements.txt | ||
| start_web_ui.py | ||
| test_tool.py | ||
Karaoke Song Library Cleanup Tool
A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke song collections. The tool identifies duplicate songs across different formats and generates a "skip list" for future imports.
Features
Core Functionality
- Song Deduplication: Identifies duplicate songs based on artist + title matching
- Multi-Format Support: Handles MP3, CDG, and MP4 files
- CDG/MP3 Pairing: Treats CDG and MP3 files with the same base filename as single karaoke units
- Channel Priority: For MP4 files, prioritizes based on folder names in the path
- Fuzzy Matching: Configurable fuzzy matching for artist/title comparison
File Type Priority System
- MP4 files (with channel priority sorting)
- CDG/MP3 pairs (treated as single units)
- Standalone MP3 files
- Standalone CDG files
Web UI Features
- Interactive Table View: Sortable, filterable grid of duplicate songs
- Bulk Selection: Select multiple items for batch operations
- Search & Filter: Real-time search across artists, titles, and paths
- Responsive Design: Mobile-friendly interface
- Easy Startup: Automated dependency checking and browser launch
🆕 Drag-and-Drop Priority Management
- Visual Priority Reordering: Drag and drop files within each duplicate group to change their priority
- Persistent Preferences: Save your priority preferences for future CLI runs
- Priority Indicators: Visual numbered indicators show the current priority order
- Reset Functionality: Easily reset to default priorities if needed
Installation
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
Usage
CLI Tool
Run the main CLI tool:
python cli/main.py
Options:
--verbose: Enable verbose output--save-reports: Generate detailed analysis reports--dry-run: Show what would be done without making changes
Web UI
Start the web interface:
python start_web_ui.py
The web UI will automatically:
- Check for required dependencies
- Start the Flask server
- Open your default browser to the interface
Priority Preferences
The web UI now supports drag-and-drop priority management:
- Reorder Files: Click the "Details" button for any duplicate group, then drag files to reorder them
- Save Preferences: Click "Save Priority Preferences" to store your choices
- Apply to CLI: Future CLI runs will automatically use your saved preferences
- Reset: Use "Reset Priorities" to restore default behavior
Your preferences are saved in data/preferences/priority_preferences.json and will be automatically loaded by the CLI tool.
Configuration
Edit config/config.json to customize:
- Channel priorities for MP4 files
- Matching settings (fuzzy matching, thresholds)
- Output options
File Structure
KaraokeMerge/
├── data/
│ ├── allSongs.json # Input: Your song library data
│ ├── skipSongs.json # Output: Generated skip list
│ ├── preferences/ # User priority preferences
│ │ └── priority_preferences.json
│ └── reports/ # Detailed analysis reports
├── config/
│ └── config.json # Configuration settings
├── cli/
│ ├── main.py # Main CLI application
│ ├── matching.py # Song matching logic
│ ├── preferences.py # Priority preferences manager
│ ├── report.py # Report generation
│ └── utils.py # Utility functions
├── web/ # Web UI for manual review
│ ├── app.py # Flask web application
│ └── templates/
│ └── index.html # Web interface template
├── start_web_ui.py # Web UI startup script
├── test_tool.py # Validation and testing script
├── requirements.txt # Python dependencies
├── PRD.md # Product Requirements Document
└── README.md # Project documentation
Data Requirements
Place your song library data in data/allSongs.json with the following format:
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3"
}
]
Performance
Successfully tested with:
- 37,015 songs
- 12,424 duplicates (33.6% duplicate rate)
- 10,998 unique files after deduplication
Contributing
This project follows strict architectural principles:
- Separation of Concerns: Modular design with focused responsibilities
- Constants and Enums: Centralized configuration
- Readability: Self-documenting code with clear naming
- Extensibility: Designed for future growth
- Refactorability: Minimal coupling between components