KaraokeMerge/README.md

141 lines
4.8 KiB
Markdown

# Karaoke Song Library Cleanup Tool
A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke song collections. The tool identifies duplicate songs across different formats and generates a "skip list" for future imports.
## Features
### Core Functionality
- **Song Deduplication**: Identifies duplicate songs based on artist + title matching
- **Multi-Format Support**: Handles MP3, CDG, and MP4 files
- **CDG/MP3 Pairing**: Treats CDG and MP3 files with the same base filename as single karaoke units
- **Channel Priority**: For MP4 files, prioritizes based on folder names in the path
- **Fuzzy Matching**: Configurable fuzzy matching for artist/title comparison
### File Type Priority System
1. **MP4 files** (with channel priority sorting)
2. **CDG/MP3 pairs** (treated as single units)
3. **Standalone MP3** files
4. **Standalone CDG** files
### Web UI Features
- **Interactive Table View**: Sortable, filterable grid of duplicate songs
- **Bulk Selection**: Select multiple items for batch operations
- **Search & Filter**: Real-time search across artists, titles, and paths
- **Responsive Design**: Mobile-friendly interface
- **Easy Startup**: Automated dependency checking and browser launch
### 🆕 Drag-and-Drop Priority Management
- **Visual Priority Reordering**: Drag and drop files within each duplicate group to change their priority
- **Persistent Preferences**: Save your priority preferences for future CLI runs
- **Priority Indicators**: Visual numbered indicators show the current priority order
- **Reset Functionality**: Easily reset to default priorities if needed
## Installation
1. Clone the repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
## Usage
### CLI Tool
Run the main CLI tool:
```bash
python cli/main.py
```
Options:
- `--verbose`: Enable verbose output
- `--save-reports`: Generate detailed analysis reports
- `--dry-run`: Show what would be done without making changes
### Web UI
Start the web interface:
```bash
python start_web_ui.py
```
The web UI will automatically:
1. Check for required dependencies
2. Start the Flask server
3. Open your default browser to the interface
### Priority Preferences
The web UI now supports drag-and-drop priority management:
1. **Reorder Files**: Click the "Details" button for any duplicate group, then drag files to reorder them
2. **Save Preferences**: Click "Save Priority Preferences" to store your choices
3. **Apply to CLI**: Future CLI runs will automatically use your saved preferences
4. **Reset**: Use "Reset Priorities" to restore default behavior
Your preferences are saved in `data/preferences/priority_preferences.json` and will be automatically loaded by the CLI tool.
## Configuration
Edit `config/config.json` to customize:
- Channel priorities for MP4 files
- Matching settings (fuzzy matching, thresholds)
- Output options
## File Structure
```
KaraokeMerge/
├── data/
│ ├── allSongs.json # Input: Your song library data
│ ├── skipSongs.json # Output: Generated skip list
│ ├── preferences/ # User priority preferences
│ │ └── priority_preferences.json
│ └── reports/ # Detailed analysis reports
├── config/
│ └── config.json # Configuration settings
├── cli/
│ ├── main.py # Main CLI application
│ ├── matching.py # Song matching logic
│ ├── preferences.py # Priority preferences manager
│ ├── report.py # Report generation
│ └── utils.py # Utility functions
├── web/ # Web UI for manual review
│ ├── app.py # Flask web application
│ └── templates/
│ └── index.html # Web interface template
├── start_web_ui.py # Web UI startup script
├── test_tool.py # Validation and testing script
├── requirements.txt # Python dependencies
├── PRD.md # Product Requirements Document
└── README.md # Project documentation
```
## Data Requirements
Place your song library data in `data/allSongs.json` with the following format:
```json
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3"
}
]
```
## Performance
Successfully tested with:
- 37,015 songs
- 12,424 duplicates (33.6% duplicate rate)
- 10,998 unique files after deduplication
## Contributing
This project follows strict architectural principles:
- **Separation of Concerns**: Modular design with focused responsibilities
- **Constants and Enums**: Centralized configuration
- **Readability**: Self-documenting code with clear naming
- **Extensibility**: Designed for future growth
- **Refactorability**: Minimal coupling between components