KaraokeMerge/README.md

198 lines
6.6 KiB
Markdown

# Karaoke Song Library Cleanup Tool
A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke song collections. The tool identifies duplicate songs across different formats and generates a "skip list" for future imports.
## Features
### Core Functionality
- **Song Deduplication**: Identifies duplicate songs based on artist + title matching
- **Multi-Format Support**: Handles MP3, CDG, and MP4 files
- **CDG/MP3 Pairing**: Treats CDG and MP3 files with the same base filename as single karaoke units
- **Channel Priority**: For MP4 files, prioritizes based on folder names in the path
- **Fuzzy Matching**: Configurable fuzzy matching for artist/title comparison
- **Playlist Validation**: Validates playlists against your song library with exact and fuzzy matching
### File Type Priority System
1. **MP4 files** (with channel priority sorting)
2. **CDG/MP3 pairs** (treated as single units)
3. **Standalone MP3** files
4. **Standalone CDG** files
### Web UI Features
- **Interactive Table View**: Sortable, filterable grid of duplicate songs
- **Bulk Selection**: Select multiple items for batch operations
- **Search & Filter**: Real-time search across artists, titles, and paths
- **Responsive Design**: Mobile-friendly interface
- **Easy Startup**: Automated dependency checking and browser launch
### 🆕 Drag-and-Drop Priority Management
- **Visual Priority Reordering**: Drag and drop files within each duplicate group to change their priority
- **Persistent Preferences**: Save your priority preferences for future CLI runs
- **Priority Indicators**: Visual numbered indicators show the current priority order
- **Reset Functionality**: Easily reset to default priorities if needed
### 🔄 Reset & Regenerate Feature
- **One-Click Reset**: Delete all generated files and regenerate everything with a single button click
- **Complete Cleanup**: Removes skipSongs.json, reports directory, and preferences directory
- **Automatic CLI Execution**: Runs the CLI tool automatically to regenerate all data
- **Progress Feedback**: Shows loading state and provides detailed feedback on completion
## Installation
### Prerequisites
- Python 3.7 or higher
- pip (Python package installer)
### Installation Steps
1. Clone the repository:
```bash
git clone <repository-url>
cd KaraokeMerge
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
**Note**: The installation includes:
- **Flask** for the web UI
- **fuzzywuzzy** and **python-Levenshtein** for fuzzy matching in playlist validation
- All other required dependencies
3. Verify installation:
```bash
python -c "import flask, fuzzywuzzy; print('All dependencies installed successfully!')"
```
### Migration from Previous Versions
If you're upgrading from a previous version that used `allSongs.json`, run the migration script:
```bash
python3 migrate_to_songs_json.py
```
This script will:
- Rename `allSongs.json` to `songs.json`
- Add `data_directory` configuration to `config.json`
- Create backups of your original files
## Usage
### CLI Tool
Run the main CLI tool:
```bash
python cli/main.py
```
Options:
- `--verbose`: Enable verbose output
- `--save-reports`: Generate detailed analysis reports
- `--dry-run`: Show what would be done without making changes
### Web UI
Start the web interface:
```bash
python start_web_ui.py
```
The web UI will automatically:
1. Check for required dependencies
2. Start the Flask server
3. Open your default browser to the interface
### Playlist Validation
Validate your playlists against your song library:
```bash
cd cli
python playlist_validator.py
```
Options:
- `--playlist-index N`: Validate a specific playlist by index
- `--output results.json`: Save results to a JSON file
- `--apply`: Apply corrections to playlists (use with caution)
**Note**: Playlist validation uses fuzzy matching to find potential matches. Make sure fuzzywuzzy is installed for best results.
### Priority Preferences
The web UI now supports drag-and-drop priority management:
1. **Reorder Files**: Click the "Details" button for any duplicate group, then drag files to reorder them
2. **Save Preferences**: Click "Save Priority Preferences" to store your choices
3. **Apply to CLI**: Future CLI runs will automatically use your saved preferences
4. **Reset**: Use "Reset Priorities" to restore default behavior
Your preferences are saved in `data/preferences/priority_preferences.json` and will be automatically loaded by the CLI tool.
## Configuration
Edit `config/config.json` to customize:
- Channel priorities for MP4 files
- Matching settings (fuzzy matching, thresholds)
- Output options
## File Structure
```
KaraokeMerge/
├── data/
│ ├── songs.json # Input: Your song library data
│ ├── skipSongs.json # Output: Generated skip list
│ ├── preferences/ # User priority preferences
│ │ └── priority_preferences.json
│ └── reports/ # Detailed analysis reports
├── config/
│ └── config.json # Configuration settings
├── cli/
│ ├── main.py # Main CLI application
│ ├── matching.py # Song matching logic
│ ├── preferences.py # Priority preferences manager
│ ├── report.py # Report generation
│ └── utils.py # Utility functions
├── web/ # Web UI for manual review
│ ├── app.py # Flask web application
│ └── templates/
│ └── index.html # Web interface template
├── start_web_ui.py # Web UI startup script
├── test_tool.py # Validation and testing script
├── requirements.txt # Python dependencies
├── PRD.md # Product Requirements Document
└── README.md # Project documentation
```
## Data Requirements
Place your song library data in `data/songs.json` with the following format:
```json
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3"
}
]
```
## Performance
Successfully tested with:
- 37,015 songs
- 12,424 duplicates (33.6% duplicate rate)
- 10,998 unique files after deduplication
## Contributing
This project follows strict architectural principles:
- **Separation of Concerns**: Modular design with focused responsibilities
- **Constants and Enums**: Centralized configuration
- **Readability**: Self-documenting code with clear naming
- **Extensibility**: Designed for future growth
- **Refactorability**: Minimal coupling between components