Compare commits

...

3 Commits

22 changed files with 328219 additions and 269 deletions

24
PRD.md
View File

@ -51,6 +51,7 @@ These principles are fundamental to the project's long-term success and must be
- **CLI Commands Documentation:** All CLI functionality, options, and usage examples must be documented in `cli/commands.txt`
- **Code Comments:** Significant logic changes should include inline documentation
- **API Documentation:** New endpoints, functions, or interfaces must be documented
- **API Update Requirement:** Whenever a new API endpoint is added, the PRD.md, README.md, and cli/commands.txt MUST be updated to reflect the new functionality
**Documentation Update Checklist:**
- [ ] Update PRD.md with any architectural or requirement changes
@ -59,6 +60,7 @@ These principles are fundamental to the project's long-term success and must be
- [ ] Add inline comments for complex logic or business rules
- [ ] Update any configuration examples or file structure documentation
- [ ] Review and update implementation status sections
- [ ] **API Updates:** When new API endpoints are added, update PRD.md, README.md, and cli/commands.txt
**CLI Commands Documentation Requirements:**
- **Comprehensive Coverage:** All CLI arguments, options, and flags must be documented with examples
@ -68,6 +70,14 @@ These principles are fundamental to the project's long-term success and must be
- **Integration Notes:** Document how CLI integrates with web UI and other components
- **Version Tracking:** Keep version information and feature status up to date
**API Documentation Requirements:**
- **Endpoint Documentation:** All new API endpoints must be documented in the PRD.md with their purpose, parameters, and responses
- **README Integration:** API changes must be reflected in README.md with usage examples and integration notes
- **CLI Integration:** If CLI commands interact with APIs, they must be documented in cli/commands.txt
- **Version Tracking:** API versioning and changes must be tracked in documentation
- **Error Handling:** Document all possible error responses and status codes
- **Authentication:** Document any authentication requirements or API key usage
This documentation requirement is mandatory and ensures the project remains maintainable and accessible to future developers and users.
### 2.3 Code Quality & Development Standards
@ -151,7 +161,7 @@ These standards ensure the codebase remains clean, maintainable, and accessible
### 3.1 Input
- Reads from `/data/allSongs.json`
- Reads from `/data/songs.json`
- Each song includes at least:
- `artist`, `title`, `path`, (plus id3 tag info, `channel` for MP4s)
@ -220,7 +230,7 @@ These standards ensure the codebase remains clean, maintainable, and accessible
```
KaraokeMerge/
├── data/
│ ├── allSongs.json # Input: Your song library data
│ ├── songs.json # Input: Your song library data
│ ├── skipSongs.json # Output: Generated skip list
│ ├── preferences/ # User priority preferences
│ │ ├── priority_preferences.json
@ -287,6 +297,13 @@ KaraokeMerge/
- **Priority Persistence**: Save/load user priority preferences to/from JSON files
- **Priority Preferences API**: RESTful endpoints for managing priority preferences
#### **Reset & Regenerate System**
- **One-Click Reset**: Delete all generated files and regenerate everything with a single button click
- **Complete Cleanup**: Removes skipSongs.json, reports directory, and preferences directory
- **Automatic CLI Execution**: Runs the CLI tool automatically to regenerate all data
- **Progress Feedback**: Shows loading state and provides detailed feedback on completion
- **Safety Confirmation**: Requires user confirmation before performing destructive operations
#### **User Interface Enhancements**
- **Visual Status Indicators**: Color-coded cards (green for kept, red for skipped)
- **File Type Badges**: Visual indicators for MP3, MP4, and CDG files
@ -400,7 +417,7 @@ data/preferences/
### ✅ Completed Features
#### **Core CLI Functionality**
- [x] Write initial CLI tool to parse allSongs.json, deduplicate, and output skipSongs.json
- [x] Write initial CLI tool to parse songs.json, deduplicate, and output skipSongs.json
- [x] Print CLI summary reports (with verbosity control)
- [x] Implement config file support for channel priority
- [x] Organize folder/file structure for easy expansion
@ -437,6 +454,7 @@ data/preferences/
- [x] Pattern analysis and channel optimization suggestions
- [x] Non-destructive operation (skip lists only)
- [x] Verbose and dry-run modes
- [x] Reset & regenerate functionality with one-click cleanup
### 🎯 Current Implementation

View File

@ -10,6 +10,7 @@ A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke
- **CDG/MP3 Pairing**: Treats CDG and MP3 files with the same base filename as single karaoke units
- **Channel Priority**: For MP4 files, prioritizes based on folder names in the path
- **Fuzzy Matching**: Configurable fuzzy matching for artist/title comparison
- **Playlist Validation**: Validates playlists against your song library with exact and fuzzy matching
### File Type Priority System
1. **MP4 files** (with channel priority sorting)
@ -30,14 +31,55 @@ A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke
- **Priority Indicators**: Visual numbered indicators show the current priority order
- **Reset Functionality**: Easily reset to default priorities if needed
### 🔄 Reset & Regenerate Feature
- **One-Click Reset**: Delete all generated files and regenerate everything with a single button click
- **Complete Cleanup**: Removes skipSongs.json, reports directory, and preferences directory
- **Automatic CLI Execution**: Runs the CLI tool automatically to regenerate all data
- **Progress Feedback**: Shows loading state and provides detailed feedback on completion
## Installation
1. Clone the repository
### Prerequisites
- Python 3.7 or higher
- pip (Python package installer)
### Installation Steps
1. Clone the repository:
```bash
git clone <repository-url>
cd KaraokeMerge
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
**Note**: The installation includes:
- **Flask** for the web UI
- **fuzzywuzzy** and **python-Levenshtein** for fuzzy matching in playlist validation
- All other required dependencies
3. Verify installation:
```bash
python -c "import flask, fuzzywuzzy; print('All dependencies installed successfully!')"
```
### Migration from Previous Versions
If you're upgrading from a previous version that used `allSongs.json`, run the migration script:
```bash
python3 migrate_to_songs_json.py
```
This script will:
- Rename `allSongs.json` to `songs.json`
- Add `data_directory` configuration to `config.json`
- Create backups of your original files
## Usage
### CLI Tool
@ -64,6 +106,21 @@ The web UI will automatically:
2. Start the Flask server
3. Open your default browser to the interface
### Playlist Validation
Validate your playlists against your song library:
```bash
cd cli
python playlist_validator.py
```
Options:
- `--playlist-index N`: Validate a specific playlist by index
- `--output results.json`: Save results to a JSON file
- `--apply`: Apply corrections to playlists (use with caution)
**Note**: Playlist validation uses fuzzy matching to find potential matches. Make sure fuzzywuzzy is installed for best results.
### Priority Preferences
The web UI now supports drag-and-drop priority management:
@ -87,7 +144,7 @@ Edit `config/config.json` to customize:
```
KaraokeMerge/
├── data/
│ ├── allSongs.json # Input: Your song library data
│ ├── songs.json # Input: Your song library data
│ ├── skipSongs.json # Output: Generated skip list
│ ├── preferences/ # User priority preferences
│ │ └── priority_preferences.json
@ -113,7 +170,7 @@ KaraokeMerge/
## Data Requirements
Place your song library data in `data/allSongs.json` with the following format:
Place your song library data in `data/songs.json` with the following format:
```json
[
{

View File

@ -1,77 +1,117 @@
# Karaoke Song Library Cleanup Tool - CLI Commands Reference
# Karaoke Song Library Cleanup Tool - CLI Commands Reference (v2.0)
## Overview
The CLI tool analyzes karaoke song collections, identifies duplicates, and generates skip lists for future imports. It supports multiple file formats (MP3, CDG, MP4) with configurable priority systems.
The CLI tool analyzes karaoke song collections, identifies duplicates, validates playlists, and generates skip lists for future imports. It supports multiple file formats (MP3, CDG, MP4) with configurable priority systems.
## Basic Usage
## Quick Start Commands
### Standard Analysis
### Basic Analysis (Most Common)
```bash
python cli/main.py
cd cli
python3 main.py
```
Runs the tool with default settings:
- Input: `data/allSongs.json`
- Input: `data/songs.json`
- Config: `config/config.json`
- Output: `data/skipSongs.json`
- Verbose: Disabled
- Reports: **Automatically generated** (including web UI data)
- Reports: **Automatically generated**
### Verbose Output
### Process Everything (Recommended)
```bash
python cli/main.py --verbose
cd cli
python3 main.py --process-all
```
Complete processing including:
- Duplicate analysis and skip list generation
- Favorites processing with priority logic (MP4 over MP3)
- History processing with priority logic
- Comprehensive report generation
## Main CLI Commands (main.py)
### Basic Analysis Commands
#### Standard Analysis
```bash
python3 main.py
```
Runs the tool with default settings and generates all reports automatically.
#### Verbose Output
```bash
python3 main.py --verbose
# or
python cli/main.py -v
python3 main.py -v
```
Enables detailed output showing:
- Individual song processing
- Duplicate detection details
- File type analysis
- Channel priority decisions
Enables detailed output showing individual song processing and decisions.
### Dry Run Mode
#### Dry Run Mode
```bash
python cli/main.py --dry-run
python3 main.py --dry-run
```
Analyzes songs without generating the skip list file. Useful for:
- Testing configuration changes
- Previewing results before committing
- Validating input data
Analyzes songs without generating the skip list file. Useful for testing and previewing results.
## Configuration Options
### Configuration Commands
### Custom Configuration File
#### Custom Configuration File
```bash
python cli/main.py --config path/to/custom_config.json
python3 main.py --config path/to/custom_config.json
```
Uses a custom configuration file instead of the default `config/config.json`.
### Show Current Configuration
#### Show Current Configuration
```bash
python cli/main.py --show-config
python3 main.py --show-config
```
Displays the current configuration settings and exits. Useful for:
- Verifying configuration values
- Debugging configuration issues
- Understanding current settings
Displays the current configuration settings and exits.
## Input/Output Options
### Input/Output Commands
### Custom Input File
#### Custom Input File
```bash
python cli/main.py --input path/to/songs.json
python3 main.py --input path/to/songs.json
```
Specifies a custom input file instead of the default `data/allSongs.json`.
Specifies a custom input file instead of the default `data/songs.json`.
### Custom Output Directory
#### Custom Output Directory
```bash
python cli/main.py --output-dir ./custom_output
python3 main.py --output-dir ./custom_output
```
Saves output files to a custom directory instead of the default `data/` folder.
## Report Generation
### Processing Commands
### Detailed Reports (Always Generated)
Reports are now **automatically generated** every time you run the CLI tool. The `--save-reports` flag is kept for backward compatibility but is no longer required.
#### Process Favorites Only
```bash
python3 main.py --process-favorites
```
Processes favorites with priority-based logic to select best versions (MP4 over MP3).
#### Process History Only
```bash
python3 main.py --process-history
```
Processes history with priority-based logic to select best versions (MP4 over MP3).
#### Process Everything
```bash
python3 main.py --process-all
```
Processes everything: duplicates, generates reports, AND updates favorites/history with priority logic.
#### Merge History Objects
```bash
python3 main.py --merge-history
```
Merges history objects that match on artist, title, and path, summing their count properties.
### Report Generation
#### Save Detailed Reports (Legacy)
```bash
python3 main.py --save-reports
```
**Note**: Reports are now automatically generated every time you run the CLI tool. This flag is kept for backward compatibility.
Generated reports include:
- `enhanced_summary_report.txt` - Comprehensive analysis
@ -82,43 +122,244 @@ Generated reports include:
- `analysis_data.json` - Raw analysis data for further processing
- `skip_songs_detailed.json` - **Web UI data (always generated)**
## Combined Examples
## Playlist Validator Commands (playlist_validator.py)
### Full Analysis with Reports
### Basic Playlist Validation
#### Validate All Playlists
```bash
python cli/main.py --verbose
python3 playlist_validator.py
```
Runs complete analysis with:
Validates all playlists in `data/songList.json` against the song library.
#### Validate Specific Playlist
```bash
python3 playlist_validator.py --playlist-index 0
```
Validates a specific playlist by index (0-based).
### Playlist Validator Options
#### Custom Configuration
```bash
python3 playlist_validator.py --config path/to/custom_config.json
```
Uses a custom configuration file.
#### Custom Data Directory
```bash
python3 playlist_validator.py --data-dir path/to/data
```
Uses a custom data directory.
#### Apply Changes (Disable Dry Run)
```bash
python3 playlist_validator.py --apply
```
Applies changes to playlists instead of just previewing them.
#### Output Results to File
```bash
python3 playlist_validator.py --output results.json
```
Saves validation results to a JSON file.
## Comprehensive Examples
### Complete Workflow Examples
#### 1. Full Analysis with Everything
```bash
cd cli
python3 main.py --process-all --verbose
```
Complete processing with detailed output:
- Duplicate analysis and skip list generation
- Favorites and history processing with priority logic
- Comprehensive report generation
- Verbose output for detailed processing information
- **Automatic comprehensive report generation**
- Skip list creation
### Custom Configuration with Dry Run
#### 2. Preview Changes Before Applying
```bash
python cli/main.py --config custom_config.json --dry-run --verbose
cd cli
python3 main.py --process-all --dry-run --verbose
```
Tests a custom configuration without generating files:
- Uses custom configuration
Preview all changes without saving:
- Shows what would be processed
- No files are modified
- Useful for testing configuration changes
#### 3. Custom Configuration Testing
```bash
cd cli
python3 main.py --config custom_config.json --dry-run --verbose
```
Test a custom configuration:
- Uses custom configuration file
- Shows detailed processing
- No output files created
### Custom Input/Output with Reports
#### 4. Process Only Favorites and History
```bash
python cli/main.py --input /path/to/songs.json --output-dir ./reports
cd cli
python3 main.py --process-favorites --process-history
```
Process only favorites and history files:
- Updates favorites with best versions (MP4 over MP3)
- Updates history with best versions
- No duplicate analysis performed
#### 5. Merge History Objects
```bash
cd cli
python3 main.py --merge-history --dry-run
```
Preview history merging:
- Shows which history objects would be merged
- No files are modified
#### 6. Apply History Merging
```bash
cd cli
python3 main.py --merge-history
```
Actually merge history objects:
- Combines duplicate history entries
- Sums count properties
- Saves updated history file
### Playlist Validation Examples
#### 1. Validate All Playlists
```bash
cd cli
python3 playlist_validator.py
```
Validates all playlists and shows summary:
- Total playlists and songs
- Exact matches found
- Missing songs count
- Fuzzy matches (if available)
#### 2. Validate Specific Playlist
```bash
cd cli
python3 playlist_validator.py --playlist-index 5
```
Validates playlist at index 5:
- Shows detailed results for that specific playlist
- Lists exact matches and missing songs
#### 3. Save Validation Results
```bash
cd cli
python3 playlist_validator.py --output validation_results.json
```
Saves detailed validation results to JSON file for further analysis.
#### 4. Apply Playlist Corrections
```bash
cd cli
python3 playlist_validator.py --apply
```
Applies corrections to playlists (use with caution).
### Advanced Examples
#### 1. Custom Input/Output with Full Processing
```bash
cd cli
python3 main.py --input /path/to/songs.json --output-dir ./reports --process-all --verbose
```
Processes custom input and saves all outputs to reports directory:
- Custom input file
- Custom output location
- **All report files automatically generated**
- Full processing including favorites/history
- Verbose output
### Minimal Output
#### 2. Configuration Testing Workflow
```bash
python cli/main.py --output-dir ./minimal
cd cli
# Show current configuration
python3 main.py --show-config
# Test with dry run
python3 main.py --dry-run --verbose
# Test with custom config
python3 main.py --config test_config.json --dry-run --verbose
```
Runs with minimal output:
- No verbose logging
- No detailed reports
- Only generates skip list
#### 3. Playlist Analysis Workflow
```bash
cd cli
# Validate all playlists
python3 playlist_validator.py
# Validate specific playlist
python3 playlist_validator.py --playlist-index 0
# Save detailed results
python3 playlist_validator.py --output playlist_analysis.json
```
#### 4. Complete System Analysis
```bash
cd cli
# Process everything
python3 main.py --process-all --verbose
# Validate playlists
python3 playlist_validator.py
# Show configuration
python3 main.py --show-config
```
## Command Line Options Reference
### Main CLI (main.py) Options
| Option | Description | Default |
|--------|-------------|---------|
| `--config` | Configuration file path | `../config/config.json` |
| `--input` | Input songs file path | `../data/songs.json` |
| `--output-dir` | Output directory | `../data` |
| `--verbose, -v` | Enable verbose output | `False` |
| `--dry-run` | Analyze without generating files | `False` |
| `--save-reports` | Save detailed reports | `True` (always enabled) |
| `--show-config` | Show configuration and exit | `False` |
| `--process-favorites` | Process favorites with priority logic | `False` |
| `--process-history` | Process history with priority logic | `False` |
| `--process-all` | Process everything | `False` |
| `--merge-history` | Merge history objects | `False` |
### Playlist Validator (playlist_validator.py) Options
| Option | Description | Default |
|--------|-------------|---------|
| `--config` | Configuration file path | `../config/config.json` |
| `--data-dir` | Data directory path | `../data` |
| `--dry-run` | Dry run mode | `True` |
| `--apply` | Apply changes (disable dry run) | `False` |
| `--playlist-index` | Validate specific playlist by index | `None` |
| `--output` | Output results to JSON file | `None` |
## File Structure Requirements
### Required Files
- `data/songs.json` - Main song library
- `config/config.json` - Configuration settings
### Optional Files
- `data/favorites.json` - Favorites list (for processing)
- `data/history.json` - History list (for processing)
- `data/songList.json` - Playlists (for validation)
### Generated Files
- `data/skipSongs.json` - Skip list for future imports
- `data/reports/` - Directory containing all analysis reports
- `data/preferences/` - Directory containing priority preferences
## Configuration File Structure
@ -148,31 +389,9 @@ The default configuration file (`config/config.json`) contains:
}
```
### Configuration Options Explained
#### Channel Priorities
- **channel_priorities**: Array of folder names for MP4 files
- Order determines priority (first = highest priority)
- Files without matching folders are marked for manual review
#### Matching Settings
- **fuzzy_matching**: Enable/disable fuzzy string matching
- **fuzzy_threshold**: Similarity threshold (0.0-1.0) for fuzzy matching
- **case_sensitive**: Case-sensitive artist/title comparison
#### Output Settings
- **verbose**: Enable detailed output
- **include_reasons**: Include reason field in skip list
- **max_duplicates_per_song**: Maximum duplicates to process per song
#### File Type Settings
- **supported_extensions**: All supported file extensions
- **mp4_extensions**: Extensions treated as MP4 files
## Input File Format
The tool expects a JSON array of song objects:
## Input File Formats
### Song Library Format (songs.json)
```json
[
{
@ -183,9 +402,45 @@ The tool expects a JSON array of song objects:
]
```
Optional fields for MP4 files:
- `channel`: Channel/folder information
- ID3 tag information (artist, title, etc.)
### Playlist Format (songList.json)
```json
[
{
"title": "Playlist Name",
"songs": [
{
"position": 1,
"artist": "Artist Name",
"title": "Song Title"
}
]
}
]
```
### Favorites Format (favorites.json)
```json
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3",
"favorite": true
}
]
```
### History Format (history.json)
```json
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3",
"count": 5
}
]
```
## Output Files
@ -193,7 +448,7 @@ Optional fields for MP4 files:
- **skipSongs.json**: List of file paths to skip in future imports
- Format: `[{"path": "file/path.mp3", "reason": "duplicate"}]`
### Report Files (with --save-reports)
### Report Files (Automatically Generated)
- **enhanced_summary_report.txt**: Overall analysis and statistics
- **channel_optimization_report.txt**: Channel priority suggestions
- **duplicate_pattern_report.txt**: Duplicate detection patterns
@ -222,7 +477,7 @@ The tool provides clear error messages for:
## Performance Notes
- Successfully tested with 37,000+ songs
- Successfully tested with 49,000+ songs
- Processes large datasets efficiently
- Shows progress indicators for long operations
- Memory-efficient processing
@ -245,17 +500,29 @@ The CLI tool integrates with the web UI:
### Debug Mode
```bash
python cli/main.py --verbose --dry-run --show-config
cd cli
python3 main.py --verbose --dry-run --show-config
```
Complete debugging setup:
- Shows configuration
- Verbose processing
- No file changes
### Playlist Validator Debug
```bash
cd cli
python3 playlist_validator.py --dry-run --output debug_results.json
```
Debug playlist validation:
- Dry run mode
- Save results to file
- No playlist modifications
## Version Information
This commands reference is for Karaoke Song Library Cleanup Tool v2.0
- CLI: Fully functional with comprehensive options
- Web UI: Interactive priority management
- Priority System: Drag-and-drop with persistence
- Reports: Enhanced analysis with actionable insights
- Reports: Enhanced analysis with actionable insights
- Playlist Validator: Complete playlist analysis and validation

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -15,6 +15,192 @@ from matching import SongMatcher
from report import ReportGenerator
def merge_history_objects(data_dir: str, args) -> None:
"""Merge history objects that match on artist, title, and path, summing their count properties."""
history_path = os.path.join(data_dir, 'history.json')
if not os.path.exists(history_path):
print(f"History file not found: {history_path}")
return
try:
# Load current history
history_items = load_json_file(history_path)
if not history_items:
print("No history items found to merge")
return
print(f"\n🔄 Merging history objects...")
print(f"Processing {len(history_items):,} history entries...")
# Create a dictionary to group items by artist, title, and path
grouped_items = {}
merged_count = 0
total_merged_entries = 0
for item in history_items:
if not isinstance(item, dict):
continue
artist = item.get('artist', '').strip()
title = item.get('title', '').strip()
path = item.get('path', '').strip()
if not artist or not title or not path:
continue
# Create a key for grouping
key = (artist.lower(), title.lower(), path.lower())
if key not in grouped_items:
grouped_items[key] = []
grouped_items[key].append(item)
# Process groups with multiple items
merged_items = []
for key, items in grouped_items.items():
if len(items) == 1:
# Single item, keep as is
merged_items.append(items[0])
else:
# Multiple items, merge them
artist, title, path = key
# Start with the first item as the base
merged_item = items[0].copy()
# Sum the counts (handle both int and string values)
total_count = 0
for item in items:
count_value = item.get('count', 0)
if isinstance(count_value, str):
try:
total_count += int(count_value)
except ValueError:
total_count += 0
else:
total_count += count_value
merged_item['count'] = total_count
# For boolean properties, if any are True, keep True
merged_item['favorite'] = any(item.get('favorite', False) for item in items)
merged_item['disabled'] = any(item.get('disabled', False) for item in items)
# For other properties, keep the first non-empty value
for prop in ['key', 'original_path', 'genre']:
if prop in merged_item and merged_item[prop]:
continue
for item in items[1:]: # Skip first item since we already have it
if item.get(prop):
merged_item[prop] = item[prop]
break
merged_items.append(merged_item)
merged_count += 1
total_merged_entries += len(items)
if args.verbose:
print(f"Merged {len(items)} entries for '{artist} - {title}': total count = {total_count}")
# Save the merged history
if not args.dry_run:
save_json_file(merged_items, history_path)
print(f"✅ Merged {merged_count} groups ({total_merged_entries} total entries → {len(merged_items)} entries)")
print(f"📁 Saved to: {history_path}")
else:
print(f"DRY RUN: Would merge {merged_count} groups ({total_merged_entries} total entries → {len(merged_items)} entries)")
except Exception as e:
print(f"Error merging history objects: {e}")
def process_favorites_and_history(matcher: SongMatcher, all_songs: List[Dict[str, Any]], data_dir: str, args) -> None:
"""Process favorites and history with priority-based logic to select best versions."""
def process_file(file_type: str, file_path: str) -> List[Dict[str, Any]]:
"""Process a single favorites or history file."""
try:
items = load_json_file(file_path)
if not items:
print(f"No {file_type} found in {file_path}")
return []
print(f"\nProcessing {len(items):,} {file_type} entries...")
# Find matching songs for each item
processed_items = []
updated_count = 0
for i, item in enumerate(items):
if not isinstance(item, dict):
print(f"Warning: Skipping invalid {file_type} item at index {i}")
continue
artist = item.get('artist', '')
title = item.get('title', '')
current_path = item.get('path', '')
if not artist or not title:
print(f"Warning: Skipping {file_type} item with missing artist/title at index {i}")
continue
# Find all matching songs for this artist/title
matching_songs = []
for song in all_songs:
if (song.get('artist', '').lower().strip() == artist.lower().strip() and
song.get('title', '').lower().strip() == title.lower().strip()):
matching_songs.append(song)
if not matching_songs:
print(f"Warning: No matching songs found for {artist} - {title}")
processed_items.append(item)
continue
# Use the same priority logic as duplicates
best_song, skip_songs = matcher.select_best_song(matching_songs, artist, title)
if best_song and best_song['path'] != current_path:
# Update the path to the best version
item['path'] = best_song['path']
item['original_path'] = current_path # Keep track of the original
updated_count += 1
if args.verbose:
print(f"Updated {artist} - {title}: {current_path}{best_song['path']}")
processed_items.append(item)
# Save the updated file
if not args.dry_run:
save_json_file(processed_items, file_path)
print(f"✅ Updated {updated_count:,} {file_type} entries with best versions")
print(f"📁 Saved to: {file_path}")
else:
print(f"DRY RUN: Would update {updated_count:,} {file_type} entries")
return processed_items
except Exception as e:
print(f"Error processing {file_type}: {e}")
return []
# Process favorites if requested
if args.process_favorites:
favorites_path = os.path.join(data_dir, 'favorites.json')
if os.path.exists(favorites_path):
process_file('favorites', favorites_path)
else:
print(f"Favorites file not found: {favorites_path}")
# Process history if requested
if args.process_history:
history_path = os.path.join(data_dir, 'history.json')
if os.path.exists(history_path):
process_file('history', history_path)
else:
print(f"History file not found: {history_path}")
def parse_arguments():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
@ -27,25 +213,31 @@ Examples:
python main.py --config custom_config.json # Use custom config
python main.py --output-dir ./reports # Save reports to custom directory
python main.py --dry-run # Analyze without generating files
python main.py --process-favorites # Process favorites with priority logic (MP4 over MP3)
python main.py --process-history # Process history with priority logic (MP4 over MP4)
python main.py --process-all # Process everything: duplicates, generate reports, AND update favorites/history with priority logic
python main.py --process-all --dry-run # Preview changes without saving
python main.py --merge-history # Merge history objects that match on artist, title, and path
python main.py --merge-history --dry-run # Preview history merging without saving
"""
)
parser.add_argument(
'--config',
default='config/config.json',
help='Path to configuration file (default: config/config.json)'
default='../config/config.json',
help='Path to configuration file (default: ../config/config.json)'
)
parser.add_argument(
'--input',
default='data/allSongs.json',
help='Path to input songs file (default: data/allSongs.json)'
default=None,
help='Path to input songs file (default: auto-detected from config)'
)
parser.add_argument(
'--output-dir',
default='data',
help='Directory for output files (default: data)'
default=None,
help='Directory for output files (default: auto-detected from config)'
)
parser.add_argument(
@ -72,6 +264,30 @@ Examples:
help='Show current configuration and exit'
)
parser.add_argument(
'--process-favorites',
action='store_true',
help='Process favorites with priority-based logic to select best versions (MP4 over MP3)'
)
parser.add_argument(
'--process-history',
action='store_true',
help='Process history with priority-based logic to select best versions (MP4 over MP3)'
)
parser.add_argument(
'--process-all',
action='store_true',
help='Process everything: duplicates, generate reports, AND update favorites/history with priority logic'
)
parser.add_argument(
'--merge-history',
action='store_true',
help='Merge history objects that match on artist, title, and path, summing their count properties'
)
return parser.parse_args()
@ -119,137 +335,178 @@ def main():
reporter.print_report("config", config)
return
# Load songs
songs = load_songs(args.input)
# Determine data directory and input file from config or args
data_dir = args.output_dir or config.get('data_directory', '../data')
# Resolve relative paths from CLI directory
if not os.path.isabs(data_dir):
data_dir = os.path.join(os.path.dirname(__file__), '..', data_dir)
input_file = args.input or os.path.join(data_dir, 'songs.json')
# Initialize components
data_dir = args.output_dir
matcher = SongMatcher(config, data_dir)
reporter = ReportGenerator(config)
# Load songs (only if needed for processing)
songs = None
matcher = None
reporter = None
print("\nStarting song analysis...")
print("=" * 60)
if not args.merge_history:
songs = load_songs(input_file)
matcher = SongMatcher(config, data_dir)
reporter = ReportGenerator(config)
# Process songs
try:
best_songs, skip_songs, stats = matcher.process_songs(songs)
# Process favorites and history if requested
if args.process_favorites or args.process_history or args.process_all:
print("\n🎯 Processing favorites and history with priority logic...")
print("=" * 60)
# Generate reports
# If --process-all is used, set both flags
if args.process_all:
args.process_favorites = True
args.process_history = True
process_favorites_and_history(matcher, songs, data_dir, args)
print("\n" + "=" * 60)
reporter.print_report("summary", stats)
print("Favorites/History processing complete!")
# Add channel priority report
if config.get('channel_priorities'):
channel_report = reporter.generate_channel_priority_report(stats, config['channel_priorities'])
print("\n" + channel_report)
# If --process-all, also do the full duplicate analysis and reporting
if args.process_all:
print("\n🔄 Processing duplicates and generating reports...")
print("=" * 60)
else:
return
# Merge history objects if requested (separate operation)
if args.merge_history:
print("\n🔄 Merging history objects...")
print("=" * 60)
merge_history_objects(data_dir, args)
print("\n" + "=" * 60)
print("History merging complete!")
return
# If not processing favorites/history OR if --process-all, do the full analysis
if not (args.process_favorites or args.process_history) or args.process_all:
print("\nStarting song analysis...")
print("=" * 60)
if config['output']['verbose']:
duplicate_info = matcher.get_detailed_duplicate_info(songs)
reporter.print_report("duplicates", duplicate_info)
reporter.print_report("skip_summary", skip_songs)
# Save skip list if not dry run
if not args.dry_run and skip_songs:
skip_list_path = os.path.join(args.output_dir, 'skipSongs.json')
# Process songs
try:
best_songs, skip_songs, stats = matcher.process_songs(songs)
# Create simplified skip list (just paths and reasons) with deduplication
seen_paths = set()
simple_skip_list = []
duplicate_count = 0
# Generate reports
print("\n" + "=" * 60)
reporter.print_report("summary", stats)
for skip_song in skip_songs:
path = skip_song['path']
if path not in seen_paths:
seen_paths.add(path)
skip_entry = {'path': path}
if config['output']['include_reasons']:
skip_entry['reason'] = skip_song['reason']
simple_skip_list.append(skip_entry)
else:
duplicate_count += 1
# Add channel priority report
if config.get('channel_priorities'):
channel_report = reporter.generate_channel_priority_report(stats, config['channel_priorities'])
print("\n" + channel_report)
save_json_file(simple_skip_list, skip_list_path)
print(f"\nSkip list saved to: {skip_list_path}")
print(f"Total songs to skip: {len(simple_skip_list):,}")
if duplicate_count > 0:
print(f"Removed {duplicate_count:,} duplicate entries from skip list")
elif args.dry_run:
print("\nDRY RUN MODE: No skip list generated")
# Always generate detailed reports (not just when --save-reports is used)
if not args.dry_run:
reports_dir = os.path.join(args.output_dir, 'reports')
os.makedirs(reports_dir, exist_ok=True)
print(f"\n📊 Generating enhanced analysis reports...")
# Analyze skip patterns
skip_analysis = reporter.analyze_skip_patterns(skip_songs)
# Analyze channel optimization
channel_analysis = reporter.analyze_channel_optimization(stats, skip_analysis)
# Generate and save enhanced reports
enhanced_summary = reporter.generate_enhanced_summary_report(stats, skip_analysis)
reporter.save_report_to_file(enhanced_summary, os.path.join(reports_dir, 'enhanced_summary_report.txt'))
channel_optimization = reporter.generate_channel_optimization_report(channel_analysis)
reporter.save_report_to_file(channel_optimization, os.path.join(reports_dir, 'channel_optimization_report.txt'))
duplicate_patterns = reporter.generate_duplicate_pattern_report(skip_analysis)
reporter.save_report_to_file(duplicate_patterns, os.path.join(reports_dir, 'duplicate_pattern_report.txt'))
actionable_insights = reporter.generate_actionable_insights_report(stats, skip_analysis, channel_analysis)
reporter.save_report_to_file(actionable_insights, os.path.join(reports_dir, 'actionable_insights_report.txt'))
# Generate detailed duplicate analysis
detailed_duplicates = reporter.generate_detailed_duplicate_analysis(skip_songs, best_songs)
reporter.save_report_to_file(detailed_duplicates, os.path.join(reports_dir, 'detailed_duplicate_analysis.txt'))
# Save original reports for compatibility
summary_report = reporter.generate_summary_report(stats)
reporter.save_report_to_file(summary_report, os.path.join(reports_dir, 'summary_report.txt'))
skip_report = reporter.generate_skip_list_summary(skip_songs)
reporter.save_report_to_file(skip_report, os.path.join(reports_dir, 'skip_list_summary.txt'))
# Save detailed duplicate report if verbose
if config['output']['verbose']:
duplicate_info = matcher.get_detailed_duplicate_info(songs)
duplicate_report = reporter.generate_duplicate_details(duplicate_info)
reporter.save_report_to_file(duplicate_report, os.path.join(reports_dir, 'duplicate_details.txt'))
reporter.print_report("duplicates", duplicate_info)
# Save analysis data as JSON for further processing
analysis_data = {
'stats': stats,
'skip_analysis': skip_analysis,
'channel_analysis': channel_analysis,
'timestamp': __import__('datetime').datetime.now().isoformat()
}
save_json_file(analysis_data, os.path.join(reports_dir, 'analysis_data.json'))
reporter.print_report("skip_summary", skip_songs)
# Save full skip list data (this is what the web UI needs)
save_json_file(skip_songs, os.path.join(reports_dir, 'skip_songs_detailed.json'))
# Save skip list if not dry run
if not args.dry_run and skip_songs:
skip_list_path = os.path.join(data_dir, 'skipSongs.json')
# Create simplified skip list (just paths and reasons) with deduplication
seen_paths = set()
simple_skip_list = []
duplicate_count = 0
for skip_song in skip_songs:
path = skip_song['path']
if path not in seen_paths:
seen_paths.add(path)
skip_entry = {'path': path}
if config['output']['include_reasons']:
skip_entry['reason'] = skip_song['reason']
simple_skip_list.append(skip_entry)
else:
duplicate_count += 1
save_json_file(simple_skip_list, skip_list_path)
print(f"\nSkip list saved to: {skip_list_path}")
print(f"Total songs to skip: {len(simple_skip_list):,}")
if duplicate_count > 0:
print(f"Removed {duplicate_count:,} duplicate entries from skip list")
elif args.dry_run:
print("\nDRY RUN MODE: No skip list generated")
print(f"✅ Enhanced reports saved to: {reports_dir}")
print(f"📋 Generated reports:")
print(f" • enhanced_summary_report.txt - Comprehensive analysis")
print(f" • channel_optimization_report.txt - Priority optimization suggestions")
print(f" • duplicate_pattern_report.txt - Duplicate pattern analysis")
print(f" • actionable_insights_report.txt - Recommendations and insights")
print(f" • detailed_duplicate_analysis.txt - Specific songs and their duplicates")
print(f" • analysis_data.json - Raw analysis data for further processing")
print(f" • skip_songs_detailed.json - Web UI data (always generated)")
elif args.dry_run:
print("\nDRY RUN MODE: No reports generated")
print("\n" + "=" * 60)
print("Analysis complete!")
except Exception as e:
print(f"\nError during processing: {e}")
sys.exit(1)
# Always generate detailed reports (not just when --save-reports is used)
if not args.dry_run:
reports_dir = os.path.join(data_dir, 'reports')
os.makedirs(reports_dir, exist_ok=True)
print(f"\n📊 Generating enhanced analysis reports...")
# Analyze skip patterns
skip_analysis = reporter.analyze_skip_patterns(skip_songs)
# Analyze channel optimization
channel_analysis = reporter.analyze_channel_optimization(stats, skip_analysis)
# Generate and save enhanced reports
enhanced_summary = reporter.generate_enhanced_summary_report(stats, skip_analysis)
reporter.save_report_to_file(enhanced_summary, os.path.join(reports_dir, 'enhanced_summary_report.txt'))
channel_optimization = reporter.generate_channel_optimization_report(channel_analysis)
reporter.save_report_to_file(channel_optimization, os.path.join(reports_dir, 'channel_optimization_report.txt'))
duplicate_patterns = reporter.generate_duplicate_pattern_report(skip_analysis)
reporter.save_report_to_file(duplicate_patterns, os.path.join(reports_dir, 'duplicate_pattern_report.txt'))
actionable_insights = reporter.generate_actionable_insights_report(stats, skip_analysis, channel_analysis)
reporter.save_report_to_file(actionable_insights, os.path.join(reports_dir, 'actionable_insights_report.txt'))
# Generate detailed duplicate analysis
detailed_duplicates = reporter.generate_detailed_duplicate_analysis(skip_songs, best_songs)
reporter.save_report_to_file(detailed_duplicates, os.path.join(reports_dir, 'detailed_duplicate_analysis.txt'))
# Save original reports for compatibility
summary_report = reporter.generate_summary_report(stats)
reporter.save_report_to_file(summary_report, os.path.join(reports_dir, 'summary_report.txt'))
skip_report = reporter.generate_skip_list_summary(skip_songs)
reporter.save_report_to_file(skip_report, os.path.join(reports_dir, 'skip_list_summary.txt'))
# Save detailed duplicate report if verbose
if config['output']['verbose']:
duplicate_info = matcher.get_detailed_duplicate_info(songs)
duplicate_report = reporter.generate_duplicate_details(duplicate_info)
reporter.save_report_to_file(duplicate_report, os.path.join(reports_dir, 'duplicate_details.txt'))
# Save analysis data as JSON for further processing
analysis_data = {
'stats': stats,
'skip_analysis': skip_analysis,
'channel_analysis': channel_analysis,
'timestamp': __import__('datetime').datetime.now().isoformat()
}
save_json_file(analysis_data, os.path.join(reports_dir, 'analysis_data.json'))
# Save full skip list data (this is what the web UI needs)
save_json_file(skip_songs, os.path.join(reports_dir, 'skip_songs_detailed.json'))
print(f"✅ Enhanced reports saved to: {reports_dir}")
print(f"📋 Generated reports:")
print(f" • enhanced_summary_report.txt - Comprehensive analysis")
print(f" • channel_optimization_report.txt - Priority optimization suggestions")
print(f" • duplicate_pattern_report.txt - Duplicate pattern analysis")
print(f" • actionable_insights_report.txt - Recommendations and insights")
print(f" • detailed_duplicate_analysis.txt - Specific songs and their duplicates")
print(f" • analysis_data.json - Raw analysis data for further processing")
print(f" • skip_songs_detailed.json - Web UI data (always generated)")
elif args.dry_run:
print("\nDRY RUN MODE: No reports generated")
print("\n" + "=" * 60)
print("Analysis complete!")
except Exception as e:
print(f"\nError during processing: {e}")
sys.exit(1)
if __name__ == "__main__":

View File

@ -17,6 +17,7 @@ from utils import (
extract_consolidated_channel_from_path,
get_file_extension,
parse_multi_artist,
clean_artist_name,
validate_song_data,
find_mp3_pairs
)
@ -63,10 +64,15 @@ class SongMatcher:
if not validate_song_data(song):
continue
# Handle multi-artist songs
artists = parse_multi_artist(song['artist'])
# Clean and handle artist names
cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists:
artists = [song['artist']]
artists = [cleaned_artist]
# Create groups for each artist variation
for artist in artists:
@ -90,10 +96,15 @@ class SongMatcher:
if i % 1000 == 0 and i > 0:
print(f"Processing song {i:,}/{len(songs):,}...")
# Handle multi-artist songs
artists = parse_multi_artist(song['artist'])
# Clean and handle artist names
cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists:
artists = [song['artist']]
artists = [cleaned_artist]
# Try exact matching first
added_to_exact = False
@ -117,10 +128,15 @@ class SongMatcher:
if i % 100 == 0 and i > 0:
print(f"Fuzzy matching song {i:,}/{len(ungrouped_songs):,}...")
# Handle multi-artist songs
artists = parse_multi_artist(song['artist'])
# Clean and handle artist names
cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists:
artists = [song['artist']]
artists = [cleaned_artist]
# Try to find an existing fuzzy group
added_to_group = False

File diff suppressed because it is too large Load Diff

350
cli/playlist_validator.py Normal file
View File

@ -0,0 +1,350 @@
#!/usr/bin/env python3
"""
Playlist validation module for the Karaoke Song Library Cleanup Tool.
Validates playlist songs against the song library using exact and fuzzy matching.
"""
import json
import os
from typing import Dict, List, Any, Tuple, Optional
from collections import defaultdict
import difflib
try:
from fuzzywuzzy import fuzz
FUZZY_AVAILABLE = True
except ImportError:
FUZZY_AVAILABLE = False
from utils import (
normalize_artist_title,
extract_channel_from_path,
get_file_extension,
parse_multi_artist,
clean_artist_name,
validate_song_data
)
from matching import SongMatcher
class PlaylistValidator:
"""Validates playlist songs against the song library."""
def __init__(self, config: Dict[str, Any], data_dir: str = "../data"):
self.config = config
self.data_dir = data_dir
self.song_matcher = SongMatcher(config, data_dir)
self.fuzzy_threshold = config.get('matching', {}).get('fuzzy_threshold', 0.8)
# Load song library
self.all_songs = self._load_all_songs()
if not self.all_songs:
raise ValueError("Could not load song library from allSongs.json")
# Create lookup dictionaries for faster matching
self._build_lookup_tables()
def _load_all_songs(self) -> List[Dict[str, Any]]:
"""Load the song library from songs.json."""
all_songs_path = os.path.join(self.data_dir, 'songs.json')
try:
with open(all_songs_path, 'r', encoding='utf-8') as f:
return json.load(f)
except Exception as e:
print(f"Error loading song library: {e}")
return []
def _build_lookup_tables(self):
"""Build lookup tables for faster exact matching."""
self.exact_lookup = {}
self.artist_title_lookup = {}
for song in self.all_songs:
if not validate_song_data(song):
continue
# Clean and handle artist names
cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists:
artists = [cleaned_artist]
# Create exact match keys
for artist in artists:
normalized_key = normalize_artist_title(artist, song['title'], False)
if normalized_key not in self.exact_lookup:
self.exact_lookup[normalized_key] = []
self.exact_lookup[normalized_key].append(song)
# Also store by artist-title for fuzzy matching
artist_title_key = f"{artist.lower()} - {song['title'].lower()}"
if artist_title_key not in self.artist_title_lookup:
self.artist_title_lookup[artist_title_key] = []
self.artist_title_lookup[artist_title_key].append(song)
def find_exact_match(self, artist: str, title: str) -> Optional[List[Dict[str, Any]]]:
"""Find exact matches for artist/title combination."""
normalized_key = normalize_artist_title(artist, title, False)
return self.exact_lookup.get(normalized_key, [])
def find_fuzzy_matches(self, artist: str, title: str, threshold: float = None) -> List[Tuple[Dict[str, Any], float]]:
"""Find fuzzy matches for artist/title combination."""
if not FUZZY_AVAILABLE:
return []
if threshold is None:
threshold = self.fuzzy_threshold
query = f"{artist.lower()} - {title.lower()}"
matches = []
for key, songs in self.artist_title_lookup.items():
similarity = fuzz.ratio(query, key) / 100.0
if similarity >= threshold:
# Get the best song from this group using existing priority logic
best_song, _ = self.song_matcher.select_best_song(songs, artist, title)
matches.append((best_song, similarity))
# Sort by similarity score (highest first)
matches.sort(key=lambda x: x[1], reverse=True)
return matches
def validate_playlist(self, playlist: Dict[str, Any], dry_run: bool = True) -> Dict[str, Any]:
"""Validate a single playlist against the song library."""
results = {
'playlist_title': playlist.get('title', 'Unknown Playlist'),
'total_songs': len(playlist.get('songs', [])),
'exact_matches': [],
'fuzzy_matches': [],
'missing_songs': [],
'summary': {
'exact_match_count': 0,
'fuzzy_match_count': 0,
'missing_count': 0,
'needs_manual_review': 0
}
}
for song in playlist.get('songs', []):
artist = song.get('artist', '')
title = song.get('title', '')
position = song.get('position', 0)
if not artist or not title:
results['missing_songs'].append({
'position': position,
'artist': artist,
'title': title,
'reason': 'Missing artist or title'
})
results['summary']['missing_count'] += 1
continue
# Try exact match first
exact_matches = self.find_exact_match(artist, title)
if exact_matches:
# Get the best song using existing priority logic
best_song, _ = self.song_matcher.select_best_song(exact_matches, artist, title)
results['exact_matches'].append({
'position': position,
'playlist_artist': artist,
'playlist_title': title,
'found_song': best_song,
'match_type': 'exact'
})
results['summary']['exact_match_count'] += 1
else:
# Try fuzzy matching
fuzzy_matches = self.find_fuzzy_matches(artist, title)
if fuzzy_matches:
best_fuzzy_song, similarity = fuzzy_matches[0]
results['fuzzy_matches'].append({
'position': position,
'playlist_artist': artist,
'playlist_title': title,
'found_song': best_fuzzy_song,
'similarity': similarity,
'match_type': 'fuzzy',
'needs_manual_review': True
})
results['summary']['fuzzy_match_count'] += 1
results['summary']['needs_manual_review'] += 1
else:
results['missing_songs'].append({
'position': position,
'artist': artist,
'title': title,
'reason': 'No matches found'
})
results['summary']['missing_count'] += 1
return results
def validate_all_playlists(self, dry_run: bool = True) -> Dict[str, Any]:
"""Validate all playlists in songList.json."""
playlists_path = os.path.join(self.data_dir, 'songList.json')
try:
with open(playlists_path, 'r', encoding='utf-8') as f:
playlists = json.load(f)
except Exception as e:
print(f"Error loading playlists: {e}")
return {}
all_results = {
'total_playlists': len(playlists),
'playlist_results': [],
'overall_summary': {
'total_songs': 0,
'exact_matches': 0,
'fuzzy_matches': 0,
'missing_songs': 0,
'needs_manual_review': 0
}
}
for playlist in playlists:
result = self.validate_playlist(playlist, dry_run)
all_results['playlist_results'].append(result)
# Update overall summary
summary = result['summary']
all_results['overall_summary']['total_songs'] += result['total_songs']
all_results['overall_summary']['exact_matches'] += summary['exact_match_count']
all_results['overall_summary']['fuzzy_matches'] += summary['fuzzy_match_count']
all_results['overall_summary']['missing_songs'] += summary['missing_count']
all_results['overall_summary']['needs_manual_review'] += summary['needs_manual_review']
return all_results
def update_playlist_song(self, playlist_index: int, song_position: int,
new_artist: str, new_title: str, dry_run: bool = True) -> bool:
"""Update a playlist song with corrected artist/title."""
playlists_path = os.path.join(self.data_dir, 'songList.json')
try:
with open(playlists_path, 'r', encoding='utf-8') as f:
playlists = json.load(f)
except Exception as e:
print(f"Error loading playlists: {e}")
return False
if playlist_index >= len(playlists):
print(f"Invalid playlist index: {playlist_index}")
return False
playlist = playlists[playlist_index]
songs = playlist.get('songs', [])
# Find the song by position
for song in songs:
if song.get('position') == song_position:
if dry_run:
print(f"DRY RUN: Would update playlist '{playlist['title']}' song {song_position}")
print(f" From: {song['artist']} - {song['title']}")
print(f" To: {new_artist} - {new_title}")
else:
song['artist'] = new_artist
song['title'] = new_title
# Save the updated playlists
try:
with open(playlists_path, 'w', encoding='utf-8') as f:
json.dump(playlists, f, indent=2, ensure_ascii=False)
print(f"Updated playlist '{playlist['title']}' song {song_position}")
return True
except Exception as e:
print(f"Error saving playlists: {e}")
return False
break
else:
print(f"Song with position {song_position} not found in playlist")
return False
return True
def main():
"""Main function for CLI usage."""
import argparse
parser = argparse.ArgumentParser(description='Validate playlists against song library')
parser.add_argument('--config', default='../config/config.json', help='Configuration file path')
parser.add_argument('--data-dir', default='../data', help='Data directory path')
parser.add_argument('--dry-run', action='store_true', default=True, help='Dry run mode (default)')
parser.add_argument('--apply', action='store_true', help='Apply changes (disable dry run)')
parser.add_argument('--playlist-index', type=int, help='Validate specific playlist by index')
parser.add_argument('--output', help='Output results to JSON file')
args = parser.parse_args()
# Load configuration
try:
with open(args.config, 'r') as f:
config = json.load(f)
except Exception as e:
print(f"Error loading config: {e}")
return
# Create validator
validator = PlaylistValidator(config, args.data_dir)
# Determine dry run mode
dry_run = not args.apply
if args.playlist_index is not None:
# Validate specific playlist
playlists_path = os.path.join(args.data_dir, 'songList.json')
try:
with open(playlists_path, 'r', encoding='utf-8') as f:
playlists = json.load(f)
except Exception as e:
print(f"Error loading playlists: {e}")
return
if args.playlist_index >= len(playlists):
print(f"Invalid playlist index: {args.playlist_index}")
return
result = validator.validate_playlist(playlists[args.playlist_index], dry_run)
print(f"\nPlaylist: {result['playlist_title']}")
print(f"Total songs: {result['total_songs']}")
print(f"Exact matches: {result['summary']['exact_match_count']}")
print(f"Fuzzy matches: {result['summary']['fuzzy_match_count']}")
print(f"Missing: {result['summary']['missing_count']}")
print(f"Need manual review: {result['summary']['needs_manual_review']}")
else:
# Validate all playlists
results = validator.validate_all_playlists(dry_run)
print(f"\nPlaylist Validation Results:")
print(f"Total playlists: {results['total_playlists']}")
print(f"Total songs: {results['overall_summary']['total_songs']}")
print(f"Exact matches: {results['overall_summary']['exact_matches']}")
print(f"Fuzzy matches: {results['overall_summary']['fuzzy_matches']}")
print(f"Missing: {results['overall_summary']['missing_songs']}")
print(f"Need manual review: {results['overall_summary']['needs_manual_review']}")
if args.output:
try:
with open(args.output, 'w', encoding='utf-8') as f:
json.dump(results, f, indent=2, ensure_ascii=False)
print(f"\nResults saved to: {args.output}")
except Exception as e:
print(f"Error saving results: {e}")
if __name__ == '__main__':
main()

View File

@ -510,7 +510,16 @@ class ReportGenerator:
def save_report_to_file(self, report_content: str, file_path: str) -> None:
"""Save a report to a text file."""
import os
os.makedirs(os.path.dirname(file_path), exist_ok=True)
# Validate file_path
if not file_path or file_path is None:
print("Warning: Invalid file path provided, skipping report save")
return
# Get directory and create it if needed
directory = os.path.dirname(file_path)
if directory: # Only create directory if there is one
os.makedirs(directory, exist_ok=True)
with open(file_path, 'w', encoding='utf-8') as f:
f.write(report_content)

View File

@ -218,6 +218,50 @@ def extract_consolidated_channel_from_path(file_path: str, channel_priorities: L
return None
def clean_artist_name(artist_string: str) -> str:
"""Clean artist name by removing features, collaborations, etc."""
if not artist_string:
return ""
# Remove common feature/collaboration patterns (more precise)
patterns_to_remove = [
r'\s*feat\.?\s*.*$', # feat. anything after
r'\s*ft\.?\s*.*$', # ft. anything after
r'\s*featuring\s*.*$', # featuring anything after
r'\s*with\s*.*$', # with anything after
r'\s*presents\s*.*$', # presents anything after
r'\s*featuring\s*.*$', # featuring anything after
r'\s*feat\s*.*$', # feat anything after
r'\s*ft\s*.*$', # ft anything after
]
# Handle comma/semicolon/slash patterns more carefully
# Only remove if they're followed by feature words
separator_patterns = [
r'\s*,\s*(feat\.?|ft\.?|featuring|with|presents).*$', # comma followed by feature words
r'\s*;\s*(feat\.?|ft\.?|featuring|with|presents).*$', # semicolon followed by feature words
r'\s*/\s*(feat\.?|ft\.?|featuring|with|presents).*$', # slash followed by feature words
]
cleaned_artist = artist_string
# Apply feature removal patterns first
for pattern in patterns_to_remove:
cleaned_artist = re.sub(pattern, '', cleaned_artist, flags=re.IGNORECASE)
# Apply separator patterns only if they're followed by feature words
for pattern in separator_patterns:
cleaned_artist = re.sub(pattern, '', cleaned_artist, flags=re.IGNORECASE)
# Clean up any trailing separators that might be left
cleaned_artist = re.sub(r'\s*[,;/]\s*$', '', cleaned_artist)
# Clean up extra whitespace
cleaned_artist = re.sub(r'\s+', ' ', cleaned_artist).strip()
return cleaned_artist
def parse_multi_artist(artist_string: str) -> List[str]:
"""Parse multi-artist strings with various delimiters."""
if not artist_string:

View File

@ -1,11 +1,12 @@
{
"data_directory": "data",
"channel_priorities": [
"Sing King Karaoke",
"KaraFun Karaoke",
"Stingray Karaoke"
],
"matching": {
"fuzzy_matching": false,
"fuzzy_matching": true,
"fuzzy_threshold": 0.85,
"case_sensitive": false
},

144
migrate_to_songs_json.py Normal file
View File

@ -0,0 +1,144 @@
#!/usr/bin/env python3
"""
Migration script to help users move from allSongs.json to songs.json
and update their configuration to use the new dynamic data directory.
"""
import os
import json
import shutil
from pathlib import Path
def load_json_file(file_path: str):
"""Load JSON file safely."""
try:
with open(file_path, 'r', encoding='utf-8') as f:
return json.load(f)
except Exception as e:
print(f"Error loading {file_path}: {e}")
return None
def save_json_file(file_path: str, data):
"""Save JSON file safely."""
try:
with open(file_path, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
return True
except Exception as e:
print(f"Error saving {file_path}: {e}")
return False
def migrate_songs_file():
"""Migrate allSongs.json to songs.json if it exists."""
old_file = 'data/allSongs.json'
new_file = 'data/songs.json'
if not os.path.exists(old_file):
print(f"⚠️ {old_file} not found - no migration needed")
return True
if os.path.exists(new_file):
print(f"⚠️ {new_file} already exists - skipping migration")
return True
print(f"🔄 Migrating {old_file} to {new_file}...")
# Load the old file
songs_data = load_json_file(old_file)
if not songs_data:
print(f"❌ Failed to load {old_file}")
return False
# Save to new file
if save_json_file(new_file, songs_data):
print(f"✅ Successfully migrated to {new_file}")
# Create backup of old file
backup_file = 'data/allSongs.json.backup'
shutil.copy2(old_file, backup_file)
print(f"📦 Created backup at {backup_file}")
return True
else:
print(f"❌ Failed to save {new_file}")
return False
def update_config():
"""Update config.json to include data_directory if not present."""
config_file = 'config/config.json'
if not os.path.exists(config_file):
print(f"{config_file} not found")
return False
print(f"🔄 Updating {config_file}...")
# Load current config
config = load_json_file(config_file)
if not config:
print(f"❌ Failed to load {config_file}")
return False
# Check if data_directory already exists
if 'data_directory' in config:
print(f"✅ data_directory already configured: {config['data_directory']}")
return True
# Add data_directory
config['data_directory'] = 'data'
# Create backup
backup_file = 'config/config.json.backup'
shutil.copy2(config_file, backup_file)
print(f"📦 Created backup at {backup_file}")
# Save updated config
if save_json_file(config_file, config):
print(f"✅ Successfully added data_directory to {config_file}")
return True
else:
print(f"❌ Failed to save {config_file}")
return False
def main():
"""Main migration function."""
print("🎤 KaraokeMerge Migration Script")
print("=" * 40)
print("This script will help you migrate to the new configuration:")
print("- Rename allSongs.json to songs.json")
print("- Add data_directory to config.json")
print()
# Check if we're in the right directory
if not os.path.exists('config') or not os.path.exists('data'):
print("❌ Please run this script from the KaraokeMerge root directory")
return False
success = True
# Migrate songs file
if not migrate_songs_file():
success = False
# Update config
if not update_config():
success = False
print()
if success:
print("✅ Migration completed successfully!")
print()
print("Next steps:")
print("1. Test the CLI tool: python cli/main.py --show-config")
print("2. Test the web UI: python start_web_ui.py")
print("3. If everything works, you can delete the backup files")
else:
print("❌ Migration failed - please check the errors above")
return False
return True
if __name__ == "__main__":
success = main()
if not success:
exit(1)

View File

@ -1,16 +1,12 @@
# Python dependencies for KaraokeMerge CLI tool
# Core dependencies (currently using only standard library)
# No external dependencies required for basic functionality
# Core dependencies
flask>=2.0.0
# Optional dependencies for enhanced features:
# Uncomment the following lines if you want to enable fuzzy matching:
# Fuzzy matching dependencies (required for playlist validation)
fuzzywuzzy>=0.18.0
python-Levenshtein>=0.21.0
# For future enhancements:
# pandas>=1.5.0 # For advanced data analysis
# click>=8.0.0 # For enhanced CLI interface
# Web UI dependencies
flask>=2.0.0
# click>=8.0.0 # For enhanced CLI interface

View File

@ -10,21 +10,38 @@ import webbrowser
from time import sleep
def check_dependencies():
"""Check if Flask is installed."""
"""Check if required dependencies are installed."""
dependencies_ok = True
# Check Flask
try:
import flask
print("✅ Flask is installed")
return True
except ImportError:
print("❌ Flask is not installed")
print("Installing Flask...")
try:
subprocess.check_call([sys.executable, "-m", "pip", "install", "flask>=2.0.0"])
print("✅ Flask installed successfully")
return True
except subprocess.CalledProcessError:
print("❌ Failed to install Flask")
return False
dependencies_ok = False
# Check fuzzywuzzy for playlist validation
try:
import fuzzywuzzy
print("✅ fuzzywuzzy is installed (for playlist validation)")
except ImportError:
print("❌ fuzzywuzzy is not installed")
print("Installing fuzzywuzzy and python-Levenshtein...")
try:
subprocess.check_call([sys.executable, "-m", "pip", "install", "fuzzywuzzy>=0.18.0", "python-Levenshtein>=0.21.0"])
print("✅ fuzzywuzzy installed successfully")
except subprocess.CalledProcessError:
print("❌ Failed to install fuzzywuzzy")
print("⚠️ Playlist validation will work without fuzzy matching")
return dependencies_ok
def check_data_files():
"""Check if required data files exist."""
@ -71,7 +88,7 @@ def start_web_ui():
# Start Flask app
try:
print("🌐 Web UI will be available at: http://localhost:5000")
print("🌐 Web UI will be available at: http://localhost:5002")
print("📱 You can open this URL in your web browser")
print("\n⏳ Starting server... (Press Ctrl+C to stop)")
print("-" * 60)
@ -79,7 +96,7 @@ def start_web_ui():
# Open browser after a short delay
def open_browser():
sleep(2)
webbrowser.open("http://localhost:5000")
webbrowser.open("http://localhost:5002")
import threading
browser_thread = threading.Thread(target=open_browser)

View File

@ -24,7 +24,7 @@ def validate_data_files():
# Check for required files
required_files = [
'data/allSongs.json',
'data/songs.json',
'config/config.json'
]
@ -59,7 +59,7 @@ def analyze_song_data():
"""Analyze the song data structure and provide insights."""
print("\n=== Song Data Analysis ===")
all_songs_path = 'data/allSongs.json'
all_songs_path = 'data/songs.json'
if not os.path.exists(all_songs_path):
print(f"{all_songs_path} not found - cannot analyze song data")
return

1022
web/app.py

File diff suppressed because it is too large Load Diff

1000
web/templates/favorites.html Normal file

File diff suppressed because it is too large Load Diff

1047
web/templates/history.html Normal file

File diff suppressed because it is too large Load Diff

View File

@ -245,9 +245,141 @@
margin-top: 4px;
word-break: break-all;
}
/* Navigation */
.nav-link {
color: #6c757d;
}
.nav-link.active {
color: #007bff;
font-weight: bold;
}
/* Reset & Regenerate Button Styles */
#reset-regenerate-btn {
background: linear-gradient(135deg, #ff6b6b 0%, #ee5a24 100%);
border: none;
color: white;
font-weight: bold;
box-shadow: 0 4px 15px rgba(255, 107, 107, 0.3);
transition: all 0.3s ease;
}
#reset-regenerate-btn:hover {
background: linear-gradient(135deg, #ee5a24 0%, #ff6b6b 100%);
transform: translateY(-2px);
box-shadow: 0 6px 20px rgba(255, 107, 107, 0.4);
}
#reset-regenerate-btn:disabled {
background: #6c757d;
transform: none;
box-shadow: none;
}
.action-buttons-section {
background: linear-gradient(135deg, #f8f9fa 0%, #e9ecef 100%);
border-radius: 10px;
padding: 1rem;
border: 1px solid #dee2e6;
}
/* Progress Modal Styles */
.progress-container {
margin: 20px 0;
}
.progress-step {
font-size: 1.1rem;
font-weight: bold;
color: #007bff;
margin-bottom: 10px;
}
.progress-bar-container {
display: flex;
align-items: center;
gap: 10px;
margin-bottom: 15px;
}
.progress-bar {
flex: 1;
height: 20px;
background-color: #e9ecef;
border-radius: 10px;
overflow: hidden;
}
.progress-bar-fill {
height: 100%;
background: linear-gradient(90deg, #007bff, #0056b3);
transition: width 0.3s ease;
}
.progress-message {
color: #6c757d;
font-style: italic;
}
.cli-output-container {
margin-top: 20px;
border-top: 1px solid #dee2e6;
padding-top: 15px;
}
.cli-output {
background-color: #f8f9fa;
border: 1px solid #dee2e6;
border-radius: 5px;
padding: 10px;
max-height: 300px;
overflow-y: auto;
font-family: 'Courier New', monospace;
font-size: 0.9rem;
white-space: pre-wrap;
}
.modal-close {
color: #aaa;
float: right;
font-size: 28px;
font-weight: bold;
cursor: pointer;
}
.modal-close:hover {
color: #000;
}
</style>
</head>
<body>
<!-- Navigation -->
<nav class="navbar navbar-expand-lg navbar-dark bg-dark">
<div class="container-fluid">
<a class="navbar-brand" href="/">
<i class="fas fa-music"></i> Karaoke Manager
</a>
<div class="navbar-nav">
<a class="nav-link active" href="/">
<i class="fas fa-copy"></i> Duplicates
</a>
<a class="nav-link" href="/favorites">
<i class="fas fa-heart"></i> Favorites
</a>
<a class="nav-link" href="/history">
<i class="fas fa-history"></i> History
</a>
<a class="nav-link" href="/remaining-songs">
<i class="fas fa-list"></i> Remaining Songs
</a>
<a class="nav-link" href="/playlist-validation">
<i class="fas fa-list-check"></i> Playlist Validation
</a>
</div>
</div>
</nav>
<div class="container-fluid">
<!-- Header -->
<div class="row bg-primary text-white p-3 mb-4">
@ -310,6 +442,20 @@
</div>
</div>
<!-- Action Buttons -->
<div class="row mb-4">
<div class="col-12">
<div class="action-buttons-section">
<div class="d-flex justify-content-end">
<button id="reset-regenerate-btn" class="btn btn-lg" onclick="resetAndRegenerate()"
title="Delete all generated files and run the CLI tool again to regenerate everything">
<i class="fas fa-sync-alt"></i> Reset & Regenerate
</button>
</div>
</div>
</div>
</div>
<!-- File Type Breakdown -->
<div class="row mb-4">
<div class="col-md-4">
@ -450,7 +596,7 @@
<option value="">All Types</option>
<option value="mp4">MP4</option>
<option value="mp3">MP3</option>
<option value="mp3-only">MP3 Only (No MP4 Alternative)</option>
</select>
</div>
<div class="col-md-2">
@ -1316,6 +1462,124 @@
}
}
async function resetAndRegenerate() {
if (confirm('⚠️ WARNING: This will delete all generated files and run the CLI tool again.\n\nThis will:\n• Delete skipSongs.json\n• Delete all files in data/reports/\n• Delete all files in data/preferences/\n• Run the CLI tool to regenerate everything\n\nAre you sure you want to continue?')) {
try {
// Show progress modal
showProgressModal();
// Disable the button
const button = document.getElementById('reset-regenerate-btn');
button.disabled = true;
// Start the reset and regenerate process
const response = await fetch('/api/reset-and-regenerate', {
method: 'POST'
});
const result = await response.json();
if (result.success) {
// Start monitoring progress
startProgressMonitoring();
} else {
hideProgressModal();
alert('❌ Error: ' + result.error);
button.disabled = false;
}
} catch (error) {
console.error('Error during reset and regenerate:', error);
hideProgressModal();
alert('❌ Error during reset and regenerate: ' + error.message);
const button = document.getElementById('reset-regenerate-btn');
button.disabled = false;
}
}
}
function showProgressModal() {
const modal = document.getElementById('progressModal');
modal.style.display = 'block';
// Reset progress
document.getElementById('currentStep').textContent = 'Initializing...';
document.getElementById('progressBarFill').style.width = '0%';
document.getElementById('progressText').textContent = '0%';
document.getElementById('progressMessage').textContent = 'Starting process...';
document.getElementById('cliOutput').textContent = '';
}
function hideProgressModal() {
const modal = document.getElementById('progressModal');
modal.style.display = 'none';
}
function closeProgressModal() {
hideProgressModal();
// Re-enable the button
const button = document.getElementById('reset-regenerate-btn');
button.disabled = false;
}
function startProgressMonitoring() {
// Use polling for progress updates (more reliable than SSE)
const pollInterval = setInterval(async function() {
try {
const response = await fetch('/api/progress');
const data = await response.json();
updateProgress(data);
// If process is complete or error, stop polling
if (data.status === 'completed' || data.status === 'error') {
clearInterval(pollInterval);
if (data.status === 'completed') {
setTimeout(() => {
hideProgressModal();
alert('✅ Reset and regeneration completed successfully!\n\n' + data.message);
window.location.reload();
}, 2000);
} else {
setTimeout(() => {
hideProgressModal();
alert('❌ Error: ' + data.message);
const button = document.getElementById('reset-regenerate-btn');
button.disabled = false;
}, 2000);
}
}
} catch (error) {
console.error('Error polling progress:', error);
clearInterval(pollInterval);
hideProgressModal();
alert('❌ Error: Lost connection to progress updates');
const button = document.getElementById('reset-regenerate-btn');
button.disabled = false;
}
}, 1000); // Poll every second
}
function updateProgress(data) {
// Update progress bar
const progressBar = document.getElementById('progressBarFill');
const progressText = document.getElementById('progressText');
progressBar.style.width = data.progress + '%';
progressText.textContent = data.progress + '%';
// Update current step
document.getElementById('currentStep').textContent = data.current_step;
// Update message
document.getElementById('progressMessage').textContent = data.message;
// Update CLI output
const cliOutput = document.getElementById('cliOutput');
cliOutput.textContent = data.cli_output.join('\n');
cliOutput.scrollTop = cliOutput.scrollHeight; // Auto-scroll to bottom
}
// Video Player Functions
function normalizePath(filePath) {
// Debug logging to track path transformation - show original path first
@ -1552,5 +1816,35 @@
</div>
</div>
</div>
<!-- Progress Modal -->
<div id="progressModal" class="modal">
<div class="modal-content" style="max-width: 800px;">
<span class="modal-close" onclick="closeProgressModal()">&times;</span>
<h3><i class="fas fa-cog fa-spin"></i> Processing...</h3>
<div class="progress-container">
<div class="progress-step">
<span id="currentStep">Initializing...</span>
</div>
<div class="progress-bar-container">
<div class="progress-bar">
<div id="progressBarFill" class="progress-bar-fill" style="width: 0%"></div>
</div>
<span id="progressText">0%</span>
</div>
<div class="progress-message">
<span id="progressMessage">Starting process...</span>
</div>
</div>
<div class="cli-output-container">
<h4>CLI Output:</h4>
<div id="cliOutput" class="cli-output"></div>
</div>
</div>
</div>
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@ -48,9 +48,41 @@
.back-button {
margin-bottom: 1rem;
}
/* Navigation */
.nav-link {
color: #6c757d;
}
.nav-link.active {
color: #28a745;
font-weight: bold;
}
</style>
</head>
<body>
<!-- Navigation -->
<nav class="navbar navbar-expand-lg navbar-dark bg-dark">
<div class="container-fluid">
<a class="navbar-brand" href="/">
<i class="fas fa-music"></i> Karaoke Manager
</a>
<div class="navbar-nav">
<a class="nav-link" href="/">
<i class="fas fa-copy"></i> Duplicates
</a>
<a class="nav-link" href="/favorites">
<i class="fas fa-heart"></i> Favorites
</a>
<a class="nav-link" href="/history">
<i class="fas fa-history"></i> History
</a>
<a class="nav-link active" href="/remaining-songs">
<i class="fas fa-list"></i> Remaining Songs
</a>
</div>
</div>
</nav>
<div class="container-fluid">
<!-- Header -->
<div class="row mt-3">