Signed-off-by: Matt Bruce <mbrucedogs@gmail.com>

This commit is contained in:
Matt Bruce 2025-08-10 10:48:02 -05:00
parent d184724c70
commit dd916a646a
13 changed files with 322932 additions and 115 deletions

10
PRD.md
View File

@ -51,6 +51,7 @@ These principles are fundamental to the project's long-term success and must be
- **CLI Commands Documentation:** All CLI functionality, options, and usage examples must be documented in `cli/commands.txt` - **CLI Commands Documentation:** All CLI functionality, options, and usage examples must be documented in `cli/commands.txt`
- **Code Comments:** Significant logic changes should include inline documentation - **Code Comments:** Significant logic changes should include inline documentation
- **API Documentation:** New endpoints, functions, or interfaces must be documented - **API Documentation:** New endpoints, functions, or interfaces must be documented
- **API Update Requirement:** Whenever a new API endpoint is added, the PRD.md, README.md, and cli/commands.txt MUST be updated to reflect the new functionality
**Documentation Update Checklist:** **Documentation Update Checklist:**
- [ ] Update PRD.md with any architectural or requirement changes - [ ] Update PRD.md with any architectural or requirement changes
@ -59,6 +60,7 @@ These principles are fundamental to the project's long-term success and must be
- [ ] Add inline comments for complex logic or business rules - [ ] Add inline comments for complex logic or business rules
- [ ] Update any configuration examples or file structure documentation - [ ] Update any configuration examples or file structure documentation
- [ ] Review and update implementation status sections - [ ] Review and update implementation status sections
- [ ] **API Updates:** When new API endpoints are added, update PRD.md, README.md, and cli/commands.txt
**CLI Commands Documentation Requirements:** **CLI Commands Documentation Requirements:**
- **Comprehensive Coverage:** All CLI arguments, options, and flags must be documented with examples - **Comprehensive Coverage:** All CLI arguments, options, and flags must be documented with examples
@ -68,6 +70,14 @@ These principles are fundamental to the project's long-term success and must be
- **Integration Notes:** Document how CLI integrates with web UI and other components - **Integration Notes:** Document how CLI integrates with web UI and other components
- **Version Tracking:** Keep version information and feature status up to date - **Version Tracking:** Keep version information and feature status up to date
**API Documentation Requirements:**
- **Endpoint Documentation:** All new API endpoints must be documented in the PRD.md with their purpose, parameters, and responses
- **README Integration:** API changes must be reflected in README.md with usage examples and integration notes
- **CLI Integration:** If CLI commands interact with APIs, they must be documented in cli/commands.txt
- **Version Tracking:** API versioning and changes must be tracked in documentation
- **Error Handling:** Document all possible error responses and status codes
- **Authentication:** Document any authentication requirements or API key usage
This documentation requirement is mandatory and ensures the project remains maintainable and accessible to future developers and users. This documentation requirement is mandatory and ensures the project remains maintainable and accessible to future developers and users.
### 2.3 Code Quality & Development Standards ### 2.3 Code Quality & Development Standards

View File

@ -10,6 +10,7 @@ A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke
- **CDG/MP3 Pairing**: Treats CDG and MP3 files with the same base filename as single karaoke units - **CDG/MP3 Pairing**: Treats CDG and MP3 files with the same base filename as single karaoke units
- **Channel Priority**: For MP4 files, prioritizes based on folder names in the path - **Channel Priority**: For MP4 files, prioritizes based on folder names in the path
- **Fuzzy Matching**: Configurable fuzzy matching for artist/title comparison - **Fuzzy Matching**: Configurable fuzzy matching for artist/title comparison
- **Playlist Validation**: Validates playlists against your song library with exact and fuzzy matching
### File Type Priority System ### File Type Priority System
1. **MP4 files** (with channel priority sorting) 1. **MP4 files** (with channel priority sorting)
@ -32,12 +33,34 @@ A comprehensive tool for analyzing, deduplicating, and cleaning up large karaoke
## Installation ## Installation
1. Clone the repository ### Prerequisites
- Python 3.7 or higher
- pip (Python package installer)
### Installation Steps
1. Clone the repository:
```bash
git clone <repository-url>
cd KaraokeMerge
```
2. Install dependencies: 2. Install dependencies:
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
``` ```
**Note**: The installation includes:
- **Flask** for the web UI
- **fuzzywuzzy** and **python-Levenshtein** for fuzzy matching in playlist validation
- All other required dependencies
3. Verify installation:
```bash
python -c "import flask, fuzzywuzzy; print('All dependencies installed successfully!')"
```
## Usage ## Usage
### CLI Tool ### CLI Tool
@ -64,6 +87,21 @@ The web UI will automatically:
2. Start the Flask server 2. Start the Flask server
3. Open your default browser to the interface 3. Open your default browser to the interface
### Playlist Validation
Validate your playlists against your song library:
```bash
cd cli
python playlist_validator.py
```
Options:
- `--playlist-index N`: Validate a specific playlist by index
- `--output results.json`: Save results to a JSON file
- `--apply`: Apply corrections to playlists (use with caution)
**Note**: Playlist validation uses fuzzy matching to find potential matches. Make sure fuzzywuzzy is installed for best results.
### Priority Preferences ### Priority Preferences
The web UI now supports drag-and-drop priority management: The web UI now supports drag-and-drop priority management:

View File

@ -1,77 +1,117 @@
# Karaoke Song Library Cleanup Tool - CLI Commands Reference # Karaoke Song Library Cleanup Tool - CLI Commands Reference (v2.0)
## Overview ## Overview
The CLI tool analyzes karaoke song collections, identifies duplicates, and generates skip lists for future imports. It supports multiple file formats (MP3, CDG, MP4) with configurable priority systems. The CLI tool analyzes karaoke song collections, identifies duplicates, validates playlists, and generates skip lists for future imports. It supports multiple file formats (MP3, CDG, MP4) with configurable priority systems.
## Basic Usage ## Quick Start Commands
### Standard Analysis ### Basic Analysis (Most Common)
```bash ```bash
python cli/main.py cd cli
python3 main.py
``` ```
Runs the tool with default settings: Runs the tool with default settings:
- Input: `data/allSongs.json` - Input: `data/allSongs.json`
- Config: `config/config.json` - Config: `config/config.json`
- Output: `data/skipSongs.json` - Output: `data/skipSongs.json`
- Verbose: Disabled - Reports: **Automatically generated**
- Reports: **Automatically generated** (including web UI data)
### Verbose Output ### Process Everything (Recommended)
```bash ```bash
python cli/main.py --verbose cd cli
python3 main.py --process-all
```
Complete processing including:
- Duplicate analysis and skip list generation
- Favorites processing with priority logic (MP4 over MP3)
- History processing with priority logic
- Comprehensive report generation
## Main CLI Commands (main.py)
### Basic Analysis Commands
#### Standard Analysis
```bash
python3 main.py
```
Runs the tool with default settings and generates all reports automatically.
#### Verbose Output
```bash
python3 main.py --verbose
# or # or
python cli/main.py -v python3 main.py -v
``` ```
Enables detailed output showing: Enables detailed output showing individual song processing and decisions.
- Individual song processing
- Duplicate detection details
- File type analysis
- Channel priority decisions
### Dry Run Mode #### Dry Run Mode
```bash ```bash
python cli/main.py --dry-run python3 main.py --dry-run
``` ```
Analyzes songs without generating the skip list file. Useful for: Analyzes songs without generating the skip list file. Useful for testing and previewing results.
- Testing configuration changes
- Previewing results before committing
- Validating input data
## Configuration Options ### Configuration Commands
### Custom Configuration File #### Custom Configuration File
```bash ```bash
python cli/main.py --config path/to/custom_config.json python3 main.py --config path/to/custom_config.json
``` ```
Uses a custom configuration file instead of the default `config/config.json`. Uses a custom configuration file instead of the default `config/config.json`.
### Show Current Configuration #### Show Current Configuration
```bash ```bash
python cli/main.py --show-config python3 main.py --show-config
``` ```
Displays the current configuration settings and exits. Useful for: Displays the current configuration settings and exits.
- Verifying configuration values
- Debugging configuration issues
- Understanding current settings
## Input/Output Options ### Input/Output Commands
### Custom Input File #### Custom Input File
```bash ```bash
python cli/main.py --input path/to/songs.json python3 main.py --input path/to/songs.json
``` ```
Specifies a custom input file instead of the default `data/allSongs.json`. Specifies a custom input file instead of the default `data/allSongs.json`.
### Custom Output Directory #### Custom Output Directory
```bash ```bash
python cli/main.py --output-dir ./custom_output python3 main.py --output-dir ./custom_output
``` ```
Saves output files to a custom directory instead of the default `data/` folder. Saves output files to a custom directory instead of the default `data/` folder.
## Report Generation ### Processing Commands
### Detailed Reports (Always Generated) #### Process Favorites Only
Reports are now **automatically generated** every time you run the CLI tool. The `--save-reports` flag is kept for backward compatibility but is no longer required. ```bash
python3 main.py --process-favorites
```
Processes favorites with priority-based logic to select best versions (MP4 over MP3).
#### Process History Only
```bash
python3 main.py --process-history
```
Processes history with priority-based logic to select best versions (MP4 over MP3).
#### Process Everything
```bash
python3 main.py --process-all
```
Processes everything: duplicates, generates reports, AND updates favorites/history with priority logic.
#### Merge History Objects
```bash
python3 main.py --merge-history
```
Merges history objects that match on artist, title, and path, summing their count properties.
### Report Generation
#### Save Detailed Reports (Legacy)
```bash
python3 main.py --save-reports
```
**Note**: Reports are now automatically generated every time you run the CLI tool. This flag is kept for backward compatibility.
Generated reports include: Generated reports include:
- `enhanced_summary_report.txt` - Comprehensive analysis - `enhanced_summary_report.txt` - Comprehensive analysis
@ -82,43 +122,244 @@ Generated reports include:
- `analysis_data.json` - Raw analysis data for further processing - `analysis_data.json` - Raw analysis data for further processing
- `skip_songs_detailed.json` - **Web UI data (always generated)** - `skip_songs_detailed.json` - **Web UI data (always generated)**
## Combined Examples ## Playlist Validator Commands (playlist_validator.py)
### Full Analysis with Reports ### Basic Playlist Validation
#### Validate All Playlists
```bash ```bash
python cli/main.py --verbose python3 playlist_validator.py
``` ```
Runs complete analysis with: Validates all playlists in `data/songLists.json` against the song library.
#### Validate Specific Playlist
```bash
python3 playlist_validator.py --playlist-index 0
```
Validates a specific playlist by index (0-based).
### Playlist Validator Options
#### Custom Configuration
```bash
python3 playlist_validator.py --config path/to/custom_config.json
```
Uses a custom configuration file.
#### Custom Data Directory
```bash
python3 playlist_validator.py --data-dir path/to/data
```
Uses a custom data directory.
#### Apply Changes (Disable Dry Run)
```bash
python3 playlist_validator.py --apply
```
Applies changes to playlists instead of just previewing them.
#### Output Results to File
```bash
python3 playlist_validator.py --output results.json
```
Saves validation results to a JSON file.
## Comprehensive Examples
### Complete Workflow Examples
#### 1. Full Analysis with Everything
```bash
cd cli
python3 main.py --process-all --verbose
```
Complete processing with detailed output:
- Duplicate analysis and skip list generation
- Favorites and history processing with priority logic
- Comprehensive report generation
- Verbose output for detailed processing information - Verbose output for detailed processing information
- **Automatic comprehensive report generation**
- Skip list creation
### Custom Configuration with Dry Run #### 2. Preview Changes Before Applying
```bash ```bash
python cli/main.py --config custom_config.json --dry-run --verbose cd cli
python3 main.py --process-all --dry-run --verbose
``` ```
Tests a custom configuration without generating files: Preview all changes without saving:
- Uses custom configuration - Shows what would be processed
- No files are modified
- Useful for testing configuration changes
#### 3. Custom Configuration Testing
```bash
cd cli
python3 main.py --config custom_config.json --dry-run --verbose
```
Test a custom configuration:
- Uses custom configuration file
- Shows detailed processing - Shows detailed processing
- No output files created - No output files created
### Custom Input/Output with Reports #### 4. Process Only Favorites and History
```bash ```bash
python cli/main.py --input /path/to/songs.json --output-dir ./reports cd cli
python3 main.py --process-favorites --process-history
```
Process only favorites and history files:
- Updates favorites with best versions (MP4 over MP3)
- Updates history with best versions
- No duplicate analysis performed
#### 5. Merge History Objects
```bash
cd cli
python3 main.py --merge-history --dry-run
```
Preview history merging:
- Shows which history objects would be merged
- No files are modified
#### 6. Apply History Merging
```bash
cd cli
python3 main.py --merge-history
```
Actually merge history objects:
- Combines duplicate history entries
- Sums count properties
- Saves updated history file
### Playlist Validation Examples
#### 1. Validate All Playlists
```bash
cd cli
python3 playlist_validator.py
```
Validates all playlists and shows summary:
- Total playlists and songs
- Exact matches found
- Missing songs count
- Fuzzy matches (if available)
#### 2. Validate Specific Playlist
```bash
cd cli
python3 playlist_validator.py --playlist-index 5
```
Validates playlist at index 5:
- Shows detailed results for that specific playlist
- Lists exact matches and missing songs
#### 3. Save Validation Results
```bash
cd cli
python3 playlist_validator.py --output validation_results.json
```
Saves detailed validation results to JSON file for further analysis.
#### 4. Apply Playlist Corrections
```bash
cd cli
python3 playlist_validator.py --apply
```
Applies corrections to playlists (use with caution).
### Advanced Examples
#### 1. Custom Input/Output with Full Processing
```bash
cd cli
python3 main.py --input /path/to/songs.json --output-dir ./reports --process-all --verbose
``` ```
Processes custom input and saves all outputs to reports directory: Processes custom input and saves all outputs to reports directory:
- Custom input file - Custom input file
- Custom output location - Custom output location
- **All report files automatically generated** - Full processing including favorites/history
- Verbose output
### Minimal Output #### 2. Configuration Testing Workflow
```bash ```bash
python cli/main.py --output-dir ./minimal cd cli
# Show current configuration
python3 main.py --show-config
# Test with dry run
python3 main.py --dry-run --verbose
# Test with custom config
python3 main.py --config test_config.json --dry-run --verbose
``` ```
Runs with minimal output:
- No verbose logging #### 3. Playlist Analysis Workflow
- No detailed reports ```bash
- Only generates skip list cd cli
# Validate all playlists
python3 playlist_validator.py
# Validate specific playlist
python3 playlist_validator.py --playlist-index 0
# Save detailed results
python3 playlist_validator.py --output playlist_analysis.json
```
#### 4. Complete System Analysis
```bash
cd cli
# Process everything
python3 main.py --process-all --verbose
# Validate playlists
python3 playlist_validator.py
# Show configuration
python3 main.py --show-config
```
## Command Line Options Reference
### Main CLI (main.py) Options
| Option | Description | Default |
|--------|-------------|---------|
| `--config` | Configuration file path | `../config/config.json` |
| `--input` | Input songs file path | `../data/allSongs.json` |
| `--output-dir` | Output directory | `../data` |
| `--verbose, -v` | Enable verbose output | `False` |
| `--dry-run` | Analyze without generating files | `False` |
| `--save-reports` | Save detailed reports | `True` (always enabled) |
| `--show-config` | Show configuration and exit | `False` |
| `--process-favorites` | Process favorites with priority logic | `False` |
| `--process-history` | Process history with priority logic | `False` |
| `--process-all` | Process everything | `False` |
| `--merge-history` | Merge history objects | `False` |
### Playlist Validator (playlist_validator.py) Options
| Option | Description | Default |
|--------|-------------|---------|
| `--config` | Configuration file path | `../config/config.json` |
| `--data-dir` | Data directory path | `../data` |
| `--dry-run` | Dry run mode | `True` |
| `--apply` | Apply changes (disable dry run) | `False` |
| `--playlist-index` | Validate specific playlist by index | `None` |
| `--output` | Output results to JSON file | `None` |
## File Structure Requirements
### Required Files
- `data/allSongs.json` - Main song library
- `config/config.json` - Configuration settings
### Optional Files
- `data/favorites.json` - Favorites list (for processing)
- `data/history.json` - History list (for processing)
- `data/songLists.json` - Playlists (for validation)
### Generated Files
- `data/skipSongs.json` - Skip list for future imports
- `data/reports/` - Directory containing all analysis reports
- `data/preferences/` - Directory containing priority preferences
## Configuration File Structure ## Configuration File Structure
@ -148,31 +389,9 @@ The default configuration file (`config/config.json`) contains:
} }
``` ```
### Configuration Options Explained ## Input File Formats
#### Channel Priorities
- **channel_priorities**: Array of folder names for MP4 files
- Order determines priority (first = highest priority)
- Files without matching folders are marked for manual review
#### Matching Settings
- **fuzzy_matching**: Enable/disable fuzzy string matching
- **fuzzy_threshold**: Similarity threshold (0.0-1.0) for fuzzy matching
- **case_sensitive**: Case-sensitive artist/title comparison
#### Output Settings
- **verbose**: Enable detailed output
- **include_reasons**: Include reason field in skip list
- **max_duplicates_per_song**: Maximum duplicates to process per song
#### File Type Settings
- **supported_extensions**: All supported file extensions
- **mp4_extensions**: Extensions treated as MP4 files
## Input File Format
The tool expects a JSON array of song objects:
### Song Library Format (allSongs.json)
```json ```json
[ [
{ {
@ -183,9 +402,45 @@ The tool expects a JSON array of song objects:
] ]
``` ```
Optional fields for MP4 files: ### Playlist Format (songLists.json)
- `channel`: Channel/folder information ```json
- ID3 tag information (artist, title, etc.) [
{
"title": "Playlist Name",
"songs": [
{
"position": 1,
"artist": "Artist Name",
"title": "Song Title"
}
]
}
]
```
### Favorites Format (favorites.json)
```json
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3",
"favorite": true
}
]
```
### History Format (history.json)
```json
[
{
"artist": "Artist Name",
"title": "Song Title",
"path": "path/to/file.mp3",
"count": 5
}
]
```
## Output Files ## Output Files
@ -193,7 +448,7 @@ Optional fields for MP4 files:
- **skipSongs.json**: List of file paths to skip in future imports - **skipSongs.json**: List of file paths to skip in future imports
- Format: `[{"path": "file/path.mp3", "reason": "duplicate"}]` - Format: `[{"path": "file/path.mp3", "reason": "duplicate"}]`
### Report Files (with --save-reports) ### Report Files (Automatically Generated)
- **enhanced_summary_report.txt**: Overall analysis and statistics - **enhanced_summary_report.txt**: Overall analysis and statistics
- **channel_optimization_report.txt**: Channel priority suggestions - **channel_optimization_report.txt**: Channel priority suggestions
- **duplicate_pattern_report.txt**: Duplicate detection patterns - **duplicate_pattern_report.txt**: Duplicate detection patterns
@ -222,7 +477,7 @@ The tool provides clear error messages for:
## Performance Notes ## Performance Notes
- Successfully tested with 37,000+ songs - Successfully tested with 49,000+ songs
- Processes large datasets efficiently - Processes large datasets efficiently
- Shows progress indicators for long operations - Shows progress indicators for long operations
- Memory-efficient processing - Memory-efficient processing
@ -245,17 +500,29 @@ The CLI tool integrates with the web UI:
### Debug Mode ### Debug Mode
```bash ```bash
python cli/main.py --verbose --dry-run --show-config cd cli
python3 main.py --verbose --dry-run --show-config
``` ```
Complete debugging setup: Complete debugging setup:
- Shows configuration - Shows configuration
- Verbose processing - Verbose processing
- No file changes - No file changes
### Playlist Validator Debug
```bash
cd cli
python3 playlist_validator.py --dry-run --output debug_results.json
```
Debug playlist validation:
- Dry run mode
- Save results to file
- No playlist modifications
## Version Information ## Version Information
This commands reference is for Karaoke Song Library Cleanup Tool v2.0 This commands reference is for Karaoke Song Library Cleanup Tool v2.0
- CLI: Fully functional with comprehensive options - CLI: Fully functional with comprehensive options
- Web UI: Interactive priority management - Web UI: Interactive priority management
- Priority System: Drag-and-drop with persistence - Priority System: Drag-and-drop with persistence
- Reports: Enhanced analysis with actionable insights - Reports: Enhanced analysis with actionable insights
- Playlist Validator: Complete playlist analysis and validation

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -17,6 +17,7 @@ from utils import (
extract_consolidated_channel_from_path, extract_consolidated_channel_from_path,
get_file_extension, get_file_extension,
parse_multi_artist, parse_multi_artist,
clean_artist_name,
validate_song_data, validate_song_data,
find_mp3_pairs find_mp3_pairs
) )
@ -63,10 +64,15 @@ class SongMatcher:
if not validate_song_data(song): if not validate_song_data(song):
continue continue
# Handle multi-artist songs # Clean and handle artist names
artists = parse_multi_artist(song['artist']) cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists: if not artists:
artists = [song['artist']] artists = [cleaned_artist]
# Create groups for each artist variation # Create groups for each artist variation
for artist in artists: for artist in artists:
@ -90,10 +96,15 @@ class SongMatcher:
if i % 1000 == 0 and i > 0: if i % 1000 == 0 and i > 0:
print(f"Processing song {i:,}/{len(songs):,}...") print(f"Processing song {i:,}/{len(songs):,}...")
# Handle multi-artist songs # Clean and handle artist names
artists = parse_multi_artist(song['artist']) cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists: if not artists:
artists = [song['artist']] artists = [cleaned_artist]
# Try exact matching first # Try exact matching first
added_to_exact = False added_to_exact = False
@ -117,10 +128,15 @@ class SongMatcher:
if i % 100 == 0 and i > 0: if i % 100 == 0 and i > 0:
print(f"Fuzzy matching song {i:,}/{len(ungrouped_songs):,}...") print(f"Fuzzy matching song {i:,}/{len(ungrouped_songs):,}...")
# Handle multi-artist songs # Clean and handle artist names
artists = parse_multi_artist(song['artist']) cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists: if not artists:
artists = [song['artist']] artists = [cleaned_artist]
# Try to find an existing fuzzy group # Try to find an existing fuzzy group
added_to_group = False added_to_group = False

File diff suppressed because it is too large Load Diff

View File

@ -21,6 +21,7 @@ from utils import (
extract_channel_from_path, extract_channel_from_path,
get_file_extension, get_file_extension,
parse_multi_artist, parse_multi_artist,
clean_artist_name,
validate_song_data validate_song_data
) )
@ -63,10 +64,15 @@ class PlaylistValidator:
if not validate_song_data(song): if not validate_song_data(song):
continue continue
# Handle multi-artist songs # Clean and handle artist names
artists = parse_multi_artist(song['artist']) cleaned_artist = clean_artist_name(song['artist'])
if not cleaned_artist:
cleaned_artist = song['artist'] # Fallback to original if cleaning fails
# Handle multi-artist songs (after cleaning)
artists = parse_multi_artist(cleaned_artist)
if not artists: if not artists:
artists = [song['artist']] artists = [cleaned_artist]
# Create exact match keys # Create exact match keys
for artist in artists: for artist in artists:

View File

@ -218,6 +218,50 @@ def extract_consolidated_channel_from_path(file_path: str, channel_priorities: L
return None return None
def clean_artist_name(artist_string: str) -> str:
"""Clean artist name by removing features, collaborations, etc."""
if not artist_string:
return ""
# Remove common feature/collaboration patterns (more precise)
patterns_to_remove = [
r'\s*feat\.?\s*.*$', # feat. anything after
r'\s*ft\.?\s*.*$', # ft. anything after
r'\s*featuring\s*.*$', # featuring anything after
r'\s*with\s*.*$', # with anything after
r'\s*presents\s*.*$', # presents anything after
r'\s*featuring\s*.*$', # featuring anything after
r'\s*feat\s*.*$', # feat anything after
r'\s*ft\s*.*$', # ft anything after
]
# Handle comma/semicolon/slash patterns more carefully
# Only remove if they're followed by feature words
separator_patterns = [
r'\s*,\s*(feat\.?|ft\.?|featuring|with|presents).*$', # comma followed by feature words
r'\s*;\s*(feat\.?|ft\.?|featuring|with|presents).*$', # semicolon followed by feature words
r'\s*/\s*(feat\.?|ft\.?|featuring|with|presents).*$', # slash followed by feature words
]
cleaned_artist = artist_string
# Apply feature removal patterns first
for pattern in patterns_to_remove:
cleaned_artist = re.sub(pattern, '', cleaned_artist, flags=re.IGNORECASE)
# Apply separator patterns only if they're followed by feature words
for pattern in separator_patterns:
cleaned_artist = re.sub(pattern, '', cleaned_artist, flags=re.IGNORECASE)
# Clean up any trailing separators that might be left
cleaned_artist = re.sub(r'\s*[,;/]\s*$', '', cleaned_artist)
# Clean up extra whitespace
cleaned_artist = re.sub(r'\s+', ' ', cleaned_artist).strip()
return cleaned_artist
def parse_multi_artist(artist_string: str) -> List[str]: def parse_multi_artist(artist_string: str) -> List[str]:
"""Parse multi-artist strings with various delimiters.""" """Parse multi-artist strings with various delimiters."""
if not artist_string: if not artist_string:

View File

@ -5,7 +5,7 @@
"Stingray Karaoke" "Stingray Karaoke"
], ],
"matching": { "matching": {
"fuzzy_matching": false, "fuzzy_matching": true,
"fuzzy_threshold": 0.85, "fuzzy_threshold": 0.85,
"case_sensitive": false "case_sensitive": false
}, },

View File

@ -1,16 +1,12 @@
# Python dependencies for KaraokeMerge CLI tool # Python dependencies for KaraokeMerge CLI tool
# Core dependencies (currently using only standard library) # Core dependencies
# No external dependencies required for basic functionality flask>=2.0.0
# Optional dependencies for enhanced features: # Fuzzy matching dependencies (required for playlist validation)
# Uncomment the following lines if you want to enable fuzzy matching:
fuzzywuzzy>=0.18.0 fuzzywuzzy>=0.18.0
python-Levenshtein>=0.21.0 python-Levenshtein>=0.21.0
# For future enhancements: # For future enhancements:
# pandas>=1.5.0 # For advanced data analysis # pandas>=1.5.0 # For advanced data analysis
# click>=8.0.0 # For enhanced CLI interface # click>=8.0.0 # For enhanced CLI interface
# Web UI dependencies
flask>=2.0.0

View File

@ -10,21 +10,38 @@ import webbrowser
from time import sleep from time import sleep
def check_dependencies(): def check_dependencies():
"""Check if Flask is installed.""" """Check if required dependencies are installed."""
dependencies_ok = True
# Check Flask
try: try:
import flask import flask
print("✅ Flask is installed") print("✅ Flask is installed")
return True
except ImportError: except ImportError:
print("❌ Flask is not installed") print("❌ Flask is not installed")
print("Installing Flask...") print("Installing Flask...")
try: try:
subprocess.check_call([sys.executable, "-m", "pip", "install", "flask>=2.0.0"]) subprocess.check_call([sys.executable, "-m", "pip", "install", "flask>=2.0.0"])
print("✅ Flask installed successfully") print("✅ Flask installed successfully")
return True
except subprocess.CalledProcessError: except subprocess.CalledProcessError:
print("❌ Failed to install Flask") print("❌ Failed to install Flask")
return False dependencies_ok = False
# Check fuzzywuzzy for playlist validation
try:
import fuzzywuzzy
print("✅ fuzzywuzzy is installed (for playlist validation)")
except ImportError:
print("❌ fuzzywuzzy is not installed")
print("Installing fuzzywuzzy and python-Levenshtein...")
try:
subprocess.check_call([sys.executable, "-m", "pip", "install", "fuzzywuzzy>=0.18.0", "python-Levenshtein>=0.21.0"])
print("✅ fuzzywuzzy installed successfully")
except subprocess.CalledProcessError:
print("❌ Failed to install fuzzywuzzy")
print("⚠️ Playlist validation will work without fuzzy matching")
return dependencies_ok
def check_data_files(): def check_data_files():
"""Check if required data files exist.""" """Check if required data files exist."""

View File

@ -1449,4 +1449,4 @@ def apply_all_updates():
if __name__ == '__main__': if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5000) app.run(debug=True, host='0.0.0.0', port=5001)