Signed-off-by: Matt Bruce <mbrucedogs@gmail.com>

This commit is contained in:
Matt Bruce 2025-08-01 08:01:07 -05:00
parent ddbc6a9ebc
commit 9124640bf4
6 changed files with 1065 additions and 630 deletions

View File

@ -2,55 +2,86 @@
## Overview ## Overview
The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool creates separate output files for successful and failed songs, along with detailed processing reports.
## Basic Command Structure ## Basic Command Structure
```bash ```bash
python musicbrainz_cleaner.py <input_file> [output_file] [options] docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main [options]
``` ```
## Command Arguments ## Command Options
### Required Arguments ### Main Options
| Argument | Type | Description | Example | | Option | Type | Description | Default | Example |
|----------|------|-------------|---------| |--------|------|-------------|---------|---------|
| `input_file` | string | Path to the JSON file containing song data | `my_songs.json` | | `--source` | string | Source JSON file path | `data/songs.json` | `--source data/my_songs.json` |
| `--output-success` | string | Output file for successful songs | `source-success.json` | `--output-success cleaned.json` |
### Optional Arguments | `--output-failure` | string | Output file for failed songs | `source-failure.json` | `--output-failure failed.json` |
| `--limit` | number | Process only first N songs | None (all songs) | `--limit 1000` |
| Argument | Type | Description | Example | | `--use-api` | flag | Force use of HTTP API instead of database | Database mode | `--use-api` |
|----------|------|-------------|---------| | `--test-connection` | flag | Test connection to MusicBrainz server | None | `--test-connection` |
| `output_file` | string | Path for the cleaned output file | `cleaned_songs.json` | | `--help` | flag | Show help information | None | `--help` |
| `--help` | flag | Show help information | `--help` | | `--version` | flag | Show version information | None | `--version` |
| `--version` | flag | Show version information | `--version` |
## Command Examples ## Command Examples
### Basic Usage ### Basic Usage (Default)
```bash ```bash
# Clean songs and save to auto-generated filename # Process all songs with default settings
python musicbrainz_cleaner.py songs.json docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Output: songs_cleaned.json # Output: data/songs-success.json and data/songs-failure.json
``` ```
### Custom Output File ### Custom Source File
```bash ```bash
# Specify custom output filename # Process specific file
python musicbrainz_cleaner.py songs.json cleaned_songs.json docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
# Output: data/my_songs-success.json and data/my_songs-failure.json
```
### Custom Output Files
```bash
# Specify custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json
```
### Limited Processing
```bash
# Process only first 1000 songs
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
```
### Force API Mode
```bash
# Use HTTP API instead of database (slower but works without PostgreSQL)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
```
### Test Connection
```bash
# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
# Test API connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
``` ```
### Help and Information ### Help and Information
```bash ```bash
# Show help information # Show help information
python musicbrainz_cleaner.py --help docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help
# Show version information # Show version information
python musicbrainz_cleaner.py --version docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version
``` ```
## Input File Format ## Input File Format
@ -84,9 +115,13 @@ Any additional fields will be preserved in the output:
- `path`: File path - `path`: File path
- Any other custom fields - Any other custom fields
## Output File Format ## Output Files
The output file will contain the same structure with cleaned data and added MBID fields: The tool creates **three output files**:
### 1. Successful Songs (`source-success.json`)
Array of successfully processed songs with MBIDs added:
```json ```json
[ [
@ -103,49 +138,107 @@ The output file will contain the same structure with cleaned data and added MBID
] ]
``` ```
### Added Fields ### 2. Failed Songs (`source-failure.json`)
Array of songs that couldn't be processed (same format as source):
```json
[
{
"artist": "Unknown Artist",
"title": "Unknown Song",
"disabled": false,
"favorite": false,
"guid": "12345678-1234-1234-1234-123456789012",
"path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
}
]
```
### 3. Processing Report (`processing_report_YYYYMMDD_HHMMSS.txt`)
Human-readable text report with statistics and failed song list:
```
MusicBrainz Data Cleaner - Processing Report
==================================================
Source File: data/songs.json
Processing Date: 2024-12-19 14:30:22
Processing Time: 15263.3 seconds
SUMMARY
--------------------
Total Songs Processed: 49,170
Successful Songs: 40,692
Failed Songs: 8,478
Success Rate: 82.8%
DETAILED STATISTICS
--------------------
Artists Found: 44,526/49,170 (90.6%)
Recordings Found: 40,998/49,170 (83.4%)
Processing Speed: 3.2 songs/second
OUTPUT FILES
--------------------
Successful Songs: data/songs-success.json
Failed Songs: data/songs-failure.json
Report File: data/processing_report_20241219_143022.txt
FAILED SONGS (First 50)
--------------------
1. Unknown Artist - Unknown Song
2. Invalid Artist - Invalid Title
3. Test Artist - Test Song
...
```
### Added Fields (Successful Songs Only)
- `mbid`: MusicBrainz Artist ID (string) - `mbid`: MusicBrainz Artist ID (string)
- `recording_mbid`: MusicBrainz Recording ID (string) - `recording_mbid`: MusicBrainz Recording ID (string)
## Command Line Options ## Processing Output
### Help Option ### Progress Indicators
```bash ```
python musicbrainz_cleaner.py --help 🚀 Starting song processing...
📊 Total songs to process: 49,170
Using database connection
==================================================
[1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
[2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
[3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
==================================================
🎉 Processing completed!
📊 Final Results:
⏱️ Total processing time: 15263.3 seconds
🚀 Average speed: 3.2 songs/second
✅ Artists found: 44,526/49,170 (90.6%)
✅ Recordings found: 40,998/49,170 (83.4%)
❌ Failed songs: 8,478 (17.2%)
📄 Files saved:
✅ Successful songs: data/songs-success.json
❌ Failed songs: data/songs-failure.json
📋 Text report: data/processing_report_20241219_143022.txt
📊 JSON report: data/processing_report_20241219_143022.json
``` ```
**Output:** ### Status Indicators
```
Usage: python musicbrainz_cleaner.py <input_file.json> [output_file.json]
MusicBrainz Data Cleaner - Clean and normalize song data using MusicBrainz | Symbol | Meaning | Description |
|--------|---------|-------------|
Arguments: | ✅ | Success | Song processed successfully with MBIDs found |
input_file.json JSON file containing array of song objects | ❌ | Failure | Song processing failed (no MBIDs found) |
output_file.json Optional: Output file for cleaned data | 📈 | Progress | Progress update with statistics |
(default: input_file_cleaned.json) | 🚀 | Start | Processing started |
| 🎉 | Complete | Processing completed successfully |
Examples:
python musicbrainz_cleaner.py songs.json
python musicbrainz_cleaner.py songs.json cleaned_songs.json
Requirements:
- MusicBrainz server running on http://localhost:5001
- Python 3.6+ with requests library
```
### Version Option
```bash
python musicbrainz_cleaner.py --version
```
**Output:**
```
MusicBrainz Data Cleaner v1.0.0
```
## Error Messages and Exit Codes ## Error Messages and Exit Codes
@ -161,7 +254,7 @@ MusicBrainz Data Cleaner v1.0.0
#### File Not Found #### File Not Found
``` ```
Error: File 'songs.json' not found Error: Source file does not exist: data/songs.json
``` ```
#### Invalid JSON #### Invalid JSON
@ -171,12 +264,12 @@ Error: Invalid JSON in file 'songs.json'
#### Invalid Input Format #### Invalid Input Format
``` ```
Error: Input file should contain a JSON array of songs Error: Source file should contain a JSON array of songs
``` ```
#### Connection Error #### Connection Error
``` ```
Error searching for artist 'Artist Name': Connection refused ❌ Connection to MusicBrainz database failed
``` ```
#### Missing Dependencies #### Missing Dependencies
@ -184,112 +277,95 @@ Error searching for artist 'Artist Name': Connection refused
ModuleNotFoundError: No module named 'requests' ModuleNotFoundError: No module named 'requests'
``` ```
## Processing Output ## Environment Configuration
### Progress Indicators ### Docker Environment
``` The tool runs in a Docker container with the following configuration:
Processing 3 songs...
==================================================
[1/3] Processing: ACDC - Shot In The Dark
✅ Found artist: AC/DC (MBID: 66c662b6-6e2f-4930-8610-912e24c63ed1)
✅ Found recording: Shot in the Dark (MBID: cf8b5cd0-d97c-413d-882f-fc422a2e57db)
✅ Updated to: AC/DC - Shot in the Dark
[2/3] Processing: Bruno Mars ft. Cardi B - Finesse Remix
❌ Could not find artist: Bruno Mars ft. Cardi B
[3/3] Processing: Taylor Swift - Love Story
✅ Found artist: Taylor Swift (MBID: 20244d07-534f-4eff-b4d4-930878889970)
✅ Found recording: Love Story (MBID: d783e6c5-761f-4fc3-bfcf-6089cdfc8f96)
✅ Updated to: Taylor Swift - Love Story
==================================================
✅ Processing complete!
📁 Output saved to: songs_cleaned.json
```
### Status Indicators
| Symbol | Meaning | Description |
|--------|---------|-------------|
| ✅ | Success | Operation completed successfully |
| ❌ | Error | Operation failed |
| 🔄 | Processing | Currently processing |
## Batch Processing
### Multiple Files
To process multiple files, you can use shell scripting:
```bash
# Process all JSON files in current directory
for file in *.json; do
python musicbrainz_cleaner.py "$file"
done
```
### Large Files
For large files, the tool processes songs one at a time with a 0.1-second delay between API calls to be respectful to the MusicBrainz server.
## Environment Variables
The tool uses the following default configuration:
| Setting | Default | Description | | Setting | Default | Description |
|---------|---------|-------------| |---------|---------|-------------|
| MusicBrainz URL | `http://localhost:5001` | Local MusicBrainz server URL | | Database Host | `db` | PostgreSQL database container |
| API Delay | `0.1` seconds | Delay between API calls | | Database Port | `5432` | PostgreSQL port |
| Database Name | `musicbrainz_db` | MusicBrainz database name |
| API URL | `http://localhost:5001` | MusicBrainz web server URL |
### Environment Variables
```bash
# Database configuration
DB_HOST=db
DB_PORT=5432
DB_NAME=musicbrainz_db
DB_USER=musicbrainz
DB_PASSWORD=musicbrainz
# Web server configuration
MUSICBRAINZ_WEB_SERVER_PORT=5001
```
## Troubleshooting Commands ## Troubleshooting Commands
### Check MusicBrainz Server Status ### Check MusicBrainz Server Status
```bash ```bash
# Test if server is running # Test if web server is running
curl -I http://localhost:5001 curl -I http://localhost:5001
# Test API endpoint # Test database connection
curl http://localhost:5001/ws/2/artist/?query=name:AC/DC&fmt=json docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"
``` ```
### Validate JSON File ### Validate JSON File
```bash ```bash
# Check if JSON is valid # Check if JSON is valid
python -m json.tool songs.json python -m json.tool data/songs.json
# Check JSON structure # Check JSON structure
python -c "import json; data=json.load(open('songs.json')); print('Valid JSON array with', len(data), 'items')" python -c "import json; data=json.load(open('data/songs.json')); print('Valid JSON array with', len(data), 'items')"
``` ```
### Check Python Dependencies ### Test Tool Connection
```bash ```bash
# Check if requests is installed # Test database connection
python -c "import requests; print('requests version:', requests.__version__)" docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
# Install if missing # Test API connection
pip install requests docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
``` ```
## Advanced Usage ## Advanced Usage
### Custom MusicBrainz Server ### Batch Processing
To use a different MusicBrainz server, modify the script: To process multiple files, you can use shell scripting:
```python ```bash
# In musicbrainz_cleaner.py, change: # Process all JSON files in data directory
self.base_url = "http://your-server:5001" for file in data/*.json; do
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source "$file"
done
``` ```
### Verbose Output ### Large Files
For debugging, you can modify the script to add more verbose output by uncommenting debug print statements. For large files, the tool processes songs efficiently with:
- Direct database access for maximum speed
- Progress tracking every 100 songs
- Memory-efficient processing
- No rate limiting with database access
### Custom Processing
```bash
# Process with custom chunk size (for testing)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --limit 1000
# Process with custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success my_cleaned.json --output-failure my_failed.json
```
## Command Line Shortcuts ## Command Line Shortcuts
@ -299,18 +375,22 @@ Add these to your shell profile for convenience:
```bash ```bash
# Add to ~/.bashrc or ~/.zshrc # Add to ~/.bashrc or ~/.zshrc
alias mbclean='python musicbrainz_cleaner.py' alias mbclean='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main'
alias mbclean-help='python musicbrainz_cleaner.py --help' alias mbclean-help='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help'
alias mbclean-test='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection'
``` ```
### Usage with Aliases ### Usage with Aliases
```bash ```bash
# Using alias # Using alias
mbclean songs.json mbclean --source data/songs.json
# Show help # Show help
mbclean-help mbclean-help
# Test connection
mbclean-test
``` ```
## Integration Examples ## Integration Examples
@ -319,8 +399,8 @@ mbclean-help
```bash ```bash
# Process files and commit changes # Process files and commit changes
python musicbrainz_cleaner.py songs.json docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json
git add songs_cleaned.json git add data/songs-success.json data/songs-failure.json
git commit -m "Clean song metadata with MusicBrainz IDs" git commit -m "Clean song metadata with MusicBrainz IDs"
``` ```
@ -328,7 +408,7 @@ git commit -m "Clean song metadata with MusicBrainz IDs"
```bash ```bash
# Add to crontab to process files daily # Add to crontab to process files daily
0 2 * * * cd /path/to/musicbrainz-cleaner && python musicbrainz_cleaner.py /path/to/songs.json 0 2 * * * cd /path/to/musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source /path/to/songs.json
``` ```
### With Shell Scripts ### With Shell Scripts
@ -337,13 +417,18 @@ git commit -m "Clean song metadata with MusicBrainz IDs"
#!/bin/bash #!/bin/bash
# clean_songs.sh # clean_songs.sh
INPUT_FILE="$1" INPUT_FILE="$1"
OUTPUT_FILE="${INPUT_FILE%.json}_cleaned.json" OUTPUT_SUCCESS="${INPUT_FILE%.json}-success.json"
OUTPUT_FAILURE="${INPUT_FILE%.json}-failure.json"
python musicbrainz_cleaner.py "$INPUT_FILE" "$OUTPUT_FILE" docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main \
--source "$INPUT_FILE" \
--output-success "$OUTPUT_SUCCESS" \
--output-failure "$OUTPUT_FAILURE"
if [ $? -eq 0 ]; then if [ $? -eq 0 ]; then
echo "Successfully cleaned $INPUT_FILE" echo "Successfully processed $INPUT_FILE"
echo "Output saved to $OUTPUT_FILE" echo "Successful songs: $OUTPUT_SUCCESS"
echo "Failed songs: $OUTPUT_FAILURE"
else else
echo "Error processing $INPUT_FILE" echo "Error processing $INPUT_FILE"
exit 1 exit 1
@ -354,7 +439,10 @@ fi
| Command | Description | | Command | Description |
|---------|-------------| |---------|-------------|
| `python musicbrainz_cleaner.py file.json` | Basic usage | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main` | Process all songs with defaults |
| `python musicbrainz_cleaner.py file.json output.json` | Custom output | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source file.json` | Process specific file |
| `python musicbrainz_cleaner.py --help` | Show help | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000` | Process first 1000 songs |
| `python musicbrainz_cleaner.py --version` | Show version | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection` | Test database connection |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api` | Force API mode |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help` | Show help |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version` | Show version |

52
PRD.md
View File

@ -38,13 +38,16 @@ docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*)
docker-compose run --rm musicbrainz-cleaner python3 -c "from src.api.database import MusicBrainzDatabase; db = MusicBrainzDatabase(); print('Connection result:', db.connect())" docker-compose run --rm musicbrainz-cleaner python3 -c "from src.api.database import MusicBrainzDatabase; db = MusicBrainzDatabase(); print('Connection result:', db.connect())"
``` ```
### 4. Run Tests ### 4. Run the Cleaner
```bash ```bash
# Test 100 random songs # Process all songs with default settings
docker-compose run --rm musicbrainz-cleaner python3 test_100_random.py docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Or other test scripts # Process with custom options
docker-compose run --rm musicbrainz-cleaner python3 [script_name].py docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json --limit 1000
# Test connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
``` ```
**⚠️ Critical**: Always run scripts via Docker - the cleaner cannot connect to the database directly from outside the container. **⚠️ Critical**: Always run scripts via Docker - the cleaner cannot connect to the database directly from outside the container.
@ -119,10 +122,15 @@ Users have song data in JSON format with inconsistent artist names, song titles,
#### 6. CLI Interface #### 6. CLI Interface
- **REQ-034:** Command-line interface with argument parsing - **REQ-034:** Command-line interface with argument parsing
- **REQ-035:** Support for input and optional output file specification - **REQ-035:** Support for source file specification with smart defaults
- **REQ-036:** Progress reporting during processing - **REQ-036:** Progress reporting during processing with song counter
- **REQ-037:** Error handling and user-friendly messages - **REQ-037:** Error handling and user-friendly messages
- **REQ-038:** Option to force API mode with `--use-api` flag - **REQ-038:** Option to force API mode with `--use-api` flag
- **NEW REQ-039:** Simplified CLI with default full dataset processing
- **NEW REQ-040:** Separate output files for successful and failed songs (array format)
- **NEW REQ-041:** Human-readable text report with statistics
- **NEW REQ-042:** Configurable processing limits and output file paths
- **NEW REQ-043:** Smart defaults for all file paths and options
### ✅ Non-Functional Requirements ### ✅ Non-Functional Requirements
@ -170,7 +178,11 @@ src/
│ ├── __init__.py │ ├── __init__.py
│ └── constants.py # Constants and settings │ └── constants.py # Constants and settings
├── core/ # Core functionality ├── core/ # Core functionality
├── utils/ # Utility functions ├── tests/ # Test files and scripts
│ ├── __init__.py
│ ├── test_*.py # Unit and integration tests
│ └── debug_*.py # Debug scripts
└── utils/ # Utility functions
``` ```
### Architectural Principles ### Architectural Principles
@ -183,6 +195,7 @@ src/
- **Fallback Strategy**: Automatic fallback to API when database unavailable - **Fallback Strategy**: Automatic fallback to API when database unavailable
- **NEW**: **Database-First**: Always use live database data over static caches - **NEW**: **Database-First**: Always use live database data over static caches
- **NEW**: **Intelligent Collaboration Detection**: Distinguish band names from collaborations - **NEW**: **Intelligent Collaboration Detection**: Distinguish band names from collaborations
- **NEW**: **Test Organization**: All test files must be placed in `src/tests/` directory, not in root
### Data Flow ### Data Flow
1. Read JSON input file 1. Read JSON input file
@ -224,6 +237,13 @@ src/
- Manual configuration needed for custom artist/recording mappings - Manual configuration needed for custom artist/recording mappings
- **NEW**: Some edge cases may require manual intervention (data quality issues) - **NEW**: Some edge cases may require manual intervention (data quality issues)
### Test File Organization
- **REQUIRED**: All test files must be placed in `src/tests/` directory
- **PROHIBITED**: Test files should not be placed in the root directory
- **Naming Convention**: Test files should follow `test_*.py` or `debug_*.py` patterns
- **Purpose**: Keeps root directory clean and organizes test code properly
- **Import Path**: Tests can import from parent modules using relative imports
## Server Setup Requirements ## Server Setup Requirements
### MusicBrainz Server Configuration ### MusicBrainz Server Configuration
@ -295,6 +315,11 @@ docker-compose logs -f musicbrainz
- [x] **NEW**: Band name vs collaboration distinction - [x] **NEW**: Band name vs collaboration distinction
- [x] **NEW**: Complex collaboration parsing - [x] **NEW**: Complex collaboration parsing
- [x] **NEW**: Removed problematic known_artists cache - [x] **NEW**: Removed problematic known_artists cache
- [x] **NEW**: Simplified CLI with default full dataset processing
- [x] **NEW**: Separate output files for successful and failed songs (array format)
- [x] **NEW**: Human-readable text reports with statistics
- [x] **NEW**: Smart defaults for all file paths and options
- [x] **NEW**: Configurable processing limits and output file paths
### 🔄 Future Enhancements ### 🔄 Future Enhancements
- [ ] Web interface option - [ ] Web interface option
@ -396,14 +421,17 @@ pip install -r requirements.txt
### Usage ### Usage
```bash ```bash
# Use database access (recommended, faster) # Process all songs with default settings (recommended)
python musicbrainz_cleaner.py input.json docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Process specific file with custom options
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json --limit 1000
# Force API mode (slower, fallback) # Force API mode (slower, fallback)
python musicbrainz_cleaner.py input.json --use-api docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
# Test connections # Test connections
python musicbrainz_cleaner.py --test-connection docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
``` ```
## Maintenance ## Maintenance

211
README.md
View File

@ -265,8 +265,13 @@ Before running tests, verify:
## 🧪 Testing ## 🧪 Testing
Run the test suite to verify everything works correctly: ### Test File Organization
- **REQUIRED**: All test files must be placed in `src/tests/` directory
- **PROHIBITED**: Test files should not be placed in the root directory
- **Naming Convention**: Test files should follow `test_*.py` or `debug_*.py` patterns
- **Purpose**: Keeps root directory clean and organizes test code properly
### Running Tests
```bash ```bash
# Run all tests # Run all tests
python3 src/tests/run_tests.py python3 src/tests/run_tests.py
@ -288,7 +293,25 @@ python3 src/tests/run_tests.py --list
- **Integration Tests**: Test interactions between components and database - **Integration Tests**: Test interactions between components and database
- **Debug Tests**: Debug scripts and troubleshooting tools - **Debug Tests**: Debug scripts and troubleshooting tools
## 📁 Data Files ## 📁 Project Structure
```
musicbrainz-cleaner/
├── src/
│ ├── api/ # Database and API access
│ ├── cli/ # Command-line interface
│ ├── config/ # Configuration and constants
│ ├── core/ # Core functionality
│ ├── tests/ # Test files (REQUIRED location)
│ └── utils/ # Utility functions
├── data/ # Data files and output
│ ├── known_artists.json # Name variations (ACDC → AC/DC)
│ ├── known_recordings.json # Known recording MBIDs
│ └── songs.json # Source songs file
└── docker-compose.yml # Docker configuration
```
### Data Files
The tool uses external JSON files for name variations: The tool uses external JSON files for name variations:
@ -342,40 +365,61 @@ These files can be easily updated without touching the code, making it simple to
- **Numbers**: "98 Degrees", "S Club 7", "3 Doors Down" - **Numbers**: "98 Degrees", "S Club 7", "3 Doors Down"
- **Special Characters**: "a-ha", "The B-52s", "Salt-N-Pepa" - **Special Characters**: "a-ha", "The B-52s", "Salt-N-Pepa"
### 🆕 Simplified Processing
- **Default Behavior**: Process all songs by default (no special flags needed)
- **Separate Output Files**: Successful and failed songs saved to different files
- **Progress Tracking**: Real-time progress with song counter and status
- **Smart Defaults**: Sensible defaults for all file paths and options
- **Detailed Reporting**: Comprehensive statistics and processing report
- **Batch Processing**: Efficient handling of large song collections
## 📖 Usage Examples ## 📖 Usage Examples
### Basic Usage ### Basic Usage (Default)
```bash ```bash
# Clean your songs and save to auto-generated filename # Process all songs with default settings (data/songs.json)
python musicbrainz_cleaner.py my_songs.json docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Output: my_songs_cleaned.json # Output: data/songs-success.json and data/songs-failure.json
``` ```
### Custom Output File ### Custom Source File
```bash ```bash
# Specify your own output filename # Process specific file
python musicbrainz_cleaner.py my_songs.json cleaned_songs.json docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
# Output: data/my_songs-success.json and data/my_songs-failure.json
```
### Custom Output Files
```bash
# Specify custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json
```
### Limit Processing
```bash
# Process only first 1000 songs
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
``` ```
### Force API Mode ### Force API Mode
```bash ```bash
# Use HTTP API instead of database (slower but works without PostgreSQL) # Use HTTP API instead of database (slower but works without PostgreSQL)
python musicbrainz_cleaner.py my_songs.json --use-api docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
``` ```
### Test Connections ### Test Connections
```bash ```bash
# Test database connection # Test database connection
python musicbrainz_cleaner.py --test-connection docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
# Test with API mode # Test with API mode
python musicbrainz_cleaner.py --test-connection --use-api docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
``` ```
### Help ### Help
```bash ```bash
# Show usage information # Show usage information
python musicbrainz_cleaner.py --help docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help
``` ```
## 📁 Data Files ## 📁 Data Files
@ -406,7 +450,10 @@ Your JSON file should contain an array of song objects:
## 📤 Output Format ## 📤 Output Format
The tool will update your objects with corrected data: The tool creates **three output files**:
### 1. Successful Songs (`source-success.json`)
Array of successfully processed songs with MBIDs added:
```json ```json
[ [
@ -433,39 +480,123 @@ The tool will update your objects with corrected data:
] ]
``` ```
### 2. Failed Songs (`source-failure.json`)
Array of songs that couldn't be processed (same format as source):
```json
[
{
"artist": "Unknown Artist",
"title": "Unknown Song",
"disabled": false,
"favorite": false,
"guid": "12345678-1234-1234-1234-123456789012",
"path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
}
]
```
### 3. Processing Report (`processing_report_YYYYMMDD_HHMMSS.txt`)
Human-readable text report with statistics and failed song list:
```
MusicBrainz Data Cleaner - Processing Report
==================================================
Source File: data/songs.json
Processing Date: 2024-12-19 14:30:22
Processing Time: 15263.3 seconds
SUMMARY
--------------------
Total Songs Processed: 49,170
Successful Songs: 40,692
Failed Songs: 8,478
Success Rate: 82.8%
DETAILED STATISTICS
--------------------
Artists Found: 44,526/49,170 (90.6%)
Recordings Found: 40,998/49,170 (83.4%)
Processing Speed: 3.2 songs/second
OUTPUT FILES
--------------------
Successful Songs: data/songs-success.json
Failed Songs: data/songs-failure.json
Report File: data/processing_report_20241219_143022.txt
FAILED SONGS (First 50)
--------------------
1. Unknown Artist - Unknown Song
2. Invalid Artist - Invalid Title
3. Test Artist - Test Song
...
```
## 🎬 Example Run ## 🎬 Example Run
### Basic Processing
```bash ```bash
$ python musicbrainz_cleaner.py data/sample_songs.json $ docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
Processing 3 songs... 🚀 Starting song processing...
📊 Total songs to process: 49,170
Using database connection Using database connection
================================================== ==================================================
[1/3] Processing: ACDC - Shot In The Dark [1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
🎯 Fuzzy match found: ACDC → AC/DC (score: 0.85) [2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
✅ Found artist: AC/DC (MBID: 66c662b6-6e2f-4930-8610-912e24c63ed1) [3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
🎯 Fuzzy match found: Shot In The Dark → Shot in the Dark (score: 0.92) [4 of 49,170] ✅ PASS: Taylor Swift - Love Story
✅ Found recording: Shot in the Dark (MBID: cf8b5cd0-d97c-413d-882f-fc422a2e57db) ...
✅ Updated to: AC/DC - Shot in the Dark
[2/3] Processing: Bruno Mars ft. Cardi B - Finesse Remix 📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
🎯 Fuzzy match found: Bruno Mars → Bruno Mars (score: 1.00) 📈 Progress: 200/49,170 (0.4%) - Success: 87.5% - Rate: 3.1 songs/sec
✅ Found artist: Bruno Mars (MBID: afb680f2-b6eb-4cd7-a70b-a63b25c763d5) ...
🎯 Fuzzy match found: Finesse Remix → Finesse (remix) (score: 0.88)
✅ Found recording: Finesse (remix) (MBID: 8ed14014-547a-4128-ab81-c2dca7ae198e)
✅ Updated to: Bruno Mars feat. Cardi B - Finesse (remix)
[3/3] Processing: Taylor Swift - Love Story
🎯 Fuzzy match found: Taylor Swift → Taylor Swift (score: 1.00)
✅ Found artist: Taylor Swift (MBID: 20244d07-534f-4eff-b4d4-930878889970)
🎯 Fuzzy match found: Love Story → Love Story (score: 1.00)
✅ Found recording: Love Story (MBID: d783e6c5-761f-4fc3-bfcf-6089cdfc8f96)
✅ Updated to: Taylor Swift - Love Story
================================================== ==================================================
✅ Processing complete! 🎉 Processing completed!
📁 Output saved to: data/sample_songs_cleaned.json 📊 Final Results:
⏱️ Total processing time: 15263.3 seconds
🚀 Average speed: 3.2 songs/second
✅ Artists found: 44,526/49,170 (90.6%)
✅ Recordings found: 40,998/49,170 (83.4%)
❌ Failed songs: 8,478 (17.2%)
📄 Files saved:
✅ Successful songs: data/songs-success.json
❌ Failed songs: data/songs-failure.json
📋 Text report: data/processing_report_20241219_143022.txt
📊 JSON report: data/processing_report_20241219_143022.json
```
### Limited Processing
```bash
$ docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
⚠️ Limiting processing to first 1000 songs
🚀 Starting song processing...
📊 Total songs to process: 1,000
Using database connection
==================================================
[1 of 1,000] ✅ PASS: ACDC - Shot In The Dark
[2 of 1,000] ❌ FAIL: Unknown Artist - Unknown Song
...
==================================================
🎉 Processing completed!
📊 Final Results:
⏱️ Total processing time: 312.5 seconds
🚀 Average speed: 3.2 songs/second
✅ Artists found: 856/1,000 (85.6%)
✅ Recordings found: 789/1,000 (78.9%)
❌ Failed songs: 211 (21.1%)
📄 Files saved:
✅ Successful songs: data/songs-success.json
❌ Failed songs: data/songs-failure.json
📋 Text report: data/processing_report_20241219_143022.txt
📊 JSON report: data/processing_report_20241219_143022.json
``` ```
## 🔧 Troubleshooting ## 🔧 Troubleshooting
@ -601,6 +732,12 @@ This tool is provided as-is for educational and personal use.
- **Fuzzy search thresholds** need tuning for different datasets - **Fuzzy search thresholds** need tuning for different datasets
- **Connection pooling** would improve performance for large datasets - **Connection pooling** would improve performance for large datasets
### CLI Design
- **Simplified interface** with smart defaults reduces complexity
- **Array format consistency** makes output files easier to work with
- **Human-readable reports** improve user experience
- **Test file organization** keeps project structure clean
--- ---
**Happy cleaning! 🎵✨** **Happy cleaning! 🎵✨**

View File

@ -63689,6 +63689,14 @@
"path": "z://MP4\\KaraFun Karaoke\\Karaoke Zoom - The Commodores.mp4", "path": "z://MP4\\KaraFun Karaoke\\Karaoke Zoom - The Commodores.mp4",
"title": "Zoom" "title": "Zoom"
}, },
{
"artist": "Kat DeLuna feat. Elephant Man",
"disabled": false,
"favorite": false,
"guid": "a5487de7-4ec6-d6bb-7e88-6ec275133a52",
"path": "z://MP4\\KaraFun Karaoke\\Kat DeLuna feat. Elephant Man - Whine Up.mp4",
"title": "Whine Up"
},
{ {
"artist": "Marillion", "artist": "Marillion",
"disabled": false, "disabled": false,
@ -69378,14 +69386,6 @@
"path": "z://MP4\\KaraFun Karaoke\\Whenever, Wherever - Shakira Karaoke Version KaraFun.mp4", "path": "z://MP4\\KaraFun Karaoke\\Whenever, Wherever - Shakira Karaoke Version KaraFun.mp4",
"title": "Whenever, Wherever" "title": "Whenever, Wherever"
}, },
{
"artist": "Whine Up",
"disabled": false,
"favorite": false,
"guid": "04f64889-07cc-2811-bf7d-3fa235859e25",
"path": "z://MP4\\KaraFun Karaoke\\Whine Up - Kat DeLuna feat. Elephant Man Karaoke Version KaraFun.mp4",
"title": "Kat DeLuna feat. Elephant Man Karaoke Version KaraFun"
},
{ {
"artist": "Michael Bublé", "artist": "Michael Bublé",
"disabled": false, "disabled": false,
@ -169868,7 +169868,7 @@
"title": "Girl Crush" "title": "Girl Crush"
}, },
{ {
"artist": "Little Mix", "artist": "Little M!x",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"genre": "Karaoke", "genre": "Karaoke",
@ -169877,7 +169877,7 @@
"title": "How Ya Doin'" "title": "How Ya Doin'"
}, },
{ {
"artist": "Little Mix", "artist": "Little M!x",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"genre": "Karaoke", "genre": "Karaoke",
@ -200306,6 +200306,14 @@
"path": "z://MP4\\Stingray Karaoke\\Bartender Lady Antebellum Karaoke with Lyrics.mp4", "path": "z://MP4\\Stingray Karaoke\\Bartender Lady Antebellum Karaoke with Lyrics.mp4",
"title": "Bartender" "title": "Bartender"
}, },
{
"artist": "Bastille",
"disabled": false,
"favorite": false,
"guid": "2ca64591-9d7c-e975-5340-ca1804195902",
"path": "z://MP4\\Stingray Karaoke\\Bastille - Pompeii.mp4",
"title": "Pompeii"
},
{ {
"artist": "No Doubt", "artist": "No Doubt",
"disabled": false, "disabled": false,
@ -200549,6 +200557,14 @@
"path": "z://MP4\\Stingray Karaoke\\Black Sabbath - Paranoid.mp4", "path": "z://MP4\\Stingray Karaoke\\Black Sabbath - Paranoid.mp4",
"title": "Paranoid" "title": "Paranoid"
}, },
{
"artist": "Black Sabbath",
"disabled": false,
"favorite": false,
"guid": "b1e7d35c-6682-546e-4b29-5829d3343899",
"path": "z://MP4\\Stingray Karaoke\\Black Sabbath - Snowblind.mp4",
"title": "Snowblind"
},
{ {
"artist": "Elton John", "artist": "Elton John",
"disabled": false, "disabled": false,
@ -202891,6 +202907,14 @@
"path": "z://MP4\\Stingray Karaoke\\Jingle Bell Rock Bobby Helms Karaoke with Lyrics.mp4", "path": "z://MP4\\Stingray Karaoke\\Jingle Bell Rock Bobby Helms Karaoke with Lyrics.mp4",
"title": "Jingle Bell Rock" "title": "Jingle Bell Rock"
}, },
{
"artist": "John Lennon",
"disabled": false,
"favorite": false,
"guid": "1b67db72-da96-f56f-2e38-c6ace52dfc1a",
"path": "z://MP4\\Stingray Karaoke\\John Lennon - Give Peace A Chance.mp4",
"title": "Give Peace A Chance"
},
{ {
"artist": "John Lennon", "artist": "John Lennon",
"disabled": false, "disabled": false,
@ -202987,38 +203011,6 @@
"path": "z://MP4\\Stingray Karaoke\\Justin Bieber - Mistletoe (Karaoke Version).mp4", "path": "z://MP4\\Stingray Karaoke\\Justin Bieber - Mistletoe (Karaoke Version).mp4",
"title": "Mistletoe" "title": "Mistletoe"
}, },
{
"artist": "Bastille",
"disabled": false,
"favorite": false,
"guid": "e270d6ed-4e3e-9db2-d5ff-a06e1cc6b7d3",
"path": "z://MP4\\Stingray Karaoke\\Karaoke Version Pompeii in the Style of Bastille with lyrics (no lead vocal).mp4",
"title": "Karaoke Version Pompeii"
},
{
"artist": "John Lennon",
"disabled": false,
"favorite": false,
"guid": "caa7c825-12a7-57b7-fafd-4e259851bd54",
"path": "z://MP4\\Stingray Karaoke\\Karaoke Video Give Peace A Chance in the Style of John Lennon with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Give Peace A Chance"
},
{
"artist": "Peggy Lee",
"disabled": false,
"favorite": false,
"guid": "532b8b6d-4fa3-b5b4-3879-765d6bc223e2",
"path": "z://MP4\\Stingray Karaoke\\Karaoke Video Sing Fever in the Style of Peggy Lee with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Sing Fever"
},
{
"artist": "Black Sabbath",
"disabled": false,
"favorite": false,
"guid": "4dae1287-ffd5-7d98-3b39-21cc172bebd5",
"path": "z://MP4\\Stingray Karaoke\\Karaoke Video Snowblind in the Style of Black Sabbath with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Snowblind"
},
{ {
"artist": "Katy Perry", "artist": "Katy Perry",
"disabled": false, "disabled": false,
@ -204138,6 +204130,14 @@
"path": "z://MP4\\Stingray Karaoke\\Pearl Jam - Black (Karaoke Version).mp4", "path": "z://MP4\\Stingray Karaoke\\Pearl Jam - Black (Karaoke Version).mp4",
"title": "Black" "title": "Black"
}, },
{
"artist": "Peggy Lee",
"disabled": false,
"favorite": false,
"guid": "bac4bc08-7c95-f059-8abe-a723960cd2aa",
"path": "z://MP4\\Stingray Karaoke\\Peggy Lee - Sing Fever.mp4",
"title": "Sing Fever"
},
{ {
"artist": "Pentatonix", "artist": "Pentatonix",
"disabled": false, "disabled": false,
@ -207329,6 +207329,14 @@
"path": "z://MP4\\TheKARAOKEChannel\\Birthday in the Style of The Beatles karaoke video with lyrics (no lead vocal).mp4", "path": "z://MP4\\TheKARAOKEChannel\\Birthday in the Style of The Beatles karaoke video with lyrics (no lead vocal).mp4",
"title": "Birthday" "title": "Birthday"
}, },
{
"artist": "Black Sabbath",
"disabled": false,
"favorite": false,
"guid": "b0dfc4cd-8a54-0db4-378a-266d8e14e882",
"path": "z://MP4\\TheKARAOKEChannel\\Black Sabbath - Snowblind.mp4",
"title": "Snowblind"
},
{ {
"artist": "Alannah Myles", "artist": "Alannah Myles",
"disabled": false, "disabled": false,
@ -208913,6 +208921,14 @@
"path": "z://MP4\\TheKARAOKEChannel\\Green Day - I Fought The Law.mp4", "path": "z://MP4\\TheKARAOKEChannel\\Green Day - I Fought The Law.mp4",
"title": "I Fought The Law" "title": "I Fought The Law"
}, },
{
"artist": "Gretchen Wilson",
"disabled": false,
"favorite": false,
"guid": "660cec43-1346-f165-7f30-ada5d336d123",
"path": "z://MP4\\TheKARAOKEChannel\\Gretchen Wilson - Here For The Party.mp4",
"title": "Here For The Party"
},
{ {
"artist": "Miranda Lambert", "artist": "Miranda Lambert",
"disabled": false, "disabled": false,
@ -209841,6 +209857,14 @@
"path": "z://MP4\\TheKARAOKEChannel\\John Legend - All Of Me (Lyrics).mp4", "path": "z://MP4\\TheKARAOKEChannel\\John Legend - All Of Me (Lyrics).mp4",
"title": "All Of Me" "title": "All Of Me"
}, },
{
"artist": "John Lennon",
"disabled": false,
"favorite": false,
"guid": "a9a33863-57a0-01f2-0060-386e0f3cdc32",
"path": "z://MP4\\TheKARAOKEChannel\\John Lennon - Give Peace A Chance.mp4",
"title": "Give Peace A Chance"
},
{ {
"artist": "John Lennon", "artist": "John Lennon",
"disabled": false, "disabled": false,
@ -210017,54 +210041,6 @@
"path": "z://MP4\\TheKARAOKEChannel\\Karaoke Version Pompeii in the Style of Bastille with lyrics (no lead vocal).mp4", "path": "z://MP4\\TheKARAOKEChannel\\Karaoke Version Pompeii in the Style of Bastille with lyrics (no lead vocal).mp4",
"title": "Karaoke Version Pompeii" "title": "Karaoke Version Pompeii"
}, },
{
"artist": "Kelly Clarkson",
"disabled": false,
"favorite": false,
"guid": "ba61f0c9-cf3c-47b5-2d2f-b5de183348e9",
"path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video Because of You in the Style of Kelly Clarkson with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Because of You"
},
{
"artist": "John Lennon",
"disabled": false,
"favorite": false,
"guid": "d0610705-cbce-05c4-e9ce-c7e0d5dec594",
"path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video Give Peace A Chance in the Style of John Lennon with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Give Peace A Chance"
},
{
"artist": "Gretchen Wilson",
"disabled": false,
"favorite": false,
"guid": "0ad553d3-6816-a567-aae1-ffe72f717c95",
"path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video Here For The Party in the Style of Gretchen Wilson with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Here For The Party"
},
{
"artist": "Peggy Lee",
"disabled": false,
"favorite": false,
"guid": "9c448677-ebaa-f8f4-9a32-55d16451515b",
"path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video Sing Fever in the Style of Peggy Lee with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Sing Fever"
},
{
"artist": "Black Sabbath",
"disabled": false,
"favorite": false,
"guid": "ae8512c0-733e-b962-708f-57e74c77c7f8",
"path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video Snowblind in the Style of Black Sabbath with lyrics (no lead vocal).mp4",
"title": "Karaoke Video Snowblind"
},
{
"artist": "Jewel",
"disabled": false,
"favorite": false,
"guid": "8775ad86-a0a0-2877-99ef-5b59f928ac4c",
"path": "z://MP4\\TheKARAOKEChannel\\KarenLovesAdam performs Foolish Games for the Undercover Karaoke Challenge in the style of Jewel.mp4",
"title": "KarenLovesAdam performs Foolish Games for the Undercover Karaoke Challenge"
},
{ {
"artist": "Culture Club", "artist": "Culture Club",
"disabled": false, "disabled": false,
@ -210137,6 +210113,14 @@
"path": "z://MP4\\TheKARAOKEChannel\\Kelly Clarkson & Ariana Grande - Santa, Cant You Hear Me (Karaoke With Lyrics).mp4", "path": "z://MP4\\TheKARAOKEChannel\\Kelly Clarkson & Ariana Grande - Santa, Cant You Hear Me (Karaoke With Lyrics).mp4",
"title": "Santa, Cant You Hear Me" "title": "Santa, Cant You Hear Me"
}, },
{
"artist": "Kelly Clarkson",
"disabled": false,
"favorite": false,
"guid": "6362d98a-46df-48a6-6bf9-7a3df08a390c",
"path": "z://MP4\\TheKARAOKEChannel\\Kelly Clarkson - Because of You .mp4",
"title": "Because of You"
},
{ {
"artist": "Kenny Rogers", "artist": "Kenny Rogers",
"disabled": false, "disabled": false,
@ -211369,6 +211353,14 @@
"path": "z://MP4\\TheKARAOKEChannel\\Peaceful Easy Feeling Eagles Karaoke with Lyrics.mp4", "path": "z://MP4\\TheKARAOKEChannel\\Peaceful Easy Feeling Eagles Karaoke with Lyrics.mp4",
"title": "Peaceful Easy Feeling" "title": "Peaceful Easy Feeling"
}, },
{
"artist": "Peggy Lee",
"disabled": false,
"favorite": false,
"guid": "8c50e858-3dc6-19d9-a417-5331e763af4b",
"path": "z://MP4\\TheKARAOKEChannel\\Peggy Lee - Sing Fever.mp4",
"title": "Sing Fever"
},
{ {
"artist": "Pentatonix", "artist": "Pentatonix",
"disabled": false, "disabled": false,
@ -218141,15 +218133,6 @@
"path": "z://MP4\\ZoomKaraokeOfficial\\All About Eve - Marthas Harbour.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\All About Eve - Marthas Harbour.mp4",
"title": "Martha's Harbour" "title": "Martha's Harbour"
}, },
{
"artist": "All I Want For Christmas Is My Two Front Teeth",
"disabled": false,
"favorite": false,
"genre": "Karaoke",
"guid": "f347cb5c-fbc2-2d4d-a124-741414c58939",
"path": "z://MP4\\ZoomKaraokeOfficial\\All I Want For Christmas Is My Two Front Teeth - Karaoke Version from Zoom Karaoke.mp4",
"title": "Karaoke Version from Zoom Karaoke"
},
{ {
"artist": "All Saints", "artist": "All Saints",
"disabled": false, "disabled": false,
@ -219192,7 +219175,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "b5b380d6-6699-d1b6-095b-e7721f553838", "guid": "b5b380d6-6699-d1b6-095b-e7721f553838",
"path": "z://MP4\\ZoomKaraokeOfficial\\Annie Soundtrack - Tomorrow Karaoke Version from Zoom Karaoke (1982 Version).mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Annie Soundtrack - Tomorrow Karaoke Version from Zoom Karaoke (1982 Version).mp4",
"title": "Tomorrow" "title": "Tomorrow - Karaoke Version from Zoom Karaoke (1982 Version)"
}, },
{ {
"artist": "Another Level", "artist": "Another Level",
@ -220407,7 +220390,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "892e330d-aeb9-195d-e2ac-1d2ed44b00f2", "guid": "892e330d-aeb9-195d-e2ac-1d2ed44b00f2",
"path": "z://MP4\\ZoomKaraokeOfficial\\Bananarama - Venus Karaoke Version from Zoom Karaoke (Lyric Fixed).mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Bananarama - Venus Karaoke Version from Zoom Karaoke (Lyric Fixed).mp4",
"title": "Venus (Lyric Fixed)" "title": "Venus - Karaoke Version from Zoom Karaoke (Lyric Fixed)"
}, },
{ {
"artist": "Bananarama", "artist": "Bananarama",
@ -220416,7 +220399,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "3cda0061-edc5-71eb-b1ce-a592de40fed8", "guid": "3cda0061-edc5-71eb-b1ce-a592de40fed8",
"path": "z://MP4\\ZoomKaraokeOfficial\\Bananarama - Venus Karaoke Version from Zoom Karaoke (Old Version).mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Bananarama - Venus Karaoke Version from Zoom Karaoke (Old Version).mp4",
"title": "Venus (Old Version)" "title": "Venus - Karaoke Version from Zoom Karaoke (Old Version)"
}, },
{ {
"artist": "Band Aid 30", "artist": "Band Aid 30",
@ -223107,7 +223090,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "708df1c8-4d05-607f-30a0-7ea1cf43c026", "guid": "708df1c8-4d05-607f-30a0-7ea1cf43c026",
"path": "z://MP4\\ZoomKaraokeOfficial\\Blondie - Union City Blue Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Blondie - Union City Blue Karaoke Version from Zoom Karaoke.mp4",
"title": "Union City Blue - Karaoke Version from Zoom Karaoke" "title": "Union City Blue"
}, },
{ {
"artist": "Blood Brothers", "artist": "Blood Brothers",
@ -225150,7 +225133,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "ee0c36f0-bf6c-4d87-78f1-356e07824609", "guid": "ee0c36f0-bf6c-4d87-78f1-356e07824609",
"path": "z://MP4\\ZoomKaraokeOfficial\\Bruce Springsteen - Sherry Darling Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Bruce Springsteen - Sherry Darling Karaoke Version from Zoom Karaoke.mp4",
"title": "Sherry Darling - Karaoke Version from Zoom Karaoke" "title": "Sherry Darling"
}, },
{ {
"artist": "Bruce Springsteen", "artist": "Bruce Springsteen",
@ -227112,7 +227095,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "89e2f35a-d52a-54f7-da46-8a590ee38f68", "guid": "89e2f35a-d52a-54f7-da46-8a590ee38f68",
"path": "z://MP4\\ZoomKaraokeOfficial\\Charli XCX - Speed Drive Karaoke Version from Zoom Karaoke (Barbie Movie).mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Charli XCX - Speed Drive Karaoke Version from Zoom Karaoke (Barbie Movie).mp4",
"title": "Speed Drive (Barbie Movie)" "title": "Speed Drive - Karaoke Version from Zoom Karaoke (Barbie Movie)"
}, },
{ {
"artist": "Charli XCX ft. Ariana Grande", "artist": "Charli XCX ft. Ariana Grande",
@ -230082,7 +230065,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "0173e449-92bc-8c75-7054-105227a56c19", "guid": "0173e449-92bc-8c75-7054-105227a56c19",
"path": "z://MP4\\ZoomKaraokeOfficial\\Darlene Love - All Alone On Christmas Karaoke Version from Zoom Karaoke (from Home Alone).mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Darlene Love - All Alone On Christmas Karaoke Version from Zoom Karaoke (from Home Alone).mp4",
"title": "All Alone On Christmas (from 'Home Alone')" "title": "All Alone On Christmas - Karaoke Version from Zoom Karaoke (from 'Home Alone')"
}, },
{ {
"artist": "Darts", "artist": "Darts",
@ -230217,7 +230200,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "a48c4cc6-7a32-57e7-48ce-14ff7692b8d1", "guid": "a48c4cc6-7a32-57e7-48ce-14ff7692b8d1",
"path": "z://MP4\\ZoomKaraokeOfficial\\Dave Edmunds - From Small Things (Big Things One Day Come) Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Dave Edmunds - From Small Things (Big Things One Day Come) Karaoke Version from Zoom Karaoke.mp4",
"title": "From Small Things (Big Things One Day Come)." "title": "From Small Things (Big Things One Day Come) - Karaoke Version from Zoom Karaoke."
}, },
{ {
"artist": "Dave Edmunds", "artist": "Dave Edmunds",
@ -232143,7 +232126,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "b119decd-7823-7e21-53b8-86d6934c9d87", "guid": "b119decd-7823-7e21-53b8-86d6934c9d87",
"path": "z://MP4\\ZoomKaraokeOfficial\\Devo - Girl U Want Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Devo - Girl U Want Karaoke Version from Zoom Karaoke.mp4",
"title": "Girl U Want - Karaoke Version from Zoom Karaoke" "title": "Girl U Want"
}, },
{ {
"artist": "Dexy's Midnight Runners", "artist": "Dexy's Midnight Runners",
@ -232550,15 +232533,6 @@
"path": "z://MP4\\ZoomKaraokeOfficial\\Diplo & Miguel - Dont Forget My Love.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Diplo & Miguel - Dont Forget My Love.mp4",
"title": "Don't Forget My Love" "title": "Don't Forget My Love"
}, },
{
"artist": "Disney Villians The Musical ft. Maleficent",
"disabled": false,
"favorite": false,
"genre": "Karaoke",
"guid": "f2b524bf-d2e3-9a7e-a83a-495d79fdea59",
"path": "z://MP4\\ZoomKaraokeOfficial\\Disney Villians The Musical feat Maleficent - Karaoke Version from Zoom Karaoke.mp4",
"title": "Karaoke Version from Zoom Karaoke"
},
{ {
"artist": "Divinyls", "artist": "Divinyls",
"disabled": false, "disabled": false,
@ -238587,7 +238561,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "c9081555-9f9f-7de4-1889-e0fde67dba1c", "guid": "c9081555-9f9f-7de4-1889-e0fde67dba1c",
"path": "z://MP4\\ZoomKaraokeOfficial\\First Aid Kit - Fireworks (No Harmony For Duet) Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\First Aid Kit - Fireworks (No Harmony For Duet) Karaoke Version from Zoom Karaoke.mp4",
"title": "Fireworks (No Harmony For Duet) - Karaoke Version from Zoom Karaoke" "title": "Fireworks (No Harmony For Duet)"
}, },
{ {
"artist": "First Aid Kit", "artist": "First Aid Kit",
@ -242000,15 +241974,6 @@
"path": "z://MP4\\ZoomKaraokeOfficial\\God Save The King - British National Anthem.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\God Save The King - British National Anthem.mp4",
"title": "British National Anthem" "title": "British National Anthem"
}, },
{
"artist": "God Save The Queen",
"disabled": false,
"favorite": false,
"genre": "Karaoke",
"guid": "b7d8684d-b7af-28b9-f7cd-8ec9dd16320f",
"path": "z://MP4\\ZoomKaraokeOfficial\\God Save The Queen - Karaoke Version from Zoom Karaoke British National Anthem.mp4",
"title": "Karaoke Version from Zoom Karaoke - British National Anthem"
},
{ {
"artist": "Goldfrapp", "artist": "Goldfrapp",
"disabled": false, "disabled": false,
@ -242036,15 +242001,6 @@
"path": "z://MP4\\ZoomKaraokeOfficial\\Gonzalez - Havent Stopped Dancing Yet.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Gonzalez - Havent Stopped Dancing Yet.mp4",
"title": "Haven't Stopped Dancing Yet" "title": "Haven't Stopped Dancing Yet"
}, },
{
"artist": "Goodnight Campers from Hi-De-Hi",
"disabled": false,
"favorite": false,
"genre": "Karaoke",
"guid": "05afcc76-656b-9f6c-986f-c204a08e49eb",
"path": "z://MP4\\ZoomKaraokeOfficial\\Goodnight Campers from Hi-De-Hi - Karaoke Version from Zoom Karaoke.mp4",
"title": "Karaoke Version from Zoom Karaoke"
},
{ {
"artist": "Gorgon City ft. Zak Abel", "artist": "Gorgon City ft. Zak Abel",
"disabled": false, "disabled": false,
@ -244493,15 +244449,6 @@
"path": "z://MP4\\ZoomKaraokeOfficial\\Imelda May - Train Kept A Rollin.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Imelda May - Train Kept A Rollin.mp4",
"title": "Train Kept A Rollin'" "title": "Train Kept A Rollin'"
}, },
{
"artist": "In The Box (The Goodbye Song)",
"disabled": false,
"favorite": false,
"genre": "Karaoke",
"guid": "0c4383a2-d65f-a16c-b4b2-1a0fa1189480",
"path": "z://MP4\\ZoomKaraokeOfficial\\In The Box (The Goodbye Song) - Karaoke Version from Zoom Karaoke Australian TV Theme.mp4",
"title": "Karaoke Version from Zoom Karaoke - Australian TV Theme"
},
{ {
"artist": "Infernal", "artist": "Infernal",
"disabled": false, "disabled": false,
@ -246111,7 +246058,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "3ba6129d-ac04-b405-044d-8b453f565165", "guid": "3ba6129d-ac04-b405-044d-8b453f565165",
"path": "z://MP4\\ZoomKaraokeOfficial\\Jeannie C Riley - Harper Valley PTA Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Jeannie C Riley - Harper Valley PTA Karaoke Version from Zoom Karaoke.mp4",
"title": "Harper Valley P.T.A. - Karaoke Version from Zoom Karaoke" "title": "Harper Valley P.T.A."
}, },
{ {
"artist": "Jedward", "artist": "Jedward",
@ -250758,13 +250705,13 @@
"title": "You And I" "title": "You And I"
}, },
{ {
"artist": "Kenny Rogers -You Decorated My Life", "artist": "Kenny Rogers",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"genre": "Karaoke", "genre": "Karaoke",
"guid": "dec681a1-bddc-9920-083c-79022832ae3c", "guid": "8bf25cbb-f0b1-b7d1-3095-9bc53b1fd971",
"path": "z://MP4\\ZoomKaraokeOfficial\\Kenny Rogers -You Decorated My Life - Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Kenny Rogers - You Decorated My Life.mp4",
"title": "Karaoke Version from Zoom Karaoke" "title": "You Decorated My Life"
}, },
{ {
"artist": "Kenny Rogers And Dolly Parton", "artist": "Kenny Rogers And Dolly Parton",
@ -257280,7 +257227,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "c67b0bdb-0fdb-79e6-1aa7-d9abb648df62", "guid": "c67b0bdb-0fdb-79e6-1aa7-d9abb648df62",
"path": "z://MP4\\ZoomKaraokeOfficial\\Meghan Trainor - Good To Be Alive Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Meghan Trainor - Good To Be Alive Karaoke Version from Zoom Karaoke.mp4",
"title": "Good To Be Alive - Karaoke Version from Zoom Karaoke" "title": "Good To Be Alive"
}, },
{ {
"artist": "Meghan Trainor", "artist": "Meghan Trainor",
@ -263886,7 +263833,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "33ebdbf6-adb1-bca8-a29a-499a167b53a8", "guid": "33ebdbf6-adb1-bca8-a29a-499a167b53a8",
"path": "z://MP4\\ZoomKaraokeOfficial\\Perry Como - Ave Maria Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Perry Como - Ave Maria Karaoke Version from Zoom Karaoke.mp4",
"title": "Ave Maria - Karaoke Version from Zoom Karaoke" "title": "Ave Maria"
}, },
{ {
"artist": "Perry Como", "artist": "Perry Como",
@ -265785,7 +265732,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "da7260c6-5dab-0a8b-8229-307e1d5d2b2f", "guid": "da7260c6-5dab-0a8b-8229-307e1d5d2b2f",
"path": "z://MP4\\ZoomKaraokeOfficial\\Postmodern Jukebox - We Cant Stop (Without Backing Vocals) Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Postmodern Jukebox - We Cant Stop (Without Backing Vocals) Karaoke Version from Zoom Karaoke.mp4",
"title": "We Can't Stop (Without Backing Vocals) - Karaoke Version from Zoom Karaoke" "title": "We Can't Stop"
}, },
{ {
"artist": "Postmodern Jukebox", "artist": "Postmodern Jukebox",
@ -267675,7 +267622,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "d041e962-e78a-ad5f-e419-b670e47590e4", "guid": "d041e962-e78a-ad5f-e419-b670e47590e4",
"path": "z://MP4\\ZoomKaraokeOfficial\\Rihanna - SOS Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Rihanna - SOS Karaoke Version from Zoom Karaoke.mp4",
"title": "S.O.S. - Karaoke Version from Zoom Karaoke" "title": "S.O.S."
}, },
{ {
"artist": "Rihanna", "artist": "Rihanna",
@ -271617,7 +271564,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "1cbf4200-9369-7551-005b-63420fa9b187", "guid": "1cbf4200-9369-7551-005b-63420fa9b187",
"path": "z://MP4\\ZoomKaraokeOfficial\\Sheppard - Geronimo Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Sheppard - Geronimo Karaoke Version from Zoom Karaoke.mp4",
"title": "Geronimo - Karaoke Version from Zoom Karaoke" "title": "Geronimo"
}, },
{ {
"artist": "Sherbet", "artist": "Sherbet",
@ -271977,7 +271924,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "d10abce9-d119-e81a-4156-48bf66ea9f30", "guid": "d10abce9-d119-e81a-4156-48bf66ea9f30",
"path": "z://MP4\\ZoomKaraokeOfficial\\Showaddywaddy - Why Do Lovers Break Each Others Hearts Karaoke Version From Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Showaddywaddy - Why Do Lovers Break Each Others Hearts Karaoke Version From Zoom Karaoke.mp4",
"title": "Why Do Lovers Break Each Other's Hearts - Karaoke Version From Zoom Karaoke" "title": "Why Do Lovers Break Each Other's Hearts"
}, },
{ {
"artist": "Showaddywaddy", "artist": "Showaddywaddy",
@ -275424,7 +275371,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "50bc6b77-c2b2-bbca-522a-05ec0d55d2bd", "guid": "50bc6b77-c2b2-bbca-522a-05ec0d55d2bd",
"path": "z://MP4\\ZoomKaraokeOfficial\\Take That - SOS Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Take That - SOS Karaoke Version from Zoom Karaoke.mp4",
"title": "S.O.S. - Karaoke Version from Zoom Karaoke" "title": "S.O.S."
}, },
{ {
"artist": "Take That", "artist": "Take That",
@ -281543,7 +281490,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "75c896fe-77ad-690b-f358-dccbda4b5ae8", "guid": "75c896fe-77ad-690b-f358-dccbda4b5ae8",
"path": "z://MP4\\ZoomKaraokeOfficial\\The Mavericks - Here Comes My Baby Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\The Mavericks - Here Comes My Baby Karaoke Version from Zoom Karaoke.mp4",
"title": "Here Comes My Baby - Karaoke Version from Zoom Karaoke" "title": "Here Comes My Baby"
}, },
{ {
"artist": "The Mavericks", "artist": "The Mavericks",
@ -284417,7 +284364,7 @@
"title": "You're No Good" "title": "You're No Good"
}, },
{ {
"artist": "The Tamperer featuring Maya", "artist": "The Tamperer ft. Maya",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"genre": "Karaoke", "genre": "Karaoke",
@ -284426,7 +284373,7 @@
"title": "Feel It" "title": "Feel It"
}, },
{ {
"artist": "The Tamperer featuring Maya", "artist": "The Tamperer ft. Maya",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"genre": "Karaoke", "genre": "Karaoke",
@ -286475,7 +286422,7 @@
"genre": "Karaoke", "genre": "Karaoke",
"guid": "6e734302-c67e-cfff-c585-65e90e57c9a4", "guid": "6e734302-c67e-cfff-c585-65e90e57c9a4",
"path": "z://MP4\\ZoomKaraokeOfficial\\Tom Jones - Ill Never Fall In Love Again Karaoke Version from Zoom Karaoke.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Tom Jones - Ill Never Fall In Love Again Karaoke Version from Zoom Karaoke.mp4",
"title": "I'll Never Fall In Love Again - Karaoke Version from Zoom Karaoke" "title": "I'll Never Fall In Love Again"
}, },
{ {
"artist": "Tom Jones", "artist": "Tom Jones",
@ -288970,15 +288917,6 @@
"path": "z://MP4\\ZoomKaraokeOfficial\\Wendy Moten - Come In Out Of The Rain.mp4", "path": "z://MP4\\ZoomKaraokeOfficial\\Wendy Moten - Come In Out Of The Rain.mp4",
"title": "Come In Out Of The Rain" "title": "Come In Out Of The Rain"
}, },
{
"artist": "Werewolves Of London/All Summer Long/Sweet Home Alabama Medley",
"disabled": false,
"favorite": false,
"genre": "Karaoke",
"guid": "fec61e62-37f3-7d62-f527-aa58a8a9cb45",
"path": "z://MP4\\ZoomKaraokeOfficial\\Werewolves Of LondonAll Summer LongSweet Home Alabama Medley - Karaoke Version from Zoom Karaoke.mp4",
"title": "Karaoke Version from Zoom Karaoke"
},
{ {
"artist": "West End ft. Sybil", "artist": "West End ft. Sybil",
"disabled": false, "disabled": false,
@ -306804,7 +306742,7 @@
"title": "Metal Postcard" "title": "Metal Postcard"
}, },
{ {
"artist": "Lovato, Demi & Joe Jonas", "artist": "Lavato, Demi & Joe Jonas",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"genre": "Karaoke", "genre": "Karaoke",
@ -308673,7 +308611,7 @@
"title": "I Run To You" "title": "I Run To You"
}, },
{ {
"artist": "Lovato, Demi & Joe Jonas", "artist": "Lavato, Demi & Joe Jonas",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"genre": "Karaoke", "genre": "Karaoke",
@ -358620,7 +358558,7 @@
"title": "Green Garden" "title": "Green Garden"
}, },
{ {
"artist": "Little Mix", "artist": "Little M!x",
"disabled": false, "disabled": false,
"favorite": false, "favorite": false,
"guid": "c8a57b4d-070d-0569-f178-edc96ff7b64f", "guid": "c8a57b4d-070d-0569-f178-edc96ff7b64f",

View File

@ -348,7 +348,7 @@ class MusicBrainzDatabase:
best_recording_count = artist['recording_count'] best_recording_count = artist['recording_count']
print(f" 🎯 New best match: {artist['name']} (score: {score}, recordings: {artist['recording_count']})") print(f" 🎯 New best match: {artist['name']} (score: {score}, recordings: {artist['recording_count']})")
if best_score >= 80: if best_score >= 70: # Lower threshold for better matching
print(SUCCESS_MESSAGES['fuzzy_match_found'].format( print(SUCCESS_MESSAGES['fuzzy_match_found'].format(
original=artist_name, original=artist_name,
matched=best_match['name'], matched=best_match['name'],
@ -413,7 +413,7 @@ class MusicBrainzDatabase:
best_score = score best_score = score
best_match = recording best_match = recording
if best_score >= 80: if best_score >= 70: # Lower threshold for better matching
return (best_match['name'], best_match['gid'], best_score / 100.0) return (best_match['name'], best_match['gid'], best_score / 100.0)
else: else:
# No artist constraint - search by title only with all variations # No artist constraint - search by title only with all variations
@ -447,7 +447,7 @@ class MusicBrainzDatabase:
best_score = score best_score = score
best_match = recording best_match = recording
if best_score >= 80: if best_score >= 70: # Lower threshold for better matching
return (best_match['name'], best_match['gid'], best_score / 100.0) return (best_match['name'], best_match['gid'], best_score / 100.0)
return None return None
@ -608,15 +608,30 @@ class MusicBrainzDatabase:
import re import re
# Primary collaboration indicators # Primary collaboration indicators
primary_patterns = ['ft.', 'feat.', 'featuring'] primary_patterns = ['ft.', 'feat.', 'featuring', 'ft', 'feat']
# Secondary collaboration indicators (need more careful handling) # Secondary collaboration indicators (need more careful handling)
secondary_patterns = ['&', 'and'] secondary_patterns = ['&', 'and', ',']
# Check if this is a collaboration # Check if this is a collaboration
is_collaboration = False is_collaboration = False
split_pattern = None split_pattern = None
# Special case: Handle malformed artist names like "ft Jamie Foxx West, Kanye"
# This should be "Kanye West ft. Jamie Foxx"
if artist_string.lower().startswith(('ft ', 'feat ')):
# This is a malformed collaboration string
# Try to extract the actual artists from the rest
remaining = artist_string[artist_string.find(' ') + 1:].strip()
if ',' in remaining:
# Split on comma and reverse the order
parts = [part.strip() for part in remaining.split(',')]
if len(parts) >= 2:
# Assume the last part is the main artist
main_artist = parts[-1].strip()
collaborators = parts[:-1]
return (main_artist, collaborators)
for pattern in primary_patterns: for pattern in primary_patterns:
if pattern.lower() in artist_string.lower(): if pattern.lower() in artist_string.lower():
is_collaboration = True is_collaboration = True
@ -632,6 +647,20 @@ class MusicBrainzDatabase:
# If no primary collaboration found, check secondary patterns # If no primary collaboration found, check secondary patterns
if not is_collaboration: if not is_collaboration:
for pattern in secondary_patterns: for pattern in secondary_patterns:
if pattern == ',':
# Handle comma-separated artists (e.g., "Ariana Grande, Normani, Nicki Minaj")
if ',' in artist_string:
# Count commas to determine if this is likely a collaboration
comma_count = artist_string.count(',')
if comma_count >= 1:
# Split on comma and treat as collaboration
parts = [part.strip() for part in artist_string.split(',')]
if len(parts) >= 2:
# First artist is main, rest are collaborators
main_artist = parts[0]
collaborators = parts[1:]
return (main_artist, collaborators)
else:
# Use word boundaries to avoid splitting within words like "Orlando" # Use word boundaries to avoid splitting within words like "Orlando"
import re import re
pattern_regex = r'\b' + re.escape(pattern) + r'\b' pattern_regex = r'\b' + re.escape(pattern) + r'\b'
@ -711,45 +740,152 @@ class MusicBrainzDatabase:
def _generate_title_variations(self, title: str) -> List[str]: def _generate_title_variations(self, title: str) -> List[str]:
""" """
Generate title variations by removing parenthetical content. Generate title variations by removing parenthetical content and fixing common issues.
Returns list of title variations to try. Returns list of title variations to try.
""" """
import re import re
search_titles = [title.strip()] search_titles = [title.strip()]
# Remove complete parentheses (content) # Fix common typos and missing apostrophes
title_fixes = title.strip()
# Fix missing apostrophes in common contractions
apostrophe_fixes = [
(r'\bDont\b', "Don't"),
(r'\bCant\b', "Can't"),
(r'\bWont\b', "Won't"),
(r'\bArent\b', "Aren't"),
(r'\bIsnt\b', "Isn't"),
(r'\bWasnt\b', "Wasn't"),
(r'\bDidnt\b', "Didn't"),
(r'\bDoesnt\b', "Doesn't"),
(r'\bHavent\b', "Haven't"),
(r'\bHasnt\b', "Hasn't"),
(r'\bWouldnt\b', "Wouldn't"),
(r'\bCouldnt\b', "Couldn't"),
(r'\bShouldnt\b', "Shouldn't"),
(r'\bPhunk\b', "Funk"), # Common typo
(r'\bBout\b', "About"), # Shortened form
]
for pattern, replacement in apostrophe_fixes:
fixed_title = re.sub(pattern, replacement, title_fixes, flags=re.IGNORECASE)
if fixed_title != title_fixes:
title_fixes = fixed_title
if title_fixes not in search_titles:
search_titles.append(title_fixes)
# Comprehensive parentheses removal - try multiple approaches
# 1. Remove all complete parentheses (most aggressive)
clean_title = re.sub(r'\s*\([^)]*\)', '', title.strip()) clean_title = re.sub(r'\s*\([^)]*\)', '', title.strip())
clean_title = clean_title.strip() clean_title = clean_title.strip()
if clean_title != title.strip() and clean_title: if clean_title != title.strip() and clean_title:
search_titles.append(clean_title) search_titles.append(clean_title)
# Remove unmatched opening parenthesis at end # 2. Remove specific common patterns first, then general parentheses
specific_patterns = [
r'\s*\(Karaoke Version\)',
r'\s*\(Karaoke\)',
r'\s*\(Instrumental\)',
r'\s*\(Backing Track\)',
r'\s*\(live [^)]*\)',
r'\s*\(Live [^)]*\)',
r'\s*\(Acoustic\)',
r'\s*\(acoustic\)',
r'\s*\(Without Backing Vocals\)',
r'\s*\(Without Backing Vocals\)',
r'\s*\(Clean\)',
r'\s*\(clean\)',
r'\s*\(Remix\)',
r'\s*\(remix\)',
r'\s*\(Radio Edit\)',
r'\s*\(radio edit\)',
r'\s*\(Extended Mix\)',
r'\s*\(extended mix\)',
r'\s*\(Single Version\)',
r'\s*\(single version\)',
r'\s*\(Album Version\)',
r'\s*\(album version\)',
r'\s*\(Original Mix\)',
r'\s*\(original mix\)',
r'\s*\(John Lewis Christmas Ad \d+\)', # Specific pattern from test
r'\s*\(from the movie [^)]*\)',
r'\s*\(from the [^)]*\)',
r'\s*\(feat\. [^)]*\)',
r'\s*\(featuring [^)]*\)',
r'\s*\(ft\. [^)]*\)',
r'\s*\(duet\)',
r'\s*\(Duet\)',
r'\s*\(Two Semitones Down\)',
r'\s*\(Minus Piano\)',
r'\s*\(Cut Down\)',
r'\s*\(Boone & Speedy Vocals\)',
r'\s*\(My Heart Belongs to You\)',
]
# 3. Remove dash-separated content (like "Live At the BBC")
dash_patterns = [
r'\s*-\s*Live [^-]*$',
r'\s*-\s*live [^-]*$',
r'\s*-\s*Live At [^-]*$',
r'\s*-\s*Live At the [^-]*$',
r'\s*-\s*Live At the BBC$',
r'\s*-\s*Live From [^-]*$',
r'\s*-\s*Live In [^-]*$',
r'\s*-\s*Live On [^-]*$',
]
# Apply specific patterns first
for pattern in specific_patterns:
specific_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
specific_clean = specific_clean.strip()
if specific_clean != title.strip() and specific_clean and specific_clean not in search_titles:
search_titles.append(specific_clean)
# Apply dash patterns
for pattern in dash_patterns:
dash_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
dash_clean = dash_clean.strip()
if dash_clean != title.strip() and dash_clean and dash_clean not in search_titles:
search_titles.append(dash_clean)
# 3. Remove any remaining parentheses after specific patterns
for pattern in specific_patterns:
remaining_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
remaining_clean = re.sub(r'\s*\([^)]*\)', '', remaining_clean.strip())
remaining_clean = remaining_clean.strip()
if remaining_clean != title.strip() and remaining_clean and remaining_clean not in search_titles:
search_titles.append(remaining_clean)
# 4. Remove unmatched opening parenthesis at end
clean_title2 = re.sub(r'\s*\([^)]*$', '', title.strip()) clean_title2 = re.sub(r'\s*\([^)]*$', '', title.strip())
clean_title2 = clean_title2.strip() clean_title2 = clean_title2.strip()
if clean_title2 != title.strip() and clean_title2 and clean_title2 not in search_titles: if clean_title2 != title.strip() and clean_title2 and clean_title2 not in search_titles:
search_titles.append(clean_title2) search_titles.append(clean_title2)
# Remove unmatched closing parenthesis at start # 5. Remove unmatched closing parenthesis at start
clean_title3 = re.sub(r'^[^)]*\)\s*', '', title.strip()) clean_title3 = re.sub(r'^[^)]*\)\s*', '', title.strip())
clean_title3 = clean_title3.strip() clean_title3 = clean_title3.strip()
if clean_title3 != title.strip() and clean_title3 and clean_title3 not in search_titles: if clean_title3 != title.strip() and clean_title3 and clean_title3 not in search_titles:
search_titles.append(clean_title3) search_titles.append(clean_title3)
# Also try with specific karaoke patterns removed # 6. Try removing extra spaces and normalizing
karaoke_patterns = [ normalized_title = re.sub(r'\s+', ' ', title.strip())
r'\s*\(Karaoke Version\)', if normalized_title != title.strip() and normalized_title not in search_titles:
r'\s*\(Karaoke\)', search_titles.append(normalized_title)
r'\s*\(Instrumental\)',
r'\s*\(Backing Track\)',
]
for pattern in karaoke_patterns:
karaoke_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
karaoke_clean = karaoke_clean.strip()
if karaoke_clean != title.strip() and karaoke_clean not in search_titles:
search_titles.append(karaoke_clean)
return search_titles # 7. Apply normalization to all cleaned versions and remove duplicates
normalized_versions = []
for version in search_titles:
# Normalize spaces (replace multiple spaces with single space)
normalized = re.sub(r'\s+', ' ', version.strip())
# Remove leading/trailing spaces
normalized = normalized.strip()
if normalized and normalized not in normalized_versions:
normalized_versions.append(normalized)
return normalized_versions
def _parse_collaborators(self, collaborators_string: str) -> List[str]: def _parse_collaborators(self, collaborators_string: str) -> List[str]:
""" """

View File

@ -10,6 +10,7 @@ import time
import re import re
from pathlib import Path from pathlib import Path
from typing import Dict, Optional, Any, Tuple, List from typing import Dict, Optional, Any, Tuple, List
from datetime import datetime
# Import constants # Import constants
from ..config.constants import ( from ..config.constants import (
@ -170,10 +171,10 @@ class MusicBrainzCleaner:
import re import re
# Primary collaboration indicators # Primary collaboration indicators
primary_patterns = ['ft.', 'feat.', 'featuring'] primary_patterns = ['ft.', 'feat.', 'featuring', 'ft', 'feat']
# Secondary collaboration indicators (need more careful handling) # Secondary collaboration indicators (need more careful handling)
secondary_patterns = ['&', 'and'] secondary_patterns = ['&', 'and', ',']
# Check if this is a collaboration # Check if this is a collaboration
is_collaboration = False is_collaboration = False
@ -366,99 +367,188 @@ class MusicBrainzCleaner:
return song, False return song, False
def clean_songs_file(self, input_file: Path, output_file: Optional[Path] = None, limit: Optional[int] = None) -> Tuple[Path, List[Dict]]: def process_songs(self, source_file: Path, output_success: Path = None, output_failure: Path = None, limit: Optional[int] = None) -> Dict[str, Any]:
try: """
# Read input file Process songs from source file and save successful and failed songs to separate files.
with open(input_file, 'r', encoding='utf-8') as f: This is the main processing method that handles full dataset processing by default.
songs = json.load(f) """
if not source_file.exists():
print(f'❌ Source file not found: {source_file}')
return {}
if not isinstance(songs, list): print('🚀 Starting song processing...')
print("Error: Input file should contain a JSON array of songs")
return input_file, [] # Load songs
with open(source_file, 'r') as f:
all_songs = json.load(f)
if not isinstance(all_songs, list):
print("Error: Source file should contain a JSON array of songs")
return {}
# Apply limit if specified # Apply limit if specified
if limit is not None: if limit is not None:
songs = songs[:limit] all_songs = all_songs[:limit]
print(f"⚠️ Limiting processing to first {limit} songs") print(f"⚠️ Limiting processing to first {limit} songs")
# Determine output path total_songs = len(all_songs)
if output_file is None: print(f'📊 Total songs to process: {total_songs:,}')
output_file = input_file.parent / f"{input_file.stem}_cleaned.json" print(f'Using {"database" if self.use_database else "API"} connection')
print(f"Processing {len(songs)} songs...")
print(f"Using {'database' if self.use_database else 'API'} connection")
print(PROGRESS_SEPARATOR) print(PROGRESS_SEPARATOR)
# Clean each song # Initialize arrays for batch processing
cleaned_songs = [] successful_songs = []
failed_songs = [] failed_songs = []
success_count = 0
fail_count = 0
for i, song in enumerate(songs, 1): # Statistics tracking
cleaned_song, success = self.clean_song(song) stats = {
cleaned_songs.append(cleaned_song) 'total_processed': 0,
'artists_found': 0,
'recordings_found': 0,
'start_time': time.time()
}
if success: # Process each song
success_count += 1 for i, song in enumerate(all_songs, 1):
print(f"[{i}/{len(songs)}] ✅ PASS") try:
result = self.clean_song(song)
cleaned_song, success = result
artist_found = 'mbid' in cleaned_song
recording_found = 'recording_mbid' in cleaned_song
# Display progress with counter and status
artist_name = song.get('artist', 'Unknown')
title = song.get('title', 'Unknown')
if artist_found and recording_found:
stats['artists_found'] += 1
stats['recordings_found'] += 1
successful_songs.append(cleaned_song)
print(f'[{i:,} of {total_songs:,}] ✅ PASS: {artist_name} - {title}')
else: else:
fail_count += 1 # Keep the original song in failed_songs array (same format as source)
print(f"[{i}/{len(songs)}] ❌ FAIL") failed_songs.append(song)
# Store failed song info for report print(f'[{i:,} of {total_songs:,}] ❌ FAIL: {artist_name} - {title}')
failed_songs.append({
'index': i, stats['total_processed'] += 1
'original_artist': song.get('artist', ''),
'original_title': song.get('title', ''), # Progress update every 100 songs
'cleaned_artist': cleaned_song.get('artist', ''), if i % 100 == 0:
'cleaned_title': cleaned_song.get('title', ''), elapsed = time.time() - stats['start_time']
'has_mbid': 'mbid' in cleaned_song, rate = i / elapsed if elapsed > 0 else 0
'has_recording_mbid': 'recording_mbid' in cleaned_song success_rate = (stats['artists_found'] / i * 100) if i > 0 else 0
}) print(f' 📈 Progress: {i:,}/{total_songs:,} ({i/total_songs*100:.1f}%) - '
f'Success: {success_rate:.1f}% - Rate: {rate:.1f} songs/sec')
except Exception as e:
print(f' ❌ Error processing song {i}: {e}')
# Keep the original song in failed_songs array
failed_songs.append(song)
stats['total_processed'] += 1
# Only add delay for API calls, not database queries # Only add delay for API calls, not database queries
if not self.use_database: if not self.use_database:
time.sleep(API_REQUEST_DELAY) time.sleep(API_REQUEST_DELAY)
# Write output file # Determine output file paths
with open(output_file, 'w', encoding='utf-8') as f: if output_success is None:
json.dump(cleaned_songs, f, indent=2, ensure_ascii=False) output_success = source_file.parent / f"{source_file.stem}-success.json"
if output_failure is None:
output_failure = source_file.parent / f"{source_file.stem}-failure.json"
# Save successful songs (array format, same as source)
with open(output_success, 'w', encoding='utf-8') as f:
json.dump(successful_songs, f, indent=2, ensure_ascii=False)
# Save failed songs (array format, same as source)
with open(output_failure, 'w', encoding='utf-8') as f:
json.dump(failed_songs, f, indent=2, ensure_ascii=False)
# Calculate final statistics
total_time = time.time() - stats['start_time']
# Create human-readable text report
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
report_file = source_file.parent / f"processing_report_{timestamp}.txt"
# Generate failure report
report_file = input_file.parent / f"{input_file.stem}_failure_report.json"
with open(report_file, 'w', encoding='utf-8') as f: with open(report_file, 'w', encoding='utf-8') as f:
json.dump({ f.write("MusicBrainz Data Cleaner - Processing Report\n")
f.write("=" * 50 + "\n\n")
f.write(f"Source File: {source_file}\n")
f.write(f"Processing Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"Processing Time: {total_time:.1f} seconds\n\n")
f.write("SUMMARY\n")
f.write("-" * 20 + "\n")
f.write(f"Total Songs Processed: {stats['total_processed']:,}\n")
f.write(f"Successful Songs: {len(successful_songs):,}\n")
f.write(f"Failed Songs: {len(failed_songs):,}\n")
f.write(f"Success Rate: {(len(successful_songs)/stats['total_processed']*100):.1f}%\n\n")
f.write("DETAILED STATISTICS\n")
f.write("-" * 20 + "\n")
f.write(f"Artists Found: {stats['artists_found']:,}/{stats['total_processed']:,} ({(stats['artists_found']/stats['total_processed']*100):.1f}%)\n")
f.write(f"Recordings Found: {stats['recordings_found']:,}/{stats['total_processed']:,} ({(stats['recordings_found']/stats['total_processed']*100):.1f}%)\n")
f.write(f"Processing Speed: {stats['total_processed'] / total_time:.1f} songs/second\n\n")
f.write("OUTPUT FILES\n")
f.write("-" * 20 + "\n")
f.write(f"Successful Songs: {output_success}\n")
f.write(f"Failed Songs: {output_failure}\n")
f.write(f"Report File: {report_file}\n\n")
if len(failed_songs) > 0:
f.write("FAILED SONGS (First 50)\n")
f.write("-" * 20 + "\n")
for i, song in enumerate(failed_songs[:50], 1):
artist = song.get('artist', 'Unknown')
title = song.get('title', 'Unknown')
f.write(f"{i:3d}. {artist} - {title}\n")
if len(failed_songs) > 50:
f.write(f"... and {len(failed_songs) - 50} more failed songs\n")
# Save detailed JSON report for programmatic access
json_report_file = source_file.parent / f"processing_report_{timestamp}.json"
final_stats = {
'summary': { 'summary': {
'total_songs': len(songs), 'total_tested': stats['total_processed'],
'successful': success_count, 'artists_found': stats['artists_found'],
'failed': fail_count, 'recordings_found': stats['recordings_found'],
'success_rate': f"{(success_count/len(songs)*100):.1f}%" 'failed_count': len(failed_songs),
'artist_success_rate': (stats['artists_found'] / stats['total_processed'] * 100) if stats['total_processed'] > 0 else 0,
'recording_success_rate': (stats['recordings_found'] / stats['total_processed'] * 100) if stats['total_processed'] > 0 else 0,
'processing_time_seconds': total_time,
'songs_per_second': stats['total_processed'] / total_time if total_time > 0 else 0
}, },
'failed_songs': failed_songs 'files': {
}, f, indent=2, ensure_ascii=False) 'source': str(source_file),
'successful_songs': str(output_success),
'failed_songs': str(output_failure),
'text_report': str(report_file),
'json_report': str(json_report_file)
}
}
print(f"\n{PROGRESS_SEPARATOR}") with open(json_report_file, 'w') as f:
print(f"✅ SUCCESS: {success_count} songs") json.dump(final_stats, f, indent=2)
print(f"❌ FAILED: {fail_count} songs")
print(f"📊 SUCCESS RATE: {(success_count/len(songs)*100):.1f}%")
print(f"💾 CLEANED DATA: {output_file}")
print(f"📋 FAILURE REPORT: {report_file}")
return output_file, failed_songs print(f'\n{PROGRESS_SEPARATOR}')
print(f'🎉 Processing completed!')
print(f'📊 Final Results:')
print(f' ⏱️ Total processing time: {total_time:.1f} seconds')
print(f' 🚀 Average speed: {stats["total_processed"] / total_time:.1f} songs/second')
print(f' ✅ Artists found: {stats["artists_found"]:,}/{stats["total_processed"]:,} ({stats["artists_found"]/stats["total_processed"]*100:.1f}%)')
print(f' ✅ Recordings found: {stats["recordings_found"]:,}/{stats["total_processed"]:,} ({stats["recordings_found"]/stats["total_processed"]*100:.1f}%)')
print(f' ❌ Failed songs: {len(failed_songs):,} ({len(failed_songs)/stats["total_processed"]*100:.1f}%)')
print(f'📄 Files saved:')
print(f' ✅ Successful songs: {output_success}')
print(f' ❌ Failed songs: {output_failure}')
print(f' 📋 Text report: {report_file}')
print(f' 📊 JSON report: {json_report_file}')
except FileNotFoundError: return final_stats
print(f"Error: File '{input_file}' not found")
return input_file, []
except json.JSONDecodeError:
print(f"Error: Invalid JSON in file '{input_file}'")
return input_file, []
except Exception as e:
print(f"Error processing file: {e}")
return input_file, []
finally:
# Clean up database connection
if self.use_database and hasattr(self, 'db'):
self.db.disconnect()
def print_help() -> None: def print_help() -> None:
@ -466,25 +556,36 @@ def print_help() -> None:
MusicBrainz Data Cleaner - Clean and normalize song data using MusicBrainz MusicBrainz Data Cleaner - Clean and normalize song data using MusicBrainz
USAGE: USAGE:
musicbrainz-cleaner <input_file.json> [output_file.json] [options] musicbrainz-cleaner [options]
ARGUMENTS:
input_file.json JSON file containing array of song objects
output_file.json Optional: Output file for cleaned data
OPTIONS: OPTIONS:
--source FILE Source JSON file (default: data/songs.json)
--output-success FILE Output file for successful songs (default: source-success.json)
--output-failure FILE Output file for failed songs (default: source-failure.json)
--limit N Process only the first N songs (default: all songs)
--use-api Force use of HTTP API instead of direct database access
--test-connection Test connection to MusicBrainz server
--help, -h Show this help message --help, -h Show this help message
--version, -v Show version information --version, -v Show version information
--test-connection Test connection to MusicBrainz server
--limit N Process only the first N songs (for testing)
--use-api Force use of HTTP API instead of direct database access
EXAMPLES: EXAMPLES:
musicbrainz-cleaner songs.json # Process all songs with default settings
musicbrainz-cleaner songs.json cleaned_songs.json musicbrainz-cleaner
# Process specific file
musicbrainz-cleaner --source data/my_songs.json
# Process with custom output files
musicbrainz-cleaner --source data/songs.json --output-success cleaned.json --output-failure failed.json
# Process only first 1000 songs
musicbrainz-cleaner --limit 1000
# Test connection
musicbrainz-cleaner --test-connection musicbrainz-cleaner --test-connection
musicbrainz-cleaner songs.json --limit 5
musicbrainz-cleaner songs.json --use-api # Force API mode
musicbrainz-cleaner --use-api
REQUIREMENTS: REQUIREMENTS:
- MusicBrainz server running on http://localhost:5001 - MusicBrainz server running on http://localhost:5001
@ -501,12 +602,14 @@ PERFORMANCE:
def print_version() -> None: def print_version() -> None:
version_info = """ version_info = """
MusicBrainz Data Cleaner v2.0.0 MusicBrainz Data Cleaner v3.0.0
Enhanced with: Enhanced with:
- Direct PostgreSQL database access - Direct PostgreSQL database access
- Fuzzy search for better matching - Fuzzy search for better matching
- Improved performance and accuracy - Improved performance and accuracy
- Separate output files for successful and failed songs
- Detailed progress tracking and reporting
Copyright (c) 2024 MusicBrainz Data Cleaner Contributors Copyright (c) 2024 MusicBrainz Data Cleaner Contributors
MIT License - see LICENSE file for details MIT License - see LICENSE file for details
@ -516,35 +619,89 @@ Built with Python 3.6+
print(version_info) print(version_info)
def parse_arguments(args: List[str]) -> Dict[str, Any]:
"""Parse command line arguments into a dictionary"""
parsed = {
'source': 'data/songs.json',
'output_success': None,
'output_failure': None,
'limit': None,
'use_api': False,
'test_connection': False,
'help': False,
'version': False
}
i = 0
while i < len(args):
arg = args[i]
if arg in ['--help', '-h', 'help']:
parsed['help'] = True
elif arg in ['--version', '-v', 'version']:
parsed['version'] = True
elif arg == '--test-connection':
parsed['test_connection'] = True
elif arg == '--use-api':
parsed['use_api'] = True
elif arg == '--source':
if i + 1 < len(args) and not args[i + 1].startswith('--'):
parsed['source'] = args[i + 1]
i += 1
else:
print("Error: --source requires a file path")
sys.exit(ExitCode.USAGE_ERROR)
elif arg == '--output-success':
if i + 1 < len(args) and not args[i + 1].startswith('--'):
parsed['output_success'] = args[i + 1]
i += 1
else:
print("Error: --output-success requires a file path")
sys.exit(ExitCode.USAGE_ERROR)
elif arg == '--output-failure':
if i + 1 < len(args) and not args[i + 1].startswith('--'):
parsed['output_failure'] = args[i + 1]
i += 1
else:
print("Error: --output-failure requires a file path")
sys.exit(ExitCode.USAGE_ERROR)
elif arg == '--limit':
if i + 1 < len(args) and not args[i + 1].startswith('--'):
try:
parsed['limit'] = int(args[i + 1])
if parsed['limit'] <= 0:
print("Error: --limit must be a positive number")
sys.exit(ExitCode.USAGE_ERROR)
except ValueError:
print("Error: --limit requires a valid number")
sys.exit(ExitCode.USAGE_ERROR)
i += 1
else:
print("Error: --limit requires a number")
sys.exit(ExitCode.USAGE_ERROR)
i += 1
return parsed
def main() -> int: def main() -> int:
try: try:
args = sys.argv[1:] args = sys.argv[1:]
parsed = parse_arguments(args)
# Handle help and version flags # Handle help and version flags
if not args or args[0] in ['--help', '-h', 'help']: if parsed['help']:
print_help() print_help()
return ExitCode.SUCCESS return ExitCode.SUCCESS
if args[0] in ['--version', '-v', 'version']: if parsed['version']:
print_version() print_version()
return ExitCode.SUCCESS return ExitCode.SUCCESS
# Check for API flag
use_database = '--use-api' not in args
if not use_database:
print("⚠️ Using HTTP API mode (slower than database access)")
# Handle test connection # Handle test connection
if args[0] == '--test-connection': if parsed['test_connection']:
if use_database: if parsed['use_api']:
db = MusicBrainzDatabase()
if db.test_connection():
print("✅ Connection to MusicBrainz database successful")
return ExitCode.SUCCESS
else:
print("❌ Connection to MusicBrainz database failed")
return ExitCode.ERROR
else:
api = MusicBrainzAPIClient() api = MusicBrainzAPIClient()
if api.test_connection(): if api.test_connection():
print("✅ Connection to MusicBrainz API server successful") print("✅ Connection to MusicBrainz API server successful")
@ -552,10 +709,7 @@ def main() -> int:
else: else:
print("❌ Connection to MusicBrainz API server failed") print("❌ Connection to MusicBrainz API server failed")
return ExitCode.ERROR return ExitCode.ERROR
else:
# Check for test connection flag in any position
if '--test-connection' in args:
if use_database:
db = MusicBrainzDatabase() db = MusicBrainzDatabase()
if db.test_connection(): if db.test_connection():
print("✅ Connection to MusicBrainz database successful") print("✅ Connection to MusicBrainz database successful")
@ -563,73 +717,27 @@ def main() -> int:
else: else:
print("❌ Connection to MusicBrainz database failed") print("❌ Connection to MusicBrainz database failed")
return ExitCode.ERROR return ExitCode.ERROR
else:
api = MusicBrainzAPIClient()
if api.test_connection():
print("✅ Connection to MusicBrainz API server successful")
return ExitCode.SUCCESS
else:
print("❌ Connection to MusicBrainz API server failed")
return ExitCode.ERROR
# Validate input file # Process songs (main functionality)
if not args: source_file = Path(parsed['source'])
print("Error: Input file is required") output_success = Path(parsed['output_success']) if parsed['output_success'] else None
print("Use --help for usage information") output_failure = Path(parsed['output_failure']) if parsed['output_failure'] else None
if not source_file.exists():
print(f"Error: Source file does not exist: {source_file}")
return ExitCode.USAGE_ERROR return ExitCode.USAGE_ERROR
# Parse limit argument and remove it from args if not source_file.is_file():
limit = None print(f"Error: Source path is not a file: {source_file}")
args_to_remove = []
for i, arg in enumerate(args):
if arg == '--limit':
if i + 1 < len(args) and not args[i + 1].startswith('--'):
try:
limit = int(args[i + 1])
if limit <= 0:
print("Error: Limit must be a positive number")
return ExitCode.USAGE_ERROR
args_to_remove.extend([i, i + 1])
except ValueError:
print("Error: --limit requires a valid number")
return ExitCode.USAGE_ERROR
else:
print("Error: --limit requires a number")
return ExitCode.USAGE_ERROR return ExitCode.USAGE_ERROR
# Remove limit arguments and API flag from args if source_file.suffix.lower() != '.json':
for index in reversed(args_to_remove): print(f"Error: Source file must be a JSON file: {source_file}")
args.pop(index)
# Remove API flag
args = [arg for arg in args if arg != '--use-api']
# Filter out remaining flags to get file arguments
file_args = [arg for arg in args if not arg.startswith('--')]
if not file_args:
print("Error: Input file is required")
print("Use --help for usage information")
return ExitCode.USAGE_ERROR
input_file = Path(file_args[0])
output_file = Path(file_args[1]) if len(file_args) > 1 else None
if not input_file.exists():
print(f"Error: Input file does not exist: {input_file}")
return ExitCode.USAGE_ERROR
if not input_file.is_file():
print(f"Error: Input path is not a file: {input_file}")
return ExitCode.USAGE_ERROR
if input_file.suffix.lower() != '.json':
print(f"Error: Input file must be a JSON file: {input_file}")
return ExitCode.USAGE_ERROR return ExitCode.USAGE_ERROR
# Process the file # Process the file
cleaner = MusicBrainzCleaner(use_database=use_database) cleaner = MusicBrainzCleaner(use_database=not parsed['use_api'])
result_path, failed_songs = cleaner.clean_songs_file(input_file, output_file, limit) cleaner.process_songs(source_file, output_success, output_failure, parsed['limit'])
return ExitCode.SUCCESS return ExitCode.SUCCESS