Signed-off-by: Matt Bruce <mbrucedogs@gmail.com>

2025-08-01 08:01:07 -05:00 · 2025-08-01 08:01:07 -05:00 · 9124640bf4
commit 9124640bf4
parent ddbc6a9ebc
6 changed files with 1065 additions and 630 deletions
--- a/COMMANDS.md
+++ b/COMMANDS.md
@ -2,55 +2,86 @@
 ## Overview
-The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database.
+The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool creates separate output files for successful and failed songs, along with detailed processing reports.
 ## Basic Command Structure
 ```bash
-python musicbrainz_cleaner.py <input_file> [output_file] [options]
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main [options]
 ```
-## Command Arguments
+## Command Options
-### Required Arguments
+### Main Options
-| Argument | Type | Description | Example |
+| Option | Type | Description | Default | Example |
-|----------|------|-------------|---------|
+|--------|------|-------------|---------|---------|
-| `input_file` | string | Path to the JSON file containing song data | `my_songs.json` |
+| `--source` | string | Source JSON file path | `data/songs.json` | `--source data/my_songs.json` |
-
+| `--output-success` | string | Output file for successful songs | `source-success.json` | `--output-success cleaned.json` |
-### Optional Arguments
+| `--output-failure` | string | Output file for failed songs | `source-failure.json` | `--output-failure failed.json` |
-
+| `--limit` | number | Process only first N songs | None (all songs) | `--limit 1000` |
-| Argument | Type | Description | Example |
+| `--use-api` | flag | Force use of HTTP API instead of database | Database mode | `--use-api` |
-|----------|------|-------------|---------|
+| `--test-connection` | flag | Test connection to MusicBrainz server | None | `--test-connection` |
-| `output_file` | string | Path for the cleaned output file | `cleaned_songs.json` |
+| `--help` | flag | Show help information | None | `--help` |
-| `--help` | flag | Show help information | `--help` |
+| `--version` | flag | Show version information | None | `--version` |
 | `--version` | flag | Show version information | `--version` |
 ## Command Examples
-### Basic Usage
+### Basic Usage (Default)
 ```bash
-# Clean songs and save to auto-generated filename
+# Process all songs with default settings
-python musicbrainz_cleaner.py songs.json
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
-# Output: songs_cleaned.json
+# Output: data/songs-success.json and data/songs-failure.json
 ```
-### Custom Output File
+### Custom Source File
 ```bash
-# Specify custom output filename
+# Process specific file
-python musicbrainz_cleaner.py songs.json cleaned_songs.json
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
 # Output: data/my_songs-success.json and data/my_songs-failure.json
 ```
 ### Custom Output Files
 ```bash
 # Specify custom output files
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json
 ```
 ### Limited Processing
 ```bash
 # Process only first 1000 songs
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
 ```
 ### Force API Mode
 ```bash
 # Use HTTP API instead of database (slower but works without PostgreSQL)
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
 ```
 ### Test Connection
 ```bash
 # Test database connection
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
 # Test API connection
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
 ```
 ### Help and Information
 ```bash
 # Show help information
-python musicbrainz_cleaner.py --help
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help
 # Show version information
-python musicbrainz_cleaner.py --version
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version
 ```
 ## Input File Format
@ -84,9 +115,13 @@ Any additional fields will be preserved in the output:
 - `path`: File path
 - Any other custom fields
-## Output File Format
+## Output Files
-The output file will contain the same structure with cleaned data and added MBID fields:
+The tool creates **three output files**:
 ### 1. Successful Songs (`source-success.json`)
 Array of successfully processed songs with MBIDs added:
 ```json
 [
@ -103,49 +138,107 @@ The output file will contain the same structure with cleaned data and added MBID
 ]
 ```
-### Added Fields
+### 2. Failed Songs (`source-failure.json`)
 Array of songs that couldn't be processed (same format as source):
 ```json
 [
  {
    "artist": "Unknown Artist",
    "title": "Unknown Song",
    "disabled": false,
    "favorite": false,
    "guid": "12345678-1234-1234-1234-123456789012",
    "path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
  }
 ]
 ```
 ### 3. Processing Report (`processing_report_YYYYMMDD_HHMMSS.txt`)
 Human-readable text report with statistics and failed song list:
 ```
 MusicBrainz Data Cleaner - Processing Report
 ==================================================
 Source File: data/songs.json
 Processing Date: 2024-12-19 14:30:22
 Processing Time: 15263.3 seconds
 SUMMARY
 --------------------
 Total Songs Processed: 49,170
 Successful Songs: 40,692
 Failed Songs: 8,478
 Success Rate: 82.8%
 DETAILED STATISTICS
 --------------------
 Artists Found: 44,526/49,170 (90.6%)
 Recordings Found: 40,998/49,170 (83.4%)
 Processing Speed: 3.2 songs/second
 OUTPUT FILES
 --------------------
 Successful Songs: data/songs-success.json
 Failed Songs: data/songs-failure.json
 Report File: data/processing_report_20241219_143022.txt
 FAILED SONGS (First 50)
 --------------------
  1. Unknown Artist - Unknown Song
  2. Invalid Artist - Invalid Title
  3. Test Artist - Test Song
 ...
 ```
 ### Added Fields (Successful Songs Only)
 - `mbid`: MusicBrainz Artist ID (string)
 - `recording_mbid`: MusicBrainz Recording ID (string)
-## Command Line Options
+## Processing Output
-### Help Option
+### Progress Indicators
-```bash
+```
-python musicbrainz_cleaner.py --help
+🚀 Starting song processing...
 📊 Total songs to process: 49,170
 Using database connection
 ==================================================
 [1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
 [2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
 [3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
  📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
 ==================================================
 🎉 Processing completed!
 📊 Final Results:
  ⏱️  Total processing time: 15263.3 seconds
  🚀 Average speed: 3.2 songs/second
  ✅ Artists found: 44,526/49,170 (90.6%)
  ✅ Recordings found: 40,998/49,170 (83.4%)
  ❌ Failed songs: 8,478 (17.2%)
 📄 Files saved:
  ✅ Successful songs: data/songs-success.json
  ❌ Failed songs: data/songs-failure.json
  📋 Text report: data/processing_report_20241219_143022.txt
  📊 JSON report: data/processing_report_20241219_143022.json
 ```
-**Output:**
+### Status Indicators
 ```
 Usage: python musicbrainz_cleaner.py <input_file.json> [output_file.json]
-MusicBrainz Data Cleaner - Clean and normalize song data using MusicBrainz
+| Symbol | Meaning | Description |
-
+|--------|---------|-------------|
-Arguments:
+| ✅ | Success | Song processed successfully with MBIDs found |
-  input_file.json    JSON file containing array of song objects
+| ❌ | Failure | Song processing failed (no MBIDs found) |
-  output_file.json   Optional: Output file for cleaned data
+| 📈 | Progress | Progress update with statistics |
-                     (default: input_file_cleaned.json)
+| 🚀 | Start | Processing started |
-
+| 🎉 | Complete | Processing completed successfully |
 Examples:
  python musicbrainz_cleaner.py songs.json
  python musicbrainz_cleaner.py songs.json cleaned_songs.json
 Requirements:
  - MusicBrainz server running on http://localhost:5001
  - Python 3.6+ with requests library
 ```
 ### Version Option
 ```bash
 python musicbrainz_cleaner.py --version
 ```
 **Output:**
 ```
 MusicBrainz Data Cleaner v1.0.0
 ```
 ## Error Messages and Exit Codes
@ -161,7 +254,7 @@ MusicBrainz Data Cleaner v1.0.0
 #### File Not Found
 ```
-Error: File 'songs.json' not found
+Error: Source file does not exist: data/songs.json
 ```
 #### Invalid JSON
@ -171,12 +264,12 @@ Error: Invalid JSON in file 'songs.json'
 #### Invalid Input Format
 ```
-Error: Input file should contain a JSON array of songs
+Error: Source file should contain a JSON array of songs
 ```
 #### Connection Error
 ```
-Error searching for artist 'Artist Name': Connection refused
+❌ Connection to MusicBrainz database failed
 ```
 #### Missing Dependencies
@ -184,112 +277,95 @@ Error searching for artist 'Artist Name': Connection refused
 ModuleNotFoundError: No module named 'requests'
 ```
-## Processing Output
+## Environment Configuration
-### Progress Indicators
+### Docker Environment
-```
+The tool runs in a Docker container with the following configuration:
 Processing 3 songs...
 ==================================================
 [1/3] Processing: ACDC - Shot In The Dark
  ✅ Found artist: AC/DC (MBID: 66c662b6-6e2f-4930-8610-912e24c63ed1)
  ✅ Found recording: Shot in the Dark (MBID: cf8b5cd0-d97c-413d-882f-fc422a2e57db)
  ✅ Updated to: AC/DC - Shot in the Dark
 [2/3] Processing: Bruno Mars ft. Cardi B - Finesse Remix
  ❌ Could not find artist: Bruno Mars ft. Cardi B
 [3/3] Processing: Taylor Swift - Love Story
  ✅ Found artist: Taylor Swift (MBID: 20244d07-534f-4eff-b4d4-930878889970)
  ✅ Found recording: Love Story (MBID: d783e6c5-761f-4fc3-bfcf-6089cdfc8f96)
  ✅ Updated to: Taylor Swift - Love Story
 ==================================================
 ✅ Processing complete!
 📁 Output saved to: songs_cleaned.json
 ```
 ### Status Indicators
 | Symbol | Meaning | Description |
 |--------|---------|-------------|
 | ✅ | Success | Operation completed successfully |
 | ❌ | Error | Operation failed |
 | 🔄 | Processing | Currently processing |
 ## Batch Processing
 ### Multiple Files
 To process multiple files, you can use shell scripting:
 ```bash
 # Process all JSON files in current directory
 for file in *.json; do
    python musicbrainz_cleaner.py "$file"
 done
 ```
 ### Large Files
 For large files, the tool processes songs one at a time with a 0.1-second delay between API calls to be respectful to the MusicBrainz server.
 ## Environment Variables
 The tool uses the following default configuration:
 | Setting | Default | Description |
 |---------|---------|-------------|
-| MusicBrainz URL | `http://localhost:5001` | Local MusicBrainz server URL |
+| Database Host | `db` | PostgreSQL database container |
-| API Delay | `0.1` seconds | Delay between API calls |
+| Database Port | `5432` | PostgreSQL port |
 | Database Name | `musicbrainz_db` | MusicBrainz database name |
 | API URL | `http://localhost:5001` | MusicBrainz web server URL |
 ### Environment Variables
 ```bash
 # Database configuration
 DB_HOST=db
 DB_PORT=5432
 DB_NAME=musicbrainz_db
 DB_USER=musicbrainz
 DB_PASSWORD=musicbrainz
 # Web server configuration
 MUSICBRAINZ_WEB_SERVER_PORT=5001
 ```
 ## Troubleshooting Commands
 ### Check MusicBrainz Server Status
 ```bash
-# Test if server is running
+# Test if web server is running
 curl -I http://localhost:5001
-# Test API endpoint
+# Test database connection
-curl http://localhost:5001/ws/2/artist/?query=name:AC/DC&fmt=json
+docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"
 ```
 ### Validate JSON File
 ```bash
 # Check if JSON is valid
-python -m json.tool songs.json
+python -m json.tool data/songs.json
 # Check JSON structure
-python -c "import json; data=json.load(open('songs.json')); print('Valid JSON array with', len(data), 'items')"
+python -c "import json; data=json.load(open('data/songs.json')); print('Valid JSON array with', len(data), 'items')"
 ```
-### Check Python Dependencies
+### Test Tool Connection
 ```bash
-# Check if requests is installed
+# Test database connection
-python -c "import requests; print('requests version:', requests.__version__)"
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
-# Install if missing
+# Test API connection
-pip install requests
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
 ```
 ## Advanced Usage
-### Custom MusicBrainz Server
+### Batch Processing
-To use a different MusicBrainz server, modify the script:
+To process multiple files, you can use shell scripting:
-```python
+```bash
-# In musicbrainz_cleaner.py, change:
+# Process all JSON files in data directory
-self.base_url = "http://your-server:5001"
+for file in data/*.json; do
    docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source "$file"
 done
 ```
-### Verbose Output
+### Large Files
-For debugging, you can modify the script to add more verbose output by uncommenting debug print statements.
+For large files, the tool processes songs efficiently with:
 - Direct database access for maximum speed
 - Progress tracking every 100 songs
 - Memory-efficient processing
 - No rate limiting with database access
 ### Custom Processing
 ```bash
 # Process with custom chunk size (for testing)
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --limit 1000
 # Process with custom output files
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success my_cleaned.json --output-failure my_failed.json
 ```
 ## Command Line Shortcuts
@ -299,18 +375,22 @@ Add these to your shell profile for convenience:
 ```bash
 # Add to ~/.bashrc or ~/.zshrc
-alias mbclean='python musicbrainz_cleaner.py'
+alias mbclean='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main'
-alias mbclean-help='python musicbrainz_cleaner.py --help'
+alias mbclean-help='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help'
 alias mbclean-test='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection'
 ```
 ### Usage with Aliases
 ```bash
 # Using alias
-mbclean songs.json
+mbclean --source data/songs.json
 # Show help
 mbclean-help
 # Test connection
 mbclean-test
 ```
 ## Integration Examples
@ -319,8 +399,8 @@ mbclean-help
 ```bash
 # Process files and commit changes
-python musicbrainz_cleaner.py songs.json
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json
-git add songs_cleaned.json
+git add data/songs-success.json data/songs-failure.json
 git commit -m "Clean song metadata with MusicBrainz IDs"
 ```
@ -328,7 +408,7 @@ git commit -m "Clean song metadata with MusicBrainz IDs"
 ```bash
 # Add to crontab to process files daily
-0 2 * * * cd /path/to/musicbrainz-cleaner && python musicbrainz_cleaner.py /path/to/songs.json
+0 2 * * * cd /path/to/musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source /path/to/songs.json
 ```
 ### With Shell Scripts
@ -337,13 +417,18 @@ git commit -m "Clean song metadata with MusicBrainz IDs"
 #!/bin/bash
 # clean_songs.sh
 INPUT_FILE="$1"
-OUTPUT_FILE="${INPUT_FILE%.json}_cleaned.json"
+OUTPUT_SUCCESS="${INPUT_FILE%.json}-success.json"
 OUTPUT_FAILURE="${INPUT_FILE%.json}-failure.json"
-python musicbrainz_cleaner.py "$INPUT_FILE" "$OUTPUT_FILE"
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main \
    --source "$INPUT_FILE" \
    --output-success "$OUTPUT_SUCCESS" \
    --output-failure "$OUTPUT_FAILURE"
 if [ $? -eq 0 ]; then
-    echo "Successfully cleaned $INPUT_FILE"
+    echo "Successfully processed $INPUT_FILE"
-    echo "Output saved to $OUTPUT_FILE"
+    echo "Successful songs: $OUTPUT_SUCCESS"
    echo "Failed songs: $OUTPUT_FAILURE"
 else
    echo "Error processing $INPUT_FILE"
    exit 1
@ -354,7 +439,10 @@ fi
 | Command | Description |
 |---------|-------------|
-| `python musicbrainz_cleaner.py file.json` | Basic usage |
+| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main` | Process all songs with defaults |
-| `python musicbrainz_cleaner.py file.json output.json` | Custom output |
+| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source file.json` | Process specific file |
-| `python musicbrainz_cleaner.py --help` | Show help |
+| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000` | Process first 1000 songs |
-| `python musicbrainz_cleaner.py --version` | Show version | 
+| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection` | Test database connection |
 | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api` | Force API mode |
 | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help` | Show help |
 | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version` | Show version | 
--- a/PRD.md
+++ b/PRD.md
@ -38,13 +38,16 @@ docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*)
 docker-compose run --rm musicbrainz-cleaner python3 -c "from src.api.database import MusicBrainzDatabase; db = MusicBrainzDatabase(); print('Connection result:', db.connect())"
 ```
-### 4. Run Tests
+### 4. Run the Cleaner
 ```bash
-# Test 100 random songs
+# Process all songs with default settings
-docker-compose run --rm musicbrainz-cleaner python3 test_100_random.py
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
-# Or other test scripts
+# Process with custom options
-docker-compose run --rm musicbrainz-cleaner python3 [script_name].py
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json --limit 1000
 # Test connection
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
 ```
 **⚠️ Critical**: Always run scripts via Docker - the cleaner cannot connect to the database directly from outside the container.
@ -119,10 +122,15 @@ Users have song data in JSON format with inconsistent artist names, song titles,
 #### 6. CLI Interface
 - **REQ-034:** Command-line interface with argument parsing
- **REQ-035:** Support for input and optional output file specification
+- **REQ-035:** Support for source file specification with smart defaults
- **REQ-036:** Progress reporting during processing
+- **REQ-036:** Progress reporting during processing with song counter
 - **REQ-037:** Error handling and user-friendly messages
 - **REQ-038:** Option to force API mode with `--use-api` flag
 - **NEW REQ-039:** Simplified CLI with default full dataset processing
 - **NEW REQ-040:** Separate output files for successful and failed songs (array format)
 - **NEW REQ-041:** Human-readable text report with statistics
 - **NEW REQ-042:** Configurable processing limits and output file paths
 - **NEW REQ-043:** Smart defaults for all file paths and options
 ### ✅ Non-Functional Requirements
@ -170,7 +178,11 @@ src/
 │   ├── __init__.py
 │   └── constants.py    # Constants and settings
 ├── core/               # Core functionality
-├── utils/              # Utility functions
+├── tests/              # Test files and scripts
 │   ├── __init__.py
 │   ├── test_*.py       # Unit and integration tests
 │   └── debug_*.py      # Debug scripts
 └── utils/              # Utility functions
 ```
 ### Architectural Principles
@ -183,6 +195,7 @@ src/
 - **Fallback Strategy**: Automatic fallback to API when database unavailable
 - **NEW**: **Database-First**: Always use live database data over static caches
 - **NEW**: **Intelligent Collaboration Detection**: Distinguish band names from collaborations
 - **NEW**: **Test Organization**: All test files must be placed in `src/tests/` directory, not in root
 ### Data Flow
 1. Read JSON input file
@ -224,6 +237,13 @@ src/
 - Manual configuration needed for custom artist/recording mappings
 - **NEW**: Some edge cases may require manual intervention (data quality issues)
 ### Test File Organization
 - **REQUIRED**: All test files must be placed in `src/tests/` directory
 - **PROHIBITED**: Test files should not be placed in the root directory
 - **Naming Convention**: Test files should follow `test_*.py` or `debug_*.py` patterns
 - **Purpose**: Keeps root directory clean and organizes test code properly
 - **Import Path**: Tests can import from parent modules using relative imports
 ## Server Setup Requirements
 ### MusicBrainz Server Configuration
@ -295,6 +315,11 @@ docker-compose logs -f musicbrainz
 - [x] **NEW**: Band name vs collaboration distinction
 - [x] **NEW**: Complex collaboration parsing
 - [x] **NEW**: Removed problematic known_artists cache
 - [x] **NEW**: Simplified CLI with default full dataset processing
 - [x] **NEW**: Separate output files for successful and failed songs (array format)
 - [x] **NEW**: Human-readable text reports with statistics
 - [x] **NEW**: Smart defaults for all file paths and options
 - [x] **NEW**: Configurable processing limits and output file paths
 ### 🔄 Future Enhancements
 - [ ] Web interface option
@ -396,14 +421,17 @@ pip install -r requirements.txt
 ### Usage
 ```bash
-# Use database access (recommended, faster)
+# Process all songs with default settings (recommended)
-python musicbrainz_cleaner.py input.json
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
 # Process specific file with custom options
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json --limit 1000
 # Force API mode (slower, fallback)
-python musicbrainz_cleaner.py input.json --use-api
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
 # Test connections
-python musicbrainz_cleaner.py --test-connection
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
 ```
 ## Maintenance
--- a/README.md
+++ b/README.md
@ -265,8 +265,13 @@ Before running tests, verify:
 ## 🧪 Testing
-Run the test suite to verify everything works correctly:
+### Test File Organization
 - **REQUIRED**: All test files must be placed in `src/tests/` directory
 - **PROHIBITED**: Test files should not be placed in the root directory
 - **Naming Convention**: Test files should follow `test_*.py` or `debug_*.py` patterns
 - **Purpose**: Keeps root directory clean and organizes test code properly
 ### Running Tests
 ```bash
 # Run all tests
 python3 src/tests/run_tests.py
@ -288,7 +293,25 @@ python3 src/tests/run_tests.py --list
 - **Integration Tests**: Test interactions between components and database
 - **Debug Tests**: Debug scripts and troubleshooting tools
-## 📁 Data Files
+## 📁 Project Structure
 ```
 musicbrainz-cleaner/
 ├── src/
 │   ├── api/                 # Database and API access
 │   ├── cli/                 # Command-line interface
 │   ├── config/              # Configuration and constants
 │   ├── core/                # Core functionality
 │   ├── tests/               # Test files (REQUIRED location)
 │   └── utils/               # Utility functions
 ├── data/                    # Data files and output
 │   ├── known_artists.json   # Name variations (ACDC → AC/DC)
 │   ├── known_recordings.json # Known recording MBIDs
 │   └── songs.json           # Source songs file
 └── docker-compose.yml       # Docker configuration
 ```
 ### Data Files
 The tool uses external JSON files for name variations:
@ -342,40 +365,61 @@ These files can be easily updated without touching the code, making it simple to
 - **Numbers**: "98 Degrees", "S Club 7", "3 Doors Down"
 - **Special Characters**: "a-ha", "The B-52s", "Salt-N-Pepa"
 ### 🆕 Simplified Processing
 - **Default Behavior**: Process all songs by default (no special flags needed)
 - **Separate Output Files**: Successful and failed songs saved to different files
 - **Progress Tracking**: Real-time progress with song counter and status
 - **Smart Defaults**: Sensible defaults for all file paths and options
 - **Detailed Reporting**: Comprehensive statistics and processing report
 - **Batch Processing**: Efficient handling of large song collections
 ## 📖 Usage Examples
-### Basic Usage
+### Basic Usage (Default)
 ```bash
-# Clean your songs and save to auto-generated filename
+# Process all songs with default settings (data/songs.json)
-python musicbrainz_cleaner.py my_songs.json
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
-# Output: my_songs_cleaned.json
+# Output: data/songs-success.json and data/songs-failure.json
 ```
-### Custom Output File
+### Custom Source File
 ```bash
-# Specify your own output filename
+# Process specific file
-python musicbrainz_cleaner.py my_songs.json cleaned_songs.json
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
 # Output: data/my_songs-success.json and data/my_songs-failure.json
 ```
 ### Custom Output Files
 ```bash
 # Specify custom output files
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json
 ```
 ### Limit Processing
 ```bash
 # Process only first 1000 songs
 docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
 ```
 ### Force API Mode
 ```bash
 # Use HTTP API instead of database (slower but works without PostgreSQL)
-python musicbrainz_cleaner.py my_songs.json --use-api
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
 ```
 ### Test Connections
 ```bash
 # Test database connection
-python musicbrainz_cleaner.py --test-connection
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
 # Test with API mode
-python musicbrainz_cleaner.py --test-connection --use-api
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
 ```
 ### Help
 ```bash
 # Show usage information
-python musicbrainz_cleaner.py --help
+docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help
 ```
 ## 📁 Data Files
@ -406,7 +450,10 @@ Your JSON file should contain an array of song objects:
 ## 📤 Output Format
-The tool will update your objects with corrected data:
+The tool creates **three output files**:
 ### 1. Successful Songs (`source-success.json`)
 Array of successfully processed songs with MBIDs added:
 ```json
 [
@ -433,39 +480,123 @@ The tool will update your objects with corrected data:
 ]
 ```
 ### 2. Failed Songs (`source-failure.json`)
 Array of songs that couldn't be processed (same format as source):
 ```json
 [
  {
    "artist": "Unknown Artist",
    "title": "Unknown Song",
    "disabled": false,
    "favorite": false,
    "guid": "12345678-1234-1234-1234-123456789012",
    "path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
  }
 ]
 ```
 ### 3. Processing Report (`processing_report_YYYYMMDD_HHMMSS.txt`)
 Human-readable text report with statistics and failed song list:
 ```
 MusicBrainz Data Cleaner - Processing Report
 ==================================================
 Source File: data/songs.json
 Processing Date: 2024-12-19 14:30:22
 Processing Time: 15263.3 seconds
 SUMMARY
 --------------------
 Total Songs Processed: 49,170
 Successful Songs: 40,692
 Failed Songs: 8,478
 Success Rate: 82.8%
 DETAILED STATISTICS
 --------------------
 Artists Found: 44,526/49,170 (90.6%)
 Recordings Found: 40,998/49,170 (83.4%)
 Processing Speed: 3.2 songs/second
 OUTPUT FILES
 --------------------
 Successful Songs: data/songs-success.json
 Failed Songs: data/songs-failure.json
 Report File: data/processing_report_20241219_143022.txt
 FAILED SONGS (First 50)
 --------------------
  1. Unknown Artist - Unknown Song
  2. Invalid Artist - Invalid Title
  3. Test Artist - Test Song
 ...
 ```
 ## 🎬 Example Run
 ### Basic Processing
 ```bash
-$ python musicbrainz_cleaner.py data/sample_songs.json
+$ docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
-Processing 3 songs...
+🚀 Starting song processing...
 📊 Total songs to process: 49,170
 Using database connection
 ==================================================
-[1/3] Processing: ACDC - Shot In The Dark
+[1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
-  🎯 Fuzzy match found: ACDC → AC/DC (score: 0.85)
+[2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
-  ✅ Found artist: AC/DC (MBID: 66c662b6-6e2f-4930-8610-912e24c63ed1)
+[3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
-  🎯 Fuzzy match found: Shot In The Dark → Shot in the Dark (score: 0.92)
+[4 of 49,170] ✅ PASS: Taylor Swift - Love Story
-  ✅ Found recording: Shot in the Dark (MBID: cf8b5cd0-d97c-413d-882f-fc422a2e57db)
+...
  ✅ Updated to: AC/DC - Shot in the Dark
-[2/3] Processing: Bruno Mars ft. Cardi B - Finesse Remix
+  📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
-  🎯 Fuzzy match found: Bruno Mars → Bruno Mars (score: 1.00)
+  📈 Progress: 200/49,170 (0.4%) - Success: 87.5% - Rate: 3.1 songs/sec
-  ✅ Found artist: Bruno Mars (MBID: afb680f2-b6eb-4cd7-a70b-a63b25c763d5)
+  ...
  🎯 Fuzzy match found: Finesse Remix → Finesse (remix) (score: 0.88)
  ✅ Found recording: Finesse (remix) (MBID: 8ed14014-547a-4128-ab81-c2dca7ae198e)
  ✅ Updated to: Bruno Mars feat. Cardi B - Finesse (remix)
 [3/3] Processing: Taylor Swift - Love Story
  🎯 Fuzzy match found: Taylor Swift → Taylor Swift (score: 1.00)
  ✅ Found artist: Taylor Swift (MBID: 20244d07-534f-4eff-b4d4-930878889970)
  🎯 Fuzzy match found: Love Story → Love Story (score: 1.00)
  ✅ Found recording: Love Story (MBID: d783e6c5-761f-4fc3-bfcf-6089cdfc8f96)
  ✅ Updated to: Taylor Swift - Love Story
 ==================================================
-✅ Processing complete!
+🎉 Processing completed!
-📁 Output saved to: data/sample_songs_cleaned.json
+📊 Final Results:
  ⏱️  Total processing time: 15263.3 seconds
  🚀 Average speed: 3.2 songs/second
  ✅ Artists found: 44,526/49,170 (90.6%)
  ✅ Recordings found: 40,998/49,170 (83.4%)
  ❌ Failed songs: 8,478 (17.2%)
 📄 Files saved:
  ✅ Successful songs: data/songs-success.json
  ❌ Failed songs: data/songs-failure.json
  📋 Text report: data/processing_report_20241219_143022.txt
  📊 JSON report: data/processing_report_20241219_143022.json
 ```
 ### Limited Processing
 ```bash
 $ docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
 ⚠️  Limiting processing to first 1000 songs
 🚀 Starting song processing...
 📊 Total songs to process: 1,000
 Using database connection
 ==================================================
 [1 of 1,000] ✅ PASS: ACDC - Shot In The Dark
 [2 of 1,000] ❌ FAIL: Unknown Artist - Unknown Song
 ...
 ==================================================
 🎉 Processing completed!
 📊 Final Results:
  ⏱️  Total processing time: 312.5 seconds
  🚀 Average speed: 3.2 songs/second
  ✅ Artists found: 856/1,000 (85.6%)
  ✅ Recordings found: 789/1,000 (78.9%)
  ❌ Failed songs: 211 (21.1%)
 📄 Files saved:
  ✅ Successful songs: data/songs-success.json
  ❌ Failed songs: data/songs-failure.json
  📋 Text report: data/processing_report_20241219_143022.txt
  📊 JSON report: data/processing_report_20241219_143022.json
 ```
 ## 🔧 Troubleshooting
@ -601,6 +732,12 @@ This tool is provided as-is for educational and personal use.
 - **Fuzzy search thresholds** need tuning for different datasets
 - **Connection pooling** would improve performance for large datasets
 ### CLI Design
 - **Simplified interface** with smart defaults reduces complexity
 - **Array format consistency** makes output files easier to work with
 - **Human-readable reports** improve user experience
 - **Test file organization** keeps project structure clean
 ---
 **Happy cleaning! 🎵✨** 
--- a/data/songs.json
+++ b/data/songs.json
@ -63689,6 +63689,14 @@
    "path": "z://MP4\\KaraFun Karaoke\\Karaoke Zoom - The Commodores.mp4",
    "title": "Zoom"
  },
  {
    "artist": "Kat DeLuna feat. Elephant Man",
    "disabled": false,
    "favorite": false,
    "guid": "a5487de7-4ec6-d6bb-7e88-6ec275133a52",
    "path": "z://MP4\\KaraFun Karaoke\\Kat DeLuna feat. Elephant Man - Whine Up.mp4",
    "title": "Whine Up"
  },
  {
    "artist": "Marillion",
    "disabled": false,
@ -69378,14 +69386,6 @@
    "path": "z://MP4\\KaraFun Karaoke\\Whenever, Wherever - Shakira   Karaoke Version   KaraFun.mp4",
    "title": "Whenever, Wherever"
  },
  {
    "artist": "Whine Up",
    "disabled": false,
    "favorite": false,
    "guid": "04f64889-07cc-2811-bf7d-3fa235859e25",
    "path": "z://MP4\\KaraFun Karaoke\\Whine Up - Kat DeLuna feat. Elephant Man   Karaoke Version   KaraFun.mp4",
    "title": "Kat DeLuna feat. Elephant Man   Karaoke Version   KaraFun"
  },
  {
    "artist": "Michael Bublé",
    "disabled": false,
@ -169868,7 +169868,7 @@
    "title": "Girl Crush"
  },
  {
-    "artist": "Little Mix",
+    "artist": "Little M!x",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
@ -169877,7 +169877,7 @@
    "title": "How Ya Doin'"
  },
  {
-    "artist": "Little Mix",
+    "artist": "Little M!x",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
@ -200306,6 +200306,14 @@
    "path": "z://MP4\\Stingray Karaoke\\Bartender  Lady Antebellum   Karaoke with Lyrics.mp4",
    "title": "Bartender"
  },
  {
    "artist": "Bastille",
    "disabled": false,
    "favorite": false,
    "guid": "2ca64591-9d7c-e975-5340-ca1804195902",
    "path": "z://MP4\\Stingray Karaoke\\Bastille - Pompeii.mp4",
    "title": "Pompeii"
  },
  {
    "artist": "No Doubt",
    "disabled": false,
@ -200549,6 +200557,14 @@
    "path": "z://MP4\\Stingray Karaoke\\Black Sabbath - Paranoid.mp4",
    "title": "Paranoid"
  },
  {
    "artist": "Black Sabbath",
    "disabled": false,
    "favorite": false,
    "guid": "b1e7d35c-6682-546e-4b29-5829d3343899",
    "path": "z://MP4\\Stingray Karaoke\\Black Sabbath - Snowblind.mp4",
    "title": "Snowblind"
  },
  {
    "artist": "Elton John",
    "disabled": false,
@ -202891,6 +202907,14 @@
    "path": "z://MP4\\Stingray Karaoke\\Jingle Bell Rock  Bobby Helms   Karaoke with Lyrics.mp4",
    "title": "Jingle Bell Rock"
  },
  {
    "artist": "John Lennon",
    "disabled": false,
    "favorite": false,
    "guid": "1b67db72-da96-f56f-2e38-c6ace52dfc1a",
    "path": "z://MP4\\Stingray Karaoke\\John Lennon - Give Peace A Chance.mp4",
    "title": "Give Peace A Chance"
  },
  {
    "artist": "John Lennon",
    "disabled": false,
@ -202987,38 +203011,6 @@
    "path": "z://MP4\\Stingray Karaoke\\Justin Bieber - Mistletoe (Karaoke Version).mp4",
    "title": "Mistletoe"
  },
  {
    "artist": "Bastille",
    "disabled": false,
    "favorite": false,
    "guid": "e270d6ed-4e3e-9db2-d5ff-a06e1cc6b7d3",
    "path": "z://MP4\\Stingray Karaoke\\Karaoke Version Pompeii in the Style of  Bastille  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Version Pompeii"
  },
  {
    "artist": "John Lennon",
    "disabled": false,
    "favorite": false,
    "guid": "caa7c825-12a7-57b7-fafd-4e259851bd54",
    "path": "z://MP4\\Stingray Karaoke\\Karaoke Video Give Peace A Chance in the Style of  John Lennon  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video Give Peace A Chance"
  },
  {
    "artist": "Peggy Lee",
    "disabled": false,
    "favorite": false,
    "guid": "532b8b6d-4fa3-b5b4-3879-765d6bc223e2",
    "path": "z://MP4\\Stingray Karaoke\\Karaoke Video Sing Fever in the Style of  Peggy Lee  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video Sing Fever"
  },
  {
    "artist": "Black Sabbath",
    "disabled": false,
    "favorite": false,
    "guid": "4dae1287-ffd5-7d98-3b39-21cc172bebd5",
    "path": "z://MP4\\Stingray Karaoke\\Karaoke Video Snowblind in the Style of  Black Sabbath  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video Snowblind"
  },
  {
    "artist": "Katy Perry",
    "disabled": false,
@ -204138,6 +204130,14 @@
    "path": "z://MP4\\Stingray Karaoke\\Pearl Jam - Black (Karaoke Version).mp4",
    "title": "Black"
  },
  {
    "artist": "Peggy Lee",
    "disabled": false,
    "favorite": false,
    "guid": "bac4bc08-7c95-f059-8abe-a723960cd2aa",
    "path": "z://MP4\\Stingray Karaoke\\Peggy Lee - Sing Fever.mp4",
    "title": "Sing Fever"
  },
  {
    "artist": "Pentatonix",
    "disabled": false,
@ -207329,6 +207329,14 @@
    "path": "z://MP4\\TheKARAOKEChannel\\Birthday in the Style of  The Beatles  karaoke video with lyrics (no lead vocal).mp4",
    "title": "Birthday"
  },
  {
    "artist": "Black Sabbath",
    "disabled": false,
    "favorite": false,
    "guid": "b0dfc4cd-8a54-0db4-378a-266d8e14e882",
    "path": "z://MP4\\TheKARAOKEChannel\\Black Sabbath - Snowblind.mp4",
    "title": "Snowblind"
  },
  {
    "artist": "Alannah Myles",
    "disabled": false,
@ -208913,6 +208921,14 @@
    "path": "z://MP4\\TheKARAOKEChannel\\Green Day - I Fought The Law.mp4",
    "title": "I Fought The Law"
  },
  {
    "artist": "Gretchen Wilson",
    "disabled": false,
    "favorite": false,
    "guid": "660cec43-1346-f165-7f30-ada5d336d123",
    "path": "z://MP4\\TheKARAOKEChannel\\Gretchen Wilson - Here For The Party.mp4",
    "title": "Here For The Party"
  },
  {
    "artist": "Miranda Lambert",
    "disabled": false,
@ -209841,6 +209857,14 @@
    "path": "z://MP4\\TheKARAOKEChannel\\John Legend - All Of Me (Lyrics).mp4",
    "title": "All Of Me"
  },
  {
    "artist": "John Lennon",
    "disabled": false,
    "favorite": false,
    "guid": "a9a33863-57a0-01f2-0060-386e0f3cdc32",
    "path": "z://MP4\\TheKARAOKEChannel\\John Lennon - Give Peace A Chance.mp4",
    "title": "Give Peace A Chance"
  },
  {
    "artist": "John Lennon",
    "disabled": false,
@ -210017,54 +210041,6 @@
    "path": "z://MP4\\TheKARAOKEChannel\\Karaoke Version  Pompeii in the Style of  Bastille  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Version  Pompeii"
  },
  {
    "artist": "Kelly Clarkson",
    "disabled": false,
    "favorite": false,
    "guid": "ba61f0c9-cf3c-47b5-2d2f-b5de183348e9",
    "path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video   Because of You  in the Style of  Kelly Clarkson  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video   Because of You"
  },
  {
    "artist": "John Lennon",
    "disabled": false,
    "favorite": false,
    "guid": "d0610705-cbce-05c4-e9ce-c7e0d5dec594",
    "path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video  Give Peace A Chance in the Style of  John Lennon  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video  Give Peace A Chance"
  },
  {
    "artist": "Gretchen Wilson",
    "disabled": false,
    "favorite": false,
    "guid": "0ad553d3-6816-a567-aae1-ffe72f717c95",
    "path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video  Here For The Party in the Style of  Gretchen Wilson  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video  Here For The Party"
  },
  {
    "artist": "Peggy Lee",
    "disabled": false,
    "favorite": false,
    "guid": "9c448677-ebaa-f8f4-9a32-55d16451515b",
    "path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video  Sing Fever in the Style of  Peggy Lee  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video  Sing Fever"
  },
  {
    "artist": "Black Sabbath",
    "disabled": false,
    "favorite": false,
    "guid": "ae8512c0-733e-b962-708f-57e74c77c7f8",
    "path": "z://MP4\\TheKARAOKEChannel\\Karaoke Video  Snowblind in the Style of  Black Sabbath  with lyrics (no lead vocal).mp4",
    "title": "Karaoke Video  Snowblind"
  },
  {
    "artist": "Jewel",
    "disabled": false,
    "favorite": false,
    "guid": "8775ad86-a0a0-2877-99ef-5b59f928ac4c",
    "path": "z://MP4\\TheKARAOKEChannel\\KarenLovesAdam performs Foolish Games for the Undercover Karaoke Challenge in the style of Jewel.mp4",
    "title": "KarenLovesAdam performs Foolish Games for the Undercover Karaoke Challenge"
  },
  {
    "artist": "Culture Club",
    "disabled": false,
@ -210137,6 +210113,14 @@
    "path": "z://MP4\\TheKARAOKEChannel\\Kelly Clarkson & Ariana Grande - Santa, Can’t You Hear Me (Karaoke With Lyrics).mp4",
    "title": "Santa, Can’t You Hear Me"
  },
  {
    "artist": "Kelly Clarkson",
    "disabled": false,
    "favorite": false,
    "guid": "6362d98a-46df-48a6-6bf9-7a3df08a390c",
    "path": "z://MP4\\TheKARAOKEChannel\\Kelly Clarkson - Because of You .mp4",
    "title": "Because of You"
  },
  {
    "artist": "Kenny Rogers",
    "disabled": false,
@ -211369,6 +211353,14 @@
    "path": "z://MP4\\TheKARAOKEChannel\\Peaceful Easy Feeling  Eagles   Karaoke with Lyrics.mp4",
    "title": "Peaceful Easy Feeling"
  },
  {
    "artist": "Peggy Lee",
    "disabled": false,
    "favorite": false,
    "guid": "8c50e858-3dc6-19d9-a417-5331e763af4b",
    "path": "z://MP4\\TheKARAOKEChannel\\Peggy Lee - Sing Fever.mp4",
    "title": "Sing Fever"
  },
  {
    "artist": "Pentatonix",
    "disabled": false,
@ -218141,15 +218133,6 @@
    "path": "z://MP4\\ZoomKaraokeOfficial\\All About Eve - Marthas Harbour.mp4",
    "title": "Martha's Harbour"
  },
  {
    "artist": "All I Want For Christmas Is My Two Front Teeth",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
    "guid": "f347cb5c-fbc2-2d4d-a124-741414c58939",
    "path": "z://MP4\\ZoomKaraokeOfficial\\All I Want For Christmas Is My Two Front Teeth - Karaoke Version from Zoom Karaoke.mp4",
    "title": "Karaoke Version from Zoom Karaoke"
  },
  {
    "artist": "All Saints",
    "disabled": false,
@ -219192,7 +219175,7 @@
    "genre": "Karaoke",
    "guid": "b5b380d6-6699-d1b6-095b-e7721f553838",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Annie Soundtrack - Tomorrow Karaoke Version from Zoom Karaoke (1982 Version).mp4",
-    "title": "Tomorrow"
+    "title": "Tomorrow - Karaoke Version from Zoom Karaoke (1982 Version)"
  },
  {
    "artist": "Another Level",
@ -220407,7 +220390,7 @@
    "genre": "Karaoke",
    "guid": "892e330d-aeb9-195d-e2ac-1d2ed44b00f2",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Bananarama - Venus Karaoke Version from Zoom Karaoke (Lyric Fixed).mp4",
-    "title": "Venus (Lyric Fixed)"
+    "title": "Venus - Karaoke Version from Zoom Karaoke (Lyric Fixed)"
  },
  {
    "artist": "Bananarama",
@ -220416,7 +220399,7 @@
    "genre": "Karaoke",
    "guid": "3cda0061-edc5-71eb-b1ce-a592de40fed8",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Bananarama - Venus Karaoke Version from Zoom Karaoke (Old Version).mp4",
-    "title": "Venus (Old Version)"
+    "title": "Venus - Karaoke Version from Zoom Karaoke (Old Version)"
  },
  {
    "artist": "Band Aid 30",
@ -223107,7 +223090,7 @@
    "genre": "Karaoke",
    "guid": "708df1c8-4d05-607f-30a0-7ea1cf43c026",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Blondie - Union City Blue  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Union City Blue -  Karaoke Version from Zoom Karaoke"
+    "title": "Union City Blue"
  },
  {
    "artist": "Blood Brothers",
@ -225150,7 +225133,7 @@
    "genre": "Karaoke",
    "guid": "ee0c36f0-bf6c-4d87-78f1-356e07824609",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Bruce Springsteen - Sherry Darling  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Sherry Darling -  Karaoke Version from Zoom Karaoke"
+    "title": "Sherry Darling"
  },
  {
    "artist": "Bruce Springsteen",
@ -227112,7 +227095,7 @@
    "genre": "Karaoke",
    "guid": "89e2f35a-d52a-54f7-da46-8a590ee38f68",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Charli XCX - Speed Drive Karaoke Version from Zoom Karaoke (Barbie Movie).mp4",
-    "title": "Speed Drive (Barbie Movie)"
+    "title": "Speed Drive - Karaoke Version from Zoom Karaoke (Barbie Movie)"
  },
  {
    "artist": "Charli XCX ft. Ariana Grande",
@ -230082,7 +230065,7 @@
    "genre": "Karaoke",
    "guid": "0173e449-92bc-8c75-7054-105227a56c19",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Darlene Love - All Alone On Christmas Karaoke Version from Zoom Karaoke (from Home Alone).mp4",
-    "title": "All Alone On Christmas (from 'Home Alone')"
+    "title": "All Alone On Christmas - Karaoke Version from Zoom Karaoke (from 'Home Alone')"
  },
  {
    "artist": "Darts",
@ -230217,7 +230200,7 @@
    "genre": "Karaoke",
    "guid": "a48c4cc6-7a32-57e7-48ce-14ff7692b8d1",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Dave Edmunds - From Small Things (Big Things One Day Come) Karaoke Version from Zoom Karaoke.mp4",
-    "title": "From Small Things (Big Things One Day Come)."
+    "title": "From Small Things (Big Things One Day Come) - Karaoke Version from Zoom Karaoke."
  },
  {
    "artist": "Dave Edmunds",
@ -232143,7 +232126,7 @@
    "genre": "Karaoke",
    "guid": "b119decd-7823-7e21-53b8-86d6934c9d87",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Devo - Girl U Want  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Girl U Want -  Karaoke Version from Zoom Karaoke"
+    "title": "Girl U Want"
  },
  {
    "artist": "Dexy's Midnight Runners",
@ -232550,15 +232533,6 @@
    "path": "z://MP4\\ZoomKaraokeOfficial\\Diplo & Miguel - Dont Forget My Love.mp4",
    "title": "Don't Forget My Love"
  },
  {
    "artist": "Disney Villians The Musical ft. Maleficent",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
    "guid": "f2b524bf-d2e3-9a7e-a83a-495d79fdea59",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Disney Villians The Musical feat Maleficent - Karaoke Version from Zoom Karaoke.mp4",
    "title": "Karaoke Version from Zoom Karaoke"
  },
  {
    "artist": "Divinyls",
    "disabled": false,
@ -238587,7 +238561,7 @@
    "genre": "Karaoke",
    "guid": "c9081555-9f9f-7de4-1889-e0fde67dba1c",
    "path": "z://MP4\\ZoomKaraokeOfficial\\First Aid Kit - Fireworks (No Harmony For Duet)  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Fireworks (No Harmony For Duet) -  Karaoke Version from Zoom Karaoke"
+    "title": "Fireworks (No Harmony For Duet)"
  },
  {
    "artist": "First Aid Kit",
@ -242000,15 +241974,6 @@
    "path": "z://MP4\\ZoomKaraokeOfficial\\God Save The King - British National Anthem.mp4",
    "title": "British National Anthem"
  },
  {
    "artist": "God Save The Queen",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
    "guid": "b7d8684d-b7af-28b9-f7cd-8ec9dd16320f",
    "path": "z://MP4\\ZoomKaraokeOfficial\\God Save The Queen - Karaoke Version from Zoom Karaoke British National Anthem.mp4",
    "title": "Karaoke Version from Zoom Karaoke - British National Anthem"
  },
  {
    "artist": "Goldfrapp",
    "disabled": false,
@ -242036,15 +242001,6 @@
    "path": "z://MP4\\ZoomKaraokeOfficial\\Gonzalez - Havent Stopped Dancing Yet.mp4",
    "title": "Haven't Stopped Dancing Yet"
  },
  {
    "artist": "Goodnight Campers from Hi-De-Hi",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
    "guid": "05afcc76-656b-9f6c-986f-c204a08e49eb",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Goodnight Campers from Hi-De-Hi - Karaoke Version from Zoom Karaoke.mp4",
    "title": "Karaoke Version from Zoom Karaoke"
  },
  {
    "artist": "Gorgon City ft. Zak Abel",
    "disabled": false,
@ -244493,15 +244449,6 @@
    "path": "z://MP4\\ZoomKaraokeOfficial\\Imelda May - Train Kept A Rollin.mp4",
    "title": "Train Kept A Rollin'"
  },
  {
    "artist": "In The Box (The Goodbye Song)",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
    "guid": "0c4383a2-d65f-a16c-b4b2-1a0fa1189480",
    "path": "z://MP4\\ZoomKaraokeOfficial\\In The Box (The Goodbye Song) - Karaoke Version from Zoom Karaoke Australian TV Theme.mp4",
    "title": "Karaoke Version from Zoom Karaoke - Australian TV Theme"
  },
  {
    "artist": "Infernal",
    "disabled": false,
@ -246111,7 +246058,7 @@
    "genre": "Karaoke",
    "guid": "3ba6129d-ac04-b405-044d-8b453f565165",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Jeannie C Riley - Harper Valley PTA  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Harper Valley P.T.A. -  Karaoke Version from Zoom Karaoke"
+    "title": "Harper Valley P.T.A."
  },
  {
    "artist": "Jedward",
@ -250758,13 +250705,13 @@
    "title": "You And I"
  },
  {
-    "artist": "Kenny Rogers -You Decorated My Life",
+    "artist": "Kenny Rogers",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
-    "guid": "dec681a1-bddc-9920-083c-79022832ae3c",
+    "guid": "8bf25cbb-f0b1-b7d1-3095-9bc53b1fd971",
-    "path": "z://MP4\\ZoomKaraokeOfficial\\Kenny Rogers -You Decorated My Life - Karaoke Version from Zoom Karaoke.mp4",
+    "path": "z://MP4\\ZoomKaraokeOfficial\\Kenny Rogers - You Decorated My Life.mp4",
-    "title": "Karaoke Version from Zoom Karaoke"
+    "title": "You Decorated My Life"
  },
  {
    "artist": "Kenny Rogers And Dolly Parton",
@ -257280,7 +257227,7 @@
    "genre": "Karaoke",
    "guid": "c67b0bdb-0fdb-79e6-1aa7-d9abb648df62",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Meghan Trainor - Good To Be Alive  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Good To Be Alive -  Karaoke Version from Zoom Karaoke"
+    "title": "Good To Be Alive"
  },
  {
    "artist": "Meghan Trainor",
@ -263886,7 +263833,7 @@
    "genre": "Karaoke",
    "guid": "33ebdbf6-adb1-bca8-a29a-499a167b53a8",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Perry Como - Ave Maria  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Ave Maria -  Karaoke Version from Zoom Karaoke"
+    "title": "Ave Maria"
  },
  {
    "artist": "Perry Como",
@ -265785,7 +265732,7 @@
    "genre": "Karaoke",
    "guid": "da7260c6-5dab-0a8b-8229-307e1d5d2b2f",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Postmodern Jukebox - We Cant Stop (Without Backing Vocals)  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "We Can't Stop (Without Backing Vocals) -  Karaoke Version from Zoom Karaoke"
+    "title": "We Can't Stop"
  },
  {
    "artist": "Postmodern Jukebox",
@ -267675,7 +267622,7 @@
    "genre": "Karaoke",
    "guid": "d041e962-e78a-ad5f-e419-b670e47590e4",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Rihanna - SOS  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "S.O.S. -  Karaoke Version from Zoom Karaoke"
+    "title": "S.O.S."
  },
  {
    "artist": "Rihanna",
@ -271617,7 +271564,7 @@
    "genre": "Karaoke",
    "guid": "1cbf4200-9369-7551-005b-63420fa9b187",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Sheppard - Geronimo  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Geronimo -  Karaoke Version from Zoom Karaoke"
+    "title": "Geronimo"
  },
  {
    "artist": "Sherbet",
@ -271977,7 +271924,7 @@
    "genre": "Karaoke",
    "guid": "d10abce9-d119-e81a-4156-48bf66ea9f30",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Showaddywaddy - Why Do Lovers Break Each Others Hearts Karaoke Version From Zoom Karaoke.mp4",
-    "title": "Why Do Lovers Break Each Other's Hearts - Karaoke Version From Zoom Karaoke"
+    "title": "Why Do Lovers Break Each Other's Hearts"
  },
  {
    "artist": "Showaddywaddy",
@ -275424,7 +275371,7 @@
    "genre": "Karaoke",
    "guid": "50bc6b77-c2b2-bbca-522a-05ec0d55d2bd",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Take That - SOS  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "S.O.S. -  Karaoke Version from Zoom Karaoke"
+    "title": "S.O.S."
  },
  {
    "artist": "Take That",
@ -281543,7 +281490,7 @@
    "genre": "Karaoke",
    "guid": "75c896fe-77ad-690b-f358-dccbda4b5ae8",
    "path": "z://MP4\\ZoomKaraokeOfficial\\The Mavericks - Here Comes My Baby Karaoke Version from Zoom Karaoke.mp4",
-    "title": "Here Comes My Baby - Karaoke Version from Zoom Karaoke"
+    "title": "Here Comes My Baby"
  },
  {
    "artist": "The Mavericks",
@ -284417,7 +284364,7 @@
    "title": "You're No Good"
  },
  {
-    "artist": "The Tamperer featuring Maya",
+    "artist": "The Tamperer ft. Maya",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
@ -284426,7 +284373,7 @@
    "title": "Feel It"
  },
  {
-    "artist": "The Tamperer featuring Maya",
+    "artist": "The Tamperer ft. Maya",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
@ -286475,7 +286422,7 @@
    "genre": "Karaoke",
    "guid": "6e734302-c67e-cfff-c585-65e90e57c9a4",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Tom Jones - Ill Never Fall In Love Again  Karaoke Version from Zoom Karaoke.mp4",
-    "title": "I'll Never Fall In Love Again -  Karaoke Version from Zoom Karaoke"
+    "title": "I'll Never Fall In Love Again"
  },
  {
    "artist": "Tom Jones",
@ -288970,15 +288917,6 @@
    "path": "z://MP4\\ZoomKaraokeOfficial\\Wendy Moten - Come In Out Of The Rain.mp4",
    "title": "Come In Out Of The Rain"
  },
  {
    "artist": "Werewolves Of London/All Summer Long/Sweet Home Alabama Medley",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
    "guid": "fec61e62-37f3-7d62-f527-aa58a8a9cb45",
    "path": "z://MP4\\ZoomKaraokeOfficial\\Werewolves Of LondonAll Summer LongSweet Home Alabama Medley - Karaoke Version from Zoom Karaoke.mp4",
    "title": "Karaoke Version from Zoom Karaoke"
  },
  {
    "artist": "West End ft. Sybil",
    "disabled": false,
@ -306804,7 +306742,7 @@
    "title": "Metal Postcard"
  },
  {
-    "artist": "Lovato, Demi & Joe Jonas",
+    "artist": "Lavato, Demi & Joe Jonas",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
@ -308673,7 +308611,7 @@
    "title": "I Run To You"
  },
  {
-    "artist": "Lovato, Demi & Joe Jonas",
+    "artist": "Lavato, Demi & Joe Jonas",
    "disabled": false,
    "favorite": false,
    "genre": "Karaoke",
@ -358620,7 +358558,7 @@
    "title": "Green Garden"
  },
  {
-    "artist": "Little Mix",
+    "artist": "Little M!x",
    "disabled": false,
    "favorite": false,
    "guid": "c8a57b4d-070d-0569-f178-edc96ff7b64f",
--- a/src/api/database.py
+++ b/src/api/database.py
@ -348,7 +348,7 @@ class MusicBrainzDatabase:
                    best_recording_count = artist['recording_count']
                    print(f"  🎯 New best match: {artist['name']} (score: {score}, recordings: {artist['recording_count']})")
-            if best_score >= 80:
+            if best_score >= 70:  # Lower threshold for better matching
                print(SUCCESS_MESSAGES['fuzzy_match_found'].format(
                    original=artist_name,
                    matched=best_match['name'],
@ -413,7 +413,7 @@ class MusicBrainzDatabase:
                            best_score = score
                            best_match = recording
-                if best_score >= 80:
+                if best_score >= 70:  # Lower threshold for better matching
                    return (best_match['name'], best_match['gid'], best_score / 100.0)
            else:
                # No artist constraint - search by title only with all variations
@ -447,7 +447,7 @@ class MusicBrainzDatabase:
                            best_score = score
                            best_match = recording
-                if best_score >= 80:
+                if best_score >= 70:  # Lower threshold for better matching
                    return (best_match['name'], best_match['gid'], best_score / 100.0)
            return None
@ -608,15 +608,30 @@ class MusicBrainzDatabase:
        import re
        # Primary collaboration indicators
-        primary_patterns = ['ft.', 'feat.', 'featuring']
+        primary_patterns = ['ft.', 'feat.', 'featuring', 'ft', 'feat']
        # Secondary collaboration indicators (need more careful handling)
-        secondary_patterns = ['&', 'and']
+        secondary_patterns = ['&', 'and', ',']
        # Check if this is a collaboration
        is_collaboration = False
        split_pattern = None
        # Special case: Handle malformed artist names like "ft Jamie Foxx West, Kanye"
        # This should be "Kanye West ft. Jamie Foxx"
        if artist_string.lower().startswith(('ft ', 'feat ')):
            # This is a malformed collaboration string
            # Try to extract the actual artists from the rest
            remaining = artist_string[artist_string.find(' ') + 1:].strip()
            if ',' in remaining:
                # Split on comma and reverse the order
                parts = [part.strip() for part in remaining.split(',')]
                if len(parts) >= 2:
                    # Assume the last part is the main artist
                    main_artist = parts[-1].strip()
                    collaborators = parts[:-1]
                    return (main_artist, collaborators)
        for pattern in primary_patterns:
            if pattern.lower() in artist_string.lower():
                is_collaboration = True
@ -632,6 +647,20 @@ class MusicBrainzDatabase:
        # If no primary collaboration found, check secondary patterns
        if not is_collaboration:
            for pattern in secondary_patterns:
                if pattern == ',':
                    # Handle comma-separated artists (e.g., "Ariana Grande, Normani, Nicki Minaj")
                    if ',' in artist_string:
                        # Count commas to determine if this is likely a collaboration
                        comma_count = artist_string.count(',')
                        if comma_count >= 1:
                            # Split on comma and treat as collaboration
                            parts = [part.strip() for part in artist_string.split(',')]
                            if len(parts) >= 2:
                                # First artist is main, rest are collaborators
                                main_artist = parts[0]
                                collaborators = parts[1:]
                                return (main_artist, collaborators)
                else:
                    # Use word boundaries to avoid splitting within words like "Orlando"
                    import re
                    pattern_regex = r'\b' + re.escape(pattern) + r'\b'
@ -711,45 +740,152 @@ class MusicBrainzDatabase:
    def _generate_title_variations(self, title: str) -> List[str]:
        """
-        Generate title variations by removing parenthetical content.
+        Generate title variations by removing parenthetical content and fixing common issues.
        Returns list of title variations to try.
        """
        import re
        search_titles = [title.strip()]
-        # Remove complete parentheses (content)
+        # Fix common typos and missing apostrophes
        title_fixes = title.strip()
        # Fix missing apostrophes in common contractions
        apostrophe_fixes = [
            (r'\bDont\b', "Don't"),
            (r'\bCant\b', "Can't"),
            (r'\bWont\b', "Won't"),
            (r'\bArent\b', "Aren't"),
            (r'\bIsnt\b', "Isn't"),
            (r'\bWasnt\b', "Wasn't"),
            (r'\bDidnt\b', "Didn't"),
            (r'\bDoesnt\b', "Doesn't"),
            (r'\bHavent\b', "Haven't"),
            (r'\bHasnt\b', "Hasn't"),
            (r'\bWouldnt\b', "Wouldn't"),
            (r'\bCouldnt\b', "Couldn't"),
            (r'\bShouldnt\b', "Shouldn't"),
            (r'\bPhunk\b', "Funk"),  # Common typo
            (r'\bBout\b', "About"),  # Shortened form
        ]
        for pattern, replacement in apostrophe_fixes:
            fixed_title = re.sub(pattern, replacement, title_fixes, flags=re.IGNORECASE)
            if fixed_title != title_fixes:
                title_fixes = fixed_title
                if title_fixes not in search_titles:
                    search_titles.append(title_fixes)
        # Comprehensive parentheses removal - try multiple approaches
        # 1. Remove all complete parentheses (most aggressive)
        clean_title = re.sub(r'\s*\([^)]*\)', '', title.strip())
        clean_title = clean_title.strip()
        if clean_title != title.strip() and clean_title:
            search_titles.append(clean_title)
-        # Remove unmatched opening parenthesis at end
+        # 2. Remove specific common patterns first, then general parentheses
        specific_patterns = [
            r'\s*\(Karaoke Version\)',
            r'\s*\(Karaoke\)',
            r'\s*\(Instrumental\)',
            r'\s*\(Backing Track\)',
            r'\s*\(live [^)]*\)',
            r'\s*\(Live [^)]*\)',
            r'\s*\(Acoustic\)',
            r'\s*\(acoustic\)',
            r'\s*\(Without Backing Vocals\)',
            r'\s*\(Without Backing Vocals\)',
            r'\s*\(Clean\)',
            r'\s*\(clean\)',
            r'\s*\(Remix\)',
            r'\s*\(remix\)',
            r'\s*\(Radio Edit\)',
            r'\s*\(radio edit\)',
            r'\s*\(Extended Mix\)',
            r'\s*\(extended mix\)',
            r'\s*\(Single Version\)',
            r'\s*\(single version\)',
            r'\s*\(Album Version\)',
            r'\s*\(album version\)',
            r'\s*\(Original Mix\)',
            r'\s*\(original mix\)',
            r'\s*\(John Lewis Christmas Ad \d+\)',  # Specific pattern from test
            r'\s*\(from the movie [^)]*\)',
            r'\s*\(from the [^)]*\)',
            r'\s*\(feat\. [^)]*\)',
            r'\s*\(featuring [^)]*\)',
            r'\s*\(ft\. [^)]*\)',
            r'\s*\(duet\)',
            r'\s*\(Duet\)',
            r'\s*\(Two Semitones Down\)',
            r'\s*\(Minus Piano\)',
            r'\s*\(Cut Down\)',
            r'\s*\(Boone & Speedy Vocals\)',
            r'\s*\(My Heart Belongs to You\)',
        ]
        # 3. Remove dash-separated content (like "Live At the BBC")
        dash_patterns = [
            r'\s*-\s*Live [^-]*$',
            r'\s*-\s*live [^-]*$',
            r'\s*-\s*Live At [^-]*$',
            r'\s*-\s*Live At the [^-]*$',
            r'\s*-\s*Live At the BBC$',
            r'\s*-\s*Live From [^-]*$',
            r'\s*-\s*Live In [^-]*$',
            r'\s*-\s*Live On [^-]*$',
        ]
        # Apply specific patterns first
        for pattern in specific_patterns:
            specific_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
            specific_clean = specific_clean.strip()
            if specific_clean != title.strip() and specific_clean and specific_clean not in search_titles:
                search_titles.append(specific_clean)
        # Apply dash patterns
        for pattern in dash_patterns:
            dash_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
            dash_clean = dash_clean.strip()
            if dash_clean != title.strip() and dash_clean and dash_clean not in search_titles:
                search_titles.append(dash_clean)
        # 3. Remove any remaining parentheses after specific patterns
        for pattern in specific_patterns:
            remaining_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
            remaining_clean = re.sub(r'\s*\([^)]*\)', '', remaining_clean.strip())
            remaining_clean = remaining_clean.strip()
            if remaining_clean != title.strip() and remaining_clean and remaining_clean not in search_titles:
                search_titles.append(remaining_clean)
        # 4. Remove unmatched opening parenthesis at end
        clean_title2 = re.sub(r'\s*\([^)]*$', '', title.strip())
        clean_title2 = clean_title2.strip()
        if clean_title2 != title.strip() and clean_title2 and clean_title2 not in search_titles:
            search_titles.append(clean_title2)
-        # Remove unmatched closing parenthesis at start
+        # 5. Remove unmatched closing parenthesis at start
        clean_title3 = re.sub(r'^[^)]*\)\s*', '', title.strip())
        clean_title3 = clean_title3.strip()
        if clean_title3 != title.strip() and clean_title3 and clean_title3 not in search_titles:
            search_titles.append(clean_title3)
-        # Also try with specific karaoke patterns removed
+        # 6. Try removing extra spaces and normalizing
-        karaoke_patterns = [
+        normalized_title = re.sub(r'\s+', ' ', title.strip())
-            r'\s*\(Karaoke Version\)',
+        if normalized_title != title.strip() and normalized_title not in search_titles:
-            r'\s*\(Karaoke\)',
+            search_titles.append(normalized_title)
            r'\s*\(Instrumental\)',
            r'\s*\(Backing Track\)',
        ]
        for pattern in karaoke_patterns:
            karaoke_clean = re.sub(pattern, '', title.strip(), flags=re.IGNORECASE)
            karaoke_clean = karaoke_clean.strip()
            if karaoke_clean != title.strip() and karaoke_clean not in search_titles:
                search_titles.append(karaoke_clean)
-        return search_titles
+        # 7. Apply normalization to all cleaned versions and remove duplicates
        normalized_versions = []
        for version in search_titles:
            # Normalize spaces (replace multiple spaces with single space)
            normalized = re.sub(r'\s+', ' ', version.strip())
            # Remove leading/trailing spaces
            normalized = normalized.strip()
            if normalized and normalized not in normalized_versions:
                normalized_versions.append(normalized)
        return normalized_versions
    def _parse_collaborators(self, collaborators_string: str) -> List[str]:
        """
--- a/src/cli/main.py
+++ b/src/cli/main.py
@ -10,6 +10,7 @@ import time
 import re
 from pathlib import Path
 from typing import Dict, Optional, Any, Tuple, List
 from datetime import datetime
 # Import constants
 from ..config.constants import (
@ -170,10 +171,10 @@ class MusicBrainzCleaner:
        import re
        # Primary collaboration indicators
-        primary_patterns = ['ft.', 'feat.', 'featuring']
+        primary_patterns = ['ft.', 'feat.', 'featuring', 'ft', 'feat']
        # Secondary collaboration indicators (need more careful handling)
-        secondary_patterns = ['&', 'and']
+        secondary_patterns = ['&', 'and', ',']
        # Check if this is a collaboration
        is_collaboration = False
@ -366,99 +367,188 @@ class MusicBrainzCleaner:
        return song, False
-    def clean_songs_file(self, input_file: Path, output_file: Optional[Path] = None, limit: Optional[int] = None) -> Tuple[Path, List[Dict]]:
+    def process_songs(self, source_file: Path, output_success: Path = None, output_failure: Path = None, limit: Optional[int] = None) -> Dict[str, Any]:
-        try:
+        """
-            # Read input file
+        Process songs from source file and save successful and failed songs to separate files.
-            with open(input_file, 'r', encoding='utf-8') as f:
+        This is the main processing method that handles full dataset processing by default.
-                songs = json.load(f)
+        """
        if not source_file.exists():
            print(f'❌ Source file not found: {source_file}')
            return {}
-            if not isinstance(songs, list):
+        print('🚀 Starting song processing...')
-                print("Error: Input file should contain a JSON array of songs")
+        
-                return input_file, []
+        # Load songs
        with open(source_file, 'r') as f:
            all_songs = json.load(f)
        if not isinstance(all_songs, list):
            print("Error: Source file should contain a JSON array of songs")
            return {}
        # Apply limit if specified
        if limit is not None:
-                songs = songs[:limit]
+            all_songs = all_songs[:limit]
            print(f"⚠️  Limiting processing to first {limit} songs")
-            # Determine output path
+        total_songs = len(all_songs)
-            if output_file is None:
+        print(f'📊 Total songs to process: {total_songs:,}')
-                output_file = input_file.parent / f"{input_file.stem}_cleaned.json"
+        print(f'Using {"database" if self.use_database else "API"} connection')
            print(f"Processing {len(songs)} songs...")
            print(f"Using {'database' if self.use_database else 'API'} connection")
        print(PROGRESS_SEPARATOR)
-            # Clean each song
+        # Initialize arrays for batch processing
-            cleaned_songs = []
+        successful_songs = []
        failed_songs = []
            success_count = 0
            fail_count = 0
-            for i, song in enumerate(songs, 1):
+        # Statistics tracking
-                cleaned_song, success = self.clean_song(song)
+        stats = {
-                cleaned_songs.append(cleaned_song)
+            'total_processed': 0,
            'artists_found': 0,
            'recordings_found': 0,
            'start_time': time.time()
        }
-                if success:
+        # Process each song
-                    success_count += 1
+        for i, song in enumerate(all_songs, 1):
-                    print(f"[{i}/{len(songs)}] ✅ PASS")
+            try:
                result = self.clean_song(song)
                cleaned_song, success = result
                artist_found = 'mbid' in cleaned_song
                recording_found = 'recording_mbid' in cleaned_song
                # Display progress with counter and status
                artist_name = song.get('artist', 'Unknown')
                title = song.get('title', 'Unknown')
                if artist_found and recording_found:
                    stats['artists_found'] += 1
                    stats['recordings_found'] += 1
                    successful_songs.append(cleaned_song)
                    print(f'[{i:,} of {total_songs:,}] ✅ PASS: {artist_name} - {title}')
                else:
-                    fail_count += 1
+                    # Keep the original song in failed_songs array (same format as source)
-                    print(f"[{i}/{len(songs)}] ❌ FAIL")
+                    failed_songs.append(song)
-                    # Store failed song info for report
+                    print(f'[{i:,} of {total_songs:,}] ❌ FAIL: {artist_name} - {title}')
-                    failed_songs.append({
+                
-                        'index': i,
+                stats['total_processed'] += 1
-                        'original_artist': song.get('artist', ''),
+                
-                        'original_title': song.get('title', ''),
+                # Progress update every 100 songs
-                        'cleaned_artist': cleaned_song.get('artist', ''),
+                if i % 100 == 0:
-                        'cleaned_title': cleaned_song.get('title', ''),
+                    elapsed = time.time() - stats['start_time']
-                        'has_mbid': 'mbid' in cleaned_song,
+                    rate = i / elapsed if elapsed > 0 else 0
-                        'has_recording_mbid': 'recording_mbid' in cleaned_song
+                    success_rate = (stats['artists_found'] / i * 100) if i > 0 else 0
-                    })
+                    print(f'  📈 Progress: {i:,}/{total_songs:,} ({i/total_songs*100:.1f}%) - '
                          f'Success: {success_rate:.1f}% - Rate: {rate:.1f} songs/sec')
            except Exception as e:
                print(f'  ❌ Error processing song {i}: {e}')
                # Keep the original song in failed_songs array
                failed_songs.append(song)
                stats['total_processed'] += 1
            # Only add delay for API calls, not database queries
            if not self.use_database:
                time.sleep(API_REQUEST_DELAY)
-            # Write output file
+        # Determine output file paths
-            with open(output_file, 'w', encoding='utf-8') as f:
+        if output_success is None:
-                json.dump(cleaned_songs, f, indent=2, ensure_ascii=False)
+            output_success = source_file.parent / f"{source_file.stem}-success.json"
        if output_failure is None:
            output_failure = source_file.parent / f"{source_file.stem}-failure.json"
        # Save successful songs (array format, same as source)
        with open(output_success, 'w', encoding='utf-8') as f:
            json.dump(successful_songs, f, indent=2, ensure_ascii=False)
        # Save failed songs (array format, same as source)
        with open(output_failure, 'w', encoding='utf-8') as f:
            json.dump(failed_songs, f, indent=2, ensure_ascii=False)
        # Calculate final statistics
        total_time = time.time() - stats['start_time']
        # Create human-readable text report
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        report_file = source_file.parent / f"processing_report_{timestamp}.txt"
            # Generate failure report
            report_file = input_file.parent / f"{input_file.stem}_failure_report.json"
        with open(report_file, 'w', encoding='utf-8') as f:
-                json.dump({
+            f.write("MusicBrainz Data Cleaner - Processing Report\n")
            f.write("=" * 50 + "\n\n")
            f.write(f"Source File: {source_file}\n")
            f.write(f"Processing Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
            f.write(f"Processing Time: {total_time:.1f} seconds\n\n")
            f.write("SUMMARY\n")
            f.write("-" * 20 + "\n")
            f.write(f"Total Songs Processed: {stats['total_processed']:,}\n")
            f.write(f"Successful Songs: {len(successful_songs):,}\n")
            f.write(f"Failed Songs: {len(failed_songs):,}\n")
            f.write(f"Success Rate: {(len(successful_songs)/stats['total_processed']*100):.1f}%\n\n")
            f.write("DETAILED STATISTICS\n")
            f.write("-" * 20 + "\n")
            f.write(f"Artists Found: {stats['artists_found']:,}/{stats['total_processed']:,} ({(stats['artists_found']/stats['total_processed']*100):.1f}%)\n")
            f.write(f"Recordings Found: {stats['recordings_found']:,}/{stats['total_processed']:,} ({(stats['recordings_found']/stats['total_processed']*100):.1f}%)\n")
            f.write(f"Processing Speed: {stats['total_processed'] / total_time:.1f} songs/second\n\n")
            f.write("OUTPUT FILES\n")
            f.write("-" * 20 + "\n")
            f.write(f"Successful Songs: {output_success}\n")
            f.write(f"Failed Songs: {output_failure}\n")
            f.write(f"Report File: {report_file}\n\n")
            if len(failed_songs) > 0:
                f.write("FAILED SONGS (First 50)\n")
                f.write("-" * 20 + "\n")
                for i, song in enumerate(failed_songs[:50], 1):
                    artist = song.get('artist', 'Unknown')
                    title = song.get('title', 'Unknown')
                    f.write(f"{i:3d}. {artist} - {title}\n")
                if len(failed_songs) > 50:
                    f.write(f"... and {len(failed_songs) - 50} more failed songs\n")
        # Save detailed JSON report for programmatic access
        json_report_file = source_file.parent / f"processing_report_{timestamp}.json"
        final_stats = {
            'summary': {
-                        'total_songs': len(songs),
+                'total_tested': stats['total_processed'],
-                        'successful': success_count,
+                'artists_found': stats['artists_found'],
-                        'failed': fail_count,
+                'recordings_found': stats['recordings_found'],
-                        'success_rate': f"{(success_count/len(songs)*100):.1f}%"
+                'failed_count': len(failed_songs),
                'artist_success_rate': (stats['artists_found'] / stats['total_processed'] * 100) if stats['total_processed'] > 0 else 0,
                'recording_success_rate': (stats['recordings_found'] / stats['total_processed'] * 100) if stats['total_processed'] > 0 else 0,
                'processing_time_seconds': total_time,
                'songs_per_second': stats['total_processed'] / total_time if total_time > 0 else 0
            },
-                    'failed_songs': failed_songs
+            'files': {
-                }, f, indent=2, ensure_ascii=False)
+                'source': str(source_file),
                'successful_songs': str(output_success),
                'failed_songs': str(output_failure),
                'text_report': str(report_file),
                'json_report': str(json_report_file)
            }
        }
-            print(f"\n{PROGRESS_SEPARATOR}")
+        with open(json_report_file, 'w') as f:
-            print(f"✅ SUCCESS: {success_count} songs")
+            json.dump(final_stats, f, indent=2)
            print(f"❌ FAILED: {fail_count} songs")
            print(f"📊 SUCCESS RATE: {(success_count/len(songs)*100):.1f}%")
            print(f"💾 CLEANED DATA: {output_file}")
            print(f"📋 FAILURE REPORT: {report_file}")
-            return output_file, failed_songs
+        print(f'\n{PROGRESS_SEPARATOR}')
        print(f'🎉 Processing completed!')
        print(f'📊 Final Results:')
        print(f'  ⏱️  Total processing time: {total_time:.1f} seconds')
        print(f'  🚀 Average speed: {stats["total_processed"] / total_time:.1f} songs/second')
        print(f'  ✅ Artists found: {stats["artists_found"]:,}/{stats["total_processed"]:,} ({stats["artists_found"]/stats["total_processed"]*100:.1f}%)')
        print(f'  ✅ Recordings found: {stats["recordings_found"]:,}/{stats["total_processed"]:,} ({stats["recordings_found"]/stats["total_processed"]*100:.1f}%)')
        print(f'  ❌ Failed songs: {len(failed_songs):,} ({len(failed_songs)/stats["total_processed"]*100:.1f}%)')
        print(f'📄 Files saved:')
        print(f'  ✅ Successful songs: {output_success}')
        print(f'  ❌ Failed songs: {output_failure}')
        print(f'  📋 Text report: {report_file}')
        print(f'  📊 JSON report: {json_report_file}')
-        except FileNotFoundError:
+        return final_stats
            print(f"Error: File '{input_file}' not found")
            return input_file, []
        except json.JSONDecodeError:
            print(f"Error: Invalid JSON in file '{input_file}'")
            return input_file, []
        except Exception as e:
            print(f"Error processing file: {e}")
            return input_file, []
        finally:
            # Clean up database connection
            if self.use_database and hasattr(self, 'db'):
                self.db.disconnect()
 def print_help() -> None:
@ -466,25 +556,36 @@ def print_help() -> None:
 MusicBrainz Data Cleaner - Clean and normalize song data using MusicBrainz
 USAGE:
-    musicbrainz-cleaner <input_file.json> [output_file.json] [options]
+    musicbrainz-cleaner [options]
 ARGUMENTS:
    input_file.json       JSON file containing array of song objects
    output_file.json      Optional: Output file for cleaned data
 OPTIONS:
    --source FILE              Source JSON file (default: data/songs.json)
    --output-success FILE      Output file for successful songs (default: source-success.json)
    --output-failure FILE      Output file for failed songs (default: source-failure.json)
    --limit N                  Process only the first N songs (default: all songs)
    --use-api                  Force use of HTTP API instead of direct database access
    --test-connection          Test connection to MusicBrainz server
    --help, -h                 Show this help message
    --version, -v              Show version information
    --test-connection     Test connection to MusicBrainz server
    --limit N             Process only the first N songs (for testing)
    --use-api             Force use of HTTP API instead of direct database access
 EXAMPLES:
-    musicbrainz-cleaner songs.json
+    # Process all songs with default settings
-    musicbrainz-cleaner songs.json cleaned_songs.json
+    musicbrainz-cleaner
    # Process specific file
    musicbrainz-cleaner --source data/my_songs.json
    # Process with custom output files
    musicbrainz-cleaner --source data/songs.json --output-success cleaned.json --output-failure failed.json
    # Process only first 1000 songs
    musicbrainz-cleaner --limit 1000
    # Test connection
    musicbrainz-cleaner --test-connection
-    musicbrainz-cleaner songs.json --limit 5
+
-    musicbrainz-cleaner songs.json --use-api
+    # Force API mode
    musicbrainz-cleaner --use-api
 REQUIREMENTS:
    - MusicBrainz server running on http://localhost:5001
@ -501,12 +602,14 @@ PERFORMANCE:
 def print_version() -> None:
    version_info = """
-MusicBrainz Data Cleaner v2.0.0
+MusicBrainz Data Cleaner v3.0.0
 Enhanced with:
 - Direct PostgreSQL database access
 - Fuzzy search for better matching
 - Improved performance and accuracy
 - Separate output files for successful and failed songs
 - Detailed progress tracking and reporting
 Copyright (c) 2024 MusicBrainz Data Cleaner Contributors
 MIT License - see LICENSE file for details
@ -516,35 +619,89 @@ Built with Python 3.6+
    print(version_info)
 def parse_arguments(args: List[str]) -> Dict[str, Any]:
    """Parse command line arguments into a dictionary"""
    parsed = {
        'source': 'data/songs.json',
        'output_success': None,
        'output_failure': None,
        'limit': None,
        'use_api': False,
        'test_connection': False,
        'help': False,
        'version': False
    }
    i = 0
    while i < len(args):
        arg = args[i]
        if arg in ['--help', '-h', 'help']:
            parsed['help'] = True
        elif arg in ['--version', '-v', 'version']:
            parsed['version'] = True
        elif arg == '--test-connection':
            parsed['test_connection'] = True
        elif arg == '--use-api':
            parsed['use_api'] = True
        elif arg == '--source':
            if i + 1 < len(args) and not args[i + 1].startswith('--'):
                parsed['source'] = args[i + 1]
                i += 1
            else:
                print("Error: --source requires a file path")
                sys.exit(ExitCode.USAGE_ERROR)
        elif arg == '--output-success':
            if i + 1 < len(args) and not args[i + 1].startswith('--'):
                parsed['output_success'] = args[i + 1]
                i += 1
            else:
                print("Error: --output-success requires a file path")
                sys.exit(ExitCode.USAGE_ERROR)
        elif arg == '--output-failure':
            if i + 1 < len(args) and not args[i + 1].startswith('--'):
                parsed['output_failure'] = args[i + 1]
                i += 1
            else:
                print("Error: --output-failure requires a file path")
                sys.exit(ExitCode.USAGE_ERROR)
        elif arg == '--limit':
            if i + 1 < len(args) and not args[i + 1].startswith('--'):
                try:
                    parsed['limit'] = int(args[i + 1])
                    if parsed['limit'] <= 0:
                        print("Error: --limit must be a positive number")
                        sys.exit(ExitCode.USAGE_ERROR)
                except ValueError:
                    print("Error: --limit requires a valid number")
                    sys.exit(ExitCode.USAGE_ERROR)
                i += 1
            else:
                print("Error: --limit requires a number")
                sys.exit(ExitCode.USAGE_ERROR)
        i += 1
    return parsed
 def main() -> int:
    try:
        args = sys.argv[1:]
        parsed = parse_arguments(args)
        # Handle help and version flags
-        if not args or args[0] in ['--help', '-h', 'help']:
+        if parsed['help']:
            print_help()
            return ExitCode.SUCCESS
-        if args[0] in ['--version', '-v', 'version']:
+        if parsed['version']:
            print_version()
            return ExitCode.SUCCESS
        # Check for API flag
        use_database = '--use-api' not in args
        if not use_database:
            print("⚠️  Using HTTP API mode (slower than database access)")
        # Handle test connection
-        if args[0] == '--test-connection':
+        if parsed['test_connection']:
-            if use_database:
+            if parsed['use_api']:
                db = MusicBrainzDatabase()
                if db.test_connection():
                    print("✅ Connection to MusicBrainz database successful")
                    return ExitCode.SUCCESS
                else:
                    print("❌ Connection to MusicBrainz database failed")
                    return ExitCode.ERROR
            else:
                api = MusicBrainzAPIClient()
                if api.test_connection():
                    print("✅ Connection to MusicBrainz API server successful")
@ -552,10 +709,7 @@ def main() -> int:
                else:
                    print("❌ Connection to MusicBrainz API server failed")
                    return ExitCode.ERROR
-        
+            else:
        # Check for test connection flag in any position
        if '--test-connection' in args:
            if use_database:
                db = MusicBrainzDatabase()
                if db.test_connection():
                    print("✅ Connection to MusicBrainz database successful")
@ -563,73 +717,27 @@ def main() -> int:
                else:
                    print("❌ Connection to MusicBrainz database failed")
                    return ExitCode.ERROR
            else:
                api = MusicBrainzAPIClient()
                if api.test_connection():
                    print("✅ Connection to MusicBrainz API server successful")
                    return ExitCode.SUCCESS
                else:
                    print("❌ Connection to MusicBrainz API server failed")
                    return ExitCode.ERROR
-        # Validate input file
+        # Process songs (main functionality)
-        if not args:
+        source_file = Path(parsed['source'])
-            print("Error: Input file is required")
+        output_success = Path(parsed['output_success']) if parsed['output_success'] else None
-            print("Use --help for usage information")
+        output_failure = Path(parsed['output_failure']) if parsed['output_failure'] else None
        if not source_file.exists():
            print(f"Error: Source file does not exist: {source_file}")
            return ExitCode.USAGE_ERROR
-        # Parse limit argument and remove it from args
+        if not source_file.is_file():
-        limit = None
+            print(f"Error: Source path is not a file: {source_file}")
        args_to_remove = []
        for i, arg in enumerate(args):
            if arg == '--limit':
                if i + 1 < len(args) and not args[i + 1].startswith('--'):
                    try:
                        limit = int(args[i + 1])
                        if limit <= 0:
                            print("Error: Limit must be a positive number")
                            return ExitCode.USAGE_ERROR
                        args_to_remove.extend([i, i + 1])
                    except ValueError:
                        print("Error: --limit requires a valid number")
                        return ExitCode.USAGE_ERROR
                else:
                    print("Error: --limit requires a number")
            return ExitCode.USAGE_ERROR
-        # Remove limit arguments and API flag from args
+        if source_file.suffix.lower() != '.json':
-        for index in reversed(args_to_remove):
+            print(f"Error: Source file must be a JSON file: {source_file}")
            args.pop(index)
        # Remove API flag
        args = [arg for arg in args if arg != '--use-api']
        # Filter out remaining flags to get file arguments
        file_args = [arg for arg in args if not arg.startswith('--')]
        if not file_args:
            print("Error: Input file is required")
            print("Use --help for usage information")
            return ExitCode.USAGE_ERROR
        input_file = Path(file_args[0])
        output_file = Path(file_args[1]) if len(file_args) > 1 else None
        if not input_file.exists():
            print(f"Error: Input file does not exist: {input_file}")
            return ExitCode.USAGE_ERROR
        if not input_file.is_file():
            print(f"Error: Input path is not a file: {input_file}")
            return ExitCode.USAGE_ERROR
        if input_file.suffix.lower() != '.json':
            print(f"Error: Input file must be a JSON file: {input_file}")
            return ExitCode.USAGE_ERROR
        # Process the file
-        cleaner = MusicBrainzCleaner(use_database=use_database)
+        cleaner = MusicBrainzCleaner(use_database=not parsed['use_api'])
-        result_path, failed_songs = cleaner.clean_songs_file(input_file, output_file, limit)
+        cleaner.process_songs(source_file, output_success, output_failure, parsed['limit'])
        return ExitCode.SUCCESS