# MusicBrainz Data Cleaner - CLI Commands Reference ## Overview The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool creates separate output files for successful and failed songs, along with detailed processing reports. ## Basic Command Structure ```bash docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main [options] ``` ## Command Options ### Main Options | Option | Type | Description | Default | Example | |--------|------|-------------|---------|---------| | `--source` | string | Source JSON file path | `data/songs.json` | `--source data/my_songs.json` | | `--output-success` | string | Output file for successful songs | `source-success.json` | `--output-success cleaned.json` | | `--output-failure` | string | Output file for failed songs | `source-failure.json` | `--output-failure failed.json` | | `--limit` | number | Process only first N songs | None (all songs) | `--limit 1000` | | `--use-api` | flag | Force use of HTTP API instead of database | Database mode | `--use-api` | | `--test-connection` | flag | Test connection to MusicBrainz server | None | `--test-connection` | | `--help` | flag | Show help information | None | `--help` | | `--version` | flag | Show version information | None | `--version` | ## Command Examples ### Basic Usage (Default) ```bash # Process all songs with default settings docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main # Output: data/songs-success.json and data/songs-failure.json ``` ### Custom Source File ```bash # Process specific file docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json # Output: data/my_songs-success.json and data/my_songs-failure.json ``` ### Custom Output Files ```bash # Specify custom output files docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json ``` ### Limited Processing ```bash # Process only first 1000 songs docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000 ``` ### Force API Mode ```bash # Use HTTP API instead of database (slower but works without PostgreSQL) docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api ``` ### Test Connection ```bash # Test database connection docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection # Test API connection docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api ``` ### Help and Information ```bash # Show help information docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help # Show version information docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version ``` ## Input File Format The input file must be a valid JSON file containing an array of song objects: ```json [ { "artist": "ACDC", "title": "Shot In The Dark", "disabled": false, "favorite": true, "guid": "8946008c-7acc-d187-60e6-5286e55ad502", "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4" } ] ``` ### Required Fields - `artist`: The artist name (string) - `title`: The song title (string) ### Optional Fields Any additional fields will be preserved in the output: - `disabled`: Boolean flag - `favorite`: Boolean flag - `guid`: Unique identifier - `path`: File path - Any other custom fields ## Output Files The tool creates **three output files**: ### 1. Successful Songs (`source-success.json`) Array of successfully processed songs with MBIDs added: ```json [ { "artist": "AC/DC", "title": "Shot in the Dark", "disabled": false, "favorite": true, "guid": "8946008c-7acc-d187-60e6-5286e55ad502", "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4", "mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1", "recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db" } ] ``` ### 2. Failed Songs (`source-failure.json`) Array of songs that couldn't be processed (same format as source): ```json [ { "artist": "Unknown Artist", "title": "Unknown Song", "disabled": false, "favorite": false, "guid": "12345678-1234-1234-1234-123456789012", "path": "z://MP4\\Unknown Artist - Unknown Song.mp4" } ] ``` ### 3. Processing Report (`processing_report_YYYYMMDD_HHMMSS.txt`) Human-readable text report with statistics and failed song list: ``` MusicBrainz Data Cleaner - Processing Report ================================================== Source File: data/songs.json Processing Date: 2024-12-19 14:30:22 Processing Time: 15263.3 seconds SUMMARY -------------------- Total Songs Processed: 49,170 Successful Songs: 40,692 Failed Songs: 8,478 Success Rate: 82.8% DETAILED STATISTICS -------------------- Artists Found: 44,526/49,170 (90.6%) Recordings Found: 40,998/49,170 (83.4%) Processing Speed: 3.2 songs/second OUTPUT FILES -------------------- Successful Songs: data/songs-success.json Failed Songs: data/songs-failure.json Report File: data/processing_report_20241219_143022.txt FAILED SONGS (First 50) -------------------- 1. Unknown Artist - Unknown Song 2. Invalid Artist - Invalid Title 3. Test Artist - Test Song ... ``` ### Added Fields (Successful Songs Only) - `mbid`: MusicBrainz Artist ID (string) - `recording_mbid`: MusicBrainz Recording ID (string) ## Processing Output ### Progress Indicators ``` 🚀 Starting song processing... 📊 Total songs to process: 49,170 Using database connection ================================================== [1 of 49,170] ✅ PASS: ACDC - Shot In The Dark [2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song [3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix) 📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec ================================================== 🎉 Processing completed! 📊 Final Results: ⏱️ Total processing time: 15263.3 seconds 🚀 Average speed: 3.2 songs/second ✅ Artists found: 44,526/49,170 (90.6%) ✅ Recordings found: 40,998/49,170 (83.4%) ❌ Failed songs: 8,478 (17.2%) 📄 Files saved: ✅ Successful songs: data/songs-success.json ❌ Failed songs: data/songs-failure.json 📋 Text report: data/processing_report_20241219_143022.txt 📊 JSON report: data/processing_report_20241219_143022.json ``` ### Status Indicators | Symbol | Meaning | Description | |--------|---------|-------------| | ✅ | Success | Song processed successfully with MBIDs found | | ❌ | Failure | Song processing failed (no MBIDs found) | | 📈 | Progress | Progress update with statistics | | 🚀 | Start | Processing started | | 🎉 | Complete | Processing completed successfully | ## Error Messages and Exit Codes ### Exit Codes | Code | Meaning | Description | |------|---------|-------------| | 0 | Success | Processing completed successfully | | 1 | Error | General error occurred | | 2 | Usage Error | Invalid command line arguments | ### Common Error Messages #### File Not Found ``` Error: Source file does not exist: data/songs.json ``` #### Invalid JSON ``` Error: Invalid JSON in file 'songs.json' ``` #### Invalid Input Format ``` Error: Source file should contain a JSON array of songs ``` #### Connection Error ``` ❌ Connection to MusicBrainz database failed ``` #### Missing Dependencies ``` ModuleNotFoundError: No module named 'requests' ``` ## Environment Configuration ### Docker Environment The tool runs in a Docker container with the following configuration: | Setting | Default | Description | |---------|---------|-------------| | Database Host | `db` | PostgreSQL database container | | Database Port | `5432` | PostgreSQL port | | Database Name | `musicbrainz_db` | MusicBrainz database name | | API URL | `http://localhost:5001` | MusicBrainz web server URL | ### Environment Variables ```bash # Database configuration DB_HOST=db DB_PORT=5432 DB_NAME=musicbrainz_db DB_USER=musicbrainz DB_PASSWORD=musicbrainz # Web server configuration MUSICBRAINZ_WEB_SERVER_PORT=5001 ``` ## Troubleshooting Commands ### Check MusicBrainz Server Status ```bash # Test if web server is running curl -I http://localhost:5001 # Test database connection docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;" ``` ### Validate JSON File ```bash # Check if JSON is valid python -m json.tool data/songs.json # Check JSON structure python -c "import json; data=json.load(open('data/songs.json')); print('Valid JSON array with', len(data), 'items')" ``` ### Test Tool Connection ```bash # Test database connection docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection # Test API connection docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api ``` ## Advanced Usage ### Batch Processing To process multiple files, you can use shell scripting: ```bash # Process all JSON files in data directory for file in data/*.json; do docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source "$file" done ``` ### Large Files For large files, the tool processes songs efficiently with: - Direct database access for maximum speed - Progress tracking every 100 songs - Memory-efficient processing - No rate limiting with database access ### Custom Processing ```bash # Process with custom chunk size (for testing) docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --limit 1000 # Process with custom output files docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success my_cleaned.json --output-failure my_failed.json ``` ## Command Line Shortcuts ### Common Aliases Add these to your shell profile for convenience: ```bash # Add to ~/.bashrc or ~/.zshrc alias mbclean='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main' alias mbclean-help='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help' alias mbclean-test='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection' ``` ### Usage with Aliases ```bash # Using alias mbclean --source data/songs.json # Show help mbclean-help # Test connection mbclean-test ``` ## Integration Examples ### With Git ```bash # Process files and commit changes docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json git add data/songs-success.json data/songs-failure.json git commit -m "Clean song metadata with MusicBrainz IDs" ``` ### With Cron Jobs ```bash # Add to crontab to process files daily 0 2 * * * cd /path/to/musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source /path/to/songs.json ``` ### With Shell Scripts ```bash #!/bin/bash # clean_songs.sh INPUT_FILE="$1" OUTPUT_SUCCESS="${INPUT_FILE%.json}-success.json" OUTPUT_FAILURE="${INPUT_FILE%.json}-failure.json" docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main \ --source "$INPUT_FILE" \ --output-success "$OUTPUT_SUCCESS" \ --output-failure "$OUTPUT_FAILURE" if [ $? -eq 0 ]; then echo "Successfully processed $INPUT_FILE" echo "Successful songs: $OUTPUT_SUCCESS" echo "Failed songs: $OUTPUT_FAILURE" else echo "Error processing $INPUT_FILE" exit 1 fi ``` ## Command Reference Summary | Command | Description | |---------|-------------| | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main` | Process all songs with defaults | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source file.json` | Process specific file | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000` | Process first 1000 songs | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection` | Test database connection | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api` | Force API mode | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help` | Show help | | `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version` | Show version |