musicbrainz-cleaner/COMMANDS.md

12 KiB

MusicBrainz Data Cleaner - CLI Commands Reference

Overview

The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool creates separate output files for successful and failed songs, along with detailed processing reports.

Basic Command Structure

docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main [options]

Command Options

Main Options

Option Type Description Default Example
--source string Source JSON file path data/songs.json --source data/my_songs.json
--output-success string Output file for successful songs source-success.json --output-success cleaned.json
--output-failure string Output file for failed songs source-failure.json --output-failure failed.json
--limit number Process only first N songs None (all songs) --limit 1000
--use-api flag Force use of HTTP API instead of database Database mode --use-api
--test-connection flag Test connection to MusicBrainz server None --test-connection
--help flag Show help information None --help
--version flag Show version information None --version

Command Examples

Basic Usage (Default)

# Process all songs with default settings
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Output: data/songs-success.json and data/songs-failure.json

Custom Source File

# Process specific file
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
# Output: data/my_songs-success.json and data/my_songs-failure.json

Custom Output Files

# Specify custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json

Limited Processing

# Process only first 1000 songs
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000

Force API Mode

# Use HTTP API instead of database (slower but works without PostgreSQL)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api

Test Connection

# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection

# Test API connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api

Help and Information

# Show help information
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help

# Show version information
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version

Input File Format

The input file must be a valid JSON file containing an array of song objects:

[
  {
    "artist": "ACDC",
    "title": "Shot In The Dark",
    "disabled": false,
    "favorite": true,
    "guid": "8946008c-7acc-d187-60e6-5286e55ad502",
    "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4"
  }
]

Required Fields

  • artist: The artist name (string)
  • title: The song title (string)

Optional Fields

Any additional fields will be preserved in the output:

  • disabled: Boolean flag
  • favorite: Boolean flag
  • guid: Unique identifier
  • path: File path
  • Any other custom fields

Output Files

The tool creates three output files:

1. Successful Songs (source-success.json)

Array of successfully processed songs with MBIDs added:

[
  {
    "artist": "AC/DC",
    "title": "Shot in the Dark",
    "disabled": false,
    "favorite": true,
    "guid": "8946008c-7acc-d187-60e6-5286e55ad502",
    "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4",
    "mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
    "recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
  }
]

2. Failed Songs (source-failure.json)

Array of songs that couldn't be processed (same format as source):

[
  {
    "artist": "Unknown Artist",
    "title": "Unknown Song",
    "disabled": false,
    "favorite": false,
    "guid": "12345678-1234-1234-1234-123456789012",
    "path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
  }
]

3. Processing Report (processing_report_YYYYMMDD_HHMMSS.txt)

Human-readable text report with statistics and failed song list:

MusicBrainz Data Cleaner - Processing Report
==================================================

Source File: data/songs.json
Processing Date: 2024-12-19 14:30:22
Processing Time: 15263.3 seconds

SUMMARY
--------------------
Total Songs Processed: 49,170
Successful Songs: 40,692
Failed Songs: 8,478
Success Rate: 82.8%

DETAILED STATISTICS
--------------------
Artists Found: 44,526/49,170 (90.6%)
Recordings Found: 40,998/49,170 (83.4%)
Processing Speed: 3.2 songs/second

OUTPUT FILES
--------------------
Successful Songs: data/songs-success.json
Failed Songs: data/songs-failure.json
Report File: data/processing_report_20241219_143022.txt

FAILED SONGS (First 50)
--------------------
  1. Unknown Artist - Unknown Song
  2. Invalid Artist - Invalid Title
  3. Test Artist - Test Song
...

Added Fields (Successful Songs Only)

  • mbid: MusicBrainz Artist ID (string)
  • recording_mbid: MusicBrainz Recording ID (string)

Processing Output

Progress Indicators

🚀 Starting song processing...
📊 Total songs to process: 49,170
Using database connection
==================================================

[1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
[2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
[3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)

  📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec

==================================================
🎉 Processing completed!
📊 Final Results:
  ⏱️  Total processing time: 15263.3 seconds
  🚀 Average speed: 3.2 songs/second
  ✅ Artists found: 44,526/49,170 (90.6%)
  ✅ Recordings found: 40,998/49,170 (83.4%)
  ❌ Failed songs: 8,478 (17.2%)
📄 Files saved:
  ✅ Successful songs: data/songs-success.json
  ❌ Failed songs: data/songs-failure.json
  📋 Text report: data/processing_report_20241219_143022.txt
  📊 JSON report: data/processing_report_20241219_143022.json

Status Indicators

Symbol Meaning Description
Success Song processed successfully with MBIDs found
Failure Song processing failed (no MBIDs found)
📈 Progress Progress update with statistics
🚀 Start Processing started
🎉 Complete Processing completed successfully

Error Messages and Exit Codes

Exit Codes

Code Meaning Description
0 Success Processing completed successfully
1 Error General error occurred
2 Usage Error Invalid command line arguments

Common Error Messages

File Not Found

Error: Source file does not exist: data/songs.json

Invalid JSON

Error: Invalid JSON in file 'songs.json'

Invalid Input Format

Error: Source file should contain a JSON array of songs

Connection Error

❌ Connection to MusicBrainz database failed

Missing Dependencies

ModuleNotFoundError: No module named 'requests'

Environment Configuration

Docker Environment

The tool runs in a Docker container with the following configuration:

Setting Default Description
Database Host db PostgreSQL database container
Database Port 5432 PostgreSQL port
Database Name musicbrainz_db MusicBrainz database name
API URL http://localhost:5001 MusicBrainz web server URL

Environment Variables

# Database configuration
DB_HOST=db
DB_PORT=5432
DB_NAME=musicbrainz_db
DB_USER=musicbrainz
DB_PASSWORD=musicbrainz

# Web server configuration
MUSICBRAINZ_WEB_SERVER_PORT=5001

Troubleshooting Commands

Check MusicBrainz Server Status

# Test if web server is running
curl -I http://localhost:5001

# Test database connection
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"

Validate JSON File

# Check if JSON is valid
python -m json.tool data/songs.json

# Check JSON structure
python -c "import json; data=json.load(open('data/songs.json')); print('Valid JSON array with', len(data), 'items')"

Test Tool Connection

# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection

# Test API connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api

Advanced Usage

Batch Processing

To process multiple files, you can use shell scripting:

# Process all JSON files in data directory
for file in data/*.json; do
    docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source "$file"
done

Large Files

For large files, the tool processes songs efficiently with:

  • Direct database access for maximum speed
  • Progress tracking every 100 songs
  • Memory-efficient processing
  • No rate limiting with database access

Custom Processing

# Process with custom chunk size (for testing)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --limit 1000

# Process with custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success my_cleaned.json --output-failure my_failed.json

Command Line Shortcuts

Common Aliases

Add these to your shell profile for convenience:

# Add to ~/.bashrc or ~/.zshrc
alias mbclean='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main'
alias mbclean-help='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help'
alias mbclean-test='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection'

Usage with Aliases

# Using alias
mbclean --source data/songs.json

# Show help
mbclean-help

# Test connection
mbclean-test

Integration Examples

With Git

# Process files and commit changes
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json
git add data/songs-success.json data/songs-failure.json
git commit -m "Clean song metadata with MusicBrainz IDs"

With Cron Jobs

# Add to crontab to process files daily
0 2 * * * cd /path/to/musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source /path/to/songs.json

With Shell Scripts

#!/bin/bash
# clean_songs.sh
INPUT_FILE="$1"
OUTPUT_SUCCESS="${INPUT_FILE%.json}-success.json"
OUTPUT_FAILURE="${INPUT_FILE%.json}-failure.json"

docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main \
    --source "$INPUT_FILE" \
    --output-success "$OUTPUT_SUCCESS" \
    --output-failure "$OUTPUT_FAILURE"

if [ $? -eq 0 ]; then
    echo "Successfully processed $INPUT_FILE"
    echo "Successful songs: $OUTPUT_SUCCESS"
    echo "Failed songs: $OUTPUT_FAILURE"
else
    echo "Error processing $INPUT_FILE"
    exit 1
fi

Command Reference Summary

Command Description
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main Process all songs with defaults
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source file.json Process specific file
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000 Process first 1000 songs
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api Force API mode
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help Show help
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version Show version