musicbrainz-cleaner/COMMANDS.md

8.0 KiB

MusicBrainz Data Cleaner - CLI Commands Reference

Overview

The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database.

Basic Command Structure

python musicbrainz_cleaner.py <input_file> [output_file] [options]

Command Arguments

Required Arguments

Argument Type Description Example
input_file string Path to the JSON file containing song data my_songs.json

Optional Arguments

Argument Type Description Example
output_file string Path for the cleaned output file cleaned_songs.json
--help flag Show help information --help
--version flag Show version information --version

Command Examples

Basic Usage

# Clean songs and save to auto-generated filename
python musicbrainz_cleaner.py songs.json
# Output: songs_cleaned.json

Custom Output File

# Specify custom output filename
python musicbrainz_cleaner.py songs.json cleaned_songs.json

Help and Information

# Show help information
python musicbrainz_cleaner.py --help

# Show version information
python musicbrainz_cleaner.py --version

Input File Format

The input file must be a valid JSON file containing an array of song objects:

[
  {
    "artist": "ACDC",
    "title": "Shot In The Dark",
    "disabled": false,
    "favorite": true,
    "guid": "8946008c-7acc-d187-60e6-5286e55ad502",
    "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4"
  }
]

Required Fields

  • artist: The artist name (string)
  • title: The song title (string)

Optional Fields

Any additional fields will be preserved in the output:

  • disabled: Boolean flag
  • favorite: Boolean flag
  • guid: Unique identifier
  • path: File path
  • Any other custom fields

Output File Format

The output file will contain the same structure with cleaned data and added MBID fields:

[
  {
    "artist": "AC/DC",
    "title": "Shot in the Dark",
    "disabled": false,
    "favorite": true,
    "guid": "8946008c-7acc-d187-60e6-5286e55ad502",
    "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4",
    "mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
    "recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
  }
]

Added Fields

  • mbid: MusicBrainz Artist ID (string)
  • recording_mbid: MusicBrainz Recording ID (string)

Command Line Options

Help Option

python musicbrainz_cleaner.py --help

Output:

Usage: python musicbrainz_cleaner.py <input_file.json> [output_file.json]

MusicBrainz Data Cleaner - Clean and normalize song data using MusicBrainz

Arguments:
  input_file.json    JSON file containing array of song objects
  output_file.json   Optional: Output file for cleaned data
                     (default: input_file_cleaned.json)

Examples:
  python musicbrainz_cleaner.py songs.json
  python musicbrainz_cleaner.py songs.json cleaned_songs.json

Requirements:
  - MusicBrainz server running on http://localhost:5001
  - Python 3.6+ with requests library

Version Option

python musicbrainz_cleaner.py --version

Output:

MusicBrainz Data Cleaner v1.0.0

Error Messages and Exit Codes

Exit Codes

Code Meaning Description
0 Success Processing completed successfully
1 Error General error occurred
2 Usage Error Invalid command line arguments

Common Error Messages

File Not Found

Error: File 'songs.json' not found

Invalid JSON

Error: Invalid JSON in file 'songs.json'

Invalid Input Format

Error: Input file should contain a JSON array of songs

Connection Error

Error searching for artist 'Artist Name': Connection refused

Missing Dependencies

ModuleNotFoundError: No module named 'requests'

Processing Output

Progress Indicators

Processing 3 songs...
==================================================

[1/3] Processing: ACDC - Shot In The Dark
  ✅ Found artist: AC/DC (MBID: 66c662b6-6e2f-4930-8610-912e24c63ed1)
  ✅ Found recording: Shot in the Dark (MBID: cf8b5cd0-d97c-413d-882f-fc422a2e57db)
  ✅ Updated to: AC/DC - Shot in the Dark

[2/3] Processing: Bruno Mars ft. Cardi B - Finesse Remix
  ❌ Could not find artist: Bruno Mars ft. Cardi B

[3/3] Processing: Taylor Swift - Love Story
  ✅ Found artist: Taylor Swift (MBID: 20244d07-534f-4eff-b4d4-930878889970)
  ✅ Found recording: Love Story (MBID: d783e6c5-761f-4fc3-bfcf-6089cdfc8f96)
  ✅ Updated to: Taylor Swift - Love Story

==================================================
✅ Processing complete!
📁 Output saved to: songs_cleaned.json

Status Indicators

Symbol Meaning Description
Success Operation completed successfully
Error Operation failed
🔄 Processing Currently processing

Batch Processing

Multiple Files

To process multiple files, you can use shell scripting:

# Process all JSON files in current directory
for file in *.json; do
    python musicbrainz_cleaner.py "$file"
done

Large Files

For large files, the tool processes songs one at a time with a 0.1-second delay between API calls to be respectful to the MusicBrainz server.

Environment Variables

The tool uses the following default configuration:

Setting Default Description
MusicBrainz URL http://localhost:5001 Local MusicBrainz server URL
API Delay 0.1 seconds Delay between API calls

Troubleshooting Commands

Check MusicBrainz Server Status

# Test if server is running
curl -I http://localhost:5001

# Test API endpoint
curl http://localhost:5001/ws/2/artist/?query=name:AC/DC&fmt=json

Validate JSON File

# Check if JSON is valid
python -m json.tool songs.json

# Check JSON structure
python -c "import json; data=json.load(open('songs.json')); print('Valid JSON array with', len(data), 'items')"

Check Python Dependencies

# Check if requests is installed
python -c "import requests; print('requests version:', requests.__version__)"

# Install if missing
pip install requests

Advanced Usage

Custom MusicBrainz Server

To use a different MusicBrainz server, modify the script:

# In musicbrainz_cleaner.py, change:
self.base_url = "http://your-server:5001"

Verbose Output

For debugging, you can modify the script to add more verbose output by uncommenting debug print statements.

Command Line Shortcuts

Common Aliases

Add these to your shell profile for convenience:

# Add to ~/.bashrc or ~/.zshrc
alias mbclean='python musicbrainz_cleaner.py'
alias mbclean-help='python musicbrainz_cleaner.py --help'

Usage with Aliases

# Using alias
mbclean songs.json

# Show help
mbclean-help

Integration Examples

With Git

# Process files and commit changes
python musicbrainz_cleaner.py songs.json
git add songs_cleaned.json
git commit -m "Clean song metadata with MusicBrainz IDs"

With Cron Jobs

# Add to crontab to process files daily
0 2 * * * cd /path/to/musicbrainz-cleaner && python musicbrainz_cleaner.py /path/to/songs.json

With Shell Scripts

#!/bin/bash
# clean_songs.sh
INPUT_FILE="$1"
OUTPUT_FILE="${INPUT_FILE%.json}_cleaned.json"

python musicbrainz_cleaner.py "$INPUT_FILE" "$OUTPUT_FILE"

if [ $? -eq 0 ]; then
    echo "Successfully cleaned $INPUT_FILE"
    echo "Output saved to $OUTPUT_FILE"
else
    echo "Error processing $INPUT_FILE"
    exit 1
fi

Command Reference Summary

Command Description
python musicbrainz_cleaner.py file.json Basic usage
python musicbrainz_cleaner.py file.json output.json Custom output
python musicbrainz_cleaner.py --help Show help
python musicbrainz_cleaner.py --version Show version