8.0 KiB
MusicBrainz Data Cleaner - CLI Commands Reference
Overview
The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database.
Basic Command Structure
python musicbrainz_cleaner.py <input_file> [output_file] [options]
Command Arguments
Required Arguments
| Argument | Type | Description | Example |
|---|---|---|---|
input_file |
string | Path to the JSON file containing song data | my_songs.json |
Optional Arguments
| Argument | Type | Description | Example |
|---|---|---|---|
output_file |
string | Path for the cleaned output file | cleaned_songs.json |
--help |
flag | Show help information | --help |
--version |
flag | Show version information | --version |
Command Examples
Basic Usage
# Clean songs and save to auto-generated filename
python musicbrainz_cleaner.py songs.json
# Output: songs_cleaned.json
Custom Output File
# Specify custom output filename
python musicbrainz_cleaner.py songs.json cleaned_songs.json
Help and Information
# Show help information
python musicbrainz_cleaner.py --help
# Show version information
python musicbrainz_cleaner.py --version
Input File Format
The input file must be a valid JSON file containing an array of song objects:
[
{
"artist": "ACDC",
"title": "Shot In The Dark",
"disabled": false,
"favorite": true,
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4"
}
]
Required Fields
artist: The artist name (string)title: The song title (string)
Optional Fields
Any additional fields will be preserved in the output:
disabled: Boolean flagfavorite: Boolean flagguid: Unique identifierpath: File path- Any other custom fields
Output File Format
The output file will contain the same structure with cleaned data and added MBID fields:
[
{
"artist": "AC/DC",
"title": "Shot in the Dark",
"disabled": false,
"favorite": true,
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4",
"mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
"recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
}
]
Added Fields
mbid: MusicBrainz Artist ID (string)recording_mbid: MusicBrainz Recording ID (string)
Command Line Options
Help Option
python musicbrainz_cleaner.py --help
Output:
Usage: python musicbrainz_cleaner.py <input_file.json> [output_file.json]
MusicBrainz Data Cleaner - Clean and normalize song data using MusicBrainz
Arguments:
input_file.json JSON file containing array of song objects
output_file.json Optional: Output file for cleaned data
(default: input_file_cleaned.json)
Examples:
python musicbrainz_cleaner.py songs.json
python musicbrainz_cleaner.py songs.json cleaned_songs.json
Requirements:
- MusicBrainz server running on http://localhost:5001
- Python 3.6+ with requests library
Version Option
python musicbrainz_cleaner.py --version
Output:
MusicBrainz Data Cleaner v1.0.0
Error Messages and Exit Codes
Exit Codes
| Code | Meaning | Description |
|---|---|---|
| 0 | Success | Processing completed successfully |
| 1 | Error | General error occurred |
| 2 | Usage Error | Invalid command line arguments |
Common Error Messages
File Not Found
Error: File 'songs.json' not found
Invalid JSON
Error: Invalid JSON in file 'songs.json'
Invalid Input Format
Error: Input file should contain a JSON array of songs
Connection Error
Error searching for artist 'Artist Name': Connection refused
Missing Dependencies
ModuleNotFoundError: No module named 'requests'
Processing Output
Progress Indicators
Processing 3 songs...
==================================================
[1/3] Processing: ACDC - Shot In The Dark
✅ Found artist: AC/DC (MBID: 66c662b6-6e2f-4930-8610-912e24c63ed1)
✅ Found recording: Shot in the Dark (MBID: cf8b5cd0-d97c-413d-882f-fc422a2e57db)
✅ Updated to: AC/DC - Shot in the Dark
[2/3] Processing: Bruno Mars ft. Cardi B - Finesse Remix
❌ Could not find artist: Bruno Mars ft. Cardi B
[3/3] Processing: Taylor Swift - Love Story
✅ Found artist: Taylor Swift (MBID: 20244d07-534f-4eff-b4d4-930878889970)
✅ Found recording: Love Story (MBID: d783e6c5-761f-4fc3-bfcf-6089cdfc8f96)
✅ Updated to: Taylor Swift - Love Story
==================================================
✅ Processing complete!
📁 Output saved to: songs_cleaned.json
Status Indicators
| Symbol | Meaning | Description |
|---|---|---|
| ✅ | Success | Operation completed successfully |
| ❌ | Error | Operation failed |
| 🔄 | Processing | Currently processing |
Batch Processing
Multiple Files
To process multiple files, you can use shell scripting:
# Process all JSON files in current directory
for file in *.json; do
python musicbrainz_cleaner.py "$file"
done
Large Files
For large files, the tool processes songs one at a time with a 0.1-second delay between API calls to be respectful to the MusicBrainz server.
Environment Variables
The tool uses the following default configuration:
| Setting | Default | Description |
|---|---|---|
| MusicBrainz URL | http://localhost:5001 |
Local MusicBrainz server URL |
| API Delay | 0.1 seconds |
Delay between API calls |
Troubleshooting Commands
Check MusicBrainz Server Status
# Test if server is running
curl -I http://localhost:5001
# Test API endpoint
curl http://localhost:5001/ws/2/artist/?query=name:AC/DC&fmt=json
Validate JSON File
# Check if JSON is valid
python -m json.tool songs.json
# Check JSON structure
python -c "import json; data=json.load(open('songs.json')); print('Valid JSON array with', len(data), 'items')"
Check Python Dependencies
# Check if requests is installed
python -c "import requests; print('requests version:', requests.__version__)"
# Install if missing
pip install requests
Advanced Usage
Custom MusicBrainz Server
To use a different MusicBrainz server, modify the script:
# In musicbrainz_cleaner.py, change:
self.base_url = "http://your-server:5001"
Verbose Output
For debugging, you can modify the script to add more verbose output by uncommenting debug print statements.
Command Line Shortcuts
Common Aliases
Add these to your shell profile for convenience:
# Add to ~/.bashrc or ~/.zshrc
alias mbclean='python musicbrainz_cleaner.py'
alias mbclean-help='python musicbrainz_cleaner.py --help'
Usage with Aliases
# Using alias
mbclean songs.json
# Show help
mbclean-help
Integration Examples
With Git
# Process files and commit changes
python musicbrainz_cleaner.py songs.json
git add songs_cleaned.json
git commit -m "Clean song metadata with MusicBrainz IDs"
With Cron Jobs
# Add to crontab to process files daily
0 2 * * * cd /path/to/musicbrainz-cleaner && python musicbrainz_cleaner.py /path/to/songs.json
With Shell Scripts
#!/bin/bash
# clean_songs.sh
INPUT_FILE="$1"
OUTPUT_FILE="${INPUT_FILE%.json}_cleaned.json"
python musicbrainz_cleaner.py "$INPUT_FILE" "$OUTPUT_FILE"
if [ $? -eq 0 ]; then
echo "Successfully cleaned $INPUT_FILE"
echo "Output saved to $OUTPUT_FILE"
else
echo "Error processing $INPUT_FILE"
exit 1
fi
Command Reference Summary
| Command | Description |
|---|---|
python musicbrainz_cleaner.py file.json |
Basic usage |
python musicbrainz_cleaner.py file.json output.json |
Custom output |
python musicbrainz_cleaner.py --help |
Show help |
python musicbrainz_cleaner.py --version |
Show version |