13 KiB
MusicBrainz Data Cleaner - CLI Commands Reference
Overview
The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool uses an interface-based architecture with dependency injection for clean, maintainable code. It creates separate output files for successful and failed songs, along with detailed processing reports.
Basic Command Structure
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main [options]
Command Options
Main Options
| Option | Type | Description | Default | Example |
|---|---|---|---|---|
--source |
string | Source JSON file path | data/songs.json |
--source data/my_songs.json |
--output-success |
string | Output file for successful songs | source-success.json |
--output-success cleaned.json |
--output-failure |
string | Output file for failed songs | source-failure.json |
--output-failure failed.json |
--limit |
number | Process only first N songs | None (all songs) | --limit 1000 |
--use-api |
flag | Force use of HTTP API instead of database | Database mode | --use-api |
--test-connection |
flag | Test connection to MusicBrainz server | None | --test-connection |
--help |
flag | Show help information | None | --help |
--version |
flag | Show version information | None | --version |
Command Examples
Basic Usage (Default)
# Process all songs with default settings
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Output: data/songs-success.json and data/songs-failure.json
Custom Source File
# Process specific file
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
# Output: data/my_songs-success.json and data/my_songs-failure.json
Custom Output Files
# Specify custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json
Limited Processing
# Process only first 1000 songs
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
Force API Mode
# Use HTTP API instead of database (slower but works without PostgreSQL)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
Test Connection
# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
# Test API connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
Help and Information
# Show help information
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help
# Show version information
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version
Input File Format
The input file must be a valid JSON file containing an array of song objects:
[
{
"artist": "ACDC",
"title": "Shot In The Dark",
"disabled": false,
"favorite": true,
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4"
}
]
Required Fields
artist: The artist name (string)title: The song title (string)
Optional Fields
Any additional fields will be preserved in the output:
disabled: Boolean flagfavorite: Boolean flagguid: Unique identifierpath: File path- Any other custom fields
Output Files
The tool creates three output files:
1. Successful Songs (source-success.json)
Array of successfully processed songs with MBIDs added:
[
{
"artist": "AC/DC",
"title": "Shot in the Dark",
"disabled": false,
"favorite": true,
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4",
"mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
"recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
}
]
2. Failed Songs (source-failure.json)
Array of songs that couldn't be processed (same format as source):
[
{
"artist": "Unknown Artist",
"title": "Unknown Song",
"disabled": false,
"favorite": false,
"guid": "12345678-1234-1234-1234-123456789012",
"path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
}
]
3. Processing Report (processing_report_YYYYMMDD_HHMMSS.txt)
Human-readable text report with statistics and failed song list:
MusicBrainz Data Cleaner - Processing Report
==================================================
Source File: data/songs.json
Processing Date: 2024-12-19 14:30:22
Processing Time: 15263.3 seconds
SUMMARY
--------------------
Total Songs Processed: 49,170
Successful Songs: 40,692
Failed Songs: 8,478
Success Rate: 82.8%
DETAILED STATISTICS
--------------------
Artists Found: 44,526/49,170 (90.6%)
Recordings Found: 40,998/49,170 (83.4%)
Processing Speed: 3.2 songs/second
OUTPUT FILES
--------------------
Successful Songs: data/songs-success.json
Failed Songs: data/songs-failure.json
Report File: data/processing_report_20241219_143022.txt
FAILED SONGS (First 50)
--------------------
1. Unknown Artist - Unknown Song
2. Invalid Artist - Invalid Title
3. Test Artist - Test Song
...
Added Fields (Successful Songs Only)
mbid: MusicBrainz Artist ID (string)recording_mbid: MusicBrainz Recording ID (string)
Processing Output
Progress Indicators
🚀 Starting song processing...
📊 Total songs to process: 49,170
Using database connection
==================================================
[1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
[2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
[3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
==================================================
🎉 Processing completed!
📊 Final Results:
⏱️ Total processing time: 15263.3 seconds
🚀 Average speed: 3.2 songs/second
✅ Artists found: 44,526/49,170 (90.6%)
✅ Recordings found: 40,998/49,170 (83.4%)
❌ Failed songs: 8,478 (17.2%)
📄 Files saved:
✅ Successful songs: data/songs-success.json
❌ Failed songs: data/songs-failure.json
📋 Text report: data/processing_report_20241219_143022.txt
📊 JSON report: data/processing_report_20241219_143022.json
Status Indicators
| Symbol | Meaning | Description |
|---|---|---|
| ✅ | Success | Song processed successfully with MBIDs found |
| ❌ | Failure | Song processing failed (no MBIDs found) |
| 📈 | Progress | Progress update with statistics |
| 🚀 | Start | Processing started |
| 🎉 | Complete | Processing completed successfully |
Error Messages and Exit Codes
Exit Codes
| Code | Meaning | Description |
|---|---|---|
| 0 | Success | Processing completed successfully |
| 1 | Error | General error occurred |
| 2 | Usage Error | Invalid command line arguments |
Common Error Messages
File Not Found
Error: Source file does not exist: data/songs.json
Invalid JSON
Error: Invalid JSON in file 'songs.json'
Invalid Input Format
Error: Source file should contain a JSON array of songs
Connection Error
❌ Connection to MusicBrainz database failed
Missing Dependencies
ModuleNotFoundError: No module named 'requests'
Architecture Overview
Interface-Based Design
The tool uses a clean interface-based architecture:
MusicBrainzDataProviderInterface: Common protocol for data accessDataProviderFactory: Creates appropriate provider (database or API)SongProcessor: Centralized processing logic using the interface- Dependency Injection: CLI depends on interfaces, not concrete classes
Data Flow
- CLI uses
DataProviderFactoryto create data provider - Factory returns either database or API implementation
- SongProcessor processes songs using the common interface
- Same logic works regardless of provider type
Environment Configuration
Docker Environment
The tool runs in a Docker container with the following configuration:
| Setting | Default | Description |
|---|---|---|
| Database Host | db |
PostgreSQL database container |
| Database Port | 5432 |
PostgreSQL port |
| Database Name | musicbrainz_db |
MusicBrainz database name |
| API URL | http://localhost:5001 |
MusicBrainz web server URL |
Environment Variables
# Database configuration
DB_HOST=db
DB_PORT=5432
DB_NAME=musicbrainz_db
DB_USER=musicbrainz
DB_PASSWORD=musicbrainz
# Web server configuration
MUSICBRAINZ_WEB_SERVER_PORT=5001
Troubleshooting Commands
Check MusicBrainz Server Status
# Test if web server is running
curl -I http://localhost:5001
# Test database connection
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"
Validate JSON File
# Check if JSON is valid
python -m json.tool data/songs.json
# Check JSON structure
python -c "import json; data=json.load(open('data/songs.json')); print('Valid JSON array with', len(data), 'items')"
Test Tool Connection
# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
# Test API connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
Advanced Usage
Batch Processing
To process multiple files, you can use shell scripting:
# Process all JSON files in data directory
for file in data/*.json; do
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source "$file"
done
Large Files
For large files, the tool processes songs efficiently with:
- Direct database access for maximum speed
- Progress tracking every 100 songs
- Memory-efficient processing
- No rate limiting with database access
Custom Processing
# Process with custom chunk size (for testing)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --limit 1000
# Process with custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success my_cleaned.json --output-failure my_failed.json
Command Line Shortcuts
Common Aliases
Add these to your shell profile for convenience:
# Add to ~/.bashrc or ~/.zshrc
alias mbclean='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main'
alias mbclean-help='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help'
alias mbclean-test='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection'
Usage with Aliases
# Using alias
mbclean --source data/songs.json
# Show help
mbclean-help
# Test connection
mbclean-test
Integration Examples
With Git
# Process files and commit changes
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json
git add data/songs-success.json data/songs-failure.json
git commit -m "Clean song metadata with MusicBrainz IDs"
With Cron Jobs
# Add to crontab to process files daily
0 2 * * * cd /path/to/musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source /path/to/songs.json
With Shell Scripts
#!/bin/bash
# clean_songs.sh
INPUT_FILE="$1"
OUTPUT_SUCCESS="${INPUT_FILE%.json}-success.json"
OUTPUT_FAILURE="${INPUT_FILE%.json}-failure.json"
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main \
--source "$INPUT_FILE" \
--output-success "$OUTPUT_SUCCESS" \
--output-failure "$OUTPUT_FAILURE"
if [ $? -eq 0 ]; then
echo "Successfully processed $INPUT_FILE"
echo "Successful songs: $OUTPUT_SUCCESS"
echo "Failed songs: $OUTPUT_FAILURE"
else
echo "Error processing $INPUT_FILE"
exit 1
fi
Command Reference Summary
| Command | Description |
|---|---|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main |
Process all songs with defaults |
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source file.json |
Process specific file |
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000 |
Process first 1000 songs |
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection |
Test database connection |
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api |
Force API mode |
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help |
Show help |
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version |
Show version |