466 lines
13 KiB
Markdown
466 lines
13 KiB
Markdown
# MusicBrainz Data Cleaner - CLI Commands Reference
|
|
|
|
## Overview
|
|
|
|
The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool uses an interface-based architecture with dependency injection for clean, maintainable code. It creates separate output files for successful and failed songs, along with detailed processing reports.
|
|
|
|
## Basic Command Structure
|
|
|
|
```bash
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main [options]
|
|
```
|
|
|
|
## Command Options
|
|
|
|
### Main Options
|
|
|
|
| Option | Type | Description | Default | Example |
|
|
|--------|------|-------------|---------|---------|
|
|
| `--source` | string | Source JSON file path | `data/songs.json` | `--source data/my_songs.json` |
|
|
| `--output-success` | string | Output file for successful songs | `source-success.json` | `--output-success cleaned.json` |
|
|
| `--output-failure` | string | Output file for failed songs | `source-failure.json` | `--output-failure failed.json` |
|
|
| `--limit` | number | Process only first N songs | None (all songs) | `--limit 1000` |
|
|
| `--use-api` | flag | Force use of HTTP API instead of database | Database mode | `--use-api` |
|
|
| `--test-connection` | flag | Test connection to MusicBrainz server | None | `--test-connection` |
|
|
| `--help` | flag | Show help information | None | `--help` |
|
|
| `--version` | flag | Show version information | None | `--version` |
|
|
|
|
## Command Examples
|
|
|
|
### Basic Usage (Default)
|
|
|
|
```bash
|
|
# Process all songs with default settings
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
|
|
# Output: data/songs-success.json and data/songs-failure.json
|
|
```
|
|
|
|
### Custom Source File
|
|
|
|
```bash
|
|
# Process specific file
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
|
|
# Output: data/my_songs-success.json and data/my_songs-failure.json
|
|
```
|
|
|
|
### Custom Output Files
|
|
|
|
```bash
|
|
# Specify custom output files
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json
|
|
```
|
|
|
|
### Limited Processing
|
|
|
|
```bash
|
|
# Process only first 1000 songs
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
|
|
```
|
|
|
|
### Force API Mode
|
|
|
|
```bash
|
|
# Use HTTP API instead of database (slower but works without PostgreSQL)
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
|
|
```
|
|
|
|
### Test Connection
|
|
|
|
```bash
|
|
# Test database connection
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
|
|
|
|
# Test API connection
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
|
|
```
|
|
|
|
### Help and Information
|
|
|
|
```bash
|
|
# Show help information
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help
|
|
|
|
# Show version information
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version
|
|
```
|
|
|
|
## Input File Format
|
|
|
|
The input file must be a valid JSON file containing an array of song objects:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"artist": "ACDC",
|
|
"title": "Shot In The Dark",
|
|
"disabled": false,
|
|
"favorite": true,
|
|
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
|
|
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4"
|
|
}
|
|
]
|
|
```
|
|
|
|
### Required Fields
|
|
|
|
- `artist`: The artist name (string)
|
|
- `title`: The song title (string)
|
|
|
|
### Optional Fields
|
|
|
|
Any additional fields will be preserved in the output:
|
|
- `disabled`: Boolean flag
|
|
- `favorite`: Boolean flag
|
|
- `guid`: Unique identifier
|
|
- `path`: File path
|
|
- Any other custom fields
|
|
|
|
## Output Files
|
|
|
|
The tool creates **three output files**:
|
|
|
|
### 1. Successful Songs (`source-success.json`)
|
|
|
|
Array of successfully processed songs with MBIDs added:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"artist": "AC/DC",
|
|
"title": "Shot in the Dark",
|
|
"disabled": false,
|
|
"favorite": true,
|
|
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
|
|
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4",
|
|
"mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
|
|
"recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
|
|
}
|
|
]
|
|
```
|
|
|
|
### 2. Failed Songs (`source-failure.json`)
|
|
|
|
Array of songs that couldn't be processed (same format as source):
|
|
|
|
```json
|
|
[
|
|
{
|
|
"artist": "Unknown Artist",
|
|
"title": "Unknown Song",
|
|
"disabled": false,
|
|
"favorite": false,
|
|
"guid": "12345678-1234-1234-1234-123456789012",
|
|
"path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
|
|
}
|
|
]
|
|
```
|
|
|
|
### 3. Processing Report (`processing_report_YYYYMMDD_HHMMSS.txt`)
|
|
|
|
Human-readable text report with statistics and failed song list:
|
|
|
|
```
|
|
MusicBrainz Data Cleaner - Processing Report
|
|
==================================================
|
|
|
|
Source File: data/songs.json
|
|
Processing Date: 2024-12-19 14:30:22
|
|
Processing Time: 15263.3 seconds
|
|
|
|
SUMMARY
|
|
--------------------
|
|
Total Songs Processed: 49,170
|
|
Successful Songs: 40,692
|
|
Failed Songs: 8,478
|
|
Success Rate: 82.8%
|
|
|
|
DETAILED STATISTICS
|
|
--------------------
|
|
Artists Found: 44,526/49,170 (90.6%)
|
|
Recordings Found: 40,998/49,170 (83.4%)
|
|
Processing Speed: 3.2 songs/second
|
|
|
|
OUTPUT FILES
|
|
--------------------
|
|
Successful Songs: data/songs-success.json
|
|
Failed Songs: data/songs-failure.json
|
|
Report File: data/processing_report_20241219_143022.txt
|
|
|
|
FAILED SONGS (First 50)
|
|
--------------------
|
|
1. Unknown Artist - Unknown Song
|
|
2. Invalid Artist - Invalid Title
|
|
3. Test Artist - Test Song
|
|
...
|
|
```
|
|
|
|
### Added Fields (Successful Songs Only)
|
|
|
|
- `mbid`: MusicBrainz Artist ID (string)
|
|
- `recording_mbid`: MusicBrainz Recording ID (string)
|
|
|
|
## Processing Output
|
|
|
|
### Progress Indicators
|
|
|
|
```
|
|
🚀 Starting song processing...
|
|
📊 Total songs to process: 49,170
|
|
Using database connection
|
|
==================================================
|
|
|
|
[1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
|
|
[2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
|
|
[3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
|
|
|
|
📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
|
|
|
|
==================================================
|
|
🎉 Processing completed!
|
|
📊 Final Results:
|
|
⏱️ Total processing time: 15263.3 seconds
|
|
🚀 Average speed: 3.2 songs/second
|
|
✅ Artists found: 44,526/49,170 (90.6%)
|
|
✅ Recordings found: 40,998/49,170 (83.4%)
|
|
❌ Failed songs: 8,478 (17.2%)
|
|
📄 Files saved:
|
|
✅ Successful songs: data/songs-success.json
|
|
❌ Failed songs: data/songs-failure.json
|
|
📋 Text report: data/processing_report_20241219_143022.txt
|
|
📊 JSON report: data/processing_report_20241219_143022.json
|
|
```
|
|
|
|
### Status Indicators
|
|
|
|
| Symbol | Meaning | Description |
|
|
|--------|---------|-------------|
|
|
| ✅ | Success | Song processed successfully with MBIDs found |
|
|
| ❌ | Failure | Song processing failed (no MBIDs found) |
|
|
| 📈 | Progress | Progress update with statistics |
|
|
| 🚀 | Start | Processing started |
|
|
| 🎉 | Complete | Processing completed successfully |
|
|
|
|
## Error Messages and Exit Codes
|
|
|
|
### Exit Codes
|
|
|
|
| Code | Meaning | Description |
|
|
|------|---------|-------------|
|
|
| 0 | Success | Processing completed successfully |
|
|
| 1 | Error | General error occurred |
|
|
| 2 | Usage Error | Invalid command line arguments |
|
|
|
|
### Common Error Messages
|
|
|
|
#### File Not Found
|
|
```
|
|
Error: Source file does not exist: data/songs.json
|
|
```
|
|
|
|
#### Invalid JSON
|
|
```
|
|
Error: Invalid JSON in file 'songs.json'
|
|
```
|
|
|
|
#### Invalid Input Format
|
|
```
|
|
Error: Source file should contain a JSON array of songs
|
|
```
|
|
|
|
#### Connection Error
|
|
```
|
|
❌ Connection to MusicBrainz database failed
|
|
```
|
|
|
|
#### Missing Dependencies
|
|
```
|
|
ModuleNotFoundError: No module named 'requests'
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
### Interface-Based Design
|
|
|
|
The tool uses a clean interface-based architecture:
|
|
|
|
- **`MusicBrainzDataProvider` Interface**: Common protocol for data access
|
|
- **`DataProviderFactory`**: Creates appropriate provider (database or API)
|
|
- **`SongProcessor`**: Centralized processing logic using the interface
|
|
- **Dependency Injection**: CLI depends on interfaces, not concrete classes
|
|
|
|
### Data Flow
|
|
|
|
1. **CLI** uses `DataProviderFactory` to create data provider
|
|
2. **Factory** returns either database or API implementation
|
|
3. **SongProcessor** processes songs using the common interface
|
|
4. **Same logic** works regardless of provider type
|
|
|
|
## Environment Configuration
|
|
|
|
### Docker Environment
|
|
|
|
The tool runs in a Docker container with the following configuration:
|
|
|
|
| Setting | Default | Description |
|
|
|---------|---------|-------------|
|
|
| Database Host | `db` | PostgreSQL database container |
|
|
| Database Port | `5432` | PostgreSQL port |
|
|
| Database Name | `musicbrainz_db` | MusicBrainz database name |
|
|
| API URL | `http://localhost:5001` | MusicBrainz web server URL |
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Database configuration
|
|
DB_HOST=db
|
|
DB_PORT=5432
|
|
DB_NAME=musicbrainz_db
|
|
DB_USER=musicbrainz
|
|
DB_PASSWORD=musicbrainz
|
|
|
|
# Web server configuration
|
|
MUSICBRAINZ_WEB_SERVER_PORT=5001
|
|
```
|
|
|
|
## Troubleshooting Commands
|
|
|
|
### Check MusicBrainz Server Status
|
|
|
|
```bash
|
|
# Test if web server is running
|
|
curl -I http://localhost:5001
|
|
|
|
# Test database connection
|
|
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"
|
|
```
|
|
|
|
### Validate JSON File
|
|
|
|
```bash
|
|
# Check if JSON is valid
|
|
python -m json.tool data/songs.json
|
|
|
|
# Check JSON structure
|
|
python -c "import json; data=json.load(open('data/songs.json')); print('Valid JSON array with', len(data), 'items')"
|
|
```
|
|
|
|
### Test Tool Connection
|
|
|
|
```bash
|
|
# Test database connection
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
|
|
|
|
# Test API connection
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
|
|
```
|
|
|
|
## Advanced Usage
|
|
|
|
### Batch Processing
|
|
|
|
To process multiple files, you can use shell scripting:
|
|
|
|
```bash
|
|
# Process all JSON files in data directory
|
|
for file in data/*.json; do
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source "$file"
|
|
done
|
|
```
|
|
|
|
### Large Files
|
|
|
|
For large files, the tool processes songs efficiently with:
|
|
- Direct database access for maximum speed
|
|
- Progress tracking every 100 songs
|
|
- Memory-efficient processing
|
|
- No rate limiting with database access
|
|
|
|
### Custom Processing
|
|
|
|
```bash
|
|
# Process with custom chunk size (for testing)
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --limit 1000
|
|
|
|
# Process with custom output files
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success my_cleaned.json --output-failure my_failed.json
|
|
```
|
|
|
|
## Command Line Shortcuts
|
|
|
|
### Common Aliases
|
|
|
|
Add these to your shell profile for convenience:
|
|
|
|
```bash
|
|
# Add to ~/.bashrc or ~/.zshrc
|
|
alias mbclean='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main'
|
|
alias mbclean-help='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help'
|
|
alias mbclean-test='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection'
|
|
```
|
|
|
|
### Usage with Aliases
|
|
|
|
```bash
|
|
# Using alias
|
|
mbclean --source data/songs.json
|
|
|
|
# Show help
|
|
mbclean-help
|
|
|
|
# Test connection
|
|
mbclean-test
|
|
```
|
|
|
|
## Integration Examples
|
|
|
|
### With Git
|
|
|
|
```bash
|
|
# Process files and commit changes
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json
|
|
git add data/songs-success.json data/songs-failure.json
|
|
git commit -m "Clean song metadata with MusicBrainz IDs"
|
|
```
|
|
|
|
### With Cron Jobs
|
|
|
|
```bash
|
|
# Add to crontab to process files daily
|
|
0 2 * * * cd /path/to/musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source /path/to/songs.json
|
|
```
|
|
|
|
### With Shell Scripts
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# clean_songs.sh
|
|
INPUT_FILE="$1"
|
|
OUTPUT_SUCCESS="${INPUT_FILE%.json}-success.json"
|
|
OUTPUT_FAILURE="${INPUT_FILE%.json}-failure.json"
|
|
|
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main \
|
|
--source "$INPUT_FILE" \
|
|
--output-success "$OUTPUT_SUCCESS" \
|
|
--output-failure "$OUTPUT_FAILURE"
|
|
|
|
if [ $? -eq 0 ]; then
|
|
echo "Successfully processed $INPUT_FILE"
|
|
echo "Successful songs: $OUTPUT_SUCCESS"
|
|
echo "Failed songs: $OUTPUT_FAILURE"
|
|
else
|
|
echo "Error processing $INPUT_FILE"
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
## Command Reference Summary
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main` | Process all songs with defaults |
|
|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source file.json` | Process specific file |
|
|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000` | Process first 1000 songs |
|
|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection` | Test database connection |
|
|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api` | Force API mode |
|
|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help` | Show help |
|
|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version` | Show version | |