musicbrainz-cleaner/COMMANDS.md

466 lines
13 KiB
Markdown

# MusicBrainz Data Cleaner - CLI Commands Reference
## Overview
The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool uses an interface-based architecture with dependency injection for clean, maintainable code. It creates separate output files for successful and failed songs, along with detailed processing reports.
## Basic Command Structure
```bash
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main [options]
```
## Command Options
### Main Options
| Option | Type | Description | Default | Example |
|--------|------|-------------|---------|---------|
| `--source` | string | Source JSON file path | `data/songs.json` | `--source data/my_songs.json` |
| `--output-success` | string | Output file for successful songs | `source-success.json` | `--output-success cleaned.json` |
| `--output-failure` | string | Output file for failed songs | `source-failure.json` | `--output-failure failed.json` |
| `--limit` | number | Process only first N songs | None (all songs) | `--limit 1000` |
| `--use-api` | flag | Force use of HTTP API instead of database | Database mode | `--use-api` |
| `--test-connection` | flag | Test connection to MusicBrainz server | None | `--test-connection` |
| `--help` | flag | Show help information | None | `--help` |
| `--version` | flag | Show version information | None | `--version` |
## Command Examples
### Basic Usage (Default)
```bash
# Process all songs with default settings
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Output: data/songs-success.json and data/songs-failure.json
```
### Custom Source File
```bash
# Process specific file
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
# Output: data/my_songs-success.json and data/my_songs-failure.json
```
### Custom Output Files
```bash
# Specify custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json
```
### Limited Processing
```bash
# Process only first 1000 songs
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000
```
### Force API Mode
```bash
# Use HTTP API instead of database (slower but works without PostgreSQL)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api
```
### Test Connection
```bash
# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
# Test API connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
```
### Help and Information
```bash
# Show help information
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help
# Show version information
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version
```
## Input File Format
The input file must be a valid JSON file containing an array of song objects:
```json
[
{
"artist": "ACDC",
"title": "Shot In The Dark",
"disabled": false,
"favorite": true,
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4"
}
]
```
### Required Fields
- `artist`: The artist name (string)
- `title`: The song title (string)
### Optional Fields
Any additional fields will be preserved in the output:
- `disabled`: Boolean flag
- `favorite`: Boolean flag
- `guid`: Unique identifier
- `path`: File path
- Any other custom fields
## Output Files
The tool creates **three output files**:
### 1. Successful Songs (`source-success.json`)
Array of successfully processed songs with MBIDs added:
```json
[
{
"artist": "AC/DC",
"title": "Shot in the Dark",
"disabled": false,
"favorite": true,
"guid": "8946008c-7acc-d187-60e6-5286e55ad502",
"path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4",
"mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
"recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
}
]
```
### 2. Failed Songs (`source-failure.json`)
Array of songs that couldn't be processed (same format as source):
```json
[
{
"artist": "Unknown Artist",
"title": "Unknown Song",
"disabled": false,
"favorite": false,
"guid": "12345678-1234-1234-1234-123456789012",
"path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
}
]
```
### 3. Processing Report (`processing_report_YYYYMMDD_HHMMSS.txt`)
Human-readable text report with statistics and failed song list:
```
MusicBrainz Data Cleaner - Processing Report
==================================================
Source File: data/songs.json
Processing Date: 2024-12-19 14:30:22
Processing Time: 15263.3 seconds
SUMMARY
--------------------
Total Songs Processed: 49,170
Successful Songs: 40,692
Failed Songs: 8,478
Success Rate: 82.8%
DETAILED STATISTICS
--------------------
Artists Found: 44,526/49,170 (90.6%)
Recordings Found: 40,998/49,170 (83.4%)
Processing Speed: 3.2 songs/second
OUTPUT FILES
--------------------
Successful Songs: data/songs-success.json
Failed Songs: data/songs-failure.json
Report File: data/processing_report_20241219_143022.txt
FAILED SONGS (First 50)
--------------------
1. Unknown Artist - Unknown Song
2. Invalid Artist - Invalid Title
3. Test Artist - Test Song
...
```
### Added Fields (Successful Songs Only)
- `mbid`: MusicBrainz Artist ID (string)
- `recording_mbid`: MusicBrainz Recording ID (string)
## Processing Output
### Progress Indicators
```
🚀 Starting song processing...
📊 Total songs to process: 49,170
Using database connection
==================================================
[1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
[2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
[3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
==================================================
🎉 Processing completed!
📊 Final Results:
⏱️ Total processing time: 15263.3 seconds
🚀 Average speed: 3.2 songs/second
✅ Artists found: 44,526/49,170 (90.6%)
✅ Recordings found: 40,998/49,170 (83.4%)
❌ Failed songs: 8,478 (17.2%)
📄 Files saved:
✅ Successful songs: data/songs-success.json
❌ Failed songs: data/songs-failure.json
📋 Text report: data/processing_report_20241219_143022.txt
📊 JSON report: data/processing_report_20241219_143022.json
```
### Status Indicators
| Symbol | Meaning | Description |
|--------|---------|-------------|
| ✅ | Success | Song processed successfully with MBIDs found |
| ❌ | Failure | Song processing failed (no MBIDs found) |
| 📈 | Progress | Progress update with statistics |
| 🚀 | Start | Processing started |
| 🎉 | Complete | Processing completed successfully |
## Error Messages and Exit Codes
### Exit Codes
| Code | Meaning | Description |
|------|---------|-------------|
| 0 | Success | Processing completed successfully |
| 1 | Error | General error occurred |
| 2 | Usage Error | Invalid command line arguments |
### Common Error Messages
#### File Not Found
```
Error: Source file does not exist: data/songs.json
```
#### Invalid JSON
```
Error: Invalid JSON in file 'songs.json'
```
#### Invalid Input Format
```
Error: Source file should contain a JSON array of songs
```
#### Connection Error
```
❌ Connection to MusicBrainz database failed
```
#### Missing Dependencies
```
ModuleNotFoundError: No module named 'requests'
```
## Architecture Overview
### Interface-Based Design
The tool uses a clean interface-based architecture:
- **`MusicBrainzDataProvider` Interface**: Common protocol for data access
- **`DataProviderFactory`**: Creates appropriate provider (database or API)
- **`SongProcessor`**: Centralized processing logic using the interface
- **Dependency Injection**: CLI depends on interfaces, not concrete classes
### Data Flow
1. **CLI** uses `DataProviderFactory` to create data provider
2. **Factory** returns either database or API implementation
3. **SongProcessor** processes songs using the common interface
4. **Same logic** works regardless of provider type
## Environment Configuration
### Docker Environment
The tool runs in a Docker container with the following configuration:
| Setting | Default | Description |
|---------|---------|-------------|
| Database Host | `db` | PostgreSQL database container |
| Database Port | `5432` | PostgreSQL port |
| Database Name | `musicbrainz_db` | MusicBrainz database name |
| API URL | `http://localhost:5001` | MusicBrainz web server URL |
### Environment Variables
```bash
# Database configuration
DB_HOST=db
DB_PORT=5432
DB_NAME=musicbrainz_db
DB_USER=musicbrainz
DB_PASSWORD=musicbrainz
# Web server configuration
MUSICBRAINZ_WEB_SERVER_PORT=5001
```
## Troubleshooting Commands
### Check MusicBrainz Server Status
```bash
# Test if web server is running
curl -I http://localhost:5001
# Test database connection
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"
```
### Validate JSON File
```bash
# Check if JSON is valid
python -m json.tool data/songs.json
# Check JSON structure
python -c "import json; data=json.load(open('data/songs.json')); print('Valid JSON array with', len(data), 'items')"
```
### Test Tool Connection
```bash
# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection
# Test API connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api
```
## Advanced Usage
### Batch Processing
To process multiple files, you can use shell scripting:
```bash
# Process all JSON files in data directory
for file in data/*.json; do
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source "$file"
done
```
### Large Files
For large files, the tool processes songs efficiently with:
- Direct database access for maximum speed
- Progress tracking every 100 songs
- Memory-efficient processing
- No rate limiting with database access
### Custom Processing
```bash
# Process with custom chunk size (for testing)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --limit 1000
# Process with custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success my_cleaned.json --output-failure my_failed.json
```
## Command Line Shortcuts
### Common Aliases
Add these to your shell profile for convenience:
```bash
# Add to ~/.bashrc or ~/.zshrc
alias mbclean='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main'
alias mbclean-help='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help'
alias mbclean-test='docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection'
```
### Usage with Aliases
```bash
# Using alias
mbclean --source data/songs.json
# Show help
mbclean-help
# Test connection
mbclean-test
```
## Integration Examples
### With Git
```bash
# Process files and commit changes
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json
git add data/songs-success.json data/songs-failure.json
git commit -m "Clean song metadata with MusicBrainz IDs"
```
### With Cron Jobs
```bash
# Add to crontab to process files daily
0 2 * * * cd /path/to/musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source /path/to/songs.json
```
### With Shell Scripts
```bash
#!/bin/bash
# clean_songs.sh
INPUT_FILE="$1"
OUTPUT_SUCCESS="${INPUT_FILE%.json}-success.json"
OUTPUT_FAILURE="${INPUT_FILE%.json}-failure.json"
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main \
--source "$INPUT_FILE" \
--output-success "$OUTPUT_SUCCESS" \
--output-failure "$OUTPUT_FAILURE"
if [ $? -eq 0 ]; then
echo "Successfully processed $INPUT_FILE"
echo "Successful songs: $OUTPUT_SUCCESS"
echo "Failed songs: $OUTPUT_FAILURE"
else
echo "Error processing $INPUT_FILE"
exit 1
fi
```
## Command Reference Summary
| Command | Description |
|---------|-------------|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main` | Process all songs with defaults |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source file.json` | Process specific file |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000` | Process first 1000 songs |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection` | Test database connection |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api` | Force API mode |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help` | Show help |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version` | Show version |