Signed-off-by: Matt Bruce <mbrucedogs@gmail.com>
This commit is contained in:
parent
504820c8a1
commit
4bf359ee5d
34
PRD.md
34
PRD.md
@ -374,6 +374,33 @@ python musicbrainz_cleaner.py --test-connection
|
|||||||
- **NEW**: Review and update band name protection list in `data/known_artists.json`
|
- **NEW**: Review and update band name protection list in `data/known_artists.json`
|
||||||
- **NEW**: Monitor collaboration detection accuracy
|
- **NEW**: Monitor collaboration detection accuracy
|
||||||
|
|
||||||
|
### Operational Procedures
|
||||||
|
|
||||||
|
#### After System Reboot
|
||||||
|
1. **Start Docker Desktop** (if auto-start not enabled)
|
||||||
|
2. **Restart MusicBrainz services**:
|
||||||
|
```bash
|
||||||
|
cd musicbrainz-cleaner
|
||||||
|
./restart_services.sh
|
||||||
|
```
|
||||||
|
3. **Wait for database initialization** (5-10 minutes)
|
||||||
|
4. **Test connection**:
|
||||||
|
```bash
|
||||||
|
docker-compose run --rm musicbrainz-cleaner python3 quick_test_20.py
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Service Management
|
||||||
|
- **Start services**: `./start_services.sh` (full setup) or `./restart_services.sh` (quick restart)
|
||||||
|
- **Stop services**: `cd ../musicbrainz-docker && docker-compose down`
|
||||||
|
- **Check status**: `cd ../musicbrainz-docker && docker-compose ps`
|
||||||
|
- **View logs**: `cd ../musicbrainz-docker && docker-compose logs -f`
|
||||||
|
|
||||||
|
#### Troubleshooting
|
||||||
|
- **Port conflicts**: Use `MUSICBRAINZ_WEB_SERVER_PORT=5001` environment variable
|
||||||
|
- **Container conflicts**: Run `docker-compose down` then restart
|
||||||
|
- **Database issues**: Check logs with `docker-compose logs -f db`
|
||||||
|
- **Memory issues**: Increase Docker Desktop memory allocation (8GB+ recommended)
|
||||||
|
|
||||||
### Support
|
### Support
|
||||||
- GitHub issues for bug reports
|
- GitHub issues for bug reports
|
||||||
- Documentation updates
|
- Documentation updates
|
||||||
@ -406,3 +433,10 @@ python musicbrainz_cleaner.py --test-connection
|
|||||||
- **Database-first approach** ensures live data
|
- **Database-first approach** ensures live data
|
||||||
- **Fuzzy search thresholds** need tuning for different datasets
|
- **Fuzzy search thresholds** need tuning for different datasets
|
||||||
- **Connection pooling** would improve performance for large datasets
|
- **Connection pooling** would improve performance for large datasets
|
||||||
|
|
||||||
|
### Operational Insights
|
||||||
|
- **Docker Service Management**: MusicBrainz services require proper startup sequence and initialization time
|
||||||
|
- **Port Conflicts**: Common on macOS, requiring automatic detection and resolution
|
||||||
|
- **System Reboots**: Services need to be restarted after system reboots, but data persists in Docker volumes
|
||||||
|
- **Resource Requirements**: MusicBrainz services require significant memory (8GB+ recommended) and disk space
|
||||||
|
- **Platform Compatibility**: Apple Silicon (M1/M2) works but may show platform mismatch warnings
|
||||||
103
README.md
103
README.md
@ -39,50 +39,81 @@ A powerful command-line tool that cleans and normalizes your song data using the
|
|||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
### 1. Install Dependencies
|
### Option 1: Automated Setup (Recommended)
|
||||||
|
|
||||||
|
1. **Start MusicBrainz services**:
|
||||||
|
```bash
|
||||||
|
./start_services.sh
|
||||||
|
```
|
||||||
|
This script will:
|
||||||
|
- Check for Docker and port conflicts
|
||||||
|
- Start all MusicBrainz services
|
||||||
|
- Wait for database initialization
|
||||||
|
- Create environment configuration
|
||||||
|
- Test the connection
|
||||||
|
|
||||||
|
2. **Run the cleaner**:
|
||||||
|
```bash
|
||||||
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --input data/songs.json --output cleaned_songs.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Manual Setup
|
||||||
|
|
||||||
|
1. **Start MusicBrainz services manually**:
|
||||||
|
```bash
|
||||||
|
cd ../musicbrainz-docker
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
|
||||||
|
```
|
||||||
|
Wait 5-10 minutes for database initialization.
|
||||||
|
|
||||||
|
2. **Create environment configuration**:
|
||||||
|
```bash
|
||||||
|
# Create .env file in musicbrainz-cleaner directory
|
||||||
|
cat > .env << EOF
|
||||||
|
DB_HOST=172.18.0.2
|
||||||
|
DB_PORT=5432
|
||||||
|
DB_NAME=musicbrainz_db
|
||||||
|
DB_USER=musicbrainz
|
||||||
|
DB_PASSWORD=musicbrainz
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=5001
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Run the cleaner**:
|
||||||
|
```bash
|
||||||
|
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --input data/songs.json --output cleaned_songs.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### For detailed setup instructions, see [SETUP.md](SETUP.md)
|
||||||
|
|
||||||
|
## 🔄 After System Reboot
|
||||||
|
|
||||||
|
After restarting your Mac, you'll need to restart the MusicBrainz services:
|
||||||
|
|
||||||
|
### Quick Restart (Recommended)
|
||||||
```bash
|
```bash
|
||||||
pip install requests psycopg2-binary fuzzywuzzy python-Levenshtein
|
# If Docker Desktop is already running
|
||||||
|
./restart_services.sh
|
||||||
|
|
||||||
|
# Or manually
|
||||||
|
cd ../musicbrainz-docker && MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Set Up MusicBrainz Server
|
### Full Restart (If you have issues)
|
||||||
|
|
||||||
#### Option A: Docker (Recommended)
|
|
||||||
```bash
|
```bash
|
||||||
# Clone MusicBrainz Docker repository
|
# Complete setup including Docker checks
|
||||||
git clone https://github.com/metabrainz/musicbrainz-docker.git
|
./start_services.sh
|
||||||
cd musicbrainz-docker
|
|
||||||
|
|
||||||
# Update postgres.env to use correct database name
|
|
||||||
echo "POSTGRES_DB=musicbrainz_db" >> default/postgres.env
|
|
||||||
|
|
||||||
# Start the server
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# Wait for database to be ready (can take 10-15 minutes)
|
|
||||||
docker-compose logs -f musicbrainz
|
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Option B: Manual Setup
|
### Auto-start Setup (Optional)
|
||||||
1. Install PostgreSQL 12+
|
1. **Enable Docker Desktop auto-start**:
|
||||||
2. Create database: `createdb musicbrainz_db`
|
- Open Docker Desktop
|
||||||
3. Import MusicBrainz data dump
|
- Go to Settings → General
|
||||||
4. Start MusicBrainz server on port 8080
|
- Check "Start Docker Desktop when you log in"
|
||||||
|
|
||||||
### 3. Test Connection
|
2. **Then just run**: `./restart_services.sh` after each reboot
|
||||||
```bash
|
|
||||||
python musicbrainz_cleaner.py --test-connection
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Run the Cleaner
|
**Note**: Your data is preserved in Docker volumes, so you don't need to reconfigure anything after a reboot.
|
||||||
```bash
|
|
||||||
# Use database access (recommended, faster)
|
|
||||||
python musicbrainz_cleaner.py your_songs.json
|
|
||||||
|
|
||||||
# Force API mode (slower, fallback)
|
|
||||||
python musicbrainz_cleaner.py your_songs.json --use-api
|
|
||||||
```
|
|
||||||
|
|
||||||
That's it! Your cleaned data will be saved to `your_songs_cleaned.json`
|
|
||||||
|
|
||||||
## 📋 Requirements
|
## 📋 Requirements
|
||||||
|
|
||||||
|
|||||||
266
SETUP.md
Normal file
266
SETUP.md
Normal file
@ -0,0 +1,266 @@
|
|||||||
|
# MusicBrainz Cleaner Setup Guide
|
||||||
|
|
||||||
|
This guide will help you set up the MusicBrainz database and Docker services needed to run the cleaner.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Docker Desktop installed and running
|
||||||
|
- At least 8GB of available RAM
|
||||||
|
- At least 10GB of free disk space
|
||||||
|
- Git (to clone the repositories)
|
||||||
|
|
||||||
|
## Step 1: Clone the MusicBrainz Server Repository
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone the main MusicBrainz server repository (if not already done)
|
||||||
|
git clone https://github.com/metabrainz/musicbrainz-server.git
|
||||||
|
cd musicbrainz-server
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 2: Start the MusicBrainz Docker Services
|
||||||
|
|
||||||
|
The MusicBrainz server uses Docker Compose to run multiple services including PostgreSQL, Solr search, Redis, and the web server.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Navigate to the musicbrainz-docker directory
|
||||||
|
cd musicbrainz-docker
|
||||||
|
|
||||||
|
# Check if port 5000 is available (common conflict on macOS)
|
||||||
|
lsof -i :5000
|
||||||
|
|
||||||
|
# If port 5000 is in use, use port 5001 instead
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
|
||||||
|
|
||||||
|
# Or if port 5000 is free, use the default
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Troubleshooting Port Conflicts
|
||||||
|
|
||||||
|
If you get a port conflict error:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Kill any process using port 5000
|
||||||
|
lsof -ti:5000 | xargs kill -9
|
||||||
|
|
||||||
|
# Or use a different port
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Troubleshooting Container Conflicts
|
||||||
|
|
||||||
|
If you get container name conflicts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove existing containers
|
||||||
|
docker-compose down
|
||||||
|
|
||||||
|
# Force remove conflicting containers
|
||||||
|
docker rm -f musicbrainz-docker-db-1
|
||||||
|
|
||||||
|
# Start fresh
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 3: Wait for Services to Start
|
||||||
|
|
||||||
|
The services take time to initialize, especially the database:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check service status
|
||||||
|
docker-compose ps
|
||||||
|
|
||||||
|
# Wait for all services to be healthy (this can take 5-10 minutes)
|
||||||
|
docker-compose logs -f db
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important**: Wait until you see database initialization complete messages before proceeding.
|
||||||
|
|
||||||
|
## Step 4: Verify Services Are Running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check all containers are running
|
||||||
|
docker-compose ps
|
||||||
|
|
||||||
|
# Test the web interface (if using port 5001)
|
||||||
|
curl http://localhost:5001
|
||||||
|
|
||||||
|
# Or if using default port 5000
|
||||||
|
curl http://localhost:5000
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 5: Set Environment Variables
|
||||||
|
|
||||||
|
Create a `.env` file in the `musicbrainz-cleaner` directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ../musicbrainz-cleaner
|
||||||
|
|
||||||
|
# Create .env file
|
||||||
|
cat > .env << EOF
|
||||||
|
# Database connection (default Docker setup)
|
||||||
|
DB_HOST=172.18.0.2
|
||||||
|
DB_PORT=5432
|
||||||
|
DB_NAME=musicbrainz_db
|
||||||
|
DB_USER=musicbrainz
|
||||||
|
DB_PASSWORD=musicbrainz
|
||||||
|
|
||||||
|
# MusicBrainz web server
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=5001
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: If you used the default port 5000, change `MUSICBRAINZ_WEB_SERVER_PORT=5001` to `MUSICBRAINZ_WEB_SERVER_PORT=5000`.
|
||||||
|
|
||||||
|
## Step 6: Test the Connection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run a simple test to verify everything is working
|
||||||
|
docker-compose run --rm musicbrainz-cleaner python3 quick_test_20.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Service Details
|
||||||
|
|
||||||
|
The Docker Compose setup includes:
|
||||||
|
|
||||||
|
- **PostgreSQL Database** (`db`): Main MusicBrainz database
|
||||||
|
- **Solr Search** (`search`): Full-text search engine
|
||||||
|
- **Redis** (`redis`): Caching and session storage
|
||||||
|
- **Message Queue** (`mq`): Background job processing
|
||||||
|
- **MusicBrainz Web Server** (`musicbrainz`): Main web application
|
||||||
|
- **Indexer** (`indexer`): Search index maintenance
|
||||||
|
|
||||||
|
## Ports Used
|
||||||
|
|
||||||
|
- **5000/5001**: MusicBrainz web server (configurable)
|
||||||
|
- **5432**: PostgreSQL database (internal)
|
||||||
|
- **8983**: Solr search (internal)
|
||||||
|
- **6379**: Redis (internal)
|
||||||
|
- **5672**: Message queue (internal)
|
||||||
|
|
||||||
|
## Stopping Services
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stop all services
|
||||||
|
cd musicbrainz-docker
|
||||||
|
docker-compose down
|
||||||
|
|
||||||
|
# To also remove volumes (WARNING: this deletes all data)
|
||||||
|
docker-compose down -v
|
||||||
|
```
|
||||||
|
|
||||||
|
## Restarting Services
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restart all services
|
||||||
|
docker-compose restart
|
||||||
|
|
||||||
|
# Or restart specific service
|
||||||
|
docker-compose restart db
|
||||||
|
```
|
||||||
|
|
||||||
|
## Monitoring Services
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View logs for all services
|
||||||
|
docker-compose logs -f
|
||||||
|
|
||||||
|
# View logs for specific service
|
||||||
|
docker-compose logs -f db
|
||||||
|
docker-compose logs -f musicbrainz
|
||||||
|
|
||||||
|
# Check resource usage
|
||||||
|
docker stats
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Database Connection Issues
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check if database is running
|
||||||
|
docker-compose ps db
|
||||||
|
|
||||||
|
# Check database logs
|
||||||
|
docker-compose logs db
|
||||||
|
|
||||||
|
# Test database connection
|
||||||
|
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT 1;"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Memory Issues
|
||||||
|
|
||||||
|
If you encounter memory issues:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Increase Docker memory limit in Docker Desktop settings
|
||||||
|
# Recommended: 8GB minimum, 16GB preferred
|
||||||
|
|
||||||
|
# Check current memory usage
|
||||||
|
docker stats
|
||||||
|
```
|
||||||
|
|
||||||
|
### Platform Issues (Apple Silicon)
|
||||||
|
|
||||||
|
If you're on Apple Silicon (M1/M2) and see platform warnings:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# The services will still work, but you may see warnings about platform mismatch
|
||||||
|
# This is normal and doesn't affect functionality
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Tips
|
||||||
|
|
||||||
|
1. **Allocate sufficient memory** to Docker Desktop (8GB+ recommended)
|
||||||
|
2. **Use SSD storage** for better database performance
|
||||||
|
3. **Close other resource-intensive applications** while running the services
|
||||||
|
4. **Wait for full initialization** before running tests
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
Once the services are running successfully:
|
||||||
|
|
||||||
|
1. Run the quick test: `python3 quick_test_20.py`
|
||||||
|
2. Run larger tests: `python3 bulk_test_1000.py`
|
||||||
|
3. Use the cleaner on your own data: `python3 -m src.cli.main --input your_file.json --output cleaned.json`
|
||||||
|
|
||||||
|
## 🔄 After System Reboot
|
||||||
|
|
||||||
|
After restarting your Mac, you'll need to restart the MusicBrainz services:
|
||||||
|
|
||||||
|
### Quick Restart (Recommended)
|
||||||
|
```bash
|
||||||
|
# Navigate to musicbrainz-cleaner directory
|
||||||
|
cd /Users/mattbruce/Documents/Projects/musicbrainz-server/musicbrainz-cleaner
|
||||||
|
|
||||||
|
# If Docker Desktop is already running
|
||||||
|
./restart_services.sh
|
||||||
|
|
||||||
|
# Or manually
|
||||||
|
cd ../musicbrainz-docker && MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Full Restart (If you have issues)
|
||||||
|
```bash
|
||||||
|
# Complete setup including Docker checks
|
||||||
|
./start_services.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Auto-start Setup (Optional)
|
||||||
|
1. **Enable Docker Desktop auto-start**:
|
||||||
|
- Open Docker Desktop
|
||||||
|
- Go to Settings → General
|
||||||
|
- Check "Start Docker Desktop when you log in"
|
||||||
|
|
||||||
|
2. **Then just run**: `./restart_services.sh` after each reboot
|
||||||
|
|
||||||
|
**Note**: Your data is preserved in Docker volumes, so you don't need to reconfigure anything after a reboot.
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
If you encounter issues:
|
||||||
|
|
||||||
|
1. Check the logs: `docker-compose logs -f`
|
||||||
|
2. Verify Docker has sufficient resources
|
||||||
|
3. Ensure all prerequisites are met
|
||||||
|
4. Try restarting the services: `docker-compose restart`
|
||||||
@ -222,6 +222,7 @@
|
|||||||
"The Proclaimers",
|
"The Proclaimers",
|
||||||
"The Stanley Brothers",
|
"The Stanley Brothers",
|
||||||
"The Statler Brothers",
|
"The Statler Brothers",
|
||||||
|
"The Tamperer featuring Maya",
|
||||||
"The Walker Brothers",
|
"The Walker Brothers",
|
||||||
"The Wilburn Brothers",
|
"The Wilburn Brothers",
|
||||||
"Thompson Twins",
|
"Thompson Twins",
|
||||||
|
|||||||
108
quick_test_20.py
Normal file
108
quick_test_20.py
Normal file
@ -0,0 +1,108 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Quick test script for 20 random songs
|
||||||
|
Simple single-threaded approach
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Add the src directory to the path
|
||||||
|
sys.path.insert(0, '/app')
|
||||||
|
from src.cli.main import MusicBrainzCleaner
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print('🚀 Starting quick test with 20 random songs...')
|
||||||
|
|
||||||
|
# Load songs
|
||||||
|
input_file = Path('data/songs.json')
|
||||||
|
if not input_file.exists():
|
||||||
|
print('❌ songs.json not found')
|
||||||
|
return
|
||||||
|
|
||||||
|
with open(input_file, 'r') as f:
|
||||||
|
all_songs = json.load(f)
|
||||||
|
|
||||||
|
print(f'📊 Total songs available: {len(all_songs):,}')
|
||||||
|
|
||||||
|
# Take 20 random songs
|
||||||
|
import random
|
||||||
|
sample_songs = random.sample(all_songs, 20)
|
||||||
|
print(f'🎯 Testing 20 random songs...')
|
||||||
|
|
||||||
|
# Initialize cleaner
|
||||||
|
cleaner = MusicBrainzCleaner()
|
||||||
|
|
||||||
|
# Process songs
|
||||||
|
found_artists = 0
|
||||||
|
found_recordings = 0
|
||||||
|
failed_songs = []
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
|
||||||
|
for i, song in enumerate(sample_songs, 1):
|
||||||
|
print(f' [{i:2d}/20] Processing: "{song.get("artist", "Unknown")}" - "{song.get("title", "Unknown")}"')
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = cleaner.clean_song(song)
|
||||||
|
|
||||||
|
artist_found = 'mbid' in result
|
||||||
|
recording_found = 'recording_mbid' in result
|
||||||
|
|
||||||
|
if artist_found and recording_found:
|
||||||
|
found_artists += 1
|
||||||
|
found_recordings += 1
|
||||||
|
print(f' ✅ Found both artist and recording')
|
||||||
|
else:
|
||||||
|
failed_songs.append({
|
||||||
|
'original': song,
|
||||||
|
'cleaned': result,
|
||||||
|
'artist_found': artist_found,
|
||||||
|
'recording_found': recording_found,
|
||||||
|
'artist_name': song.get('artist', 'Unknown'),
|
||||||
|
'title': song.get('title', 'Unknown')
|
||||||
|
})
|
||||||
|
print(f' ❌ Artist: {artist_found}, Recording: {recording_found}')
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f' 💥 Error: {e}')
|
||||||
|
failed_songs.append({
|
||||||
|
'original': song,
|
||||||
|
'cleaned': {'error': str(e)},
|
||||||
|
'artist_found': False,
|
||||||
|
'recording_found': False,
|
||||||
|
'artist_name': song.get('artist', 'Unknown'),
|
||||||
|
'title': song.get('title', 'Unknown'),
|
||||||
|
'error': str(e)
|
||||||
|
})
|
||||||
|
|
||||||
|
end_time = time.time()
|
||||||
|
processing_time = end_time - start_time
|
||||||
|
|
||||||
|
# Calculate success rates
|
||||||
|
artist_success_rate = found_artists / 20 * 100
|
||||||
|
recording_success_rate = found_recordings / 20 * 100
|
||||||
|
failed_rate = len(failed_songs) / 20 * 100
|
||||||
|
|
||||||
|
print(f'\n📊 Final Results:')
|
||||||
|
print(f' ⏱️ Processing time: {processing_time:.2f} seconds')
|
||||||
|
print(f' 🚀 Speed: {20/processing_time:.1f} songs/second')
|
||||||
|
print(f' ✅ Artists found: {found_artists}/20 ({artist_success_rate:.1f}%)')
|
||||||
|
print(f' ✅ Recordings found: {found_recordings}/20 ({recording_success_rate:.1f}%)')
|
||||||
|
print(f' ❌ Failed songs: {len(failed_songs)} ({failed_rate:.1f}%)')
|
||||||
|
|
||||||
|
# Show failed songs
|
||||||
|
if failed_songs:
|
||||||
|
print(f'\n🔍 Failed songs:')
|
||||||
|
for i, failed in enumerate(failed_songs, 1):
|
||||||
|
print(f' [{i}] "{failed["artist_name"]}" - "{failed["title"]}"')
|
||||||
|
print(f' Artist found: {failed["artist_found"]}, Recording found: {failed["recording_found"]}')
|
||||||
|
if 'error' in failed:
|
||||||
|
print(f' Error: {failed["error"]}')
|
||||||
|
else:
|
||||||
|
print('\n🎉 All songs processed successfully!')
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
19
restart_services.sh
Executable file
19
restart_services.sh
Executable file
@ -0,0 +1,19 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Quick restart script for after Mac reboots
|
||||||
|
# This assumes Docker Desktop is already running
|
||||||
|
|
||||||
|
echo "🔄 Restarting MusicBrainz services..."
|
||||||
|
|
||||||
|
# Navigate to musicbrainz-docker
|
||||||
|
cd ../musicbrainz-docker
|
||||||
|
|
||||||
|
# Start services
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
|
||||||
|
|
||||||
|
echo "✅ Services started!"
|
||||||
|
echo "⏳ Database may take 5-10 minutes to fully initialize"
|
||||||
|
echo ""
|
||||||
|
echo "📊 Check status: docker-compose ps"
|
||||||
|
echo "📋 View logs: docker-compose logs -f db"
|
||||||
|
echo "🧪 Test when ready: cd ../musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 quick_test_20.py"
|
||||||
@ -276,8 +276,12 @@ class MusicBrainzCleaner:
|
|||||||
|
|
||||||
return collaborators
|
return collaborators
|
||||||
|
|
||||||
def clean_song(self, song: Dict[str, Any]) -> Dict[str, Any]:
|
def clean_song(self, song: Dict[str, Any]) -> Tuple[Dict[str, Any], bool]:
|
||||||
print(f"Processing: {song.get('artist', 'Unknown')} - {song.get('title', 'Unknown')}")
|
"""
|
||||||
|
Clean a single song and return (cleaned_song, success_status)
|
||||||
|
"""
|
||||||
|
original_artist = song.get('artist', '')
|
||||||
|
original_title = song.get('title', '')
|
||||||
|
|
||||||
# Find artist MBID
|
# Find artist MBID
|
||||||
artist_mbid = self.find_artist_mbid(song.get('artist', ''))
|
artist_mbid = self.find_artist_mbid(song.get('artist', ''))
|
||||||
@ -289,13 +293,11 @@ class MusicBrainzCleaner:
|
|||||||
has_collaboration = len(collaborators) > 0
|
has_collaboration = len(collaborators) > 0
|
||||||
|
|
||||||
if artist_mbid is None and has_collaboration:
|
if artist_mbid is None and has_collaboration:
|
||||||
print(f" 🎯 Collaboration detected: {song.get('artist')}")
|
|
||||||
# Try to find recording using artist credit approach
|
# Try to find recording using artist credit approach
|
||||||
if self.use_database:
|
if self.use_database:
|
||||||
result = self.db.find_artist_credit(song.get('artist', ''), song.get('title', ''))
|
result = self.db.find_artist_credit(song.get('artist', ''), song.get('title', ''))
|
||||||
if result:
|
if result:
|
||||||
artist_credit_id, artist_string, recording_mbid = result
|
artist_credit_id, artist_string, recording_mbid = result
|
||||||
print(f" ✅ Found recording: {song.get('title')} (MBID: {recording_mbid})")
|
|
||||||
|
|
||||||
# Update with the correct artist credit
|
# Update with the correct artist credit
|
||||||
song['artist'] = artist_string
|
song['artist'] = artist_string
|
||||||
@ -309,11 +311,9 @@ class MusicBrainzCleaner:
|
|||||||
if artist_result and isinstance(artist_result, tuple) and len(artist_result) >= 2:
|
if artist_result and isinstance(artist_result, tuple) and len(artist_result) >= 2:
|
||||||
song['mbid'] = artist_result[1] # Set the main artist's MBID
|
song['mbid'] = artist_result[1] # Set the main artist's MBID
|
||||||
|
|
||||||
print(f" ✅ Updated to: {song['artist']} - {song.get('title')}")
|
return song, True
|
||||||
return song
|
|
||||||
else:
|
else:
|
||||||
print(f" ❌ Could not find recording: {song.get('title')}")
|
return song, False
|
||||||
return song
|
|
||||||
else:
|
else:
|
||||||
# Fallback to API method
|
# Fallback to API method
|
||||||
recording_mbid = self.find_recording_mbid(None, song.get('title', ''))
|
recording_mbid = self.find_recording_mbid(None, song.get('title', ''))
|
||||||
@ -323,37 +323,29 @@ class MusicBrainzCleaner:
|
|||||||
artist_string = self._build_artist_string(recording_info['artist-credit'])
|
artist_string = self._build_artist_string(recording_info['artist-credit'])
|
||||||
if artist_string:
|
if artist_string:
|
||||||
song['artist'] = artist_string
|
song['artist'] = artist_string
|
||||||
print(f" ✅ Updated to: {song['artist']} - {recording_info['title']}")
|
|
||||||
song['title'] = recording_info['title']
|
song['title'] = recording_info['title']
|
||||||
song['recording_mbid'] = recording_mbid
|
song['recording_mbid'] = recording_mbid
|
||||||
return song
|
return song, True
|
||||||
else:
|
return song, False
|
||||||
print(f" ❌ Could not find recording: {song.get('title')}")
|
|
||||||
return song
|
|
||||||
|
|
||||||
# Regular case (non-collaboration or collaboration not found)
|
# Regular case (non-collaboration or collaboration not found)
|
||||||
if not artist_mbid:
|
if not artist_mbid:
|
||||||
print(f" ❌ Could not find artist: {song.get('artist')}")
|
return song, False
|
||||||
return song
|
|
||||||
|
|
||||||
# Get artist info
|
# Get artist info
|
||||||
artist_info = self.get_artist_info(artist_mbid)
|
artist_info = self.get_artist_info(artist_mbid)
|
||||||
if artist_info:
|
if artist_info:
|
||||||
print(f" ✅ Found artist: {artist_info['name']} (MBID: {artist_mbid})")
|
|
||||||
song['artist'] = artist_info['name']
|
song['artist'] = artist_info['name']
|
||||||
song['mbid'] = artist_mbid
|
song['mbid'] = artist_mbid
|
||||||
|
|
||||||
# Find recording MBID
|
# Find recording MBID
|
||||||
recording_mbid = self.find_recording_mbid(artist_mbid, song.get('title', ''))
|
recording_mbid = self.find_recording_mbid(artist_mbid, song.get('title', ''))
|
||||||
if not recording_mbid:
|
if not recording_mbid:
|
||||||
print(f" ❌ Could not find recording: {song.get('title')}")
|
return song, False
|
||||||
return song
|
|
||||||
|
|
||||||
# Get recording info
|
# Get recording info
|
||||||
recording_info = self.get_recording_info(recording_mbid)
|
recording_info = self.get_recording_info(recording_mbid)
|
||||||
if recording_info:
|
if recording_info:
|
||||||
print(f" ✅ Found recording: {recording_info['title']} (MBID: {recording_mbid})")
|
|
||||||
|
|
||||||
# Update artist string if there are multiple artists, but preserve the artist MBID
|
# Update artist string if there are multiple artists, but preserve the artist MBID
|
||||||
if self.use_database and recording_info.get('artist_credit'):
|
if self.use_database and recording_info.get('artist_credit'):
|
||||||
song['artist'] = recording_info['artist_credit']
|
song['artist'] = recording_info['artist_credit']
|
||||||
@ -370,11 +362,11 @@ class MusicBrainzCleaner:
|
|||||||
|
|
||||||
song['title'] = recording_info['title']
|
song['title'] = recording_info['title']
|
||||||
song['recording_mbid'] = recording_mbid
|
song['recording_mbid'] = recording_mbid
|
||||||
|
return song, True
|
||||||
|
|
||||||
print(f" ✅ Updated to: {song['artist']} - {song['title']}")
|
return song, False
|
||||||
return song
|
|
||||||
|
|
||||||
def clean_songs_file(self, input_file: Path, output_file: Optional[Path] = None, limit: Optional[int] = None) -> Path:
|
def clean_songs_file(self, input_file: Path, output_file: Optional[Path] = None, limit: Optional[int] = None) -> Tuple[Path, List[Dict]]:
|
||||||
try:
|
try:
|
||||||
# Read input file
|
# Read input file
|
||||||
with open(input_file, 'r', encoding='utf-8') as f:
|
with open(input_file, 'r', encoding='utf-8') as f:
|
||||||
@ -382,7 +374,7 @@ class MusicBrainzCleaner:
|
|||||||
|
|
||||||
if not isinstance(songs, list):
|
if not isinstance(songs, list):
|
||||||
print("Error: Input file should contain a JSON array of songs")
|
print("Error: Input file should contain a JSON array of songs")
|
||||||
return input_file
|
return input_file, []
|
||||||
|
|
||||||
# Apply limit if specified
|
# Apply limit if specified
|
||||||
if limit is not None:
|
if limit is not None:
|
||||||
@ -399,11 +391,31 @@ class MusicBrainzCleaner:
|
|||||||
|
|
||||||
# Clean each song
|
# Clean each song
|
||||||
cleaned_songs = []
|
cleaned_songs = []
|
||||||
|
failed_songs = []
|
||||||
|
success_count = 0
|
||||||
|
fail_count = 0
|
||||||
|
|
||||||
for i, song in enumerate(songs, 1):
|
for i, song in enumerate(songs, 1):
|
||||||
print(f"\n[{i}/{len(songs)}]", end=" ")
|
cleaned_song, success = self.clean_song(song)
|
||||||
cleaned_song = self.clean_song(song)
|
|
||||||
cleaned_songs.append(cleaned_song)
|
cleaned_songs.append(cleaned_song)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
success_count += 1
|
||||||
|
print(f"[{i}/{len(songs)}] ✅ PASS")
|
||||||
|
else:
|
||||||
|
fail_count += 1
|
||||||
|
print(f"[{i}/{len(songs)}] ❌ FAIL")
|
||||||
|
# Store failed song info for report
|
||||||
|
failed_songs.append({
|
||||||
|
'index': i,
|
||||||
|
'original_artist': song.get('artist', ''),
|
||||||
|
'original_title': song.get('title', ''),
|
||||||
|
'cleaned_artist': cleaned_song.get('artist', ''),
|
||||||
|
'cleaned_title': cleaned_song.get('title', ''),
|
||||||
|
'has_mbid': 'mbid' in cleaned_song,
|
||||||
|
'has_recording_mbid': 'recording_mbid' in cleaned_song
|
||||||
|
})
|
||||||
|
|
||||||
# Only add delay for API calls, not database queries
|
# Only add delay for API calls, not database queries
|
||||||
if not self.use_database:
|
if not self.use_database:
|
||||||
time.sleep(API_REQUEST_DELAY)
|
time.sleep(API_REQUEST_DELAY)
|
||||||
@ -412,21 +424,37 @@ class MusicBrainzCleaner:
|
|||||||
with open(output_file, 'w', encoding='utf-8') as f:
|
with open(output_file, 'w', encoding='utf-8') as f:
|
||||||
json.dump(cleaned_songs, f, indent=2, ensure_ascii=False)
|
json.dump(cleaned_songs, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
print(f"\n{PROGRESS_SEPARATOR}")
|
# Generate failure report
|
||||||
print(SUCCESS_MESSAGES['processing_complete'])
|
report_file = input_file.parent / f"{input_file.stem}_failure_report.json"
|
||||||
print(SUCCESS_MESSAGES['output_saved'].format(file_path=output_file))
|
with open(report_file, 'w', encoding='utf-8') as f:
|
||||||
|
json.dump({
|
||||||
|
'summary': {
|
||||||
|
'total_songs': len(songs),
|
||||||
|
'successful': success_count,
|
||||||
|
'failed': fail_count,
|
||||||
|
'success_rate': f"{(success_count/len(songs)*100):.1f}%"
|
||||||
|
},
|
||||||
|
'failed_songs': failed_songs
|
||||||
|
}, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
return output_file
|
print(f"\n{PROGRESS_SEPARATOR}")
|
||||||
|
print(f"✅ SUCCESS: {success_count} songs")
|
||||||
|
print(f"❌ FAILED: {fail_count} songs")
|
||||||
|
print(f"📊 SUCCESS RATE: {(success_count/len(songs)*100):.1f}%")
|
||||||
|
print(f"💾 CLEANED DATA: {output_file}")
|
||||||
|
print(f"📋 FAILURE REPORT: {report_file}")
|
||||||
|
|
||||||
|
return output_file, failed_songs
|
||||||
|
|
||||||
except FileNotFoundError:
|
except FileNotFoundError:
|
||||||
print(f"Error: File '{input_file}' not found")
|
print(f"Error: File '{input_file}' not found")
|
||||||
return input_file
|
return input_file, []
|
||||||
except json.JSONDecodeError:
|
except json.JSONDecodeError:
|
||||||
print(f"Error: Invalid JSON in file '{input_file}'")
|
print(f"Error: Invalid JSON in file '{input_file}'")
|
||||||
return input_file
|
return input_file, []
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error processing file: {e}")
|
print(f"Error processing file: {e}")
|
||||||
return input_file
|
return input_file, []
|
||||||
finally:
|
finally:
|
||||||
# Clean up database connection
|
# Clean up database connection
|
||||||
if self.use_database and hasattr(self, 'db'):
|
if self.use_database and hasattr(self, 'db'):
|
||||||
@ -601,7 +629,7 @@ def main() -> int:
|
|||||||
|
|
||||||
# Process the file
|
# Process the file
|
||||||
cleaner = MusicBrainzCleaner(use_database=use_database)
|
cleaner = MusicBrainzCleaner(use_database=use_database)
|
||||||
result_path = cleaner.clean_songs_file(input_file, output_file, limit)
|
result_path, failed_songs = cleaner.clean_songs_file(input_file, output_file, limit)
|
||||||
|
|
||||||
return ExitCode.SUCCESS
|
return ExitCode.SUCCESS
|
||||||
|
|
||||||
|
|||||||
157
start_services.sh
Executable file
157
start_services.sh
Executable file
@ -0,0 +1,157 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# MusicBrainz Cleaner - Quick Start Script
|
||||||
|
# This script automates the startup of MusicBrainz services
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
echo "🚀 Starting MusicBrainz services..."
|
||||||
|
|
||||||
|
# Colors for output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Function to print colored output
|
||||||
|
print_status() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_success() {
|
||||||
|
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_warning() {
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if Docker is running
|
||||||
|
if ! docker info > /dev/null 2>&1; then
|
||||||
|
print_error "Docker is not running. Please start Docker Desktop first."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Docker is running"
|
||||||
|
|
||||||
|
# Check if we're in the right directory
|
||||||
|
if [ ! -f "docker-compose.yml" ]; then
|
||||||
|
print_error "This script must be run from the musicbrainz-cleaner directory"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if musicbrainz-docker directory exists
|
||||||
|
if [ ! -d "../musicbrainz-docker" ]; then
|
||||||
|
print_error "musicbrainz-docker directory not found. Please ensure you're in the musicbrainz-server directory."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Navigate to musicbrainz-docker
|
||||||
|
cd ../musicbrainz-docker
|
||||||
|
|
||||||
|
print_status "Checking for port conflicts..."
|
||||||
|
|
||||||
|
# Check if port 5000 is available
|
||||||
|
if lsof -i :5000 > /dev/null 2>&1; then
|
||||||
|
print_warning "Port 5000 is in use. Using port 5001 instead."
|
||||||
|
PORT=5001
|
||||||
|
else
|
||||||
|
print_success "Port 5000 is available"
|
||||||
|
PORT=5000
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Stop any existing containers
|
||||||
|
print_status "Stopping existing containers..."
|
||||||
|
docker-compose down > /dev/null 2>&1 || true
|
||||||
|
|
||||||
|
# Remove any conflicting containers
|
||||||
|
print_status "Cleaning up conflicting containers..."
|
||||||
|
docker rm -f musicbrainz-docker-db-1 > /dev/null 2>&1 || true
|
||||||
|
|
||||||
|
# Start services
|
||||||
|
print_status "Starting MusicBrainz services on port $PORT..."
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=$PORT docker-compose up -d
|
||||||
|
|
||||||
|
print_success "Services started successfully!"
|
||||||
|
|
||||||
|
# Wait for database to be ready
|
||||||
|
print_status "Waiting for database to initialize (this may take 5-10 minutes)..."
|
||||||
|
print_status "You can monitor progress with: docker-compose logs -f db"
|
||||||
|
|
||||||
|
# Check if database is ready
|
||||||
|
attempts=0
|
||||||
|
max_attempts=60
|
||||||
|
while [ $attempts -lt $max_attempts ]; do
|
||||||
|
if docker-compose exec -T db pg_isready -U musicbrainz > /dev/null 2>&1; then
|
||||||
|
print_success "Database is ready!"
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
attempts=$((attempts + 1))
|
||||||
|
print_status "Waiting for database... (attempt $attempts/$max_attempts)"
|
||||||
|
sleep 10
|
||||||
|
done
|
||||||
|
|
||||||
|
if [ $attempts -eq $max_attempts ]; then
|
||||||
|
print_warning "Database may still be initializing. You can check status with: docker-compose logs db"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create .env file in musicbrainz-cleaner directory
|
||||||
|
cd ../musicbrainz-cleaner
|
||||||
|
|
||||||
|
print_status "Creating environment configuration..."
|
||||||
|
|
||||||
|
cat > .env << EOF
|
||||||
|
# Database connection (default Docker setup)
|
||||||
|
DB_HOST=172.18.0.2
|
||||||
|
DB_PORT=5432
|
||||||
|
DB_NAME=musicbrainz_db
|
||||||
|
DB_USER=musicbrainz
|
||||||
|
DB_PASSWORD=musicbrainz
|
||||||
|
|
||||||
|
# MusicBrainz web server
|
||||||
|
MUSICBRAINZ_WEB_SERVER_PORT=$PORT
|
||||||
|
EOF
|
||||||
|
|
||||||
|
print_success "Environment configuration created"
|
||||||
|
|
||||||
|
# Test connection
|
||||||
|
print_status "Testing connection..."
|
||||||
|
if docker-compose run --rm musicbrainz-cleaner python3 -c "
|
||||||
|
import sys
|
||||||
|
sys.path.insert(0, '/app')
|
||||||
|
from src.api.database import MusicBrainzDatabase
|
||||||
|
try:
|
||||||
|
db = MusicBrainzDatabase()
|
||||||
|
print('✅ Database connection successful')
|
||||||
|
except Exception as e:
|
||||||
|
print(f'❌ Database connection failed: {e}')
|
||||||
|
sys.exit(1)
|
||||||
|
" 2>/dev/null; then
|
||||||
|
print_success "Connection test passed!"
|
||||||
|
else
|
||||||
|
print_warning "Connection test failed. Services may still be initializing."
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
print_success "MusicBrainz services are now running!"
|
||||||
|
echo ""
|
||||||
|
echo "📊 Service Status:"
|
||||||
|
echo " - Web Server: http://localhost:$PORT"
|
||||||
|
echo " - Database: PostgreSQL (internal)"
|
||||||
|
echo " - Search: Solr (internal)"
|
||||||
|
echo ""
|
||||||
|
echo "🧪 Next steps:"
|
||||||
|
echo " 1. Run quick test: python3 quick_test_20.py"
|
||||||
|
echo " 2. Run larger test: python3 bulk_test_1000.py"
|
||||||
|
echo " 3. Use cleaner: python3 -m src.cli.main --input your_file.json --output cleaned.json"
|
||||||
|
echo ""
|
||||||
|
echo "📋 Useful commands:"
|
||||||
|
echo " - View logs: cd ../musicbrainz-docker && docker-compose logs -f"
|
||||||
|
echo " - Stop services: cd ../musicbrainz-docker && docker-compose down"
|
||||||
|
echo " - Check status: cd ../musicbrainz-docker && docker-compose ps"
|
||||||
|
echo ""
|
||||||
Loading…
Reference in New Issue
Block a user