Signed-off-by: Matt Bruce <mbrucedogs@gmail.com>

This commit is contained in:
Matt Bruce 2025-07-31 18:07:18 -05:00
parent 504820c8a1
commit 4bf359ee5d
8 changed files with 715 additions and 71 deletions

34
PRD.md
View File

@ -374,6 +374,33 @@ python musicbrainz_cleaner.py --test-connection
- **NEW**: Review and update band name protection list in `data/known_artists.json`
- **NEW**: Monitor collaboration detection accuracy
### Operational Procedures
#### After System Reboot
1. **Start Docker Desktop** (if auto-start not enabled)
2. **Restart MusicBrainz services**:
```bash
cd musicbrainz-cleaner
./restart_services.sh
```
3. **Wait for database initialization** (5-10 minutes)
4. **Test connection**:
```bash
docker-compose run --rm musicbrainz-cleaner python3 quick_test_20.py
```
#### Service Management
- **Start services**: `./start_services.sh` (full setup) or `./restart_services.sh` (quick restart)
- **Stop services**: `cd ../musicbrainz-docker && docker-compose down`
- **Check status**: `cd ../musicbrainz-docker && docker-compose ps`
- **View logs**: `cd ../musicbrainz-docker && docker-compose logs -f`
#### Troubleshooting
- **Port conflicts**: Use `MUSICBRAINZ_WEB_SERVER_PORT=5001` environment variable
- **Container conflicts**: Run `docker-compose down` then restart
- **Database issues**: Check logs with `docker-compose logs -f db`
- **Memory issues**: Increase Docker Desktop memory allocation (8GB+ recommended)
### Support
- GitHub issues for bug reports
- Documentation updates
@ -406,3 +433,10 @@ python musicbrainz_cleaner.py --test-connection
- **Database-first approach** ensures live data
- **Fuzzy search thresholds** need tuning for different datasets
- **Connection pooling** would improve performance for large datasets
### Operational Insights
- **Docker Service Management**: MusicBrainz services require proper startup sequence and initialization time
- **Port Conflicts**: Common on macOS, requiring automatic detection and resolution
- **System Reboots**: Services need to be restarted after system reboots, but data persists in Docker volumes
- **Resource Requirements**: MusicBrainz services require significant memory (8GB+ recommended) and disk space
- **Platform Compatibility**: Apple Silicon (M1/M2) works but may show platform mismatch warnings

View File

@ -39,50 +39,81 @@ A powerful command-line tool that cleans and normalizes your song data using the
## 🚀 Quick Start
### 1. Install Dependencies
### Option 1: Automated Setup (Recommended)
1. **Start MusicBrainz services**:
```bash
pip install requests psycopg2-binary fuzzywuzzy python-Levenshtein
./start_services.sh
```
This script will:
- Check for Docker and port conflicts
- Start all MusicBrainz services
- Wait for database initialization
- Create environment configuration
- Test the connection
2. **Run the cleaner**:
```bash
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --input data/songs.json --output cleaned_songs.json
```
### 2. Set Up MusicBrainz Server
### Option 2: Manual Setup
#### Option A: Docker (Recommended)
1. **Start MusicBrainz services manually**:
```bash
# Clone MusicBrainz Docker repository
git clone https://github.com/metabrainz/musicbrainz-docker.git
cd musicbrainz-docker
cd ../musicbrainz-docker
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
```
Wait 5-10 minutes for database initialization.
# Update postgres.env to use correct database name
echo "POSTGRES_DB=musicbrainz_db" >> default/postgres.env
# Start the server
docker-compose up -d
# Wait for database to be ready (can take 10-15 minutes)
docker-compose logs -f musicbrainz
2. **Create environment configuration**:
```bash
# Create .env file in musicbrainz-cleaner directory
cat > .env << EOF
DB_HOST=172.18.0.2
DB_PORT=5432
DB_NAME=musicbrainz_db
DB_USER=musicbrainz
DB_PASSWORD=musicbrainz
MUSICBRAINZ_WEB_SERVER_PORT=5001
EOF
```
#### Option B: Manual Setup
1. Install PostgreSQL 12+
2. Create database: `createdb musicbrainz_db`
3. Import MusicBrainz data dump
4. Start MusicBrainz server on port 8080
### 3. Test Connection
3. **Run the cleaner**:
```bash
python musicbrainz_cleaner.py --test-connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --input data/songs.json --output cleaned_songs.json
```
### 4. Run the Cleaner
```bash
# Use database access (recommended, faster)
python musicbrainz_cleaner.py your_songs.json
### For detailed setup instructions, see [SETUP.md](SETUP.md)
# Force API mode (slower, fallback)
python musicbrainz_cleaner.py your_songs.json --use-api
## 🔄 After System Reboot
After restarting your Mac, you'll need to restart the MusicBrainz services:
### Quick Restart (Recommended)
```bash
# If Docker Desktop is already running
./restart_services.sh
# Or manually
cd ../musicbrainz-docker && MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
```
That's it! Your cleaned data will be saved to `your_songs_cleaned.json`
### Full Restart (If you have issues)
```bash
# Complete setup including Docker checks
./start_services.sh
```
### Auto-start Setup (Optional)
1. **Enable Docker Desktop auto-start**:
- Open Docker Desktop
- Go to Settings → General
- Check "Start Docker Desktop when you log in"
2. **Then just run**: `./restart_services.sh` after each reboot
**Note**: Your data is preserved in Docker volumes, so you don't need to reconfigure anything after a reboot.
## 📋 Requirements

266
SETUP.md Normal file
View File

@ -0,0 +1,266 @@
# MusicBrainz Cleaner Setup Guide
This guide will help you set up the MusicBrainz database and Docker services needed to run the cleaner.
## Prerequisites
- Docker Desktop installed and running
- At least 8GB of available RAM
- At least 10GB of free disk space
- Git (to clone the repositories)
## Step 1: Clone the MusicBrainz Server Repository
```bash
# Clone the main MusicBrainz server repository (if not already done)
git clone https://github.com/metabrainz/musicbrainz-server.git
cd musicbrainz-server
```
## Step 2: Start the MusicBrainz Docker Services
The MusicBrainz server uses Docker Compose to run multiple services including PostgreSQL, Solr search, Redis, and the web server.
```bash
# Navigate to the musicbrainz-docker directory
cd musicbrainz-docker
# Check if port 5000 is available (common conflict on macOS)
lsof -i :5000
# If port 5000 is in use, use port 5001 instead
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
# Or if port 5000 is free, use the default
docker-compose up -d
```
### Troubleshooting Port Conflicts
If you get a port conflict error:
```bash
# Kill any process using port 5000
lsof -ti:5000 | xargs kill -9
# Or use a different port
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
```
### Troubleshooting Container Conflicts
If you get container name conflicts:
```bash
# Remove existing containers
docker-compose down
# Force remove conflicting containers
docker rm -f musicbrainz-docker-db-1
# Start fresh
docker-compose up -d
```
## Step 3: Wait for Services to Start
The services take time to initialize, especially the database:
```bash
# Check service status
docker-compose ps
# Wait for all services to be healthy (this can take 5-10 minutes)
docker-compose logs -f db
```
**Important**: Wait until you see database initialization complete messages before proceeding.
## Step 4: Verify Services Are Running
```bash
# Check all containers are running
docker-compose ps
# Test the web interface (if using port 5001)
curl http://localhost:5001
# Or if using default port 5000
curl http://localhost:5000
```
## Step 5: Set Environment Variables
Create a `.env` file in the `musicbrainz-cleaner` directory:
```bash
cd ../musicbrainz-cleaner
# Create .env file
cat > .env << EOF
# Database connection (default Docker setup)
DB_HOST=172.18.0.2
DB_PORT=5432
DB_NAME=musicbrainz_db
DB_USER=musicbrainz
DB_PASSWORD=musicbrainz
# MusicBrainz web server
MUSICBRAINZ_WEB_SERVER_PORT=5001
EOF
```
**Note**: If you used the default port 5000, change `MUSICBRAINZ_WEB_SERVER_PORT=5001` to `MUSICBRAINZ_WEB_SERVER_PORT=5000`.
## Step 6: Test the Connection
```bash
# Run a simple test to verify everything is working
docker-compose run --rm musicbrainz-cleaner python3 quick_test_20.py
```
## Service Details
The Docker Compose setup includes:
- **PostgreSQL Database** (`db`): Main MusicBrainz database
- **Solr Search** (`search`): Full-text search engine
- **Redis** (`redis`): Caching and session storage
- **Message Queue** (`mq`): Background job processing
- **MusicBrainz Web Server** (`musicbrainz`): Main web application
- **Indexer** (`indexer`): Search index maintenance
## Ports Used
- **5000/5001**: MusicBrainz web server (configurable)
- **5432**: PostgreSQL database (internal)
- **8983**: Solr search (internal)
- **6379**: Redis (internal)
- **5672**: Message queue (internal)
## Stopping Services
```bash
# Stop all services
cd musicbrainz-docker
docker-compose down
# To also remove volumes (WARNING: this deletes all data)
docker-compose down -v
```
## Restarting Services
```bash
# Restart all services
docker-compose restart
# Or restart specific service
docker-compose restart db
```
## Monitoring Services
```bash
# View logs for all services
docker-compose logs -f
# View logs for specific service
docker-compose logs -f db
docker-compose logs -f musicbrainz
# Check resource usage
docker stats
```
## Troubleshooting
### Database Connection Issues
```bash
# Check if database is running
docker-compose ps db
# Check database logs
docker-compose logs db
# Test database connection
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT 1;"
```
### Memory Issues
If you encounter memory issues:
```bash
# Increase Docker memory limit in Docker Desktop settings
# Recommended: 8GB minimum, 16GB preferred
# Check current memory usage
docker stats
```
### Platform Issues (Apple Silicon)
If you're on Apple Silicon (M1/M2) and see platform warnings:
```bash
# The services will still work, but you may see warnings about platform mismatch
# This is normal and doesn't affect functionality
```
## Performance Tips
1. **Allocate sufficient memory** to Docker Desktop (8GB+ recommended)
2. **Use SSD storage** for better database performance
3. **Close other resource-intensive applications** while running the services
4. **Wait for full initialization** before running tests
## Next Steps
Once the services are running successfully:
1. Run the quick test: `python3 quick_test_20.py`
2. Run larger tests: `python3 bulk_test_1000.py`
3. Use the cleaner on your own data: `python3 -m src.cli.main --input your_file.json --output cleaned.json`
## 🔄 After System Reboot
After restarting your Mac, you'll need to restart the MusicBrainz services:
### Quick Restart (Recommended)
```bash
# Navigate to musicbrainz-cleaner directory
cd /Users/mattbruce/Documents/Projects/musicbrainz-server/musicbrainz-cleaner
# If Docker Desktop is already running
./restart_services.sh
# Or manually
cd ../musicbrainz-docker && MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
```
### Full Restart (If you have issues)
```bash
# Complete setup including Docker checks
./start_services.sh
```
### Auto-start Setup (Optional)
1. **Enable Docker Desktop auto-start**:
- Open Docker Desktop
- Go to Settings → General
- Check "Start Docker Desktop when you log in"
2. **Then just run**: `./restart_services.sh` after each reboot
**Note**: Your data is preserved in Docker volumes, so you don't need to reconfigure anything after a reboot.
## Support
If you encounter issues:
1. Check the logs: `docker-compose logs -f`
2. Verify Docker has sufficient resources
3. Ensure all prerequisites are met
4. Try restarting the services: `docker-compose restart`

View File

@ -222,6 +222,7 @@
"The Proclaimers",
"The Stanley Brothers",
"The Statler Brothers",
"The Tamperer featuring Maya",
"The Walker Brothers",
"The Wilburn Brothers",
"Thompson Twins",

108
quick_test_20.py Normal file
View File

@ -0,0 +1,108 @@
#!/usr/bin/env python3
"""
Quick test script for 20 random songs
Simple single-threaded approach
"""
import sys
import json
import time
from pathlib import Path
# Add the src directory to the path
sys.path.insert(0, '/app')
from src.cli.main import MusicBrainzCleaner
def main():
print('🚀 Starting quick test with 20 random songs...')
# Load songs
input_file = Path('data/songs.json')
if not input_file.exists():
print('❌ songs.json not found')
return
with open(input_file, 'r') as f:
all_songs = json.load(f)
print(f'📊 Total songs available: {len(all_songs):,}')
# Take 20 random songs
import random
sample_songs = random.sample(all_songs, 20)
print(f'🎯 Testing 20 random songs...')
# Initialize cleaner
cleaner = MusicBrainzCleaner()
# Process songs
found_artists = 0
found_recordings = 0
failed_songs = []
start_time = time.time()
for i, song in enumerate(sample_songs, 1):
print(f' [{i:2d}/20] Processing: "{song.get("artist", "Unknown")}" - "{song.get("title", "Unknown")}"')
try:
result = cleaner.clean_song(song)
artist_found = 'mbid' in result
recording_found = 'recording_mbid' in result
if artist_found and recording_found:
found_artists += 1
found_recordings += 1
print(f' ✅ Found both artist and recording')
else:
failed_songs.append({
'original': song,
'cleaned': result,
'artist_found': artist_found,
'recording_found': recording_found,
'artist_name': song.get('artist', 'Unknown'),
'title': song.get('title', 'Unknown')
})
print(f' ❌ Artist: {artist_found}, Recording: {recording_found}')
except Exception as e:
print(f' 💥 Error: {e}')
failed_songs.append({
'original': song,
'cleaned': {'error': str(e)},
'artist_found': False,
'recording_found': False,
'artist_name': song.get('artist', 'Unknown'),
'title': song.get('title', 'Unknown'),
'error': str(e)
})
end_time = time.time()
processing_time = end_time - start_time
# Calculate success rates
artist_success_rate = found_artists / 20 * 100
recording_success_rate = found_recordings / 20 * 100
failed_rate = len(failed_songs) / 20 * 100
print(f'\n📊 Final Results:')
print(f' ⏱️ Processing time: {processing_time:.2f} seconds')
print(f' 🚀 Speed: {20/processing_time:.1f} songs/second')
print(f' ✅ Artists found: {found_artists}/20 ({artist_success_rate:.1f}%)')
print(f' ✅ Recordings found: {found_recordings}/20 ({recording_success_rate:.1f}%)')
print(f' ❌ Failed songs: {len(failed_songs)} ({failed_rate:.1f}%)')
# Show failed songs
if failed_songs:
print(f'\n🔍 Failed songs:')
for i, failed in enumerate(failed_songs, 1):
print(f' [{i}] "{failed["artist_name"]}" - "{failed["title"]}"')
print(f' Artist found: {failed["artist_found"]}, Recording found: {failed["recording_found"]}')
if 'error' in failed:
print(f' Error: {failed["error"]}')
else:
print('\n🎉 All songs processed successfully!')
if __name__ == '__main__':
main()

19
restart_services.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
# Quick restart script for after Mac reboots
# This assumes Docker Desktop is already running
echo "🔄 Restarting MusicBrainz services..."
# Navigate to musicbrainz-docker
cd ../musicbrainz-docker
# Start services
MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
echo "✅ Services started!"
echo "⏳ Database may take 5-10 minutes to fully initialize"
echo ""
echo "📊 Check status: docker-compose ps"
echo "📋 View logs: docker-compose logs -f db"
echo "🧪 Test when ready: cd ../musicbrainz-cleaner && docker-compose run --rm musicbrainz-cleaner python3 quick_test_20.py"

View File

@ -276,8 +276,12 @@ class MusicBrainzCleaner:
return collaborators
def clean_song(self, song: Dict[str, Any]) -> Dict[str, Any]:
print(f"Processing: {song.get('artist', 'Unknown')} - {song.get('title', 'Unknown')}")
def clean_song(self, song: Dict[str, Any]) -> Tuple[Dict[str, Any], bool]:
"""
Clean a single song and return (cleaned_song, success_status)
"""
original_artist = song.get('artist', '')
original_title = song.get('title', '')
# Find artist MBID
artist_mbid = self.find_artist_mbid(song.get('artist', ''))
@ -289,13 +293,11 @@ class MusicBrainzCleaner:
has_collaboration = len(collaborators) > 0
if artist_mbid is None and has_collaboration:
print(f" 🎯 Collaboration detected: {song.get('artist')}")
# Try to find recording using artist credit approach
if self.use_database:
result = self.db.find_artist_credit(song.get('artist', ''), song.get('title', ''))
if result:
artist_credit_id, artist_string, recording_mbid = result
print(f" ✅ Found recording: {song.get('title')} (MBID: {recording_mbid})")
# Update with the correct artist credit
song['artist'] = artist_string
@ -309,11 +311,9 @@ class MusicBrainzCleaner:
if artist_result and isinstance(artist_result, tuple) and len(artist_result) >= 2:
song['mbid'] = artist_result[1] # Set the main artist's MBID
print(f" ✅ Updated to: {song['artist']} - {song.get('title')}")
return song
return song, True
else:
print(f" ❌ Could not find recording: {song.get('title')}")
return song
return song, False
else:
# Fallback to API method
recording_mbid = self.find_recording_mbid(None, song.get('title', ''))
@ -323,37 +323,29 @@ class MusicBrainzCleaner:
artist_string = self._build_artist_string(recording_info['artist-credit'])
if artist_string:
song['artist'] = artist_string
print(f" ✅ Updated to: {song['artist']} - {recording_info['title']}")
song['title'] = recording_info['title']
song['recording_mbid'] = recording_mbid
return song
else:
print(f" ❌ Could not find recording: {song.get('title')}")
return song
return song, True
return song, False
# Regular case (non-collaboration or collaboration not found)
if not artist_mbid:
print(f" ❌ Could not find artist: {song.get('artist')}")
return song
return song, False
# Get artist info
artist_info = self.get_artist_info(artist_mbid)
if artist_info:
print(f" ✅ Found artist: {artist_info['name']} (MBID: {artist_mbid})")
song['artist'] = artist_info['name']
song['mbid'] = artist_mbid
# Find recording MBID
recording_mbid = self.find_recording_mbid(artist_mbid, song.get('title', ''))
if not recording_mbid:
print(f" ❌ Could not find recording: {song.get('title')}")
return song
return song, False
# Get recording info
recording_info = self.get_recording_info(recording_mbid)
if recording_info:
print(f" ✅ Found recording: {recording_info['title']} (MBID: {recording_mbid})")
# Update artist string if there are multiple artists, but preserve the artist MBID
if self.use_database and recording_info.get('artist_credit'):
song['artist'] = recording_info['artist_credit']
@ -370,11 +362,11 @@ class MusicBrainzCleaner:
song['title'] = recording_info['title']
song['recording_mbid'] = recording_mbid
return song, True
print(f" ✅ Updated to: {song['artist']} - {song['title']}")
return song
return song, False
def clean_songs_file(self, input_file: Path, output_file: Optional[Path] = None, limit: Optional[int] = None) -> Path:
def clean_songs_file(self, input_file: Path, output_file: Optional[Path] = None, limit: Optional[int] = None) -> Tuple[Path, List[Dict]]:
try:
# Read input file
with open(input_file, 'r', encoding='utf-8') as f:
@ -382,7 +374,7 @@ class MusicBrainzCleaner:
if not isinstance(songs, list):
print("Error: Input file should contain a JSON array of songs")
return input_file
return input_file, []
# Apply limit if specified
if limit is not None:
@ -399,11 +391,31 @@ class MusicBrainzCleaner:
# Clean each song
cleaned_songs = []
failed_songs = []
success_count = 0
fail_count = 0
for i, song in enumerate(songs, 1):
print(f"\n[{i}/{len(songs)}]", end=" ")
cleaned_song = self.clean_song(song)
cleaned_song, success = self.clean_song(song)
cleaned_songs.append(cleaned_song)
if success:
success_count += 1
print(f"[{i}/{len(songs)}] ✅ PASS")
else:
fail_count += 1
print(f"[{i}/{len(songs)}] ❌ FAIL")
# Store failed song info for report
failed_songs.append({
'index': i,
'original_artist': song.get('artist', ''),
'original_title': song.get('title', ''),
'cleaned_artist': cleaned_song.get('artist', ''),
'cleaned_title': cleaned_song.get('title', ''),
'has_mbid': 'mbid' in cleaned_song,
'has_recording_mbid': 'recording_mbid' in cleaned_song
})
# Only add delay for API calls, not database queries
if not self.use_database:
time.sleep(API_REQUEST_DELAY)
@ -412,21 +424,37 @@ class MusicBrainzCleaner:
with open(output_file, 'w', encoding='utf-8') as f:
json.dump(cleaned_songs, f, indent=2, ensure_ascii=False)
print(f"\n{PROGRESS_SEPARATOR}")
print(SUCCESS_MESSAGES['processing_complete'])
print(SUCCESS_MESSAGES['output_saved'].format(file_path=output_file))
# Generate failure report
report_file = input_file.parent / f"{input_file.stem}_failure_report.json"
with open(report_file, 'w', encoding='utf-8') as f:
json.dump({
'summary': {
'total_songs': len(songs),
'successful': success_count,
'failed': fail_count,
'success_rate': f"{(success_count/len(songs)*100):.1f}%"
},
'failed_songs': failed_songs
}, f, indent=2, ensure_ascii=False)
return output_file
print(f"\n{PROGRESS_SEPARATOR}")
print(f"✅ SUCCESS: {success_count} songs")
print(f"❌ FAILED: {fail_count} songs")
print(f"📊 SUCCESS RATE: {(success_count/len(songs)*100):.1f}%")
print(f"💾 CLEANED DATA: {output_file}")
print(f"📋 FAILURE REPORT: {report_file}")
return output_file, failed_songs
except FileNotFoundError:
print(f"Error: File '{input_file}' not found")
return input_file
return input_file, []
except json.JSONDecodeError:
print(f"Error: Invalid JSON in file '{input_file}'")
return input_file
return input_file, []
except Exception as e:
print(f"Error processing file: {e}")
return input_file
return input_file, []
finally:
# Clean up database connection
if self.use_database and hasattr(self, 'db'):
@ -601,7 +629,7 @@ def main() -> int:
# Process the file
cleaner = MusicBrainzCleaner(use_database=use_database)
result_path = cleaner.clean_songs_file(input_file, output_file, limit)
result_path, failed_songs = cleaner.clean_songs_file(input_file, output_file, limit)
return ExitCode.SUCCESS

157
start_services.sh Executable file
View File

@ -0,0 +1,157 @@
#!/bin/bash
# MusicBrainz Cleaner - Quick Start Script
# This script automates the startup of MusicBrainz services
set -e
echo "🚀 Starting MusicBrainz services..."
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Function to print colored output
print_status() {
echo -e "${BLUE}[INFO]${NC} $1"
}
print_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
print_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
print_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# Check if Docker is running
if ! docker info > /dev/null 2>&1; then
print_error "Docker is not running. Please start Docker Desktop first."
exit 1
fi
print_success "Docker is running"
# Check if we're in the right directory
if [ ! -f "docker-compose.yml" ]; then
print_error "This script must be run from the musicbrainz-cleaner directory"
exit 1
fi
# Check if musicbrainz-docker directory exists
if [ ! -d "../musicbrainz-docker" ]; then
print_error "musicbrainz-docker directory not found. Please ensure you're in the musicbrainz-server directory."
exit 1
fi
# Navigate to musicbrainz-docker
cd ../musicbrainz-docker
print_status "Checking for port conflicts..."
# Check if port 5000 is available
if lsof -i :5000 > /dev/null 2>&1; then
print_warning "Port 5000 is in use. Using port 5001 instead."
PORT=5001
else
print_success "Port 5000 is available"
PORT=5000
fi
# Stop any existing containers
print_status "Stopping existing containers..."
docker-compose down > /dev/null 2>&1 || true
# Remove any conflicting containers
print_status "Cleaning up conflicting containers..."
docker rm -f musicbrainz-docker-db-1 > /dev/null 2>&1 || true
# Start services
print_status "Starting MusicBrainz services on port $PORT..."
MUSICBRAINZ_WEB_SERVER_PORT=$PORT docker-compose up -d
print_success "Services started successfully!"
# Wait for database to be ready
print_status "Waiting for database to initialize (this may take 5-10 minutes)..."
print_status "You can monitor progress with: docker-compose logs -f db"
# Check if database is ready
attempts=0
max_attempts=60
while [ $attempts -lt $max_attempts ]; do
if docker-compose exec -T db pg_isready -U musicbrainz > /dev/null 2>&1; then
print_success "Database is ready!"
break
fi
attempts=$((attempts + 1))
print_status "Waiting for database... (attempt $attempts/$max_attempts)"
sleep 10
done
if [ $attempts -eq $max_attempts ]; then
print_warning "Database may still be initializing. You can check status with: docker-compose logs db"
fi
# Create .env file in musicbrainz-cleaner directory
cd ../musicbrainz-cleaner
print_status "Creating environment configuration..."
cat > .env << EOF
# Database connection (default Docker setup)
DB_HOST=172.18.0.2
DB_PORT=5432
DB_NAME=musicbrainz_db
DB_USER=musicbrainz
DB_PASSWORD=musicbrainz
# MusicBrainz web server
MUSICBRAINZ_WEB_SERVER_PORT=$PORT
EOF
print_success "Environment configuration created"
# Test connection
print_status "Testing connection..."
if docker-compose run --rm musicbrainz-cleaner python3 -c "
import sys
sys.path.insert(0, '/app')
from src.api.database import MusicBrainzDatabase
try:
db = MusicBrainzDatabase()
print('✅ Database connection successful')
except Exception as e:
print(f'❌ Database connection failed: {e}')
sys.exit(1)
" 2>/dev/null; then
print_success "Connection test passed!"
else
print_warning "Connection test failed. Services may still be initializing."
fi
echo ""
print_success "MusicBrainz services are now running!"
echo ""
echo "📊 Service Status:"
echo " - Web Server: http://localhost:$PORT"
echo " - Database: PostgreSQL (internal)"
echo " - Search: Solr (internal)"
echo ""
echo "🧪 Next steps:"
echo " 1. Run quick test: python3 quick_test_20.py"
echo " 2. Run larger test: python3 bulk_test_1000.py"
echo " 3. Use cleaner: python3 -m src.cli.main --input your_file.json --output cleaned.json"
echo ""
echo "📋 Useful commands:"
echo " - View logs: cd ../musicbrainz-docker && docker-compose logs -f"
echo " - Stop services: cd ../musicbrainz-docker && docker-compose down"
echo " - Check status: cd ../musicbrainz-docker && docker-compose ps"
echo ""