musicbrainz-cleaner/README.md

26 KiB
Raw Blame History

🎵 MusicBrainz Data Cleaner v3.0

A powerful command-line tool that cleans and normalizes your song data using the MusicBrainz database. Now with interface-based architecture, advanced collaboration detection, artist alias handling, and intelligent fuzzy search for maximum accuracy!

🚀 Quick Start for New Sessions

If you're starting fresh or after a reboot, follow this exact sequence:

1. Start MusicBrainz Services

# Quick restart (recommended)
./restart_services.sh

# Or full restart (if you have issues)
./start_services.sh

2. Wait for Services to Initialize

  • Database: 5-10 minutes to fully load
  • Web server: 2-3 minutes to start responding
  • Check status: cd ../musicbrainz-docker && docker-compose ps

3. Verify Services Are Ready

# Test web server
curl -s http://localhost:5001 | head -5

# Test database (should show 2.6M+ artists)
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"

# Test cleaner connection
docker-compose run --rm musicbrainz-cleaner python3 -c "from src.api.database import MusicBrainzDatabase; db = MusicBrainzDatabase(); print('Connection result:', db.connect())"

4. Run Tests

# Test 100 random songs
docker-compose run --rm musicbrainz-cleaner python3 test_100_random.py

# Or other test scripts
docker-compose run --rm musicbrainz-cleaner python3 [script_name].py

⚠️ Important: Always run scripts via Docker - the cleaner cannot connect to the database directly from outside the container.

📋 Troubleshooting: See TROUBLESHOOTING.md for common issues and solutions.

What's New in v3.0

  • 🏗️ Interface-Based Architecture: Clean dependency injection with common interfaces
  • 🏭 Factory Pattern: Smart data provider creation and configuration
  • 🚀 Direct Database Access: Connect directly to PostgreSQL for 10x faster performance
  • 🎯 Advanced Fuzzy Search: Intelligent matching for similar artist names and song titles
  • 🔄 Automatic Fallback: Falls back to API mode if database access fails
  • No Rate Limiting: Database queries don't have API rate limits
  • 📊 Similarity Scoring: See how well matches are scored
  • 🆕 Collaboration Detection: Intelligently handle complex collaborations like "Pitbull ft. Ne-Yo, Afrojack & Nayer"
  • 🆕 Artist Aliases: Handle name variations like "98 Degrees" → "98°" and "S Club 7" → "S Club"
  • 🆕 Sort Names: Handle "Last, First" formats like "Corby, Matt" → "Matt Corby"
  • 🆕 Edge Case Handling: Support for artists with hyphens, exclamation marks, numbers, and special characters
  • 🆕 Band Name Protection: Distinguish between band names (Simon & Garfunkel) and collaborations (Lavato, Demi & Joe Jonas)

What It Does

Before:

{
  "artist": "ACDC",
  "title": "Shot In The Dark",
  "favorite": true
}

After:

{
  "artist": "AC/DC",
  "title": "Shot in the Dark",
  "favorite": true,
  "mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
  "recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
}

🚀 Quick Start

  1. Start MusicBrainz services:

    ./start_services.sh
    

    This script will:

    • Check for Docker and port conflicts
    • Start all MusicBrainz services
    • Wait for database initialization
    • Create environment configuration
    • Test the connection
  2. Run the cleaner:

    docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --input data/songs.json --output cleaned_songs.json
    

Option 2: Manual Setup

  1. Start MusicBrainz services manually:

    cd ../musicbrainz-docker
    MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d
    

    Wait 5-10 minutes for database initialization.

  2. Create environment configuration:

    # Create .env file in musicbrainz-cleaner directory
    cat > .env << EOF
    DB_HOST=172.18.0.2
    DB_PORT=5432
    DB_NAME=musicbrainz_db
    DB_USER=musicbrainz
    DB_PASSWORD=musicbrainz
    MUSICBRAINZ_WEB_SERVER_PORT=5001
    EOF
    
  3. Run the cleaner:

    docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --input data/songs.json --output cleaned_songs.json
    

For detailed setup instructions, see SETUP.md

🔄 After System Reboot

After restarting your Mac, you'll need to restart the MusicBrainz services:

# If Docker Desktop is already running
./restart_services.sh

# Or manually
cd ../musicbrainz-docker && MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d

Full Restart (If you have issues)

# Complete setup including Docker checks
./start_services.sh

Auto-start Setup (Optional)

  1. Enable Docker Desktop auto-start:

    • Open Docker Desktop
    • Go to Settings → General
    • Check "Start Docker Desktop when you log in"
  2. Then just run: ./restart_services.sh after each reboot

Note: Your data is preserved in Docker volumes, so you don't need to reconfigure anything after a reboot.

🚨 Common Startup Issues & Fixes

Issue 1: Database Connection Refused

Problem: Cleaner can't connect to database with error "Connection refused" Root Cause: Database container not fully initialized or wrong host configuration Fix:

# Wait for database to be ready (check logs)
cd ../musicbrainz-docker && docker-compose logs db | tail -10

# Verify database is accepting connections
docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"

Issue 2: Wrong Database Host Configuration

Problem: Cleaner tries to connect to 172.18.0.2 but can't reach it Root Cause: Hardcoded IP address in database connection Fix: Use Docker service name db instead of IP address

# In src/api/database.py, change:
host='172.18.0.2'  # ❌ Wrong
host='db'          # ✅ Correct

Issue 3: Test Script Logic Error

Problem: Test shows 0% success rate despite finding artists Root Cause: Test script checking 'mbid' in result where result is a tuple (song_dict, success_boolean) Fix: Extract song dictionary from tuple

# Wrong:
artist_found = 'mbid' in result

# Correct:
cleaned_song, success = result
artist_found = 'mbid' in cleaned_song

Issue 4: Services Not Fully Initialized

Problem: API returns empty results even though database has data Root Cause: MusicBrainz web server still starting up Fix: Wait for services to be fully ready

# Check if web server is responding
curl -s http://localhost:5001 | head -5

# Wait for database to be ready
docker-compose logs db | grep "database system is ready"

Issue 5: Port Conflicts

Problem: Port 5000 already in use Root Cause: Another service using the port Fix: Use alternative port

MUSICBRAINZ_WEB_SERVER_PORT=5001 docker-compose up -d

Issue 6: Container Name Conflicts

Problem: "Container name already in use" error Root Cause: Previous containers not properly cleaned up Fix: Remove conflicting containers

docker-compose down
docker rm -f <container_name>

🔧 Startup Checklist

Before running tests, verify:

  1. Docker Desktop is running
  2. All containers are up: docker-compose ps
  3. Database is ready: docker-compose logs db | grep "ready"
  4. Web server responds: curl -s http://localhost:5001
  5. Database has data: docker-compose exec db psql -U musicbrainz -d musicbrainz_db -c "SELECT COUNT(*) FROM artist;"
  6. Cleaner can connect: Test database connection in cleaner

📋 Requirements

  • Python 3.6+
  • MusicBrainz Server running on localhost:8080
  • PostgreSQL Database accessible on localhost:5432
  • Dependencies: requests, psycopg2-binary, fuzzywuzzy, python-Levenshtein

🔧 Server Configuration

Database Access

  • Host: localhost (or Docker container IP: 172.18.0.2)
  • Port: 5432 (PostgreSQL default)
  • Database: musicbrainz_db (actual database name)
  • User: musicbrainz
  • Password: musicbrainz (default, should be changed in production)

HTTP API (Fallback)

Troubleshooting

  • Database Connection Failed: Check PostgreSQL is running and credentials are correct
  • API Connection Failed: Check MusicBrainz server is running on port 8080
  • Slow Performance: Ensure database indexes are built
  • No Results: Verify data has been imported to the database
  • NEW: Docker Networking: Use container IP (172.18.0.2) for Docker-to-Docker connections
  • NEW: Database Name: Ensure using musicbrainz_db not musicbrainz

🧪 Testing

Test File Organization

  • REQUIRED: All test files must be placed in src/tests/ directory
  • PROHIBITED: Test files should not be placed in the root directory
  • Naming Convention: Test files should follow test_*.py or debug_*.py patterns
  • Purpose: Keeps root directory clean and organizes test code properly

Running Tests

# Run all tests
python3 src/tests/run_tests.py

# Run specific test categories
python3 src/tests/run_tests.py --unit          # Unit tests only
python3 src/tests/run_tests.py --integration   # Integration tests only

# Run specific test module
python3 src/tests/run_tests.py test_data_loader
python3 src/tests/run_tests.py test_cli

# List all available tests
python3 src/tests/run_tests.py --list

Test Categories

  • Unit Tests: Test individual components in isolation
  • Integration Tests: Test interactions between components and database
  • Debug Tests: Debug scripts and troubleshooting tools

📁 Project Structure

musicbrainz-cleaner/
├── src/
│   ├── api/                 # Database and API access
│   │   ├── database.py      # Direct PostgreSQL access (implements MusicBrainzDataProvider)
│   │   └── api_client.py    # HTTP API client (implements MusicBrainzDataProvider)
│   ├── cli/                 # Command-line interface
│   │   └── main.py          # Main CLI implementation (uses factory pattern)
│   ├── config/              # Configuration and constants
│   ├── core/                # Core functionality
│   │   ├── interfaces.py    # Common interfaces and protocols
│   │   ├── factory.py       # Data provider factory
│   │   └── song_processor.py # Centralized song processing logic
│   ├── tests/               # Test files (REQUIRED location)
│   └── utils/               # Utility functions
│       ├── artist_title_processing.py # Shared artist/title processing
│       └── data_loader.py   # Data loading utilities
├── data/                    # Data files and output
│   ├── known_artists.json   # Name variations (ACDC → AC/DC)
│   ├── known_recordings.json # Known recording MBIDs
│   └── songs.json           # Source songs file
└── docker-compose.yml       # Docker configuration

Data Files

The tool uses external JSON files for name variations:

  • data/known_artists.json: Contains name variations (ACDC → AC/DC, ft. → feat.)
  • data/known_recordings.json: Contains known recording MBIDs for common songs

These files can be easily updated without touching the code, making it simple to add new name variations.

🎯 Features

Artist Name Fixes

  • ACDCAC/DC
  • Bruno Mars ft. Cardi BBruno Mars feat. Cardi B
  • featuringfeat.
  • 98 Degrees98° (artist aliases)
  • S Club 7S Club (numerical suffixes)
  • Corby, MattMatt Corby (sort names)

Collaboration Detection

  • Primary Patterns: "ft.", "feat.", "featuring" (always collaborations)
  • Secondary Patterns: "&", "and", "," (intelligent detection)
  • Band Name Protection: 200+ known band names from data/known_artists.json
  • Complex Collaborations: "Pitbull ft. Ne-Yo, Afrojack & Nayer"
  • Case Insensitive: "Featuring" → "featuring"

Song Title Fixes

  • Shot In The DarkShot in the Dark
  • Removes (Karaoke Version), (Instrumental) suffixes
  • Normalizes capitalization and formatting

Added Data

  • mbid: Official MusicBrainz Artist ID
  • recording_mbid: Official MusicBrainz Recording ID

Preserves Your Data

  • Keeps all your existing fields (guid, path, disabled, favorite, etc.)
  • Only adds new fields, never removes existing ones
  • Intelligent Matching: Finds similar names even with typos or variations
  • Similarity Scoring: Shows how well each match scores (0.0 to 1.0)
  • Configurable Thresholds: Adjust matching sensitivity
  • Multiple Algorithms: Uses ratio, partial ratio, and token sort matching
  • Enhanced Search Fields: artist.name, artist_alias.name, artist.sort_name
  • Dash Handling: Regular dash (-) vs Unicode dash ()
  • Substring Protection: Avoids false matches like "Sleazy-E" vs "Eazy-E"

🆕 Edge Case Support

  • Hyphenated Artists: "Blink-182", "Ne-Yo", "G-Eazy"
  • Exclamation Marks: "P!nk", "Panic! At The Disco", "3OH!3"
  • Numbers: "98 Degrees", "S Club 7", "3 Doors Down"
  • Special Characters: "a-ha", "The B-52s", "Salt-N-Pepa"

🆕 Simplified Processing

  • Default Behavior: Process all songs by default (no special flags needed)
  • Separate Output Files: Successful and failed songs saved to different files
  • Progress Tracking: Real-time progress with song counter and status
  • Smart Defaults: Sensible defaults for all file paths and options
  • Detailed Reporting: Comprehensive statistics and processing report
  • Batch Processing: Efficient handling of large song collections

📖 Usage Examples

Basic Usage (Default)

# Process all songs with default settings (data/songs.json)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main
# Output: data/songs-success.json and data/songs-failure.json

Custom Source File

# Process specific file
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/my_songs.json
# Output: data/my_songs-success.json and data/my_songs-failure.json

Custom Output Files

# Specify custom output files
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --source data/songs.json --output-success cleaned.json --output-failure failed.json

Limit Processing

# Process only first 1000 songs
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000

Force API Mode

# Use HTTP API instead of database (slower but works without PostgreSQL)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api

Test Connections

# Test database connection
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection

# Test with API mode
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection --use-api

Help

# Show usage information
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help

📁 Data Files

Input Format

Your JSON file should contain an array of song objects:

[
  {
    "artist": "ACDC",
    "title": "Shot In The Dark",
    "disabled": false,
    "favorite": true,
    "guid": "8946008c-7acc-d187-60e6-5286e55ad502",
    "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4"
  },
  {
    "artist": "Bruno Mars ft. Cardi B",
    "title": "Finesse Remix",
    "disabled": false,
    "favorite": false,
    "guid": "946a1077-ab9e-300c-3a72-b1e141e9706f",
    "path": "z://MP4\\Bruno Mars ft. Cardi B - Finesse Remix (Karaoke Version).mp4"
  }
]

📤 Output Format

The tool creates three output files:

1. Successful Songs (source-success.json)

Array of successfully processed songs with MBIDs added:

[
  {
    "artist": "AC/DC",
    "title": "Shot in the Dark",
    "disabled": false,
    "favorite": true,
    "guid": "8946008c-7acc-d187-60e6-5286e55ad502",
    "path": "z://MP4\\ACDC - Shot In The Dark (Karaoke Version).mp4",
    "mbid": "66c662b6-6e2f-4930-8610-912e24c63ed1",
    "recording_mbid": "cf8b5cd0-d97c-413d-882f-fc422a2e57db"
  },
  {
    "artist": "Bruno Mars feat. Cardi B",
    "title": "Finesse (remix)",
    "disabled": false,
    "favorite": false,
    "guid": "946a1077-ab9e-300c-3a72-b1e141e9706f",
    "path": "z://MP4\\Bruno Mars ft. Cardi B - Finesse Remix (Karaoke Version).mp4",
    "mbid": "afb680f2-b6eb-4cd7-a70b-a63b25c763d5",
    "recording_mbid": "8ed14014-547a-4128-ab81-c2dca7ae198e"
  }
]

2. Failed Songs (source-failure.json)

Array of songs that couldn't be processed (same format as source):

[
  {
    "artist": "Unknown Artist",
    "title": "Unknown Song",
    "disabled": false,
    "favorite": false,
    "guid": "12345678-1234-1234-1234-123456789012",
    "path": "z://MP4\\Unknown Artist - Unknown Song.mp4"
  }
]

3. Processing Report (processing_report_YYYYMMDD_HHMMSS.txt)

Human-readable text report with statistics and failed song list:

MusicBrainz Data Cleaner - Processing Report
==================================================

Source File: data/songs.json
Processing Date: 2024-12-19 14:30:22
Processing Time: 15263.3 seconds

SUMMARY
--------------------
Total Songs Processed: 49,170
Successful Songs: 40,692
Failed Songs: 8,478
Success Rate: 82.8%

DETAILED STATISTICS
--------------------
Artists Found: 44,526/49,170 (90.6%)
Recordings Found: 40,998/49,170 (83.4%)
Processing Speed: 3.2 songs/second

OUTPUT FILES
--------------------
Successful Songs: data/songs-success.json
Failed Songs: data/songs-failure.json
Report File: data/processing_report_20241219_143022.txt

FAILED SONGS (First 50)
--------------------
  1. Unknown Artist - Unknown Song
  2. Invalid Artist - Invalid Title
  3. Test Artist - Test Song
...

🎬 Example Run

Basic Processing

$ docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main

🚀 Starting song processing...
📊 Total songs to process: 49,170
Using database connection
==================================================

[1 of 49,170] ✅ PASS: ACDC - Shot In The Dark
[2 of 49,170] ❌ FAIL: Unknown Artist - Unknown Song
[3 of 49,170] ✅ PASS: Bruno Mars feat. Cardi B - Finesse (remix)
[4 of 49,170] ✅ PASS: Taylor Swift - Love Story
...

  📈 Progress: 100/49,170 (0.2%) - Success: 85.0% - Rate: 3.2 songs/sec
  📈 Progress: 200/49,170 (0.4%) - Success: 87.5% - Rate: 3.1 songs/sec
  ...

==================================================
🎉 Processing completed!
📊 Final Results:
  ⏱️  Total processing time: 15263.3 seconds
  🚀 Average speed: 3.2 songs/second
  ✅ Artists found: 44,526/49,170 (90.6%)
  ✅ Recordings found: 40,998/49,170 (83.4%)
  ❌ Failed songs: 8,478 (17.2%)
📄 Files saved:
  ✅ Successful songs: data/songs-success.json
  ❌ Failed songs: data/songs-failure.json
  📋 Text report: data/processing_report_20241219_143022.txt
  📊 JSON report: data/processing_report_20241219_143022.json

Limited Processing

$ docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --limit 1000

⚠️  Limiting processing to first 1000 songs
🚀 Starting song processing...
📊 Total songs to process: 1,000
Using database connection
==================================================

[1 of 1,000] ✅ PASS: ACDC - Shot In The Dark
[2 of 1,000] ❌ FAIL: Unknown Artist - Unknown Song
...

==================================================
🎉 Processing completed!
📊 Final Results:
  ⏱️  Total processing time: 312.5 seconds
  🚀 Average speed: 3.2 songs/second
  ✅ Artists found: 856/1,000 (85.6%)
  ✅ Recordings found: 789/1,000 (78.9%)
  ❌ Failed songs: 211 (21.1%)
📄 Files saved:
  ✅ Successful songs: data/songs-success.json
  ❌ Failed songs: data/songs-failure.json
  📋 Text report: data/processing_report_20241219_143022.txt
  📊 JSON report: data/processing_report_20241219_143022.json

🔧 Troubleshooting

"Could not find artist"

  • The artist might not be in the MusicBrainz database
  • Try checking the spelling or using a different variation
  • The search index might still be building (wait a few minutes)
  • Check fuzzy search similarity score - lower threshold if needed
  • NEW: Check for artist aliases (e.g., "98 Degrees" → "98°")
  • NEW: Check for sort names (e.g., "Corby, Matt" → "Matt Corby")

"Could not find recording"

  • The song might not be in the database
  • The title might not match exactly
  • Try a simpler title (remove extra words)
  • Check fuzzy search similarity score - lower threshold if needed
  • NEW: For collaborations, check if it's stored under the main artist

Connection errors

  • Database: Make sure PostgreSQL is running and accessible
  • API: Make sure your MusicBrainz server is running on http://localhost:8080
  • Check that Docker containers are up and running
  • Verify the server is accessible in your browser
  • NEW: For Docker, use container IP (172.18.0.2) instead of localhost

JSON errors

  • Make sure your input file is valid JSON
  • Check that it contains an array of objects
  • Verify all required fields are present

Performance issues

  • Use database mode instead of API mode for better performance
  • Ensure database indexes are built for faster queries
  • Check fuzzy search thresholds - higher thresholds mean fewer but more accurate matches

Collaboration detection issues

  • NEW: Check if it's a band name vs collaboration (e.g., "Simon & Garfunkel" vs "Lavato, Demi & Joe Jonas")
  • NEW: Verify the collaboration pattern is supported (ft., feat., featuring, &, and, ,)
  • NEW: Check case sensitivity - patterns are case-insensitive

Using Tests for Troubleshooting

  • FIRST STEP: Check src/tests/ directory for existing test files that might help
  • DEBUG SCRIPTS: Run python3 src/tests/debug_artist_search.py for artist search issues
  • COLLABORATION ISSUES: Check src/tests/test_failed_collaborations.py for collaboration examples
  • DATABASE ISSUES: Look at src/tests/test_simple_query.py for database connection patterns
  • WORKING EXAMPLES: Test files often contain working code that can be adapted for your issue

🎯 Use Cases

  • Karaoke Systems: Clean up song metadata for better search and organization
  • Music Libraries: Standardize artist names and add official IDs
  • Music Apps: Ensure consistent data across your application
  • Data Migration: Clean up legacy music data when moving to new systems
  • Fuzzy Matching: Handle typos and variations in artist/song names
  • NEW: Collaboration Handling: Process complex artist collaborations
  • NEW: Edge Cases: Handle artists with special characters and unusual names

📚 What are MBIDs?

MBID stands for MusicBrainz Identifier. These are unique, permanent IDs assigned to artists, recordings, and other music entities in the MusicBrainz database.

Benefits:

  • Permanent: Never change, even if names change
  • Universal: Used across many music applications
  • Reliable: Official identifiers from the MusicBrainz database
  • Linked Data: Connect to other music databases and services

🆕 Performance Comparison

Method Speed Rate Limiting Fuzzy Search Setup Complexity
Database 10x faster None Yes 🔧 Medium
API 🐌 Slower ⏱️ Yes (0.1s delay) No Easy

🆕 Collaboration Detection Examples

Input Type Detection Output
Bruno Mars ft. Cardi B Collaboration Primary pattern Bruno Mars feat. Cardi B
Pitbull ft. Ne-Yo, Afrojack & Nayer Complex Collaboration Multiple patterns Pitbull feat. Ne-Yo, Afrojack & Nayer
Simon & Garfunkel Band Name Protected Simon & Garfunkel
Lavato, Demi & Joe Jonas Collaboration Comma detection Lavato, Demi & Joe Jonas
Hall & Oates Band Name Protected Hall & Oates

🆕 Edge Case Examples

Input Type Handling Output
ACDC Name Variation Alias lookup AC/DC
98 Degrees Artist Alias Alias search 98°
S Club 7 Numerical Suffix Suffix removal S Club
Corby, Matt Sort Name Sort name search Matt Corby
Blink-182 Dash Variation Unicode dash handling blink182
P!nk Special Characters Direct search P!nk
3OH!3 Numbers + Special Direct search 3OH!3

🤝 Contributing

Found a bug or have a feature request?

  1. Check the existing issues
  2. Create a new issue with details
  3. Include sample data if possible

📄 License

This tool is provided as-is for educational and personal use.

📝 Lessons Learned

Database Integration

  • Direct PostgreSQL access is 10x faster than API calls
  • Docker networking requires container IPs, not localhost
  • Database name matters: musicbrainz_db not musicbrainz
  • Static caches cause problems: Wrong MBIDs override correct database lookups

Collaboration Handling

  • Primary patterns (ft., feat.) are always collaborations
  • Secondary patterns (&, and) require intelligence to distinguish from band names
  • Comma detection helps identify collaborations
  • Artist credit lookup is essential for preserving all collaborators

Edge Cases

  • Dash variations (regular vs Unicode) cause exact match failures
  • Artist aliases are common and important (98 Degrees → 98°)
  • Sort names handle "Last, First" formats
  • Numerical suffixes in names need special handling (S Club 7 → S Club)

Performance Optimization

  • Remove static caches for better accuracy
  • Database-first approach ensures live data
  • Fuzzy search thresholds need tuning for different datasets
  • Connection pooling would improve performance for large datasets

CLI Design

  • Simplified interface with smart defaults reduces complexity
  • Array format consistency makes output files easier to work with
  • Human-readable reports improve user experience
  • Test file organization keeps project structure clean

Happy cleaning! 🎵