Signed-off-by: Matt Bruce <mbrucedogs@gmail.com>

This commit is contained in:
Matt Bruce 2025-08-05 08:37:30 -05:00
parent f053471a76
commit 7d60d7fc47
3 changed files with 167 additions and 8 deletions

View File

@ -463,4 +463,66 @@ fi
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --test-connection` | Test database connection |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --use-api` | Force API mode |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --help` | Show help |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version` | Show version |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main --version` | Show version |
## Artist Lookup System Commands
The MusicBrainz Data Cleaner includes an advanced Artist Lookup System with its own CLI interface for managing artist data.
### Artist Lookup CLI Structure
```bash
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli [command] [options]
```
### Available Commands
#### Search for Artists
```bash
# Search for an artist in the lookup table
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli search "Queen"
# Search with custom similarity threshold (0.0 to 1.0)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli search "Destiny's Child" --min-score 0.8
```
#### View Statistics
```bash
# Show lookup table statistics
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli stats
```
#### List All Artists
```bash
# List all artists in the lookup table
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli list
```
#### Add New Artists
```bash
# Add a new artist with variations
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli add \
--canonical-name "New Artist" \
--mbid "12345678-1234-1234-1234-123456789abc" \
--variations "Artist, The Artist, Artist Band" \
--notes "Description of the artist"
```
### Artist Lookup Command Reference
| Command | Description |
|---------|-------------|
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli search "Artist Name"` | Search for artist with fuzzy matching |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli search "Artist Name" --min-score 0.8` | Search with custom similarity threshold |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli stats` | Show lookup table statistics |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli list` | List all artists in lookup table |
| `docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli add --canonical-name "Name" --mbid "MBID" --variations "var1, var2"` | Add new artist to lookup table |
### Artist Lookup Features
- **2,446+ Artists**: Comprehensive lookup table
- **4,950+ Variations**: Extensive name variations and aliases
- **Fuzzy Matching**: Intelligent matching with configurable thresholds
- **Canonical Names**: Consistent artist name replacement
- **Automatic Integration**: Works seamlessly with song processing
- **CLI Management**: Full command-line interface for data management

52
PRD.md
View File

@ -4,9 +4,9 @@
## Project Overview
**Product Name:** MusicBrainz Data Cleaner
**Version:** 3.0.0
**Date:** December 19, 2024
**Status:** Production Ready with Advanced Database Integration
**Version:** 3.1.0
**Date:** August 4, 2024
**Status:** Production Ready with Advanced Artist Lookup System
## 🚀 Quick Start for New Sessions
@ -65,6 +65,9 @@ Users have song data in JSON format with inconsistent artist names, song titles,
- **NEW**: Use fuzzy search for better matching of similar names
- **NEW**: Handle artist aliases and name variations (e.g., "98 Degrees" → "98°")
- **NEW**: Distinguish between band names and collaborations (e.g., "Simon & Garfunkel" vs "Lavato, Demi & Joe Jonas")
- **NEW**: Advanced Artist Lookup System with 2,446+ artists and 4,950+ variations
- **NEW**: Fallback lookup table for artists not found in database
- **NEW**: Canonical name replacement for consistent artist naming
## Target Users
@ -656,4 +659,45 @@ Test files often contain working code snippets that can be adapted:
- **Intelligent artist selection**: Tries multiple artist candidates when first choice doesn't have the recording
- **Recording-aware prioritization**: Artists with the specific recording are prioritized
- **Fallback strategy**: Up to 5 different artist candidates are tried if needed
- **Comprehensive search**: Searches names, aliases, and fuzzy matches
- **Comprehensive search**: Searches names, aliases, and fuzzy matches
## Artist Lookup System
### Overview
The MusicBrainz Data Cleaner now includes an advanced Artist Lookup System that provides fallback matching for artists not found in the primary database search. This system significantly improves artist matching success rates.
### Key Features
- **2,446+ Artists**: Comprehensive lookup table with real and placeholder MBIDs
- **4,950+ Variations**: Extensive name variations and aliases
- **Fuzzy Matching**: Intelligent matching with configurable similarity thresholds
- **Canonical Names**: Consistent artist name replacement across datasets
- **Fallback System**: Secondary search when database lookup fails
### Data Structure
```json
{
"artist_variations": {
"Canonical Artist Name": {
"mbid": "real-or-placeholder-mbid",
"variations": [
"Artist Name",
"Artist Name Variation 1",
"Artist Name Variation 2"
],
"notes": "Description or status"
}
}
}
```
### Usage
- **Automatic Integration**: Works seamlessly with existing song processing
- **CLI Management**: Full command-line interface for managing lookup data
- **Search Capabilities**: Find artists by name with fuzzy matching
- **Statistics**: Comprehensive reporting on lookup table usage
### Benefits
- **Improved Success Rates**: Higher artist matching percentages
- **Consistent Naming**: Standardized artist names across datasets
- **Easy Management**: Simple tools for adding and updating artist data
- **Scalable**: Can be extended with additional artists and variations

View File

@ -1,6 +1,6 @@
# 🎵 MusicBrainz Data Cleaner v3.0
# 🎵 MusicBrainz Data Cleaner v3.1
A powerful command-line tool that cleans and normalizes your song data using the MusicBrainz database. **Now with interface-based architecture, advanced collaboration detection, artist alias handling, and intelligent fuzzy search for maximum accuracy!**
A powerful command-line tool that cleans and normalizes your song data using the MusicBrainz database. **Now with interface-based architecture, advanced collaboration detection, artist alias handling, intelligent fuzzy search, and a comprehensive Artist Lookup System for maximum accuracy!**
## 🚀 Quick Start for New Sessions
@ -45,7 +45,7 @@ docker-compose run --rm musicbrainz-cleaner python3 [script_name].py
**📋 Troubleshooting**: See `TROUBLESHOOTING.md` for common issues and solutions.
## ✨ What's New in v3.0
## ✨ What's New in v3.1
- **🏗️ Interface-Based Architecture**: Clean dependency injection with common interfaces
- **🏭 Factory Pattern**: Smart data provider creation and configuration
@ -59,6 +59,9 @@ docker-compose run --rm musicbrainz-cleaner python3 [script_name].py
- **🆕 Sort Names**: Handle "Last, First" formats like "Corby, Matt" → "Matt Corby"
- **🆕 Edge Case Handling**: Support for artists with hyphens, exclamation marks, numbers, and special characters
- **🆕 Band Name Protection**: Distinguish between band names (Simon & Garfunkel) and collaborations (Lavato, Demi & Joe Jonas)
- **🆕 Artist Lookup System**: Comprehensive fallback system with 2,446+ artists and 4,950+ variations
- **🆕 Canonical Name Replacement**: Consistent artist naming across datasets
- **🆕 CLI Management Tools**: Full command-line interface for managing artist lookup data
## ✨ What It Does
@ -131,6 +134,56 @@ docker-compose run --rm musicbrainz-cleaner python3 [script_name].py
### For detailed setup instructions, see [SETUP.md](SETUP.md)
## 🎯 Artist Lookup System
The MusicBrainz Data Cleaner includes an advanced Artist Lookup System that provides fallback matching for artists not found in the primary database search.
### Features
- **2,446+ Artists**: Comprehensive lookup table with real and placeholder MBIDs
- **4,950+ Variations**: Extensive name variations and aliases
- **Fuzzy Matching**: Intelligent matching with configurable similarity thresholds
- **Canonical Names**: Consistent artist name replacement across datasets
- **Automatic Integration**: Works seamlessly with existing song processing
### Usage Examples
#### Search for Artists
```bash
# Search for an artist in the lookup table
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli search "Queen"
# Search with custom similarity threshold
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli search "Destiny's Child" --min-score 0.8
```
#### View Statistics
```bash
# Show lookup table statistics
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli stats
```
#### List All Artists
```bash
# List all artists in the lookup table
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli list
```
#### Add New Artists
```bash
# Add a new artist with variations
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.artist_lookup_cli add \
--canonical-name "New Artist" \
--mbid "12345678-1234-1234-1234-123456789abc" \
--variations "Artist, The Artist, Artist Band" \
--notes "Description of the artist"
```
### Benefits
- **Improved Success Rates**: Higher artist matching percentages
- **Consistent Naming**: Standardized artist names across datasets
- **Easy Management**: Simple tools for adding and updating artist data
- **Scalable**: Can be extended with additional artists and variations
## 🔄 After System Reboot
After restarting your Mac, you'll need to restart the MusicBrainz services: