Signed-off-by: Matt Bruce <mbrucedogs@gmail.com>
This commit is contained in:
parent
0f33590eca
commit
b794d9dc1c
20
COMMANDS.md
20
COMMANDS.md
@ -2,7 +2,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool creates separate output files for successful and failed songs, along with detailed processing reports.
|
||||
The MusicBrainz Data Cleaner is a command-line interface (CLI) tool that processes JSON song data files and cleans/normalizes the metadata using the MusicBrainz database. The tool uses an interface-based architecture with dependency injection for clean, maintainable code. It creates separate output files for successful and failed songs, along with detailed processing reports.
|
||||
|
||||
## Basic Command Structure
|
||||
|
||||
@ -277,6 +277,24 @@ Error: Source file should contain a JSON array of songs
|
||||
ModuleNotFoundError: No module named 'requests'
|
||||
```
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Interface-Based Design
|
||||
|
||||
The tool uses a clean interface-based architecture:
|
||||
|
||||
- **`MusicBrainzDataProvider` Interface**: Common protocol for data access
|
||||
- **`DataProviderFactory`**: Creates appropriate provider (database or API)
|
||||
- **`SongProcessor`**: Centralized processing logic using the interface
|
||||
- **Dependency Injection**: CLI depends on interfaces, not concrete classes
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **CLI** uses `DataProviderFactory` to create data provider
|
||||
2. **Factory** returns either database or API implementation
|
||||
3. **SongProcessor** processes songs using the common interface
|
||||
4. **Same logic** works regardless of provider type
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
### Docker Environment
|
||||
|
||||
60
PRD.md
60
PRD.md
@ -162,6 +162,7 @@ Users have song data in JSON format with inconsistent artist names, song titles,
|
||||
- **Primary:** Direct PostgreSQL database access
|
||||
- **Fallback:** MusicBrainz REST API (local server)
|
||||
- **Interface:** Command-line (CLI)
|
||||
- **Design Pattern:** Interface-based architecture with dependency injection
|
||||
|
||||
### Project Structure
|
||||
```
|
||||
@ -169,20 +170,27 @@ src/
|
||||
├── __init__.py # Package initialization
|
||||
├── api/ # API-related modules
|
||||
│ ├── __init__.py
|
||||
│ ├── database.py # Direct PostgreSQL access with fuzzy search
|
||||
│ └── api_client.py # Legacy HTTP API client (fallback)
|
||||
│ ├── database.py # Direct PostgreSQL access (implements MusicBrainzDataProvider)
|
||||
│ └── api_client.py # HTTP API client (implements MusicBrainzDataProvider)
|
||||
├── cli/ # Command-line interface
|
||||
│ ├── __init__.py
|
||||
│ └── main.py # Main CLI implementation
|
||||
│ └── main.py # Main CLI implementation (uses factory pattern)
|
||||
├── config/ # Configuration
|
||||
│ ├── __init__.py
|
||||
│ └── constants.py # Constants and settings
|
||||
├── core/ # Core functionality
|
||||
│ ├── __init__.py
|
||||
│ ├── interfaces.py # Common interfaces and protocols
|
||||
│ ├── factory.py # Data provider factory
|
||||
│ └── song_processor.py # Centralized song processing logic
|
||||
├── tests/ # Test files and scripts
|
||||
│ ├── __init__.py
|
||||
│ ├── test_*.py # Unit and integration tests
|
||||
│ └── debug_*.py # Debug scripts
|
||||
└── utils/ # Utility functions
|
||||
├── __init__.py
|
||||
├── artist_title_processing.py # Shared artist/title processing
|
||||
└── data_loader.py # Data loading utilities
|
||||
```
|
||||
|
||||
### Architectural Principles
|
||||
@ -193,13 +201,18 @@ src/
|
||||
- **Error Handling**: Graceful error handling with meaningful messages
|
||||
- **Performance First**: Direct database access for maximum speed
|
||||
- **Fallback Strategy**: Automatic fallback to API when database unavailable
|
||||
- **NEW**: **Database-First**: Always use live database data over static caches
|
||||
- **NEW**: **Intelligent Collaboration Detection**: Distinguish band names from collaborations
|
||||
- **NEW**: **Test Organization**: All test files must be placed in `src/tests/` directory, not in root
|
||||
- **Interface-Based Design**: Uses dependency injection with common interfaces
|
||||
- **Factory Pattern**: Clean provider creation and configuration
|
||||
- **Single Responsibility**: Each class has one clear purpose
|
||||
- **Database-First**: Always use live database data over static caches
|
||||
- **Intelligent Collaboration Detection**: Distinguish band names from collaborations
|
||||
- **Test Organization**: All test files must be placed in `src/tests/` directory, not in root
|
||||
|
||||
### Data Flow
|
||||
1. Read JSON input file
|
||||
2. For each song:
|
||||
1. **CLI** uses `DataProviderFactory` to create appropriate data provider (database or API)
|
||||
2. **SongProcessor** receives the data provider and processes songs using the common interface
|
||||
3. **Data Provider** (database or API) implements the same interface for consistent behavior
|
||||
4. For each song:
|
||||
- Clean artist name using name variations
|
||||
- Detect collaboration patterns
|
||||
- Use fuzzy search to find artist in database (including aliases, sort_names)
|
||||
@ -207,7 +220,7 @@ src/
|
||||
- For collaborations: find artist_credit and recording
|
||||
- For single artists: find recording by artist and title
|
||||
- Update song object with corrected data and MBIDs
|
||||
3. Write cleaned data to output file
|
||||
5. Write cleaned data to output file
|
||||
|
||||
### Fuzzy Search Implementation
|
||||
- **Algorithm**: Uses fuzzywuzzy library with multiple matching strategies
|
||||
@ -316,18 +329,23 @@ docker-compose logs -f musicbrainz
|
||||
- [x] Fuzzy search for artists and recordings
|
||||
- [x] Automatic fallback to API mode
|
||||
- [x] Performance optimizations
|
||||
- [x] **NEW**: Advanced collaboration detection and handling
|
||||
- [x] **NEW**: Artist alias and sort_name search
|
||||
- [x] **NEW**: Dash variation handling
|
||||
- [x] **NEW**: Numerical suffix handling
|
||||
- [x] **NEW**: Band name vs collaboration distinction
|
||||
- [x] **NEW**: Complex collaboration parsing
|
||||
- [x] **NEW**: Removed problematic known_artists cache
|
||||
- [x] **NEW**: Simplified CLI with default full dataset processing
|
||||
- [x] **NEW**: Separate output files for successful and failed songs (array format)
|
||||
- [x] **NEW**: Human-readable text reports with statistics
|
||||
- [x] **NEW**: Smart defaults for all file paths and options
|
||||
- [x] **NEW**: Configurable processing limits and output file paths
|
||||
- [x] Advanced collaboration detection and handling
|
||||
- [x] Artist alias and sort_name search
|
||||
- [x] Dash variation handling
|
||||
- [x] Numerical suffix handling
|
||||
- [x] Band name vs collaboration distinction
|
||||
- [x] Complex collaboration parsing
|
||||
- [x] Removed problematic known_artists cache
|
||||
- [x] Simplified CLI with default full dataset processing
|
||||
- [x] Separate output files for successful and failed songs (array format)
|
||||
- [x] Human-readable text reports with statistics
|
||||
- [x] Smart defaults for all file paths and options
|
||||
- [x] Configurable processing limits and output file paths
|
||||
- [x] **NEW**: Interface-based architecture with dependency injection
|
||||
- [x] **NEW**: Factory pattern for data provider creation
|
||||
- [x] **NEW**: Centralized song processing logic
|
||||
- [x] **NEW**: Common interfaces for database and API clients
|
||||
- [x] **NEW**: Clean separation of concerns
|
||||
|
||||
### 🔄 Future Enhancements
|
||||
- [ ] Web interface option
|
||||
|
||||
12
README.md
12
README.md
@ -1,6 +1,6 @@
|
||||
# 🎵 MusicBrainz Data Cleaner v3.0
|
||||
|
||||
A powerful command-line tool that cleans and normalizes your song data using the MusicBrainz database. **Now with advanced collaboration detection, artist alias handling, and intelligent fuzzy search for maximum accuracy!**
|
||||
A powerful command-line tool that cleans and normalizes your song data using the MusicBrainz database. **Now with interface-based architecture, advanced collaboration detection, artist alias handling, and intelligent fuzzy search for maximum accuracy!**
|
||||
|
||||
## 🚀 Quick Start for New Sessions
|
||||
|
||||
@ -47,6 +47,8 @@ docker-compose run --rm musicbrainz-cleaner python3 [script_name].py
|
||||
|
||||
## ✨ What's New in v3.0
|
||||
|
||||
- **🏗️ Interface-Based Architecture**: Clean dependency injection with common interfaces
|
||||
- **🏭 Factory Pattern**: Smart data provider creation and configuration
|
||||
- **🚀 Direct Database Access**: Connect directly to PostgreSQL for 10x faster performance
|
||||
- **🎯 Advanced Fuzzy Search**: Intelligent matching for similar artist names and song titles
|
||||
- **🔄 Automatic Fallback**: Falls back to API mode if database access fails
|
||||
@ -299,11 +301,19 @@ python3 src/tests/run_tests.py --list
|
||||
musicbrainz-cleaner/
|
||||
├── src/
|
||||
│ ├── api/ # Database and API access
|
||||
│ │ ├── database.py # Direct PostgreSQL access (implements MusicBrainzDataProvider)
|
||||
│ │ └── api_client.py # HTTP API client (implements MusicBrainzDataProvider)
|
||||
│ ├── cli/ # Command-line interface
|
||||
│ │ └── main.py # Main CLI implementation (uses factory pattern)
|
||||
│ ├── config/ # Configuration and constants
|
||||
│ ├── core/ # Core functionality
|
||||
│ │ ├── interfaces.py # Common interfaces and protocols
|
||||
│ │ ├── factory.py # Data provider factory
|
||||
│ │ └── song_processor.py # Centralized song processing logic
|
||||
│ ├── tests/ # Test files (REQUIRED location)
|
||||
│ └── utils/ # Utility functions
|
||||
│ ├── artist_title_processing.py # Shared artist/title processing
|
||||
│ └── data_loader.py # Data loading utilities
|
||||
├── data/ # Data files and output
|
||||
│ ├── known_artists.json # Name variations (ACDC → AC/DC)
|
||||
│ ├── known_recordings.json # Known recording MBIDs
|
||||
|
||||
@ -1,19 +1,20 @@
|
||||
"""
|
||||
Legacy HTTP API client for MusicBrainz Data Cleaner.
|
||||
Used as fallback when direct database access is not available.
|
||||
HTTP API client for MusicBrainz Data Cleaner.
|
||||
Implements the MusicBrainzDataProvider interface for API-based access.
|
||||
"""
|
||||
|
||||
import requests
|
||||
import time
|
||||
from typing import Dict, Optional, Any
|
||||
from typing import Dict, Optional, Any, Tuple
|
||||
from ..config.constants import (
|
||||
DEFAULT_MUSICBRAINZ_URL, API_REQUEST_DELAY, REQUEST_TIMEOUT,
|
||||
SUCCESS_MESSAGES, ERROR_MESSAGES
|
||||
)
|
||||
from ..core.interfaces import MusicBrainzDataProvider
|
||||
|
||||
|
||||
class MusicBrainzAPIClient:
|
||||
"""Legacy HTTP API client for MusicBrainz (fallback option)."""
|
||||
class MusicBrainzAPIClient(MusicBrainzDataProvider):
|
||||
"""HTTP API client for MusicBrainz implementing the data provider interface."""
|
||||
|
||||
def __init__(self, base_url: str = DEFAULT_MUSICBRAINZ_URL):
|
||||
self.base_url = base_url
|
||||
@ -30,8 +31,8 @@ class MusicBrainzAPIClient:
|
||||
print(f"API connection test failed: {e}")
|
||||
return False
|
||||
|
||||
def search_artist(self, artist_name: str) -> Optional[Dict[str, Any]]:
|
||||
"""Search for artist by name using API."""
|
||||
def fuzzy_search_artist(self, artist_name: str) -> Optional[Tuple[str, str, float]]:
|
||||
"""Search for artist by name using API. Returns (artist_name, mbid, score)."""
|
||||
try:
|
||||
url = f"{self.base_url}/ws/2/artist/?query=name:{artist_name}&fmt=json"
|
||||
response = self.session.get(url)
|
||||
@ -39,14 +40,15 @@ class MusicBrainzAPIClient:
|
||||
data = response.json()
|
||||
|
||||
if data.get('artists') and len(data['artists']) > 0:
|
||||
return data['artists'][0]
|
||||
artist = data['artists'][0]
|
||||
return (artist['name'], artist['id'], 1.0) # Perfect match for API
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"API search failed for artist '{artist_name}': {e}")
|
||||
return None
|
||||
|
||||
def search_recording(self, title: str, artist_mbid: Optional[str] = None) -> Optional[Dict[str, Any]]:
|
||||
"""Search for recording by title and optionally artist using API."""
|
||||
def fuzzy_search_recording(self, title: str, artist_mbid: Optional[str] = None) -> Optional[Tuple[str, str, float]]:
|
||||
"""Search for recording by title and optionally artist using API. Returns (recording_name, mbid, score)."""
|
||||
try:
|
||||
if artist_mbid:
|
||||
url = f"{self.base_url}/ws/2/recording/?query=arid:{artist_mbid}%20AND%20name:{title}&fmt=json"
|
||||
@ -58,7 +60,8 @@ class MusicBrainzAPIClient:
|
||||
data = response.json()
|
||||
|
||||
if data.get('recordings') and len(data['recordings']) > 0:
|
||||
return data['recordings'][0]
|
||||
recording = data['recordings'][0]
|
||||
return (recording['title'], recording['id'], 1.0) # Perfect match for API
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"API search failed for recording '{title}': {e}")
|
||||
|
||||
@ -13,10 +13,11 @@ from ..config.constants import (
|
||||
TITLE_SIMILARITY_THRESHOLD, ARTIST_SIMILARITY_THRESHOLD,
|
||||
SUCCESS_MESSAGES, ERROR_MESSAGES
|
||||
)
|
||||
from src.utils.artist_title_processing import parse_complex_collaboration, parse_collaborators, generate_title_variations
|
||||
from ..utils.artist_title_processing import parse_complex_collaboration, parse_collaborators, generate_title_variations
|
||||
from ..core.interfaces import MusicBrainzDataProvider
|
||||
|
||||
|
||||
class MusicBrainzDatabase:
|
||||
class MusicBrainzDatabase(MusicBrainzDataProvider):
|
||||
"""Direct PostgreSQL database access for MusicBrainz with fuzzy search."""
|
||||
|
||||
def __init__(self, host: str = DB_HOST, port: int = DB_PORT,
|
||||
|
||||
@ -23,35 +23,23 @@ from ..config.constants import (
|
||||
|
||||
|
||||
|
||||
# Import database and API clients
|
||||
from ..api.database import MusicBrainzDatabase
|
||||
from ..api.api_client import MusicBrainzAPIClient
|
||||
# Import core components
|
||||
from ..core.song_processor import SongProcessor
|
||||
from ..core.factory import DataProviderFactory
|
||||
|
||||
|
||||
class MusicBrainzCleaner:
|
||||
"""Enhanced MusicBrainz Cleaner with direct database access and fuzzy search."""
|
||||
"""Enhanced MusicBrainz Cleaner with interface-based data access."""
|
||||
|
||||
def __init__(self, use_database: bool = True, base_url: str = DEFAULT_MUSICBRAINZ_URL):
|
||||
self.use_database = use_database
|
||||
self.base_url = base_url
|
||||
|
||||
# Initialize database connection (primary method)
|
||||
if use_database:
|
||||
self.db = MusicBrainzDatabase()
|
||||
if not self.db.connect():
|
||||
print("⚠️ Database connection failed, falling back to API")
|
||||
self.use_database = False
|
||||
# Create data provider using factory
|
||||
self.data_provider = DataProviderFactory.create_provider(use_database, base_url)
|
||||
|
||||
# Initialize API client (fallback method)
|
||||
if not self.use_database:
|
||||
self.api = MusicBrainzAPIClient(base_url)
|
||||
|
||||
# Initialize centralized song processor
|
||||
self.song_processor = SongProcessor(
|
||||
database_client=self.db if use_database else None,
|
||||
api_client=self.api if not use_database else None
|
||||
)
|
||||
# Initialize centralized song processor with the data provider
|
||||
self.song_processor = SongProcessor(self.data_provider)
|
||||
|
||||
def clean_song(self, song: Dict[str, Any]) -> Tuple[Dict[str, Any], bool]:
|
||||
"""
|
||||
@ -394,21 +382,14 @@ def main() -> int:
|
||||
|
||||
# Handle test connection
|
||||
if parsed['test_connection']:
|
||||
if parsed['use_api']:
|
||||
api = MusicBrainzAPIClient()
|
||||
if api.test_connection():
|
||||
print("✅ Connection to MusicBrainz API server successful")
|
||||
provider = DataProviderFactory.create_provider(not parsed['use_api'])
|
||||
if provider.test_connection():
|
||||
provider_type = "API server" if parsed['use_api'] else "database"
|
||||
print(f"✅ Connection to MusicBrainz {provider_type} successful")
|
||||
return ExitCode.SUCCESS
|
||||
else:
|
||||
print("❌ Connection to MusicBrainz API server failed")
|
||||
return ExitCode.ERROR
|
||||
else:
|
||||
db = MusicBrainzDatabase()
|
||||
if db.test_connection():
|
||||
print("✅ Connection to MusicBrainz database successful")
|
||||
return ExitCode.SUCCESS
|
||||
else:
|
||||
print("❌ Connection to MusicBrainz database failed")
|
||||
provider_type = "API server" if parsed['use_api'] else "database"
|
||||
print(f"❌ Connection to MusicBrainz {provider_type} failed")
|
||||
return ExitCode.ERROR
|
||||
|
||||
# Process songs (main functionality)
|
||||
|
||||
36
src/core/factory.py
Normal file
36
src/core/factory.py
Normal file
@ -0,0 +1,36 @@
|
||||
"""
|
||||
Factory for creating MusicBrainz data providers.
|
||||
Provides a clean way to instantiate the appropriate data provider based on configuration.
|
||||
"""
|
||||
|
||||
from typing import Optional
|
||||
from ..api.database import MusicBrainzDatabase
|
||||
from ..api.api_client import MusicBrainzAPIClient
|
||||
from .interfaces import MusicBrainzDataProvider
|
||||
from ..config.constants import DEFAULT_MUSICBRAINZ_URL
|
||||
|
||||
|
||||
class DataProviderFactory:
|
||||
"""Factory for creating MusicBrainz data providers."""
|
||||
|
||||
@staticmethod
|
||||
def create_provider(use_database: bool = True, base_url: str = DEFAULT_MUSICBRAINZ_URL) -> MusicBrainzDataProvider:
|
||||
"""
|
||||
Create a data provider based on configuration.
|
||||
|
||||
Args:
|
||||
use_database: Whether to use database (True) or API (False)
|
||||
base_url: Base URL for API client (only used if use_database=False)
|
||||
|
||||
Returns:
|
||||
MusicBrainzDataProvider instance
|
||||
"""
|
||||
if use_database:
|
||||
provider = MusicBrainzDatabase()
|
||||
if provider.connect():
|
||||
return provider
|
||||
else:
|
||||
print("⚠️ Database connection failed, falling back to API")
|
||||
return MusicBrainzAPIClient(base_url)
|
||||
else:
|
||||
return MusicBrainzAPIClient(base_url)
|
||||
60
src/core/interfaces.py
Normal file
60
src/core/interfaces.py
Normal file
@ -0,0 +1,60 @@
|
||||
"""
|
||||
Core interfaces for MusicBrainz Data Cleaner.
|
||||
Defines the common protocol that all data access implementations must follow.
|
||||
"""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Dict, Optional, Any, Tuple, List
|
||||
|
||||
|
||||
class MusicBrainzDataProvider(ABC):
|
||||
"""
|
||||
Abstract base class defining the interface for MusicBrainz data access.
|
||||
Both database and API implementations must implement these methods.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def test_connection(self) -> bool:
|
||||
"""Test connection to the data source."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def fuzzy_search_artist(self, artist_name: str) -> Optional[Tuple[str, str, float]]:
|
||||
"""
|
||||
Fuzzy search for artist by name.
|
||||
Returns (artist_name, mbid, similarity_score) or None.
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def fuzzy_search_recording(self, title: str, artist_mbid: Optional[str] = None) -> Optional[Tuple[str, str, float]]:
|
||||
"""
|
||||
Fuzzy search for recording by title and optionally artist.
|
||||
Returns (recording_name, mbid, similarity_score) or None.
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_artist_info(self, mbid: str) -> Optional[Dict[str, Any]]:
|
||||
"""Get detailed artist information by MBID."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_recording_info(self, mbid: str) -> Optional[Dict[str, Any]]:
|
||||
"""Get detailed recording information by MBID."""
|
||||
pass
|
||||
|
||||
|
||||
class SongProcessorInterface(ABC):
|
||||
"""
|
||||
Abstract base class defining the interface for song processing.
|
||||
This ensures consistent behavior across different implementations.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def clean_song(self, song: Dict[str, Any]) -> Tuple[Dict[str, Any], bool]:
|
||||
"""
|
||||
Clean a single song.
|
||||
Returns (cleaned_song, success_status)
|
||||
"""
|
||||
pass
|
||||
@ -7,22 +7,20 @@ between CLI and database interactions to ensure consistency.
|
||||
from typing import Dict, Optional, Tuple, Any, List
|
||||
from ..utils.artist_title_processing import parse_complex_collaboration, generate_title_variations
|
||||
from ..utils.data_loader import data_loader
|
||||
from .interfaces import MusicBrainzDataProvider, SongProcessorInterface
|
||||
|
||||
|
||||
class SongProcessor:
|
||||
class SongProcessor(SongProcessorInterface):
|
||||
"""Centralized song processing with consistent logic across all interfaces."""
|
||||
|
||||
def __init__(self, database_client=None, api_client=None):
|
||||
def __init__(self, data_provider: MusicBrainzDataProvider):
|
||||
"""
|
||||
Initialize with optional database and API clients.
|
||||
Initialize with a data provider that implements MusicBrainzDataProvider.
|
||||
|
||||
Args:
|
||||
database_client: MusicBrainzDatabase instance for direct DB access
|
||||
api_client: MusicBrainzAPIClient instance for API fallback
|
||||
data_provider: Instance implementing MusicBrainzDataProvider interface
|
||||
"""
|
||||
self.db = database_client
|
||||
self.api = api_client
|
||||
self.use_database = database_client is not None
|
||||
self.data_provider = data_provider
|
||||
|
||||
def find_artist_mbid(self, artist_name: str) -> Optional[str]:
|
||||
"""
|
||||
@ -37,21 +35,10 @@ class SongProcessor:
|
||||
if not artist_name:
|
||||
return None
|
||||
|
||||
# Try database first if available
|
||||
if self.use_database:
|
||||
result = self.db.fuzzy_search_artist(artist_name)
|
||||
result = self.data_provider.fuzzy_search_artist(artist_name)
|
||||
if result and isinstance(result, tuple) and len(result) >= 2:
|
||||
return result[1] # Return MBID from tuple (artist_name, mbid, score)
|
||||
|
||||
# Fallback to API
|
||||
if self.api:
|
||||
try:
|
||||
result = self.api.search_artist(artist_name)
|
||||
if result:
|
||||
return result['id']
|
||||
except:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
def find_recording_mbid(self, artist_mbid: str, title: str, original_artist: str = None) -> Optional[str]:
|
||||
@ -83,12 +70,11 @@ class SongProcessor:
|
||||
return recording_mbid
|
||||
|
||||
# Handle collaborations using artist credit
|
||||
if self.use_database:
|
||||
# If no artist_mbid (collaboration case), try to find by title and verify artist credit
|
||||
if not artist_mbid and original_artist:
|
||||
# This is a collaboration case, try to find by title with all variations
|
||||
for variation in title_variations:
|
||||
result = self.db.fuzzy_search_recording(variation)
|
||||
result = self.data_provider.fuzzy_search_recording(variation)
|
||||
if result and isinstance(result, tuple) and len(result) >= 2:
|
||||
recording_mbid = result[1]
|
||||
|
||||
@ -113,40 +99,23 @@ class SongProcessor:
|
||||
if collaborator.lower() in recording_artists:
|
||||
return recording_mbid
|
||||
|
||||
return None
|
||||
else:
|
||||
# Regular case with artist_mbid - try all title variations
|
||||
for variation in title_variations:
|
||||
result = self.db.fuzzy_search_recording(variation, artist_mbid)
|
||||
result = self.data_provider.fuzzy_search_recording(variation, artist_mbid)
|
||||
if result and isinstance(result, tuple) and len(result) >= 2:
|
||||
return result[1] # Return MBID from tuple (recording_name, mbid, score)
|
||||
else:
|
||||
# Fallback to API - try all title variations
|
||||
for variation in title_variations:
|
||||
try:
|
||||
result = self.api.search_recording(variation, artist_mbid)
|
||||
if result:
|
||||
return result['id']
|
||||
except:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
def get_artist_info(self, mbid: str) -> Optional[Dict[str, Any]]:
|
||||
"""Get artist info using consistent logic."""
|
||||
if self.use_database:
|
||||
return self.db.get_artist_info(mbid)
|
||||
elif self.api:
|
||||
return self.api.get_artist_info(mbid)
|
||||
return None
|
||||
return self.data_provider.get_artist_info(mbid)
|
||||
|
||||
def get_recording_info(self, mbid: str) -> Optional[Dict[str, Any]]:
|
||||
"""Get recording info using consistent logic."""
|
||||
if self.use_database:
|
||||
return self.db.get_recording_info(mbid)
|
||||
elif self.api:
|
||||
return self.api.get_recording_info(mbid)
|
||||
return None
|
||||
return self.data_provider.get_recording_info(mbid)
|
||||
|
||||
def _build_artist_string(self, artist_credit: list) -> str:
|
||||
"""Build artist string from artist credit list (API format)."""
|
||||
@ -189,9 +158,9 @@ class SongProcessor:
|
||||
recording_info = self.get_recording_info(recording_mbid)
|
||||
if recording_info:
|
||||
# Update with the correct artist credit and title
|
||||
if self.use_database and recording_info.get('artist_credit'):
|
||||
if recording_info.get('artist_credit'):
|
||||
song['artist'] = recording_info['artist_credit']
|
||||
elif not self.use_database and recording_info.get('artist-credit'):
|
||||
elif recording_info.get('artist-credit'):
|
||||
artist_string = self._build_artist_string(recording_info['artist-credit'])
|
||||
if artist_string:
|
||||
song['artist'] = artist_string
|
||||
@ -200,10 +169,9 @@ class SongProcessor:
|
||||
song['recording_mbid'] = recording_mbid
|
||||
|
||||
# For collaborations, try to get the main artist's MBID
|
||||
if self.use_database:
|
||||
main_artist, collaborators = parse_complex_collaboration(song.get('artist', ''))
|
||||
if main_artist:
|
||||
artist_result = self.db.fuzzy_search_artist(main_artist)
|
||||
artist_result = self.data_provider.fuzzy_search_artist(main_artist)
|
||||
if artist_result and isinstance(artist_result, tuple) and len(artist_result) >= 2:
|
||||
song['mbid'] = artist_result[1] # Set the main artist's MBID
|
||||
|
||||
@ -229,12 +197,12 @@ class SongProcessor:
|
||||
recording_info = self.get_recording_info(recording_mbid)
|
||||
if recording_info:
|
||||
# Update artist string if there are multiple artists, but preserve the artist MBID
|
||||
if self.use_database and recording_info.get('artist_credit'):
|
||||
if recording_info.get('artist_credit'):
|
||||
song['artist'] = recording_info['artist_credit']
|
||||
# Keep the original artist MBID even when updating artist name
|
||||
if 'mbid' not in song:
|
||||
song['mbid'] = artist_mbid
|
||||
elif not self.use_database and recording_info.get('artist-credit'):
|
||||
elif recording_info.get('artist-credit'):
|
||||
artist_string = self._build_artist_string(recording_info['artist-credit'])
|
||||
if artist_string:
|
||||
song['artist'] = artist_string
|
||||
|
||||
Loading…
Reference in New Issue
Block a user