musicbrainz-cleaner/src/tests/README.md

5.3 KiB

MusicBrainz Data Cleaner - Tests

This directory contains all tests for the MusicBrainz Data Cleaner project, organized by type.

📁 Test Structure

src/tests/
├── unit/           # Unit tests for individual components
├── integration/    # Integration tests for database and API
├── debug/          # Debug scripts and troubleshooting tests
├── run_tests.py    # Test runner script
├── README.md       # This file
├── legacy/         # Legacy scripts moved from root directory
└── moved/          # Test files moved from root directory

Legacy Scripts (Moved from Root)

  • process_full_dataset.py - Legacy script that redirects to new CLI
  • musicbrainz_cleaner.py - Legacy entry point script

Moved Test Files (Moved from Root)

  • test_title_cleaning.py - Test title cleaning functionality
  • test_simple_query.py - Test simple database queries
  • debug_artist_search.py - Debug artist search functionality
  • test_failed_collaborations.py - Test failed collaboration cases
  • test_collaboration_debug.py - Debug collaboration parsing
  • test_100_random.py - Test 100 random songs
  • quick_test_20.py - Quick test with 20 songs

🧪 Test Categories

Unit Tests (unit/)

  • Purpose: Test individual components in isolation
  • Examples:
    • test_data_loader.py - Test data loading functionality
    • test_collaboration_patterns.py - Test collaboration detection
    • test_hyphenated_artists.py - Test artist name variations
    • test_eazy_e.py - Test specific edge cases

Integration Tests (integration/)

  • Purpose: Test interactions between components
  • Examples:
    • test_cli.py - Test command-line interface
    • direct_db_test.py - Test database connectivity
    • test_db_connection.py - Test database queries

Debug Tests (debug/)

  • Purpose: Debug scripts and troubleshooting tools
  • Examples:
    • debug_collaboration.py - Debug collaboration parsing
    • simple_debug.py - Simple debugging utilities
    • check_collaboration.py - Check collaboration handling

🚀 Running Tests

Run All Tests

python3 src/tests/run_tests.py

Running Moved Test Files

The following test files were moved from the root directory to src/tests/:

# Run individual moved test files
python3 src/tests/test_100_random.py
python3 src/tests/quick_test_20.py
python3 src/tests/test_title_cleaning.py
python3 src/tests/test_simple_query.py
python3 src/tests/debug_artist_search.py
python3 src/tests/test_failed_collaborations.py
python3 src/tests/test_collaboration_debug.py

Running Legacy Scripts

Legacy scripts that redirect to the new CLI:

# Legacy full dataset processing (redirects to CLI)
python3 src/tests/process_full_dataset.py

# Legacy entry point (redirects to CLI)
python3 src/tests/musicbrainz_cleaner.py

Note: These legacy scripts are kept for backward compatibility but the new CLI is preferred:

# Preferred method (new CLI)
docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main

Run Specific Test Categories

# Run only unit tests
python3 src/tests/run_tests.py --unit

# Run only integration tests
python3 src/tests/run_tests.py --integration

Run Specific Test Module

# Run a specific test file
python3 src/tests/run_tests.py test_data_loader
python3 src/tests/run_tests.py test_collaboration_patterns
python3 src/tests/run_tests.py test_cli

List Available Tests

python3 src/tests/run_tests.py --list

📋 Test Data Files

Some tests use JSON data files for testing:

  • unit/test_aliases.json - Test data for artist aliases
  • unit/test_sclub7.json - Test data for name variations
  • unit/test_aliases_cleaned.json - Expected output for alias tests
  • unit/test_sclub7_cleaned.json - Expected output for name variation tests

🔧 Test Requirements

  • Database: Some tests require a running MusicBrainz database
  • Dependencies: All Python dependencies must be installed
  • Environment: Tests should be run from the project root directory

📝 Writing New Tests

Unit Tests

  • Place in unit/ directory
  • Test individual functions or classes
  • Use mock data when possible
  • Follow naming convention: test_*.py

Integration Tests

  • Place in integration/ directory
  • Test component interactions
  • May require database connection
  • Follow naming convention: test_*.py

Debug Scripts

  • Place in debug/ directory
  • Use for troubleshooting specific issues
  • Can be temporary or permanent
  • Follow naming convention: debug_*.py or check_*.py

🐛 Debugging Tests

If tests fail:

  1. Check database connection: Ensure MusicBrainz database is running
  2. Check dependencies: Ensure all requirements are installed
  3. Check environment: Ensure you're running from the correct directory
  4. Use debug scripts: Run debug scripts in debug/ directory for troubleshooting

📊 Test Coverage

The test suite covers:

  • Data loading and validation
  • Artist name normalization
  • Collaboration detection
  • Database connectivity
  • CLI functionality
  • Edge cases and error handling
  • Fuzzy search algorithms
  • Recording count prioritization

🔄 Continuous Integration

Tests are automatically run:

  • On pull requests
  • Before releases
  • During development

All tests must pass before code is merged.