# MusicBrainz Data Cleaner - Tests This directory contains all tests for the MusicBrainz Data Cleaner project, organized by type. ## ๐Ÿ“ Test Structure ``` src/tests/ โ”œโ”€โ”€ unit/ # Unit tests for individual components โ”œโ”€โ”€ integration/ # Integration tests for database and API โ”œโ”€โ”€ debug/ # Debug scripts and troubleshooting tests โ”œโ”€โ”€ run_tests.py # Test runner script โ”œโ”€โ”€ README.md # This file โ”œโ”€โ”€ legacy/ # Legacy scripts moved from root directory โ””โ”€โ”€ moved/ # Test files moved from root directory ``` ### Legacy Scripts (Moved from Root) - `process_full_dataset.py` - Legacy script that redirects to new CLI - `musicbrainz_cleaner.py` - Legacy entry point script ### Moved Test Files (Moved from Root) - `test_title_cleaning.py` - Test title cleaning functionality - `test_simple_query.py` - Test simple database queries - `debug_artist_search.py` - Debug artist search functionality - `test_failed_collaborations.py` - Test failed collaboration cases - `test_collaboration_debug.py` - Debug collaboration parsing - `test_100_random.py` - Test 100 random songs - `quick_test_20.py` - Quick test with 20 songs ## ๐Ÿงช Test Categories ### Unit Tests (`unit/`) - **Purpose**: Test individual components in isolation - **Examples**: - `test_data_loader.py` - Test data loading functionality - `test_collaboration_patterns.py` - Test collaboration detection - `test_hyphenated_artists.py` - Test artist name variations - `test_eazy_e.py` - Test specific edge cases ### Integration Tests (`integration/`) - **Purpose**: Test interactions between components - **Examples**: - `test_cli.py` - Test command-line interface - `direct_db_test.py` - Test database connectivity - `test_db_connection.py` - Test database queries ### Debug Tests (`debug/`) - **Purpose**: Debug scripts and troubleshooting tools - **Examples**: - `debug_collaboration.py` - Debug collaboration parsing - `simple_debug.py` - Simple debugging utilities - `check_collaboration.py` - Check collaboration handling ## ๐Ÿš€ Running Tests ### Run All Tests ```bash python3 src/tests/run_tests.py ``` ### Running Moved Test Files The following test files were moved from the root directory to `src/tests/`: ```bash # Run individual moved test files python3 src/tests/test_100_random.py python3 src/tests/quick_test_20.py python3 src/tests/test_title_cleaning.py python3 src/tests/test_simple_query.py python3 src/tests/debug_artist_search.py python3 src/tests/test_failed_collaborations.py python3 src/tests/test_collaboration_debug.py ``` ### Running Legacy Scripts Legacy scripts that redirect to the new CLI: ```bash # Legacy full dataset processing (redirects to CLI) python3 src/tests/process_full_dataset.py # Legacy entry point (redirects to CLI) python3 src/tests/musicbrainz_cleaner.py ``` **Note**: These legacy scripts are kept for backward compatibility but the new CLI is preferred: ```bash # Preferred method (new CLI) docker-compose run --rm musicbrainz-cleaner python3 -m src.cli.main ``` ### Run Specific Test Categories ```bash # Run only unit tests python3 src/tests/run_tests.py --unit # Run only integration tests python3 src/tests/run_tests.py --integration ``` ### Run Specific Test Module ```bash # Run a specific test file python3 src/tests/run_tests.py test_data_loader python3 src/tests/run_tests.py test_collaboration_patterns python3 src/tests/run_tests.py test_cli ``` ### List Available Tests ```bash python3 src/tests/run_tests.py --list ``` ## ๐Ÿ“‹ Test Data Files Some tests use JSON data files for testing: - `unit/test_aliases.json` - Test data for artist aliases - `unit/test_sclub7.json` - Test data for name variations - `unit/test_aliases_cleaned.json` - Expected output for alias tests - `unit/test_sclub7_cleaned.json` - Expected output for name variation tests ## ๐Ÿ”ง Test Requirements - **Database**: Some tests require a running MusicBrainz database - **Dependencies**: All Python dependencies must be installed - **Environment**: Tests should be run from the project root directory ## ๐Ÿ“ Writing New Tests ### Unit Tests - Place in `unit/` directory - Test individual functions or classes - Use mock data when possible - Follow naming convention: `test_*.py` ### Integration Tests - Place in `integration/` directory - Test component interactions - May require database connection - Follow naming convention: `test_*.py` ### Debug Scripts - Place in `debug/` directory - Use for troubleshooting specific issues - Can be temporary or permanent - Follow naming convention: `debug_*.py` or `check_*.py` ## ๐Ÿ› Debugging Tests If tests fail: 1. **Check database connection**: Ensure MusicBrainz database is running 2. **Check dependencies**: Ensure all requirements are installed 3. **Check environment**: Ensure you're running from the correct directory 4. **Use debug scripts**: Run debug scripts in `debug/` directory for troubleshooting ## ๐Ÿ“Š Test Coverage The test suite covers: - โœ… Data loading and validation - โœ… Artist name normalization - โœ… Collaboration detection - โœ… Database connectivity - โœ… CLI functionality - โœ… Edge cases and error handling - โœ… Fuzzy search algorithms - โœ… Recording count prioritization ## ๐Ÿ”„ Continuous Integration Tests are automatically run: - On pull requests - Before releases - During development All tests must pass before code is merged.