6.8 KiB
Music Charts Archive Scraper
A full-stack React application for scraping and visualizing music chart data from the Music Charts Archive website.
Project Structure
MusicCharts/
├── docs/
├── frontend/ # React frontend application
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── services/ # API service layer
│ │ ├── App.jsx # Main app component
│ │ ├── index.js # Entry point
│ │ └── styles.css # Global styles
│ ├── public/
│ └── package.json
└── backend/ # Node.js/Express backend API
├── src/
│ ├── routes/ # API routes
│ ├── controllers/ # Route controllers
│ ├── models/ # Data models & scraping logic
│ ├── middleware/ # Express middleware
│ └── server.js # Main server file
└── package.json
Features
- Year Selection: Choose different years to view available chart dates
- Date Selection: Browse and select specific chart dates for detailed song data
- Music Chart Display: View ranked songs with title and artist information
- Yearly Analytics: Calculate and display top songs of the year using multi-factor ranking algorithm
- Enhanced Data Export: Export chart data in JSON format with customizable properties
- JSON Viewer: Interactive JSON data inspection with copy/download functionality
- Property Selection: Choose which data properties to include in exports
- Responsive Design: Mobile-friendly interface
- Error Handling: Comprehensive error states and loading indicators
- Web Scraping: Real-time data scraping from Music Charts Archive
Frontend Components
YearSelector: Dropdown for year selectionDateList: Grid of available chart dates for selected yearChartTable: Music chart data table with rankingsYearlyTopSongs: Enhanced yearly analytics display with calculation metricsJsonViewer: Interactive JSON data viewer with property selectionDownloadButton: JSON export functionality for weekly chartsYearlyDownloadButton: JSON export functionality for yearly chartsLayout: Main layout wrapper
Backend API Endpoints
GET /api/health- Health check endpointGET /api/chart/dates/:year- Get available chart dates for a yearGET /api/chart/data/:date- Get chart data for a specific dateGET /api/chart/yearly-top/:year- Get yearly top songs ranking with calculation metrics
Setup Instructions
Prerequisites
- Node.js (v14 or higher)
- npm or yarn
Backend Setup
-
Navigate to the backend directory:
cd backend -
Install dependencies:
npm install -
Create a
.envfile in the backend directory:PORT=3001 NODE_ENV=development CORS_ORIGIN=http://localhost:3000 -
Start the development server:
npm run dev
The backend will be available at http://localhost:3001
Frontend Setup
-
Navigate to the frontend directory:
cd frontend -
Install dependencies:
npm install -
Start the development server:
npm start
The frontend will be available at http://localhost:3000
API Documentation
Get Available Chart Dates
GET /api/chart/dates/:year
Parameters:
year(number): The year to get chart dates for (1970-present)
Response:
[
{
"date": "2024-01-06",
"formattedDate": "Jan 6, 2024"
},
{
"date": "2024-01-13",
"formattedDate": "Jan 13, 2024"
}
]
Get Chart Data
GET /api/chart/data/:date
Parameters:
date(string): Date in YYYY-MM-DD format
Response:
[
{
"order": 1,
"title": "Lovin On Me",
"artist": "Jack Harlow"
},
{
"order": 2,
"title": "Cruel Summer [re-release]",
"artist": "Taylor Swift"
}
]
Get Yearly Top Songs
GET /api/chart/yearly-top/:year
Parameters:
year(number): The year to get yearly top songs for
Response:
[
{
"order": 1,
"title": "Luther",
"artist": "Kendrick Lamar",
"totalPoints": 245,
"highestPosition": 1,
"appearances": 8
},
{
"order": 2,
"title": "Die With A Smile",
"artist": "Lady Gaga",
"totalPoints": 198,
"highestPosition": 2,
"appearances": 6
}
]
Data Source
This application scrapes data from Music Charts Archive, a comprehensive database of historical music charts. The scraper extracts:
- Chart Dates: Available chart dates for each year
- Song Rankings: Top songs with their chart positions, titles, and artists
- Yearly Analytics: Multi-factor ranking algorithm calculating:
- Total Points: Reverse scoring (1st = 50 points, 50th = 1 point)
- Highest Position: Best position achieved during the year
- Appearances: Number of weeks on the charts
Development
Adding New Features
- Frontend: Add new components in
frontend/src/components/ - Backend: Add new routes in
backend/src/routes/and controllers inbackend/src/controllers/ - Scraping: Modify
backend/src/models/chartService.jsfor new data extraction - Styling: Modify
frontend/src/styles.cssfor global styles - API Integration: Update
frontend/src/services/chartApi.jsfor new endpoints - JSON Formatting: Modify
frontend/src/components/JsonViewer.jsxfor custom data formats
Testing
- Backend tests:
cd backend && npm test - Frontend tests:
cd frontend && npm test
Technologies Used
Frontend
- React 18
- Custom Hooks
- CSS3 with modern features
- Axios for API calls (with extended timeouts for yearly calculations)
Backend
- Node.js
- Express.js
- Cheerio for HTML parsing
- Axios for web scraping (with optimized timeouts)
- CORS for cross-origin requests
- Helmet for security headers
- Morgan for logging
Performance Optimizations
Timeout Management
- Frontend API Timeout: 5 minutes (300,000ms) for yearly calculations
- Backend Scraping Timeout: 30 seconds per request
- Request Delays: 100ms between requests to respect server limits
Data Processing
- Progress Logging: Real-time progress updates during yearly calculations
- Error Recovery: Graceful handling of failed requests with continuation
- Memory Optimization: Efficient data structures for large datasets
Legal Notice
This application is for educational and research purposes. Please respect the terms of service of the Music Charts Archive website and implement appropriate rate limiting and caching strategies for production use.
License
MIT License