MusicCharts/docs/PRD.md

401 lines
12 KiB
Markdown

# Product Requirements Document (PRD)
## Music Charts Archive Scraper & Analytics Platform
---
## 📋 **Executive Summary**
A full-stack web application that scrapes, analyzes, and visualizes music chart data from the Music Charts Archive website. The platform provides both weekly chart views and comprehensive yearly rankings, enabling users to explore historical music trends and download data for further analysis.
---
## 🎯 **Product Vision**
Create a comprehensive music analytics platform that transforms raw chart data into actionable insights, making historical music trends accessible and analyzable for researchers, music enthusiasts, and data analysts.
---
## 🎵 **Core Features**
### **1. Data Scraping Engine**
- **Weekly Chart Scraping**: Extract song rankings from individual chart dates
- **Yearly Data Aggregation**: Collect and analyze data across all available dates for a year
- **Real-time Data Fetching**: Live scraping from Music Charts Archive website
- **Error Handling**: Graceful handling of missing data and network issues
### **2. Weekly Chart View**
- **Date Selection**: Browse available chart dates by year
- **Chart Display**: View top 50 songs with rankings, titles, and artists
- **Data Export**: Download weekly chart data in JSON format
- **Responsive Design**: Mobile and desktop optimized interface
### **3. Yearly Analytics**
- **Yearly Rankings**: Calculate top songs of the year based on:
- Total points (position-based scoring)
- Highest achieved position
- Number of appearances
- **Smart Algorithm**: Multi-factor ranking system
- **Yearly Export**: Download comprehensive yearly data
### **4. Data Export System**
- **JSON Format**: Structured data export with metadata
- **File Naming**: Date/year-based file naming convention
- **Metadata Inclusion**: Chart date, total songs, formatted titles
---
## 🔧 **Technical Architecture**
### **Backend Requirements**
#### **Core Technologies (Current: Node.js/Express)**
- **Server Framework**: Node.js/Express, Python/Flask, Java/Spring Boot, C#/.NET, Go/Gin
- **Web Scraping**: Cheerio (Node.js), BeautifulSoup (Python), Jsoup (Java), HtmlAgilityPack (C#), goquery (Go)
- **HTTP Client**: Axios (Node.js), requests (Python), OkHttp (Java), HttpClient (C#), net/http (Go)
- **Data Processing**: JavaScript, Python, Java, C#, Go
#### **API Endpoints**
```
GET /api/health
- Purpose: Health check endpoint
- Response: Status confirmation
GET /api/chart/dates/:year
- Purpose: Get available chart dates for a year
- Parameters: year (number)
- Response: Array of date objects with date and formattedDate
GET /api/chart/data/:date
- Purpose: Get chart data for specific date
- Parameters: date (YYYY-MM-DD format)
- Response: Array of song objects with order, title, artist
GET /api/chart/yearly-top/:year
- Purpose: Get yearly top songs ranking
- Parameters: year (number)
- Response: Array of top 50 songs with order, title, artist
```
#### **Data Models**
```javascript
// Date Object
{
date: "2024-01-20",
formattedDate: "Jan 20, 2024"
}
// Song Object
{
order: 1,
title: "Lovin On Me",
artist: "Jack Harlow"
}
// Weekly Chart Export
{
title: "January 20, 2024 - Top Songs",
date: "2024-01-20",
totalSongs: 50,
songs: [Song Object Array]
}
// Yearly Chart Export
{
year: 2024,
title: "Top Songs of 2024",
songs: [Song Object Array]
}
```
#### **Scraping Logic**
1. **Date Extraction**: Parse HTML for chart date links
2. **Song Data Extraction**: Parse table rows for song information
3. **Data Validation**: Ensure data integrity and completeness
4. **Error Recovery**: Skip problematic dates and continue processing
#### **Yearly Ranking Algorithm**
```javascript
// Scoring System
- Position Points: 1st = 50 points, 50th = 1 point
- Highest Position: Track best position achieved
- Appearance Count: Number of weeks on charts
// Sorting Priority
1. Total Points (descending)
2. Highest Position (ascending)
3. Appearance Count (descending)
```
### **Frontend Requirements**
#### **Core Technologies (Current: React)**
- **UI Framework**: React, Vue.js, Angular, Svelte, Flutter (mobile), SwiftUI (iOS), Jetpack Compose (Android)
- **State Management**: React Hooks, Vuex, Redux, NgRx, Svelte stores
- **HTTP Client**: Axios, fetch API, HttpClient
- **Styling**: CSS3, SCSS, Tailwind CSS, Material-UI, Bootstrap
#### **Component Architecture**
```
App
├── Layout
├── YearSelector
├── ViewModeSelector (Weekly/Yearly)
├── DateList (Weekly mode)
├── ChartTable
├── YearlyTopSongs
├── DownloadButton
└── YearlyDownloadButton
```
#### **State Management**
```javascript
// Core State
{
selectedYear: number,
selectedDate: string | null,
viewMode: 'weekly' | 'yearly',
dates: DateObject[],
chartData: SongObject[],
yearlySongs: SongObject[]
}
// Loading States
{
datesLoading: boolean,
dataLoading: boolean,
yearlyLoading: boolean
}
// Error States
{
datesError: string | null,
dataError: string | null,
yearlyError: string | null
}
```
#### **User Interface Requirements**
- **Responsive Design**: Mobile-first approach
- **Loading States**: Visual feedback during data fetching
- **Error Handling**: User-friendly error messages
- **Accessibility**: WCAG 2.1 compliance
- **Cross-browser Compatibility**: Modern browser support
---
## 📱 **Mobile Platform Considerations**
### **iOS (Swift/SwiftUI)**
```swift
// Data Models
struct ChartDate: Codable {
let date: String
let formattedDate: String
}
struct Song: Codable {
let order: Int
let title: String
let artist: String
}
// Network Layer
class ChartAPIService {
func fetchChartDates(year: Int) async throws -> [ChartDate]
func fetchChartData(date: String) async throws -> [Song]
func fetchYearlyTopSongs(year: Int) async throws -> [Song]
}
// UI Components
struct YearSelector: View
struct ChartTableView: View
struct YearlyTopSongsView: View
struct DownloadButton: View
```
### **Android (Kotlin/Jetpack Compose)**
```kotlin
// Data Models
data class ChartDate(
val date: String,
val formattedDate: String
)
data class Song(
val order: Int,
val title: String,
val artist: String
)
// Network Layer
class ChartAPIService {
suspend fun fetchChartDates(year: Int): List<ChartDate>
suspend fun fetchChartData(date: String): List<Song>
suspend fun fetchYearlyTopSongs(year: Int): List<Song>
}
// UI Components
@Composable fun YearSelector()
@Composable fun ChartTable()
@Composable fun YearlyTopSongs()
@Composable fun DownloadButton()
```
---
## 🎵 **Data Flow**
### **Weekly Chart Flow**
1. User selects year → Fetch available dates
2. User selects date → Fetch chart data
3. Display chart table → Enable download
4. User downloads → Generate JSON file
### **Yearly Chart Flow**
1. User selects year → Fetch all dates for year
2. System scrapes all dates → Calculate rankings
3. Display yearly top songs → Enable download
4. User downloads → Generate yearly JSON file
---
## 🛠 **Development Guidelines**
### **Backend Development**
1. **Error Handling**: Implement comprehensive error handling
2. **Rate Limiting**: Respect website terms of service
3. **Caching**: Implement caching for frequently accessed data
4. **Logging**: Comprehensive logging for debugging
5. **Testing**: Unit tests for core algorithms
### **Frontend Development**
1. **Component Reusability**: Create reusable UI components
2. **State Management**: Centralized state management
3. **Performance**: Optimize for large datasets
4. **User Experience**: Smooth loading and error states
5. **Accessibility**: Screen reader and keyboard navigation support
### **Mobile Development**
1. **Offline Support**: Cache data for offline viewing
2. **Native Features**: Share functionality, file downloads
3. **Performance**: Optimize for mobile devices
4. **Platform Guidelines**: Follow iOS/Android design guidelines
---
## 📊 **Success Metrics**
### **Technical Metrics**
- **API Response Time**: < 2 seconds for chart data
- **Scraping Success Rate**: > 95% successful data extraction
- **Error Rate**: < 5% failed requests
- **Uptime**: > 99% availability
### **User Experience Metrics**
- **Page Load Time**: < 3 seconds
- **User Engagement**: Time spent on platform
- **Download Rate**: Percentage of users downloading data
- **Error Recovery**: Successful error resolution rate
---
## 🔒 **Security & Compliance**
### **Data Protection**
- **Rate Limiting**: Prevent abuse of scraping endpoints
- **User Agent**: Proper identification in requests
- **Error Handling**: No sensitive data in error messages
- **CORS**: Proper cross-origin resource sharing
### **Legal Considerations**
- **Terms of Service**: Respect Music Charts Archive terms
- **Data Usage**: Educational and research purposes
- **Attribution**: Proper credit to data source
- **Rate Limiting**: Responsible scraping practices
---
## 🚀 **Future Enhancements**
### **Phase 2 Features**
- **Historical Trends**: Visual charts showing song performance over time
- **Artist Analytics**: Artist-specific statistics and rankings
- **Genre Analysis**: Genre-based filtering and analysis
- **Advanced Filtering**: Date range, artist, song title filters
### **Phase 3 Features**
- **User Accounts**: Save favorite charts and analyses
- **Social Features**: Share charts and insights
- **API Access**: Public API for developers
- **Data Visualization**: Interactive charts and graphs
---
## 📝 **Implementation Checklist**
### **Backend Setup**
- [ ] Choose server framework and language
- [ ] Set up web scraping library
- [ ] Implement API endpoints
- [ ] Create data models
- [ ] Implement yearly ranking algorithm
- [ ] Add error handling and logging
- [ ] Set up testing framework
### **Frontend Setup**
- [ ] Choose UI framework
- [ ] Set up state management
- [ ] Create reusable components
- [ ] Implement API integration
- [ ] Add download functionality
- [ ] Implement responsive design
- [ ] Add loading and error states
### **Mobile Setup (if applicable)**
- [ ] Set up mobile development environment
- [ ] Create data models
- [ ] Implement network layer
- [ ] Create UI components
- [ ] Add native features (download, share)
- [ ] Test on devices
---
## 🎯 **Current Implementation Details**
### **Tech Stack**
- **Backend**: Node.js, Express.js, Cheerio, Axios
- **Frontend**: React 18, Custom Hooks, CSS3
- **Development**: npm, nodemon
### **Project Structure**
```
m/
├── frontend/ # React frontend application
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── services/ # API service layer
│ │ ├── App.jsx # Main app component
│ │ ├── index.js # Entry point
│ │ └── styles.css # Global styles
│ ├── public/
│ └── package.json
└── backend/ # Node.js/Express backend API
├── src/
│ ├── routes/ # API routes
│ ├── controllers/ # Route controllers
│ ├── models/ # Data models & scraping logic
│ ├── middleware/ # Express middleware
│ └── server.js # Main server file
└── package.json
```
### **Key Files**
- `start.sh` - Startup script for both services
- `README.md` - Setup and usage instructions
- `PRD.md` - This product requirements document
---
This PRD provides a comprehensive blueprint for recreating the Music Charts Archive Scraper in any technology stack, from web frameworks to mobile platforms. The modular architecture and clear data flow make it adaptable to different implementation approaches while maintaining the core functionality and user experience.