Project information
- Category: Machine Learning
- Tools: Random Forest, XGBoost, Scikit-Learn
- Adjusted R²: 88%
- Project URL: GitHub
Project Details
🎓 Student Performance Prediction
📌 Problem Statement
This project aims to analyze how students' academic performance (test scores) is influenced by various factors such as Gender, Ethnicity, Parental Level of Education, Lunch Type, and Test Preparation Course. The goal is to build predictive models that estimate students' scores based on these variables.
📊 Data Collection
Dataset Source: Kaggle - Students Performance in Exams
Features Description:
- gender: Sex of students (Male/Female)
- race/ethnicity: Ethnicity of students (Group A, B, C, D, E)
- parental level of education: Parents' highest education level
- lunch: Lunch type before test (Standard/Free or Reduced)
- test preparation course: Completion status (Completed/Not Completed)
- math score, reading score, writing score: Students' performance in each subject
Dataset Summary:
- 8 columns, 1000 rows
- Mean scores between 66 and 68.05
- Standard deviations around 14.6–15.2
- Minimum scores: Math (0), Writing (10), Reading (17)
📈 Model Performance Comparison:
- Linear Regression: R² = 0.88 (best performance)
- Ridge Regression: Similar to Linear, high accuracy
- Random Forest & XGBoost: Good generalization, slightly lower than Ridge
- Decision Tree: Overfitted
- Lasso: Weakest model
🛠️ How to Run the Project:
1. Clone the repository:
git clone https://github.com/Mazenasag/End-to-end-student-performance-prediction-system.git
2. Navigate to project folder:
cd student-performance-prediction
3. Install dependencies:
pip install -r requirements.txt
4. Run the app:
python app.py
5. Open in browser:
http://127.0.0.1:5000/
🚀 Future Enhancements:
- Deploy the model using Streamlit or Flask
- Tune models with GridSearchCV
- Try deep learning techniques
- Add advanced feature engineering
👥 Contributor: Mazen Asag
📜 License: MIT License
Feel free to contribute and improve the model! 🚀