План проекта по машинному обучению
Оригинал тут → https://github.com/Data-Learn/data-science/blob/main/ML-101%20Modules/Module%2001/Lesson%2003/Plan.ipynb
С курса по ML от Анастасии Рицца на datalearn.ru
Общее название проекта
«Детальное название проекта.»
Table of Contents
Part 0: Introduction
Overview
О чем этот датасет
+
Метаданные:
- Rank — Ranking of overall sales
- Name — The games name
- Year — Year of the game’s release
Assumptions
Пояснения/уточнения
Questions:
Вопросы но котороые надо ответить
- #### Question 1:
- #### Question 2:
- #### Question 3:
Part 1: Import, Settings, Load Data
- ### Import libraries, Create settings, Read data from ‘.csv’ file
Part 2: Exploratory Data Analysis
- ### Info, Head, Describe
- ### Observation of target variable «…»
- ### Missing Data
- #### List of data features with missing values (visualisation: какую диаграмму, график или плот используем?)
- #### Filling missing values
- ### Numerical and Categorical features
- #### List of Numerical and Categorical features
- #### Numerical features:
- Head
- Visualisation of Numerical features (какую диаграмму, график или плот используем?)
- Outliers (visualisation: какую диаграмму, график или плот используем?)
- Correlation Numerical features to the target
- #### Categorical Features:
- Head
- Visualisation of Categorical features (какую диаграмму, график или плот используем?)
- Convert Categorical into Numerical features
- Drop all old Categorical features
- #### Correlation new features to the target. Drop all features with weak correlation to the target.
- #### Visualisation of all data features with strong correlation to target (visualisation: heatmap)
Part 3: Data Wrangling and Transformation
- ### Multicollinearity
- ### Standard Scaler
- ### Creating datasets for ML part
- ### ‘Train\Test’ splitting method
Part 4: Machine Learning
- ### ML Models
- ### Build and train a models
- ### Evaluate a models
- #### If regression: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Abolute Error (MAE), R Squared
- #### If Classification: Classification Report and Confusion Matrix
- ### Hyper parameters tuning (если надо)
- ### Creating final predictions with Test set
- ### If Classification: AUC–ROC curve (если надо)
Conclusion
- ### Submission of ‘.csv’ file with predictions