Please note that the dataset has been removed from data/ directory in case of dataset leakage, remember to add the dataset in this directory and modify the data path in data_processing.py before running the code
To clone the project:
git clone https://github.com/ACSEkevin/Industrial-Programme-with-AMRC-Sheffield.git
checkpoint/: storing weightsHDF5 file
data/: storing dataset csv file
itpma3_utils/:
utils.py: wrapping functions and classes that are frequently usedmodels/: machine learning models
data_processing.py: data analysis, preprocessing, feature engineering
train.py: model training
evaluate.py: model evaluation
requirements.py: for version test and available packages detecting
The notebook version of data processing, model training and evalutaion are also provided which can resent a clear overall visualizations:
NOTICE:
Please change the directory before running the code, in colab, this command might be helpful:
from google.colab import drive
drive.mount('/content/drive') The models in the project are developed using Keras/TensorFlow (MLP) and Scikit-Learn (AdaBoost, XGBoost, LightGBM, same API), any questions please refer to
- Keras tutorial: build a model in class object
- TensorFlow tutorial: model save checkpoint, weights saving and loading
- Sklearn ensemble tutorial: Ensemble learning & AdaBoost
- Numpy quick start
- XGBoost sklearn API tutorial
- LightGBM sklearn API tutorial
The project has six contributors. All the page links will be refined in the future