Decision Tree in Python : Metaprotein

Python code

R code

Decision Tree

We have implemented the following criteria to split nodes in our Decision Trees for comparison of accuracy

Gini Index
Information Gain (Entropy)

Metaprotein Dataset

Metaproteins as rows and, patient data of three types with samples from each being tested for the presence of metaproteins in columns (along with metaprotein demographics)

To suit our decision tree model, we removed the demographic columns from the dataset and have transposed the data frame to turn metaproteins into columns/variables & patients as rows.

We created a class label "Patient type" which has 3 factors - C, UC & CD

Metaprotein Dataset after pre-processing

We have taken half (1/2) of our Metaprotein Dataset to be used as Training Dataset & (1/2) to be used as Testing Dataset

Gini Index

depth = 2
leaf nodes = 4

Confusion Matrix: Prediction on Test Dataset

Accuracy = 19/24 = 79.16 %

Information Gain (Entropy)

depth = 2
leaf nodes = 4

Confusion Matrix: Prediction on Test Dataset

Accuracy = 17/24 = 70.83 %

Pruning

A tree that is too large risks overfitting the training data and poorly generalizing to new samples. We realized the necessity of pruning the decision trees by tuning parameters namely, "max_depth" and "max_leaf_nodes".

Pruned Decision Tree: Gini Index

depth = 1
leaf nodes = 2

Confusion Matrix: Prediction on Test Dataset

Accuracy = 54.1 %

Pruned Decision Tree: Information Gain (Entropy)

depth = 2
leaf nodes = 3

Confusion Matrix: Prediction on Test Dataset

Accuracy = 70.8 %

Paper Publication

Citation in JabRef

Home

Surveys: DBMS, Graph DBMS

Survey of Decision Tree: Algorithms & Codes

Survey of Dataset for ML Algorithms

Clinical Datasets for ML Algorithms

Decision Tree in Python : Metaprotein

Python code

R code

Decision Tree

Metaprotein Dataset

Metaprotein Dataset after pre-processing

Gini Index

Confusion Matrix: Prediction on Test Dataset

Information Gain (Entropy)

Confusion Matrix: Prediction on Test Dataset

Pruning

Pruned Decision Tree: Gini Index

Confusion Matrix: Prediction on Test Dataset

Pruned Decision Tree: Information Gain (Entropy)

Confusion Matrix: Prediction on Test Dataset

Paper Publication

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Surveys: DBMS, Graph DBMS

Survey of Decision Tree: Algorithms & Codes

Survey of Dataset for ML Algorithms

Dataset 1: Heart Failure Prediction

Dataset 2: Metaprotein

Dataset 3: Flu Classification

Decision Tree Plugin (DTP)

Clone this wiki locally