Decision Tree Algorithm Comparision

Survey of Decision Tree Algorithms

Introduction

Decision Tree belongs to a family of supervised learning algorithms. Unlike other supervised learning algorithms, decision tree algorithm can also be used for solving regression and classification problems. Hence the reason why decision tree often applied in working with medical data to classified patients according to their conditions.

Decision Tree Algorithms

There are several statistical algorithms that can be uses to build a decision tree. Depending on our clinical data set about heart failure prediction, we will need to choose a decision tree algorithms accordingly.

The table below provides a brief comparison between decision tree algorithms, The information from this table are gathered from various articles: ResearchGate ,TowardsDataScience

Methods	CART	C4.5	CHAID
Measures used to select input variable	Gini index	Gain Ratio	Chi-square or F-tests
Pruning	Pre-pruning using a single-pass algorithm	Pre-pruning using a single-pass algorithm	Pre-pruning using Chi-square test for independence
Dependent variable	Categorical or Continuous	Categorical or Continuous	Categorical
Input variable	Categorical or Continuous	Categorical or Continuous	Categorical or Continuous
Split at each node	Split on linear combinations	Multiple	Multiple

CHAID

The Chi-squared Automatic Interaction Detection (CHAID) is an algorithms method that created multi-way Decision Trees. CHAID is suitable for classification and regression tasks. The Decision Trees that created by CHAID are tend to be wider rather then deeper. CHAID is not the most powerful method in detecting smallest possible differences or fastest algorithm out there.

CART

CART is a Decision Tree algorithm that created binary classification or regression trees. It can handle data in raw form and can use the variables again in the same decision tree.

C4.5

C4.5 can handle continuous and categorical data, make it able to create regression and classification trees. It can also handle missing values by ignoring instances that include non-existing data. There is a latest version of C4.5 algorithm that is called C5. C5 presents with some improvements like:

Improvement in speed
Memory usage
Smaller decision trees
Additional data types
Winnowing

Decision Tree Algorithm Comparision

Survey of Decision Tree Algorithms

Introduction

Decision Tree Algorithms

CHAID

CART

C4.5

Paper Publication

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Surveys: DBMS, Graph DBMS

Survey of Decision Tree: Algorithms & Codes

Survey of Dataset for ML Algorithms

Dataset 1: Heart Failure Prediction

Dataset 2: Metaprotein

Dataset 3: Flu Classification

Decision Tree Plugin (DTP)

Clone this wiki locally