Skip to content

Decision Tree Algorithm Comparision

MinhDungDo edited this page Dec 21, 2020 · 3 revisions

Survey of Decision Tree Algorithms

Introduction

Decision Tree belongs to a family of supervised learning algorithms. Unlike other supervised learning algorithms, decision tree algorithm can also be used for solving regression and classification problems. Hence the reason why decision tree often applied in working with medical data to classified patients according to their conditions.

Decision Tree Algorithms

There are several statistical algorithms that can be uses to build a decision tree. Depending on our clinical data set about heart failure prediction, we will need to choose a decision tree algorithms accordingly.

The table below provides a brief comparison between decision tree algorithms, The information from this table are gathered from various articles: ResearchGate ,TowardsDataScience

Methods CART C4.5 CHAID
Measures used to select input variable Gini index Gain Ratio Chi-square or F-tests
Pruning Pre-pruning using a single-pass algorithm Pre-pruning using a single-pass algorithm Pre-pruning using Chi-square test for independence
Dependent variable Categorical or Continuous Categorical or Continuous Categorical
Input variable Categorical or Continuous Categorical or Continuous Categorical or Continuous
Split at each node Split on linear combinations Multiple Multiple

CHAID

The Chi-squared Automatic Interaction Detection (CHAID) is an algorithms method that created multi-way Decision Trees. CHAID is suitable for classification and regression tasks. The Decision Trees that created by CHAID are tend to be wider rather then deeper. CHAID is not the most powerful method in detecting smallest possible differences or fastest algorithm out there.

CART

CART is a Decision Tree algorithm that created binary classification or regression trees. It can handle data in raw form and can use the variables again in the same decision tree.

C4.5

C4.5 can handle continuous and categorical data, make it able to create regression and classification trees. It can also handle missing values by ignoring instances that include non-existing data. There is a latest version of C4.5 algorithm that is called C5. C5 presents with some improvements like:

  • Improvement in speed
  • Memory usage
  • Smaller decision trees
  • Additional data types
  • Winnowing

Clone this wiki locally