Decision Trees. Implemented

Simple, module wise implementation of Decision Trees using Gini index

Inner mechanisms

For this article, I’ll refer to the target column of the dataset as Y and the rest as X. At the very core of the system, we’ve got a third level nested loop that one each iteration performs the following,

  1. The middle one loops through each column of X to find one with the best split point, dubbed as the best attribute. At this level, we’re gonna make the split. With the split made, we’ve got two new datasets, shrunk in size and a little bit more homogenous. We apply the same rules to the new datasets alike until we either stumble upon a case where there are no more features to split on — an absolutely pure state of the dataset, or each label of Y is the same.
  2. This brings us to the outermost loop that iterates through each of the child datasets to execute the inner loops.


Let's kick off with the imports,

Developer, writer, artist. Total blend. Visit me at