Decision Trees: Splitting Criteria

Edition #136 | May 12, 2025

May 12, 2025

Master AI Agents & Build Fully Autonomous Web Interactions!

Join our AI Agents Certification Program and learn to develop AI agents that plan, reason, and automate tasks independently.
- A hands-on, 4-weeks intensive program with expert-led live sessions.
- Batch Size is 10, hence you get personalized mentorship.
- High Approval Ratings for the past cohorts
- Create Practical AI Agents after each session
- EMI options available

📅 Starts: 24st May | Early Bird: $1190 (Limited Spots! Price Increases to $2490 in 4 Days)
🔗 Enroll now & unlock exclusive bonuses! (Worth 500$+)

Explore & Learn More Here

Hello!!
Welcome to the new edition of Business Analytics Review!

We're diving into the fascinating world of decision trees, a cornerstone of machine learning, with a focus on the splitting criteria that drive their decision-making: the Gini index, entropy, and information gain. These concepts are key to understanding how decision trees make sense of data, and we’ll explore them with a mix of technical insights and relatable examples.

To a 5-Year-Old, Decision Trees explained without Math | by Jae Duk Seo | Medium

What Are Decision Trees?

Decision trees are intuitive machine learning models used for classification (e.g., predicting whether a customer will churn) and regression (e.g., estimating house prices). They work by recursively splitting the dataset into subsets based on feature values, creating a tree-like structure. Each internal node represents a decision based on a feature, each branch an outcome, and each leaf a final prediction. The challenge lies in choosing the best feature to split on at each step, which is where splitting criteria come in.

For example, imagine a bank deciding whether to approve a loan based on features like credit score, income, and debt-to-income ratio. A decision tree might split the data first on credit score, then on income, creating branches that lead to "approve" or "deny" leaves. The goal is to make these splits as informative as possible, ensuring each subset is as homogeneous as possible in terms of the target variable.

Diving into Splitting Criteria

Let’s break down the three main splitting criteria used in decision trees: Gini index, entropy, and information gain. Each measures the "impurity" of a node (how mixed the classes are) and helps the algorithm decide which feature to split on.

Gini Index

The Gini index, also known as Gini impurity, quantifies the probability of misclassifying a randomly chosen data point in a node. For a binary classification problem (e.g., two classes: positive and negative), the Gini index is calculated as:

[{Gini} = 1 - (p1^2 + p2^2)]

where ( p1 ) and ( p2 ) are the proportions of the two classes in the node. For example, if a node has 60% positive and 40% negative instances, the Gini index is:

[{Gini} = 1 - (0.6^2 + 0.4^2) = 1 - (0.36 + 0.16) = 0.48]

A lower Gini index indicates a purer node, where most instances belong to one class. The CART (Classification and Regression Trees) algorithm uses the Gini index to evaluate splits, choosing the feature that results in the lowest weighted average Gini index across child nodes.

The Gini index is computationally simple because it involves only squaring and summing probabilities, making it a popular choice for large datasets where speed matters.

Entropy

Entropy measures the uncertainty or disorder in a node, rooted in information theory. For a binary classification, entropy is calculated as:

[{Entropy} = -p1 log2(p1) - p2 log2(p2) ]

Using the same example (60% positive, 40% negative), the entropy is:

[{Entropy} = -0.6 log_2(0.6) - 0.4 log_2(0.4) -0.6(-0.737) - 0.4(-1.322) 0.971]

A lower entropy value indicates a purer node. Entropy ranges from 0 (perfectly pure, all instances in one class) to 1 (maximum impurity, equal distribution of classes). Algorithms like ID3 and C4.5 rely on entropy to assess splits.

Entropy is more sensitive to changes in class probabilities than the Gini index, which can lead to different tree structures. However, calculating logarithms makes it computationally more intensive.

Information Gain

Information gain builds on entropy, measuring how much uncertainty is reduced after splitting on a feature. It’s calculated as:

[ {Information Gain} = {Entropy}{parent}} - ({ni}{n}{Entropy}i) ]

where ({Entropy}{parent}) is the entropy of the parent node, ( ni ) is the number of instances in child node ( i ), and ( n ) is the total number of instances in the parent node. The feature with the highest information gain is chosen for the split, as it reduces uncertainty the most.

For example, if splitting on credit score reduces entropy from 0.971 to a weighted average of 0.5 across child nodes, the information gain is 0.471, indicating a significant reduction in uncertainty.

Information gain is central to algorithms like ID3 and C4.5, which prioritize features that create the most homogeneous child nodes.

Comparing Gini Index and Entropy

While both Gini index and entropy measure impurity, they differ in keyways:

Computational Complexity: The Gini index is faster to compute, as it avoids logarithms, making it ideal for large datasets or real-time applications.
Sensitivity: Entropy is more sensitive to small changes in class probabilities, potentially leading to different splits. This sensitivity can be beneficial in datasets with nuanced class distributions but may not always improve performance.
Algorithm Preference: Gini index is used in CART, while entropy (via information gain) is used in ID3 and C4.5. The choice often depends on the algorithm implemented in your machine learning library.

Research suggests that the choice between Gini index and entropy typically has a minimal impact on the final decision tree’s performance (Data Science Stack Exchange). Factors like tree pruning or dataset characteristics often play a larger role. However, if computational efficiency is critical, the Gini index is often preferred.

A Real-World Example

To bring this to life, let’s consider a dataset of 1,000 patients, where we’re predicting whether they have diabetes based on features like fasting blood sugar, BMI, and blood pressure. A decision tree might evaluate splitting on BMI:

Without Split: The parent node has 600 diabetic and 400 non-diabetic patients, with an entropy of 0.971.
Splitting on BMI > 30: This creates two child nodes:
- Child 1 (400 patients, 300 diabetic): Entropy = 0.811
- Child 2 (600 patients, 300 diabetic): Entropy = 1.0
- Weighted average entropy: ( (400/1000) \times 0.811 + (600/1000) \times 1.0 = 0.924 )
- Information gain: ( 0.971 - 0.924 = 0.047 )

Now, compare this to splitting on fasting blood sugar > 120 mg/dL, which might yield a higher information gain or lower Gini index. The algorithm chooses the feature that maximizes information gain or minimizes the weighted Gini index, building the tree step by step.

This process mirrors real-world decision-making. For instance, a marketing team might use a decision tree to segment customers, splitting on features like purchase history or website visits to identify high-value prospects.

When to Choose Which Criterion

Choosing between Gini index and entropy depends on your project’s needs:

Use Gini Index if you prioritize speed and simplicity, especially for large datasets or when using CART-based implementations like scikit-learn’s DecisionTreeClassifier.
Use Entropy/Information Gain if you’re working with algorithms like ID3 or C4.5, or if you suspect your dataset benefits from entropy’s sensitivity to class distribution changes.
Consider Dataset Characteristics: In highly imbalanced datasets, entropy might perform better, as it’s more sensitive to minority classes (Gini vs Entropy).

In practice, many machine learning libraries default to Gini index for its computational efficiency but experimenting with both can help optimize your model.

Trending in AI and Data Science

Let’s catch up on some of the latest happenings in the world of AI and Data Science:

Pope Leo XIV: AI as Humanity’s Defining Challenge
Pope Leo XIV identifies artificial intelligence as the main challenge for humanity, linking his papal vision and name to the ethical dilemmas of the AI era.
Jeff Bezos Leads $72M Investment in AI Data Firm Toloka
Amazon’s Jeff Bezos spearheads a $72 million investment in Toloka, an AI data company focused on blending human expertise with automation to scale high-quality AI solutions.
Safe Superintelligence Targets $20B Valuation in AI Safety
AI startup Safe Superintelligence, led by Ilya Sutskever, aims to quadruple its valuation to $20 billion as it develops advanced, secure artificial intelligence systems.

Trending AI Tool: RapidMiner

Free Licenses for All RapidMiner Products: Machine Learning without Coding

RapidMiner provides a comprehensive platform for data science and machine learning, with tools for data preparation, model building, and deployment, emphasizing ease of use and automation. This powerful data science platform offers an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. Its user-friendly interface and support for end-to-end workflows make it ideal for building decision tree models, whether you’re a beginner or a seasoned data scientist.

RapidMiner’s drag-and-drop interface allows you to experiment with decision trees, visualize splits, and compare Gini index versus entropy-based models without writing code. It’s a fantastic tool for applying the concepts we’ve discussed today.
Learn more.

Thank you for joining us on this journey! Until next time, happy analyzing!

Master AI Agents & Build Fully Autonomous Web Interactions!

📅 Starts: 24st May | Early Bird: $1190 (Limited Spots! Price Increases to $2490 in 4 Days)
🔗 Enroll now & unlock exclusive bonuses! (Worth 500$+)

Business Analytics Review

Decision Trees: Splitting Criteria

Edition #136 | May 12, 2025

Master AI Agents & Build Fully Autonomous Web Interactions!

What Are Decision Trees?

Diving into Splitting Criteria

Gini Index

Entropy

Information Gain

Comparing Gini Index and Entropy

A Real-World Example

When to Choose Which Criterion

Recommended Reads for Further Exploration

Trending in AI and Data Science

Trending AI Tool: RapidMiner

Master AI Agents & Build Fully Autonomous Web Interactions!

Discussion about this post