MemoLearning Decision Trees & Random Forests

What are Decision Trees

Learn the fundamental concept of decision trees as flowchart-like tree structures for decision making.

Flowchart Rules Non-parametric

A decision tree is a flowchart-like tree structure where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or continuous value.

Tree Structure and Terminology

Master the essential terminology used in decision tree algorithms.

Root Node: Top node with no incoming edges
Internal Nodes: Nodes with outgoing edges (decision nodes)
Leaf Nodes: Nodes with no outgoing edges (terminal nodes)
Branches: Edges connecting nodes
Depth: Length of longest path from root to leaf

          # Decision tree terminology

          tree_components = {

            "Root": "Starting point, first decision",

            "Internal Nodes": "Decision points (if-then)",

            "Branches": "Possible outcomes of decisions",

            "Leaf Nodes": "Final predictions/classifications",

            "Depth": "Number of levels in the tree",

            "Splitting": "Process of dividing a node",

            "Pruning": "Removing branches to avoid overfitting"

          }

Decision Tree Intuition

Understand the intuitive logic behind how decision trees make predictions.

Decision trees mimic human decision-making by asking a series of yes/no questions. Each question splits the data into subgroups that are more homogeneous (pure) with respect to the target variable.

          # Example: Should I play tennis?

          # Simple decision tree logic

          def should_play_tennis(weather, temperature, humidity, wind):

            if weather == "sunny":

              if humidity == "high":

                return "No"

              else:

                return "Yes"

            elif weather == "overcast":

              return "Yes"

            elif weather == "rainy":

              if wind == "strong":

                return "No"

              else:

                return "Yes"

          # This mirrors how a decision tree works!

Advantages and Disadvantages

Understand the strengths and limitations of decision tree algorithms.

Interpretable Non-linear Overfitting Instability

          # Decision trees pros and cons

          advantages = [

            "Easy to understand and interpret",

            "No assumptions about data distribution",

            "Handles both numerical and categorical data",

            "Captures non-linear relationships",

            "Automatic feature selection",

            "Handles missing values naturally"

          ]

          disadvantages = [

            "Prone to overfitting",

            "Unstable (small data changes = big tree changes)",

            "Biased toward features with more levels",

            "Can create overly complex trees",

            "Difficult to capture linear relationships",

            "Greedy algorithm (not globally optimal)"

          ]

Classification vs Regression Trees

Learn the differences between classification and regression decision trees.

Classification Trees (CART): Predict discrete class labels using measures like Gini impurity or entropy
Regression Trees: Predict continuous values using measures like mean squared error

          from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor

          import numpy as np

          # Classification example

          X_class = np.array([[0, 0], [1, 1], [0, 1], [1, 0]])

          y_class = np.array([0, 1, 1, 0])  # Binary classes

          clf = DecisionTreeClassifier(criterion='gini')

          clf.fit(X_class, y_class)

          print("Classification prediction:", clf.predict([[0.5, 0.5]]))

          # Regression example

          X_reg = np.array([[1], [2], [3], [4], [5]])

          y_reg = np.array([2.1, 3.9, 6.1, 8.0, 10.2])  # Continuous values

          reg = DecisionTreeRegressor(criterion='squared_error')

          reg.fit(X_reg, y_reg)

          print("Regression prediction:", reg.predict([[3.5]]))

Interpretability Benefits

Understand why decision trees are considered highly interpretable machine learning models.

Decision trees provide clear, rule-based explanations for their predictions. Each path from root to leaf represents a logical rule that can be easily understood by domain experts and stakeholders.

          from sklearn.tree import export_text

          from sklearn.datasets import load_iris

          from sklearn.tree import DecisionTreeClassifier

          # Load iris dataset

          iris = load_iris()

          X, y = iris.data, iris.target

          # Train decision tree

          clf = DecisionTreeClassifier(max_depth=3, random_state=42)

          clf.fit(X, y)

          # Export tree rules as text

          tree_rules = export_text(clf, feature_names

🌳 Decision Trees & Random Forests

Decision Trees & Random Forests Curriculum

Introduction to Decision Trees

Tree Construction Process

Splitting Criteria

Overfitting and Pruning

Classification Trees

Regression Trees

Introduction to Ensembles

Bootstrap Aggregating

Random Forest Algorithm

Feature Importance

Model Evaluation