MemoLearning Logistic Regression

What is Classification

Learn the fundamental concept of classification in machine learning and supervised learning.

Supervised Learning Categorical Output Discrete Labels

Classification is a supervised learning task where the goal is to predict the category or class of new observations based on a training dataset of observations whose category membership is known.

Binary vs Multiclass Problems

Understand the difference between binary and multiclass classification problems.

Binary Classification: 2 classes (Yes/No, Spam/Not Spam, Fraud/Not Fraud)
Multiclass Classification: 3+ classes (Red/Green/Blue, Cat/Dog/Bird)
Multilabel Classification: Multiple labels per instance

          # Examples of classification problems

          classification_types = {

            "Binary": {

              "Email": ["Spam", "Not Spam"],

              "Medical": ["Disease", "Healthy"],

              "Finance": ["Fraud", "Legitimate"]

            },

            "Multiclass": {

              "Image": ["Cat", "Dog", "Bird", "Fish"],

              "Text": ["Sports", "Politics", "Technology"],

              "Iris": ["Setosa", "Versicolor", "Virginica"]

            }

          }

Decision Boundaries

Understand how classifiers create decision boundaries to separate different classes.

Decision Boundary: The hyperplane that separates different classes in the feature space. For logistic regression, this boundary is linear and defined by the equation where the predicted probability equals 0.5.

          import numpy as np

          import matplotlib.pyplot as plt

          from sklearn.linear_model import LogisticRegression

          # Create sample 2D data

          np.random.seed(42)

          X = np.random.randn(100, 2)

          y = (X[:, 0] + X[:, 1] > 0).astype(int)

          # Fit logistic regression

          model = LogisticRegression()

          model.fit(X, y)

          # Decision boundary: w0*x1 + w1*x2 + b = 0

          w = model.coef_[0]

          b = model.intercept_[0]

          # Plot decision boundary

          x_boundary = np.linspace(-3, 3, 100)

          y_boundary = -(w[0] * x_boundary + b) / w[1]

          plt.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu')

          plt.plot(x_boundary, y_boundary, 'k-', linewidth=2)

          plt.title('Logistic Regression Decision Boundary')

          plt.show()

Classification vs Regression

Learn the key differences between classification and regression tasks.

Discrete Continuous Categorical Numerical

          # Key differences

          differences = {

            "Output Type": {

              "Classification": "Discrete/Categorical",

              "Regression": "Continuous/Numerical"

            },

            "Examples": {

              "Classification": "Email spam detection",

              "Regression": "House price prediction"

            },

            "Evaluation": {

              "Classification": "Accuracy, Precision, Recall",

              "Regression": "MSE, RMSE, MAE, R²"

            },

            "Algorithms": {

              "Classification": "Logistic Regression, SVM, Random Forest",

              "Regression": "Linear Regression, Ridge, Lasso"

            }

          }

Real-world Applications

Explore practical applications of classification in various industries.

Healthcare Finance Marketing Technology

          # Real-world classification applications

          applications = {

            "Healthcare": [

              "Disease diagnosis from symptoms",

              "Medical image classification",

              "Drug response prediction"

            ],

            "Finance": [

              "Credit approval decisions",

              "Fraud detection systems",

              "Risk assessment models"

            ],

            "Marketing": [

              "Customer segmentation",

              "Churn prediction",

              "Ad targeting optimization"

            ],

            "Technology": [

              "Email spam filtering",

              "Image recognition",

              "Natural language processing"

            ]

          }

Probabilistic Interpretation

Understand how classification can be viewed through a probabilistic lens.

Instead of hard predictions, we estimate P(y=1|X) - the probability that an instance belongs to class 1 given its features X. This probabilistic approach provides uncertainty estimates and enables better decision-making.

          from sklearn.linear_model import LogisticRegression

          import numpy as np

          # Sample data

          X = np.array([[1, 2], [2, 3], [3, 1], [4, 5]])

          y = np.array([0, 0, 1, 1])

          # Fit logistic regression

          model = LogisticRegression()

          model.fit(X, y)

          # Get probability predictions

          probabilities = model.predict_proba(X)

          print("Probabilities for each class:")

          print(probabilities)

          # Get class predictions (threshold = 0.5)

          predictions = model.predict(X)

          print("Class predictions:", predictions)

          # Custom threshold

          custom_predictions = (probabilities[:, 1] > 0.3).astype(int)

          print("Custom threshold predictions:", custom_predictions)

📊 Logistic Regression

Logistic Regression Curriculum

Introduction to Classification

From Linear to Logistic

The Sigmoid Function

Maximum Likelihood Estimation

Cost Function and Optimization

Binary Classification

Multiclass Classification

Regularization Techniques

Model Evaluation