Linear Separability Concept
Learn what it means for data to be linearly separable and why this is important for SVMs.
Hyperplane
Separable
Binary
A dataset is linearly separable if there exists a hyperplane that can perfectly separate the data points of different classes without any misclassification errors.
Hyperplane Geometry
Understand the mathematical representation of hyperplanes in n-dimensional space.
Hyperplane Equation: w₁x₁ + w₂x₂ + ... + wₙxₙ + b = 0
Or in vector form: wᵀx + b = 0
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
# Generate linearly separable data
X, y = make_classification(n_samples=100, n_features=2,
n_redundant=0, n_informative=2,
n_clusters_per_class=1, random_state=42)
# Hyperplane parameters
w = np.array([1, -1]) # Normal vector
b = 0.5 # Bias term
# Plot data and hyperplane
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu')
# Create hyperplane
x_line = np.linspace(X[:, 0].min(), X[:, 0].max(), 100)
y_line = -(w[0] * x_line + b) / w[1]
plt.plot(x_line, y_line, 'k-', linewidth=2)
plt.title('Linear Separable Data with Hyperplane')
plt.show()
Decision Boundaries
Understand how hyperplanes serve as decision boundaries for classification.
Decision Rule: For a point x, classify as:
• Class +1 if wᵀx + b > 0
• Class -1 if wᵀx + b < 0
• On boundary if wᵀx + b = 0
def classify_point(x, w, b):
"""Classify a point using linear decision boundary"""
decision_value = np.dot(w, x) + b
if decision_value > 0:
return +1
elif decision_value < 0:
return -1
else:
return 0 # On the boundary
# Example usage
w = np.array([1, -1])
b = 0.5
test_points = np.array([[2, 1], [1, 2], [0, 0.5]])
for point in test_points:
prediction = classify_point(point, w, b)
distance = abs(np.dot(w, point) + b) / np.linalg.norm(w)
print(f"Point {point}: Class {prediction}, Distance {distance:.2f}")
Multiple Separating Planes
Learn why there can be infinite hyperplanes that separate linearly separable data.
For linearly separable data, there are infinitely many hyperplanes that can achieve perfect separation. The question becomes: which one should we choose and why?
# Demonstrate multiple separating hyperplanes
import numpy as np
import matplotlib.pyplot as plt
# Simple 2D linearly separable data
class1 = np.array([[1, 1], [2, 2], [2, 1]])
class2 = np.array([[4, 4], [5, 5], [4, 5]])
plt.figure(figsize=(8, 6))
plt.scatter(class1[:, 0], class1[:, 1], c='red', marker='o', s=100)
plt.scatter(class2[:, 0], class2[:, 1], c='blue', marker='s', s=100)
# Multiple possible separating lines
x = np.linspace(0, 6, 100)
# Different separating hyperplanes
y1 = x - 0.5 # Line 1
y2 = x + 0.5 # Line 2
y3 = 0.5*x + 1 # Line 3
plt.plot(x, y1, 'g--', label='Hyperplane 1')
plt.plot(x, y2, 'm--', label='Hyperplane 2')
plt.plot(x, y3, 'orange', '--', label='Hyperplane 3')
plt.legend()
plt.title('Multiple Possible Separating Hyperplanes')
plt.xlabel('X1')
plt.ylabel('X2')
plt.grid(True, alpha=0.3)
plt.show()
Optimal Hyperplane Idea
Understand the motivation for finding the "best" hyperplane among all possible separating hyperplanes.
Generalization
Robustness
Margin
# Intuition: Why maximum margin?
reasons_for_max_margin = {
"Generalization": "Better performance on unseen data",
"Robustness": "Less sensitive to small data variations",
"Confidence": "Points far from boundary are classified more confidently",
"Uniqueness": "Only one maximum margin hyperplane exists",
"Theory": "Statistical learning theory supports maximum margin"
}
# The maximum margin hyperplane maximizes the minimum
# distance from any training point to the decision boundary
def margin_width(X, y, w, b):
&