🔍 K-Nearest Neighbors

Master instance-based learning, distance metrics, and lazy learning algorithms for classification and regression

← Back to Data Science

K-Nearest Neighbors Curriculum

10
Core Units
~55
Key Concepts
8+
Distance Metrics
20+
Practical Examples
1

Introduction to Instance-Based Learning

Understand the fundamentals of instance-based learning and lazy algorithms.

  • Instance-based learning concept
  • Lazy vs eager learning
  • Memory-based learning
  • No explicit model building
  • Local approximation
  • Similarity-based prediction
  • Non-parametric nature
  • Computational characteristics
2

K-NN Algorithm Fundamentals

Learn the core K-NN algorithm and its simple yet powerful approach.

  • K-NN algorithm steps
  • Nearest neighbor concept
  • K parameter significance
  • Training phase (storage)
  • Prediction phase (search)
  • Majority voting
  • Distance-based similarity
  • Algorithm simplicity
3

Distance Metrics

Master various distance measures for computing similarity between instances.

  • Euclidean distance
  • Manhattan distance
  • Minkowski distance
  • Cosine similarity
  • Hamming distance
  • Mahalanobis distance
  • Custom distance functions
  • Distance metric properties
4

Choosing K Value

Learn strategies for selecting the optimal K parameter for your dataset.

  • K parameter importance
  • Odd vs even K values
  • Cross-validation for K
  • Bias-variance tradeoff
  • Small K effects
  • Large K effects
  • Rule of thumb methods
  • Grid search optimization
5

K-NN for Classification

Apply K-NN to classification problems with majority voting strategies.

  • Classification with K-NN
  • Majority voting mechanism
  • Tie-breaking strategies
  • Weighted voting
  • Probability estimates
  • Multiclass classification
  • Decision boundaries
  • Class imbalance handling
6

K-NN for Regression

Use K-NN for continuous value prediction through local averaging.

  • Regression with K-NN
  • Local averaging
  • Weighted averages
  • Distance-based weights
  • Kernel functions
  • Local linear regression
  • Outlier sensitivity
  • Smooth predictions
7

Curse of Dimensionality

Understand how high-dimensional data affects K-NN performance.

  • High-dimensional challenges
  • Distance concentration
  • Nearest neighbors become equidistant
  • Volume and sparsity
  • Feature selection importance
  • Dimensionality reduction
  • Feature weighting
  • Mitigation strategies
8

Data Preprocessing for K-NN

Learn essential preprocessing techniques for optimal K-NN performance.

  • Feature scaling necessity
  • Standardization vs normalization
  • Handling categorical features
  • Missing value treatment
  • Outlier impact
  • Feature engineering
  • Distance-based preprocessing
  • Data quality importance
9

Efficient K-NN Implementation

Optimize K-NN algorithms for computational efficiency and scalability.

  • Brute force approach
  • K-D trees
  • Ball trees
  • LSH (Locality Sensitive Hashing)
  • Approximate nearest neighbors
  • Memory vs speed tradeoffs
  • Parallel implementations
  • Large-scale considerations
10

Variations and Extensions

Explore advanced K-NN variants and practical applications.

  • Weighted K-NN
  • Adaptive K selection
  • Local outlier factor
  • Condensed nearest neighbor
  • Edited nearest neighbor
  • Fuzzy K-NN
  • Prototype selection
  • Real-world applications

Unit 1: Introduction to Instance-Based Learning

Understand the fundamentals of instance-based learning and lazy algorithms.

Instance-Based Learning Concept

Learn how instance-based learning differs from model-based approaches by storing training examples.

Memory-Based Non-Parametric Local
Instance-based learning stores training instances and makes predictions based on similarity to stored examples, rather than building an explicit generalized model during training.

Lazy vs Eager Learning

Understand the fundamental difference between lazy and eager learning approaches.

Lazy Learning: Defers computation until query time (K-NN, Case-Based Reasoning)
Eager Learning: Builds model during training (Decision Trees, SVM, Neural Networks)
# Comparison of lazy vs eager learning

class LazyLearner:
  def __init__(self):
    self.training_data = []
  
  def fit(self, X, y):
    # Just store the data
    self.training_data = list(zip(X, y))
    print("Training completed: Data stored")
  
  def predict(self, x_query):
    # Computation happens here!
    # Find nearest neighbors and make prediction
    print("Computing prediction using stored data...")
    return self._knn_predict(x_query)

class EagerLearner:
  def __init__(self):
    self.model = None
  
  def fit(self, X, y):
    # Build model during training
    print("Training: Building model...")
    self.model = self._build_model(X, y)
    print("Model built and ready")
  
  def predict(self, x_query):
    # Quick prediction using pre-built model
    return self.model.predict(x_query)

Memory-Based Learning

Explore how instance-based methods use memory to store and retrieve similar cases.

Memory-based learning assumes that similar problems have similar solutions. It relies on storing examples in memory and using similarity measures to retrieve relevant cases for new problems.
import numpy as np
from collections import Counter

class SimpleMemoryBasedLearner:
  def __init__(self):
    self.memory = [] # Store all training examples
  
  def store_example(self, features, label):
    """Store a training example in memory"""
    self.memory.append((features, label))
  
  def retrieve_similar(self, query_features, k=3):
    """Retrieve k most similar examples"""
    distances = []
    
    for stored_features, label in self.memory:
      # Calculate Euclidean distance
      dist = np.linalg.norm(query_features - stored_features)
      distances.append((dist, label))
    
    # Sort by distance and return k nearest
    distances.sort(key=lambda x: x[0])
    return distances[:k]

# Example usage
learner = SimpleMemoryBasedLearner()

# Store some examples
learner.store_example(np.array([1, 2]), "A")
learner.store_example(np.array([2, 3]), "A")
learner.store_example(np.array([8, 9]), "B")

# Query for similar examples
similar = learner.retrieve_similar(np.array([1.5, 2.5]), k=2)
print("Similar examples:", similar)

Local Approximation

Understand how instance-based learning creates local approximations rather than global models.

Local Models Adaptive Context-Sensitive
# Local approximation concept
import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
np.random.seed(42)
X = np.linspace(0, 10, 50)
y = np.sin(X) + 0.1 * np.random.randn(50)

def local_approximation(x_query, X_train, y_train, k=5):
  """Create local approximation around query point"""
  
  # Find k nearest neighbors
  distances = np.abs(X_train - x_query)
  nearest_indices = np.argsort(distances)[:k]
  
  # Use only local data for prediction
  local_X = X_train[nearest_indices]
  local_y = y_train[nearest_indices]
  
  # Simple average of local neighbors
  prediction = np.mean(local_y)
  
  return prediction, local_X, local_y

# Example: Predict at x = 5.0
pred, local_X, local_y = local_approximation(5.0, X, y, k=7)

print(f"Query point: 5.0")
print(f"Local prediction: {pred:.3f}")
print(f"Using {len(local_X)} nearest neighbors")
print(f"Local X values: {local_X}")
print(f"Local y values: {local_y}")

Computational Characteristics

Learn about the computational trade-offs of instance-based learning methods.

Training Time: O(1) - Just store data
Prediction Time: O(n) - Search through all stored instances
Memory Usage: O(n) - Store all training data
import time
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# Compare training and prediction times
def compare_algorithms(X, y, X_test