ESL 4.1: Introduction

Summary

Decision boundary divides the feature space into labeled regions. We focus on linear decision boundaries.

Discriminant analysis models discriminant function $\delta_k(x)$ for each class, then classifies to $\hat{G}(x) = \arg\max_k \hat{\delta_k}(x)$.

Decision boundary is $\delta_1(x) = \delta_2(x)$.
Example: fit $k$ linear regression models to class indicator responses. Hyperplane decision boundaries.

Modeling the posterior $P(G \lvert X = x)$ is a form of discriminant analysis.

If monotone transformation of the discriminant function or posterior is linear, then decision boundary is linear.

Binary example: if posterior is sigmoid($\beta^Tx$), its logit/log-odds transformation is linear, so the decision boundary is linear: $\beta^Tx = 0$.
LDA and logistic regression both yield linear log-odds or logits; difference is the way the linear function is fit to the training data.

Alternative to discriminant analysis is explicitly modeling a linear boundary; separating hyperplane if two-class.

Perceptron
Optimally separating hyperplane

Generalizations: with quadratic basis expansion, linear decision boundary in augmented space map down to quadratic decision boundary in the original space.

Summary#

Summary