4.3 Linear Discriminant Analysis - Summary

LDA and QDA
4.3.1 Regularized Discriminant Analysis
4.3.2 Computations for LDA
4.3.3 Reduced-Rank LDA

LDA and QDA

Recall Bayes classifier: $\hat{G}(x)= \arg\max_{k} P(G = k \lvert X = x) = \arg\max_{k} P(X = x \lvert G = k) P(G = k)$. Both LDA and QDA model each class-conditional density as Gaussian.

LDA arises with a common class-conditional density covariance.

Pairwise decision boundary is linear (hyperplane) because log-ratio is linear.
Linear discriminant functions yield same decision boundary.
Estimate prior, mean, and common covariance.

LDA and linear regression:

If two-class, LDA direction is proportional to the coefficient vector from linear regression with +1 and -1 responses.
If multi-class, LDA avoids the masking problem unlike linear regression.

QDA arises with differing covariances.

Log-ratio, pairwise decision boundary, and discriminant functions are quadratic.
Estimate covariance matrix for each class.
TODO: LDA has $O(K \cdot p)$ parameters whereas QDA has $O(K \cdot p^2)$.

LDA and QDA may perform well due to bias-variance tradeoff; incur bias of linear decision boundary because we can estimate it with lower variance.

4.3.1 Regularized Discriminant Analysis

Let each covariance be a weighted mix of common and individual covariances. Choose weight via cross-validation.

4.3.2 Computations for LDA

If we sphere the data using SVD of $\hat{\Sigma}$, we can implement LDA by nearest centroid classification.

4.3.3 Reduced-Rank LDA

LDA is nearest centroid classification with sphered data. Since the centroids span a subspace $H_{K - 1}$ of dimension at most $K - 1$, we can project the p-dimensional data to $H_{K - 1}$ for classification without loss of information.

We can further reduce the dimension to $L < K - 1$, by selecting principal component subspace of the $H_{K - 1}$.

TODO: Prediction with reduced dimension.

TODO: Fisher’s problem. More analysis.

Brian's Blog

4.3 Linear Discriminant Analysis - Summary

LDA and QDA

4.3.1 Regularized Discriminant Analysis

4.3.2 Computations for LDA

4.3.3 Reduced-Rank LDA