Recall Bayes classifier: $\hat{G}(x)= \arg\max_{k} P(G = k \lvert X = x) = \arg\max_{k} P(X = x \lvert G = k) P(G = k)$. Both LDA and QDA model each class-conditional density as Gaussian.
LDA arises with a common class-conditional density covariance.
LDA and linear regression:
QDA arises with differing covariances.
LDA and QDA may perform well due to bias-variance tradeoff; incur bias of linear decision boundary because we can estimate it with lower variance.
Let each covariance be a weighted mix of common and individual covariances. Choose weight via cross-validation.
If we sphere the data using SVD of $\hat{\Sigma}$, we can implement LDA by nearest centroid classification.
LDA is nearest centroid classification with sphered data. Since the centroids span a subspace $H_{K - 1}$ of dimension at most $K - 1$, we can project the p-dimensional data to $H_{K - 1}$ for classification without loss of information.
We can further reduce the dimension to $L < K - 1$, by selecting principal component subspace of the $H_{K - 1}$.
TODO: Prediction with reduced dimension.
TODO: Fisher’s problem. More analysis.