3.5 Methods using Derived Input Directions - Summary
Given many highly correlated features, use few linear combinations of them.
3.5.1 Principal Components Regression
Use $M \leq p$ principal components.
- Since principal components are orthogonal, $\hat{y}$ is sum of univariate regressions.
- Normalize features: principal components depend on scale of features.
- Similar to ridge: instead of shrinking coefficients with smaller singular values, drop them.
3.5.2 Partial Least Squares
Each derived feature $z$ is a sum of inputs (orthogonalized w.r.t. prev feature), weighted by strength of univariate effect on $y$. For each feature:
- $z = \sum_i^p proj(y \rightarrow x_i)$
- $\hat{y} \mathrel{+}= proj(y \rightarrow z)$
- $x_i \mathrel{-}= proj(x_i \rightarrow z)$
Optimization formulation: PLS seeks directions that has high variance and are highly correlated with response.