3.5 Methods using Derived Input Directions - Summary

3.5.1 Principal Components Regression
3.5.2 Partial Least Squares

Given many highly correlated features, use few linear combinations of them.

3.5.1 Principal Components Regression

Use $M \leq p$ principal components.

Since principal components are orthogonal, $\hat{y}$ is sum of univariate regressions.
Normalize features: principal components depend on scale of features.
Similar to ridge: instead of shrinking coefficients with smaller singular values, drop them.

3.5.2 Partial Least Squares

Each derived feature $z$ is a sum of inputs (orthogonalized w.r.t. prev feature), weighted by strength of univariate effect on $y$. For each feature:

$z = \sum_i^p proj(y \rightarrow x_i)$
$\hat{y} \mathrel{+}= proj(y \rightarrow z)$
$x_i \mathrel{-}= proj(x_i \rightarrow z)$

Optimization formulation: PLS seeks directions that has high variance and are highly correlated with response.