7.6 Effective Number of Parameters - Summary

How do we generalize “the number of parameters”, for regularized or non-linear models?

For a linear fitting method $\hat{y} = Sy$, effective degrees of freedom is defined as $\text{df}(S) = \text{tr}(S)$. If S is orthogonal projection matrix, $\text{tr}(S) = d$.

Also, define $\text{df}(\hat{y}) = \frac{\sum_i Cov(\hat{y_i}, y_i)}{\sigma^2_\varepsilon}$. If additive-error model, this equals $\text{tr}(S)$.

TODO: df for neural networks.

Brian's Blog

7.6 Effective Number of Parameters - Summary