4.4 Logistic Regression - Derivations
(4.18) Express posteriors using Sigmoid (binary) and Softmax (multi-class)
(Not stated in book.)
(4.21) Maximizing log-likelihood is equivalent to minimizing cross-entropy
(Not stated in book.)
(4.23) Why does the Newton-Raphson algorithm work?
(p121) Log-likelihood is concave
(p121) Coordinate-descent methods can be used to maximize the (multi-class) log-likelihood efficiently
(Table 4.2) Std Error and Z-score for each coefficient
(Table 4.3) Subset selection using analysis of deviance
(Section 4.4.3) Implications of least squares connections
(4.31) L-1 regularized log-likelihood is concave
(Section 4.4.4) Optimization methods for L-1 regularized logistic regression
(Section 4.4.5) LDA is generative and logistic regression is discriminative
(Not stated in book.)
(4.37) Maximizing full log-likelihood gives LDA parameter estimates
(p128) LDA is not robust to gross outliers
(p128) Marginal likelihood as a regularizer