RSS as a Maximum Likelihood Estimator

If the residuals of a model are expected to be normally distributed, then the paramaters should be chosen to minimize the RSS ¹.

Otherwise the choice of loss function is arbitrary.

Note

Linear Regression is simply the assumption that

$y \sim N (μ, σ)$

where $μ = m x$

Consider a linear regression:

$\overset{y}{^} = m x + b + ε$

If we presume that $ε \sim N (0, σ), \exists σ$ we could view this from a perspective of choosing a $y_{i}$ value that is normally distributed around $\overset{y}{^}_{i}$ with a variance of $σ^{2}$ .

% If we treat $\overset{y}{^}$ as a a function of $x$ the probability of seeing a value $y$ , given that this model is true and the residuals are indeed normal, is given by $pnorm (\overset{y}{^} (x), y, σ)$ , this would correspond to:

The probability of seeing any such value is:

$p (y_{i}) = \frac{1}{σ 2 π} exp (\frac{( y _{i} - y ^ _{i} ) ^{2}}{2 σ ^{2}}) = \frac{1}{σ 2 π} exp (\frac{( y _{i} - m x _{i} - b ) ^{2}}{2 σ ^{2}})$

The likelihood of seeing all the observations would be given by:

$L (y) = i \in N \prod [\frac{1}{σ 2 π} exp (- \frac{( y _{i} - y ^ _{i} ) ^{2}}{2 σ ^{2}})]$

This function only has three parameters ( $σ$ , $m$ and $b$ ), everything else is either an observed value or a constant.

What we want to do is choose values of $m$ and $b$ that maximize this likelihood for any given $σ$ :

$m, b Because log is a monotone transform: By log laws: Substituting in the probability: Drop the negative = m, b arg max (i = 1 \prod N [p (y_{i})]) = m, b arg max (lo g (i = 1 \prod N [p (y_{i})])) = m, b arg max (i = 1 \sum N [lo g (p (y_{i}))]) = m, b arg max (i = 1 \sum N [lo g (\frac{1}{σ 2 π} exp (- \frac{( y _{i} - y ^ _{i} ) ^{2}}{2 σ ^{2}}))]) = m, b arg max (\sum N [lo g (\frac{1}{σ 2 π} exp (\frac{( y _{i} - y ^ _{i} ) ^{2}}{2 σ ^{2}}))])$

$Constants can be dropped from argmin = m, b arg min (\sum N [lo g (exp (\frac{( y _{i} - y ^ _{i} ) ^{2}}{2 σ ^{2}}))]) = m, b arg min (\sum N [\frac{( y _{i} - y ^ _{i} ) ^{2}}{2 σ ^{2}}])$

$Dropping constants: = m, b arg min (i = 1 \sum N [(y_{i} - \overset{y}{^}_{i})^{2}])$

Thus, in order to maximize the likelihood of seeing any $y_{i}$ with normally distributed residuals, it is sufficient to choose values $m$ and $b$ that minimize the RSS.

Footnotes

RSS: The Residual Sum of Squares $(\sum_{i = 1}^{n} [(y_{i} - \overset{y}{^}_{i})^{2}])$

Environmental Informatics (MATH3005)

RSS as a Maximum Likelihood Estimator

Footnotes