RSS as a Maximum Likelihood Estimator

If the residuals of a model are expected to be normally distributed, then the paramaters should be chosen to minimize the RSS 1.

Otherwise the choice of loss function is arbitrary.

Note

Linear Regression is simply the assumption that

where

Consider a linear regression:

If we presume that we could view this from a perspective of choosing a value that is normally distributed around with a variance of .

% If we treat as a a function of the probability of seeing a value , given that this model is true and the residuals are indeed normal, is given by , this would correspond to:

The probability of seeing any such value is:

The likelihood of seeing all the observations would be given by:

This function only has three parameters (, and ), everything else is either an observed value or a constant.

What we want to do is choose values of and that maximize this likelihood for any given :

Thus, in order to maximize the likelihood of seeing any with normally distributed residuals, it is sufficient to choose values and that minimize the RSS.

Footnotes

1

RSS: The Residual Sum of Squares