RSS as a Maximum Likelihood Estimator
If the residuals of a model are expected to be normally distributed, then the paramaters should be chosen to minimize the RSS 1.
Otherwise the choice of loss function is arbitrary.
Consider a linear regression:
If we presume that we could view this from a perspective of choosing a value that is normally distributed around with a variance of .
% If we treat as a a function of the probability of seeing a value , given that this model is true and the residuals are indeed normal, is given by , this would correspond to:
The probability of seeing any such value is:
The likelihood of seeing all the observations would be given by:
This function only has three parameters (, and ), everything else is either an observed value or a constant.
What we want to do is choose values of and that maximize this likelihood for any given :
Thus, in order to maximize the likelihood of seeing any with normally distributed residuals, it is sufficient to choose values and that minimize the RSS.
Footnotes
RSS: The Residual Sum of Squares