Deriving Multiple Linear Regression


The goal here, is to maximize the probability of seeing some observation given a mean value and a constant variance

Deriving Multiple Linear Regression

Consider a multiple linear regression of the form:

The bias for this model is included in the weights matrix, as shown in Bias Terms in Multiple Linear Regression.

Consider the probability of seeing an observation under the assumption:

The goal in linear regression is to find such that:

The argument which maximises the likelhood of this expression is equivalent to the the argument which minimises the rss, see Maximum Likelihood is equal to Minimum RSS


Note that the squaring operation is elment-wise, so more conventionally:

because is a quadratic function:

Recall the Matrix Chain Rule

Setting the derivative to 0:


So, in conclusion, if , the value for that minimises the l2 norm is 1 2 :



"5.4 - A Matrix Formulation of the Multiple Regression Model | STAT 462." Accessed August 17, 2023.


s 3.1 of Bingham, N. H., and John M. Fry. Regression: Linear Models in Statistics. Springer Undergraduate Mathematics Series. London ; New York: Springer, 2010.