Deriving Multiple Linear Regression

Introduction

The goal here, is to maximize the probability of seeing some observation given a mean value and a constant variance

Deriving Multiple Linear Regression

Consider a multiple linear regression of the form:

The bias for this model is included in the weights matrix, as shown in Bias Terms in Multiple Linear Regression.

Consider the probability of seeing an observation under the assumption:

The goal in linear regression is to find such that:

The argument which maximises the likelhood of this expression is equivalent to the the argument which minimises the rss, see Maximum Likelihood is equal to Minimum RSS

where:

Note that the squaring operation is elment-wise, so more conventionally:

because is a quadratic function:

Recall the Matrix Chain Rule

Setting the derivative to 0:

Conclusion

So, in conclusion, if , the value for that minimises the l2 norm is 1 2 :

References

1

"5.4 - A Matrix Formulation of the Multiple Regression Model | STAT 462." Accessed August 17, 2023. https://online.stat.psu.edu/stat462/node/132/.

2

s 3.1 of Bingham, N. H., and John M. Fry. Regression: Linear Models in Statistics. Springer Undergraduate Mathematics Series. London ; New York: Springer, 2010.