Deriving Multiple Linear Regression

Introduction

The goal here, is to maximize the probability of seeing some observation $y_{i}$ given a mean value $\overset{y}{^}_{i}$ and a constant variance $σ$

Deriving Multiple Linear Regression

Consider a multiple linear regression of the form:

$N \times 12 \hat{Y} = N \times 3 N \times 12 XW$

The bias for this model is included in the weights matrix, as shown in Bias Terms in Multiple Linear Regression.

Consider the probability of seeing an observation under the assumption:

$y_{i} ⟹ P (y_{i}) \sim N (\overset{y}{^}_{i}, σ) \propto e^{(\frac{y _{i} - y _{i} ^}{σ})^{2}}$

The goal in linear regression is to find $\hat{Y} = XW$ such that:

$W = W argmax [i = 1 \prod n (P (Y_{i}))]$

The argument $W$ which maximises the likelhood of this expression is equivalent to the the argument $W$ which minimises the rss, see Maximum Likelihood is equal to Minimum RSS

$W = W argmin [ε (W)]$

where:

$ε (W) = i = 1 \sum N j = 12 \sum 12 [(XW - Y)^{\circ 2}]$

Note that the squaring operation is elment-wise, so more conventionally:

$(ε (W))_{[ij]} = i = 1 \sum N j = 12 \sum 12 [((XW - Y)_{[ij]})^{2}]$

because $ε$ is a quadratic function:

$\frac{\partial ε}{\partial W} = 0 ⟺ W = W arg min (ε)$

Recall the Matrix Chain Rule

$\frac{\partial ε}{\partial W} \frac{\partial ε}{\partial Y ^} \frac{\partial Y ^}{\partial W} ⟹ \frac{\partial ε}{\partial W} = \frac{\partial Y ^}{\partial W} (\frac{\partial ε}{\partial Y ^})^{T} = 2 (\hat{Y} - Y) = X = 2 X^{T} (\hat{Y} - Y)$

Setting the derivative to 0:

$0 X^{T} XW W = X^{T} (XW - Y) = X^{T} XW - X^{T} Y = X^{T} Y = (X^{T} X)^{- 1} X^{T} Y$

Conclusion

So, in conclusion, if $Y = WX$ , the value for $W$ that minimises the l2 norm is ¹ ² :

$W = (X^{T} X)^{- 1} X^{T} Y$

References

"5.4 - A Matrix Formulation of the Multiple Regression Model | STAT 462." Accessed August 17, 2023. https://online.stat.psu.edu/stat462/node/132/.

s 3.1 of Bingham, N. H., and John M. Fry. Regression: Linear Models in Statistics. Springer Undergraduate Mathematics Series. London ; New York: Springer, 2010.

Environmental Informatics (MATH3005)

Deriving Multiple Linear Regression

Introduction