Matrix Chain Rule
If observations are recorded along rows (row-major), the following linear regression model holds and conventional definition of the partials:
For column-major the model is:
In either case, it will be shown that the chain rule will be:
Notation
The notation, is adapted from Knuth 1 and defined as:
Solving the Product Derivatives
So we have:
Solving the Chain Rule
For the column-major example :
This of course assumes column-major where the columns of represent observations and the rows are features, in the event that these were transposed we would have
and hence:
1
Graham, Ronald L., Donald Ervin Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. 2nd ed. Reading, Mass: Addison-Wesley, 1994.