Matrix Chain Rule

If observations are recorded along rows (row-major), the following linear regression model holds and conventional definition of the partials:

For column-major the model is:

In either case, it will be shown that the chain rule will be:

Notation

The notation, is adapted from Knuth 1 and defined as:

Solving the Product Derivatives

So we have:

Solving the Chain Rule

For the column-major example :

This of course assumes column-major where the columns of represent observations and the rows are features, in the event that these were transposed we would have

and hence:

1

Graham, Ronald L., Donald Ervin Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. 2nd ed. Reading, Mass: Addison-Wesley, 1994.