# The Linear Regression Cost Function in Matrix Form

## Preliminaries

In this tutorial I will describe the implementation of the linear regression cost function in matrix form, with an example in Python with Numpy and Pandas. If you would like to jump to the python code you can find it on my github page.

I will walk you though each part of the following vector product in detail to help you understand how it works:

In order to explain how the vectorized cost function works lets use a simple abstract data set described below:

One more vector will be needed to help us with our calculation:

## Vectorizing the Cost Function

### Calculating the Hypothesis

To produce the matrix representation of the hypothesis function the matrix X is multiplied by the theta column vector, the operation of which consists of multiplying the rows of X by the columns of y. The result of which is a column vector.

### Calculating the Errors

To calculate the errors the elements of the target vector y is subtracted from the hypothesis vector.

### Calculating the Squared Errors

To calculate the square of the errors an element wise squaring operation is performed.

### Calculating the Sum of the Squared Errors

To sum the elements of the squared error vector a 1xm row vector (o) is multiplied by the squared error vector (e) to produce a 1×1 vector containing the sum of all the squared errors.

### Calculating the Product of the Reciprocal 2m by the Squared Errors

Finally to complete the cost function calculation the sum of the sqared errors is multiplied by the reciprocal of 2m.

Thats it! Now lets get our hands dirty implementing it in Python.

## Implementing the Cost Function in Python

In this section we will impliment our vectorized for of the cost function with a simple (ok, contrived) dataset.

#prologue import pandas as pd import numpy as np X = pd.DataFrame({'x0' : [1,1,1], 'x1': [1,2,5]})

x0 | x1 | |
---|---|---|

0 | 1 | 1 |

1 | 1 | 2 |

2 | 1 | 5 |

y = pd.DataFrame({'y' : [0,2,4]})

y | |
---|---|

0 | 0 |

1 | 2 |

2 | 4 |

### Calculate the Hypothesis

To calculate our hypothesis we will first set theta to:

and calculate:

We can expect the product to result in a 3×1 matrix.

#Numpy .dot is used instead because the dataframe .dot function requires matching column names. xtheta = np.dot(X, theta)

theta | |
---|---|

0 | 0 |

1 | 1 |

#Numpy .dot is used instead because the dataframe .dot function requires matching column names. xtheta = np.dot(X, theta) # array([[1], # [2], # [5]])

### Calculating the Errors

errors = xtheta - y

y | |
---|---|

0 | 1 |

1 | 0 |

2 | 1 |

### Calculating the Squared Errors

errorsSquared = errors**2

y | |
---|---|

0 | 1 |

1 | 0 |

2 | 1 |

### Summing the Errors

To sum the error vector we multiply it by a column vector of ones. The multiplication results in the row elements of ones being multiplied and summed by column vector of errors, resulting in the sum of squared errors.

ones = pd.DataFrame({'ones':np.ones(3)})

ones | |
---|---|

0 | 1 |

1 | 1 |

2 | 1 |

sumSquaredErrors = np.dot(ones.T, errorsSquared) # array([[ 2.]])

### Multiplying the Sum of the Errors by the Reciprocal of 2m

Finally a reciprocal of 2m is multiplied by the sum of the squared result resulting in the cost for the chosen set of thetas.

error = sumSquaredErrors/(2*3) # array([[ 0.33333333]])

As expected the result is a one dimensional matrix containing one third.

### The Cost Function in Python

import pandas as pd import numpy as np def cost(X, y, theta): """ Calculates the cost function. Args: X (DataFrame): The design matrix y (DataFrame): The target theta (DataFrame): The hypothesis function's independent variables Returns: np.Array: the calculated squared error cost """ m = len(y) o = pd.DataFrame(np.ones(m)) errorSquared = (np.dot(X,theta)-y)**2 return np.dot(o.T, errorSquared)/(2*m)

This is the best explanation I’ve seen for how this is vectorized, and you make it very intuitive! I would love a similar breakdown of the vectorized gradient descent algorithm, which I still can’t wrap my head around

Nicely explained, and I was hoping to do the gradient descent too.