The Linear Regression Cost Function in Matrix Form

Preliminaries

In this tutorial I will describe the implementation of the linear regression cost function in matrix form, with an example in Python with Numpy and Pandas. If you would like to jump to the python code you can find it on my github page.

$J(\theta) = \frac{1}{2m}\sum{(h_{\theta}(x^{(i)}) - y^{(i)})^2}$

I will walk you though each part of the following vector product in detail to help you understand how it works:

$J(\theta) = \frac{1}{2m}\vec{o}^{T}[(X\vec{\theta}-\vec{y})^2]$

In order to explain how the vectorized cost function works lets use a simple abstract data set described below:

$X = \begin{bmatrix} 1 & x^1_1 \\ 1 & x^2_1 \\ 1 & x^3_1 \end{bmatrix}, \quad \vec{y} = \begin{bmatrix} y^1_1 \\ y^2_1 \\ y^3_1 \end{bmatrix}$

One more vector will be needed to help us with our calculation:

$\vec{\theta} = \begin{bmatrix} \theta_0 \\ \theta_1 \end{bmatrix}, \quad \vec{o}^{T} = \begin{bmatrix} 1 & 1 & 1 \end{bmatrix}$

Vectorizing the Cost Function

Calculating the Hypothesis

To produce the matrix representation of the hypothesis function the matrix X is multiplied by the theta column vector, the operation of which consists of multiplying the rows of X by the columns of y. The result of which is a column vector.

$h_{\theta}(x) = X\vec{\theta} = \begin{bmatrix} 1 & x^1_1 \\ 1 & x^2_1 \\ 1 & x^3_1 \end{bmatrix} \begin{bmatrix} \theta_0 \\ \theta_1 \end{bmatrix} = \begin{bmatrix} \theta_0 + \theta_1x^1_1 \\ \theta_0 + \theta_1x^2_1 \\ \theta_0 + \theta_1x^3_1 \end{bmatrix}$

Calculating the Errors

To calculate the errors the elements of the target vector y is subtracted from the hypothesis vector.

$\vec{e} = X\vec{\theta} - \vec{y} = \begin{bmatrix} \theta_0 + \theta_1x^1_1 - y^1_1 \\ \theta_0 + \theta_1x^2_1 - y^2_1 \\ \theta_0 + \theta_1x^3_1 - y^3_1 \end{bmatrix}$

Calculating the Squared Errors

To calculate the square of the errors an element wise squaring operation is performed.

$\vec{e}^2 = (X\vec{\theta} - \vec{y})^2 = \begin{bmatrix} (\theta_0 + \theta_1x^1_1 - y^1_1)^2 \\ (\theta_0 + \theta_1x^2_1 - y^2_1)^2 \\ (\theta_0 + \theta_1x^3_1 - y^3_1)^2 \end{bmatrix}$

Calculating the Sum of the Squared Errors

To sum the elements of the squared error vector a 1xm row vector (o) is multiplied by the squared error vector (e) to produce a 1×1 vector containing the sum of all the squared errors.

$\sum{(h_{\theta}(x^{(i)}) - y^{(i)})^2} = \vec{o}^T \vec{e}^2$

$= \vec{o}^T(X\vec{\theta} - \vec{y})^2 = \begin{bmatrix} 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} (\theta_0 + \theta_1x^1_1 - y^1_1)^2 \\ (\theta_0 + \theta_1x^2_1 - y^2_1)^2 \\ (\theta_0 + \theta_1x^3_1 - y^3_1)^2 \end{bmatrix}$

$= \begin{bmatrix} (\theta_0 + \theta_1x^1_1 - y^1_1)^2 + (\theta_0 + \theta_1x^2_1 - y^2_1)^2 + (\theta_0 + \theta_1x^3_1 - y^3_1)^2 \end{bmatrix}$

Calculating the Product of the Reciprocal 2m by the Squared Errors

Finally to complete the cost function calculation the sum of the sqared errors is multiplied by the reciprocal of 2m.

$J(\theta) = \frac{1}{2m}\sum{(h_{\theta}(x^{(i)}) - y^{(i)})^2}$

$= \frac{1}{2m}\vec{o}^T(X\vec{\theta} - \vec{y})^2$

$= \begin{bmatrix} \frac{1}{2m}(\theta_0 + \theta_1x^1_1 - y^1_1)^2 + \frac{1}{2m}(\theta_0 + \theta_1x^2_1 - y^2_1)^2 + \frac{1}{2m}(\theta_0 + \theta_1x^3_1 - y^3_1)^2 \end{bmatrix}$

Thats it! Now lets get our hands dirty implementing it in Python.

Implementing the Cost Function in Python

In this section we will impliment our vectorized for of the cost function with a simple (ok, contrived) dataset.

#prologue
import pandas as pd
import numpy as np

X = pd.DataFrame({'x0' : [1,1,1], 'x1': [1,2,5]})
X
x0 x1
0 1 1
1 1 2
2 1 5
y = pd.DataFrame({'y' : [0,2,4]})
y
y
0 0
1 2
2 4

Calculate the Hypothesis

To calculate our hypothesis we will first set theta to: $\vec{\theta} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$

and calculate: $h_{\theta}(x) = X\vec{\theta} = \begin{bmatrix} 0*1 + 1*1 \\ 0*1 + 1*2 \\ 0*1 + 1*5 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ 5 \end{bmatrix}$

We can expect the product to result in a 3×1 matrix.

#Numpy .dot is used instead because the dataframe .dot function requires matching column names.
xtheta = np.dot(X, theta)
xtheta
theta
0 0
1 1
#Numpy .dot is used instead because the dataframe .dot function requires matching column names.
xtheta = np.dot(X, theta)

# array([[1],
#  [2],
#  [5]])

Calculating the Errors

$X\vec{\theta} - \vec{y} = \begin{bmatrix} 1 \\ 2 \\ 5 \end{bmatrix} - \begin{bmatrix} 0 \\ 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}$

errors = xtheta - y
errors
y
0 1
1 0
2 1

Calculating the Squared Errors

$\vec{e} = (X\vec{\theta} - \vec{y})^2 = \begin{bmatrix} 1^2 \\ 0 \\ 1^2 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}$

errorsSquared = errors**2
y
0 1
1 0
2 1

Summing the Errors

To sum the error vector we multiply it by a column vector of ones. The multiplication results in the row elements of ones being multiplied and summed by column vector of errors, resulting in the sum of squared errors.

$\vec{s} = \sum{(h_{\theta}(x^{(i)}) - y^{(i)})^2} = \vec{o}^T\vec{e}^2 = \begin{bmatrix} 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}= \begin{bmatrix} 1 + 0 + 1 \end{bmatrix} = \begin{bmatrix} 2\end{bmatrix}$

ones = pd.DataFrame({'ones':np.ones(3)})
ones
ones
0 1
1 1
2 1
sumSquaredErrors = np.dot(ones.T, errorsSquared)

# array([[ 2.]])

Multiplying the Sum of the Errors by the Reciprocal of 2m

Finally a reciprocal of 2m is multiplied by the sum of the squared result resulting in the cost for the chosen set of thetas.

$J(\theta) = \frac{1}{2m}\sum{(h_{\theta}(x^{(i)}) - y^{(i)})^2} = \frac{1}{2*3}\vec{s} = \frac{1}{6}\begin{bmatrix} 2\end{bmatrix} = \begin{bmatrix} \frac{1}{3}\end{bmatrix}$

error = sumSquaredErrors/(2*3)

# array([[ 0.33333333]])

As expected the result is a one dimensional matrix containing one third.

The Cost Function in Python

import pandas as pd
import numpy as np

def cost(X, y, theta):
""" Calculates the cost function.

Args:
X (DataFrame): The design matrix
y (DataFrame): The target
theta (DataFrame): The hypothesis function's independent variables

Returns:
np.Array: the calculated squared error cost
"""
m = len(y)
o = pd.DataFrame(np.ones(m))
errorSquared = (np.dot(X,theta)-y)**2
return np.dot(o.T, errorSquared)/(2*m)


2 Responses

1. This is the best explanation I’ve seen for how this is vectorized, and you make it very intuitive! I would love a similar breakdown of the vectorized gradient descent algorithm, which I still can’t wrap my head around

2. Pablo says:

Nicely explained, and I was hoping to do the gradient descent too.