Skip to main content

Section 8.1 Least Square Problems

This chapter deals with linear least square problems and its applications.

Subsection 8.1.1 Linear Least Square Problems

Consider a system of equations \(Ax=b\) having \(m\) equations in \(n\) variables. Suppose \(m\geq n\text{.}\) Then this system may not have a solution. Then we can look for what is the best approximate solution. If \(z\in \R^n\) is a solution of \(Ax=b\) then \(\norm{Az-b}=0\text{.}\) Here \(\norm{Az-b}\) is the measure of how far \(b\) from \(Az\text{.}\) The aim is to find the point \(z \in \R^b\) such that \(Az\) is at the smallest distance from \(b\text{.}\) Thus if such a \(z\) exists then \(\norm{Az-b}\leq \norm{Ax-b}\) for all \(x\in \R^n\text{.}\)
The answer to the above question is “yes”. In order to find this, we consider
\begin{equation*} W={\cal R}(A)=\{Ax:x\in \R^n\}\text{.} \end{equation*}
Note that \(W\) is subspace of \(\R^m\text{.}\) We are looking for \(z\) which is at the smallest distance from \(b\text{,}\) which is nothing but the orthogonal projection of \(b\) onto \(W\text{.}\) Suppose we assume that columns of \(A\) are linearly independent. Then \(W\) is the column space of. Hence by the Eq. (6.3.2), orthogonal projection is given by
\begin{equation*} z = A(A^TA)^{-1}A^Tb\text{.} \end{equation*}
Here the vector \(x^*=(A^TA)^{-1}A^Tb\) is called the least square approximation (solution) of\(Ax=b\text{.}\)
For the system \(Ax=b\text{,}\) after multiplying both sides by \(A^T\text{,}\) we get
\begin{equation*} A^TAx=A^Tb \end{equation*}
which is called the normal equation. We know that rank of \(A\) is equal to rank of \(A^TA\text{.}\) Hence \(A^T\) is invertible if \(A\) has linearly independent columns. Also the \(Ax=b\) has least square solution if and only if the associated normal equation \(A^TAx=b\) has a solution.
Note that the least square solution \(x^*=(A^TA)^{-1}A^Tb\) is the minimizer of the function \(f(x)=\norm{Ax-b}^2\text{.}\) This can also be obtained using calculus.
\begin{align*} f(x) \amp = \norm{Ax-b}^2 = (Ax-b)^T(Ax-b)\\ \amp = x^TA^TAx-x^TA^Tb-b^TAx+b^Tb\\ \amp = x^TA^TAx-2A^Tbx+b^Tb \text{.} \end{align*}
The last equiality is due to the fact that, \(x^TA^Tb=b^TAx\text{.}\)
Hence the gradient \(\nabla f(x)=2A^TAx-2A^Tb\text{.}\) Hence \(\nabla f(x)=0\) implies
\begin{equation} x^*=(A^TA)^{-1}A^Tb\tag{8.1.1} \end{equation}
provided \(A^TA\) is non singular.

Example 8.1.1.

Consider the inner product space \({\cal C}[-1,1]\) equipped with inner product
\begin{equation*} \inner{f}{g}=\int_{-1}^1 f(x)g(x)\text{.} \end{equation*}
Find the polynomial of degree at most 2 which is closest to the function \(f(x)=|x|\text{.}\) Here we consider the subspace \(W={\cal P}_2(\R)\text{.}\) We need the find the orthogonal projection of \(f(x)\) onto \(W\text{.}\)
From Example 7.1.21, we know that
\begin{equation*} \left\{u_1 = \frac{1}{\sqrt{2}}, u_2 = \frac{\sqrt{6}}{2}x, u_3=\frac{3\sqrt{10}}{4}\left(x^2-\frac13\right)\right\} \end{equation*}
is an orthonormal basis of \(W\text{.}\) Hence the orthogonal projection \(p\) of \(f\) onto \(W\) is give by
\begin{equation*} p(x)=\inner{f}{u_1}u_1+\inner{f}{u_2}u_2+\inner{f}{u_2}u_3\text{.} \end{equation*}
After simplification we get \(p(x)=\frac{3}{16}(5x^2+1)\text{.}\)

Checkpoint 8.1.2.

Consider the inner product space \({\cal C}[-1,1]\) equipped with inner product
\begin{equation*} \inner{f}{g}=\int_{-1}^1 f(x)g(x)\text{.} \end{equation*}
Find the polynomial of degree at most 2 which is closest to the function \(f(x)=\sin x\text{.}\)

Example 8.1.3.

Consider a system of linear equations \(Ax=b\) where
\begin{equation*} A= \left(\begin{array}{rr} 1 \amp -1 \\ 1 \amp 1 \\ 2 \amp 1 \\ -2 \amp 3 \end{array} \right),\qquad b = \begin{pmatrix}-1\\3\\5\\1 \end{pmatrix} \end{equation*}
(a) Find the least square solution of \(Ax=b\text{.}\)
(b) Find the orthogonal projection of \(b\) onto the column space \(W={col}(A)\text{.}\)
(c) Write the \(b\) as \(b=w+u\) where \(w\in W\) and \(u\in W^\perp\text{.}\)
Solution: It is easy to see that the columns of \(A\) are linearly independent, hence the least square solution exists. We have
\begin{equation*} A^TA = \left(\begin{array}{rr} 10 \amp -4 \\ -4 \amp 12 \end{array} \right),\text{ and } A^Tb=\left(\begin{array}{rr}10\\12 \end{array} \right)\text{.} \end{equation*}
Hence the least square solution of \(Ax=b\) is the solution of the normal equation \(A^TAx=A^b\) which is \(x^*=\left(\begin{array}{rr}21/13\\20/13 \end{array} \right)\text{.}\) The same can obtained as \(x^* =(A^TA)^{-1}A^Tb\text{.}\)
(b) The orthogonal projection of \(b\) onto \(W\) is given by
\begin{equation*} w = A(A^TA)^{-1}A^Tb = \left(\begin{array}{rr} 1 \amp -1 \\ 1 \amp 1 \\ 2 \amp 1 \\ -2 \amp 3 \end{array} \right)\left(\begin{array}{rr}21/13\\20/13 \end{array} \right)=\left(\begin{array}{r} \frac{1}{13} \\ \frac{41}{13} \\ \frac{62}{13} \\ \frac{18}{13} \end{array} \right)\text{.} \end{equation*}
(b) Since we have found \(w\text{,}\) \(u\) is given by \(b-w=\left(\begin{array}{r} -\frac{14}{13} \\ -\frac{2}{13} \\ \frac{3}{13} \\ -\frac{5}{13} \end{array} \right)\text{.}\) Hence
\begin{equation*} b = \begin{pmatrix}-1\\3\\5\\1 \end{pmatrix} =w+u=\left(\begin{array}{r} \frac{1}{13} \\ \frac{41}{13} \\ \frac{62}{13} \\ \frac{18}{13} \end{array} \right)+\left(\begin{array}{r} -\frac{14}{13} \\ -\frac{2}{13} \\ \frac{3}{13} \\ -\frac{5}{13} \end{array} \right)\text{.} \end{equation*}

Exercises Exercises

1.
(i) Find the best approximation (least square solution) of the system of linear equations
\begin{equation*} 3x-y = 4, x+2y=0, 2x+y = 1 \end{equation*}
2.
The average number of goals \(g\text{,}\) per game scored by a football player is related linearly to two factors, (i) the number \(x_1\) of years of experience and (ii) the number \(x_2\) of goals in the preceding 10 games. Find the linear The data on the following page were collected on four players. Find the linear function \(y=a_0+a_1x_1+a_2x_2\text{.}\)
goals \((g)\) 0.8 0.7 0.6 0.5
\(x_1\) 10 8 6 3
\(x_2\) 4 4 3 2

Example 8.1.4.

The average annual temperature of Santacruz in Mumbai recorded from 1991 to 2021 is given in the following table.
Year 1990 1991 1992 1993 1994 1995 1996 1997
Temp 27.07 26.93 27.11 27.18 26.94 27.25 27.64 27.66
Year 1998 1999 2000 2001 2002 2003 2004 2005
Temp 27.75 27.65 27.61 27.26 27.82 27.46 27.00 27.36
Year 2006 2007 2008 2009 2010 2011 2012 2013
Temp 27.36 28.02 27.75 28.33 28.16 27.94 27.61 27.63
Year 2014 2015 2016 2017 2018 2019 2020 2021
Temp 28.18 28.67 28.24 28.55 28.76 28.27 28.40 28.48
Find the equation of the line that best fits these data points.
The temperature data is plotted in the Figure 8.1.5.
Figure 8.1.5.
We wish to find the best fit line to the given set of data. Suppose the line is given by \(y=c+mx\text{,}\) then we wish to find \(a\) and \(b\) such that the line \(y=c+mx\) is best fit. Now what is meaning of "best fit". Suppose we consider the point \((x_i,y_i)\text{,}\) if it lies on \(c+mx\text{,}\) then \(y=c+mx_i\text{,}\) other wise \(|(c+mx_i)-y_i|\) is the error. We need to minimize this error for all the points. That is achieved by minimizing the sum of errors. Which is given by
\begin{equation*} \sum_{i=1}^n [(c+mx_i)-y_i]^2 \end{equation*}
where \(n\) is the number of points. Note that the sum of error squares can be written as
\begin{equation*} \sum_{i=1}^n [(c+mx_i)-y_i]^2 = \norm{\begin{bmatrix}1 \amp x_1 \\1 \amp x_2 \\\vdots \amp \vdots \\ 1 \amp x_n \end{bmatrix} \begin{bmatrix}c\\m \end{bmatrix} -\begin{bmatrix}y_1\\y_2\\\vdots\\y_n \end{bmatrix} }=\norm{Ax-b}\text{.} \end{equation*}
Here
\begin{equation*} A = \begin{bmatrix}1 \amp x_1 \\1 \amp x_2 \\\vdots \amp \vdots \\ 1 \amp x_n \end{bmatrix} x= \begin{bmatrix}c\\m \end{bmatrix} ,b = \begin{bmatrix}y_1\\y_2\\\vdots\\y_n \end{bmatrix} \end{equation*}
Thus finding \(m,c\) amount to finding the least square solution of \(Ax=b\text{,}\) which is given by
\begin{equation*} x^*=\begin{bmatrix}c\\m \end{bmatrix} =(A^TA)^{-1}A^Tb \end{equation*}
For the given problem, we have
\begin{equation*} A = \begin{bmatrix}1 \amp 1990 \\1 \amp 1991 \\\vdots \amp \vdots \\ 1 \amp 2021 \end{bmatrix} , b = \begin{bmatrix}27.07\\26.93\\\vdots\\28.48 \end{bmatrix} \end{equation*}
We have
\begin{equation*} A^T = \left(\begin{array}{rr} 32 \amp 64176 \\ 64176 \amp 128707696 \end{array} \right), A^Tb =\left(\begin{array}{r} 888.050000000000 \\ 1.78111456000000 \times 10^{6} \end{array} \right) \end{equation*}
\begin{equation*} (A^TA)^{-1}=\left(\begin{array}{rr} \frac{8044231}{5456} \amp -\frac{4011}{5456} \\ -\frac{4011}{5456} \amp \frac{1}{2728} \end{array} \right)\text{.} \end{equation*}
Hence
\begin{equation*} x^*=\begin{bmatrix}c\\m \end{bmatrix} =(A^TA)^{-1}A^Tb = \left(\begin{array}{r} -68.0279710416216 \\ 0.0477584310854127 \end{array} \right) \end{equation*}
The set of points along with the best fitted line is shown in the Figure 8.1.6
Figure 8.1.6.

Subsection 8.1.2 Fitting polynomials to a data set

Suppose we are given a set of \(n\)-data points \((x_i y_i)\) and we wish to find the best fit polynomial curve of degree \(p\text{,}\) say, \(y=a_0+a_1x+a_2x^2+\cdots +a_px^p\text{,}\) with \(a_p\neq 0\text{.}\) In this case, the error term for \((x_i,y_i)\) from the the the curve \(y\) is \(|(a_0+a_1x_i+a_2x_i^2+\cdots +a_px_i^p)-y_i|\text{.}\) Thus the sum of the error square is
\begin{equation*} \sum_i^n [(a_0+a_1x_i+a_2x_i^2+\cdots +a_px_i^p)-y_i]^2 \end{equation*}
The sum of error square can be written as
\begin{equation*} \sum_i^n [y_i-(a_0+a_1x_i+a_2x_i^2+\cdots +a_px_i^p)]^2=\norm{Ax-b}^2\text{,} \end{equation*}
where
\begin{equation*} A = \begin{bmatrix}1 \amp x_1 \amp x_1^2 \amp \cdots \amp x_1^p\\ 1 \amp x_2 \amp x_2^2 \amp \cdots \amp x_2^p\\ \vdots \amp \vdots \amp \cdots \amp \ddots \amp \vdots\\ 1 \amp x_n \amp x_n^2 \amp \cdots \amp x_n^p \end{bmatrix} , b = \begin{bmatrix}y_1\\y_2\\\vdots\\y_n \end{bmatrix} , x = \begin{bmatrix}a_0\\a_1\\a_2\\\vdots\\a_p \end{bmatrix} \end{equation*}
Thus the least square solution is given by \(x^* = (A^TA)A^Tb\) provided columns of \(A\) are linearly independent.

Example 8.1.7.

Find th best fit cubic \(y=a_0+a_1x+a_2x^2+a_3x^3\) to the following set of points
\(x_i\) -3.0 -2.8 -2.5 -2.2 -2.0 -1.8 -1.5 -1.2 -1.0 -0.75
\(y_i\) 1.1 4.0 7.3 7.1 8.2 7.8 9.9 7.1 8.8 6.2
\(x_i\) -0.50 -0.25 0.00 0.25 0.50 0.75 1.0 1.2 1.5 1.8
\(y_i\) 7.0 3.7 4.7 3.4 5.6 5.8 5.3 6.6 10. 12.
Thus we need to find the least square solution of \(Ax=b\text{,}\) where
\begin{equation*} A = \left(\begin{array}{rrrr} 1.00 \amp -3.00 \amp 9.00 \amp -27.00\\ 1.0 \amp -2.80 \amp 7.84\amp -21.952\\ 1.00 \amp -2.500\amp 6.250 \amp -15.625\\ \vdots \amp \vdots\amp \vdots \amp \vdots \\ 1.00 \amp 1.80\amp 3.240 \amp 5.832 \end{array} \right), b = \left(\begin{array}{r} 1.1\\ 4.0 \\ 7.3\\ \vdots\\ 12.0 \end{array} \right) \end{equation*}
We have
\begin{equation*} A^TA = \left(\begin{array}{rrrr} 20.0 \amp -12.5\amp 49.54 \amp -83.225\\ -12.5\amp 49.54\amp -83.225 \amp 258.986725 \\ 49.54 \amp -83.225\amp 258.986725 \amp -596.29625 \\ -83.225 \amp 258.986725\amp -596.29625 \amp 1731.57619431250 \end{array} \right) \end{equation*}
\begin{equation*} A^Tb = \left(\begin{array}{r} 131.6 \\ -62.235 \\ 307.14775 \\ -352.6518375 \end{array} \right) \end{equation*}
Hence the least square solution
\begin{equation*} x^* = (A^TA)^{-1}A^Tb=\begin{bmatrix}a_0 \\a_1\\a_2\\a_3 \end{bmatrix} = \begin{bmatrix}4.82641673699866 \\ -1.84149774223300 \\ 1.78790889783526 \\ 0.919436026634361 \end{bmatrix} \end{equation*}
See the Figure 8.1.8 for fitted curve along with the data.
Figure 8.1.8.

Exercises Exercises

1.
Find the least squares approximating line \(y = a_0 +a_1x\) for each of the following sets of data points.
(i) \((-2, 3), (-1, 1), (0, 0), (1, -2), (2, -4)\)
(ii) \((-1, -1), (0, 1), (1, 2), (2, 4), (3, 6)\)
2.
Find the least squares approximating quadratic \(y=a_0+a_1x+a_2x^2\) for each of the following sets of data points.
(i) \(. (-2, 1), (0, 0), (3, 2), (4, 3)\)
(ii) The table gives the worldwide cumulative HIV infections in millions.
1980 0.1 1995 29.8
1982 0.7 1997 40.9
1985 2.4 2000 57.9
1987 4.5 2002 67.9
1990 10 2005 82.7
1992 16.1 2008 100.2

Subsection 8.1.3 Weighted Least Square Problems