Least Square Problems

Section 8.1 Least Square Problems

This chapter deals with linear least square problems and its applications.

Subsection 8.1.1 Linear Least Square Problems

Consider a system of equations \(Ax=b\) having \(m\) equations in \(n\) variables. Suppose \(m\geq n\text{.}\) Then this system may not have a solution. Then we can look for what is the best approximate solution. If \(z\in \R^n\) is a solution of \(Ax=b\) then \(\norm{Az-b}=0\text{.}\) Here \(\norm{Az-b}\) is the measure of how far \(b\) from \(Az\text{.}\) The aim is to find the point \(z \in \R^b\) such that \(Az\) is at the smallest distance from \(b\text{.}\) Thus if such a \(z\) exists then \(\norm{Az-b}\leq \norm{Ax-b}\) for all \(x\in \R^n\text{.}\)

🔗

The answer to the above question is “yes”. In order to find this, we consider

\begin{equation*} W={\cal R}(A)=\{Ax:x\in \R^n\}\text{.} \end{equation*}

🔗

Note that \(W\) is subspace of \(\R^m\text{.}\) We are looking for \(z\) which is at the smallest distance from \(b\text{,}\) which is nothing but the orthogonal projection of \(b\) onto \(W\text{.}\) Suppose we assume that columns of \(A\) are linearly independent. Then \(W\) is the column space of. Hence by the Eq. (6.3.2), orthogonal projection is given by

\begin{equation*} z = A(A^TA)^{-1}A^Tb\text{.} \end{equation*}

🔗

Here the vector \(x^*=(A^TA)^{-1}A^Tb\) is called the least square approximation (solution) of\(Ax=b\text{.}\)

🔗

For the system \(Ax=b\text{,}\) after multiplying both sides by \(A^T\text{,}\) we get

\begin{equation*} A^TAx=A^Tb \end{equation*}

which is called the normal equation. We know that rank of \(A\) is equal to rank of \(A^TA\text{.}\) Hence \(A^T\) is invertible if \(A\) has linearly independent columns. Also the \(Ax=b\) has least square solution if and only if the associated normal equation \(A^TAx=b\) has a solution.

🔗

Note that the least square solution \(x^*=(A^TA)^{-1}A^Tb\) is the minimizer of the function \(f(x)=\norm{Ax-b}^2\text{.}\) This can also be obtained using calculus.

\begin{align*} f(x) \amp = \norm{Ax-b}^2 = (Ax-b)^T(Ax-b)\\ \amp = x^TA^TAx-x^TA^Tb-b^TAx+b^Tb\\ \amp = x^TA^TAx-2A^Tbx+b^Tb \text{.} \end{align*}

🔗

The last equiality is due to the fact that, \(x^TA^Tb=b^TAx\text{.}\)

🔗

Hence the gradient \(\nabla f(x)=2A^TAx-2A^Tb\text{.}\) Hence \(\nabla f(x)=0\) implies

\begin{equation} x^*=(A^TA)^{-1}A^Tb\tag{8.1.1} \end{equation}

provided \(A^TA\) is non singular.

🔗

Example 8.1.1.

Consider the inner product space \({\cal C}[-1,1]\) equipped with inner product

\begin{equation*} \inner{f}{g}=\int_{-1}^1 f(x)g(x)\text{.} \end{equation*}

🔗

Find the polynomial of degree at most 2 which is closest to the function \(f(x)=|x|\text{.}\) Here we consider the subspace \(W={\cal P}_2(\R)\text{.}\) We need the find the orthogonal projection of \(f(x)\) onto \(W\text{.}\)

🔗

From Example 7.1.22, we know that

\begin{equation*} \left\{u_1 = \frac{1}{\sqrt{2}}, u_2 = \frac{\sqrt{6}}{2}x, u_3=\frac{3\sqrt{10}}{4}\left(x^2-\frac13\right)\right\} \end{equation*}

is an orthonormal basis of \(W\text{.}\) Hence the orthogonal projection \(p\) of \(f\) onto \(W\) is give by

\begin{equation*} p(x)=\inner{f}{u_1}u_1+\inner{f}{u_2}u_2+\inner{f}{u_2}u_3\text{.} \end{equation*}

🔗

After simplification we get \(p(x)=\frac{3}{16}(5x^2+1)\text{.}\)

🔗

Checkpoint 8.1.2.

Consider the inner product space \({\cal C}[-1,1]\) equipped with inner product

\begin{equation*} \inner{f}{g}=\int_{-1}^1 f(x)g(x)\text{.} \end{equation*}

🔗

Find the polynomial of degree at most 2 which is closest to the function \(f(x)=\sin x\text{.}\)

🔗

Example 8.1.3.

Consider a system of linear equations \(Ax=b\) where

\begin{equation*} A= \left(\begin{array}{rr} 1 \amp -1 \\ 1 \amp 1 \\ 2 \amp 1 \\ -2 \amp 3 \end{array} \right),\qquad b = \begin{pmatrix}-1\\3\\5\\1 \end{pmatrix} \end{equation*}

🔗

(a) Find the least square solution of \(Ax=b\text{.}\)

🔗

(b) Find the orthogonal projection of \(b\) onto the column space \(W={col}(A)\text{.}\)

🔗

Solution: It is easy to see that the columns of \(A\) are linearly independent, hence the least square solution exists. We have

\begin{equation*} A^TA = \left(\begin{array}{rr} 10 \amp -4 \\ -4 \amp 12 \end{array} \right),\text{ and } A^Tb=\left(\begin{array}{rr}10\\12 \end{array} \right)\text{.} \end{equation*}

🔗

Hence the least square solution of \(Ax=b\) is the solution of the normal equation \(A^TAx=A^Tb\) which is \(x^*=\left(\begin{array}{rr}21/13\\20/13 \end{array} \right)\text{.}\) The same can obtained as \(x^* =(A^TA)^{-1}A^Tb\text{.}\)

🔗

(b) The orthogonal projection of \(b\) onto \(W\) is given by

\begin{equation*} w = A(A^TA)^{-1}A^Tb = \left(\begin{array}{rr} 1 \amp -1 \\ 1 \amp 1 \\ 2 \amp 1 \\ -2 \amp 3 \end{array} \right)\left(\begin{array}{rr}21/13\\20/13 \end{array} \right)=\left(\begin{array}{r} \frac{1}{13} \\ \frac{41}{13} \\ \frac{62}{13} \\ \frac{18}{13} \end{array} \right)\text{.} \end{equation*}

🔗

(b) Since we have found \(w\text{,}\) \(u\) is given by \(b-w=\left(\begin{array}{r} -\frac{14}{13} \\ -\frac{2}{13} \\ \frac{3}{13} \\ -\frac{5}{13} \end{array} \right)\text{.}\) Hence

\begin{equation*} b = \begin{pmatrix}-1\\3\\5\\1 \end{pmatrix} =w+u=\left(\begin{array}{r} \frac{1}{13} \\ \frac{41}{13} \\ \frac{62}{13} \\ \frac{18}{13} \end{array} \right)+\left(\begin{array}{r} -\frac{14}{13} \\ -\frac{2}{13} \\ \frac{3}{13} \\ -\frac{5}{13} \end{array} \right)\text{.} \end{equation*}

🔗

Exercises Exercises

1.

(i) Find the best approximation (least square solution) of the system of linear equations

\begin{equation*} 3x-y = 4, x+2y=0, 2x+y = 1 \end{equation*}

🔗

2.

The average number of goals \(g\text{,}\) per game scored by a football player is related linearly to two factors, (i) the number \(x_1\) of years of experience and (ii) the number \(x_2\) of goals in the preceding 10 games. Find the linear The data on the following page were collected on four players. Find the linear function \(y=a_0+a_1x_1+a_2x_2\text{.}\)

🔗


goals \((g)\)	0.8	0.7	0.6	0.5

\(x_1\)	10	8	6	3

\(x_2\)	4	4	3	2

🔗

Example 8.1.4.

The average annual temperature of Santacruz in Mumbai recorded from 1991 to 2021 is given in the following table.

🔗


Year	1990	1991	1992	1993	1994	1995	1996	1997
Temp	27.07	26.93	27.11	27.18	26.94	27.25	27.64	27.66

Year	1998	1999	2000	2001	2002	2003	2004	2005
Temp	27.75	27.65	27.61	27.26	27.82	27.46	27.00	27.36

Year	2006	2007	2008	2009	2010	2011	2012	2013
Temp	27.36	28.02	27.75	28.33	28.16	27.94	27.61	27.63

Year	2014	2015	2016	2017	2018	2019	2020	2021
Temp	28.18	28.67	28.24	28.55	28.76	28.27	28.40	28.48

🔗

Find the equation of the line that best fits these data points.

🔗

The temperature data is plotted in the Figure 8.1.5.

🔗

We wish to find the best fit line to the given set of data. Suppose the line is given by \(y=c+mx\text{,}\) then we wish to find \(a\) and \(b\) such that the line \(y=c+mx\) is best fit. Now what is meaning of "best fit". Suppose we consider the point \((x_i,y_i)\text{,}\) if it lies on \(c+mx\text{,}\) then \(y=c+mx_i\text{,}\) other wise \(|(c+mx_i)-y_i|\) is the error. We need to minimize this error for all the points. That is achieved by minimizing the sum of errors. Which is given by

\begin{equation*} \sum_{i=1}^n [(c+mx_i)-y_i]^2 \end{equation*}

where \(n\) is the number of points. Note that the sum of error squares can be written as

\begin{equation*} \sum_{i=1}^n [(c+mx_i)-y_i]^2 = \norm{\begin{bmatrix}1 \amp x_1 \\1 \amp x_2 \\\vdots \amp \vdots \\ 1 \amp x_n \end{bmatrix} \begin{bmatrix}c\\m \end{bmatrix} -\begin{bmatrix}y_1\\y_2\\\vdots\\y_n \end{bmatrix} }=\norm{Ax-b}\text{.} \end{equation*}

🔗

Here

\begin{equation*} A = \begin{bmatrix}1 \amp x_1 \\1 \amp x_2 \\\vdots \amp \vdots \\ 1 \amp x_n \end{bmatrix} x= \begin{bmatrix}c\\m \end{bmatrix} ,b = \begin{bmatrix}y_1\\y_2\\\vdots\\y_n \end{bmatrix} \end{equation*}

🔗

Thus finding \(m,c\) amount to finding the least square solution of \(Ax=b\text{,}\) which is given by

\begin{equation*} x^*=\begin{bmatrix}c\\m \end{bmatrix} =(A^TA)^{-1}A^Tb \end{equation*}

🔗

For the given problem, we have

\begin{equation*} A = \begin{bmatrix}1 \amp 1990 \\1 \amp 1991 \\\vdots \amp \vdots \\ 1 \amp 2021 \end{bmatrix} , b = \begin{bmatrix}27.07\\26.93\\\vdots\\28.48 \end{bmatrix} \end{equation*}

🔗

We have

\begin{equation*} A^T = \left(\begin{array}{rr} 32 \amp 64176 \\ 64176 \amp 128707696 \end{array} \right), A^Tb =\left(\begin{array}{r} 888.050000000000 \\ 1.78111456000000 \times 10^{6} \end{array} \right) \end{equation*}

\begin{equation*} (A^TA)^{-1}=\left(\begin{array}{rr} \frac{8044231}{5456} \amp -\frac{4011}{5456} \\ -\frac{4011}{5456} \amp \frac{1}{2728} \end{array} \right)\text{.} \end{equation*}

🔗

Hence

\begin{equation*} x^*=\begin{bmatrix}c\\m \end{bmatrix} =(A^TA)^{-1}A^Tb = \left(\begin{array}{r} -68.0279710416216 \\ 0.0477584310854127 \end{array} \right) \end{equation*}

🔗

The set of points along with the best fitted line is shown in the Figure 8.1.6

🔗

This means the slope of the fitted (regression line) is 0.0477584310854127 and the intercept is -68.0279710416216. Now we plot the fitted line along with the fitted line.

🔗

Sage also has an inbult function to find the fitted line.

🔗

Subsection 8.1.2 Fitting polynomials to a data set

Suppose we are given a set of \(n\)-data points \((x_i y_i)\) and we wish to find the best fit polynomial curve of degree \(p\text{,}\) say, \(y=a_0+a_1x+a_2x^2+\cdots +a_px^p\text{,}\) with \(a_p\neq 0\text{.}\) In this case, the error term for \((x_i,y_i)\) from the the the curve \(y\) is \(|(a_0+a_1x_i+a_2x_i^2+\cdots +a_px_i^p)-y_i|\text{.}\) Thus the sum of the error square is

\begin{equation*} \sum_i^n [(a_0+a_1x_i+a_2x_i^2+\cdots +a_px_i^p)-y_i]^2 \end{equation*}

🔗

The sum of error square can be written as

\begin{equation*} \sum_i^n [y_i-(a_0+a_1x_i+a_2x_i^2+\cdots +a_px_i^p)]^2=\norm{Ax-b}^2\text{,} \end{equation*}

where

\begin{equation*} A = \begin{bmatrix}1 \amp x_1 \amp x_1^2 \amp \cdots \amp x_1^p\\ 1 \amp x_2 \amp x_2^2 \amp \cdots \amp x_2^p\\ \vdots \amp \vdots \amp \cdots \amp \ddots \amp \vdots\\ 1 \amp x_n \amp x_n^2 \amp \cdots \amp x_n^p \end{bmatrix} , b = \begin{bmatrix}y_1\\y_2\\\vdots\\y_n \end{bmatrix} , x = \begin{bmatrix}a_0\\a_1\\a_2\\\vdots\\a_p \end{bmatrix} \end{equation*}

🔗

Thus the least square solution is given by \(x^* = (A^TA)A^Tb\) provided columns of \(A\) are linearly independent.

🔗

Example 8.1.7.

Find th best fit cubic \(y=a_0+a_1x+a_2x^2+a_3x^3\) to the following set of points

🔗


\(x_i\)	-3.0	-2.8	-2.5	-2.2	-2.0	-1.8	-1.5	-1.2	-1.0	-0.75
\(y_i\)	1.1	4.0	7.3	7.1	8.2	7.8	9.9	7.1	8.8	6.2

\(x_i\)	-0.50	-0.25	0.00	0.25	0.50	0.75	1.0	1.2	1.5	1.8
\(y_i\)	7.0	3.7	4.7	3.4	5.6	5.8	5.3	6.6	10.	12.

Thus we need to find the least square solution of \(Ax=b\text{,}\) where

\begin{equation*} A = \left(\begin{array}{rrrr} 1.00 \amp -3.00 \amp 9.00 \amp -27.00\\ 1.0 \amp -2.80 \amp 7.84\amp -21.952\\ 1.00 \amp -2.500\amp 6.250 \amp -15.625\\ \vdots \amp \vdots\amp \vdots \amp \vdots \\ 1.00 \amp 1.80\amp 3.240 \amp 5.832 \end{array} \right), b = \left(\begin{array}{r} 1.1\\ 4.0 \\ 7.3\\ \vdots\\ 12.0 \end{array} \right) \end{equation*}

🔗

We have

\begin{equation*} A^TA = \left(\begin{array}{rrrr} 20.0 \amp -12.5\amp 49.54 \amp -83.225\\ -12.5\amp 49.54\amp -83.225 \amp 258.986725 \\ 49.54 \amp -83.225\amp 258.986725 \amp -596.29625 \\ -83.225 \amp 258.986725\amp -596.29625 \amp 1731.57619431250 \end{array} \right) \end{equation*}

\begin{equation*} A^Tb = \left(\begin{array}{r} 131.6 \\ -62.235 \\ 307.14775 \\ -352.6518375 \end{array} \right) \end{equation*}

🔗

Hence the least square solution

\begin{equation*} x^* = (A^TA)^{-1}A^Tb=\begin{bmatrix}a_0 \\a_1\\a_2\\a_3 \end{bmatrix} = \begin{bmatrix}4.82641673699866 \\ -1.84149774223300 \\ 1.78790889783526 \\ 0.919436026634361 \end{bmatrix} \end{equation*}

🔗

See the Figure 8.1.8 for fitted curve along with the data.

🔗

Using the inbuilt sage function find_fit, we have

🔗

Exercises Exercises

1.

Find the least squares approximating line \(y = a_0 +a_1x\) for each of the following sets of data points.

🔗

(i) \((-2, 3), (-1, 1), (0, 0), (1, -2), (2, -4)\)

🔗

(ii) \((-1, -1), (0, 1), (1, 2), (2, 4), (3, 6)\)

🔗

2.

Find the least squares approximating quadratic \(y=a_0+a_1x+a_2x^2\) for each of the following sets of data points.

🔗

(i) \(. (-2, 1), (0, 0), (3, 2), (4, 3)\)

🔗

(ii) The table gives the worldwide cumulative HIV infections in millions.

🔗


1980	0.1	1995	29.8
1982	0.7	1997	40.9
1985	2.4	2000	57.9
1987	4.5	2002	67.9
1990	10	2005	82.7
1992	16.1	2008	100.2

🔗

Subsection 8.1.3 Least Square Solution us QR-factorization

\(QR\)-factorization can also be used to find the solution of least square problems.

🔗

Subsection 8.1.4 Weighted Least Square Problems

Weighted Least Squares (WLS) is an extension of the Ordinary Least Squares (OLS) method, allowing each observation to have a different weight based on its reliability. This is particularly useful when data points have unequal variances, a situation known as heteroscedasticity.

🔗

Consider the overdetermined system \(A\mathbf{x} \approx \mathbf{b}\text{,}\) where \(A \in \mathbb{R}^{m \times n}\) is the design matrix, \(\mathbf{x} \in \mathbb{R}^n\) is the vector of unknowns, and \(\mathbf{b} \in \mathbb{R}^m\) contains the observations. If the \(i\)-th observation has variance \(\sigma_i^2\text{,}\) the corresponding weight is \(w_i = \frac{1}{\sigma_i^2}\text{.}\) We define the diagonal weight matrix:

\begin{equation*} W = \mathrm{diag}(w_1, w_2, \dots, w_m). \end{equation*}

🔗

The WLS problem minimizes the weighted squared residuals:

\begin{equation*} \min_{\mathbf{x}} \| W^{1/2} (A\mathbf{x} - \mathbf{b}) \|_2^2 \end{equation*}

leading to the normal equations:

\begin{equation*} A^\mathsf{T} W A \mathbf{x} = A^\mathsf{T} W \mathbf{b}. \end{equation*}

If \(A^\mathsf{T} W A\) is invertible, the solution is:

\begin{equation*} \mathbf{x} = (A^\mathsf{T} W A)^{-1} A^\mathsf{T} W \mathbf{b}. \end{equation*}

🔗

Geometrically, this is equivalent to projecting \(\mathbf{b}\) onto the column space of \(A\) using the \(W\)-weighted inner product:

\begin{equation*} \langle \mathbf{u}, \mathbf{v} \rangle_W = \mathbf{u}^\mathsf{T} W \mathbf{v}. \end{equation*}

🔗

Prev Top Next