Skip to main content

Section 9.1 Singular Value Decomposition

In this section, we deal with one of the most important matrix factorization tools, called the singular value decomposition (SVD). The SVD of a matrix \(A\) is closely related to eigen decomposition of the matrix \(AA^T\text{.}\) One can also think of this as a generalization of diagonalization procedure that allows us to diagonalize any matrix not necessarily square matrix. The SVD is computationally a viable tool for a wide variety of applications. It has applications in image and signal processing, control theory, least square problems, time series analysis, pattern recognition, dimensionality reduction, biomedical engineering and defining a generalized inverse of a matrix and many more. We shall deal with few of these applications.

Subsection 9.1.1 Singular Value Decomposition Theorem

The decomposition \(A=U\sum V^T\) is called a singular value decomposition of \(A\text{.}\) The diagonals entries \(\sigma_1,\sigma_2,\ldots, \sigma_r\) are called singular values of \(A\text{.}\) (that is why the name singular value decomposition.)
Before we prove this theorem, let us play with the equation (9.1.1). We have
\begin{equation*} A^TA = \left(V\Sigma^TU\right) \left(U^T\Sigma V^T\right) =V\Sigma^T\Sigma V^T\text{.} \end{equation*}
This implies
\begin{equation*} V^T(A^TA)V=\Sigma^T\Sigma \text{.} \end{equation*}
Hence
\begin{equation} V^T(A^TA)V = \left[\begin{array}{ccc|ccc} \sigma_1^2 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp \sigma_r^2 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{n\times n}\tag{9.1.2} \end{equation}
The Eqn. (9.1.2) suggests that columns of \(V\) are eigenvectors of \(A^TA\text{,}\) and are called right singular vectors.
Similarly,
\begin{equation*} AA^T = \left(U\Sigma V^T\right) \left(V\Sigma^T U^T\right) =U\Sigma\Sigma^T U^T \end{equation*}
This implies
\begin{equation} U(A^TA)U^T = \left[\begin{array}{ccc|ccc} \sigma_1^2 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp \sigma_r^2 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{m\times m}\tag{9.1.3} \end{equation}
The Eqn. (9.1.3) suggests that columns of \(U\) are eigenvectors of \(AA^T\text{,}\) and are called left singular vectors.
The notion of right and left eigenvectors suggest a way to construct the matrix \(V\) and \(U\) in the SVD decomposition. Let us see what do I mean? Suppose \(U=[u_1~u_2~\cdots~u_m]\) and \(V=[v_1~v_2~\cdots~v_n]\text{.}\) Then we have
\begin{equation*} A^TAv_i=\sigma_i^2 v_i \text{ and } AA^Tu_i=\sigma_i^2 u_i\text{.} \end{equation*}
Hence
\begin{equation*} A(A^TA v_i) = (AA^T)(Av_i)=A(\sigma_i^2v_i)=\sigma_i^2(Av_i)\text{.} \end{equation*}
Thus suggests, that \(u_i\) may be defined as \(Av_i\text{.}\) However, if \(u_i=Av_i\text{,}\) then
\begin{equation*} \innprod{u_i}{u_j} = \innprod{Av_i}{Av_j}=\innprod{A^TAv_i}{v_j}=\sigma_i^2\innprod{v_i}{v_j}=\sigma_i^2\delta_{ij}\text{.} \end{equation*}
If we want \(U\) to be orthogonal, we may define \(u_i=\frac{1}{\sigma_i}Av_i\text{.}\)
Let us look at proof of Theorem 9.1.1.

Proof.

Note that \(A^TA\) is symmetric and \(\rank{(A)}=\rank{(A^TA)}\text{.}\) Let \(\rank{(A)}=r\text{.}\) Further
\begin{equation*} \innprod{A^TAx}{x}=\innprod{Ax}{Ax}=\norm{Ax}^2\geq 0, \forall x\in \R^n\text{.} \end{equation*}
Hence \(A^TA\) is a symmetric and semi-positive definite matrix. Hence all its eigenvalues are real and non negative. Let \(\lambda_1, \lambda_2,\ldots,\lambda_r\) be non zero eigenvalues of \(A^TA\) with \(\lambda_1\geq \lambda_2\geq\cdots\geq\lambda_r\text{.}\) The remaining eigenvalues of \(A^TA\) are \(\lambda_{r+1}=\lambda_{r+2}=\cdots=\lambda_n=0\text{.}\) Let us denote \(\lambda_i=\sigma_i^2\) for \(i=1,\ldots, n\text{.}\) Since \(A^TA\) is symmetric matrix, it is diagonalizable. Hence there exists an orthogonal eigenbasis \(v_1,\ldots, v_n\) for \(\R^n\) of \(A^TA\text{.}\) Let \(A^TAv_i=\sigma_i^2v_i\text{.}\) This implies
\begin{align} v_i^TA^TAv_i=\amp \sigma_i^2 \amp \text{ for } i=1,2,\ldots, r\tag{9.1.4}\\ v_i^TA^TAv_i=\amp 0 \amp \text{ for } i=r+1,\ldots, n\tag{9.1.5} \end{align}
From (9.1.5), we have
\begin{equation*} v_k^TA^TAv_k=\innprod{Av_k}{Av_k}=0 \text{ for } i=r+1,\ldots,n\text{.} \end{equation*}
This implies,
\begin{equation*} Av_k=0 \text{ for } i=r+1,\ldots,n\text{.} \end{equation*}
We define
\begin{equation} u_i:=\frac{1}{\sigma_i}Av_i \text{ for } i=1,2,\ldots, r\text{.}\tag{9.1.6} \end{equation}
We claim that \(\{u_1,\ldots, u_r\}\) is an orthonormal set. For
\begin{equation*} \innprod{u_i}{u_j}=u_i^Tu_j=\frac{1}{\sigma_i}(Av_i)^T\frac{1}{\sigma_j}Av_j =\frac{1}{\sigma_i\sigma_j}v_i^TA^TAv_j=\delta_{ij}\text{.} \end{equation*}
Now we complete \(\{u_1,\ldots, u_r\}\) to an orthonormal basis \(\{u_1,\ldots, u_r,u_{r+1},\ldots, u_n\}\) of \(\R^n\text{.}\) Define
\begin{equation*} U:=[u_1~\ldots~ u_r~u_{r+1}~\ldots~ u_n] \text{ and } V:=[v_1~\ldots~ v_r~v_{r+1}~\ldots~ v_n]. \end{equation*}
We claim that \(U^TAV=\sum\) and hence \(A= U\Sigma V^T\text{.}\)
\begin{align*} U^TAV =\amp \begin{pmatrix} u_1^T\\\vdots \\ u_r^T\\u_{r+1}^T\\\vdots\\u_n^T\end{pmatrix} A[v_1~\ldots~ v_r~v_{r+1}~\ldots~ v_n]\\ =\amp \begin{pmatrix} \frac{1}{\sigma_1}v_1^TA^T\\\vdots\\\frac{1}{\sigma_r}v_r^TA^T\\u_{r+1}^T\\ \vdots\\u_n^T\end{pmatrix}A[v_1~\ldots~ v_r~v_{r+1}~\ldots~ v_n]\\ =\amp \begin{pmatrix} \frac{1}{\sigma_1}v_1^TA^T\\\vdots\\\frac{1}{\sigma_r}v_r^TA^T\\ u_{r+1}^T\\\vdots\\u_n^T\end{pmatrix}[Av_1~\ldots~ Av_r~v_{r+1}~\ldots~ Av_n]\\ =\amp \diag(\sigma_1,\ldots,\sigma_r,0,\ldots,0)=\sum\text{.} \end{align*}
Hence
\begin{equation*} A=U\sum V^T\text{.} \end{equation*}

Remark 9.1.2.

The singular values of a matrix in a SVD are unique, however singular vectors are not unique.
The decomposition (9.1.7) is called a rank one decomposition of \(A\), as rank of each term in (9.1.7) is 1. (why?) This is a very useful way of decomposing \(A\) as we shall see this later.

Reading Questions Reading Questions

1.
Use SVD to show that a square matrix \(A\) is symmetric (\(A^T=A\)) if and only if \(A^TA=AA^T\text{.}\)
Hint.
\(A^TA=AA^T\) implies right singular vectors of \(A\) are same as left singular vectors of \(A\) with same singular values. Hence the SVD of \(A=Q\Sigma P\) with \(P=Q\text{.}\) Any matrix of the form \(P\Sigma P^T\) is symmetric.

Exercises Exercises

1.
An \(n\times n\) matrix \(A\) is rank one matrix if and only if there exist non-zero vectors \(p,q\in \R^n\) such that \(A=pq^T\text{.}\)
2.
If \(A\) is a real symmetric matrix with eigenvalues \(\lambda_1,\ldots,\lambda_n\text{,}\) then show that singular values of \(A\) are \(|\lambda_1|,\ldots,|\lambda_n|\)
3.
A square matrix \(A\) is non singular if and only if all singular values of \(A\) are non zero.

Example 9.1.4.

Let us find a singular value decomposition of \(A= \begin{pmatrix}3 \amp 2 \amp 2 \\ 2 \amp 3 \amp -2 \end{pmatrix}\text{.}\)
We have \(A^TA=\begin{pmatrix}13 \amp 12 \amp 2 \\ 12 \amp 13 \amp -2 \\ 2 \amp -2 \amp 8 \end{pmatrix}\text{.}\) The eigenvalues of \(A^TA\) are \(\sigma_1^2=25, \sigma_2^2=9\) and \(\sigma_3^2=0\text{.}\) The corresponding eigenvectors with respect to eigenvalues \(25,9, 0\) of \(A^TA\) are \(\begin{pmatrix}1\\1\\0 \end{pmatrix}\text{,}\) \(\begin{pmatrix}1\\-1\\4 \end{pmatrix}\) and \(\begin{pmatrix}1\\-1\\-\frac{1}{2} \end{pmatrix}\) respectively. Hence an orthonormal eigenbasis of \(A^TA\) is
\begin{equation*} \{v_1,v_2,v_3\}=\left\{\begin{pmatrix}1/\sqrt{2}\\1/\sqrt{2}\\0 \end{pmatrix} , \begin{pmatrix}1/\sqrt{18}\\-1/\sqrt{18}\\4/\sqrt{18} \end{pmatrix} , \begin{pmatrix}2/3\\-2/3\\-1/3 \end{pmatrix} \right\}\text{.} \end{equation*}
We define
\begin{equation*} u_1:=\frac{1}{\sigma_1}Av_1=\begin{pmatrix}1/\sqrt{2}\\1/\sqrt{2} \end{pmatrix} \text{ and } u_2:= \frac{1}{\sigma_2}Av_2=\begin{pmatrix}1/\sqrt{2}\\-1/\sqrt{2} \end{pmatrix}\text{.} \end{equation*}
Thus we have
\begin{equation*} U=\begin{pmatrix}\frac{1}{2} \, \sqrt{2} \amp \frac{1}{2} \, \sqrt{2} \\ \frac{1}{2} \, \sqrt{2} \amp -\frac{1}{2} \, \sqrt{2} \end{pmatrix} \end{equation*}
\begin{equation*} V=\begin{pmatrix}\frac{1}{2} \, \sqrt{2} \amp \frac{1}{6} \, \sqrt{2} \amp \frac{2}{3} \\\frac{1}{2} \, \sqrt{2} \amp -\frac{1}{6} \, \sqrt{2} \amp -\frac{2}{3} \\ 0 \amp \frac{2}{3} \, \sqrt{2} \amp -\frac{1}{3} \end{pmatrix} \text{ and } \sum = \begin{pmatrix}5 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \end{pmatrix}\text{.} \end{equation*}
It is easy to check that that \(UAV^T=\sum\text{.}\)

Exercises Exercises

1.
Verify the equation (9.1.7) for example 9.1.4. That is,
\begin{equation*} A=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T\text{.} \end{equation*}
2.
Find a singular value decomposition of \(A =\begin{pmatrix}1 \amp 1 \amp -1\\1 \amp 1 \amp -1 \end{pmatrix}\text{.}\)
3.
Find the SVD of \(A = \begin{pmatrix}1 \amp 1 \\2 \amp 2 \\3 \amp 1 \end{pmatrix}\) using step by step calculations.

Subsection 9.1.2 Pseudoinverse using SVD

Definition 9.1.5.

If \(A\) is a \(m\times n\) matrix, then the pseudoinverse of \(A\) is matrix \(X\) satisfying the following properties:
  1. \(\displaystyle AXA=A\)
  2. \(\displaystyle XAX=X\)
  3. \(\displaystyle (AX)^T=AX\)
  4. \((XA)^T=XA\text{.}\)
The pseudoinverse of a matrix \(A\) is denoted by \(A^\dagger\text{.}\) Pseudoinverse is also called the generalized inverse or Moore-Penrose pseudoinverse.
Singular value decomposition provides an effective procedure to find the pseudoinverse of a matrix.
Suppose \(A=U\sum V^T\) is a SVD of \(A\text{.}\) Since \(U\) and \(V\) are orthogonal matrices they are invertible. Thus to define pseudoinverse of \(A\text{,}\) it is sufficient to define pseudoinverse of the diagonal matrix \(\sum\text{.}\) It is natural to define the inverse of diagonal matrix by taking reciprocal of the nonzero diagonal entries and taking its transpose. Thus if
\begin{equation*} \Sigma = \left[\begin{array}{ccc|ccc} \sigma_1 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp \sigma_r \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{m\times n}\text{.} \end{equation*}
Then
\begin{equation*} \Sigma^\dagger = \left[\begin{array}{ccc|ccc} 1/\sigma_1 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 1/ \sigma_r \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{m\times n}^T\text{.} \end{equation*}
Having defined the generalized inverse of \(\sum\text{,}\) now it is natural to define
\begin{equation} A^\dagger:=V{\sum}^\dagger U^T\text{.}\tag{9.1.8} \end{equation}

Example 9.1.6.

Find the pseudoinverse of \(A=\begin{pmatrix}1\amp 1\\1\amp 1\\1\amp -1 \end{pmatrix}\text{.}\)
Note that \(A^TA=\begin{pmatrix}3\amp 1\\1\amp 3 \end{pmatrix}\text{.}\) The eigenvalues of \(A^TA\) are \(\sigma_1^2=4\) and \(\sigma_2^2=2\) with corresponding orthonormal eigenvectors \(v_1=\begin{pmatrix}\frac{1}{\sqrt{2}}\\\frac{1}{\sqrt{2}} \end{pmatrix}\) and \(v_2=\begin{pmatrix}\frac{1}{\sqrt{2}}\\-\frac{1}{\sqrt{2}} \end{pmatrix}\) respectively. Now
\begin{equation*} u_1=\frac{1}{\sigma_1}Av_1=\begin{pmatrix}\frac{1}{\sqrt{2}}\\\frac{1}{\sqrt{2}}\\0 \end{pmatrix} \text{ and } u_2=\frac{1}{\sigma_2}Av_2=\begin{pmatrix}0\\0\\1 \end{pmatrix}\text{.} \end{equation*}
Extending \(u_1,u_2\) to an orthonormal basis of \(\R^3\text{,}\) we can select\(u_3=\begin{pmatrix}\frac{1}{\sqrt{2}}\\-\frac{1}{\sqrt{2}}\\0 \end{pmatrix}\text{.}\)
Thus a SVD is given by
\begin{equation*} A=U\sum V^T= \begin{pmatrix}\frac{1}{\sqrt{2}} \amp 0 \amp \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \amp 0 \amp -\frac{1}{\sqrt{2}}\\ 0 \amp 1 \amp 0 \end{pmatrix} \begin{pmatrix}2\amp 0\\ 0 \amp \sqrt{2}\\ 0\amp 0 \end{pmatrix} \begin{pmatrix}\frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \amp -\frac{1}{\sqrt{2}} \end{pmatrix}\text{.} \end{equation*}
Hence
\begin{align*} A^\dagger=\amp V{\sum}^\dagger U^T= \begin{pmatrix} \frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \amp -\frac{1}{\sqrt{2}} \end{pmatrix} \begin{pmatrix}1/2\amp 0\amp 0\\0 \amp \frac{1}{\sqrt{2}}\amp 0 \end{pmatrix} \begin{pmatrix}\frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}}\amp 0\\0 \amp 0 \amp 1\\ \frac{1}{\sqrt{2}} \amp -\frac{1}{\sqrt{2}}\amp 0 \end{pmatrix}\\ =\amp \begin{pmatrix}1/4\amp 1/4\amp 1/2\\1/4\amp 1/4\amp -1/2 \end{pmatrix}\text{.} \end{align*}

Reading Questions Reading Questions

1.
If \(A\) is \(m\times n\) matrix with \(m\geq n\) and \(\rank(A)=n\) then
\begin{equation*} A^\dagger=(A^TA)^{-1}A^T\text{.} \end{equation*}
\((A^TA)^{-1}A^T\) is called the left pseudoinnverse of \(A\text{.}\)
2.
If \(m\leq n\) and \(\rank(A)=m\) then
\begin{equation*} A^\dagger =A^T(AA^T)^{-1}\text{.} \end{equation*}
\(A^T(AA^T)^{-1}\) is called the right pseudoinverse of \(A\text{.}\)

Exercises Exercises

1.
If \(A\) is square matrix, then show that \(A^TA\) and \(AA^T\) are similar.
2.
Find the SVD of a matrix \(\begin{pmatrix}1 \amp 1 \amp 1 \\2 \amp 2 \amp 2\\3 \amp 1 \amp -1 \end{pmatrix}\text{.}\)
3.
Find the least square solution of the system of equations \(Ax=b\) where \(A =\begin{pmatrix}2 \amp 3 \amp -1\\-2 \amp 1 \amp 4\\3 \amp 1 \amp 3\\ -5 \amp 4 \amp 2\\1 \amp 1 \amp 1 \end{pmatrix}\) and \(b= \begin{pmatrix}-10\\7\\15\\8\\9 \end{pmatrix}\) using generalized inverse.

Subsection 9.1.3 Geometry of SVD

SVD provides effective way to look at how a matrix tranforms and object geometrically. In order to see the geometric, let us consider the matrix \(A=\left(\begin{array}{rr} 4 \amp -2 \\ -2 \amp 3 \end{array} \right)\text{.}\)
The singular value decomposition of \(A=USV^T\) is given by
\begin{equation*} A=\left(\begin{array}{rr} -0.78820 \amp 0.61541 \\ 0.61541 \amp 0.78820 \end{array} \right)\left(\begin{array}{rr} 5.56150 \amp 0.0 \\ 0.0 \amp 1.4384 \end{array} \right)\left(\begin{array}{rr} -0.78820 \amp 0.61541 \\ 0.61541 \amp 0.78820 \end{array} \right)^T\text{.} \end{equation*}
The geometric meaning of \(A\) applied to unit circle along with unit vectors \(e_1\) and \(e_2\) is explained in the following figure Figure 9.1.8.
Figure 9.1.8. Transformation of unit circle under SVD

Example 9.1.9.

Consider a \(3\times 3\) matrix \(A=\begin{pmatrix}2 \amp 3 \amp 1 \\ -1 \amp 2 \amp 1 \\ 0 \amp 2 \amp 3 \end{pmatrix}\text{.}\) The singular values of \(A\) are 5.107, 2.2982 and 1.2779.
The Figure Figure 9.1.10 below explains what happens to a unit sphere and unit vectors under \(A\text{,}\) obtained using SVD.
Figure 9.1.10. Transformation of unit circle under SVD

Subsection 9.1.4 Image Compression using SVD

Images stored on a computer is a collection of dots called pixels. The collection of dots/pixels that constitute an image can be stored as a matrix. The color image can be thought of as 3 dimensional array.
Using Eqn. (9.1.7) we can write a matrix as sum of rank one matrices.
\begin{equation*} A=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_ru_rv_r^T\text{.} \end{equation*}
This property says that \(\rank{(A)}\) is equal to the number of singular values of \(A\text{.}\) Since \(\sigma_1\gt \sigma_2\gt \cdots\gt \sigma_r\text{,}\) the first term has highest impact on \(A\) followed by the second term and so on. This propriety allows us to reduce the noise or compress the matrix data by eliminating the small singular values or the higher ranks. This can be used as approximation of a given matrix, in particular we can approximate a matrix by adding only the first few terms of (9.1.7).
If we let
\begin{equation*} A_k:=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_ku_kv_k^T (k\leq r) \end{equation*}
then the total storage required for \(A_k\) is \(k(m+n+1)\) which is much less compare to \(mn\text{.}\)
When an image (the corresponding matrix) is transformed using SVD, it is not compressed, but the data take a form in which the first singular value has a more amount of the image information. This allows us to use first few singular values to represent the image almost identical to the original.
Look at the Image in Fig 9.1.11. The associates matrix for this image is of size \(995\times 1770\times 3\text{.}\)
Figure 9.1.11. Orginal Color Image
This image is converted into a gray scale image (See Figure 9.1.12). This size of matrix associated to the gray scale image is \(995\times 1770\) with rank 995.
Figure 9.1.12. Original Gray Image
After applying SVD to the Gray image and using 1st 5, 10, 20,30, 50, 100 terms respectively of the rank one approximation the approximate images are plotted in the Figures 9.1.13, Figure 9.1.14, Figure 9.1.15, Figure 9.1.17, Figure 9.1.17, Figure 9.1.18 respectively.
Figure 9.1.13. Approximate Image with 5 terms
Figure 9.1.14. Approximate Image with 10 terms
Figure 9.1.15. Approximate Image with 20 terms
Figure 9.1.16. Approximate Image with 30 terms
Figure 9.1.17. Approximate Image with 50 terms
Figure 9.1.18. Approximate Image with 100 terms
It is quite evident that 1st 100 terms itself gives a very good approximation of the original gray scale image. Note that the original image has \(995\times 1770=1761150\) pixels, where as if we take the 1st 100 terms, then it is of size \(100(995+1770)=276500\) which quite small compared to the original size.