Singular Value Decomposition

Section 9.1 Singular Value Decomposition

In this section, we deal with one of the most important matrix factorization tools, called the singular value decomposition (SVD). The SVD of a matrix \(A\) is closely related to eigen decomposition of the matrix \(AA^T\text{.}\) One can also think of this as a generalization of diagonalization procedure that allows us to diagonalize any matrix not necessarily square matrix. The SVD is computationally a viable tool for a wide variety of applications. It has applications in image and signal processing, control theory, least square problems, time series analysis, pattern recognition, dimensionality reduction, biomedical engineering and defining a generalized inverse of a matrix and many more. We shall deal with few of these applications.

🔗

Subsection 9.1.1 Singular Value Decomposition Theorem

Theorem 9.1.1. SVD Theorem.

Let \(A\) be a real \(m\times n\) matrix. Then \(A\) can be factorized as

\begin{equation} A=U\sum V^T\tag{9.1.1} \end{equation}

where \(U\) is an \(m\times m\) orthogonal matrix, \(V\) is an \(n\times n\) orthogonal matrix and \(\sum\) is a \(m\times n\) diagonal matrix given by

\begin{equation*} \left[\begin{array}{ccc|ccc} \sigma_1 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp \sigma_r \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{m\times n} \end{equation*}

whose diagonal entries are non negative and are arranged in a non increasing order.

🔗

The number of non zero entries in \(\sum\) is the rank of \(A\text{.}\)

🔗

The decomposition \(A=U\sum V^T\) is called a singular value decomposition of \(A\text{.}\) The diagonals entries \(\sigma_1,\sigma_2,\ldots, \sigma_r\) are called singular values of \(A\text{.}\) (that is why the name singular value decomposition.)

🔗

Before we prove this theorem, let us play with the equation (9.1.1). We have

\begin{equation*} A^TA = \left(V\Sigma^TU\right) \left(U^T\Sigma V^T\right) =V\Sigma^T\Sigma V^T\text{.} \end{equation*}

This implies

\begin{equation*} V^T(A^TA)V=\Sigma^T\Sigma \text{.} \end{equation*}

Hence

\begin{equation} V^T(A^TA)V = \left[\begin{array}{ccc|ccc} \sigma_1^2 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp \sigma_r^2 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{n\times n}\tag{9.1.2} \end{equation}

🔗

The Eqn. (9.1.2) suggests that columns of \(V\) are eigenvectors of \(A^TA\text{,}\) and are called right singular vectors.

🔗

Similarly,

\begin{equation*} AA^T = \left(U\Sigma V^T\right) \left(V\Sigma^T U^T\right) =U\Sigma\Sigma^T U^T \end{equation*}

This implies

\begin{equation} U(A^TA)U^T = \left[\begin{array}{ccc|ccc} \sigma_1^2 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp \sigma_r^2 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{m\times m}\tag{9.1.3} \end{equation}

🔗

The Eqn. (9.1.3) suggests that columns of \(U\) are eigenvectors of \(AA^T\text{,}\) and are called left singular vectors.

🔗

The notion of right and left eigenvectors suggest a way to construct the matrix \(V\) and \(U\) in the SVD decomposition. Let us see what do I mean? Suppose \(U=[u_1~u_2~\cdots~u_m]\) and \(V=[v_1~v_2~\cdots~v_n]\text{.}\) Then we have

\begin{equation*} A^TAv_i=\sigma_i^2 v_i \text{ and } AA^Tu_i=\sigma_i^2 u_i\text{.} \end{equation*}

🔗

Hence

\begin{equation*} A(A^TA v_i) = (AA^T)(Av_i)=A(\sigma_i^2v_i)=\sigma_i^2(Av_i)\text{.} \end{equation*}

🔗

Thus suggests, that \(u_i\) may be defined as \(Av_i\text{.}\) However, if \(u_i=Av_i\text{,}\) then

\begin{equation*} \innprod{u_i}{u_j} = \innprod{Av_i}{Av_j}=\innprod{A^TAv_i}{v_j}=\sigma_i^2\innprod{v_i}{v_j}=\sigma_i^2\delta_{ij}\text{.} \end{equation*}

🔗

If we want \(U\) to be orthogonal, we may define \(u_i=\frac{1}{\sigma_i}Av_i\text{.}\)

🔗

Let us look at proof of Theorem 9.1.1.

🔗

Proof.

Note that \(A^TA\) is symmetric and \(\rank{(A)}=\rank{(A^TA)}\text{.}\) Let \(\rank{(A)}=r\text{.}\) Further

\begin{equation*} \innprod{A^TAx}{x}=\innprod{Ax}{Ax}=\norm{Ax}^2\geq 0, \forall x\in \R^n\text{.} \end{equation*}

🔗

Hence \(A^TA\) is a symmetric and semi-positive definite matrix. Hence all its eigenvalues are real and non negative. Let \(\lambda_1, \lambda_2,\ldots,\lambda_r\) be non zero eigenvalues of \(A^TA\) with \(\lambda_1\geq \lambda_2\geq\cdots\geq\lambda_r\text{.}\) The remaining eigenvalues of \(A^TA\) are \(\lambda_{r+1}=\lambda_{r+2}=\cdots=\lambda_n=0\text{.}\) Let us denote \(\lambda_i=\sigma_i^2\) for \(i=1,\ldots, n\text{.}\) Since \(A^TA\) is symmetric matrix, it is diagonalizable. Hence there exists an orthogonal eigenbasis \(v_1,\ldots, v_n\) for \(\R^n\) of \(A^TA\text{.}\) Let \(A^TAv_i=\sigma_i^2v_i\text{.}\) This implies

\begin{align} v_i^TA^TAv_i=\amp \sigma_i^2 \amp \text{ for } i=1,2,\ldots, r\tag{9.1.4}\\ v_i^TA^TAv_i=\amp 0 \amp \text{ for } i=r+1,\ldots, n\tag{9.1.5} \end{align}

🔗

From (9.1.5), we have

\begin{equation*} v_k^TA^TAv_k=\innprod{Av_k}{Av_k}=0 \text{ for } i=r+1,\ldots,n\text{.} \end{equation*}

🔗

This implies,

\begin{equation*} Av_k=0 \text{ for } i=r+1,\ldots,n\text{.} \end{equation*}

🔗

We define

\begin{equation} u_i:=\frac{1}{\sigma_i}Av_i \text{ for } i=1,2,\ldots, r\text{.}\tag{9.1.6} \end{equation}

🔗

We claim that \(\{u_1,\ldots, u_r\}\) is an orthonormal set. For

\begin{equation*} \innprod{u_i}{u_j}=u_i^Tu_j=\frac{1}{\sigma_i}(Av_i)^T\frac{1}{\sigma_j}Av_j =\frac{1}{\sigma_i\sigma_j}v_i^TA^TAv_j=\delta_{ij}\text{.} \end{equation*}

🔗

Now we complete \(\{u_1,\ldots, u_r\}\) to an orthonormal basis \(\{u_1,\ldots, u_r,u_{r+1},\ldots, u_n\}\) of \(\R^n\text{.}\) Define

\begin{equation*} U:=[u_1~\ldots~ u_r~u_{r+1}~\ldots~ u_n] \text{ and } V:=[v_1~\ldots~ v_r~v_{r+1}~\ldots~ v_n]. \end{equation*}

🔗

We claim that \(U^TAV=\sum\) and hence \(A= U\Sigma V^T\text{.}\)

\begin{align*} U^TAV =\amp \begin{pmatrix} u_1^T\\\vdots \\ u_r^T\\u_{r+1}^T\\\vdots\\u_n^T\end{pmatrix} A[v_1~\ldots~ v_r~v_{r+1}~\ldots~ v_n]\\ =\amp \begin{pmatrix} \frac{1}{\sigma_1}v_1^TA^T\\\vdots\\\frac{1}{\sigma_r}v_r^TA^T\\u_{r+1}^T\\ \vdots\\u_n^T\end{pmatrix}A[v_1~\ldots~ v_r~v_{r+1}~\ldots~ v_n]\\ =\amp \begin{pmatrix} \frac{1}{\sigma_1}v_1^TA^T\\\vdots\\\frac{1}{\sigma_r}v_r^TA^T\\ u_{r+1}^T\\\vdots\\u_n^T\end{pmatrix}[Av_1~\ldots~ Av_r~v_{r+1}~\ldots~ Av_n]\\ =\amp \diag(\sigma_1,\ldots,\sigma_r,0,\ldots,0)=\sum\text{.} \end{align*}

🔗

Hence

\begin{equation*} A=U\sum V^T\text{.} \end{equation*}

🔗

Remark 9.1.2.

The singular values of a matrix in a SVD are unique, however singular vectors are not unique.

🔗

Corollary 9.1.3.

Let \(A=U\Sigma V^T\) be a SVD of \(A\) where \(U=[u_1~u_2~\cdots~u_m]\) and \(V=[v_1~v_2~\cdots~v_n]\text{,}\) \(\sigma_1,\ldots, \sigma_r\) are singular values of \(A\text{.}\) Then \(A\) can also be written as

\begin{gather} A=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_ru_rv_r^T\text{.}\tag{9.1.7} \end{gather}

🔗

The decomposition (9.1.7) is called a rank one decomposition of \(A\), as rank of each term in (9.1.7) is 1. (why?) This is a very useful way of decomposing \(A\) as we shall see this later.

🔗

Reading Questions Reading Questions

1.

Use SVD to show that a square matrix \(A\) is symmetric (\(A^T=A\)) if and only if \(A^TA=AA^T\text{.}\)

🔗

Hint.

\(A^TA=AA^T\) implies right singular vectors of \(A\) are same as left singular vectors of \(A\) with same singular values. Hence the SVD of \(A=Q\Sigma P\) with \(P=Q\text{.}\) Any matrix of the form \(P\Sigma P^T\) is symmetric.

🔗

Exercises Exercises

1.

An \(n\times n\) matrix \(A\) is rank one matrix if and only if there exist non-zero vectors \(p,q\in \R^n\) such that \(A=pq^T\text{.}\)

🔗

2.

If \(A\) is a real symmetric matrix with eigenvalues \(\lambda_1,\ldots,\lambda_n\text{,}\) then show that singular values of \(A\) are \(|\lambda_1|,\ldots,|\lambda_n|\)

🔗

3.

A square matrix \(A\) is non singular if and only if all singular values of \(A\) are non zero.

🔗

Example 9.1.4.

Let us find a singular value decomposition of \(A= \begin{pmatrix}3 \amp 2 \amp 2 \\ 2 \amp 3 \amp -2 \end{pmatrix}\text{.}\)

🔗

We have \(A^TA=\begin{pmatrix}13 \amp 12 \amp 2 \\ 12 \amp 13 \amp -2 \\ 2 \amp -2 \amp 8 \end{pmatrix}\text{.}\) The eigenvalues of \(A^TA\) are \(\sigma_1^2=25, \sigma_2^2=9\) and \(\sigma_3^2=0\text{.}\) The corresponding eigenvectors with respect to eigenvalues \(25,9, 0\) of \(A^TA\) are \(\begin{pmatrix}1\\1\\0 \end{pmatrix}\text{,}\) \(\begin{pmatrix}1\\-1\\4 \end{pmatrix}\) and \(\begin{pmatrix}1\\-1\\-\frac{1}{2} \end{pmatrix}\) respectively. Hence an orthonormal eigenbasis of \(A^TA\) is

\begin{equation*} \{v_1,v_2,v_3\}=\left\{\begin{pmatrix}1/\sqrt{2}\\1/\sqrt{2}\\0 \end{pmatrix} , \begin{pmatrix}1/\sqrt{18}\\-1/\sqrt{18}\\4/\sqrt{18} \end{pmatrix} , \begin{pmatrix}2/3\\-2/3\\-1/3 \end{pmatrix} \right\}\text{.} \end{equation*}

🔗

We define

\begin{equation*} u_1:=\frac{1}{\sigma_1}Av_1=\begin{pmatrix}1/\sqrt{2}\\1/\sqrt{2} \end{pmatrix} \text{ and } u_2:= \frac{1}{\sigma_2}Av_2=\begin{pmatrix}1/\sqrt{2}\\-1/\sqrt{2} \end{pmatrix}\text{.} \end{equation*}

🔗

Thus we have

\begin{equation*} U=\begin{pmatrix}\frac{1}{2} \, \sqrt{2} \amp \frac{1}{2} \, \sqrt{2} \\ \frac{1}{2} \, \sqrt{2} \amp -\frac{1}{2} \, \sqrt{2} \end{pmatrix} \end{equation*}

\begin{equation*} V=\begin{pmatrix}\frac{1}{2} \, \sqrt{2} \amp \frac{1}{6} \, \sqrt{2} \amp \frac{2}{3} \\\frac{1}{2} \, \sqrt{2} \amp -\frac{1}{6} \, \sqrt{2} \amp -\frac{2}{3} \\ 0 \amp \frac{2}{3} \, \sqrt{2} \amp -\frac{1}{3} \end{pmatrix} \text{ and } \sum = \begin{pmatrix}5 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \end{pmatrix}\text{.} \end{equation*}

🔗

It is easy to check that that \(UAV^T=\sum\text{.}\)

🔗

Sage includes a built-in method for computing the SVD of a matrix. However, the matrix must be defined over the Real Double Field (RDF), a numerical field of real numbers represented in 64-bit double precision, offering approximately 15–16 decimal digits of accuracy. Let us see to achieve this.

🔗

Example 9.1.5.

Find the signular value decomposition of the matrix \(A=\left(\begin{array}{rr}3 \amp 5 \\5 \amp -3 \\4 \amp 4\end{array}\right)\text{.}\)

🔗

Solution.

We have

\begin{equation*} A^TA=\left(\begin{array}{rr} 50 \amp 16 \\ 16 \amp 50 \end{array}\right). \end{equation*}

It is easy to check that eigenvalues of \(A^TA\) are 66 and 34 with corresponding eigenvectors \((1,1)\) and \((1,-1)\text{.}\) Hence we have

\begin{equation*} V= \left(\begin{array}{rr} \frac{1}{2} \, \sqrt{2} \amp \frac{1}{2} \, \sqrt{2} \\ \frac{1}{2} \, \sqrt{2} \amp -\frac{1}{2} \, \sqrt{2} \end{array}\right) \text{ and } \Sigma = \left(\begin{array}{rr} \sqrt{66} \amp 0 \\ 0 \amp \sqrt{34} \\ 0 \amp 0 \end{array}\right). \end{equation*}

🔗

Now let us find the matrix \(U\text{.}\) We have \(v_1=\begin{pmatrix}1/\sqrt{2}\\1/\sqrt{2}\end{pmatrix}\text{,}\) \(v_2=\begin{pmatrix}1/\sqrt{2}\\1/\sqrt{2}\end{pmatrix}\) and \(\sigma_1=\sqrt{66}, \sigma_2=\sqrt{34}\text{.}\) Hence

\begin{equation*} u_1 = \frac{1}{\sigma_1}Av_1=\left(\begin{array}{r} \frac{2}{33} \, \sqrt{66} \sqrt{2} \\ \frac{1}{66} \, \sqrt{66} \sqrt{2} \\ \frac{2}{33} \, \sqrt{66} \sqrt{2} \end{array}\right), u_2 = \frac{1}{\sigma_2}Av_2=\left(\begin{array}{r} -\frac{1}{34} \, \sqrt{34} \sqrt{2} \\ \frac{2}{17} \, \sqrt{34} \sqrt{2} \\ 0 \end{array}\right). \end{equation*}

To find \(u_3\text{,}\) we take the cross product of \(u_1\) and \(u_2\text{.}\) Thus

\begin{equation*} u_3 = u_1\times u_2 = \left(\begin{array}{r} -\frac{8}{561} \, \sqrt{66} \sqrt{34} \\ -\frac{2}{561} \, \sqrt{66} \sqrt{34} \\ \frac{1}{66} \, \sqrt{66} \sqrt{34} \end{array}\right). \end{equation*}

Hence we have

\begin{equation*} U= \left(\begin{array}{rrr} \frac{2}{33} \, \sqrt{66} \sqrt{2} \amp -\frac{1}{34} \, \sqrt{34} \sqrt{2} \amp -\frac{8}{561} \, \sqrt{66} \sqrt{34} \\ \frac{1}{66} \, \sqrt{66} \sqrt{2} \amp \frac{2}{17} \, \sqrt{34} \sqrt{2} \amp -\frac{2}{561} \, \sqrt{66} \sqrt{34} \\ \frac{2}{33} \, \sqrt{66} \sqrt{2} \amp 0 \amp \frac{1}{66} \, \sqrt{66} \sqrt{34} \end{array}\right) \end{equation*}

Veryfy that \(U\Sigma V^T=A\text{.}\)

🔗

Exercises Exercises

1.

Verify the equation (9.1.7) for example 9.1.4. That is,

\begin{equation*} A=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T\text{.} \end{equation*}

🔗

2.

Find a singular value decomposition of \(A =\begin{pmatrix}1 \amp 1 \amp -1\\1 \amp 1 \amp -1 \end{pmatrix}\text{.}\)

🔗

3.

Find the SVD of \(A = \begin{pmatrix}1 \amp 1 \\2 \amp 2 \\3 \amp 1 \end{pmatrix}\) using step by step calculations.

🔗

Subsection 9.1.2 Pseudoinverse using SVD

Definition 9.1.6.

If \(A\) is a \(m\times n\) matrix, then the pseudoinverse of \(A\) is matrix \(X\) satisfying the following properties:

🔗

\(\displaystyle AXA=A\)
🔗

🔗
\(\displaystyle XAX=X\)
🔗

🔗
\(\displaystyle (AX)^T=AX\)
🔗

🔗
\((XA)^T=XA\text{.}\)
🔗

🔗

🔗

The pseudoinverse of a matrix \(A\) is denoted by \(A^\dagger\text{.}\) Pseudoinverse is also called the generalized inverse or Moore-Penrose pseudoinverse.

🔗

Singular value decomposition provides an effective procedure to find the pseudoinverse of a matrix.

🔗

Suppose \(A=U\sum V^T\) is a SVD of \(A\text{.}\) Since \(U\) and \(V\) are orthogonal matrices they are invertible. Thus to define pseudoinverse of \(A\text{,}\) it is sufficient to define pseudoinverse of the diagonal matrix \(\sum\text{.}\) It is natural to define the inverse of diagonal matrix by taking reciprocal of the nonzero diagonal entries and taking its transpose. Thus if

\begin{equation*} \Sigma = \left[\begin{array}{ccc|ccc} \sigma_1 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp \sigma_r \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{m\times n}\text{.} \end{equation*}

Then

\begin{equation*} \Sigma^\dagger = \left[\begin{array}{ccc|ccc} 1/\sigma_1 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 1/ \sigma_r \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \vdots \amp 0 \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp \cdots \amp 0 \\ \end{array}\right]_{m\times n}^T\text{.} \end{equation*}

🔗

Having defined the generalized inverse of \(\sum\text{,}\) now it is natural to define

\begin{equation} A^\dagger:=V{\sum}^\dagger U^T\text{.}\tag{9.1.8} \end{equation}

🔗

Example 9.1.7.

Find the pseudoinverse of \(A=\begin{pmatrix}1\amp 1\\1\amp 1\\1\amp -1 \end{pmatrix}\text{.}\)

🔗

Note that \(A^TA=\begin{pmatrix}3\amp 1\\1\amp 3 \end{pmatrix}\text{.}\) The eigenvalues of \(A^TA\) are \(\sigma_1^2=4\) and \(\sigma_2^2=2\) with corresponding orthonormal eigenvectors \(v_1=\begin{pmatrix}\frac{1}{\sqrt{2}}\\\frac{1}{\sqrt{2}} \end{pmatrix}\) and \(v_2=\begin{pmatrix}\frac{1}{\sqrt{2}}\\-\frac{1}{\sqrt{2}} \end{pmatrix}\) respectively. Now

\begin{equation*} u_1=\frac{1}{\sigma_1}Av_1=\begin{pmatrix}\frac{1}{\sqrt{2}}\\\frac{1}{\sqrt{2}}\\0 \end{pmatrix} \text{ and } u_2=\frac{1}{\sigma_2}Av_2=\begin{pmatrix}0\\0\\1 \end{pmatrix}\text{.} \end{equation*}

🔗

Extending \(u_1,u_2\) to an orthonormal basis of \(\R^3\text{,}\) we can select\(u_3=\begin{pmatrix}\frac{1}{\sqrt{2}}\\-\frac{1}{\sqrt{2}}\\0 \end{pmatrix}\text{.}\)

🔗

Thus a SVD is given by

\begin{equation*} A=U\sum V^T= \begin{pmatrix}\frac{1}{\sqrt{2}} \amp 0 \amp \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \amp 0 \amp -\frac{1}{\sqrt{2}}\\ 0 \amp 1 \amp 0 \end{pmatrix} \begin{pmatrix}2\amp 0\\ 0 \amp \sqrt{2}\\ 0\amp 0 \end{pmatrix} \begin{pmatrix}\frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \amp -\frac{1}{\sqrt{2}} \end{pmatrix}\text{.} \end{equation*}

🔗

Hence

\begin{align*} A^\dagger=\amp V{\sum}^\dagger U^T= \begin{pmatrix} \frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \amp -\frac{1}{\sqrt{2}} \end{pmatrix} \begin{pmatrix}1/2\amp 0\amp 0\\0 \amp \frac{1}{\sqrt{2}}\amp 0 \end{pmatrix} \begin{pmatrix}\frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}}\amp 0\\0 \amp 0 \amp 1\\ \frac{1}{\sqrt{2}} \amp -\frac{1}{\sqrt{2}}\amp 0 \end{pmatrix}\\ =\amp \begin{pmatrix}1/4\amp 1/4\amp 1/2\\1/4\amp 1/4\amp -1/2 \end{pmatrix}\text{.} \end{align*}

🔗

Sage provdes a method .pseudoinverse() to find the pseudoinverse of a matrix.

🔗

Reading Questions Reading Questions

1.

If \(A\) is \(m\times n\) matrix with \(m\geq n\) and \(\rank(A)=n\) then

\begin{equation*} A^\dagger=(A^TA)^{-1}A^T\text{.} \end{equation*}

🔗

\((A^TA)^{-1}A^T\) is called the left pseudoinnverse of \(A\text{.}\)

🔗

2.

If \(m\leq n\) and \(\rank(A)=m\) then

\begin{equation*} A^\dagger =A^T(AA^T)^{-1}\text{.} \end{equation*}

🔗

\(A^T(AA^T)^{-1}\) is called the right pseudoinverse of \(A\text{.}\)

🔗

Theorem 9.1.8.

Let \(Ax = b\text{,}\) be a system of \(m\) equations in \(n\) variables with \(m\geq n\text{.}\) Let \(\rank(A) = n\text{.}\) Then the vector \(x^*=A^\dagger b\) minimizes \(\norm{Ax-b}^2\) on \(\R^n\text{,}\) that is the least square solution of \(Ax=b\text{.}\)

🔗

Example 9.1.9.

Example 8.1.7

Solution.

Note that we get the same answer.

🔗

Exercises Exercises

1.

If \(A\) is square matrix, then show that \(A^TA\) and \(AA^T\) are similar.

🔗

2.

Find the SVD of a matrix \(\begin{pmatrix}1 \amp 1 \amp 1 \\2 \amp 2 \amp 2\\3 \amp 1 \amp -1 \end{pmatrix}\text{.}\)

🔗

3.

Find the least square solution of the system of equations \(Ax=b\) where \(A =\begin{pmatrix}2 \amp 3 \amp -1\\-2 \amp 1 \amp 4\\3 \amp 1 \amp 3\\ -5 \amp 4 \amp 2\\1 \amp 1 \amp 1 \end{pmatrix}\) and \(b= \begin{pmatrix}-10\\7\\15\\8\\9 \end{pmatrix}\) using generalized inverse.

🔗

Subsection 9.1.3 Geometry of SVD

SVD provides effective way to look at how a matrix tranforms and object geometrically. In order to see the geometric, let us consider the matrix \(A=\left(\begin{array}{rr} 4 \amp -2 \\ -2 \amp 3 \end{array} \right)\text{.}\)

🔗

The singular value decomposition of \(A=USV^T\) is given by

\begin{equation*} A=\left(\begin{array}{rr} -0.78820 \amp 0.61541 \\ 0.61541 \amp 0.78820 \end{array} \right)\left(\begin{array}{rr} 5.56150 \amp 0.0 \\ 0.0 \amp 1.4384 \end{array} \right)\left(\begin{array}{rr} -0.78820 \amp 0.61541 \\ 0.61541 \amp 0.78820 \end{array} \right)^T\text{.} \end{equation*}

🔗

The geometric meaning of \(A\) applied to unit circle along with unit vectors \(e_1\) and \(e_2\) is explained in the following figure Figure 9.1.10.

🔗

Figure 9.1.10. Transformation of unit circle under SVD
🔗

Let us use Sage to demonstrate geometry of the above example.

🔗

We can plot the image under \(A\text{.}\)

🔗

Example 9.1.11.

Consider a \(3\times 3\) matrix \(A=\begin{pmatrix}2 \amp 3 \amp 1 \\ -1 \amp 2 \amp 1 \\ 0 \amp 2 \amp 3 \end{pmatrix}\text{.}\) The singular values of \(A\) are 5.107, 2.2982 and 1.2779.

🔗

The Figure Figure 9.1.12 below explains what happens to a unit sphere and unit vectors under \(A\text{,}\) obtained using SVD.

🔗

Figure 9.1.12. Transformation of unit circle under SVD
🔗

🔗

We can also demostrate the \(3\times 3\) example also in Sage.

🔗

Subsection 9.1.4 Image Compression using SVD

Images stored on a computer is a collection of dots called pixels. The collection of dots/pixels that constitute an image can be stored as a matrix. The color image can be thought of as 3 dimensional array.

🔗

Using Eqn. (9.1.7) we can write a matrix as sum of rank one matrices.

\begin{equation*} A=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_ru_rv_r^T\text{.} \end{equation*}

🔗

This property says that \(\rank{(A)}\) is equal to the number of singular values of \(A\text{.}\) Since \(\sigma_1\gt \sigma_2\gt \cdots\gt \sigma_r\text{,}\) the first term has highest impact on \(A\) followed by the second term and so on. This propriety allows us to reduce the noise or compress the matrix data by eliminating the small singular values or the higher ranks. This can be used as approximation of a given matrix, in particular we can approximate a matrix by adding only the first few terms of (9.1.7).

🔗

If we let

\begin{equation*} A_k:=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_ku_kv_k^T (k\leq r) \end{equation*}

then the total storage required for \(A_k\) is \(k(m+n+1)\) which is much less compare to \(mn\text{.}\)

🔗

When an image (the corresponding matrix) is transformed using SVD, it is not compressed, but the data take a form in which the first singular value has a more amount of the image information. This allows us to use first few singular values to represent the image almost identical to the original.

🔗

Look at the Image in Fig 9.1.13. The associates matrix for this image is of size \(995\times 1770\times 3\text{.}\)

🔗

This image is converted into a gray scale image (See Figure 9.1.14). This size of matrix associated to the gray scale image is \(995\times 1770\) with rank 995.

🔗

After applying SVD to the Gray image and using 1st 5, 10, 20,30, 50, 100 terms respectively of the rank one approximation the approximate images are plotted in the Figures 9.1.15, Figure 9.1.16, Figure 9.1.17, Figure 9.1.19, Figure 9.1.19, Figure 9.1.20 respectively.

🔗

Figure 9.1.15. Approximate Image with 5 terms
🔗

Figure 9.1.16. Approximate Image with 10 terms
🔗

Figure 9.1.17. Approximate Image with 20 terms
🔗

Figure 9.1.18. Approximate Image with 30 terms
🔗

Figure 9.1.19. Approximate Image with 50 terms
🔗

Figure 9.1.20. Approximate Image with 100 terms
🔗

It is quite evident that 1st 100 terms itself gives a very good approximation of the original gray scale image. Note that the original image has \(995\times 1770=1761150\) pixels, where as if we take the 1st 100 terms, then it is of size \(100(995+1770)=276500\) which quite small compared to the original size.

🔗

Now let us see how, we can implement image compression in Python.

🔗

Example 9.1.21. Color Image Compression.

In this example, we present a Python program that demonstrates color image compression using Singular Value Decomposition (SVD). The method involves separating the image into its Red, Green, and Blue (RGB) channels, applying SVD to each channel individually, and then recombining the compressed channels to obtain the final compressed image.

🔗

The reader is encouraged to copy the following code and execute it in a Python environment.

🔗

Remark 9.1.22.

Suppose \(A=U\Sigma V^T\) is the singular value decomposition of \(A\text{.}\) Then we get the following informations from the SVD of \(A\text{.}\)

🔗

The rank is equal to the number of non-zero singular values: \(\mathrm{rank}(A) = \#\{\sigma_i > 0\}\)
🔗

🔗
Null space (Kernel) is obtained from the columns of \(V\) corresponding to zero singular values.
🔗

🔗
Column space (Range) is panned by the first \(r\) columns of \(U\text{,}\) where \(r = \mathrm{rank}(A)\text{.}\)
🔗

🔗
Row space is spanned by the first \(r\) columns of \(V\text{.}\)
🔗

🔗
Spectral norm, \(\|A\|_2 = \sigma_{\max}\)
🔗

🔗
Frobenius norm \(\|A\|_F = \sum a_{ij}^2=\sqrt{\sum \sigma_i^2}\)
🔗

🔗
Condition number, defined as \(\kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}}\text{.}\)
🔗

🔗

🔗

In essence, SVD bridges theory and application—providing a unifying framework in linear algebra with far-reaching uses from solving least squares problems to dimensionality reduction and real-world data compression.

🔗

Prev Top Next