Skip to main content

Section 6.4 Orthogonal Diagonalizations

Recall the concept of diagonalization of a square matrix. We have seen that an \(n\times n\) matrix \(A\) is diagonalizable if there is an eigenbasis of \(\R^n\text{.}\) In this section, we shall explore if we can find an eigenbasis which is also an orthonormal. First of all we shall define what is meaning of an orthogonal matrix.

Proof.

Assume that \(P^{-1}=P^T\text{.}\) This implies \(PP^T=I\text{.}\) Let the columns of \(P\) are \(p_1, p_2,\ldots, p_n\text{.}\) Then \(\{p_1,\ldots, p_n\}\) is linearly independent. It is easy to see that the \(ij\)-th entry of \(PP^T\) is \(p_i\cdot p_j\text{.}\) Hence we have \(p_i\cdot p_j=\delta_{ij}=1\) if \(i=j\) and 0 otherwise. This proves rows of \(P\) and orthogonal and hence columns of \(P\) are orthogonal. The converse is easy.

Definition 6.4.2.

A square matrix \(P\) is called an orthogonal matrix if it satisfies any one (and hence all) the conditions of Theorem TheoremΒ 6.4.1.

Example 6.4.3.

  1. The matrix \(\begin{pmatrix}\cos \theta \amp -\sin\theta\\\sin\theta \amp \cos\theta \end{pmatrix}\) is an orthogonal matrix.
  2. \(\left(\begin{array}{rrr} -\frac{1}{3} \, \sqrt{3} \amp \sqrt{\frac{2}{3}} \amp 0 \\ \frac{1}{3} \, \sqrt{3} \amp \frac{1}{2} \, \sqrt{\frac{2}{3}} \amp -\sqrt{\frac{1}{2}} \\ \frac{1}{3} \, \sqrt{3} \amp \frac{1}{2} \, \sqrt{\frac{2}{3}} \amp \sqrt{\frac{1}{2}} \end{array} \right)\) is an orthogonal matrix.

Definition 6.4.4.

An \(n\times n\) matrix is called orthogonally diagonalizable if there exists an orthogonal matrix \(P\) such that \(P^{-1}AP=P^TAP\) is a diagonal matrix.
It is easy to easy to see that if \(P\) and \(Q\) are orthgogonal matrices then \(PQ\) is also orthogonal. (why?)

Definition 6.4.5.

Two \(n\times n\) matrices \(A\) and \(B\) are called orthogonally similar if there exists an orthogonal matrix \(P\) such that \(B =P^{-1}AP=P^TAP\) is a diagonal matrix.
Thus an orthogonally diagonally matrix is ortghogonally similar to a diagonal matrix.
Suppose a matrix \(A\) is orthogonally diagonalizable. That is \(P^TAP=D\text{,}\) a digonal matrix. This means \(A=PDP^T\text{.}\) Hence
\begin{equation*} A^T=(PDP^T)^T=PD^TP^T=PDP^T=A. \end{equation*}
Thus if \(A\) is orthogonally diagonalizable then \(A\) must be symmetric.

Proof.

We have
\begin{equation*} (\lambda_1 v_1)\cdot v_2 = (Av_1) \cdot (v_2)= {(Av_1)}^T(v_2)=v_1^T A^Tv_2=v_1^T(\lambda_2 v_2). \end{equation*}
This implies \((\lambda_1-\lambda_2)(v_1\cdot v_2)=0\text{.}\) Since \(\lambda_1\neq \lambda_2\text{,}\) we have \(v_1\cdot v_2=0\text{.}\)
The following therem shows that every real symmetric matrix is orthogonally diagonalizable.

Proof.

\((1\implies 2)\)
Let \(v_1,\ldots, v_n\) be orthogonormal eigenvectors of \(A\) such that \(Av_i=\lambda_i v_i\text{.}\) Then \(P=\begin{bmatrix} v_1\amp v_2\cdots v_n\end{bmatrix}\) is orthogonal. Hence
\begin{equation*} P^TAP={\rm diag}(\lambda_1,\ldots,\lambda_n)=D. \end{equation*}
Hence \(A\) is orthogonally diagonalizable.
\((2\implies 1)\)
Suppose there exists an orthogonal matrix \(P\) susch that \(P^{-1}AP=D\text{.}\) Then \(AP=PD\text{.}\) Let \(D={\rm diag}\{\lambda_1,\ldots,\lambda_n\}\) and \(v_1,\ldots, v_n\) be columns of \(P\text{,}\) then \(\{v_1,\ldots, v_n\}\) is an orthonormal basis of \(\R^n\text{.}\) Also \(AP=PD\) implies \(Av_i=\lambda_i v_1\text{.}\) Hence \(\beta\) is an orthonormal eigenbasis of \(A\text{.}\)
\((2\implies 3)\)
If \(A\) is orthogonally diagonalizable with \(P^TAP=D\) then
\begin{equation*} A^T={(PDP^T)}^T=PDP^T=A. \end{equation*}
Hence \(A\) is symmetric.
\((3\implies 2)\)
We prove this result using induction on \(n\text{.}\) For \(n=1\text{.}\) Let \(A=[\alpha]\text{.}\) Then \(\{1\}\) is an orthonormal basis of \(\R\) and it is also an eigenvector.
Assume that the result is true for \(n-1\text{.}\) That is if \(A\) is an \((n-1)\times (n-1)\) real symmetric matrix then it is orthogonally diagonalizable.
Let us prove the result for \(n\text{.}\) Let \(A\) be an \(n\times n\) real symmetric matrix. By the fundamental theorem of algebra, we know that every real polynomial of has a root in \(\mathbb{C}\text{.}\) Hence the characteristic polynomila of \(A\) has a complex characteristics root. By TheoremΒ 5.3.4, all eigenvalues of \(A\) are real. Thus \(A\) has a real eigenvalue, say, \(\lambda\text{.}\) Let \(u\) be a unit eigenvector corresponding to the eigenvalue \(\lambda\) and \(W=\R u\text{.}\) Then \(W\) is a one dimensional subspace of \(\R^n\text{.}\) Hence \(W^\perp\) an \((n-1)\)-dimensional subspace of \(\R^n\text{.}\) Also \(W\) is \(A-\)invariannt. Hence by CheckpointΒ 6.3.12, \(W^\perp\) is \(A\)-invariant. Also \(\R^n=W\oplus W^\perp\text{.}\)
Let \(\beta = \{u,v_1,\ldots,v_{n-1}\}\) be an extended orthonormnal basis of \(A\text{.}\) Let \(P=[u~v_1~\cdots~v_{n-1}]\text{,}\) the orthogonal matrix whose columns are vectors \(u,v_1,\ldots,v_{n-1}\text{.}\) The the matrix \(M\) of \(A\) with respect to \(\beta\) is \(P^TAP\) which is of the form
\begin{equation*} M=\left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp C \end{array}\right]\text{,} \end{equation*}
where \(C\) is an \((n-1)\times (n-1)\) real symmetric matrix. (why)? Hence by induction, there exists an \((n-1)\times (n-1)\) orthogonal matrix \(Q\) such that \(Q^TCQ=D\text{,}\) a diagonal matrix. Hence
\begin{equation*} M=\left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp Q \end{array}\right] \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp D \end{array}\right] \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp Q^T \end{array}\right] \end{equation*}
Thus we have
\begin{equation*} P^TAP = M = \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp Q \end{array}\right] \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp D \end{array}\right] \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp Q^T \end{array}\right]. \end{equation*}
This implies
\begin{equation*} A = P\left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp Q \end{array}\right] \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp D \end{array}\right] \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp Q^T \end{array}\right]P^T. \end{equation*}
Define \(P_1=P\left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp Q \end{array}\right]\) . The \(P_1\) is an orthogonal matrix and
\begin{equation*} A=P_1 \left[ \begin{array}{c|c} \lambda \amp 0 \\ \hline 0 \amp D \end{array}\right] P_1^T. \end{equation*}
The above theorem is called the spectral throem of real symmetric matrix.

Example 6.4.9.

Consider a matrix \(A=\left(\begin{array}{rrr} 5 \amp -2 \amp -4 \\ -2 \amp 8 \amp -2 \\ -4 \amp -2 \amp 5 \end{array} \right)\text{.}\) Clearly \(A\) is symmetric and hence it is orthogonally diagonalizable. The characteristic polynomial of \(A\) is
\begin{equation*} \det{(xI-A)}=x^3 - 18x^2 + 81x=x(x-9)^2\text{.} \end{equation*}
Hence \(0, 9, 9\) are eigenvalues of \(A\text{.}\) Its is easy to find that \(v_1=(1, 1/2, 1)\) is an eigenvector corresponding to the eigenvalue 0. \(v_2=(1, 0, -1), v_2=(0, 1, -1/2)\) are eigenvectors corresponding to eigenvalue 9. Hence \(P:=\left(\begin{array}{rrr} 1 \amp 1 \amp 0 \\ \frac{1}{2} \amp 0 \amp 1 \\ 1 \amp -1 \amp -\frac{1}{2} \end{array} \right)\text{.}\) Then
\begin{equation*} P^{-1}AP=\left(\begin{array}{rrr} 0 \amp 0 \amp 0 \\ 0 \amp 9 \amp 0 \\ 0 \amp 0 \amp 9 \end{array} \right) \end{equation*}

Problem 6.4.10.

For the following matrices find an orthogonal matrix \(P\) such that \(P^{-1}AP\) is a diagonal matrix.
\begin{equation*} \begin{pmatrix}2 \amp -1 \\-1 \amp 1 \end{pmatrix} , \begin{pmatrix}1 \amp 0 \amp -1\\0 \amp 1 \amp 2\\-1 \amp 2 \amp 5 \end{pmatrix} \end{equation*}

Proof.

Distance Preserving maps in \(\R^n.\)
Suppose \(f\colon \R^n \to \R^n\) is a map, that preserves the distance, that is \(\norm{f(x)-f(x)}=\norm{x-y}\) for all \(x,y\in \R^n\text{.}\) We would like to study such maps. Let us first look at a spacial case when \(f\) fixes the origin.

Proof.

From (1) and (2) we have
\begin{equation*} \norm{f(x)}=\norm{f(x)=f(0)}=\norm{f(x-0)}}=\norm{f(x)}, \end{equation*}
for all \(x\in \R^n\text{.}\) Using this we have
\begin{equation*} \norm{f(x)-f(y)}^2=\norm{x-y}^2. \end{equation*}
Exanding both sides, we get
\begin{equation*} f(x)\cdot f(y) = x\cdot y \end{equation*}
for all \(x,y\text{.}\) That is, \(f\) preserves the dot product. This implies that \(f\) maps an orthonormal basis of \(\R^n\) to an orthonormal basis of \(\R^n\text{.}\) In particulatr, \(\{f(e_i)\}\) is an orthonormal basis of \(\R^n\text{,}\) where \(\{e_i\}\) is the standard basis. Hence
\begin{equation*} f(x)=f(\sum x_i e_i)=\sum f(x)\cdot f(e_i) f(e_i)=\sum x\cdot e_i f(e_i)=\sum x_i f(e_i). \end{equation*}
This shows that \(f\) is a linear map. (why?)
Now using the abobve LemmaΒ 6.4.12, we can identify all distance preserving maps in \(\R^n\text{,}\) which is the content of the next theorem.

Proof.

Let \(x_0:=f(0)\) and \(g(x)=f(x)-x_0\text{.}\) Then it is easy to check that \(g(0)=0\) and \(\norm{g(x)-g(y)}=\norm{x-y}\) for all \(x,y\text{.}\) Hence by LemmaΒ 6.4.12, \(g\) is linear. By TheoremΒ 6.4.11, \(g(x)=Ax\) for some orthogonal linear transformation \(A\text{.}\) Hence \(f(x)=Ax+x_0.\)

Definition 6.4.14.

Two \(n\times n\) matrices are called simultaneously diagonalizable if there exist a snon singular matrix \(P\) such that \(P^{-1}AP\) and \(P^{-1}BP\) are diagonal.

Checkpoint 6.4.15.

If \(A\) and \(B\) are simultaneously diagonalizable, then they commute.
Consider two matrices
\begin{equation*} A=\begin{pmatrix} 1 \amp 0 \amp 0\\ 0 \amp 1 \amp 0\\ 0 \amp 0 \amp 2 \end{pmatrix}, \qquad B=\begin{pmatrix} 1 \amp 1 \amp 0\\ 0 \amp 1 \amp 0\\ 0 \amp 0 \amp 2 \end{pmatrix}. \end{equation*}
It is easy to check that \(AB=BA\text{.}\) Also \(A\) is diagonalizable, however, \(B\) is not diagonalizable. In addition, let us assume that \(A\) and \(B\) are symmetric matrices and commute. Can we say that they are simultaneosuly diagonalizable?

Proof.

Since \(A\) is symmetric, it is orthogonally diogonalizable. Let \(Q^TAQ=D\text{,}\) where \(D={\rm diag}(\lambda_1,\ldots, \lambda_n)\) and \(Q=[v_1,\ldots,v_n\text{.}\) That is, \(Av_i=\lambda_i v_i\) and \(v_i^Tv_j=\delta_{ij}\text{.}\)
Suppose \(v\) is an eigenvector of \(A\) and \(Av=\lambda v\text{.}\) Then
\begin{equation*} B(Av)=B(Av)=\lambda Bv. \end{equation*}
Hence \(Bv\) is also an eigenvector of \(A\) with the same eigenvalue. This implies that the eigenspacce \(E_\lambda\) is \(B\)-invariant. Also \({B_{\mid}}_{E_\lambda}\text{,}\) the restriction of \(B\) on \(E_\lambda\) is symmetric and hence \({B_{\mid}}_{E_\lambda}\) has an orthonormal basis of \(E_\lambda\text{.}\) Thus we can construct eigenbasis on each of distinct eigenspace of \(A\text{.}\) Since different eigenspaces are mutuatlly orthogonal, by taking the union of all these eigenbases and we get an eigenbasis of \(\R^n\) of \(B\text{.}\) It is easy to see that this is a common eigenbasis of \(A\) and \(B\text{.}\)
Steps to simultaneosuly diagonalize syymetric commuting matrices
Let \(A\) and \(B\) be two \(n\times n\) symmetric matrices such that \(AB=BA\text{.}\)
1. Since \(A\) is symmetric, it admits an orthogonal diagonalization:
\begin{equation*} A = Q \Lambda Q^T, \end{equation*}
where \(\Lambda\) is diagonal and the columns of \(Q\) form an orthonormal basis of eigenvectors of \(A\text{.}\)
2. Consider the eigenvalue multiplicities of \(A\text{:}\)
  • If all eigenvalues of \(A\) are distinct, then each eigenspace is one-dimensional. Because \(AB=BA\text{,}\) the eigenspaces of \(A\) are invariant under \(B\text{.}\) Hence \(B\) must already be diagonal in this basis.
  • If some eigenvalues of \(A\) are repeated, then the corresponding eigenspace \(E_\lambda\) has dimension greater than one. In this case, \(B\) preserves \(E_\lambda\) and the restriction \(B|_{E_\lambda}\) is symmetric, so it can be orthogonally diagonalized within \(E_\lambda\text{.}\)
3. Replace, in each repeated eigenspace \(E_\lambda\text{,}\) the basis vectors of \(A\) by the orthonormal eigenvectors of \(B|_{E_\lambda}\text{.}\) This yields a common orthonormal eigenbasis for both \(A\) and \(B\text{.}\)
4. Let \(P\) be the orthogonal matrix formed from these common eigenvectors as columns. Then
\begin{equation*} P^T A P = D_A, \qquad P^T B P = D_B, \end{equation*}
where \(D_A\) and \(D_B\) are diagonal matrices.
Thus, \(A\) and \(B\) are simultaneously orthogonally diagonalizable.

Proof.

Existence:
Since \(A\) is a real symmetric matrix, it is orthogonally diagonalizable. Let
\begin{equation*} A = QDQ^T, \end{equation*}
with \(D= \mathrm{diag}(\lambda_1,\dots,\lambda_n)\text{.}\) Sicne \(A\) is semi-positive definite, all its eigenvalues are positive. That is, \(\lambda_i \geq 0\) for all \(i\text{.}\) Define \(C: = \mathrm{diag}(\sqrt{\lambda_1},\dots,\sqrt{\lambda_n})\text{,}\) where we take non negative square roots of \(\lambda_i\text{.}\) Now define
\begin{equation*} B: = Q C Q^T. \end{equation*}
Then
\begin{equation*} B^2 = (Q C Q^T)(Q C Q^T)=QC^2Q^T=QDQ^T=A. \end{equation*}
Since \(C\) is symmetric and semi-positive definite, \(B\) is symmetric and semi-positve definite. This proves the existence.
Uniqueness:
Suppose, there exist a symmetric semi-positive definite matrces \(B\) and \(B_1\) such that \(B^2=A=B_1^2\text{.}\)
The PropositionΒ 6.4.17 is way to construct the square root of a real symmetric positive matrices.

Subsection 6.4.1 Applications of Affine Linear Map

Let us look some applications of affine linear transformation to fractals.

Example 6.4.19. Koch Curve.

The Koch curve is a classic fractal that can be described using the language of affine linear transformations.
The construction of the Koch curve begins with a single line segment from \((0,0)\) to \((1,0)\) called initiator.
Next we remove the middle third of the line, and replace it with two lines that each have the same length (1/3 or orgininal) as the remaining lines on each side. This new form is called the generator, because it specifies a rule that is used to generate a new form. Note that length of each seqment is 1/3. See FigureΒ 6.4.21.
Figure 6.4.20. Initiator
Figure 6.4.21. Generator
Next we repeat the above steps to each of four segments in the generator. Then we get the cureve as in FigureΒ 6.4.22. Length of each segment in this case is \(16/9\text{.}\) If we apply the generator once again, we ge the curve as in FigureΒ 6.4.23. Length of each segment in this case is \(64/27\text{.}\)
Figure 6.4.22. After 2 iterations
Figure 6.4.23. After 3 iterators
If we keep apply this process, we get what is called the Koch Curve (named after the mathematician Helge von Koch in 1904). After applying the 7 iterators we get the curve as in FigureΒ 6.4.24
Figure 6.4.24. Koch Curve with 7 iterations.
Now let us construct the Koch curve as an application of affine linear map. The construction begins with a single line segment from \((0,0)\) to \((1,0)\text{.}\) At each step, this segment is replaced by four smaller segments:
  1. The first third (straight, scaled by \(1/3\)).
  2. The second third (scaled by \(1/3\) and rotated by \(+60^\circ\)).
  3. The third third (scaled by \(1/3\) and rotated by \(-60^\circ\)).
  4. The last third (straight, shifted).
Each of these pieces is obtained from the original segment by applying one of four affine linear maps.
\begin{align*} T_1(z) \amp= \tfrac{1}{3}z, \\ T_2(z) \amp= \tfrac{1}{3}z + \tfrac{1}{3}, \\ T_3(z) \amp= \tfrac{1}{3} e^{i\pi/3} z + \tfrac{1}{3}, \\ T_4(z) \amp= \tfrac{1}{3}z + \tfrac{2}{3}. \end{align*}
Written in real coordinates, these are of the form \(T_j(x) = A_j x + b_j\) with
  • \(A_1 = \tfrac{1}{3}I, \quad b_1 = (0,0)\text{,}\)
  • \(A_2 = \tfrac{1}{3}I, \quad b_2 = (1/3,0)\text{,}\)
  • \(A_3 = \tfrac{1}{3}R_{60}, \quad b_3 = (1/3,0)\text{,}\)
  • \(A_4 = \tfrac{1}{3}I, \quad b_4 = (2/3,0)\text{,}\)
where \(R_{60}\) is the \(2\times 2\) rotation matrix for \(60^\circ\text{.}\)
We deomontrate this in Sage.

Example 6.4.25. SierpiΕ„ski Triangle.

The Sierpinski triangle (named after the Polish mathematician Waclaw Sierpinski), also called the Sierpinski gasket, is a self-similar fractal subset \(S\) of the plane. It can be obtained by an iterative geometric construction starting from a filled equilateral triangle and applying iterated function system (IFS) consisting of affine maps.
The construction begins with a filled equilateral triangle (stage 0). At each stage we subdivide and remove parts according to the following rules:
  • Stage 0: Start with a solid equilateral triangle of side length 1. See FigureΒ 6.4.26.
  • Stage 1: Subdivide the triangle into four smaller equilateral triangles of side length 1/2 and remove the central one. See FigureΒ 6.4.27.
  • Stage \(n+1\text{:}\) For each filled triangle from stage \(n\text{,}\) repeat the same process: divide into four, remove the central one. See FigureΒ 6.4.28 and FigureΒ 6.4.29 for two and three iterations.
Continuing indefinitely, the limit of this process is the SierpiΕ„ski triangle. See FigureΒ 6.4.30 after 8 iterations.
Figure 6.4.26. Original Triangle
Figure 6.4.27. After One Iteration
Figure 6.4.28. After Two Iterations
Figure 6.4.29. After Three Iterations
Figure 6.4.30. Sierpinski Triangle after 8 iterations.
The SierpiΕ„ski triangle arises from three specific affine maps:
\begin{align*} T_1(x) \amp = \tfrac{1}{2} x, \\ T_2(x) \amp= \tfrac{1}{2} x + (1/2, 0), \\ T_3(x) \amp= \tfrac{1}{2} x + (1/4, \tfrac{\sqrt{3}}{4}) \end{align*}
Each of these maps scales the plane by a factor of \(1/2\) and then translates:
  • \(T_1\) shrinks towards the origin.
  • \(T_2\) shrinks and shifts right to cover the bottom-right subtriangle.
  • \(T_3\) shrinks and shifts upward to cover the top subtriangle.
If \(S\) denotes the SierpiΕ„ski triangle, then it satisfies the fundamental iterated function system equation:
\begin{equation*} S = T_1(S) \cup T_2(S) \cup T_3(S). \end{equation*}
Now let us see how we can make the Sierspinki triangle in Sage.

Example 6.4.31. Sierpinski Carpet.

The Sierpinski carpet is the planar fractal obtained by repeatedly removing the open central square from a subdivided unit square. Equivalently, it is the unique nonempty set \(C\) satisfying an iterated-function system (IFS) of eight contractive affine maps.

Example 6.4.32. Sierpinski Pyramid.

The Sierpinski pyramid (also called the Sierpinski tetra-pyramid when based on a triangle, or SierpiΕ„ski square pyramid when based on a square) is a three-dimensional fractal obtained by repeatedly subdividing a pyramid into smaller self-similar pyramids. It provides a natural extension of the ideas behind the SierpiΕ„ski triangle and SierpiΕ„ski carpet to three dimensions.
The construction can be described as an application of affine linear maps. Starting from an initial pyramid \(P_0\text{,}\) we apply scaling by a factor of \(\tfrac{1}{2}\) followed by translations to position the smaller pyramids. In the square-based case, four pyramids are placed at the corners of the base, and one is placed on the top near the apex. This gives a total of five affine maps:
\begin{equation*} P = T_1(P) \cup T_2(P) \cup T_3(P) \cup T_4(P) \cup T_5(P), \end{equation*}
where each \(T_j\) is of the form \(T_j(x) = A x + b_j\text{,}\) with \(A\) being the scaling matrix and \(b_j\) the translation vector.

Subsection 6.4.2 Quadratic Forms and Conic Sections

In this subsection, we give an application of orthogonal diagonalizability to conic sections.
A general second-degree equation in two variables is given by
\begin{equation*} Q(x,y) = ax^2 + 2bxy + cy^2 + dx + ey + f = 0, \end{equation*}
where \(a,b,c,d,e,f \in \mathbb{R}\text{.}\)
This equation can be written compactly in matrix notation as
\begin{equation*} Q(x,y) = \begin{bmatrix}x \amp y\end{bmatrix} \begin{bmatrix} a \amp b \\ b \amp c \end{bmatrix} \begin{bmatrix}x \\ y\end{bmatrix}+ \begin{bmatrix} d \amp e \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + f = 0. \end{equation*}
Here,
\begin{equation*} A = \begin{bmatrix} a \amp b \\ b \amp c \end{bmatrix} \end{equation*}
is the symmetric matrix associated with the quadratic part \(ax^2+2bxy+cy^2\text{.}\)
Since \(A\) is symmetric, it is orthogonally diagonalizable. That is, there exists an orthogonal matrix \(P\) such that
\begin{equation*} P^TAP = D = \begin{pmatrix} \lambda_1 \amp 0\\ 0 \amp \lambda_2\end{pmatrix}, \end{equation*}
where \(\lambda_1,\lambda_2\) are eigenvalues of \(A\) and \(P\) is the column matrix of orthogonal eigenbasis.
With the change of variables
\begin{equation*} \begin{bmatrix} x \\ y \end{bmatrix} = P \begin{bmatrix} u \\ v \end{bmatrix}, \end{equation*}
the quadratic form simplifies to
\begin{equation*} Q(u,v) = \lambda_1 u^2 + \lambda_2 v^2 + \alpha u+\beta v + f = 0. \end{equation*}
Note that here
\begin{equation*} \begin{bmatrix}\alpha\\\beta\end{bmatrix} =\begin{bmatrix}d\amp e\end{bmatrix}P. \end{equation*}
Thus, the cross term \(2bxy\) is eliminated by using the orthogonal linear trasformation \(\begin{bmatrix} u \\ v \end{bmatrix}=P\begin{bmatrix} x \\ y \end{bmatrix} \) and the conic aligns with its principal axes, that is along the eigenvectors directions.
Now we have various cases. If we assume that \(\lambda_1\) and \(\lambda_2\) are positive, then we can complete square and we get
\begin{equation*} Q(u,v) = \lambda_1\left(u+\frac{\alpha}{2\lambda_1}\right)^2+ \lambda_2\left(v+\frac{\beta}{2\lambda_2}\right)^2-g, \end{equation*}
for some real number \(g\text{.}\) What is \(g\text{?}\) It is \(\left(\frac{\alpha^2}{4\lambda_1} + \frac{\beta^2}{4\lambda_2} - f\right)\text{.}\)
The origin of this quadratic in \(uv\)-coordinates is
\begin{equation*} \begin{bmatrix} u_0\\v_0\end{bmatrix}= \begin{bmatrix}-\frac{\alpha}{2\lambda_1}\\-\frac{\beta}{2\lambda_2}\end{bmatrix}. \end{equation*}
Hence the orgin in terms of \(xy\)-coordinates is
\begin{equation*} \begin{bmatrix} x_0\\y_0\end{bmatrix}= P\begin{bmatrix}-\frac{\alpha}{2\lambda_1}\\-\frac{\beta}{2\lambda_2}\end{bmatrix}. \end{equation*}
Thus we have converted the original quadratic \(Q(x,y)\) to
\begin{equation*} Q(\tilde{x},\tilde{y})=\lambda_1\tilde{x}^2+\lambda_2\tilde{y}^2-g = \frac{\tilde{x}^2}{\left(\frac{g}{\lambda_1}\right)^2}+\frac{\tilde{y}^2}{\left(\frac{g}{\lambda_1}\right)^2}-1. \end{equation*}
This is an ellipse. Here, we have
\begin{equation*} \begin{pmatrix} \tilde{x}\\ \tilde{y} \end{pmatrix} = \begin{pmatrix} u+\frac{\alpha}{2\lambda_1}\\v+\frac{\beta}{2\lambda_2}\end{pmatrix} =\begin{pmatrix} u\\v\end{pmatrix}+\begin{pmatrix} \frac{\alpha}{2\lambda_1}\\\frac{\beta}{2\lambda_2}\end{pmatrix}= P^{-1}\begin{pmatrix}x\\y \end{pmatrix}+\begin{pmatrix} \frac{\alpha}{2\lambda_1}\\\frac{\beta}{2\lambda_2}\end{pmatrix}. \end{equation*}
The transformation, \(P^{-1}\begin{pmatrix}x\\y \end{pmatrix}+\begin{pmatrix} \frac{\alpha}{2\lambda_1}\\\frac{\beta}{2\lambda_2}\end{pmatrix} \) is called an affine linear transformation. Here \(P^{-1}\) is a orthogonl linear map. Thus an affine linear transformation on \(\R^n\) is a map of the form \(T(x)=Px+v\) ,where \(P\) is an orthogonal transformation and \(v\) is a called a translation vector. Such maps are also called isometries.
In case, \(\lambda_1\) and \(\lambda_2\) both are negative then, we can multiply the whole equation by \(-1\) and we get the a similar expression except, the right hand changes its sign.
In case one of the \(\lambda's\text{,}\) say \(\lambda_2<0\text{,}\) then the conic tranforms to
\begin{equation*} Q(\tilde{x},\tilde{y})=\lambda_1\tilde{x}^2-\lambda_2\tilde{y}^2-g, \end{equation*}
which is a hyperbola.
In case one of the \(\lambda's\text{,}\) say \(\lambda_2=0\text{,}\) then the conic tranforms to
\begin{equation*} Q(\tilde{x},\tilde{y})=\lambda_1\tilde{x}^2+\beta \tilde{y} -g, \end{equation*}
which is parabola. Here \(\tilde{y}=v\) and \(g=\alpha^2/(4\lambda_1)-f\text{.}\)
Classification of Conics in two variables
Based on the above discussions, the classification of the above conic section depends on the eigenvalues of \(A\text{.}\)
  • Ellipse: If both eigenvalues \(\lambda_1, \lambda_2\) have the same sign, then the quadratic is an ellipe of the form \(x^2/a^2+y^2/b^2=1\text{.}\)
  • Circle: When \(\lambda_1 = \lambda_2\text{,}\) then the quadratic is circle.
  • Hyperbola: If eigenvalues have opposite signs, then the quadratic is a hyperbola of the form
    \begin{equation*} x^2/a^2 -y^2/b^2= 1. \end{equation*}
  • Parabola: If one eigenvalue is zero, then it is a parabola.

Example 6.4.33.

Condiser the quadratic \((Q(x,y)=7 \, x^{2} - 6 \, x y + 7 \, y^{2} - 2 \, x + 3 \, y - 24\text{.}\) Let us convert this quadrtic into a conic section in canonial form.
Solution.
The associated symmetric matrix of this quadratic \(A\) is given by
\begin{equation*} A = \begin{pmatrix} 7 \amp -3 \\-3 \amp 7\end{pmatrix}. \end{equation*}
It is easy to check the the eigenvalues of \(A\) are \(\lambda_1=10\) and \(\lambda_2=4\) with the corresponding eigenvectors \(v_1 = \left(\frac{1}{2} \, \sqrt{2},\,-\frac{1}{2} \, \sqrt{2}\right)\) and \(v_2=\left(\frac{1}{2} \, \sqrt{2},\,\frac{1}{2} \, \sqrt{2}\right)\text{.}\) Hence we have
\begin{equation*} D= \left(\begin{array}{rr} 10 \amp 0 \\ 0 \amp 4 \end{array}\right), P = \left(\begin{array}{rr} \frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \amp \frac{1}{\sqrt{2}} \end{array}\right). \end{equation*}
The new coordinates in terms of \(uv\) is
\begin{equation*} \begin{bmatrix} x \\ y \end{bmatrix} = P \begin{bmatrix} u \\ v \end{bmatrix}=\left(\begin{array}{rr} \frac{1}{\sqrt{2}} u + \frac{1}{\sqrt{2}} v \\ -\frac{1}{\sqrt{2}} u + \frac{1}{\sqrt{2}v} \end{array}\right). \end{equation*}
Now substituting \(x=\frac{1}{\sqrt{2}} u + \frac{1}{\sqrt{2}} v\) and \(y=-\frac{1}{\sqrt{2}} u + \frac{1}{\sqrt{2}} v \) in the given quadratic, we get
\begin{equation*} Q(u,v)=10 \, u^{2} + 4 \, v^{2} - 7 \, \sqrt{2} u + \sqrt{2} v - 56. \end{equation*}
After completing the squares, we get
\begin{equation*} Q(u,v)=10 \left(u- \frac{7\sqrt{2}}{20}\right)^2+ 4 \left(v+ \frac{\sqrt{2}}{8}\right)^2 - 2343/40. \end{equation*}
This can be written as an equation of ellipes. Note that here the translation vector is given by
\begin{equation*} \begin{pmatrix}x_0\\y_0\end{pmatrix}=P\begin{pmatrix} \frac{7\sqrt{2}}{20}\\\frac{-\sqrt{2}}{8}\end{pmatrix}= \begin{pmatrix}\frac{9}{40}\\ -\frac{19}{40}\end{pmatrix}. \end{equation*}
Let us explore this in Sage. Here we plot the orginal quadratic curve along with the transformed coordinates.

Example 6.4.34.

Consider the quadratic equation \(3x^{2}+4xy + 2 y^{2} - 8 x + 6 y-3=0\text{.}\) We wish the classify this as a conic section.
Let us first plot the graph of this curve in Sage.
The symmetric matrix associated with quadratic tem is given by
\begin{equation*} A = \begin{pmatrix}3 \amp 2 \\ 2 \amp 2 \end{pmatrix}\text{.} \end{equation*}
It is easy to check that he eigenvalues are \(\lambda_1=0.4384471871911698, \lambda_2=4.561552812808830\text{.}\) Since both the eigenvalues are positive, this quadratic is an ellipse. This is what the graph shows.
Now we give all the steps in Sage to plot the curve along with the now coordinate system.

Example 6.4.35.

Consider the quadratic eqation given by \(-x^2+4xy-y^2-30x,+y+20=0\text{.}\) Use Sage to classify this and plot the curve along with the transformed coordinates system.

Example 6.4.36.

Consider the quadratic equation \(Q(x,y) = 3 \, x^{2} - 6 \, x y + 3 \, y^{2} - 6 \, x + 8 \, y + 5\) and classify this to a conic section.
Solution.
The matrix associated with the quadratic part of the above equation is \(A = \left(\begin{array}{rr} 3 \amp -3 \\ -3 \amp 3 \end{array}\right)\text{.}\) It is easy to check that the eigenvalues of \(A\) are \(\lambda_1=6, \lambda_2=0\text{.}\) Since one of the eigenvalues is 0, this curve is a parabola. Let us draw this curve along with the tranformed orgini and the two new coordinate directions in Sage.

Activity 6.4.1.

For given quadratic equation \(Q(x,y)=ax^2+2bxy+cy^2+dx+ey+f=0\text{,}\) write down the corresponding canonical conics by describing the new orgin \((x_0,y_0)\text{,}\) and the new coordinate vectors by codisering different cases in a tabular form.

Subsection 6.4.3 Classification of Quadratic Surfaces in Three Variables

The classification of quadratic equation in three varibale can be done in a very similar manner as we have seen in case of two variable SubsectionΒ 6.4.2
A general quadratic equation in three variables is
\begin{equation*} Q(x,y,z) = ax^2 + by^2 + cz^2 + 2dxy + 2eyz + 2fzx + gx + hy + iz + j=0 \end{equation*}
where \(a,b,c,d,e,f,g,h,i,j \in \mathbb{R}\text{.}\)
In matrix form,
\begin{equation*} Q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} + \mathbf{b}^T \mathbf{x} + j, \quad \mathbf{x} = \begin{bmatrix} x \\ y \\ z \end{bmatrix}, \quad A = \begin{bmatrix} a \amp d \amp f \\ d \amp b \amp e \\ f \amp e \amp c \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} g \\ h \\ i \end{bmatrix}. \end{equation*}
Since \(A\) is symmetric, there exists an orthogonal matrix \(P\) such that
\begin{equation*} P^T A P = D= \operatorname{diag}(\lambda_1,\lambda_2,\lambda_3). \end{equation*}
After an orthogonal change of variables \(\mathbf{x} = P\mathbf{u}\) and a translation to eliminate linear terms, the quadratic form reduces to the canonical form.
\begin{equation*} Q(u,v,w) = \lambda_1 u^2 + \lambda_2 v^2 + \lambda_3 w^2 + j = 0. \end{equation*}
Classification of Quadrics
Depending on the signs of \(\lambda_1, \lambda_2, \lambda_3\text{,}\) we obtain the following surfaces:
  1. Ellipsoid: All eigenvalues positive.
    \begin{equation*} \frac{u^2}{a^2} + \frac{v^2}{b^2} + \frac{w^2}{c^2} = 1, a,b,c > 0\text{.} \end{equation*}
  2. Hyperboloid of One Sheet: Two positive eigenvalues, one negative.
    \begin{equation*} \frac{u^2}{a^2} + \frac{v^2}{b^2} - \frac{w^2}{c^2} = 1. \end{equation*}
  3. Hyperboloid of Two Sheets: One positive eigenvalue, two negative.
    \begin{equation*} -\frac{u^2}{a^2} - \frac{v^2}{b^2} + \frac{w^2}{c^2} = 1. \end{equation*}
  4. Elliptic Cone: Two positibve and one negative eigenvalues with no constant term.
    \begin{equation*} \frac{u^2}{a^2} + \frac{v^2}{b^2} - \frac{w^2}{c^2} = 0. \end{equation*}
  5. Elliptic Paraboloid: (Bowl-shaped surface) Two positive eigenvalues, one zero.
    \begin{equation*} \frac{u^2}{a^2} + \frac{v^2}{b^2} = \frac{w}{c}. \end{equation*}
  6. Hyperbolic Paraboloid: (Saddle Surface) One positive eigenvalue, one negative, one zero.
    \begin{equation*} \frac{u^2}{a^2} - \frac{v^2}{b^2} = \frac{w}{c}. \end{equation*}
  7. Elliptic Cylinder: Two positive eigenvalues, third zero.
    \begin{equation*} \frac{u^2}{a^2} + \frac{v^2}{b^2} = 1. \end{equation*}
  8. Hyperbolic Cylinder: One positive, one negative, third zero.
    \begin{equation*} \frac{u^2}{a^2} - \frac{v^2}{b^2} = 1. \end{equation*}
  9. Parabolic Cylinder: Only one nonzero eigenvalue.
    \begin{equation*} \frac{u^2}{a^2} = \frac{v}{b}. \end{equation*}