Theorem. All eigenvalues of a symmetric matrix are real.
The narrative is missing for the above picture. Also, I want to make the picture of the action of $A^\top$. I already made the left-hand side picture:
The right-hand side picture should be simpler to make; it will follow soon.
Place the cursor over the image to start the animation.
Five of the above level surfaces at different level of opacity.
The reasoning here is the same as in a multivariable calculus course when we figure out the shape of the surface given by the equation \(-x^2-y^2+5z^2 = 0\), or \[ |z| = \frac{1}{\sqrt{5}} \sqrt{x^2+y^2}. \] This cone is obtained by rotating the line \(z = \frac{1}{\sqrt{5}}x, y = 0\) about the \(z\)-axes. Or, in the language of vectors, by rotating the line spanned by the vector \(\sqrt{5} \mathbf{i}+\mathbf{k}\) about the vector \(\mathbf{k}\).
Below is a graph of the cone \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] with the unit vectors \(\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3\) in blue and the line spanned by the vector \[ \sqrt{5} \mathbf{u}_1 + \mathbf{u}_3 = \sqrt{5} \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] in red. It is interesting to identify the blue vector in the picture below which corresponds to \(\mathbf{u}_3\), since the rotation of the red line about this vector creates the cone. Notice that \(\mathbf{u}_3\) is the only blue vector with all three coordinates positive. Also notice that the positive directions of \(x_1\)-axis and \(x_2\)-axis point away from the observer.
In the image below I give a graphical representation of the above quadruplicity. The red dot represents the zero quadratic form, the green region represents the positive semidefinite quadratic forms, the blue region represents the negative semidefinite quadratic forms and the cyan region represents the indefinite quadratic forms.
In the image above, the dark green region represents the positive definite quadratic forms and the dark blue region represents the negative definite quadratic forms. These two regions are not parts of the above quadruplicity.
Theorem. All eigenvalues of a symmetric matrix are real.
Background Knowledge. Since we are working with complex numbers, in the background knowledge we will review basics operations with complex numbers and matrices with complex entries.
BK1. Let $n\in\mathbb{N}$. An $n\times n$ matrix $A$ is said to be real if all entries of $A$ are real numbers and $A^\top = A$.
BK2. In this item we state properties of the transpose operation for matrices with complex entries. We are familiar with these properties for matrices with real entries. The same properties hold for matrices with complex entries. Let $k,l,m,n \in\mathbb{N}$. Let $X$ be a $k\times l$ matrix, let $Y$ be an $l\times m$ matrix, and $Z$ be an $m\times n$ matrix.
(a) Then \[ (XY)^\top = Y^\top X^\top \quad \text{and} \quad (XYZ)^\top = Z^\top Y^\top X^\top. \] (b) If $\alpha$ is a complex number, then we have \[ (\alpha X)^\top = \alpha X^\top, \quad (\alpha X) Y = X (\alpha Y) = \alpha (XY). \]
BK3. As you can see from my summary on Complex Numbers, the conjugation is an important operation with complex numbers. Recall, first, the imaginary unit $i$ is the complex number such that $i^2 = -1$. Second, for a complex number \[ z = x + y\mkern 2mu i \quad \text{where} \quad x, y \in \mathbb{R}, \] the conjugate of $z$, denoted by $\overline{z}$, is the complex number \[ \overline{z} = x - y\mkern 2mu i. \] Notice that the complex numbers $z$ and its conjugate $\overline{z}$ are reflections of each other across the real axis, see the first picture in Complex Numbers.
Now, three important features of the conjugate.
(a) product of a complex number and its conjugate \[ z \kern 2mu \overline{z} = (x + y\mkern 2mu i) (x - y\mkern 2mu i ) = x^2 + y^2 = |z|^2 \geq 0 \] is nonnegative. In the last equality, $|z|$ is the modulus of the complex number $z$. By definition \[ |z|=\sqrt{x^2 + y^2}. \] (b) We have $z \kern 1mu \overline{z} = |z|^2 = 0$ if and only if $z = 0$.
(c) We have $z = \overline{z}$ if and only if $z \in\mathbb{R}$, that is if $z$ is a real number.
BK4. The operation of complex conjugation extends to matrices. For a matrix $X$, by $\overline{X}$ we denote the matrix in which all the entries of $X$ are replaced by their conjugates. Let $k,l,m \in\mathbb{N}$, let $X$ be a $k\times l$ matrix, let $Y$ be an $l\times m$ matrix, and let $\alpha$ be a complex number.
(a) The following algebraic rules hold for the operation of conjugation \[ \overline{\alpha \mkern 1.5mu X} = \overline{\alpha} \mkern 2mu \overline{X} \quad \text{and} \quad \overline{X Y} = \overline{X} \mkern 2mu \overline{Y}. \] (b) All the entries of a matrix $X$ are real if and only if $X = \overline{X}$.
(c) Let $n\in\mathbb{N}$ and let $\mathbf{v} \in \mathbb{C}$. Then for $\mathbf{v}$ and its conjugate $\overline{\mathbf{v}}$ we have \[ \mathbf{v} = \left[\!\begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_n \end{array}\!\right], \qquad \overline{\mathbf{v}} = \left[\!\begin{array}{c} \overline{v_1} \\ \overline{v_2} \\ \vdots \\ \overline{v_n} \end{array}\!\right], \] and for the product of the $1\times n$ matrix $\mathbf{v}^\top$ and the $n\times 1$ matrix $\overline{\mathbf{v}}$ we have \begin{align*} \mathbf{v}^\top \overline{\mathbf{v}} & = \bigl[\!\begin{array}{cccc} v_1 & v_2 & \cdots & v_n \end{array}\!\bigr] \left[\!\begin{array}{c} \overline{v_1} \\ \overline{v_2} \\ \vdots \\ \overline{v_n} \end{array}\!\right] \\ & = v_1 \overline{v_1} + v_2 \overline{v_2} + \cdots + v_n \overline{v_n} \\ & = |v_1 |^2 + |v_2 |^2 + \cdots + | v_n |^2 \geq 0. \end{align*} If $\mathbf{v}$ is a nonzero vector, then at least one of the nonnegative numbers \[ |v_1 |, \ |v_2 |, \ \ldots , | v_n |^2 \] is positive. Therefore, if $\mathbf{v}$ is a nonzero vector, then $\mathbf{v}^\top \overline{\mathbf{v}} \gt 0$.
Proof. Let $n\in\mathbb{N}$ and let $A$ be an $n\times n$ symmetric matrix. Then $A = A^\top$ and all entries of $A$ are real. Let $\lambda$ be an eigenvalue of $A$ and let $\mathbf{v}$ be a corresponding eigenvector. Then, $\mathbf{v}$ is a nonzero vector and \[ A \mathbf{v} = \lambda \mathbf{v}. \] By BK4(a) we have \[ \overline{A} \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}. \] Since $A$ is a real matrix, by BK4(b), we have $A = \overline{A}$. Therefore, the preceding displayed equality becomes \[ A \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}. \] Let us now calculate, using BK2(a), associativity of matrix multiplication, and the fact that $A = A^\top$, \begin{equation*} (A \mathbf{v})^\top \overline{\mathbf{v}} = (\mathbf{v}^\top A^\top) \overline{\mathbf{v}} = \mathbf{v}^\top ( A^\top \overline{\mathbf{v}}) = \mathbf{v}^\top ( A \overline{\mathbf{v}}). \end{equation*} Thus $(A \mathbf{v})^\top \overline{\mathbf{v}} = \mathbf{v}^\top ( A \overline{\mathbf{v}})$. Substituting here the equalities $A \mathbf{v} = \lambda \mathbf{v}$ and $A \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}$, we obtain \[ (\lambda \mathbf{v})^\top \overline{\mathbf{v}} = \mathbf{v}^\top \bigl( \overline{\lambda} \overline{\mathbf{v}}\bigr). \] Applying BK2(b) to the last equality we get \[ \lambda (\mathbf{v}^\top \overline{\mathbf{v}}) = \overline{\lambda} (\mathbf{v}^\top \overline{\mathbf{v}}). \] Since $\mathbf{v}$ is a nonzero vector, by BK4(c) we have that $\mathbf{v}^\top \overline{\mathbf{v}} \gt 0$. Therefore, the last equality yields \[ \lambda = \overline{\lambda} . \] By BK3(c), we deduce that $\lambda$ is real. QED.
The proof in the preceding item involves a lot of algebra with complex numbers. Here I present a proof that the eigenvalues of a $2\!\times\!2$ symmetric matrix are real. This proof is based on the quadratic formula.
Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ symmetric matrix. To calculate the eigenvalues of $A$ we solve the equation $\det(A-\lambda I) =0$ for $\lambda$. That is, we solve: \[ 0 = \left| \begin{matrix} a - \lambda & b \\ b & d -\lambda \end{matrix} \right| = (a-\lambda)(d-\lambda) - b^2 = \lambda^2 -(a+d)\lambda + ad -b^2. \] The discriminant of the preceding quadratic equation is: \begin{align*} (a+d)^2 -4(ad-b^2) &= a^2 + 2ad + d^2 - 4ad +4 b^2 \\ & = a^2 - 2ad + d^2 +4 b^2 \\ & = (a-d)^2+4b^2 \geq 0. \end{align*} Solving for $\lambda$ we get \[ \lambda_{1,2} = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a+d)^2 - 4 b^2} \Bigr) = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since the discriminant is nonnegative, that is $(a-d)^2 + 4 b^2 \geq 0$, both eigenvalues are real. In fact, if $(a-d)^2 + 4 b^2 = 0$, then $b = 0$ and $a=d$, so our matrix is a multiple of an identity matrix. Otherwise, that is if $(a-d)^2 + 4 b^2 \gt 0$, the symmetric matrix has two distinct eigenvalues \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr) \lt \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr). \]
Theorem. Eigenspaces of a symmetric matrix which correspond to distinct eigenvalues are orthogonal.
Proof. Let $A$ be a symmetric $n\!\times\!n$ matrix. Let $\lambda$ and $\mu$ be an eigenvalues of $A$ and let $\mathbf{u}$ and $\mathbf{v}$ be a corresponding eigenvector. Then $\mathbf{u} \neq \mathbf{0},$ $\mathbf{v} \neq \mathbf{0}$ and \[ A \mathbf{u} = \lambda \mathbf{u} \quad \text{and} \quad A \mathbf{v} = \mu \mathbf{v}. \] Assume that \[ \lambda \neq \mu. \] Next we calculate the same dot product in two different ways; here we use the fact that $A^\top = A$ and algebra of the dot product. The first calculation: \[ (A \mathbf{u})\cdot \mathbf{v} = (\lambda \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \] The second calculation: \begin{align*} (A \mathbf{u})\cdot \mathbf{v} & = (A \mathbf{u})^\top \mathbf{v} = \mathbf{u}^\top A^\top \mathbf{v} = \mathbf{u} \cdot \bigl(A^\top \mathbf{v} \bigr) = \mathbf{u} \cdot \bigl(A \mathbf{v} \bigr) \\ & = \mathbf{u} \cdot (\mu \mathbf{v} ) = \mu ( \mathbf{u} \cdot \mathbf{v}) \end{align*} Since, \[ (A \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \quad \text{and} \quad (A \mathbf{u})\cdot \mathbf{v} = \mu (\mathbf{u}\cdot\mathbf{v}) \] we conclude that \[ \lambda (\mathbf{u}\cdot\mathbf{v}) - \mu (\mathbf{u}\cdot\mathbf{v}) = 0. \] Therefore \[ ( \lambda - \mu ) (\mathbf{u}\cdot\mathbf{v}) = 0. \] Since we assume that $ \lambda - \mu \neq 0,$ the previous displayed equality yields \[ \mathbf{u}\cdot\mathbf{v} = 0. \] This proves that any two eigenvectors corresponding to distinct eigenvalues are orthogonal. Thus, the eigenspaces corresponding to distinct eigenvalues are orthogonal. QED.
Theorem. A symmetric $2\!\times\!2$ matrix is orthogonally diagonalizable.
Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ be a symmetric matrix. We need to prove that there exists an orthogonal $2\!\times\!2$ matrix $U$ and a diagonal $2\!\times\!2$ matrix $D$ such that $A = UDU^\top.$ The eigenvalues of $A$ are \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr), \quad \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since clearly \[ (a-d)^2 + 4 b^2 \geq 0, \] the eigenvalues $\lambda_1$ and $\lambda_2$ are real numbers.
If $\lambda_1 = \lambda_2$, then $(a-d)^2 + 4 b^2 = 0$, and consequently $b= 0$ and $a=d$; that is $A = \begin{bmatrix} a & 0 \\ 0 & a \end{bmatrix}$. Hence $A = UDU^\top$ holds with $U=I_2$ and $D = A$.
Now assume that $\lambda_1 \neq \lambda_2$. Let $\mathbf{u}_1$ be a unit eigenvector corresponding to $\lambda_1$ and let $\mathbf{u}_2$ be a unit eigenvector corresponding to $\lambda_2$. We proved that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal. Since $A$ is symmetric, $\mathbf{u}_1$ and $\mathbf{u}_2$ are orthogonal, that is the matrix $U = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 \end{bmatrix}$ is orthogonal. Since $\mathbf{u}_1$ and $\mathbf{u}_2$ are eigenvectors of $A$ we have \[ AU = U \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = UD. \] Therefore $A=UDU^\top.$ This proves that $A$ is orthogonally diagonalizable. QED.
Second Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ an arbitrary $2\!\times\!2$ be a symmetric matrix. If $b=0$, then an orthogonal diagonalization is \[ \begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \] Assume that $b\neq0.$ For the given $a,b,c \in \mathbb{R},$ introduce three new coordinates $z \in \mathbb{R},$ $r \in (0,+\infty),$ and $\theta \in (0,\pi)$ such that \begin{align*} z & = \frac{a+d}{2}, \\ r & = \sqrt{\left( \frac{a-d}{2} \right)^2 + b^2}, \\ \cos(2\theta) & = \frac{\frac{a-d}{2}}{r}, \quad \sin(2\theta) = \frac{b}{r}. \end{align*} The reader will notice that these coordinates are very similar to the cylindrical coordinates in $\mathbb{R}^3.$ It is now an exercise in matrix multiplication and trigonometry to calculate \begin{align*} & \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} z+r & 0 \\ 0 & z-r \end{bmatrix}\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} (z+r) \cos(\theta) & (z+r) \sin(\theta) \\ (r-z)\sin(\theta) & (z-r) \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} (z+r) (\cos(\theta))^2 - (r-z)(\sin(\theta))^2 & (z+r) \cos(\theta) \sin(\theta) -(z-r) \cos(\theta) \sin(\theta) \\ (z+r) \cos(\theta) \sin(\theta) + (r-z) \cos(\theta) \sin(\theta) & (z+r) (\sin(\theta))^2 + (z-r)(\cos(\theta))^2 \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} z + r \cos(2\theta) & r \sin(2\theta) \\ r \sin(2\theta) & z - r \cos(2\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \frac{a+d}{2} + \frac{a-d}{2} & b \\ b & \frac{a+d}{2} - \frac{a-d}{2} \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} a & b \\ b & d \end{bmatrix}. \end{align*} QED.
Theorem. For every positive integer $n$, a symmetric $n\!\times\!n$ matrix is orthogonally diagonalizable.
Proof. (You can skip this proof.) This statement can be proved by Mathematical Induction. The base case $n = 1$ is trivial. The case $n=2$ is proved above. To get a feel how mathematical induction proceeds we will prove the theorem for $n=3.$
Let $A$ be a $3\!\times\!3$ symmetric matrix. Then $A$ has an eigenvalue, which must be real. Denote this eigenvalue by $\lambda_1$ and let $\mathbf{u}_1$ be a corresponding unit eigenvector. Let $\mathbf{v}_1$ and $\mathbf{v}_2$ be unit vectors such that the vectors $\mathbf{u}_1,$ Let $\mathbf{v}_1$ and $\mathbf{v}_2$ form an orthonormal basis for $\mathbb R^3.$ Then the matrix $V_1 = \bigl[\mathbf{u}_1 \ \ \mathbf{v}_1\ \ \mathbf{v}_2\bigr]$ is an orthogonal matrix and we have \[ V_1^\top A V_1 = \begin{bmatrix} \mathbf{u}_1^\top A \mathbf{u}_1 & \mathbf{u}_1^\top A \mathbf{v}_1 & \mathbf{u}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{u}_1 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{u}_1 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] Since $A = A^\top$, $A\mathbf{u}_1 = \lambda_1 \mathbf{u}_1$ and since $\mathbf{u}_1$ is orthogonal to both $\mathbf{v}_1$ and $\mathbf{v}_2$ we have \[ \mathbf{u}_1^\top A \mathbf{u}_1 = \lambda_1, \quad \mathbf{v}_j^\top A \mathbf{u}_1 = \lambda_1 \mathbf{v}_j^\top \mathbf{u}_1 = 0, \quad \mathbf{u}_1^\top A \mathbf{v}_j = \bigl(A \mathbf{u}_1\bigr)^\top \mathbf{v}_j = 0, \quad \quad j \in \{1,2\}, \] and \[ \mathbf{v}_2^\top A \mathbf{v}_1 = \bigl(\mathbf{v}_2^\top A \mathbf{v}_1\bigr)^\top = \mathbf{v}_1^\top A^\top \mathbf{v}_2 = \mathbf{v}_1^\top A \mathbf{v}_2. \] Hence, \[ \tag{**} V_1^\top A V_1 = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] 0 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] By the already proved theorem for $2\!\times\!2$ symmetric matrix there exists an orthogonal matrix $\begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}$ and a diagonal matrix $\begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix}$ such that \[ \begin{bmatrix} \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \end{bmatrix} = \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix} \begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}^\top. \] Substituting this equality in (**) and using some matrix algebra we get \[ V_1^\top A V_1 = \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} % \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} % \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix}^\top \] Setting \[ U = V_1 \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} \quad \text{and} \quad D = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} \] we have that $U$ is an orthogonal matrix, $D$ is a diagonal matrix and $A = UDU^\top.$ This proves that $A$ is orthogonally diagonalizable. QED.
I want to emphasize the importance of Theorem 2 in Section 7.1. The "only if" part of the theorem below is part (d) of Theorem 3. The "if" part of the theorem below is Exercise 25
Theorem. Let $n$ be a positive integer and let $A$ be an $n\times n$ matrix. Then, $A$ is symmetric if and only if it is orthogonally diagonalizable.
Theorem 3(d) says: Let $n$ be a positive integer and let $A$ be an $n\times n$ matrix. Then: If $A$ is symmetric, then $A$ is orthogonally diagonalizable.
Exercise 25 says: Let $n$ be a positive integer and let $A$ be an $n\times n$ matrix. Then: If $A$ is orthogonally diagonalizable, then $A$ is symmetric.
To prove Exercise 25, we proceed as follows. Assume that $A$ is orthogonally diagonalizable. Then there exist an orthogonal matrix $U$ and a diagonal matrix $D$ such that $A=UDU^\top$. To prove that $A$ is symmetric we calculate $A^\top$: \[ A^\top = \bigl( UDU^\top \bigr)^\top = \bigl(U^\top\bigr)^\top D^\top U^\top = UDU^\top. \] In the last calculation, we used the property of the transpose that $(XY)^\top = Y^\top X^\top$ and consequently \[ (XYZ)^\top = \bigl((XY)Z\bigr)^\top = Z^\top (XY)^\top = Z^\top Y^\top X^\top. \] We also used that every diagonal matrix is symmetric, $D^\top = D$ and repeated transpose does not change a matrix $\bigl(X^\top\bigr)^\top = X$.
The theorem in the first item above is essential for solving Exercise 33:
Suppose \( A \) and \( B \) are both orthogonally diagonalizable and \( AB = BA \). Explain why \( AB \) is also orthogonally diagonalizable.
In fact, more mathematically, the exercise should have been stated:
If \( A \) and \( B \) are both orthogonally diagonalizable and \( AB = BA \), prove that \( AB \) is also orthogonally diagonalizable.
Proof goes like this. Assume that \( A \) and \( B \) are both orthogonally diagonalizable and \( AB = BA \). By the "if" part of the theorem stated in the first item (that is by Exercise 25) we conclude that \( A \) and \( B \) are both symmetric. Now we will use the fact that $A=A^\top$ and $B=B^\top$ and the fact that $A$ and $B$ commute to prove that $AB$ is symmetric: \[ (AB)^\top = (BA)^\top = A^\top B^\top = AB. \] Thus, $AB$ is symmetric. Now the "only if" part of the theorem stated in the first item (that is Theorem 3(d)) implies that $AB$ is orthogonally diagonalizable.
(We will talk more about this in class.) We will develop an alternative way of writing matrix $A$ as a linear combination of orthogonal projections onto the eigenspaces of $A$.
The columns of \[ \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \] form an orthonormal basis for $\mathbb{R}^3$ which consists of unit eigenvectors of $A.$
The first two columns \[ \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \] form an orthonormal basis for the eigenspace of $A$ corresponding to $-1.$ The last column \[ \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \] is an orhonormal basis for the eigenspace of $A$ corresponding to $8.$
The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $-1$ is \[ P_{-1} = \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] \]
The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $8$ is \[ P_8 = \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \left[ \begin{array}{ccc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right]. \]
Since the eigenvectors that we used above form a basis for $\mathbb{R}^3$ we have \[ P_{-1} + P_8 = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] + \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = I_3. \]
In the pictures below, the positive direction of the vertical axes points downward for aesthetic reasons.
The intersection of the canonical rotated paraboloid $z=x^2+y^2$ and a plane $z=ax+by+c$ is an ellipse, provided that $a^2+b^2+4c>0$. The projection of this ellipse onto the $xy$-plane is always a circle. Always! I find this amazing!
Notice that the picture below is "upside-down,"
with the positive direction of the $z$-axis pointing downwards.
How would we determine the intersection of the paraboloid and the plane? Recall, the paraboloid is the set of points $(x,y, x^2 + y^2)$, while the plane is the set of points $(x,y, ax+by+c)$. For a point $(x,y,z)$ to be both, on the paraboloid, and on the plane we must have \[ x^2 + y^2 = a x + b y + c. \] Which points $(x,y)$ in the $xy$-plane satisfy the preceding equation? Rewrite the equation as \[ x^2 - a x + y^2 - b y = c, \] and completing the squares comes to the rescue, so we obtain the following equation: \[ \boxed{\left(x - \frac{a}{2} \right)^2 + \left(y-\frac{b}{2}\right)^2 = \frac{a^2}{4} + \frac{b^2}{4} + c.} \] The boxed equation is the equation of a circle in $xy$-plane centered at the point $(a/2,b/2)$ with the radius $\displaystyle\frac{1}{2}\sqrt{a^2+b^2+4c}$. Above this circle, in the plane $z=ax+by+c$ is the ellipse which is also on the paraboloid $z=x^2+y^2$. The circle whose equation is boxed, we will call the circle determined by the paraboloid $z=x^2+y^2$ and the plane $z=ax+by+c$.
Below I will describe in more details the method of finding the best fit circle to a given set of points. The method is identical to finding the least-squares fit plane to a set of given points. In this case all the given points lie on the canonical rotated paraboloid $z = x^2+y^2.$
Let $n$ be a positive integer greater than $2.$ Assume that we are given $n$ noncollinear points in $\mathbb{R}^2$: \[ (x_1, y_1), \ \ (x_2, y_2), \ \ \ldots, \ (x_n, y_n). \]
The normal equations for the system from the preceding item are \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \]
You can copy-paste this command to a Mathematica notebook and test it on a set of points. The output of the command is a pair of the best circle's center and the best circle's radius.Clear[BestCir, gpts, mX, vY, abc]; BestCir[gpts_] := Module[ {mX, vY, abc}, mX = Transpose[Append[Transpose[gpts], Array[1 &, Length[gpts]]]]; vY = (#[[1]]^2 + #[[2]]^2) & /@ gpts; abc = Last[ Transpose[ RowReduce[ Transpose[ Append[Transpose[Transpose[mX] . mX], Transpose[mX] . vY] ] ] ] ]; {{abc[[1]]/2, abc[[2]]/2}, Sqrt[abc[[3]] + (abc[[1]]/2)^2 + (abc[[2]]/2)^2]} ]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = {{5, 2}, {-1, 5}, {3, -2}, {3, 4.5}, {-5/2, 3}, {1, 5}, {4, 3}, {-3, 1}, {-3/2, 4}, {1, -3}, {-2, -1}, {4, -1}}; cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = {{3, 1}, {2, -4}, {-2, 3}}; cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = ((4 {Cos[2 Pi #[[1]]], Sin[2 Pi #[[1]]]} + 1/70 {#[[2]], #[[3]]}) & /@ ((RandomReal[#, 3]) & /@ Range[100])); cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
Notice that these four points form a very narrow parallelogram. A characterizing property of a parallelogram is that its diagonals share the midpoint. For this parallelogram, the coordinates of the common midpoint of the diagonals are \[ \overline{x} = \frac{1}{4}(2+3+5+6) = 4, \quad \overline{y} = \frac{1}{4}(3+2+1+0) = 3/2. \] The long sides of this parallelogram are on the parallel lines \[ y = -\frac{2}{3}x +4 \quad \text{and} \quad y = -\frac{2}{3}x + \frac{13}{3}. \] It is natural to guess that the least square line is the line which is parallel to these two lines and half-way between them. That is the line \[ y = -\frac{2}{3}x + \frac{25}{6}. \] This line is the red line in the picture below. Clearly this line goes through the point $(4,3/2),$ the intersection of the diagonals of the parallelogram.
The only way to verify the guess from the preceding item is to calculate the least-squares line for these four points. Let us find the least-squares solution of the equation \[ \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] To get to the corresponding normal equations we multiply both sides by $X^\top$ \[ \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] The corresponding normal equations are \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 6 \\ 17 \end{array} \right]. \] Since the inverse of the above $2\!\times\!2$ matrix is \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right]^{-1} = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right], \] and the solution of the normal equations is unique and it is given by \[ \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right] \left[\begin{array}{c} 6 \\ 17 \end{array} \right] = \left[\begin{array}{c} \frac{43}{10} \\ -\frac{7}{10} \end{array} \right] \]
In the image below the forest green points are the given data points.
The red line is the line which I guessed could be the least-squares line.
The blue line is the true least-squares line.
It is amazing that what we observed in the preceding example is universal.
Proposition. If the line $y = \beta_0 + \beta_1 x$ is the least-squares line for the data points \[ (x_1,y_1), \ldots, (x_n,y_n), \] then $\overline{y} = \beta_0 + \beta_1 \overline{x}$, where \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \]
The above proposition is Exercise 20 (5th edition Exercise 14) in Section 6.6.Proof. Let \[ (x_1,y_1), \ldots, (x_n,y_n), \] be given data points and set \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \] Let $y = \beta_0 + \beta_1 x$ be the least-squares line for the given data points. Then the vector $\left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right]$ satisfies the normal equation \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] Multiplying the second matrix on the left-hand side and the third vector we get \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 + \beta_1 x_1 \\ \beta_0 + \beta_1 x_2 \\ \vdots \\ \beta_0 + \beta_1 x_n \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] The above equality is an equality of vectors with two components. The top components of these vectors are equal: \[ (\beta_0 + \beta_1 x_1) + (\beta_0 + \beta_1 x_2) + \cdots + (\beta_0 + \beta_1 x_n) = y_1 + y_2 + \cdots + y_n. \] Therefore \[ n \beta_0 + \beta_1 (x_1+x_3 + \cdots + x_n) = y_1 + y_2 + \cdots + y_n. \] Dividing by $n$ we get \[ \beta_0 + \beta_1 \frac{1}{n} (x_1+x_3 + \cdots + x_n) = \frac{1}{n}( y_1 + y_2 + \cdots + y_n). \] Hence \[ \overline{y} = \beta_0 + \beta_1 \overline{x}. \] QED.
In the image below the navy blue points are the given data points and the light blue plane is the least-squares plane that best fits these data points. The dark green points are their projections onto the $xy$-plane. The teal points are the corresponding points in the least-square plane.
Theorem. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then \[ \operatorname{Nul}(A) = \operatorname{Nul}\bigl(A^\top\!A\bigr). \]
Background Knowledge established in Math 204 is as follows.
BK1. Matrix multiplication is associative. Let $j, k, l, m$ be positive integers. Let $X$ be a $j\times k$ matrix, $Y$ be a $k \times l$ matrix, and $Z$ be an $l \times m$ matrix. Then $XY$ is $j\times l$ matrix, $YZ$ is $k\times m$ and $(XY)Z = X(YZ)$.
BK2. How to calculate the transpose of a product of two matrices. Let $j, k, l$ be positive integers. Let $X$ be a $j\times k$ matrix, and let $Y$ be a $k \times l$ matrix. Then $(XY)^\top = Y^\top\mkern -2mu X^\top$. A mnemonic tool to internalize this rule, is to write the sizes of all the matrices that are involved in this equality: $XY$ is a $j\times l$ matrix, $(XY)^\top$ is an $l \times j$ matrix, $X^\top$ is a $k\times j$ matrix, $Y^\top$ is an $l\times k$ matrix. We need a formula for $l \times j$ matrix $(XY)^\top$ using a $k\times j$ matrix $X^\top$ and a $l\times k$ matrix $Y^\top$. The only way to multiply $X^\top$ and $Y^\top$ is $Y^\top\mkern -2mu X^\top$ and in this way we get an $l \times j$ matrix. This is not a proof, just a mnemonic tool. For proof, one needs to compare the entries of both matrices in the equality.
BK3. Let $n$ be a positive integer. The only vector in $\mathbb{R}^n$ whose norm is zero is the zero vector. That is for $\mathbf{v} \in \mathbb{R}^n$ we have $\|\mathbf{v}\| = 0$ if and only if $\mathbf{v} = \mathbf{0}_n$.
Proof. The set equality $\operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A)$ means \[ \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Nul}(A). \] As with all equivalences, we prove this equivalence in two steps.
Step 1. Assume that $\mathbf{x} \in \operatorname{Nul}(A)$. Then $A\mathbf{x} = \mathbf{0}$. Consequently, using BK1, we have \[ (A^\top\!A)\mathbf{x} = A^\top ( \!A\mathbf{x}) = A^\top\mathbf{0} = \mathbf{0}. \] Hence, $(A^\top\!A)\mathbf{x}= \mathbf{0}$, and therefore $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Nul}(A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ). \]
Step 2. In this step, we prove the the converse: \[ \tag{*} \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A). \] Assume, $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Then, $(A^\top\!\!A) \mathbf{x} = \mathbf{0}$. Multiplying the last equality by $\mathbf{x}^\top$ we get $\mathbf{x}^\top\! (A^\top\!\! A \mathbf{x}) = 0$. Using the associativity of the matrix multiplication, that is BK1, we obtain $(\mathbf{x}^\top\!\! A^\top) (A \mathbf{x}) = 0$. Using the Linear Algebra with the transpose operation, that is BK2, we get $(A \mathbf{x})^\top\! (A \mathbf{x}) = 0$. Now recall that for every vector $\mathbf{v}$ we have $\mathbf{v}^\top \mathbf{v} = \|\mathbf{v}\|^2$. Thus, we have proved that $\|A\mathbf{x}\|^2 = 0$. Now recall BK3, that the only vector whose norm is $0$ is the zero vector, to conclude that $A\mathbf{x} = \mathbf{0}_n$. This means $\mathbf{x} \in \operatorname{Nul}(A)$. This completes the proof of implication (*). The theorem is proved. QED.
(It is customary to end mathematical proofs with the abbreviation QED, see its Wikipedia entry.)
In Step 2 of the preceding proof, the idea introduced in the sentence which started with the highlighted text, is a truly brilliant idea. It is a pleasure to share these brilliant mathematical vignettes with you.
The preceding theorem has an important corollary.
Corollary 1. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then the columns of $A$ are linearly independent if and only if the matrix $A^\top\!A$ is invertible.
Background Knowledge established in Math 204 is as follows.
BK1. The columns of $A$ are linearly independent if and only if $\operatorname{Nul}(A)=\{\mathbf{0}_m\}$.
BK2. Notice that the matrix $A^\top\!\! A$ is a square $m\!\times\!m$ matrix. By The Invertible Matrix Theorem The matrix $A^\top\!\! A$ is invertible if and only if $\operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}$.
Proof. By BK1 the columns of $A$ are linearly independent if and only if $\operatorname{Nul}(A)=\{\mathbf{0}_m\}$. By the Theorem we have $\operatorname{Nul}(A) = \operatorname{Nul}(A^\top\!\! A )$. Therefore $\operatorname{Nul}(A)=\{\mathbf{0}_m\}$ if and only if $\operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}$. By BK2 $\operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}$ if and only if the matrix $A^\top\!\! A$.
Based on the three equivalences stated in the preceding paragraph, we have proved the corollary. QED.
Corollary 2. Let $A$ be an $n\!\times\!m$ matrix. Then $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$.
Proof 1. The following equalities we established earlier: \begin{align*} \operatorname{Col}(A^\top\!\! A ) & = \operatorname{Row}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp, \\ \operatorname{Col}(A^\top) & = \operatorname{Row}(A) = \bigl( \operatorname{Nul}(A) \bigr)^\perp \end{align*} In the above Theorem we proved that the following subspaces are equal \[ \operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A). \] Equal subspaces have equal orthogonal complements. Therefore \[ \bigl(\operatorname{Nul}(A^\top\!\! A )\bigr)^\perp = \bigl( \operatorname{Nul}(A) \bigr)^\perp. \] Since earlier we proved \[ \operatorname{Col}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp \quad \text{and} \quad \operatorname{Col}(A^\top) = \bigl( \operatorname{Nul}(A) \bigr)^\perp, \] the last three equalities imply \[ \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top). \] The corollary is proved. QED.
Proof 2. (This is a direct proof. It does not use the above Theorem. It uses the existence of an orthogonal projection onto the column space of $A$.) The set equality $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$ means \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Col}(A^\top). \] We will prove this equivalence in two steps.
Step 1. Assume that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Then there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\mathbf{x} = (A^\top\!\!A)\mathbf{v}.$ Since by the definition of matrix multiplication we have $(A^\top\!\!A)\mathbf{v} = A^\top\!(A\mathbf{v})$, we have $\mathbf{x} = A^\top\!(A\mathbf{v}).$ Consequently, $\mathbf{x} \in \operatorname{Col}(A^\top).$ Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top). \]
Step 2. Now we prove the converse: \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). \] Assume, $\mathbf{x} \in \operatorname{Col}(A^\top).$ By the definition of the column space of $A^\top$, there exists $\mathbf{y} \in \mathbb{R}^n$ such that $\mathbf{x} = A^\top\!\mathbf{y}.$ Let $\widehat{\mathbf{y}}$ be the orthogonal projection of $\mathbf{y}$ onto $\operatorname{Col}(A).$ That is $\widehat{\mathbf{y}} \in \operatorname{Col}(A)$ and $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}.$ Since $\widehat{\mathbf{y}} \in \operatorname{Col}(A),$ there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\widehat{\mathbf{y}} = A\mathbf{v}.$ Since $\bigl(\operatorname{Col}(A)\bigr)^{\perp} = \operatorname{Nul}(A^\top),$ the relationship $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}$ yields $A^\top\bigl(\mathbf{y} - \widehat{\mathbf{y}}\bigr) = \mathbf{0}.$ Consequently, since $\widehat{\mathbf{y}} = A\mathbf{v},$ we deduce $A^\top\bigl(\mathbf{y} - A\mathbf{v}\bigr) = \mathbf{0}.$ Hence \[ \mathbf{x} = A^\top\mathbf{y} = \bigl(A^\top\!\!A\bigr) \mathbf{v} \quad \text{with} \quad \mathbf{v} \in \mathbb{R}^m. \] This proves that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Thus, the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \] is proved. The corollary is proved. QED.
Corollary 3. Let $A$ be an $n\!\times\!m$ matrix. The matrices $A$, $A^\top$, $A^\top\!\! A$ and $A A^\top$ have the same rank.
Question. Given two vectors in $\mathbb{R}^3$ we learned how to calculate the area of the parallelogram spanned by these two vectors. In a multivariable calculus class we used the cross product for this calculation. Given three vectors in $\mathbb{R}^3$ we learned how to calculate the volume of the parallelepiped spanned by these three vectors. We used the determinant of the $3\times 3$ matrix whose columns are the given three vectors. If we are given three vectors in $\mathbb{R}^4$, is there a way to calculate the volume of the three-dimensional parallelepiped spanned by these three vectors?
Straightforward projections onto the span. Let $\mathcal{W} = \operatorname{Span}\bigl\{\mathbf{u}_1, \ldots, \mathbf{u}_m\bigr\}$ and let $\mathbf{y} \in \mathbb{R}^n$. Then the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ is given by \[ \require{bbox} \bbox[#FFFF00, 8px, border: 2px solid #808000]{ \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}}(\mathbf{y}) = \left( \mathbf{y}\cdot \mathbf{u}_1 \right) \mathbf{u}_1 + \left( \mathbf{y}\cdot \mathbf{u}_2 \right) \mathbf{u}_2 + \cdots + \left( \mathbf{y}\cdot \mathbf{u}_m \right) \mathbf{u}_m.} \]
The $QR$ factorization of a matrix is just the Gram-Schmidt orthogonalization algorithm for the columns of $A$ written in matrix form. The only difference is that the Gram-Schmidt orthogonalization algorithm produces orthogonal vectors which we have to normalize to obtain the matrix $ Q$ with orthonormal columns.
Example. A nice example is given by calculating $QR$ factorization of the $3\!\times\!2$ matrix \[ A = \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right]. \]
In the next example, I will demonstrate a useful simplification strategy when calculating the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$, and so on. Following the given formulas, calculation the vectors $\mathbf{v}$s will frequently involve fractions and make the arithmetic of the subsequent calculations more difficult. Recall, the objective here is to produce orthogonal set of vectors keeping the running spans equal.
To simplify the arithmetic, at each step of the Gram-Schmidt algorithm, we can replace a vector $\mathbf{v}_k$ by its scaled version $\alpha \mathbf{v}_k$ with a conveniently chosen $\alpha \gt 0$.
In this way we can avoid fractions in vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3.$ In the next item I present an example.
Definition. Let $\mathcal{W}$ be a subspace of $\mathbb{R}^n$ and let $\mathbf{y} \in \mathbb{R}^n.$ The vector $\widehat{\mathbf{y}} \in \mathbb{R}^n$ is called the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ if the vector $\widehat{\mathbf{y}}$ has the following two properties:
The notation for the orthogonal projection $\widehat{\mathbf{y}}$ of $\mathbf{y} \in \mathbb{R}^n$ onto $\mathcal{W}$ is: \[ \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal W}(\mathbf{y}). \] The transformation $\operatorname{Proj}_{\mathcal W}: \mathbb{R}^n \to \mathcal W$ is called the orthogonal projection of $\mathbb{R}^n$ onto $\mathcal{W}$.
Another important theorem in this section is Theorem 10 which I would call The Standard Matrix of an Orthogonal Projection.
The proof of Theorem 10 given in the book is deceptively simple. Please do understand the proof in the book. Below I will give another proof of this theorem.
The proof starts here.
Let $\mathbf{y} \in \mathbb R^n$ be arbitrary. By the definition of orthogonal projection, to prove that \[ UU^\top\! \mathbf y = \operatorname{Proj}_{\mathcal W}(\mathbf{y}) \] we have to prove two claims: the first \[ UU^\top\!\mathbf{y} \in \mathcal{W} \] and, the second, \[ \mathbf{y} - UU^\top\!\mathbf{y} \in \mathcal{W}^\perp. \] Since every vector of the form $U\mathbf{v}$ belongs to $\operatorname{Col}(U)$, we have that \[ UU^\top\! \mathbf{y} = U\bigl(U^\top\! \mathbf{y}\bigr) \in \operatorname{Col}(U). \] By BK 2 we have $\operatorname{Col}(U) = \mathcal{W}.$ Therefore $UU^\top \mathbf y \in {\mathcal W}$ is proved. This proves the first claim.
To prove the second claim, we use BK 3. By BK 3: $\mathbf{x} \in \mathcal{W}^\perp$ if and only if $U^\top\!\mathbf{x} = \mathbf{0}$.. Therefore, to prove \[ \mathbf{y} - UU^\top\!\mathbf{y} \in \mathcal{W}^\perp, \] we need to prove \[ U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr) = \mathbf{0}. \] Let us calculate $U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr)$ below \begin{align*} U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr) & = U^\top \! \mathbf{y} - U^\top \! \bigl(UU^\top\!\mathbf{y} \bigr) \\ &= U^\top \! \mathbf{y} - \bigl( U^\top \! U\bigr) \bigl(U^\top\!\mathbf{y}\bigr) \\ & = U^\top \! \mathbf{y} - I_m \bigl(U^\top\!\mathbf{y}\bigr) \\ & = U^\top \! \mathbf{y} - U^\top\!\mathbf{y} \\ & = \mathbf{0} \end{align*} In the preceding sequence of equalities, the first equality holds since the matrix multiplication is distributive, the second equality holds since the matrix multiplication is associative, the third equality follows from BK 1, the fourth equality follows from the definition of the matrix $I_m$, and the fifth equality holds by the definition of the opposite vector.
In conclusion, for every $\mathbf{y} \in \mathbb R^m$ we proved that the vector $UU^\top\!\mathbf{y}$ has the properties ① and ② in the definition of orthogonal projection onto $\mathcal{W}$. This proves \[ \operatorname{Proj}_{\mathcal W}(\mathbf{y}) = UU^\top\!\mathbf{y}. \]
The proof ends here.Let $m, n \in \mathbb{N}$. Let $\mathbf{v}_1, \ldots, \mathbf{v}_m \in \mathbb{R}^n$ be an orthogonal set of nonzero vectors. What the last phrase means is the following
It is truly important that we always connect vectors with matrices formed by those vectors. Let $\mathbf{v}_1, \ldots, \mathbf{v}_m \in \mathbb{R}^n$ be an orthogonal set of nonzero vectors.
Let us form the $n\times m$ matrix $M$ whose columns are the vectors $\mathbf{v}_1, \ldots, \mathbf{v}_m$: \[ M = \left[\!\begin{array}{cccc} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_m \end{array}\!\!\right]. \] Recall that $M^\top$ is an $m\times n$ matrix. Now calculate the product of $M^\top$ and $M$ as follows: \begin{align*} M^\top M & = \left[\!\begin{array}{c} \mathbf{v}_1^\top \\ \mathbf{v}_2^\top \\ \vdots \\ \mathbf{v}_m^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_m \end{array}\!\!\right] \\[6pt] & = \left[\!\begin{array}{cccc} \mathbf{v}_1\!\!\cdot\!\mathbf{v}_1 & \mathbf{v}_1\!\!\cdot\!\mathbf{v}_2 & \cdots & \mathbf{v}_1\!\!\cdot\!\mathbf{v}_m \\ \mathbf{v}_2\!\!\cdot\!\mathbf{v}_1 & \mathbf{v}_2\!\!\cdot\!\mathbf{v}_2 & \cdots & \mathbf{v}_2\!\!\cdot\!\mathbf{v}_m \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{v}_m\!\!\cdot\!\mathbf{v}_1 & \mathbf{v}_m\!\!\cdot\!\mathbf{v}_2 & \cdots & \mathbf{v}_m\!\!\cdot\!\mathbf{v}_m \end{array}\!\!\right] \\[6pt] & = \left[\!\begin{array}{cccc} \mathbf{v}_1\!\!\cdot\!\mathbf{v}_1 & 0 & \cdots & 0 \\ 0 & \mathbf{v}_2\!\!\cdot\!\mathbf{v}_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \mathbf{v}_m\!\!\cdot\!\mathbf{v}_m \end{array}\!\!\right] \end{align*} Thus $M^\top\! M$ is a diagonal $m\times m$ matrix.
The above matrix calculation shows that the vectors $\mathbf{v}_1, \ldots, \mathbf{v}_m \in \mathbb{R}^n$ form an orthogonal set of nonzero vectors if and only if the matrix $M^\top\! M$ is a diagonal $m\times m$ with nonzero entries on the diagonal.
Why are orthogonal sets of nonzero vectors in $\mathbb{R}^n$ important? My answer to that question are the following three properties, for which I use the adjective straightforward. Look at the proofs in the textbook, the proofs of these properties are straightforward using some standard algebraic manipulations with linear equations.
Straightforward linear independence. The vectors in an orthogonal set of nonzero vectors $\mathbf{v}_1, \ldots, \mathbf{v}_m$ are linearly independent. This is Theorem 4 in Section 6.2.
Straightforward linear combinations. If \[ \mathbf{y} = \alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_1 + \cdots + \alpha_m \mathbf{v}_m, \] then \[ \alpha_1 = \frac{\mathbf{y}\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}, \quad \alpha_2 = \frac{\mathbf{y}\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}, \quad \ldots, \quad \alpha_m = \frac{\mathbf{y}\cdot \mathbf{v}_m}{\mathbf{v}_m \cdot \mathbf{v}_m}. \]
In other words:
If $\mathbf{y} \in \operatorname{Span}\bigl\{\mathbf{v}_1, \ldots, \mathbf{v}_m\bigr\}$, then \[ \require{bbox} \bbox[#FFFF00, 8px, border: 2px solid #808000]{ \mathbf{y} = \left(\frac{\mathbf{y}\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 + \left(\frac{\mathbf{y}\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2+ \cdots + \left(\frac{\mathbf{y}\cdot \mathbf{v}_m}{\mathbf{v}_m \cdot \mathbf{v}_m}\right) \mathbf{v}_m.} \]
This is Theorem 5 in Section 6.2.
Straightforward projections onto the span. Let $\mathcal{W} = \operatorname{Span}\bigl\{\mathbf{v}_1, \ldots, \mathbf{v}_m\bigr\}$ and let $\mathbf{y} \in \mathbb{R}^n$. Then the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ is given by \[ \require{bbox} \bbox[#FFFF00, 8px, border: 2px solid #808000]{ \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}}(\mathbf{y}) = \left(\frac{\mathbf{y}\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 + \left(\frac{\mathbf{y}\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2 + \cdots + \left(\frac{\mathbf{y}\cdot \mathbf{v}_m}{\mathbf{v}_m \cdot \mathbf{v}_m}\right) \mathbf{v}_m.} \]
This is Theorem 8 in Section 6.3.
Consider the vectors \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right]. \]
These vectors form an orthogonal sets of nonzero vectors. To verify this claim, it is sufficient to calculate \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = 0, \quad \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 0, \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 0. \]
By Theorem 4 in Section 6.2 these vectors are linearly independent. This is an example of straightforward linear independence.
As shown in the preceding item the vectors \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right], \] form an orthogonal set of nonzero vectors. By Theorem 4 in Section 6.2 they are linearly independent. Consequently they form a basis for $\mathbb{R}^3$.
To express the vector $\left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right]$ as a linear combination of the vectors \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right], \] we use the straightforward linear combinations, that is Theorem 5 in Section 6.2.
We calculate \[ \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] = 6, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = -3, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 33, \] and \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] = 81, \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = 81, \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 81. \] Then \begin{align*} \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] & = \frac{6}{81} \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] + \frac{-3}{81} \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] + \frac{33}{81} \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] \\ & = \left[\! \begin{array}{r} -\frac{16}{27} \\ \frac{2}{27} \\ \frac{8}{27} \end{array} \!\right] + \left[\! \begin{array}{r} -\frac{1}{27} \\ \frac{8}{27} \\ -\frac{4}{27} \end{array} \!\right] + \left[\! \begin{array}{r} \frac{44}{27} \\ \frac{44}{27} \\ \frac{77}{27} \end{array} \!\right] \end{align*}
Consider the vectors \[ \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right]. \] These two vectors form an orthogonal set of nonzero vectors in $\mathbb{R}^4$.
To calculate the orthogonal projection of the vector $\mathbf{y} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right]$ onto the linear span of the given two vectors, that is \[ \mathcal{W} = \operatorname{Span}\left\{ \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right], \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \right\} \] we use straightforward projections onto the span, that is Theorem 8 in Section 6.3.
To calculate the projection, we need to calculate \[ \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 30, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 10 \] and \[ \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 36, \quad \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 36. \] Then the projection is: \begin{align*} \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}} \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] & = \frac{30}{36} \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] + \frac{10}{36} \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \\ & = \left[\! \begin{array}{c} \frac{5}{6} \\ \frac{5}{6} \\ \frac{25}{6} \\ \frac{15}{6} \end{array} \!\right] + \left[\! \begin{array}{r} \frac{5}{18} \\ -\frac{5}{18} \\ -\frac{15}{18} \\ \frac{25}{18} \end{array} \!\right] \\ & = \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right]. \end{align*} To verify that this calculation is correct we calculate \[ \mathbf{y} - \widehat{\mathbf{y}} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] - \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right] = \left[\! \begin{array}{r} -\frac{1}{9} \\ \frac{13}{9} \\ -\frac{3}{9} \\ \frac{1}{9} \end{array} \!\right]. \] This vector should be orthogonal to the both given vectors: \[ \frac{1}{9} \left[\! \begin{array}{r} -1 \\ 13 \\ -3 \\ 1 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 0, \quad \frac{1}{9} \left[\! \begin{array}{r} -1 \\ 13 \\ -3 \\ 1 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 0 \] Finally, to celebrate our calculation we write \[ \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] = \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right] + \left[\! \begin{array}{r} -\frac{1}{9} \\ \frac{13}{9} \\ -\frac{3}{9} \\ \frac{1}{9} \end{array} \!\right], \] and we point out that two vectors in the sum are orthogonal to each other. Moreover, the first vector in the sum is in the subspace $\mathcal{W}$ and the second vector in the sum is in the subspace $\mathcal{W}^\perp.$
An illustration of a Reflection across the green line
Let $A$ be a $3\times 3$ matrix. Last few days we studied the following matrix initial value problem:
It is a fact from differential equation that the solution of this initial value problem is unique. The unique solution of the above boxed initial value problem is called the matrix-exponential function: \[ Y(t) = {\Large e}^{A\mkern 0.75mu t}. \]
It is important to notice that from the boxed equation evaluated at $t=0$ we obtain \[ Y'(0) = A\mkern 1mu Y(0) = A\mkern 1mu I_3 = A . \]The exponential notation introduced in the previous item is analogous to what we learned in a calculus class. Let $a$ be a real number. The unique solution $y(t)$ of the initial \[ y'(t) = a\mkern 0.75mu y(t) \quad \text{and} \quad y(0) = 1 \] is the scalar exponential function \[ y(t) = e^{a\mkern 0.75mu t}. \]
Yesterday I posted about the matrix-valued exponential function $t\mapsto e^{At}$. Here $A$ is a $3\times 3$ matrix. We defined the matrix-valued exponential function $t\mapsto e^{At}$ as to be the unique solution $Y(t)$ of the following problem:
The goal of Problem 5 on Assignment 1 is to explore analogous problem in which real number $a$ is substituted with a diagonalizable $3\times 3$ matrix $A$.
In analogy with the scalar case, we define the matrix-valued exponential function $t\mapsto e^{At}$ to be the unique solution $Y(t)$ of the following problem:
(n-by-n matrix M) (k-th column of the n-by-n identity matrix) = (k-th column of the n-by-n matrix M).
Place the cursor over the image to start the animation.
Below I want to present a change of coordinates matrix in a vector space of polynomials which requires only the Binomial Theorem. The Binomial Theorem is the theorem that you might have seen in a college algebra class: \begin{align*} (u+v)^1 & = u+v, \\ (u+v)^2 & = u^2+2\mkern 2mu u v + v^2, \\ (u+v)^3 & = u^3+ 3\mkern 2mu u^2 v + 3\mkern 2mu u v^2 + v^3,\\ (u+v)^4 & = u^4+ 4\mkern 2mu u^3 v + 6\mkern 2mu u^2 v^2 + 4\mkern 2mu u v^3 + v^4,\\ (u+v)^5 & = u^5+ 5\mkern 2mu u^4 v + 10\mkern 2mu u^3 v^2 + 10\mkern 2mu u^2 v^3 + 5\mkern 2mu u v^4 + v^5, \end{align*} and so on.
We do not need the general version of the Binomial Theorem here, but, since we mentioned it I want to introduce you to the important concepts related to the Binomial Theorem.
In general, if $n \in \mathbb{N}$ we have \[ (u+v)^n = \sum_{k=0}^n \binom{n}{k} \mkern 2mu u^{n-k} v^k = u^n + n \mkern 2mu u^{n-1} v + \frac{n(n-1)}{2}\mkern 2mu u^{n-2} v^2 + \cdots + \frac{n(n-1)}{2}\mkern 2mu u^{2} v^{n-2} + n\mkern 2mu u v^{n-1} + v^n. \]
In the above formula, for \( n, k \in \{0\}\cup\mathbb{N}\) with \(k \leq n \), the symbol \( \displaystyle \binom{n}{k} \) (read as "n choose k") denotes the Binomial coefficient. The definition is:
\[ \binom{n}{k} = \frac{n!}{k! \, (n-k)!}, \]
where for \( m \in \mathbb{N} \), \( m! \) (read as "m factorial") is the product of all positive integers up to \( m \). By convention \( 0! = 1 \).
The Base Case: \(0!=1\)
The Recursive Step: For all \(m\in\mathbb{N}\) we set \(m! = \bigl( (m-1)! \bigr) \mkern 2px m\).
For more details, see Factorial.
The recursive definition of the binomial coefficients is as follows:
The Base Case: \begin{equation*} \text{For all} \ \ n \in \{0\}\cup\mathbb{N} \quad \text{we set} \quad \binom{n}{0} = 1 \quad \text{and} \quad \binom{n}{n} = 1. \end{equation*} The Recursive Step: \begin{equation*} \text{For all} \ \ n \in \mathbb{N} \ \ \text{and} \ \ k \in \{1,\ldots,n\} \quad \text{we set} \quad \binom{n+1}{k} = \binom{n}{k-1} + \binom{n}{k}. \end{equation*} At each line below, the recursive step with specific values for \(n\) and \(k\) and the previously evaluated values (that is why it is called a recursion, see the next item below) for the binomial coefficients yields:
\begin{alignat*}{2} &\text{For } n=2, \ k=1 \qquad &&\binom{2}{1} = \binom{1}{0} + \binom{1}{1} = 1 + 1 = 2, \\ &\text{For } n=3, \ k=1 &&\binom{3}{1} = \binom{2}{0} + \binom{2}{1} = 1 + 2 = 3, \\ &\text{For } n=3, \ k=2 &&\binom{3}{2} = \binom{2}{1} + \binom{2}{2} = 2 + 1 = 3, \\ &\text{For } n=4, \ k=1 &&\binom{4}{1} = \binom{3}{0} + \binom{3}{1} = 1 + 3 = 4, \\ &\text{For } n=4, \ k=2 &&\binom{4}{2} = \binom{3}{1} + \binom{3}{2} = 3 + 3 = 6, \\ &\text{For } n=4, \ k=3 &&\binom{4}{3} = \binom{3}{2} + \binom{3}{3} = 3 + 1 = 4, \\ &\text{For } n=5, \ k=1 &&\binom{5}{1} = \binom{4}{0} + \binom{4}{1} = 1 + 4 = 5, \\ &\text{For } n=5, \ k=2 &&\binom{5}{2} = \binom{4}{1} + \binom{4}{2} = 4 + 6 = 10, \\ &\text{For } n=5, \ k=3 &&\binom{5}{3} = \binom{4}{2} + \binom{4}{3} = 6 + 4 = 10, \\ &\text{For } n=5, \ k=4 &&\binom{5}{4} = \binom{4}{3} + \binom{4}{4} = 4 + 1 = 5, \\ & & & \quad \quad \mkern 12px \vdots \end{alignat*}
For more details about this recursion, see Pascal's triangle.
Below is a proof that the monomials $1, x, x^2, x^3$ are linearly independent in the vector space ${\mathbb P}_3$. First we need to be specific what we need to prove.
Let $\alpha_1,$ $\alpha_2,$ $\alpha_3,$ and $\alpha_4$ be scalars in $\mathbb{R}.$ We need to prove the following implication: If \[ \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}, \] then \[ \bbox[5px, #FF4444, border: 1pt solid red]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \] Proof.
Definition. A function $f$ from $A$ to $B$, $f:A\to B$, is called surjection if it satisfies condition the following condition:
Definition. A function $f$ from $A$ to $B$, $f:A\to B$, is called injection if it satisfies the following condition
An equivalent formulation of the preceding condition is:
Definition. A function $f:A\to B$ is called bijection if it satisfies the following two conditions:
In other words, a function $f:A\to B$ is called bijection if it is both an injection and a surjection.
Definition. Let $\mathcal V$ and $\mathcal W$ be vector spaces. A linear bijection $T: \mathcal V \to \mathcal W$ is said to be an isomorphism.
Theorem 8. Let $n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. The coordinate mapping \[ \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal V, \] is a linear bijection between the vector space $\mathcal V$ and the vector space $\mathbb{R}^n.$
Theorem 8. Let $n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. The coordinate mapping \[ \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal{V}, \] is an isomorphism between the vector space $\mathcal V$ and the vector space $\mathbb{R}^n.$
Corollary 1. Let $m, n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. Then the following statements are equivalent:
Corollary 2. Let $m, n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. Then the following statements are equivalent:
Each step in a row reduction can be achieved by multiplication by a matrix.
Step | the row operations | the matrix used | the matrix inverse |
---|---|---|---|
1st | $\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to (-2)\sideset{_o}{_1}R + \sideset{_o}{_2}R,\\ \sideset{_n}{_3}R \to (-3)\sideset{_o}{_1}R + \sideset{_o}{_3}R \end{array} $ | $M_1 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ -3 & 0 & 1 \\ \end{array}\right]$ | $(M_1)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 0 & 1 \\ \end{array}\right]$ |
2nd | $\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to \sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to (-2)\sideset{_o}{_2}R + \sideset{_o}{_3}R \end{array}$ | $M_2 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{array}\right]$ | $(M_2)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array}\right]$ |
The importance of the careful keeping track of each matrix is that we can calculate which single matrix performs the above Row Reduction:
\[ M_2 M_1 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ -3 & 0 & 1 \\ \end{array}\right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 1 & -2 & 1 \\ \end{array} \right] \] You can verify that \[ \left[ \begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 1 & -2 & 1 \\ \end{array} \right] \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 2 & 4 & 1 & 0 & 5 \\ 3 & 6 & 2 & -1 & 8 \end{array}\right] = \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 0 & 0 & 1 & -2 & 1 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]. \]Each step in a row reduction can be achieved by multiplication by a matrix.
Step | the row operations | the matrix used | the matrix inverse |
---|---|---|---|
1st | $\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_3}R, \\ \sideset{_n}{_2}R \to \frac{1}{2}\sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to \frac{1}{3} \sideset{_o}{_1}R \end{array}$ | $M_1 = \left[\!\begin{array}{rrr} 0 & 0 & 1 \\ 0 & \frac{1}{2} & 0 \\ \frac{1}{3} & 0 & 0 \\ \end{array}\right]$ | $(M_1)^{-1} = \left[\!\begin{array}{rrr} 0 & 0 & 3 \\ 0 & 2 & 0 \\ 1 & 0 & 0 \\ \end{array}\right]$ |
2nd | $\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to (-1)\sideset{_o}{_1}R+\sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to (-1)\sideset{_o}{_1}R + \sideset{_o}{_3}R \end{array}$ | $M_2 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{array}\right]$ | $(M_2)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{array}\right]$ |
3rd | $\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R + 2 \sideset{_o}{_2}R, \\ \sideset{_n}{_2}R \to (-1)\sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to -\frac{5}{3} \sideset{_o}{_2}R + \sideset{_o}{_3}R \end{array}$ | $M_3 = \left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right]$ | $(M_3)^{-1} = \left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right]$ |
4th | $\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to \sideset{_o}{_2}R + (-3)\sideset{_o}{_3}R,\\ \sideset{_n}{_3}R \to 6 \sideset{_o}{_3}R \end{array}$ | $M_4 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & -3 \\ 0 & 0 & 6 \end{array}\right]$ | $(M_4)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & \frac{1}{2} \\ 0 & 0 & \frac{1}{6} \end{array}\right]$ |
The importance of the careful keeping track of each matrix is that we can calculate which single matrix performs the above Row Reduction:
\[ M_4 M_3 M_2 M_1 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & -3 \\ 0 & 0 & 6 \end{array}\right]\left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right] \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 0 & 0 & 1 \\ 0 & \frac{1}{2} & 0 \\ \frac{1}{3} & 0 & 0 \\ \end{array}\right] = \left[ \begin{array}{rrr} 0 & 1 & -1 \\ -1 & 2 & -1 \\ 2 & -5 & 4 \\ \end{array} \right] \] You can verify that \[ \left[ \begin{array}{rrr} 0 & 1 & -1 \\ -1 & 2 & -1 \\ 2 & -5 & 4 \\ \end{array} \right] \left[\!\begin{array}{rrrrr} 3 & 1 & 5 & 1 & 2 \\ 2 & 2 & 2 & 1 & 4 \\ 1 & 2 & 0 & 1 & 5 \end{array}\right] = \left[\!\begin{array}{rrrrr} 1 & 0 & 2 & 0 & -1 \\ 0 & 1 & -1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 4 \end{array}\right]. \]It is important to observe that forming the above three linear combinations is straightforward because of the positioning of the leading $1$s in the rows of the RREF. I tried to emphasise this by
I wanted to provide a colorful beginning of the New Year and the Winter Quarter, so I started the class by talking about the colors in relation to linear algebra. I love the application of vectors to COLORS so much that I wrote a webpage to celebrate it: Color Cube.
It is important to point out that in the red-green-blue coloring scheme, the following eighteen colors stand out. I present them in six steps with three colors in each step.
You:
Can you please write a complete LaTeX file with instructions on using basic mathematical operations, like fractions, sums, integrals, basic functions, like cosine, sine, and exponential function, and how to structure a document and similar features? Please explain the difference between the inline and displayed mathematical formulas. Please include examples of different ways of formatting displayed mathematical formulas. Please include what you think would be useful to a mathematics student. Also, can you please include your favorite somewhat complicated mathematical formula as an example of the power of LaTeX? I emphasize I want a complete file that I can copy into the LaTeX compiler and compile into a pdf file. Please ensure that your document contains the code for the formulas you are writing, which displays both as code separately from compiled formulas. Also, please double-check that your code compiles correctly. Remember that I am a beginner and cannot fix the errors. Please act as a concerned teacher would do.
This is the LaTeX document that ChatGPT produced base on the above prompt. Here is the compiled PDF document.
You can ask ChatGPT for specific LaTeX advise. To get a good response, think carefully about your prompt. Also, you can offer to ChatGPT a sample of short mathematical writing from the web or a book as a PNG file and it convert that writing to LaTeX. You can even try with neat handwriting. The results will of course depend on the clarity of the file, ChatGPT makes mistakes, but I found it incredibly useful.