Math 304 - Winter 2024

Branko Ćurgus

Saturday, March 16, 2024

I post another example of Singular Value Decomposition with a short discussion of the reduced Singular Value Decomposition and the pseudoinverse (Moore-Penrose inverse).
Example 3. Here is a calculation of a singular value decomposition of the matrix \[ A = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right]. \]
- (I) To find the singular values and right singular vectors we calculate the matrix \[ A^\top \!A = \left[\!\begin{array}{rrrr} 3 & -1 & 1 & 1 \\ -1 & 3 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] = \left[\!\begin{array}{rrr} 12 & -4 & 4 \\ -4 & 12 & 4 \\ 4 & 4 & 4 \end{array}\right] = 4 \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \end{array}\right]. \] Observe that adding the first two columns and subtracting twice the third column gives the zero vector. Hence $\lambda_3 = 0$ is an eigenvalue of $A^\top\!A$ and a corresponding eigenvector is $\bigl[ -1 \ -1 \ \ 2 \bigr]^\top$. Since each row of $A^\top\!A$ sums to $12$, $\lambda_2 = 12$ is an eigenvalue of $A^\top\!A$ and a corresponding eigenvector is $\bigl[ 1 \ \ 1 \ \ 1 \bigr]^\top$. Since the vector $\bigl[ 1 \ -1 \ \ 0 \bigr]^\top$ is orthogonal to both earlier found eigenvectors it also must be an eigenvector of $A^\top\!A$. The corresponding eigenvalue is $\lambda_1 = 16$. Thus the singular values of $A$ are $\sigma_1 = 4$ and $\sigma_2 = 2\sqrt{3}$, and the matrices $\Sigma$ and $V$ are as follows \[ \Sigma = \left[\!\begin{array}{rrr} 4 & 0 & 0 \\ 0 & 2\sqrt{3} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right] \qquad V = \left[\!\begin{array}{rrr} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{6}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{6}} \\ 0 & \frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \end{array}\right] = \bigl[ \mathbf{v}_1 \ \mathbf{v}_2 \ \mathbf{v}_3 \bigr]. \]
- (II) To find a $4\!\times\!4$ orthogonal matrix $U$ we first normalize vectors \[ A \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \end{array}\right] = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \end{array}\right] = \left[\!\begin{array}{r} 4 \\ -4 \\ 0 \\ 0 \end{array}\right] = 4 \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \\ 0 \end{array}\right], \quad \text{hence} \quad \mathbf{u}_1 = \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array}\right], \] and \[ A \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right] = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right] = \left[\!\begin{array}{r} 3 \\ 3 \\ 3 \\ 3 \end{array}\right] = 3 \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \\ 1 \end{array}\right], \quad \text{hence} \quad \mathbf{u}_2 = \left[\!\begin{array}{r} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right]. \] From the general considerations about the singular value decomposition we know that the singular values and left and right singular vectors must satisfy: $A\mathbf{v}_1 = \sigma_1 \mathbf{u}_1$ and $A\mathbf{v}_2 = \sigma_2 \mathbf{u}_2$. Next we verify these equalities: \[ \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{array}\right] = 4 \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array}\right] \quad \text{and} \quad \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array}\right] = 2\sqrt{3} \left[\!\begin{array}{r} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right] \] It has been established in class that $\mathbf{u}_1$ and $\mathbf{u}_2$ form an orthonormal basis for $\operatorname{Col}A$.
- (III) To complete the matrix $U$ we need an orthonormal basis for $\mathbb{R}^4$. Since the space $\operatorname{Nul}\bigl(A^\top\bigr)$ is the orthogonal complement of $\operatorname{Col}A$, we can simply find the nullspace of $A^\top$, and then find two orhonormal vectors in $\operatorname{Nul}\bigl(A^\top\bigr).$ Here we go: \[ \textstyle \left[\!\begin{array}{rrrr} 3 & -1 & 1 & 1 \\ -1 & 3 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 1 & 1 & 1 \\ 0 & 4 & 2 & 2 \\ 0 & -4 & -2 & -2 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 1 & 1 & 1 \\ 0 & 1 & 1/2 & 1/2 \\ 0 & 0 & 0 & 0 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 0 & 1/2 & 1/2 \\ 0 & 1 & 1/2 & 1/2 \\ 0 & 0 & 0 & 0 \end{array}\right] \] Thus, \[ \operatorname{Nul}\bigl(A^\top\bigr) = \left\{ s \left[\!\begin{array}{r} -1 \\ -1 \\ 0 \\ 2 \end{array}\right] + t \left[\!\begin{array}{r} -1 \\ -1 \\ 2 \\ 0 \end{array}\right] \ : \ s, t \in \mathbb{R} \right\}. \] All the vectors in $\operatorname{Nul}\bigl(A^\top\bigr)$ are orthogonal to $\mathbf{u}_1$ and $\mathbf{u}_2$ (verify this). There are many pairs of orthonormal vectors in $\operatorname{Nul}\bigl(A^\top\bigr).$ One pair that cough my attention is obtained with $s=1/2$, $t=1/2$ and $s=1/2$, $t=-1/2$ and then normalized. That is the pair \[ \mathbf{u}_3 = \left[\!\begin{array}{r} -\frac{1}{2} \\ - \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right] \quad \text{and} \quad \mathbf{u}_4 = \left[\!\begin{array}{c} 0 \\ 0 \\ -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array}\right] \] Finally, \[ U = \left[\!\begin{array}{rrrr} \frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \end{array}\right]. \]
- Remark To find vectors $\mathbf{u}_3$ and $ \mathbf{u}_4$ it might be slightly more efficient to proceed in the following way. Since we know that $\mathbf{u}_1$ and $ \mathbf{u}_2$ form a basis for $\operatorname{Col} A$ we can find a basis for $(\operatorname{Col} A)^{\perp}$ by solving the system \[ \left[\!\begin{array}{rrrr} 1 & -1 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array}\right] = \left[\!\begin{array}{c} 0 \\ 0 \end{array}\right] \] The row reduction of the matrix \[ \left[\!\begin{array}{rrrr} 1 & -1 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{array}\right] \sim \cdots \sim \left[\!\begin{array}{rrrr} 1 & 0 & 1/2 & 1/2 \\ 0 & 1 & 1/2 & 1/2 \end{array}\right] \] might be simpler than the row reduction that we did in (III).
To celebrate our work we verify \[ \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] = \left[\!\begin{array}{rrrr} \frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \end{array}\right] \left[\!\begin{array}{rrr} 4 & 0 & 0 \\ 0 & 2\sqrt{3} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{rrr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \end{array}\right] . \]
The reduced Singular Value Decomposition of $A$ is (we drop the vectors that belong to the nullspaces of $A$ and $A^\top$ and the zero rows and columns of $\Sigma$) \[ \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] = \left[\!\begin{array}{rr} \frac{1}{\sqrt{2}} & \frac{1}{2} \\ -\frac{1}{\sqrt{2}} & \frac{1}{2} \\ 0 & \frac{1}{2} \\ 0 & \frac{1}{2} \end{array}\right] \left[\!\begin{array}{rr} 4 & 0 \\ 0 & 2\sqrt{3} \end{array}\right] \left[\!\begin{array}{rrr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{array}\right] . \]
The pseudoinverse of $A$ is \[ A^+ = \left[\!\begin{array}{rr} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ 0 & \frac{1}{\sqrt{3}} \end{array}\right] \left[\!\begin{array}{cc} \frac{1}{4} & 0 \\ 0 & \frac{1}{2\sqrt{3}} \end{array}\right] \left[\!\begin{array}{rrrr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 & 0 \\ \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \end{array}\right] = \left[\!\begin{array}{rrrr} \frac{5}{24} & -\frac{1}{24} & \frac{1}{12} & \frac{1}{12} \\ -\frac{1}{24} & \frac{5}{24} & \frac{1}{12} & \frac{1}{12} \\ \frac{1}{12} & \frac{1}{12} & \frac{1}{12} & \frac{1}{12} \end{array}\right] \] The following four properties make the pseudoinverse unique and special:
- The matrix $A^+A$ is symmetric.
- The matrix $AA^+$ is symmetric.
- $AA^+A = A$
- $A^+AA^+ = A^+$

Friday, March 15, 2024

Example 2. The following $4\!\times\!5$ matrix is used as an example of Singular Value Decomposition on Wikipedia. \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right]. \] Since this matrix has a lot of zero entries it should not be hard to find its SVD. Remember, SVD is not unique, so if the SVD that we find is different from what is on Wikipedia, it does not mean that it is wrong.
- In this item I will state an important principle in finding an SVD by hand. Let $A$ be an $m\!\times\!n$ matrix and let \[ A = U\Sigma V^\top \] be an SVD of $A.$ Notice that knowing an SVD of $A$ immediately have found a Singular Value Decomposition of $A^\top$: \[ A^\top = V \Sigma^\top U^\top. \] When you write down the matrix $\Sigma^\top$ you see that the entries on the ``diagonal'' of this matrix are the same as the entries of $\Sigma$. Therefore the singular values of $A$ and $A^\top$ are the same. The only difference is that matrices $U$ and $V$ change positions. Conversely, if we know a Singular Value Decomposition of $A^\top$ we immediately know a Singular Value Decomposition of $A.$
  The above observation is particularly important if the positive integer $n$ is "much" larger than the positive integer $m.$ To understand why, think of what is involved in finding an SVD of $A$: We need to find an orthogonal diagonalization of the $n\!\times\!n$ matrix $A^\top A.$ Contrast this with what is involved in finding an SVD of $A^\top$: We need to find an orthogonal diagonalization of the $m\!\times\!m$ matrix $(A^\top)^\top A^\top = AA^\top.$ Since we assume that $m$ is a smaller positive integer, it is easier to find orthogonal diagonalization of $A^\top.$
- To find a Singular Value Decomposition of $M$ from Wikipedia, we are looking for a $4\!\times\!4$ orthogonal matrix $U$, the $4\!\times\!5$ matrix $\Sigma$ with the singular values of $M$ on the "diagonal", and a $5\!\times\!5$ orthogonal matrix $V$, such that $M = U\Sigma V^\top$.
  
  As explained in the previous item, finding an SVD of $M^\top$ is easier. Thus, we proceed with finding \[ M^\top = V \Sigma^\top U^\top, \] with $U,$ $V$ and $\Sigma$ as above.
- (I) To find the singular values and right singular vectors of $M^\top$ we calculate the matrix \[ M M^\top = \left[\!\begin{array}{rrrr} 5 & 0 & 0 & 0 \\ 0 & 9 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 4 \end{array}\right]. \] Clearly the eigenvalues of this matrix in nonincreasing order are $9,$ $5,$ $4$ and $0.$ Thus the singular values of $M$ and $M^\top$ are $3,$ $\sqrt{5},$ and $2.$ The ranks of both $M$ and $M^\top$ are $3.$ The dimension of the nulspace of $M$ is $2$ and the dimension of the nulspace of $M^\top$ is $1$. The matrix $\Sigma$, in fact for us now it is $\Sigma^\top,$ is \[ \Sigma^\top = \left[\!\begin{array}{cccc} 3 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] \] The corresponding orthogonal matrix $U$ is \[ U = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \]
- (II) To find a $5\!\times\!5$ orthogonal matrix $V$ we notice that the equality $M^\top = V \Sigma^\top U^\top$ implies \[ M^\top U = V \Sigma^\top. \] Thus, we calculate \[ M^\top U =\left[\!\begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 3 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \end{array}\right] = \left[\!\begin{array}{cccc} 0 & 1/\sqrt{5} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 2/\sqrt{5} & 0 & 0 \end{array}\right] \left[\!\begin{array}{cccc} 3 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] \] Thus, the first three columns of $V$ are \[ \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 2/\sqrt{5} & 0 \end{array}\right]. \] Notice that to find these three columns we performed a minimal amount of calculation.
- (III) The next step is to find the remaining two columns of $V.$ Since the first three columns of $V$ form an orthonormal basis for $\operatorname{Row} M$, the remaining two columns of $V$ will be two orthonormal vectors in $\operatorname{Nul} M.$ To find these vectors row-reduce $M$: \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] \quad \sim \quad \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array}\right]. \] Thus, the null-space of $M$ is spanned by the orthogonal vectors \[ \left[\!\begin{array}{r} -2 \\ 0 \\ 0 \\ 0 \\ 1 \end{array}\right] \qquad \text{and} \qquad \left[\!\begin{array}{r} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{array}\right]. \] Finally we have the complete $5\!\times\!5$ matrix $V$ \[ V = \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 & -2/\sqrt{5} & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 2/\sqrt{5} & 0 & 1/\sqrt{5} & 0 \end{array}\right]. \]
To celebrate our work we verify \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \left[\!\begin{array}{ccccc} 3 & 0 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{ccc} 0 & 0 & 1 & 0 & 0 \\ 1/\sqrt{5} & 0 & 0 & 0 & 2/\sqrt{5} \\ 0 & 1 & 0 & 0 & 0 \\ -2/\sqrt{5} & 0 & 0 & 0 & 1/\sqrt{5} \\ 0 & 0 & 0 & 1 & 0 \end{array}\right] \] Or, equivalently, what is easier $MV = U\Sigma$: \[ \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 & -2/\sqrt{5} & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 2/\sqrt{5} & 0 & 1/\sqrt{5} & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \left[\!\begin{array}{ccccc} 3 & 0 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]. \]

Thursday, March 14, 2024

Suggested problems for Section 7.4: 3, 7, 11, 13, 14, 15, 17, 18, 19 (Notice a mistake in Practice Problem 2.)
I post my notes on the Singular Value Decomposition that I wrote in GoodNotes on my iPad, trying to replicate what I did on the blackboard.
I believe that colors and pictures will help you internalize the process of the construction of the Singular Value Decomposition.

Example 1. As the first example consider $3\!\times\!2$ matrix \[ A = \left[\begin{array}{rr} 1 & 2 \\ 2 & 1 \\ 1 & -1 \end{array}\right]. \]
- To find a Singular Value Decomposition of the $3\!\times 2$ matrix $A$ given above, we are looking for a $3\!\times\!3$ orthogonal matrix $U$, the $3\!\times\!2$ matrix $\Sigma$ with the singular values of $A$ on the "diagonal", and a $2\!\times\!2$ orthogonal matrix $V$, such that $A = U\Sigma V^\top$.
- (I) To find the singular values and right singular vectors of $A$ we calculate the symmetric matrix \[ A^\top A = \left[\begin{array}{rrr} 1 & 2 & 1 \\ 2 & 1 & -1 \end{array}\right] \left[\begin{array}{rr} 1 & 2 \\ 2 & 1 \\ 1 & -1 \end{array}\right] = \left[\begin{array}{rr}6 & 3 \\ 3 & 6 \end{array}\right]. \] The eigenvalues of this matrix $A^\top A$ in nonincreasing order are $9$ and $3.$ The orthogonal diagonalization of the matrix $A^\top A$ is \[ A^\top A = \left[\begin{array}{rr}6 & 3 \\ 3 & 6 \end{array}\right] = \left[\begin{array}{rr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} &\frac{1}{\sqrt{2}} \end{array}\right] \left[\!\begin{array}{cc} 9 & 0\\ 0 & 3 \end{array}\right] \left[\begin{array}{rr} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ - \frac{1}{\sqrt{2}} &\frac{1}{\sqrt{2}} \end{array}\right]. \] Here we found the matrix $V$ in the singular value decomposition \[ V = \left[\begin{array}{rr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} &\frac{1}{\sqrt{2}} \end{array}\right]. \] The singular values of $A$ are $3$ and $\sqrt{3}.$ The matrix $\Sigma$ is \[ \Sigma = \left[\!\begin{array}{cc} 3 & 0\\ 0 & \sqrt{3} \\ 0 & 0 \end{array}\right] \]
- (II) To find a $3\!\times\!3$ orthogonal matrix $U$, we rewrite the equality $A = U \Sigma^\top V^\top$ as \[ A V = U \Sigma. \] It is easier to see what is going on in the above equality if we write \[ V = \bigl[ \mathbf{v}_1 \ \mathbf{v}_2 \bigr], \quad U = \bigl[ \mathbf{u}_1 \ \mathbf{u}_2 \ \mathbf{u}_3 \bigr], \quad \Sigma = \left[\!\begin{array}{cc} \sigma_1 & 0\\ 0 & \sigma_2 \\ 0 & 0 \end{array}\right] \] Thus, \[ A V = \bigl[ A \mathbf{v}_1 \ A \mathbf{v}_2 \bigr] \] and \[ U \Sigma = \bigl[ \mathbf{u}_1 \ \mathbf{u}_2 \ \mathbf{u}_3 \bigr] \left[\!\begin{array}{cc} \sigma_1 & 0\\ 0 & \sigma_2 \\ 0 & 0 \end{array}\right] = \bigl[ \sigma_1 \mathbf{u}_1 \ \sigma_2 \mathbf{u}_2 \bigr]. \] Hence \[ A \mathbf{v}_1 = \sigma_1 \mathbf{u}_1, \qquad A \mathbf{v}_2 = \sigma_2 \mathbf{u}_2. \] Therefore, \[ \mathbf{u}_1 = \frac{1}{\sigma_1} A \mathbf{v}_1 = \frac{1}{3} \left[\begin{array}{rr} 1 & 2 \\ 2 & 1 \\ 1 & -1 \end{array}\right] \left[\begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array}\right] = \left[\begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 0 \end{array}\right] \] and \[ \mathbf{u}_2 = \frac{1}{\sigma_2} A \mathbf{v}_2 = \frac{1}{\sqrt{3}} \left[\begin{array}{rr} 1 & 2 \\ 2 & 1 \\ 1 & -1 \end{array}\right] \left[\begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array}\right] = \left[\begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \end{array}\right]. \] Thus, we have the first two columns of $U$. Those two columns form a basis for the column space of $A$.
- (III) The next step is to find the remaining column of $U.$ Since the first two columns of $U$ form an orthonormal basis for $\operatorname{Col}(A)$, the remaining column of $U$ will be a unit vector in $\operatorname{Nul}(A^\top).$ That vector is \[ \mathbf{u}_3 = \left[\!\begin{array}{r} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array}\right]. \] Finally we have the complete $3\!\times\!3$ matrix $U$ \[ U = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} &\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{6}} & - \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{array}\right]. \]
To celebrate our work we verify \[ A = \left[\begin{array}{rr} 1 & 2 \\ 2 & 1 \\ 1 & -1 \end{array}\right] = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} &\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{6}} & - \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{array}\right] \left[\!\begin{array}{cc} 3 & 0\\ 0 & \sqrt{3} \\ 0 & 0 \end{array}\right] \left[\begin{array}{rr} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} &\frac{1}{\sqrt{2}} \end{array}\right]. \] Or, equivalently, what is easier $AV = U\Sigma$: \[ \left[\begin{array}{rr} 1 & 2 \\ 2 & 1 \\ 1 & -1 \end{array}\right] \left[\begin{array}{rr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} &\frac{1}{\sqrt{2}} \end{array}\right] = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} &\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{6}} & - \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{array}\right] \left[\!\begin{array}{cc} 3 & 0\\ 0 & \sqrt{3} \\ 0 & 0 \end{array}\right]. \]
In this case, since the $A$ maps vectors in $\mathbb{R}^2$ to vectors in $\mathbb{R}^3$, we can represent the geometric significance of the singular values and the singular vectors in a picture.

The narrative is missing for the above picture. Also, I want to make the picture of the action of $A^\top$. I already made the left-hand side picture:

The right-hand side picture should be simpler to make; it will follow soon.

Monday, March 11, 2024

In examples below we will consider several specific quadratic forms $Q$ and answer the following four questions:
- Write the quadratic form $Q$ using a symmetric matrix $A$ as $Q(\mathbf{x}) = \mathbf{x}^\top\!A \mathbf{x}$ where $\mathbf{x}\in \mathbb{R}^n.$
- Classify $Q$ using the quadruplicity stated in the post on Tuesday: positive semidefinite, negative semidefinite, indefinite. Don't forget to state whether the form is positive definite or negative definite or not.
- Give a detailed description of the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = -1 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 0 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 1 \bigr\}. \]
- Consider the set of real numbers \[ S = \bigl\{ Q(\mathbf{x}) \, : \ \mathbf{x}\in \mathbb{R}^n, \ \ \| \mathbf{x} \| = 1 \bigr\}. \] Determine $\min S$ and $\max S$ and describe the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \min S \bigr\}, \qquad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \max S \bigr\}. \]
Suggested problems for Section 7.3: 1, 3, 5, 9, 11, 12
In Section 7.2 in the book the author does not discus quadratic forms with three variables. Here are some animations that might help you understand the quadratic form $x_1^2 + x_2^2 - x_3^2$. Here I show the surfaces in ${\mathbb R}^3$ with equations $x_1^2 + x_2^2 - x_3^2 = c$ for different values of $c$. These surfaces are called hyperboloids. You can read more at the Wikipedia Hyperboloid page. One sheet hyperboloids are often encountered in art, see these Wikipedia pages Hyperboloid structure and list of hyperboloid structures, do not miss the Gallery at the bottom of the last Wikipedia page.
Place the cursor over the image to start the animation.

Five of the above level surfaces at different level of opacity.
Example 4. In this item we consider the quadratic form \[ Q(x_1, x_2, x_3) = 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- (I) We have \begin{align*} Q(x_1, x_2, x_3) = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] & = 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 \\ & \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*} The symmetric matrix $\displaystyle A = \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array}\!\right]$ is said to be associated with the quadratic form $Q$. In the next item we will orthogonally diagonalize the matrix $A$. Finding an orthonormal basis $\{\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3 \}$ of $\mathbb{R}^3$ that consists of unit eigenvectors of $A$ and the corresponding eigenvalues $\lambda_1, \lambda_2, \lambda_3$ will be a significant help in understanding the quadratic form $Q$.
- (II) Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix $A$ that is associated with this quadratic form: \[ \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array} \!\right] = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right]^\top. \] Here \[ \mathbf{u}_1 = \left[\!\begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}}\end{array}\!\right], \quad \mathbf{u}_2 = \left[\!\begin{array}{c}\frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}}\end{array}\!\right], \quad \mathbf{u}_3 = \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array}\!\right], \] are normalized eigenvectors of$A$. The corresponding eigenvalues, respectively, are $\lambda_1 = -1$, $\lambda_2 = -1$, and $\lambda_3 = 5$. Let us introduce two bases of $\mathbb{R}^3$, the basis $\mathcal{B}= \{\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3 \}$ which consists of the normalized eigenvectors of $A$ and the standard basis $\mathcal{E} = \{\mathbf{i}, \mathbf{j}, \mathbf{k}\}$. (Here I use the notation $\mathbf{i}, \mathbf{j}, \mathbf{k}$ for the standard coordinate vectors in $\mathbb{R}^3$. This notation is common in multivariable calculus courses.) That is, \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates of $\mathbf{x}$ relative to the basis $\mathcal{B}$, that is $\mathbf{y} = \bigl[\mathbf{x}\bigr]_{\mathcal{B}}.$ With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \begin{align*} 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 & = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right]\\[5pt] \boxed{\text{orthogonal diagonalization}} & = \mathbf{x}^\top \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right]^\top \mathbf{x} \\[5pt] \boxed{\text{definitions of $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ and $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$}} & = \mathbf{x}^\top \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} \\[5pt] \boxed{\text{$\displaystyle \Bigl( \displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\Bigr)^\top = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$}} & = \mathbf{x}^\top \Bigl( \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\Bigr)^\top \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} \\[5pt] \boxed{\text{properties of transpose}} & = \Bigl( \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\Bigr)^\top \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} \\[5pt] \boxed{\text{change of coordinates}} & = \mathbf{y}^\top \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \mathbf{y} \\[5pt] & = \bigl[ y_1 \ \ y_2 \ \ y_3 \bigr] \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \left[\! \begin{array}{c} y_1 \\ y_2 \\ y_3 \end{array} \!\right]\\[5pt] & = - y_1^2 - y_2^2 + 5 y_3^2. \end{align*} Clearly the quadratic form $- y_1^2 - y_2^2 + 5 y_3^2$ is an indefinite form taking the value $-1$ at $(y_1,y_2,y_3) = (1,0,0)$ and the value $5$ at $(y_1,y_2,y_3) = (0,0,1).$ Or, using the change of coordinates matrices, the quadratic form given in the Example takes the negative value $-2$ at $(x_1,x_2,x_3) = (-1,0,1)$ and the positive value $30$ at $(x_1,x_2,x_3) = (1,2,1).$
- (III) The above introduced change of coordinates yields the following set equality \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = c \bigr\} = \left\{ \mathbf{x} \in \mathbb{R}^3 \, : \bigl[ \mathbf{x}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{c} y_1 \\ y_2 \\ y_3\end{array} \!\right] \quad \text{and} \quad \, - y_1^2 - y_2^2 + 5 y_3^2 = c \right\} \] which holds for each $c \in \mathbb{R}.$
  
  Since the set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = 0 \bigr\} \] is a rotated cone, the set equality stated at the beginning of this item with $c=0$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] is also a rotated cone. The intersection of this cone and the plane spanned by the vectors $\mathbf{u}_1$ and $\mathbf{u}_3$ is a pair of lines given by the equations \[ y_1 = \sqrt{5} y_3 \quad \text{and} \quad y_1 = -\sqrt{5} y_3. \] These lines are spanned, respectively, by the following vectors \[ \sqrt{5} \mathbf{u}_1 + \mathbf{u}_3 = \sqrt{5} \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \quad \text{and} \quad -\sqrt{5} \mathbf{u}_1 + \mathbf{u}_3 -\sqrt{5} \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] Therefore, the cone is obtained by the rotation of the line spanned by the vector \[ \sqrt{5} \mathbf{u}_1 + \mathbf{u}_3 = \sqrt{5} \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] about the line spanned by the vector \[ \mathbf{u}_3 = \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
  The reasoning here is the same as in a multivariable calculus course when we figure out the shape of the surface given by the equation $-x^2-y^2+5z^2 = 0$, or \[ |z| = \frac{1}{\sqrt{5}} \sqrt{x^2+y^2}. \] This cone is obtained by rotating the line $z = \frac{1}{\sqrt{5}}x, y = 0$ about the $z$-axes. Or, in the language of vectors, by rotating the line spanned by the vector $\sqrt{5} \mathbf{i}+\mathbf{k}$ about the vector $\mathbf{k}$.
  
  Below is a graph of the cone \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] with the unit vectors $\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3$ in blue and the line spanned by the vector \[ \sqrt{5} \mathbf{u}_1 + \mathbf{u}_3 = \sqrt{5} \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] in red. It is interesting to identify the blue vector in the picture below which corresponds to $\mathbf{u}_3$, since the rotation of the red line about this vector creates the cone. Notice that $\mathbf{u}_3$ is the only blue vector with all three coordinates positive. Also notice that the positive directions of $x_1$-axis and $x_2$-axis point away from the observer.
- (IV) The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = 1 \bigr\} \] is a rotated two sheet hyperboloid. This two sheet hyperboloid is obtained as the hyperbola $-y_1^2 + 5 y_3^2 = 1, y_2 = 0$ rotates about the $y_3$-axis. Consequently, the set equality stated at the beginning of this item with $c=1$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} \] is also a rotated two sheet hyperboloid. This hyperboloid is obtained as a hyperbola in the plane spanned by the vectors \[ \mathbf{u}_1 = \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \quad \text{and} \quad \mathbf{u}_3 = \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] rotates about the vector \[ \mathbf{u}_3 = \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \] In the picture below this hyperbola is colored red.
- (V) The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = -1 \bigr\} \] is a rotated one sheet hyperboloid. This hyperboloid is obtained as the hyperbola $-y_2^2 + 5 y_3^2 = -1, y_1 = 0$ rotates about $y_3$-axis. The set equality stated at the beginning of this item with $c=-1$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} \] is also a rotated one sheet hyperboloid. This hyperboloid is obtained as a hyperbola in the plane spanned by the vectors \[ \mathbf{u}_2 = \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \quad \text{and} \quad \mathbf{u}_3 = \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] rotates about the vector \[ \mathbf{u}_3 = \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
- (VI) Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ - y_1^2 - y_2^2 + 5 y_3^2 \, : \, \mathbf{y} = 1, \ \mathbf{y} \in\mathbb{R}^3 \bigr\}. \] Since whenever $y_1^2 + y_2^2 + y_3^2 = 1$ we have \[ -1 = -1 y_1^2 - 1 y_2^2 - 1 y_3^2 \leq - y_1^2 - y_2^2 + 5 y_3^2 \leq 5 y_1^2 + 5 y_2^2 + 5 y_3^2 = 5 \] wededuce that $\min S = -1$ and $\max S = 5$. The form $- y_1^2 - y_2^2 + 5 y_3^2$ takes the value $-1$ when $(y_1,y_2,y_3) = (\cos \theta, \sin \theta, 0)$ and the value $5$ when $(y_1,y_2,y_3) = (0,0,1)$ or $(y_1,y_2,y_3) = (0,0,-1)$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that: The minimum value $-1$ is taken at the circle on the unit sphere in $\mathbb{R}^3$. That is \begin{align*} \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} & = \bigl\{(\cos \theta) \mathbf{u}_1 + (\sin \theta) \mathbf{u}_2 \ : \ \theta \in [0, 2\pi) \bigr\} \\ & = \left\{ (\cos \theta) \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin \theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \ : \ \theta \in [0, 2\pi) \right\}. \end{align*} The situation with the maximum value $5$ is simpler, this value is taken at two vectors: \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 5 \bigr\} = \{ \mathbf{u}_3, - \mathbf{u}_3 \} = \left\{\left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \right\}. \]
Example 5. In this item we consider the quadratic form \begin{align*} Q(x_1, x_2, x_3) & = 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 \\ & \qquad \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- (I) We have \begin{align*} Q(x_1, x_2, x_3) & = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] \\ & = 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 \\ & \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- (II) Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{array} \!\right] = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 4 \end{array} \!\right] \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The vector $\mathbf{y}$ is the coordinate vector of $\mathbf{x}$ relative to the basis $\mathcal{B}$, that is $\mathbf{y} = \bigl[\mathbf{x}\bigr]_{\mathcal{B}}.$
  
  With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 = y_1^2 + y_2^2 + 4 y_3^2. \] The quadratic form $y_1^2 + y_2^2 + 4 y_3^2$ is positive definite since $y_1^2 + y_2^2 + 4 y_3^2 \geq 0$ for all $y_1, y_2, y_3 \in \mathbb{R}$ and $y_1^2 + y_2^2 + 4 y_3^2 = 0$ implies $(y_1,y_2,y_3) = (0,0,0).$ Therefore the given quadratic form $Q(\mathbf{x})$ is also positive definite.
- (III) The above introduced change of coordinates yields the following set equality \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = c \bigr\} = \left\{ \mathbf{x} \in \mathbb{R}^3 \, : \bigl[ \mathbf{x}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{c} y_1 \\ y_2 \\ y_3\end{array} \!\right] \quad \text{and} \quad \, y_1^2 + y_2^2 + 4 y_3^2 = c \right\} \] which holds for each $c \in \mathbb{R}.$ Since \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = -1 \bigr\} \] is the empty set, the stated set equality with $c = -1$ yields that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} \] is the empty set.
  
  Since the set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = 0 \bigr\} \] is a singleton set consisting of the zero vector, the stated set equality with $c = 0$ yields that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] is a singleton set consisting of the zero vector.
- (IV) Since the set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = -1 \bigr\} \] is the empty set, the stated set equality with $c = -1$ yields that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} \] is also the empty set.
- (V) The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = 1 \bigr\} \] is a rotated ellipsoid. This ellipsoid is obtained as the ellipse \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + 4 y_3^2 = 1, \ y_2 = 0 \bigr\} = \left\{ \left[\! \begin{array}{c} \cos\theta \\ 0 \\ \frac{1}{2} \sin\theta \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}, \] which is in the $y_1y_3$-plane, rotates about the $y_3$-axis. The set equality stated at the beginning of this item with $c = 1$ yields that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \ : \ Q(\mathbf{x}) = 1 \bigr\} \] is also a rotated ellipsoid obtained as the ellipse \[ \left\{ (\cos\theta)\left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + \frac{1}{2} (\sin\theta ) \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}, \] which is in the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right], \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \] rotates about the line determined by the vector $\displaystyle \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right].$ Notice that the intersection of this ellipsoid and the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] is the unit circle \[ \left\{ (\cos\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}. \]
- (VI) Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal matrices, we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{y_1^2 + y_2^2 + 4 y_3^2 \, : \, \| \mathbf{y} \| = 1, \ \mathbf{y} \in\mathbb{R}^3 \bigr\}. \] Since whenever $y_1^2 + y_2^2 + y_3^2 = 1$ we have \[ 1 = y_1^2 + y_2^2 + y_3^2 \leq y_1^2 + y_2^2 + 4 y_3^2 \leq 4 y_1^2 + 4 y_2^2 + 4 y_3^2 = 4, \] we deduce that $\min S = 1$ and $\max S = 4$. The form $y_1^2 + y_2^2 + 4 y_3^2$ takes the value $1$ when $(y_1,y_2,y_3) = (\cos \theta, \sin \theta, 0)$ and the value $4$ when $(y_1,y_2,y_3) = (0,0,1)$ or $(y_1,y_2,y_3) = (0,0,-1)$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that the minimum value, $1$, of $S$ is taken at the circle on the unit sphere in $\mathbb{R}^3$ given by \[ \left\{ (\cos\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}. \] The situation with the maximum value $4$ of $S$ is simpler; this value is taken at two vectors: \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 5 \bigr\} = \left\{\left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \right\}. \]

Friday, March 8, 2024

Today we continued discussion of Examples posted yesterday.
We continue with Section 7.2: Quadratic Forms. Suggested problems for: 1, 3, 5, 7, 9, 13, 17, 19, 20, 31, 32, 33, 34, 35. (5th ed: 1, 3, 5, 7, 9, 13, 17, 19, 20, 23, 24, 25, 26, 27).
When studying quadratic forms we want to answer the following four questions:
- Write the quadratic form $Q$ using a symmetric matrix $A$ as $Q(\mathbf{x}) = \mathbf{x}^\top\!A \mathbf{x}$ where $\mathbf{x}\in \mathbb{R}^n.$
- Classify $Q$ using the quadruplicity stated in the post on Tuesday: positive semidefinite, negative semidefinite, indefinite. Don't forget to state whether the form is positive definite or negative definite or not.
- Give a detailed description of the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = -1 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 0 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 1 \bigr\}. \]
- Consider the set of real numbers \[ S = \bigl\{ Q(\mathbf{x}) \, : \ \mathbf{x}\in \mathbb{R}^n, \ \ \| \mathbf{x} \| = 1 \bigr\}. \] Determine $\min S$ and $\max S$ and describe the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \min S \bigr\}, \qquad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \max S \bigr\}. \]

Thursday, March 7, 2024

We continue with Section 7.2: Quadratic Forms. Suggested problems for: 1, 3, 5, 7, 9, 13, 17, 19, 20, 31, 32, 33, 34, 35. (5th ed: 1, 3, 5, 7, 9, 13, 17, 19, 20, 23, 24, 25, 26, 27).
In Sections 7.2 and 7.3 we study Quadratic Forms.
In the several items below we will consider several specific quadratic forms $Q$ and answer the following four questions:
- Write the quadratic form $Q$ using a symmetric matrix $A$ as $Q(\mathbf{x}) = \mathbf{x}^\top\!A \mathbf{x}$ where $\mathbf{x}\in \mathbb{R}^n.$
- Classify $Q$ using the quadruplicity stated in the post on Tuesday: positive semidefinite, negative semidefinite, indefinite. Don't forget to state whether the form is positive definite or negative definite or not.
- Give a detailed description of the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = -1 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 0 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 1 \bigr\}. \]
- Consider the set of real numbers \[ S = \bigl\{ Q(\mathbf{x}) \, : \ \mathbf{x}\in \mathbb{R}^n, \ \ \| \mathbf{x} \| = 1 \bigr\}. \] Determine $\min S$ and $\max S$ and describe the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \min S \bigr\}, \qquad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \max S \bigr\}. \]
In class today we did Example 1, most of it.
Example 1. In this item we consider the quadratic form \[ Q(x_1,x_2) = 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1,x_2) = \bigl[ x_1 \ \ x_2 \bigr] \left[\! \begin{array}{cc} 6 & -2 \\ -2 & 3 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{cc} 6 & -2 \\ -2 & 3 \end{array} \!\right] = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right] \left[\! \begin{array}{cc} 2 & 0 \\ 0 & 7 \end{array} \!\right] \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ which consists of the blue vectors in the next image With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 = 2 y_1^2 + 7 y_2^2. \] Clearly $2 y_1^2 + 7 y_2^2 \geq 0$ for all $y_1, y_2 \in \mathbb{R}$ and $2 y_1^2 + 7 y_2^2 = 0$ if and only if $(y_1,y_2) =(0,0).$ Therefore, the given quadratic form is positive definite.
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = -1 \bigr\}, \] (this set is clearly an empty set) \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 0 \bigr\} \] (this set clearly consists of the zero vector only) and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 1 \bigr\}. \] The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 1 \bigr\} \] is an ellipse. The vertices of this ellipse in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(\frac{\sqrt{2}}{2}, 0 \right) , \ \left(-\frac{\sqrt{2}}{2}, 0 \right), \qquad \text{co-vertices}: \left(0, \frac{\sqrt{7}}{7} \right) , \ \left(0, -\frac{\sqrt{7}}{7}\right). \] To get the coordinates of these points in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(\frac{\sqrt{10}}{10},\frac{\sqrt{10}}{5} \right) , \ \left(-\frac{\sqrt{10}}{10}, - \frac{\sqrt{10}}{5} \right), \] \[ \text{co-vertices}: \left(-\frac{2 \sqrt{35}}{35}, \frac{\sqrt{35}}{35} \right) , \ \left(\frac{2 \sqrt{35}}{35}, -\frac{\sqrt{35}}{35} \right). \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^2, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ 2 y_1^2 + 7 y_2^2 \, : \, y_1^2 + y_2^2 = 1, \ y_1, y_2 \in\mathbb{R} \bigr\}. \] Since \[ 2 = 2 y_1^2 + 2 y_2^2 \leq 2 y_1^2 + 7 y_2^2 \leq 7 y_1^2 + 7 y_2^2 = 7 \] whenever $y_1^2 + y_2^2 = 1$, we have that $\min S = 2$ and $\max S = 7$. The form $2 y_1^2 + 7 y_2^2$ takes the value $2$ when $y_1 = 1, y_2 =0$ and $y_1 = -1, y_2 =0$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 2 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 7 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{-2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{2}{\sqrt{5}} \\ \frac{-1}{\sqrt{5}} \end{array} \!\right] \right\}. \]
Example 2. In this item we consider the quadratic form \[ Q(x_1,x_2) = x_1^2 + 6 x_2 x_1 + x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1,x_2) = \bigl[ x_1 \ \ x_2 \bigr] \left[\! \begin{array}{cc} 1 & 3 \\ 3 & 1 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = x_1^2 + 6 x_2 x_1 + x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{cc} 1 & 3 \\ 3 & 1 \end{array} \!\right] = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right] \left[\! \begin{array}{cc} 4 & 0 \\ 0 & -2 \end{array} \!\right] \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ which consists of the blue vectors in the next image With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ x_1^2 + 6 x_2 x_1 + x_2^2 = 4 y_1^2 - 2 y_2^2. \] Clearly $4 y_1^2 - 2 y_2^2$ is an indefinite form taking the value $4$ at $(y_1,y_2) = (1,0)$ and the value $-2$ at $(y_1,y_2) = (0,1).$
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = -1 \bigr\}, \] \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 0 \bigr\} \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 1 \bigr\}. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 1 \bigr\} \] is a hyperbola. The vertices of this hyperbola in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(\frac{1}{2}, 0 \right) , \ \left(-\frac{1}{2}, 0 \right) \] and the asymptotes of this hyperbola are two lines which are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right]. \] To get the coordinates of the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(\frac{\sqrt{2}}{4},\frac{\sqrt{2}}{4} \right) , \ \left(-\frac{\sqrt{2}}{4},-\frac{\sqrt{2}}{4} \right), \] and the asymptotes are determined by the vectors \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 0 \bigr\} \] is a union of two lines that go through the origin. These lines are the asymptotes of the preceding hyperbola and are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right], \] in the coordinates relative to the basis $\mathcal{B}$. To get the coordinates in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = -1 \bigr\} \] is a hyperbola. The vertices of this hyperbola in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(0, \frac{\sqrt{2}}{2}\right) , \ \left(0, -\frac{\sqrt{2}}{2} \right) \] and the asymptotes of this hyperbola are two lines which are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right]. \] To get the coordinates of the the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(-\frac{1}{2},\frac{1}{2} \right) , \ \left(\frac{1}{2}, -\frac{1}{2} \right), \] and the asymptotes are determined by the vectors \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^2, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ 4 y_1^2 - 2 y_2^2 \, : \, y_1^2 + y_2^2 = 1, \ y_1, y_2 \in\mathbb{R} \bigr\}. \] Since \[ -2 = -2 y_1^2 - 2 y_2^2 \leq 4 y_1^2 - 2 y_2^2 \leq 4 y_1^2 + 4 y_2^2 = 4 \] whenever $y_1^2 + y_2^2 = 1$, we have that $\min S = -2$ and $\max S = 4$. The form $4 y_1^2 - 2 y_2^2$ takes the value $-2$ when $y_1 = 0, y_2 = 1$ or $y_1 = 0, y_2 =-1$ and the value $4$ when $y_1 = 1, y_2 = 0$ or $y_1 = -1, y_2 =0$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -2 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 4 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \right\}. \]
Example 3. In this item we consider the quadratic form \[ Q(x_1, x_2, x_3) = x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1, x_2, x_3) = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 1 & -2 & 0 \\ -2 & 0 & 2 \\ 0 & 2 & -1 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] = x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 \quad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 1 & -2 & 0 \\ -2 & 0 & 2 \\ 0 & 2 & -1 \end{array} \!\right] = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} -3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \end{array} \!\right] \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right], \left[\! \begin{array}{c} \frac{2}{3} \\ \frac{1}{3} \\ \frac{2}{3} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 = - 3 y_1^2 + 3 y_2^2. \] Clearly the form $- 3 y_1^2 + 3 y_2^2$ is an indefinite form taking the value $-3$ at $(y_1,y_2,y_3) = (1,0,0)$ and the value $3$ at $(y_1,y_2,y_3) = (0,1,0).$
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = -1 \bigr\}, \] \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 0 \bigr\}, \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 1 \bigr\}. \]
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 0 \bigr\} \] is a union of two planes. These planes are represented by the following two spans: \[ \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\} \qquad \text{and} \qquad \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ -1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\} \] in the coordinates relative to the basis $\mathcal{B}$. To get the spans of the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \operatorname{Span} \left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 2 \\ 1 \\ 2 \end{array} \!\right] \right\} \qquad \text{and} \qquad \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ -4 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 2 \\ 1 \\ 2 \end{array} \!\right] \right\} \]
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 1 \bigr\} \] is a hyperbolic cylinder. The equation $- 3 y_1^2 + 3 y_2^2 = 1$ represents a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \quad \text{coordinates relative to} \quad \mathcal{E}. \] The cylinder is formed by the parallel lines that go through the points on the hyperbola and are orthogonal to the plane spanned by the above two vectors.
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = -1 \bigr\} \] is a hyperbolic cylinder. The equation $- 3 y_1^2 + 3 y_2^2 = -11$ represents a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \quad \text{coordinates relative to} \quad \mathcal{E}. \] The cylinder is formed by the parallel lines that go through the points on the hyperbola and are orthogonal to the plane spanned by the above two vectors.
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ -3 y_1^2 + 3 y_2^2 \, : \, y_1^2 + y_2^2 + y_3^2 = 1, \ y_1, y_2, y_3 \in\mathbb{R} \bigr\}. \] Since \[ -3 = -3 y_1^2 - 3 y_2^2 - 3 y_3^2 \leq -3 y_1^2 + 3 y_2^2 \leq 3 y_1^2 + 3 y_2^2 + 3 y_3^2 = 3 \] whenever $y_1^2 + y_2^2 + y_3^2 = 1$, we have that $\min S = -3$ and $\max S = 3$. The form $-3 y_1^2 + 3 y_2^2$ takes the value $-3$ when $y_1 = 1, y_2 = 0$ or $y_1 = -1, y_2 =0$ and the value $3$ when $y_1 = 0, y_2 = 1$ or $y_1 = 0, y_2 =-1$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -3 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right], -\left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 3 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right], - \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \right\}. \]

Wednesday, March 6, 2024

Two ways to multiply two matrices: \[ \left[\! \begin{array}{rrrrr} 1 & -2 & 3 \\ -4 & 5 & -7 \\ 6 & -8 & 9 \\ 2 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrrrr} -1 & 1 & 4 & -6 & -3 \\ -4 & 3 & -2 & -1 & 2 \\ -3 & 2 & -5 & 4 & 3 \end{array} \!\right] = \left[\! \begin{array}{rrrrr} -2 & 1 & -7 & 8 & 2 \\ 5 & -3 & 9 & -9 & 1 \\ -1 & 0 & -5 & 8 & -7 \\ -5 & 4 & 3 & -8 & -3 \end{array} \!\right] \]
First the "ROW-COLUMN" focus. This method is always used when multiplying matrices on paper: focus on the rows of the first matrix and on the columns of the second matrix. \begin{align*} & \left[\!\! \begin{array}{r} \phantom{\biggl|\biggr.} \bigl[\!\begin{array}{ccc} \phantom{-}1 & \bbox[#FF9944]{-2} & \phantom{-}3 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} -4 &\bbox[#FF9944]{ \phantom{-}5} & -7 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} \phantom{-}6 & \bbox[#FF9944]{-8} & \phantom{-}9 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} \phantom{-}2 & \bbox[#FF9944]{\phantom{-}0} & \phantom{-}1 \end{array} \!\bigr] \end{array} \!\right] \left[\! \begin{array}{rrrrr} \left[\! \begin{array}{r} -1 \\ \bbox[#FF9944]{-4} \\ -3 \end{array} \!\right] & \left[\! \begin{array}{r} 1 \\ \bbox[#FF9944]{3} \\ 2 \end{array} \!\right] & \left[\! \begin{array}{r} 4 \\ \bbox[#FF9944]{-2} \\ -5 \end{array} \!\right] & \left[\! \begin{array}{r} -6 \\ \bbox[#FF9944]{-1} \\ 4 \end{array} \!\right] & \left[\! \begin{array}{r} -3 \\ \bbox[#FF9944]{2} \\ 3 \end{array} \!\right] \end{array} \!\right] \\[10pt] & = \left[\! \begin{array}{rrrrr} \bigl[\!\begin{array}{ccc} 1 & \bbox[#FF9944]{-2} & 3 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ \bbox[#FF9944]{-4} \\ -3 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & \bbox[#FF9944]{-2} & 3 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ \bbox[#FF9944]{3} \\ 2 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & \bbox[#FF9944]{-2} & 3 \end{array} \!\bigr] \left[\! \begin{array}{r} 4 \\ \bbox[#FF9944]{-2} \\- 5 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & \bbox[#FF9944]{-2} & 3 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -6 \\ \bbox[#FF9944]{-1} \\ 4 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & \bbox[#FF9944]{-2} & 3 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -3 \\ \bbox[#FF9944]{2} \\ 3 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} -4 & \bbox[#FF9944]{5} & -7 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ \bbox[#FF9944]{-4} \\ -3 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} -4 & \bbox[#FF9944]{5} & -7 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ \bbox[#FF9944]{3} \\ 2 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} -4 & \bbox[#FF9944]{5} & -7 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 4 \\ \bbox[#FF9944]{-2} \\ -5 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} -4 & \bbox[#FF9944]{5} & -7 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -6 \\ \bbox[#FF9944]{-1} \\ 4 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} -4 & \bbox[#FF9944]{5} & -7 \end{array} \!\bigr]\!\left[\! \begin{array}{r}-3 \\ \bbox[#FF9944]{2} \\ 3 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 6 & \bbox[#FF9944]{-8} & 9 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ \bbox[#FF9944]{-4} \\ -3 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 6 & \bbox[#FF9944]{-8} & 9 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ \bbox[#FF9944]{3} \\ 2 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 6 & \bbox[#FF9944]{-8} & 9 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 4 \\ \bbox[#FF9944]{-2} \\ -5 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 6 & \bbox[#FF9944]{-8} & 9 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -6 \\ \bbox[#FF9944]{-1} \\ 4 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 6 & \bbox[#FF9944]{-8} & 9 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -3 \\ \bbox[#FF9944]{2} \\ 3 \end{array} \!\right]\\ \bigl[\!\begin{array}{ccc} 2 & \bbox[#FF9944]{0} & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ \bbox[#FF9944]{-4} \\ -3 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & \bbox[#FF9944]{0} & 1\end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ \bbox[#FF9944]{3} \\ 2 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & \bbox[#FF9944]{0} & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 4 \\ \bbox[#FF9944]{-2} \\ -5 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & \bbox[#FF9944]{0} & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -6 \\ \bbox[#FF9944]{-1} \\ 4 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & \bbox[#FF9944]{0} & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -3 \\ \bbox[#FF9944]{2} \\ 3 \end{array} \!\right] \\ \end{array} \!\right] \\[10pt] & ={\small \left[\! \begin{array}{ccccc} 1\!\cdot\!(-1) +\bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{(-4)} + 3\!\cdot\!(-3) & 1\!\cdot\!1 + \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{3} + 3\!\cdot\!2 & 1\!\cdot\!4 + \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{(-2)}+ 3\!\cdot\!5 & 1\!\cdot\!(-6) + \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{(-1)} + 3\!\cdot\!4 & 1\!\cdot\!(-3) + \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{2} + 3\!\cdot\!3 \\ (-4)\!\cdot\!(-1) + \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{(-4)}+ (-7)\!\cdot\!(-3) & (-4)\!\cdot\!1 + \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{3} + (-7)\!\cdot\!2 & (-4)\!\cdot\!4 + \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{(-2)}+ (-7)\!\cdot\!5 & (-4)\!\cdot\!(-6) + \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{(-1)} + (-7)\!\cdot\!4 & (-4)\!\cdot\!(-3) + \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{2} + (-7)\!\cdot\!3 \\ 6\!\cdot\!(-1) + \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{(-4)} + 9\!\cdot\!(-3) & 6\!\cdot\!1 + \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{3} + 9\!\cdot\!2 & 6\!\cdot\!4 + \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{(-2)} + 9\!\cdot\!5 & 6\!\cdot\!(-6) + \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{(-1)} + 9\!\cdot\!4 & 6\!\cdot\!(-3) + \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{2} + 9\!\cdot\!3 \\ 2\!\cdot\!(-1) + \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{(-4)} + 1\!\cdot\!(-3) & 2\!\cdot\!1 + \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{3} + 1\!\cdot\!2 & 2\!\cdot\!4 + \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{(-2)} + 1\!\cdot\!5 & 2\!\cdot\!(-6) + \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{(-1)} + 1\!\cdot\!4 & 2\!\cdot\!(-3) + \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{2} + 1\!\cdot\!3 \end{array} \!\right] } \\[10pt] &= \left[\! \begin{array}{rrrrr} -2 & 1 & -7 & 8 & 2 \\ 5 & -3 & 9 & -9 & 1 \\ -1 & 0 & -5 & 8 & -7 \\ -5 & 4 & 3 & -8 & -3 \end{array} \!\right]. \end{align*}
Occasionally it is important to know that we can focus on the columns of the first matrix and the rows of the second: "COLUMN-ROW" multiplication is as follows \begin{align*} & \left[\! \begin{array}{rrr} \left[\! \begin{array}{r} 1 \\ -4 \\ 6 \\ 2 \end{array} \!\right] & \left[\! \begin{array}{r} \bbox[#FF9944]{-2} \\ \bbox[#FF9944]{5} \\ \bbox[#FF9944]{-8} \\ \bbox[#FF9944]{0} \end{array} \!\right] & \left[\! \begin{array}{r} 3 \\ -7 \\ 9 \\ 1 \end{array} \!\right] \end{array} \!\right] \left[\!\! \begin{array}{r} \phantom{\biggl|\biggr.} \bigl[\!\begin{array}{ccccc} -1 & 1 & \phantom{-}4 & -6 & -3 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccccc} \bbox[#FF9944]{-4} & \bbox[#FF9944]{3} & \bbox[#FF9944]{-2} & \bbox[#FF9944]{-1} & \bbox[#FF9944]{\phantom{-}2} \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccccc} -3 & 2 & -5 & \phantom{-}4 & \phantom{-}3 \end{array} \!\bigr] \end{array} \!\right] \\[10pt] & \qquad = \left[\! \begin{array}{r} 1 \\ -4 \\ 6 \\ 2 \end{array} \!\right] \bigl[\!\begin{array}{ccccc} -1 & 1 & \ 4 & -6 & -3 \end{array} \!\bigr] + \left[\! \begin{array}{r} \bbox[#FF9944]{-2} \\ \bbox[#FF9944]{5} \\ \bbox[#FF9944]{-8} \\ \bbox[#FF9944]{0} \end{array} \!\right] \bigl[\!\begin{array}{ccccc} \bbox[#FF9944]{-4} & \bbox[#FF9944]{3} & \bbox[#FF9944]{-2} & \bbox[#FF9944]{-1} & \bbox[#FF9944]{\phantom{-}2} \end{array} \!\bigr] + \left[\! \begin{array}{r} 3 \\ -7 \\ 9 \\ 1 \end{array} \!\right] \bigl[\!\begin{array}{ccccc} -3 & 2 & -5 & \phantom{-}4 & \phantom{-}3 \end{array} \!\bigr] \\[10pt] & = {\scriptsize \left[\! \begin{array}{ccccc} 1\!\cdot\!(-1) & 1\!\cdot\!1 & 1\!\cdot\!4 & 1\!\cdot\!(-6) & 1\!\cdot\!(-3) \\ (-4)\!\cdot\!(-1) & (-4)\!\cdot\!1 & (-4)\!\cdot\!4 & (-4)\!\cdot\!(-6) & (-4)\!\cdot\!(-3)\\ 6\!\cdot\!(-1) & 6\!\cdot\!1 & 6\!\cdot\!4& 6\!\cdot\!(-6) & 6\!\cdot\!(-3)\\ 2\!\cdot\!(-1) & 2\!\cdot\!1 & 2\!\cdot\!4& 2\!\cdot\!(-6) & 2\!\cdot\!(-3) \end{array} \!\right] + \left[\! \begin{array}{ccccc} \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{(-4)} & \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{3} & \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{(-2)} & \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{(-1)} & \bbox[#FF9944]{(-2)}\!\cdot\!\bbox[#FF9944]{2} \\ \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{(-4)} & \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{3} & \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{(-2)} & \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{(-1)} & \bbox[#FF9944]{5}\!\cdot\!\bbox[#FF9944]{2} \\ \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{(-4)} & \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{3} & \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{(-2)} & \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{(-1)} & \bbox[#FF9944]{(-8)}\!\cdot\!\bbox[#FF9944]{2} \\ \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{(-4)} & \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{3} & \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{(-2)} & \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{(-1)} & \bbox[#FF9944]{0}\!\cdot\!\bbox[#FF9944]{2} \end{array} \!\right] + \left[\! \begin{array}{ccccc} 3\!\cdot\!(-3) & 3\!\cdot\!2 & 3\!\cdot\!5 & 3\!\cdot\!4 & 3\!\cdot\!3 \\ (-7)\!\cdot\!(-3) & (-7)\!\cdot\!2 & (-7)\!\cdot\!5 & (-7)\!\cdot\!4 & (-7)\!\cdot\!3 \\ 9\!\cdot\!(-3) & 9\!\cdot\!2 & 9\!\cdot\!5 & 9\!\cdot\!4 & 9\!\cdot\!3 \\ 1\!\cdot\!(-3) & 1\!\cdot\!2 & 1\!\cdot\!5 & 1\!\cdot\!4 & 1\!\cdot\!3 \end{array} \!\right] } \\[10pt] & \qquad = \left[\! \begin{array}{rrrrr} -1 & 1 & 4 & -6 & -3 \\ 4 & -4 & -16 & 24 & 12 \\ -6 & 6 & 24 & -36 & -18 \\ -2 & 2 & 8 & -12 & -6 \end{array} \!\right] + \left[\! \begin{array}{rrrrr} \bbox[#FF9944]{8} & \bbox[#FF9944]{-6} & \bbox[#FF9944]{4} & \bbox[#FF9944]{2} & \bbox[#FF9944]{-4} \\ \bbox[#FF9944]{-20} & \bbox[#FF9944]{15} & \bbox[#FF9944]{-10} & \bbox[#FF9944]{-5} & \bbox[#FF9944]{10} \\ \bbox[#FF9944]{32} & \bbox[#FF9944]{-24} & \bbox[#FF9944]{16} & \bbox[#FF9944]{8} & \bbox[#FF9944]{-16} \\ \bbox[#FF9944]{0} & \bbox[#FF9944]{0} & \bbox[#FF9944]{0} & \bbox[#FF9944]{0} & \bbox[#FF9944]{0} \end{array} \!\right]+ \left[\! \begin{array}{rrrrr} -9 & 6 & -15 & 12 & 9 \\ 21 & -14 & 35 & -28 & -21 \\ -27 & 18 & -45 & 36 & 27 \\ -3 & 2 & -5 & 4 & 3 \end{array} \!\right] \\[10pt] & \qquad = \left[\! \begin{array}{rrrrr} -2 & 1 & -7 & 8 & 2 \\ 5 & -3 & 9 & -9 & 1 \\ -1 & 0 & -5 & 8 & -7 \\ -5 & 4 & 3 & -8 & -3 \end{array} \!\right] \end{align*}
Note to myself: provide a proof for the "COLUMN-ROW" multiplication from the preceding item.

Tuesday, March 5, 2024

Suggested problems for Section 7.2: Quadratic Forms: 1, 3, 5, 7, 9, 13, 17, 19, 20, 31, 32, 33, 34, 35. (5th ed: 1, 3, 5, 7, 9, 13, 17, 19, 20, 23, 24, 25, 26, 27).
In Sections 7.2 and 7.3 we study quadratic forms.
A quadratic form in $n$ variables is a special kind of function $Q:\mathbb{R}^n \to \mathbb{R}.$ Below are few examples of quadratic forms
- Below are three specific quadratic forms in two variables: \[ Q(x_1,x_2) = 6 x_1^2 - 4 x_1 x_2 + 3 x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] \[ Q(x_1,x_2) = x_1^2 + 6 x_1 x_2 + x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] \[ Q(x_1,x_2) = 4 x_1^2 + 4 x_1 x_2 + x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] In general, a quadratic form $Q$ in two variables $x_1,x_2$ is a function defined on $\mathbb{R}^2$ with the values in $\mathbb{R}$ which can be expressed as \[ Q(x_1,x_2) = a\, x_1x_1 + b\, x_1x_2 + c\, x_2x_2, \qquad (x_1,x_2) \in \mathbb{R}^2, \] where $a, b, c$ are real coefficients.
- Below are three specific quadratic forms in three variables: \[ Q(x_1,x_2,x_3) = x_1^2 -4x_1 x_2 +4 x_2 x_3 - x_3^2, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] \[ Q(x_1,x_2,x_3) = 4x_1 x_2 + 2 x_1 x_3 + 3 x_2^2 + 4 x_2 x_3, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] \[ Q(x_1,x_2,x_3) = 2 x_1^2 + 2 x_1 x_2 + 2 x_1 x_3 + 2 x_2^2 + 2 x_2 x_3 + 2 x_3^2, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] In general, a quadratic form $Q$ in three variables $x_1,x_2,x_3$ is a function defined on $\mathbb{R}^3$ with the values in $\mathbb{R}$ which can be expressed as \[ Q(x_1,x_2,x_3) = a\, x_1x_1 + b\, x_1x_2 + c\, x_1x_3 + d\, x_2 x_2 + e\, x_2 x_3 + f\, x_3 x_3, \quad (x_1,x_2,x_3) \in \mathbb{R}^3, \] where $a, b, c, d, e, f$ are real coefficients.
- A quadratic form $Q$ in four variables $x_1,x_2,x_3,x_4$ is a function defined on $\mathbb{R}^4$ with the values in $\mathbb{R}$ which is a linear combination of the following ten terms \[ x_1x_1, \quad x_1x_2, \quad x_1x_3, \quad x_1 x_4, \quad x_2 x_2, \quad x_2 x_3, \quad x_2x_4, \quad x_3 x_3, \quad x_3 x_4, \quad x_4 x_4. \] In other words, a quadratic form in four variables is a polynomial in four variables which contains only terms of degree $2.$
- In general, a quadratic form in $n$ variables is a polynomial in $n$ variables which contains only terms of degree $2.$ To be more specific, for $j, k \in \{1,\ldots,n\}$ with $j \leq k$ let us define the functions $q_{jk}:\mathbb{R}^n \to \mathbb{R}$ by \[ q_{jk}(\mathbf{x}) = x_j x_k, \qquad \mathbf{x} = (x_1,\ldots,x_n) \in \mathbb{R}^n. \] Notice that there are $\binom{n+1}{2} = \frac{n(n+1)}{2}$ such functions. A linear combination of the functions $q_{jk}(\mathbf{x})$ with $j, k \in \{1,\ldots,n\}$ with $j \leq k$, is called a quadratic form in $n$ variables.
- For us, the most important fact about quadratic forms is that for each quadratic form $Q$ in $n$ variables there exists a unique symmetric $n\!\times\!n$ matrix $A$ such that \[ Q(\mathbf{x}) = \mathbf{x} \cdot (A\mathbf{x}) = \mathbf{x}^\top A\mathbf{x} \quad \text{for all} \quad \mathbf{x} \in \mathbb{R}^n. \] Such matrix $A$ is called the matrix of a quadratic form.
- In the above example, for all $(x_1,x_2) \in \mathbb{R}^2$ we have \[ Q(x_1,x_2) = a\, x_1x_1 + b\, x_1x_2 + c\, x_2x_2 = \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]^\top \left(\left[\! \begin{array}{cc} a & b/2 \\ b/2 & c \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] \right) \] And for all $(x_1,x_2,x_3) \in \mathbb{R}^3$ we have \begin{align*} Q(x_1,x_2,x_3) &= a\, x_1x_1 + b\, x_1x_2 + c\, x_1x_3 + d\, x_2 x_2 + e\, x_2 x_3 + f\, x_3 x_3 \\ & = \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right]^\top \left(\left[\! \begin{array}{ccc} a & b/2 & c/2 \\ b/2 & d & e/2 \\ c/2 & e/2 & f \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] \right) \end{align*}
In this item I will write about polychotomies in mathematics. A polychotomy is a partition of a given set of mathematical objects into disjoint classes which are all given distinct names.
- A dichotomy is a partition of a given set of mathematical objects into two disjoint classes each of which is given a name. The following are examples of dichotomies.
  - The most important dichotomy for numbers is the partition of numbers into the singleton set $\{0\}$ consisting of only zero and the set of all nonzero numbers. Further, dichotomy for the nonzero real numbers is the partition of the nonzero real numbers into positive real numbers and negative real numbers.
  - An important dichotomy for the set of real numbers is the partition into rational and irrational numbers.
  - A useful dichotomy for complex numbers is the partition of the complex numbers into the real and nonreal numbers. A complex number $z$ is said to be nonreal if the imaginary part of $z$ is nonzero.
  - Consider the set of all square matrices. A square matrix $M$ is said to be singular if $\det M = 0.$ A square matrix $M$ is said to be nonsingular if $\det M \neq 0.$ You also learned that a square matrix is invertible if and only if it is nonsingular. Thus, singular-invertible is a dichotomy for square matrices.
- A trichotomy is a partition of a given set of mathematical objects into three disjoint classes each of which is given a name. The following are examples of trichotomies.
  - The most important trichotomy for the set of real numbers is the partition of numbers into singleton set $\{0\}$ consisting of only zero, the set of positive real numbers and the set of negative real numbers. As we mention before this trichotomy arrises as two dichotomies.
  - In high school you learned about the trichotomy involving quadratic equations $a x^2 + b x + c = 0$ with $a\neq 0.$ Such equation can have: no solutions, exactly one solution, and exactly two solutions.
- A quadruplicity is a partition of a given set of mathematical objects into four disjoint classes each of which is given a name. I started writing about polychotomies because of the quadruplicity which arises with quadratic forms. I define that quadruplicity in the next item.
Let $Q : \mathbb R^n \to \mathbb R$ be a quadratic form. We distinguish the following four types of quadratic forms:
- $Q$ is said to be a zero quadratic form if $Q(\mathbf x) = 0$ for all $\mathbf x \in \mathbb R^n.$
- $Q$ is said to be a positive semidefinite quadratic form if $Q(\mathbf x) \geq 0$ for all $\mathbf x \in \mathbb R^n$ and there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \gt 0.$
- $Q$ is said to be a negative semidefinite quadratic form if $Q(\mathbf x) \leq 0$ for all $\mathbf x \in \mathbb R^n$ and there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \lt 0.$
- $Q$ is said to be an indefinite quadratic form if there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \gt 0$ and there exists $\mathbf u \in \mathbb R^n$ such that $Q(\mathbf u) \lt 0.$
The above four definitions constitute a quadruplicity for the set of quadratic forms. In the textbook the author emphasizes two special kinds of semidefinite forms:
- $Q$ is said to be a positive definite quadratic form if $Q(\mathbf x) \gt 0$ for all $\mathbf x \in \mathbb R^n\!\setminus\!\{\mathbf 0\}.$
- $Q$ is said to be a negative definite quadratic form if $Q(\mathbf x) \lt 0$ for all $\mathbf x \in \mathbb R^n\!\setminus\!\{\mathbf 0\}.$
In the image below I give a graphical representation of the above quadruplicity. The red dot represents the zero quadratic form, the green region represents the positive semidefinite quadratic forms, the blue region represents the negative semidefinite quadratic forms and the cyan region represents the indefinite quadratic forms.

In the image above, the dark green region represents the positive definite quadratic forms and the dark blue region represents the negative definite quadratic forms. These two regions are not parts of the above quadruplicity.

Monday, March 4, 2024

Today we reviewed some proofs that I posted on Saturday, March 2, and we worked out the details of the orthogonal diagonalization and the spectral decomposition of the matrix in the Example posted on Thursday, February 29.

Saturday, March 2, 2024

We continued with Section 7.1: Diagonalization of Symmetric Matrices doing mostly proofs. Suggested problems are 3, 4, 9, 10, 15, 19, 23, 24, 25, 33, 35, 36, 39, 41. (5th edition 3, 4, 9, 10, 15, 19, 23, 24, 25(a), 27, 29, 30, 33, 35).
In this item, we allow matrices to have complex entries. If you need a review, I have written a short summary on Complex Numbers.

Theorem. All eigenvalues of a symmetric matrix are real.

Background Knowledge. Since we are working with complex numbers, in the background knowledge we will review basics operations with complex numbers and matrices with complex entries.

BK1. Let $n\in\mathbb{N}$. An $n\times n$ matrix $A$ is said to be real if all entries of $A$ are real numbers and $A^\top = A$.

BK2. In this item we state properties of the transpose operation for matrices with complex entries. We are familiar with these properties for matrices with real entries. The same properties hold for matrices with complex entries. Let $k,l,m,n \in\mathbb{N}$. Let $X$ be a $k\times l$ matrix, let $Y$ be an $l\times m$ matrix, and $Z$ be an $m\times n$ matrix.

(a) Then \[ (XY)^\top = Y^\top X^\top \quad \text{and} \quad (XYZ)^\top = Z^\top Y^\top X^\top. \] (b) If $\alpha$ is a complex number, then we have \[ (\alpha X)^\top = \alpha X^\top, \quad (\alpha X) Y = X (\alpha Y) = \alpha (XY). \]

BK3. As you can see from my summary on Complex Numbers, the conjugation is an important operation with complex numbers. Recall, first, the imaginary unit $i$ is the complex number such that $i^2 = -1$. Second, for a complex number \[ z = x + y\mkern 2mu i \quad \text{where} \quad x, y \in \mathbb{R}, \] the conjugate of $z$, denoted by $\overline{z}$, is the complex number \[ \overline{z} = x - y\mkern 2mu i. \] Notice that the complex numbers $z$ and its conjugate $\overline{z}$ are reflections of each other across the real axis, see the first picture in Complex Numbers.

Now, three important features of the conjugate.

(a) product of a complex number and its conjugate \[ z \kern 2mu \overline{z} = (x + y\mkern 2mu i) (x - y\mkern 2mu i ) = x^2 + y^2 = |z|^2 \geq 0 \] is nonnegative. In the last equality, $|z|$ is the modulus of the complex number $z$. By definition \[ |z|=\sqrt{x^2 + y^2}. \] (b) We have $z \kern 1mu \overline{z} = |z|^2 = 0$ if and only if $z = 0$.

(c) We have $z = \overline{z}$ if and only if $z \in\mathbb{R}$, that is if $z$ is a real number.

BK4. The operation of complex conjugation extends to matrices. For a matrix $X$, by $\overline{X}$ we denote the matrix in which all the entries of $X$ are replaced by their conjugates. Let $k,l,m \in\mathbb{N}$, let $X$ be a $k\times l$ matrix, let $Y$ be an $l\times m$ matrix, and let $\alpha$ be a complex number.

(a) The following algebraic rules hold for the operation of conjugation \[ \overline{\alpha \mkern 1.5mu X} = \overline{\alpha} \mkern 2mu \overline{X} \quad \text{and} \quad \overline{X Y} = \overline{X} \mkern 2mu \overline{Y}. \] (b) All the entries of a matrix $X$ are real if and only if $X = \overline{X}$.

(c) Let $n\in\mathbb{N}$ and let $\mathbf{v} \in \mathbb{C}$. Then for $\mathbf{v}$ and its conjugate $\overline{\mathbf{v}}$ we have \[ \mathbf{v} = \left[\!\begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_n \end{array}\!\right], \qquad \overline{\mathbf{v}} = \left[\!\begin{array}{c} \overline{v_1} \\ \overline{v_2} \\ \vdots \\ \overline{v_n} \end{array}\!\right], \] and for the product of the $1\times n$ matrix $\mathbf{v}^\top$ and the $n\times 1$ matrix $\overline{\mathbf{v}}$ we have \begin{align*} \mathbf{v}^\top \overline{\mathbf{v}} & = \bigl[\!\begin{array}{cccc} v_1 & v_2 & \cdots & v_n \end{array}\!\bigr] \left[\!\begin{array}{c} \overline{v_1} \\ \overline{v_2} \\ \vdots \\ \overline{v_n} \end{array}\!\right] \\ & = v_1 \overline{v_1} + v_2 \overline{v_2} + \cdots + v_n \overline{v_n} \\ & = |v_1 |^2 + |v_2 |^2 + \cdots + | v_n |^2 \geq 0. \end{align*} If $\mathbf{v}$ is a nonzero vector, then at least one of the nonnegative numbers \[ |v_1 |, \ |v_2 |, \ \ldots , | v_n |^2 \] is positive. Therefore, if $\mathbf{v}$ is a nonzero vector, then $\mathbf{v}^\top \overline{\mathbf{v}} \gt 0$.

Proof. Let $n\in\mathbb{N}$ and let $A$ be an $n\times n$ symmetric matrix. Then $A = A^\top$ and all entries of $A$ are real. Let $\lambda$ be an eigenvalue of $A$ and let $\mathbf{v}$ be a corresponding eigenvector. Then, $\mathbf{v}$ is a nonzero vector and \[ A \mathbf{v} = \lambda \mathbf{v}. \] By BK4(a) we have \[ \overline{A} \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}. \] Since $A$ is a real matrix, by BK4(b), we have $A = \overline{A}$. Therefore, the preceding displayed equality becomes \[ A \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}. \] Let us now calculate, using BK2(a), associativity of matrix multiplication, and the fact that $A = A^\top$, \begin{equation*} (A \mathbf{v})^\top \overline{\mathbf{v}} = (\mathbf{v}^\top A^\top) \overline{\mathbf{v}} = \mathbf{v}^\top ( A^\top \overline{\mathbf{v}}) = \mathbf{v}^\top ( A \overline{\mathbf{v}}). \end{equation*} Thus $(A \mathbf{v})^\top \overline{\mathbf{v}} = \mathbf{v}^\top ( A \overline{\mathbf{v}})$. Substituting here the equalities $A \mathbf{v} = \lambda \mathbf{v}$ and $A \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}$, we obtain \[ (\lambda \mathbf{v})^\top \overline{\mathbf{v}} = \mathbf{v}^\top \bigl( \overline{\lambda} \overline{\mathbf{v}}\bigr). \] Applying BK2(b) to the last equality we get \[ \lambda (\mathbf{v}^\top \overline{\mathbf{v}}) = \overline{\lambda} (\mathbf{v}^\top \overline{\mathbf{v}}). \] Since $\mathbf{v}$ is a nonzero vector, by BK4(c) we have that $\mathbf{v}^\top \overline{\mathbf{v}} \gt 0$. Therefore, the last equality yields \[ \lambda = \overline{\lambda} . \] By BK3(c), we deduce that $\lambda$ is real. QED.
The proof in the preceding item involves a lot of algebra with complex numbers. Here I present a proof that the eigenvalues of a $2\!\times\!2$ symmetric matrix are real. This proof is based on the quadratic formula.

Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ symmetric matrix. To calculate the eigenvalues of $A$ we solve the equation $\det(A-\lambda I) =0$ for $\lambda$. That is, we solve: \[ 0 = \left| \begin{matrix} a - \lambda & b \\ b & d -\lambda \end{matrix} \right| = (a-\lambda)(d-\lambda) - b^2 = \lambda^2 -(a+d)\lambda + ad -b^2. \] The discriminant of the preceding quadratic equation is: \begin{align*} (a+d)^2 -4(ad-b^2) &= a^2 + 2ad + d^2 - 4ad +4 b^2 \\ & = a^2 - 2ad + d^2 +4 b^2 \\ & = (a-d)^2+4b^2 \geq 0. \end{align*} Solving for $\lambda$ we get \[ \lambda_{1,2} = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a+d)^2 - 4 b^2} \Bigr) = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since the discriminant is nonnegative, that is $(a-d)^2 + 4 b^2 \geq 0$, both eigenvalues are real. In fact, if $(a-d)^2 + 4 b^2 = 0$, then $b = 0$ and $a=d$, so our matrix is a multiple of an identity matrix. Otherwise, that is if $(a-d)^2 + 4 b^2 \gt 0$, the symmetric matrix has two distinct eigenvalues \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr) \lt \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr). \]
Since we proved that eigenvalues of a symmetric matrix are real, we do not need to deal with complex numbers when studying symmetric matrices.

Theorem. Eigenspaces of a symmetric matrix which correspond to distinct eigenvalues are orthogonal.

Proof. Let $A$ be a symmetric $n\!\times\!n$ matrix. Let $\lambda$ and $\mu$ be an eigenvalues of $A$ and let $\mathbf{u}$ and $\mathbf{v}$ be a corresponding eigenvector. Then $\mathbf{u} \neq \mathbf{0},$ $\mathbf{v} \neq \mathbf{0}$ and \[ A \mathbf{u} = \lambda \mathbf{u} \quad \text{and} \quad A \mathbf{v} = \mu \mathbf{v}. \] Assume that \[ \lambda \neq \mu. \] Next we calculate the same dot product in two different ways; here we use the fact that $A^\top = A$ and algebra of the dot product. The first calculation: \[ (A \mathbf{u})\cdot \mathbf{v} = (\lambda \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \] The second calculation: \begin{align*} (A \mathbf{u})\cdot \mathbf{v} & = (A \mathbf{u})^\top \mathbf{v} = \mathbf{u}^\top A^\top \mathbf{v} = \mathbf{u} \cdot \bigl(A^\top \mathbf{v} \bigr) = \mathbf{u} \cdot \bigl(A \mathbf{v} \bigr) \\ & = \mathbf{u} \cdot (\mu \mathbf{v} ) = \mu ( \mathbf{u} \cdot \mathbf{v}) \end{align*} Since, \[ (A \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \quad \text{and} \quad (A \mathbf{u})\cdot \mathbf{v} = \mu (\mathbf{u}\cdot\mathbf{v}) \] we conclude that \[ \lambda (\mathbf{u}\cdot\mathbf{v}) - \mu (\mathbf{u}\cdot\mathbf{v}) = 0. \] Therefore \[ ( \lambda - \mu ) (\mathbf{u}\cdot\mathbf{v}) = 0. \] Since we assume that $ \lambda - \mu \neq 0,$ the previous displayed equality yields \[ \mathbf{u}\cdot\mathbf{v} = 0. \] This proves that any two eigenvectors corresponding to distinct eigenvalues are orthogonal. Thus, the eigenspaces corresponding to distinct eigenvalues are orthogonal. QED.
Theorem. A symmetric $2\!\times\!2$ matrix is orthogonally diagonalizable.

Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ be a symmetric matrix. We need to prove that there exists an orthogonal $2\!\times\!2$ matrix $U$ and a diagonal $2\!\times\!2$ matrix $D$ such that $A = UDU^\top.$ The eigenvalues of $A$ are \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr), \quad \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since clearly \[ (a-d)^2 + 4 b^2 \geq 0, \] the eigenvalues $\lambda_1$ and $\lambda_2$ are real numbers.

If $\lambda_1 = \lambda_2$, then $(a-d)^2 + 4 b^2 = 0$, and consequently $b= 0$ and $a=d$; that is $A = \begin{bmatrix} a & 0 \\ 0 & a \end{bmatrix}$. Hence $A = UDU^\top$ holds with $U=I_2$ and $D = A$.

Now assume that $\lambda_1 \neq \lambda_2$. Let $\mathbf{u}_1$ be a unit eigenvector corresponding to $\lambda_1$ and let $\mathbf{u}_2$ be a unit eigenvector corresponding to $\lambda_2$. We proved that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal. Since $A$ is symmetric, $\mathbf{u}_1$ and $\mathbf{u}_2$ are orthogonal, that is the matrix $U = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 \end{bmatrix}$ is orthogonal. Since $\mathbf{u}_1$ and $\mathbf{u}_2$ are eigenvectors of $A$ we have \[ AU = U \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = UD. \] Therefore $A=UDU^\top.$ This proves that $A$ is orthogonally diagonalizable. QED.

Second Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ an arbitrary $2\!\times\!2$ be a symmetric matrix. If $b=0$, then an orthogonal diagonalization is \[ \begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \]

Assume that $b\neq0.$ For the given $a,b,c \in \mathbb{R},$ introduce three new coordinates $z \in \mathbb{R},$ $r \in (0,+\infty),$ and $\theta \in (0,\pi)$ such that \begin{align*} z & = \frac{a+d}{2}, \\ r & = \sqrt{\left( \frac{a-d}{2} \right)^2 + b^2}, \\ \cos(2\theta) & = \frac{\frac{a-d}{2}}{r}, \quad \sin(2\theta) = \frac{b}{r}. \end{align*} The reader will notice that these coordinates are very similar to the cylindrical coordinates in $\mathbb{R}^3.$
It is now an exercise in matrix multiplication and trigonometry to calculate \begin{align*} & \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} z+r & 0 \\ 0 & z-r \end{bmatrix}\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} (z+r) \cos(\theta) & (z+r) \sin(\theta) \\ (r-z)\sin(\theta) & (z-r) \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} (z+r) (\cos(\theta))^2 - (r-z)(\sin(\theta))^2 & (z+r) \cos(\theta) \sin(\theta) -(z-r) \cos(\theta) \sin(\theta) \\ (z+r) \cos(\theta) \sin(\theta) + (r-z) \cos(\theta) \sin(\theta) & (z+r) (\sin(\theta))^2 + (z-r)(\cos(\theta))^2 \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} z + r \cos(2\theta) & r \sin(2\theta) \\ r \sin(2\theta) & z - r \cos(2\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \frac{a+d}{2} + \frac{a-d}{2} & b \\ b & \frac{a+d}{2} - \frac{a-d}{2} \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} a & b \\ b & d \end{bmatrix}. \end{align*} QED.
Theorem. For every positive integer $n$, a symmetric $n\!\times\!n$ matrix is orthogonally diagonalizable.

Proof. (You can skip this proof.) This statement can be proved by Mathematical Induction. The base case $n = 1$ is trivial. The case $n=2$ is proved above. To get a feel how mathematical induction proceeds we will prove the theorem for $n=3.$

Let $A$ be a $3\!\times\!3$ symmetric matrix. Then $A$ has an eigenvalue, which must be real. Denote this eigenvalue by $\lambda_1$ and let $\mathbf{u}_1$ be a corresponding unit eigenvector. Let $\mathbf{v}_1$ and $\mathbf{v}_2$ be unit vectors such that the vectors $\mathbf{u}_1,$ Let $\mathbf{v}_1$ and $\mathbf{v}_2$ form an orthonormal basis for $\mathbb R^3.$ Then the matrix $V_1 = \bigl[\mathbf{u}_1 \ \ \mathbf{v}_1\ \ \mathbf{v}_2\bigr]$ is an orthogonal matrix and we have \[ V_1^\top A V_1 = \begin{bmatrix} \mathbf{u}_1^\top A \mathbf{u}_1 & \mathbf{u}_1^\top A \mathbf{v}_1 & \mathbf{u}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{u}_1 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{u}_1 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] Since $A = A^\top$, $A\mathbf{u}_1 = \lambda_1 \mathbf{u}_1$ and since $\mathbf{u}_1$ is orthogonal to both $\mathbf{v}_1$ and $\mathbf{v}_2$ we have \[ \mathbf{u}_1^\top A \mathbf{u}_1 = \lambda_1, \quad \mathbf{v}_j^\top A \mathbf{u}_1 = \lambda_1 \mathbf{v}_j^\top \mathbf{u}_1 = 0, \quad \mathbf{u}_1^\top A \mathbf{v}_j = \bigl(A \mathbf{u}_1\bigr)^\top \mathbf{v}_j = 0, \quad \quad j \in \{1,2\}, \] and \[ \mathbf{v}_2^\top A \mathbf{v}_1 = \bigl(\mathbf{v}_2^\top A \mathbf{v}_1\bigr)^\top = \mathbf{v}_1^\top A^\top \mathbf{v}_2 = \mathbf{v}_1^\top A \mathbf{v}_2. \] Hence, \[ \tag{**} V_1^\top A V_1 = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] 0 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] By the already proved theorem for $2\!\times\!2$ symmetric matrix there exists an orthogonal matrix $\begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}$ and a diagonal matrix $\begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix}$ such that \[ \begin{bmatrix} \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \end{bmatrix} = \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix} \begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}^\top. \] Substituting this equality in (**) and using some matrix algebra we get \[ V_1^\top A V_1 = \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} % \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} % \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix}^\top \] Setting \[ U = V_1 \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} \quad \text{and} \quad D = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} \] we have that $U$ is an orthogonal matrix, $D$ is a diagonal matrix and $A = UDU^\top.$ This proves that $A$ is orthogonally diagonalizable. QED.

Thursday, February 29, 2024

We started Section 7.1: Diagonalization of Symmetric Matrices. Suggested problems are 3, 4, 9, 10, 15, 19, 23, 24, 25, 33, 35, 36, 39, 41. (5th edition 3, 4, 9, 10, 15, 19, 23, 24, 25(a), 27, 29, 30, 33, 35).
- I want to emphasize the importance of Theorem 2 in Section 7.1. The "only if" part of the theorem below is part (d) of Theorem 3. The "if" part of the theorem below is Exercise 25
  
  Theorem. Let $n$ be a positive integer and let $A$ be an $n\times n$ matrix. Then, $A$ is symmetric if and only if it is orthogonally diagonalizable.
- Theorem 3(d) says: Let $n$ be a positive integer and let $A$ be an $n\times n$ matrix. Then: If $A$ is symmetric, then $A$ is orthogonally diagonalizable.
  
  Exercise 25 says: Let $n$ be a positive integer and let $A$ be an $n\times n$ matrix. Then: If $A$ is orthogonally diagonalizable, then $A$ is symmetric.
  
  To prove Exercise 25, we proceed as follows. Assume that $A$ is orthogonally diagonalizable. Then there exist an orthogonal matrix $U$ and a diagonal matrix $D$ such that $A=UDU^\top$. To prove that $A$ is symmetric we calculate $A^\top$: \[ A^\top = \bigl( UDU^\top \bigr)^\top = \bigl(U^\top\bigr)^\top D^\top U^\top = UDU^\top. \] In the last calculation, we used the property of the transpose that $(XY)^\top = Y^\top X^\top$ and consequently \[ (XYZ)^\top = \bigl((XY)Z\bigr)^\top = Z^\top (XY)^\top = Z^\top Y^\top X^\top. \] We also used that every diagonal matrix is symmetric, $D^\top = D$ and repeated transpose does not change a matrix $\bigl(X^\top\bigr)^\top = X$.
- The theorem in the first item above is essential for solving Exercise 33:
  
  Suppose $ A $ and $ B $ are both orthogonally diagonalizable and $ AB = BA $. Explain why $ AB $ is also orthogonally diagonalizable.
  
  In fact, more mathematically, the exercise should have been stated:
  
  If $ A $ and $ B $ are both orthogonally diagonalizable and $ AB = BA $, prove that $ AB $ is also orthogonally diagonalizable.
  
  Proof goes like this. Assume that $ A $ and $ B $ are both orthogonally diagonalizable and $ AB = BA $. By the "if" part of the theorem stated in the first item (that is by Exercise 25) we conclude that $ A $ and $ B $ are both symmetric. Now we will use the fact that $A=A^\top$ and $B=B^\top$ and the fact that $A$ and $B$ commute to prove that $AB$ is symmetric: \[ (AB)^\top = (BA)^\top = A^\top B^\top = AB. \] Thus, $AB$ is symmetric. Now the "only if" part of the theorem stated in the first item (that is Theorem 3(d)) implies that $AB$ is orthogonally diagonalizable.
Example. Let us find a spectral decomposition of the matrix \[ A = \left[\! \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \!\right]. \]
- To find the characteristic polynomial of this matrix we calculate \begin{align*} \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 4 & 3 - \lambda & 2 \\ 2 & 2 & -\lambda \end{array} \right| & = \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 0 & -1 - \lambda & 2 + 2 \lambda \\ 2 & 2 & -\lambda \end{array} \right| \\ & = (1+\lambda) \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 0 & -1 & 2 \\ 2 & 2 & -\lambda \end{array} \right| \\ & = (1+\lambda) \left| \begin{array}{ccc} 3 - \lambda & 0 & 10 \\ 0 & -1 & 2 \\ 2 & 0 & 4-\lambda \end{array} \right| \\ & = (-1) (1+\lambda) \left| \begin{array}{cc} 3 - \lambda & 10 \\ 2 & 4-\lambda \end{array} \right| \\ & = (-1) (1+\lambda) \bigl(\lambda^2 - 7 \lambda - 8 \bigr)\\ & = (-1) (\lambda+1)^2 (\lambda - 8 \bigr)\\ & = -\lambda^3 + 6 \lambda^2 + 15 \lambda + 8. \end{align*} Thus the roots of the characteristic polynomial, that is the eigenvalues of $A$, are $-1$ and $8.$ The algebraic multiplicity of $-1$ is $2$ and the algebraic multiplicity of $8$ is $1.$ Therefore we expect that the eigenspace corresponding to $-1$ will be two-dimensional and the eigenspace corresponding to $8$ will be one-dimensional and the
- Next we find the corresponding eigenspaces: \begin{align*} \operatorname{Nul}(A-(-1)I_3) & = \operatorname{Span} \left\{ \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right], \left[ \begin{array}{r} -1 \\ 0 \\ 2 \end{array} \right] \right\}, \\ \operatorname{Nul}(A-8I_3) & = \operatorname{Span} \left\{ \left[ \begin{array}{r} 2 \\ 2 \\ 1 \end{array} \right] \right\} \end{align*}
- To find an orthogonal diagonalization of $A$ we need unit eigenvectors which are orthogonal to each other. The eigenvector $\bigl[ 2 \ 2 \ 1 \bigr]^\top$ is "nice" since its length is integer $3$ (that is, it does not involve square-root). However neither of the eigenvectors in the basis of the eigenspace corresponding to $-1$ has this property. But, if we add the vectors of the basis of the eigenspace corresponding to $-1$ we get $\bigl[ -2 \ 1 \ 2 \bigr]^\top$ and the length of this vector is also $3$. Now we need to find an eigenvector corresponding to $-1$ orthogonal to $\bigl[ -2 \ 1 \ 2 \bigr]^\top.$ To do that we use $\bigl[ -1 \ 1 \ 0 \bigr]^\top$ and apply the Gram-Schmidt orthogonalization to two vectors. \begin{align*} \mathbf{v}_1 & = \left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right], \\ \mathbf{v}_2 & = \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right] - \frac{3}{9} \left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right] = \left[ \begin{array}{r} -1/3 \\ 2/3 \\ -2/3 \end{array} \right] \quad \text{take the opposite vector, fewer - signs!} \end{align*} Thus, three orthogonal unit eigenvectors of $A$ are \[ \frac{1}{3}\left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right], \quad \frac{1}{3}\left[ \begin{array}{r} 1 \\ -2 \\ 2 \end{array} \right], \quad \frac{1}{3}\left[ \begin{array}{r} 2 \\ 2 \\ 1 \end{array} \right]. \]
- Thus, the orthogonal diagonalization of $A$ is \[ \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \left[ \begin{array}{rrr} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 8 \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right]. \]
- (We will talk more about this in class.) We will develop an alternative way of writing matrix $A$ as a linear combination of orthogonal projections onto the eigenspaces of $A$.
- The columns of \[ \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \] form an orthonormal basis for $\mathbb{R}^3$ which consists of unit eigenvectors of $A.$
- The first two columns \[ \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \] form an orthonormal basis for the eigenspace of $A$ corresponding to $-1.$ The last column \[ \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \] is an orhonormal basis for the eigenspace of $A$ corresponding to $8.$
- The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $-1$ is \[ P_{-1} = \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] \]
- The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $8$ is \[ P_8 = \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \left[ \begin{array}{ccc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right]. \]
- Since the eigenvectors that we used above form a basis for $\mathbb{R}^3$ we have \[ P_{-1} + P_8 = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] + \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = I_3. \]
- The equality in the preceding item means that for every $\mathbf{v} \in \mathbb{R}^3$ we have \[ \mathbf{v} = P_{-1} \mathbf{v} + P_8 \mathbf{v}. \] Since $P_{-1}$ is the orthogonal projection onto the eigenspace of $A$ corresponding to $-1$ we have \[ A P_{-1} \mathbf{v} = (-1) P_{-1} \mathbf{v}. \] Similarly, since $P_{8}$ is the orthogonal projection onto the eigenspace of $A$ corresponding to $8$ we have \[ A P_{8} \mathbf{v} = 8 P_{8} \mathbf{v}. \] Therefore \[ A\mathbf{v} = A P_{-1} \mathbf{v} + A P_8 \mathbf{v} = (-1)P_{-1} \mathbf{v} + 8 P_{8} \mathbf{v} = \bigl((-1)P_{-1} + 8 P_{8}\bigr) \mathbf{v}. \] Since the last equality holds for all $\mathbf{v} \in \mathbb{R}^3$, we proved that \[ A = (-1)P_{-1} + 8 P_{8}, \] or, with matrices \[ \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = (-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 8 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] The last two equalities are called spectral decomposition of $A.$ Although it sounds mathematically surprising, the diagonalization of $A$ is equivalent to the preceding equality.
- The orthogonal projection matrix $P_{-1}$ onto $\operatorname{Nul}(A-(-1)I_3)$ and the orthogonal projection matrix $P_8$ onto $\operatorname{Nul}(A-8I_3)$ have the following properties: \[ (P_{-1})^2 = P_{-1}, \quad P_{-1}^\top = P_{-1}, \quad I_3 - P_{-1} = P_8, \quad P_{-1} P_8 = 0, \] \[ (P_{8})^2 = P_{8}, \quad P_{8}^\top = P_{8}, \quad I_3 - P_{8} = P_{-1}, \quad P_{8} P_{-1} = 0. \]
- Please enjoy: \[ I_3 = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right]. \] And then just by scaling the projections with the eigenvalues we get \[ A = \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = (-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 8 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] We can modify the eigenvalues and define a new matrix: \[ B = \frac{1}{3} \left[ \begin{array}{ccc} 1 & 4 & 2 \\ 4 & 1 & 2 \\ 2 & 2 & -2 \end{array} \right] =(-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 2 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] Pay attention to the changes that I made and you can tell (or guess) without any calculations what is the relationship between the matrices $B$ and $A.$

Wednesday, February 28, 2024

I post several pictures related to Problem 4 on Assignment 3.
In the pictures below, the positive direction of the vertical axes points downward for aesthetic reasons.

The intersection of the canonical rotated paraboloid $z=x^2+y^2$ and a plane $z=ax+by+c$ is an ellipse, provided that $a^2+b^2+4c>0$. The projection of this ellipse onto the $xy$-plane is always a circle. Always! I find this amazing!

Notice that the picture below is "upside-down,"
with the positive direction of the $z$-axis pointing downwards.

How would we determine the intersection of the paraboloid and the plane? Recall, the paraboloid is the set of points $(x,y, x^2 + y^2)$, while the plane is the set of points $(x,y, ax+by+c)$. For a point $(x,y,z)$ to be both, on the paraboloid, and on the plane we must have \[ x^2 + y^2 = a x + b y + c. \] Which points $(x,y)$ in the $xy$-plane satisfy the preceding equation? Rewrite the equation as \[ x^2 - a x + y^2 - b y = c, \] and completing the squares comes to the rescue, so we obtain the following equation: \[ \boxed{\left(x - \frac{a}{2} \right)^2 + \left(y-\frac{b}{2}\right)^2 = \frac{a^2}{4} + \frac{b^2}{4} + c.} \] The boxed equation is the equation of a circle in $xy$-plane centered at the point $(a/2,b/2)$ with the radius $\displaystyle\frac{1}{2}\sqrt{a^2+b^2+4c}$. Above this circle, in the plane $z=ax+by+c$ is the ellipse which is also on the paraboloid $z=x^2+y^2$. The circle whose equation is boxed, we will call the circle determined by the paraboloid $z=x^2+y^2$ and the plane $z=ax+by+c$.
The picture below illustrates Problem 4(a). The orange points are the points $P_1$, $P_2$, $P_3$, $P_4$. The points $Q_1$, $Q_2$, $Q_3$, $Q_4$ are the corresponding points on the rotated paraboloid. I colored them red, but these red points don't show well on the paraboloid. The gray line segments are $P_1Q_1$, $P_2Q_2$, $P_3Q_3$, $P_4Q_4$. These line segments connect the points $P_1$, $P_2$, $P_3$, $P_4$ in the $xy$-plane and the corresponding points $Q_1$, $Q_2$, $Q_3$, $Q_4$ on the rotated paraboloid.
The picture below illustrates the solution of Problem 4(a). The yellow plane is the least-squares fit plane that best fits the points $Q_1$, $Q_2$, $Q_3$, $Q_4$.
The pictures below illustrate the solution of Problem 4(c). The black circle in $xy$-plane is, in some sense, the best fit circle for the points $P_1$, $P_2$, $P_3$, $P_4$.
Below I will describe in more details the method of finding the best fit circle to a given set of points. The method is identical to finding the least-squares fit plane to a set of given points. In this case all the given points lie on the canonical rotated paraboloid $z = x^2+y^2.$
- The standard equation for a circle in $\mathbb{R}^2$ centered at the point $(a,b)$ with the radius $r \gt 0$ is \[ (x-a)^2 + (y-b)^2 = r^2. \] Expending the squares and grouping terms, the preceding equality is equivalent to \[ x^2 + y^2 - (2a) x - (2b) y - (r^2 - a^2 - b^2) = 0. \] Substituting \[ \beta_0 = r^2 - a^2 - b^2, \quad \beta_1 = 2a, \quad \beta_2 = 2b, \] we rewrite the preceding equality as \[ \require{bbox} \bbox[5px, #FFFF00, border: 1pt solid #888800]{\beta_0 + \beta_1 x + \beta_2 y = x^2 + y^2}. \] Although the last equation does not look like an equation of a circle, it is an equation of a circle, provided that \[ \beta_0 + \bigl(\beta_1/2\bigr)^2 + \bigl(\beta_2/2\bigr)^2 \gt 0. \]
- Let $n$ be a positive integer greater than $2.$ Assume that we are given $n$ noncollinear points in $\mathbb{R}^2$: \[ (x_1, y_1), \ \ (x_2, y_2), \ \ \ldots, \ (x_n, y_n). \]
- We want to find an equation of a circle which fits these points the best. We use the yellow highlighted equation of a circle. The unknown quantities are $\color{red}{\beta_0},$ $\color{red}{\beta_1},$ $\color{red}{\beta_2}.$ The linear equations that we need to solve are \begin{align*} \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_1 + \color{red}{\beta_2} y_1 &= (x_1)^2 + (y_1)^2 \\ \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_2 + \color{red}{\beta_2} y_2 &= (x_2)^2 + (y_2)^2 \\ & \ \ \vdots \\ \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_n + \color{red}{\beta_2} y_n &= (x_n)^2 + (y_n)^2 \end{align*}
- In matrix form the above system is: \[ \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \] This system is identical to the system for finding the least-squares plane that best fits the points \[ \bigl(x_1, y_1, (x_1)^2 + (y_1)^2 \bigr), \ \ \bigl(x_2, y_2, (x_2)^2 + (y_2)^2\bigr), \ \ \ldots, \ \bigl(x_n, y_n, (x_n)^2 + (y_n)^2\bigr). \] These points are all on the canonical rotated paraboloid $z=x^2+y^2,$ as explained at the beginning of today's post.
- The normal equations for the system from the preceding item are \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \]

In the items below I will present the Mathematica code that I wrote to automate finding the best fit circle. The argument of this function is a given set of points, and the output of this function is the center and the radius of the best fit circle.
You do not need Mathematica to solve Problem 4 on Assignment 3. I am posting the Mathematica code here for two reasons: first, some readers might want to engage with the code; second, I find keeping a record of what I do to be useful.
To get you started with Mathematica I created my Mathematica website. At the end of the Mathematica website you will find a section about Linear Algebra in Mathematica.

Now that we have a linear algebra method for finding the best fit circle to a set of points, I wrote the Mathematica command BestCir[] to automate finding of the best fit circle:


Clear[BestCir, gpts, mX, vY, abc];
BestCir[gpts_] := Module[
  {mX, vY, abc},
  mX = Transpose[Append[Transpose[gpts], Array[1 &, Length[gpts]]]];
  vY = (#[[1]]^2 + #[[2]]^2) & /@ gpts;
  abc = Last[
    Transpose[
     RowReduce[
         Transpose[
       Append[Transpose[Transpose[mX] . mX], Transpose[mX] . vY]
                        ]
                     ]
                   ]
                 ];
  {{abc[[1]]/2, abc[[2]]/2}, Sqrt[abc[[3]] + (abc[[1]]/2)^2 + (abc[[2]]/2)^2]}
                              ]

You can copy-paste this command to a Mathematica notebook and test it on a set of points. The output of the command is a pair of the best circle's center and the best circle's radius.

Below is an example for the above command. First I name a set of points mypts, then I use the above command to find the best circles's center and the radius, which are named cir


mypts = {{5, 2}, {-1, 5}, {3, -2}, {3, 4.5}, {-5/2, 3}, {1, 5}, {4,
    3}, {-3, 1}, {-3/2, 4}, {1, -3}, {-2, -1}, {4, -1}};

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

In this example, we use our command to obtain the exact circle through three noncollinear points. First I name a set of points mypts, then I use the above command to find the best circles's center and the radius, which are named cir


mypts = {{3, 1}, {2, -4}, {-2, 3}};

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

In this example, we randomly generate 100 points in close to the circle centered at the origin with radius 4. Then we use our command to obtain the best fit circle. We name a set of hundred points mypts, then we use the above command to find the best circles's center and the radius, which are named cir


mypts = ((4 {Cos[2 Pi #[[1]]], Sin[2 Pi #[[1]]]} +
       1/70 {#[[2]], #[[3]]}) & /@ ((RandomReal[#, 3]) & /@
      Range[100]));

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

Tuesday, February 27, 2024

We discussed Section 6.6: Linear Models Suggested problems for Section 6.6: 1, 2, 3, 4, 11, 12, 13, 14, 15, 20 (5th edition: 1, 2, 3, 4, 5, 6, 7, 8, 9, 14)
Exercise 4 in Section 6.6 is an interesting problem. In this exercise we are given four data points \[ ( 2,3), \ \ (3,2), \ \ (5,1), \ \ (6,0), \] and we are asked to find the least-squares line that best fits the given data points. (We will call this line simply the least-squares line.)
- Notice that these four points form a very narrow parallelogram. A characterizing property of a parallelogram is that its diagonals share the midpoint. For this parallelogram, the coordinates of the common midpoint of the diagonals are \[ \overline{x} = \frac{1}{4}(2+3+5+6) = 4, \quad \overline{y} = \frac{1}{4}(3+2+1+0) = 3/2. \] The long sides of this parallelogram are on the parallel lines \[ y = -\frac{2}{3}x +4 \quad \text{and} \quad y = -\frac{2}{3}x + \frac{13}{3}. \] It is natural to guess that the least square line is the line which is parallel to these two lines and half-way between them. That is the line \[ y = -\frac{2}{3}x + \frac{25}{6}. \] This line is the red line in the picture below. Clearly this line goes through the point $(4,3/2),$ the intersection of the diagonals of the parallelogram.
- The only way to verify the guess from the preceding item is to calculate the least-squares line for these four points. Let us find the least-squares solution of the equation \[ \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] To get to the corresponding normal equations we multiply both sides by $X^\top$ \[ \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] The corresponding normal equations are \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 6 \\ 17 \end{array} \right]. \] Since the inverse of the above $2\!\times\!2$ matrix is \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right]^{-1} = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right], \] and the solution of the normal equations is unique and it is given by \[ \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right] \left[\begin{array}{c} 6 \\ 17 \end{array} \right] = \left[\begin{array}{c} \frac{43}{10} \\ -\frac{7}{10} \end{array} \right] \]
- Hence, the least-squares line for the given data points is \[ y = -\frac{7}{10}x + \frac{43}{10}. \] This line is the blue line in the picture below. The picture below strongly indicates that the blue line also goes through the point $(4,3/2).$ This is easily confirmed: \[ \frac{3}{2} = -\frac{7}{10}4 + \frac{43}{10}. \]
- Finally, let us calculate the least-squares error for the line which we guessed at the beginning of this problem and the least-squares line that we calculated above, that is \[ y = -\frac{2}{3}x + \frac{25}{6} \quad \text{and} \quad y = -\frac{7}{10}x + \frac{43}{10}. \] Below I calculate the residual vectors for the guessed line and for the least-squares line are as follows: \[ \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right] - \left[\begin{array}{r} \frac{17}{6} \\ \frac{13}{6} \\ \frac{5}{6} \\ \frac{1}{6} \end{array} \right] = \left[\begin{array}{r} \frac{1}{6} \\ -\frac{1}{6} \\ \frac{1}{6} \\ -\frac{1}{6} \end{array} \right] \quad \text{and} \quad \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right] - \left[\begin{array}{r} \frac{29}{10} \\ \frac{11}{5} \\ \frac{4}{5} \\ \frac{1}{10} \end{array} \right] = \left[\begin{array}{r} \frac{1}{10} \\ -\frac{1}{5} \\ \frac{1}{5} \\ -\frac{1}{10} \end{array} \right], \] with the corresponding norms \[ \frac{1}{3} \approx 0.333333 \qquad \text{and} \qquad \frac{1}{\sqrt{10}} \approx 0.316228 \] As it should be, the least-squares error for the least-squares line is smaller than the lease-squares error for the line guessed at the beginning of the problem.
In the image below the forest green points are the given data points.

The red line is the line which I guessed could be the least-squares line.

The blue line is the true least-squares line.
It is amazing that what we observed in the preceding example is universal.

Proposition. If the line $y = \beta_0 + \beta_1 x$ is the least-squares line for the data points \[ (x_1,y_1), \ldots, (x_n,y_n), \] then $\overline{y} = \beta_0 + \beta_1 \overline{x}$, where \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \]
The above proposition is Exercise 20 (5th edition Exercise 14) in Section 6.6.
Proof. Let \[ (x_1,y_1), \ldots, (x_n,y_n), \] be given data points and set \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \] Let $y = \beta_0 + \beta_1 x$ be the least-squares line for the given data points. Then the vector $\left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right]$ satisfies the normal equation \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] Multiplying the second matrix on the left-hand side and the third vector we get \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 + \beta_1 x_1 \\ \beta_0 + \beta_1 x_2 \\ \vdots \\ \beta_0 + \beta_1 x_n \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] The above equality is an equality of vectors with two components. The top components of these vectors are equal: \[ (\beta_0 + \beta_1 x_1) + (\beta_0 + \beta_1 x_2) + \cdots + (\beta_0 + \beta_1 x_n) = y_1 + y_2 + \cdots + y_n. \] Therefore \[ n \beta_0 + \beta_1 (x_1+x_3 + \cdots + x_n) = y_1 + y_2 + \cdots + y_n. \] Dividing by $n$ we get \[ \beta_0 + \beta_1 \frac{1}{n} (x_1+x_3 + \cdots + x_n) = \frac{1}{n}( y_1 + y_2 + \cdots + y_n). \] Hence \[ \overline{y} = \beta_0 + \beta_1 \overline{x}. \] QED.
Do the following problem: Consider the following four data points \[ ( 0, 0, 5), \ \ (3, 0, 6), \ \ (3, 3, 14), \ \ (0, 3, 9). \]
- Find the equation $z = \beta_0 + \beta_1 x +\beta_2 y$ of the least-squares plane that best fits the data points.
- Find the coordinates of the dark green points and the teal points in the picture below.
- Calculate the residual vector and the least-squares error.
- Find the equation of the plane through the data points \[ ( 0, 0, 5), \ \ (3, 0, 6), \ \ (0, 3, 9). \] Show that the least-squares error is larger for this plane than the error for the least-squares plane.
In the image below the navy blue points are the given data points and the light blue plane is the least-squares plane that best fits these data points. The dark green points are their projections onto the $xy$-plane. The teal points are the corresponding points in the least-square plane.

Monday, February 26, 2024

We discussed Section 6.5 today. Suggested problems for Section 6.5: 1, 3, 6, 7, 9, 11, 12, 13, 16, 17, 18, 19, 20, 21, 27, 28, 29, 30, 31, 32.
Exercise 27 in Section 6.5 (6th edition) is the following theorem:

Theorem. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then \[ \operatorname{Nul}(A) = \operatorname{Nul}\bigl(A^\top\!A\bigr). \]

Background Knowledge established in Math 204 is as follows.

BK1. Matrix multiplication is associative. Let $j, k, l, m$ be positive integers. Let $X$ be a $j\times k$ matrix, $Y$ be a $k \times l$ matrix, and $Z$ be an $l \times m$ matrix. Then $XY$ is $j\times l$ matrix, $YZ$ is $k\times m$ and $(XY)Z = X(YZ)$.

BK2. How to calculate the transpose of a product of two matrices. Let $j, k, l$ be positive integers. Let $X$ be a $j\times k$ matrix, and let $Y$ be a $k \times l$ matrix. Then $(XY)^\top = Y^\top\mkern -2mu X^\top$. A mnemonic tool to internalize this rule, is to write the sizes of all the matrices that are involved in this equality: $XY$ is a $j\times l$ matrix, $(XY)^\top$ is an $l \times j$ matrix, $X^\top$ is a $k\times j$ matrix, $Y^\top$ is an $l\times k$ matrix. We need a formula for $l \times j$ matrix $(XY)^\top$ using a $k\times j$ matrix $X^\top$ and a $l\times k$ matrix $Y^\top$. The only way to multiply $X^\top$ and $Y^\top$ is $Y^\top\mkern -2mu X^\top$ and in this way we get an $l \times j$ matrix. This is not a proof, just a mnemonic tool. For proof, one needs to compare the entries of both matrices in the equality.

BK3. Let $n$ be a positive integer. The only vector in $\mathbb{R}^n$ whose norm is zero is the zero vector. That is for $\mathbf{v} \in \mathbb{R}^n$ we have $\|\mathbf{v}\| = 0$ if and only if $\mathbf{v} = \mathbf{0}_n$.

Proof. The set equality $\operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A)$ means \[ \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Nul}(A). \] As with all equivalences, we prove this equivalence in two steps.

Step 1. Assume that $\mathbf{x} \in \operatorname{Nul}(A)$. Then $A\mathbf{x} = \mathbf{0}$. Consequently, using BK1, we have \[ (A^\top\!A)\mathbf{x} = A^\top ( \!A\mathbf{x}) = A^\top\mathbf{0} = \mathbf{0}. \] Hence, $(A^\top\!A)\mathbf{x}= \mathbf{0}$, and therefore $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Nul}(A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ). \]

Step 2. In this step, we prove the the converse: \[ \tag{*} \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A). \] Assume, $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Then, $(A^\top\!\!A) \mathbf{x} = \mathbf{0}$. Multiplying the last equality by $\mathbf{x}^\top$ we get $\mathbf{x}^\top\! (A^\top\!\! A \mathbf{x}) = 0$. Using the associativity of the matrix multiplication, that is BK1, we obtain $(\mathbf{x}^\top\!\! A^\top) (A \mathbf{x}) = 0$. Using the Linear Algebra with the transpose operation, that is BK2, we get $(A \mathbf{x})^\top\! (A \mathbf{x}) = 0$. Now recall that for every vector $\mathbf{v}$ we have $\mathbf{v}^\top \mathbf{v} = \|\mathbf{v}\|^2$. Thus, we have proved that $\|A\mathbf{x}\|^2 = 0$. Now recall BK3, that the only vector whose norm is $0$ is the zero vector, to conclude that $A\mathbf{x} = \mathbf{0}_n$. This means $\mathbf{x} \in \operatorname{Nul}(A)$. This completes the proof of implication (*). The theorem is proved. QED.

(It is customary to end mathematical proofs with the abbreviation QED, see its Wikipedia entry.)

In Step 2 of the preceding proof, the idea introduced in the sentence which started with the highlighted text, is a truly brilliant idea. It is a pleasure to share these brilliant mathematical vignettes with you.

The preceding theorem has an important corollary.
Corollary 1 is stated in Exercises 28 and 29 in Section 6.5

Corollary 1. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then the columns of $A$ are linearly independent if and only if the matrix $A^\top\!A$ is invertible.

Background Knowledge established in Math 204 is as follows.

BK1. The columns of $A$ are linearly independent if and only if $\operatorname{Nul}(A)=\{\mathbf{0}_m\}$.

BK2. Notice that the matrix $A^\top\!\! A$ is a square $m\!\times\!m$ matrix. By The Invertible Matrix Theorem The matrix $A^\top\!\! A$ is invertible if and only if $\operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}$.

Proof. By BK1 the columns of $A$ are linearly independent if and only if $\operatorname{Nul}(A)=\{\mathbf{0}_m\}$. By the Theorem we have $\operatorname{Nul}(A) = \operatorname{Nul}(A^\top\!\! A )$. Therefore $\operatorname{Nul}(A)=\{\mathbf{0}_m\}$ if and only if $\operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}$. By BK2 $\operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}$ if and only if the matrix $A^\top\!\! A$.

Based on the three equivalences stated in the preceding paragraph, we have proved the corollary. QED.
Corollary 2 is implicitly stated in Theorem 13 in Section 6.5.

Corollary 2. Let $A$ be an $n\!\times\!m$ matrix. Then $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$.

Proof 1. The following equalities we established earlier: \begin{align*} \operatorname{Col}(A^\top\!\! A ) & = \operatorname{Row}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp, \\ \operatorname{Col}(A^\top) & = \operatorname{Row}(A) = \bigl( \operatorname{Nul}(A) \bigr)^\perp \end{align*} In the above Theorem we proved that the following subspaces are equal \[ \operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A). \] Equal subspaces have equal orthogonal complements. Therefore \[ \bigl(\operatorname{Nul}(A^\top\!\! A )\bigr)^\perp = \bigl( \operatorname{Nul}(A) \bigr)^\perp. \] Since earlier we proved \[ \operatorname{Col}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp \quad \text{and} \quad \operatorname{Col}(A^\top) = \bigl( \operatorname{Nul}(A) \bigr)^\perp, \] the last three equalities imply \[ \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top). \] The corollary is proved. QED.

Proof 2. (This is a direct proof. It does not use the above Theorem. It uses the existence of an orthogonal projection onto the column space of $A$.) The set equality $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$ means \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Col}(A^\top). \] We will prove this equivalence in two steps.

Step 1. Assume that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Then there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\mathbf{x} = (A^\top\!\!A)\mathbf{v}.$ Since by the definition of matrix multiplication we have $(A^\top\!\!A)\mathbf{v} = A^\top\!(A\mathbf{v})$, we have $\mathbf{x} = A^\top\!(A\mathbf{v}).$ Consequently, $\mathbf{x} \in \operatorname{Col}(A^\top).$ Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top). \]

Step 2. Now we prove the converse: \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). \] Assume, $\mathbf{x} \in \operatorname{Col}(A^\top).$ By the definition of the column space of $A^\top$, there exists $\mathbf{y} \in \mathbb{R}^n$ such that $\mathbf{x} = A^\top\!\mathbf{y}.$ Let $\widehat{\mathbf{y}}$ be the orthogonal projection of $\mathbf{y}$ onto $\operatorname{Col}(A).$ That is $\widehat{\mathbf{y}} \in \operatorname{Col}(A)$ and $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}.$ Since $\widehat{\mathbf{y}} \in \operatorname{Col}(A),$ there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\widehat{\mathbf{y}} = A\mathbf{v}.$ Since $\bigl(\operatorname{Col}(A)\bigr)^{\perp} = \operatorname{Nul}(A^\top),$ the relationship $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}$ yields $A^\top\bigl(\mathbf{y} - \widehat{\mathbf{y}}\bigr) = \mathbf{0}.$ Consequently, since $\widehat{\mathbf{y}} = A\mathbf{v},$ we deduce $A^\top\bigl(\mathbf{y} - A\mathbf{v}\bigr) = \mathbf{0}.$ Hence \[ \mathbf{x} = A^\top\mathbf{y} = \bigl(A^\top\!\!A\bigr) \mathbf{v} \quad \text{with} \quad \mathbf{v} \in \mathbb{R}^m. \] This proves that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Thus, the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \] is proved. The corollary is proved. QED.
Corollary 3 item is stated in Exercise 30 in Section 6.5. Corollary 3 is a consequence of Corollary 2. The hint given in Exercise 30 will result in a different proof of Corollary 3.

Corollary 3. Let $A$ be an $n\!\times\!m$ matrix. The matrices $A$, $A^\top$, $A^\top\!\! A$ and $A A^\top$ have the same rank.

Friday, February 23, 2024

A few years ago, a student's insightful question prompted me to explore the connection between the $QR$-factorization of an $n\times m$ matrix $A$ and the $m$-dimensional volume of the parallelepiped spanned by the columns of $A$ in the Euclidean space $\mathbb{R}^n$. While I don't remember the question's exact wording, I provide a paraphrased version below.

Question. Given two vectors in $\mathbb{R}^3$ we learned how to calculate the area of the parallelogram spanned by these two vectors. In a multivariable calculus class we used the cross product for this calculation. Given three vectors in $\mathbb{R}^3$ we learned how to calculate the volume of the parallelepiped spanned by these three vectors. We used the determinant of the $3\times 3$ matrix whose columns are the given three vectors. If we are given three vectors in $\mathbb{R}^4$, is there a way to calculate the volume of the three-dimensional parallelepiped spanned by these three vectors?
My answer to the above question is in the items below. It took me a while to come up with this answer. Later, I will add a picture that I used in class to motivate the reasoning below.
Let us find the $QR$-factorization of the following matrix: \[ A = \left[ \begin{array}{rrr} 4 & -1 & -7 \\ 2 & 8 & 7 \\ 2 & 4 & -8 \\ 1 & 5 & 5 \\ \end{array} \right] = \bigl[ \mathbf{a}_1 \ \ \mathbf{a}_2 \ \ \mathbf{a}_3 \bigr]. \] The goal here is to use the $QR$-factorization to calculate the volume of the $3$-dimensional parallelepiped spanned in $\mathbb{R}^4$ by the vectors $\mathbf{a}_1$, $\mathbf{a}_2$, and $\mathbf{a}_3$. Notice that \[ \| \mathbf{a}_1 \| = 5, \quad \|\mathbf{a}_2\| = \sqrt{106}, \quad \|\mathbf{a}_3\| = \sqrt{187}. \] Thus, the volume we seek is certainly smaller or equal to the product of these norms: \[ 5 \sqrt{19822} \approx 703.953. \]
- We first perform the Gram-Schmidt orthogonalization algorithm on the vectors $\mathbf{a}_1$, $\mathbf{a}_2$, and $\mathbf{a}_3$. \begin{align*} \mathbf{v}_1 & = \mathbf{a}_1 = \left[\!\begin{array}{r} 4 \\ 2 \\ 2 \\ 1 \\ \end{array}\! \right] \\ \mathbf{v}_2 & = \mathbf{a}_2 - \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_1}{\mathbf{v}_1\cdot \mathbf{v}_1}\right) \mathbf{v}_1 = \left[\!\begin{array}{r} -1 \\ 8 \\ 4 \\ 5 \\ \end{array}\! \right] - \frac{25}{25} \left[\!\begin{array}{r} 4 \\ 2 \\ 2 \\ 1 \\ \end{array}\! \right] = \left[\!\begin{array}{r} -5 \\ 6 \\ 2 \\ 4 \\ \end{array}\! \right] \\ \mathbf{v}_3 & = \mathbf{a}_3 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_1}{\mathbf{v}_1\cdot \mathbf{v}_1}\right) \mathbf{v}_1 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_2}{\mathbf{v}_2\cdot \mathbf{v}_2}\right) \mathbf{v}_2 \\ & = \left[\!\begin{array}{r} -7 \\ 7 \\ -8 \\ 5 \\ \end{array}\! \right] - \frac{-25}{25} \left[\!\begin{array}{r} 4 \\ 2 \\ 2 \\ 1 \\ \end{array}\! \right] - \frac{81}{81} \left[\!\begin{array}{r} -5 \\ 6 \\ 2 \\ 4 \\ \end{array}\! \right] = \left[\!\begin{array}{r} 2 \\ 3 \\ -8 \\ 2 \\ \end{array}\! \right] \end{align*}
- Since the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, and $\mathbf{v}_3$ form an orthogonal set of vectors, the volume of the cuboid spanned by the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, and $\mathbf{v}_3$ is \[ \| \mathbf{v}_1 \| \times \| \mathbf{v}_2 \| \times \| \mathbf{v}_3 \| = 5\times 9\times 9 = 405. \] Since this cuboid is a shear transformation of the parallelepiped spanned by the vectors $\mathbf{a}_1$, $\mathbf{a}_2$, and $\mathbf{a}_3$, the volume of the parallelepiped is also $405$. So, the Gram-Schmidt orthogonalization gives the volume that we are asked to calculate. The volume is the product of the lengths of the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, and $\mathbf{v}_3$.
- To get the $QR$-factorization of $A$, we normalize the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, and $\mathbf{v}_3$: \[ \mathbf{u}_1 = \frac{1}{5}\left[\!\begin{array}{r} 4 \\ 2 \\ 2 \\ 1 \\ \end{array}\! \right], \quad \mathbf{u}_2 = \frac{1}{9}\left[\!\begin{array}{r} -5 \\ 6 \\ 2 \\ 4 \\ \end{array}\! \right], \quad \mathbf{u}_3 = \frac{1}{9} \left[\!\begin{array}{r} 2 \\ 3 \\ -8 \\ 2 \\ \end{array}\! \right]. \]
- Next, we can write the $QR$-factorization of $A$: \begin{align*} Q & = \left[ \begin{array}{rrr} \frac{4}{5} & -\frac{5}{9} & \frac{2}{9} \\ \frac{2}{5} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{5} & \frac{2}{9} & -\frac{8}{9} \\ \frac{1}{5} & \frac{4}{9} & \frac{2}{9} \\ \end{array} \right], \\ R & = Q^\top A = \left[ \begin{array}{cccc} \frac{4}{5} & \frac{2}{5} & \frac{2}{5} & \frac{1}{5} \\ -\frac{5}{9} & \frac{2}{3} & \frac{2}{9} & \frac{4}{9} \\ \frac{2}{9} & \frac{1}{3} & -\frac{8}{9} & \frac{2}{9} \\ \end{array} \right]\left[ \begin{array}{rrr} 4 & -1 & -7 \\ 2 & 8 & 7 \\ 2 & 4 & -8 \\ 1 & 5 & 5 \\ \end{array} \right] = \left[ \begin{array}{ccc} 5 & 5 & -5 \\ 0 & 9 & 9 \\ 0 & 0 & 9 \\ \end{array} \right]. \end{align*} It is important to observe that $R$ can also be read from the Gram-Schmidt orthogonalization. The columns of $R$ are the coefficients of the columns of $A$ as written as linear combinations of the columns of $\mathbf{u}_1$, $\mathbf{u}_2$, and $\mathbf{u}_3$. Importantly, the entries on the diagonal of $R$ are the lengths of the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, and $\mathbf{v}_3$.
- Thus, the $QR$-factorization of $A$ is \[ \left[\begin{array}{rrr} 4 & -1 & -7 \\ 2 & 8 & 7 \\ 2 & 4 & -8 \\ 1 & 5 & 5 \\ \end{array} \right] = \left[ \begin{array}{rrr} \frac{4}{5} & -\frac{5}{9} & \frac{2}{9} \\ \frac{2}{5} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{5} & \frac{2}{9} & -\frac{8}{9} \\ \frac{1}{5} & \frac{4}{9} & \frac{2}{9} \\ \end{array} \right] \left[ \begin{array}{ccc} 5 & 5 & -5 \\ 0 & 9 & 9 \\ 0 & 0 & 9 \\ \end{array} \right]. \]
- The volume of the $3$-dimensional parallelepiped spanned in $\mathbb{R}^4$ by the vectors $\mathbf{a}_1$, $\mathbf{a}_2$, and $\mathbf{a}_3$ is \[ \det(R) = \left| \begin{array}{ccc} 5 & 5 & -5 \\ 0 & 9 & 9 \\ 0 & 0 & 9 \\ \end{array} \right| = 405. \]
- A natural question occurs: Is it possible to calculate this volume directly from the matrix $A$ (for a general $n\times m$ matrix $A$ with linearly independent columns) without doing the Gram-Schmidt orthogonalization? The answer is yes, and it follows from the calculations below: \begin{align*} A & = QR \\ A^\top A &= R^\top Q^\top \\ A^\top A &= R^\top Q^\top QR = R^\top I_m R = R^\top R \\ \det(R^\top) & = \det(R) \\ \det(A^\top A) & = \det(R^\top R) = \det(R^\top) \det(R) =\bigl(\det R\bigr)^2. \end{align*} Consequently \[ \det(R) = \sqrt{\det(A^\top A)}. \] For our specific matrix $A$: \[ A^\top A = \left[ \begin{array}{rrrr} 4 & 2 & 2 & 1 \\ -1 & 8 & 4 & 5 \\ -7 & 7 & -8 & 5 \\ \end{array} \right] \left[ \begin{array}{rrr} 4 & -1 & -7 \\ 2 & 8 & 7 \\ 2 & 4 & -8 \\ 1 & 5 & 5 \\ \end{array}\right] = \left[ \begin{array}{rrr} 25 & 25 & -25 \\ 25 & 106 & 56 \\ -25 & 56 & 187 \\ \end{array} \right] \] and \begin{align*} \left| \begin{array}{rrr} 25 & 25 & -25 \\ 25 & 106 & 56 \\ -25 & 56 & 187 \\ \end{array} \right| &=\left| \begin{array}{rrr} 25 & 25 & -25 \\ 0 & 81 & 81 \\ 0 & 81 & 162 \\ \end{array} \right| \\ & = \left| \begin{array}{rrr} 25 & 25 & -25 \\ 0 & 81 & 81 \\ 0 & 0 & 81 \\ \end{array} \right| \\ & = 25\times 81 \times 81 \\ & = (5\times 9 \times 9)^2 \\ &= 405^2. \end{align*}

Thursday, February 22, 2024

Guided by your questions, we discussed the general Pythagorean Theorem in the Euclidean space $\mathbb{R}^n$.

Theorem. Let $m, n \in \mathbb{N}$. Let $\mathbf{x}_1, \ldots, \mathbf{x}_m$ be an orthogonal set of vectors in $\mathbb{R}^n$. Then the following equality holds: \[ \| \mathbf{x}_1 + \cdots + \mathbf{x}_m \|^2 = \| \mathbf{x}_1\|^2 + \cdots + \|\mathbf{x}_m \|^2 . \] Or, using the sum notation, \[ \left\| \ \sum_{k=1}^m \mathbf{x}_k \ \right\|^2 = \sum_{k=1}^m \ \|\mathbf{x}_k\|^2 . \]
- Proof. Assume that vectors $\mathbf{x}_1, \ldots, \mathbf{x}_m$ form an orthogonal set of vectors in $\mathbb{R}^n$. This means that for all $j,k \in \{1,\ldots,m\}$ such that $j\neq k$ we have \[ \mathbf{x}_k \cdot \mathbf{x}_j = 0. \] The proof is based on the distributive property of the dot product. \begin{align*} \left\| \ \sum_{k=1}^m \mathbf{x}_k \ \right\|^2 & = \left(\sum_{k=1}^m \mathbf{x}_k \right) \cdot \left(\sum_{j=1}^m \mathbf{x}_j \right) \\ & = \sum_{k=1}^m \mathbf{x}_k \cdot \left( \sum_{j=1}^m \mathbf{x}_j \right) \\ & =\sum_{k=1}^m \ \mathbf{x}_k \cdot \mathbf{x}_k + \sum_{k=1}^m \mathbf{x}_k \cdot \left( \sum_{\overset{\Large j=1}{j\neq k}}^m \mathbf{x}_j \right) \\ & =\sum_{k=1}^m \ \mathbf{x}_k \cdot \mathbf{x}_k + \sum_{k=1}^m \left( \sum_{\overset{\Large j=1}{j\neq k}}^m \ \mathbf{x}_k \cdot \mathbf{x}_j \right) \\ & =\sum_{k=1}^m \ \mathbf{x}_k \cdot \mathbf{x}_k \\ & =\sum_{k=1}^m \ \|\mathbf{x}_k\|^2 \end{align*} The first equality above follows from the definition of the norm of a vector. The second equality above follows from the distributive property of the dot product. In the third equality above we separate the dot product of the vectors with the same index, therefore we emphasise that $j \neq k$. The fourth equality above follows from the distributive property of the dot product. In the fifth equality above we use the fact that $\mathbf{x}_k \cdot \mathbf{x}_j = 0$ whenever $j \neq k$; therefore the second sum disappears. The last equality is the definition of the norm. QED.
- To better understand the proof above, write all the details with $m = 3$. \begin{align*} \left\| \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 \right\|^2 & = \left(\mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 \right) \cdot \left(\mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 \right) \\ & = \mathbf{x}_1 \cdot \left( \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 \right) + \mathbf{x}_2 \cdot \left(\mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 \right) + \mathbf{x}_3 \cdot \left(\mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 \right) \\ & = \mathbf{x}_1 \cdot \mathbf{x}_1 + \mathbf{x}_2 \cdot \mathbf{x}_2 + \mathbf{x}_3 \cdot \mathbf{x}_3 \\ & \mkern 90mu + \mathbf{x}_1 \cdot \left( \mathbf{x}_2 + \mathbf{x}_3 \right) + \mathbf{x}_2 \cdot \left(\mathbf{x}_1 + \mathbf{x}_3 \right) + \mathbf{x}_3 \cdot \left(\mathbf{x}_1 + \mathbf{x}_2 \right) \\ & = \mathbf{x}_1 \cdot \mathbf{x}_1 + \mathbf{x}_2 \cdot \mathbf{x}_2 + \mathbf{x}_3 \cdot \mathbf{x}_3 + 0 + 0 + 0\\ & = \|\mathbf{x}_1\|^2 + \|\mathbf{x}_2\|^2 + \| \mathbf{x}_3 \|^2 \end{align*}
Since we presented the general Pythagorean Theorem in the Euclidean space $\mathbb{R}^n$, I could not pass a chance to prove Bessel's Inequality:

Theorem. Let $m, n \in \mathbb{N}$. Let $\mathbf{u}_1, \ldots, \mathbf{u}_m$ be an orthonormal set of vectors in $\mathbb{R}^n$ and let $\mathbf{y} \in\mathbb{R}^n$. Then the following inequality holds: \[ \sum_{k=1}^m \ \bigl| \, \mathbf{y} \cdot \mathbf{u}_k \bigr|^2 \leq \|\mathbf{y}\|^2 . \]
- Proof. Set \[ \mathcal{W} = \operatorname{Span} \bigl\{ \mathbf{u}_1, \ldots, \mathbf{u}_m \bigr\} \] Recall the post on February 15 where I post the formula from the projection of $\mathbf{y}$ onto $\mathbf{W}$. This is in Theorem 8 in Section 6.3.
  - Straightforward projections onto the span. Let $\mathcal{W} = \operatorname{Span}\bigl\{\mathbf{u}_1, \ldots, \mathbf{u}_m\bigr\}$ and let $\mathbf{y} \in \mathbb{R}^n$. Then the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ is given by \[ \require{bbox} \bbox[#FFFF00, 8px, border: 2px solid #808000]{ \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}}(\mathbf{y}) = \left( \mathbf{y}\cdot \mathbf{u}_1 \right) \mathbf{u}_1 + \left( \mathbf{y}\cdot \mathbf{u}_2 \right) \mathbf{u}_2 + \cdots + \left( \mathbf{y}\cdot \mathbf{u}_m \right) \mathbf{u}_m.} \]
  Since we assume the vectors $\mathbf{u}_1, \ldots, \mathbf{u}_m$ are unit vectors, for every $k \in \{1,\ldots,m\}$ we have \[ \bigl\| \left( \mathbf{y}\cdot \mathbf{u}_k \right) \mathbf{u}_k \bigr\|^2 = \bigl| \left( \mathbf{y}\cdot \mathbf{u}_k \right) \bigr|^2 \bigl\| \mathbf{u}_k \bigr\|^2 = \bigl| \left( \mathbf{y}\cdot \mathbf{u}_k \right) \bigr|^2. \] Using this last equality, and applying the Pythagorean Theorem to the yellow highlighted formula for the projection $\widehat{\mathbf{y}}$ we obtain \[ \bigl\| \widehat{\mathbf{y}} \bigr\|^2 = \bigl| \left( \mathbf{y}\cdot \mathbf{u}_1 \right) \bigr|^2 + \cdots + \bigl| \left( \mathbf{y}\cdot \mathbf{u}_m \right) \bigr|^2. \] Recall that from the definition of the orthogonal projection we have that the vectors \[ \widehat{\mathbf{y}} \quad \text{and} \quad \mathbf{y} - \widehat{\mathbf{y}} \] are orthogonal. Since \[ \mathbf{y} = \widehat{\mathbf{y}} + \bigl( \mathbf{y} - \widehat{\mathbf{y}} \bigr), \] applying the Pythagorean Theorem to the last vector equality we get \[ \bigl\| \mathbf{y} \bigr\|^2 = \bigl\| \widehat{\mathbf{y}} \bigr\|^2 + \bigl\| \mathbf{y} - \widehat{\mathbf{y}} \bigr\|^2 . \] Since \[ \bigl\| \mathbf{y} - \widehat{\mathbf{y}} \bigr\|^2 \geq 0, \] from the preceding equality we deduce that \[ \bigl\| \mathbf{y} \bigr\|^2 \geq \bigl\| \widehat{\mathbf{y}} \bigr\|^2. \] Now, we recall the equality \[ \bigl\| \widehat{\mathbf{y}} \bigr\|^2 = \bigl| \left( \mathbf{y}\cdot \mathbf{u}_1 \right) \bigr|^2 + \cdots + \bigl| \left( \mathbf{y}\cdot \mathbf{u}_m \right) \bigr|^2. \] Combining the last two displayed relations we obtain \[ \bigl\| \mathbf{y} \bigr\|^2 \geq \bigl| \left( \mathbf{y}\cdot \mathbf{u}_1 \right) \bigr|^2 + \cdots + \bigl| \left( \mathbf{y}\cdot \mathbf{u}_m \right) \bigr|^2. \] This proves Bessel's Inequality. QED.

Tuesday, February 20, 2024

I posted the application of orthogonal projection to converting a colored picture to a black-and-white picture as a backdated post yesterday, Monday, February 19.
The topic today was Section 6.4: Gram-Schmidt Orthogonalization Algorithm. I made up this name. It seems that it is an appropriate modern name. But, I think that mathematically the most fitting name would be: Gram-Schmidt Orthogonalization Recursion. The Gram-Schmidt Orthogonalization is a recursive formula. The formula for the next vector is given in terms of the previously calculated vectors. Suggested problems for Section 6.4: 2, 3, 5, 7, 9, 13, 15, 17, 19, 20
The presentation of the $QR$ factorization in the textbook somewhat obscures the direct connection between the Gram-Schmidt orthogonalization algorithm and the $QR$ factorization. Below I will demonstrate the connection.
Let $m, n \in \mathbb{N}$. Let $\mathbf{a}_1, \mathbf{a}_2, \ldots, \mathbf{a}_m$ be linearly independent vectors in $\mathbb{R}^n$.
- The Gram-Schmidt orthogonalization algorithm produces the mutually orthogonal vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ defined as follows: \begin{align*} \mathbf{v}_1 & = \mathbf{a}_1 \\ \mathbf{v}_2 & = \mathbf{a}_2 - \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 \\ \mathbf{v}_3 & = \mathbf{a}_3 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2 \\ & \ \ \vdots \\ \mathbf{v}_m & = \mathbf{a}_m - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 - \cdots - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\mathbf{v}_{m-1} \cdot \mathbf{v}_{m-1}} \right) \mathbf{v}_{m-1} \\ \end{align*}
- We can rewrite the above vector equations as \begin{align*} \mathbf{a}_1 & = \mathbf{v}_1 \\ \mathbf{a}_2 & = \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \right) \mathbf{v}_1 + \mathbf{v}_2 \\ \mathbf{a}_3 & = \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}}\right) \mathbf{v}_1 + \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{2}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \right) \mathbf{v}_2 + \mathbf{v}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \right) \mathbf{v}_1 + \cdots + \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\mathbf{v}_{m-1} \cdot \mathbf{v}_{m-1}} \right) \mathbf{v}_{m-1} + \mathbf{v}_m \\ \end{align*}
- Now we normalize the vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$. That is, we introduce the unit vectors $\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_m$ defined as follows: \[ \mathbf{u}_k = \frac{1}{\|\mathbf{v}_k\|} \mathbf{v}_k \quad \text{for all} \quad k \in \{1,\ldots,m\}. \] We use the fact that $\mathbf{v}_k \cdot \mathbf{v}_k = \|\mathbf{v}_k\|^2$ to rewrite the formulas for the vectors $\mathbf{a}_1,\dots, \mathbf{a}_m$ in terms of the orthonormal vectors $\mathbf{u}_1,\ldots,\mathbf{u}_m$ as follows: \begin{align*} \mathbf{a}_1 & = \|\mathbf{v}_1\| \, \mathbf{u}_1 \\ \mathbf{a}_2 & = \left( \mathbf{a}_2\cdot \mathbf{u}_{1} \right) \mathbf{u}_1 + \|\mathbf{v}_2\| \, \mathbf{u}_2 \\ \mathbf{a}_3 & = \left( \mathbf{a}_3\cdot \mathbf{u}_{1} \right) \mathbf{u}_1 + \left( \mathbf{a}_3\cdot \mathbf{u}_{2} \right) \mathbf{u}_2 + \|\mathbf{v}_3\| \, \mathbf{u}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \left( \mathbf{a}_m\cdot \mathbf{u}_{1} \right) \mathbf{u}_1 + \cdots + \left( \mathbf{a}_m\cdot \mathbf{u}_{m-1} \right) \mathbf{u}_{m-1} + \|\mathbf{v}_m\| \, \mathbf{u}_m \end{align*}
- Now set \[ \alpha_{j,k} = \mathbf{a}_k\cdot \mathbf{u}_{j} \quad \text{for} \quad j \in \{1,\ldots,k-1\}, \ \ k \in \{2,\ldots,m\} \] and the above equations can be rewritten as \begin{align*} \mathbf{a}_1 & = \|\mathbf{v}_1\| \, \mathbf{u}_1 \\ \mathbf{a}_2 & = \alpha_{1,2} \, \mathbf{u}_1 + \|\mathbf{v}_2\| \, \mathbf{u}_2 \\ \mathbf{a}_3 & = \alpha_{1,3} \, \mathbf{u}_1 + \alpha_{2,3} \, \mathbf{u}_2 + \|\mathbf{v}_3\| \, \mathbf{u}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \alpha_{1,m} \, \mathbf{u}_1 + \cdots + \alpha_{m-1,m} \, \mathbf{u}_{m-1} + \|\mathbf{v}_m\| \, \mathbf{u}_m \\ \end{align*}
- The preceding vector equations can be written in matrix form as follows \[ \Bigl[\begin{array}{ccccc} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 & \cdots & \mathbf{a}_m \end{array} \Bigr] = \Bigl[\begin{array}{ccccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \cdots & \mathbf{u}_m \end{array} \Bigr] \left[\begin{array}{ccccc} \|\mathbf{v}_1\| & \alpha_{1,2} & \alpha_{1,3} & \cdots & \alpha_{1,m} \\ 0 & \|\mathbf{v}_2\| & \alpha_{2,3} & \cdots & \alpha_{2,m} \\ 0 & 0 & \|\mathbf{v}_3\| & \cdots & \alpha_{3,m} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \|\mathbf{v}_m\| \\ \end{array} \right] \]
- The preceding matrix equation is the $QR$ factorization of $A$: \[ A = QR \] with \begin{equation*} A = \left[\begin{array}{ccccc} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 & \cdots & \mathbf{a}_m \end{array} \right] \quad \text{and} \quad Q = \left[\begin{array}{ccccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \cdots & \mathbf{u}_m \end{array} \right] \end{equation*} and the matrix $R$ is an upper triangular matrix with positive terms on the diagonal. Since the vectors $\mathbf{u}_1, \mathbf{u}_2,\ldots, \mathbf{u}_m$ are orthonormal, we have $Q^{\top} Q = I_m$. Therefore the upper triangular $m\!\times\!m$ matrix $R$ can be calculated as \[ R = Q^{\top} A. \]
Next I will state the $QR$ factorization of a matrix with linearly independent columns as a theorem.

Theorem. Every $n\times m$ matrix $A$ with linearly independent columns can be written as a product $A = QR$ where $Q$ is an $n\times m$ matrix whose columns form an orthonormal basis for the column space of $A$ and $R$ is an $m\times m$ upper triangular invertible matrix with positive entries on its diagonal.
The $QR$ factorization of a matrix is just the Gram-Schmidt orthogonalization algorithm for the columns of $A$ written in matrix form. The only difference is that the Gram-Schmidt orthogonalization algorithm produces orthogonal vectors which we have to normalize to obtain the matrix $ Q$ with orthonormal columns.
Example. A nice example is given by calculating $QR$ factorization of the $3\!\times\!2$ matrix \[ A = \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right]. \]
- Denote the columns of $A$ by $\mathbf{a}_1$ and $\mathbf{a}_2$.
- The Gram-Schmidt orthogonalization of the vectors $\mathbf{a}_1$ and $\mathbf{a}_2$ leads to vectors \[ \mathbf{v}_1 = \left[ \begin{array}{r} 1 \\[2pt] 2 \\[2pt] 2 \end{array}\right], \quad \mathbf{v}_2 = \left[ \begin{array}{r} -2/3 \\[2pt] 2/3 \\[2pt] 1/3 \end{array}\right]. \] These vectors are calculated as \[ \tag{*} \mathbf{v}_1 = \mathbf{a}_1, \quad \mathbf{v}_2 = \mathbf{a}_2 - \left(\frac{\mathbf{a}_2 \cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 = \mathbf{a}_2 - \frac{5}{3} \mathbf{v}_1. \]
- Next we normalize the vectors $\mathbf{v}_1$ and $\mathbf{v}_2$. The norm of vector $\mathbf{v}_1$ is $3$ and the norm of $\mathbf{v}_2$ is 1. Hence the following vectors are orthonormal: \[ \mathbf{u}_1 = \frac{1}{3} \left[ \begin{array}{r} 1 \\[2pt] 2 \\[2pt] 2 \end{array}\right], \quad \mathbf{u}_2 = \frac{1}{3} \left[ \begin{array}{r} -2 \\[2pt] 2 \\[2pt] 1 \end{array}\right]. \]
- We can rewrite equalities (*) using the vectors $\mathbf{u}_1$ and $\mathbf{u}_2$ as follows \[ \tag{**} \mathbf{a}_1 = 3\, \mathbf{u}_1 = 3\, \mathbf{u}_1 + 0\, \mathbf{u}_2, \quad \mathbf{a}_2 = \frac{5}{3} \, 3\, \mathbf{u}_1 + \mathbf{u}_2 = 5\,\mathbf{u}_1 + \mathbf{u}_2. \] In the matrix form the equalities (**) can be written as \[ \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right] = \frac{1}{3} \left[ \begin{array}{rr} 1 & -2 \\[2pt] 2 & 2 \\[2pt] 2 & 1 \end{array}\right] \left[ \begin{array}{rr} 3 & 5 \\[2pt] 0 & 1 \end{array}\right]. \] where \[ Q = \frac{1}{3} \left[ \begin{array}{rr} 1 & -2 \\[2pt] 2 & 2 \\[2pt] 2 & 1 \end{array}\right] \] is a matrix with orthonormal columns and its column space is identical to the columns space of the matrix $A$. Here \[ R = \left[ \begin{array}{rr} 3 & 5 \\[2pt] 0 & 1 \end{array}\right]. \]
- Notice that on the diagonal of the matrix $R$ are the norms of the vectors $\mathbf{v}_1$ and $\mathbf{v}_2$ which we obtained by the Gram-Schmidt orthogonalization algorithm. Since the matrix $Q$ has orthonormal columns we have $Q^\top Q = I_2$. Therefore the matrix $R$ can be calculated as \[ R = Q^\top A. \] That is \[ \left[ \begin{array}{rr} 3 & 5 \\[2pt] 0 & 1 \end{array}\right] = \frac{1}{3} \left[ \begin{array}{rrr} 1 & 2 & 2 \\[2pt] -2 & 2 & 1 \end{array}\right] \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right]. \] The last calculation of $R$ might be simpler than making adjustments to the coefficients of the Gram-Schmidt orthogonalization algorithm as we did in this above example. However, it is good to know that $R$ is closely related to the coefficients that appear in the Gram-Schmidt orthogonalization algorithm.
In the next example, I will demonstrate a useful simplification strategy when calculating the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$, and so on. Following the given formulas, calculation the vectors $\mathbf{v}$s will frequently involve fractions and make the arithmetic of the subsequent calculations more difficult. Recall, the objective here is to produce orthogonal set of vectors keeping the running spans equal.

To simplify the arithmetic, at each step of the Gram-Schmidt algorithm, we can replace a vector $\mathbf{v}_k$ by its scaled version $\alpha \mathbf{v}_k$ with a conveniently chosen $\alpha \gt 0$.

In this way we can avoid fractions in vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3.$ In the next item I present an example.
Example. Calculate the $QR$-factorization of the matrix: \[ \left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right]. \]
- We first apply the Gram-Schmidt algorithm to the vectors \[ \left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\! \right], \quad \left[\! \begin{array}{c} 6 \\ 6 \\ 1 \end{array}\! \right], \quad \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array}\! \right]. \]
- We first calculate \begin{align*} \mathbf{v}_1 & = \left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right], \\ \mathbf{v}_2 & = \left[\! \begin{array}{c} 6 \\ 6 \\ 1 \end{array}\!\right] - \frac{8}{7} \left[\!\begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right] = \left[\!\begin{array}{c} -6/7 \\ 18/7 \\ -9/7 \end{array}\!\right] = \frac{3}{7} \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right], \quad \\ & \phantom{\hspace{1cm}} \text{continue with the scaled version} \ \mathbf{v}_2 = \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right] \\ \mathbf{v}_3 & = \left[\!\begin{array}{c} 1\\ 1\\ 1 \end{array}\!\right] - \frac{11}{49}\left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right] - \frac{1}{49} \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right] = \frac{1}{49} \left[\!\begin{array}{r} 49 - 66 + 2 \\ 49 - 33 - 6 \\ 49 - 22 + 3 \end{array}\!\right] = \frac{5}{49} \left[\!\begin{array}{r} -3 \\ 2 \\ 6 \end{array}\!\right], \\ & \phantom{\hspace{4cm}} \text{continue with the scaled version} \ \mathbf{v}_3 = \left[\!\begin{array}{r} -3 \\ 2 \\ 6 \end{array}\!\right]. \end{align*}
- Next we normalize the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$: \[ \frac{1}{7} \left[\! \begin{array}{r} 6 \\ 3 \\ 2 \end{array}\! \right], \quad \frac{1}{7} \left[\! \begin{array}{r} -2 \\ 6 \\ -3 \end{array}\! \right], \quad \frac{1}{7} \left[\! \begin{array}{r} -3 \\ 2 \\ 6 \end{array}\! \right]. \] The preceding unit vectors are the columns of $Q$.
- Next we calculate $R = Q^\top A$: \[ \frac{1}{7} \left[\! \begin{array}{rrr} 6 & 3 & 2 \\ -2 & 6 & -3 \\ -3 & 2 & 6 \end{array}\! \right]\left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right] = \left[\! \begin{array}{rrr} 7 & 8 & 11/7 \\ 0 & 3 & 1/7 \\ 0 & 0 & 5/7 \end{array}\! \right] \]
- Thus \[ \left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right] = \frac{1}{7} \left[\! \begin{array}{rrr} 6 & -2 & -3 \\ 3 & 6 & 2 \\ 2 & -3 & 6 \end{array}\! \right] \left[\! \begin{array}{rrr} 7 & 8 & 11/7 \\ 0 & 3 & 1/7 \\ 0 & 0 & 5/7 \end{array}\! \right] \]
- The matrix \[ \frac{1}{7} \left[\! \begin{array}{rrr} 6 & -2 & -3 \\ 3 & 6 & 2 \\ 2 & -3 & 6 \end{array}\! \right] = \left[\! \begin{array}{rrr} 6/7 & -2/7 & -3/7 \\ 3/7 & 6/7 & 2/7 \\ 2/7 & -3/7 & 6/7 \end{array}\! \right] \] is a $3\!\times\!3$ matrix with orthonormal columns. A square matrix with orthonormal columns is called an orthogonal matrix. An orthogonal matrix is very special since its inverse equals its transpose. Verify that for the above matrix $Q$: \[ \left(\frac{1}{7} \left[\! \begin{array}{rrr} 6 & 3 & 2 \\ -2 & 6 & -3 \\ -3 & 2 & 6 \end{array}\! \right]\right) \left(\frac{1}{7} \left[\! \begin{array}{rrr} 6 & -2 & -3 \\ 3 & 6 & 2 \\ 2 & -3 & 6 \end{array}\! \right]\right) = \frac{1}{49} \left[\! \begin{array}{rrr} 49 & 0 & 0 \\ 0 & 49 & 0 \\ 0 & 0 & 49 \end{array}\! \right] \]
For practice, find $QR$ factorizations of the following matrices \[ \left[ \begin{array}{ccc} -1 & -1 & 3 \\ 1 & 5 & -1 \\ 1 & 1 & 3 \\ -1 & -5 & 7 \end{array} \right] \quad \left[ \begin{array}{ccc} 6 & 8 & 7 \\ 3 & 6 & 0 \\ 2 & 2 & 0 \end{array} \right] \quad \left[ \begin{array}{ccc} 2 & 2 & 1 \\ 1 & 2 & 8 \\ 2 & 3 & 1 \end{array} \right] \quad \left[ \begin{array}{ccc} 4 & -1 & -7 \\ 2 & 8 & 7 \\ 2 & 4 & -8 \\ 1 & 5 & 5 \end{array} \right] \] \[ \left[ \begin{array}{ccc} 2 & -6 & 4 \\ -5 & 9 & 1 \\ 4 & 4 & 9 \\ 2 & -4 & 5 \end{array} \right] \]

Monday, February 19, 2024 (posted on Tuesday, February 20)

Talking about orthogonal projections, a nice application of an orthogonal projection is converting a color picture to a black&white picture. To understand how this works, one first needs to understand that colors are in fact vectors in $\mathbb{R}^3,$ not all vectors in $\mathbb{R}^3,$ but only vectors confined to the unit cube \[ [0,1]^3 = \left\{ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \, : \, r, g, b \in [0,1] \right\}. \] Here $[0,1]$ denotes the closed unit interval of real numbers defined as \[ [0,1] = \bigl\{x \in \mathbb{R} \, : \, 0 \leq x \ \ \text{and} \ \ x \leq 1 \bigr\}. \] I always use so called RGB color model. I like using RGB triplets with entries between $0$ and $1$, including $0$ and $1$.
I love the application of vectors to COLORS so much that I wrote a webpage to celebrate it: Color Cube.
In class, I explained how the orthogonal projection onto the vector $\left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right]$ can be used to convert a color image into a black&white image. The formula for this projection is: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{3} \left[\begin{array}{c} r+g+b \\[-2pt] r+g+b \\[-1pt] r+g+b \end{array}\right] \]
In the animation below, the shades of Gray are on the diagonal of the unit cube joining the corners $(0,0,0)$ (Black) and $(1,1,1)$ (White). For the hundred shades of Gray $(a,a,a)$ with $a \in [0,1],$ I present a polygon of all the colors that are projected to that shade of Gray. The shade of Gray can be identified by the dot in the center of the polygon of colors. I present both a three-dimensional picture in the Color Cube and a two-dimensional orthogonal projection onto the plane which is orthogonal to the vector $\left[\begin{array}{c}1 \\ 1 \\ 1 \end{array}\right].$ Hover the cursor over the image for the animation to start.
What are the practical results of Linear Algebra on colors with a real life picture? I took a small picture of me, imported it into Wolfram Mathematica and did three linear algebra operations on the colors in that picture.
- The first picture from the left is the original picture.
- The second picture from the left is obtained from the original picture by replacing each color vector by its projection onto the vector containing all shades of Gray: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{3} \left[\begin{array}{c} r+g+b \\[-2pt] r+g+b \\[-1pt] r+g+b \end{array}\right]. \]
- The third picture from the left is obtained from the original picture by replacing each color vector by its darkened version: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]. \] In fact, a linear algebra definition of a dark color is as follows: A dark version of a color represented by its vector is the color represented by the scaling that vector by $1/2.$ All dark shades of that color are obtained by scaling that vector by a scalar between $0$ and $1.$
- The fourth picture from the left is obtained from the original picture by replacing each color vector by its lightened version, that is by the vector which is a half way between the color vector and white: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] + \frac{1}{2} \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right]. \] In fact, a linear algebra definition of a light color is as follows: A light version of a color represented by its vector is the color represented by the linear combination of that vector and the white vector with coefficients equal to $1/2.$ All light shades of a color are obtained the following formula \[ (1-s) \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] + s \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right] \qquad \text{where} \qquad s \in (0,1). \]
- Justifications for the definitions in the preceding two items are the pictures below:
  - The first picture above from the left is the original picture.
  - In the second picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{3}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right].$
  - In the third picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right].$
  - In the fourth picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right].$
  - The first picture above from the left is the original picture.
  - In the second picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{3}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]+\dfrac{1}{4} \left[\begin{array}{c} 1 \\[-2pt] 1 \\[-2pt] 1 \end{array}\right].$
  - In the third picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]+\dfrac{1}{2} \left[\begin{array}{c} 1 \\[-2pt] 1 \\[-2pt] 1 \end{array}\right].$
  - In the fourth picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]+\dfrac{3}{4} \left[\begin{array}{c} 1 \\[-2pt] 1 \\[-2pt] 1 \end{array}\right].$

Friday, February 16, 2024

We discussed Section 6.3 today. Suggested problems for Section 6.3 are: 1, 2, 4, 5, 7, 10, 11, 13, 15, 16 17, 19, 20, 21, 23.
The main concept in this section is the following definition:
Definition. Let $\mathcal{W}$ be a subspace of $\mathbb{R}^n$ and let $\mathbf{y} \in \mathbb{R}^n.$ The vector $\widehat{\mathbf{y}} \in \mathbb{R}^n$ is called the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ if the vector $\widehat{\mathbf{y}}$ has the following two properties:
- ① $\widehat{\mathbf{y}} \in \mathcal{W},$
- ② $\mathbf{y} - \widehat{\mathbf{y}} \in \mathcal{W}^{\perp}.$
The notation for the orthogonal projection $\widehat{\mathbf{y}}$ of $\mathbf{y} \in \mathbb{R}^n$ onto $\mathcal{W}$ is: \[ \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal W}(\mathbf{y}). \] The transformation $\operatorname{Proj}_{\mathcal W}: \mathbb{R}^n \to \mathcal W$ is called the orthogonal projection of $\mathbb{R}^n$ onto $\mathcal{W}$.
There are two important theorems in Section 6.3 are The Orthogonal Decomposition Theorem (Theorem 8) and The Best Approximation Theorem (Theorem 9).
Another important theorem in this section is Theorem 10 which I would call The Standard Matrix of an Orthogonal Projection.

The proof of Theorem 10 given in the book is deceptively simple. Please do understand the proof in the book. Below I will give another proof of this theorem.
Theorem
Assumptions
- Assumption 1. $\mathcal W$ is a subspace of $\mathbb R^n,$ $\mathbf u_1, \ldots, \mathbf u_m \in \mathcal W,$ and \[ \mathcal{W} = \operatorname{span} \{ \mathbf u_1, \ldots, \mathbf u_m \} \]
- Assumption 2. The set $\{ \mathbf u_1, \ldots, \mathbf u_m \}$ is an orthonormal set.
- Claim. For every $\mathbf y \in \mathbb R^n$ we have \[ \operatorname{Proj}_{\mathcal W}(\mathbf{y}) = UU^\top\!\mathbf{y}, \] where \[ U = \bigl[ \mathbf{u}_1 \, \cdots \, \mathbf{u}_m \bigr] \]
End of Theorem

In the proof of the theorem we use the following Background Knowledge, abbreviated as BK. Background Knowledge consists of the facts that we use in proof and that have been proved elsewhere.
- BK 1. Since by Assumption 2. the set $\{ \mathbf u_1, \ldots, \mathbf u_m \}$ is an orthonormal set, the matrix $U$ is an $n\!\times\!m$ matrix with orthonormal columns. Therefore, by Theorem 6 in Section 6.2, we have that $U^\top U = I_m.$
- BK 2. Since \[ U = \bigl[ \mathbf{u}_1 \, \cdots \, \mathbf{u}_m \bigr], \] by the definition of $\operatorname{Col}(U)$ we have \[ \operatorname{Col}(U) = \operatorname{span} \{ \mathbf u_1, \ldots, \mathbf u_m \} = \mathcal{W}. \]
- BK 3. By Theorem 3 in Section 6.1 we have \[ \bigl(\operatorname{Col}(U)\bigr)^\perp = \operatorname{Nul}(U^\top). \] Since by BK 2 we have \[ \mathcal{W} = \operatorname{Col}(U). \] we conclude that \[ \mathcal{W}^\perp = \bigl(\operatorname{Col}(U)\bigr)^\perp = \operatorname{Nul}(U^\top). \] Consequently, we have $\mathbf{x} \in \mathcal{W}^\perp$ if and only if $U^\top\!\mathbf{x} = \mathbf{0}$.
The proof starts here.

Let $\mathbf{y} \in \mathbb R^n$ be arbitrary. By the definition of orthogonal projection, to prove that \[ UU^\top\! \mathbf y = \operatorname{Proj}_{\mathcal W}(\mathbf{y}) \] we have to prove two claims: the first \[ UU^\top\!\mathbf{y} \in \mathcal{W} \] and, the second, \[ \mathbf{y} - UU^\top\!\mathbf{y} \in \mathcal{W}^\perp. \] Since every vector of the form $U\mathbf{v}$ belongs to $\operatorname{Col}(U)$, we have that \[ UU^\top\! \mathbf{y} = U\bigl(U^\top\! \mathbf{y}\bigr) \in \operatorname{Col}(U). \] By BK 2 we have $\operatorname{Col}(U) = \mathcal{W}.$ Therefore $UU^\top \mathbf y \in {\mathcal W}$ is proved. This proves the first claim.

To prove the second claim, we use BK 3. By BK 3: $\mathbf{x} \in \mathcal{W}^\perp$ if and only if $U^\top\!\mathbf{x} = \mathbf{0}$.. Therefore, to prove \[ \mathbf{y} - UU^\top\!\mathbf{y} \in \mathcal{W}^\perp, \] we need to prove \[ U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr) = \mathbf{0}. \] Let us calculate $U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr)$ below \begin{align*} U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr) & = U^\top \! \mathbf{y} - U^\top \! \bigl(UU^\top\!\mathbf{y} \bigr) \\ &= U^\top \! \mathbf{y} - \bigl( U^\top \! U\bigr) \bigl(U^\top\!\mathbf{y}\bigr) \\ & = U^\top \! \mathbf{y} - I_m \bigl(U^\top\!\mathbf{y}\bigr) \\ & = U^\top \! \mathbf{y} - U^\top\!\mathbf{y} \\ & = \mathbf{0} \end{align*} In the preceding sequence of equalities, the first equality holds since the matrix multiplication is distributive, the second equality holds since the matrix multiplication is associative, the third equality follows from BK 1, the fourth equality follows from the definition of the matrix $I_m$, and the fifth equality holds by the definition of the opposite vector.

In conclusion, for every $\mathbf{y} \in \mathbb R^m$ we proved that the vector $UU^\top\!\mathbf{y}$ has the properties ① and ② in the definition of orthogonal projection onto $\mathcal{W}$. This proves \[ \operatorname{Proj}_{\mathcal W}(\mathbf{y}) = UU^\top\!\mathbf{y}. \]
The proof ends here.

Thursday, February 15, 2024

We did Section 6.2: Orthogonal sets of vectors today. Suggested problems are: 2, 3, 5, 8, 9, 11, 13, 15, 17, 19, 21, 23, 25, 26, 27, 29.
Let $m, n \in \mathbb{N}$. Let $\mathbf{v}_1, \ldots, \mathbf{v}_m \in \mathbb{R}^n$ be an orthogonal set of nonzero vectors. What the last phrase means is the following
- For all $j,k \in \{1,\ldots,m\}$ we have $\mathbf{v}_j \cdot \mathbf{v}_k = 0$ whenever $j \neq k$.
- For all $k \in \{1,\ldots,m\}$ we have $\mathbf{v}_k \cdot \mathbf{v}_k \gt 0$.
It is truly important that we always connect vectors with matrices formed by those vectors. Let $\mathbf{v}_1, \ldots, \mathbf{v}_m \in \mathbb{R}^n$ be an orthogonal set of nonzero vectors.

Let us form the $n\times m$ matrix $M$ whose columns are the vectors $\mathbf{v}_1, \ldots, \mathbf{v}_m$: \[ M = \left[\!\begin{array}{cccc} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_m \end{array}\!\!\right]. \] Recall that $M^\top$ is an $m\times n$ matrix. Now calculate the product of $M^\top$ and $M$ as follows: \begin{align*} M^\top M & = \left[\!\begin{array}{c} \mathbf{v}_1^\top \\ \mathbf{v}_2^\top \\ \vdots \\ \mathbf{v}_m^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_m \end{array}\!\!\right] \\[6pt] & = \left[\!\begin{array}{cccc} \mathbf{v}_1\!\!\cdot\!\mathbf{v}_1 & \mathbf{v}_1\!\!\cdot\!\mathbf{v}_2 & \cdots & \mathbf{v}_1\!\!\cdot\!\mathbf{v}_m \\ \mathbf{v}_2\!\!\cdot\!\mathbf{v}_1 & \mathbf{v}_2\!\!\cdot\!\mathbf{v}_2 & \cdots & \mathbf{v}_2\!\!\cdot\!\mathbf{v}_m \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{v}_m\!\!\cdot\!\mathbf{v}_1 & \mathbf{v}_m\!\!\cdot\!\mathbf{v}_2 & \cdots & \mathbf{v}_m\!\!\cdot\!\mathbf{v}_m \end{array}\!\!\right] \\[6pt] & = \left[\!\begin{array}{cccc} \mathbf{v}_1\!\!\cdot\!\mathbf{v}_1 & 0 & \cdots & 0 \\ 0 & \mathbf{v}_2\!\!\cdot\!\mathbf{v}_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \mathbf{v}_m\!\!\cdot\!\mathbf{v}_m \end{array}\!\!\right] \end{align*} Thus $M^\top\! M$ is a diagonal $m\times m$ matrix.

The above matrix calculation shows that the vectors $\mathbf{v}_1, \ldots, \mathbf{v}_m \in \mathbb{R}^n$ form an orthogonal set of nonzero vectors if and only if the matrix $M^\top\! M$ is a diagonal $m\times m$ with nonzero entries on the diagonal.
Why are orthogonal sets of nonzero vectors in $\mathbb{R}^n$ important? My answer to that question are the following three properties, for which I use the adjective straightforward. Look at the proofs in the textbook, the proofs of these properties are straightforward using some standard algebraic manipulations with linear equations.
- Straightforward linear independence. The vectors in an orthogonal set of nonzero vectors $\mathbf{v}_1, \ldots, \mathbf{v}_m$ are linearly independent. This is Theorem 4 in Section 6.2.
- Straightforward linear combinations. If \[ \mathbf{y} = \alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_1 + \cdots + \alpha_m \mathbf{v}_m, \] then \[ \alpha_1 = \frac{\mathbf{y}\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}, \quad \alpha_2 = \frac{\mathbf{y}\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}, \quad \ldots, \quad \alpha_m = \frac{\mathbf{y}\cdot \mathbf{v}_m}{\mathbf{v}_m \cdot \mathbf{v}_m}. \]
  
  In other words:
  
  If $\mathbf{y} \in \operatorname{Span}\bigl\{\mathbf{v}_1, \ldots, \mathbf{v}_m\bigr\}$, then \[ \require{bbox} \bbox[#FFFF00, 8px, border: 2px solid #808000]{ \mathbf{y} = \left(\frac{\mathbf{y}\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 + \left(\frac{\mathbf{y}\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2+ \cdots + \left(\frac{\mathbf{y}\cdot \mathbf{v}_m}{\mathbf{v}_m \cdot \mathbf{v}_m}\right) \mathbf{v}_m.} \]
  
  This is Theorem 5 in Section 6.2.
- Straightforward projections onto the span. Let $\mathcal{W} = \operatorname{Span}\bigl\{\mathbf{v}_1, \ldots, \mathbf{v}_m\bigr\}$ and let $\mathbf{y} \in \mathbb{R}^n$. Then the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ is given by \[ \require{bbox} \bbox[#FFFF00, 8px, border: 2px solid #808000]{ \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}}(\mathbf{y}) = \left(\frac{\mathbf{y}\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 + \left(\frac{\mathbf{y}\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2 + \cdots + \left(\frac{\mathbf{y}\cdot \mathbf{v}_m}{\mathbf{v}_m \cdot \mathbf{v}_m}\right) \mathbf{v}_m.} \]
  
  This is Theorem 8 in Section 6.3.
Examples.
- Consider the vectors \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right]. \]
  
  These vectors form an orthogonal sets of nonzero vectors. To verify this claim, it is sufficient to calculate \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = 0, \quad \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 0, \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 0. \]
  
  By Theorem 4 in Section 6.2 these vectors are linearly independent. This is an example of straightforward linear independence.
- As shown in the preceding item the vectors \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right], \] form an orthogonal set of nonzero vectors. By Theorem 4 in Section 6.2 they are linearly independent. Consequently they form a basis for $\mathbb{R}^3$.
  
  To express the vector $\left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right]$ as a linear combination of the vectors \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right], \] we use the straightforward linear combinations, that is Theorem 5 in Section 6.2.
  
  We calculate \[ \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] = 6, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = -3, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 33, \] and \[ \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] = 81, \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = 81, \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 81. \] Then \begin{align*} \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] & = \frac{6}{81} \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] + \frac{-3}{81} \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] + \frac{33}{81} \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] \\ & = \left[\! \begin{array}{r} -\frac{16}{27} \\ \frac{2}{27} \\ \frac{8}{27} \end{array} \!\right] + \left[\! \begin{array}{r} -\frac{1}{27} \\ \frac{8}{27} \\ -\frac{4}{27} \end{array} \!\right] + \left[\! \begin{array}{r} \frac{44}{27} \\ \frac{44}{27} \\ \frac{77}{27} \end{array} \!\right] \end{align*}
- Consider the vectors \[ \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right]. \] These two vectors form an orthogonal set of nonzero vectors in $\mathbb{R}^4$.
  
  To calculate the orthogonal projection of the vector $\mathbf{y} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right]$ onto the linear span of the given two vectors, that is \[ \mathcal{W} = \operatorname{Span}\left\{ \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right], \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \right\} \] we use straightforward projections onto the span, that is Theorem 8 in Section 6.3.
  
  To calculate the projection, we need to calculate \[ \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 30, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 10 \] and \[ \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 36, \quad \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 36. \] Then the projection is: \begin{align*} \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}} \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] & = \frac{30}{36} \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] + \frac{10}{36} \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \\ & = \left[\! \begin{array}{c} \frac{5}{6} \\ \frac{5}{6} \\ \frac{25}{6} \\ \frac{15}{6} \end{array} \!\right] + \left[\! \begin{array}{r} \frac{5}{18} \\ -\frac{5}{18} \\ -\frac{15}{18} \\ \frac{25}{18} \end{array} \!\right] \\ & = \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right]. \end{align*} To verify that this calculation is correct we calculate \[ \mathbf{y} - \widehat{\mathbf{y}} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] - \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right] = \left[\! \begin{array}{r} -\frac{1}{9} \\ \frac{13}{9} \\ -\frac{3}{9} \\ \frac{1}{9} \end{array} \!\right]. \] This vector should be orthogonal to the both given vectors: \[ \frac{1}{9} \left[\! \begin{array}{r} -1 \\ 13 \\ -3 \\ 1 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 0, \quad \frac{1}{9} \left[\! \begin{array}{r} -1 \\ 13 \\ -3 \\ 1 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 0 \] Finally, to celebrate our calculation we write \[ \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] = \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right] + \left[\! \begin{array}{r} -\frac{1}{9} \\ \frac{13}{9} \\ -\frac{3}{9} \\ \frac{1}{9} \end{array} \!\right], \] and we point out that two vectors in the sum are orthogonal to each other. Moreover, the first vector in the sum is in the subspace $\mathcal{W}$ and the second vector in the sum is in the subspace $\mathcal{W}^\perp.$

Tuesday, February 13, 2024

We finished Section 6.1 today. Suggested problems: 1, 5, 7, 8, 9-12, 13, 15-18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32 (do this problem by hand), 33 (do this problem by hand).
The following concepts emerge from dot product in $\mathbb{R}^n$:
- the length of a vector,
- the distance between two vectors,
- the concept of orthogonality among vectors,
- the concept of orthogonal complement,
- the Pythagorean theorem for orthogonal vectors,
- the abstract concept of the angle between vectors.
Here is a proof of the Law of Cosines and its connection to dot product.
Here is a proof of the classical Pythagorean Theorem.
One of the most important theorems in Section 6.1 is Theorem 3. I did not talk about it in class (I will briefly mention it on Thursday, please study it from the textbook and here. Theorem 3 states: Let $m$ and $n$ be positive integers. Let $A$ be an $m\!\times\!n$ matrix. Then \[ \require{bbox} \bbox[#FFFF00, 8px, border: 3px solid #808000]{ \bigl(\operatorname{Row}(A)\bigr)^\perp = \operatorname{Nul}(A) } \quad \text{and} \quad \bbox[#FFFF00, 8px, border: 3px solid #808000]{ \bigl(\operatorname{Col}(A)\bigr)^\perp = \operatorname{Nul}(A^\top)}. \] And also \[ \bbox[#FFFF00, 8px, border: 3px solid #808000]{ \bigl(\operatorname{Nul}(A)\bigr)^\perp = \operatorname{Row}(A)} \quad \text{and} \quad \bbox[#FFFF00, 8px, border: 3px solid #808000]{\bigl(\operatorname{Nul}(A^\top)\bigr)^\perp = \operatorname{Col}(A)}. \]
The theorem from the previous item is useful for a problem like this: Find the orthogonal complement of the following span: \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\}. \]
- We first observe that \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} = \operatorname{Row}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right). \] Therefore, by Theorem 3 in Section 6.1 we deduce \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Nul}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right). \]
- To find the preceding null space Row Reduce to RREF: \[ \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \ \ \sim \quad \cdots \quad \sim \ \ \left[\! \begin{array}{cccc} 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \end{array} \!\right]. \] From the last RREF matrix we deduce that \[ \operatorname{Nul}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right) = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} \]
- Therefore \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} \]
- It is a curious fact that \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ -1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ - 1 \\ 1 \end{array} \!\right]\right\}. \] The preceding claim is a consequence of the fact that \[ \operatorname{Span} \bigl\{ \mathbf{u}, \mathbf{v} \bigr\} = \operatorname{Span} \bigl\{ \mathbf{u} + \mathbf{v}, \mathbf{u} - \mathbf{v} \bigr\}, \] and the fact that $\mathbf{u}$ and $\mathbf{v}$ are linearly independent if and only if $\mathbf{u}+\mathbf{v}$ and $\mathbf{u}-\mathbf{v}$ are linearly independent. (Prove these facts for exercise.)
- As a consequence of the preceding two items we have \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ -1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ -1 \\ 1 \end{array} \!\right]\right\}. \]
- The four vectors that we used in the preceding item are remarkable four vectors in $\mathbb{R}^4.$ One can see how remarkable they are if we put these four vectors as the columns of a $4\!\times\!4$ matrix \[ U = \left[\! \begin{array}{cccc} 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & 1 \\ 1 & 1 & 1 & -1 \\ 1 & -1 & 1 & 1 \end{array} \!\right] \] and calculate \[ U^\top U = \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ -1 & 1 & -1 & 1 \end{array} \!\right] \left[\! \begin{array}{cccc} 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & 1 \\ 1 & 1 & 1 & -1 \\ 1 & -1 & 1 & 1 \end{array} \!\right] = \left[\! \begin{array}{cccc} 4 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 4 \end{array} \!\right] \] The preceding matrix equality tells us in a matrix form that the columns of the matrix $U$ are orthogonal to each other and each column as a vector in $\mathbb{R}^4$ has norm (length) equal to $2.$

Monday, February 12, 2024

We started Section 6.1 today. Suggested problems: 1, 5, 7, 8, 9-12, 13, 15-18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32 (do this problem by hand), 33 (do this problem by hand).
The following concepts emerge from dot product in $\mathbb{R}^n$:
- the length of a vector,
- the distance between two vectors,
- the concept of orthogonality among vectors,
- the concept of orthogonal complement,
- the Pythagorean theorem for orthogonal vectors,
- the abstract concept of the angle between vectors.
In the animation below I illustrate the dot product in $\mathbb{R}^2$ with approximately $100$ pairs of vectors.

It is important for you to internalize that we have been working with the dot product all along when multiplying matrices. Let $k,m$ and $n$ be positive integers and let $A$ be a $k\!\times\!m$ matrix and $B$ be a $m\!\times\!n$. Then $A$ has $k$ rows and each row of $A$ is a transpose of a vector in $\mathbb{R}^m$. Similarly, $B$ has $n$ columns and each column of $B$ is a vector in $\mathbb{R}^m$. Now introduce the notation: \[ \mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_k \in \mathbb{R}^m \quad \text{are the transposes of the rows of} \quad A \] \[ \mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_n \in \mathbb{R}^m \quad \text{are the columns of} \quad B \] So, I can write the matrices $A$ and $B$ as \[ A = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right], \qquad B = \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right]. \] Now we calculate the product of $A$ and $B$ as follows: \[ A B = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right] = \left[\!\begin{array}{cccc} \mathbf{r}_1\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_n \\ \mathbf{r}_2\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{r}_k\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_n \end{array}\!\!\right] \]
As an example, I want to recall the matrix multiplication related to the Reduced Row Echelon Form of a matrix. Below is a $4\times 5$ matrix and its reduced row echelon form: \[ A = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] \ \sim \quad \cdots \quad \sim \ \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array} \!\right] \]
- Whenever the reduced row echelon form (RREF) is found we should observe the following remarkable identity: The matrix product of the $4\times 3$ matrix consisting of the pivot columns of $A$ and the $3\times 5$ matrix consisting of the nonzero rows of the RREF of $A$ results in the matrix $A$: \[ \require{bbox} \bbox[#7FBFBF, 8px, border: 3px solid teal]{ \left[\! \begin{array}{rrr} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 2 & 1 & 1 \\ 1 & 4 & 2 \end{array} \!\right] \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ \end{array} \!\right] = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] = A }. \] I decided to color this identity in teal since it is beautealful.
- Notice that the columns in the $4\!\times\!3$ matrix in the teal identity are the basis vectors for $\operatorname{Col}(A).$ Similarly, the rows in the $3\!\times\!5$ matrix in the teal identity are the basis vectors for $\operatorname{Row}(A).$
- By the rules of matrix multiplication the teal identity tells us that the columns of the matrix $A$ are linear combinations of the columns of the $4\!\times\!3$ matrix in the teal identity. The coefficients in these linear combinations are the columns of $3\!\times\!5$ matrix in the teal identity.
- By the rules of matrix multiplication the teal identity tells us that the rows of the matrix $A$ are linear combinations of the rows of the $3\!\times\!5$ matrix in the teal identity. The coefficients in these linear combinations are the rows of $4\!\times\!3$ matrix in the teal identity.
In this item I will illustrate the multiplication of two matrices below by emphasising the dot products between $\mathbb{R}^3$ vectors: \[ \require{bbox} \bbox[#7FBFBF, 8px, border: 3px solid teal]{ \left[\! \begin{array}{rrr} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 2 & 1 & 1 \\ 1 & 4 & 2 \end{array} \!\right] \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ \end{array} \!\right] = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] }. \] Here the left matrix in the product is a $4\times\!3$ matrix with the rows: \[ \left[\! \begin{array}{r} 1 \\ 3 \\ 2 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 2 \\ 0 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 2 \\ 1 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 1 \\ 4 \\ 2 \end{array} \!\right], \] and right matrix in the product is a $3\times\!5$ matrix with the columns: \[ \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right]. \] It is useful to think of the matrix multiplication as follows (as you read through the matrix-vector arithmetic below always look for dot products in different forms, always): \begin{align*} & \left[\!\! \begin{array}{r} \phantom{\biggl|\biggr.} \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr] \end{array} \!\right] \left[\! \begin{array}{rrrrr} \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \end{array} \!\right] \\ & = \left[\! \begin{array}{rrrrr} \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \end{array} \!\right] \\ & = \left[\! \begin{array}{ccccc} 1\!\cdot\!1 + 3\!\cdot\!0 + 2\!\cdot\!0 & 1\!\cdot\!0 + 3\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!(-1) + 3\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!0 + 3\!\cdot\!0 + 2\!\cdot\!1 & 1\!\cdot\!1 + 3\!\cdot\!1 + 2\!\cdot\!(-1) \\ 2\!\cdot\!1 + 0\!\cdot\!0 + 1\!\cdot\!0 & 2\!\cdot\!0 + 0\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!(-1) + 0\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!0 + 0\!\cdot\!0 + 1\!\cdot\!1 & 2\!\cdot\!1 + 0\!\cdot\!1 + 1\!\cdot\!(-1) \\ 2\!\cdot\!1 + 1\!\cdot\!0 + 1\!\cdot\!0 & 2\!\cdot\!0 + 1\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!(-1) + 1\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!0 + 1\!\cdot\!0 + 1\!\cdot\!1 & 2\!\cdot\!1 + 1\!\cdot\!1 + 1\!\cdot\!(-1) \\ 1\!\cdot\!1 + 4\!\cdot\!0 + 2\!\cdot\!0 & 1\!\cdot\!0 + 4\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!(-1) + 4\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!0 + 4\!\cdot\!0 + 2\!\cdot\!1 & 1\!\cdot\!1 + 4\!\cdot\!1 + 2\!\cdot\!(-1) \end{array} \!\right] \\ &= \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right]. \end{align*}

Friday, February 9, 2024

In this item I will work out the details how the Hidden Rotation-Dilation Theorem works for a real $3\times 3$ matrix with complex eigenvalues. Consider the $3\times 3$ matrix \[ A = \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right]. \] You might be able to figure out the geometric action of this matrix by looking at what this matrix does to the coordinate vectors. However, I want to demonstrate how to use the eigenvalues and eigenvectors to figure this action out.
- First calculate the eigenvalues by finding the characteristic polynomial of $A$ \begin{align*} \det\left( A - \lambda I_3 \right) & = \left| \begin{array}{ccc} -\lambda & 0 & 1\\ 1 & -\lambda & 0 \\ 0 & 1 & -\lambda \end{array} \right| \\ & = (-\lambda) \left| \begin{array}{cc} -\lambda & 0 \\ 1 & -\lambda \end{array} \right| + \left| \begin{array}{cc} 1 & -\lambda \\ 0 & 1 \end{array} \right| \\ & = -\lambda^3 + 1. \end{align*} To find all the roots of the cubic equation $\lambda^3 - 1 = 0$, we factor it \[ \lambda^3 - 1 = (\lambda -1) (\lambda^2 + \lambda + 1), \] and find out that \[ \lambda_1 = 1, \quad \lambda_2 = -\frac{1}{2} - i \frac{\sqrt{3}}{2}, \quad \lambda_3 = -\frac{1}{2} + i \frac{\sqrt{3}}{2} \] are the eigenvalues of $A$.
- To calculate an eigenvector corresponding to the eigenvalue $\displaystyle \lambda_2 = -\frac{1}{2} - i \frac{\sqrt{3}}{2}$ we need to find the nullspace of the matrix \[ A - \lambda_2 I_3 = \left[\!\begin{array}{ccc} \frac{1}{2} + i \frac{\sqrt{3}}{2} & 0 & 1 \\ 1 & \frac{1}{2} + i \frac{\sqrt{3}}{2} & 0 \\ 0 & 1 & \frac{1}{2} + i \frac{\sqrt{3}}{2} \end{array}\right] \] It is convenient to introduce the notation \[ z = \frac{1}{2} + i \frac{\sqrt{3}}{2} \] and calculate \begin{align*} \overline{z} & = \frac{1}{2} - i \frac{\sqrt{3}}{2} \\ z^2 & = \frac{1}{4} + i \frac{\sqrt{3}}{2} - \frac{3}{4} = -\frac{1}{4} + i \frac{\sqrt{3}}{2} = - \overline{z} \\ -z^2 & = \overline{z} \\ z \mkern 1.5mu \overline{z} & = \left(\frac{1}{2} + i \frac{\sqrt{3}}{2}\right) \left(\frac{1}{2} - i \frac{\sqrt{3}}{2}\right) = \frac{1}{4} - (-1) \frac{3}{4} = 1 \\ -z \mkern 1.5mu \overline{z} & =-1 \end{align*} Now we calculate the Reduced Row Echelon Form of $A - \lambda_2 I_3$ as follows (at the 1st step we swap the first two rows; at the 2nd step we replace the second row with $(-z)\text{Row1} + \text{Row2}$; at the 3rd step we swap the last two rows; at the 4th step we replace the third row with $(-\overline{z})\text{Row2} + \text{Row3}$) \begin{alignat*}{2} \text{1st} \ & \text{step} \quad \left[\!\begin{array}{ccc} z & 0 & 1 \\ 1 & z & 0 \\ 0 & 1 & z \end{array}\right] && \sim \left[\!\begin{array}{ccc} 1 & z & 0 \\ z & 0 & 1 \\ 0 & 1 & z \end{array}\right] \\ \text{2nd} \ & \text{step} && \sim \left[\!\begin{array}{ccc} 1 & z & 0 \\ 0 & \overline{z} & 1 \\ 0 & 1 & z \end{array}\right] \\ \text{3rd} \ & \text{step} && \sim \left[\!\begin{array}{ccc} 1 & 0 & \overline{z} \\ 0 & 1 & z \\ 0 & \overline{z} & 1 \end{array}\right] \\ \text{4th} \ & \text{step} && \sim \left[\!\begin{array}{ccc} 1 & 0 & \overline{z} \\ 0 & 1 & z \\ 0 & 0 & 0 \end{array}\right] \\ \end{alignat*} To find the eigenspace corresponding to $\displaystyle \lambda_2 = -\frac{1}{2} - i \frac{\sqrt{3}}{2}$ we solve the system $x_1 + \overline{z} x_3 = 0$ and $x_2 + z x_3 = 0$. Here $x_3$ is a free variable, call it $s$, and for $x_1$ and $x_2$ we have \begin{align*} x_1 & = - s \overline{z} = - s \left(\frac{1}{2} - i \frac{\sqrt{3}}{2}\right) \\ x_2 &= - s z = - s \left(\frac{1}{2} + i \frac{\sqrt{3}}{2}\right) \\ x_3 & = s \end{align*} It is convenient to take $s= 2$. Then we get that a corresponding eigenvector is \[ \mathbf{v}_2 = \left[ \begin{array}{c} -1 \\ -1 \\ 2 \end{array} \right] + i \left[ \begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array} \right]. \]
- Corresponding eigenvectors : \begin{alignat*}{3} \require{bbox} \lambda_1 & = \bbox[#FF8000]{1} \quad && \text{a corresponding eigenvector is} \quad \mathbf{v}_1 &&= \left[ \bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 1\end{array}} \right] \\ \lambda_2 & = \bbox[#00FF00]{-\frac{1}{2}} - i \bbox[#8080FF]{\frac{\sqrt{3}}{2}} \quad && \text{a corresponding eigenvector is} \quad \mathbf{v}_2 && = \left[ \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} \right] + i \left[ \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \right] \\ \lambda_3 & = \bbox[#00FF00]{-\frac{1}{2}} + i \bbox[#8080FF]{\frac{\sqrt{3}}{2}} \quad && \text{a corresponding eigenvector is} \quad \mathbf{v}_3 && = \left[\bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} \right] - i \left[ \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0\end{array}} \right] \end{alignat*}
- Homework: Verify that the above calculated eigenvalues and eigenvectors are correct.
- The significance of the eigenvector $\left[\! \begin{array}{c} 1 \\ 1 \\ 1\end{array} \!\right]$ is that it remains unchanged under $A$. Not only that, but any scalar multiple of this eigenvector remains unchanged under $A$. For all $x\in \mathbb{R}$ we have: \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] \left[\! \begin{array}{c} x \\ x \\ x\end{array} \!\right] = \left[\! \begin{array}{c} x \\ x \\ x \end{array} \!\right] \]
- Below we will discover the significance of the real and the imaginary part of the eigenvector $\mathbb{v}_2$: \[ \left[ \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} \right], \quad \left[ \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \right]. \]
- Recall the following equivalences that we established yesterday: \[ A (\bbox[#00FF00]{\mathbf{x}} + i \mkern 1mu \bbox[#8080FF]{\mathbf{y}}) = (\bbox[#00FF00]{a}-i \mkern 1mu \bbox[#8080FF]{b}) (\bbox[#00FF00]{\mathbf{x}} + i \mkern 1mu \bbox[#8080FF]{\mathbf{y}}) \quad \Leftrightarrow \quad \begin{array}{l} A\bbox[#00FF00]{\mathbf{x}} = \phantom{-} \bbox[#00FF00]{a} \mkern 1mu \bbox[#00FF00]{\mathbf{x}} +\bbox[#8080FF]{b} \mkern 1mu \bbox[#8080FF]{\mathbf{y}} \\ A \bbox[#00FF00]{\mathbf{x}} = - \bbox[#8080FF]{b} \bbox[#00FF00]{\mathbf{x}} + \bbox[#00FF00]{a} \bbox[#8080FF]{\mathbf{y}} \end{array} \quad \Leftrightarrow \quad A \left[\! \begin{array}{cc} \bbox[#00FF00]{\mathbf{x}} & \bbox[#8080FF]{\mathbf{y}} \end{array} \!\right] = \left[\! \begin{array}{cc} \bbox[#00FF00]{\mathbf{x}} & \bbox[#8080FF]{\mathbf{y}} \end{array} \!\right] \left[\! \begin{array}{cc} \bbox[#00FF00]{a} & -\bbox[#8080FF]{b} \\ \bbox[#8080FF]{b} & \bbox[#00FF00]{a} \end{array} \!\right]. \]
- Applying the preceding equivalence to the complex eigenvalue $\lambda_2$ and its eigenvector we found above, we get \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] \left[\! \begin{array}{cc} \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} & \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \end{array}\!\right] = \left[\! \begin{array}{cc} \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} & \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \end{array}\!\right] \left[\! \begin{array}{cc} \bbox[#00FF00]{-\frac{1}{2}} & -\bbox[#8080FF]{\frac{\sqrt{3}}{2}} \\ \bbox[#8080FF]{\frac{\sqrt{3}}{2}} & \bbox[#00FF00]{-\frac{1}{2}} \end{array} \!\right]. \] Since \[ \bbox[#00FF00]{\cos\bigl(2\pi/3\bigr)} = \bbox[#00FF00]{-\frac{1}{2}} \qquad \text{and} \qquad \bbox[#8080FF]{\sin\bigl(2\pi/3\bigr)}= \bbox[#8080FF]{\frac{\sqrt{3}}{2}}, \] the matrix \[ \left[\! \begin{array}{cc} \bbox[#00FF00]{-\frac{1}{2}} & -\bbox[#8080FF]{\frac{\sqrt{3}}{2}} \\ \bbox[#8080FF]{\frac{\sqrt{3}}{2}} & \bbox[#00FF00]{-\frac{1}{2}} \end{array} \!\right] \] represents the counterclockwise rotation by $\theta = 2 \pi/3.$
- The identity that we discovered in the preceding item tells us that the matrix $A$ acts as the rotation by $\theta = 2 \pi/3$ in the plane spanned by the vectors \[ \left[ \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} \right], \quad \left[\bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \right] \]
- Now the question arises, how to implement the eigenvalue $\lambda_1 = \bbox[#FF8000]{1}$ and its eigenvector $\left[\bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 1\end{array}} \right]$ to discover an identity corresponding to hidden dilation-rotation for a $3\times 3$ matrix. The following identity holds: \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] \left[\! \begin{array}{ccc} \bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 1\end{array}} & \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} & \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \end{array} \!\right] = \left[\! \begin{array}{ccc} \bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 1\end{array}} & \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} & \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \end{array} \!\right] \left[\! \begin{array}{ccc} \bbox[#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#00FF00]{-\frac{1}{2}} & -\bbox[#8080FF]{\frac{\sqrt{3}}{2}} \\ 0 & \bbox[#8080FF]{\frac{\sqrt{3}}{2}} & \bbox[#00FF00]{-\frac{1}{2}} \end{array} \!\right]. \] Now calculate \[ \left[\! \begin{array}{ccc} \bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 1\end{array}} & \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} & \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \end{array} \!\right]^{-1} = \frac{1}{6} \left[\! \begin{array}{ccc} 2 & 2 & 2 \\ -1 & -1 & 2 \\ \sqrt{3} & -\sqrt{3} & 0 \end{array} \!\right] \] Thus, the hidden dilation-rotation formula for the $3\times 3$ matrix $A$ is \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] = \left[\! \begin{array}{ccc} \bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 1\end{array}} & \bbox[#00FF00]{\begin{array}{c} -1 \\ -1 \\ 2 \end{array}} & \bbox[#8080FF]{\begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array}} \end{array} \!\right] \left[\! \begin{array}{ccc} \bbox[#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#00FF00]{-\frac{1}{2}} & -\bbox[#8080FF]{\frac{\sqrt{3}}{2}} \\ 0 & \bbox[#8080FF]{\frac{\sqrt{3}}{2}} & \bbox[#00FF00]{-\frac{1}{2}} \end{array} \!\right] \left[\! \begin{array}{ccc} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ -\frac{1}{6} & -\frac{1}{6} & \frac{1}{3} \\ \frac{\sqrt{3}}{6} & -\frac{\sqrt{3}}{6} & 0 \end{array} \!\right] . \]
- Homework: Explain the meaning of the formula in the preceding item using the concept of change of coordinates.

Thursday, February 8, 2024

Today we went over all the details of what was posted on Tuesday, February 6, 2024.
To unhide a hidden rotation and a dilation in a $2\times 2$ matrix is such a fun process. And enjoying the meaning of it enriches our understanding of the beauty of linear algebra.

Tuesday, February 6, 2024

Today we started Section 5.5. Suggested problems for Section 5.5: 1-6, 7-12, 13, 16, 17, 18, 21, 25, 26.
I rewrote my introduction to Complex Numbers. I added a colored picture of the Complex Plane in which I introduce the conjugation of complex numbers as the reflection across the real axis.
We work with complex numbers in this section. Recall the properties of complex conjugation. If $z = a + i b,$ with $a, b \in \mathbb{R},$ then the complex conjugate $\overline{z}$ of $z$ is the complex number $\overline{z} = a - i b.$ The following algebra holds for complex conjugation. For complex numbers $z \in \mathbb{C}$ and $w \in \mathbb{C}$ we have \[ \overline{z + w} = \overline{z} + \overline{w} \qquad \text{and} \qquad \overline{z \, w} = \overline{z}\mkern 4mu \overline{w}. \] To verify these algebraic properties of conjugation we set $z = a + i b$ and $w = c + i d$ with $a, b, c, d \in \mathbb{R}.$
- We calculate \begin{align*} z + w & = (a+c) + i (b + d), \\ \overline{z + w} & = (a+c) - i (b + d), \\ \overline{z} & = a - i b, \\ \overline{w} & = c - i d, \\ \overline{z} + \overline{w} & = (a+c) + i (-b - d) = (a+c) - i (b + d). \end{align*} Comparing the second equality and the last equality above, we deduce $\overline{z + w} = \overline{z} + \overline{w}.$
- We calculate \begin{align*} z \, w & = (a+ib) (c +i d) = (ac - bd) + i (ad + bc) \\ \overline{z \, w} & = (ac - bd) - i (ad + bc) \\ \overline{z} & = a - i b, \\ \overline{w} & = c - i d, \\ \overline{z} \mkern 4mu \overline{w} & = (a - i b) (c - i d) = (ac - bd) + i (-ad-bc) = (ac - bd) - i (ad+bc). \end{align*} Comparing the second equality and the last equality above, we deduce $\overline{z \, w} = \overline{z} \mkern 3mu \overline{w}.$
- Let $z = a + i b,$ with $a, b \in \mathbb{R}.$ The $\overline{z} = z$ if and only if $a - i b = a + i b.$ The last equality is equivalent to $2 i b = 0$ and this equality is equivalent to $b=0.$ And $b=0$ means $z = a \in \mathbb{R}.$ Therefore, for $z \in \mathbb{C}$ we have $\overline{z} = z$ if and only if $z \in \mathbb{R}.$
A summary of Section 5.5 is as follows. Let $A$ be a real $n\!\times\!n$ matrix. Assume that $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{x} + i \mathbf{y} \in\mathbb{C}^n \quad \text{where} \quad \mathbf{x}, \mathbf{y} \in \mathbb{R}^n. \] That is we assume \[ A \mathbf{v} = \lambda \mathbf{v}, \quad \mathbf{v} \neq \mathbf{0}, \quad \operatorname{Im}(\lambda) = -b \neq 0. \] Next we write the preceding equality with the real and imaginary components: \[ A (\mathbf{x} + i \mathbf{y}) = (a-i b) (\mathbf{x} + i \mathbf{y}). \] Now taking the conjugate of both sides of this equation we get \[ \overline{A (\mathbf{x} + i \mathbf{y})} = \overline{(a-i b) (\mathbf{x} + i \mathbf{y})}. \] Next, using the rules for the conjugation we get \[ \overline{A} \, \overline{(\mathbf{x} + i \mathbf{y})} = \overline{(a-i b)} \, \overline{(\mathbf{x} + i \mathbf{y})}. \] Since $A$ consists of real numbers we have $\overline{A} = A.$ Consequently, the preceding equality reads \[ A (\mathbf{x} - i \mathbf{y}) = (a + i b) (\mathbf{x} - i \mathbf{y}). \] The last equality means that $\overline{\lambda} = a + i b$ is an eigenvalue of $A$ and its corresponding eigenvector is $\overline{\mathbf{v}} = \mathbf{x} - i \mathbf{y}.$
The conclusion from the previous item is: If $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, is a complex eigenvalue of a real matrix $A$ and \[ \mathbf{v} = \mathbf{x} + i \mathbf{y} \quad \text{where} \quad \mathbf{x}, \mathbf{y} \in \mathbb{R}^n, \] is a corresponding eigenvector, then the conjugate $\overline{\lambda} = a + i b$ is also an eigenvalue of $A$ and its corresponding eigenvector is the conjugate vector of $\mathbf{v},$ that is $\overline{\mathbf{v}} = \mathbf{x} - i \mathbf{y}.$
Assume that a real matrix $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{x} + i \mathbf{y} \quad \text{where} \quad \mathbf{x}, \mathbf{y} \in \mathbb{R}^n. \] This means that \[ A \bigl(\mathbf{x} + i \mathbf{y}\bigr) = ( a - i b) \bigl(\mathbf{x} + i \mathbf{y}\bigr). \] Using the linearity of the matrix-vector multiplication and algebra with vectors we can rewrite the preceding equality as \[ A \mathbf{x} + i A \mathbf{y} = (a\mkern 1mu \mathbf{x} + b\mkern 1mu \mathbf{y}) + i (- b\mkern 1mu \mathbf{x} + a\mkern 1mu \mathbf{y}). \] Since the vectors $A \mathbf{x}$, $A \mathbf{y}$ and $a\mkern 1mu \mathbf{x} + b\mkern 1mu \mathbf{y}$, $- b\mkern 1mu \mathbf{x} + a\mkern 1mu \mathbf{y}$ are vectors with real entries the preceding equality implies that \begin{align*} A\mkern 1mu \mathbf{x} & = \ \ a\mkern 1mu \mathbf{x} + b\mkern 1mu \mathbf{y} \\ A\mkern 1mu \mathbf{y} & = - b\mkern 1mu \mathbf{x} + a\mkern 1mu \mathbf{y} . \end{align*} The last two vector equalities can be rewritten as one matrix equality \[ A \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] = \bigl[ A\mkern 1mu\mathbf{x} \ \ A\mkern 1mu\mathbf{y} \bigr] = \bigl[ a\mkern 1mu \mathbf{x} + b\mkern 1mu \mathbf{y} \ \ \ - b\mkern 1mu \mathbf{x} + a\mkern 1mu \mathbf{y} \bigr]. \] The last matrix can be factored as \[ \bigl[ a\mkern 1mu \mathbf{x} + b\mkern 1mu \mathbf{y} \ \ \ - b\mkern 1mu \mathbf{x} + a\mkern 1mu \mathbf{y} \bigr] = \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right]. \] Finally, the last two equalities yield \[ A \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] = \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right]. \] Notice that the matrices in the preceding equality are $n\!\times\!2$ matrices.
Let $A$ be a real $n\!\times\!n$ matrix. Assume that $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{x} + i \mathbf{y} \in \mathbb{C}^n \quad \text{where} \quad \mathbf{x}, \mathbf{y} \in \mathbb{R}^n. \] In class we will prove the following implication: If \[ A (\mathbf{x} + i \mathbf{y}) = (a - i b) (\mathbf{x} + i \mathbf{y}), \quad \mathbf{x} + i \mathbf{y} \neq \mathbf{0}, \quad b \neq 0, \] than \[ \mathbf{x} \quad \text{and} \quad \mathbf{y} \qquad \text{are linearly independent.} \]
Now assume that $A$ is a real $2\!\times\!2$ matrix. In this case, since the real vectors $\mathbf{x}$ and $\mathbf{y}$ are linearly independent, the $2\!\times\!2$ matrix $\bigl[ \mathbf{x} \ \ \mathbf{y} \bigr]$ is invertible. Therefore, the equality \[ A \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] = \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] can be rewritten as \[ A = \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr]^{-1}. \] The matrix \[ \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] is a composition of a scaling and a rotation. To see that factor out $\sqrt{a^2+b^2}$ from the preceding matrix: \[ \sqrt{a^2+b^2} \left[\! \begin{array}{rr} \frac{a}{\sqrt{a^2+b^2}} & -\frac{b}{\sqrt{a^2+b^2}} \\ \frac{b}{\sqrt{a^2+b^2}} & \frac{a}{\sqrt{a^2+b^2}} \end{array} \!\right]. \] Since \[ \left( \frac{a}{\sqrt{a^2+b^2}} \right)^2 + \left( \frac{b}{\sqrt{a^2+b^2}} \right)^2 = 1, \] there exists an angle $\theta \in [0,2\pi)$ such that \[ \cos \theta = \frac{a}{\sqrt{a^2+b^2}} \quad \text{and} \quad \sin \theta = \frac{b}{\sqrt{a^2+b^2}} . \] Thus, with $\alpha = \sqrt{a^2+b^2}$ we have \[ \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] = \alpha \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right]. \]
The above considerations are summarized in the following theorem which is called the Hidden Rotation-Dilation Theorem:

Theorem. Let $A$ be a real $2\!\times\!2$ matrix with a nonreal eigenvalue $a-i b$ and a corresponding eigenvector $\mathbf{x} + i \mathbf{y}$. Here $a, b \in \mathbb{R},$ $b\neq 0$ and $\mathbf{x}, \mathbf{y}\in \mathbb{R}^2.$ Then the $2\!\times\!2$ matrix \[ P = \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] \] is invertible and \[ A = \alpha P \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] P^{-1}, \] where $\alpha = \sqrt{a^2 + b^2}$ and $\theta \in [0, 2\pi)$ is such that \[ \cos \theta = \frac{a}{\sqrt{a^2 + b^2}}, \quad \sin \theta = \frac{b}{\sqrt{a^2 + b^2}}. \]

In the above theorem the matrix \[ \alpha \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] = \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] is the Hidden Rotation-Dilation, $\alpha$ dilates and $\theta$ rotates.
Here is an example of the above procedure. Consider the matrix \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right]. \] The eigenvalues of this matrix are \[ 4 - 3i \qquad \text{and} \qquad 4+3i. \] Corresponding eigenvectors are \[ \left[\! \begin{array}{r} 1 \\ -1 \end{array} \!\right] + i \left[\! \begin{array}{r} 0 \\ 1 \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{r} 1 \\ -1 \end{array} \!\right] - i \left[\! \begin{array}{r} 0 \\ 1 \end{array} \!\right] \] One of the identities for the matrix $\left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right]$ that we established in the previous item is \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] = \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} 4 & -3 \\ 3 & 4 \end{array} \!\right]. \] Since $\sqrt{4^2+3^2} = 5$ we have \[ \left[\! \begin{array}{rr} 4 & -3 \\ 3 & 4 \end{array} \!\right] = 5 \left[\! \begin{array}{rr} \frac{4}{5} & -\frac{3}{5} \\ \frac{3}{5} & \frac{4}{5} \end{array} \!\right] = 5 \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right], \quad \text{where} \quad \theta = \arccos \frac{4}{5} \approx 0.643501. \] Thus \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right] = 5 \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} \frac{4}{5} & -\frac{3}{5} \\ \frac{3}{5} & \frac{4}{5} \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ 1 & 1 \end{array} \!\right] \]
Here is another example of the above procedure. Consider the matrix \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right]. \] For this matrix it is interesting to calculate its square \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right] = \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] \] and then \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right]^4 = \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array} \!\right]. \] Explain why the fourth power of the given matrix is the identity matrix by using the method presented in this post.

Monday, February 5, 2024

Today talked about complex numbers, see Appendix B Complex Numbers in the textbook. I wrote my own introduction to complex numbers. There are 11 exercises at the end of my notes.

Friday, February 2, 2024

Inspired by Problems 15 and 16 in Section 5.4, consider the linear mapping $T: \mathbb{P}_3 \to \mathbb{P}_3$ defined by \[ T\bigl(p\bigr) = p(3) + p(2) x + p(1) x^2 + p(0) x^3 \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] The goal in this item is to find a matrix representation for $T$ relative to the monomial bases \[ \mathcal{M} = \bigl\{ 1, x, x^2, x^2, x^3 \bigr\} \] for $\mathbb{P}_3$.
- The matrix representation for $T$ relative to the basis $\mathcal{M}$ of $\mathbb{P}_3$ \[ M \bigl[p\bigr]_{\mathcal{M}} = \bigl[Tp]_{\mathcal{M}} \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \]
- Let us introduce a notation for the monomials in $\mathcal{M}$: \[ q_0(x) = 1, \quad q_1(x) = x, \quad q_2(x) = x^2, \quad q_3(x) = x^3, \qquad x \in \mathbb{R}. \] By formula (4) in Section 5.4 we have \[ M = \Bigl[ \ \bigl[ Tq_0\bigr]_{\mathcal{M}} \ \, \bigl[Tq_1\bigr]_{\mathcal{M}} \ \, \bigl[Tq_2\bigr]_{\mathcal{M}} \ \, \bigl[ Tq_3 \bigr]_{\mathcal{M}} \ \Bigr] \]
- Let us calculate $Tq_0$. Recall that $q_0(x) = 1$ for all $x \in \mathbb{R}$. Then \[ (Tq_0)(x) = q_0(3) + q_0(2) x + q_0(1) x^2 + q_0(0) x^3 = 1 + 1 x + 1 x^2 + 1 x^3. \] Therefore \[ \bigl[ Tq_0\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right]. \]
- Let us calculate $Tq_1$. Recall that $q_1(x) = x$ for all $x \in \mathbb{R}$. Then \[ (Tq_1)(x) = q_1(3) + q_1(2) x + q_1(1) x^2 + q_1(0) x^3 = 3 + 2 x + 1 x^2 + 0 x^3. \] Therefore \[ \bigl[ Tq_1\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \!\right]. \]
- Let us calculate $Tq_2$. Recall that $q_2(x) = x^2$ for all $x \in \mathbb{R}$. Then \[ (Tq_2)(x) = q_2(3) + q_2(2) x + q_2(1) x^2 + q_2(0) x^3 = 9 + 4 x + 1 x^2 + 0 x^3. \] Therefore \[ \bigl[ Tq_2\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 9 \\ 4 \\ 1 \\ 0 \end{array} \!\right]. \]
- Let us calculate $Tq_3$. Recall that $q_3(x) = x^3$ for all $x \in \mathbb{R}$. Then \[ (Tq_3)(x) = q_3(3) + q_3(2) x + q_3(1) x^2 + q_3(0) x^3 = 27 + 8 x + 1 x^2 + 0 x^3. \] Therefore \[ \bigl[ Tq_3\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 27 \\ 8 \\ 1 \\ 0 \end{array} \!\right]. \]
- Therefore \[ M = \Bigl[ \ \bigl[ Tq_0\bigr]_{\mathcal{M}} \ \, \bigl[Tq_1\bigr]_{\mathcal{M}} \ \, \bigl[Tq_2\bigr]_{\mathcal{M}} \ \, \bigl[ Tq_3 \bigr]_{\mathcal{M}} \ \Bigr] = \left[\!\begin{array}{cccc} 1 & 3 & 9 & 27 \\ 1 & 2 & 4 & 8 \\ 1 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \end{array} \!\right] \]
- The determinant of $M$ is equal to $12$. Therefore $M$, and therefore also $T$, is invertible. Let us calculate the matrix representation for $T^{-1}$. We will do that not by inverting $M,$ but by considering polynomials.
- Again we will use formula (4) in Section 5.4. For the transformation $T^{-1}$ formula (4) in Section 5.4 reads \[ M^{-1} = \Bigl[ \ \bigl[ T^{-1}q_0\bigr]_{\mathcal{M}} \ \, \bigl[T^{-1}q_1\bigr]_{\mathcal{M}} \ \, \bigl[T^{-1}q_2\bigr]_{\mathcal{M}} \ \, \bigl[ T^{-1}q_3 \bigr]_{\mathcal{M}} \ \Bigr]. \]
- Let us calculate $T^{-1}q_0$. Recall that $q_0(x) = 1$ for all $x \in \mathbb{R}$. Set $T^{-1}q_0 = p$. The last equality means that $Tp = q_0$. That is \[ (Tp)(x) = p(3) + p(2) x + p(1) x^2 + p(0) x^3 = 1 \quad \text{for all} \quad x \in \mathbb{R}. \] The last equality means that the polynomial $p$ must have the properties \[ p(3) = 1, \quad p(2) = 0, \quad p(1) = 0, \quad p(0) = 0. \] Do I know a polynomial such that \[ p(2) = 0, \quad p(1) = 0, \quad p(0) = 0? \] Yes I do, \[ p(x) = x(x-1)(x-2). \] But, for this $p$ we have $p(3) =3 \times 2 \times 1 = 6$. Therefore \[ p(x) = \frac{1}{6} x(x-1)(x-2) = \frac{1}{3} x - \frac{1}{2} x^2 + \frac{1}{6} x^3, \] satisfies $T^{-1}q_0 = p$. Therefore, \[ \bigl[ T^{-1}q_0\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 0 \\ \frac{1}{3} \\ -\frac{1}{2} \\ \frac{1}{6} \end{array} \!\right]. \]
- Let us calculate $T^{-1}q_1$. Recall that $q_1(x) = x$ for all $x \in \mathbb{R}$. Set $T^{-1}q_1 = p$. The last equality means that $(Tp)(x) = x$. That is \[ (Tp)(x) = p(3) + p(2) x + p(1) x^2 + p(0) x^3 = x \quad \text{for all} \quad x \in \mathbb{R}. \] The last equality means that the polynomial $p$ must have the properties \[ p(3) = 0, \quad p(2) = 1, \quad p(1) = 0, \quad p(0) = 0. \] Do I know a polynomial such that \[ p(3) = 0, \quad p(1) = 0, \quad p(0) = 0? \] Yes I do, \[ p(x) = x(x-1)(x-3). \] But, for this $p$ we have $p(2) =2 \times 1 \times (-1) = -2$. Therefore \[ p(x) = -\frac{1}{2} x(x-1)(x-3) = -\frac{3}{2} x +2 x^2 -\frac{1}{2} x^3, \] satisfies $T^{-1}q_1 = p$. Therefore, \[ \bigl[ T^{-1}q_1\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 0 \\ -\frac{3}{2} \\ 2 \\ -\frac{1}{2} \end{array} \!\right]. \]
- Let us calculate $T^{-1}q_2$. Recall that $q_2(x) = x^2$ for all $x \in \mathbb{R}$. Set $T^{-1}q_2 = p$. The last equality means that $(Tp)(x) = x^2$. That is \[ (Tp)(x) = p(3) + p(2) x + p(1) x^2 + p(0) x^3 = x^2 \quad \text{for all} \quad x \in \mathbb{R}. \] The last equality means that the polynomial $p$ must have the properties \[ p(3) = 0, \quad p(2) = 0, \quad p(1) = 1, \quad p(0) = 0. \] Do I know a polynomial such that \[ p(3) = 0, \quad p(2) = 0, \quad p(0) = 0? \] Yes I do, \[ p(x) = x(x-2)(x-3). \] But, for this $p$ we have $p(1) =1 \times (-1) \times (-2) = 2$. Therefore \[ p(x) = \frac{1}{2} x(x-2)(x-3) = 3 x -\frac{5}{2} x^2 + \frac{1}{2} x^3, \] satisfies $T^{-1}q_2 = p$. Therefore, \[ \bigl[ T^{-1}q_2\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 0 \\ 3 \\ -\frac{5}{2} \\ \frac{1}{2} \end{array} \!\right]. \]
- Let us calculate $T^{-1}q_3$. Recall that $q_3(x) = x^3$ for all $x \in \mathbb{R}$. Set $T^{-1}q_3 = p$. The last equality means that $(Tp)(x) = x^3$. That is \[ (Tp)(x) = p(3) + p(2) x + p(1) x^2 + p(0) x^3 = x^3 \quad \text{for all} \quad x \in \mathbb{R}. \] The last equality means that the polynomial $p$ must have the properties \[ p(3) = 0, \quad p(2) = 0, \quad p(1) = 1, \quad p(0) = 1. \] Do I know a polynomial such that \[ p(3) = 0, \quad p(2) = 0, \quad p(1) = 0? \] Yes I do, \[ p(x) = (x-1)(x-2)(x-3). \] But, for this $p$ we have $p(0) =(-1) \times (-2) \times (-3) = -6$. Therefore \[ p(x) = -\frac{1}{6} (x-1)(x-2)(x-3) =1 -\frac{11}{6} x +x^2 - \frac{1}{6} x^3, \] satisfies $T^{-1}q_3 = p$. Therefore, \[ \bigl[ T^{-1}q_3\bigr]_{\mathcal{M}} = \left[\!\begin{array}{c} 1 \\ -\frac{11}{6} \\ 1 \\ - \frac{1}{6} \end{array} \!\right]. \]
- The last four items give us $M^{-1}$ as follows \[ M^{-1} = \left[\!\begin{array}{cccc} 0 & 0 & 0 & 1 \\ \frac{1}{3} & -\frac{3}{2} & 3 & -\frac{11}{6} \\ -\frac{1}{2} & 2 & -\frac{5}{2} & 1 \\ \frac{1}{6} & -\frac{1}{2} & \frac{1}{2} & -\frac{1}{6} \end{array}\!\right] \]
- It remains to verify: \[ M \, M^{-1} = \left[\!\begin{array}{cccc} 1 & 3 & 9 & 27 \\ 1 & 2 & 4 & 8 \\ 1 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \end{array} \!\right] \left[\!\begin{array}{cccc} 0 & 0 & 0 & 1 \\ \frac{1}{3} & -\frac{3}{2} & 3 & -\frac{11}{6} \\ -\frac{1}{2} & 2 & -\frac{5}{2} & 1 \\ \frac{1}{6} & -\frac{1}{2} & \frac{1}{2} & -\frac{1}{6} \end{array}\!\right] = \left[\!\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \!\right] \] It is really amazing that a calculation with polynomials gave us the inverse of $M$.
- I was hoping that I will be able to demonstrate how to find eigenvalues and eigenvectors of the linear transformation $T$ \[ T\bigl(p\bigr) = p(3) + p(2) x + p(1) x^2 + p(0) x^3 \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] I was hoping to find polynomials $p$ and the corresponding eigenvalues $\lambda$ such that \[ (Tp)(x) = \lambda p(x). \] That can be done by finding the eigenvalues and corresponding eigenvectors of the matrix \[ \bigl[T\bigr]_{\mathcal{M}} = \left[\!\begin{array}{cccc} 1 & 3 & 9 & 27 \\ 1 & 2 & 4 & 8 \\ 1 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \end{array} \!\right]. \] But the characteristic polynomial of the matrix $M$ is \[ \lambda ^4-4 \lambda ^3-38 \lambda ^2+50 \lambda +12. \] The roots of the characteristic polynomial can be found only approximately. Here are approximate values: \[ \lambda_1 \approx -5.15835, \quad \lambda_2 \approx -0.207906, \quad \lambda_3 \approx 1.40557, \quad \lambda_4 \approx 7.96069. \] Corresponding eigenvectors are \[ \mathbf{v}_1 \approx \left[\!\begin{array}{r} -5.16 \\ -0.85 \\ 0.81 \\ 1.00 \end{array} \!\right], \quad \mathbf{v}_2 \approx \left[\!\begin{array}{r} -0.21 \\ 4.68 \\ -4.53 \\ 1.00 \end{array} \!\right], \quad \mathbf{v}_3 \approx \left[\!\begin{array}{r} 1.41 \\ -3.17 \\ -1.88 \\ 1.00 \end{array} \!\right], \quad \mathbf{v}_3 \approx \left[\!\begin{array}{r} 7.96 \\ 3.92 \\ 1.85 \\ 1.00 \end{array} \!\right]. \]
- Since the fourth row of our matrix is $\bigl[ \ 1 \ \ 0 \ \ 0 \ \ 0 \bigr]$ and since each of the eigenvectors has the fourth entry equal to $1$, the first entry of each eigenvector must be equal to the corresponding eigenvalue.
- Now we can verify whether we have an approximate solution for \[ (Tp)(x) = \lambda p(x) \] with the polynomial corresponding to the first eigenvector and the smallest eigenvalue: \[ p(x) \approx -5.16 -0.85 x + 0.81 x^2 + 1.00 x^3 \quad \text{and} \quad \lambda \approx -5.15835. \] That is, we are asking: Is it true that \begin{multline*} \bigl(-5.16 -0.85 \times 3 + 0.81\times 3^2 + 1.00\times 3^3\bigr) + \bigl(-5.16 -0.85\times 2 + 0.81\times 2^2 + 1.00 \times 2^3\bigr) x \\ + \bigl(-5.16 -0.85 + 0.81 + 1.00\bigr) x^2 - 5.16 x^3 \\ \approx (-5.16) \bigl(-5.16 -0.85 x + 0.81 x^2 + 1.00 x^3 \bigr). \end{multline*} That is, we are asking: Are the following approximate equalities true: \begin{align*} -5.16 -0.85 \times 3 + 0.81\times 3^2 + 1.00\times 3^3 &\approx (-5.16) (-5.16) \\ -5.16 -0.85\times 2 + 0.81\times 2^2 + 1.00 \times 2^3 & \approx (-5.16) (-0.85)\\ -5.16 -0.85 + 0.81 + 1.00 & \approx (-5.16) (0.81) \\ - 5.16 & \approx (-5.16) (1.00)? \end{align*} That is we are asking: Are the following approximate equalities true: \begin{align*} 26.58 &\approx 26.6256 \\ 4.38 & \approx 4.386\\ -4.2 & \approx -4.1796 \\ -5.16 & \approx -5.16? \end{align*} So, these are reasonable approximations. On my computer, in Computer Algebra System Wolfram Mathematica, I used better approximations and error was reduced to $10^{-15}$.
To do the calculations in the previous item I used Wolfram Mathematica. Computer algebra system Mathematica is very useful in mathematics. You can start getting familiar with it. To get started with Mathematica see my Mathematica page. Please watch the videos that are on my Mathematica page. Watching the movies is very helpful to get started with Mathematica efficiently! Mathematica is available in the computer labs in BH 215 and BH 209.
The calculations for the example presented above I did in the following Mathematica notebook Linear_Transformation_of_Polynomials.nb. You can download this notebook and start using Mathematica. Here is the pdf printout of the same notebook, so that you can view how Mathematica commands work without having access to Mathematica.

Thursday, February 1, 2024

Today we started Section 5.4. Suggested problems are: 1, 3, 4, 5, 6, 7, 9, 15, 17, 27, 28.
The content of Section 5.4 can be used to provide an alternative way to obtain the standard matrix of a reflection.
- In the picture below we show the reflection across the green line. The unit vector along the green line and a vector orthogonal to it are colored teal: \[ \color{teal}{\mathbf{u}_1} = \color{teal}{\left[\! \begin{array}{c} \cos \theta \\ \sin \theta \end{array} \!\right]}, \ \color{teal}{\mathbf{u}_2} = \color{teal}{\left[\! \begin{array}{r} - \sin \theta \\ \cos \theta \end{array} \!\right]}. \]
- In the picture below we denote the reflection of a vector $\color{purple}{\mathbf{v}}$ by $T\color{purple}{\mathbf{v}}$.
  An illustration of a Reflection across the green line
- Let \[ \color{teal}{\mathcal B} = \bigl\{ \color{teal}{\mathbf{u}_1}, \color{teal}{\mathbf{u}_2} \bigr\} \quad \text{and} \quad \mathcal E = \bigl\{ \mathbf{e}_1 , \mathbf{e}_2 \bigr\}. \] As we learned in Chapter 4 the change of coordinate matrices are \[ \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} = \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \quad \text{and} \quad \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} = \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ - \sin \theta & \cos \theta \end{array} \!\right] \]
- It is clear that the matrix of the reflection $T$ relative to the teal basis $\color{teal}{\mathcal B}$ is \[ \bigl[ T \bigr]_{\color{teal}{\mathcal B}} = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right]. \] This means that if we have the coordinates of a vector $\color{purple}{\mathbf{v}}$ relative to the teal basis $\color{teal}{\mathcal B}$ then we can easily calculate the coordinates of its reflection $T\color{purple}{\mathbf{v}}$ relative to the teal basis $\color{teal}{\mathcal B}:$ \[ \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}}. \]
- Now recall the power of the change of coordinates matrix: \[ \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\mathcal E} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \color{purple}{\mathbf{v}} \] and that \[ T \color{purple}{\mathbf{v}} = \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\mathcal E} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} \]
- Now put together the preceding three displayed relations: \[ T \color{purple}{\mathbf{v}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \color{purple}{\mathbf{v}}. \] Thus, the matrix of the reflection is \begin{align*} \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} &= \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{array} \!\right] \\ & = \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{array} \!\right] \\ & = \left[\! \begin{array}{cc} (\cos \theta)^2 - (\sin\theta)^2 & 2 (\sin \theta)(\cos\theta) \\ 2 (\sin \theta)(\cos\theta) & (\sin\theta)^2 - (\cos \theta)^2 \end{array} \!\right] \\ & = \color{blue}{\left[\! \begin{array}{rr} \cos(2\theta) & \sin(2\theta) \\ \sin(2\theta) & -\cos(2\theta) \end{array} \!\right]} \end{align*}
- Thus, the standard matrix of the reflection across the green line which makes the angle $\theta$ with the positive $x$-axis is \[ \color{blue}{\left[\! \begin{array}{rr} \cos(2\theta) & \sin(2\theta) \\ \sin(2\theta) & -\cos(2\theta) \end{array} \!\right]}. \]
As an exercise consider a similar setting in $\mathbb{R}^3.$ Consider the plane in $\mathbb{R}^3$ given as \[ \operatorname{Span} \left\{ \left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right] \right\}. \] Your task is to find the matrix of the reflection in $\mathbb R^3$ across this plane.
- To help you out I will point out that the vector \[ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right] \] is orthogonal to the given plane. Thus, the reflection across the given plane, call it $R,$ leaves the vectors $\left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right]$ and $ \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right]$ unchanged and reflects the vector $\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right]$ to its opposite vector $-\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right].$ That is \[ R \left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right] = \left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right], \quad R\left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right] = \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right], \quad R\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right] = -\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right]. \]
At the end of the class today we studied a particular example of an evaluation mapping $T: \mathbb{P}_2 \to \mathbb{R}^3$ for polynomials. A problem similar to Problems 9 and 10. Below I will present a different example which involves an evaluation mapping for polynomials.
Considered the linear mapping $T: \mathbb{P}_3 \to \mathbb{R}^4$ defined by \[ T\bigl(p\bigr) = \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] A mapping like this is sometimes called an evaluation mapping. The goal in this item is to find a matrix representation for $T$ relative to the standard bases for $\mathbb{P}_3$ and $\mathbb{R}^4.$
- Recall that the standard basis in $\mathbb{P}_3$ is the set of all monomials $\mathcal{M} = \bigl\{1,x,x^2,x^3\bigr\}$ and the standard basis for $\mathbb{R}^4$ is the set of the columns of the identity matrix $I_4.$ We denote this basis by $\mathcal{E}.$
- The matrix representation for $T$ relative to the basis $\mathcal{M}$ of $\mathbb{P}_3$ and the basis $\mathcal{E}$ of $\mathbb{R}^4$ is the matrix $M$ with the following property \[ M \bigl[p\bigr]_{\mathcal{M}} = \bigl[Tp]_{\mathcal{E}} \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] In this case we can figure out the matrix $M$ directly, based on the definition.
- Let $p(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3.$ Then \[ \bigl[Tp]_{\mathcal{E}} = Tp = \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \] Thus we need a $4\times 4$ matrix $M$ such that \[ \left[\!\begin{array}{cccc} \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \end{array} \!\right]\left[\!\begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right] = \left[\! \begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \] By the definition of the action of a matrix on a vector we can reconstruct the matrix $M$ \[ \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \end{array} \!\right]\left[\!\begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right] = \left[\! \begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \]
- Let us introduce a notation for the monomials in $\mathcal{M}$: \[ q_0(x) = 1, \quad q_1(x) = x, \quad q_2(x) = x^2, \quad q_3(x) = x^3, \qquad x\in \mathbb{R}. \] By formula (4) in Section 5.4 we have \[ M = \Bigl[ \bigl[ Tq_0\bigr]_{\mathcal{E}} \ \, \bigl[Tq_1\bigr]_{\mathcal{E}} \ \, \bigl[Tq_2\bigr]_{\mathcal{E}} \ \, \bigl[ Tq_3 \bigr]_{\mathcal{E}} \ \, \Bigr] = \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array} \!\right] \] which, of course, coincides with what we got above.
- The determinant of $M$ is equal to $12.$ Therefore $M,$ and so $T,$ is invertible. Let us calculate the matrix representation for $T^{-1}.$ We will do that not by inverting $M,$ but by considering polynomials.
- We will use formula (4) in Section 5.4 to determine $M^{-1}.$ For convenience we use the standard notation for the columns of the identity matrix $I_4$ \[ \mathbf{e}_1 = \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \quad \mathbf{e}_2 = \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right], \quad\mathbf{e}_3 = \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \quad\mathbf{e}_4 = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \] Then \[ M^{-1} = \Biggl[ \Bigl[ T^{-1} \mathbf{e}_1 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_2 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_3 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_4 \Bigr]_{\mathcal{M}} \Biggr] \] So, to find $M^{-1}$ we need to calculate the polynomials \[ T^{-1} \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(0) = 0, \quad p(1) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = x(x-1)(x-2). \] However, for this $p(x)$ we have $p(-1) = -6.$ Since we need $p(-1) = 1,$ the needed $p(x)$ 1s \[ p(x) = - \frac{1}{6} x(x-1)(x-2) = 0\cdot 1 - \frac{1}{3} x + \frac{1}{2} x^2 - \frac{1}{6} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(1) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)(x-1)(x-2). \] However, for this $p(x)$ we have $p(0) = 2.$ Since we need $p(0) = 1,$ the needed $p(x)$ is \[ p(x) = \frac{1}{2} (x+1)(x-1)(x-2) = 1 - \frac{1}{2} x - x^2 + \frac{1}{2} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(0) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)x(x-2). \] However, for this $p(x)$ we have $p(1) = -2.$ Since we need $p(1) = 1,$ the needed $p(x)$ is \[ p(x) = - \frac{1}{2} (x+1)x(x-2) = 0\cdot 1 + x + \frac{1}{2} x^2 - \frac{1}{2} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(0) = 0, \quad p(1) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)x(x-1). \] However, for this $p(x)$ we have $p(2) = 6.$ Since we need $p(2) = 1$, the needed $p(x)$ is \[ p(x) = \frac{1}{6} (x+1)x(x-1) = 0\cdot 1 - \frac{1}{6} x + 0 \cdot x^2 + \frac{1}{6} x^3. \]
- The last four items give us $M^{-1}$ as follows \[ M^{-1} = \left[\!\begin{array}{cccc} 0 & 1 & 0 & 0 \\ -\frac{1}{3} & -\frac{1}{2} & 1 & -\frac{1}{6} \\ \frac{1}{2} & -1 & \frac{1}{2} & 0 \\ -\frac{1}{6} & \frac{1}{2} & -\frac{1}{2} & \frac{1}{6} \end{array}\!\right] \]
- It remains to verify: \[ M \, M^{-1} = \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array} \!\right] \left[\!\begin{array}{cccc} 0 & 1 & 0 & 0 \\ -\frac{1}{3} & -\frac{1}{2} & 1 & -\frac{1}{6} \\ \frac{1}{2} & -1 & \frac{1}{2} & 0 \\ -\frac{1}{6} & \frac{1}{2} & -\frac{1}{2} & \frac{1}{6} \end{array}\!\right] = \left[\!\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \!\right] \] It is really amazing that a calculation with polynomials gave us the inverse of $M$.
Here I will present some interesting linear transformations of the vector space $\mathbb{P}_4$ of polynomials of degree $\leq 4$. In these examples I always calculate the matrix of a linear transformation with respect to the basis of this space which consists of monomials: \[ \mathcal{M} =\bigl\{1, t, t^2, t^3,t^4 \bigr\}. \] Notice that this basis has 5 elements, so the space $\mathbb{P}_4$ is a five-dimensional vector space. Sometimes it is convenient to introduce notation for monomials: We set \[ \phi_0(t) = 1, \quad \phi_1(t) = t, \quad \phi_2(t) = t^2, \quad \phi_3(t) = t^3, \quad \phi_4(t) = t^4. \]
- Let $D: \mathbb P_4 \to \mathbb P_4$ be the linear transformation of taking the derivative with respect to $t$. That is, for every $p(t) \in \mathbb P_n$ we set $(Dp)(t) = p'(t).$ Then the matrix representation of $D$ relative to $\mathcal M$ is the following $5\!\times\!5$ matrix \[ \left[\! \begin{array}{ccccc} 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 4 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array}\!\right]. \]
- Let $R: \mathbb P_4 \to \mathbb P_4$ be the linear transformation defined for every $p(t) \in \mathbb P_4$ by \[ (Rp)(t) = t^4 p(1/t). \] Then the matrix representation of $R$ relative to $\mathcal M$ is the following $5\!\times\!5$ matrix \[ Z_{5} = \left[\! \begin{array}{ccccc} 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ \end{array}\!\right]. \]
- Let $T: \mathbb P_4 \to \mathbb P_4$ be the linear transformation defined for every $p(t) \in \mathbb P_4$ by \[ (Tp)(t) = (1-t^2)p''(t) - tp'(t). \] Then the matrix representation of $T$ relative to $\mathcal M$ is the following $5\!\times\!5$ matrix \[ \left[ \begin{array}{rrrrr} 0 & 0 & 2 & 0 & 0 \\ 0 & -1 & 0 & 6 & 0 \\ 0 & 0 & -4 & 0 & 12 \\ 0 & 0 & 0 & -9 & 0 \\ 0 & 0 & 0 & 0 & -16 \end{array} \right]. \]
  
  The linear transformation introduced here is related to the Chebyshev differential equation and Chebyshev polynomials of the first kind, see my web-page about this topic using linear algebra.
In Problem 2 on Assignment 1 we studied a vector space of trigonometric functions. Here is a subspace of that vector space: \[ \mathcal{H} = \operatorname{Span}\bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \] We proved that the set \[ \mathcal{B} = \bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \] is a basis for the vector space $\mathcal{H}.$
In Section 5.4 we study the concept of the matrix of a linear transformation. The remarkable linear transformation in the space \[ \mathcal{H} = \operatorname{Span}\bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \] is the second derivative: Denote by $T: \mathcal{H} \to \mathcal{H}$ the second derivative \[ \forall\mkern 0mu f \in \mathcal{H} \qquad (T f)(t) = \frac{d^2}{dt^2} f(t). \] It is not immediately clear that the second derivative of a function in $\mathcal{H}$ is again in $\mathcal{H}$. Next we will prove that by calculating the second derivatives of each function in the basis \[ \mathcal{B} = \bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \]
Clearly the second derivative of $1$ is $0$. Therefore, $T(1) = 0 \in \mathcal{H}$ and \[ \require{bbox} \bigl[T (1)\bigr]_{\mathcal{B}} = \left[\bbox[#20FFFF]{\begin{array}{r} 0 \\ 0 \\ 0 \\ 0 \end{array}}\right] \] Let us calculate the second derivative of $(\cos t)^2$: \begin{align*} \frac{d^2}{dt^2} \bigl( (\cos t)^2 \bigr) & = \frac{d}{dt} \bigl(2 (\cos t)(-\sin t) \bigr) \\ & = 2 (-\sin t)(-\sin t) + 2 (\cos t)(-\cos t) \\ & = 2 (\sin t)^2 - 2 (\cos t)^2 \\ & = 2 \bigl(1 - (\cos t)^2 \bigr) - 2 (\cos t)^2 \\ & = 2 - 4 (\cos t)^2. \end{align*} Thus \[ T \bigl((\cos t)^2\bigr) = 2 - 4 (\cos t)^2. \] Therefore $T \bigl((\cos t)^2\bigr) \in \mathcal{H}$ and \[ \require{bbox} \bigl[T \bigl((\cos t)^2\bigr)\bigr]_{\mathcal{B}} = \left[\bbox[#FF8000]{\begin{array}{r} 2 \\ -4 \\ 0 \\ 0 \end{array}}\right] \] Let us calculate the second derivative of $(\cos t)^4$: \begin{align*} \frac{d^2}{dt^2} \bigl( (\cos t)^4 \bigr) & = \frac{d}{dt} \bigl(4 (\cos t)^3(-\sin t) \bigr) \\ & = 4\cdot 3 (\cos t)^2 (-\sin t)^2 + 4 (\cos t)^3(-\cos t) \\ & = 12 (\cos t)^2 \bigl(1 - (\cos t)^2 \bigr) - 4 (\cos t)^4 \\ & = 12 (\cos t)^2 - 16 (\cos t)^4. \end{align*} Thus \[ T \bigl((\cos t)^4\bigr) = 12 (\cos t)^2 - 16 (\cos t)^4. \] Therefore $T \bigl((\cos t)^4\bigr) \in \mathcal{H}$ and \[ \require{bbox} \bigl[T \bigl((\cos t)^4\bigr)\bigr]_{\mathcal{B}} = \left[\bbox[#808020]{\begin{array}{r} 0 \\ 12 \\ -16 \\ 0 \end{array}}\right] \] Let us calculate the second derivative of $(\cos t)^6$: \begin{align*} \frac{d^2}{dt^2} \bigl( (\cos t)^6 \bigr) & = \frac{d}{dt} \bigl(6 (\cos t)^5(-\sin t) \bigr) \\ & = 6\cdot 5 (\cos t)^4 (-\sin t)^2 + 6 (\cos t)^5(-\cos t) \\ & = 30 (\cos t)^4 \bigl(1 - (\cos t)^2 \bigr) - 6 (\cos t)^6 \\ & = 30 (\cos t)^4 - 36 (\cos t)^6. \end{align*} Thus \[ T \bigl((\cos t)^6\bigr) = 30 (\cos t)^4 - 36 (\cos t)^6. \] Therefore $T \bigl((\cos t)^6\bigr) \in \mathcal{H}$ and \[ \require{bbox} \bigl[T \bigl((\cos t)^6\bigr)\bigr]_{\mathcal{B}} = \left[\bbox[#802080]{\begin{array}{r} 0 \\ 0 \\ 30 \\ -36 \end{array}}\right]. \]
Based on what we calculated in the preceding item, the matrix of the second derivative relative to the basis $\mathcal{B}$ is \[ \bigl[ T \bigr]_{\mathcal{B}} = \left[\! \begin{array}{rrrr} \bbox[#20FFFF]{\begin{array}{r} 0 \\ 0 \\ 0 \\ 0 \end{array}} & \bbox[#FF8000]{\begin{array}{r} 2 \\ -4 \\ 0 \\ 0 \end{array}} & \bbox[#808020]{\begin{array}{r} 0 \\ 12 \\ -16 \\ 0 \end{array}} & \bbox[#802080]{\begin{array}{r} 0 \\ 0 \\ 30 \\ -36 \end{array}} \end{array} \!\right] = \left[\! \begin{array}{rrrr} 0 & 2 & 0 & 0 \\ 0 & -4 & 12 & 0 \\ 0 & 0 & -16 & 30 \\ 0 & 0 & 0 & -36 \end{array} \!\right] . \]
Let us calculate the eigenvalues and the eigenvectors of the matrix \[ \left[\! \begin{array}{rrrr} 0 & 2 & 0 & 0 \\ 0 & -4 & 12 & 0 \\ 0 & 0 & -16 & 30 \\ 0 & 0 & 0 & -36 \end{array} \!\right]. \] Since this is an upper triangular matrix, its eigenvalues are $0$, $-4$, $-16$, $-36$. The eigenvectors are \begin{align*} \left[\! \begin{array}{rrrr} 0 & 2 & 0 & 0 \\ 0 & -4 & 12 & 0 \\ 0 & 0 & -16 & 30 \\ 0 & 0 & 0 & -36 \end{array} \!\right] \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right] &= \phantom{-1}0 \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \\ \left[\! \begin{array}{rrrr} 0 & 2 & 0 & 0 \\ 0 & -4 & 12 & 0 \\ 0 & 0 & -16 & 30 \\ 0 & 0 & 0 & -36 \end{array} \!\right] \left[\! \begin{array}{r} -1 \\ 2 \\ 0 \\ 0 \end{array} \!\right] & = (-4) \left[\! \begin{array}{r} -1 \\ 2 \\ 0 \\ 0 \end{array} \!\right], \\ \left[\! \begin{array}{rrrr} 0 & 2 & 0 & 0 \\ 0 & -4 & 12 & 0 \\ 0 & 0 & -16 & 30 \\ 0 & 0 & 0 & -36 \end{array} \!\right] \left[\! \begin{array}{r} 1 \\ -8 \\ 8 \\ 0 \end{array} \!\right] &= (-16) \left[\! \begin{array}{r} 1 \\ -8 \\ 8 \\ 0 \end{array} \!\right], \\ \left[\! \begin{array}{rrrr} 0 & 2 & 0 & 0 \\ 0 & -4 & 12 & 0 \\ 0 & 0 & -16 & 30 \\ 0 & 0 & 0 & -36 \end{array} \!\right] \left[\! \begin{array}{r} -1 \\ 18 \\ -48 \\ 32 \end{array} \!\right] &= (-36) \left[\! \begin{array}{r} -1 \\ 18 \\ -48 \\ 32 \end{array} \!\right]. \end{align*}
The vectors \[ \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} -1 \\ 2 \\ 0 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 1 \\ -8 \\ 8 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} -1 \\ 18 \\ -48 \\ 32 \end{array} \!\right], \] are the coordinate vectors relative to the basis \[ \mathcal{B} = \bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\} \] of the following functions: \begin{align*} & \phantom{=-}1 \\ & \phantom{=}- 1 + 2 (\cos t)^2\\ & \phantom{=-} 1 - 8 (\cos t)^2 + 8 (\cos t)^4 \\ & \phantom{=}- 1 + 18 (\cos t)^2 - 48 (\cos t)^4 + 32 (\cos t)^6 \end{align*} and since their coordinate vectors are eigenvectors of the matrix of the second derivative, the above functions are the eigenfunctions of the second derivative. But, we have encountered these functions before: \begin{align*} 1 & = \phantom{-}1 \\ \cos(2t) & = - 1 + 2 (\cos t)^2\\ \cos(4t) & = \phantom{-} 1 - 8 (\cos t)^2 + 8 (\cos t)^4 \\ \cos(6t) & = - 1 + 18 (\cos t)^2 - 48 (\cos t)^4 + 32 (\cos t)^6 \end{align*}
Conclusion. In the preceding items, we used linear algebra to discover that the eigenfunctions of the second derivative in the vector space $\mathcal{H}$ are the function \[ \mathcal{C} = \bigl\{ 1, \cos(2 t), \cos(4 t), \cos(6 t) \bigr\}. \] In the post on April 10, we proved that these functions form a basis of $\mathcal{H}$. In this post we have discovered that the matrix of the second derivative relative to the basis $\mathcal{C}$ is a diagonal matrix: \[ \left[\begin{array}{rrrr} 0 & 0 & 0 & 0 \\ 0 & -4 & 0 & 0 \\ 0 & 0 & -16 & 0 \\ 0 & 0 & 0 & -36 \end{array}\right] \]
Now we found two matrices (relative to two different basis) of the second derivative. What is the relationship between these two matrices? To answer this question, recall that on April 10 we calculated \[ \underset{\mathcal{B}\leftarrow\mathcal{C}}{P} = \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right] \qquad \text{and} \qquad \underset{\mathcal{C}\leftarrow\mathcal{B}}{P} = \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right]^{-1} = \left[\!\begin{array}{cccc} 1 & \frac{1}{2} & \frac{3}{8} & \frac{5}{16} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{15}{32} \\ 0 & 0 & \frac{1}{8} & \frac{3}{16} \\ 0 & 0 & 0 & \frac{1}{32} \\ \end{array}\!\right] \] The relationship between two matrix representations of the second derivative is as follows: \[ \left[\! \begin{array}{rrrr} 0 & 2 & 0 & 0 \\ 0 & -4 & 12 & 0 \\ 0 & 0 & -16 & 30 \\ 0 & 0 & 0 & -36 \end{array} \!\right] = \underset{\mathcal{B}\leftarrow\mathcal{C}}{P} \left[\begin{array}{rrrr} 0 & 0 & 0 & 0 \\ 0 & -4 & 0 & 0 \\ 0 & 0 & -16 & 0 \\ 0 & 0 & 0 & -36 \end{array}\right] \underset{\mathcal{C}\leftarrow\mathcal{B}}{P} \]

Tuesday, January 30, 2024

Today we did a remarkable application of eigenvalues and eigenvectors to a famous sequence of positive integers: Fibonacci numbers.
- Recall that $\mathbb{N}$ denotes the set of positive integers. The Fibonacci numbers are the elements of the sequence \[ f_0, f_1, f_2, \ldots, f_n, \ldots \] recursively defined by \[ f_0 = 0, \quad f_1 = 1, \quad \text{and} \quad f_{n+1} = f_n + f_{n-1} \quad \text{for all} \quad n \in \mathbb{N}. \] Since we are given $f_0 = 0$ and $f_1 = 1$ using the recursion $f_{n+1} = f_n + f_{n-1}$ with $n=1$ we get $f_2 = 0+1 = 1.$ A repeated use of the recursion $f_{n+1} = f_n + f_{n-1}$ with $n=2$, then $n=3$, and so on, we get \[ f_0 = 0, \ f_1 = 1, \ f_2 = 1, \ f_3 = 2, \ f_4 = 3, \ f_5 = 5, \ f_6 = 8, \ f_7 = 13, \ f_8 = 21, \ f_9 = 34, \ f_{10} = 55, \ f_{11} = 89, \ \ldots \ \] It is clear that with enough patience we can calculate $f_{100}$ by calculating all Fibonacci numbers preceding it. Computers are really good with doing recursively defined operations. I wrote a webpage on how to do this in Mathematica. By the way, Mathematica gives \[ f_{100} = 354,224,848,179,261,915,075 \]
- Since in Mathematics we like to be able to approach mathematical concepts in diverse ways, we are interested in finding a formula for the $n$-th Fibonacci number without calculating all the preceding Fibonacci numbers; a formula that will use only $n$ and algebraic operations. Amazingly, Linear Algebra offers a way to do that. The next items will illustrate how that comes about.
- The first step is to write the recutsion \[ f_{n+1} = f_n + f_{n-1} \] using a matrix: \[ \left[\!\begin{array}{l} f_{n} \\ f_{n+1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{n-1} \\ f_{n} \end{array}\!\right]. \] Thus, we can obtain the Fibonacci sequence by repeated application of the matrix $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$ as follows \begin{align*} \left[\!\begin{array}{l} f_{1} \\ f_{2} \end{array}\!\right] & = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_0 \\ f_{1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{2} \\ f_{3} \end{array}\!\right] & = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{1} \\ f_{2} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^2 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{3} \\ f_{4} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{2} \\ f_{3} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^2 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^3 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{4} \\ f_{5} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{3} \\ f_{4} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^3 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^4 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ & \ \vdots \\ \left[\!\begin{array}{l} f_{n} \\ f_{n+1} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{n-1} \\ f_{n} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^{n-1} \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{n+1} \\ f_{n+2} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{n} \\ f_{n+1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^{n} \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^{n+1} \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \end{align*}
- In the preceding item we saw that \[ \left[\!\begin{array}{l} f_{n} \\ f_{n+1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right]. \] Therefore \[ f_n = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right]. \] We could stop here and pronounce that this is sufficiently good formula for $f_n$ which uses only $n$ and matrix algebra. However, we want a formula for $f_n$ which uses only algebra with specific numbers; without matrices. To obtain such a formula we will calculate the eigenvalues and eigenvectors of the matrix $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$.
- First we calculate the characteristic polynomial of $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$: \[ \det\left( \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] - \left[\!\begin{array}{cc} \lambda & 0 \\ 0 & \lambda \end{array}\!\right] \right) = \left|\!\begin{array}{cc} -\lambda & 1 \\ 1 & 1-\lambda \end{array}\!\right| = -\lambda(1-\lambda) - 1 = \lambda^2 - \lambda - 1 \] The roots of the characteristic polynomial are \[ \lambda_1 = \frac{1+\sqrt{5}}{2} = \varphi \quad \text{and} \quad \lambda_2 = \frac{1-\sqrt{5}}{2} = \psi. \] The number $\varphi$ is the famous number Golden ratio. The Greek letter $\varphi$ is the standard notation for the Golden ratio. We introduce $\psi$ since we will use it several times below.
- An eigenvector of $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$ corresponding to $\varphi$ is $\displaystyle\left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right]$ and an eigenvector corresponding to $\psi$ is $\displaystyle\left[\!\begin{array}{l} 1 \\ \psi \end{array}\!\right]$. Please verify that \[ \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]\left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right] = \varphi \left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right] \quad \text{and} \quad \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]\left[\!\begin{array}{l} 1 \\ \psi \end{array}\!\right] = \psi \left[\!\begin{array}{l} 1 \\ \psi \end{array}\!\right]. \] To verify the preceding vector equalities you will use that $\varphi$ and $\psi$ are the roots of the characteristic polynomial, that is \[ \varphi^2 - \varphi - 1 = 0 \quad \text{and} \quad \psi^2 - \psi - 1 = 0. \]
- One of the important properties of eigenvectors is that it is easy to calculate the action of the powers of the matrix on eigenvectors: \[ \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] = \varphi^n \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] \quad \text{and} \quad \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] = \psi^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right]. \]
- To improve the formula \[ f_n = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 0 \\ 1 \end{array}\!\right] \] we will represent the vector $\displaystyle\left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right]$ as a linear combination of the eigenvectors: \[ \left[\!\begin{array}{c} 0 \\ 1 \end{array}\!\right] = x_1 \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] + x_2 \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right]. \] We do not need to do row reduction to solve this system. Since $x_1+x_2 = 0$ we deduce that $x_2 = -x_1$. Then we have \[ x_1 (\varphi - \psi) = 1. \] Since $\varphi - \psi = \sqrt{5}$ we have \[ \left[\!\begin{array}{c} 0 \\ 1 \end{array}\!\right] = \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right]. \] Therefore, \begin{align*} f_n & = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ & = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left( \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] \right) \\ & = \bigl[ 1 \ \ 0 \bigr] \left( \frac{1}{\sqrt{5}} \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] \right) \\ & = \bigl[ 1 \ \ 0 \bigr] \left( \frac{1}{\sqrt{5}} \varphi^n \left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \psi^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] \right) \\ & = \frac{1}{\sqrt{5}} \bigl[ 1 \ \ 0 \bigr] \left( \left[\!\begin{array}{c} \varphi^n \\ \varphi^{n+1} \end{array}\!\right] - \left[\!\begin{array}{c} \psi^{n} \\ \psi^{n+1} \end{array}\!\right] \right) \\ & = \frac{1}{\sqrt{5}} \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{c} \varphi^n - \psi^{n} \\ \varphi^{n+1} -\psi^{n+1} \end{array}\!\right]\\ & = \frac{1}{\sqrt{5}} \bigl( \varphi^n - \psi^{n} \bigr). \end{align*}
- Thus, finally and amazingly, \[ f_n = \frac{1}{\sqrt{5}} \bigl( \varphi^n - \psi^{n} \bigr) = \frac{1}{\sqrt{5}} \left( \frac{(1+\sqrt{5})^n}{2^n} - \frac{(1-\sqrt{5})^n}{2^n} \right) = \frac{ (1+\sqrt{5})^n - (1-\sqrt{5})^n }{2^n \sqrt{5}} \] for all nonnegative integers $n$.
- A formula like in the preceding item in which $f_n$ is given in terms of $n$ and standard functions is called a closed form expression. A lot of effort in mathematics has been put in finding closed form expressions for different mathematical objects.
- The closed form expression for the Fibonacci numbers \[ \forall\mkern 1mu n \in \mathbb{N} \qquad f_n = \frac{ (1+\sqrt{5})^n - (1-\sqrt{5})^n }{2^n \sqrt{5}} \] can be somewhat simplified by using the standard notation for the Golden Ratio: \[ \varphi = \frac{1+\sqrt{5}}{2}. \] Here is some interesting arithmetic with which leads to the negative reciprocal of the golden ratio: \[ \frac{1-\sqrt{5}}{2} = \left(\frac{1-\sqrt{5}}{2} \right) \left( \frac{1+\sqrt{5}}{1+\sqrt{5}} \right) = \frac{1-5}{2\bigl(1+\sqrt{5} \bigr)} = - \frac{2}{1+\sqrt{5}} = - \frac{1}{\varphi}. \] Therefore, the closed form expression for the Fibonacci numbers can be written as \[ \forall\mkern 1mu n \in \mathbb{N} \qquad f_{n} = \frac{1}{\sqrt{5}}\Bigl( \varphi^n - (-1)^n \varphi^{-n} \Bigr). \]

Monday, January 29, 2024

Let $A$ be a $3\times 3$ matrix. Last few days we studied the following matrix initial value problem:

Find a $3\times 3$ matrix-valued function $Y(t)$ which satisfies: \[ Y'(t) = A\mkern 1mu Y(t) \quad \text{and} \quad Y(0) = I_3. \]

It is a fact from differential equation that the solution of this initial value problem is unique. The unique solution of the above boxed initial value problem is called the matrix-exponential function: \[ Y(t) = {\Large e}^{A\mkern 0.75mu t}. \]
It is important to notice that from the boxed equation evaluated at $t=0$ we obtain \[ Y'(0) = A\mkern 1mu Y(0) = A\mkern 1mu I_3 = A . \]
The exponential notation introduced in the previous item is analogous to what we learned in a calculus class. Let $a$ be a real number. The unique solution $y(t)$ of the initial \[ y'(t) = a\mkern 0.75mu y(t) \quad \text{and} \quad y(0) = 1 \] is the scalar exponential function \[ y(t) = e^{a\mkern 0.75mu t}. \]
In the preceding item, instead of a $3\times 3$ matrix $A$ we can consider any square matrix. In the example in the next item we will work with $2\times 2$ matrix functions.
Recall the rotation matrix in $\mathbb{R}^2$. The $2\times 2$ rotation matrix \[ \left[\!\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] \] is the matrix representation of the linear transformation which rotates counterclockwise points of the plane $\mathbb{R}^2$ through an angle $\theta$ about the origin.
- The rotation matrix can be viewed as a matrix-valued function of $\theta$ \[ R(\theta) = \left[\!\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right]. \] The values of this function are $2\times 2$ matrices.
- Let us calculate the derivative of this matrix: \[ \frac{d}{d\theta} \left[\!\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = \left[\!\begin{array}{cc} -\sin\theta & -\cos\theta \\ \cos\theta & -\sin\theta \end{array}\!\right]. \]
- We have \[ R(0) = \left[\!\begin{array}{rr} \cos 0 & -\sin 0 \\ \sin 0 & \cos 0 \end{array}\!\right] = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] \] and \[ R'(0) = \left[\!\begin{array}{rr} -\sin 0 & -\cos 0 \\ \cos 0 & -\sin 0 \end{array}\!\right] = \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right] = A. \]
- Let us now verify whether $R(\theta)$ satisfies $R'(\theta) = A\mkern 1mu R(\theta)$: \[ \left[\!\begin{array}{rr} -\sin\theta & -\cos\theta \\ \cos\theta & -\sin\theta \end{array}\!\right] = \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right] \left[\!\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right]. \] Since the last identity holds true, we have in fact proved that \[ {\huge e}^{^{\left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 0.5mu \begin{array}{r} -1 \\ 0 \end{array} \!\right]{\Large \theta}}} = \left[\!\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right]. \]
- The last identity can be rewritten as \[ {\huge e}^{^{\left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 0.5mu \begin{array}{r} -1 \\ 0 \end{array} \!\right]{\Large \theta}}} = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] (\cos\theta) + \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right] (\sin \theta). \]
- We will talk more about this when we discuss the complex numbers. The complex numbers are introduced by first introducing the imaginary unit $i$ such that $i^2 = -1$. Since the matrix $\displaystyle \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right]$ has the property \[ \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right]^2 = \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right]\left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right] = \left[\!\begin{array}{r} -1 \\ 0\end{array} \mkern 4mu \begin{array}{r} 0 \\ -1 \end{array} \!\right] = - \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right], \] the matrix $\displaystyle \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right]$ can be identified by the imaginary unit $i.$ Similarly the identity matrix $\displaystyle \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]$ can be identified with the real number $1$.
- With this identification, the identity \[ {\huge e}^{^{\left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right] \Large \theta}} = \left[\!\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] (\cos\theta) + \left[\!\begin{array}{r} 0 \\ 1\end{array} \mkern 4mu \begin{array}{r} -1 \\ 0 \end{array} \!\right] (\sin \theta). \] can be rewritten with complex numbers as follows: For all real numbers $\theta$ we have \[ {\Large e}^{i \theta} = (\cos\theta) + i (\sin \theta). \] This is Euler's identity, one of the most famous formulas in Mathematics.
In the preceding item we deduced Euler's identity. I want to celebrate it by carving it in a teal plate: Often Euler's identity is celebrated by stating its special case $\theta = \pi$:

But, in today's post we established a tight connection between Euler's formula and the matrix exponential: In fact, the above formula is Euler's identity in terms of matrices. Let us celebrate it by stating its special case $\theta = \pi$:

Sunday, January 28, 2024

Yesterday I posted about the matrix-valued exponential function $t\mapsto e^{At}$. Here $A$ is a $3\times 3$ matrix. We defined the matrix-valued exponential function $t\mapsto e^{At}$ as to be the unique solution $Y(t)$ of the following problem:

Find a $3\times 3$ matrix-valued function $Y(t)$ which satisfies: \[ Y'(t) = A\mkern 1mu Y(t) \quad \text{and} \quad Y(0) = I_3. \]
Let us show how to solve the above differential equation when $A$ is a diagonal matrix, call it $D$.
Find a $3\times 3$ matrix-valued function $Y(t)$ which satisfies: \[ Y'(t) = D\mkern 1mu Y(t) \quad \text{and} \quad Y(0) = I_3. \]
For example, let us consider the diagonal matrix that we encountered last week: \[ D = \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{2} & 0 \\ 0 & 0 & \bbox[#80FF80]{2} \end{array} \!\right] \]
- With a diagonal matrix it is reasonable to expect that the solution will be a diagonal matrix. \begin{multline*} \left[\! \begin{array}{ccc} y_1'(t) & 0 & 0 \\ 0 & y_2'(t) & 0 \\ 0 & 0 & y_3'(t) \end{array} \!\right] = \left[\! \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right] \left[\! \begin{array}{ccc} y_1(t) & 0 & 0 \\ 0 & y_2(t) & 0 \\ 0 & 0 & y_3(t) \end{array} \!\right] \\ \text{and} \qquad \left[\! \begin{array}{ccc} y_1(0) & 0 & 0 \\ 0 & y_2(0) & 0 \\ 0 & 0 & y_3(0) \end{array} \!\right] = \left[\! \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right]. \end{multline*}
- But the matrix equation in the preceding item is just a very complicated way of writing three scalar equations \begin{align*} y_1'(t) & = 1\mkern 1mu y_1(t), \quad y_1(0) = 1, \\ y_2'(t) & = 2\mkern 1mu y_2(t), \quad y_2(0) = 1, \\ y_3'(t) & = 2\mkern 1mu y_3(t), \quad y_3(0) = 1. \end{align*} For each of these equations we know the solution \[ y_1(t) = e^t, \qquad y_2(t) = e^{2t}, \qquad y_3(t) = e^{2t}. \]
- Thus, the matrix-valued function \[ Y(t) = \left[\! \begin{array}{ccc} e^{t} & 0 & 0 \\ 0 & e^{2t} & 0 \\ 0 & 0 & e^{2t} \end{array} \!\right] \] solves the problem
  Find a $3\times 3$ matrix-valued function $Y(t)$ which satisfies: \[ Y'(t) = D Y(t) \quad \text{and} \quad Y(0) = I_3. \]
  with \[ D = \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right]. \]
It is quite amazing that, now that we solved the problem $Y'(t) = D Y(t)$ and $Y(0) = I_3$ for a diagonal matrix, we can solve the problem $X'(t) = A X(t)$ and $X(0) = I_3$ for a diagonalizable matrix $A$. That is for a matrix $A$ for which \[ A = PDP^{-1}, \] where $P$ is an invertible matrix whose columns are eigenvectors of $A$. The reasoning goes as follows:
- We know the function $Y(t)$ which solves \[ Y'(t) = D Y(t) \quad \text{and} \quad Y(0) = I_3. \]
- The last two equalities we can multiply by matrices $P$ and $P^{-1}$ as follows \[ P\mkern 2mu Y'(t) P^{-1} = P DP^{-1} P\mkern 2mu Y(t) P^{-1} \quad \text{and} \quad P\mkern 2mu Y(0) P^{-1} = P\mkern 2mu I_3 P^{-1} = I_3. \] Since we know that $A = PDP^{-1}$, the last differential equation can be rewritten as \[ P\mkern 2mu Y'(t) P^{-1} = A\mkern 2mu P\mkern 2mu Y(t) P^{-1} \quad \text{and} \quad P\mkern 2mu Y(0) P^{-1} = I_3 \]
- Now set \[ X(t) = P\mkern 2mu Y(t) P^{-1}. \] Then \[ X(0) = P\mkern 2mu Y(0) P^{-1} = I_3 \] and, since $P$ and $P^{-1}$ are constant matrices \[ X'(t) = \frac{d}{dt} \bigl( P\mkern 2mu Y(t) P^{-1} \bigr) = P\mkern 2mu Y'(t) P^{-1}. \] Substituting the last derivative in \[ P\mkern 2mu Y'(t) P^{-1} = A\mkern 2mu P\mkern 2mu Y(t) P^{-1}, \] we see that \[ X'(t) = A \mkern 2mu X(t). \]
- Thus, \[ X(t) = P\mkern 2mu Y(t) P^{-1} \quad \text{solves} \quad X'(t) = A Y(t) \quad \text{and} \quad X(0) = I_3. \]
Now recall the diagonalization \[ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{2} & 0 \\ 0 & 0 & \bbox[#80FF80]{2} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right]. \]
- We calculated that the matrix \[ Y(t) = \left[\! \begin{array}{ccc} e^{t} & 0 & 0 \\ 0 & e^{2t} & 0 \\ 0 & 0 & e^{2t} \end{array} \!\right] \] solves the problem
  Find a $3\times 3$ matrix-valued function $Y(t)$ which satisfies: \[ Y'(t) = D Y(t) \quad \text{and} \quad Y(0) = I_3. \]
  with \[ D = \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right]. \]
- The reasoning in the previous square (■) item shows that the matrix function \[ X(t) = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{e^{t}} & 0 & 0 \\ 0 & \bbox[#8080FF]{e^{2t}} & 0 \\ 0 & 0 & \bbox[#80FF80]{e^{2t}} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[\! \begin{array}{rrr} 2 e^{2 t}-e^t & e^{2 t}-e^t & e^t-e^{2 t} \\ e^{2 t}-e^t & 2 e^{2 t}-e^t & e^t-e^{2 t} \\ 3 e^{2 t}-3 e^t & 3 e^{2 t}-3 e^t & 3 e^t-2 e^{2 t} \end{array} \!\right], \] or simplified \[ X(t) = e^t \mkern 1mu \left[\! \begin{array}{rrr} 2 e^{t}-1 & e^{t}-1 & 1-e^{t} \\ e^{t}-1 & 2 e^{t}-1 & 1-e^{t} \\ 3 e^{t}-3 & 3 e^{t}-3 & 3 -2 e^{t} \end{array} \!\right] \] solves the problem \[ X'(t) = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] X(t) \quad \text{and} \quad X(0) = I_3. \]
- Let us verify \[ X(0) = e^0 \mkern 1mu \left[\! \begin{array}{rrr} 2 e^{0}-1 & e^{0}-1 & 1-e^{0} \\ e^{0}-1 & 2 e^{0}-1 & 1-e^{0} \\ 3 e^{0}-3 & 3 e^{0}-3 & 3 -2 e^{0} \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right] \]
- Let us calculate \[ X'(t) = \frac{d}{dt} \left[\! \begin{array}{rrr} 2 e^{2 t}-e^t & e^{2 t}-e^t & e^t-e^{2 t} \\ e^{2 t}-e^t & 2 e^{2 t}-e^t & e^t-e^{2 t} \\ 3 e^{2 t}-3 e^t & 3 e^{2 t}-3 e^t & 3 e^t-2 e^{2 t} \end{array} \!\right] = \left[\! \begin{array}{rrr} 4 e^{2 t}-e^t & 2 e^{2 t}-e^t & e^t-2 e^{2 t} \\ 2 e^{2 t}-e^t & 4 e^{2 t}-e^t & e^t-2 e^{2 t} \\ 6 e^{2 t}-3 e^t & 6 e^{2 t}-3 e^t & 3 e^t-4 e^{2 t} \end{array} \!\right] \] Or, slightly simplified \[ X'(t) = e^t \mkern 1mu \left[\! \begin{array}{rrr} 4 e^{t}-1 & 2 e^{t}-1 & 1-2 e^{t} \\ 2 e^{t}-1 & 4 e^{t}-1 & 1-2 e^{t} \\ 6 e^{t}-3 & 6 e^{t}-3 & 3-4 e^{t} \end{array} \!\right] \]
- Now we need to verify the equality $X'(t) = AX(t)$, that is \[ e^t \mkern 1mu \left[\! \begin{array}{rrr} 4 e^{t}-1 & 2 e^{t}-1 & 1-2 e^{t} \\ 2 e^{t}-1 & 4 e^{t}-1 & 1-2 e^{t} \\ 6 e^{t}-3 & 6 e^{t}-3 & 3-4 e^{t} \end{array} \!\right] \stackrel{?}{=} e^{t} \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] \left[\! \begin{array}{rrr} 2 e^{t}-1 & e^{t}-1 & 1-e^{t} \\ e^{t}-1 & 2 e^{t}-1 & 1-e^{t} \\ 3 e^{t}-3 & 3 e^{t}-3 & 3 -2 e^{t} \end{array} \!\right]. \] Let us verify the entry in the first row and the first column: \begin{align*} 4 e^{t}-1 & \stackrel{?}{=} 3 (2 e^{t}-1) + 1 (e^{t}-1) + (-1) (3 e^{t}-3) \\ & = 6 e^t - 3 + e^t -1 - 3 e^t + 3 \\ & = 4 e^t - 1. \end{align*} Let us verify the entry in the first row and the second column: \begin{align*} 2 e^{t}-1 & \stackrel{?}{=} 3 (e^{t}-1) + 1 (2e^{t}-1) + (-1) (3 e^{t}-3) \\ & = 3 e^t - 3 + 2 e^t -1 - 3 e^t + 3 \\ & = 2 e^t - 1. \end{align*} Let us verify the entry in the first row and the third column: \begin{align*} 1 - 2 e^{t} & \stackrel{?}{=} 3 (1 - e^{t}) + 1 (1 -e^{t}) + (-1) (3 - 2 e^{t}) \\ & = 3 - 3 e^t + 1 - e^t -3 + 2 e^t \\ & = 1 - 2 e^t. \end{align*} Let us verify the entry in the second row and the first column: \begin{align*} 2 e^{t} -1 & \stackrel{?}{=} 1 (2 e^{t}-1) + 3 (e^{t}-1) + (-1) (3 e^{t}-3) \\ & = 2 e^t - 1 + 3 e^t -3 - 3 e^t + 3 \\ & = 2 e^t - 1. \end{align*} Let us verify the entry in the second row and the second column: \begin{align*} 4 e^{t}-1 & \stackrel{?}{=} 1 (e^{t}-1) + 3 (2e^{t}-1) + (-1) (3 e^{t}-3) \\ & = e^t - 1 + 6 e^t -3 - 3 e^t + 3 \\ & = 4 e^t - 1. \end{align*} Let us verify the entry in the second row and the third column: \begin{align*} 1 - 2 e^{t} & \stackrel{?}{=} 1 (1 - e^{t}) + 3 (1 -e^{t}) + (-1) (3 - 2 e^{t}) \\ & = 1 - e^t + 3 - 3 e^t -3 + 2 e^t \\ & = 1 - 2 e^t. \end{align*} Let us verify the entry in the third row and the first column: \begin{align*} 6 e^{t} -3 & \stackrel{?}{=} 3 (2 e^{t}-1) + 3 (e^{t}-1) + (-1) (3 e^{t}-3) \\ & = 6 e^t - 3 + 3 e^t -3 - 3 e^t + 3 \\ & = 6 e^t - 3. \end{align*} Let us verify the entry in the second row and the second column: \begin{align*} 6 e^{t}-3 & \stackrel{?}{=} 3 (e^{t}-1) + 3 (2e^{t}-1) + (-1) (3 e^{t}-3) \\ & = 3 e^t - 3 + 6 e^t -3 - 3 e^t + 3 \\ & = 6 e^t - 3. \end{align*} Let us verify the entry in the second row and the third column: \begin{align*} 3 - 4 e^{t} & \stackrel{?}{=} 3 (1 - e^{t}) + 3 (1 -e^{t}) + (-1) (3 - 2 e^{t}) \\ & = 3 - 3 e^t + 3 - 3 e^t -3 + 2 e^t \\ & = 3 - 4 e^t. \end{align*} So we verified all nine entries of the product $AX(t)$ and confirmed the equality $X'(t) = AX(t)$.

Saturday, January 27, 2024

On Thursday and Friday we studied the matrix \[ A = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right]. \] The most important result (posted on Thursday) was:
Conclusion. Since the given matrix $A$ has three linearly independent eigenvectors, it is diagonalizable. The following equality is the diagonalization of $A.$ \[ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{2} & 0 \\ 0 & 0 & \bbox[#80FF80]{2} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right]. \] Please verify the preceding equality.
Knowing a diagonalization opens doors for so many calculations with a matrix $A$. First, we can positive integer powers of the matrix $A$. For example \begin{align*} A^{8}&= \bigl(P\mkern 2mu D P^{-1}\bigr)^{8} \\ \text{(the meaning of the 8th power) } & = \bigl(P\mkern 2mu D P^{-1}\bigr)\bigl(P\mkern 2mu D P^{-1}\bigr)\bigl(P\mkern 2mu D P^{-1}\bigr)\bigl(P\mkern 2mu D P^{-1}\bigr)\bigl(P\mkern 2mu D P^{-1}\bigr)\bigl(P\mkern 2mu D P^{-1}\bigr)\bigl(P\mkern 2mu D P^{-1}\bigr)\bigl(P\mkern 2mu D P^{-1}\bigr) \\ \text{(matrix multiplication is associative) } & = P\mkern 2mu D P^{-1} P\mkern 2mu D P^{-1} P\mkern 2mu D P^{-1} P\mkern 2mu D P^{-1} P\mkern 2mu D P^{-1} P\mkern 2mu D P^{-1} P\mkern 2mu D P^{-1} P\mkern 2mu D P^{-1} \\ P^{-1} P = I_3 \ \ & = P D I_3 D I_3 D I_3 D I_3 D I_3 D I_3 D I_3 D P^{-1} \\ D I_3 = D \ \ & = P DDDDDDDD P^{-1} \\ \text{(the meaning of the 8th power) } & = P D^8 P^{-1} \end{align*}
Any power of a diagonal matrix is easy to calculate. For example, \[ \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right]^8 = \left[\! \begin{array}{rrr} 1^8 & 0 & 0 \\ 0 & 2^8 & 0 \\ 0 & 0 & 2^8 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 256 & 0 \\ 0 & 0 & 256 \end{array} \!\right]. \] Therefore, \begin{align*} \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right]^8 & = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{256} & 0 \\ 0 & 0 & \bbox[#80FF80]{256} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \\ &= \left[\! \begin{array}{rrr} 511 & 255 & -255 \\[5pt] 255 & 511 & -255 \\[5pt] 765 & 765 & -509 \end{array} \!\right] \end{align*}
This logic extends to other functions. For example, the square root. What is $\sqrt{A}$ for the given $3\times 3$ matrix $A$? We will give a precise answer to this question later. For now let us find a $3\times 3$ matrix $X$ such that \[ X^2 = A. \] There are several such matrices. You can try to calculate exactly how many. One of such matrices is \begin{align*} X & = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{\sqrt{2}} & 0 \\ 0 & 0 & \bbox[#80FF80]{\sqrt{2}} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \\ &= \left[\! \begin{array}{ccc} 2 \sqrt{2}-1 & \sqrt{2}-1 & 1-\sqrt{2} \\[5pt] \sqrt{2}-1 & 2 \sqrt{2}-1 & 1-\sqrt{2} \\[5pt] 3 \sqrt{2}-3 & 3 \sqrt{2}-3 & 3-2 \sqrt{2} \end{array} \!\right] \end{align*}
Most importantly, a diagonalization makes it possible do define the matrix-valued exponential function. First recall the exponential function that you studied in calculus.
Let $a \in \mathbb{R}.$ The importance of the function $t\mapsto e^{at}$ is that it is the unique solution $y(t)$ of the problem:
Find a function $y(t)$ which satisfies: \[ y'(t) = a y(t) \quad \text{and} \quad y(0) = 1. \]

The goal of Problem 5 on Assignment 1 is to explore analogous problem in which real number $a$ is substituted with a diagonalizable $3\times 3$ matrix $A$.

In analogy with the scalar case, we define the matrix-valued exponential function $t\mapsto e^{At}$ to be the unique solution $Y(t)$ of the following problem:

Find a $3\times 3$ matrix-valued function $Y(t)$ which satisfies: \[ Y'(t) = A Y(t) \quad \text{and} \quad Y(0) = I_3. \]
From the above box, it is clear that $A = Y'(0)$. In the same spirit as the derivative with respect to $t$ of the function $t\mapsto e^{at}$ evaluated at $t = 0$ is equal to $a$.
Problem 5 on Assignment 1 deals with $3\times 3$ matrices whose entries are functions of a real variable $t$. For example, \[ \left[\!\begin{array}{ccc} 1 & t & \frac{t^2}{2} \\ 0 & 1 & t \\ 0 & 0 & 1 \\ \end{array}\!\right], \quad \left[\!\begin{array}{ccc} \cos (t) & -\sin (t) & 0 \\ \sin (t) & \cos (t) & 0 \\ 0 & 0 & e^t \end{array}\!\right], \quad \frac{1}{6} \left[\!\begin{array}{ccc} 5 e^{-t}+1 & 1-e^{-t} & 2 e^{-t}-2 \\ 1-e^{-t} & 5 e^{-t}+1 & 2 e^{-t}-2 \\ 2 e^{-t}-2 & 2 e^{-t}-2 & 2 e^{-t}+4 \\ \end{array} \!\right]. \] We will consider only functions which are defined for all real numbers $t$. That is we consider functions $F$ such that \[ F:\mathbb{R} \to \mathbb{R}^{3\times 3} \] ($F$ is defined for all real numbers and the values of $F$ are $3\times 3$ matrices).
Let \[ F(t) = \left[\!\begin{array}{ccc} f_{11}(t) & f_{12}(t) & f_{13}(t) \\ f_{21}(t) & f_{22}(t) & f_{23}(t) \\ f_{31}(t) & f_{32}(t) & f_{33}(t) \\ \end{array} \!\right], \] where all the functions $f_{jk}(t)$ with $j,k \in \{1,2,3\}$ are differentiable functions. Then the derivative $F'(t)$ is the matrix-valued function whose entries are the derivatives of the entries of $F(t)$. That is \[ F'(t) = \left[\!\begin{array}{ccc} f_{11}'(t) & f_{12}'(t) & f_{13}'(t) \\ f_{21}'(t) & f_{22}'(t) & f_{23}'(t) \\ f_{31}'(t) & f_{32}'(t) & f_{33}'(t) \\ \end{array} \!\right]. \]
For example, the derivatives of the three matrix-valued functions that we listed as examples are: \[ \left[\!\begin{array}{ccc} 0 & 1 & t \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{array}\!\right], \quad \left[\!\begin{array}{ccc} -\sin (t) & -\cos (t) & 0 \\ \cos (t) & -\sin (t) & 0 \\ 0 & 0 & e^t \\ \end{array}\!\right], \quad \frac{1}{6} \left[\!\begin{array}{ccc} -5 e^{-t} & e^{-t} & -2 e^{-t}\\ e^{-t} & -5 e^{-t} & -2 e^{-t} \\ -2 e^{-t} & -2 e^{-t} & - 2 e^{-t} \\ \end{array} \!\right]. \]
You learned many formulas for derivatives in calculus. Here we will bring up only one such formula for matrix valued functions. Let $F(t)$ be a differentiable $3\times 3$ matrix-valued function and let $A$ and $B$ be constant $3\times 3$ matrices. Then \[ \frac{d}{dt} \bigl( A \mkern 1mu F(t) \mkern 1mu B \bigr) = A \mkern 1mu F'(t) \mkern 1mu B. \]
Let us now consider each of the three examples of matrix-valued function separately.
- For the first example we have \[ Y(t) = \left[\!\begin{array}{ccc} 1 & t & \frac{t^2}{2} \\ 0 & 1 & t \\ 0 & 0 & 1 \\ \end{array}\!\right], \quad Y'(t) = \left[\!\begin{array}{ccc} 0 & 1 & t \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{array}\!\right] \] and \[ Y(0) = \left[\!\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1 \\ \end{array}\!\right], \quad Y'(0) = \left[\!\begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{array}\!\right] = A. \] Let us now verify whether $Y'(t) = A Y(t)$ \[ \left[\!\begin{array}{ccc} 0 & 1 & t \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{array}\!\right] = \left[\!\begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{array}\!\right] \left[\!\begin{array}{ccc} 1 & t & \frac{t^2}{2} \\ 0 & 1 & t \\ 0 & 0 & 1 \\ \end{array}\!\right]. \] Since the above identity holds true, we have in fact proved that \[ {\huge e}^{^{\left[\!\begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{array}\!\right]t}} = \left[\!\begin{array}{ccc} 1 & t & \frac{t^2}{2} \\ 0 & 1 & t \\ 0 & 0 & 1 \\ \end{array}\!\right]. \]
- For the second example we have \[ Y(t) = \left[\!\begin{array}{ccc} \cos (t) & -\sin (t) & 0 \\ \sin (t) & \cos (t) & 0 \\ 0 & 0 & e^t \\ \end{array}\!\right], \quad Y'(t) = \left[\!\begin{array}{ccc} -\sin (t) & -\cos (t) & 0 \\ \cos (t) & -\sin (t) & 0 \\ 0 & 0 & e^t \\ \end{array}\!\right] \] and \[ Y(0) = \left[\!\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1 \\ \end{array}\!\right], \quad Y'(0) = \left[\!\begin{array}{ccc} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{array}\!\right] = A. \] Let us now verify whether $Y'(t) = A Y(t)$ \[ \left[\!\begin{array}{ccc} -\sin (t) & -\cos (t) & 0 \\ \cos (t) & -\sin (t) & 0 \\ 0 & 0 & e^t \\ \end{array}\!\right] = \left[\!\begin{array}{ccc} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{array}\!\right] \left[\!\begin{array}{ccc} \cos (t) & -\sin (t) & 0 \\ \sin (t) & \cos (t) & 0 \\ 0 & 0 & e^t \\ \end{array}\!\right]. \] Since the above identity holds true, we have in fact proved that \[ {\huge e}^{^{\left[\!\begin{array}{ccc} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{array}\!\right] t}} = \left[\!\begin{array}{ccc} \cos (t) & -\sin (t) & 0 \\ \sin (t) & \cos (t) & 0 \\ 0 & 0 & e^t \\ \end{array}\!\right]. \]
- For the third example we have \[ Y(t) = \frac{1}{6} \left[\!\begin{array}{ccc} 5 e^{-t}+1 & 1-e^{-t} & 2 e^{-t}-2 \\ 1-e^{-t} & 5 e^{-t}+1 & 2 e^{-t}-2 \\ 2 e^{-t}-2 & 2 e^{-t}-2 & 2 e^{-t}+4 \\ \end{array} \!\right], \quad Y'(t) = \frac{1}{6} \left[\!\begin{array}{ccc} -5 e^{-t} & e^{-t} & -2 e^{-t}\\ e^{-t} & -5 e^{-t} & -2 e^{-t} \\ -2 e^{-t} & -2 e^{-t} & - 2 e^{-t} \\ \end{array} \!\right] \] and \[ Y(0) = \left[\!\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1 \\ \end{array}\!\right], \quad Y'(0) = \frac{1}{6} \left[\!\begin{array}{ccc} -5 & 1 & -2 \\ 1 & -5 & 2 \\ -2 & -2 & -2 \\ \end{array}\!\right] = A. \] Let us now verify whether $Y'(t) = A Y(t)$ \[ \frac{1}{6}\left[\!\begin{array}{ccc} -5 e^{-t} & e^{-t} & -2 e^{-t}\\ e^{-t} & -5 e^{-t} & -2 e^{-t} \\ -2 e^{-t} & -2 e^{-t} & - 2 e^{-t} \\ \end{array} \!\right] = \frac{1}{6} \left[\!\begin{array}{ccc} -5 & 1 & -2 \\ 1 & -5 & 2 \\ -2 & -2 & -2 \\ \end{array}\!\right] \frac{1}{6}\left[\!\begin{array}{ccc} 5 e^{-t}+1 & 1-e^{-t} & 2 e^{-t}-2 \\ 1-e^{-t} & 5 e^{-t}+1 & 2 e^{-t}-2 \\ 2 e^{-t}-2 & 2 e^{-t}-2 & 2 e^{-t}+4 \\ \end{array} \!\right]. \] Since the above identity holds true, we have in fact proved that \[ {\huge e}^{^{\frac{1}{6} \left[\!\begin{array}{ccc} -5 & 1 & -2 \\ 1 & -5 & 2 \\ -2 & -2 & -2 \\ \end{array}\!\right] t}} = \frac{1}{6} \left[\!\begin{array}{ccc} 5 e^{-t}+1 & 1-e^{-t} & 2 e^{-t}-2 \\ 1-e^{-t} & 5 e^{-t}+1 & 2 e^{-t}-2 \\ 2 e^{-t}-2 & 2 e^{-t}-2 & 2 e^{-t}+4 \\ \end{array} \!\right]. \]

Friday, January 26, 2024

Talking to a student during an office hour and inspired by student's questions and suggestions, I realized that the following hint might be helpful. In Problem 1 on Assignment 1. As an example, I will use the matrix $B$ from Problem 1. In Problem 1 all questions are about column spaces. The easiest way to deal with a column space is to find its basis. Using two Reduced Row Echelon Forms we can find two bases for $\operatorname{Col}(B)$.
- We need to find the Reduced Row Echelon Form of $B$ and the Reduced Row Echelon Form of $B^\top$.
- Find the Reduced Row Echelon Form of $B$: \[ B = \left[\! \begin{array}{ccc} 1 & 2 & 0 \\ 2 & 3 & 1 \\ 3 & 1 & 5 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{ccc} 1 & 0 & 2 \\ 0 & 1 & -1 \\ 0 & 0 & 0 \end{array} \!\right] \] The Reduced Row Echelon Form of $B$ tells us that \[ \operatorname{Col}(B) = \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \!\right], \left[\! \begin{array}{c} 2 \\ 3 \\ 1 \end{array} \!\right] \right\} \] But the preceding equality also tells us \[ \operatorname{Col}(B) = \operatorname{Col} \left[\! \begin{array}{ccc} 1 & 2 & 0 \\ 2 & 3 & 1 \\ 3 & 1 & 5 \end{array} \!\right] = \operatorname{Col} \left[\! \begin{array}{cc} 1 & 2 \\ 2 & 3 \\ 3 & 1 \end{array} \!\right] \]
- Find the Reduced Row Echelon Form of $B^\top$: \[ B = \left[\! \begin{array}{ccc} 1 & 2 & 3 \\ 2 & 3 & 1 \\ 0 & 1 & 5 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{ccc} 1 & 0 & -7 \\ 0 & 1 & 5 \\ 0 & 0 & 0 \end{array} \!\right] \] The Reduced Row Echelon Form of $B^\top$ tells us that \[ \operatorname{Col}(B) = \operatorname{Row}(B^\top) = \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ -7 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \end{array} \!\right] \right\} \] But the preceding equality also tells us \[ \operatorname{Col}(B) = \operatorname{Col} \left[\! \begin{array}{ccc} 1 & 2 & 0 \\ 2 & 3 & 1 \\ 3 & 1 & 5 \end{array} \!\right] = \operatorname{Col} \left[\! \begin{array}{cc} 1 & 0 \\ 0 & 1 \\ -7 & 5 \end{array} \!\right] \]
- Now we have three matrices with the same column space: \[ \operatorname{Col}(B) = \operatorname{Col} \left[\! \begin{array}{ccc} 1 & 2 & 0 \\ 2 & 3 & 1 \\ 3 & 1 & 5 \end{array} \!\right] = \operatorname{Col} \left[\! \begin{array}{cc} 1 & 2 \\ 2 & 3 \\ 3 & 1 \end{array} \!\right] = \operatorname{Col} \left[\! \begin{array}{cc} 1 & 0 \\ 0 & 1 \\ -7 & 5 \end{array} \!\right] \]
- Now, we have a question: Which of the three matrices with the same column space is the easiest to deal with? What does it mean easy to deal with? I think that it is quite convincing that it is easiest to deal with a matrix with lots of zeros. Thus, it is reasonable to continue with the matrix \[ \left[\! \begin{array}{cc} 1 & 0 \\ 0 & 1 \\ -7 & 5 \end{array} \!\right] \] instead of $B$.
- We can do the same for all matrices that are given in Problem 1: $A$, $C$, $E$ and $F$. In each case we find a matrix that has the same column space and fewer columns and more zeros. I hope this helps.

Thursday, January 25, 2024

Read Section 5.1. Suggested problems for Section 5.1: 1, 3, 4, 5, 6, 8, 11, 15, 16, 17, 19, 20, 24-27, 29, 30, 31.
A related Wikipedia link: Eigenvalue, eigenvector and eigenspace.
Below are animations of different matrices in action. In each scene the navy blue vector is the image of the sea green vector under the multiplication by a matrix $A$. For easier visualization of the action the heads of vectors leave traces.
Just looking at the movies you can guess what the eigenvalues and eigenvectors of the featured matrix are. In particular it is easy to see whether an eigenvalue is positive, negative, zero, or complex, ...
Moreover, looking at the movies, you can also SEE what the matrix used in each movie is. This is done by using, what I called "matrix surgery":
(n-by-n matrix M) (k-th column of the n-by-n identity matrix) = (k-th column of the n-by-n matrix M).
Each movie starts with the see green vector at position $\displaystyle \begin{bmatrix} 1 \\ 0 \end{bmatrix}.$ Since this vector is the first column of the $2\times2$ identity matrix, the corresponding navy blue vector is the first column of the matrix $A$ used in that movie. The featured still picture from each movie presents the see green vector at position $\displaystyle \begin{bmatrix} 0 \\ 1 \end{bmatrix}.$ Since this vector is the second column of the $2\times2$ identity matrix, the corresponding navy blue vector is the second column of the matrix $A$ used in that movie.

Place the cursor over the image to start the animation.

For Section 5.2 do 1-8, 11, 12, 14, 15, (in all these problems you can find eigenvectors as well) 9, 13, 18, 19, 20, 21, 24, 25, 27.
Two examples follow.
Example 1. In this item I will illustrate how to calculate eigenvalues and the corresponding eigenspaces of a specific $3\!\times\!3$ matrix. Consider the matrix \[ A = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] . \]
- First we find the characteristic polynomial of this matrix. The characteristic polynomial is the determinant of the following matrix: \[ A - \lambda I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} \lambda & 0 & 0 \\ 0 & \lambda & 0 \\ 0 & 0 & \lambda \end{array} \!\right] = \left[\! \begin{array}{ccc} 3-\lambda & 1 & -1 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right] \] Next we calculate this determinant: \begin{align*} \left|\! \begin{array}{ccc} 3-\lambda & 1 & -1 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right| &= \left|\! \begin{array}{ccc} 2-\lambda & -2+\lambda & 0 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right| \\ &= \left|\! \begin{array}{ccc} 2-\lambda & 0 & 0 \\ 1 & 4-\lambda & -1 \\ 3 & 6 & -1-\lambda \end{array} \!\right| \\ &= (2-\lambda) \bigl( (4-\lambda)(-1-\lambda) + 6 \bigr) \\ & = (2-\lambda)\bigl(\lambda^2 - 3 \lambda + 2\bigr) \\ & = -(\lambda - 2)^2 ( \lambda - 1) \end{align*} (At the first equality sign, we subtracted the second row from the first. At the second equality sign, we added the first column to the second. These operations do not change the value of a determinant.)
- Thus the eigenvalues of the matrix $A$ are $1$ and $2.$
- Next we find the eigenspace corresponding to the eigenvalue $1.$ For that we need to find the nullspace of the matrix \[ A - 1 I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right] = \left[\! \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right]. \] So, we row reduce the preceding matrix and find its nullspace: \[ \left[\! \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 2 & -1 \\ 0 & 3 & -1 \\ 0 & 3 & -1 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 2 & -1 \\ 0 & 1 & -1/3 \\ 0 & 0 & 0 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 0 & -1/3 \\ 0 & 1 & -1/3 \\ 0 & 0 & 0 \end{array} \!\right]. \] Thus, the eigenspace corresponding to the eigenvalue $1$ is the subspace \[ \left\{ \left[\! \begin{array}{c} s/3 \\ s/3 \\ s \end{array} \!\right] \ : \ s \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right] \right\}. \] Hence one eigenvector is $\left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right].$
- Next we find the eigenspace corresponding to the eigenvalue $2.$ For that we need to find the nullspace of the matrix \[ A - 2 I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 1 & 1 & -1 \\ 3 & 3 & -3 \end{array} \!\right]. \] So, we row reduce the preceding matrix and find its nullspace: \[ \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 1 & 1 & -1 \\ 3 & 3 & -3 \end{array} \!\right] \sim \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \!\right]. \] Thus, the eigenspace is the subspace \[ \left\{ \left[\! \begin{array}{c} -s + t \\ s \\ t \end{array} \!\right] \ : \ s, t \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] Hence two linearly independent eigenvectors corresponding to the eigenvalue $2$ are $\left[\! \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \!\right]$ and $\left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right].$
- The magic of what we found by now is that we found a basis of $\mathbb{R}^3$ which consists of eigenvectors of $A:$ \[ \left[\bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}}\right], \quad \left[\bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}}\right], \quad \left[\bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \right]. \]
- It is easy to verify whether these are really eigenvectors: \begin{align*} \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] \left[\bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}}\right] & = \bbox[#FF8000]{1} \left[\bbox[#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}}\right], \\ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array}\!\right] \left[\bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}}\right] & = \bbox[#8080FF]{2} \left[\bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} \right], \\ \left[\!\begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array}\!\right] \left[\bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}}\right] & = \bbox[#80FF80]{2} \left[\bbox[#80FF80]{ \begin{array}{c} 1 \\ 0 \\ 1 \end{array}}\right]. \end{align*} Yes, they are.
- Three matrix-vector equalities from the preceding item can be expressed as one matrix equality as follows: \[ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array}\!\right] \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{2} & 0 \\ 0 & 0 & \bbox[#80FF80]{2} \end{array} \!\right] . \] The preceding matrix equality is the first step towards a diagonalization of the given matrix.
- In the next item we will prove that the matrix whose columns are the eigenvectors is invertible. Therefore we can write the matrix equality from the preceding item as follows: \[ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array}\!\right] = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{2} & 0 \\ 0 & 0 & \bbox[#80FF80]{2} \end{array} \!\right] \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right]^{-1} \] The preceding equality is called the diagonalization of $A.$ Although, one more step is needed; we need to find the inverse which appears in the preceding equality. We calculate the inverse in the next item.
- I also made the claim that three eigenvectors are linearly independent. Let us verify that as well. \[ \left[\!\begin{array}{rrr|ccc} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} & \begin{array}{c} 1 \\ 0 \\ 0 \end{array} & \begin{array}{c} 0 \\ 1 \\ 0 \end{array} & \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{ccc|rrr} 1 & 0 & 0 & -1 & -1 & 1\\ 0 & 1 & 0 & 1 & 2 & -1 \\ 0 & 0 & 1 & 3 & 3 & -2 \end{array} \!\right]. \] We know that the right-hand side matrix in the Reduced Row Echelon Form is the inverse of the matrix whose columns are the eigenvectors. To verify the row reduction above, we calculate: \[ \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right]. \]
- Conclusion. Since the given matrix $A$ has three linearly independent eigenvectors, it is diagonalizable. The following equality is the diagonalization of $A.$ \[ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] = \left[\begin{array}{rrr} \bbox [#FF8000]{\begin{array}{c} 1 \\ 1 \\ 3 \end{array}} & \bbox[#8080FF]{\begin{array}{r} -1 \\ 1 \\ 0 \end{array}} & \bbox[#80FF80]{\begin{array}{c} 1 \\ 0 \\ 1 \end{array}} \end{array} \right] \left[\! \begin{array}{rrr} \bbox [#FF8000]{1} & 0 & 0 \\ 0 & \bbox[#8080FF]{2} & 0 \\ 0 & 0 & \bbox[#80FF80]{2} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \] Please verify the preceding equality.
Example 2. In this item I will illustrate how to calculate eigenvalues and the corresponding eigenspaces of a specific $4\!\times\!4$ matrix. The purpose is to demonstrate a matrix that is not diagonalizable. Consider the matrix \[ A = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] . \]
- First we find the characteristic polynomial of this matrix. The characteristic polynomial is the determinant of the following matrix: \[ A - \lambda I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] - \left[\! \begin{array}{rrrr} \lambda & 0 &0 & 0 \\ 0 & \lambda & 0 & 0 \\ 0 & 0 & \lambda & 0 \\ 0 & 0 & 0 & \lambda \end{array} \!\right] = \left[\! \begin{array}{cccc} -\lambda & 0 & -1 & -1 \\ -1 & -\lambda & 0 & 0 \\ 2 & 1 & 2-\lambda & 1 \\ -2 & -1 & -1 & -\lambda \end{array} \!\right] \] Next we calculate the determinant of the preceding matrix: \begin{align*} \left|\! \begin{array}{cccc} -\lambda & 0 & -1 & -1 \\ -1 & -\lambda & 0 & 0 \\ 2 & 1 & 2-\lambda & 1 \\ -2 & -1 & -1 & -\lambda \end{array} \!\right| & = \left|\!\begin{array}{cccc} -\lambda & \lambda^2 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 - 2 \lambda & 2-\lambda & 1 \\ -2 & -1 + 2\lambda & -1 & -\lambda \end{array}\!\right| \\[6pt] & = \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 - 2 \lambda & 2-\lambda & 1 \\ -1 + 2\lambda & -1 & -\lambda \end{array}\!\right| \\[6pt] & = \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1-\lambda & 1 -\lambda \end{array}\!\right| \\[6pt] & = (1-\lambda) \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1 & 1 \end{array}\!\right| \\[6pt] & = (1-\lambda) \left|\!\begin{array}{ccc} \lambda^2 & 0 & 0 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1 & 1 \end{array}\!\right| \\[6pt] & = \lambda^2 (1-\lambda) \left|\!\begin{array}{cc} 2-\lambda & 1 \\ 1 & 1 \end{array}\!\right| \\[6pt] & = \lambda^2 (1-\lambda) (1-\lambda) \end{align*} (At the first equality sign, we subtracted the first column multiplied by $-\lambda$ from the second column. At the second equality sign, we perform the cofactor expansion along the second row. At the third equality sign, we add the second row to the third. At the fourth equality sign, we factor out the common factor $(1-\lambda)$ from the third row. At the fifth equality sign, we add the third row to the first. At the sixth equality sign, we perform the cofactor expansion along the first row. At the last equality sign, we calculate the $2\!\times\!2$ determinant.)
- Thus the eigenvalues of the matrix $A$ are $0$ and $1.$ The algebraic multiplicities of both eigenvalues is $2.$ Next we calculate the geometric multiplicities of these eigenvalues.
- Next we find the eigenspace corresponding to the eigenvalue $0.$ For that we need to find the nullspace of the matrix \[ A - 0 I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] \]
- So, we row reduce the matrix $A:$ \[ \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \end{array} \!\right] \] The nullspace of the preceding matrix is the eigenspace corresponding to the eigenvalue $0$; which we calculate to be \[ \operatorname{Nul}(A-0 I_4) = \left\{ \left[\! \begin{array}{c} 0 \\ s \\ -s \\ s \end{array} \!\right] \ : \ s \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \\ 1 \end{array} \!\right] \right\}. \]
- Thus, the eigenspace corresponding to the eigenvalue $0$ is one-dimensional.
- Next we find the eigenspace corresponding to the eigenvalue $1.$ For that we need to find the nullspace of the matrix \[ A - 1 I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] - \left[\! \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \!\right] = \left[\! \begin{array}{rrrr} -1 & 0 & -1 & -1 \\ -1 & -1 & 0 & 0 \\ 2 & 1 & 1 & 1 \\ -2 & -1 & -1 & -1 \end{array} \!\right] \]
- So, we row reduce the preceding matrix: \[ \left[\! \begin{array}{rrrr} -1 & 0 & -1 & -1 \\ -1 & -1 & 0 & 0 \\ 2 & 1 & 1 & 1 \\ -2 & -1 & -1 & -1 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{rrrr} 1 & 0 & 1 & 1 \\ 0 & 1 & -1 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \!\right] \] Thus, the eigenspace corresponding to the eigenvalue $1$ is the subspace \[ \operatorname{Nul}(A-1 I_4) = \left\{ \left[\! \begin{array}{c} -s-t \\ s+t \\ s \\ t \end{array} \!\right] \ : \ s, t \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} -1 \\ 1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \\ 1 \end{array} \!\right] \right\}. \]
- Thus, the eigenspace corresponding to the eigenvalue $1$ is two-dimensional.
- Conclusion. Since we found all eigenspaces of the $4\!\times\!4$ matrix $A$ and these eigenspaces have dimensions $1$ and $2$, we conclude that we can have at most three linearly independent eigenvectors. Consequently, we can not have a basis for $\mathbb R^4$ which consists of eigenvectors of $A.$ This shows that the matrix $A$ is not diagonalizable.

Tuesday, January 23, 2024

I wrote about vector spaces of trigonometric functions on Friday, January 19, 2024. There is much more about vector spaces of trigonometric functions then what is presented on Friday and in four problems in the textbook. I wrote about vector space of trigonometric functions on this webpage. The linked webpage explores the relationship between the powers of cosine function and the multiple angle cosine functions.
Below I want to present a change of coordinates matrix in a vector space of polynomials which requires only the Binomial Theorem. The Binomial Theorem is the theorem that you might have seen in a college algebra class: \begin{align*} (u+v)^1 & = u+v, \\ (u+v)^2 & = u^2+2\mkern 2mu u v + v^2, \\ (u+v)^3 & = u^3+ 3\mkern 2mu u^2 v + 3\mkern 2mu u v^2 + v^3,\\ (u+v)^4 & = u^4+ 4\mkern 2mu u^3 v + 6\mkern 2mu u^2 v^2 + 4\mkern 2mu u v^3 + v^4,\\ (u+v)^5 & = u^5+ 5\mkern 2mu u^4 v + 10\mkern 2mu u^3 v^2 + 10\mkern 2mu u^2 v^3 + 5\mkern 2mu u v^4 + v^5, \end{align*} and so on.

We do not need the general version of the Binomial Theorem here, but, since we mentioned it I want to introduce you to the important concepts related to the Binomial Theorem.
- In general, if $n \in \mathbb{N}$ we have \[ (u+v)^n = \sum_{k=0}^n \binom{n}{k} \mkern 2mu u^{n-k} v^k = u^n + n \mkern 2mu u^{n-1} v + \frac{n(n-1)}{2}\mkern 2mu u^{n-2} v^2 + \cdots + \frac{n(n-1)}{2}\mkern 2mu u^{2} v^{n-2} + n\mkern 2mu u v^{n-1} + v^n. \]
- In the above formula, for $ n, k \in \{0\}\cup\mathbb{N}$ with $k \leq n $, the symbol $ \displaystyle \binom{n}{k} $ (read as "n choose k") denotes the Binomial coefficient. The definition is:
  
  \[ \binom{n}{k} = \frac{n!}{k! \, (n-k)!}, \]
  
  where for $ m \in \mathbb{N} $, $ m! $ (read as "m factorial") is the product of all positive integers up to $ m $. By convention $ 0! = 1 $.
- Recursive definitions of functions defined on nonnegative integers are at the cornerstone of mathematics. Therefore, it is appropriate here to point out to the recursive definitions of the factorial and the binomial coefficients.
- The recursive definition of the factorial is as follows:
  
  The Base Case: $0!=1$
  
  The Recursive Step: For all $m\in\mathbb{N}$ we set $m! = \bigl( (m-1)! \bigr) \mkern 2px m$.
  
  For more details, see Factorial.
- The recursive definition of the binomial coefficients is as follows:
  
  The Base Case: \begin{equation*} \text{For all} \ \ n \in \{0\}\cup\mathbb{N} \quad \text{we set} \quad \binom{n}{0} = 1 \quad \text{and} \quad \binom{n}{n} = 1. \end{equation*} The Recursive Step: \begin{equation*} \text{For all} \ \ n \in \mathbb{N} \ \ \text{and} \ \ k \in \{1,\ldots,n\} \quad \text{we set} \quad \binom{n+1}{k} = \binom{n}{k-1} + \binom{n}{k}. \end{equation*} At each line below, the recursive step with specific values for $n$ and $k$ and the previously evaluated values (that is why it is called a recursion, see the next item below) for the binomial coefficients yields:
  \begin{alignat*}{2} &\text{For } n=2, \ k=1 \qquad &&\binom{2}{1} = \binom{1}{0} + \binom{1}{1} = 1 + 1 = 2, \\ &\text{For } n=3, \ k=1 &&\binom{3}{1} = \binom{2}{0} + \binom{2}{1} = 1 + 2 = 3, \\ &\text{For } n=3, \ k=2 &&\binom{3}{2} = \binom{2}{1} + \binom{2}{2} = 2 + 1 = 3, \\ &\text{For } n=4, \ k=1 &&\binom{4}{1} = \binom{3}{0} + \binom{3}{1} = 1 + 3 = 4, \\ &\text{For } n=4, \ k=2 &&\binom{4}{2} = \binom{3}{1} + \binom{3}{2} = 3 + 3 = 6, \\ &\text{For } n=4, \ k=3 &&\binom{4}{3} = \binom{3}{2} + \binom{3}{3} = 3 + 1 = 4, \\ &\text{For } n=5, \ k=1 &&\binom{5}{1} = \binom{4}{0} + \binom{4}{1} = 1 + 4 = 5, \\ &\text{For } n=5, \ k=2 &&\binom{5}{2} = \binom{4}{1} + \binom{4}{2} = 4 + 6 = 10, \\ &\text{For } n=5, \ k=3 &&\binom{5}{3} = \binom{4}{2} + \binom{4}{3} = 6 + 4 = 10, \\ &\text{For } n=5, \ k=4 &&\binom{5}{4} = \binom{4}{3} + \binom{4}{4} = 4 + 1 = 5, \\ & & & \quad \quad \mkern 12px \vdots \end{alignat*}
  
  For more details about this recursion, see Pascal's triangle.
Yesterday we introduced the standard basis of a vector space of polynomials.
- For example, consider the vector space of all polynomials of degree less or equal to $4.$ This vector space of polynomials is denoted by $\mathbb{P}_4$. We have \[ \mathbb{P}_4 = \bigl\{\alpha_0 + \alpha_1 x + \alpha_2 x^2 + \alpha_3 x^3 + \alpha_4 x^4 \, : \, \alpha_0, \alpha_1, \alpha_2, \alpha_3, \alpha_4 \in \mathbb{R} \bigr\}, \] that is \[ \mathbb{P}_4 = \operatorname{Span}\bigl\{ 1, x, x^2, x^3, x^4 \bigr\}. \]
- Yesterday we proved that the monomials \[ \mathcal{M} = \bigl\{ 1, x, x^2, x^3, x^4 \bigr\} \] are linearly independent. Therefore, $\mathcal{M}$ is a basis for $\mathbb{P}_4$. This basis is called the standard basis of a vector space of polynomials $\mathbb{P}_4$.
- Let us introduce another basis for $\mathbb{P}_4$. Let $a \in \mathbb{R}$ be any real number and consider the shifts of the monomials in $\mathcal{M}$: \[ \mathcal{S}_a = \bigl\{ 1, x-a, (x-a)^2, (x-a)^3, (x-a)^4 \bigr\}. \]
- Let us calculate the coordinates of the polynomials in $\mathcal{S}_a$ relative to the standard basis $\mathcal{M}$. To get the coefficients we apply the Binomial Theorem: \begin{align*} (x-a)^1 & = -a + x, \\ (x-a)^2 & = \phantom{-} a^2 - 2\mkern 2mu a x + x^2, \\ (x-a)^3 & = -a^3 + 3\mkern 2mu a^2 x - 3\mkern 2mu a x^2 + x^3,\\ (x-a)^4 & = \phantom{-} a^4 - 4\mkern 2mu a^3 x + 6\mkern 2mu a^2 x^2 - 4\mkern 2mu a x^3 + x^4. \end{align*}
- It follows from the preceding item that the coordinate vectors of the polynomials in $\mathcal{S}_a$ are as follows: \[ [1]_{\mathcal{M}} = \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \quad [x-a]_{\mathcal{M}} = \begin{bmatrix} -a \\ 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \quad [(x-a)^2]_{\mathcal{M}} = \begin{bmatrix} a^2 \\ -2a \\ 1 \\ 0 \\ 0 \end{bmatrix}, \quad [(x-a)^3]_{\mathcal{M}} = \begin{bmatrix} -a^3 \\ 3 a^2 \\ -3 a \\ 1 \\ 0 \end{bmatrix}, \quad [(x-a)^4]_{\mathcal{M}} = \begin{bmatrix} a^4 \\ -4a^3 \\ 6 a^2 \\ -4a \\ 1 \end{bmatrix}. \] Since the above five vectors are linearly independent, the polynomials in $\mathcal{S}_a$ form a basis for $\mathbb{P}_4$. The change of coordinates matrix is \[ \underset{\mathcal{M}\leftarrow\mathcal{S}_a}{P} = \begin{bmatrix} 1 & -a & a^2 & -a^3 & a^4 \\ 0 & 1 & -2a & 3 a^2 & -4 a^3 \\ 0 & 0 & 1 & - 3 a & 6 a^2 \\ 0 & 0 & 0 & 1 & -4 a \\ 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix} \]
- The next task is to calculate the change of coordinates matrix $\displaystyle \underset{\mathcal{S}_a\leftarrow\mathcal{M}}{P}$. For this matrix we need the coordinate vectors: \[ [1]_{\mathcal{S}_a}, \quad [x]_{\mathcal{S}_a}, \quad [x^2]_{\mathcal{S}_a}, \quad [x^3]_{\mathcal{S}_a}, \quad [x^4]_{\mathcal{S}_a}. \] We use the fact that $x = a + (x-a)$ and apply the Binomial Theorem: \begin{align*} x & = a + (x-a), \\ x^2 & = \bigl(a + (x-a)\bigr)^2 = a^2 + 2 a \mkern 2mu (x-a) + (x-a)^2, \\ x^3 & = \bigl(a + (x-a)\bigr)^3 = a^3 + 3 a^2 \mkern 2mu (x-a) + 3 a \mkern 2mu (x-a)^2 + (x-a)^3, \\ x^4 & = \bigl(a + (x-a)\bigr)^4 = a^4 + 4 a^3 \mkern 2mu (x-a) + 6 a^2 \mkern 2mu (x-a)^2 + 4 a \mkern 2mu (x-a)^3 + (x-a)^4. \end{align*}
- It follows from the preceding item that the coordinate vectors of the polynomials in $\mathcal{M}$ relative to the basis $\mathcal{S}_a$ are as follows: \[ [1]_{\mathcal{M}} = \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \quad [x]_{\mathcal{M}} = \begin{bmatrix} a \\ 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \quad [x^2]_{\mathcal{M}} = \begin{bmatrix} a^2 \\ 2a \\ 1 \\ 0 \\ 0 \end{bmatrix}, \quad [x^3]_{\mathcal{M}} = \begin{bmatrix} a^3 \\ 3 a^2 \\ 3 a \\ 1 \\ 0 \end{bmatrix}, \quad [x^4]_{\mathcal{M}} = \begin{bmatrix} a^4 \\ 4a^3 \\ 6 a^2 \\ 4a \\ 1 \end{bmatrix}. \] Therefore, the change of coordinates matrix $\displaystyle \underset{\mathcal{S}_a\leftarrow\mathcal{M}}{P}$ is \[ \underset{\mathcal{S}_a\leftarrow\mathcal{M}}{P} = \begin{bmatrix} 1 & a & a^2 & a^3 & a^4 \\ 0 & 1 & 2a & 3 a^2 & 4 a^3 \\ 0 & 0 & 1 & 3 a & 6 a^2 \\ 0 & 0 & 0 & 1 & 4 a \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}. \]
- To celebrate our work we verify \[ \Bigl(\underset{\mathcal{M}\leftarrow\mathcal{S}_a}{P}\Bigr) \Bigl( \underset{\mathcal{S}_a\leftarrow\mathcal{M}}{P} \Bigr) = \begin{bmatrix} 1 & -a & a^2 & -a^3 & a^4 \\ 0 & 1 & -2a & 3 a^2 & -4 a^3 \\ 0 & 0 & 1 & - 3 a & 6 a^2 \\ 0 & 0 & 0 & 1 & -4 a \\ 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 1 & a & a^2 & a^3 & a^4 \\ 0 & 1 & 2a & 3 a^2 & 4 a^3 \\ 0 & 0 & 1 & 3 a & 6 a^2 \\ 0 & 0 & 0 & 1 & 4 a \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix} \]

Monday, January 22, 2024

Today we discussed the concept of a basis in the context of polynomials. In the textbook that is Section 4.4, specifically Example 5, the theory before it and Problems 13 and 14. These two problems deals with the space $\mathbb{P}_2$ of all polynomials of the degree less or equal than $2$. Below I use the space $\mathbb{P}_3$ of all polynomials of the degree less or equal than $3$ to illustrate the basic concepts.
The definition of a linearly independent set on page 210. Next, I will restate this definition as an implication: An indexed set of vectors $\{\mathbf{v}_1,\ldots,\mathbf{v}_m\}$ in a vector space $\mathcal{V}$ is said to be linearly independent if the following implication holds \[ \alpha_1 \mathbf{v}_1 + \cdots + \alpha_m \mathbf{v}_m = \mathbf{0} \quad \text{implies} \quad \alpha_k = 0 \quad \text{for all} \quad k \in \{1,\ldots,m\}. \] There are many other equivalent ways of stating this definition. However, the above statement is the only formal definition which is easiest to use when we need to prove that certain vectors are linearly independent.
The definition of a basis is in Section 4.3. Let $\mathcal{H}$ be a subspace of a vector space $\mathcal{V}$. An indexed set of vectors $\mathcal{B} = \{\mathbf{v}_1,\ldots,\mathbf{v}_m\}$ in a vector space $\mathcal{H}$ is said to be a basis for $\mathcal{H}$ if the following to conditions are satisfied
- $\mathcal{B} = \{\mathbf{v}_1,\ldots,\mathbf{v}_m\}$ is a linearly independent set.
- $\mathcal{H} = \operatorname{Span}\{\mathbf{v}_1,\ldots,\mathbf{v}_m\}$.
Important examples of finite dimensional vector spaces are spaces of polynomials. For $n\in\mathbb{N}$ by $\mathbb{P}_n$ we denote the vector space of all polynomials of degree less or equal to $n.$ The most important step in understanding the vector space $\mathbb{P}_n$ is establishing that the monomials form a basis of this space. I wrote this webpage with a proof which uses only linear algebra. The proof which I give below for $\mathbb{P}_3$ uses calculus. The proof which is given in our textbook uses the Fundamental Theorem of Algebra (which is more difficult to prove).
Below we study the polynomial space $\mathbb{P}_3$, that is the vector space of all polynomials of degree less or equal to $3.$
- $\mathbb{P}_3$ denotes the vector space of all polynomials of degree less or equal $3.$ That is, in set-builder notation, \[ \mathbb{P}_3 = \bigl\{a_0 + a_1 x + a_2 x^2 + a_3 x^3 \, : \, a_0, a_1, a_2, a_3 \in \mathbb{R} \bigr\}. \] Recall that the constant polynomial $1$ is a polynomial in $\mathbb{P}_3$. To get this polynomial in the above set-builder notation we take $a_0 = 1,$ $a_1 = 0,$ $a_2 =0,$ and $a_3 = 0.$ To get the polynomial $x$ in the above set-builder notation we take $a_0 = 0,$ $a_1 = 1,$ $a_2 = 0,$ and $a_3 = 0.$ Similarly, to get the square polynomial $x^2$ in the above set-builder notation we take $a_0 = 0,$ $a_1 = 0,$ $a_2 = 1,$ and and $a_3 = 0.$ To get the cube polynomial $x^3$ in the above set-builder notation we take $a_0 = 0,$ $a_1 = 0,$ $a_2 = 0,$ and and $a_3 = 1.$ Using the concept of the span the above expression for $\mathbb{P}_3$ in set-builder notation can be written using the concept of the span as \[ \mathbb{P}_3 = \operatorname{Span}\bigl\{ 1, x, x^2, x^3 \bigr\}. \]
- As you probably learned in Math 204 the polynomials $1, x, x^2, x^3$ are linearly independent. These polynomials are called monomials.
Below is a proof that the monomials $1, x, x^2, x^3$ are linearly independent in the vector space ${\mathbb P}_3$. First we need to be specific what we need to prove.

Let $\alpha_1,$ $\alpha_2,$ $\alpha_3,$ and $\alpha_4$ be scalars in $\mathbb{R}.$ We need to prove the following implication: If \[ \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}, \] then \[ \bbox[5px, #FF4444, border: 1pt solid red]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \] Proof.
- Assume that \[ \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}, \]
- Consider the left-hand side of the preceding green identity as a function of $x$ and take the derivative with respect to $x$. We obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_2 + 2 \alpha_3 x + 3 \alpha_4 x^2 =0 \quad \text{for all} \quad x \in \mathbb{R}}. \]
- Again, consider the left-hand side of the preceding green identity as a function of $x$ and take the derivative with respect to $x$. We obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{2 \alpha_3 + 6 \alpha_4 x =0 \quad \text{for all} \quad x \in \mathbb{R}}. \]
- Again, consider the left-hand side of the preceding green identity as a function of $x$ and take the derivative with respect to $x$. We obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{6 \alpha_4 = 0 \quad \text{for all} \quad x \in \mathbb{R}}. \]
- Substituting $x=0$ in the preceding four green identities we obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1 = 0, \quad \alpha_2 =0, \quad 2 \alpha_3 = 0, \quad 6 \alpha_4 = 0}. \] Dividing the third equation by $2$ and the fourth equation by $6$ we obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \] In this way we have greenifyed the red statement. That is, we proved it.
Here is an alternative proof that the monomials $1, x, x^2, x^3$ are linearly independent in the vector space ${\mathbb P}_3$.
- Assume that $\alpha_1,$ $\alpha_2,$ $\alpha_3,$ and $\alpha_4$ are scalars in $\mathbb{R}$ such that \[ \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}. \] The objective here is to prove \[ \bbox[5px, #FF4444, border: 1pt solid red]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \]
- The above green identity holds for all $x\in\mathbb{R}.$ In particular, it holds for specific $x=-1,$ $x=0,$ $x=1,$ and $x=2.$ That is, we have \[ \bbox[5px, #88FF88, border: 1pt solid green]{ \begin{array}{lr} \alpha_1 - \phantom{2} \alpha_2 +\phantom{4}\alpha_3 -\phantom{8} \alpha_4 &=0 \\ \alpha_1 &=0 \\ \alpha_1 + \phantom{2} \alpha_2 + \phantom{4}\alpha_3 +\phantom{8} \alpha_4 &=0 \\ \alpha_1 + 2 \alpha_2 + 4 \alpha_3 + 8 \alpha_4 &=0 \\ \end{array} } \]
- The last green box contains a homogeneous system of linear equations which can be written in a matrix form as \[ \bbox[5px, #88FF88, border: 1pt solid green]{ \left[\!\begin{array}{rrrr} 1 & -1 & 1 & -1\\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array}\!\right] \left[\!\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \alpha_4 \end{array}\!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right] } \]
- Let us calculate the determinant of the preceding $4\!\times\!4$ matrix: \[ \left|\!\begin{array}{rrrr} 1 & -1 & 1 & -1\\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array}\!\right| = -1 \left|\!\begin{array}{rrr} -1 & 1 & -1\\ 1 & 1 & 1 \\ 2 & 4 & 8 \end{array}\!\right| = -1 \left|\!\begin{array}{rrr} 0 & 2 & 0\\ 1 & 1 & 1 \\ 2 & 4 & 8 \end{array}\!\right| = (-1) (-2) \left|\!\begin{array}{rrr} 1 & 1 \\ 2 & 8 \end{array}\!\right| = (1)(-2) 6 = 12. \] Since the determinant of the above $4\!\times\!4$ matrix is $12$, the above homogeneous system of linear equations has only the trivial solution. That is, \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \] In this way we have greenifyed the red statement. That is, we proved it.
Hence, \[ \mathbb{P}_3 = \operatorname{Span}\bigl\{ 1, x, x^2, x^3 \bigr\} \] and $\mathcal{M} = \bigl\{ 1, x, x^2, x^3 \bigr\}$ is a basis for $\mathbb{P}_3.$ I denoted this basis by $\mathcal{M}$ since the polynomials $1,$ $x,$ $x^2$, $x^3$ are called monomials. The basis $\mathcal{M} = \bigl\{ 1, x, x^2, x^3 \bigr\}$ is called the standard basis for $\mathbb{P}_3$.
The concept of a basis of a finite-dimensional vector space is essential for everything that we discus in this class. The definition of a basis is at the beginning of Section 4.3 in the textbook.
The Unique Representation Theorem on page 218 is essential for the definition of a coordinate mapping relative to a basis.
Let $\mathcal{B} = \{\mathbf{b}_1,\ldots,\mathbf{b}_n\}$ be a basis of a vector space $\mathcal{V}.$ Please understand the concept of the coordinates of a vector in $\mathcal{V}$ relative to the basis $\mathcal{B}.$ This definition is on page 218. Also understand the concept of the coordinate mapping determined by the basis $\mathcal{B}.$ Briefly, for a vector $\mathbf{v}$ in $\mathcal{V}$ we have \[ [\mathbf{v}]_{\mathcal{B}} = \left[\!\!\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{array}\!\!\right] \quad \text{if and only if} \quad \mathbf{v} = \alpha_1 \mathbf{b}_1 + \alpha_2 \mathbf{b}_2 + \cdots + \alpha_n \mathbf{b}_n \] The coordinate mapping relative to a basis $\mathcal{B}$ is a linear mapping with domain $\mathcal{V}$ and with the range $\mathbb{R}^n$ which is defined by \[ \mathcal{V} \ni \mathbf{v} \longmapsto [\mathbf{v}]_{\mathcal{B}} = \left[\!\!\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{array}\!\!\right] \in \mathbb{R}^n. \]
The fundamental theorem about the coordinate mapping is Theorem 8 on page 221. In Theorem 8 the book uses terminology one-to-one and onto. I would like to encourage you to use different terminology for these concepts. My preferred synonym for one-to-one is injection. My preferred synonym of onto is surjection.
Since the concepts of injection and surjection are basic concepts related to functions, I state their definitions here.
Definition. A function $f$ from $A$ to $B$, $f:A\to B$, is called surjection if it satisfies condition the following condition:
- For every $y \in B$ there exists $x \in A$ such that $f(x) = y$.
Definition. A function $f$ from $A$ to $B$, $f:A\to B$, is called injection if it satisfies the following condition
- For every $x_1, x_2 \in A$ we have that $f(x_1) = f(x_2)$ implies $x_1 = x_2$.
An equivalent formulation of the preceding condition is:
- For every $x_1, x_2 \in A$ we have that $x_1 \neq x_2$ implies $f(x_1) \neq f(x_2)$.
Together with the terminology injection and surjection goes the following terminology
Definition. A function $f:A\to B$ is called bijection if it satisfies the following two conditions:
- For every $y \in B$ there exists $x \in A$ such that $f(x) = y$.
- For every $x_1, x_2 \in A$ we have that $f(x_1) = f(x_2)$ implies $x_1 = x_2$.
In other words, a function $f:A\to B$ is called bijection if it is both an injection and a surjection.
In the context of vector spaces the following is an important definition:

Definition. Let $\mathcal V$ and $\mathcal W$ be vector spaces. A linear bijection $T: \mathcal V \to \mathcal W$ is said to be an isomorphism.
Since I use different terminology, I will restate Theorem 8 from page 221 here.

Theorem 8. Let $n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. The coordinate mapping \[ \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal V, \] is a linear bijection between the vector space $\mathcal V$ and the vector space $\mathbb{R}^n.$

Or we can use the concept of isomorphism and state:

Theorem 8. Let $n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. The coordinate mapping \[ \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal{V}, \] is an isomorphism between the vector space $\mathcal V$ and the vector space $\mathbb{R}^n.$
The next two corollaries of Theorem 8 are useful tools in dealing with abstract vector spaces.
Corollary 1. Let $m, n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. Then the following statements are equivalent:
1. Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ are linearly independent in $\mathcal V$.
2. The columns of the $n\times m$ matrix $\Bigl[ [\mathbf{v}_1]_\mathcal{B} \ [\mathbf{v}_2]_\mathcal{B} \ \cdots \ [\mathbf{v}_m]_\mathcal{B} \Bigr]$ are linearly independent.
Corollary 2. Let $m, n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. Then the following statements are equivalent:
1. Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ span the space $\mathcal V$.
2. The columns of the $n\times m$ matrix $\Bigl[ [\mathbf{v}_1]_\mathcal{B} \ [\mathbf{v}_2]_\mathcal{B} \ \cdots \ [\mathbf{v}_m]_\mathcal{B} \Bigr]$ span the space $\mathbb{R}^n$.
In Problem 13 in Section 4.4, we use Corollary 1 and Corollary 2 to prove that the set \[ \mathcal B = \bigl\{1+x^2, x+ x^2, 1+2 x+ x^2\bigr\} \] is a basis for $\mathbb{P}_2$.
- We first express coordinate vectors relative to the basis $\mathcal{M}_2=\{1,x,x^2\}$ for each of the polynomials in $\mathcal{B}$: \[ [1+x^2]_{\mathcal{M}_2} = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \quad [x+x^2]_{\mathcal{M}_2} = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}, \quad [1+2 x+x^2]_{\mathcal{M}_2} = \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}. \]
- Since \[ \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 2 \\ 1 & 1 & 1 \end{bmatrix} \sim \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 2 \\ 0 & 1 & 0 \end{bmatrix} \sim \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix} \sim \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \] the vectors \[ \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \quad \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}, \quad \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}, \] are linearly independent and span $\mathbb{R}^3$. By Corollary 1 and Corollary 2, the polynomials \[ 1+x^2, \quad x+x^2, \quad 1+2 x+x^2, \] are linearly independent and the span $\mathbb{P}_2$.
- We have the change of coordinate basis \[ \underset{\mathcal{M}_2\leftarrow\mathcal{B}}{P} = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 2 \\ 1 & 1 & 1 \end{bmatrix}. \] To calculate the change of coordinate basis $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{M}_2}{P}$ we calculate the inverse of $\displaystyle \underset{\mathcal{M}_2\leftarrow\mathcal{B}}{P}$: \begin{align*} \left[\!\begin{array}{ccc|ccc} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 2 & 0 & 1 & 0 \\ 1 & 1 & 1 & 0 & 0 & 1 \end{array}\!\right] & \sim \left[\!\begin{array}{ccc|ccc} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 2 & 0 & 1 & 0 \end{array}\!\right] \\ & \sim \left[\!\begin{array}{ccc|ccc} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & -1 & 0 & 1 \\ 0 & 0 & 2 & 1 & 1 & -1 \end{array}\!\right] \\ & \sim \left[\!\begin{array}{ccc|ccc} 1 & 0 & 1 & \frac{1}{2} & -\frac{1}{2} & \frac{1}{2} \\ 0 & 1 & 0 & -1 & 0 & 1 \\ 0 & 0 & 1 & \frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \end{array}\!\right] \\ \end{align*}
- Thus \[ \underset{\mathcal{B}\leftarrow\mathcal{M}_2}{P} = \begin{bmatrix} \frac{1}{2} & -\frac{1}{2} & \frac{1}{2} \\ -1 & 0 & 1 \\ \frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \end{bmatrix}. \] The practical meaning of the above matrix is as follows: \begin{align*} 1 & = \tfrac{1}{2} (1+x^2) + (-1) (x+x^2) + \tfrac{1}{2} (1 + 2x + x^2) \\ x & =\bigl(-\tfrac{1}{2} \bigr) (1+x^2) \qquad \qquad + \tfrac{1}{2} (1 + 2x + x^2) \\ x^2 & = \tfrac{1}{2} (1+x^2) + (1) (x+x^2) + \bigl(-\tfrac{1}{2} \bigr) (1 + 2x + x^2) \end{align*}

Friday, January 19, 2024

Today we did all the details about what was posted on Tuesday, January 16, 2024. The main concepts are Section 4.4: Coordinate Systems and Section 4.7: Change of Basis. Read these sections and work out the picture problem posted on Tuesday. Suggested exercises for Section 4.7: Change of Basis are 2, 3, 4, 6, 8, 9, 11, 12, 19, 20.
On Monday we will discuss the Four Problems from the textbook that relate to vector spaces of trigonometric functions. These problems are in this linked PDF file.
There is much more about trigonometric functions then what is presented in these four problems. I wrote about it on this webpage. The linked webpage explores the relationship between the powers of cosine function and the multiple angle cosine functions.
In the items below I will present some methods do deal with the Four Problems related to trigonometric functions linked above.
Today we proved that the functions in the following set are linearly independent: \[ \mathcal{B} = \bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \] What does this mean? This means that the only linear combination of the functions in $\mathcal{B}$ which gives the zero function is the linear combination with zero coefficients. That is, we have to prove \[ \forall t \in\mathbb{R} \quad \alpha_0 + \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0 \quad \Rightarrow \quad \alpha_0 = 0, \ \alpha_1 = 0, \ \alpha_2 = 0, \ \alpha_3 = 0. \] To prove this implication we assume \[ \forall t \in\mathbb{R} \quad \alpha_0 + \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0. \] We will present two proofs. The first proof is suggested in the textbook. The second proof uses differentiation which you learned in Calculus.
- Proof 1. Since we assume that the last displayed identity holds for all $t\in\mathbb{R},$ substituting $t=\pi/2$ we obtain \[ \alpha_0 = 0. \] Therefore, the assumption becomes \[ \forall t \in\mathbb{R} \quad \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0. \] Substituting \[ t= 0, \quad t=\pi/3,\quad \text{and} \quad t=\pi/6, \] we obtain \begin{alignat*}{6} \alpha_1 &&+ && \alpha_2 &&+ && \alpha_3 && = 0 & \\ \tfrac{1}{4}\alpha_1&&+ && \tfrac{1}{16} \alpha_2 && + && \tfrac{1}{64} \alpha_3 && = 0 & \\ \tfrac{3}{4} \alpha_1 && + && \tfrac{9}{16} \alpha_2 && + && \tfrac{27}{64} \alpha_3 && = 0 & \\ \end{alignat*} Or, equivalently \begin{alignat*}{6} \alpha_1 &&+ && \alpha_2 &&+ && \alpha_3 && = 0 & \\ \alpha_1&&+ && \tfrac{1}{4} \alpha_2 && + && \tfrac{1}{16} \alpha_3 && = 0 & \\ \alpha_1 && + && \tfrac{3}{4} \alpha_2 && + && \tfrac{9}{16}\alpha_3 && = 0 & \\ \end{alignat*} To solve this system, we row reduce the matrix \[ \left[\!\begin{array}{rrr} 1 & 1 & 1 \\ 1 & \tfrac{1}{4} & \tfrac{1}{16} \\ 1 & \tfrac{3}{4} & \tfrac{9}{16} \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 1 & 1 \\ 0 & - \tfrac{3}{4} & - \tfrac{15}{16} \\ 0 & - \tfrac{1}{4} & - \tfrac{7}{16} \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 1 & 1 \\ 0 & 1 & \tfrac{5}{4} \\ 0 & 0 & -\tfrac{1}{8} \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 0 & -\tfrac{1}{4} \\ 0 & 1 & \tfrac{5}{4} \\ 0 & 0 & 1 \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\!\right] \] Thus, the only solution of the system for $\alpha_1$, $\alpha_2$, $\alpha_3$ is $\alpha_1=0$, $\alpha_2=0$, $\alpha_3=0.$ This completes the proof.
- Proof 2. Since we assume that the last displayed identity holds for all $t\in\mathbb{R},$ substituting $t=\pi/2$ we obtain \[ \alpha_0 = 0. \] Therefore, the assumption becomes \[ \forall t \in\mathbb{R} \quad \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0. \] Substituting \[ s = (\cos t)^2, \] we obtain \[ \forall s \in [0,1] \quad \alpha_1 s + \alpha_2 s^2 + \alpha_3 s^3 = 0. \] Differentiating the preceding identity with respect to $s \in [0,1]$ we obtain \[ \forall s \in [0,1] \quad 2 \alpha_1 + 2 \alpha_2 s + 3 \alpha_3 s^2 = 0. \] Differentiating the preceding identity again with respect to $s \in [0,1]$ we obtain \[ \forall s \in[(0,1] \quad 2 \alpha_2 + 6 \alpha_3 s = 0. \] Differentiating the preceding identity with respect to $s \in [0,1]$ we obtain \[ \forall s \in [0,1] \quad 6 \alpha_3 = 0. \] Substituting $s=0$ in the preceding three equalities we obtain \[ \alpha_1 = 0, \quad \alpha_2 = 0, \quad \alpha_3 = 0. \] This completes the proof.
Set \[ \mathcal{H} = \operatorname{Span}\bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\} \] and \[ \mathcal{B} = \bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \] In the preceding item we proved that $\mathcal{B}$ is a basis for $\mathcal{H}.$
We discussed the following three trigonometric identities \begin{align*} \cos(2t) & = - 1 + 2 (\cos t)^2\\ \cos(4t) & = \phantom{-} 1 - 8 (\cos t)^2 + 8 (\cos t)^4 \\ \cos(6t) & = - 1 + 18 (\cos t)^2 - 48 (\cos t)^4 + 32 (\cos t)^6 \end{align*} In the language of linear algebra, these identities mean \[ \cos(2t) \in \mathcal{H}, \quad \cos(4t) \in \mathcal{H}, \quad\cos(6t) \in \mathcal{H} \] and \[ \bigl[ 1 \bigr]_{\mathcal{B}}=\left[\begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \end{array}\right], \quad \bigl[\cos(2t)\bigr]_{\mathcal{B}}=\left[\begin{array}{r} -1 \\ 2 \\ 0 \\ 0 \end{array}\right], \quad \bigl[\cos(4t)\bigr]_{\mathcal{B}}=\left[\begin{array}{r} 1 \\ -8 \\ 8 \\ 0 \end{array}\right], \quad \bigl[\cos(6t)\bigr]_{\mathcal{B}}=\left[\begin{array}{r} -1 \\ 18 \\ -48 \\ 32 \end{array}\right]. \] It follows from Theorem 8 in Section 4.4 in the textbook that the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ are linearly independent if and only if their coordinates are linearly independent. That is the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ are linearly independent if and only if the vectors \[ \left[\begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \end{array}\right], \quad \left[\begin{array}{r} -1 \\ 2 \\ 0 \\ 0 \end{array}\right], \quad \left[\begin{array}{r} 1 \\ -8 \\ 8 \\ 0 \end{array}\right], \quad \left[\begin{array}{r} -1 \\ 18 \\ -48 \\ 32 \end{array}\right] \] are linearly independent. Since \[ \det \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right] = 1\cdot 2 \cdot 8 \cdot 32 = 512 \neq 0 \] the coordinate vectors are linearly independent. Therefore, the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ are linearly independent. Consequently, the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ form a basis for the vector space $\mathcal{H}.$
Set \[ \mathcal{C} = \bigl\{ 1, \cos(2 t), \cos(4 t), \cos(6 t) \bigr\}. \] Then $\mathcal{C}$ is a basis for the vector space $\mathcal{H}.$ Read the post of Monday, January 9. Based on that post we deduce that \[ \underset{\mathcal{B}\leftarrow\mathcal{C}}{P} = \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right]. \] Therefore \[ \underset{\mathcal{C}\leftarrow\mathcal{B}}{P} = \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right]^{-1} = \left[\!\begin{array}{cccc} 1 & \frac{1}{2} & \frac{3}{8} & \frac{5}{16} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{15}{32} \\ 0 & 0 & \frac{1}{8} & \frac{3}{16} \\ 0 & 0 & 0 & \frac{1}{32} \\ \end{array}\!\right] = \left[ \bigl[ 1 \bigr]_{\mathcal{C}} \ \ \bigl[ (\cos t)^2 \bigr]_{\mathcal{C}} \ \ \bigl[ (\cos t)^4 \bigr]_{\mathcal{C}} \ \ \bigl[ (\cos t)^6 \bigr]_{\mathcal{C}} \right]. \] Therefore, for example, \[ (\cos t)^6 = \frac{5}{16} + \frac{15}{32} \cos (2 t)+\frac{3}{16} \cos (4 t)+\frac{1}{32} \cos (6 t). \] Thus, we have discovered a trigonometric identity using linear algebra only.

Tuesday, January 16, 2024

Today in class we explored a problem inspired by the picture below. No numbers are given just the picture. In the picture below we are given two bases of $\mathbb{R}^2$, one blue and one purple: \[ \color{blue}{\mathcal B} = \bigl\{ \color{blue}{\mathbf{b}_1}, \color{blue}{\mathbf{b}_2} \bigr\}, \quad \color{purple}{\mathcal C} = \bigl\{ \color{purple}{\mathbf{c}_1}, \color{purple}{\mathbf{c}_2} \bigr\} \] For each basis a coordinate grid is shown in the corresponding color. The coordinate grids are drown in the increments of 1/10 with the multiples of 1/2 emphasized with slightly thicker lines. Based on the information provided in the picture give good estimates for the change of coordinates matrices: \[ \underset{\color{purple}{\mathcal C}\leftarrow\color{blue}{\mathcal{B}}}{P}, \qquad \underset{\color{blue}{\mathcal{B}}\leftarrow\color{purple}{\mathcal C}}{P}. \] Why are the red points in the picture exceptional? How you can use the red points to verify whether your change of coordinates matrices are correct?
Suggested exercises for Section 4.7: Change of Basis are 2, 3, 4, 6, 8, 9, 11, 12, 19, 20.
A brief review of the Change of Coordinates Matrix follows. Let $m, n \in \mathbb{N}$ and $m\leq n$. Let $\mathcal{H}$ be a subspace of $\mathbb{R}^n$ and let \[ \mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} \] and \[ \mathcal{B} = \bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] be two bases of $\mathcal{H}.$ By definition of a basis this implies that \[ \mathcal{H} = \operatorname{Span}\bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} = \operatorname{Span}\bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] and both \[ \mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} \quad \text{and} \quad \mathcal{B} = \bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] are linearly independent. We proved in class that the change of coordinates matrix $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ is given by \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} = \Bigl[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}} \ \cdots \ \bigl[ \mathbf{a}_m\bigr]_{\mathcal{B}} \Bigr] \] and analogously \[ \underset{\mathcal{A}\leftarrow\mathcal{B}}{P} = \Bigl[ \bigl[\mathbf{b}_1\bigr]_{\mathcal{A}} \ \cdots \ \bigl[ \mathbf{b}_m\bigr]_{\mathcal{A}} \Bigr]. \] But, how to calculate \[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}}, \ldots,\bigl[ \mathbf{a}_m\bigr]_{\mathcal{B}}? \] Let us look at \[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}} = \left[\!\begin{array}{c} x_1 \\ \vdots \\ x_m \end{array}\!\right]. \] To find the real numbers $x_1, \ldots, x_m$ we have to solve the nonhomogeneous vector equation \[ x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_1. \] To solve the preceding equation we row reduce \[ \Bigl[\!\begin{array}{cccc|c} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_m & \mathbf{a}_1\end{array}\!\Bigr]. \] Since the vectors $\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_m$ are linearly independent, the Reduced Row Echelon Form of the preceding augmented matrix has the following form \[ \left[\!\begin{array}{cccc|c} 1 & 0 & \cdots & 0 & \text{the solution for} \ x_1 \\ 0 & 1 & \cdots & 0 & \text{the solution for} \ x_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & \text{the solution for} \ x_m \\ 0 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 & 0 \end{array}\!\right]. \] Notice that in the preceding matrix the bottom zero rows are present only in the case when $n \gt m.$ If $n \gt m$, then there are exactly $n-m$ rows of zeros. Also notice that the above system must be consistent since the vector $\mathbf{a}_1$ is in the span of the vectors $\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_m.$
To solve the nonhomogeneous vector equations \[ x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_2, \quad \ldots, \quad x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_m, \] we just build the bigger augmented matrix: \[ \Bigl[\!\begin{array}{cccc|cccc} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_m & \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_m \end{array}\!\Bigr]. \] Since the vectors $\mathbf{b}_1, \ldots, \mathbf{b}_m$ are linearly independent, the RREF off the matrix whose columns are the vectors of $\mathcal{B}$ consists of the identity matrix $I_m$ and $n-m$ zero rows at the bottom if $n\gt m.$ Therefore \[ \Bigl[\!\begin{array}{ccc|ccc} \mathbf{b}_1 & \cdots & \mathbf{b}_m & \mathbf{a}_1 & \cdots & \mathbf{a}_m \end{array}\!\Bigr] \sim \cdots \sim \left[\! \begin{array}{c|c} I_m & \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \\ 0 & 0 \end{array} \!\right]. \] In the preceding RREF, the zero matrices at the bottom are present only if $n-m \gt 0.$ Then, if $n-m \gt 0,$ these matrices are of the size $(n-m)\!\times\!m;$ they both have $m$ columns and $n-m$ rows consisting of zeros.
In the next example we are given two bases of a two-dimensional subspace of $\mathbb{R}^4$ and we are asked to find a change of coordinate matrices between these two bases: \[ \mathcal{H} = \operatorname{Span}\left\{\left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right], \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]\right\} = \operatorname{Span}\left\{\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right], \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]\right\}. \] Set \[ \mathcal{A} = \left\{\left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right], \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]\right\}, \qquad \mathcal{B} = \left\{\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right], \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]\right\}. \] To calculate $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ we need to row reduce the matrix \[ \left[\!\begin{array}{cr|cc} 2 & 1 & 1 & 2 \\ 5 & 0 & 2 & 3 \\ 3 & -1 & 1 & 1 \\ 7 & 1 & 3 & 5 \end{array}\!\right] \] The RREF of the preceding matrix will certainly include fractions. Therefore we rather find $\displaystyle\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}$ for which we need to row reduce (without fractions) \[ \left[\!\begin{array}{cc|cr} 1 & 2 & 2 & 1 \\ 2 & 3 & 5 & 0 \\ 1 & 1 & 3 & -1 \\ 3 & 5 & 7 & 1 \end{array}\!\right] \sim \cdots \sim \left[\!\begin{array}{cc|rr} 1 & 0 & 4 & -3 \\ 0 & 1 & -1 & 2 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\!\right]. \] Hence \[ \underset{\mathcal{A}\leftarrow\mathcal{B}}{P} = \left[\!\begin{array}{rr} 4 & -3 \\ -1 & 2 \end{array}\!\right]. \] Let us verify this calculation. Is it true that: \[ \left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] = (4) \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] + (-1)\left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right], \qquad \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right] = (-3) \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] + (2) \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]? \] Yes. Therefore the following equalities are correct: \[ \bigl[ \mathbf{b}_1\bigr]_{\mathcal{A}} = \left[\!\begin{array}{r} 4 \\ -1 \end{array}\!\right], \qquad \bigl[ \mathbf{b}_2\bigr]_{\mathcal{A}} = \left[\!\begin{array}{r} -3 \\ 2 \end{array}\!\right]. \] Hence, $\displaystyle\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}$ is correct.

We have \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} = \left(\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}\right)^{-1} = \frac{1}{5} \left[\!\begin{array}{rr} 2 & 3 \\ 1 & 4 \end{array}\!\right]. \] Verify this: \[ \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] = \frac{2}{5}\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] + \frac{1}{5} \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right], \qquad \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right] = \frac{3}{5} \left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] + \frac{4}{5} \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]. \] True. Therefore the following equalites are correct: \[ \bigl[ \mathbf{a}_1\bigr]_{\mathcal{B}} = \frac{1}{5}\left[\!\begin{array}{r} 2 \\ 1 \end{array}\!\right], \qquad \bigl[ \mathbf{a}_2\bigr]_{\mathcal{B}} = \frac{1}{5} \left[\!\begin{array}{r} 3 \\ 4 \end{array}\!\right]. \] Hence, $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ is correct.

Friday, January 12, 2024

To celebrate the concept of reduced row echelon form of a matrix and the full power of its utility I wrote the webpage
An Ode to Reduced Row Echelon Form

Today we studied the following $3\times 5$ matrix \[ A = \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 2 & 4 & 1 & 0 & 5 \\ 3 & 6 & 2 & -1 & 8 \end{array}\right] \].

We calculated the Reduced Echelon Form of $A$ \begin{alignat*}{2} \text{1st} \ & \text{step} \quad \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 2 & 4 & 1 & 0 & 5 \\ 3 & 6 & 2 & -1 & 8 \end{array}\right] && \sim \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 0 & 0 & 1 & -2 & 1 \\ 0 & 0 & 2 & -4 & 2 \end{array}\right] \\ \text{2nd} \ & \text{step} && \sim \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 0 & 0 & 1 & -2 & 1 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right] \\ \end{alignat*}

Each step in a row reduction can be achieved by multiplication by a matrix.

Step	the row operations	the matrix used	the matrix inverse
1st	$\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to (-2)\sideset{_o}{_1}R + \sideset{_o}{_2}R,\\ \sideset{_n}{_3}R \to (-3)\sideset{_o}{_1}R + \sideset{_o}{_3}R \end{array} $	$M_1 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ -3 & 0 & 1 \\ \end{array}\right]$	$(M_1)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 0 & 1 \\ \end{array}\right]$
2nd	$\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to \sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to (-2)\sideset{_o}{_2}R + \sideset{_o}{_3}R \end{array}$	$M_2 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{array}\right]$	$(M_2)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array}\right]$

The importance of the careful keeping track of each matrix is that we can calculate which single matrix performs the above Row Reduction:
\[ M_2 M_1 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ -3 & 0 & 1 \\ \end{array}\right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 1 & -2 & 1 \\ \end{array} \right] \] You can verify that \[ \left[ \begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 1 & -2 & 1 \\ \end{array} \right] \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 2 & 4 & 1 & 0 & 5 \\ 3 & 6 & 2 & -1 & 8 \end{array}\right] = \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 0 & 0 & 1 & -2 & 1 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]. \]
The last matrix multiplication in the preceding item is important since it reveals the following important fact
The nonzero rows of the Reduced Row Echelon Form of a matrix are linear combinations of the rows of the original matrix.
It is always better to be specific and write the specific linear combinations: \begin{alignat*}{4} \left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] & = &1 &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] + &0 &\left[\!\begin{array}{r} 2 \\ 4 \\ 1 \\ 0 \\ 5 \end{array}\!\right] + &0 &\left[\!\begin{array}{r} 3 \\ 6 \\ 2 \\ -1 \\ 8 \end{array}\!\right], \\ \left[\!\begin{array}{r} 0 \\ 0 \\ 1 \\ -2 \\ 1 \end{array}\!\right] & = &(-2) &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] + &1&\left[\!\begin{array}{r} 2 \\ 4 \\ 1 \\ 0 \\ 5 \end{array}\!\right] + &0&\left[\!\begin{array}{r} 3 \\ 6 \\ 2 \\ -1 \\ 8 \end{array}\!\right], \\ \left[\!\begin{array}{r} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right] & = &1 &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] + & (-2) &\left[\!\begin{array}{r} 2 \\ 4 \\ 1 \\ 0 \\ 5 \end{array}\!\right] + &1 &\left[\!\begin{array}{r} 3 \\ 6 \\ 2 \\ -1 \\ 8 \end{array}\!\right]. \end{alignat*}
But, also,
The rows of the original matrix are linear combinations of the nonzero rows of the Reduced Row Echelon Form of the original matrix .
To see this calculate \[ (M_1)^{-1} (M_2)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 0 & 1 \\ \end{array}\right] \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array}\right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 2 & 1 \\ \end{array} \right] \] You can verify that \[ \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 2 & 1 \\ \end{array} \right] \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 0 & 0 & 1 & -2 & 1 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right] = \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 2 \\ 2 & 4 & 1 & 0 & 5 \\ 3 & 6 & 2 & -1 & 8 \end{array}\right] . \] It is always better to be specific and write the specific linear combinations of the rows of the original matrix as linear combinations of the row of the Reduced Row Echelon Form: \begin{alignat*}{4} \left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] & = &1&\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] + &0 &\left[\!\begin{array}{r} 0 \\ 0 \\ 1 \\ -2 \\ 1 \end{array}\!\right], \\ \left[\!\begin{array}{r} 2 \\ 4 \\ 1 \\ 0 \\ 5 \end{array}\!\right] & = &2 &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] + &1&\left[\!\begin{array}{r} 0 \\ 0 \\ 1 \\ -2 \\ 1 \end{array}\!\right], \\ \left[\!\begin{array}{r} 3 \\ 6 \\ 2 \\ -1 \\ 8 \end{array}\!\right] & = & 3 &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 2 \end{array}\!\right] + & 2 &\left[\!\begin{array}{r} 0 \\ 0 \\ 1 \\ -2 \\ 1 \end{array}\!\right] \end{alignat*}
The last six linear combinations prove that the row spaces of the of the matrix $A$ and the row space of the RREF of $A$ are identical.
Since the nonzero rows of the RREF of $A$ are linearly independent, we deduce that
The nonzero rows of the Reduced Row Echelon Form of $A$ form a basis for the row space of $A$.
For what Reduced Row Echelon Form of a matrix tells us about the Column Space of a matrix read
An Ode to Reduced Row Echelon Form

Let us work through another example of a $3\times 5$ matrix \[ A = \left[\!\begin{array}{rrrrr} 3 & 1 & 5 & 1 & 2 \\ 2 & 2 & 2 & 1 & 4 \\ 1 & 2 & 0 & 1 & 5 \end{array}\right] \].

We calculated the Reduced Echelon Form of $A$ \begin{alignat*}{2} \text{1st} \ & \text{step} \quad \left[\!\begin{array}{rrrrr} 3 & 1 & 5 & 1 & 2 \\ 2 & 2 & 2 & 1 & 4 \\ 1 & 2 & 0 & 1 & 5 \end{array}\right] && \sim \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 5 \\ 1 & 1 & 1 & \frac{1}{2} & 2 \\ 1 & \frac{1}{3} & \frac{5}{3} & \frac{1}{3} & \frac{2}{3} \end{array}\right] \\ \text{2nd} \ & \text{step} && \sim \left[\!\begin{array}{rrrrr} 1 & 2 & 0 & 1 & 5 \\ 0 & -1 & 1 & -\frac{1}{2} & -3 \\ 0 & -\frac{5}{3} & \frac{5}{3} & -\frac{2}{3} & -\frac{13}{3} \end{array}\right] \\ \text{3rd} \ & \text{step} && \sim \left[\!\begin{array}{rrrrr} 1 & 0 & 2 & 0 & -1 \\ 0 & 1 & -1 & \frac{1}{2} & 3 \\ 0 & 0 & 0 & \frac{1}{6} & \frac{2}{3} \end{array}\right] \\ \text{4th} \ & \text{step} && \sim \left[\!\begin{array}{rrrrr} 1 & 0 & 2 & 0 & -1 \\ 0 & 1 & -1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 4 \end{array}\right] \\ \end{alignat*}

Each step in a row reduction can be achieved by multiplication by a matrix.

Step	the row operations	the matrix used	the matrix inverse
1st	$\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_3}R, \\ \sideset{_n}{_2}R \to \frac{1}{2}\sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to \frac{1}{3} \sideset{_o}{_1}R \end{array}$	$M_1 = \left[\!\begin{array}{rrr} 0 & 0 & 1 \\ 0 & \frac{1}{2} & 0 \\ \frac{1}{3} & 0 & 0 \\ \end{array}\right]$	$(M_1)^{-1} = \left[\!\begin{array}{rrr} 0 & 0 & 3 \\ 0 & 2 & 0 \\ 1 & 0 & 0 \\ \end{array}\right]$
2nd	$\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to (-1)\sideset{_o}{_1}R+\sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to (-1)\sideset{_o}{_1}R + \sideset{_o}{_3}R \end{array}$	$M_2 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{array}\right]$	$(M_2)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{array}\right]$
3rd	$\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R + 2 \sideset{_o}{_2}R, \\ \sideset{_n}{_2}R \to (-1)\sideset{_o}{_2}R, \\ \sideset{_n}{_3}R \to -\frac{5}{3} \sideset{_o}{_2}R + \sideset{_o}{_3}R \end{array}$	$M_3 = \left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right]$	$(M_3)^{-1} = \left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right]$
4th	$\mkern 5mu \begin{array}{l} \sideset{_n}{_1}R \to \sideset{_o}{_1}R, \\ \sideset{_n}{_2}R \to \sideset{_o}{_2}R + (-3)\sideset{_o}{_3}R,\\ \sideset{_n}{_3}R \to 6 \sideset{_o}{_3}R \end{array}$	$M_4 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & -3 \\ 0 & 0 & 6 \end{array}\right]$	$(M_4)^{-1} = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & \frac{1}{2} \\ 0 & 0 & \frac{1}{6} \end{array}\right]$

The importance of the careful keeping track of each matrix is that we can calculate which single matrix performs the above Row Reduction:
\[ M_4 M_3 M_2 M_1 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & -3 \\ 0 & 0 & 6 \end{array}\right]\left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right] \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 0 & 0 & 1 \\ 0 & \frac{1}{2} & 0 \\ \frac{1}{3} & 0 & 0 \\ \end{array}\right] = \left[ \begin{array}{rrr} 0 & 1 & -1 \\ -1 & 2 & -1 \\ 2 & -5 & 4 \\ \end{array} \right] \] You can verify that \[ \left[ \begin{array}{rrr} 0 & 1 & -1 \\ -1 & 2 & -1 \\ 2 & -5 & 4 \\ \end{array} \right] \left[\!\begin{array}{rrrrr} 3 & 1 & 5 & 1 & 2 \\ 2 & 2 & 2 & 1 & 4 \\ 1 & 2 & 0 & 1 & 5 \end{array}\right] = \left[\!\begin{array}{rrrrr} 1 & 0 & 2 & 0 & -1 \\ 0 & 1 & -1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 4 \end{array}\right]. \]
The last matrix multiplication in the preceding item is important since it reveals the following important fact
The nonzero rows of the Reduced Row Echelon Form of a matrix are linear combinations of the rows of the original matrix.
It is always better to be specific and write the specific linear combinations: \begin{alignat*}{4} \left[\!\begin{array}{r} 1 \\ 0 \\ 2 \\ 0 \\ -1 \end{array}\!\right] & = &0 &\left[\!\begin{array}{r} 3 \\ 1 \\ 5 \\ 1 \\ 2 \end{array}\!\right] + &1 &\left[\!\begin{array}{r} 2 \\ 2 \\ 2 \\ 1 \\ 4 \end{array}\!\right] + &(-1) &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 5 \end{array}\!\right], \\ \left[\!\begin{array}{r} 0 \\ 1 \\ -1 \\ 0 \\ 1 \end{array}\!\right] & = &(-1) &\left[\!\begin{array}{r} 3 \\ 1 \\ 5 \\ 1 \\ 2 \end{array}\!\right] + &2 &\left[\!\begin{array}{r} 2 \\ 2 \\ 2 \\ 1 \\ 4 \end{array}\!\right] + &(-1) &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 5 \end{array}\!\right], \\ \left[\!\begin{array}{r} 0 \\ 0 \\ 0 \\ 1 \\ 4 \end{array}\!\right] & = & 2 &\left[\!\begin{array}{r} 3 \\ 1 \\ 5 \\ 1 \\ 2 \end{array}\!\right] + & (-5) &\left[\!\begin{array}{r} 2 \\ 2 \\ 2 \\ 1 \\ 4 \end{array}\!\right] + & 4 &\left[\!\begin{array}{r} 1 \\ 2 \\ 0 \\ 1 \\ 5 \end{array}\!\right]. \end{alignat*}
But, also,
The rows of the original matrix are linear combinations of the nonzero rows of the Reduced Row Echelon Form of the original matrix .
It is always better to be specific and write the specific linear combinations: \begin{alignat*}{4} \require{bbox} \left[\!\begin{array}{r} \bbox[yellow]{3} \\ \bbox[lightblue]{1} \\ 5 \\ \bbox[lightgreen]{1} \\ 2 \end{array}\!\right] & = &\bbox[yellow]{3} &\left[\!\begin{array}{r} \bbox[yellow]{1} \\ \bbox[lightblue]{0} \\ 2 \\ \bbox[lightgreen]{0} \\ -1 \end{array}\!\right] + &\bbox[lightblue]{1} &\left[\!\begin{array}{r} \bbox[yellow]{0} \\ \bbox[lightblue]{1} \\ -1 \\ \bbox[lightgreen]{0} \\ 1 \end{array}\!\right] + &\bbox[lightgreen]{1} &\left[\!\begin{array}{r} \bbox[yellow]{0} \\ \bbox[lightblue]{0} \\ 0 \\ \bbox[lightgreen]{1} \\ 4 \end{array}\!\right], \\ \left[\!\begin{array}{r} \bbox[yellow]{2} \\ \bbox[lightblue]{2} \\ 2 \\ \bbox[lightgreen]{1} \\ 4 \end{array}\!\right] & = &\bbox[yellow]{2} &\left[\!\begin{array}{r} \bbox[yellow]{1} \\ \bbox[lightblue]{0} \\ 2 \\ \bbox[lightgreen]{0} \\ -1 \end{array}\!\right] + &\bbox[lightblue]{2} &\left[\!\begin{array}{r} \bbox[yellow]{0} \\ \bbox[lightblue]{1} \\ -1 \\ \bbox[lightgreen]{0} \\ 1 \end{array}\!\right] + &\bbox[lightgreen]{1} &\left[\!\begin{array}{r} \bbox[yellow]{0} \\ \bbox[lightblue]{0} \\ 0 \\ \bbox[lightgreen]{1} \\ 4 \end{array}\!\right], \\ \left[\!\begin{array}{r} \bbox[yellow]{1} \\ \bbox[lightblue]{2} \\ 0 \\ \bbox[lightgreen]{1} \\ 5 \end{array}\!\right] & = & \bbox[yellow]{1} &\left[\!\begin{array}{r} \bbox[yellow]{1} \\ \bbox[lightblue]{0} \\ 2 \\ \bbox[lightgreen]{0} \\ -1 \end{array}\!\right] + & \bbox[lightblue]{2} &\left[\!\begin{array}{r} \bbox[yellow]{0} \\ \bbox[lightblue]{1} \\ -1 \\ \bbox[lightgreen]{0} \\ 1 \end{array}\!\right] + & \bbox[lightgreen]{1} &\left[\!\begin{array}{r} \bbox[yellow]{0} \\ \bbox[lightblue]{0} \\ 0 \\ \bbox[lightgreen]{1} \\ 4 \end{array}\!\right]. \end{alignat*}
It is important to observe that forming the above three linear combinations is straightforward because of the positioning of the leading $1$s in the rows of the RREF. I tried to emphasise this by
The last six linear combinations prove that the row spaces of the of the matrix $A$ and the row space of the RREF of $A$ are identical.
Since the nonzero rows of the RREF of $A$ are linearly independent, we deduce that
The nonzero rows of the Reduced Row Echelon Form of $A$ form a basis for the row space of $A$.
For what Reduced Row Echelon Form of a matrix tells us about the Column Space of a matrix read
An Ode to Reduced Row Echelon Form

Tuesday, January 9, 2024

I wanted to provide a colorful beginning of the New Year and the Winter Quarter, so I started the class by talking about the colors in relation to linear algebra. I love the application of vectors to COLORS so much that I wrote a webpage to celebrate it: Color Cube.
It is important to point out that in the red-green-blue coloring scheme, the following eighteen colors stand out. I present them in six steps with three colors in each step.
- Step One:
  - Black (vector $(0,0,0)$),
  - White (vector $(1,1,1)$) and the color between them,
  - Gray (vector $(1/2,1/2,1/2)$) which can be considered as dark White;
- Step Two: the coordinate colors,
  - Red (vector $(1,0,0)$),
  - Green (vector $(0,1,0)$),
  - Blue (vector $(0,0,1)$);
- Step Three: the complementary colors to RGB which are CMY,
  - Cyan (vector $(0,1,1)$) absence of Red,
  - Magenta (vector $(1,0,1)$) absence of Green,
  - Yellow (vector $(1,1,0)$) absence of Blue;
- Step Four: the dark RGB colors,
  - Maroon (vector $(1/2,0,0)$) is the dark Red,
  - Forest (vector $(0,1/2,0)$) is the dark Green,
  - Navy (vector $(0,0,1/2)$) is the dark Blue;
- Step Five: the dark CMY colors,
  - Teal (vector $(0,1/2,1/2)$) is the dark Cyan,
  - Purple (vector $(1/2,0,1/2)$) is the dark Magenta,
  - Olive (vector $(1/2,1/2,0)$) is the dark Yellow.
- Step Six: three (out of total of six) colors between neighbouring RGB and CMY. The logic of our choice here is more complicated. R neighbours with M and Y. G neighbours with C and Y. B neighbours with C and M. It turns out that only three out of these six colors have names.
  - Orange (vector $(1,1/2,0)$) is the color in the middle between Red and Yellow,
  - Chartreuse (vector $(1/2,1,0)$) is the color in the middle between Green and Yellow,
  - Violet (vector $(1/2,0,1)$) is the color in the middle between Blue and Magenta.
I recommend that you look into learning LaTeX which is a superior typesetting system for writing math papers. Not only that LaTeX is the best way to write mathematical content on computer, but it is the best way to display math on the Internet. All the mathematics that you see on my website is written in LaTeX. All the mathematics that you see on Wikipedia is written in LaTeX. In fact, if you go to a mathematics page, on the line below the title, next to the Read button, you will find the Edit button. Clicking this edit button will take you to the code for that webpage. You will see a lot of LaTeX code there. In fact, if you need exactly that formula, you can copy and paste it in your document.
- I created Getting Started with LaTeX page to help you with this.
- Here is a simple LaTeX sample document in which I prove an interesting inequality.
- If you need help starting with LaTeX, feel free to ask me for help. It is as important as learning one nice piece of mathematics. And I want to help you with that, not only because that is my job, but because I deeply believe that both math - through learning rigorous thinking - and professional writing skills - supported by rigorous thinking - will help you in your life more than anything else.
- I have noticed that many students use the online LaTeX editor Overleaf. I do not have any experience with Overleaf, except that I have seen that many students use it with nice results.
- LaTeX is just like a computer program that compiles into your mathematical document. As you probably heard ChatGPT is exceptionally good at writing code in all kind of different languages. So, it is good at writing LaTeX code. If you are confused on how to do a particular task in LaTeX, you can ask ChatGPT for advise. All advice that I obtained was superb. Not only in LaTeX, but in Wolfram Mathematica as well. Sometimes ChatGPT is sloppy, and makes trivial mistakes, parenthesis not matching and stuff like that. It helps to write very detailed user prompts.
- You:
  
  Can you please write a complete LaTeX file with instructions on using basic mathematical operations, like fractions, sums, integrals, basic functions, like cosine, sine, and exponential function, and how to structure a document and similar features? Please explain the difference between the inline and displayed mathematical formulas. Please include examples of different ways of formatting displayed mathematical formulas. Please include what you think would be useful to a mathematics student. Also, can you please include your favorite somewhat complicated mathematical formula as an example of the power of LaTeX? I emphasize I want a complete file that I can copy into the LaTeX compiler and compile into a pdf file. Please ensure that your document contains the code for the formulas you are writing, which displays both as code separately from compiled formulas. Also, please double-check that your code compiles correctly. Remember that I am a beginner and cannot fix the errors. Please act as a concerned teacher would do.
  
  This is the LaTeX document that ChatGPT produced base on the above prompt. Here is the compiled PDF document.
  
  You can ask ChatGPT for specific LaTeX advise. To get a good response, think carefully about your prompt. Also, you can offer to ChatGPT a sample of short mathematical writing from the web or a book as a PNG file and it convert that writing to LaTeX. You can even try with neat handwriting. The results will of course depend on the clarity of the file, ChatGPT makes mistakes, but I found it incredibly useful.
- It is tempting to make LaTeX a mandatory tool for creating submissions in this class. However, I do not appreciate having mandates imposed on me; I prefer the freedom to think my own thoughts. An important part of a teacher's role is to encourage students to think their own thoughts, and imposing a mandate could contradict that spirit.

Friday, December 22, 2024

The information sheet (the link will go live before the class starts)
We start with a review. Please review
- To celebrate the concept of reduced row echelon form of a matrix and the full power of its utility I wrote the webpage
  An Ode to Reduced Row Echelon Form
- The definition of an abstract vector space in Section 4.1, page 192.
- The definition of a linearly independent set and the definition of a basis in Section 4.3; Examples 3, 4, 5, 6 and 10; Practice Problems 1, 2, 3, Exercises 1-8 and 38.
- Section 4.4: Theorem 7 (the unique representation theorem), the definition of coordinates with respect to a basis, the definition of a change-of-coordinates matrix on page 249 and the definition and the properties of a coordinate mapping; Examples 1, 2, 4, 5, 6; Practice Problems 1, 2; Exercises 3, 4, 5, 7, 9, 10, 11, 13, 18, 21, 32.
- Section 4.5: Theorem 10, the definition of a finite-dimensional vector space and its dimension and the Basis Theorem; Examples 1, 2, 3, 4; Practice Problems 1, 2; Exercises 2, 3, 7, 22, 24, and 34.
- Section 4.7 Change of Basis. Suggested exercises are 2, 3, 4, 6, 8, 9, 11, 12, 15, 16, 19 (you do not need a calculator for this problem).
Before briefly reflecting on the history of Linear Algebra, I want to celebrate the history of mathematics with one long sentence:

Throughout history, human civilizations continue to develop and share mathematical knowledge; succeeding civilizations recognize and admire the contributions of the preceding ones; prior mathematical creations serve as a foundation and inspiration for further advancements, all in unity, giving mathematics the true spirit of a collective endeavor of the entirety of humanity through history, the present, and the future.
What is the oldest linear algebra problem?
- Clay tablet VAT 8389 from the Old Babylonian period, from 2000 to 1600 BC, contains what is believed to be the earliest word problem that can be interpreted as a system of linear equations:
  
  Total area of two fields is 1800 sar, the rent for one is 2 silà of grain per 3 sar, for the other is 1 silà per 2 sar, the total rent on the first exceeds that on the other by by 500 silà. What is the area of each plot?
  
  This blog has a picture of clay tablet VAT 8389 and more details about it.
  
  A translation of this word problem into a system of linear equations is as follows: \begin{alignat*}{4} &x_1 & &\ + &x_2 & = 1800 \\ \tfrac{2}{3} &x_1 & &- \tfrac{1}{2} &x_2 & = \phantom{1}500. \end{alignat*}
- Problem 40 of the Rhind papyrus which is dated to 1550 BC is:
  
  Divide 100 hekats of barley among 5 men so that the common difference is the same and so that the sum of the two smallest is 1/7 the sum of the three largest.
  
  Since the Rhind papyrus was copied by the scribe Ahmes from a now-lost text from the period around 1850 BC, and this lost text might have been copied from an even older text from around 2500 BC, the above problem could be by far the oldest known linear algebra problem.
  
  Denote by $x_1$ the smallest number and by $x_2$ the common difference. After simplification the above problem translates into the following system of linear equations: \begin{alignat*}{5} 5 &x_1 & & + 10 &x_2 & = 100 \\ \tfrac{11}{7} &x_1 & & - \phantom{1}\tfrac{2}{7} &x_2 & = \phantom{10}0. \end{alignat*}
- Most importantly for us, the oldest known treatment of systems of linear equations from antiquity which resembles the methods that we will use in this class is in Chapter 8 of the Chinese textbook Nine Chapters of the Mathematical Art which is at least 1800 years old.
  
  From 3 top-grade rice paddies, 2 medium-grade, and 1 low-grade, the combined yield is 39 dou of grain. From 2 top-grade, 3 medium-grade, and 1 low-grade, the combined yield is 34 dou of grain. From 1 top-grade, 2 medium-grade, and 3 low-grade, the combined yield is 26 dou of grain. How much dou does one bundle of each grade yield?
  
  Denote by $x_1$ the yield of the top-grade rice paddy, by $x_2$ the yield of the medium-grade, and by $x_3$ the yield of the low-grade rice paddy. Then the above problem translates into the following system of linear equations: \begin{alignat*}{4} 3 &x_1 && + 2 x_2 && + \phantom{3} x_3 &&= 39 \\ 2 &x_1 && + 3 x_2 && + \phantom{3} x_3 &&= 34 \\ &x_1 && + 2 x_2 && + 3 x_3 && = 26 \end{alignat*}
If the history of mathematics might inspire you to study mathematics with more fascination, below I link to some websites with more about the history of Linear Algebra.
- Early History of Linear Algebra by Roger Hart
- History of matrices
- History of abstract vector spaces
- Solving a System of Linear Equations Using Ancient Chinese Methods by Mary Flagg

Winter 2024 MATH 304: Linear algebra

Branko Ćurgus

Winter 2024
MATH 304: Linear algebra