Math 304 - Winter 2023

Branko Ćurgus

Thursday, March 9, 2023

Suggested problems for Section 7.4: 3, 7, 11, 13, 14, 15, 17, 21
While teaching today, I hope I have improved my presentation of the Singular Value Decomposition. I post white board pictures from today.

Until I implement these new ideas in more formal writing, I post notes I wrote last year below.
I believe that colors and pictures will help you internalize the process of the construction of the Singular Value Decomposition.

Example 1. The following $4\!\times\!5$ matrix is used as an example of Singular Value Decomposition on Wikipedia. \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right]. \] Since this matrix has a lot of zero entries it should not be hard to find its SVD. Remember, SVD is not unique, so if the SVD that we find is different from what is on Wikipedia, it does not mean that it is wrong.
- In this item I will state an important principle in finding an SVD by hand. Let $A$ be an $m\!\times\!n$ matrix and let \[ A = U\Sigma V^\top \] be an SVD of $A.$ Notice that knowing an SVD of $A$ immediately have found a Singular Value Decomposition of $A^\top$: \[ A^\top = V \Sigma^\top U^\top. \] When you write down the matrix $\Sigma^\top$ you see that the entries on the ``diagonal'' of this matrix are the same as the entries of $\Sigma$. Therefore the singular values of $A$ and $A^\top$ are the same. The only difference is that matrices $U$ and $V$ change positions. Conversely, if we know a Singular Value Decomposition of $A^\top$ we immediately know a Singular Value Decomposition of $A.$
  The above observation is particularly important if the positive integer $n$ is "much" larger than the positive integer $m.$ To understand why, think of what is involved in finding an SVD of $A$: We need to find an orthogonal diagonalization of the $n\!\times\!n$ matrix $A^\top A.$ Contrast this with what is involved in finding an SVD of $A^\top$: We need to find an orthogonal diagonalization of the $m\!\times\!m$ matrix $(A^\top)^\top A^\top = AA^\top.$ Since we assume that $m$ is a smaller positive integer, it is easier to find orthogonal diagonalization of $A^\top.$
- To find a Singular Value Decomposition of $M$ from Wikipedia, we are looking for a $4\!\times\!4$ orthogonal matrix $U$, the $4\!\times\!5$ matrix $\Sigma$ with the singular values of $M$ on the "diagonal", and a $5\!\times\!5$ orthogonal matrix $V$, such that $M = U\Sigma V^\top$.
  
  As explained in the previous item, finding an SVD of $M^\top$ is easier. Thus, we proceed with finding \[ M^\top = V \Sigma^\top U^\top, \] with $U,$ $V$ and $\Sigma$ as above.
- (I) To find the singular values and right singular vectors of $M^\top$ we calculate the matrix \[ M M^\top = \left[\!\begin{array}{rrrr} 5 & 0 & 0 & 0 \\ 0 & 9 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 4 \end{array}\right]. \] Clearly the eigenvalues of this matrix in nonincreasing order are $9,$ $5,$ $4$ and $0.$ Thus the singular values of $M$ and $M^\top$ are $3,$ $\sqrt{5},$ and $2.$ The ranks of both $M$ and $M^\top$ are $3.$ The dimension of the nulspace of $M$ is $2$ and the dimension of the nulspace of $M^\top$ is $1$. The matrix $\Sigma$, in fact for us now it is $\Sigma^\top,$ is \[ \Sigma^\top = \left[\!\begin{array}{cccc} 3 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] \] The corresponding orthogonal matrix $U$ is \[ U = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \]
- (II) To find a $5\!\times\!5$ orthogonal matrix $V$ we notice that the equality $M^\top = V \Sigma^\top U^\top$ implies \[ M^\top U = V \Sigma^\top. \] Thus, we calculate \[ M^\top U =\left[\!\begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 3 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \end{array}\right] = \left[\!\begin{array}{cccc} 0 & 1/\sqrt{5} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 2/\sqrt{5} & 0 & 0 \end{array}\right] \left[\!\begin{array}{cccc} 3 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] \] Thus, the first three columns of $V$ are \[ \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 2/\sqrt{5} & 0 \end{array}\right]. \] Notice that to find these three columns we performed a minimal amount of calculation.
- (III) The next step is to find the remaining two columns of $V.$ Since the first three columns of $V$ form an orthonormal basis for $\operatorname{Row} M$, the remaining two columns of $V$ will be two orthonormal vectors in $\operatorname{Nul} M.$ To find these vectors row-reduce $M$: \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] \quad \sim \quad \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array}\right]. \] Thus, the null-space of $M$ is spanned by the orthogonal vectors \[ \left[\!\begin{array}{r} -2 \\ 0 \\ 0 \\ 0 \\ 1 \end{array}\right] \qquad \text{and} \qquad \left[\!\begin{array}{r} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{array}\right]. \] Finally we have the complete $5\!\times\!5$ matrix $V$ \[ V = \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 & -2/\sqrt{5} & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 2/\sqrt{5} & 0 & 1/\sqrt{5} & 0 \end{array}\right]. \]
To celebrate our work we verify \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \left[\!\begin{array}{ccccc} 3 & 0 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{ccc} 0 & 0 & 1 & 0 & 0 \\ 1/\sqrt{5} & 0 & 0 & 0 & 2/\sqrt{5} \\ 0 & 1 & 0 & 0 & 0 \\ -2/\sqrt{5} & 0 & 0 & 0 & 1/\sqrt{5} \\ 0 & 0 & 0 & 1 & 0 \end{array}\right] \] Or, equivalently, what is easier $MV = U\Sigma$: \[ \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 & -2/\sqrt{5} & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 2/\sqrt{5} & 0 & 1/\sqrt{5} & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \left[\!\begin{array}{ccccc} 3 & 0 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]. \]
Example 2. Here is a calculation of a singular value decomposition of the matrix \[ A = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right]. \]
- (I) To find the singular values and right singular vectors we calculate the matrix \[ A^\top \!A = \left[\!\begin{array}{rrrr} 3 & -1 & 1 & 1 \\ -1 & 3 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] = \left[\!\begin{array}{rrr} 12 & -4 & 4 \\ -4 & 12 & 4 \\ 4 & 4 & 4 \end{array}\right] = 4 \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \end{array}\right]. \] Observe that adding the first two columns and subtracting twice the third column gives the zero vector. Hence $\lambda_3 = 0$ is an eigenvalue of $A^\top\!A$ and a corresponding eigenvector is $\bigl[ -1 \ -1 \ \ 2 \bigr]^\top$. Since each row of $A^\top\!A$ sums to $12$, $\lambda_2 = 12$ is an eigenvalue of $A^\top\!A$ and a corresponding eigenvector is $\bigl[ 1 \ \ 1 \ \ 1 \bigr]^\top$. Since the vector $\bigl[ 1 \ -1 \ \ 0 \bigr]^\top$ is orthogonal to both earlier found eigenvectors it also must be an eigenvector of $A^\top\!A$. The corresponding eigenvalue is $\lambda_1 = 16$. Thus the singular values of $A$ are $\sigma_1 = 4$ and $\sigma_2 = 2\sqrt{3}$, and the matrices $\Sigma$ and $V$ are as follows \[ \Sigma = \left[\!\begin{array}{rrr} 4 & 0 & 0 \\ 0 & 2\sqrt{3} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right] \qquad V = \left[\!\begin{array}{rrr} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{6}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{6}} \\ 0 & \frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \end{array}\right] = \bigl[ \mathbf{v}_1 \ \mathbf{v}_2 \ \mathbf{v}_3 \bigr]. \]
- (II) To find a $4\!\times\!4$ orthogonal matrix $U$ we first normalize vectors \[ A \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \end{array}\right] = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \end{array}\right] = \left[\!\begin{array}{r} 4 \\ -4 \\ 0 \\ 0 \end{array}\right] = 4 \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \\ 0 \end{array}\right], \quad \text{hence} \quad \mathbf{u}_1 = \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array}\right], \] and \[ A \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right] = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right] = \left[\!\begin{array}{r} 3 \\ 3 \\ 3 \\ 3 \end{array}\right] = 3 \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \\ 1 \end{array}\right], \quad \text{hence} \quad \mathbf{u}_2 = \left[\!\begin{array}{r} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right]. \] From the general considerations about the singular value decomposition we know that the singular values and left and right singular vectors must satisfy: $A\mathbf{v}_1 = \sigma_1 \mathbf{u}_1$ and $A\mathbf{v}_2 = \sigma_2 \mathbf{u}_2$. Next we verify these equalities: \[ \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{array}\right] = 4 \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array}\right] \quad \text{and} \quad \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array}\right] = 2\sqrt{3} \left[\!\begin{array}{r} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right] \] It has been established in class that $\mathbf{u}_1$ and $\mathbf{u}_2$ form an orthonormal basis for $\operatorname{Col}A$.
- (III) To complete the matrix $U$ we need an orthonormal basis for $\mathbb{R}^4$. Since the space $\operatorname{Nul}\bigl(A^\top\bigr)$ is the orthogonal complement of $\operatorname{Col}A$, we can simply find the nullspace of $A^\top$, and then find two orhonormal vectors in $\operatorname{Nul}\bigl(A^\top\bigr).$ Here we go: \[ \textstyle \left[\!\begin{array}{rrrr} 3 & -1 & 1 & 1 \\ -1 & 3 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 1 & 1 & 1 \\ 0 & 4 & 2 & 2 \\ 0 & -4 & -2 & -2 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 1 & 1 & 1 \\ 0 & 1 & 1/2 & 1/2 \\ 0 & 0 & 0 & 0 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 0 & 1/2 & 1/2 \\ 0 & 1 & 1/2 & 1/2 \\ 0 & 0 & 0 & 0 \end{array}\right] \] Thus, \[ \operatorname{Nul}\bigl(A^\top\bigr) = \left\{ s \left[\!\begin{array}{r} -1 \\ -1 \\ 0 \\ 2 \end{array}\right] + t \left[\!\begin{array}{r} -1 \\ -1 \\ 2 \\ 0 \end{array}\right] \ : \ s, t \in \mathbb{R} \right\}. \] All the vectors in $\operatorname{Nul}\bigl(A^\top\bigr)$ are orthogonal to $\mathbf{u}_1$ and $\mathbf{u}_2$ (verify this). There are many pairs of orthonormal vectors in $\operatorname{Nul}\bigl(A^\top\bigr).$ One pair that cough my attention is obtained with $s=1/2$, $t=1/2$ and $s=1/2$, $t=-1/2$ and then normalized. That is the pair \[ \mathbf{u}_3 = \left[\!\begin{array}{r} -\frac{1}{2} \\ - \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right] \quad \text{and} \quad \mathbf{u}_4 = \left[\!\begin{array}{c} 0 \\ 0 \\ -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array}\right] \] Finally, \[ U = \left[\!\begin{array}{rrrr} \frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \end{array}\right]. \]
- Remark To find vectors $\mathbf{u}_3$ and $ \mathbf{u}_4$ it might be slightly more efficient to proceed in the following way. Since we know that $\mathbf{u}_1$ and $ \mathbf{u}_2$ form a basis for $\operatorname{Col} A$ we can find a basis for $(\operatorname{Col} A)^{\perp}$ by solving the system \[ \left[\!\begin{array}{rrrr} 1 & -1 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array}\right] = \left[\!\begin{array}{c} 0 \\ 0 \end{array}\right] \] The row reduction of the matrix \[ \left[\!\begin{array}{rrrr} 1 & -1 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{array}\right] \sim \cdots \sim \left[\!\begin{array}{rrrr} 1 & 0 & 1/2 & 1/2 \\ 0 & 1 & 1/2 & 1/2 \end{array}\right] \] might be simpler than the row reduction that we did in (III).
To celebrate our work we verify \[ \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] = \left[\!\begin{array}{rrrr} \frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \end{array}\right] \left[\!\begin{array}{rrr} 4 & 0 & 0 \\ 0 & 2\sqrt{3} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{rrr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \end{array}\right] . \]

Monday, March 6, 2023

Suggested problems for Section 7.4: 3, 7, 11, 13, 14, 15, 17, 21
I continue with examples started on Thursday.
Example 5. In this item we consider the quadratic form \begin{align*} Q(x_1, x_2, x_3) & = 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 \\ & \qquad \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- We have \begin{align*} Q(x_1, x_2, x_3) & = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] \\ & = 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 \\ & \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{array} \!\right] = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 4 \end{array} \!\right] \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The vector $\mathbf{y}$ is the coordinate vector of $\mathbf{x}$ relative to the basis $\mathcal{B}$, that is $\mathbf{y} = \bigl[\mathbf{x}\bigr]_{\mathcal{B}}.$
  
  With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 = y_1^2 + y_2^2 + 4 y_3^2. \] The quadratic form $y_1^2 + y_2^2 + 4 y_3^2$ is positive definite since $y_1^2 + y_2^2 + 4 y_3^2 \geq 0$ for all $y_1, y_2, y_3 \in \mathbb{R}$ and $y_1^2 + y_2^2 + 4 y_3^2 = 0$ implies $(y_1,y_2,y_3) = (0,0,0).$ Therefore the given quadratic form $Q(\mathbf{x})$ is also positive definite.
- The above introduced change of coordinates yields the following set equality \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = c \bigr\} = \left\{ \mathbf{x} \in \mathbb{R}^3 \, : \bigl[ \mathbf{x}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{c} y_1 \\ y_2 \\ y_3\end{array} \!\right] \quad \text{and} \quad \, y_1^2 + y_2^2 + 4 y_3^2 = c \right\} \] which holds for each $c \in \mathbb{R}.$ Since \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = -1 \bigr\} \] is the empty set, the stated set equality with $c = -1$ yields that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} \] is the empty set.
  
  Since the set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = 0 \bigr\} \] is a singleton set consisting of the zero vector, the stated set equality with $c = 0$ yields that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] is a singleton set consisting of the zero vector.
  
  The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = 1 \bigr\} \] is a rotated ellipsoid. This ellipsoid is obtained as the ellipse \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + 4 y_3^2 = 1, \ y_2 = 0 \bigr\} = \left\{ \left[\! \begin{array}{c} \cos\theta \\ 0 \\ \frac{1}{2} \sin\theta \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}, \] which is in the $y_1y_3$-plane, rotates about the $y_3$-axis. The set equality stated at the beginning of this item with $c = 1$ yields that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \ : \ Q(\mathbf{x}) = 1 \bigr\} \] is also a rotated ellipsoid obtained as the ellipse \[ \left\{ (\cos\theta)\left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + \frac{1}{2} (\sin\theta ) \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}, \] which is in the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right], \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \] rotates about the line determined by the vector $\displaystyle \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right].$ Notice that the intersection of this ellipsoid and the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] is the unit circle \[ \left\{ (\cos\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}. \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal matrices, we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{y_1^2 + y_2^2 + 4 y_3^2 \, : \, \| \mathbf{y} \| = 1, \ \mathbf{y} \in\mathbb{R}^3 \bigr\}. \] Since whenever $y_1^2 + y_2^2 + y_3^2 = 1$ we have \[ 1 = y_1^2 + y_2^2 + y_3^2 \leq y_1^2 + y_2^2 + 4 y_3^2 \leq 4 y_1^2 + 4 y_2^2 + 4 y_3^2 = 4, \] we deduce that $\min S = 1$ and $\max S = 4$. The form $y_1^2 + y_2^2 + 4 y_3^2$ takes the value $1$ when $(y_1,y_2,y_3) = (\cos \theta, \sin \theta, 0)$ and the value $4$ when $(y_1,y_2,y_3) = (0,0,1)$ or $(y_1,y_2,y_3) = (0,0,-1)$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that the minimum value, $1$, of $S$ is taken at the circle on the unit sphere in $\mathbb{R}^3$ given by \[ \left\{ (\cos\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}. \] The situation with the maximum value $4$ of $S$ is simpler; this value is taken at two vectors: \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 5 \bigr\} = \left\{\left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \right\}. \]

Sunday, March 5, 2023

Suggested problems for Section 7.3: 1, 3, 5, 9, 11, 12
In Section 7.2 in the book the author does not discus quadratic forms with three variables. Here are some animations that might help you understand the quadratic form $x_1^2 + x_2^2 - x_3^2$. Here I show the surfaces in ${\mathbb R}^3$ with equations $x_1^2 + x_2^2 - x_3^2 = c$ for different values of $c$. These surfaces are called hyperboloids. You can read more at the Wikipedia Hyperboloid page. One sheet hyperboloids are often encountered in art, see these Wikipedia pages Hyperboloid structure and list of hyperboloid structures, do not miss the Gallery at the bottom of the last Wikipedia page.
Place the cursor over the image to start the animation.

Five of the above level surfaces at different level of opacity.
I continue with examples posted on Thursday.
Example 4. In this item we consider the quadratic form \[ Q(x_1, x_2, x_3) = 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \begin{align*} Q(x_1, x_2, x_3) = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] & = 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 \\ & \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array} \!\right] = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates of $\mathbf{x}$ relative to the basis $\mathcal{B}$, that is $\mathbf{y} = \bigl[\mathbf{x}\bigr]_{\mathcal{B}}.$ With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 = - y_1^2 - y_2^2 + 5 y_3^2. \] Clearly the quadratic form $- y_1^2 - y_2^2 + 5 y_3^2$ is an indefinite form taking the value $-1$ at $(y_1,y_2,y_3) = (1,0,0)$ and the value $5$ at $(y_1,y_2,y_3) = (0,0,1).$
- The above introduced change of coordinates yields the following set equality \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = c \bigr\} = \left\{ \mathbf{x} \in \mathbb{R}^3 \, : \bigl[ \mathbf{x}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{c} y_1 \\ y_2 \\ y_3\end{array} \!\right] \quad \text{and} \quad \, - y_1^2 - y_2^2 + 5 y_3^2 = c \right\} \] which holds for each $c \in \mathbb{R}.$
  
  Since the set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = 0 \bigr\} \] is a rotated cone, the set equality stated at the beginning of this item with $c=0$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] is also a rotated cone. This cone obtained by the rotation of the line spanned by the vector \[ \sqrt{5} \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] about the line spanned by the vector \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = 1 \bigr\} \] is a rotated two sheet hyperboloid. This two sheet hyperboloid is obtained as the hyperbola $-y_1^2 + 5 y_3^2 = 1, y_2 = 0$ rotates about the $y_3$-axis. Consequently, the set equality stated at the beginning of this item with $c=1$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} \] is also a rotated two sheet hyperboloid. This hyperboloid is obtained as a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] rotates about the vector \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = -1 \bigr\} \] is a rotated one sheet hyperboloid. This hyperboloid is obtained as the hyperbola $-y_2^2 + 5 y_3^2 = -1, y_1 = 0$ rotates about $y_3$-axis. The set equality stated at the beginning of this item with $c=-1$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} \] is also a rotated one sheet hyperboloid. This hyperboloid is obtained as a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] rotates about the vector \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ - y_1^2 - y_2^2 + 5 y_3^2 \, : \, \mathbf{y} = 1, \ \mathbf{y} \in\mathbb{R}^3 \bigr\}. \] Since whenever $y_1^2 + y_2^2 + y_3^2 = 1$ we have \[ -1 = -1 y_1^2 - 1 y_2^2 - 1 y_3^2 \leq - y_1^2 - y_2^2 + 5 y_3^2 \leq 5 y_1^2 + 5 y_2^2 + 5 y_3^2 = 5 \] wededuce that $\min S = -1$ and $\max S = 5$. The form $- y_1^2 - y_2^2 + 5 y_3^2$ takes the value $-1$ when $(y_1,y_2,y_3) = (\cos \theta, \sin \theta, 0)$ and the value $5$ when $(y_1,y_2,y_3) = (0,0,1)$ or $(y_1,y_2,y_3) = (0,0,-1)$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that: The minimum value $-1$ is taken at the circle on the unit sphere in $\mathbb{R}^3$. That is \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \left\{ (\cos \theta) \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin \theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \ : \ \theta \in [0, 2\pi) \right\}. \] The situation with the maximum value $5$ is simpler, this value is taken at two vectors: \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 5 \bigr\} = \left\{\left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \right\}. \]

Thursday, March 2, 2023

In the several items below we will consider several specific quadratic forms $Q$ and answer the following questions four questions:
- Write the quadratic form $Q$ using a symmetric matrix $A$ as $Q(\mathbf{x}) = \mathbf{x}^\top\!A \mathbf{x}$ where $\mathbf{x}\in \mathbb{R}^n.$
- Classify $Q$ using the quadruplicity stated in the post on Tuesday: positive semidefinite, negative semidefinite, indefinite. Don't forget to state whether the form is positive definite or negative definite or not.
- Give a detailed description of the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = -1 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 0 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 1 \bigr\}. \]
- Consider the set of real numbers \[ S = \bigl\{ Q(\mathbf{x}) \, : \ \mathbf{x}\in \mathbb{R}^n, \ \ \| \mathbf{x} \| = 1 \bigr\}. \] Determine $\min S$ and $\max S$ and describe the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \min S \bigr\}, \qquad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \max S \bigr\}. \]
Example 1. In this item we consider the quadratic form \[ Q(x_1,x_2) = 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1,x_2) = \bigl[ x_1 \ \ x_2 \bigr] \left[\! \begin{array}{cc} 6 & -2 \\ -2 & 3 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{cc} 6 & -2 \\ -2 & 3 \end{array} \!\right] = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right] \left[\! \begin{array}{cc} 2 & 0 \\ 0 & 7 \end{array} \!\right] \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ which consists of the blue vectors in the next image With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 = 2 y_1^2 + 7 y_2^2. \] Since clearly $2 y_1^2 + 7 y_2^2 \geq 0$ for all $y_1, y_2 \in \mathbb{R}$ and $2 y_1^2 + 7 y_2^2 = 0$ if and only if $(y_1,y_2) =(0,0).$ In conclusion, the given quadratic form is positive definite.
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = -1 \bigr\}, \] (this set is clearly an empty set) \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 0 \bigr\} \] (this set clearly consists of the zero vector only) and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 1 \bigr\}. \] The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 1 \bigr\} \] is an ellipse. The vertices of this ellipse in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(\frac{\sqrt{2}}{2}, 0 \right) , \ \left(-\frac{\sqrt{2}}{2}, 0 \right), \qquad \text{co-vertices}: \left(0, \frac{\sqrt{7}}{7} \right) , \ \left(0, -\frac{\sqrt{7}}{7}\right). \] To get the coordinates of these points in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(\frac{\sqrt{10}}{10},\frac{\sqrt{10}}{5} \right) , \ \left(-\frac{\sqrt{10}}{10}, - \frac{\sqrt{10}}{5} \right), \] \[ \text{co-vertices}: \left(-\frac{2 \sqrt{35}}{35}, \frac{\sqrt{35}}{35} \right) , \ \left(\frac{2 \sqrt{35}}{35}, -\frac{\sqrt{35}}{35} \right). \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^2, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ 2 y_1^2 + 7 y_2^2 \, : \, y_1^2 + y_2^2 = 1, \ y_1, y_2 \in\mathbb{R} \bigr\}. \] Since \[ 2 = 2 y_1^2 + 2 y_2^2 \leq 2 y_1^2 + 7 y_2^2 \leq 7 y_1^2 + 7 y_2^2 = 7 \] whenever $y_1^2 + y_2^2 = 1$, we have that $\min S = 2$ and $\max S = 7$. The form $2 y_1^2 + 7 y_2^2$ takes the value $2$ when $y_1 = 1, y_2 =0$ and $y_1 = -1, y_2 =0$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 2 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 7 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{-2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{2}{\sqrt{5}} \\ \frac{-1}{\sqrt{5}} \end{array} \!\right] \right\}. \]
Example 2. In this item we consider the quadratic form \[ Q(x_1,x_2) = x_1^2 + 6 x_2 x_1 + x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1,x_2) = \bigl[ x_1 \ \ x_2 \bigr] \left[\! \begin{array}{cc} 1 & 3 \\ 3 & 1 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = x_1^2 + 6 x_2 x_1 + x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{cc} 1 & 3 \\ 3 & 1 \end{array} \!\right] = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right] \left[\! \begin{array}{cc} 4 & 0 \\ 0 & -2 \end{array} \!\right] \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ which consists of the blue vectors in the next image With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ x_1^2 + 6 x_2 x_1 + x_2^2 = 4 y_1^2 - 2 y_2^2. \] Clearly $4 y_1^2 - 2 y_2^2$ is an indefinite form taking the value $4$ at $(y_1,y_2) = (1,0)$ and the value $-2$ at $(y_1,y_2) = (0,1).$
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = -1 \bigr\}, \] \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 0 \bigr\} \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 1 \bigr\}. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 1 \bigr\} \] is a hyperbola. The vertices of this hyperbola in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(\frac{1}{2}, 0 \right) , \ \left(-\frac{1}{2}, 0 \right) \] and the asymptotes of this hyperbola are two lines which are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right]. \] To get the coordinates of the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(\frac{\sqrt{2}}{4},\frac{\sqrt{2}}{4} \right) , \ \left(-\frac{\sqrt{2}}{4},-\frac{\sqrt{2}}{4} \right), \] and the asymptotes are determined by the vectors \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 0 \bigr\} \] is a union of two lines that go through the origin. These lines are the asymptotes of the preceding hyperbola and are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right], \] in the coordinates relative to the basis $\mathcal{B}$. To get the coordinates in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = -1 \bigr\} \] is a hyperbola. The vertices of this hyperbola in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(0, \frac{\sqrt{2}}{2}\right) , \ \left(0, -\frac{\sqrt{2}}{2} \right) \] and the asymptotes of this hyperbola are two lines which are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right]. \] To get the coordinates of the the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(-\frac{1}{2},\frac{1}{2} \right) , \ \left(\frac{1}{2}, -\frac{1}{2} \right), \] and the asymptotes are determined by the vectors \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^2, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ 4 y_1^2 - 2 y_2^2 \, : \, y_1^2 + y_2^2 = 1, \ y_1, y_2 \in\mathbb{R} \bigr\}. \] Since \[ -2 = -2 y_1^2 - 2 y_2^2 \leq 4 y_1^2 - 2 y_2^2 \leq 4 y_1^2 + 4 y_2^2 = 4 \] whenever $y_1^2 + y_2^2 = 1$, we have that $\min S = -2$ and $\max S = 4$. The form $4 y_1^2 - 2 y_2^2$ takes the value $-2$ when $y_1 = 0, y_2 = 1$ or $y_1 = 0, y_2 =-1$ and the value $4$ when $y_1 = 1, y_2 = 0$ or $y_1 = -1, y_2 =0$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -2 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 4 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \right\}. \]
Example 3. In this item we consider the quadratic form \[ Q(x_1, x_2, x_3) = x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1, x_2, x_3) = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 1 & -2 & 0 \\ -2 & 0 & 2 \\ 0 & 2 & -1 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] = x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 \quad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 1 & -2 & 0 \\ -2 & 0 & 2 \\ 0 & 2 & -1 \end{array} \!\right] = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} -3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \end{array} \!\right] \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right], \left[\! \begin{array}{c} \frac{2}{3} \\ \frac{1}{3} \\ \frac{2}{3} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 = - 3 y_1^2 + 3 y_2^2. \] Clearly the form $- 3 y_1^2 + 3 y_2^2$ is an indefinite form taking the value $-3$ at $(y_1,y_2,y_3) = (1,0,0)$ and the value $3$ at $(y_1,y_2,y_3) = (0,1,0).$
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = -1 \bigr\}, \] \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 0 \bigr\}, \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 1 \bigr\}. \]
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 0 \bigr\} \] is a union of two planes. These planes are represented by the following two spans: \[ \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\} \qquad \text{and} \qquad \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ -1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\} \] in the coordinates relative to the basis $\mathcal{B}$. To get the spans of the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \operatorname{Span} \left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 2 \\ 1 \\ 2 \end{array} \!\right] \right\} \qquad \text{and} \qquad \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ -4 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 2 \\ 1 \\ 2 \end{array} \!\right] \right\} \]
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 1 \bigr\} \] is a hyperbolic cylinder. The equation $- 3 y_1^2 + 3 y_2^2 = 1$ represents a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \quad \text{coordinates relative to} \quad \mathcal{E}. \] The cylinder is formed by the parallel lines that go through the points on the hyperbola and are orthogonal to the plane spanned by the above two vectors.
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = -1 \bigr\} \] is a hyperbolic cylinder. The equation $- 3 y_1^2 + 3 y_2^2 = -11$ represents a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \quad \text{coordinates relative to} \quad \mathcal{E}. \] The cylinder is formed by the parallel lines that go through the points on the hyperbola and are orthogonal to the plane spanned by the above two vectors.
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ -3 y_1^2 + 3 y_2^2 \, : \, y_1^2 + y_2^2 + y_3^2 = 1, \ y_1, y_2, y_3 \in\mathbb{R} \bigr\}. \] Since \[ -3 = -3 y_1^2 - 3 y_2^2 - 3 y_3^2 \leq -3 y_1^2 + 3 y_2^2 \leq 3 y_1^2 + 3 y_2^2 + 3 y_3^2 = 3 \] whenever $y_1^2 + y_2^2 + y_3^2 = 1$, we have that $\min S = -3$ and $\max S = 3$. The form $-3 y_1^2 + 3 y_2^2$ takes the value $-3$ when $y_1 = 1, y_2 = 0$ or $y_1 = -1, y_2 =0$ and the value $3$ when $y_1 = 0, y_2 = 1$ or $y_1 = 0, y_2 =-1$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -3 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right], -\left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 3 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right], - \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \right\}. \]

Tuesday, February 28, 2023

Suggested problems for Section 7.2: 1, 3, 5, 7, 9, 13, 17, 19, 20, 21, 23, 25
In Sections 7.2 and 7.3 we study quadratic forms.
A quadratic form in $n$ variables is a special kind of function $Q:\mathbb{R}^n \to \mathbb{R}.$ Below are few examples of quadratic forms
- Below are three specific quadratic forms in two variables: \[ Q(x_1,x_2) = 6 x_1^2 - 4 x_1 x_2 + 3 x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] \[ Q(x_1,x_2) = x_1^2 + 6 x_1 x_2 + x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] \[ Q(x_1,x_2) = 4 x_1^2 + 4 x_1 x_2 + x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] In general, a quadratic form $Q$ in two variables $x_1,x_2$ is a function defined on $\mathbb{R}^2$ with the values in $\mathbb{R}$ which can be expressed as \[ Q(x_1,x_2) = a\, x_1x_1 + b\, x_1x_2 + c\, x_2x_2, \qquad (x_1,x_2) \in \mathbb{R}^2, \] where $a, b, c$ are real coefficients.
- Below are three specific quadratic forms in three variables: \[ Q(x_1,x_2,x_3) = x_1^2 -4x_1 x_2 +4 x_2 x_3 - x_3^2, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] \[ Q(x_1,x_2,x_3) = 4x_1 x_2 + 2 x_1 x_3 + 3 x_2^2 + 4 x_2 x_3, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] \[ Q(x_1,x_2,x_3) = 2 x_1^2 + 2 x_1 x_2 + 2 x_1 x_3 + 2 x_2^2 + 2 x_2 x_3 + 2 x_3^2, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] In general, a quadratic form $Q$ in three variables $x_1,x_2,x_3$ is a function defined on $\mathbb{R}^3$ with the values in $\mathbb{R}$ which can be expressed as \[ Q(x_1,x_2,x_3) = a\, x_1x_1 + b\, x_1x_2 + c\, x_1x_3 + d\, x_2 x_2 + e\, x_2 x_3 + f\, x_3 x_3, \quad (x_1,x_2,x_3) \in \mathbb{R}^3, \] where $a, b, c, d, e, f$ are real coefficients.
- A quadratic form $Q$ in four variables $x_1,x_2,x_3,x_4$ is a function defined on $\mathbb{R}^4$ with the values in $\mathbb{R}$ which is a linear combination of the following ten terms \[ x_1x_1, \quad x_1x_2, \quad x_1x_3, \quad x_1 x_4, \quad x_2 x_2, \quad x_2 x_3, \quad x_2x_4, \quad x_3 x_3, \quad x_3 x_4, \quad x_4 x_4. \] In other words, a quadratic form in four variables is a polynomial in four variables which contains only terms of degree $2.$
- In general, a quadratic form in $n$ variables is a polynomial in $n$ variables which contains only terms of degree $2.$ To be more specific, for $j, k \in \{1,\ldots,n\}$ with $j \leq k$ let us define the functions $q_{jk}:\mathbb{R}^n \to \mathbb{R}$ by \[ q_{jk}(\mathbf{x}) = x_j x_k, \qquad \mathbf{x} = (x_1,\ldots,x_n) \in \mathbb{R}^n. \] Notice that there are $\binom{n+1}{2} = \frac{n(n+1)}{2}$ such functions. A linear combination of the functions $q_{jk}(\mathbf{x})$ with $j, k \in \{1,\ldots,n\}$ with $j \leq k$, is called a quadratic form in $n$ variables.
- For us, the most important fact about quadratic forms is that for each quadratic form $Q$ in $n$ variables there exists a unique symmetric $n\!\times\!n$ matrix $A$ such that \[ Q(\mathbf{x}) = \mathbf{x} \cdot (A\mathbf{x}) = \mathbf{x}^\top A\mathbf{x} \quad \text{for all} \quad \mathbf{x} \in \mathbb{R}^n. \] Such matrix $A$ is called the matrix of a quadratic form.
- In the above example, for all $(x_1,x_2) \in \mathbb{R}^2$ we have \[ Q(x_1,x_2) = a\, x_1x_1 + b\, x_1x_2 + c\, x_2x_2 = \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]^\top \left(\left[\! \begin{array}{cc} a & b/2 \\ b/2 & c \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] \right) \] And for all $(x_1,x_2,x_3) \in \mathbb{R}^3$ we have \begin{align*} Q(x_1,x_2,x_3) &= a\, x_1x_1 + b\, x_1x_2 + c\, x_1x_3 + d\, x_2 x_2 + e\, x_2 x_3 + f\, x_3 x_3 \\ & = \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right]^\top \left(\left[\! \begin{array}{ccc} a & b/2 & c/2 \\ b/2 & d & e/2 \\ c/2 & e/2 & f \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] \right) \end{align*}
In this item I will write about polychotomies in mathematics. A polychotomy is a partition of a given set of mathematical objects into disjoint classes which are all given distinct names.
- A dichotomy is a partition of a given set of mathematical objects into two disjoint classes each of which is given a name. The following are examples of dichotomies.
  - The most important dichotomy for numbers is the partition of numbers into the singleton set $\{0\}$ consisting of only zero and the set of all nonzero numbers. Further, dichotomy for the nonzero real numbers is the partition of the nonzero real numbers into positive real numbers and negative real numbers.
  - An important dichotomy for the set of real numbers is the partition into rational and irrational numbers.
  - A useful dichotomy for complex numbers is the partition of the complex numbers into the real and nonreal numbers. A complex number $z$ is said to be nonreal if the imaginary part of $z$ is nonzero.
  - Consider the set of all square matrices. A square matrix $M$ is said to be singular if $\det M = 0.$ A square matrix $M$ is said to be nonsingular if $\det M \neq 0.$ You also learned that a square matrix is invertible if and only if it is nonsingular. Thus, singular-invertible is a dichotomy for square matrices.
- A trichotomy is a partition of a given set of mathematical objects into three disjoint classes each of which is given a name. The following are examples of trichotomies.
  - The most important trichotomy for the set of real numbers is the partition of numbers into singleton set $\{0\}$ consisting of only zero, the set of positive real numbers and the set of negative real numbers. As we mention before this trichotomy arrises as two dichotomies.
  - In high school you learned about the trichotomy involving quadratic equations $a x^2 + b x + c = 0$ with $a\neq 0.$ Such equation can have: no solutions, exactly one solution, and exactly two solutions.
- A quadruplicity is a partition of a given set of mathematical objects into four disjoint classes each of which is given a name. I started writing about polychotomies because of the quadruplicity which arises with quadratic forms. I define that quadruplicity in the next item.
Let $Q : \mathbb R^n \to \mathbb R$ be a quadratic form. We distinguish the following four types of quadratic forms:
- $Q$ is said to be a zero quadratic form if $Q(\mathbf x) = 0$ for all $\mathbf x \in \mathbb R^n.$
- $Q$ is said to be a positive semidefinite quadratic form if $Q(\mathbf x) \geq 0$ for all $\mathbf x \in \mathbb R^n$ and there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \gt 0.$
- $Q$ is said to be a negative semidefinite quadratic form if $Q(\mathbf x) \leq 0$ for all $\mathbf x \in \mathbb R^n$ and there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \lt 0.$
- $Q$ is said to be an indefinite quadratic form if there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \gt 0$ and there exists $\mathbf u \in \mathbb R^n$ such that $Q(\mathbf u) \lt 0.$
The above four definitions constitute a quadruplicity for the set of quadratic forms. In the textbook the author emphasizes two special kinds of semidefinite forms:
- $Q$ is said to be a positive definite quadratic form if $Q(\mathbf x) \gt 0$ for all $\mathbf x \in \mathbb R^n\!\setminus\!\{\mathbf 0\}.$
- $Q$ is said to be a negative definite quadratic form if $Q(\mathbf x) \lt 0$ for all $\mathbf x \in \mathbb R^n\!\setminus\!\{\mathbf 0\}.$
In the image below I give a graphical representation of the above quadruplicity. The red dot represents the zero quadratic form, the green region represents the positive semidefinite quadratic forms, the blue region represents the negative semidefinite quadratic forms and the cyan region represents the indefinite quadratic forms.

In the image above, the dark green region represents the positive definite quadratic forms and the dark blue region represents the negative definite quadratic forms. These two regions are not parts of the above quadruplicity.

Monday, February 27, 2023

We finished Section 7.1. Suggested problems 3, 4, 9, 11, 15, 19, 23, 24, 25, 27, 30, 33, 35.
There are several important theorems in Section 7.1. Their proofs are presented in this item.
Theorem. All eigenvalues of a symmetric matrix are real.

Proof. We will prove that the eigenvalues of a $2\!\times\!2$ be a symmetric matrix are real. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ be a symmetric matrix. To calculate the eigenvalues of $A$ we solve $\det(A-\lambda I) =0$, that is \[ 0 = \left| \begin{matrix} a - \lambda & b \\ b & d -\lambda \end{matrix} \right| = (a-\lambda)(d-\lambda) - b^2 = \lambda^2 -(a+d)\lambda + ad -b^2. \] Solving for $\lambda$ we get \[ \lambda_{1,2} = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a+d)^2 - 4 b^2} \Bigr) = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since $(a-d)^2 + 4 b^2 \geq 0$ both eigenvalues are real. In fact, if $(a-d)^2 + 4 b^2 = 0$, then $b = 0$ and $a=d$, so our matrix is a multiple of an identity matrix. Othervise, that is if $(a-d)^2 + 4 b^2 \gt 0$, the symmetric matrix has two distinct eigenvalues \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr) \lt \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr). \]

Theorem. Eigenspaces of a symmetric matrix which correspond to distinct eigenvalues are orthogonal.

Proof. Let $A$ be a symmetric $n\!\times\!n$ matrix. Let $\lambda$ and $\mu$ be an eigenvalues of $A$ and let $\mathbf{u}$ and $\mathbf{v}$ be a corresponding eigenvector. Then $\mathbf{u} \neq \mathbf{0},$ $\mathbf{v} \neq \mathbf{0}$ and \[ A \mathbf{u} = \lambda \mathbf{u} \quad \text{and} \quad A \mathbf{v} = \mu \mathbf{v}. \] Assume that \[ \lambda \neq \mu. \] Next we calculate the same dot product in two different ways; here we use the fact that $A^\top = A$ and algebra of the dot product. The first calculation: \[ (A \mathbf{u})\cdot \mathbf{v} = (\lambda \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \] The second calculation: \begin{align*} (A \mathbf{u})\cdot \mathbf{v} & = (A \mathbf{u})^\top \mathbf{v} = \mathbf{u}^\top A^\top \mathbf{v} = \mathbf{u} \cdot \bigl(A^\top \mathbf{v} \bigr) = \mathbf{u} \cdot \bigl(A \mathbf{v} \bigr) \\ & = \mathbf{u} \cdot (\mu \mathbf{v} ) = \mu ( \mathbf{u} \cdot \mathbf{v}) \end{align*} Since, \[ (A \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \quad \text{and} \quad (A \mathbf{u})\cdot \mathbf{v} = \mu (\mathbf{u}\cdot\mathbf{v}) \] we conclude that \[ \lambda (\mathbf{u}\cdot\mathbf{v}) - \mu (\mathbf{u}\cdot\mathbf{v}) = 0. \] Therefore \[ ( \lambda - \mu ) (\mathbf{u}\cdot\mathbf{v}) = 0. \] Since we assume that $ \lambda - \mu \neq 0,$ the previous displayed equality yields \[ \mathbf{u}\cdot\mathbf{v} = 0. \] This proves that any two eigenvectors corresponding to distinct eigenvalues are orthogonal. Thus, the eigenspaces corresponding to distinct eigenvalues are orthogonal.

Theorem. A symmetric $2\!\times\!2$ matrix is orthogonally diagonalizable.

Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ be a symmetric matrix. We need to prove that there exists an orthogonal $2\!\times\!2$ matrix $U$ and a diagonal $2\!\times\!2$ matrix $D$ such that $A = UDU^\top.$ The eigenvalues of $A$ are \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr), \quad \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since clearly \[ (a-d)^2 + 4 b^2 \geq 0, \] the eigenvalues $\lambda_1$ and $\lambda_2$ are real numbers.

If $\lambda_1 = \lambda_2$, then $(a-d)^2 + 4 b^2 = 0$, and consequently $b= 0$ and $a=d$; that is $A = \begin{bmatrix} a & 0 \\ 0 & a \end{bmatrix}$. Hence $A = UDU^\top$ holds with $U=I_2$ and $D = A$.

Now assume that $\lambda_1 \neq \lambda_2$. Let $\mathbf{u}_1$ be a unit eigenvector corresponding to $\lambda_1$ and let $\mathbf{u}_2$ be a unit eigenvector corresponding to $\lambda_2$. We proved that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal. Since $A$ is symmetric, $\mathbf{u}_1$ and $\mathbf{u}_2$ are orthogonal, that is the matrix $U = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 \end{bmatrix}$ is orthogonal. Since $\mathbf{u}_1$ and $\mathbf{u}_2$ are eigenvectors of $A$ we have \[ AU = U \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = UD. \] Therefore $A=UDU^\top.$ This proves that $A$ is orthogonally diagonalizable.

Second Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ an arbitrary $2\!\times\!2$ be a symmetric matrix. If $b=0$, then an orthogonal diagonalization is \[ \begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \]

Assume that $b\neq0.$ For the given $a,b,c \in \mathbb{R},$ introduce three new coordinates $z \in \mathbb{R},$ $r \in (0,+\infty),$ and $\theta \in (0,\pi)$ such that \begin{align*} z & = \frac{a+d}{2}, \\ r & = \sqrt{\left( \frac{a-d}{2} \right)^2 + b^2}, \\ \cos(2\theta) & = \frac{\frac{a-d}{2}}{r}, \quad \sin(2\theta) = \frac{b}{r}. \end{align*} The reader will notice that these coordinates are very similar to the cylindrical coordinates in $\mathbb{R}^3.$
It is now an exercise in matrix multiplication and trigonometry to calculate \begin{align*} & \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} z+r & 0 \\ 0 & z-r \end{bmatrix}\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} (z+r) \cos(\theta) & (z+r) \sin(\theta) \\ (r-z)\sin(\theta) & (z-r) \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} (z+r) (\cos(\theta))^2 - (r-z)(\sin(\theta))^2 & (z+r) \cos(\theta) \sin(\theta) -(z-r) \cos(\theta) \sin(\theta) \\ (z+r) \cos(\theta) \sin(\theta) + (r-z) \cos(\theta) \sin(\theta) & (z+r) (\sin(\theta))^2 + (z-r)(\cos(\theta))^2 \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} z + r \cos(2\theta) & r \sin(2\theta) \\ r \sin(2\theta) & z - r \cos(2\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \frac{a+d}{2} + \frac{a-d}{2} & b \\ b & \frac{a+d}{2} - \frac{a-d}{2} \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} a & b \\ b & d \end{bmatrix}. \end{align*}

Theorem. For every positive integer $n$, a symmetric $n\!\times\!n$ matrix is orthogonally diagonalizable.

Proof. (You can skip this proof.) This statement can be proved by Mathematical Induction. The base case $n = 1$ is trivial. The case $n=2$ is proved above. To get a feel how mathematical induction proceeds we will prove the theorem for $n=3.$

Let $A$ be a $3\!\times\!3$ symmetric matrix. Then $A$ has an eigenvalue, which must be real. Denote this eigenvalue by $\lambda_1$ and let $\mathbf{u}_1$ be a corresponding unit eigenvector. Let $\mathbf{v}_1$ and $\mathbf{v}_2$ be unit vectors such that the vectors $\mathbf{u}_1,$ Let $\mathbf{v}_1$ and $\mathbf{v}_2$ form an orthonormal basis for $\mathbb R^3.$ Then the matrix $V_1 = \bigl[\mathbf{u}_1 \ \ \mathbf{v}_1\ \ \mathbf{v}_2\bigr]$ is an orthogonal matrix and we have \[ V_1^\top A V_1 = \begin{bmatrix} \mathbf{u}_1^\top A \mathbf{u}_1 & \mathbf{u}_1^\top A \mathbf{v}_1 & \mathbf{u}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{u}_1 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{u}_1 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] Since $A = A^\top$, $A\mathbf{u}_1 = \lambda_1 \mathbf{u}_1$ and since $\mathbf{u}_1$ is orthogonal to both $\mathbf{v}_1$ and $\mathbf{v}_2$ we have \[ \mathbf{u}_1^\top A \mathbf{u}_1 = \lambda_1, \quad \mathbf{v}_j^\top A \mathbf{u}_1 = \lambda_1 \mathbf{v}_j^\top \mathbf{u}_1 = 0, \quad \mathbf{u}_1^\top A \mathbf{v}_j = \bigl(A \mathbf{u}_1\bigr)^\top \mathbf{v}_j = 0, \quad \quad j \in \{1,2\}, \] and \[ \mathbf{v}_2^\top A \mathbf{v}_1 = \bigl(\mathbf{v}_2^\top A \mathbf{v}_1\bigr)^\top = \mathbf{v}_1^\top A^\top \mathbf{v}_2 = \mathbf{v}_1^\top A \mathbf{v}_2. \] Hence, \[ \tag{**} V_1^\top A V_1 = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_2 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] By the already proved theorem for $2\!\times\!2$ symmetric matrix there exists an orthogonal matrix $\begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}$ and a diagonal matrix $\begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix}$ such that \[ \begin{bmatrix} \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{v}_2 & \mathbf{v}_2^\top A \mathbf{v}_2 \end{bmatrix} = \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix} \begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}^\top. \] Substituting this equality in (**) and using some matrix algebra we get \[ V_1^\top A V_1 = \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} % \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} % \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix}^\top \] Setting \[ U = V_1 \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} \quad \text{and} \quad D = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} \] we have that $U$ is an orthogonal matrix, $D$ is a diagonal matrix and $A = UDU^\top.$ This proves that $A$ is orthogonally diagonalizable.

Today in class you noticed that the orthogonal matrix \[ U = \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \] is also symmetric.
- The fact that this matrix is symmetric implies that its eigenvalues are real. The fact that this matrix is orthogonal is implies that its transpose it its inverse. Since this matrix equals its transpose, we deduce that this matrix is its own inverse. That is $U^2 = I_3$. This implies that the eigenvalues $\lambda$ of $U$ satisfy $\lambda^2 = 1$. Thus, the eigenvalues are $1$ and $-1$. Let us calculate the orthogonal diagonalization of this matrix.
- Since in the preceding item we concluded that the only possible eigenvalues of $U$ are $1$ and $-1$ we can just calculate the corresponding eigenvectors by row reducing the matrices $U- I_3$ and $U+I_3$.
- First $U- I_3$: \[ U - I_3 = \left[ \begin{array}{ccc} -\frac{5}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{5}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{2}{3} \end{array} \right] \sim \cdots \sim \left[ \begin{array}{ccc} 1 & 0 & -\frac{1}{2} \\ 0 & 1 & -\frac{1}{2} \\ 0 & 0 & 0 \end{array} \right] \] Thus an eigenvector corresponding to the eigenvalue $\lambda = 1$ is $\displaystyle \left[ \begin{array}{c} 1 \\ 1 \\ 2 \end{array} \right]$.
- Now row reduce $U + I_3$ \[ U + I_3 = \left[ \begin{array}{ccc} \frac{1}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{4}{3} \end{array} \right] \sim \cdots \sim \left[ \begin{array}{ccc} 1 & 1 & 2 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right] \] Thus two orthogonal eigenvectors corresponding to the eigenvalue $\lambda = -1$ are $\displaystyle \left[ \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \right]$ and $\displaystyle \left[ \begin{array}{c} 1 \\ 1 \\ -1 \end{array} \right]$
- We conclude that the matrix $U$ leaves the vectors along collinear with $\displaystyle \left[ \begin{array}{c} 1 \\ 1 \\ 2 \end{array} \right]$ unchanged and maps the vectors orthogonal to this vector into their opposites. Exactly what we had in Problem 4 on Assignment 2. In fact, the matrix $U$ is the matrix of the reflection across the line determined by the vector $\displaystyle \left[ \begin{array}{c} 1 \\ 1 \\ 2 \end{array} \right]$.
- Thus, the orthogonal diagonalization of the matrix $U$ is \[ \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] = \left[ \begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \end{array} \right] \left[ \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}}& \frac{2}{\sqrt{6}} \end{array} \right] \] Or briefly $U = PDP^\top$, with $PP^\top = I_3$.
- Since this matrix has only eigenvalues $-1$ and $1$, we can write \[ \left[ \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 1 \end{array} \right] = - I_3 + 2 \left[ \begin{array}{ccc} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right]. \]
- Now we calculate \[ U = PDP^\top = P\left( - I_3 + 2 \left[ \begin{array}{ccc} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right] \right) P^\top = -I_3 + 2 P \left[ \begin{array}{ccc} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right]P^\top. \] Here we used the fact that $P I_3 P^\top = I_3$.
- Further we calculate that \begin{align*} P \left[ \begin{array}{ccc} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right]P^\top & = \left[ \begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \end{array} \right] \left[ \begin{array}{ccc} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}}& \frac{2}{\sqrt{6}} \end{array} \right] \\ &= \left[ \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \end{array} \right] \left[ \begin{array}{ccc} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}}& \frac{2}{\sqrt{6}} \end{array} \right] \\ & = \left[ \begin{array}{ccc} \frac{1}{6} & \frac{1}{6} & \frac{1}{3} \\ \frac{1}{6} & \frac{1}{6} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \right] \end{align*}
- Thus, we have \[ U = - I_3 + 2 \mathbf{u} \mathbf{u}^\top \quad \text{where} \quad \mathbf{u} = \left[ \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \end{array} \right] \]
- What we established in the preceding item is in some sense universal. For an arbitrary $3\times 3$ matrix $U$ which is both orthogonal and symmetric there exists a unit vector $\mathbf{u} \in \mathbb{R}^3$ such that \[ U = - I_3 + 2 \mathbf{u} \mathbf{u}^\top \qquad \text{or} \qquad U = I_3 - 2 \mathbf{u} \mathbf{u}^\top. \]
- This is an excellent introduction to one problem that will appear on the final assignment.

Saturday, February 25, 2023

We started Section 7.1 Friday. Suggested problems are 3, 4, 9, 11, 15, 19, 23, 24, 25, 27, 30, 33, 35.
Let us find a spectral decomposition of the matrix \[ A = \left[\! \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \!\right]. \]
- To find the characteristic polynomial of this matrix we calculate \begin{align*} \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 4 & 3 - \lambda & 2 \\ 2 & 2 & -\lambda \end{array} \right| & = \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 0 & -1 - \lambda & 2 + 2 \lambda \\ 2 & 2 & -\lambda \end{array} \right| \\ & = (1+\lambda) \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 0 & -1 & 2 \\ 2 & 2 & -\lambda \end{array} \right| \\ & = (1+\lambda) \left| \begin{array}{ccc} 3 - \lambda & 0 & 10 \\ 0 & -1 & 2 \\ 2 & 0 & 4-\lambda \end{array} \right| \\ & = (-1) (1+\lambda) \left| \begin{array}{cc} 3 - \lambda & 10 \\ 2 & 4-\lambda \end{array} \right| \\ & = (-1) (1+\lambda) \bigl(\lambda^2 - 7 \lambda - 8 \bigr)\\ & = (-1) (\lambda+1)^2 (\lambda - 8 \bigr)\\ & = -\lambda^3 + 6 \lambda^2 + 15 \lambda + 8. \end{align*} Thus the roots of the characteristic polynomial, that is the eigenvalues of $A$, are $-1$ and $8.$ The algebraic multiplicity of $-1$ is $2$ and the algebraic multiplicity of $8$ is $1.$ Therefore we expect that the eigenspace corresponding to $-1$ will be two-dimensional and the eigenspace corresponding to $8$ will be one-dimensional and the
- Next we find the corresponding eigenspaces: \begin{align*} \operatorname{Nul}(A-(-1)I_3) & = \operatorname{Span} \left\{ \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right], \left[ \begin{array}{r} -1 \\ 0 \\ 2 \end{array} \right] \right\}, \\ \operatorname{Nul}(A-8I_3) & = \operatorname{Span} \left\{ \left[ \begin{array}{r} 2 \\ 2 \\ 1 \end{array} \right] \right\} \end{align*}
- To find an orthogonal diagonalization of $A$ we need unit eigenvectors which are orthogonal to each other. The eigenvector $\bigl[ 2 \ 2 \ 1 \bigr]^\top$ is "nice" since its length is integer $3$ (that is, it does not involve square-root). However neither of the eigenvectors in the basis of the eigenspace corresponding to $-1$ has this property. But, if we add the vectors of the basis of the eigenspace corresponding to $-1$ we get $\bigl[ -2 \ 1 \ 2 \bigr]^\top$ and the length of this vector is also $3$. Now we need to find an eigenvector corresponding to $-1$ orthogonal to $\bigl[ -2 \ 1 \ 2 \bigr]^\top.$ To do that we use $\bigl[ -1 \ 1 \ 0 \bigr]^\top$ and apply the Gram-Schmidt orthogonalization to two vectors. \begin{align*} \mathbf{v}_1 & = \left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right], \\ \mathbf{v}_2 & = \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right] - \frac{3}{9} \left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right] = \left[ \begin{array}{r} -1/3 \\ 2/3 \\ -2/3 \end{array} \right] \quad \text{take the opposite vector, fewer - signs!} \end{align*} Thus, three orthogonal unit eigenvectors of $A$ are \[ \frac{1}{3}\left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right], \quad \frac{1}{3}\left[ \begin{array}{r} 1 \\ -2 \\ 2 \end{array} \right], \quad \frac{1}{3}\left[ \begin{array}{r} 2 \\ 2 \\ 1 \end{array} \right]. \]
- Thus, the orthogonal diagonalization of $A$ is \[ \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \left[ \begin{array}{rrr} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 8 \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right]. \]
- We will talk more about this in class. We will develop an alternative way of writing matrix $A$ as a linear combination of orthogonal projections onto the eigenspaces of $A$.
- The columns of \[ \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \] form an orthonormal basis for $\mathbb{R}^3$ which consists of unit eigenvectors of $A.$
- The first two columns \[ \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \] form an orthonormal basis for the eigenspace of $A$ corresponding to $-1.$ The last column \[ \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \] is an orhonormal basis for the eigenspace of $A$ corresponding to $8.$
- The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $-1$ is \[ P_{-1} = \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] \]
- The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $8$ is \[ P_8 = \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \left[ \begin{array}{ccc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right]. \]
- Since the eigenvectors that we used above form a basis for $\mathbb{R}^3$ we have \[ P_{-1} + P_8 = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] + \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = I_3. \]
- The equality in the preceding item means that for every $\mathbf{v} \in \mathbb{R}^3$ we have \[ \mathbf{v} = P_{-1} \mathbf{v} + P_8 \mathbf{v}. \] Since $P_{-1}$ is the orthogonal projection onto the eigenspace of $A$ corresponding to $-1$ we have \[ A P_{-1} \mathbf{v} = (-1) P_{-1} \mathbf{v}. \] Similarly, since $P_{8}$ is the orthogonal projection onto the eigenspace of $A$ corresponding to $8$ we have \[ A P_{8} \mathbf{v} = 8 P_{8} \mathbf{v}. \] Therefore \[ A\mathbf{v} = A P_{-1} \mathbf{v} + A P_8 \mathbf{v} = (-1)P_{-1} \mathbf{v} + 8 P_{8} \mathbf{v} = \bigl((-1)P_{-1} + 8 P_{8}\bigr) \mathbf{v}. \] Since the last equality holds for all $\mathbf{v} \in \mathbb{R}^3$, we proved that \[ A = (-1)P_{-1} + 8 P_{8}, \] or, with matrices \[ \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = (-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 8 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] The last two equalities are called spectral decomposition of $A.$ Although it sounds mathematically surprising, the diagonalization of $A$ is equivalent to the preceding equality. I will talk more about this in class.
- The orthogonal projection matrix $P_{-1}$ onto $\operatorname{Nul}(A-(-1)I_3)$ and the orthogonal projection matrix $P_8$ onto $\operatorname{Nul}(A-8I_3)$ have the following properties: \[ (P_{-1})^2 = P_{-1}, \quad P_{-1}^\top = P_{-1}, \quad I_3 - P_{-1} = P_8, \quad P_{-1} P_8 = 0, \] \[ (P_{8})^2 = P_{8}, \quad P_{8}^\top = P_{8}, \quad I_3 - P_{8} = P_{-1}, \quad P_{8} P_{-1} = 0. \]
- Please enjoy: \[ I_3 = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right]. \] And then just by scaling the projections with the eigenvalues we get \[ A = \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = (-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 8 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] We can modify the eigenvalues and define a new matrix: \[ B = \frac{1}{3} \left[ \begin{array}{ccc} 1 & 4 & 2 \\ 4 & 1 & 2 \\ 2 & 2 & -2 \end{array} \right] =(-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 2 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] Pay attention to the changes that I made and you can tell (or guess) without any calculations what is the relationship between the matrices $B$ and $A.$

Thursday, February 23, 2023

Rigorous reasoning in Mathematics is based on Mathematical Logic. I wrote a short file to basics of mathematical logic: Brief Review of Mathematical Logic.
After reading Brief Review of Mathematical Logic you can read my webpage Mathematical Rigor in the Context of Quadratic Functions.
In Discussions on Canvas, Kiana pointed out that in another class the following book is used: Book of Proof. In particular Chapters 4-6 talk about different types of proofs.

Tuesday, February 21, 2023

We talked about the second part of Section 6.8: Fourier Series. Suggested problems: 5, 6, 7, 8, 9, 11, 12.

The concept of orthogonality is essential in each inner product space. Let $\mathcal{V}$ be an inner product space with the inner product $\langle\,\cdot\,,\cdot\,\rangle.$ For completeness we state the definition of an orthogonal set of vectors in an inner product space.
Definition. A set of vectors $\{ u_1, \ldots, u_m \}$ in the inner product space $\mathcal{V}$ is said to be an orthogonal set of vectors if it has the following two properties:
- For all $j,k \in \{1,\ldots,m\}$ such that $j\neq k$ we have $\langle u_j, u_k \rangle = 0$.
- For all $k \in \{1,\ldots,m\}$ we have $\langle u_k, u_k \rangle \gt 0$. (This in fact means that all the vectors in this set are nonzero vectors.)
Here we state three important properties of an orthogonal set of vectors. Assume that $\{ u_1, \ldots, u_m \}$ is an orthogonal set of vectors in $\mathcal{V}.$
- The set $\{ {u}_1, \ldots, {u}_m \}$ is linearly independent.
- If $v \in \operatorname{Span} \{ u_1, \ldots, u_m \}$, then the solution of the vector equation \[ \require{bbox} \color{red}{\alpha_1} \color{#008000}{{u}_1} + \cdots + \color{red}{\alpha_m} \color{#008000}{{u}_m} = \color{#008000}{v} \] is given by \[ \forall \mkern+1mu k\in \{1,\ldots,m\} \quad \color{red}{\alpha_k} = \frac{\langle \color{#008000}{v}, \color{#008000}{u_k} \rangle}{\langle \color{#008000}{u_k}, \color{#008000}{u_k} \rangle}. \] Notice a beautiful separation of colors. This property I call "easy solving of a vector equation.") This property is essential for answering question (b) in Problem 5 on Assignment 3.
- Let $v \in \mathcal{V}$ and let \[ \mathcal W = \operatorname{Span} \{ u_1, \ldots, u_m \}. \] Then \[ \operatorname{Proj}_{\mathcal W}({v}) = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m \] This property I call "easy orthogonal projection." I derived this formula in class. It is proved in the book. However, in the next item below I will prove that the vector on the right-hand side of the equality above satisfies properties ① and ② in the definition of the orthogonal projection.
In this item I will prove the third property stated in the preceding item. I will prove that the vector $\mathbf{w}$ defined as \[ {w} = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m. \] is the projection of ${v}$ onto $\mathcal{W}.$ That is we will prove that ${w}$ is the projection $\operatorname{Proj}_{\mathcal W}({v}).$ For the proof, we need to prove properties ① and ② in the definition of the orthogonal projection. That is we need to prove: \[ w \in \mathcal{W} \qquad \text{and} \qquad (v - w) \perp \mathcal{W}. \]
- Since $\mathcal W$ is the span of the vectors $u_1, \ldots, u_m$ and ${w}$ is a linear combination of the vectors $u_1, \ldots, u_m$ we have that ${w} \in \mathcal{W}.$ This proves ①.
- To prove ②, that is to prove \[ ( v - w) \perp \mathcal{W} \] we recall the equivalence \[ (v - w) \perp \mathcal{W} \qquad \Leftrightarrow \qquad \forall\mkern+1mu k\in\{1,\ldots,m\} \quad \bigl\langle (v - w), u_k \bigr\rangle = 0. \]
- The proof of the right-hand side in the preceding equivalence follows. Let $k\in\{1,\ldots,m\}$ be arbitrary. We calculate \begin{align*} \bigl\langle (v - w), u_k \bigr\rangle & = \bigl\langle v , u_k \bigr\rangle - \bigl\langle w, u_k \bigr\rangle \\ &= \bigl\langle v , u_k \bigr\rangle - \left\langle \sum_{j=1}^m \frac{\langle v, u_j \rangle}{\langle u_j, u_j \rangle} \, {u}_j , {u}_k \right\rangle \\ & = \bigl\langle v , u_k \bigr\rangle - \sum_{j=1}^m \frac{\langle v, u_j \rangle}{\langle u_j, u_j \rangle} \langle u_j, u_k \rangle \\ & = \bigl\langle v , u_k \bigr\rangle - \bigl\langle v , u_k \bigr\rangle \\ & = 0. \end{align*} Since $k\in\{1,\ldots,m\}$ was arbitrary, this proves that for all $k\in\{1,\ldots,m\}$ we have $\bigl\langle (v - w), u_k \bigr\rangle = 0.$ By the equivalence in the preceding item, property ② is proved.
Below we will demonstrate how to use the orthogonal projection formula presented above to find Fourier approximations of functions on the interval $[-\pi,\pi].$ I recall the orthogonal projection formula first.
- Let $v \in \mathcal{V},$ let $\{ u_1, \ldots, u_m \}$ be an orthogonal set of vectors, and let \[ \mathcal W = \operatorname{Span} \{ u_1, \ldots, u_m \}. \] Then \[ \operatorname{Proj}_{\mathcal W}({v}) = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m \]
To apply the above orthogonal projection formula we need a vector space with an inner product.

We consider the vector space of continuous functions on the interval is $[-\pi,\pi].$ The notation for this vector space is $C[-\pi,\pi].$ The inner product in this space is given by \[ \bigl\langle f, g \bigr\rangle = \int_{-\pi}^{\pi} f(t) g(t) dt \qquad \text{where} \quad f, g \in C[-\pi,\pi]. \] When we do not have a specific names for functions that we are considering we will write functions using the variable. For example, we write \[ \bigl\langle t^2, \cos(n t) \bigr\rangle \] for the inner product of the square function and the cosine function of frequency $n.$
The set of functions \[ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \cos(3t), \sin(3t), \ \ldots, \ \cos(n t), \sin(n t), \ \ldots \] is an orthogonal set of functions. To prove this claim we calculate \begin{align*} \bigl\langle 1, \cos(n t) \bigr\rangle &= \int_{-\pi}^{\pi} 1 \, \cos(n t) dt = 0, \\ \bigl\langle 1, \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} 1 \, \sin(n t) dt = 0, \\ \bigl\langle \cos(m t), \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \cos(m t) \, \sin(n t) dt = 0, \\ \bigl\langle \cos(m t), \cos(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \cos(m t) \, \cos(n t) dt = 0 \quad \text{whenever} \quad m\neq n, \\ \bigl\langle \sin(m t), \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \sin(m t) \, \sin(n t) dt = 0 \quad \text{whenever} \quad m\neq n, \\ \bigl\langle 1, 1 \bigr\rangle &= \int_{-\pi}^{\pi} 1^2 dt = 2 \pi, \\ \bigl\langle \cos(n t), \cos(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \bigl(\cos(n t) \bigr)^2 dt = \pi, \\ \bigl\langle \sin(n t), \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \bigl(\sin(n t) \bigr)^2 dt = \pi. \end{align*} These integrals are probably done in Math 125. Some of these integrals are not difficult, some require product-to-sum trigonometric identities.
As you can see to calculate inner products in the vector space $C[-\pi,\pi]$ we have to calculate many definite integrals. Doing these calculations "by hand" is an interesting, but time consuming process. It is much more convenient to calculate these integrals using technology. There is another reason why using technology is inevitable in this setting: When a formula for a Fourier approximation is found, the only way to visually check how well it approximates a given function is to plot both the given function and the approximation that is found. My technology tool is Wolfram Mathematica, an amazing piece of software. To get you started with Mathematica I created my Mathematica website.
As an alternative to integral calculations presented in Math 125, here is one example of Mathematica code that will confirm what you did in your Calculus integration class
```
Integrate[Cos[m t]*Cos[n t], {t, -Pi, Pi}]
```
Mathematica responds We immediately see that the above formula does not hold for $m=n.$ Next, we exercise our knowledge that for $m, n \in \mathbb{N}$ we have $\sin(m \pi) = 0$ and $\sin(n \pi) = 0$ to verify that \[ \int_{-\pi}^{\pi} \cos(m t) \, \cos(n t) dt = 0 \quad \text{whenever} \quad m\neq n. \] Warning: Mathematica has powerful commands Simplify[] and FullSimplify[] in which we can place assumptions and ask Mathematica to algebraically simplify mathematical expressions. For example,
```
FullSimplify[
    Integrate[Cos[m t]*Cos[n t], {t, -Pi, Pi}],
                And[n \[Element] Integers, m \[Element] Integers]
                  ]
```
Unfortunately, Mathematica response to this command is 0. This is clearly wrong when m and n are equal; as shown by evaluating
```
FullSimplify[
    Integrate[Cos[n t]*Cos[n t], {t, -Pi, Pi}],
                And[n \[Element] Integers]
                  ]
```
So, Mathematica is powerful, but one has to exercise critical thinking.
Next, I will present a calculation of the projection of the projection of the square function $t^2$ onto the span \[ \mathcal{W} = \operatorname{Span} \{ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \cos(3t), \sin(3t), \ \ldots, \ \cos(n t), \sin(n t) \}, \] where $n\in\mathbb{N}.$
Recall the projection formula \[ \operatorname{Proj}_{\mathcal W}({v}) = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m \] Here the role of the vector $v$ is played by the function $t^2.$ The role of the vectors $u_1,\ldots,u_m$ is played by the functions \[ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \cos(3t), \sin(3t), \ \ldots \] In this setting the coefficients $\displaystyle \frac{\langle v, u_k \rangle}{\langle u_k, u_k \rangle}$ are called the Fourier coefficients.
To calculate the Fourier coefficients for the function $t^2$ we need to calculate the integrals \begin{align*} \bigl\langle t^2, 1 \bigr\rangle & = \int_{-\pi}^{\pi} t^2 dt = \frac{2}{3} \pi^3, \\ \bigl\langle t^2, \sin(k t) \bigr\rangle &= \int_{-\pi}^{\pi} t^2 \sin(k t) dt = 0, \\ \bigl\langle t^2, \cos(k t) \bigr\rangle &= \int_{-\pi}^{\pi} t^2 \cos(k t) dt = (-1)^k\frac{4}{k} \pi , \\ \end{align*} where $k \in \mathbb{N}.$ Hence, the Fourier coefficients for $t^2$ are \begin{align*} \frac{\bigl\langle t^2, 1 \bigr\rangle}{\bigl\langle 1, 1 \bigr\rangle} & = \frac{\frac{2}{3} \pi^3}{2\pi} = \frac{1}{3} \pi^2, \\ \frac{\bigl\langle t^2, \sin(k t) \bigr\rangle}{\bigl\langle \sin(k t), \sin(k t) \bigr\rangle} &= \frac{0}{\pi} = 0, \\ \frac{\bigl\langle t^2, \cos(k t) \bigr\rangle}{\bigl\langle \cos(k t), \cos(k t) \bigr\rangle} &= \frac{ (-1)^k\frac{4}{k} \pi}{\pi} = (-1)^k \frac{4}{k}. \\ \end{align*}
Let $n\mathbb{N}$ be a fixed number. The projection (or the best approximation) of the function $t^2$ onto the span of the trigonometric functions with integer frequencies \[ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \ \ldots, \ \cos(nt), \sin(nt) \] is \[ \frac{1}{3} \pi^2 + \sum_{k=1}^n (-1)^k \frac{4}{k} \cos(k t). \]
The power of the approximation found in the preceding item is illustrated with a plot of the function $t^2$ and the approximation with a specific value of $n.$ For example, with $n=3$ the approximation of $t^2$ is \[ \frac{1}{3}\pi ^2 - 4 \cos (t) + \cos (2 t) - \frac{4}{9} \cos (3 t). \] To illustrate this in Mathematica execute this command
```
nn = 3; Plot[
{t^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k t], {k, 1, nn}]}, {t, -3 Pi, 3 Pi},
  PlotPoints -> {100, 200},
 PlotStyle -> {
      {RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}
          },
 Ticks -> {Range[-2 Pi, 2 Pi, Pi/2], Range[-14, 14, 2]},
 PlotRange -> {{-Pi - 0.1, Pi + 0.1}, {-1, Pi^2 + 0.2}},
 AspectRatio -> 1/GoldenRatio
        ]
```
Changing the value of nn in the above Mathematica expression one gets a better approximation.
Since the trigonometric functions used above are periodic with period $2\pi$ it is common to consider the periodic extension of the square function from the interval $[-\pi,\pi]$ to the entire real line. To plot the periodic extension of functions in Mathematica I use the Mathematica function Mod[t,2 Pi, -Pi]. To plot of Mod[t,2 Pi, -Pi] I use Mathematica code
```
Plot[
{Mod[t, 2 Pi, -Pi]}, {t, -3 Pi, 3 Pi}, PlotStyle -> {{RGBColor[0, 0, 0], Thickness[0.005]}},
       Ticks -> {Range[-5 Pi, 5 Pi, Pi/2], Range[-5 Pi, 5 Pi, Pi/2]},
         PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-Pi - 1, Pi + 1}},
           GridLines -> {Range[-5 Pi, 5 Pi, Pi/4], Range[-5 Pi, 5 Pi, Pi/4]},
 AspectRatio -> Automatic, ImageSize -> 600
      ]
```
with the following output

The periodic extension of the square function is just the square of the preceding Mod[t,2 Pi, -Pi]. The periodic extension and its approximation are illustrated as follows


nn = 10;  Plot[
      {Mod[t, 2 Pi, -Pi]^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k t], {k, 1, nn}]},
              {t, -4 Pi, 4 Pi}, PlotPoints -> {100, 200},
  PlotStyle -> {{RGBColor[0, 0, 0.5],
     Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}},
  Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-14, 14, 2]},
  PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-1, Pi^2 + 1}},
  AspectRatio -> Automatic, ImageSize -> 600
        ]

Repeating the above procedure to the linear function $t$ we get the following approximation with 10 sine functions of integer frequencies: \begin{multline*} 2 \sin(t)-\sin (2 t) + \frac{2}{3} \sin(3 t) - \frac{1}{2} \sin(4 t) + \frac{2}{5} \sin (5 t) \\ - \frac{1}{3} \sin(6 t) + \frac{2}{7} \sin(7 t) - \frac{1}{4} \sin(8 t)+\frac{2}{9} \sin(9 t) - \frac{1}{5} \sin(10 t) \end{multline*}


nn = 10;  Plot[
{Mod[t, 2 Pi, -Pi], Sum[-((2 (-1)^k)/k)*Sin[k t], {k, 1, nn}]},
       {t, -4 Pi, 4 Pi}, PlotPoints -> {100, 200},
 PlotStyle -> {{RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}},
 Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-4 Pi, 4 Pi, Pi/2]},
 GridLines -> {Range[-4 Pi, 4 Pi, Pi/4], Range[-4 Pi, 4 Pi, Pi/4]},
 PlotRange -> {{-3 Pi - 0.5, 3 Pi + 0.5}, {-Pi - 1, Pi + 1}},
 AspectRatio -> Automatic, ImageSize -> 600
            ]

Here is the Mathematica notebook that I used for the above calculations.

Monday, February 20, 2023

On Friday we discussed Section 6.7 Inner Product Spaces. Suggested problems for Section 6.7: 1, 2, 3, 5, 7, 9, 10, 13, 16, 17, 19, 20, 21, 23, 25
In this item, I recall the definition of an abstract inner product. In the definition below $\times$ denotes the Cartesian product between two sets.

Definition. Let $\mathcal{V}$ be a vector space over $\mathbb R.$ A function \[ \langle\,\cdot\,,\cdot\,\rangle : \mathcal{V}\times\mathcal{V} \to \mathbb{R} \] is called an inner product on $\mathcal{V}$ if it satisfies the following four axioms.

IPC. For all $u, v \in \mathcal{V}$ we have $\langle u,v\rangle = \langle v,u\rangle$.

IPA. For all $u, v, w \in \mathcal{V}$ we have $\langle u + v, w \rangle = \langle u,w\rangle + \langle v,w\rangle$.

IPS. For all $u, v \in \mathcal{V}$ and all $\alpha, \beta \in \mathbb{R}$ we have $\langle \alpha u , \beta v \rangle = \alpha \beta \langle u, v \rangle$.

IPP. For all $v \in \mathcal{V}$ we have \[ \langle v , v \rangle \geq 0 \qquad \text{and} \qquad \langle v , v \rangle = 0 \quad \Leftrightarrow \quad v = 0_{\mathcal{V}}. \]

Explanation of the abbreviations: IPC--inner product is commutative, IPA--inner product respects addition, IPS--inner product respects scaling, IPP--inner product is positive definite. The abbreviations are made up by me as cute mnemonic tools.
The preceding definition means that for every two vectors $u \in \mathcal{V}$ and $v \in \mathcal{V}$ there exists a unique real number $\langle u,v\rangle \in \mathbb{R}$ which is called the inner product of $u$ and $v$ and the inner product $\langle u,v\rangle$ has the algebraic properties IPC (inner product is commutative), IPA (inner product respects addition), IPS (inner product respects scaling), IPP (inner product is positive definite).
The most important abstract inner products are inner products given by the Riemann integral in vector spaces of functions. I illustrate this with the inner product \[ \langle p, q \rangle = \int_{-1}^{1} p(t) q(t) dt \] in the vector space of polynomials. We can restrict ourselves to the space $\mathbb{P}_4.$ The standard basis in $\mathbb{P}_4$ is the basis which consists of monomials: \[ p_0(t) = 1, \quad p_1(t) = t, \quad p_2(t) = t^2, \quad p_3(t) = t^3, \quad p_4(t) = t^4. \] Set \[ \mathcal M = \bigl\{ p_0, p_1, p_2, p_3, p_4 \bigr\}. \] $\mathcal M$ is not an orthogonal basis for $\mathbb{P}_5.$ In fact it is useful to calculate \begin{alignat*}{5} \langle p_0, p_0 \rangle & = 2, & \quad \langle p_0, p_1 \rangle & = 0, & \quad \langle p_0, p_2 \rangle & = \frac{2}{3}, & \quad \langle p_0, p_3 \rangle & = 0, & \quad \langle p_0, p_4 \rangle & = \frac{2}{5}, \\ \langle p_1, p_0 \rangle & = 0, & \quad \langle p_1, p_1 \rangle & = \frac{2}{3}, & \quad \langle p_1, p_2 \rangle & = 0, & \quad \langle p_1, p_3 \rangle & = \frac{2}{5}, & \quad \langle p_1, p_4 \rangle & = 0, \\ \langle p_2, p_0 \rangle & = \frac{2}{3}, & \quad \langle p_2, p_1 \rangle & = 0, & \quad \langle p_2, p_2 \rangle & = \frac{2}{5}, & \quad \langle p_2, p_3 \rangle & = 0, & \quad \langle p_2, p_4 \rangle & = \frac{2}{7}, \\ \langle p_3, p_0 \rangle & = 0, & \quad \langle p_3, p_1 \rangle & = \frac{2}{5}, & \quad \langle p_3, p_2 \rangle & = 0, & \quad \langle p_3, p_3 \rangle & = \frac{2}{7}, & \quad \langle p_3, p_4 \rangle & = 0, \\ \langle p_4, p_0 \rangle & = \frac{2}{5}, & \quad \langle p_4, p_1 \rangle & = 0, & \quad \langle p_4, p_2 \rangle & = \frac{2}{7}, & \quad \langle p_4, p_3 \rangle & = 0, & \quad \langle p_4, p_4 \rangle & = \frac{2}{9}. \\ \end{alignat*}
One conclusion from the above table is that monomials of even degree are orthogonal to the monomials of odd degree. We will use this fact in the calculations in the next item.
We can apply the Gram-Schmidt orthogonalization algorithm to the basis $\mathcal M$ obtain an orthogonal basis \[ \mathcal A = \bigl\{q_0, q_1, q_2, q_3, q_4 \bigr\} \] for $\mathbb{P}_5:$ \begin{align*} q_0(t) & = 1 \\ q_1(t) & = t \\ q_2(t) & = t^2 - \frac{\langle p_2, q_0 \rangle }{\langle q_0, q_0 \rangle} 1 = t^2 - \frac{1}{3} \\ q_3(t) & = t^3 - \frac{\langle p_3, q_1 \rangle }{\langle q_1, q_1 \rangle} t = t^3 - \frac{3}{5} t \\ q_4(t) & = t^4 - \frac{\langle p_4, q_0 \rangle }{\langle q_0, q_0 \rangle} 1 - \frac{\langle p_4, q_2 \rangle }{\langle q_2, q_2 \rangle} \left(t^2 - \frac{1}{3}\right) = t^4 - \frac{6}{7} t^2 + \frac{3}{35} \\ \end{align*} In the above calculation we used that \[ \langle q_0, q_0 \rangle = 2, \quad \langle q_1, q_1 \rangle =\frac{2}{3}, \quad \langle q_2, q_2 \rangle = \frac{8}{45} \] and \[ \langle p_2, q_0 \rangle = \frac{2}{3}, \quad \langle p_3, q_1 \rangle = \frac{2}{5}, \quad \langle p_4, q_0 \rangle = \frac{2}{5}, \quad \langle p_4, q_2 \rangle = \frac{16}{105}. \]
It is common to normalize the polynomials $q_0, q_1, q_2, q_3, q_4$ so that they have values $1$ at $t=1.$ First calculate \[ q_0(1) = 1, \quad q_1(1) = 1, \quad q_2(1) = \frac{2}{3}, \quad q_3(1) = \frac{2}{5}, \quad q_4(1) = \frac{8}{35}. \] The polynomials \begin{alignat*}{2} P_0(t) & = 1 & & \\ P_1(t) & = t & & \\ P_2(t) & = \frac{1}{2} \left( 3 t^2 -1 \right) & & = \frac{3}{2} q_2(t) \\ P_3(t) & = \frac{1}{2} \left( 5 t^3 -3 t \right) & & = \frac{5}{2} q_3(t) \\ P_4(t) & = \frac{1}{8} \left( 35 t^4 - 30 t^2 + 3 \right) & & = \frac{35}{8} q_4(t) \\ \end{alignat*} The polynomials $P_0, P_1, P_2, P_3, P_4$ are the first five of the sequence of orthogonal polynomials called Legendre polynomials.
There are many examples of other sequences of orthogonal polynomials. Legendre polynomials is just one example which is presented here since the inner product in which they are orthogonal is particularly simple.

Sunday, February 19, 2023

I post several pictures related to Problem 4 on Assignment 3.
The picture below illustrates Problem 4(b)(iii) and Problem 4(c):

The intersection of the canonical rotated paraboloid $z=x^2+y^2$ and a plane $z = ax+by+c$ is an ellipse (provided that $a^2+b^2 + 4c \gt 0$). The projection of that ellipse onto $xy$-plane is a circle.

Notice that the picture below is "upside-down." The positive direction of the $z$-axes is downwards.

How would we determine the intersection of the paraboloid and the plane? Recall, the paraboloid is the set of points $(x,y, x^2 + y^2)$, while the plane is the set of points $(x,y, ax+by+c)$. For a point $(x,y,z)$ to be both, on the paraboloid, and on the plane we must have \[ x^2 + y^2 = a x + b y + c. \] Which points $(x,y)$ in the $xy$-plane satisfy the preceding equation? Rewrite the equation as \[ x^2 - a x + y^2 - b y = c, \] and completing the squares comes to the rescue, so we obtain the following equation: \[ \boxed{\left(x - \frac{a}{2} \right)^2 + \left(y-\frac{b}{2}\right)^2 = \frac{a^2}{4} + \frac{b^2}{4} + c.} \] The boxed equation is the equation of a circle in $xy$-plane centered at the point $(a/2,b/2)$ with the radius $\displaystyle\frac{1}{2}\sqrt{a^2+b^2+4c}$. Above this circle, in the plane $z=ax+by+c$ is the ellipse which is also on the paraboloid $z=x^2+y^2$. The circle whose equation is boxed, we will call the circle determined by the paraboloid $z=x^2+y^2$ and the plane $z=ax+by+c$.
The picture below illustrates Problem 4(a). The orange points are the points $P_1$, $P_2$, $P_3$, $P_4$. The points $Q_1$, $Q_2$, $Q_3$, $Q_4$ are the corresponding points on the rotated paraboloid. I colored them red, but these red points don't show well on the paraboloid. The gray line segments are $P_1Q_1$, $P_2Q_2$, $P_3Q_3$, $P_4Q_4$. These line segments connect the points $P_1$, $P_2$, $P_3$, $P_4$ in the $xy$-plane and the corresponding points $Q_1$, $Q_2$, $Q_3$, $Q_4$ on the rotated paraboloid.
The picture below illustrates the solution of Problem 4(a). The yellow plane is the least-squares fit plane that best fits the points $Q_1$, $Q_2$, $Q_3$, $Q_4$.
The pictures below illustrate the solution of Problem 4(c). The black circle in $xy$-plane is, in some sense, the best fit circle for the points $P_1$, $P_2$, $P_3$, $P_4$.
Below I will describe in more details the method of finding the best fit circle to a given set of points. The method is identical to finding the least-squares fit plane to a set of given points. In this case all the given points lie on the canonical rotated paraboloid $z = x^2+y^2.$
- The standard equation for a circle in $\mathbb{R}^2$ centered at the point $(a,b)$ with the radius $r \gt 0$ is \[ (x-a)^2 + (y-b)^2 = r^2. \] Expending the squares and grouping terms, the preceding equality is equivalent to \[ x^2 + y^2 - (2a) x - (2b) y - (r^2 - a^2 - b^2) = 0. \] Substituting \[ \beta_0 = r^2 - a^2 - b^2, \quad \beta_1 = 2a, \quad \beta_2 = 2b, \] we rewrite the preceding equality as \[ \require{bbox} \bbox[5px, #FFFF00, border: 1pt solid #888800]{\beta_0 + \beta_1 x + \beta_2 y = x^2 + y^2}. \] Although the last equation does not look like an equation of a circle, it is an equation of a circle, provided that \[ \beta_0 + \bigl(\beta_1/2\bigr)^2 + \bigl(\beta_2/2\bigr)^2 \gt 0. \]
- Let $n$ be a positive integer greater than $2.$ Assume that we are given $n$ noncollinear points in $\mathbb{R}^2$: \[ (x_1, y_1), \ \ (x_2, y_2), \ \ \ldots, \ (x_n, y_n). \]
- We want to find an equation of a circle which fits these points the best. We use the yellow highlighted equation of a circle. The unknown quantities are $\color{red}{\beta_0},$ $\color{red}{\beta_1},$ $\color{red}{\beta_2}.$ The linear equations that we need to solve are \begin{align*} \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_1 + \color{red}{\beta_2} y_1 &= (x_1)^2 + (y_1)^2 \\ \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_2 + \color{red}{\beta_2} y_2 &= (x_2)^2 + (y_2)^2 \\ & \ \ \vdots \\ \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_n + \color{red}{\beta_2} y_n &= (x_n)^2 + (y_n)^2 \end{align*}
- In matrix form the above system is: \[ \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \] This system is identical to the system for finding the least-squares plane that best fits the points \[ \bigl(x_1, y_1, (x_1)^2 + (y_1)^2 \bigr), \ \ \bigl(x_2, y_2, (x_2)^2 + (y_2)^2\bigr), \ \ \ldots, \ \bigl(x_n, y_n, (x_n)^2 + (y_n)^2\bigr). \] These points are all on the canonical rotated paraboloid $z=x^2+y^2,$ as explained at the beginning of today's post.
- The normal equations for the system from the preceding item are \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \]

In the items below I will present the Mathematica code that I wrote to automate finding the best fit circle. The argument of this function is a given set of points, and the output of this function is the center and the radious of the best fit circle.

Now that we have a linear algebra method for finding the best fit circle to a set of points, I wrote the Mathematica command BestCir[] to automate finding of the best fit circle:


Clear[BestCir, gpts, mX, vY, abc];
BestCir[gpts_] := Module[
  {mX, vY, abc},
  mX = Transpose[Append[Transpose[gpts], Array[1 &, Length[gpts]]]];
  vY = (#[[1]]^2 + #[[2]]^2) & /@ gpts;
  abc = Last[
    Transpose[
     RowReduce[
         Transpose[
       Append[Transpose[Transpose[mX] . mX], Transpose[mX] . vY]
                        ]
                     ]
                   ]
                 ];
  {{abc[[1]]/2, abc[[2]]/2}, Sqrt[abc[[3]] + (abc[[1]]/2)^2 + (abc[[2]]/2)^2]}
                              ]

You can copy-paste this command to a Mathematica notebook and test it on a set of points. The output of the command is a pair of the best circle's center and the best circle's radius.

To get you started with Mathematica I created my Mathematica website. At the end of the Mathematica website you will find a section about Linear Algebra in Mathematica.

Below is an example for the above command. First I name a set of points mypts, then I use the above command to find the best circles's center and the radius, which are named cir


mypts = {{5, 2}, {-1, 5}, {3, -2}, {3, 4.5}, {-5/2, 3}, {1, 5}, {4,
    3}, {-3, 1}, {-3/2, 4}, {1, -3}, {-2, -1}, {4, -1}};

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

In this example, we use our command to obtain the exact circle through three noncollinear points. First I name a set of points mypts, then I use the above command to find the best circles's center and the radius, which are named cir


mypts = {{3, 1}, {2, -4}, {-2, 3}};

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

In this example, we randomly generate 100 points in close to the circle centered at the origin with radius 4. Then we use our command to obtain the best fit circle. We name a set of hundred points mypts, then we use the above command to find the best circles's center and the radius, which are named cir


mypts = ((4 {Cos[2 Pi #[[1]]], Sin[2 Pi #[[1]]]} +
       1/70 {#[[2]], #[[3]]}) & /@ ((RandomReal[#, 3]) & /@
      Range[100]));

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

Wednesday, February 15, 2023

Suggested problems for Section 6.6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 15, 16
Exercise 4 in Section 6.6 is a simple interesting problem. In this exercise we are given four data points \[ ( 2,3), \ \ (3,2), \ \ (5,1), \ \ (6,0), \] and we are asked to find the least-squares line that best fits the given data points. (We will call this line simply the least-squares line.)
- Notice that these four points form a very narrow parallelogram. A characterizing property of a parallelogram is that its diagonals share the midpoint. For this parallelogram, the coordinates of the common midpoint of the diagonals are \[ \overline{x} = \frac{1}{4}(2+3+5+6) = 4, \quad \overline{y} = \frac{1}{4}(3+2+1+0) = 3/2. \] The long sides of this parallelogram are on the parallel lines $y = -2x/3 +4$ and $y = -2x/3 + 13/3.$ It is natural to guess that the least square line is the line which is parallel to these two lines and half-way between them. That is the line $y = -2x/3 + 25/6.$ This line is the red line in the picture below. Clearly this line goes through the point $(4,3/2),$ the intersection of the diagonals of the parallelogram.
  
  The only way to verify this guess is to calculate the least-squares line for these four points. We did that by finding the least-squares solution of the equation \[ \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] To get to the corresponding normal equations we multiply both sides by $X^\top$ \[ \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] The corresponding normal equations are \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 6 \\ 17 \end{array} \right]. \] Since the inverse of the above $2\!\times\!2$ matrix is \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right]^{-1} = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right], \] and the solution of the normal equations is unique and it is given by \[ \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right] \left[\begin{array}{c} 6 \\ 17 \end{array} \right] = \left[\begin{array}{c} \frac{43}{10} \\ -\frac{7}{10} \end{array} \right] \] Hence, the least-squares line for the given data points is \[ y = -\frac{7}{10}x + \frac{43}{10}. \] This line is the blue line in the picture below. The picture below strongly indicates that the blue line also goes through the point $(4,3/2).$ This is easily confirmed: \[ \frac{3}{2} = -\frac{7}{10}4 + \frac{43}{10}. \]
In the image below the forest green points are the given data points. The red line is the line which I guessed could be the least-squares line. The blue line is the true least-squares line.
It is amazing that what we observed in the preceding example is universal. (I proved this fact in class by using completely different method.)

Proposition. If the line $y = \beta_0 + \beta_1 x$ is the least-squares line for the data points \[ (x_1,y_1), \ldots, (x_n,y_n), \] then $\overline{y} = \beta_0 + \beta_1 \overline{x}$, where \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \]
The above proposition is Exercise 14 in Section 6.6.
Proof. Let \[ (x_1,y_1), \ldots, (x_n,y_n), \] be given data points and set \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \] Let $y = \beta_0 + \beta_1 x$ be the least-squares line for the given data points. Then the vector $\left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right]$ satisfies the normal equation \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] Multiplying the second matrix on the left-hand side and the third vector we get \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 + \beta_1 x_1 \\ \beta_0 + \beta_1 x_2 \\ \vdots \\ \beta_0 + \beta_1 x_n \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] The above equality is an equality of vectors with two components. The top components of these vectors are equal: \[ (\beta_0 + \beta_1 x_1) + (\beta_0 + \beta_1 x_2) + \cdots + (\beta_0 + \beta_1 x_n) = y_1 + y_2 + \cdots + y_n. \] Therefore \[ n \beta_0 + \beta_1 (x_1+x_3 + \cdots + x_n) = y_1 + y_2 + \cdots + y_n. \] Dividing by $n$ we get \[ \beta_0 + \beta_1 \frac{1}{n} (x_1+x_3 + \cdots + x_n) = \frac{1}{n}( y_1 + y_2 + \cdots + y_n). \] Hence \[ \overline{y} = \beta_0 + \beta_1 \overline{x}. \] QED.
Do the following problem: Consider the following four data points \[ ( 0, 0, 5), \ \ (3, 0, 6), \ \ (3, 3, 14), \ \ (0, 3, 9). \]
- Find the equation $z = \beta_0 + \beta_1 x +\beta_2 y$ of the least-squares plane that best fits the data points.
- Find the coordinates of the dark green points and the teal points in the picture below.
- Calculate the residual vector and the least-squares error.
- Find the equation of the plane through the data points \[ ( 0, 0, 5), \ \ (3, 0, 6), \ \ (0, 3, 9). \] Show that the least-squares error is larger for this plane than the error for the least-squares plane.
In this image the the navy blue points are the given data points and the light blue plane is the least-squares plane that best fits these data points. The dark green points are their projections onto the $xy$-plane. The teal points are the corresponding points in the least-square plane.

Monday, February 13, 2023

We finished Section 6.5 today. Suggested problems for Section 6.5: 1, 3, 6, 7, 9, 13, 16, 17, 19, 20, 21, 22.

Saturday, February 11, 2023

Yesterday in class I presented a theorem which is very important in Section 6.5. In fact, it is important in this context in general. For example, after the thick divider below in this post, I use this theorem to deduce a formula for an orthogonal projection onto the column space of a matrix with linearly independent columns. This important theorem is presented as Exercise 19 in Section 6.5.
Exercise 19 in Section 6.5 is the following theorem:

Theorem. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then \[ \operatorname{Nul}(A) = \operatorname{Nul}\bigl(A^\top\!A\bigr). \]

Proof. The set equality $\operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A)$ means \[ \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Nul}(A). \] As with all equivalences, we prove this equivalence in two steps.

Step 1. Assume that $\mathbf{x} \in \operatorname{Nul}(A)$. Then $A\mathbf{x} = \mathbf{0}$. Consequently, \[ (A^\top\!A)\mathbf{x} = A^\top ( \!A\mathbf{x}) = A^\top\mathbf{0} = \mathbf{0}. \] Hence, $(A^\top\!A)\mathbf{x}= \mathbf{0}$, and therefore $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Nul}(A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ). \]

Step 2. In this step, we prove the the converse: \[ \tag{*} \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A). \] Assume, $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Then, $(A^\top\!\!A) \mathbf{x} = \mathbf{0}$. Multiplying the last equality by $\mathbf{x}^\top$ we get $\mathbf{x}^\top\! (A^\top\!\! A \mathbf{x}) = 0$. Using the associativity of the matrix multiplication we obtain $(\mathbf{x}^\top\!\! A^\top) (A \mathbf{x}) = 0$. Using the Linear Algebra with the transpose operation we get $(A \mathbf{x})^\top\! (A \mathbf{x}) = 0$. Now recall that for every vector $\mathbf{v}$ we have $\mathbf{v}^\top \mathbf{v} = \|\mathbf{v}\|^2$. Thus, we have proved that $\|A\mathbf{x}\|^2 = 0$. Now recall that the only vector whose norm is $0$ is the zero vector, to conclude that $A\mathbf{x} = \mathbf{0}$. This means $\mathbf{x} \in \operatorname{Nul}(A)$. This completes the proof of implication (*). The theorem is proved. □

In Step 2 of the preceding proof, the idea introduced in the sentence which started with the highlighted text, is a truly brilliant idea. It is a pleasure to share these brilliant mathematical vignettes with you.

Corollary 1. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then the columns of $A$ are linearly independent if and only if the matrix $A^\top\!A$ is invertible.

Please provide your own proof using what was proved in the above Theorem: $\operatorname{Nul}(A) = \operatorname{Nul}(A^\top\!\! A )$.

Corollary 2. Let $A$ be an $n\!\times\!m$ matrix. Then $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$.

Proof 1. The following equalities we established earlier: \begin{align*} \operatorname{Col}(A^\top\!\! A ) & = \operatorname{Row}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp, \\ \operatorname{Col}(A^\top) & = \operatorname{Row}(A) = \bigl( \operatorname{Nul}(A) \bigr)^\perp \end{align*} In the above Theorem we proved the following subspaces are equal \[ \operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A). \] Equal subspaces have equal orthogonal complements: \[ \bigl(\operatorname{Nul}(A^\top\!\! A )\bigr)^\perp = \bigl( \operatorname{Nul}(A) \bigr)^\perp. \] Since earlier we proved \[ \operatorname{Col}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp \quad \text{and} \quad \operatorname{Col}(A^\top) = \bigl( \operatorname{Nul}(A) \bigr)^\perp, \] the last three equalities imply \[ \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top). \]

Proof 2. (This is a direct proof. It does not use the above Theorem. It uses the existence of an orthogonal projection onto the column space of $A$.) The set equality $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$ means \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Col}(A^\top). \] We will prove this equivalence in two steps.

Step 1. Assume that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Then there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\mathbf{x} = (A^\top\!\!A)\mathbf{v}.$ Since by the definition of matrix multiplication we have $(A^\top\!\!A)\mathbf{v} = A^\top\!(A\mathbf{v})$, we have $\mathbf{x} = A^\top\!(A\mathbf{v}).$ Consequently, $\mathbf{x} \in \operatorname{Col}(A^\top).$ Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top). \]

Step 2. Now we prove the converse: \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). \] Assume, $\mathbf{x} \in \operatorname{Col}(A^\top).$ By the definition of the column space of $A^\top$, there exists $\mathbf{y} \in \mathbb{R}^n$ such that $\mathbf{x} = A^\top\!\mathbf{y}.$ Let $\widehat{\mathbf{y}}$ be the orthogonal projection of $\mathbf{y}$ onto $\operatorname{Col}(A).$ That is $\widehat{\mathbf{y}} \in \operatorname{Col}(A)$ and $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}.$ Since $\widehat{\mathbf{y}} \in \operatorname{Col}(A),$ there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\widehat{\mathbf{y}} = A\mathbf{v}.$ Since $\bigl(\operatorname{Col}(A)\bigr)^{\perp} = \operatorname{Nul}(A^\top),$ the relationship $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}$ yields $A^\top\bigl(\mathbf{y} - \widehat{\mathbf{y}}\bigr) = \mathbf{0}.$ Consequently, since $\widehat{\mathbf{y}} = A\mathbf{v},$ we deduce $A^\top\bigl(\mathbf{y} - A\mathbf{v}\bigr) = \mathbf{0}.$ Hence \[ \mathbf{x} = A^\top\mathbf{y} = \bigl(A^\top\!\!A\bigr) \mathbf{v} \quad \text{with} \quad \mathbf{v} \in \mathbb{R}^m. \] This proves that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Thus, the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \] is proved. The corollary is proved. □

Corollary 3. Let $A$ be an $n\!\times\!m$ matrix. The matrices $A$, $A^\top$ and $A^\top\!\! A$ have the same rank.

Corollary 1 in the previous item is stated in Exercises 20 and 21 in Section 6.5. Corollary 2 in the previous item is implicitly stated in Theorem 13 in Section 6.5. Corollary 3 in the previous item is stated in Exercise 22 in Section 6.5. Corollary 3 is a simple consequence of Corollary 2. The hint given in Exercise 22 will result in a different proof of Corollary 3.

In the next few items I will deduce a formula for $\operatorname{Proj}_{\mathcal W}(\mathbf{y})$ where $\mathbf{y} \in \mathbb{R}^n$ and $\mathcal{W} = \operatorname{Col}(A)$ where $A$ is an $n\!\times\!m$ matrix with linearly independent columns.
In this item I will deduce a formula for $\operatorname{Proj}_{\mathcal W}(\mathbf{y})$ where $\mathbf{y} \in \mathbb{R}^n$ and $\mathcal{W} = \operatorname{Col}(A)$ where $A$ is an $n\!\times\!m$ matrix with linearly independent columns.
- First, analyze what our objective is. Our objective is to find $\color{red}{\mathbf{w}} \in \mathbb{R}^n$ such that
  - ① $\color{red}{\mathbf{w}} \in \operatorname{Col}(\color{green}{A})$
  - and
  - ② $\color{green}{\mathbf{y}} -\color{red}{\mathbf{w}} \in \bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp$
- Next, recall our background knowledge about $\operatorname{Col}(\color{green}{A})$ and $\bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp.$
  
  Our background knowledge about about $\operatorname{Col}(\color{green}{A})$ is: \[ \mathbf{b} \in \operatorname{Col}({A}) \qquad \text{if and only if} \qquad \exists\, \mathbf{x} \in \mathbb{R}^m \ \ \text{such that} \ \ \mathbf{b} = A \mathbf{x}. \]
  
  Our background knowledge about about $\bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp$ is: \[ \bigl(\operatorname{Col}({A})\bigr)^\perp = \operatorname{Nul}\bigl(A^\top\bigr). \]
- Based on our background knowledge we can rewrite our objective as: Find $\color{red}{\mathbf{x}} \in \mathbb{R}^m$ such that
  - ① $\color{red}{\mathbf{w}} = \color{green}{A} \color{red}{\mathbf{x}}$
  - and
  - ② $\color{green}{A}^\top\bigl(\color{green}{\mathbf{y}} - \color{green}{A} \color{red}{\mathbf{x}} \bigr) = \mathbf{0}_n$
- Property ② in the preceding item is truly promising! The expression in \[ \color{green}{A}^\top\bigl(\color{green}{\mathbf{y}} - \color{green}{A} \color{red}{\mathbf{x}}\bigr) = \mathbf{0}_n \] is a linear matrix equation in $\color{red}{\mathbf{x}}.$ It is solvable for $\color{red}{\mathbf{x}}.$ Using Linear Algebra, the preceding equation is equivalent to \begin{align*} \color{green}{A}^\top\color{green}{\mathbf{y}} - \color{green}{A}^\top\color{green}{A} \color{red}{\mathbf{x}} &= \mathbf{0}_n \\ \color{green}{A}^\top\color{green}{\mathbf{y}} = \color{green}{A}^\top\color{green}{A} \color{red}{\mathbf{x}} \\ \bigl(\color{green}{A}^\top\color{green}{A}\bigr) \color{red}{\mathbf{x}} = \color{green}{A}^\top\color{green}{\mathbf{y}} \\ \end{align*}
- Now recall that in the first item in this proof we proved that $A^\top \!A$ is invertible if and only if the columns of $A$ are linearly independent. Since we assume that the columns of $A$ are linearly independent, we have that $A^\top \!A$ is invertible. Hence, the last equation in the preceding item is solvable as \[ \color{red}{\mathbf{x}} = \bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}}. \]
- By finding $\color{red}{\mathbf{x}}$ we have found $\color{red}{\mathbf{w}}$, that is we have archived our objective \[ \color{red}{\mathbf{w}} = \color{green}{A}\bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}}. \]
Notice that in Problem 5 on Assignment 2 I am not asking you to repeat the reasoning above. It is true that the reasoning above proves that the vector \[ \color{green}{A}\bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}} \] is the orthogonal projection of $\color{green}{\mathbf{y}}$ onto $\operatorname{Col}(\color{green}{A}).$ However, just proving that the vector \[ \color{green}{A}\bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}} \] is the orthogonal projection of $\color{green}{\mathbf{y}}$ onto $\operatorname{Col}(\color{green}{A})$ is much easier: you only need to verify two defining properties of the orthogonal projection.
Notice that in Problem 5 on Assignment 2 I also ask you to apply the formula that we just derived to verify your calculations in Problem 1.

Thursday, February 9, 2023

Today we finished Section 6.4: Gram-Schmidt Orthogonalization Algorithm (and $QR$ factorization). I made up this name. It seems that it is an appropriate modern name. But, I think that mathematically the most fitting name would be: Gram-Schmidt Orthogonalization Recursion. It is a recursive formula. The formula for the next vector is given in terms of the all previously calculated vectors. Suggested problems for Section 6.4: 2, 3, 5, 7, 9, 13, 15, 17, 19, 20
The presentation of the $QR$ factorization in the textbook somewhat obscures the direct connection between the Gram-Schmidt orthogonalization algorithm and the $QR$ factorization. Below I will demonstrate the connection.
Let $m, n \in \mathbb{N}$. Let $\mathbf{a}_1, \mathbf{a}_2, \ldots, \mathbf{a}_m$ be linearly independent vectors in $\mathbb{R}^n$.
- The Gram-Schmidt orthogonalization algorithm produces the mutually orthogonal vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ defined as follows: \begin{align*} \mathbf{v}_1 & = \mathbf{a}_1 \\ \mathbf{v}_2 & = \mathbf{a}_2 - \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 \\ \mathbf{v}_3 & = \mathbf{a}_3 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2 \\ & \ \ \vdots \\ \mathbf{v}_m & = \mathbf{a}_m - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 - \cdots - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\mathbf{v}_{m-1} \cdot \mathbf{v}_{m-1}} \right) \mathbf{v}_{m-1} \\ \end{align*}
- We can rewrite the above vector equations as \begin{align*} \mathbf{a}_1 & = \mathbf{v}_1 \\ \mathbf{a}_2 & = \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \right) \mathbf{v}_1 + \mathbf{v}_2 \\ \mathbf{a}_3 & = \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}}\right) \mathbf{v}_1 + \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{2}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \right) \mathbf{v}_2 + \mathbf{v}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \right) \mathbf{v}_1 + \cdots + \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\mathbf{v}_{m-1} \cdot \mathbf{v}_{m-1}} \right) \mathbf{v}_{m-1} + \mathbf{v}_m \\ \end{align*}
- Now we normalize the vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$. That is we introduce the unit vectors $\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_m$ defined as follows: \[ \mathbf{u}_k = \frac{1}{\|\mathbf{v}_k\|} \mathbf{v}_k \quad \text{for all} \quad k \in \{1,\ldots,m\}. \] We use the fact that $\mathbf{v}_k \cdot \mathbf{v}_k = \|\mathbf{v}_k\|^2$ to rewrite the vectors $\mathbf{a}_1,\dots, \mathbf{a}_m$ in terms of the orthonormal vectors $\mathbf{u}_1,\ldots,\mathbf{u}_m$: \begin{align*} \mathbf{a}_1 & = \|\mathbf{v}_1\| \, \mathbf{u}_1 \\ \mathbf{a}_2 & = \left( \mathbf{a}_2\cdot \mathbf{u}_{1} \right) \mathbf{u}_1 + \|\mathbf{v}_2\| \, \mathbf{u}_2 \\ \mathbf{a}_3 & = \left( \mathbf{a}_3\cdot \mathbf{u}_{1} \right) \mathbf{u}_1 + \left( \mathbf{a}_3\cdot \mathbf{u}_{2} \right) \mathbf{u}_2 + \|\mathbf{v}_3\| \, \mathbf{u}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \left( \mathbf{a}_m\cdot \mathbf{u}_{1} \right) \mathbf{u}_1 + \cdots + \left( \mathbf{a}_m\cdot \mathbf{u}_{m-1} \right) \mathbf{u}_{m-1} + \|\mathbf{v}_m\| \, \mathbf{u}_m \end{align*}
- Now set \[ \alpha_{jk} = \mathbf{a}_k\cdot \mathbf{u}_{j} \quad \text{for} \quad j \in \{1,\ldots,k-1\}, \ \ k \in \{2,\ldots,m\} \] and the above equations can be rewritten as \begin{align*} \mathbf{a}_1 & = \|\mathbf{v}_1\| \, \mathbf{u}_1 \\ \mathbf{a}_2 & = \alpha_{1,2} \, \mathbf{u}_1 + \|\mathbf{v}_2\| \, \mathbf{u}_2 \\ \mathbf{a}_3 & = \alpha_{1,3} \, \mathbf{u}_1 + \alpha_{2,3} \, \mathbf{u}_2 + \|\mathbf{v}_3\| \, \mathbf{u}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \alpha_{1,m} \, \mathbf{u}_1 + \cdots + \alpha_{m-1,m} \, \mathbf{u}_{m-1} + \|\mathbf{v}_m\| \, \mathbf{u}_m \\ \end{align*}
- The preceding vector equations can be written in matrix form as \[ \Bigl[\begin{array}{ccccc} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 & \cdots & \mathbf{a}_m \end{array} \Bigr] = \Bigl[\begin{array}{ccccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \cdots & \mathbf{u}_m \end{array} \Bigr] \left[\begin{array}{ccccc} \|\mathbf{v}_1\| & \alpha_{1,2} & \alpha_{1,3} & \cdots & \alpha_{1,m} \\ 0 & \|\mathbf{v}_2\| & \alpha_{2,3} & \cdots & \alpha_{2,m} \\ 0 & 0 & \|\mathbf{v}_3\| & \cdots & \alpha_{3,m} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \|\mathbf{v}_m\| \\ \end{array} \right] \]
- The preceding matrix equation is the $QR$ factorization of $A$: \[ A = QR \] with \begin{equation*} A = \left[\begin{array}{ccccc} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 & \cdots & \mathbf{a}_m \end{array} \right] \quad \text{and} \quad Q = \left[\begin{array}{ccccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \cdots & \mathbf{u}_m \end{array} \right] \end{equation*} and the matrix $R$ is an upper triangular matrix with positive terms on the diagonal. Since the vectors $\mathbf{u}_1, \mathbf{u}_2,\ldots, \mathbf{u}_m$ are orthonormal, we have $Q^{\top} Q = I_m$. Therefore the upper triangular $m\!\times\!m$ matrix $R$ can be calculated as \[ R = Q^{\top} A. \]
Next I will state the $QR$ factorization of a matrix with linearly independent columns as a theorem.

Theorem. Every $n\times m$ matrix $A$ with linearly independent columns can be written as a product $A = QR$ where $Q$ is an $n\times m$ matrix whose columns form an orthonormal basis for the column space of $A$ and $R$ is an $m\times m$ upper triangular invertible matrix with positive entries on its diagonal.
The $QR$ factorization of a matrix is just the Gram-Schmidt orthogonalization process for the columns of $A$ written in matrix form. The only difference is that a Gram-Schmidt orthogonalization process produces orthogonal vectors which we have to normalize to obtain the matrix $Q$ with orthonormal columns.

A nice simple example is given by calculating $QR$ factorization of the $3\!\times\!2$ matrix \[ A = \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right]. \]
- Denote the columns of $A$ by $\mathbf{a}_1$ and $\mathbf{a}_2$.
- The Gram-Schmidt orthogonalization of the vectors $\mathbf{a}_1$ and $\mathbf{a}_2$ leads to vectors \[ \mathbf{v}_1 = \left[ \begin{array}{r} 1 \\[2pt] 2 \\[2pt] 2 \end{array}\right], \quad \mathbf{v}_2 = \left[ \begin{array}{r} -2/3 \\[2pt] 2/3 \\[2pt] 1/3 \end{array}\right]. \] These vectors are calculated as \[ \tag{*} \mathbf{v}_1 = \mathbf{a}_1, \quad \mathbf{v}_2 = \mathbf{a}_2 - \left(\frac{\mathbf{a}_2 \cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 = \mathbf{a}_2 - \frac{5}{3} \mathbf{v}_1. \]
- Next we normalize the vectors $\mathbf{v}_1$ and $\mathbf{v}_2$. The norm of vector $\mathbf{v}_1$ is $3$ and the norm of $\mathbf{v}_2$ is 1. Hence the following vectors are orthonormal: \[ \mathbf{u}_1 = \frac{1}{3} \left[ \begin{array}{r} 1 \\[2pt] 2 \\[2pt] 2 \end{array}\right], \quad \mathbf{u}_2 = \frac{1}{3} \left[ \begin{array}{r} -2 \\[2pt] 2 \\[2pt] 1 \end{array}\right]. \]
- We can rewrite equalities (*) using the vectors $\mathbf{u}_1$ and $\mathbf{u}_2$ as follows \[ \tag{**} \mathbf{a}_1 = 3\, \mathbf{u}_1 = 3\, \mathbf{u}_1 + 0\, \mathbf{u}_2, \quad \mathbf{a}_2 = \frac{5}{3} \, 3\, \mathbf{u}_1 + \mathbf{u}_2 = 5\,\mathbf{u}_1 + \mathbf{u}_2. \] In the matrix form the equalities (**) can be written as \[ \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right] = \frac{1}{3} \left[ \begin{array}{rr} 1 & -2 \\[2pt] 2 & 2 \\[2pt] 2 & 1 \end{array}\right] \left[ \begin{array}{rr} 3 & 5 \\[2pt] 0 & 1 \end{array}\right]. \] where \[ Q = \frac{1}{3} \left[ \begin{array}{rr} 1 & -2 \\[2pt] 2 & 2 \\[2pt] 2 & 1 \end{array}\right] \] is a matrix with orthonormal columns and its column space is identical to the columns space of the matrix $A$. Here \[ R = \left[ \begin{array}{rr} 3 & 5 \\[2pt] 0 & 1 \end{array}\right]. \]
- Notice that on the diagonal of the matrix $R$ are the norms of the vectors $\mathbf{v}_1$ and $\mathbf{v}_2$ which we obtained by the Gram-Schmidt orthogonalization algorithm. Since the matrix $Q$ has orthonormal columns we have $Q^\top Q = I_2$. Therefore the matrix $R$ can be calculated as \[ R = Q^\top A. \] That is \[ \left[ \begin{array}{rr} 3 & 5 \\[2pt] 0 & 1 \end{array}\right] = \frac{1}{3} \left[ \begin{array}{rrr} 1 & 2 & 2 \\[2pt] -2 & 2 & 1 \end{array}\right] \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right]. \] This might be simpler than making adjustments to the coefficients of the Gram-Schmidt orthogonalization algorithm as we did in this simple example. However, it is good to know that $R$ is closely related to those coefficients.
In the next example, I will demonstrate a useful simplification strategy when calculating the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$, and so on. Following the given formulas, calculation the vectors $\mathbf{v}$s will frequently involve fractions and make the arithmetic of the subsequent calculations more difficult. Recall, the objective here is to produce orthogonal set of vectors keeping the running spans equal.

To simplify the arithmetic, at each step of the Gram-Schmidt algorithm, we can replace a vector $\mathbf{v}_k$ by its scaled version $\alpha \mathbf{v}_k$ with a conveniently chosen $\alpha \gt 0$.

In this way we can avoid fractions in vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3.$ In the next item I present an example.
Calculate the $QR$-factorization of the matrix: \[ \left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right]. \]
- We first apply the Gram-Schmidt algorithm to the vectors \[ \left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\! \right], \quad \left[\! \begin{array}{c} 6 \\ 6 \\ 1 \end{array}\! \right], \quad \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array}\! \right]. \]
- We first calculate \begin{align*} \mathbf{v}_1 & = \left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right], \\ \mathbf{v}_2 & = \left[\! \begin{array}{c} 6 \\ 6 \\ 1 \end{array}\!\right] - \frac{8}{7} \left[\!\begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right] = \left[\!\begin{array}{c} -6/7 \\ 18/7 \\ -9/7 \end{array}\!\right] = \frac{3}{7} \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right], \quad \\ & \phantom{\hspace{5cm}} \text{continue with} \ \mathbf{v}_2 = \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right] \\ \mathbf{v}_3 & = \left[\!\begin{array}{c} 1\\ 1\\ 1 \end{array}\!\right] - \frac{11}{49}\left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right] - \frac{1}{49} \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right] = \frac{1}{49} \left[\!\begin{array}{r} 49 - 66 + 2 \\ 49 - 33 - 6 \\ 49 - 22 + 3 \end{array}\!\right] = \frac{5}{49} \left[\!\begin{array}{r} -3 \\ 2 \\ 6 \end{array}\!\right], \\ & \phantom{\hspace{8cm}} \text{continue with} \ \mathbf{v}_3 = \left[\!\begin{array}{r} -3 \\ 2 \\ 6 \end{array}\!\right]. \end{align*}
- Next we normalize the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$: \[ \frac{1}{7} \left[\! \begin{array}{r} 6 \\ 3 \\ 2 \end{array}\! \right], \quad \frac{1}{7} \left[\! \begin{array}{r} -2 \\ 6 \\ -3 \end{array}\! \right], \quad \frac{1}{7} \left[\! \begin{array}{r} -3 \\ 2 \\ 6 \end{array}\! \right]. \] The preceding unit vectors are the columns of $Q$.
- Next we calculate $R = Q^\top A$: \[ \frac{1}{7} \left[\! \begin{array}{rrr} 6 & 3 & 2 \\ -2 & 6 & -3 \\ -3 & 2 & 6 \end{array}\! \right]\left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right] = \left[\! \begin{array}{rrr} 7 & 8 & 11/7 \\ 0 & 3 & 1/7 \\ 0 & 0 & 5/7 \end{array}\! \right] \]
- Thus \[ \left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right] = \frac{1}{7} \left[\! \begin{array}{rrr} 6 & -2 & -3 \\ 3 & 6 & 2 \\ 2 & -3 & 6 \end{array}\! \right] \left[\! \begin{array}{rrr} 7 & 8 & 11/7 \\ 0 & 3 & 1/7 \\ 0 & 0 & 5/7 \end{array}\! \right] \]
- The matrix \[ \frac{1}{7} \left[\! \begin{array}{rrr} 6 & -2 & -3 \\ 3 & 6 & 2 \\ 2 & -3 & 6 \end{array}\! \right] = \left[\! \begin{array}{rrr} 6/7 & -2/7 & -3/7 \\ 3/7 & 6/7 & 2/7 \\ 2/7 & -3/7 & 6/7 \end{array}\! \right] \] is a $3\!\times\!3$ matrix with orthonormal columns. A square matrix with orthonormal columns columns is called an orthogonal matrix. An orthogonal matrix is special since its inverse equals its transpose. Verify that for the above $Q$: \[ \left(\frac{1}{7} \left[\! \begin{array}{rrr} 6 & 3 & 2 \\ -2 & 6 & -3 \\ -3 & 2 & 6 \end{array}\! \right]\right) \left(\frac{1}{7} \left[\! \begin{array}{rrr} 6 & -2 & -3 \\ 3 & 6 & 2 \\ 2 & -3 & 6 \end{array}\! \right]\right) = \frac{1}{49} \left[\! \begin{array}{rrr} 49 & 0 & 0 \\ 0 & 49 & 0 \\ 0 & 0 & 49 \end{array}\! \right] \]
For practice, find $QR$ factorizations of the following matrices \[ \left[ \begin{array}{ccc} -1 & -1 & 3 \\ 1 & 5 & -1 \\ 1 & 1 & 3 \\ -1 & -5 & 7 \end{array} \right] \quad \left[ \begin{array}{ccc} 6 & 8 & 7 \\ 3 & 6 & 0 \\ 2 & 2 & 0 \end{array} \right] \quad \left[ \begin{array}{ccc} 2 & 2 & 1 \\ 1 & 2 & 8 \\ 2 & 3 & 1 \end{array} \right] \quad \left[ \begin{array}{ccc} 4 & -1 & -7 \\ 2 & 8 & 7 \\ 2 & 4 & -8 \\ 1 & 5 & 5 \end{array} \right] \] \[ \left[ \begin{array}{ccc} 2 & -6 & 4 \\ -5 & 9 & 1 \\ 4 & 4 & 9 \\ 2 & -4 & 5 \end{array} \right] \]

Tuesday, February 7, 2023

We finished Section 6.3 today. Suggested problems for Section 6.3 are: 1, 2, 4, 5, 7, 10, 11, 13, 15, 16 17, 19, 20, 21, 23.
The main question in this section is the following:
Given a subspace $\mathcal W$ of $\mathbb{R}^n$ and a vector $\mathbf{y} \in \mathbb{R}^n$ find a vector $\mathbf{w} \in \mathcal W$ such that:
- ① $\mathbf{w} \in \mathcal W$
- and
- ② $\mathbf{y} -\mathbf{w}$ is orthogonal to $\mathcal{W}$
In other words, for a given subspace $\mathcal W$ of $\mathbb{R}^n$ and a vector $\mathbf{y} \in \mathbb{R}^n$ we seek $\mathbf{w} \in \mathcal W$ and $\mathbf{z} \in \mathcal W^\perp$ such that \[ \mathbf{y} = \mathbf{w} + \mathbf{z}. \]
An important consequence of the preceding boxed statement is:

For a given subspace $\mathcal W$ of $\mathbb{R}^n$ and a vector $\mathbf{y} \in \mathbb{R}^n$, the properties ① and ② determine the vector $\mathbf{w} \in \mathbb{R}^n$ uniquely.

The proof of the the last claim goes as follows.
- Assume that \[ \mathbf{w}_1,\mathbf{w}_2 \in \mathcal W \quad \text{and} \quad \mathbf{z}_1, \mathbf{z}_2 \in \mathcal W^\perp. \] and \[ \mathbf{y} = \mathbf{w}_1 + \mathbf{z}_1 \quad \text{and} \quad \mathbf{y} = \mathbf{w}_2 + \mathbf{z}_2. \]
- Subtracting the preceding two equalities we get \[ \mathbf{0} = \mathbf{w}_1 - \mathbf{w}_2 + \mathbf{z}_1 - \mathbf{z}_2. \]
- Since $\mathcal{W}$ and $\mathcal{W}^\perp$ are subspaces we have \[ \mathbf{w}_1 - \mathbf{w}_2 \in \mathcal W \quad \text{and} \quad \mathbf{z}_1 - \mathbf{z}_2 \in \mathcal W^\perp. \] Therefore the vectors $\mathbf{w}_1 - \mathbf{w}_2$ and $\mathbf{z}_1 - \mathbf{z}_2$ are orthogonal.
- The last two items allow us to apply the Pythagorean Theorem to the equality \[ \mathbf{0} = \mathbf{w}_1 - \mathbf{w}_2 + \mathbf{z}_1 - \mathbf{z}_2 \] and conclude \[ 0 = \|\mathbf{0}\|^2 = \|\mathbf{w}_1 - \mathbf{w}_2\|^2 + \|\mathbf{z}_1 - \mathbf{z}_2\|^2. \]
- From the properties of the norm we have \[ \|\mathbf{w}_1 - \mathbf{w}_2\|^2 \geq 0 \qquad \text{and} \qquad \|\mathbf{z}_1 - \mathbf{z}_2\|^2 \geq 0. \]
- Since the sum of two nonnegative numbers equals to $0$ if and only if both numbers equal to $0$, from the preceding two items we deduce that \[ \|\mathbf{w}_1 - \mathbf{w}_2\|^2 = 0 \qquad \text{and} \qquad \|\mathbf{z}_1 - \mathbf{z}_2\|^2 = 0. \]
- Since the only vector whose norm is $0$ is the zero vector $\mathbf{0},$ we conclude that $\mathbf{w}_1 - \mathbf{w}_2 = \mathbf{0}.$ That is \[ \mathbf{w}_1 = \mathbf{w}_2. \] Thus, $\mathbf{w}$ is uniquely determined by the properties ① and ②.
Definition. Let $\mathcal{W}$ be a subspace of $\mathbb{R}^n$ and let $\mathbf{y} \in \mathbb{R}^n.$ The vector $\mathbf{w} \in \mathbb{R}^n$ is called the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ if $\mathbf{w}$ has the following two properties:
- ① $\mathbf{w} \in \mathcal{W},$
- ② $\mathbf{y} - \mathbf{w} \in \mathcal{W}^{\perp}.$
The notation for the orthogonal projection $\mathbf{w}$ of $\mathbf{y} \in \mathbb{R}^n$ onto $\mathcal{W}$ is: \[ \mathbf{w} = \operatorname{Proj}_{\mathcal W}(\mathbf{y}). \] The transformation $\operatorname{Proj}_{\mathcal W}: \mathbb{R}^n \to \mathcal W$ is called the orthogonal projection of $\mathbb{R}^n$ onto $\mathcal{W}$.
The next question is: How do we calculate the orthogonal projection of $\mathbf{y} \in \mathbb{R}^n$ onto a subspace $\mathcal{W}$ of $\mathbb{R}^n?$
The answer to this question depends on the way how subspace $\mathcal{W}$ is defined. We will consider two cases
- $\mathcal W = \operatorname{Span} \{ \mathbf u_1, \ldots, \mathbf u_m \}$ where the set $\{ \mathbf u_1, \ldots, \mathbf u_m \}$ is an orthogonal set of vectors.
- $\mathcal W = \operatorname{Col}(A)$ where $A$ is $n\!\times\!m$ matrix with linearly independent columns.
For completeness we state the definition of an orthogonal set of vectors.
Definition. A set of vectors $\{ \mathbf u_1, \ldots, \mathbf u_m \}$ in $\mathbb{R}^n$ is said to be an orthogonal set of vectors if it has the following two properties:
- For all $j,k \in \{1,\ldots,m\}$ such that $j\neq k$, we have $\mathbf u_j \cdot \mathbf u_k = 0$.
- For all $k \in \{1,\ldots,m\}$ we have $\mathbf u_k \cdot \mathbf u_k \gt 0$. (This in fact means that all the vectors in this set are nonzero vectors.)
Here we state three important properties of an orthogonal set of vectors. Assume that $\{ u_1, \ldots, u_m \}$ is an orthogonal set of vectors in $\mathcal{V}.$
- The set $\{ \mathbf{u}_1, \ldots, \mathbf{u}_m \}$ is linearly independent.
- If $\mathbf{b} \in \operatorname{Span} \{ \mathbf u_1, \ldots, \mathbf u_m \}$, then the solution of the vector equation \[ \alpha_1 \mathbf{u}_1 + \cdots + \alpha_m \mathbf{u}_m = \mathbf{b} \] is given by \[ \forall \mkern+1mu k\in \{1,\ldots,m\} \quad \alpha_k = \frac{\mathbf{b}\cdot \mathbf{u}_k}{\mathbf{u}_k\cdot \mathbf{u}_k}. \] (This property I call "easy solving of a vector equation.")
- Let $\mathbf{y} \in \mathbb{R}^n$ and let \[ \mathcal W = \operatorname{Span} \{ \mathbf u_1, \ldots, \mathbf u_m \}. \] Then \[ \operatorname{Proj}_{\mathcal W}(\mathbf{y}) = \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_1}{\mathbf{u}_1\!\cdot\!\mathbf{u}_1} \right) \mathbf{u}_1 + \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_2}{\mathbf{u}_2\!\cdot\!\mathbf{u}_2} \right) \mathbf{u}_2 + \cdots + \left( \frac{\mathbf{y}\!\cdot\!\mathbf{u}_m}{\mathbf{u}_m\!\cdot\!\mathbf{u}_m} \right) \mathbf{u}_m \] This property I call "easy orthogonal projection." I derived this formula in class. It is proved in the book. However, in the next item below I will prove that the vector on the right-hand side of the equality above satisfies properties ① and ② in the definition of the orthogonal projection.
In this item I will prove the third property stated in the preceding item. I will prove that the vector $\mathbf{w}$ defined as \[ \mathbf{w} = \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_1}{\mathbf{u}_1\!\cdot\!\mathbf{u}_1} \right) \mathbf{u}_1 + \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_2}{\mathbf{u}_2\!\cdot\!\mathbf{u}_2} \right) \mathbf{u}_2 + \cdots + \left( \frac{\mathbf{y}\!\cdot\!\mathbf{u}_m}{\mathbf{u}_m\!\cdot\!\mathbf{u}_m} \right) \mathbf{u}_m. \] is the projection of $\mathbf{y}$ onto $\mathcal{W}.$ That is we will prove $\mathbf{w} = \operatorname{Proj}_{\mathcal W}(\mathbf{y}).$ To prove the preceding equality we need to prove properties ① and ② in the definition of the orthogonal projection. That is we need to prove: \[ \mathbf{w} \in \mathcal{W} \qquad \text{and} \qquad (\mathbf{y} - \mathbf{w}) \perp \mathcal{W}. \]
- Since $\mathcal W$ is the span of the vectors $\mathbf u_1, \ldots, \mathbf u_m$ and $\mathbf{w}$ is a linear combination of the vectors $\mathbf u_1, \ldots, \mathbf u_m$ we have that $\mathbf{w} \in \mathcal{W}.$ This proves ①.
- To prove ②, that is to prove \[ (\mathbf{y} - \mathbf{w}) \perp \mathcal{W} \] we recall the equivalence \[ (\mathbf{y} - \mathbf{w}) \perp \mathcal{W} \qquad \Leftrightarrow \qquad \forall\mkern+1mu k\in\{1,\ldots,m\} \quad (\mathbf{y} - \mathbf{w})\cdot \mathbf{u}_k = 0. \]
- The proof of the right-hand side in the preceding equivalence follows. Let $k\in\{1,\ldots,m\}$ be arbitrary. We calculate \begin{align*} (\mathbf{y} - \mathbf{w})\cdot \mathbf{u}_k & = \mathbf{y}\cdot \mathbf{u}_k - \mathbf{w}\cdot \mathbf{u}_k \\ &= \mathbf{y}\cdot \mathbf{u}_k - \left( \sum_{j=1}^m \frac{\mathbf{y}\!\cdot\!\mathbf{u}_j}{\mathbf{u}_j\!\cdot\!\mathbf{u}_j} \, \mathbf{u}_j \right)\cdot \mathbf{u}_k \\ & = \mathbf{y}\cdot \mathbf{u}_k - \sum_{j=1}^m \frac{\mathbf{y}\!\cdot\!\mathbf{u}_j}{\mathbf{u}_j\!\cdot\!\mathbf{u}_j} (\mathbf{u}_j\cdot \mathbf{u}_k) \\ & = \mathbf{y}\cdot \mathbf{u}_k - \mathbf{y} \cdot \mathbf{u}_k \\ & = 0. \end{align*} Since $k\in\{1,\ldots,m\}$ was arbitrary, this proves that for all $k\in\{1,\ldots,m\}$ we have $(\mathbf{y} - \mathbf{w})\cdot \mathbf{u}_k = 0.$ By the equivalence in the preceding item property ② is proved.

Talking about orthogonal projections, a nice application of an orthogonal projection is converting a color picture to a black&white picture. To understand how this works, one first needs to understand that colors are in fact vectors in $\mathbb{R}^3,$ not all vectors in $\mathbb{R}^3,$ but only vectors confined to the unit cube \[ [0,1]^3 = \left\{ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \, : \, r, g, b \in [0,1] \right\}. \] Here $[0,1]$ denotes the closed unit interval of real numbers defined as \[ [0,1] = \bigl\{x \in \mathbb{R} \, : \, 0 \leq x \ \ \text{and} \ \ x \leq 1 \bigr\}. \] I always use so called RGB color model. I like using RGB triplets with entries between $0$ and $1$, including $0$ and $1$.
I love the application of vectors to COLORS so much that I wrote a webpage to celebrate it: Color Cube.
In class, I explained how the orthogonal projection onto the vector $\left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right]$ can be used to convert a color image into a black&white image. The formula for this projection is: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{3} \left[\begin{array}{c} r+g+b \\[-2pt] r+g+b \\[-1pt] r+g+b \end{array}\right] \]
In the animation below, the shades of Gray are on the diagonal of the unit cube joining the corners $(0,0,0)$ (Black) and $(1,1,1)$ (White). For the hundred shades of Gray $(a,a,a)$ with $a \in [0,1],$ I present a polygon of all the colors that are projected to that shade of Gray. The shade of Gray can be identified by the dot in the center of the polygon of colors. I present both a three-dimensional picture in the Color Cube and a two-dimensional orthogonal projection onto the plane which is orthogonal to the vector $\left[\begin{array}{c}1 \\ 1 \\ 1 \end{array}\right].$ Hover the cursor over the image for the animation to start.
What are the practical results of Linear Algebra on colors with a real life picture? I took a small picture of me, imported it into Wolfram Mathematica and did three linear algebra operations on the colors in that picture.
- The first picture from the left is the original picture.
- The second picture from the left is obtained from the original picture by replacing each color vector by its projection onto the vector containing all shades of Gray: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{3} \left[\begin{array}{c} r+g+b \\[-2pt] r+g+b \\[-1pt] r+g+b \end{array}\right]. \]
- The third picture from the left is obtained from the original picture by replacing each color vector by its darkened version: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]. \] In fact, a linear algebra definition of a dark color is as follows: A dark version of a color represented by its vector is the color represented by the scaling that vector by $1/2.$ All dark shades of that color are obtained by scaling that vector by a scalar between $0$ and $1.$
- The fourth picture from the left is obtained from the original picture by replacing each color vector by its lightened version, that is by the vector which is a half way between the color vector and white: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] + \frac{1}{2} \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right]. \] In fact, a linear algebra definition of a light color is as follows: A light version of a color represented by its vector is the color represented by the linear combination of that vector and the white vector with coefficients equal to $1/2.$ All light shades of a color are obtained the following formula \[ (1-s) \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] + s \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right] \qquad \text{where} \qquad s \in (0,1). \]
- Justifications for the definitions in the preceding two items are the pictures below:
  - The first picture above from the left is the original picture.
  - In the second picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{3}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right].$
  - In the third picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right].$
  - In the fourth picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right].$
  - The first picture above from the left is the original picture.
  - In the second picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{3}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]+\dfrac{1}{4} \left[\begin{array}{c} 1 \\[-2pt] 1 \\[-2pt] 1 \end{array}\right].$
  - In the third picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]+\dfrac{1}{2} \left[\begin{array}{c} 1 \\[-2pt] 1 \\[-2pt] 1 \end{array}\right].$
  - In the fourth picture, each color vector $\left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]$ is replaced by $\dfrac{1}{4} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]+\dfrac{3}{4} \left[\begin{array}{c} 1 \\[-2pt] 1 \\[-2pt] 1 \end{array}\right].$

Today we also started Section 6.4: Gram-Schmidt Orthogonalization Algorithm. I made up this name. It seems that it is an appropriate modern name. But, I think that mathematically the most fitting name would be: Gram-Schmidt Orthogonalization Recursion. It is a recursive formula. The formula for the next vector is given in terms of the all previously calculated vectors. Suggested problems for Section 6.4: 2, 3, 5, 7, 9, 13, 15, 17, 19, 20.
Let $\mathbf{a}_1, \mathbf{a}_2, \ldots, \mathbf{a}_m$ be given linearly independent vectors in $\mathbb{R}^n$. At the end of the class, I deduced the formulas for orthogonal vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ which have the same spans as the given vectors. In fact we have \[ \text{for all} \quad k \in \{1,\ldots,m\} \quad \text{we have} \quad \operatorname{Span} \{\mathbf{a}_1, \ldots, \mathbf{a}_k\} = \operatorname{Span} \{\mathbf{v}_1, \ldots, \mathbf{v}_k\}. \] The formulas for mutually orthogonal vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ are as follows: \begin{align*} \mathbf{v}_1 & = \mathbf{a}_1 \\ \mathbf{v}_2 & = \mathbf{a}_2 - \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 \\ \mathbf{v}_3 & = \mathbf{a}_3 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2 \\ & \ \ \vdots \\ \mathbf{v}_m & = \mathbf{a}_m - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 - \cdots - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\mathbf{v}_{m-1} \cdot \mathbf{v}_{m-1}} \right) \mathbf{v}_{m-1} \\ \end{align*} The above formulas are called the Gram-Schmidt orthogonalization algorithm. Starting from linearly independent vectors the Gram-Schmidt orthogonalization algorithm produces mutually orthogonal vectors.

Friday, February 3, 2023

We started Section 6.2 today. Suggested problems are: 2, 3, 5, 8, 9, 11, 13, 15, 17, 19, 21, 23, 25, 26, 27, 29.
The most important theorems in Section 6.2 are Theorem 4 and Theorem 5. In fact, Theorem 4 is a consequence of Theorem 5. Please understand what I mean by this claim.
We also covered Theorem 8 from Section 6.3. And this is one of the most important theorems of this section.
In this item I will illustrate Theorem 8 with the column space of following $4\times 2$ matrix \[ A = \left[\begin{array}{cc} -1 & -1 \\ 1 & -1\\ 1 & 1 \\ -1 & 1 \end{array}\right]. \]
- We have encountered these vectors yesterday. We have seen that the columns of the matrix $A$ form an orthogonal set of vectors. To verify this claim again is to calculate \[ A^\top A = \left[ \begin{array}{cccc} -1 & 1 & 1 & -1 \\ -1 & -1 & 1 & 1 \end{array} \right] \left[\begin{array}{cc} -1 & -1 \\ 1 & -1\\ 1 & 1 \\ -1 & 1 \end{array}\right] = \left[ \begin{array}{cc} 4 & 0 \\ 0 & 4 \end{array} \right]. \] Denote the columns of $A$ by $\mathbf{u}_1$ and $\mathbf{u}_2$. The above matrix calculation shows that \[ \mathbf{u}_1\cdot \mathbf{u}_1 = 4, \quad \mathbf{u}_1\cdot \mathbf{u}_2 = 0, \quad \mathbf{u}_2\cdot \mathbf{u}_2 = 0. \]
- Given a vector $\mathbf{v}=\left[\!\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}\!\right]$, we want to calculate the orthogonal projection of $\mathbf{v}$ onto the columns space of $A$. As we calculated in class, we need the coefficients \[ \mathbf{v}\cdot \mathbf{u}_1 = 0, \quad \mathbf{v}\cdot \mathbf{u}_2 = 4. \] Then the orthogonal projection of $ \mathbf{v}$ onto $\operatorname{Col}(A)$ is calculated as \[ \operatorname{proj}_{\operatorname{Col}(A)}\mathbf{v} = \frac{ \mathbf{v}\cdot \mathbf{u}_1}{ \mathbf{u}_1\cdot \mathbf{u}_1} \mathbf{u}_1 + \frac{ \mathbf{v}\cdot \mathbf{u}_2}{ \mathbf{u}_2\cdot \mathbf{u}_2} \mathbf{u}_2 = \frac{0}{4} \left[\!\begin{array}{c} -1\\ 1 \\ 1 \\ -1 \end{array}\!\right]+\frac{4}{4} \left[\!\begin{array}{c} -1\\ -1 \\ 1 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{c} -1\\ -1 \\ 1 \\ 1 \end{array}\!\right]. \]
- What makes the projection special is the following orthogonality relation: \[ \bigl(\mathbf{v} - \operatorname{proj}_{\operatorname{Col}(A)}\mathbf{v} \bigr) \cdot \operatorname{proj}_{\operatorname{Col}(A)}\mathbf{v} = 0. \] Let us verify this property: \[ \left( \left[\!\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}\!\right] - \left[\!\begin{array}{c} -1\\ -1 \\ 1 \\ 1 \end{array}\!\right] \right) \cdot \left[\!\begin{array}{c} -1\\ -1 \\ 1 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{c} 2\\ 3 \\ 2 \\ 3 \end{array}\!\right] \cdot \left[\!\begin{array}{c} -1\\ -1 \\ 1 \\ 1 \end{array}\!\right] = 0. \]

Recall from yesterday, how it is important to be able to view matrix multiplication as a collection of dot products. Let $k,m$ and $n$ be positive integers and let $A$ be a $k\!\times\!m$ matrix and $B$ be a $m\!\times\!n$. Then $A$ has $k$ rows and each row of $A$ is a vector in $\mathbb{R}^m$. Similarly, $B$ has $n$ columns and each column of $B$ is a vector in $\mathbb{R}^m$. Now introduce the notation: \[ \mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_k \in \mathbb{R}^m \quad \text{are the rows of} \quad A \] \[ \mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_n \in \mathbb{R}^m \quad \text{are the columns of} \quad B \] So, I can write the matrices $A$ and $B$ as \[ A = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right], \qquad B = \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right]. \] Now we calculate the product of $A$ and $B$ as follows: \[ A B = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right] = \left[\!\begin{array}{cccc} \mathbf{r}_1\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_n \\ \mathbf{r}_2\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{r}_k\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_n \end{array}\!\!\right] \]
The above way of thinking is particularly important when we work with a matrix that has orthogonal columns. I will show this with an $m\!\times\!4$ matrix $B$ whose columns are orthogonal (as in the above example of $A$ with two columns), and assume that the columns are nonzero. That is \[ B = \bigl[\!\begin{array}{cccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \mathbf{u}_4 \end{array}\!\!\bigr]. \] We assume that for all $i,j \in \{1,2,3,4\}$ we have \[ \mathbf{u}_j\!\cdot\!\mathbf{u}_j \neq 0 \quad \text{and} \quad \mathbf{u}_i\!\cdot\!\mathbf{u}_j = 0 \quad \text{whenever} \quad i\neq j. \] The magic is that in this case multiplying $B$ by $B^\top$ is magical: \begin{align*} B^\top B & = \left[\!\begin{array}{c} \mathbf{u}_1^\top \\ \mathbf{u}_2^\top \\ \mathbf{u}_3^\top \\ \mathbf{u}_4^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \mathbf{u}_4 \end{array}\!\right] \\ &= \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_1\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_1\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_1\!\!\cdot\!\mathbf{u}_4 \\ \mathbf{u}_2\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_4 \\ \mathbf{u}_3\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_4 \\ \mathbf{u}_4\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \\ &= \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & 0 & 0 & 0 \\ 0 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & 0 & 0 \\ 0 & 0 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & 0 \\ 0 & 0 & 0 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \end{align*}
Why is the above calculation important? Let say that we know that $\mathbf{y} \in \operatorname{Col}B$ and we want to solve the equation \[ B \mathbf{x} = \mathbf{y}, \] where $B$ is an $m\!\times\!4$ matrix whose columns are orthogonal and nonzero, that is \[ B = \bigl[\!\begin{array}{cccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \mathbf{u}_4 \end{array}\!\!\bigr], \] $\mathbf{x} \in \mathbb{R}^4$ and $\mathbf{y} \in \mathbb{R}^m$. That is we want to find $x_1,x_2,x_3,x_4 \in \mathbb{R}$ such that \[ x_1 \mathbf{u}_1 + x_2 \mathbf{u}_2 + x_3 \mathbf{u}_3 + x_4 \mathbf{u}_4 = \mathbf{y}. \] The solution is easy. Just multiply both sides of \[ B \mathbf{x} = \mathbf{y} \] by $B^\top$ to get \[ B^\top B \mathbf{x} = B^\top \mathbf{y}. \] Since \[ B^\top B = \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & 0 & 0 & 0 \\ 0 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & 0 & 0 \\ 0 & 0 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & 0 \\ 0 & 0 & 0 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \] and \[ B^\top \mathbf{y} = \left[\!\begin{array}{c} \mathbf{u}_1^\top \\ \mathbf{u}_2^\top \\ \mathbf{u}_3^\top \\ \mathbf{u}_4^\top \end{array}\!\!\right] \mathbf{y} = \left[\!\begin{array}{c} \mathbf{u}_1\!\cdot\!\mathbf{y} \\ \mathbf{u}_2\!\cdot\!\mathbf{y} \\ \mathbf{u}_3\!\cdot\!\mathbf{y} \\ \mathbf{u}_4\!\cdot\!\mathbf{y} \end{array}\!\!\right] \] we have that equation \[ B^\top B \mathbf{x} = B^\top \mathbf{y}. \] becomes \[ \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & 0 & 0 & 0 \\ 0 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & 0 & 0 \\ 0 & 0 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & 0 \\ 0 & 0 & 0 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \left[\!\begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array}\!\!\right] = \left[\!\begin{array}{c} \mathbf{u}_1\!\cdot\!\mathbf{y} \\ \mathbf{u}_2\!\cdot\!\mathbf{y} \\ \mathbf{u}_3\!\cdot\!\mathbf{y} \\ \mathbf{u}_4\!\cdot\!\mathbf{y} \end{array}\!\!\right]. \] The last equation is super simple to solve: \[ x_1 = \frac{\mathbf{u}_1\!\cdot\!\mathbf{y}}{\mathbf{u}_1\!\!\cdot\!\mathbf{u}_1}, \quad x_2 = \frac{\mathbf{u}_2\!\cdot\!\mathbf{y}}{\mathbf{u}_2\!\!\cdot\!\mathbf{u}_2}, \quad x_3 = \frac{\mathbf{u}_3\!\cdot\!\mathbf{y}}{\mathbf{u}_3\!\!\cdot\!\mathbf{u}_3}, \quad x_4 = \frac{\mathbf{u}_4\!\cdot\!\mathbf{y}}{\mathbf{u}_4\!\!\cdot\!\mathbf{u}_4}, \] In other words, if $\mathbf{y} \in \operatorname{Col}B$, then it must be that \[ \mathbf{y} = \left( \frac{\mathbf{u}_1\!\cdot\!\mathbf{y}}{\mathbf{u}_1\!\!\cdot\!\mathbf{u}_1} \right) \mathbf{u}_1 + \left( \frac{\mathbf{u}_2\!\cdot\!\mathbf{y}}{\mathbf{u}_2\!\!\cdot\!\mathbf{u}_2} \right) \mathbf{u}_2 + \left( \frac{\mathbf{u}_3\!\cdot\!\mathbf{y}}{\mathbf{u}_3\!\!\cdot\!\mathbf{u}_3} \right) \mathbf{u}_3 + \left( \frac{\mathbf{u}_4\!\cdot\!\mathbf{y}}{\mathbf{u}_4\!\!\cdot\!\mathbf{u}_4} \right) \mathbf{u}_4 . \]

Thursday, February 2, 2023

We started Section 6.1 today. Suggested problems: 1, 5, 7, 8, 9-12, 13, 15-18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32 (do this problem by hand), 33 (do this problem by hand).
Here is a proof of the Law of Cosines and its connection to dot product.
Here is a proof of the classical Pythagorean Theorem.
It is important for you to internalize that we have been working with the dot product all along when multiplying matrices. Let $k,m$ and $n$ be positive integers and let $A$ be a $k\!\times\!m$ matrix and $B$ be a $m\!\times\!n$. Then $A$ has $k$ rows and each row of $A$ is a transpose of a vector in $\mathbb{R}^m$. Similarly, $B$ has $n$ columns and each column of $B$ is a vector in $\mathbb{R}^m$. Now introduce the notation: \[ \mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_k \in \mathbb{R}^m \quad \text{are the transposes of the rows of} \quad A \] \[ \mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_n \in \mathbb{R}^m \quad \text{are the columns of} \quad B \] So, I can write the matrices $A$ and $B$ as \[ A = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right], \qquad B = \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right]. \] Now we calculate the product of $A$ and $B$ as follows: \[ A B = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right] = \left[\!\begin{array}{cccc} \mathbf{r}_1\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_n \\ \mathbf{r}_2\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{r}_k\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_n \end{array}\!\!\right] \]
The most important theorem in Section 6.1 is Theorem 3. This theorem states: Let $m$ and $n$ be positive integers. Let $A$ be an $m\!\times\!n$ matrix. Then \[ \require{bbox} \bbox[#FFFF00, 8px, border: 3px solid #808000]{ \bigl(\operatorname{Row}(A)\bigr)^\perp = \operatorname{Nul}(A) } \quad \text{and} \quad \bbox[#FFFF00, 8px, border: 3px solid #808000]{ \bigl(\operatorname{Col}(A)\bigr)^\perp = \operatorname{Nul}(A^\top)}. \] And also \[ \bbox[#FFFF00, 8px, border: 3px solid #808000]{ \bigl(\operatorname{Nul}(A)\bigr)^\perp = \operatorname{Row}(A)} \quad \text{and} \quad \bbox[#FFFF00, 8px, border: 3px solid #808000]{\bigl(\operatorname{Nul}(A^\top)\bigr)^\perp = \operatorname{Col}(A)}. \]
The theorem from the previous item is useful for a problem like this: Find the orthogonal complement of the following span: \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\}. \]
- We first observe that \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} = \operatorname{Row}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right). \] Therefore, by Theorem 3 in Section 6.1 we deduce \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Nul}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right). \]
- To find the preceding null space Row Reduce to RREF: \[ \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \ \ \sim \quad \cdots \quad \sim \ \ \left[\! \begin{array}{cccc} 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \end{array} \!\right]. \] From the last RREF matrix we deduce that \[ \operatorname{Nul}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right) = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} \]
- Therefore \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} \]
- It is a curious fact that \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ -1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ - 1 \\ 1 \end{array} \!\right]\right\}. \] The preceding claim is a consequence of the fact that \[ \operatorname{Span} \bigl\{ \mathbf{u}, \mathbf{v} \bigr\} = \operatorname{Span} \bigl\{ \mathbf{u} + \mathbf{v}, \mathbf{u} - \mathbf{v} \bigr\}, \] and the fact that $\mathbf{u}$ and $\mathbf{v}$ are linearly independent if and only if $\mathbf{u}+\mathbf{v}$ and $\mathbf{u}-\mathbf{v}$ are linearly independent. (Prove these facts for exercise.)
- As a consequence of the preceding two items we have \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ -1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ -1 \\ 1 \end{array} \!\right]\right\}. \]
- The four vectors that we used in the preceding item are remarkable four vectors in $\mathbb{R}^4.$ One can see how remarkable they are if we put these four vectors as the columns of a $4\!\times\!4$ matrix \[ U = \left[\! \begin{array}{cccc} 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & 1 \\ 1 & 1 & 1 & -1 \\ 1 & -1 & 1 & 1 \end{array} \!\right] \] and calculate \[ U^\top U = \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ -1 & 1 & -1 & 1 \end{array} \!\right] \left[\! \begin{array}{cccc} 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & 1 \\ 1 & 1 & 1 & -1 \\ 1 & -1 & 1 & 1 \end{array} \!\right] = \left[\! \begin{array}{cccc} 4 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 4 \end{array} \!\right] \] The preceding matrix equality tells us in a matrix form that the columns of the matrix $U$ are orthogonal to each other and each column as a vector in $\mathbb{R}^4$ has norm (length) equal to $2.$
- Below I will present a specific matrix multiplication calculation presented through dot products of rows and columns, as I explained above: \[ A B = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right] = \left[\!\begin{array}{cccc} \mathbf{r}_1\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_n \\ \mathbf{r}_2\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{r}_k\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_n \end{array}\!\!\right] \]
- As an example, I want to recall the matrix multiplication related to the Reduced Row Echelon Form of a matrix. Below is a $4\times 5$ matrix and its reduced row echelon form: \[ A = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] \ \sim \quad \cdots \quad \sim \ \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array} \!\right] \]
  - Whenever the reduced row echelon form (RREF) is found we should observe the following remarkable identity: The matrix product of the $4\times 3$ matrix consisting of the pivot columns of $A$ and the $3\times 5$ matrix consisting of the nonzero rows of the RREF of $A$ results in the matrix $A$: \[ \require{bbox} \bbox[#7FBFBF, 8px, border: 3px solid teal]{ \left[\! \begin{array}{rrr} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 2 & 1 & 1 \\ 1 & 4 & 2 \end{array} \!\right] \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ \end{array} \!\right] = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] = A }. \] I decided to color this identity in teal since it is beautealful.
  - Notice that the columns in the $4\!\times\!3$ matrix in the teal identity are the basis vectors for $\operatorname{Col}(A).$ Similarly, the rows in the $3\!\times\!5$ matrix in the teal identity are the basis vectors for $\operatorname{Row}(A).$
  - By the rules of matrix multiplication the teal identity tells us that the columns of the matrix $A$ are linear combinations of the columns of the $4\!\times\!3$ matrix in the teal identity. The coefficients in these linear combinations are the columns of $3\!\times\!5$ matrix in the teal identity.
  - By the rules of matrix multiplication the teal identity tells us that the rows of the matrix $A$ are linear combinations of the rows of the $3\!\times\!5$ matrix in the teal identity. The coefficients in these linear combinations are the rows of $4\!\times\!3$ matrix in the teal identity.
- In this item I will illustrate the multiplication of two matrices below by emphasising the dot products between $\mathbb{R}^3$ vectors: \[ \require{bbox} \bbox[#7FBFBF, 8px, border: 3px solid teal]{ \left[\! \begin{array}{rrr} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 2 & 1 & 1 \\ 1 & 4 & 2 \end{array} \!\right] \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ \end{array} \!\right] = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] }. \] Here the left matrix in the product is a $4\times\!3$ matrix with the rows: \[ \left[\! \begin{array}{r} 1 \\ 3 \\ 2 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 2 \\ 0 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 2 \\ 1 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 1 \\ 4 \\ 2 \end{array} \!\right], \] and right matrix in the product is a $3\times\!5$ matrix with the columns: \[ \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right]. \] It is useful to think of the matrix multiplication as follows (as you read through the matrix-vector arithmetic below always look for dot products in different forms, always): \begin{align*} & \left[\!\! \begin{array}{r} \phantom{\biggl|\biggr.} \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr] \end{array} \!\right] \left[\! \begin{array}{rrrrr} \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \end{array} \!\right] \\ & = \left[\! \begin{array}{rrrrr} \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \end{array} \!\right] \\ & = \left[\! \begin{array}{ccccc} 1\!\cdot\!1 + 3\!\cdot\!0 + 2\!\cdot\!0 & 1\!\cdot\!0 + 3\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!(-1) + 3\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!0 + 3\!\cdot\!0 + 2\!\cdot\!1 & 1\!\cdot\!1 + 3\!\cdot\!1 + 2\!\cdot\!(-1) \\ 2\!\cdot\!1 + 0\!\cdot\!0 + 1\!\cdot\!0 & 2\!\cdot\!0 + 0\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!(-1) + 0\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!0 + 0\!\cdot\!0 + 1\!\cdot\!1 & 2\!\cdot\!1 + 0\!\cdot\!1 + 1\!\cdot\!(-1) \\ 2\!\cdot\!1 + 1\!\cdot\!0 + 1\!\cdot\!0 & 2\!\cdot\!0 + 1\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!(-1) + 1\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!0 + 1\!\cdot\!0 + 1\!\cdot\!1 & 2\!\cdot\!1 + 1\!\cdot\!1 + 1\!\cdot\!(-1) \\ 1\!\cdot\!1 + 4\!\cdot\!0 + 2\!\cdot\!0 & 1\!\cdot\!0 + 4\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!(-1) + 4\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!0 + 4\!\cdot\!0 + 2\!\cdot\!1 & 1\!\cdot\!1 + 4\!\cdot\!1 + 2\!\cdot\!(-1) \end{array} \!\right] \\ &= \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right]. \end{align*}

Wednesday, February 1, 2023

Consider the $3\times 3$ matrix \[ A = \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right]. \] You might be able to figure out the geometric action of this matrix by looking at what this matrix does to the coordinate vectors. But I want to demonstrate how to use the eigenvalues and eigenvectors to figure that out.
- First calculate the eigenvalues by finding the characteristic polynomial of $A$ \begin{align*} \det\left( A - \lambda I_3 \right) & = \left| \begin{array}{ccc} -\lambda & 0 & 1\\ 1 & -\lambda & 0 \\ 0 & 1 & -\lambda \end{array} \right| \\ & = (-\lambda) \left| \begin{array}{cc} -\lambda & 0 \\ 1 & -\lambda \end{array} \right| + \left| \begin{array}{cc} 1 & -\lambda \\ 0 & 1 \end{array} \right| \\ & = -\lambda^3 + 1. \end{align*} To find all the roots of the cubic equation $\lambda^3 - 1 = 0$, we factor it \[ \lambda^3 - 1 = (\lambda -1) (\lambda^2 + \lambda + 1), \] and find out that \[ \lambda_1 = 1, \quad \lambda_2 = -\frac{1}{2} - i \frac{\sqrt{3}}{2}, \quad \lambda_3 = -\frac{1}{2} + i \frac{\sqrt{3}}{2} \] are the eigenvalues of $A$.
- Corresponding eigenvectors : \begin{alignat*}{3} \lambda_1 & = 1 \quad && \text{a corresponding eigenvector is} \quad \mathbf{v}_1 &&= \left[\! \begin{array}{c} 1 \\ 1 \\ 1\end{array} \!\right] \\ \lambda_2 & = -\frac{1}{2} - i \frac{\sqrt{3}}{2} \quad && \text{a corresponding eigenvector is} \quad \mathbf{v}_2 && = \left[\! \begin{array}{c} -1 \\ -1 \\ 2 \end{array} \!\right] + i \left[\! \begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array} \!\right] \\ \lambda_3 & = -\frac{1}{2} + i \frac{\sqrt{3}}{2} \quad && \text{a corresponding eigenvector is} \quad \mathbf{v}_3 && = \left[\! \begin{array}{c} -1 \\ -1 \\ 2 \end{array} \!\right] - i \left[\! \begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0\end{array} \!\right] \end{alignat*}
- Homework: Verify that the above calculated eigenvalues and eigenvectors are correct.
- The significance of the eigenvector $\left[\! \begin{array}{c} 1 \\ 1 \\ 1\end{array} \!\right]$ is that it remains unchanged under $A$. Not only that, but any scalar multiple of this eigenvector remains unchanged under $A$. For all $x\in \mathbb{R}$ we have: \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] \left[\! \begin{array}{c} x \\ x \\ x\end{array} \!\right] = \left[\! \begin{array}{c} x \\ x \\ x \end{array} \!\right] \]
- Below we will discover the significance of the real and the imaginary part of the eigenvector $\mathbb{v}_2$: \[ \left[\! \begin{array}{c} -1 \\ -1 \\ 2 \end{array} \!\right], \quad \left[\! \begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array} \!\right]. \]
- Recall the following equivalences that we established in the post on Saturday: \[ A (\mathbf{x} + i \mkern 1mu \mathbf{y}) = (a-i \mkern 1mu b) (\mathbf{x} + i \mkern 1mu \mathbf{y}) \quad \Leftrightarrow \quad \begin{array}{l} A\mathbf{x} = \phantom{-} a \mathbf{x} + b \mathbf{y} \\ A\mathbf{x} = -b \mathbf{x} + a \mathbf{y} \end{array} \quad \Leftrightarrow \quad A \left[\! \begin{array}{cc} \mathbf{x} & \mathbf{y} \end{array} \!\right] = \left[\! \begin{array}{cc} \mathbf{x} & \mathbf{y} \end{array} \!\right] \left[\! \begin{array}{cc} a & -b \\ b & a \end{array} \!\right]. \]
- Applying the preceding equivalence to the complex eigenvalue $\lambda_2$ and its eigenvector we found above, we get \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] \left[\! \begin{array}{cc} -1 & \sqrt{3} \\ -1 & -\sqrt{3} \\ 2 & 0 \end{array} \!\right] = \left[\! \begin{array}{cc} -1 & \sqrt{3} \\ -1 & -\sqrt{3} \\ 2 & 0 \end{array} \!\right] \left[\! \begin{array}{cc} -\frac{1}{2} & -\frac{\sqrt{3}}{2} \\ \frac{\sqrt{3}}{2} & -\frac{1}{2} \end{array} \!\right]. \] Since \[ \cos\bigl(2\pi/3\bigr)= -\frac{1}{2} \qquad \text{and} \qquad \sin\bigl(2\pi/3\bigr)= \frac{\sqrt{3}}{2}, \] the matrix \[ \left[\! \begin{array}{cc} -\frac{1}{2} & -\frac{\sqrt{3}}{2} \\ \frac{\sqrt{3}}{2} & -\frac{1}{2} \end{array} \!\right] \] represents the counterclockwise rotation by $\theta = 2 \pi/3.$
- The identity that we discovered in the preceding item tells us that the matrix $A$ acts as the rotation by $\theta = 2 \pi/3$ in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -1 \\ -1 \\ 2 \end{array} \!\right], \quad \left[\! \begin{array}{c} \sqrt{3} \\ -\sqrt{3} \\ 0 \end{array} \!\right] \]
- Now the question arises, how to implement the eigenvalue $\lambda_1 = 1$ and its eigenvector $\left[\! \begin{array}{c} 1 \\ 1 \\ 1\end{array} \!\right]$ to discover an identity corresponding to hidden dilation-rotation for a $3\times 3$ matrix. The following identity holds: \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] \left[\! \begin{array}{ccc} 1 &-1 & \sqrt{3} \\ 1 & -1 & -\sqrt{3} \\ 1 & 2 & 0 \end{array} \!\right] = \left[\! \begin{array}{ccc} 1 &-1 & \sqrt{3} \\ 1 & -1 & -\sqrt{3} \\ 1 & 2 & 0 \end{array} \!\right] \left[\! \begin{array}{ccc} 1 & 0 & 0 \\ 0 & -\frac{1}{2} & -\frac{\sqrt{3}}{2} \\ 0 & \frac{\sqrt{3}}{2} & -\frac{1}{2} \end{array} \!\right]. \] Now calculate \[ \left[\! \begin{array}{ccc} 1 &-1 & \sqrt{3} \\ 1 & -1 & -\sqrt{3} \\ 1 & 2 & 0 \end{array} \!\right] ^{-1} = \frac{1}{6} \left[\! \begin{array}{ccc} 2 & 2 & 2 \\ -1 & -1 & 2 \\ \sqrt{3} & -\sqrt{3} & 0 \end{array} \!\right] \] Thus, the hidden dilation-rotation formula for the $3\times 3$ matrix $A$ is \[ \left[\! \begin{array}{ccc} 0 & 0 & 1\\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \!\right] = \left[\! \begin{array}{ccc} 1 &-1 & \sqrt{3} \\ 1 & -1 & -\sqrt{3} \\ 1 & 2 & 0 \end{array} \!\right] \left[\! \begin{array}{ccc} 1 & 0 & 0 \\ 0 & -\frac{1}{2} & -\frac{\sqrt{3}}{2} \\ 0 & \frac{\sqrt{3}}{2} & -\frac{1}{2} \end{array} \!\right] \left[\! \begin{array}{ccc} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ -\frac{1}{6} & -\frac{1}{6} & \frac{1}{3} \\ \frac{\sqrt{3}}{6} & -\frac{\sqrt{3}}{6} & 0 \end{array} \!\right] . \]
- Homework: Explain the meaning of the formula in the preceding item using the concept of change of coordinates.

Monday, January 30, 2023

Today we did more of Section 5.5. Suggested problems for Section 5.5: 1-6, 7-12, 13, 16, 17, 18, 21, 25, 26.
I presented the abstract part of what I posted on Saturday. The main result in this section is the Hidden Rotation-Dilation Theorem:

Theorem. Let $A$ be a real $2\!\times\!2$ matrix with a nonreal eigenvalue $a-i b$ and a corresponding eigenvector $\mathbf{u} + i \mathbf{w}$. Here $a, b \in \mathbb{R},$ $b\neq 0$ and $\mathbf{u}, \mathbf{w}\in \mathbb{R}^2.$ Then the $2\!\times\!2$ matrix \[ P = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \] is invertible and \[ A = \alpha P \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] P^{-1}, \] where $\alpha = \sqrt{a^2 + b^2}$ and $\theta \in [0, 2\pi)$ is such that \[ \cos \theta = \frac{a}{\sqrt{a^2 + b^2}}, \quad \sin \theta = \frac{b}{\sqrt{a^2 + b^2}}. \]

In the above theorem the matrix \[ \alpha \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] = \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] is the Hidden Rotation-Dilation, $\alpha$ dilates and $\theta$ rotates.
On Friday, I illustrated the Hidden Rotation-Dilation Theorem with the matrix: \[ \left[\! \begin{array}{cc} 1 & 4 \\ -8 & -7 \end{array} \!\right]. \]
You can work out the details for the matrices \[ \left[\! \begin{array}{cc} -1 & 2 \\ -1 & 1 \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{cc} 2 & -5 \\ 1 & -2 \end{array} \!\right]. \]
For practice problems in linear algebra we often need small matrices with small integer entries with small integer eigenvalues and eigenvectors whose entries are also small integers. I programmed Mathematica to print many such matrices for me. I call such matrices the Beautiful Matrices.

The Book of Beautiful Matrices consists of two-by-two matrices whose entries and eigenvalues are integers between -9 and 9. I consider only the matrices with the nonnegative top-left entry. In addition, I consider only matrices with the relatively prime entries. To get matrices that are omitted in this way you just multiply one of the given matrices by an integer. You need to adjust the eigenvalues by multiplying them with the same integer. The eigenvectors remain unchanged.
I divided the Book in three volumes: Volume 1 contains matrices with real distinct eigenvalues, Volume 2 contains matrices with non-real eigenvalues (whose real and imaginary part are integers between -9 and 9) and Volume 3 contains matrices with a repeated eigenvalue. The eigenvalues and a corresponding eigenvector (and a root vector for repeated eigenvalues) are given for each matrix. Three volumes in pdf format are here:
- There are 4292 matrices in Volume 1. Here you can find Volume 1 ordered by eigenvalues.
- There are 1164 matrices in Volume 2. Here you can find Volume 2 ordered by the complex eigenvalues.
- There are 270 matrices in Volume 3. Here you can find Volume 3 ordered by repeated eigenvalues.

Saturday, January 28, 2023

Yesterday we did Section 5.5. Suggested problems for Section 5.5: 1-6, 7-12, 13, 16, 17, 18, 21, 25, 26.
We work with complex numbers in this section. Recall the properties of complex conjugation. If $z = a + i b,$ with $a, b \in \mathbb{R},$ then the complex conjugate $\overline{z}$ of $z$ is the complex number $\overline{z} = a - i b.$ The following algebra holds for complex conjugation. For complex numbers $z \in \mathbb{C}$ and $w \in \mathbb{C}$ we have \[ \overline{z + w} = \overline{z} + \overline{w} \qquad \text{and} \qquad \overline{z \, w} = \overline{z}\mkern 4mu \overline{w}. \] To verify these algebraic properties of conjugation we set $z = a + i b$ and $w = c + i d$ with $a, b, c, d \in \mathbb{R}.$
- We calculate \begin{align*} z + w & = (a+c) + i (b + d), \\ \overline{z + w} & = (a+c) - i (b + d), \\ \overline{z} & = a - i b, \\ \overline{w} & = c - i d, \\ \overline{z} + \overline{w} & = (a+c) + i (-b - d) = (a+c) - i (b + d). \end{align*} Comparing the second equality and the last equality above, we deduce $\overline{z + w} = \overline{z} + \overline{w}.$
- We calculate \begin{align*} z \, w & = (a+ib) (c +i d) = (ac - bd) + i (ad + bc) \\ \overline{z \, w} & = (ac - bd) - i (ad + bc) \\ \overline{z} & = a - i b, \\ \overline{w} & = c - i d, \\ \overline{z} \, \overline{w} & = (a - i b) (c - i d) = (ac - bd) + i (-ad-bc) = (ac - bd) - i (ad+bc). \end{align*} Comparing the second equality and the last equality above, we deduce $\overline{z \, w} = \overline{z} \mkern 3mu \overline{w}.$
- Let $z = a + i b,$ with $a, b \in \mathbb{R}.$ The $\overline{z} = z$ if and only if $a - i b = a + i b.$ The last equality is equivalent to $2 i b = 0$ and this equality is equivalent to $b=0.$ And $b=0$ means $z = a \in \mathbb{R}.$ Therefore, for $z \in \mathbb{C}$ we have $\overline{z} = z$ if and only if $z \in \mathbb{R}.$
A summary of Section 5.5 is as follows. Let $A$ be a real $n\!\times\!n$ matrix. Assume that $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{u} + i \mathbf{w} \quad \text{where} \quad \mathbf{u}, \mathbf{w} \in \mathbb{R}^n. \] That is we assume \[ A \mathbf{v} = \lambda \mathbf{v}, \quad \mathbf{v} \neq \mathbf{0}, \quad \operatorname{Im}(\lambda) = -b \neq 0. \] Next we write the preceding equality with the real and imaginary components: \[ A (\mathbf{u} + i \mathbf{w}) = (a-i b) (\mathbf{u} + i \mathbf{w}). \] Now taking the conjugate of both sides of this equation we get \[ \overline{A (\mathbf{u} + i \mathbf{w})} = \overline{(a-i b) (\mathbf{u} + i \mathbf{w})}. \] Next, using the rules for the conjugation we get \[ \overline{A} \, \overline{(\mathbf{u} + i \mathbf{w})} = \overline{(a-i b)} \, \overline{(\mathbf{u} + i \mathbf{w})}. \] Since $A$ consists of real numbers we have $\overline{A} = A.$ Consequently, the preceding equality reads \[ A (\mathbf{u} - i \mathbf{w}) = (a + i b) (\mathbf{u} - i \mathbf{w}). \] The last equality means that $\overline{\lambda} = a + i b$ is an eigenvalue of $A$ and its corresponding eigenvector is $\overline{\mathbf{v}} = \mathbf{u} - i \mathbf{w}.$
The conclusion from the previous item is: If $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, is a complex eigenvalue of a real matrix $A$ and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{u} + i \mathbf{w} \quad \text{where} \quad \mathbf{u}, \mathbf{w} \in \mathbb{R}^n, \] then the conjugate $\overline{\lambda} = a + i b$ is an also an eigenvalue of $A$ and its corresponding eigenvector is the conjugate vector of $\mathbf{v},$ that is $\overline{\mathbf{v}} = \mathbf{u} - i \mathbf{w}.$
Assume that a real matrix $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{u} + i \mathbf{w} \quad \text{where} \quad \mathbf{u}, \mathbf{w} \in \mathbb{R}^n. \] This means that \[ A \bigl(\mathbf{u} + i \mathbf{w}\bigr) = ( a - i b) \bigl(\mathbf{u} + i \mathbf{w}\bigr). \] Using the linearity of the matrix-vector multiplication and algebra with vectors we can rewrite the preceding equality as \[ A \mathbf{u} + i A \mathbf{w} = (a \mathbf{u} + b \mathbf{w}) + i (- b \mathbf{u} + a \mathbf{w}). \] Since the vectors $A \mathbf{u}$, $A \mathbf{w}$ and $a \mathbf{u} + b \mathbf{w}$, $- b \mathbf{u} + a \mathbf{w}$ are vectors with real entries the preceding equality implies that \begin{align*} A \mathbf{u} & = \ \ a \mathbf{u} + b \mathbf{w} \\ A \mathbf{w} & = - b \mathbf{u} + a \mathbf{w} . \end{align*} The last two vector equalities can be rewritten as one matrix equality \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ A\mathbf{u} \ \ A\mathbf{w} \bigr] = \bigl[ a \mathbf{u} + b \mathbf{w} \ \ \ - b \mathbf{u} + a \mathbf{w} \bigr]. \] The last matrix can be factored as \[ \bigl[ a \mathbf{u} + b \mathbf{w} \ \ \ - b \mathbf{u} + a \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right]. \] Finally, the last two equalities yield \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right]. \] Notice that the matrices in the preceding equality are $n\!\times\!2$ matrices.
Let $A$ be a real $n\!\times\!n$ matrix. Assume that $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{u} + i \mathbf{w} \quad \text{where} \quad \mathbf{u}, \mathbf{w} \in \mathbb{R}^n. \] In class we will prove the following implication: If \[ A (\mathbf{u} + i \mathbf{w}) = (a - i b) (\mathbf{u} + i \mathbf{w}), \quad \mathbf{u} + i \mathbf{w} \neq \mathbf{0}, \quad b \neq 0, \] than \[ \mathbf{u} \quad \text{and} \quad \mathbf{w} \qquad \text{are linearly independent.} \]
Now assume that $A$ is a real $2\!\times\!2$ matrix. In this case, since the real vectors $\mathbf{u}$ and $\mathbf{w}$ are linearly independent, the $2\!\times\!2$ matrix $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ is invertible. Therefore, the equality \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] can be rewritten as \[ A = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]^{-1}. \] The matrix \[ \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] is a composition of a scaling and a rotation. To see that factor out $\sqrt{a^2+b^2}$ from the preceding matrix: \[ \sqrt{a^2+b^2} \left[\! \begin{array}{rr} \frac{a}{\sqrt{a^2+b^2}} & -\frac{b}{\sqrt{a^2+b^2}} \\ \frac{b}{\sqrt{a^2+b^2}} & \frac{a}{\sqrt{a^2+b^2}} \end{array} \!\right]. \] Since \[ \left( \frac{a}{\sqrt{a^2+b^2}} \right)^2 + \left( \frac{b}{\sqrt{a^2+b^2}} \right)^2 = 1, \] there exists an angle $\theta \in (-\pi,\pi]$ such that \[ \cos \theta = \frac{a}{\sqrt{a^2+b^2}} \quad \text{and} \quad \sin \theta = \frac{b}{\sqrt{a^2+b^2}} . \] Thus, with $\alpha = \sqrt{a^2+b^2}$ we have \[ \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] = \alpha \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right]. \]
The above considerations are summarized in the following theorem which is called the Hidden Rotation-Dilation Theorem:

Theorem. Let $A$ be a real $2\!\times\!2$ matrix with a nonreal eigenvalue $a-i b$ and a corresponding eigenvector $\mathbf{u} + i \mathbf{w}$. Here $a, b \in \mathbb{R},$ $b\neq 0$ and $\mathbf{u}, \mathbf{w}\in \mathbb{R}^2.$ Then the $2\!\times\!2$ matrix \[ P = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \] is invertible and \[ A = \alpha P \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] P^{-1}, \] where $\alpha = \sqrt{a^2 + b^2}$ and $\theta \in [0, 2\pi)$ is such that \[ \cos \theta = \frac{a}{\sqrt{a^2 + b^2}}, \quad \sin \theta = \frac{b}{\sqrt{a^2 + b^2}}. \]

In the above theorem the matrix \[ \alpha \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] = \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] is the Hidden Rotation-Dilation, $\alpha$ dilates and $\theta$ rotates.
Here is an example of the above procedure. Consider the matrix \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right]. \] The eigenvalues of this matrix are \[ 4 - 3i \qquad \text{and} \qquad 4+3i. \] The corresponding eigenvectors are \[ \left[\! \begin{array}{r} 1 \\ -1 \end{array} \!\right] + i \left[\! \begin{array}{r} 0 \\ 1 \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{r} 1 \\ -1 \end{array} \!\right] - i \left[\! \begin{array}{r} 0 \\ 1 \end{array} \!\right] \] One of the identity for the matrix $\left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right]$ that we established in the previous item is \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] = \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} 4 & -3 \\ 3 & 4 \end{array} \!\right]. \] Since $\sqrt{4^2+3^2} = 5$ we have \[ \left[\! \begin{array}{rr} 4 & -3 \\ 3 & 4 \end{array} \!\right] = 5 \left[\! \begin{array}{rr} \frac{4}{5} & -\frac{3}{5} \\ \frac{3}{5} & \frac{4}{5} \end{array} \!\right] = 5 \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right], \quad \text{where} \quad \theta = \arccos \frac{4}{5} \approx 0.643501. \] Thus \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right] = 5 \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} \frac{4}{5} & -\frac{3}{5} \\ \frac{3}{5} & \frac{4}{5} \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ 1 & 1 \end{array} \!\right] \]
Here is another example of the above procedure. Consider the matrix \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right]. \] For this matrix it is interesting to calculate its square \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right] = \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] \] and then \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right]^4 = \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array} \!\right]. \] Explain why the fourth power of the given matrix is the identity matrix by using the method presented in this post.

Thursday, January 26, 2023 (updated with three more colors)

Inspired by a question that I got in the class that I teach before this one, I started the class by talking about the colors in relation to linear algebra. I love the application of vectors to COLORS so much that I wrote a webpage to celebrate it: Color Cube.

I emphasized in class that in the red-green-blue coloring scheme, the following eighteen colors stand out. I present them in six steps with three colors in each step.
- Step One:
  - Black (vector $(0,0,0)$),
  - White (vector $(1,1,1)$) and the color between them,
  - Gray (vector $(1/2,1/2,1/2)$) which can be considered as dark White;
- Step Two: the coordinate colors,
  - Red (vector $(1,0,0)$),
  - Green (vector $(0,1,0)$),
  - Blue (vector $(0,0,1)$);
- Step Three: the complementary colors to RGB which are CMY,
  - Cyan (vector $(0,1,1)$) absence of Red,
  - Magenta (vector $(1,0,1)$) absence of Green,
  - Yellow (vector $(1,1,0)$) absence of Blue;
- Step Four: the dark RGB colors,
  - Maroon (vector $(1/2,0,0)$) is the dark Red,
  - Forest (vector $(0,1/2,0)$) is the dark Green,
  - Navy (vector $(0,0,1/2)$) is the dark Blue;
- Step Five: the dark CMY colors,
  - Teal (vector $(0,1/2,1/2)$) is the dark Cyan,
  - Purple (vector $(1/2,0,1/2)$) is the dark Magenta,
  - Olive (vector $(1/2,1/2,0)$) is the dark Yellow.
- Step Six: three (out of total of six) colors between neighbouring RGB and CMY. The logic of our choice here is more complicated. R neighbours with M and Y. G neighbours with C and Y. B neighbours with C and M. It turns out that only three out of these six colors have names.
  - Orange (vector $(1,1/2,0)$) is the color in the middle between Red and Yellow,
  - Chartreuse (vector $(1/2,1,0)$) is the color in the middle between Green and Yellow,
  - Violet (vector $(1/2,0,1)$) is the color in the middle between Blue and Magenta.
Whenever you calculate with complex numbers the goal is to express a complex number in the form $a + i b$ where $a$ and $b$ are real numbers. For example, calculate \[ \frac{3+2 i}{4-i} \] is in fact asking to find the real numbers $a$ and $b$ such that \[ \frac{3+2 i}{4-i} = a + i b. \] One way to do this is to convert it to a linear algebra problem. Multiply both sides by $4-i$ and use the multiplication rules to get \[ 3+2i = (a + i b) (4 - i) = (4a + b) + i (4b-a). \] Therefore, to get $a$ and $b$ we solve the linear system \begin{align*} 4a + \ b & = 3 \\ -a + 4b & = 2. \end{align*} Or, in matrix form \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right] \left[\!\begin{array}{c} a \\ b \end{array}\!\right] = \left[\!\begin{array}{c} 3 \\ 2 \end{array}\!\right]. \] Since \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right]^{-1} = \frac{1}{17} \left[\!\begin{array}{cc} 4 & -1 \\ 1 & 4 \end{array}\!\right] \] we have \[ \frac{3+2 i}{4-i} = a + i b = \frac{10}{17} + \frac{11}{17} i . \]
The calculation that we did above indicates that the matrix \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right] \qquad \text{is related to the complex number} \qquad 4- i \] This is explained as follows: \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right] = 4 \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + (-1) \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \] Notice that \[ \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]^2 = \left[\!\begin{array}{cc} -1 & 0 \\ 0 & -1 \end{array}\!\right] = - \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \] Therefore the matrix \[ \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \qquad \text{plays a role of the imaginary unit} \qquad i. \]
In fact all the algebra with complex numbers $a + i b$ with $a$ and $b$ real numbers can be replicated with matrices \[ a \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + b \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] = \left[\!\begin{array}{cc} a & -b \\ b & a \end{array}\!\right]. \]
In Math 204 we encountered rotation matrices which have the structure of the matrix from the previous item. The rotation matrix for the angle $\theta$ in counterclockwise direction is \[ \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = (\cos\theta) \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + (\sin \theta) \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \]
Notice that \[ \frac{d}{d\theta} \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = \left[\!\begin{array}{cc} -\sin\theta & -\cos\theta \\ \cos\theta & -\sin\theta \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] \] and \[ \left[\!\begin{array}{cc} \cos 0 & -\sin 0 \\ \sin 0 & \cos 0 \end{array}\!\right] = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \]
Thus the matrix function \[ Y(\theta) = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] \quad \text{satisfies} \quad \frac{d}{d\theta} Y(\theta) = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] Y(\theta) \quad \text{and} \quad Y(0) = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \] In Problem 5 on Assignment 1 we presented the case that such function $Y(\theta) = e^{A \theta}$ where $\displaystyle A = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right].$ Hence \[ e^{\left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \theta} = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = (\cos\theta) \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + (\sin \theta) \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \]
Since we identify the matrix $\displaystyle \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]$ with the imaginary unit $i,$ the last equation in the preceding item can be restated for the complex numbers as \[ e^{i \theta} = (\cos\theta) + i (\sin \theta), \] which is exactly Euler's identity.

Tuesday, January 24, 2023

Today in class we studied a particular example of an evaluation mapping $T: \mathbb{P}_2 \to \mathbb{R}^3$ for polynomials. Below I will present a different example of involving an evaluation mapping for polynomials. At the end of today's post I link to my notes about complex numbers.
Considered the linear mapping $T: \mathbb{P}_3 \to \mathbb{R}^4$ defined by \[ T\bigl(p\bigr) = \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] A mapping like this is sometimes called an evaluation mapping. The goal in this item is to find a matrix representation for $T$ relative to the standard bases for $\mathbb{P}_3$ and $\mathbb{R}^4.$
- Recall that the standard basis in $\mathbb{P}_3$ is the set of all monomials $\mathcal{M} = \bigl\{1,x,x^2,x^3\bigr\}$ and the standard basis for $\mathbb{R}^4$ is the set of the columns of the identity matrix $I_4.$ We denote this basis by $\mathcal{E}.$
- The matrix representation for $T$ relative to the basis $\mathcal{M}$ of $\mathbb{P}_3$ and the basis $\mathcal{E}$ of $\mathbb{R}^4$ is the matrix $M$ with the following property \[ M \bigl[p\bigr]_{\mathcal{M}} = \bigl[Tp]_{\mathcal{E}} \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] In this case we can figure out the matrix $M$ directly, based on the definition.
- Let $p(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3.$ Then \[ \bigl[Tp]_{\mathcal{E}} = Tp = \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \] Thus we need a $4\times 4$ matrix $M$ such that \[ \left[\!\begin{array}{cccc} \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \end{array} \!\right]\left[\!\begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right] = \left[\! \begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \] By the definition of the action of a matrix on a vector we can reconstruct the matrix $M$ \[ \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \end{array} \!\right]\left[\!\begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right] = \left[\! \begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \]
- Let us introduce a notation for the monomials in $\mathcal{M}$: \[ q_0(x) = 1, \quad q_1(x) = x, \quad q_2(x) = x^2, \quad q_3(x) = x^3, \qquad x\in \mathbb{R}. \] By formula (4) in Section 5.4 we have \[ M = \Bigl[ \bigl[ Tq_0\bigr]_{\mathcal{E}} \ \, \bigl[Tq_1\bigr]_{\mathcal{E}} \ \, \bigl[Tq_2\bigr]_{\mathcal{E}} \ \, \bigl[ Tq_3 \bigr]_{\mathcal{E}} \ \, \Bigr] = \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array} \!\right] \] which, of course, coincides with what we got above.
- The determinant of $M$ is equal to $12.$ Therefore $M,$ and so $T,$ is invertible. Let us calculate the matrix representation for $T^{-1}.$ We will do that not by inverting $M,$ but by considering polynomials.
- We will use formula (4) in Section 5.4 to determine $M^{-1}.$ For convenience we use the standard notation for the columns of the identity matrix $I_4$ \[ \mathbf{e}_1 = \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \quad \mathbf{e}_2 = \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right], \quad\mathbf{e}_3 = \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \quad\mathbf{e}_4 = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \] Then \[ M^{-1} = \Biggl[ \Bigl[ T^{-1} \mathbf{e}_1 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_2 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_3 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_4 \Bigr]_{\mathcal{M}} \Biggr] \] So, to find $M^{-1}$ we need to calculate the polynomials \[ T^{-1} \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(0) = 0, \quad p(1) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = x(x-1)(x-2). \] However, for this $p(x)$ we have $p(-1) = -6.$ Since we need $p(-1) = 1,$ the needed $p(x)$ 1s \[ p(x) = - \frac{1}{6} x(x-1)(x-2) = 0\cdot 1 - \frac{1}{3} x + \frac{1}{2} x^2 - \frac{1}{6} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(1) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)(x-1)(x-2). \] However, for this $p(x)$ we have $p(0) = 2.$ Since we need $p(0) = 1,$ the needed $p(x)$ is \[ p(x) = \frac{1}{2} (x+1)(x-1)(x-2) = 1 - \frac{1}{2} x - x^2 + \frac{1}{2} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(0) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)x(x-2). \] However, for this $p(x)$ we have $p(1) = -2.$ Since we need $p(1) = 1,$ the needed $p(x)$ is \[ p(x) = - \frac{1}{2} (x+1)x(x-2) = 0\cdot 1 + x + \frac{1}{2} x^2 - \frac{1}{2} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(0) = 0, \quad p(1) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)x(x-1). \] However, for this $p(x)$ we have $p(2) = 6.$ Since we need $p(2) = 1$, the needed $p(x)$ is \[ p(x) = \frac{1}{6} (x+1)x(x-1) = 0\cdot 1 - \frac{1}{6} x + 0 \cdot x^2 + \frac{1}{6} x^3. \]
- The last four items give us $M^{-1}$ as follows \[ M^{-1} = \left[\!\begin{array}{cccc} 0 & 1 & 0 & 0 \\ -\frac{1}{3} & -\frac{1}{2} & 1 & -\frac{1}{6} \\ \frac{1}{2} & -1 & \frac{1}{2} & 0 \\ -\frac{1}{6} & \frac{1}{2} & -\frac{1}{2} & \frac{1}{6} \end{array}\!\right] \]
- It remains to verify: \[ M \, M^{-1} = \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array} \!\right] \left[\!\begin{array}{cccc} 0 & 1 & 0 & 0 \\ -\frac{1}{3} & -\frac{1}{2} & 1 & -\frac{1}{6} \\ \frac{1}{2} & -1 & \frac{1}{2} & 0 \\ -\frac{1}{6} & \frac{1}{2} & -\frac{1}{2} & \frac{1}{6} \end{array}\!\right] = \left[\!\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \!\right] \] It is really amazing that a calculation with polynomials gave us the inverse of $M.$
Today also talked about complex numbers, see Appendix B Complex Numbers in the textbook. I wrote my own introduction to complex numbers.
Whenever you calculate with complex numbers the goal is to express a complex number in the form $a + i b$ where $a$ and $b$ are real numbers. The deepest result in this spirit is the Euler's identity: \[ e^{i t} = (\cos t) + i (\sin t) \qquad \text{for all} \qquad t \in \mathbb{R}. \] In my introduction to complex numbers I present an explanation for Euler's identity using the differentiation rules.

Monday, January 23, 2023

Today we started Section 5.4. Suggested problems are: 1, 3 - 13, 17, 19 - 23, 27, 28.
The content of Section 5.4 can be used to provide an alternative way to obtain the standard matrix of a reflection.
- In the picture below we show the reflection across the green line. The unit vector along the green line and a vector orthogonal to it are colored teal: \[ \color{teal}{\mathbf{u}_1} = \color{teal}{\left[\! \begin{array}{c} \cos \theta \\ \sin \theta \end{array} \!\right]}, \ \color{teal}{\mathbf{u}_2} = \color{teal}{\left[\! \begin{array}{r} - \sin \theta \\ \cos \theta \end{array} \!\right]}. \]
- In the picture below we denote the reflection of a vector $\color{purple}{\mathbf{v}}$ by $T\color{purple}{\mathbf{v}}$.
  An illustration of a Reflection across the green line
- Let \[ \color{teal}{\mathcal B} = \bigl\{ \color{teal}{\mathbf{u}_1}, \color{teal}{\mathbf{u}_2} \bigr\} \quad \text{and} \quad \mathcal E = \bigl\{ \mathbf{e}_1 , \mathbf{e}_2 \bigr\}. \] As we learned in Chapter 4 the change of coordinate matrices are \[ \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} = \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \quad \text{and} \quad \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} = \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ - \sin \theta & \cos \theta \end{array} \!\right] \]
- It is clear that the matrix of the reflection $T$ relative to the teal basis $\color{teal}{\mathcal B}$ is \[ \bigl[ T \bigr]_{\color{teal}{\mathcal B}} = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right]. \] This means that if we have the coordinates of a vector $\color{purple}{\mathbf{v}}$ relative to the teal basis $\color{teal}{\mathcal B}$ then we can easily calculate the coordinates of its reflection $T\color{purple}{\mathbf{v}}$ relative to the teal basis $\color{teal}{\mathcal B}:$ \[ \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}}. \]
- Now recall the power of the change of coordinates matrix: \[ \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\mathcal E} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \color{purple}{\mathbf{v}} \] and that \[ T \color{purple}{\mathbf{v}} = \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\mathcal E} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} \]
- Now put together the preceding three displayed relations: \[ T \color{purple}{\mathbf{v}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \color{purple}{\mathbf{v}}. \] Thus, the matrix of the reflection is \begin{align*} \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} &= \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{array} \!\right] \\ & = \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{array} \!\right] \\ & = \left[\! \begin{array}{cc} (\cos \theta)^2 - (\sin\theta)^2 & 2 (\sin \theta)(\cos\theta) \\ 2 (\sin \theta)(\cos\theta) & (\sin\theta)^2 - (\cos \theta)^2 \end{array} \!\right] \\ & = \color{blue}{\left[\! \begin{array}{rr} \cos(2\theta) & \sin(2\theta) \\ \sin(2\theta) & -\cos(2\theta) \end{array} \!\right]} \end{align*}
- Thus, the standard matrix of the reflection across the green line which makes the angle $\theta$ with the positive $x$-axis is \[ \color{blue}{\left[\! \begin{array}{rr} \cos(2\theta) & \sin(2\theta) \\ \sin(2\theta) & -\cos(2\theta) \end{array} \!\right]}. \]
- As an exercise consider a similar setting in $\mathbb{R}^3.$ Consider the plane in $\mathbb{R}^3$ given as \[ \operatorname{Span} \left\{ \left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right] \right\}. \] Your task is to find the matrix of the reflection in $\mathbb R^3$ across this plane. To help you out I will point out that the vector \[ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right] \] is orthogonal to the given plane. Thus, the reflection across the given plane, call it $R,$ leaves the vectors $\left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right]$ and $ \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right]$ unchanged and reflects the vector $\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right]$ to its opposite vector $-\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right].$ That is \[ R \left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right], \quad R\left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right], \quad R\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right] = -\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right]. \]
Here I will present some interesting linear transformations of the vector space $\mathbb{P}_4$ of polynomials of degree $\leq 4$. In these examples I always calculate the matrix of a linear transformation with respect to the basis of this space which consists of monomials: \[ \mathcal{M} =\bigl\{1, t, t^2, t^3,t^4 \bigr\}. \] Notice that this basis has 5 elements, so the space $\mathbb{P}_4$ is a five-dimensional vector space. Sometimes it is convenient to introduce notation for monomials: We set \[ \phi_0(t) = 1, \quad \phi_1(t) = t, \quad \phi_2(t) = t^2, \quad \phi_3(t) = t^3, \quad \phi_4(t) = t^4. \]
- Let $D: \mathbb P_4 \to \mathbb P_4$ be the linear transformation of taking the derivative with respect to $t$. That is, for every $p(t) \in \mathbb P_n$ we set $(Dp)(t) = p'(t).$ Then the matrix representation of $D$ relative to $\mathcal M$ is the following $5\!\times\!5$ matrix \[ \left[\! \begin{array}{ccccc} 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 4 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array}\!\right]. \]
- Let $R: \mathbb P_4 \to \mathbb P_4$ be the linear transformation defined for every $p(t) \in \mathbb P_4$ by \[ (Rp)(t) = t^4 p(1/t). \] Then the matrix representation of $R$ relative to $\mathcal M$ is the following $5\!\times\!5$ matrix \[ Z_{5} = \left[\! \begin{array}{ccccc} 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ \end{array}\!\right]. \]
- Let $T: \mathbb P_4 \to \mathbb P_4$ be the linear transformation defined for every $p(t) \in \mathbb P_4$ by \[ (Tp)(t) = (1-t^2)p''(t) - tp'(t). \] Then the matrix representation of $T$ relative to $\mathcal M$ is the following $5\!\times\!5$ matrix \[ \left[ \begin{array}{rrrrr} 0 & 0 & 2 & 0 & 0 \\ 0 & -1 & 0 & 6 & 0 \\ 0 & 0 & -4 & 0 & 12 \\ 0 & 0 & 0 & -9 & 0 \\ 0 & 0 & 0 & 0 & -16 \end{array} \right]. \]
  
  The linear transformation introduced here is related to the Chebyshev differential equation and Chebyshev polynomials of the first kind, see my web-page about this topic using linear algebra.

Friday, January 20, 2023

Today we did a remarkable application of eigenvalues and eigenvectors to a famous sequence of positive integers: Fibonacci numbers.
Recall that $\mathbb{N}$ denotes the set of positive integers. The Fibonacci numbers are the elements of the sequence \[ f_0, f_1, f_2, \ldots, f_n, \ldots \] recursively defined by \[ f_0 = 0, \quad f_1 = 1, \quad \text{and} \quad f_{n+1} = f_n + f_{n-1} \quad \text{for} \quad n \in \mathbb{N}. \] Since we are given $f_0 = 0$ and $f_1 = 1$ using the recursion $f_{n+1} = f_n + f_{n-1}$ with $n=1$ we get $f_2 = 0+1 = 1.$ A repeated use of the recursion $f_{n+1} = f_n + f_{n-1}$ with $n=2$, then $n=3$, and so on, we get \[ f_0 = 0, \ f_1 = 1, \ f_2 = 1, \ f_3 = 2, \ f_4 = 3, \ f_5 = 5, \ f_6 = 8, \ f_7 = 13, \ f_8 = 21, \ f_9 = 34, \ f_{10} = 55, \ f_{11} = 89, \ \ldots \ \] It is clear that with enough patience we can calculate $f_{100}$ by calculating all Fibonacci numbers preceding it. Computers are really good with doing recursively defined operations. I wrote a webpage on how to do this in Mathematica. If you want to try using Mathematica on WWU computers please let me know. I can help you with that. By the way, \[ f_{100} = 354,224,848,179,261,915,075 \]
Since in Mathematics we like to be able to approach mathematical concepts in diverse ways, we are interested in finding a formula for the $n$-th Fibonacci number without calculating all the preceding Fibonacci numbers; a formula that will use only $n$ and algebraic operations. Amazingly, Linear Algebra offers a way to do that. The next items will illustrate how that comes about.
The first step is to write the recutsion \[ f_{n+1} = f_n + f_{n-1} \] using a matrix: \[ \left[\!\begin{array}{l} f_{n} \\ f_{n+1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{n-1} \\ f_{n} \end{array}\!\right]. \] Thus, we can obtain the Fibonacci sequence by repeated application of the matrix $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$ as follows \begin{align*} \left[\!\begin{array}{l} f_{1} \\ f_{2} \end{array}\!\right] & = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_0 \\ f_{1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{2} \\ f_{3} \end{array}\!\right] & = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{1} \\ f_{2} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^2 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{3} \\ f_{4} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{2} \\ f_{3} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^2 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^3 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{4} \\ f_{5} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{3} \\ f_{4} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^3 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^4 \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ & \ \vdots \\ \left[\!\begin{array}{l} f_{n} \\ f_{n-1} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{n-1} \\ f_{n-2} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^{n-1} \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ \left[\!\begin{array}{l} f_{n+1} \\ f_{n} \end{array}\!\right] &= \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{l} f_{n} \\ f_{n-1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^{n} \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^{n+1} \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \end{align*}
In the preceding item we saw that \[ \left[\!\begin{array}{l} f_{n} \\ f_{n-1} \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right]. \] Therefore \[ f_n = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right]. \] We could stop here and pronounce that this is sufficiently good formula for $f_n$ which uses only $n$ and matrix algebra. However, we want a formula for $f_n$ which uses only algebra with specific numbers; without matrices. To obtain such a formula we will calculate the eigenvalues and eigenvectors of the matrix $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$.
First we calculate the characteristic polynomial of $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$: \[ \det\left( \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right] - \left[\!\begin{array}{cc} \lambda & 0 \\ 0 & \lambda \end{array}\!\right] \right) = \left|\!\begin{array}{cc} -\lambda & 1 \\ 1 & 1-\lambda \end{array}\!\right| = -\lambda(1-\lambda) - 1 = \lambda^2 - \lambda - 1 \] The roots of the characteristic polynomial are \[ \lambda_1 = \frac{1+\sqrt{5}}{2} = \varphi \quad \text{and} \quad \lambda_2 = \frac{1-\sqrt{5}}{2} = \psi. \] The number $\phi$ is the famous number Golden ratio. The Greek letter $\varphi$ is the standard notation for the Golden ratio. We introduce $\psi$ since we will use it several times below.
An eigenvector of $\displaystyle\left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]$ corresponding to $\varphi$ is $\displaystyle\left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right]$ and an eigenvector corresponding to $\psi$ is $\displaystyle\left[\!\begin{array}{l} 1 \\ \psi \end{array}\!\right]$. Please verify that \[ \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]\left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right] = \varphi \left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right] \quad \text{and} \quad \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]\left[\!\begin{array}{l} 1 \\ \psi \end{array}\!\right] = \psi \left[\!\begin{array}{l} 1 \\ \psi \end{array}\!\right]. \] To verify the preceding vector equalities you will use that $\varphi$ and $\psi$ are the roots of the characteristic polynomial, that is \[ \varphi^2 - \varphi - 1 = 0 \quad \text{and} \quad \psi^2 - \psi - 1 = 0. \] One of the important properties of eigenvectors is that it is easy to calculate the action of the powers of the matrix on eigenvectors: \[ \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] = \varphi^n \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] \quad \text{and} \quad \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] = \psi^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right]. \]
To improve the formula \[ f_n = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 0 \\ 1 \end{array}\!\right] \] we will represent the vector $\displaystyle\left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right]$ as a linear combination of the eigenvectors: \[ \left[\!\begin{array}{c} 0 \\ 1 \end{array}\!\right] = x_1 \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] + x_2 \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right]. \] We do not need to do row reduction to solve this system. Since $x_1+x_2 = 0$ we deduce that $x_2 = -x_1$. Then we have \[ x_1 (\varphi - \psi) = 1. \] Since $\varphi - \psi = \sqrt{5}$ we have \[ \left[\!\begin{array}{c} 0 \\ 1 \end{array}\!\right] = \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right]. \] Therefore, \begin{align*} f_n & = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{l} 0 \\ 1 \end{array}\!\right] \\ & = \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left( \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] \right) \\ & = \bigl[ 1 \ \ 0 \bigr] \left( \frac{1}{\sqrt{5}} \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \left[\!\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\!\right]^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] \right) \\ & = \bigl[ 1 \ \ 0 \bigr] \left( \frac{1}{\sqrt{5}} \varphi^n \left[\!\begin{array}{l} 1 \\ \varphi \end{array}\!\right] - \frac{1}{\sqrt{5}} \psi^n \left[\!\begin{array}{c} 1 \\ \psi \end{array}\!\right] \right) \\ & = \frac{1}{\sqrt{5}} \bigl[ 1 \ \ 0 \bigr] \left( \left[\!\begin{array}{c} \varphi^n \\ \varphi^{n+1} \end{array}\!\right] - \left[\!\begin{array}{c} \psi^{n} \\ \psi^{n+1} \end{array}\!\right] \right) \\ & = \frac{1}{\sqrt{5}} \bigl[ 1 \ \ 0 \bigr] \left[\!\begin{array}{c} \varphi^n - \psi^{n} \\ \varphi^{n+1} -\psi^{n+1} \end{array}\!\right]\\ & = \frac{1}{\sqrt{5}} \bigl( \varphi^n - \psi^{n} \bigr). \end{align*} Thus, finally and amazingly, \[ f_n = \frac{1}{\sqrt{5}} \bigl( \varphi^n - \psi^{n} \bigr) = \frac{1}{\sqrt{5}} \left( \frac{(1+\sqrt{5})^n}{2^n} - \frac{(1-\sqrt{5})^n}{2^n} \right) = \frac{ (1+\sqrt{5})^n - (1-\sqrt{5})^n }{2^n \sqrt{5}} \] for all nonnegative integers $n$. A formula like this in which $f_n$ is given in terms of $n$ and standard functions is called a closed form expression. A lot of effort in mathematics has been put in finding closed form expressions for different mathematical objects.
This and the following item are related to the Fibonacci numbers but not to Linear ALgebra. I post it since it is interesting mathematics that you can understand and might be interested in. Without knowing what we derived above, if somebody showed you the expression \[ \frac{ (1+\sqrt{5})^n - (1-\sqrt{5})^n }{2^n \sqrt{5}} \] with $n$ a positive integer, how would one know that this is an integer. For small $n$ we could calculate, and we would get the Fibonacci numbers, but proving that it is always the case is an interesting exercise.
One can use the Binomial theorem to expend the numerator \[ (1+\sqrt5)^n-(1-\sqrt5)^n=\sum_{k=0}^{n} \binom{n}{k}(\sqrt{5})^k-\sum_{k=0}^{n} \binom{n}{k}(-1)^k (\sqrt{5})^k. \] In the above sums we can write the even and odd terms separately and get \[ \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j}(\sqrt{5})^{2j} + \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1}(\sqrt{5})^{2j+1} - \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j}(\sqrt{5})^{2j} + \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1}(\sqrt{5})^{2j+1}. \] Hence \[ (1+\sqrt5)^n-(1-\sqrt5)^n=2 \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1}(\sqrt{5})^{2j+1} = 2 \sqrt{5} \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1} 5^{j}. \] Dividing by $2^n\sqrt{5}$ so we get: \[ \frac{ (1+\sqrt{5})^n - (1-\sqrt{5})^n }{2^n \sqrt{5}} = \frac{2\sqrt{5}}{2^n \sqrt{5}}\sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1} 5^{j} = \frac{1}{2^{n-1}} \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1} 5^{j}. \] This gives us another interesting formula for the Fibonacci numbers \[ f_n = \frac{1}{2^{n-1}} \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1} 5^{j} \quad \text{where} \quad n \in\mathbb{N}. \] But it would be interesting to prove that the number \[ \frac{1}{2^{n-1}} \sum_{j=0}^{\lfloor n/2 \rfloor} \binom{n}{2j+1} 5^{j} \quad \text{where} \quad n \in\mathbb{N} \] is always an integer. I do not know how to do that directly, without all the work done in this post.

Thursday, January 19, 2023

Proofs are important aspect of mathematics. Therefore, today, I presented a proof of the important Theorem 2 in Section 5.1. This theorem simply says: Eigenvectors corresponding to distinct eigenvalues are linearly independent. Since the proof of this theorem in the book is somewhat cryptic, I present a different proof of this theorem here.
On Tuesday, I presented how a diagonalization of a matrix $A$ can be used to define the matrix $e^{At}:$ \[ e^{At} = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} e^t & 0 & 0 \\ 0 & e^{2t} & 0 \\ 0 & 0 & e^{2t} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[ \begin{array}{ccc} -e^t + 2 e^{2 t} & -e^t + e^{2 t} & e^t-e^{2 t} \\ -e^t + e^{2 t} & -e^t + 2 e^{2 t} & e^t-e^{2 t} \\ -3 e^t + 3 e^{2 t} & -3 e^t + 3 e^{2 t} & 3 e^t-2 e^{2 t} \end{array} \right]. \]
- We can use the matrix algebra to rewrite the last matrix as follows: \begin{align*} \left[ \begin{array}{ccc} -e^t + 2 e^{2 t} & -e^t + e^{2 t} & e^t-e^{2 t} \\ -e^t + e^{2 t} & -e^t + 2 e^{2 t} & e^t-e^{2 t} \\ -3 e^t + 3 e^{2 t} & -3 e^t + 3 e^{2 t} & 3 e^t-2 e^{2 t} \end{array} \right] & = \left[ \begin{array}{ccc} -e^t& -e^t & e^t \\ -e^t & -e^t & e^t \\ -3 e^t & -3 e^t& 3 e^t \end{array} \right] + \left[ \begin{array}{ccc} 2 e^{2 t} & e^{2 t} & -e^{2 t} \\ e^{2 t} &2 e^{2 t} & -e^{2 t} \\ 3 e^{2 t} &3 e^{2 t} &-2 e^{2 t} \end{array} \right] \\ & = \left[ \begin{array}{ccc} -1& -1 & 1 \\ -1 & -1 & 1 \\ -3 & -3 & 3 \end{array} \right] e^t + \left[ \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 &-2 \end{array} \right] e^{2 t}. \\ \end{align*}
- It might be a little mysterious where the matrices \[ \left[ \begin{array}{ccc} -1& -1 & 1 \\ -1 & -1 & 1 \\ -3 & -3 & 3 \end{array} \right], \qquad \left[ \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 &-2 \end{array} \right] \] come from. However, mysteries in math can be resolved. The resolution here is that these matrices come from the diagionalization in the following way: Instead the diagonal matrix, in the diagonalization we put identity. Then we split identity into as many pieces as we have distinct eigenvalues, then we multiply through: \begin{align*} I_3 & = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \\ & = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left( \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \!\right] + \left[\! \begin{array}{rrr} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right] \right) \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \\ & = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \\ & \phantom{xxxxxxxxx} +\left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \\ & = \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] + \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right] \\ & = \left[ \begin{array}{ccc} -1& -1 & 1 \\ -1 & -1 & 1 \\ -3 & -3 & 3 \end{array} \right] + \left[ \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 &-2 \end{array} \right]. \end{align*} Thus \[ \left[ \begin{array}{ccc} -1& -1 & 1 \\ -1 & -1 & 1 \\ -3 & -3 & 3 \end{array} \right] = \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] , \quad \left[ \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 &-2 \end{array} \right] = \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right] . \]
- Further \begin{align*} e^{At} & = \left[ \begin{array}{ccc} -1& -1 & 1 \\ -1 & -1 & 1 \\ -3 & -3 & 3 \end{array} \right] e^t + \left[ \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 &-2 \end{array} \right] e^{2 t} \\ & = e^t \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] + e^{2 t} \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right] \\ \end{align*} Hence, \[ e^{At} = e^t \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] + e^{2 t} \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right]. \]
- Since the vectors \[ \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right], \quad \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ 0 \\ 1 \end{array} \!\right], \] are eigenvectors of $A$ we can calculate \begin{align*} A e^{At} & = e^t A \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] + e^{2 t} A \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right] \\ & = e^t \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] + 2 e^{2 t} \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right] \end{align*}
- Also, it is easy to calculate the derivative \begin{align*} \frac{d}{dt} e^{At} & = \frac{d}{dt} e^t \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] + \frac{d}{dt} e^{2 t} \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right] \\ & = e^t \left[\! \begin{array}{r} 1 \\ 1 \\ 3 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ \end{array} \!\right] + 2 e^{2 t} \left[\! \begin{array}{rr} -1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ \end{array} \!\right] \end{align*}
- Thus, in the last two items, we have demonstrated that \[ \frac{d}{dt} e^{At} = A e^{At}. \]

Wednesday, January 18, 2023

To help you with Problem 1 on Assignment 1, bellow I review a specific example of a column space and how to explore it. For a geometric illustration of what is presented here see the post on January 10, 2022 on Math 204 website.
Consider the following matrix \[ A = \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right]. \] Recall that \[ \operatorname{Col}(A) = \operatorname{Span}\left\{ \left[\! \begin{array}{r} 1 \\ -3 \\ 1 \end{array} \!\right], \left[\! \begin{array}{r} -4 \\ 1 \\ 2 \end{array} \!\right] \right\}. \] Recall that \[ \left[\! \begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \!\right] \in \operatorname{Col}(A) \qquad \text{if and only if} \qquad \text{there exists} \ \ \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] \ \ \text{such that} \ \ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\! \begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \!\right]. \]
Given three "random" vectors \[ \left[\! \begin{array}{r} 2 \\ 1 \\ -4 \end{array} \!\right], \quad \left[\!\begin{array}{r} 6 \\ -7 \\ 0\end{array}\! \right], \quad \left[\!\begin{array}{r} -7 \\ -1 \\ 5\end{array} \!\right] \] we can ask the question: Which of these vectors are in $\operatorname{Col}(A)$?

To answer the question stated in the preceding item we have to solve the following three vector equations:

Equation 1	Equation 2	Equation 3
\[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\! \begin{array}{r} 2 \\ 1 \\ -4 \end{array} \!\right] \]	\[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{r} 6 \\ -7 \\ 0\end{array}\! \right] \]	\[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{r} -7 \\ -1 \\ 5\end{array} \!\right] \]

These three equations are solved by row reducing the augmented matrices. But, this is huge, we can do three row reductions in one go: \begin{align*} \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ -3 & 1 & 1 & -7& -1 \\ 1 & 2 & -4& 0& 5 \end{array}\! \right] & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & -11 & 7 & 11 & -22 \\ 0 & 6 & -6& -6 & 12 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & 1 & -\frac{7}{11} & -1 & 2 \\ 0 & 1 & -1& -1 & 2 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & 1 & -\frac{7}{11} & -1 & 2 \\ 0 & 0 & -\frac{4}{11}& 0 & 0 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & 1 & -\frac{7}{11} & -1 & 2 \\ 0 & 0 & 1 & 0 & 0 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 0 & 6& -7 \\ 0 & 1 & 0 & -1 & 2 \\ 0 & 0 & 1 & 0 & 0 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & 0 & 0 & 2 & 1 \\ 0 & 1 & 0 & -1 & 2 \\ 0 & 0 & 1 & 0 & 0 \end{array}\! \right] \qquad \text{this matrix is in RREF} \end{align*}
Conclusions:
- Since the first augmented column of the RREF is a pivot column and this augmented column corresponds to Equation 1, we conclude that Equation 1 is inconsistent. Therefore, it is true that \[ \left[\begin{array}{r} 2 \\ 1 \\ -4 \end{array} \right] \not\in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] . \]
- Since the second augmented column of the RREF is not a pivot column and this augmented column corresponds to Equation 2, we conclude that Equation 2 is consistent. From the RREF we can read even more \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\begin{array}{r} 2 \\ -1 \end{array} \right] = \left[\begin{array}{r} 6 \\ -7 \\ 0 \end{array} \right]. \] Therefore, it is true that \[ \left[\begin{array}{r} 6 \\ -7 \\ 0 \end{array} \right] \in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right]. \]
- Since the third augmented column of the RREF is not a pivot column and this augmented column corresponds to Equation 3, we conclude that Equation 3 is consistent. From the RREF we can read even more \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\begin{array}{r} 1 \\ 2 \end{array} \right] = \left[\begin{array}{r} -7 \\ -1 \\ 5 \end{array} \right] . \] Therefore, it is true that \[ \left[\begin{array}{r} -7 \\ -1 \\ 5 \end{array} \right] \in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right]. \]
The questions which are answered in the previous items can be answered by stating a universal problem:

Find a relationship between the coordinates $b_1,$ $b_2$ and $b_3$ such that the following relationship is satisfied: \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \!\right] \] for some $\displaystyle\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]\in\mathbb{R}^2.$

The answer to this question is very similar to what we did previously.
- Instead of specific coordinates in the augmented column we do the row reduction with $b_1,$ $b_2$ and $b_3:$ \begin{align*} \left[\!\begin{array}{rr|l} 1 & -4 & b_1 \\ -3 & 1 & b_2 \\ 1 & 2 & b_3 \end{array}\! \right] & \sim \left[\!\begin{array}{rr|l} 1 & -4 & b_1 \\ 0 & -11 & 3b_1 + b_2 \\ 0 & 6 &-b_1 + b_3 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|l} 1 & -4 & b_1 \\ 0 & 1 & -\frac{3}{11} b_1 -\frac{1}{11} b_2 \\ 0 & 6 &-b_1 + b_3 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|l} 1 & 0 & -\frac{1}{11} b_1 - \frac{4}{11} b_2 \\ 0 & 1 & -\frac{3}{11} b_1 - \frac{1}{11} b_2 \\ 0 & 0 & \frac{7}{11}b_1 + \frac{6}{11} b_2 + b_3 \end{array}\! \right] \\ \end{align*}
- The last matrix is in row echelon form. For the last matrix to be in reduced row echelon form we must have \[ \frac{7}{11}b_1 + \frac{6}{11} b_2 + b_3 = \frac{1}{11} \bigl(7 b_1 + 6 b_2 + 11 b_3\bigr) = 0. \] If \[ \frac{7}{11}b_1 + \frac{6}{11} b_2 + b_3 = \frac{1}{11} \bigl(7 b_1 + 6 b_2 + 11 b_3\bigr) \neq 0, \] then the augmented column of the last matrix is a pivot column and the corresponding system is inconsistent. Therefore we can make the following claim: \[ \left[\begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \right] \in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \quad \text{if and only if} \quad 7b_1 + 6 b_2 + 11 b_3 = 0. \]
- Assume that $7b_1 + 6 b_2 + 11 b_3 = 0$. Then the last matrix is in reduced row echelon form. From this matrix we can read the solution of the nonhomogeneous vector equation as follows: \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\!\begin{array}{c} \bigl(-\frac{1}{11} b_1 - \frac{4}{11} b_2\bigr) \\ \bigl(-\frac{3}{11} b_1 - \frac{1}{11} b_2\bigr) \end{array}\!\right] = \left[\!\begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array}\!\right]. \] For example, if $b_1 = -7,$ $b_2 = -1,$ $b_3 = 5,$ then $7(-7) + 6 (-1) + 11 \cdot 5 = 0$ and the solutions are $x_1 = -\frac{1}{11} (-7) - \frac{4}{11} (-1) = 1$ and $x_2 = -\frac{3}{11} (-7) - \frac{1}{11} (-1) = 2,$ as we have already calculated earlier.

Tuesday, January 17, 2023

Proofs are important aspect of mathematics.
- Important theorem of Section 5.1 is Theorem 2 which simply says: Eigenvectors corresponding to distinct eigenvalues are linearly independent. Since the proof of this theorem in the book is somewhat cryptic, I present a different proof of this theorem here.
- Important theorem of Section 5.3 is Theorem 5 which simply says: An $n\times n$ matrix $A$ is diagonalizable if and only if there exists a basis for $\mathbb{R}^n$ which consists of eigenvectors of $A$.
For Section 5.2 do 1-8, 11, 12, 14, 15, (in all these problems you can find eigenvectors as well) 9, 13, 18, 19, 20, 21, 24, 25, 27.
For Section 5.3 do 2, 3, 5, 8, 9, 12, 13, 16, 18, 20, 23, 24.
On Friday, January 13 we deduced a diagonalization of the matrix $A = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right]$ as follows: \[ A = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \]
Based on the diagonalization from the preceding item we can calculate a matrix $X$ such that $X^2 = A.$ This is square root of $A.$ One such $X$ is as follows \[ X = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & \sqrt{2} & 0 \\ 0 & 0 & \sqrt{2} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[ \begin{array}{ccc} -1 + 2 \sqrt{2} & -1 + \sqrt{2} & 1-\sqrt{2} \\ -1 + \sqrt{2} & -1 + 2 \sqrt{2} & 1-\sqrt{2} \\ -3 + 3 \sqrt{2} & -3 + 3 \sqrt{2} & 3-2 \sqrt{2} \\ \end{array} \right]. \] Another such matrix is \[ X = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & 0 & 0 \\ 0 & \sqrt{2} & 0 \\ 0 & 0 & \sqrt{2} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[ \begin{array}{ccc} 1+2 \sqrt{2} & 1+\sqrt{2} & -1-\sqrt{2} \\ 1+\sqrt{2} & 1+2 \sqrt{2} & -1-\sqrt{2} \\ 3+3 \sqrt{2} & 3+3 \sqrt{2} & -3-2 \sqrt{2} \\ \end{array} \right] \]
Based on the diagonalization from a preceding item we can calculate the matrix $Y(t) = e^{At}$ which has the property \[ Y'(t) = A Y(t), \quad \text{and} \quad Y(0) = I_3. \] This matrix is denoted as $Y(t) = e^{At}.$ It is calculated as follows \[ e^{At} = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} e^t & 0 & 0 \\ 0 & e^{2t} & 0 \\ 0 & 0 & e^{2t} \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[ \begin{array}{ccc} -e^t + 2 e^{2 t} & -e^t + e^{2 t} & e^t-e^{2 t} \\ -e^t + e^{2 t} & -e^t + 2 e^{2 t} & e^t-e^{2 t} \\ -3 e^t + 3 e^{2 t} & -3 e^t + 3 e^{2 t} & 3 e^t-2 e^{2 t} \end{array} \right]. \] If you are patient, you can verify that \[ \frac{d}{dt}\left[ \begin{array}{ccc} -e^t + 2 e^{2 t} & -e^t + e^{2 t} & e^t-e^{2 t} \\ -e^t + e^{2 t} & -e^t + 2 e^{2 t} & e^t-e^{2 t} \\ -3 e^t + 3 e^{2 t} & -3 e^t + 3 e^{2 t} & 3 e^t-2 e^{2 t} \end{array} \right] = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] \left[ \begin{array}{ccc} -e^t + 2 e^{2 t} & -e^t + e^{2 t} & e^t-e^{2 t} \\ -e^t + e^{2 t} & -e^t + 2 e^{2 t} & e^t-e^{2 t} \\ -3 e^t + 3 e^{2 t} & -3 e^t + 3 e^{2 t} & 3 e^t-2 e^{2 t} \end{array} \right] \]

Friday, January 13, 2023

Read Section 5.1. Suggested problems for Section 5.1: 1, 3, 4, 5, 6, 8, 11, 15, 16, 17, 19, 20, 24-27, 29, 30, 31.
A related Wikipedia link: Eigenvalue, eigenvector and eigenspace.
Below are animations of different matrices in action. In each scene the navy blue vector is the image of the sea green vector under the multiplication by a matrix $A$. For easier visualization of the action the heads of vectors leave traces.
Just looking at the movies you can guess what the eigenvalues and eigenvectors of the featured matrix are. In particular it is easy to see whether an eigenvalue is positive, negative, zero, or complex, ...
Moreover, looking at the movies, you can also SEE what the matrix used in each movie is. This is done by using, what I called "matrix surgery":
(n-by-n matrix M) (k-th column of the n-by-n identity matrix) = (k-th column of the n-by-n matrix M).
Each movie starts with the see green vector at position $\displaystyle \begin{bmatrix} 1 \\ 0 \end{bmatrix}.$ Since this vector is the first column of the $2\times2$ identity matrix, the corresponding navy blue vector is the first column of the matrix $A$ used in that movie. The featured still picture from each movie presents the see green vector at position $\displaystyle \begin{bmatrix} 0 \\ 1 \end{bmatrix}.$ Since this vector is the second column of the $2\times2$ identity matrix, the corresponding navy blue vector is the second column of the matrix $A$ used in that movie.

Place the cursor over the image to start the animation.

For Section 5.2 do 1-8, 11, 12, 14, 15, (in all these problems you can find eigenvectors as well) 9, 13, 18, 19, 20, 21, 24, 25, 27.
Two examples follow.
Example 1. In this item I will illustrate how to calculate eigenvalues and the corresponding eigenspaces of a specific $3\!\times\!3$ matrix. Consider the matrix \[ A = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] . \]
- First we find the characteristic polynomial of this matrix. The characteristic polynomial is the determinant of the following matrix: \[ A - \lambda I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} \lambda & 0 & 0 \\ 0 & \lambda & 0 \\ 0 & 0 & \lambda \end{array} \!\right] = \left[\! \begin{array}{ccc} 3-\lambda & 1 & -1 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right] \] Next we calculate this determinant: \begin{align*} \left|\! \begin{array}{ccc} 3-\lambda & 1 & -1 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right| &= \left|\! \begin{array}{ccc} 2-\lambda & -2+\lambda & 0 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right| \\ &= \left|\! \begin{array}{ccc} 2-\lambda & 0 & 0 \\ 1 & 4-\lambda & -1 \\ 3 & 6 & -1-\lambda \end{array} \!\right| \\ &= (2-\lambda) \bigl( (4-\lambda)(-1-\lambda) + 6 \bigr) \\ & = (2-\lambda)\bigl(\lambda^2 - 3 \lambda + 2\bigr) \\ & = -(\lambda - 2)^2 ( \lambda - 1) \end{align*} (At the first equality sign, we subtracted the second row from the first. At the second equality sign, we added the first column to the second. These operations do not change the value of a determinant.)
- Thus the eigenvalues of the matrix $A$ are $1$ and $2.$
- Next we find the eigenspace corresponding to the eigenvalue $1.$ For that we need to find the nullspace of the matrix \[ A - 1 I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right] = \left[\! \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right]. \] So, we row reduce the preceding matrix and find its nullspace: \[ \left[\! \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 2 & -1 \\ 0 & 3 & -1 \\ 0 & 3 & -1 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 2 & -1 \\ 0 & 1 & -1/3 \\ 0 & 0 & 0 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 0 & -1/3 \\ 0 & 1 & -1/3 \\ 0 & 0 & 0 \end{array} \!\right]. \] Thus, the eigenspace corresponding to the eigenvalue $1$ is the subspace \[ \left\{ \left[\! \begin{array}{c} s/3 \\ s/3 \\ s \end{array} \!\right] \ : \ s \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right] \right\}. \] Hence one eigenvector is $\left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right].$
- Next we find the eigenspace corresponding to the eigenvalue $2.$ For that we need to find the nullspace of the matrix \[ A - 2 I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 1 & 1 & -1 \\ 3 & 3 & -3 \end{array} \!\right]. \] So, we row reduce the preceding matrix and find its nullspace: \[ \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 1 & 1 & -1 \\ 3 & 3 & -3 \end{array} \!\right] \sim \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \!\right]. \] Thus, the eigenspace is the subspace \[ \left\{ \left[\! \begin{array}{c} -s + t \\ s \\ t \end{array} \!\right] \ : \ s, t \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] Hence two linearly independent eigenvectors corresponding to the eigenvalue $2$ are $\left[\! \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \!\right]$ and $\left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right].$
- The magic of what we found by now is that we found a basis of $\mathbb{R}^3$ which consists of eigenvectors of $A:$ \[ \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right], \quad \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \quad \left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right]. \]
- It is easy to verify whether these are really eigenvectors: \begin{align*} \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right] & = 1 \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right], \\ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & = 2 \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \\ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right]\left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right] & = 2 \left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right]. \end{align*} Yes, they are.
- I also made the claim that the three eigenvectors are linearly independent. Let us verify that as well. \[ \left[\! \begin{array}{crc|ccc} 1 & -1 & 1 & 1 & 0 & 0\\ 1 & 1 & 0 & 0 & 1 & 0 \\ 3 & 0 & 1 & 0 & 0 & 1 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{ccc|rrc} 1 & 0 & 0 & -1 & -1 & 1\\ 0 & 1 & 0 & 1 & 2 & -1 \\ 0 & 0 & 1 & 3 & 3 & -2 \end{array} \!\right]. \] We know that the right-hand side matrix in the Reduced Row Echelon Form is the inverse of the matrix whose columns are the eigenvectors. To verify the row reduction above, we calculate: \[ \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right]. \]
- Conclusion. Since the given matrix $A$ has three linearly independent eigenvectors it is diagonalizable. The following equality is called the diagonalization of $A.$ \[ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \] Please verify the above equality.
Example 2. In this item I will illustrate how to calculate eigenvalues and the corresponding eigenspaces of a specific $4\!\times\!4$ matrix. The purpose is to demonstrate a matrix that is not diagonalizable. Consider the matrix \[ A = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] . \]
- First we find the characteristic polynomial of this matrix. The characteristic polynomial is the determinant of the following matrix: \[ A - \lambda I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] - \left[\! \begin{array}{rrrr} \lambda & 0 &0 & 0 \\ 0 & \lambda & 0 & 0 \\ 0 & 0 & \lambda & 0 \\ 0 & 0 & 0 & \lambda \end{array} \!\right] = \left[\! \begin{array}{cccc} -\lambda & 0 & -1 & -1 \\ -1 & -\lambda & 0 & 0 \\ 2 & 1 & 2-\lambda & 1 \\ -2 & -1 & -1 & -\lambda \end{array} \!\right] \] Next we calculate the determinant of the preceding matrix: \begin{align*} \left|\! \begin{array}{cccc} -\lambda & 0 & -1 & -1 \\ -1 & -\lambda & 0 & 0 \\ 2 & 1 & 2-\lambda & 1 \\ -2 & -1 & -1 & -\lambda \end{array} \!\right| & = \left|\!\begin{array}{cccc} -\lambda & \lambda^2 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 - 2 \lambda & 2-\lambda & 1 \\ -2 & -1 + 2\lambda & -1 & -\lambda \end{array}\!\right| \\[6pt] & = \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 - 2 \lambda & 2-\lambda & 1 \\ -1 + 2\lambda & -1 & -\lambda \end{array}\!\right| \\[6pt] & = \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1-\lambda & 1 -\lambda \end{array}\!\right| \\[6pt] & = (1-\lambda) \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1 & 1 \end{array}\!\right| \\[6pt] & = (1-\lambda) \left|\!\begin{array}{ccc} \lambda^2 & 0 & 0 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1 & 1 \end{array}\!\right| \\[6pt] & = \lambda^2 (1-\lambda) \left|\!\begin{array}{cc} 2-\lambda & 1 \\ 1 & 1 \end{array}\!\right| \\[6pt] & = \lambda^2 (1-\lambda) (1-\lambda) \end{align*} (At the first equality sign, we subtracted the first column multiplied by $-\lambda$ from the second column. At the second equality sign, we perform the cofactor expansion along the second row. At the third equality sign, we add the second row to the third. At the fourth equality sign, we factor out the common factor $(1-\lambda)$ from the third row. At the fifth equality sign, we add the third row to the first. At the sixth equality sign, we perform the cofactor expansion along the first row. At the last equality sign, we calculate the $2\!\times\!2$ determinant.)
- Thus the eigenvalues of the matrix $A$ are $0$ and $1.$ The algebraic multiplicities of both eigenvalues is $2.$ Next we calculate the geometric multiplicities of these eigenvalues.
- Next we find the eigenspace corresponding to the eigenvalue $0.$ For that we need to find the nullspace of the matrix \[ A - 0 I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] \]
- So, we row reduce the matrix $A:$ \[ \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \end{array} \!\right] \] The nullspace of the preceding matrix is the eigenspace corresponding to the eigenvalue $0$; which we calculate to be \[ \operatorname{Nul}(A-0 I_4) = \left\{ \left[\! \begin{array}{c} 0 \\ s \\ -s \\ s \end{array} \!\right] \ : \ s \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \\ 1 \end{array} \!\right] \right\}. \]
- Thus, the eigenspace corresponding to the eigenvalue $0$ is one-dimensional.
- Next we find the eigenspace corresponding to the eigenvalue $1.$ For that we need to find the nullspace of the matrix \[ A - 1 I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] - \left[\! \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \!\right] = \left[\! \begin{array}{rrrr} -1 & 0 & -1 & -1 \\ -1 & -1 & 0 & 0 \\ 2 & 1 & 1 & 1 \\ -2 & -1 & -1 & -1 \end{array} \!\right] \]
- So, we row reduce the preceding matrix: \[ \left[\! \begin{array}{rrrr} -1 & 0 & -1 & -1 \\ -1 & -1 & 0 & 0 \\ 2 & 1 & 1 & 1 \\ -2 & -1 & -1 & -1 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{rrrr} 1 & 0 & 1 & 1 \\ 0 & 1 & -1 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \!\right] \] Thus, the eigenspace corresponding to the eigenvalue $1$ is the subspace \[ \operatorname{Nul}(A-1 I_4) = \left\{ \left[\! \begin{array}{c} -s-t \\ s+t \\ s \\ t \end{array} \!\right] \ : \ s, t \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} -1 \\ 1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \\ 1 \end{array} \!\right] \right\}. \]
- Thus, the eigenspace corresponding to the eigenvalue $1$ is two-dimensional.
- Conclusion. Since we found all eigenspaces of the $4\!\times\!4$ matrix $A$ and these eigenspaces have dimensions $1$ and $2$, we conclude that we can have at most three linearly independent eigenvectors. Consequently, we can not have a basis for $\mathbb R^4$ which consists of eigenvectors of $A.$ This shows that the matrix $A$ is not diagonalizable.

Thursday, January 12, 2023

Computer algebra system Mathematica will be very useful for the assignments in this class. You can start getting familiar with it. To get started with Mathematica see my Mathematica page. Please watch the videos that are on my Mathematica page. Watching the movies is very helpful to get started with Mathematica efficiently! Mathematica is available in the computer labs in BH 215 and BH 209.
During the class I created this small Mathematica notebook in which I did a few linear algebra calculations. You can download this notebook and start using Mathematica. Here is the pdf printout of the same notebook, so that you can view how Mathematica commands work without having access to Mathematica.
Today we proved that the functions in the following set are linearly independent: \[ \mathcal{B} = \bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \] What does this mean? This means that the only linear combination of the functions in $\mathcal{B}$ which gives the zero function is the linear combination with zero coefficients. That is, we have to prove \[ \forall t \in\mathbb{R} \quad \alpha_0 + \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0 \quad \Rightarrow \quad \alpha_0 = 0, \ \alpha_1 = 0, \ \alpha_2 = 0, \ \alpha_3 = 0. \] To prove this implication we assume \[ \forall t \in\mathbb{R} \quad \alpha_0 + \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0. \] We will present two proofs. The first proof is suggested in the textbook. The second proof uses differentiation which you learned in Calculus.
- Proof 1. Since we assume that the last displayed identity holds for all $t\in\mathbb{R},$ substituting $t=\pi/2$ we obtain \[ \alpha_0 = 0. \] Therefore, the assumption becomes \[ \forall t \in\mathbb{R} \quad \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0. \] Substituting \[ t= 0, \quad t=\pi/3,\quad \text{and} \quad t=\pi/6, \] we obtain \begin{alignat*}{6} \alpha_1 &&+ && \alpha_2 &&+ && \alpha_3 && = 0 & \\ \tfrac{1}{4}\alpha_1&&+ && \tfrac{1}{16} \alpha_2 && + && \tfrac{1}{64} \alpha_3 && = 0 & \\ \tfrac{3}{4} \alpha_1 && + && \tfrac{9}{16} \alpha_2 && + && \tfrac{27}{64} \alpha_3 && = 0 & \\ \end{alignat*} Or, equivalently \begin{alignat*}{6} \alpha_1 &&+ && \alpha_2 &&+ && \alpha_3 && = 0 & \\ \alpha_1&&+ && \tfrac{1}{4} \alpha_2 && + && \tfrac{1}{16} \alpha_3 && = 0 & \\ \alpha_1 && + && \tfrac{3}{4} \alpha_2 && + && \tfrac{9}{16}\alpha_3 && = 0 & \\ \end{alignat*} To solve this system, we row reduce the matrix \[ \left[\!\begin{array}{rrr} 1 & 1 & 1 \\ 1 & \tfrac{1}{4} & \tfrac{1}{16} \\ 1 & \tfrac{3}{4} & \tfrac{9}{16} \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 1 & 1 \\ 0 & - \tfrac{3}{4} & - \tfrac{15}{16} \\ 0 & - \tfrac{1}{4} & - \tfrac{7}{16} \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 1 & 1 \\ 0 & 1 & \tfrac{5}{4} \\ 0 & 0 & -\tfrac{1}{8} \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 0 & -\tfrac{1}{4} \\ 0 & 1 & \tfrac{5}{4} \\ 0 & 0 & 1 \end{array}\!\right] \sim \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\!\right] \] Thus, the only solution of the system for $\alpha_1$, $\alpha_2$, $\alpha_3$ is $\alpha_1=0$, $\alpha_2=0$, $\alpha_3=0.$ This completes the proof.
- Proof 2. Since we assume that the last displayed identity holds for all $t\in\mathbb{R},$ substituting $t=\pi/2$ we obtain \[ \alpha_0 = 0. \] Therefore, the assumption becomes \[ \forall t \in\mathbb{R} \quad \alpha_1 (\cos t)^2 + \alpha_2 (\cos t)^4 + \alpha_3 (\cos t)^6 = 0. \] Substituting \[ s = \cos t, \] we obtain \[ \forall s \in [-1,1] \quad \alpha_1 s^2 + \alpha_2 s^4 + \alpha_3 s^6 = 0. \] Differentiating the preceding identity twice with respect to $s \in (-1,1)$ we obtain \[ \forall s \in (-1,1) \quad 2 \alpha_1 + 12 \alpha_2 s^2 + 30 \alpha_3 s^4 = 0. \] Differentiating the preceding identity twice with respect to $s \in (-1,1)$ we obtain \[ \forall s \in (-1,1) \quad 24 \alpha_2 + 360 \alpha_3 s^2 = 0. \] Differentiating the preceding identity twice with respect to $s \in (-1,1)$ we obtain \[ \forall s \in (-1,1) \quad 720 \alpha_3 = 0. \] Substituting $s=0$ in the preceding three equalities we obtain \[ \alpha_1 = 0, \quad \alpha_2 = 0, \quad \alpha_3 = 0. \] This completes the proof.
Set \[ \mathcal{H} = \operatorname{Span}\bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\} \] and \[ \mathcal{B} = \bigl\{ 1, (\cos t)^2, (\cos t)^4, (\cos t)^6 \bigr\}. \] In the preceding item we proved that $\mathcal{B}$ is a basis for $\mathcal{H}.$
We discussed the following three trigonometric identities \begin{align*} \cos(2t) & = - 1 + 2 (\cos t)^2\\ \cos(4t) & = \phantom{-} 1 - 8 (\cos t)^2 + 8 (\cos t)^4 \\ \cos(6t) & = - 1 + 18 (\cos t)^2 - 48 (\cos t)^4 + 32 (\cos t)^6 \end{align*} In the language of linear algebra, these identities mean \[ \cos(2t) \in \mathcal{H}, \quad \cos(4t) \in \mathcal{H}, \quad\cos(6t) \in \mathcal{H} \] and \[ \bigl[ 1 \bigr]_{\mathcal{B}}=\left[\begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \end{array}\right], \quad \bigl[\cos(2t)\bigr]_{\mathcal{B}}=\left[\begin{array}{r} -1 \\ 2 \\ 0 \\ 0 \end{array}\right], \quad \bigl[\cos(4t)\bigr]_{\mathcal{B}}=\left[\begin{array}{r} 1 \\ -8 \\ 8 \\ 0 \end{array}\right], \quad \bigl[\cos(6t)\bigr]_{\mathcal{B}}=\left[\begin{array}{r} -1 \\ 18 \\ -48 \\ 32 \end{array}\right]. \] It follows from Theorem 8 in Section 4.4 in the textbook that the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ are linearly independent if and only if their coordinates are linearly independent. That is the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ are linearly independent if and only if the vectors \[ \left[\begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \end{array}\right], \quad \left[\begin{array}{r} -1 \\ 2 \\ 0 \\ 0 \end{array}\right], \quad \left[\begin{array}{r} 1 \\ -8 \\ 8 \\ 0 \end{array}\right], \quad \left[\begin{array}{r} -1 \\ 18 \\ -48 \\ 32 \end{array}\right] \] are linearly independent. Since \[ \det \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right] = 1\cdot 2 \cdot 8 \cdot 32 = 512 \neq 0 \] the coordinate vectors are linearly independent. Therefore, the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ are linearly independent. Consequently, the functions $1,$ $\cos(2t),$ $\cos(4t),$ $\cos(6t)$ form a basis for the vector space $\mathcal{H}.$
Set \[ \mathcal{C} = \bigl\{ 1, \cos(2 t), \cos(4 t), \cos(6 t) \bigr\}. \] Then $\mathcal{C}$ is a basis for the vector space $\mathcal{H}.$ Read the post of Monday, January 9. Based on that post we deduce that \[ \underset{\mathcal{B}\leftarrow\mathcal{C}}{P} = \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right]. \] Therefore \[ \underset{\mathcal{C}\leftarrow\mathcal{B}}{P} = \left[\begin{array}{rrrr} 1 & -1 & 1 & -1 \\ 0 & 2 & -8 & 18 \\ 0 & 0 & 8 & -48 \\ 0 & 0 & 0 & 32 \end{array}\right]^{-1} = \left[\!\begin{array}{cccc} 1 & \frac{1}{2} & \frac{3}{8} & \frac{5}{16} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{15}{32} \\ 0 & 0 & \frac{1}{8} & \frac{3}{16} \\ 0 & 0 & 0 & \frac{1}{32} \\ \end{array}\!\right] = \left[ \bigl[ 1 \bigr]_{\mathcal{C}} \ \ \bigl[ (\cos t)^2 \bigr]_{\mathcal{C}} \ \ \bigl[ (\cos t)^4 \bigr]_{\mathcal{C}} \ \ \bigl[ (\cos t)^6 \bigr]_{\mathcal{C}} \right]. \] Therefore, for example, \[ (\cos t)^6 = \frac{5}{16} + \frac{15}{32} \cos (2 t)+\frac{3}{16} \cos (4 t)+\frac{1}{32} \cos (6 t). \] Thus, we have discovered a trigonometric identity using linear algebra only.
The linked pdf file contains four problems from the textbook that relate to vector spaces of trigonometric functions. I hope that we reviewed this content well enough that you can do these four problems.
There is much more about trigonometric functions then what is presented in these four problems. I wrote about it on this webpage. The linked webpage explores the relationship between the powers of cosine function and the multiple angle cosine functions. In the future I will continue with the powers of sine function and the multiple angle sine functions. One can also combine sines and cosines.

Tuesday, January 10, 2023

On Thursday we will discuss four problems from the textbook that relate to vector spaces of trigonometric functions. I created a pdf page with these four problems. Try these problems on your own before the class.
There is much more about trigonometric functions then what is presented in these four problems. I wrote about it on this webpage. The linked webpage explores the relationship between the powers of cosine function and the multiple angle cosine functions. In the future I will continue with the powers of sine function and the multiple angle sine functions. One can also combine sines and cosines.

Monday, January 9, 2023

Today in class we explored a problem inspired by the picture below. No numbers are given just the picture. In the picture below we are given two bases of $\mathbb{R}^2$, one blue and one purple: \[ \color{blue}{\mathcal B} = \bigl\{ \color{blue}{\mathbf{b}_1}, \color{blue}{\mathbf{b}_2} \bigr\}, \quad \color{purple}{\mathcal C} = \bigl\{ \color{purple}{\mathbf{c}_1}, \color{purple}{\mathbf{c}_2} \bigr\} \] For each basis a coordinate grid is shown in the corresponding color. The coordinate grids are drown in the increments of 1/10 with the multiples of 1/2 emphasized with slightly thicker lines. Based on the information provided in the picture give good estimates for the change of coordinates matrices: \[ \underset{\color{purple}{\mathcal C}\leftarrow\color{blue}{\mathcal{B}}}{P}, \qquad \underset{\color{blue}{\mathcal{B}}\leftarrow\color{purple}{\mathcal C}}{P}. \] Why are the red points in the picture exceptional? How you can use the red points to verify whether your change of coordinates matrices are correct?
Suggested exercises for Section 4.7: Change of Basis are 2, 3, 4, 6, 8, 9, 11, 12, 19, 20.
A brief review of the Change of Coordinates Matrix follows. Let $m, n \in \mathbb{N}$ and $m\leq n$. Let $\mathcal{H}$ be a subspace of $\mathbb{R}^n$ and let \[ \mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} \] and \[ \mathcal{B} = \bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] be two bases of $\mathcal{H}.$ By definition of a basis this implies that \[ \mathcal{H} = \operatorname{Span}\bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} = \operatorname{Span}\bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] and both \[ \mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} \quad \text{and} \quad \mathcal{B} = \bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] are linearly independent. We proved in class that the change of coordinates matrix $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ is given by \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} = \Bigl[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}} \ \cdots \ \bigl[ \mathbf{a}_m\bigr]_{\mathcal{B}} \Bigr] \] and analogously \[ \underset{\mathcal{A}\leftarrow\mathcal{B}}{P} = \Bigl[ \bigl[\mathbf{b}_1\bigr]_{\mathcal{A}} \ \cdots \ \bigl[ \mathbf{b}_m\bigr]_{\mathcal{A}} \Bigr]. \] But, how to calculate \[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}}, \ldots,\bigl[ \mathbf{a}_m\bigr]_{\mathcal{B}}? \] Let us look at \[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}} = \left[\!\begin{array}{c} x_1 \\ \vdots \\ x_m \end{array}\!\right]. \] To find the real numbers $x_1, \ldots, x_m$ we have to solve the nonhomogeneous vector equation \[ x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_1. \] To solve the preceding equation we row reduce \[ \Bigl[\!\begin{array}{cccc|c} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_m & \mathbf{a}_1\end{array}\!\Bigr]. \] Since the vectors $\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_m$ are linearly independent, the Reduced Row Echelon Form of the preceding augmented matrix has the following form \[ \left[\!\begin{array}{cccc|c} 1 & 0 & \cdots & 0 & \text{the solution for} \ x_1 \\ 0 & 1 & \cdots & 0 & \text{the solution for} \ x_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & \text{the solution for} \ x_m \\ 0 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 & 0 \end{array}\!\right]. \] Notice that in the preceding matrix the bottom zero rows are present only in the case when $n \gt m.$ If $n \gt m$, then there are exactly $n-m$ rows of zeros. Also notice that the above system must be consistent since the vector $\mathbf{a}_1$ is in the span of the vectors $\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_m.$
To solve the nonhomogeneous vector equations \[ x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_2, \quad \ldots, \quad x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_m, \] we just build the bigger augmented matrix: \[ \Bigl[\!\begin{array}{cccc|cccc} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_m & \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_m \end{array}\!\Bigr]. \] Since the vectors $\mathbf{b}_1, \ldots, \mathbf{b}_m$ are linearly independent, the RREF off the matrix whose columns are the vectors of $\mathcal{B}$ consists of the identity matrix $I_m$ and $n-m$ zero rows at the bottom if $n\gt m.$ Therefore \[ \Bigl[\!\begin{array}{ccc|ccc} \mathbf{b}_1 & \cdots & \mathbf{b}_m & \mathbf{a}_1 & \cdots & \mathbf{a}_m \end{array}\!\Bigr] \sim \cdots \sim \left[\! \begin{array}{c|c} I_m & \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \\ 0 & 0 \end{array} \!\right]. \] In the preceding RREF, the zero matrices at the bottom are present only if $n-m \gt 0.$ Then, if $n-m \gt 0,$ these matrices are of the size $(n-m)\!\times\!m;$ they both have $m$ columns and $n-m$ rows consisting of zeros.
In the next example we are given two bases of a two-dimensional subspace of $\mathbb{R}^4$ and we are asked to find a change of coordinate matrices between these two bases: \[ \mathcal{H} = \operatorname{Span}\left\{\left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right], \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]\right\} = \operatorname{Span}\left\{\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right], \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]\right\}. \] Set \[ \mathcal{A} = \left\{\left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right], \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]\right\}, \qquad \mathcal{B} = \left\{\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right], \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]\right\}. \] To calculate $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ we need to row reduce the matrix \[ \left[\!\begin{array}{cr|cc} 2 & 1 & 1 & 2 \\ 5 & 0 & 2 & 3 \\ 3 & -1 & 1 & 1 \\ 7 & 1 & 3 & 5 \end{array}\!\right] \] The RREF of the preceding matrix will certainly include fractions. Therefore we rather find $\displaystyle\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}$ for which we need to row reduce (without fractions) \[ \left[\!\begin{array}{cc|cr} 1 & 2 & 2 & 1 \\ 2 & 3 & 5 & 0 \\ 1 & 1 & 3 & -1 \\ 3 & 5 & 7 & 1 \end{array}\!\right] \sim \cdots \sim \left[\!\begin{array}{cc|rr} 1 & 0 & 4 & -3 \\ 0 & 1 & -1 & 2 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\!\right]. \] Hence \[ \underset{\mathcal{A}\leftarrow\mathcal{B}}{P} = \left[\!\begin{array}{rr} 4 & -3 \\ -1 & 2 \end{array}\!\right]. \] Let us verify this calculation. Is it true that: \[ \left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] = (4) \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] + (-1)\left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right], \qquad \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right] = (-3) \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] + (2) \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]? \] Yes. Therefore the following equalities are correct: \[ \bigl[ \mathbf{b}_1\bigr]_{\mathcal{A}} = \left[\!\begin{array}{r} 4 \\ -1 \end{array}\!\right], \qquad \bigl[ \mathbf{b}_2\bigr]_{\mathcal{A}} = \left[\!\begin{array}{r} -3 \\ 2 \end{array}\!\right]. \] Hence, $\displaystyle\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}$ is correct.

We have \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} = \left(\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}\right)^{-1} = \frac{1}{5} \left[\!\begin{array}{rr} 2 & 3 \\ 1 & 4 \end{array}\!\right]. \] Verify this: \[ \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] = \frac{2}{5}\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] + \frac{1}{5} \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right], \qquad \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right] = \frac{3}{5} \left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] + \frac{4}{5} \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]. \] True. Therefore the following equalites are correct: \[ \bigl[ \mathbf{a}_1\bigr]_{\mathcal{B}} = \frac{1}{5}\left[\!\begin{array}{r} 2 \\ 1 \end{array}\!\right], \qquad \bigl[ \mathbf{a}_2\bigr]_{\mathcal{B}} = \frac{1}{5} \left[\!\begin{array}{r} 3 \\ 4 \end{array}\!\right]. \] Hence, $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ is correct.

Friday, January 6, 2023

Yesterday and today we reviewed the Row Reduction Algorithm and the concept of the Reduce Row Echelon Form of a matrix. See the webpage

An Ode to Reduced Row Echelon Form
On the webpage linked in the preceding item we considered the matrix $A$ given below and its RREF: \begin{equation*} \require{bbox} A = \left[\! \begin{array}{rrrrrr} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} & \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} & \bbox[lightblue]{\begin{array}{c} 4 \\ 3 \\ 2 \\ 1 \end{array}} & \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} & \bbox[lightblue]{\begin{array}{c} 6 \\ 4 \\ 8 \\ 6 \end{array}} \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{rrrrrr} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array}} & \bbox[yellow]{\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array}} & \bbox[lightblue]{\begin{array}{r} -1 \\ 5 \\ 0 \\ 0 \end{array}} & \bbox[yellow]{\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array}} & \bbox[lightblue]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 0 \end{array}} \end{array} \!\right]. \end{equation*} The yellow columns of $A$ are called the pivot columns of $A.$
A universal remarkable property of the Reduced Row Echelon Form of a matrix is the following:
Reduced Row Echelon Form of a matrix has the same number of pivot columns and nonzero rows.

Why is this property important? Later on we will see that the dimension of the column space of $A$ equals to the number of the pivot columns of $A$ and that the dimension of the row space of $A$ equals to the number of the nonzero rows of the RREF of $A.$

Thus, the boxed statement implies that for an arbitrary matrix $M$ we have

\[ \dim \operatorname{Col}(M) = \dim \operatorname{Row}(M) \]
Particularly powerful consequence of the properties of the RREF of $A$ is the following matrix product: \begin{equation*} \require{bbox} \left[\! \begin{array}{rrr} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} & \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} & \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right] \left[\! \begin{array}{rrrrr} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 0 \end{array}} & \bbox[yellow]{\begin{array}{c} 0 \\ 1 \\ 0 \end{array}} & \bbox[lightblue]{\begin{array}{r} -1 \\ 5 \\ 0 \end{array}} & \bbox[yellow]{\begin{array}{r} 0 \\ 0 \\ 1 \end{array}} & \bbox[lightblue]{\begin{array}{c} 1 \\ 2 \\ 3 \end{array}}\end{array} \!\right] = \left[\! \begin{array}{rrrrr} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} & \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} & \bbox[lightblue]{\begin{array}{c} 4 \\ 3 \\ 2 \\ 1 \end{array}} & \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} & \bbox[lightblue]{\begin{array}{c} 6 \\ 4 \\ 8 \\ 6 \end{array}} \end{array} \!\right] = A. \end{equation*} In words: the matrix product of the $4\!\times\!3$ matrix consisting of the pivot columns of $A$ and the $3\!\times\!6$ matrix consisting of the nonzero rows of the RREF of $A$ equals the given matrix $A.$
All vector relations that we use in the items below can be read from the matrix product in the previous item.
Let us introduce notation for the columns of $A$: \[ \require{bbox} \bbox[yellow]{\mathbf{a}_1} = \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} \end{array} \!\right], \quad \bbox[yellow]{\mathbf{a}_2} = \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \quad \bbox[lightblue]{\mathbf{a}_3} = \left[\! \begin{array}{r} \bbox[lightblue]{\begin{array}{c} 4 \\ 3 \\ 2 \\ 1 \end{array}} \end{array} \!\right], \quad \bbox[yellow]{\mathbf{a}_4} = \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right], \quad \bbox[lightblue]{\mathbf{a}_5} = \left[\! \begin{array}{r} \bbox[lightblue]{\begin{array}{c} 6 \\ 4 \\ 8 \\ 6 \end{array}} \end{array} \!\right] \] Recall that the column space of $A$ is the span of the columns of $A$: \[ \operatorname{Col}(A) = \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[lightblue]{\mathbf{a}_3}, \bbox[yellow]{\mathbf{a}_4}, \bbox[lightblue]{\mathbf{a}_5} \bigr\}. \]

Recall that \[ \bbox[lightblue]{\mathbf{a}_3} = (-1)\bbox[yellow]{\mathbf{a}_1} + 5 \bbox[yellow]{\mathbf{a}_2} + 0 \bbox[yellow]{\mathbf{a}_4}, \quad \bbox[lightblue]{\mathbf{a}_5} = 1\bbox[yellow]{\mathbf{a}_1} + 2 \bbox[yellow]{\mathbf{a}_2} + 3 \bbox[yellow]{\mathbf{a}_4}. \] A consequence of the preceding two equalities is the following equality for two spans: \[ \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[lightblue]{\mathbf{a}_3}, \bbox[yellow]{\mathbf{a}_4}, \bbox[lightblue]{\mathbf{a}_5} \bigr\} = \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\}. \] Hence the pivot columns of $A$ span the column space of $A$: \[ \operatorname{Col}(A) = \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\}. \]

Since the pivot columns of $A$ are linearly independent and the pivot columns of $A$ span $\operatorname{Col}(A),$ the pivot columns of the matrix $A$ form a basis for the column space of $A.$ Let us introduce the notation for this basis: \[ \mathcal{C} = \bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\} = \left\{ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right] \right\} \] Since a basis for $\operatorname{Col}(A)$ has three elements, we have that $\operatorname{Col}(A)$ is three dimensional vector space. That is \[ \dim \operatorname{Col}(A) = 3. \]

Review the concept of the coordinates of a vector relative to a basis introduced in Section 4.4 Coordinate Systems of the textbook. The notation for the coordinates of a vector $\mathbf{v}$ relative to the basis $\mathcal{B}$ is $[\mathbf{v} ]_{\mathcal{B}}.$

Using the concept of the coordinates of a vector relative to a basis we can write \[ \bigl[\bbox[yellow]{\mathbf{a}_1}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array}\!\right], \quad \bigl[\bbox[yellow]{\mathbf{a}_2}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array}\!\right], \quad \bigl[\bbox[lightblue]{\mathbf{a}_3}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} -1 \\ 5 \\ 0 \end{array}\!\right], \quad \bigl[\bbox[yellow]{\mathbf{a}_4}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array}\!\right], \quad \bigl[\bbox[lightblue]{\mathbf{a}_5}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array}\!\right] \]
In this item we list the relationship of the rows of the matrix $A$ and the nonzero rows of the RREF of $A.$

We introduce the notation for the rows of $A$ and the nonzero rows of the RREF of $A.$ We consider the rows of $A$ as vectors in $\mathbb{R}^5.$ That is we identify rows with their transposes. We introduce the following notation for the rows of $A$ \[ \mathbf{r}_1 = \left[\! \begin{array}{c} 1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array} \!\right], \quad \mathbf{r}_2 = \left[\! \begin{array}{r} 2 \\ 1 \\ 3 \\ 0 \\ 4 \end{array} \!\right], \quad \mathbf{r}_3 = \left[\! \begin{array}{r} 3 \\ 1 \\ 2 \\ 1 \\ 8 \end{array} \!\right], \quad \mathbf{r}_4 = \left[\! \begin{array}{r} 4 \\ 1 \\ 1 \\ 0 \\ 6 \end{array} \!\right]. \] We introduce the following notation for the rows of the RREF of $A$ \[ \mathbf{q}_1 = \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \quad \mathbf{q}_2 = \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \quad \mathbf{q}_3 = \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right]. \]

It has been explained in An Ode to Reduced Row Echelon Form that the rows of the RREF of $A,$ that is the vectors $\mathbf{q}_1,$ $\mathbf{q}_2,$ and $\mathbf{q}_3$ are linearly independent and they span the row space of $A$ \[ \operatorname{Row}(A) = \operatorname{Span}\bigl\{{\mathbf{r}_1}, {\mathbf{r}_2}, {\mathbf{r}_3}, {\mathbf{r}_4} \bigr\} = \operatorname{Span}\bigl\{ {\mathbf{q}_1}, {\mathbf{q}_2}, {\mathbf{q}_3} \bigr\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\}. \] Since the vectors ${\mathbf{q}_1},$ ${\mathbf{q}_2},$ and ${\mathbf{q}_3}$ are linearly independent and they span $\operatorname{Row}(A),$ we have that these vectors form a basis for $\operatorname{Row}(A).$ Denote this basis by $\mathcal{B}:$ \[ \mathcal{B} = \bigl\{ {\mathbf{q}_1}, {\mathbf{q}_2}, {\mathbf{q}_3} \bigr\} = \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\}. \] Since a basis for $\operatorname{Row}(A)$ has three elements, we have that $\operatorname{Row}(A)$ is three dimensional vector space. That is \[ \dim \operatorname{Row}(A) = 3. \]

For the first row of $A$ we have: \begin{equation*} \mathbf{r}_1 = \left[\! \begin{array}{r}{\begin{array}{c}1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array}}\end{array} \!\right] = {(1)} \left[\! \begin{array}{r}{\begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array}}\end{array} \!\right] = (1) \mathbf{q}_1 + (1) \mathbf{q}_2 + (1) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_1}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 1 \\ 1 \\ 1 \end{array}\!\right] \]

For the second row of $A$ we have: \begin{equation*} \mathbf{r}_2 = \left[\! \begin{array}{r}{\begin{array}{c} 2 \\ 1 \\ 3 \\ 0 \\ 4\end{array}}\end{array} \!\right] = {(2)}\left[\! \begin{array}{r}{\begin{array}{r}1 \\ 0 \\ -1 \\ 0 \\ 1\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(0)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3\end{array}}\end{array} \!\right] = (2) \mathbf{q}_1 + (1) \mathbf{q}_2 + (0) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_2}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 2 \\ 1 \\ 0 \end{array}\!\right] \]

For the third row of $A$ we have: \begin{equation*} \mathbf{r}_3 = \left[\! \begin{array}{r}{\begin{array}{c}3 \\ 1 \\ 2 \\ 1 \\ 8\end{array}}\end{array} \!\right] = {(3)}\left[\! \begin{array}{r}{\begin{array}{r}1 \\ 0 \\ -1 \\ 0 \\ 1\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3\end{array}}\end{array} \!\right] = (3) \mathbf{q}_1 + (1) \mathbf{q}_2 + (1) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_3}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 3 \\ 1 \\ 1 \end{array}\!\right] \]

For the fourth row of $A$ we have: \begin{equation*} \mathbf{r}_4 = \left[\! \begin{array}{r}{\begin{array}{c}4 \\ 1 \\ 1 \\ 0 \\ 6\end{array}}\end{array} \!\right] = {(4)}\left[\! \begin{array}{r}{\begin{array}{r}1 \\ 0 \\ -1 \\ 0 \\ 1\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(0)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3\end{array}}\end{array} \!\right] = (4) \mathbf{q}_1 + (1) \mathbf{q}_2 + (0) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_4}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 4 \\ 1 \\ 0 \end{array}\!\right] \]
Recall that the transpose of $A$ is the matrix in which the columns of $A$ are the rows of $A^\top$, or, equivalently, the rows of $A$ are the columns of $A^\top.$ Therefore, \[ \operatorname{Row}(A^\top) = \operatorname{Col}(A) \quad \text{and} \quad \operatorname{Col}(A^\top) = \operatorname{Row}(A) \]
Next we calculate the RREF of $A^\top:$ \[ A^\top = \left[\!\begin{array}{cccc} 1 & 2 & 3 & 4 \\ 1 & 1 & 1 & 1 \\ 4 & 3 & 2 & 1 \\ 1 & 0 & 1 & 0 \\ 6 & 4 & 8 & 6 \\ \end{array}\!\right] \quad \sim \quad \cdots \quad \sim \quad \left[\!\begin{array}{cccr} 1 & 0 & 0 & -1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\!\right] \]
Next we read the information about the \[ \operatorname{Row}(A^\top) = \operatorname{Col}(A) \quad \text{and} \quad \operatorname{Col}(A^\top) = \operatorname{Row}(A) \] from the RREF of $A^\top.$
- The first three columns of $A^\top,$ that is $\mathbf{r}_1,$ $\mathbf{r}_2,$ and $\mathbf{r}_3,$ are the pivot columns of $A^\top.$ Therefore the set \[ \mathcal{A} = \bigl\{ \mathbf{r}_1, \mathbf{r}_2, \mathbf{r}_3 \bigr\} = \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array} \!\right], \left[\! \begin{array}{r} 2 \\ 1 \\ 3 \\ 0 \\ 4 \end{array} \!\right], \left[\! \begin{array}{r} 3 \\ 1 \\ 2 \\ 1 \\ 8 \end{array} \!\right] \right\} \] is a basis for \[ \operatorname{Col}(A^\top) = \operatorname{Row}(A). \]
- The nonzero rows of the RREF of $A^\top$ form a basis for the row space of $A^\top.$ That is, the set \[ \mathcal{D} = \left\{ \left[\! \begin{array}{r}1 \\ 0 \\ 0 \\ -1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 1 \\ 0 \\ 1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 0 \\ 1 \\ 1 \end{array} \!\right] \right\} \] is a basis for \[ \operatorname{Row}(A^\top) = \operatorname{Col}(A). \]
In conclusion, we found two basis for the column space of $A$: \[ \mathcal{C} = \left\{ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}}\end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right] \right\} \quad \text{and} \quad \mathcal{D} = \left\{ \left[\! \begin{array}{r}1 \\ 0 \\ 0 \\ -1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 1 \\ 0 \\ 1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 0 \\ 1 \\ 1 \end{array} \!\right] \right\}. \] For each of the basis we can calculate the coordinates of each of the columns of $A$. You can do this as an exercise.

We found two basis for the row space of $A$: \[ \mathcal{B} = \bigl\{ {\mathbf{q}_1}, {\mathbf{q}_2}, {\mathbf{q}_3} \bigr\} = \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\} \quad \text{and} \quad \mathcal{A} = \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array} \!\right], \left[\! \begin{array}{r} 2 \\ 1 \\ 3 \\ 0 \\ 4 \end{array} \!\right], \left[\! \begin{array}{r} 3 \\ 1 \\ 2 \\ 1 \\ 8 \end{array} \!\right] \right\}. \] For each of the basis we can calculate the coordinates of each of the columns of $A$. You can do this as an exercise.
The final challenge in this context is to calculate the following four change of coordinates matrices: \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P}, \qquad \underset{\mathcal{A}\leftarrow\mathcal{B}}{P}, \qquad \underset{\mathcal{D}\leftarrow\mathcal{C}}{P}, \qquad \underset{\mathcal{C}\leftarrow\mathcal{D}}{P}. \] This is covered in Section 4.7 Change of Basis in the textbook.

Tuesday, January 3, 2023

The information sheet
We start with a review. Please review
- To celebrate the concept of reduced row echelon form of a matrix and the full power of its utility I wrote the webpage
  An Ode to Reduced Row Echelon Form
- The definition of an abstract vector space in Section 4.1, page 192.
- The definition of a linearly independent set and the definition of a basis in Section 4.3; Examples 3, 4, 5, 6 and 10; Practice Problems 1, 2, 3, Exercises 1-8 and 38.
- Section 4.4: Theorem 7 (the unique representation theorem), the definition of coordinates with respect to a basis, the definition of a change-of-coordinates matrix on page 249 and the definition and the properties of a coordinate mapping; Examples 1, 2, 4, 5, 6; Practice Problems 1, 2; Exercises 3, 4, 5, 7, 9, 10, 11, 13, 18, 21, 32.
- Section 4.5: Theorem 10, the definition of a finite-dimensional vector space and its dimension and the Basis Theorem; Examples 1, 2, 3, 4; Practice Problems 1, 2; Exercises 2, 3, 7, 22, 24, and 34.
- Section 4.7 Change of Basis. Suggested exercises are 2, 3, 4, 6, 8, 9, 11, 12, 15, 16, 19 (you do not need a calculator for this problem).
What is the oldest linear algebra problem?
- Clay tablet VAT 8389 from the Old Babylonian period, from 2000 to 1600 BC, contains what is believed to be the earliest word problem that can be interpreted as a system of linear equations:
  
  Total area of two fields is 1800 sar, the rent for one is 2 silà of grain per 3 sar, for the other is 1 silà per 2 sar, the total rent on the first exceeds that on the other by by 500 silà. What is the area of each plot?
  
  This blog has the picture of clay tablet VAT 8389 and more details about it.
  
  A translation of this word problem into a system of linear equations is as follows: \begin{alignat*}{4} &x_1 & &\ + &x_2 & = 1800 \\ \tfrac{2}{3} &x_1 & &- \tfrac{1}{2} &x_2 & = \phantom{1}500. \end{alignat*}
- Problem 40 of the Rhind papyrus which is dated to 1550 BC is:
  
  Divide 100 hekats of barley among 5 men so that the common difference is the same and so that the sum of the two smallest is 1/7 the sum of the three largest.
  
  Since the Rhind papyrus was copied by the scribe Ahmes from a now-lost text from the period around 1850 BC, and this lost text might have been copied from an even older text from around 2500 BC, the above problem could be by far the oldest known linear algebra problem.
  
  Denote by $x_1$ the smallest number and by $x_2$ the common difference. After simplification the above problem translates into the following system of linear equations: \begin{alignat*}{5} 5 &x_1 & & + 10 &x_2 & = 100 \\ \tfrac{11}{7} &x_1 & & - \phantom{1}\tfrac{2}{7} &x_2 & = \phantom{10}0. \end{alignat*}
- Most importantly for us, the oldest known treatment of systems of linear equations from antiquity which resembles the methods that we will use in this class is in Chapter 8 of the Chinese textbook Nine Chapters of the Mathematical Art which is at least 1800 years old.
  
  From 3 top-grade rice paddies, 2 medium-grade, and 1 low-grade, the combined yield is 39 dou of grain. From 2 top-grade, 3 medium-grade, and 1 low-grade, the combined yield is 34 dou of grain. From 1 top-grade, 2 medium-grade, and 3 low-grade, the combined yield is 26 dou of grain. How much dou does one bundle of each grade yield?
  
  Denote by $x_1$ the yield of the top-grade rice paddy, by $x_2$ the yield of the medium-grade, and by $x_3$ the yield of the low-grade rice paddy. Then the above problem translates into the following system of linear equations: \begin{alignat*}{7} 3 &x_1 & & + 2 &x_2 & + & x_3 = 39 \\ 2 &x_1 & & + 3 &x_2 & + & x_3 = 34 \\ &x_1 & & + 2 &x_2 & + 3 & x_3 = 26 \end{alignat*}
If the history of mathematics might inspire you to study mathematics with more enthusiasm, below I link to some websites with more about the history of Linear Algebra.
- Early History of Linear Algebra by Roger Hart
- History of matrices
- History of abstract vector spaces
- Solving a System of Linear Equations Using Ancient Chinese Methods by Mary Flagg
My comment on the history of mathematics:

Different civilizations have created mathematical knowledge throughout history and sometimes passed that knowledge among themselves. The most significant aspect of the growth of mathematical knowledge was that succeeding civilizations recognized the value of the knowledge created by preceding civilizations and used it as an inspiration for expanding that knowledge.

Winter 2023 MATH 304: Linear algebra

Branko Ćurgus

Winter 2023
MATH 304: Linear algebra